Getting EC numbers for PDB-chains

If you ever need IUPAC Enzyme Classification numbers for larger numbers of protein structures, you can use the script below.
Just feed it a text file with one PDB-id and chain identifier per line as the first argument, and it spits out E.C. numbers for each chain. I don’t know what the pdb folks would say if you tried to grab E.C. no.s for the whole database, but getting a couple of hundred at a time should be fine. If you do pull down the ECs for every chain, I’d appreciate it if you made the list available and left a comment or dropped me a line as to where we can find it.

#!/usr/bin/perl

while($_ = <>){
    my $id = $_;
    #chomp($id);
    $id =~ s/^\s+//;
    $id =~ s/\s+$//;
    my $chain = chop($id);
    my $page = `curl -s http://www.rcsb.org/pdb/explore.do?structureId=$id`;
    $page =~ m/Chains.*$chain.*\n.*EC no.*([0-9]+\.[0-9]+\.[0-9]+\.[0-9-]+)/;
    my $ecno = $1;
    $id = $_;
    $id =~ s/^\s+//;
    $id =~ s/\s+$//;
    $id =~ s/\s+/ /g;
    print "$id $ecno\n";
}

Enjoy!

Comments are closed.