Getting EC numbers for PDB-chains March 7, 2008
If you ever need IUPAC Enzyme Classification numbers for larger numbers of protein structures, you can use the script below.
Just feed it a text file with one PDB-id and chain identifier per line as the first argument, and it spits out E.C. numbers for each chain. I don’t know what the pdb folks would say if you tried to grab E.C. no.s for the whole database, but getting a couple of hundred at a time should be fine. If you do pull down the ECs for every chain, I’d appreciate it if you made the list available and left a comment or dropped me a line as to where we can find it.
#!/usr/bin/perl
while($_ = <>){
my $id = $_;
#chomp($id);
$id =~ s/^\s+//;
$id =~ s/\s+$//;
my $chain = chop($id);
my $page = `curl -s http://www.rcsb.org/pdb/explore.do?structureId=$id`;
$page =~ m/Chains.*$chain.*\n.*EC no.*([0-9]+\.[0-9]+\.[0-9]+\.[0-9-]+)/;
my $ecno = $1;
$id = $_;
$id =~ s/^\s+//;
$id =~ s/\s+$//;
$id =~ s/\s+/ /g;
print “$id $ecno\n”;
}
Enjoy!










