Friday, January 17, 2014

Welcome to prooffreaderplus: for my first trick, a small database of EC numbers for biochemists

Why on earth would I want two blogs? Well, is intended to be of general interest; prooffreaderplus is where I will post the tools I make (or steal) that might be useful for others wanting to do something similar (or totally different). I'm thinking some excel macros, databases, python or r scripts, etc. I work in genomics, so there might be an occasional proteomics or bioinformatics post as well.

I'll start off with something simple, a database of Enzyme Commission numbers as of Jan. 1, 2014 (they get periodically updated). They catalogue a hierarchy of chemical reactions catalyzed by enzymes. They comprise four digits, each one adding a level of specificity. A few times I've had to explore this hierarchy on BRENDA or other places, and I've wished I could just browse it on my own, so here it is. It's in a long, not wide, format, but with some parsing could be normalized/pivoted/casted or turned into XML or JSON.

Database of E.C. numbers (Google Spreadsheet)

There are eight columns; a hierarchical level code, then its definition, then the code for the next level, its definition, etc. For example, the first row is:

1 | Oxidoreductases | 1.1 | Acting on the CH-OH group of donors | 1.1.1 | With NAD or NADP as acceptor | | alcohol dehydrogenase

The last two rows are always different as the rows change, then previous columns increment and roll over. It's kind of obvious once you look at the whole data set.

If this can be useful to anyone, please accept it with my compliments.
