Saturday, July 26, 2008

WikiPathways - the challenge of community curation

"But as the influx and complexity of biological data continue to grow, so will the challenge of organizing and maintaining these databases.
Fortunately, the biology community can provide an answer that will scale with the challenge: community curation." [DOI 10.1371/journal.pbio.0060184]
WikiPathways follows the concept of a community based data curation principle. So, they follow the same strategy of tools likeThe database is available as SubVersion repository (SVN) in the GenMAPP Pathway Markup Language (GPML) format. For a user friendly data editing PathVisio is used.

References
A.R. Pico, T. Kelder, M.P. van Iersel, K. Hanspers, B.R. Conklin, C. Evelo, WikiPathways: Pathway Editing for the People, PLoS Biol., 2008, 6, e184. PMID: 18651794. DOI 10.1371/journal.pbio.0060184

10 comments:

ChemSpiderMan said...

I was talking with the team this past week about how we could have mutual connectivities between our databases. Watch this space. Community curation from both sides will be excellent...clean up the small molecule database on our side and the pathways on their side for a "purer" outcome for the community.

Joerg Kurt Wegner said...

Do you know UM-BDD(Biocatalysis and Biodegradation database)? This contains already some small molecule information, and some of their concepts are nice.

And, of course, you should check the pioneer work of Prof. Gasteiger's group in this area. When they curated and created this knowledge (I worked on another topic, but was doing my master there) they invested in a lot of people and spend a lot of time for creating this information.

M. Reitz, O. Sacher, A. Tarkhov, D. Trümbach, J. Gasteiger, Enabling the exploration of biochemical pathways, Org. Biomol. Chem., 2004, 2, 3226-3237. DOI 10.1039/B410949J

ChemSpiderMan said...

Yes, we know UM-BDD. It's already on ChemSpider: https://www.chemspider.com/Search.aspx?dsn=Uni%20Minnesota%20-%20Biocatalysis/Biodegradation%20Database

I'm aware of Prof Gasteiger's work but don't know whether the database of small molecules is available for download for us to index and link back. Do you know?

Joerg Kurt Wegner said...

I checked quickly and they put it into a product called BioPath.

BTW, they have also a business model around (only) 1M Commercially Available Compounds.

ChemSpiderMan said...

I've done a proof of concept link to Wikipathways here for pyruvate: http://www.chemspider.com/Chemical-Structure.96901.html

Check the data source and you will see the link through : Homo sapiens:Krebs-TCA Cycle: Pyruvate

Now the question is whether the link should be from the pyruvate anion only or pyruvic acid itself or both. Comments?

Joerg Kurt Wegner said...

For me the question is rather, in which link does the pyruvate occur?

Same problem for UM-BDD. Let us take the Acetamide example, is it now substrate or product, or both?
Acetonitrile to Acetamide (reacID# r1315)
Acetamide to Acetate (reacID# r1440)

In other words, sometimes, or always for those databases, it might be interesting having the links between two ChemSpider compounds, which are linked by a reaction, which itself can contain meta-information to other sources, e.g. proteins.

Joerg Kurt Wegner said...

BTW, Pyruvate has also a cycle in the KEGG database, and neither the node nor (fancy) incoming or outgoing links are mentioned in ChemSpider.

Anyway, here is the KEGG compound entry for Pyruvate.

ChemSpiderMan said...

Pyruvate might have a cycle in KEGG but here is Pyruvate on KEGG. http://www.genome.jp/dbget-bin/www_bget?cpd+C00022

It is pyruvic acid. For me this is a misassociation of name and structure but I understand why. Pyruvic acid from ChemSpider IS linked to this record on KEGG since this is the structure on KEGG, not the anion.

Joerg Kurt Wegner said...

Right, CID 1031 looks fine, I guess you used the InChIKey skeleton for finding similar matches?

I still like the idea of have customized edges/links between entries (beside the InChIKey skeletons), because you could find, annotate, and curate compounds with protection groups, e.g. t-BOC.
or pro-drugs. Then you could also browse them easily or find all possible edges/links of this type.

I must admit, that the complexity of this would be high. Though, on the other hand, I assume that the edge density would be low, leading only to sparse graphs.

ChemSpiderMan said...

No...didn't use the inChi skeleton...I have noticed that many people use carboxylate anions and the acid nomenclatures interchangeably and made an assumption...happened to be correct.

Regarding your idea...interesting idea. We'll put it on the list of other things to discuss if we run out of things to do...looks like a long time before that happens though :-)