At the beginning there was ... chaos
I think many people know that I am a big fan of structured data mining. As highlighted by Noel and Peter are Wikis one way for collecting this kind of information on a global level. Wikis are also nice, because even the data structures (just part of the Wiki) might evolve.
This idea is not new, but chemical data is still not there, yet! Yesterday, I was raising (again) the question what would be the best way to extend Wikipedia with structure searches? I was one of those establishing a first attempt enabling PubChem and eMolecules searches. I think you will agree that this is not general enough, because there are more databases out there. Especially, ChemSpider (@Wikipedia) is riding the web 2.0 wave very well, and it would be nice to see that even more sources are joining.
I must say that I had several eMail discussions behind the scenes and believe me, sometimes I hate it that nobody feels responsible in communities, but everybody has an opinion. In other words, I need more support
Dear cheminformatics community and friends of structure searches within Wikipedia,Only if we get unified Wikipedia entries for molecular structures and proper link-outs to all relevant pages I see a chance that the community can benefit from the massive collaboration efforts.
please help to reach the tipping point and support this, yes this!, request by adding comments or offering support!
1. I would like to get unique identifiers within Wikipedia, which are not breaking the drug box, chem box, or intermediate solutions using Wikipedia templates. I also do not like unreadable lengthy SMILES or InChI on Wikipedia pages, they do not really add value, because most people will not use them for large molecules.
2. I would like to get multiple search services, not only a few. This is crucial for creating knowledge.
3. I would like to see that structure identifiers are getting indexed by Google, which is impossible for reformatted SMILES and InChIs on Wikipedia. On the other hand is it impossible keeping this information in one piece in the boxes, because it screws them up.
4. I would like to get a consens and more publicity for getting the Wikipedia admins making step forwards, e.g. by supporting a chemical special search site. Otherwise we need a very robust and trustworthy supplier offering this service. Can the IUPAC help here?
5. I would like to see intelligent parser functions overcoming formatting problems of structural information. I think they are robust enough, and they are not that bad. So, please give the chemical and drug boxes free again for editing!
Please comment, blog, or do anything else, which helps getting this message to the right people! Especially this part is tricky on Wikipedia, since there might not be really 'one' person in charge.
Cheers, Joerg


3 comments:
I wonder whether the right tool for doing what you're asking currently exists. Wikipedia is a general-purpose information system just as Google is. They may be _too_ general.
Chemical information, is very peculiar. Molecular information, in particular, requires all sorts of special support mechanisms that, as we've seen, don't exactly play nice with others.
Maybe it's possible to craft those solutions around existing platforms; maybe it's time to think of alternatives.
BTW, would it be possible for you to include pictures (or links) on you blog that illustrate the kinds of issues you see w/ Wikipedia (i.e., showing the broken Drug Box/Chem Box).
Joerg..you know where I stand on this I believe. i would certainly love to facilitate link outs from Wikipedia to ChemSpider. We have provided links from ChemSPider directly into Wikipedia. The accuracy of these links would be far better if we could get a list of the compound names and wiki entries to form the connections - actually, SMILES or unbroken InChIs would suffice!
We have proven that on 17 million ChemSPider entries the InChIKEY is unique. I say GO WITH IT on Wikipedia and do it now...despite the fact it is in beta. Email exchanges with Google today suggest they have passion for indexing the InChIKeys so let's help them.
ChemSpider hasn't really been welcomed at Wikipedia by some..but the community supported the persistence of the page (thanks!). I believe there is a place for each of PubChem, eMolecules and ChemSPider in the boxes but clearly I am biased. I have offered in an email exchange to Klaus Gubernator of emolecules to index their data collection in ChemSpider and point traffic back to his service but could not get a response to my suggestion. Maybe it's time to make the same offer to PubChem and have them point out to ChemSpider? I think the request would be that we deposit all of the original data sources. Original data sources should come from the original depositors in my opinion.
I am very passionate about supporting your venture Joerg. I am even up for providing a dedicated search engine for Wikipedia entries if it were supported. With InChI keys on all the entries this would not be difficult in my opinion but we would need the support of Wiki admin.
Tell me what you need...you'll have it in spades.
I will create a new post including some previous comments and some actual problems.
I think we are not in a hurry. An open discussion might help to sort some things out and too get a clearer view on 'social chemistry' ?
Post a Comment