As posted already are there still several problems with structure/substructure searches, and even the plain indexing of chemical structures within Wikipedia. Here are more details about it.
==One==
1.1. There are technical problems, because a SMILES code might contain characters reserved for the Wikipedia syntax.
1.2. Some people dislike SMILES strings for large molecules
1.3. It seems not to be easy to convince people about the relevance of unified chemical identifiers within Wikipedia, and subtructure and structure search link-outs. Especially the Wikipedia administration seems to be very skeptical and non-responsive.
Substructure search in eMolecules and PubChem added
I have added a SMILES based substructure search for eMolecules and PubChem. Please have a look at Cetirizine containing already SMILES code and this can been seen as an example.
Please comment on possible improvements. Things I considered
- How to improve visualization for SMILES? Smaller font, truncating it?
- Instead of search in should just use images for eMolecules and PubChem?
- Undisplaying SMILES, but keeping the external substructure search option.
JKW 13:47, 27 January 2007 (UTC)
- Hi there. Sorry to raise an objection, but wouldn't this clutter the template much? SMILES is rarely used as is, and wouldn't it really look bad on, say, large molecules such as paclitaxel or vancomycin :)? Fvasconcellos 14:10, 27 January 2007 (UTC)
- Hi, the problem is bigger, the code breaks when there is a "]" or something like that in the SMILES, what occurs in some cases. I'd be interested in a solution, though. Just as a note, there is a discussion about moving all the 'chemboxes' to the new chembox new on the chemistry wikiproject, you might want to join that discussion. --Dirk Beetstra T C 14:14, 27 January 2007 (UTC)
- I agree on both. First, SMILES is an option not a must and a workaround might be just having the link-out for the substructure search, but not displaying the SMILES for all entries. Second, breaking the Wiki-Code is a serious issue and can we not just add SMILES as raw data by using nowiki or pre tags? And in general a substructure search contains a lot of information, so I would like to keep it somehow. The same is also the case for InChI since this is indexed by Google as well. JKW 14:36, 27 January 2007 (UTC)
- I tried a nowiki tag without success, because this blocked the template replacements. I need help here. The SMILES code was excluded, but the search in eMolecules and PubChem was kept. JKW 14:55, 27 January 2007 (UTC)
- I have tried quite some things, but all seem to fail, or to just give an improperly encoded SMILES/InChI. I could code this for you (e.g.
) which would create a proper link (I have done something like that for chemical formulae, see [1], chemistry part of that test-wiki). Might have a look tomorrow. Problem is going to be to convince Brion and Tim to enable these tags on en.wikipedia.org. See you around. --Dirk Beetstra T C 15:03, 27 January 2007 (UTC)
==Two==
2.1. Beside the already mentioned issues causes the drug box template problems for unformatted SMILES code. A very recent example from last week is the Dapagliflozin article, an experimental SGLT2 inhibitor.
2.2. I stumbled last week over the 'corrupted' drug box in this article

2.3. I checked the syntax and figured out that a completely valid SMILES code was used. Unfortunately is this not breaking automatically, so I reverted back to a previous revision, in which the only change was the removal of the HTML line-breaks. This means we have a problem here. Either we keep a valid and unfromatted SMILES entry or we make the drug box unreadable.


==Three==
3.1. The drug box template and the chem box new template are really different and some people would like to get a unified solution for avoiding redundancy and lacking standards. I am, of course, one of them!
3.2. In fact, the situation is even worse, because we have multiple boxes and templates out there, which are increasing diversity. Some further things used are explosive box template, and some deprecated solutions like in the (old) Anthrone article, which was fixed yesterday by Rifleman82. Now also the chembox new is used. Nonetheless are there some (deprecated!?) templates around, which might be used in other articles, e.g. the InChI link template.
3.3. Beside all those organisational problems, there is also the question if Wikipedia parser functions (see also MediaWiki help) should be used, and to which degree they should be used? I think that we do not have many choices, because properly encoded molecules might encode characters, which might cause problems for several world-wide-web applications. Believe me, I know that molecular query languages are not easy for several reasons. Anyway, as also highlighted by Rich is this not an easy problem. Sorry Rich, I really dislike the WLN. In other words, I clearly support parser functions, because they can break-down syntax barriers when encoding chemistry.
I hope this gives some further insights and again, please comment, blog, distribute, or support! Anything that helps making some steps forward is highly appreciated.


3 comments:
Hi Joerg, and I had just to reintroduce the breaks yet another time. Yes, I think this is a serious problem, making distroying the layout of the wiki page does indeed not really help.
Would be nice if the IUPAC would finally come with a recommendation on how InChI's may be broken into parts.
Joerg, There is little homogeneity across the Wikipedia boxes as you know. From the point of view of a potential integrated "information provider" to WP it's interesting to see PubChem, Drugbank and emolecules integrated but, it appears, no support to add in support ChemSpider. I judge that's down to the "no-support for commercial entities" approach (my words) so it's understandable. (but I do have a judgment about the list of three above in that case :-) )
So, we've gone about it the other way and have tried to form the links from ChemSpider to WP by names. it's good in many cases and WEAK in others - look at cocaine synonyms at http://www.chemspider.com/RecordView.aspx?id=5557
Many of those are labeled with WIKI linking to records on WP but these were done "the wrong way" in my opinion. What we need to link INTO WP is either the set of CHemistry records supplied to us so that we can look into the DrugBox or Chemical Box for synonyms, registry IDs etc as well as the name of the entry. Then we can link these up in a much cleaner way. PREFERABLY if each was labeled with the InChiKey it would be VERY easy to link up as we would just search the InCHiKeys on WP and link off the structure rather than the names. There must be carre with how the keys are generated in terms of layers so we would likely have to go after the skeleton only..it would be better than what we have today
Caviar, *LOL*, that is funny. I forwarded you a mail, please comment there and add those wishes there.
Post a Comment