Friday, January 12, 2007

Molecular Query Languages - FlexMol, MQL, SMARTS, and co. - from alchemy to chemistry

There were several problems with alchemy, as seen from today's standpoint. There was no systematic naming system for new compounds, and the language was esoteric and vague to the point that the terminologies meant different things to different people. Wikipedia
Over the last weeks multiple articles have been posted highlighting the problems of transforming chemical information and molecules into mineable data structures. The key problems here are
  • the destruction of experimental results by using inappropriate data representations loosing the underlying sematics. Example1, Example2. According to Wikipedia 'Semantics refers to the aspects of meaning that are expressed in a language, code, or other form of representation.' So, why not keeping all information in a structured way?
  • the missing collaborating character for complex data structures. I strongly hope that over time a collaboration service for molecules will evolve. Actual topics might be
    • collaborative structure drawing services with assigning chemical properties
    • PDB based collaboration tools avoiding that we have to reedit each PDB file from scratch. There might be different interpretations possible, but that is not a problem, just provide them all with a ranking. But, do not drop information and learn from what you have seen so far.
  • still missing standards for chemical and biological expert systems. There is a huge variety of fast and 'simple' aromaticity models for small molecules available. The underlying problem is the huge amount of hard-coded chemical expert systems, which make it difficult to share and evolve from alchemy to a better chemistry.
If we focus on the molecular/structural side two important improvements over the last weeks are
FlexMol and MQL are compared to SMILES and SMARTS improved representation systems. Both systems can be extended and should in theory allow to keep the semantics, increase the collaboration, and should avoid reinventing chemical expert systems by creating reusable and extendable molecule and molecule query systems.
And especially actual molecule mining approaches should be able to increase the knowledge enrichment and transforming data into knowledge.

References

numly esn 32720-070111-949852-73
© 2007 All Rights Reserved.

0 comments: