Sunday, August 03, 2008

Data, models, or both?

Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. [C. Anderson, Wired, 2008-07-16]
First, I disagree with this statement. Second, thanks to bioinfoman3@delicious for sharing this information.

Honestly, I do not get it, why are people claiming that the scientific community, e.g. drug design, is similar to the chip industry or google concepts. Here, Chris Anderson claimed that data alone will replace theoretic concepts. As said by others, in the comments to his article, is data alone not information. Amund Tveit confirms this by showing that data correlations alone might be misleading, because you can find correlations in everything, even if it makes no sense. Data means any data, so if you look-up NME in google then you will not find a new molecular entity as first hit, though this it what I was expecting. So, I have serious doubts that this kind of thinking is supporting science at all, since it might lead away from the right path.

In the drug design area or other domains with chemical and biological information it is well known that the raw data web is not ready or, at best, suboptimal for chemistry. Let us assume that we have already fully equipped and finished data curation, and ontology projects, which provide information, not only data. Does this information-rich data help making new drugs without any models? I doubt so! Beside, the curation and ontology projects are far from being finished, so I doubt so heavily ! We have to accept that any data set will be always incomplete or noisy, even if clean music, book, gaming, and video data would be added. Would that support drug design or other scientific areas?
As said earlier
Just accept that we are working in an area where the chemical space is just too large for allowing us to get the complete picture of it. Thus, embrace incompleteness (bounded rational drug design). [Six Rules For Creating a Data Driven Drug Design Project, Mining Drug Space, 2007-11-02]
I completely agree, that
All models are wrong, but some are useful.. [G. E. P. Box]
By the way, if Chris Anderson is stating this at the very beginning of his article, why is he then trying to neglect the usefulness of models afterwards?

If people keep claiming that they know how to change science positively, then I am wondering, if they know that data generation itself might be a serious bottleneck. In drug design some effects, e.g. ADME and toxicity, are really hard and expensive to measure. Believe me, if the model disbelievers tell us that we have already enough data and models for understanding oral drugs in humans, then anyone in the drug design industry would be happy to not only reduce, but stop, human clinical tests and provide life saving drugs to patients immediately.

Finally, I strongly believe, that we need as much connected and information-rich data as possible (so far I align with C. Anderson), that we need good models for supporting decision making processes and saving time and environmental resources, and that we need a combination of both for innovative scientific thinking and novel developments.