Life in Blue: Represenation of Molecules

There are many ways to represent molecules in computerized world, such as connection table, z-matrix, cartesian lists, SMILES, INCHI, SDF, MOL2, PDF, etc. Various kinds of formats serve for different purpose. For the most use in cheminformatics area, the format should be efficent and compact in terms of size, import and export, all sorts of operations. For the internal usage, the human readability is not so important. A molecule can be naturally viewed as a labeled graph. Robert B. Nachbar proposed a hierarchical data structure (a tree) for genetic programming in his molecular evolution studies. Like in SMILES format, rings are broken to turn a cyclic graph to a tree and the broken bonds are labeled with a label. He called such format Normal Expression represenation. In Mathematica, a sample presenation is as follows. This view is not only natural but also efficient in parsing though it seems clumszy at the first sight.

Ball-Stick Model

Mathematica Normal Expression

Tree Structure

Although the molecular evolution has been continued by researchers at the Unveristy of Leiden (they even set up a company for selling their software product), the represenation format has been abondoned. Instead, they adopt the so-called TreeSMILES format which is nearly identical with SMILES. Since LISP was used in their implementation. I don't see the reason why abondon such format. Actually, in Lisp, the format merits on another advantage: the closure scoping and compiling on-fly make it possible to present a molecule in a compiled function. For such an implementation, not only is the efficiency is high, but also the coding is terse (the molecular represenation function also serves as accessor and other related functions).

Life in Blue

Thursday, January 08, 2009

Represenation of Molecules

No comments:

Blog Archive

About Me

Followers