Saturday, May 02, 2009

Labor Day Trip to Ya'an

Mengding Mt - Wide view
On this labor's day break, I took a trip to Ya'an with some colleagues and graduate students. The traffic was very extremely heavy and we were struck on the ramp to the highway for more than two hours. Around 1 pm, we finally reached our first destination, Mengdingshan Mt. The mountain is not very high only about 1450 m in altitude but famous for tea production. The lunch included several local specialities cooked with fresh tea buds. Before waiting for the lunch, the students spent time in tea picking and of course mainly posed for photo-taking. The local tea farmer provided the free tea-making services for the visitors.
As viewed from the mountain, Ya'an area is hilly, scenic and covered by heavy clouds. The area is also known for plenty of rain and it is one of three Ya's, Ya rain, Ya fish and Ya women. Walking in the mountain, the temperature was cool but the air was fresh and it felt very comfortable. Besides tea, there is a statue of the Great Yu who was respected by Chinese for his contribution in water-control and a museum of the Red Army. The mountain was a historical battle between the red army and the Sichuan army led by Liu Xiang during the long march. Of course, like most famous mountains in China, there are temples which have turned into money-sucker in either decent or deceive ways. In front of a temple, there are dozens of ginkgo trees which was said more than 2000 yrs old. Their leaves are smaller than the common ones but the trees are tall and beautiful. They stand still there year after year, and witness the men and women passing by.

Superbig Tea Kettle
Since the mountain is famous for its tea, a super-big man-made was built at the entrance of the mountain.


Landlords
In a tea field, two tiny figurines were found along the road. I guess they are landlord grandpa and grandpa. People are weak in mind and they need to materialize their hopes and worship what can see physically.

Sunday, April 12, 2009

Dancing with Words

from Kyoto Journal, interview with Red Pine (Bill Porter).

His accounts on the levels in translating Chinese poems and buddism suntras are vivid and touching.

“When I was translating Cold Mountain, I definitely didn’t have my own voice,” he says. “With Stonehouse it was somewhere in between. I think I didn’t really discover my translation voice until I did Bodhidharma, which gave me a chance to find the rhythms of my language.

“Every project I’ve engaged in taught me an entirely different way of translation,” he says. I don’t view Chinese poetry today the way I did then. I use to count the words in my English lines and try to do my best to do the same thing they did in Chinese. I was also intrigued about things you can do in English that reflect the Chinese, not to make the English sound Chinese but to do things with it that to me at least seemed unique.

“I tried to do things that I saw happening in Chinese — the Chinese language is a very telegraphic, terse language — time is almost irrelevant, their subject is also dispensed with. A line can be very ambiguous. So I started to play with that in English and still make sense.

“Words carry a lot on their surface, but a lot is under the surface that we don’t see when we see the word — a lot comes from contextual familiarity.

People identify words with context. I was intrigued by the nature of Chinese poetry and its brevity — there were these flashes of meaning.

“What I do now is more of a performance,” he says. “Before, I was usually sort of reading the lines like an actor, but now I perform the book — what I do now is closer to dance. The words have to follow along my physical feel for the rhythm, the feeling of what’s happening in the Chinese poem. I don’t see the Chinese as the origin anymore. The Chinese was what the authors used to write down what they were feeling.
“I’ve gotten so used to the words I don’t have to think about them anymore. I’m more concerned with the spirit. I don’t think I have a philosophy of translation, but you have to be very open.

“You’re trying to get into the heart of another person. I’m fortunate I’ve found materials that present deep hearts. That’s the way I’ve responded with the passion I have. I’m fortunate to have run into the Buddha, Bodidharma, Cold Mountain, Stonehouse and the other Buddhist poets.”

Saturday, March 21, 2009

Love Hurts



Love hurts, love scars,
Love wounds, and marks,
Any heart, not tough,
Or strong, enough
To take a lot of pain,
Take a lot of pain
Love is like a cloud
Holds a lot of rain
Love hurts, ooh ooh love hurts

I'm young, I know,
But even so
I know a thing, or two
I learned, from you
I really learned a lot,
Really learned a lot
Love is like a flame
It burns you when its hot
Love hurts, ooh ooh love hurts

Some fools think of happiness
Blissfulness, togetherness
Some fools fool themselves I guess
They're not foolin me

I know it isn't true,
I know it isn't true
Love is just a lie,
Made to make you blue
Love hurts, ooh,ooh love hurts
Ooh,ooh love hurts

[guitar solo]

I know it isn't true,
I know it isn't true
Love is just a lie,
Made to make you blue
Love hurts, ooh ooh love hurts
Ooh ooh love hurts
Ooh ooh...

Thursday, February 05, 2009

Keggin Isomers

This post introduced my understanding of Keggin structure. Keggin structure, also called Baker-Figgis-Keggin structure, is a well known structrual scalfold for heteropoly acids and other inorganic compounds. Below is extracted from the structure [AlO4Al12(OH)24(H2O)12]7+ (often called as Al13) determined by the X-ray crystallograph.

Al13, &epsilon Keggin Structure


α Isomer, Td

α Isomer, Wireframe Model



&epsilon Isomer, Td

&epsilon Isomer, Wireframe Model



&beta Isomer, C3v


&gramma Isomer, C2v


&delta Isomer, C3v

Wednesday, February 04, 2009

Building a Cheminfomatics System in M7

In the past two weeks, I need to build an in-house compounds collection database. Well, there are many open and commercials ones available on Internet. However, they do not serve my needs. Then I decide that I might be able to build one of my own. I have no previous experience in database building. So I started out by thinking if I can use Mathematica or Lisp to build a database of its own without relying on the third-party engines. Later I realized the enomorus work that will require.
After searching the documentation of Mathematica 7, I found it provides an easy-to-use interface for many database systems, such as MySQL and Oracle. Although the mathematica version of SQL functions is quite limited and you cannot use those functions as free as normal functions (they have to be able to converted to SQL grammar), you could always use SQLExecute[] to write any SQL sentences. The interface function takes care of data types conversion. I then chose the open source MySQL as my database engine.
Mathematica 7 introduced new Import capabilities for SDF and SMILES files besides its support for MOL2 and PDB. It also provides a closed source database called ChemicalData. At the beginning, I thought these functionality may save me a lot of time since most of my data sources are available in either SDF or SMILES format. After trying out several big commercial catalogs, I abandoned this idea. Real life files are not 100% in conform with the file format standards, but filled with all sorts of odds. Import function fails at so many situations that I have to write many wrap-up codes for taking care of errors. Then I started wring my own parser codes and the job was much simpler than I thought. The SDF format is quite old and its definition is in the FORTRAN style.
Although mathematica is not much helpful in parsing the input files, it is a good choice for work-flow coding, pre- and post-processing of data items. Cheminformatics toolkits need time to write, but the interface to external programs of Mathematica offers a convenient pipe to employ the third-party toolkits, such as openbabel, oechem, jchem, etc.
After debugging many obvious and less obvious errors, the system is finally running smoothly. On my 2-core lenovo laptop with 2GB memory, the single-process code loads about 800,000 compounds (1.8GB on hard disk) including the redundancy check and proprieties calculation in 12 hours. And the average CPU loading is about 1.5. And the time cost per entry seems not increasing as the database increases its size dramatically.

Monday, January 26, 2009

Pfizer buys Wyeth

2008 saw many "impossible" things happened. Super big Wall Street banks went brankrupt and Obama won the presidency of the United States. Well, in the new year of 2009, Pfizer wants to buy Wyeth for $68 billion and it would become the largest merger case if it is completed as planned.
News report from New York Times is linked as below.
http://www.nytimes.com/2009/01/26/business/26drug.html?_r=1&hp

Thursday, January 15, 2009

SMILES Parser

SMILES and Mathematica normal expression are essentially the same. Both are a presentation of a graph by breaking cycles into labeled trees. It is straightforward to parse a SMILES string into an expression so that further operations can be easily applied in Mathematica functions.
With the aid of string patterns (i.e. regular expression), recursive programing and the ability to convert a string into an expression (i.e. macro expansion), an expandable parser is not hard to write.
First, there are two kinds of information encoded in SMILES: atomic and structural information. Atomic information includes atom symbol, type, bonds, isotope weight, charge, valence, etc. Structural information includes atom order, the cyclic labeling, branches, etc. Configuration around double bonds, chirality are structural information from a chemist's viewpoint. However, they are not graphic properties in the sense of topology and can only be encoded into some predefined conventions, such as the orders of atoms. If we need independent representation form of them, we can encode such information as 3D coordinates of atoms along other information. Atomic information is very limited in SMILES and can be thought as known knowledge or can be obtained with other means. For the sake of simplicity, we may neglect it now.
We divide the parsing job into two parts, static definition of atom dictionary and all sorts of string patterns and scan function of a SMILES string.

Atom Dictionary

smiElements:= {
{"C", 4},
{"H",1},
{"N", 3,5},
{"O",2},
{"S",2,4,6},
{"B",3},
{"F",1},
{"Cl",1},
{"Br",1},
{"I",1}
};
smiElementsAromatic:={"c","n","o","p","s","as","se"};


String Patterns

(* Need to be refined. *)
smiAtomDefault := (#[[1]] & /@ smiElements);
smiChargePatttern := (("-" | "+") ..) | (("-" | "+") ~~ DigitCharacter ...);
smiChiralPattern := ("@" ..) | ("@" ~~ DigitCharacter) | ("@" ~~ LetterCharacter ~~ LetterCharacter ~~ DigitCharacter);
smiAtomCustom := "[" ~~ DigitCharacter ... ~~ smiAtomDefault ~~
smiChiralPattern ... ~~ ("H" ~~ DigitCharacter ...) ... ~~
smiChargePatttern ... ~~ "]";
smiAtomPattern := smiAtomCustom | smiAtomDefault;
smiAtomAromatic := smiElementsAromatic | ( "[" ~~ DigitCharacter ... ~~ smiElementsAromatic ~~ smiChiralPattern ... ~~ ("H" ~~ DigitCharacter ...) ... ~~smiChargePatttern ... ~~ "]");
smiBondPattern := "-" | "=" | "#" | ":" | "/" | "\\" | ".";
smiBranchBra := "(";
smiBranchKet := ")";
smiBranchEither = smiBranchBra | smiBranchKet;
smiCyclicLabel := (DigitCharacter ~~ ("%" ~~ DigitCharacter ~~ DigitCharacter) ...) | (DigitCharacter ~~ DigitCharacter ~~ ("%" ~~ DigitCharacter ~~ DigitCharacter) ...)
smiAtomAny := ("" | smiBondPattern) ~~ (smiAtomPattern | smiAtomAromatic) ~~ ("" |smiCyclicLabel);


When scanning a SMILES string, we break it into basic nodes as defined in the pattern of smiAtomAny, then different cases are handled accordingly. There are three situations in general: an atomic node, an atomic node with breaking labels and a branching node.

Parse Smiles

smiParseSmiles[s_String, type_: "String"] :=
Module[{smiParseAtom, smiParse,smiGetLabelIndex, smiParseCyclicLabelMeta},

smiGetLabelIndex[i_String] := Module[{pos, temp}, ...];
smiParseCyclicLabelMeta[bond_String, y_String] := Module[{z, zz, zzz}, ...];
smiParseAtom[ss_String] /; StringMatchQ[ss, smiAtomAny] :=StringReplace[...];
smiParseAtom[ss_String] /; StringMatchQ[ss, smiBranchBra] := (...);
smiParseAtom[ss_String] /; StringMatchQ[ss, smiBranchKet] := (...);
smiParse[ss_String] := Block[{$RecursionLimit = Infinity},StringReplace[ss, ...];

StringReplace[s,
StartOfString ~~ a : (smiAtomPattern | smiAtomAromatic) ~~
c : ("" | smiCyclicLabel) ~~ rest___ ~~ EndOfString :>
"Molecule[\"" ~~ a ~~ "\"" ~~
If[StringQ[c] && StringLength[c] > 0,
smiParseCyclicLabelMeta[
If[StringMatchQ[a, smiAtomAromatic], ",Aromatic",
",Single["], c], ""] ~~
If[StringQ[rest] && StringLength[rest] > 0, smiParse[rest]] ~~
"]"]
];


Cycling labeled are re-organized with an unique number from the natural number sequence. Cycle breakage labeling and branching handling uses a similar mechanism: a stack-like (either in the form of a data structure or a function) data structure is used to store intermediate results. As an open symbol is met, a new item is built. When a closure symbol is met, the stored intermedate is handled and pop up one item from the stack. Otherwise, we push the item into the stack. To avoid using global variables which is convenient in recursive coding, we define the sub-functions inside smiParseSmiles.

The function was tested on the 150 smiles strings in the OpenBabel package. Total time of 30 seconds was costed on my laptop. Below is an example SMILES string and the parsed result.

Example SMILES

OC(=O)C1=C(C=CC=C1)C2=C3C=CC(=O)C(=C3OC4=C2C=CC(=C4Br)O)Br


Normal Expression Represenation

Molecule["O",
Single["C", Double["O"],
Single["C", Single[R[1]],
Double["C",
Single["C", Double["C", Single["C", Double["C", Double[R[1]]]]]],
Single["C", Single[R[2]],
Double["C", Double[R[3]],
Single["C",
Double["C",
Single["C", Double["O"],
Single["C",
Double["C", Double[R[3]],
Single["O",
Single["C", Single[R[4]],
Double["C", Double[R[2]],
Single["C",
Double["C",
Single["C", Double["C", Double[R[4]], Single["Br"]],
Single["O"]]]]]]]], Single["Br"]]]]]]]]]]]