Monday, August 24, 2009

Getting Married ~

I am going to get married soon with the girl that I mentioned in a previous post in the past May. We planned to register on Sept. 10th, Chinese Teacher's Day, since both us are teachers. She is in-charge for the begging, primary school education, and I am for the concluding, college education. The wedding ceremony was planned on the 30th this January (Lunar calendar, Layue Shiliu, Dec. 16th).
Thank God for letting us meet, know each other and join together. Although the future life will be definitely not always smooth, we see our lives will be tied forever.

Tuesday, July 21, 2009

Develop a Cheminformatics Platform with Mathematica

My first encounter with Mathematica happened more than 12 years ago. Since 2001, I have intensively used it as a tool in my research work. Till now, i virtually use it on a daily basis for many different tasks. I don't think I have every aspects of the tool, but I do think my understanding of Mathematica is beneficial to researchers who are interested in learning Mathematica. I feel the necessity to organize my previous codes into a tutorial for two reasons. The first is that there are no good if any books available on market on the application in the cheminformatics or bioinformatics, particularly for those intermediate users who have known the basics of Mathematica but want to sharpen their tools. The other reason is that I have the motive to clean up my codes and thoughts for the development of my research. This tutorial is intended to show the essentials of Mathematica through the anatomy of projects from design to implementation. I see many users (and some tutorials books as well) writing codes in the style of C, Fortran or other programming languages for which they have been familiar with. To me, a good programming style is extremely important to master the language, in terms of development time, efficiency and terseness. A rule of thumb is that the development process helps you to organize your thoughts and to let you understand the problems at a deeper level, instead of making you feel cumbersome and work as a laborer. On the surface, we could think Mathematica as an advanced dialect of Lisp. "Advanced" here means millions of built-in functions and packaged as the extension of the kernel delivered with the language.
My plan is to include the following chapters.
Chapter 1. Introduction
Chapter 2. Compounds collection database
Chapter 3. Graph theory representation of molecules and its applications
Chapter 4. Descriptor development, similarity comparison, and virtual screening
Chapter 5. CIF parser, molecular structural operations and interface to quantum mechanics calculation
Chapter 6. Spectra processing (including both Image and Graph)
Chapter 7. Schroedinger equation solutions
Appendices
Subject Index

Saturday, May 02, 2009

Labor Day Trip to Ya'an

Mengding Mt - Wide view
On this labor's day break, I took a trip to Ya'an with some colleagues and graduate students. The traffic was very extremely heavy and we were struck on the ramp to the highway for more than two hours. Around 1 pm, we finally reached our first destination, Mengdingshan Mt. The mountain is not very high only about 1450 m in altitude but famous for tea production. The lunch included several local specialities cooked with fresh tea buds. Before waiting for the lunch, the students spent time in tea picking and of course mainly posed for photo-taking. The local tea farmer provided the free tea-making services for the visitors.
As viewed from the mountain, Ya'an area is hilly, scenic and covered by heavy clouds. The area is also known for plenty of rain and it is one of three Ya's, Ya rain, Ya fish and Ya women. Walking in the mountain, the temperature was cool but the air was fresh and it felt very comfortable. Besides tea, there is a statue of the Great Yu who was respected by Chinese for his contribution in water-control and a museum of the Red Army. The mountain was a historical battle between the red army and the Sichuan army led by Liu Xiang during the long march. Of course, like most famous mountains in China, there are temples which have turned into money-sucker in either decent or deceive ways. In front of a temple, there are dozens of ginkgo trees which was said more than 2000 yrs old. Their leaves are smaller than the common ones but the trees are tall and beautiful. They stand still there year after year, and witness the men and women passing by.

Superbig Tea Kettle
Since the mountain is famous for its tea, a super-big man-made was built at the entrance of the mountain.


Landlords
In a tea field, two tiny figurines were found along the road. I guess they are landlord grandpa and grandpa. People are weak in mind and they need to materialize their hopes and worship what can see physically.

Sunday, April 12, 2009

Dancing with Words

from Kyoto Journal, interview with Red Pine (Bill Porter).

His accounts on the levels in translating Chinese poems and buddism suntras are vivid and touching.

“When I was translating Cold Mountain, I definitely didn’t have my own voice,” he says. “With Stonehouse it was somewhere in between. I think I didn’t really discover my translation voice until I did Bodhidharma, which gave me a chance to find the rhythms of my language.

“Every project I’ve engaged in taught me an entirely different way of translation,” he says. I don’t view Chinese poetry today the way I did then. I use to count the words in my English lines and try to do my best to do the same thing they did in Chinese. I was also intrigued about things you can do in English that reflect the Chinese, not to make the English sound Chinese but to do things with it that to me at least seemed unique.

“I tried to do things that I saw happening in Chinese — the Chinese language is a very telegraphic, terse language — time is almost irrelevant, their subject is also dispensed with. A line can be very ambiguous. So I started to play with that in English and still make sense.

“Words carry a lot on their surface, but a lot is under the surface that we don’t see when we see the word — a lot comes from contextual familiarity.

People identify words with context. I was intrigued by the nature of Chinese poetry and its brevity — there were these flashes of meaning.

“What I do now is more of a performance,” he says. “Before, I was usually sort of reading the lines like an actor, but now I perform the book — what I do now is closer to dance. The words have to follow along my physical feel for the rhythm, the feeling of what’s happening in the Chinese poem. I don’t see the Chinese as the origin anymore. The Chinese was what the authors used to write down what they were feeling.
“I’ve gotten so used to the words I don’t have to think about them anymore. I’m more concerned with the spirit. I don’t think I have a philosophy of translation, but you have to be very open.

“You’re trying to get into the heart of another person. I’m fortunate I’ve found materials that present deep hearts. That’s the way I’ve responded with the passion I have. I’m fortunate to have run into the Buddha, Bodidharma, Cold Mountain, Stonehouse and the other Buddhist poets.”

Saturday, March 21, 2009

Love Hurts



Love hurts, love scars,
Love wounds, and marks,
Any heart, not tough,
Or strong, enough
To take a lot of pain,
Take a lot of pain
Love is like a cloud
Holds a lot of rain
Love hurts, ooh ooh love hurts

I'm young, I know,
But even so
I know a thing, or two
I learned, from you
I really learned a lot,
Really learned a lot
Love is like a flame
It burns you when its hot
Love hurts, ooh ooh love hurts

Some fools think of happiness
Blissfulness, togetherness
Some fools fool themselves I guess
They're not foolin me

I know it isn't true,
I know it isn't true
Love is just a lie,
Made to make you blue
Love hurts, ooh,ooh love hurts
Ooh,ooh love hurts

[guitar solo]

I know it isn't true,
I know it isn't true
Love is just a lie,
Made to make you blue
Love hurts, ooh ooh love hurts
Ooh ooh love hurts
Ooh ooh...

Thursday, February 05, 2009

Keggin Isomers

This post introduced my understanding of Keggin structure. Keggin structure, also called Baker-Figgis-Keggin structure, is a well known structrual scalfold for heteropoly acids and other inorganic compounds. Below is extracted from the structure [AlO4Al12(OH)24(H2O)12]7+ (often called as Al13) determined by the X-ray crystallograph.

Al13, &epsilon Keggin Structure


α Isomer, Td

α Isomer, Wireframe Model



&epsilon Isomer, Td

&epsilon Isomer, Wireframe Model



&beta Isomer, C3v


&gramma Isomer, C2v


&delta Isomer, C3v

Wednesday, February 04, 2009

Building a Cheminfomatics System in M7

In the past two weeks, I need to build an in-house compounds collection database. Well, there are many open and commercials ones available on Internet. However, they do not serve my needs. Then I decide that I might be able to build one of my own. I have no previous experience in database building. So I started out by thinking if I can use Mathematica or Lisp to build a database of its own without relying on the third-party engines. Later I realized the enomorus work that will require.
After searching the documentation of Mathematica 7, I found it provides an easy-to-use interface for many database systems, such as MySQL and Oracle. Although the mathematica version of SQL functions is quite limited and you cannot use those functions as free as normal functions (they have to be able to converted to SQL grammar), you could always use SQLExecute[] to write any SQL sentences. The interface function takes care of data types conversion. I then chose the open source MySQL as my database engine.
Mathematica 7 introduced new Import capabilities for SDF and SMILES files besides its support for MOL2 and PDB. It also provides a closed source database called ChemicalData. At the beginning, I thought these functionality may save me a lot of time since most of my data sources are available in either SDF or SMILES format. After trying out several big commercial catalogs, I abandoned this idea. Real life files are not 100% in conform with the file format standards, but filled with all sorts of odds. Import function fails at so many situations that I have to write many wrap-up codes for taking care of errors. Then I started wring my own parser codes and the job was much simpler than I thought. The SDF format is quite old and its definition is in the FORTRAN style.
Although mathematica is not much helpful in parsing the input files, it is a good choice for work-flow coding, pre- and post-processing of data items. Cheminformatics toolkits need time to write, but the interface to external programs of Mathematica offers a convenient pipe to employ the third-party toolkits, such as openbabel, oechem, jchem, etc.
After debugging many obvious and less obvious errors, the system is finally running smoothly. On my 2-core lenovo laptop with 2GB memory, the single-process code loads about 800,000 compounds (1.8GB on hard disk) including the redundancy check and proprieties calculation in 12 hours. And the average CPU loading is about 1.5. And the time cost per entry seems not increasing as the database increases its size dramatically.