From data to analysis: linking NWChem and Avogadro with the syntax and semantics of Chemical Markup Language by Wibe A de Jong, Andrew M Walker and Marcus D Hanwell. (Journal of Cheminformatics 2013, 5:25 doi:10.1186/1758-2946-5-25)



Multidisciplinary integrated research requires the ability to couple the diverse sets of data obtained from a range of complex experiments and computer simulations. Integrating data requires semantically rich information. In this paper an end-to-end use of semantically rich data in computational chemistry is demonstrated utilizing the Chemical Markup Language (CML) framework. Semantically rich data is generated by the NWChem computational chemistry software with the FoX library and utilized by the Avogadro molecular editor for analysis and visualization.


The NWChem computational chemistry software has been modified and coupled to the FoX library to write CML compliant XML data files. The FoX library was expanded to represent the lexical input files and molecular orbitals used by the computational chemistry software. Draft dictionary entries and a format for molecular orbitals within CML CompChem were developed. The Avogadro application was extended to read in CML data, and display molecular geometry and electronic structure in the GUI allowing for an end-to-end solution where Avogadro can create input structures, generate input files, NWChem can run the calculation and Avogadro can then read in and analyse the CML output produced. The developments outlined in this paper will be made available in future releases of NWChem, FoX, and Avogadro.


The production of CML compliant XML files for computational chemistry software such as NWChem can be accomplished relatively easily using the FoX library. The CML data can be read in by a newly developed reader in Avogadro and analysed or visualized in various ways. A community-based effort is needed to further develop the CML CompChem convention and dictionary. This will enable the long-term goal of allowing a researcher to run simple “Google-style” searches of chemistry and physics and have the results of computational calculations returned in a comprehensible form alongside articles from the published literature.

Aside from its obvious importance for cheminformatics, I think there is another lesson in this article.

Integration of data required “…semantically rich information…, but just as importantly, integration was not a goal in and of itself.

Integration was only part of a workflow that had other goals.

No doubt some topic maps are useful as end products of integrated data, but what of cases where integration is part of a workflow?

Think of the non-reusable data integration mappings that are offered by many enterprise integration packages.

