Archive for the ‘HEP – High Energy Physics’ Category

Data and Software Preservation for Open Science (DASPOS)

Monday, December 17th, 2012

I first read in: Preserving Science Data and Software for Open Science:

One of the emerging, and soon to be defining, characteristics of science research is the collection, usage and storage of immense amounts of data. In fields as diverse as medicine, astronomy and economics, large data sets are becoming the foundation for new scientific advances. A new project led by University of Notre Dame researchers will explore solutions to the problems of preserving data, analysis software and computational work flows, and how these relate to results obtained from the analysis of large data sets.

Titled “Data and Software Preservation for Open Science (DASPOS),” the National Science Foundation-funded $1.8 million program is focused on high energy physics data from the Large Hadron Collider (LHC) and the Fermilab Tevatron.

The research group, which is led by Mike Hildreth, a professor of physics; Jarek Nabrzyski, director of the Center for Research Computing with a concurrent appointment as associate professor of computer science and engineering; and Douglas Thain, associate professor of computer science and engineering, also will survey and incorporate the preservation needs of other research communities, such as astrophysics and bioinformatics, where large data sets and the derived results are becoming the core of emerging science in these disciplines.

Preservation of data and software semantics. Sounds like topic maps!

Materials you may find useful:

Status Report of the DPHEP Study Group: Towards a Global Effort for Sustainable Data Preservation in High Energy Physics (May 2012, Omitted the last 40 authors so I am omitting the first 50 authors. See the paper for the complete list.)

Data Preservation in High Energy Physics (December 2009, forerunner to the 2012 report)

DASPOS: Common Formats? by Mike Hildreth (slides, 19 November 2012)

DASPOS Overview by Mike Hildreth (slides, 20 November 2012)

Perhaps the most important statement from the 20 November slides:

A “scouting party”: push forward in what looks like a good direction without worrying about full world-wide consensus

I have participated in, seen, read about, any number of projects and well, this is quite refreshing.

Starting a project with or prematurely developing final answers is a guarantee of poor results.

Both science and the humanities explore to find answers. Why should developing standards be any different?

A great deal to be learned here, even if you are just listening in on the conversations.

TMVA Toolkit for Multivariate Data Analysis with ROOT

Monday, December 27th, 2010

TMVA Toolkit for Multivariate Data Analysis with ROOT

From the website:

The Toolkit for Multivariate Analysis (TMVA) provides a ROOT-integrated machine learning environment for the processing and parallel evaluation of multivariate classification and regression techniques. TMVA is specifically designed to the needs of high-energy physics (HEP) applications, but should not be restricted to these. The package includes:

TMVA consists of object-oriented implementations in C++ for each of these multivariate methods and provides training, testing and performance evaluation algorithms and visualization scripts. The MVA training and testing is performed with the use of user-supplied data sets in form of ROOT trees or text files, where each event can have an individual weight. The true event classification or target value (for regression problems) in these data sets must be known. Preselection requirements and transformations can be applied on this data. TMVA supports the use of variable combinations and formulas.


  1. Review TMVA documentation on one method in detail.
  2. Using a topic map, demonstrate supplementing that documentation with additional literature or examples.
  3. TMVA is not restricted to high energy physics but do you find citations of its use outside of high energy physics?


Monday, December 27th, 2010


From the website:

ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics.  Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations.


  • Save data. You can save your data (and any C++ object) in a compressed binary form in a ROOT file.  The object format is also saved in the same file.  ROOT provides a data structure that is extremely powerful for fast access of huge amounts of data – orders of magnitude faster than any database.
  • Access data. Data saved into one or several ROOT files can be accessed from your PC, from the web and from large-scale file delivery systems used e.g. in the GRID.  ROOT trees spread over several files can be chained and accessed as a unique object, allowing for loops over huge amounts of data.
  • Process data. Powerful mathematical and statistical tools are provided to operate on your data.  The full power of a C++ application and of parallel processing is available for any kind of data manipulation.  Data can also be generated following any statistical distribution, making it possible to simulate complex systems.
  • Show results. Results are best shown with histograms, scatter plots, fitting functions, etc.  ROOT graphics may be adjusted real-time by few mouse clicks.  High-quality plots can be saved in PDF or other format.
  • Interactive or built application. You can use the CINT C++ interpreter or Python for your interactive sessions and to write macros, or compile your program to run at full speed. In both cases, you can also create a GUI.

Effective deployment of topic maps requires an understanding of how others identify their subjects.

Noting that subjects in this context includes not only subject in experimental data but the detectors and programs used to analyze that data. (Think data preservation.)


  1. Review the documentation browser for ROOT.
  2. How would you integrate one or more of the years of RootTalk Digest into that documentation?
  3. What scopes would you create and how would you use them?
  4. How would you use a topic map to integrate subject specific content for data or analysis in ROOT?