Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 30, 2018

TrackML Particle Tracking Challenge [Non-Twitter Big Data]

Filed under: CERN,Physics — Patrick Durusau @ 7:40 pm

TrackML Particle Tracking Challenge

Cutting to the chase:

… can machine learning assist high energy physics in discovering and characterizing new particles?

Details follow:

To explore what our universe is made of, scientists at CERN are colliding protons, essentially recreating mini big bangs, and meticulously observing these collisions with intricate silicon detectors.

While orchestrating the collisions and observations is already a massive scientific accomplishment, analyzing the enormous amounts of data produced from the experiments is becoming an overwhelming challenge.

Event rates have already reached hundreds of millions of collisions per second, meaning physicists must sift through tens of petabytes of data per year. And, as the resolution of detectors improve, ever better software is needed for real-time pre-processing and filtering of the most promising events, producing even more data.

To help address this problem, a team of Machine Learning experts and physics scientists working at CERN (the world largest high energy physics laboratory), has partnered with Kaggle and prestigious sponsors to answer the question: can machine learning assist high energy physics in discovering and characterizing new particles?

Specifically, in this competition, you’re challenged to build an algorithm that quickly reconstructs particle tracks from 3D points left in the silicon detectors. This challenge consists of two phases:

  • The Accuracy phase will run on Kaggle from May to July 2018. Here we’ll be focusing on the highest score, irrespective of the evaluation time. This phase is an official IEEE WCCI competition (Rio de Janeiro, Jul 2018).
  • The Throughput phase will run on Codalab from July to October 2018. Participants will submit their software which is evaluated by the platform. Incentive is on the throughput (or speed) of the evaluation while reaching a good score. This phase is an official NIPS competition (Montreal, Dec 2018).

All the necessary information for the Accuracy phase is available here on Kaggle site. The overall TrackML challenge web site is there.

I know you breathed a sigh of relief upon reading, [Non-Twitter Big Data].

There’s nothing wrong with using Twitter to practice big data techniques but end of the day, at best some advertiser can micro-tweak an advertisement for a loser (pronounced “user.”) There’s no real bang from that “achievement.”

Unlike tweaking ad targeting, a viable solution to this challenge may make a fundamental difference in high energy physics.

Would you rather be known as an ad tweaker or for advancing ML in high energy physics?

Your call.

December 7, 2017

CatBoost: Yandex’s machine learning algorithm (here be Russians)

Filed under: CERN,Machine Learning — Patrick Durusau @ 3:08 pm

CatBoost: Yandex’s machine learning algorithm is available free of charge Victoria Zavyalova.

From the post:

Russia’s Internet giant Yandex has launched CatBoost, an open source machine learning service. The algorithm has already been integrated by the European Organization for Nuclear Research to analyze data from the Large Hadron Collider, the world’s most sophisticated experimental facility.

Machine learning helps make decisions by analyzing data and can be used in many different areas, including music choice and facial recognition. Yandex, one of Russia’s leading tech companies, has made its advanced machine learning algorithm, CatBoost, available free of charge for developers around the globe.

“This is the first Russian machine learning technology that’s an open source,” said Mikhail Bilenko, Yandex’s head of machine intelligence and research.

I called out the Russian origin of the CatBoost algorithm, not because I have any nationalistic tendencies but you can find frothing paranoids in U.S. government agencies and their familiars who do. In those cases, avoid CatBoost.

If you work in saner environments, or need to use categorical data (read not converted to numbers), give CatBoost a close look!

Enjoy!

March 21, 2014

ROOT Files

Filed under: CERN,Dictionary,Files — Patrick Durusau @ 7:17 pm

ROOT Files

From the webpage:

Today, a huge amount of data is stored into files present on our PC and on the Internet. To achieve the maximum compression, binary formats are used, hence they cannot simply be opened with a text editor to fetch their content. Rather, one needs to use a program to decode the binary files. Quite often, the very same program is used both to save and to fetch the data from those files, but it is also possible (and advisable) that other programs are able to do the same. This happens when the binary format is public and well documented, but may happen also with proprietary formats that became a standard de facto. One of the most important problems of the information era is that programs evolve very rapidly, and may also disappear, so that it is not always trivial to correctly decode a binary file. This is often the case for old files written in binary formats that are not publicly documented, and is a really serious risk for the formats implemented in custom applications.

As a solution to these issues ROOT provides a file format that is a machine-independent compressed binary format, including both the data and its description, and provides an open-source automated tool to generate the data description (or “dictionary“) when saving data, and to generate C++ classes corresponding to this description when reading back the data. The dictionary is used to build and load the C++ code to load the binary objects saved in the ROOT file and to store them into instances of the automatically generated C++ classes.

ROOT files can be structured into “directories“, exactly in the same way as your operative system organizes the files into folders. ROOT directories may contain other directories, so that a ROOT file is more similar to a file system than to an ordinary file.

Amit Kapadia mentions ROOT files in his presentation at CERN on citizen science.

I have only just begun to read the documentation but wanted to pass this starting place along to you.

I don’t find the “machine-independent compressed binary format” argument all that convincing but apparently it has in fact worked for quite some time.

Of particular interest will be the data dictionary aspects of ROOT.

Other data and description capturing file formats?

November 26, 2011

INSPIRE

Filed under: CERN,INSPIRE — Patrick Durusau @ 8:01 pm

INSPIRE

From the webpage:

CERN, DESY, Fermilab and SLAC have built the next-generation High Energy Physics (HEP) information system, INSPIRE, which empowers scientists with innovative tools for successful research at the dawn of an era of new discoveries.

INSPIRE combines the successful SPIRES database content, curated at DESY, Fermilab and SLAC, with the Invenio digital library technology developed at CERN. INSPIRE is run by a collaboration of the four labs, and interacts closely with HEP publishers, arXiv.org, NASA-ADS, PDG, and other information resources.

INSPIRE represents a natural evolution of scholarly communication, built on successful community-based information systems, and provides a vision for information management in other fields of science.

INSPIRE builds on SPIRES’ expertise

  • Decades of trusted, curated content
  • Experience in managing a discipline’s wide information resources
  • Close relationship with the worldwide user community

What are the major innovations of INSPIRE?

  • Author disambiguation for high-quality profiles and improved search capabilities
  • Fulltext search and snippet display for access restricted content
  • Faster results
  • Variety of search and display options
  • Detailed record pages
  • Searchable fulltext for 5 years of arXiv content
  • Figures and searchable figure captions extracted from 5 years of arXiv articles
  • LHC experimental notes

What will be available soon?

  • Personalized features (bookshelves, author pages, paper claiming)
  • More APIs for third parties to build new tools
  • More historical content
  • Conference slides

Deeply cool digital library system from CERN.

Powered by WordPress