Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 31, 2013

Welcome to the Unified Astronomy Thesaurus!

Filed under: Astroinformatics,Thesaurus — Patrick Durusau @ 7:25 pm

Welcome to the Unified Astronomy Thesaurus!

From the webpage:

The Unified Astronomy Thesaurus (UAT) will be an open, interoperable and community-supported thesaurus which unifies the existing divergent and isolated Astronomy & Astrophysics thesauri into a single high-quality, freely-available open thesaurus formalizing astronomical concepts and their inter-relationships. The UAT builds upon the existing IAU Thesaurus with major contributions from the Astronomy portions of the thesauri developed by the Institute of Physics Publishing and the American Institute of Physics. We expect that the Unified Astronomy Thesaurus will be further enhanced and updated through a collaborative effort involving broad community participation.

While the AAS has assumed formal ownership of the UAT, the work will be available under a Creative Commons License, ensuring its widest use while protecting the intellectual property of the contributors. We envision that development and maintenance will be stewarded by a broad group of parties having a direct stake in it. This includes professional associations (IVOA, IAU), learned societies (AAS, RAS), publishers (IOP, AIP), librarians and other curators working for major astronomy institutes and data archives.

The main impetus behind the creation of a single thesaurus has been the wish to support semantic enrichment of the literature, but we expect that use of the UAT (along with other vocabularies and ontologies currently being developed in our community) will be much broader and will have a positive impact on the discovery of a wide range of astronomy resources, including data products and services.

Several thesauri are listed as resources at this site.

Certainly would make an interesting topic map project.

I first saw this at: Science Reference: A New Thesaurus Created for the Astronomy Community by Gary Price.

January 28, 2013

Planets, Stars and Stellar Systems: Volume 2: Astronomical Techniques, Software, and Data

Filed under: Astroinformatics — Patrick Durusau @ 1:17 pm

Planets, Stars and Stellar Systems: Volume 2: Astronomical Techniques, Software, and Data. (Amazon page)

A data mining book I am unlikely to ever see.

Why? The “discount” price at Amazon saves me $93.36. That should be a hint.

List price? $509.00, discount price: $415.64, for a 550 page book.

From the description:

This volume on “Astronomical Techniques, Software, and Data” edited by Howard E. Bond presents accessible review chapters on Astronomical Photometry, Astronomical Spectroscopy, Sky Surveys, Absolute Calibration of Spectrophotometric Standard Stars, Astronomical Polarimetry: polarized views of stars and planets, Infrared Astronomy Fundamentals, Techniques of Radio Astronomy, Radio and Optical Interferometry: Basic Observing Techniques and Data Analysis, Statistical Methods for Astronomy, Numerical Techniques in Astrophysics, Virtual Observatories, Data Mining, and Astroinformatics.

Given the cross-fertilization between fields on data mining techniques, Springer could profit from changing its “price by institutional subscriber base” policies.

But that would require marketing of titles to readers, not simply shipping them to customers who put them on shelves.

January 25, 2013

.Astronomy

Filed under: Astroinformatics,Graphics,Visualization — Patrick Durusau @ 8:16 pm

.Astronomy

From the “about” page:

The Internet provides an incredible platform for astronomers and astrophysical research. .Astronomy (pronounced ‘dot-astronomy’) aims to bring together an international community of astronomy researchers, developers, educators and communicators to showcase and build upon these many web-based projects, from outreach and education to research tools and data analysis.

.Astronomy events are held (almost) once a year. The most recent was in Heidelberg in July 2012. These meetings bring people together to talk, make and do cool stuff for just a few days. The events are focused, but informal, and encourage collaboration on new ideas that can benefit astronomy in a variety of ways.

Presentations from .Astronomy 4 and other resources.

For example: Julie Steele and Noah Iliinsky: How to be a Data Visualisation Star

Has a pointer to: Properties and Best Uses of Visual Encodings by Noah Iliinsky.

Be sure to grab Noah’s chart (PDF) and print it out (on a color printer)

Julie and Noah are the authors of:

Beautiful Visualizations: [looking at data through the eyes of experts]

Designing Data Visualizations

Highly recommended, along with the other presentations you will find here.

.Astronomy 5

Filed under: Astroinformatics,Conferences — Patrick Durusau @ 8:16 pm

Come to Cambridge For .Astronomy 5

From the post:

We’re happy to announce that you can now sign up for .Astronomy 5! Our fifth event will be hosted by Harvard’s Seamless Astronomy group at Microsoft’s NERD Center in Cambridge, MA, USA. Mark your diary, iCal, Google Calendar (or whatever system you’ve rigged up to wrangle Twitter into being your PA) for the date: September 16-18, 2013. We’ll be collecting together 50 attendees for a three-day conference, unconference and hack day: all about astronomy online!

Sign up will be slightly different this year, mainly to avoid a race to fill up limited spaces. We limit .Astronomy to roughly 50 people: a number that we find is large enough to inspire productive group work, and that everyone can contribute, but not so large that participants can hide in anonymity. So this year we’re opening up this sign up form today, and will keep it open until February. At that point we’ll pick 50 people based on the information on the forms received, to try and produce the most varied and awesome event yet. We want to ensure a good mix of new people and old hands – as well as good representation of all the different skills participants bring with them.

We’ll post further information about the event as we have it, such as the estimated registration fee (we aim to keep this low) and keynote speakers. If you have any questions about the signup process, then drop us a line. For updates, follow this site or keep an eye on the .Astronomy 5 information page at http://dotastronomy.com/events/five.

Jim Gray (MS) was reported to like astronomy data sets because they were big and free.

Are you interested in really “big data?”

This may be the conference for you.

January 21, 2013

New at Freebase

Filed under: Astroinformatics,Freebase — Patrick Durusau @ 7:29 pm

I saw a note at SemanticWeb.com about Freebase offering a new interface. Went to see.

Looked under astronomy, which had far fewer sub-topics than I would have imagined and visited the entry for “star.”

“Star” reports:

A star is really meant to be a single stellar object, not just something that looks like a star from earth. However, in many cases, other objects, such as multi-star systems, were originally thought to be stars. Because people have historically believed these to be stars, they are type as such, but they are also typed as what we now know them to be.

I understand the need to preserve prior “types” but that is a question of scope, not simply adding more types.

Moreover, if “star” means a “single stellar object,” then were do I put different classes of stars? Do they have occurrences too? Does that mean their occurrences get listed under “star” as well?

January 19, 2013

NASA Support – Dr. Kirk Borne of George Mason University

Filed under: Astroinformatics,Outlier Detection — Patrick Durusau @ 7:07 pm

The Arts and Entertainment Magazine (an unlikely source for me) published TAEM Interview with Dr. Kirk Borne of George Mason University, which is a delightful interview to generate support for NASA.

Of particular interest, Dr. Kirk Borne says:

My current research is focused on outlier detection, which I prefer to call Surprise Discovery – finding the unknown unknowns and the unexpected patterns in the data. These discoveries may reveal data quality problems (i.e., problems with the experiment or data processing pipeline), but they may also reveal totally new astrophysical phenomena: new types of galaxies or stars or whatever. That discovery potential is huge within the huge data collections that are being generated from the large astronomical sky surveys that are taking place now and will take place in the coming decades. I haven’t yet found that one special class of objects or new type of astrophysical process that will win me a Nobel Prize, but you never know what platinum-plated needles may be hiding in those data haystacks.

Topic maps are known for encoding knowns and known patterns in data.

How would you explore a topic map to find “…unknown unknowns and the unexpected patterns in the data?”

BTW, Dr. Borne invented the term “astroinformatics.”

January 18, 2013

Scalable Cross Comparison Service (version 1.1)

Filed under: Astroinformatics,BigData — Patrick Durusau @ 7:16 pm

Scalable Cross Comparison Service (version 1.1)

From the post:

The VAO has released a new version of the Scalable Cross Comparison (SCC) Service v1.1 on 02 January, 2013. SCC is a web-based application that performs spatial crossmatching between user source tables and on-line catalogs. New features of the service include:

  • Indexed cross-match candidate tables for very large and frequently used survey catalogs. New indexed catalogs include PPMXL, WISE, DENIS3, UCAC3, TYCHO2.
  • Interoperability of SCC with other VO tools and services via SAMP. With SAMP, tools can broadcast data from one tool to the next without the need to read or write files. Examples of other SAMP enabled virtual observatory tools include DDT, Iris, Topcat, Vizier, DS9). This is a Beta release of interoperability in SCC.

Try it now at http://www.usvao.org/tools.

Astronomy (optical and radio) has examples of integration of bigdata, before there was “bigdata.”

January 17, 2013

Virtual Astronomical Observatory – 221st AAS Meeting

Filed under: Astroinformatics,Data — Patrick Durusau @ 7:24 pm

The Virtual Astronomical Observatory (VAO) at the 221st AAS Meeting

From the post:

The VAO is funded to provide a computational infrastructure for virtual astronomy. When complete, it will enable astronomers to discover and access data in archives worldwide, allow them to share and publish datasets, and support analysis of data through an “ecosystem” of interoperable tools.

Nine out of twelve posters are available for download, including:

Even if you live in an area of severe night pollution, the heavens may only be an IP address away.

Enjoy!

January 8, 2013

Can Extragalactic Data Be Standardized? Part 2

Filed under: Astroinformatics,BigData,Parallel Programming — Patrick Durusau @ 11:44 am

Can Extragalactic Data Be Standardized? Part 2 by Ian Armas Foster.

From the post:

Last week, we profiled an effort headed by the Taiwanese Extragalactic Astronomical Data Center (TWEA-DC) to standardize astrophysical computer science.

Galaxy

Specifically, the object laid out by the TWEA-DC team was to create a language specifically designed for far-reaching astronomy—a Domain Specified Language. This would create a standard environment from which software could be developed.

For the researchers at the TWEA-DC, one of the bigger issues lies in the software currently being developed for big data management. Sebastien Foucaud and Nicolas Kamennoff co-authored the paper alongside Yasuhiro Hashimoto and Meng-Feng Tsai, who are based in Taiwan, laying out the TWEA-DC. They argue that since parallel processing is a relatively recent phenomenon, many programmers have not been versed in how to properly optimize their software. Specifically, they go into how the developers are brought up in a world where computing power steadily increases.

Indeed, preparing a new generation of computer scientists and astronomers is a main focus of the data center that opened in 2010. “One of the major goals of the TWEA-DC,” the researchers say, “is to prepare the next generation of astronomers, who will have to keep up pace with the changing face of modern Astronomy.”

Standard environments for software are useful, so long as they are recognized as also being ephemeral.

What was the standard environment for software development in the 1960’s wasn’t the same as the 1980’s nor the 1980’s the same as today.

Along with temporary “standard environments,” we should also construct entrances into and be thinking about exits from those environments.

January 3, 2013

Can Extragalactic Data Be Standardized? [Heterogeneity, the default case?]

Filed under: Astroinformatics,BigData — Patrick Durusau @ 7:53 pm

Can Extragalactic Data Be Standardized? by Ian Armas Foster.

From the post:

While lacking the direct practical applications that the study of genomics offers, astronomy is one of the more compelling use cases big data-related areas of academic research.

The wealth of stars and other astronomical phenomena that one can identify and classify provide an intriguing challenge. The long-term goal will be to eventually use the information from astronomical surveys in modeling the universe.

However, according to recent research written from French computer scientists Nicolas Kamennoff, Sebastien Foucaud, and Sebastien Reybier, the gradual decline of Moore’s Law and the resulting lack of computing power combined with the ever-expanding ability to see outside the Milky Way are creating a significant bottleneck in astronomical research. In particular, software has yet to catch up to strides made in parallel processing.

This article is the first of two focused around an ambitious-sounding institute known as the Taiwan Extragalactic Astronomical Data Center (TWEA-DC ). Here, the researchers identified three problems they hope to solve through the TWEA-DC: misuse of resources, the existence of a heterogeneous software ecosystem, and data transfer.

I guess this counts as one of my more “theory” oriented posts on topic maps. 😉

Of particular interest for the recognition that heterogeneity isn’t limited to data. Heterogeneity exists between software systems as well.

Homogeneity, for both data and software, is an artifice constructed to make early digital computers possible.

Whether CS is now strong enough for the default case, heterogeneity of both data and software, remains to be seen.

(On TWEA-DC proper, see: TaiWan Extragalactic Astronomical Data Center — TWEA-DC (website))

December 16, 2012

Asterank: an Accurate 3D Model of the Asteroids in our Solar System

Filed under: Astroinformatics,Mapping,Maps — Patrick Durusau @ 9:02 pm

Asterank: an Accurate 3D Model of the Asteroids in our Solar System by Andrew Vande Moere.

From the post:

Asterank 3D Asteroid Orbit Space Simulation [asterank.com], developed by software engineer Ian Webster, is a 3D WebGL-based model of the first 5 planets and the 30 most valuable asteroids, together with their respective orbits in our inner solar system.

Asterank’s database contains the astronomically accurate locations, as well as some economic and scientific information, of over 580,000 asteroids in our solar system. Each asteroid is accompanied by its “Value of Materials”, in terms of the metals, volatile compounds, or water it seem to contain. The “Cost of Operations” provides a financial estimation of how much it would cost to travel to the asteroid and move the materials back to Earth.

Will you be ready as semantic diversity spreads from the Earth out into the Solar System?

December 11, 2012

The Theoretical Astrophysical Observatory

Filed under: Astroinformatics — Patrick Durusau @ 7:42 pm

The Theoretical Astrophysical Observatory by Darren Croton.

If you guessed from the title that the acronym would be “TAO,” take a point for your house.

This post is not going to be about CANDELS directly, but about work that, in the long run, could play an enormous part in helping CANDELS astronomers analyse and interpret their data.

At Swinburne University in Australia, myself and my group are developing a new tool, called the Theory Astrophysical Observatory (TAO), which will make access to cutting edge supercomputer simulations of galaxy formation almost trivial. TAO will put the latest theory data in to the “cloud” for use by the international astronomy community, plus add a number of science enhancing eResearch tools. It is part of a larger project funded by the Australian Government called the All Sky Virtual Observatory (ASVO).

TAO boasts a clean and intuitive web interface. It avoids the need to know a database query language (like SQL) by providing a custom point-and-click web-form to select virtual galaxies and their properties, which auto-generates the query code in the background. Query results can then be funneled through additional “modules” and sent to a local supercomputer for further processing and manipulation….

You may not have a local supercomputer today, but in a year or two? Maybe as accessible as FaceBook is today. Hopefully more useful. 😉

Are you still designing for desk/laptop processing capabilities?

December 10, 2012

Tools for Data-Intensive Astronomy – a VO Community Day in Baltimore, MD (Archive)

Filed under: Astroinformatics,BigData — Patrick Durusau @ 7:19 pm

Tools for Data-Intensive Astronomy – a VO Community Day in Baltimore, MD (Archive)

In case you missed the original webcast, for your viewing pleasure:

2012 VAO (Virtual Astronomical Observatory) – Thursday Nov 29, 2012
Welcome, Overview, Objectives
Bob Hanisch, Matt Mountain  (Space Telescope Science Institute)
Science Capabilities of the VO
Joe Lazio (Jet Propulsion Laboratory)
Spectral Analysis and SEDs
Ivo Busko (Space Telescope Science Institute)
Data Discovery
Tom Donaldson (Space Telescope Science Institute)
CANDELS and the VO
Anton Koekemoer (Space Telescope Science Institute)
The VO and Python, VAO futures
Perry Greenfield  (Space Telescope Science Institute)
Hands-on Session, Q&A
Tom Donaldson, Bob Hanisch, Ivo Busko, Anton Koekemoer (Space Telescope Science Institute)

The academics would call it being “inter-disciplinary.”

I call it being “innovative and successful.”

December 7, 2012

Astronomy Resources [Zillman]

Filed under: Astroinformatics,Data — Patrick Durusau @ 6:38 pm

Astronomy Resources by Marcus P. Zillman.

From the post:

Astronomy Resources (AstronomyResources.info) is a Subject Tracer™ Information Blog developed and created by the Virtual Private Library™. It is designed to bring together the latest resources and sources on an ongoing basis from the Internet for astronomical resources which are listed below….

With some caveats, this may be of interest.

First, the level of content is uneven. It ranges from professional surveys (suitable for topic map explorations) to more primary/secondary education type materials. Nothing against the latter but the mix is rather jarring.

Second, I didn’t test every link but for example AstroGrid is a link to a project that was completed two years ago (2010).

Just in case you stumble across any of the “white papers” at http://www.whitepapers.us/, also by Marcus P. Zillman, do verify resources before citing them to others.

December 6, 2012

VO Inside: CALIFA Survey Data Release

Filed under: Astroinformatics,BigData — Patrick Durusau @ 11:41 am

VO Inside: CALIFA Survey Data Release

From the post:

The Calar Alto Legacy Integral Field Area Survey (CALIFA) is observing approximately 600 galaxies in the local universe using 250 observing nights with the PMAS/PPAK integral field spectrophotometer, mounted on the Calar Alto 3.5 m telescope. The first data release occurred on the 1st of November 2012. This DR comprises 200 datacubes corresponding to 100 CALIFA objects (one per setup: V500 and V1200). The data have been fully reduced, quality tested, and are scientifically useful.

The CALIFA survey team provides information here about accessing data with Topcat and other VO Tools.

Something for the astronomer on your gift list, another big data set!

November 27, 2012

SunPy [Choosing Specific Subject Identity Issues]

Filed under: Astroinformatics,Subject Identity,Topic Maps — Patrick Durusau @ 10:57 am

SunPy: A Community Python Library for Solar Physics

From the homepage:

The SunPy project is an effort to create an open-source software library for solar physics using the Python programming language.

As you have seen in your own experience or read about in my other posting on astronomical data, like elsewhere, subject identity issues abound.

This is another area that may spark someone’s interest in using topic maps to mitigate against specific subject identity issues.

“Specific subject identity issues” because the act of mitigation always creates more subjects which could be the sources of subject identity issues. It’s not a problem so long as you choose the issues most important to you.

If and when those other potential subject identity issues become relevant, they can be addressed later. The logic approach pretends such issues don’t exist at all. I prefer the former. It’s less fragile.

Sky Survey Data Lacks Standardization [Heterogeneous Big Data]

Filed under: Astroinformatics,BigData,Heterogeneous Data,Information Retrieval — Patrick Durusau @ 5:51 am

Sky Survey Data Lacks Standardization by Ian Armas Foster.

From the post:

The Sloan Digital Sky Survey is at the forefront of astronomical research, compiling data from observatories around the world in an effort to truly pinpoint where we lie on the universal map. In order to do that, they must aggregate data from several observatories across the world, an intensive data operation.

According to a report written by researchers at UCLA, even though the SDSS is a data intensive astronomical mapping survey, it has yet to lay down a standardized foundation for retrieving and storing scientific data.

Per sdss.org, the first two projects were responsible for observing “a quarter of the sky” and picking out nearly a million galaxies and over 100,000 quasars. The project started at the Apache Point observatory in New Mexico and has since grown to include 25 observatories across the globe. The SDSS gained recognition in2009 with the Nobel Prize in physics awarded to the advancement of optical fibers and digital imaging detectors (or CCDs) that allowed the project to grow in scale.

The point is that the datasets that the scientists used seemed to be scattered. Some would come about through informal social contacts such as email while others would simply search for necessary datasets on Google. Further, once these datasets were found, there was even an inconsistency in how they were stored before they could be used. However, this may have had to do with the varying sizes of the sets and how quickly the researchers wished to use the data. The entire SDSS dataset consists of over 130 TB, according to the report, and that volume can be slightly unwieldy.

“Large sky surveys, including the SDSS, have significantly shaped research practices in the field of astronomy,” the report concluded. “However, these large data sources have not served to homogenize information retrieval in the field. There is no single, standardized method for discovering, locating, retrieving, and storing astronomy data.”

So, big data isn’t going to be homogeneous big data but heterogeneous big data.

That sounds like an opportunity for topic maps to me.

You?

November 24, 2012

Tools for Data-Intensive Astronomy – a VO Community Day [Webcast]

Filed under: Astroinformatics,BigData — Patrick Durusau @ 7:38 pm

Tools for Data-Intensive Astronomy – a VO Community Day in Baltimore, MD

Thursday, November 29, 2012
10AM-2PM
Location: Bahcall Auditorium, Space Telescope Science Institute

From the post:

Experts from the VAO will demonstrate tools and services for data-intensive astronomy in the context of a range of science use cases and tutorials including:

  • Data discovery and access
  • Catalog cross comparison
  • Constructing and modeling spectral energy distributions
  • Time series analysis tools
  • Distributed database queries
  • …and more

In the morning we will be showing a number of demonstrations of VO science applications and tools. Lunch will be provided for all participants and there will be informal discussions and Q&A over lunch. Afterwards, from ~12:45 to 2:00pm, there will be some hands-on time with some typical science use cases. You are welcome to bring your laptop and try things out for yourself.

Register now at usvao.org/voday@baltimore

This event will also be webcast live:

For video see: https://webcast.stsci.edu/webcast/
For audio: 1 877-951-4490, Passcode is 4015008

And I thought I would have to miss because of distance!

Thank goodness for webcasts!

It should have a warning label:

Warning: This webcast contains new or different ideas, which may result in questioning of current ideas or even having new ones. Viewer discretion is advised.

November 23, 2012

Data Mining and Machine Learning in Astronomy

Filed under: Astroinformatics,Data Mining,Machine Learning — Patrick Durusau @ 11:30 am

Data Mining and Machine Learning in Astronomy by Nicholas M. Ball and Robert J. Brunner. (International Journal of Modern Physics D, Volume 19, Issue 07, pp. 1049-1106 (2010).)

Abstract:

We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

At fifty-eight (58) pages and three hundred and seventy-five references, this is a great starting place to learn about data mining and machine learning from an astronomy perspective!

And should yield new techniques or new ways to apply old ones to your data, with a little imagination.

Dates from 2010 so word of more recent surveys welcome!

…Knowledge Extraction From Complex Astronomical Data Sets

Filed under: Astroinformatics,BigData,Data Mining,Knowledge Discovery — Patrick Durusau @ 11:29 am

CLaSPS: A New Methodology For Knowledge Extraction From Complex Astronomical Data Sets by R. D’Abrusco, G. Fabbiano, G. Djorgovski, C. Donalek, O. Laurino and G. Longo. (R. D’Abrusco et al. 2012 ApJ 755 92 doi:10.1088/0004-637X/755/2/92)

Abstract:

In this paper, we present the Clustering-Labels-Score Patterns Spotter (CLaSPS), a new methodology for the determination of correlations among astronomical observables in complex data sets, based on the application of distinct unsupervised clustering techniques. The novelty in CLaSPS is the criterion used for the selection of the optimal clusterings, based on a quantitative measure of the degree of correlation between the cluster memberships and the distribution of a set of observables, the labels, not employed for the clustering. CLaSPS has been primarily developed as a tool to tackle the challenging complexity of the multi-wavelength complex and massive astronomical data sets produced by the federation of the data from modern automated astronomical facilities. In this paper, we discuss the applications of CLaSPS to two simple astronomical data sets, both composed of extragalactic sources with photometric observations at different wavelengths from large area surveys. The first data set, CSC+, is composed of optical quasars spectroscopically selected in the Sloan Digital Sky Survey data, observed in the x-rays by Chandra and with multi-wavelength observations in the near-infrared, optical, and ultraviolet spectral intervals. One of the results of the application of CLaSPS to the CSC+ is the re-identification of a well-known correlation between the αOX parameter and the near-ultraviolet color, in a subset of CSC+ sources with relatively small values of the near-ultraviolet colors. The other data set consists of a sample of blazars for which photometric observations in the optical, mid-, and near-infrared are available, complemented for a subset of the sources, by Fermi γ-ray data. The main results of the application of CLaSPS to such data sets have been the discovery of a strong correlation between the multi-wavelength color distribution of blazars and their optical spectral classification in BL Lac objects and flat-spectrum radio quasars, and a peculiar pattern followed by blazars in the WISE mid-infrared colors space. This pattern and its physical interpretation have been discussed in detail in other papers by one of the authors.

A new approach for mining “…correlations in complex and massive astronomical data sets produced by the federation of the data from modern automated astronomical facilities.”

Mining complex and massive data sets. I have heard that somewhere recently. Sure it will come back to me.

First Light for the Millennium Run Observatory

Filed under: Astroinformatics,Data Mining,Simulations — Patrick Durusau @ 11:29 am

First Light for the Millennium Run Observatory by Cmarchesin.

From the post:

The famous Millennium Run (MR) simulations now appear in a completely new light – literally. The project, led by Gerard Lemson of the MPA and Roderik Overzier of the University of Texas, combines detailed predictions from cosmological simulations with a virtual observatory in order to produce synthetic astronomical observations. In analogy to the moment when newly constructed astronomical observatories receive their “first light”, the Millennium Run Observatory (MRObs) has produced its first images of the simulated universe. These virtual observations allow theorists and observers to analyse the purely theoretical data in exactly the same way as they would purely observational data. Building on the success of the Millennium Run Database, the simulated observations are now being made available to the wider astronomical community for further study. The MRObs browser – a new online tool – allows users to explore the simulated images and interact with the underlying physical universe as stored in the database. The team expects that the advantages offered by this approach will lead to a richer collaboration between theoretical and observational astronomers.

At least with simulated observations, there is no need to worry about cloudy nights. 😉

Interesting in its own right but also as an example of yet another tool for data mining, that of simulation.

Not in the sense of generating “test” data but of deliberating altering data and then measuring the impact of the alterations on data mining tools.

Quite possibly in a double blind context where only some third party knows which data sets were “altered” until all tests have been performed.

Millennium Run Observatory Web Portal and access to the MRObs browser

Tera-scale Astronomical Data Analysis and Visualization

Filed under: Astroinformatics,BigData,Data Analysis,Visualization — Patrick Durusau @ 11:27 am

Tera-scale Astronomical Data Analysis and Visualization by A. H. Hassan, C. J. Fluke, D. G. Barnes, V. A. Kilborn.

Abstract:

We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: (1) volume rendering using an arbitrary transfer function at 7–10 frames per second; (2) computation of basic global image statistics such as the mean intensity and standard deviation in 1.7 s; (3) evaluation of the image histogram in 4 s; and (4) evaluation of the global image median intensity in just 45 s. Our measured results correspond to a raw computational throughput approaching one teravoxel per second, and are 10–100 times faster than the best possible performance with traditional single-node, multi-core CPU implementations. A scalability analysis shows the framework will scale well to images sized 1 TB and beyond. Other parallel data analysis algorithms can be added to the framework with relative ease, and accordingly, we present our framework as a possible solution to the image analysis and visualization requirements of next-generation telescopes, including the forthcoming Square Kilometre Array pathfinder radiotelescopes.

Looks like the original “big data” folks (astronomy) are moving up to analysis of near terabyte size images.

A glimpse of data and techniques that are rapidly approaching.

I first saw this in a tweet by Stefano Bertolo.

November 22, 2012

VAO Software Release: Data Discovery Tool (version 1.4)

Filed under: Astroinformatics,BigData — Patrick Durusau @ 6:07 am

VAO Software Release: Data Discovery Tool (version 1.4)

From the post:

The VAO has released a new version of the Data Discovery Tool (v1.4) on October 11, 2012. With this tool you can find datasets from thousands of astronomical collections known to the VO and over wide areas of the sky. This includes thousands of astronomical collections – photometric catalogs and images – and archives around the world.

New features of the Data Discovery Tool include:

  • New tooltips describing each data field in search results
  • Improved display and manipulation of numeric filters
  • Automatic color assignment for overlay graphics in the all-sky viewer

Try it now at http://www.usvao.org/tools.

From a community that used “big data” long before it became a buzz word for IT marketers.

November 2, 2012

CALIFA First Data Release

Filed under: Astroinformatics,BigData,Data — Patrick Durusau @ 6:35 am

CALIFA (Calar Alto Legacy Integral Field spectroscopy Area survey) First Data Release

From the webpage:

The Calar Alto Legacy Integral Field Area survey is one of the largest IFS surveys performed to date. At its completion it will comprise 600 galaxies, observed with the PMAS spectrograph in the PPAK mode, covering the full spatial extent of these galaxies up to two effective radii. The wavelength range between 3700 and 7500 Å is sampled with two spectroscopic configurations, a high resolution mode (V1200, R~1700, 3700-4200 Å), and a low resolution mode (V500, R~850, 3750-7500 Å). A detailed explanation of the survey is given in the CALIFA Presentation Article (Sánchez et al. 2012).

The first CALIFA Data Release (DR1) provides to the public the fully reduced and quality control tested datacubes of 100 objects in both setups (V500 and V1200). Each datacube contains ~2000 individual spectra, thus in total this DR comprises ~400,000 individual spectra. The details of the data included in this DR are described in the CALIFA DR1 Article (Husemann et al. 2012). The complete list of the DR1 objects for which we deliver data can be found in the following webpage.

The main characteristics of the galaxies included in the full CALIFA mother sample, a subset of which are delivered in this DR, will be given in the CALIFA sample characterization article (Walcher et al. in prep.). This article will provide detailed information of the photometric, morphological and environmental properties of the galaxies, and a description of the statistical properties of the full sample.

The non-technical explanation:

Galaxies are the large-scale building blocks of the cosmos. Their visible ingredients include between millions and hundreds of billions of stars as well as clouds of gas and dust. “Understanding the dynamical processes within and between galaxies that have shaped the way they are today is a key part of understanding our wider cosmic environment.”, explains Dr. Glenn van de Ven, a member of the managing board of the CALIFA survey and staff scientist at the Max Planck Institute for Astronomy (MPIA).

Traditionally, when it came to galaxies, astronomers had to choose between different observational techniques. They could, for instance, take detailed images with astronomical cameras showing the various features of a galaxy as well as their spatial relations, but they could not at the same time perform detailed analyses of the galaxy’s light, that is “obtain a galaxy spectrum”. Taking spectra required a different kind of instrument known as a spectrograph, which, as a downside, would only provide very limited information about the galaxy’s spatial structure.

An increasingly popular observational technique, integral field spectroscopy (IFS), combines the best of both worlds. The IFS instrument PMAS mounted at the Calar Alto Observatory’s 3.5 metre telescope uses 350 optical fibres to guide light from a corresponding number of different regions of a galaxy image into a spectrograph. In this way, astronomers are not restricted to analysing the galaxy as a whole – they can analyse the light coming from many different specific parts of a galaxy. The result are detailed maps of galaxy properties such as their chemical composition, and of the motions of their stars and their gas.

For the CALIFA survey, more than 900 galaxies in the local Universe, namely at distances between 70 and 400 million light years from the Milky Way, were selected from the northern sky to fully fit into the field-of-view of PMAS. They include all possible types, from roundish elliptical to majestic spiral galaxies, similar to our own Milky Way and the Andromeda galaxy. The allocated observation time will allow for around 600 of the pre-selected galaxies to be observed in depth.

From: CALIFA survey publishes intimate details of 100 galaxies

Either way, I thought you would find it an interesting “big data” set to consider over the weekend.

Or if you are an amateur astronomer with a cloudy weekend, something to expand your horizons.

November 1, 2012

VO Days (current and previous)

Filed under: Astroinformatics — Patrick Durusau @ 6:43 pm

VO Days (current and previous)

VO (virtual observatory) days resource at the US Virtual Astronomical Observatory.

Prior days cover a wide variety of data resources and tools for research astronomers.

If you are interested in “big data” before the term was popular, the astronomy community is a good place to start.

October 30, 2012

Kepler Telescope Data Release: The Power of Sharing Data

Filed under: Astroinformatics,BigData — Patrick Durusau @ 8:21 am

Additional Kepler Data Now Available to All Planet Hunters

From the post:

The Space Telescope Science Institute in Baltimore, Md., is releasing 12 additional months worth of planet-searching data meticulously collected by one of the most prolific planet-hunting endeavors ever conceived, NASA’s Kepler Mission.

As of Oct. 28, 2012, every observation from the extrasolar planet survey made by Kepler since its launch in 2009 through June 27, 2012, is available to scientists and the public. This treasure-trove contains more than 16 terabytes of data and is housed at the Barbara A. Mikulski Archive for Space Telescopes, or MAST, at the Space Telescope Science Institute. MAST is a huge data archive containing astronomical observations from 16 NASA space astronomy missions, including the Hubble Space Telescope. It is named in honor of Maryland U.S. Senator Barbara A. Mikulski.

Over the past three years the Kepler science team has discovered 77 confirmed planets and 2,321 planet candidates. All of Kepler’s upcoming observations will be no longer exclusive to the Kepler science team, its guest observers, and its asteroseismology consortium members and will be available immediately to the public.

…..

In addition to yielding evidence for planets circling some of the target stars, the Kepler data also reveal information about the behavior of many of the other stars being monitored. Kepler astronomers have discovered star spots, flaring stars, double-star systems, and “heartbeat” stars, a class of eccentric binary systems undergoing dynamic tidal distortions and tidally induced pulsations.

There is far more data in the Kepler archives than astronomers have time to analyze quickly. Avid volunteer astronomers are invited to make Kepler discoveries by perusing the archive through a website called “Planet Hunters,” (http://www.planethunters.org/). A tutorial informs citizen scientists how to analyze the Kepler data, so they may assist with the research. Visitors to the website cannot actually see individual planets. Instead, they look for the effects of planets as they sweep across the face of their parent stars. Volunteer scientists have analyzed over 14 million observations so far. Just last week citizen scientists announced the discovery of the first planet to be found in a quadruple-star system.

The additional analysis by volunteer scientists, especially: “the first planet to be found in a quadruple-star system,” illustrates the power of sharing “big data.”

October 23, 2012

When V = Volume [HST Telemetry Data]

Filed under: Astroinformatics,BigData — Patrick Durusau @ 3:54 am

Personal PCs have TB disk storage. A TB of RAM isn’t far behind. Multi-TBs of both are available in high-end appliances.

One solution when v = volume is to pump up the storage volume. But you can always find data sets that are “big data” for your current storage.

Fact is, “big data” has always outrun current storage. The question of how to store more data than convenient has been asked and answered before. I encountered one of those answers last night.

The abstract to the paper reads:

The Hubble Space Telescope (HST) generates on the order of 7,000 telemetry values, many of which are sampled at 1Hz, and with several hundred parameters being sampled at 40Hz. Such data volumes would quickly tax even the largest of processing facilities. Yet the ability to access the telemetry data in a variety of ways, and in particular, using ad hoc (i.e., no a priori fixed) queries, is essential to assuring the long term viability and usefulness of this instrument. As part of the recent NASA initiative to re-engineer HST’s ground control systems, a concept arose to apply newly available data warehousing technologies to this problem. The Space Telescope Science Institute was engaged to develop a pilot to investigate the technology and to create a proof-of-concept testbed that could be demonstrated and evaluated for operational use. This paper describes this effort and its results.

The authors framed their v = volume problem as:

Then there’s the shear volume of the telemetry data. At its nominal format and rate, the HST generates over 3,000 monitored samples per second. Tracking each sample as a separate record would generate over 95 giga-records/year, or assuming a 16 year Life-of-Mission (LOM), 1.5 tera-records/LOM. Assuming a minimal 20 byte record per transaction yields 1.9 terabytes/year or 30 terabytes/LOM. Such volumes are supported by only the most exotic and expensive custom database systems made.

We may smile at the numbers now but this was 1998. As always, solutions were needed in the near term, not in a decade or two.

The authors did find a solution. Their v = 30 terabytes/LOM was reduced to v = 2.5 terabytes/LOM.

In the author’s words:

By careful study of the data, we discovered two properties that could significantly reduce this volume. First, instead of capturing each telemetry measurement, by only capturing when the measurement changed value – we could reduce the volume by almost 3-to-1. Second, we recognized that roughly 100 parameters changed most often (i.e., high frequency parameters) and caused the largest volume of the “change” records. By averaging these parameters over some time period, we could still achieve the necessary engineering accuracy while again reducing the volume of records. In total, we reduced the volume of data down to a reasonable 250 records/sec or approximately 2.5 terabytes/LOM.

Two obvious lessons for v = volume cases:

  • Capture only changes in values
  • Capture average for rapidly changing values over time (if meets accuracy requirements)

Less obvious lesson:

  • Study data carefully to understand its properties relative to your requirements.

Studying, understanding and capturing your understanding of your data will benefit you and subsequent researchers working with the same data.

Whether your v = volume is the same as mine or not.


Quotes are from: “A Queriable Repository for HST Telemetry Data, a Case Study in using Data Warehousing for Science and Engineering” by Joseph A. Pollizzi, III and Karen Lezon, Astronomical Data Analysis Software and Systems VII, ASP Conference Series, Vol. 145, 1998, Editors: R. Albrecht, R. N. Hook and H. A. Bushouse, pp.367-370.

There are other insights and techniques of interest in this article but I leave them for another post.

October 1, 2012

PDS – Planetary Data System [The Mother Lode]

Filed under: Astroinformatics,Data — Patrick Durusau @ 4:35 pm

PDS – Planetary Data System

From the webpage:

The PDS archives and distributes scientific data from NASA planetary missions, astronomical observations, and laboratory measurements. The PDS is sponsored by NASA’s Science Mission Directorate. Its purpose is to ensure the long-term usability of NASA data and to stimulate advanced research

Tools, data, guides, etc.

Quick searches include:

  • Mercury
  • Venus
  • Mars
  • Jupiter
  • Saturn
  • Uranus, Neptune, Pluto
  • Rings
  • Asteroids
  • Comets
  • Planetary Dust
  • Earth’s Moon
  • Solar Wind

The ordering here makes a little more sense to me. What about you?

A nice way to teach scientific, mathematical and computer literacy without making it seem like work. 😉

Planetary Data System – Geosciences Node

Filed under: Astroinformatics,Data,Geographic Data — Patrick Durusau @ 3:22 pm

Sounds like SciFi, yes? SciFi? No!

After seeing Google add some sea bed material to Google Maps, I started to wonder about radar based maps of other places. Like the Moon.

I remember the excitement Ranger 7 images generated. And that in grainy newspaper reproductions.

With just a little searching, I came across PDS (Planetary Data Services) Geosciences Node (Washington University in St. Louis).

From the web page:

The Geosciences Node of NASA’s Planetary Data System (PDS) archives and distributes digital data related to the study of the surfaces and interiors of terrestrial planetary bodies. We work directly with NASA missions to help them generate well-documented, permanent data archives. We provide data to NASA-sponsored researchers along with expert assistance in using the data. All our archives are online and available to the public to download free of charge.

Which includes:

  • Mars
  • Venus
  • Mercury
  • Moon
  • Earth (test data for other planetary surfaces)
  • Asteroids
  • Gravity Models

Even after checking the FAQ, I can’t explain the ordering of these entries. Order from the Sun doesn’t work. Neither does order or distance from Earth. Nor alphabetical sort order. Suggestions?

In any event, enjoy the data set!

September 12, 2012

Pushing Parallel Barriers Skyward (Subject Identity at 1EB/year)

Filed under: Astroinformatics,BigData,Subject Identity — Patrick Durusau @ 5:50 pm

Pushing Parallel Barriers Skyward by Ian Armas Foster

From the post:

As much data as there exists on the planet Earth, the stars and the planets that surround them contain astronomically more. As we discussed earlier, Peter Nugent and the Palomar Transient Factory are using a form of parallel processing to identify astronomical phenomena.

Some researchers believe that parallel processing will not be enough to meet the huge data requirements of future massive-scale astronomical surveys. Specifically, several researchers from the Korea Institute of Science and Technology Information including Jaegyoon Hahm along with Yongsei University’s Yong-Ik Byun and the University of Michigan’s Min-Su Shin wrote a paper indicating that the future of astronomical big data research is brighter with cloud computing than parallel processing.

Parallel processing is holding its own at the moment. However, when these sky-mapping and phenomena-chasing projects grow significantly more ambitious by the year 2020, parallel processing will have no hope.

How ambitious are these future projects? According to the paper, the Large Synoptic Survey Telescope (LSST) will generate 75 petabytes of raw plus catalogued data for its ten years of operation, or about 20 terabytes a night. That pales in comparison to the Square Kilometer Array, which is projected to archive in one year 250 times the amount of information that exists on the planet today.

“The total data volume after processing (the LSST) will be several hundred PB, processed using 150 TFlops of computing power. Square Kilometer Array (SKA), which will be the largest in the world radio telescope in 2020, is projected to generate 10-100PB raw data per hour and archive data up to 1EB every year.”

Beyond storage/processing requirements, how do you deal with subject identity at 1EB/year?

Changing subject identity that is.

People are as inconstant with subject identity as they are with martial fidelity. If they do that well.

Now spread that over decades or centuries of research.

Does anyone see a problem here?

« Newer PostsOlder Posts »

Powered by WordPress