Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 15, 2012

Mining the astronomical literature

Filed under: Astroinformatics,Data Mining — Patrick Durusau @ 1:58 pm

Mining the astronomical literature (A clever data project shows the promise of open and freely accessible academic literature) by Alasdair Allan.

From the post:

There is a huge debate right now about making academic literature freely accessible and moving toward open access. But what would be possible if people stopped talking about it and just dug in and got on with it?

NASA’s Astrophysics Data System (ADS), hosted by the Smithsonian Astrophysical Observatory (SAO), has quietly been working away since the mid-’90s. Without much, if any, fanfare amongst the other disciplines, it has moved astronomers into a world where access to the literature is just a given. It’s something they don’t have to think about all that much.

The ADS service provides access to abstracts for virtually all of the astronomical literature. But it also provides access to the full text of more than half a million papers, going right back to the start of peer-reviewed journals in the 1800s. The service has links to online data archives, along with reference and citation information for each of the papers, and it’s all searchable and downloadable.

(graphic omitted)

The existence of the ADS, along with the arXiv pre-print server, has meant that most astronomers haven’t seen the inside of a brick-built library since the late 1990s.

It also makes astronomy almost uniquely well placed for interesting data mining experiments, experiments that hint at what the rest of academia could do if they followed astronomy’s lead. The fact that the discipline’s literature has been scanned, archived, indexed and catalogued, and placed behind a RESTful API makes it a treasure trove, both for hypothesis generation and sociological research.

That’s the trick isn’t it? “…if they followed astronomy’s lead.”

The technology used by the astronomical community has been equally available to other scientific, technical, medical and humanities disciplines.

Instead of ADS, for example, the humanities have JSTOR. JSTOR is supported by funds that originate with the public but the public has no access.

An example of how a data project reflects the character of the community that gave rise to it.

Astronomers value sharing of information and data, therefore their projects reflect those values.

Other projects reflect other values.

Not a question of technology but one of fundamental values.

August 10, 2012

First BOSS Data: 3-D Map of 500,000 Galaxies, 100,000 Quasars

Filed under: Astroinformatics,Data,Science — Patrick Durusau @ 9:02 am

First BOSS Data: 3-D Map of 500,000 Galaxies, 100,000 Quasars

From the post:

The Third Sloan Digital Sky Survey (SDSS-III) has issued Data Release 9 (DR9), the first public release of data from the Baryon Oscillation Spectroscopic Survey (BOSS). In this release BOSS, the largest of SDSS-III’s four surveys, provides spectra for 535,995 newly observed galaxies, 102,100 quasars, and 116,474 stars, plus new information about objects in previous Sloan surveys (SDSS-I and II).

“This is just the first of three data releases from BOSS,” says David Schlegel of the U.S. Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab), an astrophysicist in the Lab’s Physics Division and BOSS’s principal investigator. “By the time BOSS is complete, we will have surveyed more of the sky, out to a distance twice as deep, for a volume more than five times greater than SDSS has surveyed before — a larger volume of the universe than all previous spectroscopic surveys combined.”

Spectroscopy yields a wealth of information about astronomical objects including their motion (called redshift and written “z”), their composition, and sometimes also the density of the gas and other material that lies between them and observers on Earth. The BOSS spectra are now freely available to a public that includes amateur astronomers, astronomy professionals who are not members of the SDSS-III collaboration, and high-school science teachers and their students.

The new release lists spectra for galaxies with redshifts up to z = 0.8 (roughly 7 billion light years away) and quasars with redshifts between z = 2.1 and 3.5 (from 10 to 11.5 billion light years away). When BOSS is complete it will have measured 1.5 million galaxies and at least 150,000 quasars, as well as many thousands of stars and other “ancillary” objects for scientific projects other than BOSS’s main goal.

For data access, software tools, tutorials, etc., see: http://sdss3.org/

Interesting data set but also instructive for the sharing of data and development of tools for operations on shared data. You don’t have to have a local supercomputer to process the data. Dare I say a forerunner of the “cloud?”

Be the alpha geek at your local astronomy club this weekend!

August 6, 2012

A Big Data Revolution in Astrophysics

Filed under: Astroinformatics,BigData — Patrick Durusau @ 3:34 pm

A Big Data Revolution in Astrophysics by Ian Armas Foster.

Ian writes:

Humanity has been studying the stars for as long as it has been able to gaze at them. The study of stars has to led to one revelation after another; that the planet is round, that we are not the center of the universe, and has also spawned Einstein’s general theory of relativity.

As more powerful telescopes are developed, more is learned about the wild happenings in space, including black holes, binary star systems, the movement of galaxies, and even the detection of the Cosmic Microwave Background, which may hint at the beginnings of the universe.

However, all of these discoveries were made relatively slowly, relying on the relaying of information to other stations whose observatories may not be active for several hours or even days—a process that carries a painful amount of time between image and retrieval and potential discovery recognition.

Solving these problems would be huge for astrophysics. According to Peter Nugent, Senior Staff Scientist of Berkeley’s National Laboratory, big data is on its way to doing just that. Nugent has been the expert voice on this issue following his experiences with an ambitious project known as the Palomar Transient Factory.

It’s a good post and is likely to get your interested in astronomical (both senses) data problems.

Quibble: Why no links to the Palomar Transient Factory? Happen too often at many sites for this to be oversight. We are all writing in hyperlink capable media. Yes? Why the poverty of hyperlinks?

BTW:

Palomar Transient Factory, and

Access public spectra (WISEASS)

I don’t mind if you visit other sites. I write to facilitate your use of resources on the WWW. Maybe that’s the difference.

July 29, 2012

AstroPython

Filed under: Astroinformatics,Python — Patrick Durusau @ 3:09 pm

AstroPython

From the webpage:

The purpose of this web site is to act as a community knowledge base for performing astronomy research with Python. It provides lists of useful resources, a forum for general discussion, advice, or relevant news items, collecting users’ code snippets or scripts, and longer tutorials on specific topics. The topics within these pages are presented in a list view with the ability to sort by date or topic. A traditional “blog” view of the most recently posted topics is visible from the site Home page.

Along with the other astronomy applications I have mentioned this weekend I thought you might find this useful.

Skills with Python, data processing and subject identification/mapping skills transfer across disciplines.

SAOImage DS9

Filed under: Astroinformatics,Image Processing — Patrick Durusau @ 2:00 pm

SAOImage DS9

From the webpage:

SAOImage DS9 is an astronomical imaging and data visualization application. DS9 supports FITS images and binary tables, multiple frame buffers, region manipulation, and many scale algorithms and colormaps. It provides for easy communication with external analysis tasks and is highly configurable and extensible via XPA and SAMP.

DS9 is a stand-alone application. It requires no installation or support files. All versions and platforms support a consistent set of GUI and functional capabilities.

DS9 supports advanced features such as 2-D, 3-D and RGB frame buffers, mosaic images, tiling, blinking, geometric markers, colormap manipulation, scaling, arbitrary zoom, cropping, rotation, pan, and a variety of coordinate systems.

The GUI for DS9 is user configurable. GUI elements such as the coordinate display, panner, magnifier, horizontal and vertical graphs, button bar, and color bar can be configured via menus or the command line.

New in Version 7

3-D Data Visualization

Previous versions of SAOImage DS9 would allow users to load 3-D data into traditional 2-D frames, and would allow users to step through successive z-dimension pixel slices of the data cube. To visualize 3-D data in DS9 v. 7.0, a new module, encompassed by the new Frame 3D option, allows users to load and view data cubes in multiple dimensions.

The new module implements a simple ray-trace algorithm. For each pixel on the screen, a ray is projected back into the view volume, based on the current viewing parameters, returning a data value if the ray intersects the FITS data cube. To determine the value returned, there are 2 methods available, Maximum Intensity Projection (MIP) and Average Intensity Projection (AIP). MIP returns the maximum value encountered, AIP returns an average of all values encountered. At this point, normal DS9 operations are applied, such as scaling, clipping and applying a color map.

Color Tags

The purpose of color tags are to highlight (or hide) certain values of data, regardless of the color map selected. The user creates, edits, and deletes color tags via the GUI. From the color parameters dialog, the user can load, save, and delete all color tags for that frame.

Cropping

DS9 now supports cropping the current image, via the GUI, command line, or XPA/SAMP in both 2-D and 3-D. The user may specify a rectangular region of the image data as a center and width/height in any coordinate system via the Crop Dialog, or can interactively select the region of the image to display by clicking and dragging while in Crop Mode.

I encountered SAOImage DS9 in the links section of an astroinformatics blog.

Good example of very high end image/data cube exploration/processing application.

You are likely to encounter a number of subjects worthy of comment using this application.

July 28, 2012

Montage: An Astronomical Image Mosaic Engine

Filed under: Astroinformatics,Image Processing — Patrick Durusau @ 7:56 pm

Montage An Astronomical Image Mosaic Engine

From the webpage:

Montage is a toolkit for assembling Flexible Image Transport System (FITS) images into custom mosaics.

Since I mentioned astronomical data earlier today I thought about including this for your weekend leisure time!

Exploring the Universe with Machine Learning

Filed under: Astroinformatics,Machine Learning — Patrick Durusau @ 6:59 pm

Exploring the Universe with Machine Learning by Bruce Berriman.

From the post:

A short while ago, I attended a webinar on the above topic by Alex Gray and Nick Ball. The traditional approach to analytics involves identifying which collections of data or collections of information follow sets of rules. Machine learning (ML) takes a very different approach by finding patterns and making predictions from large collections of data.

The post reviews the presentation, CANFAR + Skytree Webinar Presentation (video here).

Good way to broaden your appreciation for “big data.” Astronomy has been awash in “big data” for years.

July 16, 2012

International BASP Frontiers Workshop 2013

Filed under: Astroinformatics,Biomedical,Conferences,Signal/Collect — Patrick Durusau @ 1:28 pm

International BASP Frontiers Workshop 2013

January 27th – February 1st, 2013 Villars-sur-Ollon (Switzerland)

The international biomedical and astronomical signal processing (BASP) Frontiers workshop was created to promote synergies between selected topics in astronomy and biomedical sciences, around common challenges for signal processing.

The 2013 workshop will concentrate on the themes of sparse signal sampling and reconstruction, for radio interferometry and MRI, but also open its floor to many other interesting hot topics in theoretical, astrophysical, and biomedical signal processing.

Signal processing is one form of “big data” and is rich in subjects, both in the literature and in the data.

Proceedings from the first BASP workshop are available. Be advised it is a 354 MB zip file. If you aren’t on an airport wifi, you can find those proceedings here.

July 1, 2012

SkyQuery: …Parallel Probabilistic Join Engine… [When Static Mapping Isn’t Enough]

Filed under: Astroinformatics,Bayesian Data Analysis,Dynamic Mapping,Identity,Merging,SQL — Patrick Durusau @ 4:41 pm

SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases by László Dobos, Tamás Budavári, Nolan Li, Alexander S. Szalay, and István Csabai.

Abstract:

Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while the ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. The varying statistical error of position measurements, moving and extended objects, and other physical properties make it necessary to perform the cross-identification using a mathematically correct, proper Bayesian probabilistic algorithm, capable of including various priors. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.

Astronomy is a cool area to study and has data out the wazoo, but I was struck by:

One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios.

Is identity with sharp edges, susceptible to pair-wise mapping, the common case?

Or do we just see some identity issues that way?

Commend the paper to you as an example of dynamic merging practice.

June 22, 2012

Sage Bionetworks and Amazon SWF

Sage Bionetworks and Amazon SWF

From the post:

Over the past couple of decades the medical research community has witnessed a huge increase in the creation of genetic and other bio molecular data on human patients. However, their ability to meaningfully interpret this information and translate it into advances in patient care has been much more modest. The difficulty of accessing, understanding, and reusing data, analysis methods, or disease models across multiple labs with complimentary expertise is a major barrier to the effective interpretation of genomic data. Sage Bionetworks is a non-profit biomedical research organization that seeks to revolutionize the way researchers work together by catalyzing a shift to an open, transparent research environment. Such a shift would benefit future patients by accelerating development of disease treatments, and society as a whole by reducing costs and efficacy of health care.

To drive collaboration among researchers, Sage Bionetworks built an on-line environment, called Synapse. Synapse hosts clinical-genomic datasets and provides researchers with a platform for collaborative analyses. Just like GitHub and Source Forge provide tools and shared code for software engineers, Synapse provides a shared compute space and suite of analysis tools for researchers. Synapse leverages a variety of AWS products to handle basic infrastructure tasks, which has freed the Sage Bionetworks development team to focus on the most scientifically-relevant and unique aspects of their application.

Amazon Simple Workflow Service (Amazon SWF) is a key technology leveraged in Synapse. Synapse relies on Amazon SWF to orchestrate complex, heterogeneous scientific workflows. Michael Kellen, Director of Technology for Sage Bionetworks states, “SWF allowed us to quickly decompose analysis pipelines in an orderly way by separating state transition logic from the actual activities in each step of the pipeline. This allowed software engineers to work on the state transition logic and our scientists to implement the activities, all at the same time. Moreover by using Amazon SWF, Synapse is able to use a heterogeneity of computing resources including our servers hosted in-house, shared infrastructure hosted at our partners’ sites, and public resources, such as Amazon’s Elastic Compute Cloud (Amazon EC2). This gives us immense flexibility is where we run computational jobs which enables Synapse to leverage the right combination of infrastructure for every project.”

The Sage Bionetworks case study (above) and another one, NASA JPL and Amazon SWF, will get you excited about reaching out to the documentation on Amazon Simple Workflow Service (Amazon SWF).

In ways that presentations that consist of reading slides about management advantages to Amazon SWF simply can’t reach. At least not for me.

Take the tip and follow the case studies, then onto the documentation.

Full disclosure: I have always been fascinated by space and really hard bioinformatics problems. And have < 0 interest in DRM antics on material if piped to /dev/null would raise a user's IQ.

May 17, 2012

Exploring The Universe with Machine Learning

Filed under: Astroinformatics,BigData,Machine Learning — Patrick Durusau @ 3:49 pm

Exploring The Universe with Machine Learning

Webinar: Wednesday, May 30, 2012 9:00 AM – 10:00 AM (Pacific Daylight Time), (4:00pm GMT)

From the post:

WHAT IT’S ABOUT:

There is much to discover in the big, actually astronomically big, datasets that are (and will be) available. The challenge is how to effectively mine these massive datasets.

In this webinar attendees will learn how CANFAR (the Canadian Advanced Network for Astronomical Research) is using Skytree’s high performance and scalable machine learning system in the cloud. The combination enables astronomers to focus on their analyses rather than having to waste time implementing scalable complex algorithms and architecting the infrastructure to handle the massive datasets involved.

CANFAR is designed with usability in mind. Implemented as a virtual machine (VM), users can deploy their existing desktop code to the CANFAR cloud – delivering instant scalability (replication of the VM as required), without additional development.

WHO SHOULD ATTEND:

Anyone interested in performing machine learning or advanced analytics on big (astronomical) data sets.

Well, I quality on two counts. How about you? 😉

From Skytree Big Data Analytics. They have a free server version that I haven’t looked at, yet.

April 27, 2012

And Now, For Really Big Data: 550 billion particles

Filed under: Astroinformatics,BigData — Patrick Durusau @ 6:09 pm

Want big data? Really big data? Consider the following description from 550 billion particles:

The amounts involved in this simulation are simply mindboggling: 92 000 CPUs, 150 PBytes of data, 2 (U.S.) quadrillion flops (2 PFlop/s), the equivalent of 30 million computing hours, each particle has the size of the Milky Way, and so on…

Data, from the DEUS (for Dark Energy Universe Simulation) project is freely available.

April 23, 2012

scikit-learn – Machine Learning in Python – Astronomy

Filed under: Astroinformatics,Machine Learning,Python — Patrick Durusau @ 5:57 pm

scikit-learn – Machine Learning in Python – Astronomy by Jake VanderPlas. (tutorial)

Jake branched the scikit-learn site for his tutorial on scikit-learn using astronomical data.

Good introduction to scikit-learn and will be of interest to astronomy buffs.

April 22, 2012

Flexibility to Discover…

Filed under: Astroinformatics,Bayesian Data Analysis,Bayesian Models — Patrick Durusau @ 7:08 pm

David W. Hogg writes:

If you want to have the flexibility to discover correct structure in your data, you may have to adopt methods that permit variable model complexity.

Context to follow but think about that for a minute.

Do you want to discover structures or confirm what you already believe to be present?

In context:

On day zero of AISTATS, I gave a workshop on machine learning in astronomy, concentrating on the ideas of (a) trusting unreliable data and (b) the necessity of having a likelihood, or probability of the data given the model, making use of a good noise model. Before me, Zoubin Ghahramani gave a very valuable overview of Bayesian non-parametric methods. He emphasized something that was implied to me by Brendon Brewer’s success on my MCMC High Society challenge and mentioned by Rob Fergus when we last talked about image modeling, but which has rarely been explored in astronomy: If you want to have the flexibility to discover correct structure in your data, you may have to adopt methods that permit variable model complexity. The issues are two-fold: For one, a sampler or an optimizer can easily get stuck in a bad local spot if it doesn’t have the freedom to branch more model complexity somewhere else and then later delete the structure that is getting it stuck. For another, if you try to model an image that really does have five stars in it with a model containing only four stars, you are requiring that you will do a bad job! Bayesian non-parametrics is this kind of argument on speed, with all sorts of processes named after different kinds of restaurants. But just working with the simple dictionary of stars and galaxies, we could benefit from the sampling ideas at least. (emphasis added)

Isn’t that awesome? With all the astronomy data that is coming online? (With lots of it already online.)

Not to mention finding structures in other data as well. Maybe even in “big data.”

April 4, 2012

Astronomers Look to Exascale Computing to Uncover Mysteries of the Universe

Filed under: Astroinformatics,Marketing — Patrick Durusau @ 3:34 pm

Astronomers Look to Exascale Computing to Uncover Mysteries of the Universe by Robert Gelber.

From the post:

Plans are currently underway for development of the world’s most powerful radio telescope. The Square Kilometer Array (SKA) will consist of roughly 3,000 antennae located in Southern Africa or Australia; its final location may be decided later this month. The heart of this system, however, will include one of the world’s fastest supercomputers.

The array is quite demanding of both data storage and processing power. It is expected to generate an exabyte of data per day and require a multi-exaflops supercomputer to process it. Rebecca Boyle of Popsci wrote an article about the telescope’s computing demands, estimating that such a machine would have to deliver between two to thirty exaflops.

The array is not due to go online until 2024 but that really isn’t that far away.

Strides in engineering, processing, programming, and other fields, all of which rely upon information retrieval, are going to be necessary. Will your semantic application advance or retard those efforts?

March 28, 2012

Data in an Alien Context: Kepler Visualization Source Code

Filed under: Astroinformatics,Graphics,Visualization — Patrick Durusau @ 4:20 pm

Data in an Alien Context: Kepler Visualization Source Code

Jer Thorp released a visualization of the exoplanets discovered by the Kepler project last year and has updated that visualization to include an additional 1091 candidates. He has also released the source code for his visualization.

Imagine a marriage of Jer’s visualization with additional information as it is discovered by different projects, using different techniques and formats. Topic maps anyone?

March 22, 2012

Vista Stares Deep Into the Cosmos:…

Filed under: Astroinformatics,Data,Dataset — Patrick Durusau @ 7:42 pm

Vista Stares Deep Into the Cosmos: Treasure Trove of New Infrared Data Made Available to Astronomers

From the post:

The European Southern Observatory’s VISTA telescope has created the widest deep view of the sky ever made using infrared light. This new picture of an unremarkable patch of sky comes from the UltraVISTA survey and reveals more than 200 000 galaxies. It forms just one part of a huge collection of fully processed images from all the VISTA surveys that is now being made available by ESO to astronomers worldwide. UltraVISTA is a treasure trove that is being used to study distant galaxies in the early Universe as well as for many other science projects.

ESO’s VISTA telescope has been trained on the same patch of sky repeatedly to slowly accumulate the very dim light of the most distant galaxies. In total more than six thousand separate exposures with a total effective exposure time of 55 hours, taken through five different coloured filters, have been combined to create this picture. This image from the UltraVISTA survey is the deepest [1] infrared view of the sky of its size ever taken.

The VISTA telescope at ESO’s Paranal Observatory in Chile is the world’s largest survey telescope and the most powerful infrared survey telescope in existence. Since it started work in 2009 most of its observing time has been devoted to public surveys, some covering large parts of the southern skies and some more focused on small areas. The UltraVISTA survey has been devoted to the COSMOS field [2], an apparently almost empty patch of sky which has already been extensively studied using other telescopes, including the NASA/ESA Hubble Space Telescope [3]. UltraVISTA is the deepest of the six VISTA surveys by far and reveals the faintest objects.

Another six (6) terabytes of images, just in case you are curious.

And the rate of acquisition of astronomical data is only increasing.

Clever insights into how to more efficiently process and analyze the resulting data are surely welcome.

March 17, 2012

NASA Releases Atlas Of Entire Sky

Filed under: Astroinformatics,Data Mining,Dataset — Patrick Durusau @ 8:19 pm

NASA Releases Atlas Of Entire Sky

J. Nicholas Hoover (InformationWeek) writes:

NASA this week released to the Web an atlas and catalog of 18,000 images consisting of more than 563 million stars, galaxies, asteroids, planets, and other objects in the sky–many of which have never been seen or identified before–along with data on all of those objects.

The space agency’s Wide-field Infrared Survey Explorer (WISE) mission, which was a collaboration of NASA’s Jet Propulsion Laboratory and the University of California Los Angeles, collected the data over the past two years, capturing more than 2.7 million images and processing more than 15 TB of astronomical data along the way. In order to make the data easier to use, NASA condensed the 2.7 million digital images down to 18,000 that cover the entire sky.

The WISE mission, which mapped the entire sky, uncovered a number of never-before-seen objects in the night sky, including an entirely new class of stars and the first “Trojan” asteroid that shares the Earth’s orbital path. The study also determined that there were far fewer mid-sized asteroids near Earth than had been previously thought. Even before the mass release of data to the Web, there have already been at least 100 papers published detailing the more limited results that NASA had already released.

Hoover also says that NASA has developed tutorials to assist developers in working with the data and that the entire database will be available in the not too distant future.

When I see releases like this one, I am reminded of Jim Gray (MS). Jim was reported to like astronomy data sets because they are big and free. See what you think about this one.

March 3, 2012

Tutorial: Data Discovery Portal

Filed under: Astroinformatics — Patrick Durusau @ 10:09 pm

Tutorial: Data Discovery Portal from the US Virtual Astronomical Observatory. Also available as a screencast: http://bit.ly/xwyyyb.

The Data Discovery Tool is only one among many that is accessible through the US Virtual Astronomical Observatory. More tools have been developed by observatories around the world.

The problem they faced years ago was that astronomical data was too voluminous to be easily transferred to users in bulk for local analysis. So, the entire community set about creating protocols for interfaces with that data, wherever it was stored. Which enables the analysis of that data remotely or downloading very small subsets of relevant data.

This does not diminish the importance of semantic mappings as nomenclature changes and as new theories spawn new terminologies. It does give a framework within which mappings would be useful.

I am sure there are other scientific data sharing initiatives that I have simply not encountered. Your pointers and suggestions about the same will be greatly appreciated!

January 21, 2012

Open-sourcing Sky Map….

Filed under: Astroinformatics — Patrick Durusau @ 10:13 pm

Open-sourcing Sky Map and collaborating with Carnegie Mellon University

In May 2009 we launched Google Sky Map: our “window on the sky” for Android phones. Created by half a dozen Googlers at the Pittsburgh office in our 20% time, the app was designed to show off the amazing capabilities of the sensors in the first generation Android phones. Mostly, however, we wrote it because we love astronomy. And, thanks to Android’s broad reach, we have managed to share this passion with over 20 million Android users as well as with our local community at events such as the Urban Sky Party.

Today, we are delighted to announce that we are going to share Sky Map in a different way: we are donating Sky Map to the community. We are collaborating with Carnegie Mellon University in an exciting partnership that will see further development of Sky Map as a series of student projects. Sky Map’s development will now be driven by the students, with Google engineers remaining closely involved as advisors. Additionally, we have open-sourced the app so that other astronomy enthusiasts can take the code and augment it as they wish.

I mention this because I am sure there will be opportunities to use topic maps to map in additional astronomical information to the app.

September 26, 2011

> 100 New KDD Models/Methods Appear Every Month

Filed under: Astroinformatics,Data Mining,KDD,Knowledge Discovery — Patrick Durusau @ 7:00 pm

Got your attention? It certainly got mine when I read:

Make an inventory of existing methods relevant for astrophysical applications (more than 100 new KDD models and methods appear every month on specialized journals).

A line from the charter of the KDD-IG (Knowledge Discovery and Data Mining-Interest Group) of IVOA (International Virtual Observatory Alliance).

See: IVOA Knowledge Discovery in Databases

I checked the A census of Data Mining and Machine Learning methods for astronomy wiki page but it had no takers, much less any content.

I have written to Professor Giuseppe Longo of University Federico II in Napoli, the chair of this activity to inquire about opportunities to participate in the KDD census. I will post an updated entry when I have more information.

Separate and apart from the census, over 1,200 new KDD models/methods a year, that is an impressive number. I don’t think a census will make that slow down. If anything, greater knowledge of other efforts may spur the creation of even more new models/methods.

DAta Mining & Exploration (DAME)

Filed under: Astroinformatics,Data Mining,Machine Learning — Patrick Durusau @ 7:00 pm

DAta Mining & Exploration (DAME)

From the website:

What is DAME

Nowadays, many scientific areas share the same need of being able to deal with massive and distributed datasets and to perform on them complex knowledge extraction tasks. This simple consideration is behind the international efforts to build virtual organizations such as, for instance, the Virtual Observatory (VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose, Web-based, distributed data mining infrastructure specialized in Massive Data Sets exploration with machine learning methods.

Initially fine tuned to deal with astronomical data only, DAME has evolved in a general purpose platform program, hosting a cloud of applications and services useful also in other domains of human endeavor.

DAME is an evolving platform and new services as well as additional features are continuously added. The modular architecture of DAME can also be exploited to build applications, finely tuned to specific needs.

Follow DAME on YouTube

The project represents what is commonly considered an important element of e-science: a stronger multi-disciplinary approach based on the mutual interaction and interoperability between different scientific and technological fields (nowadays defined as X-Informatics, such as Astro-Informatics). Such an approach may have significant implications in the Knowledge Discovery in Databases process, where even near-term developments in the computing infrastructure which links data, knowledge and scientists will lead to a transformation of the scientific communication paradigm and will improve the discovery scenario in all sciences.

So far there is only one video at YouTube and it could lose the background music with no ill-effect.

The lessons learned (or applied) here should be applicable to other situations with very large data sets, say from satellites revolving the Earth?

VOGCLUSTERS: an example of DAME web application

Filed under: Astroinformatics,Data Integration,Marketing — Patrick Durusau @ 6:59 pm

VOGCLUSTERS: an example of DAME web application by Marco Castellani, Massimo Brescia, Ettore Mancini, Luca Pellecchia, and Giuseppe Longo.

Abstract:

We present the alpha release of the VOGCLUSTERS web application, specialized for data and text mining on globular clusters. It is one of the web2.0 technology based services of Data Mining & Exploration (DAME) Program, devoted to mine and explore heterogeneous information related to globular clusters data.

VOGCLUSTERS (The alpha website.)

From the webpage:

This page is the entry point to the VOGCLUSTERS Web Application (alpha release) specialized for data and text mining on globular clusters. It is a toolset of DAME Program to manage and explore GC data in various formats.

In this page the users can obtain news, documentation and technical support about the web application.

The goal of the project VOGCLUSTERS is the design and development of a web application specialized in the data and text mining activities for astronomical archives related to globular clusters. Main services are employed for the simple and quick navigation in the archives (uniformed under VO standards and constraints) and their manipulation to correlate and integrate internal scientific information. The project has not to be intended as a straightforward website for the globular clusters, but as a web application. A website usually refers to the front-end interface through which the public interact with your information online. Websites are typically informational in nature with a limited amount of advanced functionality. Simple websites consist primarily of static content where the data displayed is the same for every visitor and content changes are infrequent. More advanced websites may have management and interactive content. A web application, or equivalently Rich Internet Application (RIA) usually includes a website component but features additional advanced functionality to replace or enhance existing processes. The interface design objective behind a web application is to simulate the intuitive, immediate interaction a user experiences with a desktop application.

Note the use of DAME as a foundation to “…manage and explore GC data in various formats.”

Just in case you are unaware, astronomy/radio astronomy, along with High Energy Physics (HEP) were the original big data.

If you have an interest in astronomy, this would be a good project to follow and perhaps to suggest topic map techniques.

Effective marketing of topic maps requires more than writing papers and hoping that someone reads them. Invest your time and effort into a project, then suggest (appropriately) the use of topic maps. You and your proposal will have more credibility that way.

September 7, 2011

Photometric Catalogue of Quasars and Other Point Sources in the Sloan Digital Sky Survey

Filed under: Astroinformatics,Machine Learning — Patrick Durusau @ 6:57 pm

Photometric Catalogue of Quasars and Other Point Sources in the Sloan Digital Sky Survey by Sheelu Abraham, Ninan Sajeeth Philip, Ajit Kembhavi, Yogesh G Wadadekar, and Rita Sinha. (Submitted on 9 Nov 2010 (v1), last revised 25 Aug 2011 (this version, v3))

Abstract:

We present a catalogue of about 6 million unresolved photometric detections in the Sloan Digital Sky Survey Seventh Data Release classifying them into stars, galaxies and quasars. We use a machine learning classifier trained on a subset of spectroscopically confirmed objects from 14th to 22nd magnitude in the SDSS {\it i}-band. Our catalogue consists of 2,430,625 quasars, 3,544,036 stars and 63,586 unresolved galaxies from 14th to 24th magnitude in the SDSS {\it i}-band. Our algorithm recovers 99.96% of spectroscopically confirmed quasars and 99.51% of stars to i $\sim$21.3 in the colour window that we study. The level of contamination due to data artefacts for objects beyond $i=21.3$ is highly uncertain and all mention of completeness and contamination in the paper are valid only for objects brighter than this magnitude. However, a comparison of the predicted number of quasars with the theoretical number counts shows reasonable agreement.

OK, admittedly more interest to me than probably anyone else that reads this blog.

Still, every machine learning technique and data requirement that you learn has potential application in other fields.

August 1, 2011

Open.NASA

Filed under: Astroinformatics,Data Source — Patrick Durusau @ 3:56 pm

Open.NASA

NASA has shared data and software for years but now has a shiny new website and to be fair, some introductions to make sure of the material easier.

I don’t have a citation for it but Jim Grey (MS) was reported to say that astronomy data was great because there was so much of it and it was free.

There is a lot of mapping possible twixt and tween astronomy data sets, both historic and recent, so it is a ripe area for exploration with topic maps.


Update:

NASA’s Open Government Site Built On Open Source, an InformationWeek post on the NASA site.

Why InformationWeek mentions Object Oriented Data Technology (OODT) and Disqus but provides no links to the same, I cannot say.

Admittedly I don’t do enough linking for concepts, etc., but I do try to put in links to projects and the like.

November 24, 2010

IRODS

Filed under: Astroinformatics,Software,Space Data — Patrick Durusau @ 2:49 pm

IRODS:Data Grids, Digital Libraries, Persistent Archives, and Real-time Data Systems

From the website:

iRODS™, the Integrated Rule-Oriented Data System, is a data grid software system developed by the Data Intensive Cyber Environments research group (developers of the SRB, the Storage Resource Broker), and collaborators. The iRODS system is based on expertise gained through a decade of applying the SRB technology in support of Data Grids, Digital Libraries, Persistent Archives, and Real-time Data Systems. iRODS management policies (sets of assertions these communities make about their digital collections) are characterized in iRODS Rules and state information. At the iRODS core, a Rule Engine interprets the Rules to decide how the system is to respond to various requests and conditions. iRODS is open source under a BSD license. (emphasis in original)

Provides an umbrella over data sources to presents a uniform view to users.

The rules and metadata don’t appear to be as granular as one expects with topic maps.

I mention it here because of its use/importance with space data and as a current research platform into sharing data.

Questions:

  1. Current and annotated bibliography for the project.
  2. What are the main strengths/weaknesses of this approach? (3-5 pages, citations)

September 20, 2010

Astroinformatics 2010

Filed under: Astroinformatics,Information Retrieval,Searching,Semantics — Patrick Durusau @ 6:17 pm

Astroinformatics 2010.

Conference on astronomical data, its processing and semantics.

The astronomical community has made data interchangeable, in terabyte and soon to be petabyte quantities.

Questions:

Have they solved the problem of interchangeable semantics?

Or have they reduced semantics to the point interchange becomes easier/possible?

Do semantic interchange problems/issues/opportunities reappear when more semantics are imposed?

What about preservation of semantics?

« Newer Posts

Powered by WordPress