Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 2, 2014

Astrostatistics: The Re-Emergence of a Statistical Discipline

Filed under: Astroinformatics,Information Science,Statistics — Patrick Durusau @ 4:52 pm

Astrostatistics: The Re-Emergence of a Statistical Discipline by Joseph M. Hilbe.

From the post:

If statistics can be generically understood as the science of collecting and analyzing data for the purpose of classification and prediction and of attempting to quantify and understand the uncertainty inherent in phenomena underlying data, surely astrostatistics must be considered as one of the oldest, if not the oldest, applications of statistical science to the study of nature. Astrostatistics is the discipline dealing with the statistical analysis of astronomical and astrophysical data. It also has been understood by most researchers in the area to incorporate astroinformatics, which is the science of gathering and digitalizing astronomical data for the purpose of analysis.

I mentioned that astrostatistics is a very old discipline—if we accept the broad criterion I gave for how statistics can be understood. Egyptian and Babylonian priests who assiduously studied the motions of the sun, moon, planets, and stars as long ago as 1500 BCE classified and attempted to predict future events for the purpose of knowing when to plant, determining when a new year began, and so forth. However, their predictions were infused by the attempt to understand the effects of the celestial motions on human affairs (astrology). Later, Thales (d 546 BCE), the Ionian Greek reputed to be both the first philosopher and mathematician, apparently began to divorce mythology from scientific investigation. He is credited with predicting an eclipse in 585 BCE, which he allegedly based on studies made of previous eclipses from records kept by Egyptian priests.

A short but interesting review of the history of astrostatistics and its increasing importance as the rate of astronomical data collection continues to increase.

And a call for more inter-disciplinary work between astronomers, astrophysicists, statisticians and information scientists.

The ability to cross over tribal (disciplinary) boundaries could be eased by cross-disciplinary mappings.

Big Data Illustration

Filed under: BigData,Graphics,Visualization — Patrick Durusau @ 3:55 pm

big data image

An image from Stefano Bertolo (Attribution-NonCommercial-ShareAlike 2.0 Generic) for a presentation on big data.

Stefano notes:

A pictured I edited with Inkscape to illustrate the non-linear effects in process management that result from changes in data volumes. I thank the National Library of Scotland for the original.

This illustrates the “…non-linear effects in process management that result from changes in data volumes” but does it also illustrate the increased demands on third-parties to use data?

I need an illustration for the proposition that if data (and its structures) are annotated at the moment of creation, that reduces the burden on every subsequent user.

Stefano’s image works fine for talking about the increased burden of non-documented data, but it doesn’t add a burden to each user who lacks knowledge of the data nor take it away if the data is properly prepared.

If you start with an unknown 1 GB of data, there is some additional cost for you to acquire knowledge of the data. If someone uses that data set after you, they have to go through the same process. So the cost of unknown data isn’t static but increases with the number of times it is used.

By the same token, properly documented data doesn’t exert a continual drag on its users.

Suggestions on imagery?

Comments/suggestions?

Stefano’s posting.

Data Science Apprenticeship

Filed under: Data Science — Patrick Durusau @ 3:06 pm

Data Science Apprenticeship by Vincent Granville.

The status of this program is as follows:

Stage 1 (Available now): DIY (do-it-yourself) for self-learners: material is available for free throughout DSC, including data sets and projects to work on. No registration required,get started here.

Stage 2 (April 2014): Participants will purchase my Wiley book (DSC members get a discount) as well as our data science cheat sheet to get jump started.

Stage 3: Projects will be evaluated for a fee, and a certification delivered.

Be aware that the book from Wiley appears to be a collection of blog posts.

Nothing against blogging, 😉 , but re-cycled blog posts don’t have a good narrative flow. And usually aren’t comprehensive examinations of an entire area.

There is no projected date for Stage 3 but I’m watching for updates.

Laptop Security – “A Little Dab’ll Do Ya!”

Filed under: Cybersecurity,Security — Patrick Durusau @ 2:25 pm

Fashion and astronomy lead the way to cost effective tamper protection by Paul Ducklin.

Paul details a very low-tech solution to enable you to detect tampering with your laptop. With the unlikely name “Physically Unclonable Functions (PUFs).”

Saying more would spoil Paul’s surprise. See his post and give it some serious consideration.

You could have explosives, acid protected drives, etc. but sometimes a street light or fence is enough to deter thieves.

Over 2000 D3.js Examples and Demos

Filed under: D3,Graphics,Visualization — Patrick Durusau @ 2:06 pm

Over 2000 D3.js Examples and Demos

From the post:

Here is an update to my over 1000 D3 examples compilation and in addition to many more d3 examples, the list is now sorted alphabetically. Examples are really helpful when doing any kind of development so I am hoping that this big list of D3 examples will be a valuable resource. Bookmark and share with others. Here is the huge list of D3 demos:

An amazing collection that defies general characterization.

The demos run from an Analog Clock and Game of Life to Rotating Winkel Tripel and Spermatozoa.

That leaves you 1,999 more examples/demos to explore, plus the author’s d3 examples.

Sam Hunting, an old hand at topic maps, forwarded this link to me.

January 1, 2014

Discovering Big Dark Data in 2014?

Filed under: BigData — Patrick Durusau @ 8:59 pm

I don’t normally attempt to predict the future. If anything, the future is more fluid than either the past and/or the present.

On the other hand, we judge predictions from the vantage point of some future time. If our predictions are vague enough, it is hard to be considered wrong. 😉

I will try to avoid the escape hatch of vagueness but you will have to be the judge of my success. I am too close to the author to be considered an unbiased judge.

My first prediction is that Google’s Hummingbird (How semantic search is killing the keyword) which is a marriage of very coarse annotations (schema.org) to Google’s Knowledge Graph, will demonstrate immediate ROI for low cost semantic annotation.

The ROI that the Semantic Web of the W3C never demonstrated.

Semantic Web ROI awaits a pie in the sky day when all identifiers are replaced by URIs, URIs used consistently by everyone, written to enable machine reasoning, at each author’s expense.

Because of that demonstration of ROI from annotation coupled with the knowledge graph and the Google search engine, my second prediction is that a hue and cry will go out for more simple annotations in along the lines of those found at schema.org.

Commercial, government and NGOs, that supported and waited for the Semantic Web for fifteen (15) years, with so little to show for it, will not be as patient this time.

They will want (demand) the same ROI as Google. Immediately if not sooner, not someday by and by.

The coarse annotations invented by governments, organizations, commercial interests and others will be inconsistent and often contradictory. Not to mention it is hard to apply annotations to data you don’t understand.

You and I recognize the semantic opaqueness of keys and values in unfamiliar data. It goes unnoticed by someone familiar with a data set, much in the same way you can’t look at a page and not read it. (Assuming you know the language.)

Data and their structures are much the same way. We can’t look at data we know (or think we do) and not understand what is meant by the data and its structure.

But the opposite is true for data that is foreign to us. Foreign data is semantically opaque to a visitor.

There is a lot of foreign data in big data.

Enough foreign data that my third prediction is that “Big Dark Data” will be one of the major themes of 2014.

I see topic maps (both theory and practice) as an answer for Big Dark Data.

Do you?


Summarizing my predictions for 2014:

  • Google will demonstrate ROI from the use of coarse annotations (schema.org) and its knowledge graph + search engine.
  • Governments, enterprises, organizations, etc., will seek the same semantic ROI as Google.
  • Big Data will become known as Big Dark Data since most of it is foreign to any given user.

Big data sets available for free

Filed under: BigData,Data,Dataset — Patrick Durusau @ 7:54 pm

Big data sets available for free by Vincent Granville.

From the post:

A few data sets are accessible from our data science apprenticeship web page.

(graphic omitted)

  • Source code and data for our Big Data keyword correlation API (see also section in separate chapter, in our book)
  • Great statistical analysis: forecasting meteorite hits (see also section in separate chapter, in our book)
  • Fast clustering algorithms for massive datasets (see also section in separate chapter, in our book)
  • 53.5 billion clicks dataset available for benchmarking and testing
  • Over 5,000,000 financial, economic and social datasets
  • New pattern to predict stock prices, multiplies return by factor 5 (stock market data, S&P 500; see also section in separate chapter, in our book)
  • 3.5 billion web pages: The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages
  • Another large data set – 250 million data points: This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record.
  • 125 Years of Public Health Data Available for Download

Just in case you are looking for data for a 2014 demo or data project!

« Newer Posts

Powered by WordPress