Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 2, 2014

One Thing Leads To Another (#NICAR2014)

Filed under: Data Mining,Government,Government Data,News,Reporting — Patrick Durusau @ 11:51 am

A tweet this morning read:

overviewproject ‏@overviewproject 1h
.@djournalismus talking about handling 2.5 million offshore leaks docs. Content equivalent to 50,000 bibles. #NICAR14

That sound interesting! Can’t ever tell when a leaked document will prove useful. But where to find this discussion?

Following #NICAR14 leaves you with the impression this is a conference. (I didn’t recognize the hashtag immediately.)

Searching on the web, the hashtag lead me to: 2014 Computer-Assisted Reporting Conference. (NICAR = National Institute for Computer-Assisted Reporting)

The handle @djournalismus offers the name Sebastian Mondia.

Checking the speakers list, I found this presentation:

Inside the global offshore money maze
Event: 2014 CAR Conference
Speakers: David Donald, Mar Cabra, Margot Williams, Sebastian Mondial
Date/Time: Saturday, March 1 at 2 p.m.
Location: Grand Ballroom West
Audio file: No audio file available.

The International Consortium of Investigative Journalists “Secrecy For Sale: Inside The Global Offshore Money Maze” is one of the largest and most complex cross-border investigative projects in journalism history. More than 110 journalists in about 60 countries analyzed a 260 GB leaked hard drive to expose the systematic use of tax havens. Learn how this multinational team mined 2.5 million files and cracked open the impenetrable offshore world by creating a web app that revealed the ownership behind more than 100,000 anonymous “shell companies” in 10 offshore jurisdictions.

Along the way I discovered the speakers list, who cover a wide range of subjects of interest to anyone mining data.

Another treasure is the Tip Sheets and Tutorial page. Here are six (6) selections out of sixty-one (61) items to pique your interest:

  • Follow the Fracking
  • Maps and charts in R: real newsroom examples
  • Wading through the sea of data on hospitals, doctors, medicine and more
  • Free the data: Getting government agencies to give up the goods
  • Campaign Finance I: Mining FEC data
  • Danger! Hazardous materials: Using data to uncover pollution

Not to mention that NICAR2012 and NICAR2013 are also accessible from the NICAR2014 page, with their own “tip” listings.

If you find this type of resource useful, be sure to check out Investigative Reporters and Editors (IRE)

About the IRE:

Investigative Reporters and Editors, Inc. is a grassroots nonprofit organization dedicated to improving the quality of investigative reporting. IRE was formed in 1975 to create a forum in which journalists throughout the world could help each other by sharing story ideas, newsgathering techniques and news sources.

IRE provides members access to thousands of reporting tip sheets and other materials through its resource center and hosts conferences and specialized training throughout the country. Programs of IRE include the National Institute for Computer Assisted Reporting, DocumentCloud and the Campus Coverage Project

Learn more about joining IRE and the benefits of membership.

Sounds like a win-win offer to me!

You?

March 1, 2014

Legislative XML Data Mapping Results

Filed under: Government,Law - Sources,XML — Patrick Durusau @ 7:57 pm

Legislative XML Data Mapping Results

You may recall last September (2013) when I posted: Legislative XML Data Mapping [$10K], which was a challenge to convert documents encoded in U.S. Congress and U.K. Parliament markup into Akoma Ntoso.

There were five (5) entries and two (2) winners.

The first place winner reports:

The included web application, an instance of which is running at akoma-ntoso.appspot.com, converts documents to Akoma Ntoso in response to common HTTP requests. Visit the app with a web browser, enter the URL of the source XML into the form, and the app responds with an Akoma Ntoso representation of the source document. Requests can even be made without a browser by passing the source document’s URL directly as the “source” parameter, e.g.,

But I was unable to find the files with the includes .xsl transforms.

The second place winner reports the use of Perl scripts that can be found at: http://ec2-50-112-47-161.us-west-2.compute.amazonaws.com/xml-akoma-ntoso/XML-AkomaNtoso-0.1.tar.gz

I was unable to find any formal comparison of the entries. Perhaps you will have better luck.

And I am curious, if you encountered a “converted” form of a U.S. or U.K. statute, would you be able to faithfully reconstruct the original?

CrunchBase “semanticsearch”

Filed under: Marketing — Patrick Durusau @ 7:36 pm

CrunchBase “semanticsearch”

In a recent discussion of potential “semantic” products, I was pointed to the Crunchbase.com URL you see above.

Exploring the related tags and their related tags will give you an idea of potential competitors in a particular area.

Even more interesting is to do domain specific searches with your favorite search engine to see how many potential competitors are mentioned in mainstream business publications.

Or whatever market segment you have as a target for your service and/or software.

As opposed to geek literature.

I say that because the geek market is small in comparison to other market segments.

You can also use Crunchbase to identify companies that are successful in areas of interest so you can study their advertising and marketing strategies.

Their strategies will have to be adapted to fit your service/product but you can get a sense for what is likely to work.

In what innovative ways would you use Crunchbase to evaluate a market and/or develop marketing strategies?

How to learn Chinese and Japanese [and computing?]

Filed under: Language,Learning,Topic Maps — Patrick Durusau @ 6:33 pm

How to learn Chinese and Japanese by Victor Mair.

From the post:

Victor concludes after a discussion of various authorities and sources:

If you delay introducing the characters, students’ mastery of pronunciation, grammar, vocabulary, syntax, and so forth, are all faster and more secure. Surprisingly, when later on they do start to study the characters (ideally in combination with large amounts of reading interesting texts with phonetic annotation), students acquire mastery of written Chinese much more quickly and painlessly than if writing is introduced at the same time as the spoken language.

An interesting debate follows in the comments.

I am wondering if the current emphasis on “coding” would be better shift to an emphasis on computing?

That is teaching the fundamental concepts of computing, separate and apart from any particular coding language or practice.

Much as I have taught the principles of subject identification separate and apart from a particular model or syntax.

The nooks and crannies of particular models or syntaxes can weight until later.

R and the Weather

Filed under: Data,R,Weather Data — Patrick Durusau @ 6:13 pm

R and the Weather by Joseph Rickert.

From the post:

The weather is on everybody’s mind these days: too much ice and snow east of the Rockies and no rain to speak fo in California. Ram Narasimhan has made it a little easier for R users to keep track of what’s going on and also get a historical perspective. His new R package weatherData makes it easy to down load weather data from various stations around the world collecting data. Here is a time series plot of the average temperature recorded at SFO last year with the help of the weatherData’s getWeatherForYear() function. It is really nice that the function returns a data frame of hourly data with the Time variable as class POSIXct.

Everyone is still talking about winter weather but summer isn’t far off and with that comes hurricane season.

You can capture a historical perspective that goes beyond the highest and lowest temperature for a particular day.

Enjoy!

I first saw this in The week in stats (Feb. 10th edition).

Introducing OData

Filed under: Data — Patrick Durusau @ 5:55 pm

Introducing OData by David Chappell.

From the post:

Describing OData

Our world is awash in data. Vast amounts exist today, and more is created every year. Yet data has value only if it can be used, and it can be used only if it can be accessed by applications and the people who use them.

Allowing this kind of broad access to data is the goal of the Open Data Protocol, commonly called just OData. This paper provides an introduction to OData, describing what it is and how it can be applied. The goal is to illustrate why OData is important and how your organization might use it.

The Problem: Accessing Diverse Data in a Common Way

There are many possible sources of data. Applications collect and maintain information in databases, organizations store data in the cloud, and many firms make a business out of selling data. And just as there are many data sources, there are many possible clients: Web browsers, apps on mobile devices, business intelligence (BI) tools, and more. How can this varied set of clients access these diverse data sources?

One solution is for every data source to define its own approach to exposing data. While this would work, it leads to some ugly problems. First, it requires every client to contain unique code for each data source it will access, a burden for the people who write those clients. Just as important, it requires the creators of each data source to specify and implement their own approach to getting at their data, making each one reinvent this wheel. And with custom solutions on both sides, there’s no way to create an effective set of tools to make life easier for the people who build clients and data sources.

Thinking about some typical problems illustrates why this approach isn’t the best solution. Suppose a Web application wishes to expose its data to apps on mobile phones, for instance. Without some common way to do this, the Web application must implement its own idiosyncratic approach, forcing every client app developer that needs its data to support this. Or think about the need to connect various BI tools with different data sources to answer business questions. If every data source exposes data in a different way, analyzing that data with various tools is hard — an analyst can only hope that her favorite tool supports the data access mechanism she needs to get at a particular data source.

Defining a common approach makes much more sense. All that’s needed is agreement on a way to model data and a protocol for accessing that data — the implementations can differ. And given the Web-oriented world we live in, it would make sense to build this technology with existing Web standards as much as possible. This is exactly the approach taken by OData.

I’ve been looking for a more than an elevator speech but less than all the details introduction to OData. I think this one fits the bill.

I was looking because OData Version 4.0 and OData JSON Format Version 4.0 (OData TC at OASIS) recently became OASIS standards.

However you wish to treat data post-acquisition, as in a topic map, is your concern. Obtaining data, however, will be made easier through the use of OData.

QuaaxTM New Release!

Filed under: QuaaxTM,Topic Map Software,Topic Maps — Patrick Durusau @ 3:32 pm

QuaaxTM 0.8.0 by Johannes Schmidt.

From the webpage:

QuaaxTM is a PHP ISO/IEC 13250 Topic Maps engine which implements PHPTMAPI. This enables developers to work against a standardized API. QuaaxTM uses MySQL with InnoDB or MariaDB with XtraDB as storage engine and benefits from transaction support and referential integrity.

Changes.

Download.

From the news:

0.8.0 passes all unit tests on MariaDB 5.5.35 using XtraDB storage engine. Prior versions of MariaDB should also work but are not tested.

If you don’t know MariaDB, https://mariadb.org/.

Looking good!

« Newer Posts

Powered by WordPress