Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 17, 2013

Poderopedia Plug & Play Platform

Filed under: Data Management,News,Reporting — Patrick Durusau @ 4:08 pm

Poderopedia Plug & Play Platform

From the post:

Poderopedia Plug & Play Platform is a Data Intelligence Management System that allows you to create and manage large semantic datasets of information about entities, map and visualize entity connections, include entity related documents, add and show sources of information and news mentions of entities, displaying all the information in a public or private website, that can work as a standalone product or as a public searchable database that can interoperate with a Newsroom website, for example, providing rich contextual information for news content using it`s archive.

Poderopedia Plug & Play Platform is a free open source software developed by the Poderomedia Foundation, thanks to the generous support of a Knight News Challenge 2011 grant by the Knight Foundation, a Startup Chile 2012 grant and a 2013 Knight fellowship grant by the International Center for Journalists (ICFJ).

WHAT CAN I USE IT FOR?

For anything that involves mapping entities and connections.

A few real examples:

  • NewsStack, an Africa News Challenge Winner, will use it for a pan-African investigation by 10 media organizations into the continent’s extractive industries.
  • Newsrooms from Europe and Latin America want to use it to make their own public searchable databases of entities, reuse their archive to develop new information products, provide context to new stories and make data visualizations—something like making their own Crunchbase.

Other ideas:

  • Use existing data to make searchable databases and visualizations of congresspeople, bills passed, what they own, who funds them, etc.
  • Map lobbyists and who they lobby and for whom
  • Create a NBApedia, Baseballpedia or Soccerpedia. Show data and connections about team owners, team managers, players, all their stats, salaries and related business
  • Map links between NSA, Prism and Silicon Valley
  • Keep track of foundation grants, projects that received funding, etc.
  • Anything related to data intelligence

CORE FEATURES

Plug & Play allows you to create and manage entity profile pages that include: short bio or summary, sheet of connections, long newsworthy profiles, maps of connections of an entity, documents related to the entity, sources of all the information and news river with external news about the entity.

Among several features (please see full list here) it includes:

  • Entity pages
  • Connections data sheet
  • Data visualizations without coding
  • Annotated documents repository
  • Add sources of information
  • News river
  • Faceted Search (using Solr)
  • Semantic ontology to express connections
  • Republish options and metrics record
  • View entity history
  • Report errors and inappropriate content
  • Suggest connections and new entities to add
  • Needs updating alerts
  • Send anonymous tips

Hmmm, when they say:

For anything that involves mapping entities and connections.

Topic maps would say:

For anything that involves mapping subjects and associations.

Poderopedia does lack is a notion of subject identity that would support “merging.”

I am going to install Poderopedia locally and see what the UI is like.

Appreciate your comments and reports if you do the same.

Plus suggestions about adding topic map capabilities to Poderopedia.

I first saw this in Nat Torkington’s Four Short Links: 5 July 2013.

March 19, 2013

OpenNews Learning… [data recycling?]

Filed under: News,Reporting — Patrick Durusau @ 5:18 am

OpenNews Learning wants to provide lessons to developers in and out of newsrooms by Justin Ellis.

From the post:

If you ever wanted an “Ask This Old House”-style guide set in the universe of newsroom developers and designers, today you’re in luck: OpenNews Learning is a new kind of online education project that looks at the nuts and bolts of interactive projects through the eyes of the people who built them. It’s the newest arm of Knight-Mozilla OpenNews, the two-foundation collaboration that aims to strengthen the bonds between the worlds of journalism and software development.

One of the central ideas behind OpenNews is sharing knowledge, through building community and by putting outside developers directly into newsrooms. OpenNews Learning is an extension of that, designed to help developers (aspiring and otherwise) learn how specific projects were built. Consider it another way to “show your work.”

Following these projects should provide ample opportunities to suggest where topic maps could have been used.

I suspect most researchers would prefer data recycling over data mining.

March 15, 2013

Document Mining with Overview:…

Filed under: Document Classification,Document Management,News,Reporting,Text Mining — Patrick Durusau @ 5:24 pm

Document Mining with Overview:… A Digital Tools Tutorial by Jonathan Stray.

The slides from the Overview presentation I mentioned yesterday.

One of the few webinars I have ever attended where nodding off was not a problem! Interesting stuff.

It is designed for the use case where there “…is too much material to read on deadline.”

A cross between document mining and document management.

A cross that hides a lot of the complexity from the user.

Definitely a project to watch.

March 14, 2013

The Most Expensive Fighter Jet Ever Built, by the Numbers

Filed under: Defense,Government,News,Reporting — Patrick Durusau @ 7:35 pm

The Most Expensive Fighter Jet Ever Built, by the Numbers by Theodoric Meyer.

From the post:

Thanks to the sequester, the Defense Department is now required to cut more than $40 billion this fiscal year out of its $549 billion budget. But one program that’s unlikely to take a significant hit is the F-35 Joint Strike Fighter, despite the fact that it’s almost four times more expensive than any other Pentagon weapons program that’s in the works.

We’ve compiled some of the most headache-inducing figures, from the program’s hefty cost overruns to the billions it’s generating in revenue for Lockheed Martin.

[See the post for the numbers, which are impressive.]

While the F-35 is billions over budget and years behind schedule, the program seems to be doing better recently. A Government Accountability Office report released this week found that Lockheed has made progress in improving supply and manufacturing processes and addressing technical problems.

“We’ve made enormous progress over the last few years,” Steve O’Bryan, Lockheed’s vice president of F-35 business development, told the Washington Post.

The military’s current head of the program, Lt. Gen. Christopher Bogdan, agreed that things have improved but said Lockheed and another major contractor, Pratt & Whitney, still have a ways to go.

“I want them to take on some of the risk of this program,” Bogdan said last month in Australia, which plans to buy 100 of the planes. “I want them to invest in cost reductions. I want them to do the things that will build a better relationship. I’m not getting all that love yet.”

A story that illustrates the utility of a topic map approach to news coverage.

The story has already spanned more than a decade and language like: “[t]he military’s current head of the program…,” makes me wonder about the prior military heads of the program.

Or for that matter, it isn’t really Lockheed or Pratt & Whitney, that are building (allegedly) the F-35 but identifiable teams of people within those organizations.

And those companies are paying bonuses, stock dividends, etc. during the term of the project.

No one person or for that matter any one group of people could not chase down all the actors in a story like this one.

However, merging different investigations into distinct aspects of the story could assemble a mosaic clearer than any of its individual pieces.

Perhaps tying poor management, cost overruns, etc., to named individuals will have a greater impact than generalized stories about such practices have when the name is the DoD, Lockheed, etc.


PS: If you aren’t clinically depressed, read the GAO report.

Would you buy a plane where it isn’t known if the helmet mounted display, a critical control system, will work?

It’s like buying a car where a working engine is to-be-determined, maybe.

An F-35 topic map should start with the names, addresses and current status of everyone who signed any paperwork authorizing this project.

“Mixed Messages” on Cybersecurity [China ranks #12 among cyber-attackers]

Filed under: Cybersecurity,Government,Government Data,News,Reporting,Security — Patrick Durusau @ 9:39 am

Do you remember the “mixed messages” Dibert cartoon?

Mixed Messages

Where an “honest” answer meant “mixed messages?”

I had that feeling this morning when I read: Mark Rockwell’s post: German telecom company provides real-time map of Cyber attacks.

From the post:

In hopes of blunting mounting electronic assaults, a German telecommunications carrier unveiled a free online capability that shows where Cyber attacks are happening around the world in real time.

Deutsche Telekom, parent company of T-Mobile, put up what it calls its “Security dashboard” portal on March 6. The map, said the company, is based on attacks on its purpose-built network of decoy “honeypot” systems at 90 locations worldwide

Deutsche Telekom said it launched the online portal at the CeBIT telecommunications trade show in Hanover, Germany, to increase the visibility of advancing electronic threats.

“New cyber attacks on companies and institutions are found every day. Deutsche Telekom alone records up to 450,000 attacks per day on its honeypot systems and the number is rising. We need greater transparency about the threat situation. With its security radar, Deutsche Telekom is helping to achieve this,” said Thomas Kremer, board member responsible for Data Privacy, Legal Affairs and Compliance.

Which has a handy chart of the sources of attacks over the last month:

Top 15 of Source Countries (Last month)

Source of Attack Number of Attacks
Russia Russian Federation 2,402,722
Taiwan, Province of China 907,102
Germany 780,425
Ukraine 566,531
Hungary 367,966
United States 355,341
Romania 350,948
Brazil 337,977
Italy 288,607
Australia 255,777
Argentina 185,720
China 168,146
Poland 162,235
Israel 143,943
Japan 133,908

By measured “attacks,” the geographic location of China (not the Chinese government) is #12 as an origin of cyber-attacks.

After Russia, Taiwan (Province of China), Germany, Ukraine, Hungary, United States, and others.

Just in case you missed several recent news cycles, the Chinese government was being singled out as a cyber-attacker for policy or marketing reasons that are not clear.

This service makes the specious nature of those accusations apparent, although the motivations behind the reports remains unclear.

Before you incorporate any government data or report into a topic map, you should verify the information with at least two or more independent sources.

Document Mining with Overview:… [Webinar – March 15, 2013]

Filed under: News,Reporting,Text Mining — Patrick Durusau @ 9:34 am

Document Mining with Overview: A Digital Tools Tutorial

From the post:

Friday, March 15, 2013 at 2:00pm Eastern Time Enroll Now

Overview is a free tool for journalists that automatically organizes a large set of documents by topic, and displays them in an interactive visualization for exploration, tagging, and reporting. Journalists have already used it to report on FOIA document dumps, emails, leaks, archives, and social media data. In fact it will work on any set of documents that is mostly text. It integrates with DocumentCloud and can import your projects, or you can upload data directly in CSV form.

You can’t read 10,000 pages on deadline, but Overview can help you rapidly figure out which pages are the important ones — even if you’re not sure what you’re looking for.

This training event is part of a series on digital tools in partnership with the American Press Institute and The Poynter Institute, funded by the John S. and James L. Knight Foundation.

See more tools in the Digital Tools Catalog.

I have been meaning to learn more about “Overview” and this looks like a good opportunity.

March 10, 2013

Interviewing Databases???

Filed under: News,Reporting — Patrick Durusau @ 8:42 pm

“We’re going to tell people how to interview databases”: The rise of data (big and small) in journalism

Caroline O’Donovan writes:

Viktor Mayer-Schönberger and Kenneth Cukier published their joint tome on big data this week, Big Data: A Revolution That Will Transform How We Live, Work and Think. Mayer-Schönberger, a professor of Internet governance and regulation at Oxford, and Cukier, the data editor of The Economist, argue that having access to vast amounts of data will soon overwhelm our natural human tendency to look for correlation and causality where there is none. In the near future, we’ll be able to rely on much larger pools of “messy” data rather than small pools of “clean” data to get more accurate answers to our questions.

“We are taking things we never thought of as informational and rendering them in data,” Mayer-Schönberger said in a talk Wednesday at the Berkman Center for Internet & Society at Harvard. “Once we think of it as data, we can organize it and extract new information.”

In their book, Mayer-Schönberger and Cukier give a number of examples of industries that will be changed forever by the new messiness of data. Bradford Cross cofounded FlightCaster.com, which predicted U.S. flight delays using data about flight times and weather patterns. The company was sold in 2011, at which point “Cross turned his sights on another aging industry.” He started Prismatic, one of a number of news aggregators that filters content for users by analyzing data about sharing frequency on social networks and user preferences.

Caroline quotes Cukier on “interviewing databases,” saying:

When we teach journalism in the future, we’re not just going to teach people the fundamentals of how to do an interview, or what a lede paragraph is. We’re going to tell people how to interview databases. And also, just as we train journalists by telling them that sometimes people that we interview are unfaithful and lie, we’re going to have to teach them to be suspicious of the data, because sometimes the data lies, too. You have to bring the same scrutiny as in the analog world — talking to people and observing — to the data as well.

I like the image of interviewing a database.

How many times do you think a database will be asked the same questions by different reporters?

Do you think recording and sharing those answers would save other reporters time and resources?

How about enabling other reporters to ask questions you forgot or didn’t know enough to ask?

If any of that rings a bell, there may be topic maps in your future.

March 5, 2013

Marketing Data Sets (Read Topic Maps)

Filed under: Marketing,News,Reporting — Patrick Durusau @ 11:50 am

The National Institute for Computer-Assisted Reporting (NICAR) has forty-seven (47) databases for sale in bulk or by geographic region.

Data sets range from “AJC School Test Scores” and “FAA Accidents and Incidents” to “Social Security Administration Death Master File” and “Wage and Hour Enforcement.”

The data sets cover decades of records.

There is a one hundred (100) record sample for each database.

The samples offer an avenue to show what more is possible with topic maps, to paying customers based upon a familiar dataset.

With all the talk of gun control in the United States, consider the Federal Firearms/Explosives Licensees database.

For free you can see:

Main documentation (readme.txt)

Sample Data (sampleatf_ffl.xls)

Record layout (Layout.txt)

Do remember that NICAR already has the attention of an interested audience, should you need a partner in marketing a fuller result.

Tools, Slides and Links from NICAR13 [News Investigation/Reporting]

Filed under: News,Reporting — Patrick Durusau @ 11:14 am

Tools, Slides and Links from NICAR13 by Chrys Wu.

The acronyms were new to me: NICAR (National Institute for Computer-Assisted Reporting), a program of IRE (Investigative Reporters & Editors).

From the post:

NICAR13 brings together some of the sharpest minds and most experienced hands in investigative journalism. Over four days, people share, discuss and teach techniques for hunting leads, gathering data, and presenting stories. Of all the conferences I go to, this one gets the highest marks from attendees for intensive, immediately applicable learning; networking and fun.

NICAR 2014 will be in Baltimore from Feb. 27 to March 2. You should be there.

For additional tutorials, videos, presentations and tips see the lists from 2012 and 2011.

A real wealth of material if you are interesting in mining, analyzing and reporting data.

Enjoy!

I first saw this in a tweet by Chrys Wu.

June 22, 2012

Business Intelligence and Reporting Tools (BIRT)

Filed under: BIRT,Business Intelligence,Reporting — Patrick Durusau @ 3:50 pm

Business Intelligence and Reporting Tools (BIRT)

From the homepage:

BIRT is an open source Eclipse-based reporting system that integrates with your Java/Java EE application to produce compelling reports.

Being reminded by the introduction that reports can consist of lists, charts, crosstabs, letters & documents, compound reports, I was encouraged to see:

BIRT reports consist of four main parts: data, data transforms, business logic and presentation.

  • Data – Databases, web services, Java objects all can supply data to your BIRT report. BIRT provides JDBC, XML, Web Services, and Flat File support, as well as support for using code to get at other sources of data. BIRT’s use of the Open Data Access (ODA) framework allows anyone to build new UI and runtime support for any kind of tabular data. Further, a single report can include data from any number of data sources. BIRT also supplies a feature that allows disparate data sources to be combined using inner and outer joins.
  • Data Transforms – Reports present data sorted, summarized, filtered and grouped to fit the user’s needs. While databases can do some of this work, BIRT must do it for “simple” data sources such as flat files or Java objects. BIRT allows sophisticated operations such as grouping on sums, percentages of overall totals and more.
  • Business Logic – Real-world data is seldom structured exactly as you’d like for a report. Many reports require business-specific logic to convert raw data into information useful for the user. If the logic is just for the report, you can script it using BIRT’s JavaScript support. If your application already contains the logic, you can call into your existing Java code.
  • Presentation – Once the data is ready, you have a wide range of options for presenting it to the user. Tables, charts, text and more. A single data set can appear in multiple ways, and a single report can present data from multiple data sets.

I was clued into BIRT by Actuate, so you might want to pay them a visit as well.

Anytime you are manipulating data, for analysis or reporting, you are working with subjects.

Topic maps are a natural for planning or documenting your transformations or reports.

Or let me put it this way: Do you really want to hunt down what you think you did six months ago for the last report? And then spend a day or two in frantic activity correcting what you mis-remember? There are other options. Your choice.

« Newer Posts

Powered by WordPress