Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 29, 2013

FLOPS Fall Flat for Intelligence Agency

Filed under: HPC,Intelligence,RFI-RFP,Semantics — Patrick Durusau @ 9:39 am

FLOPS Fall Flat for Intelligence Agency by Nicole Hemsoth.

From the post:

The Intelligence Advanced Research Projects Activity (IARPA) is putting out some RFI feelers in hopes of pushing new boundaries with an HPC program. However, at the core of their evaluation process is an overt dismissal of current popular benchmarks, including floating operations per second (FLOPS).

To uncover some missing pieces for their growing computational needs, IARPA is soliciting for “responses that illuminate the breadth of technologies” under the HPC umbrella, particularly the tech that “isn’t already well-represented in today’s HPC benchmarks.”

The RFI points to the general value of benchmarks (Linpack, for instance) as necessary metrics to push research and development, but argues that HPC benchmarks have “constrained the technology and architecture options for HPC system designers.” More specifically, in this case, floating point benchmarks are not quite as valuable to the agency as data-intensive system measurements, particularly as they relate to some of the graph and other so-called big data problems the agency is hoping to tackle using HPC systems.

Responses are due by Apr 05, 2013 4:00 pm Eastern.

Not that I expect most of you to respond to this RFI but I mention it as a step in the right direction for the processing of semantics.

Semantics are not native to vector fields and so every encoding of semantics in a vector field is a mapping.

As is every extraction of semantic from a vector field is the reverse of that mapping process.

The impact of this mapping/unmapping of semantics to and from a vector field on interpretation are unclear.

As mapping and unmapping decisions are interpretative, it seems reasonable to conclude there is some impact. How much isn’t known.

Vector fields are easy for high FLOPS systems to process but do you want a fast inaccurate answer or one that bears some resemblance to reality as experienced by others?

Graph databases, to name one alternative, are the current rage, at least according to graph database vendors.

But saying “graph database,” isn’t the same as usefully capturing semantics with a graph database.

Or processing semantics once captured.

What we need is an alternative to FLOPS that represents effective processing of semantics.

Suggestions?

February 24, 2012

Social Media & the FBI

Filed under: FBI,RFI-RFP — Patrick Durusau @ 5:02 pm

I pointed to the FBI RFI on Social Media mining innocently enough. Before the privacy advocates got into full voice.

Your privacy isn’t in any danger from this proposal from the FBI.

Yes, it talks about mining social media but it also says its objectives are:

  • Provide a user defined operations pictures (UDOP) that are flexible to support a myriad of functional FBI missions. Examples include but are not limited to: Reconnaissance & Surveillance, NSSE Planning, NSSE Operations, SIOC Operations, Counter Intelligence, Terrorism, Cybercrime, etc.
  • To improve the FBI SIOC’s open source intelligence collection capabilities by establishing a robust open source platform that has the flexibility to change search parameters and geo-locate the search based on breaking events or emerging threats.
  • Improve and accelerate the speed by which the FBI SIOC is alerted, vetted and notified of breaking events and emerging threats to more effectively notify the appropriate FO. LEGAT or OGA. (push vs. pull)
  • Provide FBI Executive Management with enhanced strategic, operational and tactical information for improved decision making
  • Empower the FBI SIOC with rapid self-service application to quickly adjust open source “search” parameters to a breaking event, crisis, and emerging threats.

Do you wonder what they mean by “open source?” Or do they intend to contract for “open source” in the Apache sense for do-it-yourself spyware?

The “…include but are not limited to: Reconnaissance & Surveillance, NSSE Planning, NSSE Operations, SIOC Operations, Counter Intelligence, Terrorism, Cybercrime, etc.” reminds me of the > 700,000 lines of code from the Virtual Case File project at the FBI.

The objective that makes me feel safe is: “Provide FBI Executive Management with enhanced strategic, operational and tactical information for improved decision making”

Does that help you realize this set of “objectives” was written by some FBI executive leafing through Wired magazine and just jotting down words and phrases?

I am sure there are some cutting edge applications that could be developed for the FBI. That would further its legitimate mission(s).

But unless and until the requirements for those applications are developed by, for and with the FBI personnel actively performing those missions, prior to seeking input from vendors, this is just another $170 Million rat-hole.

To be very clear, requirements should be developed by parties who have no interest in the final contract or services.

February 20, 2012

Social Media Application (FBI RFI)

Filed under: Data Mining,RFI-RFP,Social Media — Patrick Durusau @ 8:35 pm

Social Media Application (FBI RFI)

Current Due Date: 11:00 AM, March 13, 2012

You have to read the Social Media Application.pdf document to prepare a response.

Be aware that as of 20 February 2012, that document has a blank page every other page. I suspect it is the complete document but have written to confirm and to request a corrected document be posted.

Out-Hoover Hoover: FBI wants massive data-mining capability for social media does mention:

Nowhere in this detailed RFI, however, does the FBI ask industry to comment on the privacy implications of such massive data collection and storage of social media sites. Nor does the FBI say how it would define the “bad actors” who would be subjected this type of scrutiny.

I take that to mean that the FBI is not seeking your comments on privacy implications or possible definitions of “bad actors.”

I won’t be able to prepare an official response because I don’t meet the contractor suitability requirements, which include a cost estimate for an offsite server as a solution to the requirements.

I will be going over the requirements and publishing my response here as though I meet the contractor suitability requirements. Could be an interesting exercise.

January 15, 2012

RFI: Public Access to Digital Data Resulting From Federally Funded Scientific Research

Filed under: Government Data,Marketing,RFI-RFP,Topic Maps — Patrick Durusau @ 9:14 pm

RFI: Public Access to Digital Data Resulting From Federally Funded Scientific Research

Summary:

In accordance with Section 103(b)(6) of the America COMPETES Reauthorization Act of 2010 (ACRA; Pub. L. 111-358), this Request for Information (RFI) offers the opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research. The public input provided through this Notice will inform deliberations of the National Science and Technology Council’s Interagency Working Group on Digital Data.

I responded to the questions on: Standards for Interoperability, Re-Use and Re-Purposing

(10) What digital data standards would enable interoperability, reuse, and repurposing of digital scientific data? For example, MIAME (minimum information about a microarray experiment; see Brazma et al., 2001, Nature Genetics 29, 371) is an example of a community-driven data standards effort.Show citation box

(11) What are other examples of standards development processes that were successful in producing effective standards and what characteristics of the process made these efforts successful?Show citation box

(12) How could Federal agencies promote effective coordination on digital data standards with other nations and international communities?Show citation box

(13) What policies, practices, and standards are needed to support linking between publications and associated data?

The deadline was 12 January 2012 so what I have written below is my final submission.

I am tracking the Federal Register for other opportunities to comment, particularly those that bring topic maps to the attention of agencies and other applicants.

Please comment on this response so I can sharpen the language for the next opportunity. Examples would be very helpful, from different fields. For example, if it is a police type RFI, examples of use of topic maps in law enforcement would be very useful.

In the future I will try to rough out responses (with no references) early so I can ask for your assistance in refining the response.

BTW, it was a good thing I asked about the response format (the RFI didn’t say) b/c I was about to send in five (5) separate formats, OOo, MS Word, PDF, RTF, text. Suspect that would have annoyed them. 😉 Oh, they wanted plain email format. Just remember to ask!

Patrick Durusau
patrick@durusau.net

Patrick Durusau (consultant)

Covington, Georgia 30014

Comments on questions (10) – (13), under “Standards for Interoperability, Re-Use and Re-Purposing.”

(10) What digital data standards would enable interoperability, reuse, and repurposing of digital scientific data?

The goals of interoperability, reuse, and repurposing of digital scientific data are not usually addressed by a single standard on digital data.

For example, in astronomy, the FITS (http://en.wikipedia.org/wiki/FITS) format is routinely used to ensure digital data interoperability. In some absolute sense, if the data is in a proper FITS format, it can be “read” by FITS conforming software.

But being in FITS format is no guarantee of reuse or repurposing. Many projects adopt “local” extensions to FITS and their FITS files can be reused or repurposed, if and only if the local extensions are understood. (Local FITS Conventions (http://fits.gsfc.nasa.gov/fits_local_conventions.html), FITS Keyword Dictionaries (http://fits.gsfc.nasa.gov/fits_dictionary.html))

That is not to fault projects for having “local” conventions but to illustrate that scientific research can require customization of digital data standards and reuse and repurposing will depend upon documentation of those extensions.

Reuse and repurposing would be enhanced by the use of a mapping standard, such as ISO/IEC 13250, Topic Maps (http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38068). Briefly stated, topic maps enable the creation of mapping/navigational structures over digital (and analog) scientific data, furthering the goals of reuse and repurposing.

To return to the “local” conventions for FITS, it isn’t hard to imagine future solar research missions that develop different “local” conventions from the SDAC FITS Keyword Conventions (http://www.lmsal.com/solarsoft/ssw_standards.html). Interoperable to be sure because of the conformant FITS format, but reuse and repurposing become problematic with files from both data sets.

Topic maps enable experts to map the “local” conventions of the projects, one to the other, without any prior limitation on the basis for that mapping. It is important that experts be able to use their “present day” reasons to map data sets together, not just reasons from the dusty past.

Some data may go unmapped. Or should we say that not all data will be found equally useful? Mapping can and will make it easier to reuse and repurpose data but that is not without cost. The participants in a field should be allowed to make the decision if mappings to legacy data are needed.

Some Babylonian astronomical texts(http://en.wikipedia.org/wiki/Babylonian_astronomy) have survived but they haven’t been translated into modern astronomical digital format. The point being that no rule for mapping between data sets will fit all occasions.

When mapping is appropriate, topic maps offer the capacity to reuse data across shifting practices of nomenclature and styles. Twenty years ago asking about “Dublin Core” would have evoked a puzzled look. Asking about a current feature in “Dublin Core” twenty years from now, is likely to have the same response.

Planning on change and mapping it when useful, is a better response than pretending change stops with the current generation.

(11) What are other examples of standards development processes that were successful in producing effective standards and what characteristics of the process made these efforts successful?

The work of the IAU (International Astronomical Union (http://www.iau.org/)) and its maintenance of the FITS standard mentioned above is an example of a successful data standard effort.

Not formally part of the standards process but the most important factor was the people involved. They were dedicated to the development of data and placing that data in the hands of others engaged in the same enterprise.

To put a less glowing and perhaps repeatable explanation on their sharing, one could say members of the astronomical community had a mutual interest in sharing data.

Where gathering of data is dependent upon the vagaries of the weather, equipment, observing schedules and the like, data has to be taken from any available source. That being the case, there is an incentive to share data with others in like circumstances.

Funding decisions for research should depend not only on the use of standards that enable sharing but awarding heavy consideration on active sharing.

(12) How could Federal agencies promote effective coordination on digital data standards with other nations and international communities?

The answer here depends on what is meant by “effective coordination?” It wasn’t all that long ago that the debates were raging about whether both ODF (ISO/IEC 26300) and OOXML (ISO/IEC 29500) should both be ISO standards. Despite being (or perhaps because of) the ODF editor, I thought it would be to the advantage of both proposals to be ISO standards.

Several years later, I stand by that position. Progress has been slower than I would like at seeing the standards draw closer together but there are applications that support both so that is a start.

Different digital standards have and will develop for the same areas of research. Some for reasons that aren’t hard to see, some for historical accidents, others for reasons we may never know. Semantic diversity expressed in the existence of different standards is going to be with us always.

Attempting to force different communities (the source of different standards) together will have unhappy results all the way around. Instead, federal agencies should take the initiative to be the cross-walk as it were between diverse groups working in the same areas. As semantic brokers, who are familiar with two or three or perhaps more perspectives, federal agencies will offer a level of expertise that will be hard to match.

It will be a slow, evolutionary process but contributions based on understanding different perspectives will bring diverse efforts closer together. It won’t be quick or easy but federal agencies are uniquely positioned to bring the long term commitment to develop such expertise.

(13) What policies, practices, and standards are needed to support linking between publications and associated data?

Linking between publications and associated data presumes availability of the associated data. To recall the comments on incentives for sharing, making data available should be a requirement for present funding and a factor to be considered for future funding.

Applications for funding should also be judged on the extent to which they plan on incorporating existing data sets and/or provide reasons why that data should not be reused. Agencies can play an important “awareness” role by developing and maintaining resources that catalog data in given fields.

It isn’t clear that any particular type of linking between publication and associated data should be mandated. The “type” of linking is going to vary based on available technologies.

What is clear is that the publication its dependency on associated data should be clearly identified. Moreover, the data should be documented such that in the absence of the published article, a researcher in the field could use or reuse the data.

I added categories for RFI-RFP to make it easier to find this sort of analysis.

If you have any RFI-RFP responses that you feel like you can post, please do and send me links.

Powered by WordPress