Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 3, 2012

1940 Census (U.S.A.)

Filed under: Census Data,Government Data — Patrick Durusau @ 4:18 pm

1940 Census (U.S.A.)

From the “about” page:

Census records are the only records that describe the entire population of the United States on a particular day. The 1940 census is no different. The answers given to the census takers tell us, in detail, what the United States looked like on April 1, 1940, and what issues were most relevant to Americans after a decade of economic depression.

The 1940 census reflects economic tumult of the Great Depression and President Franklin D. Roosevelt’s New Deal recovery program of the 1930s. Between 1930 and 1940, the population of the Continental United States increased 7.2% to 131,669,275. The territories of Alaska, Puerto Rico, American Samoa, Guam, Hawaii, the Panama Canal, and the American Virgin Islands comprised 2,477,023 people.

Besides name, age, relationship, and occupation, the 1940 census included questions about internal migration; employment status; participation in the New Deal Civilian Conservation Corps (CCC), Works Progress Administration (WPA), and National Youth Administration (NYA) programs; and years of education.

Great for ancestry and demographic studies. What other data would you use with this census information?

March 17, 2012

Every open spending data site in the US ranked and listed

Filed under: Government Data,Transparency — Patrick Durusau @ 8:19 pm

Every open spending data site in the US ranked and listed

Lisa Evans (Guardian, UK) writes:

The Follow the Money 2012 report has this week revealed the good news that more US states are being open about their public spending by publishing their transactions on their websites. It has also exposed the states of Arkansas, Idaho, Iowa, Montana and Wyoming that are keeping their finances behind a password protected wall or are just not publishing at all.

A network of US Public Interest Research Groups (US PIRGs) which produced the report, revealed that 46 states now “allow residents to access checkbook-level information about government expenditures online”.

The checkbook means a digital copy of who receives state money, how much, and for what purpose. Perhaps to make sense of this ‘checkbook’ concept it’s useful to compare US and UK public finance transparency.

A lot of data to be sure and far more than was available as little as ten (10) years ago.

It is “checkbook-level” type information but that is only a starting point for transparency.

Citizens can spot double billing/payments or “odd” billing/payments but that isn’t transparency. Or rather it is transparency of a sort but not its full potential.

For example, if your local government is spending in its IT budget over $300,000 a year for GIS services and you see the monthly billings and payments are all correct and proper. What you are missing is that local developers, who have long standing relationships with elected officials benefit from the GIS services for planning new developments. The public doesn’t benefit from the construction of new developments, which places strain on existing infrastructure for the benefit if the very few.

To develop that level of transparency you would need electronic records of campaign support, government phone records, property records, visitor logs, and other data. And quite possibly a topic map to make sense of it all. Interesting to think about.

March 6, 2012

MPC – Minnesota Population Center

Filed under: Census Data,Government Data — Patrick Durusau @ 8:09 pm

MPC – Minnesota Population Center

I mentioned the Integrated Public Use Microdata Series (IPUMS-USA) data set last year which self-describes as:

IPUMS-USA is a project dedicated to collecting and distributing United States census data. Its goals are to:

  • Collect and preserve data and documentation
  • Harmonize data
  • Disseminate the data absolutely free!

Use it for GOOD — never for EVIL

There is international data and more U.S. data that may be of interest:

February 28, 2012

OECD Homepage

Filed under: Government Data,Statistics — Patrick Durusau @ 8:41 pm

OECD Homepage

More about how I got to this site in a moment but it is a wealth of statistical information.

From the about page:

The mission of the Organisation for Economic Co-operation and Development (OECD) is to promote policies that will improve the economic and social well-being of people around the world.

The OECD provides a forum in which governments can work together to share experiences and seek solutions to common problems. We work with governments to understand what drives economic, social and environmental change. We measure productivity and global flows of trade and investment. We analyse and compare data to predict future trends. We set international standards on a wide range of things, from agriculture and tax to the safety of chemicals.

We look, too, at issues that directly affect the lives of ordinary people, like how much they pay in taxes and social security, and how much leisure time they can take. We compare how different countries’ school systems are readying their young people for modern life, and how different countries’ pension systems will look after their citizens in old age.

Drawing on facts and real-life experience, we recommend policies designed to make the lives of ordinary people better. We work with business, through the Business and Industry Advisory Committee to the OECD, and with labour, through the Trade Union Advisory Committee. We have active contacts as well with other civil society organisations. The common thread of our work is a shared commitment to market economies backed by democratic institutions and focused on the wellbeing of all citizens. Along the way, we also set out to make life harder for the terrorists, tax dodgers, crooked businessmen and others whose actions undermine a fair and open society.

I got to the site by following a link to OECD.StatExtracts which is a beta page reported by Christophe Lalanne’s A bag of tweets / Feb 2012.

I am sure comments (helpful ones in particular) would be appreciated on the beta pages.

My personal suspicion is that eventually very little data will be transferred in bulk but most large data sets will admit to both pre-programmed as well as ah hoc processing requests. That is already quite common in astronomy (both optical and radio).

February 22, 2012

Look But Don’t Touch

Filed under: Data,Geographic Data,Government Data,Transparency — Patrick Durusau @ 4:48 pm

I would describe the Atlanta GIS Data Catalog as a Look But Don’t Touch system. A contrast to the efforts of DC at transparency.

From the webpage:

GIS Data Catalog

Atlanta GIS creates and maintains many GIS data sets (also known as” layers” because of the way they are layered one on top another to create a map) and collects others from external sources, mostly other government agencies. Each layer represents some class of geographic feature. The features represented can be physical, such as roads, buildings and streams, or they can be conceptual, such as neighbor boundaries, property lines and the locations of crimes.

The GIS Data Catalog is an on-line compilation of information on GIS layers used by the CIty. The catalog allows you to quickly locate GIS data by searching by keyword. You can also view metadata for each data layer in the catalog. All data in the catalog represent the best and most current GIS data maintained or used by the city. The city’s GIS metadata is maintained in conformance with a standard defined by the Federal Geographic Data Committee (FGDC) .

The data layers themselves are not available for download from the catalog. Data can be requested by contacting the originating department or agency. More specific contact information is available within the metadata for many data layers. (emphasis added)

I am sure most agencies would supply the data on request, but why require the request?

To add a request processing position to the agency payroll and to have procedures for processing requests, along with meetings on request granting, plus an appeals process if the request is rejected, with record keeping for all of the foregoing plus more?

That doesn’t sound like transparent government or effective use of tax dollars to me.

District of Columbia – Data Catalog

Filed under: Data,Government Data,Open Data,Transparency — Patrick Durusau @ 4:48 pm

District of Columbia – Data Catalog

This is an example of a city moving towards transparency.

A large number of data sets to access (485 as of today), with live feeds to some data streams.

Eurostat

Filed under: Data,Dataset,Government Data,Statistics — Patrick Durusau @ 4:48 pm

Eurostat

From the “about” page:

Eurostat’s mission: to be the leading provider of high quality statistics on Europe.

Eurostat is the statistical office of the European Union situated in Luxembourg. Its task is to provide the European Union with statistics at European level that enable comparisons between countries and regions.

This is a key task. Democratic societies do not function properly without a solid basis of reliable and objective statistics. On one hand, decision-makers at EU level, in Member States, in local government and in business need statistics to make those decisions. On the other hand, the public and media need statistics for an accurate picture of contemporary society and to evaluate the performance of politicians and others. Of course, national statistics are still important for national purposes in Member States whereas EU statistics are essential for decisions and evaluation at European level.

Statistics can answer many questions. Is society heading in the direction promised by politicians? Is unemployment up or down? Are there more CO2 emissions compared to ten years ago? How many women go to work? How is your country’s economy performing compared to other EU Member States?

International statistics are a way of getting to know your neighbours in Member States and countries outside the EU. They are an important, objective and down-to-earth way of measuring how we all live.

I have seen Eurostat mentioned, usually negatively, by data aggregation services. I visited Eurostat today and found it quite useful.

For the non-data professional, there are graphs and other visualizations of popular data.

For the data professional, there are bulk downloads of data and other technical information.

I am sure there is room for improvement specific feedback is required to make that happen. (It has been my experience that positive specific feedback works best. Fine something nice to say and then suggest a change to improve the outcome.)

February 13, 2012

Statistics on the length and linguistic complexity of bills

Filed under: Government Data,Legal Informatics — Patrick Durusau @ 8:20 pm

Statistics on the length and linguistic complexity of bills

From the post:

Where would you go to find out what the longest bill of the 112th Congress was by number of sections (H. R. 1473)? How about by number of unique words (H.R. 3671)? What about by Flesh-Kincaid reading level (S. 475)?

The reading level scores make me doubt that members of Congress have written any of the legislation.

February 11, 2012

Turning government data into private sector products is complicated business

Filed under: Government Data,Marketing — Patrick Durusau @ 7:51 pm

Turning government data into private sector products is complicated business by Joseph Marks.

From the post:

The government launched its massive data set trove Data.gov in 2009 with a clear mission: to put information the government was gathering anyway into the hands of private sector and nonprofit Web and mobile app developers.

Once that data was out, the White House imagined, developers would set about turning it into useful products–optimizing Census Bureau statistics for marketers; Commerce Department data for exporters; and Housing and Urban Development Department information for building contractors, mortgage brokers and insurance adjusters.

When necessary, the government also would be able to prime the pump with agency-sponsored code-a-thons and app development competitions sponsored through Challenge.gov, a White House initiative that paid out $38 million to prize-winning developers during its first year, which ended in September.

But turning government data into private sector products has proved more complicated in practice.

Good article about some niche uses of data that have succeeded. Like anything else, you can only repackage and re-sell data that is of interest to some customer.

Question: Is anyone taking published agency data and re-selling it to the agencies releasing the data? Perhaps combined with data from other agencies? With the push on to cut costs, that might be an interesting approach.

Weave open-source data visualization offers power, flexibility

Filed under: Government Data,Weave — Patrick Durusau @ 7:41 pm

Weave open-source data visualization offers power, flexibility by Sharon Machlis.

From the post:

When two Boston-area organizations rolled out an interactive data visualization website last month, it represented one of the largest public uses yet for the open-source project Weave — and more are on the way.

Three years in development so far and still in beta, Weave is designed so government agencies, non-profits and corporate users can offer the public an easy-to-use platform for examining information. Want to see the relationship between low household incomes and student reading scores in eastern Mass.? How housing and transportation costs compare with income? Or maybe how obesity rates have changed over time? Load some data to generate a table, scatter plot and map.

In addition to viewing data, mousing over various entries lets you highlight items on multiple visualizations at once: map, map legend, bar chart and scatter plot, for example. Users can also add visualization elements or change data sets, as well as right-click to look up related information on the Web.

This story about Weave highlights different data sets than the last one I reported. Is this a where there’s smoke there’s fire type situation? That is to say that public access and manipulation of data has the potential to make a real difference?

If so, in what way? Will open access to data result in closure of secret courts? Or secret indictments and evidence? The evidence that has come to light via diplomatic cables, for example, is embarrassing for incompetent or crude individuals. Hardly the stuff of “national security.” (Sorry, don’t know how to embed a drum roll in the page, maybe in HTML5 I can.)

February 5, 2012

…House Legislative Data and Transparency Conference

Filed under: Government Data,Legal Informatics,Transparency — Patrick Durusau @ 8:05 pm

Video and Other Resources Available for House Legislative Data and Transparency Conference

From the post:

Video is now available for the House Legislative Data and Transparency Conference, held 2 February 2012, in Washington, DC. The conference was hosted by the U.S. House of Representatives’ Committee on House Administration.

Click here for slides from some of the presentations (scroll down).

The Twitter hashtag for the conference was #ldtc.

Presentations concerned metadata and dissemination standards and practices for U.S. federal legislative data, including open government data standards, XML markup, integrating multimedia resources into legislative data, and standards for evaluating the transparency of U.S. federal legislative data.

Interesting source of information on legislative data.

February 1, 2012

One year on: 10 times bigger, masses more data… and a new API

Filed under: Corporate Data,Dataset,Government Data — Patrick Durusau @ 4:35 pm

One year on: 10 times bigger, masses more data… and a new API

From the post:

Was it just a year ago that we launched OpenCorporates, after just a couple months’ coding? When we opened up over 3 million companies and allowed searching across multiple jurisdictions (admittedly there were just three of them to start off with)?

Who would have thought that 12 months later we would have become 10 times bigger, with over 30 million companies and over 35 jurisdictions, and lots of other data too. So we could use this as an example to talk about some of the many milestones in that period, about all the extra data we’ve added, about our commitment to open data, and the principles behind it.

We’re not going to do that however, instead we’d rather talk about the new API we’ve just launched, allowing full access to all the info, and importantly allowing searches via the API too. In fact, we’ve now got a full website devoted to the api, http://api.opencorporates.com, and on it you’ll find all the documentation, example API calls, versioning information, error messages, etc.

Congratulations to OpenCorporates on a stellar year!

The collection of dots to connect has gotten dramatically larger!

January 29, 2012

Statistics Finland is making further utilisation of statistical data easier

Filed under: Census Data,Government Data — Patrick Durusau @ 9:08 pm

Statistics Finland is making further utilisation of statistical data easier

From the post:

Statistics Finland has confirmed new Terms of Use for the utilisation of already published statistical data. In them, Statistics Finland grants a universal, irrevocable right to the use of the data published in its www.stat.fi website service and in related free statistical databases. The right extends to use for both commercial and non-commercial purposes. The aim is to make further utilisation of the data easier and thereby increase the exploitation and effectiveness of statistics in society.

At the same time, an open interface has been built to the StatFin database. The StatFin database is a general database built with AC-Axis tools that is free-of-charge and contains a wide array of statistical data on a variety of areas in society. It contains data from some 200 sets of statistics, thousands of tables and hundreds of millions of individual data cells. The contents of the StatFin database have been systematically widened in the past few years and its expansion with various information contents and regional divisions will be continued even further.

Curious if the free commercial re-use of government collected data (paid for by taxpayers) favors established re-sellers of data or startups that will combine existing data in interesting ways. Thoughts?

First seen at Christophe Lalanne’s Bag of Tweets for January 2012.

January 24, 2012

USAspending.gov

Filed under: Data Source,Government Data — Patrick Durusau @ 3:40 pm

USAspending.gov

This website is required by the “Federal Funding Accountability and Transparency Act (Transparency Act).”

The FAQ describes its purpose as:

To provide the public with information about how their tax dollars are spent. Citizens have a right and a need to understand where tax dollars are spent. Collecting data about the various types of contracts, grants, loans, and other types of spending in our government will provide a broader picture of the Federal spending processes, and will help to meet the need of greater transparency. The ability to look at contracts, grants, loans, and other types of spending across many agencies, in greater detail, is a key ingredient to building public trust in government and credibility in the professionals who use these agreements.

An amazing amount of data which can be searched or browsed in a number of ways.

It is missing one ingredient that would change it from an amazing information resource to a game changing information resource, you.

The site can only report information known to the federal government and covered by the Transparency Act.

For example, it can’t report on family or personal relationships between various parties to contracts or even offer good (or bad) information on performance on contacts or methods used by contractors.

However, a topic map (links into this site are stable) could combine this information with other information quite easily.

I ran across this site in Analyzing US Government Contract Awards in R by Vik Paruchuri. A very good article that scratches the surface of mining this content.

January 16, 2012

Legislation Identity Issues

Filed under: Government,Government Data,Transparency — Patrick Durusau @ 2:41 pm

After posting House Launches Transparency Portal I started to think about all the identity issues that such a resource raises. None of them new but with greater access to the stuff of legislation, the more those issues come to the fore.

The easy ones are going to be identify the bills themselves, what parts of the U.S. Code they modify, legislative history (in terms of amendments), etc. And the current legislation can be tracked, etc.

Legislation identifies the subject matter to which it applies, what the rules on the subject are to become, and a host of other details.

But more than that, legislation, indirectly, identifies who will benefit from the legislation and who will bear the costs of it. Not identified in the sense that we think of social security numbers, addresses or geographic location, but just as certainly identification.

For example, what if a bill in Congress says that it applies to all cities with more than two million inhabitants. (New York, Los Angeles, Chicago, Houston – largest to smallest) Sound fair on the face of it but only four cities in four different states are going to benefit from it.

Another set of identity issues will be who wrote the legislation. Oh, err, members of Congress are “credited” with writing bills but it is my understanding that is a polite fiction. Bills are written by specialists in legislative writing. Some work for the government, some for lobbyists, some for other interest groups, etc.

It would make a very interesting subject identity project to use authorship techniques to try to identify when Covington & Burling LLP, Arnold & Porter LLP, Monsanto, or the hand of the ACLU can be detected in legislation.

Whether you identify the “actual” author of a bill or not, there is also the question of the identity of who paid for the legislation?

All of these “identity” issues and others have always existed with regard to legislation, regulations, executive orders, etc., but making bills available electronically may change how those issues are approached.

Not a plan of action but just imagine say a number people are interested enough in a particular bill to loosely organize and produce an annotated version that ties it to existing laws and probable sources of the bill and who it benefits. Other people, perhaps specialists in campaign finances or even local politics for an area, could further the analysis started by others.

I have been told that political blogging works that way, unlike the conventional news services that horde information and therefore only offer partial coverage of any event.

Whatever semantic technology that is used to produce annotations, RDF, linked data, topic maps (my favorite), you are still going to face complex identity issues with legislation.

Suggestions on possible approaches or interfaces?

January 15, 2012

House Launches Transparency Portal

Filed under: Government,Government Data,Transparency — Patrick Durusau @ 9:19 pm

House Launches Transparency Portal by Daniel Schuman.

From the post:

Making good on part of the House of Representative’s commitment to increase congressional transparency, today the House Clerk’s office launched http://docs.house.gov/, a one stop website where the public can access all House bills, amendments, resolutions for floor consideration, and conference reports in XML, as well as information on floor proceedings and more. Information will ultimately be published online in real time and archived for perpetuity.

The Clerk is hosting the site, and the information will primarily come from the leadership, the Committee on House Administration, the Rules Committee, and the Clerk’s office. The project has been driven by House Republican leaders as part of an push for transparency. Important milestones include the adoption of the new House Rules in January 2011 that gave the Committee on House Administration the power to establish standards for publishing documents online, an April 2011 letter from the Speaker and Majority Leader to the Clerk calling for better public access to House information, a Committee on House Administration hearing in June 2011 on modernizing information delivery in the House, a December 2011 public meeting on public access to congressional information, and finally the late December adoption of online publication standards.

Some immediate steps to take:

  • Contact the House Clerk’s office to express your appreciation for their efforts.
  • If you are a US citizen, contact your representatives to express your support for this effort and looking forward to more transparency.
  • Write to your local TV/radio/newspaper to point out this important resource and express your interest. (Keep it real non-technical. Transparency = Good.)
  • Write to your local school board/school, etc., to suggest they could use this as a classroom resource. (Offer to help as well.)
  • Make use of the data and credit your source.
  • Urge others to do the foregoing steps.
    • I have doubts about the transparency efforts but also think we should give credit where credit is due. A lot of people have worked very hard to make this much transparency possible so let’s make the best use of it we can.

RFI: Public Access to Digital Data Resulting From Federally Funded Scientific Research

Filed under: Government Data,Marketing,RFI-RFP,Topic Maps — Patrick Durusau @ 9:14 pm

RFI: Public Access to Digital Data Resulting From Federally Funded Scientific Research

Summary:

In accordance with Section 103(b)(6) of the America COMPETES Reauthorization Act of 2010 (ACRA; Pub. L. 111-358), this Request for Information (RFI) offers the opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research. The public input provided through this Notice will inform deliberations of the National Science and Technology Council’s Interagency Working Group on Digital Data.

I responded to the questions on: Standards for Interoperability, Re-Use and Re-Purposing

(10) What digital data standards would enable interoperability, reuse, and repurposing of digital scientific data? For example, MIAME (minimum information about a microarray experiment; see Brazma et al., 2001, Nature Genetics 29, 371) is an example of a community-driven data standards effort.Show citation box

(11) What are other examples of standards development processes that were successful in producing effective standards and what characteristics of the process made these efforts successful?Show citation box

(12) How could Federal agencies promote effective coordination on digital data standards with other nations and international communities?Show citation box

(13) What policies, practices, and standards are needed to support linking between publications and associated data?

The deadline was 12 January 2012 so what I have written below is my final submission.

I am tracking the Federal Register for other opportunities to comment, particularly those that bring topic maps to the attention of agencies and other applicants.

Please comment on this response so I can sharpen the language for the next opportunity. Examples would be very helpful, from different fields. For example, if it is a police type RFI, examples of use of topic maps in law enforcement would be very useful.

In the future I will try to rough out responses (with no references) early so I can ask for your assistance in refining the response.

BTW, it was a good thing I asked about the response format (the RFI didn’t say) b/c I was about to send in five (5) separate formats, OOo, MS Word, PDF, RTF, text. Suspect that would have annoyed them. 😉 Oh, they wanted plain email format. Just remember to ask!

Patrick Durusau
patrick@durusau.net

Patrick Durusau (consultant)

Covington, Georgia 30014

Comments on questions (10) – (13), under “Standards for Interoperability, Re-Use and Re-Purposing.”

(10) What digital data standards would enable interoperability, reuse, and repurposing of digital scientific data?

The goals of interoperability, reuse, and repurposing of digital scientific data are not usually addressed by a single standard on digital data.

For example, in astronomy, the FITS (http://en.wikipedia.org/wiki/FITS) format is routinely used to ensure digital data interoperability. In some absolute sense, if the data is in a proper FITS format, it can be “read” by FITS conforming software.

But being in FITS format is no guarantee of reuse or repurposing. Many projects adopt “local” extensions to FITS and their FITS files can be reused or repurposed, if and only if the local extensions are understood. (Local FITS Conventions (http://fits.gsfc.nasa.gov/fits_local_conventions.html), FITS Keyword Dictionaries (http://fits.gsfc.nasa.gov/fits_dictionary.html))

That is not to fault projects for having “local” conventions but to illustrate that scientific research can require customization of digital data standards and reuse and repurposing will depend upon documentation of those extensions.

Reuse and repurposing would be enhanced by the use of a mapping standard, such as ISO/IEC 13250, Topic Maps (http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38068). Briefly stated, topic maps enable the creation of mapping/navigational structures over digital (and analog) scientific data, furthering the goals of reuse and repurposing.

To return to the “local” conventions for FITS, it isn’t hard to imagine future solar research missions that develop different “local” conventions from the SDAC FITS Keyword Conventions (http://www.lmsal.com/solarsoft/ssw_standards.html). Interoperable to be sure because of the conformant FITS format, but reuse and repurposing become problematic with files from both data sets.

Topic maps enable experts to map the “local” conventions of the projects, one to the other, without any prior limitation on the basis for that mapping. It is important that experts be able to use their “present day” reasons to map data sets together, not just reasons from the dusty past.

Some data may go unmapped. Or should we say that not all data will be found equally useful? Mapping can and will make it easier to reuse and repurpose data but that is not without cost. The participants in a field should be allowed to make the decision if mappings to legacy data are needed.

Some Babylonian astronomical texts(http://en.wikipedia.org/wiki/Babylonian_astronomy) have survived but they haven’t been translated into modern astronomical digital format. The point being that no rule for mapping between data sets will fit all occasions.

When mapping is appropriate, topic maps offer the capacity to reuse data across shifting practices of nomenclature and styles. Twenty years ago asking about “Dublin Core” would have evoked a puzzled look. Asking about a current feature in “Dublin Core” twenty years from now, is likely to have the same response.

Planning on change and mapping it when useful, is a better response than pretending change stops with the current generation.

(11) What are other examples of standards development processes that were successful in producing effective standards and what characteristics of the process made these efforts successful?

The work of the IAU (International Astronomical Union (http://www.iau.org/)) and its maintenance of the FITS standard mentioned above is an example of a successful data standard effort.

Not formally part of the standards process but the most important factor was the people involved. They were dedicated to the development of data and placing that data in the hands of others engaged in the same enterprise.

To put a less glowing and perhaps repeatable explanation on their sharing, one could say members of the astronomical community had a mutual interest in sharing data.

Where gathering of data is dependent upon the vagaries of the weather, equipment, observing schedules and the like, data has to be taken from any available source. That being the case, there is an incentive to share data with others in like circumstances.

Funding decisions for research should depend not only on the use of standards that enable sharing but awarding heavy consideration on active sharing.

(12) How could Federal agencies promote effective coordination on digital data standards with other nations and international communities?

The answer here depends on what is meant by “effective coordination?” It wasn’t all that long ago that the debates were raging about whether both ODF (ISO/IEC 26300) and OOXML (ISO/IEC 29500) should both be ISO standards. Despite being (or perhaps because of) the ODF editor, I thought it would be to the advantage of both proposals to be ISO standards.

Several years later, I stand by that position. Progress has been slower than I would like at seeing the standards draw closer together but there are applications that support both so that is a start.

Different digital standards have and will develop for the same areas of research. Some for reasons that aren’t hard to see, some for historical accidents, others for reasons we may never know. Semantic diversity expressed in the existence of different standards is going to be with us always.

Attempting to force different communities (the source of different standards) together will have unhappy results all the way around. Instead, federal agencies should take the initiative to be the cross-walk as it were between diverse groups working in the same areas. As semantic brokers, who are familiar with two or three or perhaps more perspectives, federal agencies will offer a level of expertise that will be hard to match.

It will be a slow, evolutionary process but contributions based on understanding different perspectives will bring diverse efforts closer together. It won’t be quick or easy but federal agencies are uniquely positioned to bring the long term commitment to develop such expertise.

(13) What policies, practices, and standards are needed to support linking between publications and associated data?

Linking between publications and associated data presumes availability of the associated data. To recall the comments on incentives for sharing, making data available should be a requirement for present funding and a factor to be considered for future funding.

Applications for funding should also be judged on the extent to which they plan on incorporating existing data sets and/or provide reasons why that data should not be reused. Agencies can play an important “awareness” role by developing and maintaining resources that catalog data in given fields.

It isn’t clear that any particular type of linking between publication and associated data should be mandated. The “type” of linking is going to vary based on available technologies.

What is clear is that the publication its dependency on associated data should be clearly identified. Moreover, the data should be documented such that in the absence of the published article, a researcher in the field could use or reuse the data.

I added categories for RFI-RFP to make it easier to find this sort of analysis.

If you have any RFI-RFP responses that you feel like you can post, please do and send me links.

January 11, 2012

Monthly Twitter activity for all members of the U.S. Congress

Filed under: Data Source,Government Data,Tweets — Patrick Durusau @ 8:04 pm

Monthly Twitter activity for all members of the U.S. Congress by Drew Conway.

From the post:

Many months ago I blogged about the research that John Myles White and I are conducting on using Twitter data to estimate an individual’s political ideology. As I mentioned then, we are using the Twitter activity of members of the U.S. Congress to build a training data set for our model. A large part of the effort for this project has gone into designing a system to systematically collect the Twitter data on the members of the U.S. Congress.

Today I am pleased to announce that we have worked out most of the bugs, and now have a reliable data set upon which to build. Better still, we are ready to share. Unlike our old system, the data now lives on a live CouchDB database, and can be queried for specific research tasks. We have combined all of the data available from Twitter’s search API with the information on each member from Sunlight Foundation’s Congressional API.

Looks like an interesting data set to match up to the ages of addresses doesn’t it?

January 3, 2012

List of cities/states with open data – help me find more!

Filed under: Data,Government Data — Patrick Durusau @ 5:13 pm

List of cities/states with open data – help me find more!

A plea from “Simply Statistics” to increase its listing of cities with open data.

Mostly American and Canadian, with a few others, Berlin for example, suggested in comments.

I haven’t looked (yet) but since European libraries lead the charge in many ways to have greater access to their collections (my recollection, yours may differ), I would expect to find European cities and authorities also ahead on the race to publish public data.

Pointers from European readers? (Or I can look them up later this week, just not today.)

OpenData

Filed under: Dataset,Government Data — Patrick Durusau @ 5:06 pm

OpenData by Socrata

Another very large public data set collection.

Socrata developed the City of Chicago portal, which I mentioned at: Accessing Chicago, Cook and Illinois Open Data via PHP.

December 26, 2011

A Christmas Miracle

Filed under: Dataset,Government Data — Patrick Durusau @ 8:22 pm

A Christmas Miracle

From the post:

Data files on 407 banks, between the dates of 2007 to 2009, on the daily borrowing with the US Federal Reserve bank. The data sets are available from Bloomberg at this address data

This is an unprecedented look into the day-to-day transactions of banks with the Feds during one of the worse and unusual times in US financial history. A time of weekend deals, large banks being summoned to sign contracts, and all around chaos. For the economist, technocrat, and R enthusiasts this is the opportunity of a life time to examine and analyze financial data normally held in the strictest of confidentiality. A good comparison would be taking all of the auto companies and getting their daily production, sales, and cost data for two years and sending it out to the world. Never has happened.

Not to get too excited, what were released were daily totals, not the raw data itself.

Being a naturally curious person, when someone releases massaged data when the raw data would have been easier, I have to wonder what would I see if I had the raw data? Or perhaps in a topic maps context, what subjects could I link up with the raw data that I can’t with the massaged data?

December 25, 2011

LittleSis

Filed under: Government Data,Politics — Patrick Durusau @ 6:07 pm

LittleSis* is a free database of who-knows-who at the heights of business and government. (*opposite of Big Brother).

Quick Summary: LittleSis is tracking “21,390 organizations, 64,453 people, and 339,769 connections between them”

From the “about” page:

LittleSis is a free database detailing the connections between powerful people and organizations.

We bring transparency to influential social networks by tracking the key relationships of politicians, business leaders, lobbyists, financiers, and their affiliated institutions. We help answer questions such as:

  • Who do the wealthiest Americans donate their money to?
  • Where did White House officials work before they were appointed?
  • Which lobbyists are married to politicians, and who do they lobby for?

All of this information is public, but scattered. We bring it together in one place. Our data derives from government filings, news articles, and other reputable sources. Some data sets are updated automatically; the rest is filled in by our user community.

Their blog is known as: Eyes on the Ties.

Just in case you are interested in politics. Looks like the sort of effort that would benefit from using a topic map.

December 22, 2011

MyFCC Platform Enables Government Data Mashups

Filed under: Government Data,Mashups — Patrick Durusau @ 7:40 pm

MyFCC Platform Enables Government Data Mashups by Kin Lane.

From the post:

The FCC just launched a new tool that allows any user to custom build a dashboard from a variety of FCC released data, tools and services, built on the FCC API. The tool, called MyFCC, lets you create a customized FCC online experience for quick access to the tools and information you feel is most important. MyFCC make it possible to easily create, save and manage a customized page, choosing from a menu of 22 “widgets” such as, latest headlines and official documents, the daily digest, FCC forms and online filings.

Once you have built your customized MyFCC page, you can share your work using popular social network platforms, or embed on any other website. The platform allows for each widget to independently be shared, or embed the entire dashboard into another site.

Modulo my usual comments about subject identification and reuse of identifications, this is at least a step in the right direction.

December 6, 2011

White House to open source Data.gov as open government data platform

Filed under: eGov,Government Data,Open Source — Patrick Durusau @ 8:10 pm

White House to open source Data.gov as open government data platform by Alex Howard.

From the post:

As 2011 comes to an end, there are 28 international open data platforms in the open government community. By the end of 2012, code from new “Data.gov-in-a-box” may help many more countries to stand up their own platforms. A partnership between the United States and India on open government has borne fruit: progress on making the open data platform Data.gov open source.

In a post this morning at the WhiteHouse.gov blog, federal CIO Steven VanRoekel (@StevenVDC) and federal CTO Aneesh Chopra (@AneeshChopra) explained more about how Data.gov is going global:

As part of a joint effort by the United States and India to build an open government platform, the U.S. team has deposited open source code — an important benchmark in developing the Open Government Platform that will enable governments around the world to stand up their own open government data sites.

The development is evidence that the U.S. and India are indeed still collaborating on open government together, despite India’s withdrawal from the historic Open Government Partnership (OGP) that launched in September. Chopra and VanRoekel explicitly connected the move to open source Data.gov to the U.S. involvement in the Open Government Partnership today. While we’ll need to see more code and adoption to draw substantive conclusions on the outcomes of this part of the plan, this is clearly progress.

Data.gov in a boxThe U.S. National Action Plan on Open Government, which represents the U.S. commitment to the OGP, included some details about this initiative two months ago, building upon a State Department fact sheet that was released in July. Back in August, representatives from India’s National Informatics Center visited the United States for a week-long session of knowledge sharing with the U.S. Data.gov team, which is housed within the General Services Administration.

“The secretary of state and president have both spent time in India over the past 18 months,” said VanRoekel in an interview today. “There was a lot of dialogue about the power of open data to shine light upon what’s happening in the world.”

The project, which was described then as “Data.gov-in-a-box,” will include components of the Data.gov open data platform and the India.gov.in document portal. Now, the product is being called the “Open Government Platform” — not exactly creative, but quite descriptive and evocative of open government platforms that have been launched to date. The first collection of open source code, which describes a data management system, is now up on GitHub.

During the August meetings, “we agreed upon a set of things we would do around creating excellence around an open data platform,” said VanRoekel. “We owned the first deliverable: a dataset management tool. That’s the foundation of an open source data platform. It handles workflow, security and the check in of data — all of the work that goes around getting the state data needs to be in before it goes online. India owns the next phase: the presentation layer.”

If the initiative bears fruit in 2012, as planned, the international open government data movement will have a new tool to apply toward open data platforms. That could be particularly relevant to countries in the developing world, given the limited resources available to many governments.

What’s next for open government data in the United States has yet to be written. “The evolution of data.gov should be one that does things to connect to web services or an API key manager,” said VanRoekel. “We need to track usage. We’re going to double down on the things that are proving useful.”

Interests which already hold indexes of government documents should find numerous opportunities when government platforms provide opportunities for mapping into agency data as part of open government platforms.

November 27, 2011

DOD looks to semantics for better data-sharing, cost savings

Filed under: Federation,Funding,Government Data — Patrick Durusau @ 8:50 pm

DOD looks to semantics for better data-sharing, cost savings by Amber Currin.

From Federal Computer Week:

In its ongoing quest to catalyze cost efficiencies and improve information-sharing, the Defense Department is increasingly looking to IT to solve problems of all sizes. The latest bid involves high-tech search capabilities, interoperable data and a futuristic, data-rich internet known as semantic web.

In a new RFI, the Defense Information Systems Agency and Deputy Chief Management Office are looking to strengthen interoperability and data-sharing for a vast array of requirements through an enterprise information web (EIW). Their envisioned EIW is built on semantic web, which will allow better enterprise-wide collection, analysis and reporting of data necessary for managing personnel information and business systems, as well as protecting troops on the ground with crucial intelligence.

“At its heart, semantic web is about making it possible to integrate and share information at a web scale in a simple way that traditional databases don’t allow,” said James Hendler, senior constellation professor of the Tetherless World Research Constellation at Rensselaer Polytechnic Institute.

One way semantic web helps is by standardizing information to enable databases to better communicate with each other – something that could be particularly helpful for DOD’s diverse systems and lexicons.

“The information necessary for decision-making is often contained in multiple source systems managed by the military services, components and/or defense agencies. In order to provide an enterprise view or answer questions that involve multiple services or components, each organization receives data requests then must interpret the question and collect, combine and present the requested information,” the RFI reads.

Oh, and:

“DOD historically spends more than $6 billion annually developing and maintaining a portfolio of more than 2,000 business systems and web services. Many of these systems, and the underlying processes they support, are poorly integrated. They often deliver redundant capabilities that optimize a single business process with little consideration to the overall business enterprise,” DOD Deputy Chief Management Officer Beth McGrath said in an April 4 memo. “It is imperative, especially in today’s limited budget environment, to optimize our business processes and the systems that support them to reduce our annual business systems spending.”

Just in case you are interested, the deadline for responses is 19 December 2011. A direct link to the RFI.

I may actually respond. Would there be any interest in my posting my response to the RFI to get reader input on my responses?

So I could revise it week by week until the deadline.

Might be a nice way to educate other contenders and the DoD about topic maps in general.

Comments?

BTW, if you are interested in technology and the U.S. federal government, try reading Federal Computer Week on a regular basis. At least you will know what issues are “up in the air” and vocabulary being used to talk about them.

November 6, 2011

Rdatamarket Tutorial

Filed under: Data Mining,Government Data,R — Patrick Durusau @ 5:44 pm

From the Revolutions blog:

The good folks at DataMarket have posted a new tutorial on using the rdatamarket package (covered here in August) to easily download public data sets into R for analysis.

The tutorial describes how to install the rdatamarket package, how to extract metadata for data sets, and how to download the data themselves into R. The tutorial also illustrates a feature of the package I wasn’t previously aware of: you can use dimension filtering to extract just the portion of the dataset you need: for example, to read just the population data for specific countries from the entire UN World Population dataset.

DataMarket Blog: Using DataMarket from within R

October 31, 2011

Statement of Disbursements

Filed under: Government Data — Patrick Durusau @ 7:32 pm

Statement of Disbursements – United States House of Representatives.

From the Introduction:

The Statement of Disbursements (SOD) is a quarterly public report of all receipts and expenditures for U.S. House of Representatives Members, Committees, Leadership, Officers and Offices. The House has been required by law to publish the SOD since 1964.

The Chief Administrative Officer of the House publishes the SOD within 60 days of the end of each calendar year quarter (January–March, April–June, July–September and October–December).

Since 2009 the SOD has been published online to increase governmental transparency and accountability.

As a result of a new House financial system, all SODs from the fourth quarter of 2010 onward will display new transaction codes while maintaining the same data transparency as before. These codes (AP for Accounts Payable; AR for Accounts Receivable and GL for General Ledger) will replace all previously used SOD transaction codes.

Later in the Introduction it is noted:

Because of the procedural complexity inherent in balancing hundreds of Congressional budgets, the SOD document is not easy to read.

Well, that’s certainly true.

What would you need to make this a meaningful document?

Who are these salaried employees and how are they related to campaign contributors?

We pay for phones and cellphones, so where are the calling records to and from those phones?

Where are the calendars of the House members so expenses for meetings with particular lobbyists or others can be matched to the expense record?

As it is, we have a document that shows bakery expenses, cab fares, mixed in with larger and smaller expenses.

Still, I suppose it is better than nothing at all. But only just.

What other public data would you match up with these expense documents to uncover patterns of behavior?

Accessing Chicago, Cook and Illinois Open Data via PHP

Filed under: Government Data — Patrick Durusau @ 7:31 pm

Accessing Chicago, Cook and Illinois Open Data via PHP by Paul Weinstein.

From the post:

In Accessing the CTA’s API with PHP I outlined a class file I created [footnote omitted] for querying the Chicago Transit Authority’s three web-based application programming interfaces (APIs). However, that isn’t the only open data development I’ve been working on recently, I’ve also been working on a class file for accessing the City of Chicago’s Open Data Portal.

The City of Chicago’s Open Data Portal is driven by a web application developed by Socrata. Socrata’s platform provides a number of web-based API methods for retrieving published datasets [footnote omitted].

class.windy.php provides the definition for a PHP object that wraps around Socrata’s API providing PHP written apps access in turn to the city’s open data.

The general trend seems to be more access to more government data and I suspect the 2012 elections will only speed up that trend.

The question is: How valuable is the government data that is posted? Or to put it more bluntly: Is the data that is released simply noise to cover up more interesting data?

How would you use a topic map to try to answer that question? Can you think of a way to use topic maps to uncover “missing” associations?

October 27, 2011

Data.gov

Filed under: Data,Data Mining,Government Data — Patrick Durusau @ 4:46 pm

Data.gov

A truly remarkable range of resources from the U.S. Federal Government, that is made all the more interesting by Data.gov Next Generation:

Data.gov starts an exciting new chapter in its evolution to make government data more accessible and usable than ever before. The data catalog website that broke new grounds just two years ago, is once again redefining the Open Data experience. Learn more about Data.gov’s transformation into a cloud-based Open Data platform for citizens, developers and government agencies in this 4-minute introductory video.

Developers should take a look at: http://dev.socrata.com/.

October 17, 2011

CENDI: Federal STI Managers Group

Filed under: Government Data,Information Retrieval,Librarian/Expert Searchers,Library — Patrick Durusau @ 6:44 pm

CENDI: Federal STI Managers Group

From the webpage:

Welcome to the CENDI web site

CENDI’s vision is to provide its member federal STI agencies a cooperative enterprise where capabilities are shared and challenges are faced together so that the sum of accomplishments is greater than each individual agency can achieve on its own.

CENDI’s mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information-support systems. In fulfilling its mission, CENDI agencies play an important role in addressing science- and technology-based national priorities and strengthening U.S. competitiveness.

CENDI is an interagency working group of senior scientific and technical information (STI) managers from 14 U.S. federal agencies:

  • Defense Technical Information Center (Department of Defense)
  • Office of Research and Development & Office of Environmental Information (Environmental Protection Agency)
  • Government Printing Office
  • Library of Congress
  • NASA Scientific and Technical Information Program
  • National Agricultural Library (Department of Agriculture)
  • National Archives and Records Administration
  • National Library of Education (Department of Education)
  • National Library of Medicine (Department of Health and Human Services)
  • National Science Foundation
  • National Technical Information Service (Department of Commerce)
  • National Transportation Library (Department of Transportation)
  • Office of Scientific and Technical Information (Department of Energy)
  • USGS/Core Science Systems (Department of Interior)

These programs represent over 97% of the federal research and development budget.

The CENDI web site is hosted by the Defense Technical Information Center (DTIC), and is maintained by the CENDI secretariat. (emphasis added)

Yeah, I thought the 97% figure would catch your attention. 😉 Not sure how it compares with spending on IT and information systems in law enforcement and the spook agencies.

Topic Maps Class Project: Select one of the fourteen members and prepare a report for the class on their primary web interface. What did you like/dislike about the interface? How would you integrate the information you found there with your “home” library site (for students already employed elsewhere) or with the GSLIS site?

BTW, I think you will find that these agencies and their personnel have bee thinking deeply about information integration for decades. It is an extremely difficult problem that has no fixed or easy solution.

« Newer PostsOlder Posts »

Powered by WordPress