Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 17, 2012

FEMA Acronyms, Abbreviations and Terms

Filed under: Government,Government Data,Vocabularies — Patrick Durusau @ 9:00 am

FEMA Acronyms, Abbreviations and Terms (PDF)

From the webpage:

The FAAT List is a handy reference for the myriad of acronyms and abbreviations used within the federal government, emergency management and first response communities. This year’s new edition, which continues to reflect the evolving U.S. Department of Homeland Security, contains an approximately 50 percent increase in the number of entries and definitions bringing the total to over 6,200 acronyms, abbreviations and terms. Some items listed are obsolete, but they are included because they may still appear in publications and correspondence. Obsolete items can be found at the end of this document.

This may be handy for reading FEMA or related government documents.

Hasn’t been updated since 2009.

If you know of a more recent resource, please give a shout.

October 12, 2012

Leon Panetta Plays Chicken Little

Filed under: Government,Government Data,Security,Telecommunications — Patrick Durusau @ 4:53 pm

If you haven’t seen DOD: Hackers Breached U.S. Critical Infrastructure Control Systems, or similar coverage of Leon Panetta’s portrayal of Chicken Little (aka “Henny Penny”), you may find this interesting.

The InformationWeek Government article says:

Warning of more destructive attacks that could cause loss of life if successful, Panetta urged Congress to pass comprehensive legislation in the vein of the Cybersecurity Act of 2012, a bill co-sponsored by Sens. Joe Lieberman, I-Conn., Susan Collins, R-Maine, Jay Rockefeller, D-W.Va., and Dianne Feinstein, D-Calif., that failed to pass in its first attempt earlier this year by losing a cloture vote in the Senate.

“Congress must act and it must act now,” he said. “This bill is victim to legislative and political gridlock like so much else in Washington. That frankly is unacceptable and it should be unacceptable not just to me, but to you and to anyone concerned with safeguarding our national security.”

Specifically, Panetta called for legislation that would make it easier for companies to share “specific threat information without the prospect of lawsuits” but while still respecting civil liberties. He also said that there must be “baseline standards” co-developed by the public and private sector to ensure the cybersecurity of critical infrastructure IT systems. The Cybersecurity Act of 2012 contained provisions that would arguably fit the bill on both of those accounts.

While Panetta said that “there is no substitute” for legislation, he noted that the Obama administration has been working on an executive order on cybersecurity as an end-around on Congress. “We need to move as far as we can” even in the face of Congressional inaction, he said. “We have no choice because the threat that we face is already here.”

I particularly liked the lines:

“…That frankly is unacceptable and it should be unacceptable not just to me, but to you and to anyone concerned with safeguarding our national security.”

“We have no choice because the threat that we face is already here.”

Leon is old enough to remember (too old perhaps?) the Cold War when we had the Russians, the Chinese and others to defend ourselves against. Without the Cybersecurity Act of 2012.

Oh, you don’t know what the Cybersecurity Act of 2012 says do you?

The part Leon is lusting after to make private entities exempt from:

[Sec 701]….chapter 119, 121, or 206 of title 18, United States Code, the Foreign Intelligence Surveillance Act of 1978 (50 U.S.C. 1801 et seq.), and the Communications Act of 1934 (47 U.S.C. 151 et seq.), ..

I’m sorry, that still doesn’t help does it?

Try this:

[Title 18, United States Code] CHAPTER 119—WIRE AND ELECTRONIC COMMUNICATIONS INTERCEPTION AND INTERCEPTION OF ORAL COMMUNICATIONS (§§ 2510–2522)

[Title 18, United States Code] CHAPTER 121—STORED WIRE AND ELECTRONIC COMMUNICATIONS AND TRANSACTIONAL RECORDS ACCESS (§§ 2701–2712)

[Title 18, United States Code] CHAPTER 206—PEN REGISTERS AND TRAP AND TRACE DEVICES (§§ 3121–3127)

[Title 47, United States Code, start here and following]CHAPTER 5—WIRE OR RADIO COMMUNICATION (§§ 151–621)

[Title 50, United States Code, start here and following]CHAPTER 36—FOREIGN INTELLIGENCE SURVEILLANCE (§§ 1801–1885c)

Just reading the section titles should give you the idea:

The Cybersecurity Act of 2012 exempts all private entities from criminal and civil penalties for monitoring, capturing and reporting any communication by anyone. Well, except for whatever the government is doing, that stays secret.

During the Cold War, facing nuclear armageddon, we had the FBI, CIA and others, subject to the laws you read above, to protect us from our enemies. And we did just fine.

Now we are facing a group of raggamuffins and Leon wants to re-invent the Stasi. Put us all to spying and reporting on each other. Free of civil and criminal liability.

A topic map could connect half-truths, lies and the bed wetters who support this sort of legislation together. (They aren’t going to go away.)

Interested?

PS: A personal note for Leon Panetta:

Leon, before you repeat any more idle latrine gossip, talk to some of the more competent career security people at the Pentagon. They will tell you about things like separation of secure from unsecure networks. Not allowing recordable magnetic media (including Lady Gaga CDs) access to secure networks, and a host of other routine security measures already in place.

Computer security didn’t just become an issue since 9/11. Every sane installation has been aware of computer security issues for decades.

Two kinds of people are frantic about computer security now:

  1. Decision makers who don’t understand computer security.
  2. People who want to sell the government computer security services.

Our military computer experts can fashion plans within the constitution and legal system to deal with what is a routine security issue.

You just have to ask them.

October 9, 2012

Code for America: open data and hacking the government

Filed under: Government,Government Data,Open Data,Open Government,Splunk — Patrick Durusau @ 12:50 pm

Code for America: open data and hacking the government by Rachel Perkins.

From the post:

Last week, I attended the Code for America Summit here in San Francisco. I attended as a representative of Splunk>4Good (we sponsored the event via a nice outdoor patio lounge area and gave away some of our (in)famous tshirts and a few ponies). Since this wasn’t your typical “conference”, and I’m not so great at schmoozing, i was a little nervous–what would Christy Wilson, Clint Sharp, and I do there? As it turned out, there were so many amazing takeaways and so much potential for awesomeness that my nervousness was totally unfounded.

So what is Code for America?

Code for America is a program that sends technologists (who take a year off and apply to their Fellowship program) to cities throughout the US to work with advocates in city government. When they arrive, they spend a few weeks touring the city and its outskirts, meeting residents, getting to know the area and its issues, and brainstorming about how the city can harness its public data to improve things. Then they begin to hack.
Some of these partnerships have come up with amazing tools–for example,

  • Opencounter Santa Cruz mashes up several public datasets to provide tactical and strategic information for persons looking to start a small business: what forms and permits you’ll need, zoning maps with overlays of information about other businesses in the area, and then partners with http://codeforamerica.github.com/sitemybiz/ to help you find commercial space for rent that matches your zoning requirements.
  • Another Code for America Fellow created blightstatus.org, which uses public data in New Orleans to inform residents about the status and plans for blighted properties in their area.
  • Other apps from other cities do cool things like help city maintenance workers prioritize repairs of broken streetlights based on other public data like crime reports in the area, time of day the light was broken, and number of other broken lights in the vicinity, or get the citizenry involved with civic data, government, and each other by setting up a Stack Exchange type of site to ask and answer common questions.

Whatever your view data sharing by the government, too little, too much, just right, Rachel points to good things can come from open data.

Splunk has a “corporate responsibility program: Splunk>4Good.

Check it out!

BTW, do you have a topic maps “corporate responsibility” program?

October 7, 2012

New Congressional Data Available for Free Bulk Download: Bill Data 1973- , Members 1789-

Filed under: Government,Government Data,Law - Sources,Legal Informatics — Patrick Durusau @ 4:28 am

New Congressional Data Available for Free Bulk Download: Bill Data 1973- , Members 1789-

From Legal Informatics news of:

Of interest if you like U.S. history and/or recent events.

What other data would you combine with the data you find here?

October 6, 2012

Federal Government Big Data Potential and Realities

Filed under: BigData,Government,Government Data — Patrick Durusau @ 3:01 pm

Federal Government Big Data Potential and Realities (Information Week)

From the post:

Big data has enormous potential in the government sector, though little in the way of uptake and strategy at this point, according to a new report from tech industry advocacy non-profit TechAmerica Foundation.

Leaders of TechAmerica’s Federal Big Data Commission on Wednesday unveiled “Demystifying Big Data: A Practical Guide to Transforming the Business of Government.” The 39-page report provides big data basics like definitions and IT options, as well as potentials for deeper data value and government policy talks. Rife in strategy and pointers more than hard numbers on the impact of existing government data initiatives, the report pointed to big data’s “potential to transform government and society itself” by way of cues from successful data-driven private sector enterprises.

“Unfortunately, in the federal government, daily practice frequently undermines official policies that encourage sharing of information both within and among agencies and with citizens. Furthermore, decision-making by leaders in Congress and the Administration often is accomplished without the benefit of key information and without using the power of Big Data to model possible futures, make predictions, and fundamentally connect the ever increasing myriad of dots and data available,” the report’s authors wrote.

…(a while later)

The report recommended a five-step path to moving ahead with big data initiatives:

  1. Define the big data business opportunity.
  2. Assess existing and needed technical capabilities.
  3. Select a deployment pattern based on the velocity, volume and variety of data involved.
  4. Deploy the program “with an eye toward flexibility and expansion.”
  5. Review program against ROI, government policy and user needs.

Demystifying Big Data: A Practical Guide to Transforming the Business of Government (Report, PDF file) TechAmerica Foundation Big Data Commission (homepage)

The report is well worth your time but I would be cautious about the assumption that all data problems are “big data” problems.

My pre-big data strategy steps would be:

  1. Define the agency mission.
  2. Define the tasks necessary to accomplish #1.
  3. Define the role of data processing, any data processing, in meeting the tasks specified in #2.
  4. Evaluate the relevance of “big data” to the data processing defined in #3. (this is the equivalent of #1 in the commission report)

Unspecified notions about an agency’s mission, tasks to accomplish it, relevance of data processing to those tasks and finally, the relevance of “big data,” will result in disappointing and dysfunctional “Big Data” projects.

“Big data,” its potential, the needs of government, and its citizens, however urgent, are not reasons to abandon traditional precepts of project management.

Deciding on a solution, read “big data techniques,” before you understand and agree upon the problem to be solved, is a classic mistake.

Let’s not make it, again.

Follow The Data – FEC Campaign Data Challenge

Filed under: Cypher,FEC,Government,Government Data,Graphs,Neo4j — Patrick Durusau @ 5:53 am

Follow The Data – FEC Campaign Data Challenge by Andreas Kollegger.

Take the challenge and you may win a pass to Graph Connect, November 5 & 6 in San Francisco. (Closes 11 October 2012.)

In politics, people are often advised to “follow the money” to understand the forces influencing decisions. As engineers, we know we can do that and more by following the data.

Inspired by some innovative work by Dave Fauth, a Washington DC data analyst, we arranged a workshop to use FEC Campaign data that had been imported into Neo4j.

….

With the data imported, and a basic understanding of the domain model, we then challenged people to write Cypher queries to answer the following questions:

  1. All presidential candidates for 2012
  2. Most mythical presidential candidate
  3. Top 10 Presidential candidates according to number of campaign committees
  4. Find President Barack Obama
  5. Lookup Obama by his candidate ID
  6. Find Presidential Candidate Mitt Romney
  7. Lookup Romney by his candidate ID
  8. Find the shortest path of funding between Obama and Romney
  9. List the 10 top individual contributions to Obama
  10. List the 10 top individual contributions to Romney

Pointers to data, hints await at Andreas’ post.

September 23, 2012

Congress.gov: New Official Source of U.S. Federal Legislative Information

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 7:50 pm

Congress.gov: New Official Source of U.S. Federal Legislative Information

Legal Informatics has gathered up links to a number of reviews/comments on the new legislative interface for the U.S. federal government.

You can see the beta version at: Congress.gov.

Personally I like search and popularity being front and center, but that makes me wonder what isn’t available. Like bulk downloads in some reasonable format (can you say XML?).

What do you think about the interface?

September 17, 2012

U.S. Sequestration Report – Out of the Shadows/Into the Light?

Filed under: Government,Government Data,Topic Maps — Patrick Durusau @ 10:25 am

Due to personalities, pettiness and partisan politics too boring to recount, the U.S. budget is about to be automatically cut (sequestered). To accomplish that goal, the OMB Report Pursuant to the Sequestration Transparency Act of 2012 (P. L. 112–155) has been released. I first saw this report in Obama releases sequestration report by Amber Corrin (Federal Computer Weekly).

Can it be that U.S. government spending has stepped out of the shadows and into the light?

The report identifies specific programs and the proposed cuts to each one.

As you can imagine, howls of “dire consequences” are issuing from agencies, grantees, elected officials and of course, government staff.

Some of which are probably true. Some of them.

Does the sequestration report give us an opportunity to determine which claims of “dire consequences” are true and while ones are false?

Let’s take an easy one:

001-05-0127 Sergeant at Arms and Doorkeeper of the Senate

Present Cut Remaining
131 $Million 11 $Million 120 $Million

Can you name (identify) a specific “dire consequence” to reducing the “Sergeant at Arms and Doorkeeper of the Senate” budget by 11 $Million?

Want to represent the public interest? Ask your elected representatives to say what “dire consequences” they see from specific items in the sequestration report?

Do not accept hand waving generalities of purported “dire consequences.”

To qualify as a possible “dire consequence,” it should at least be identified by name. Such as: “Cut X means we can’t run metal scanners at government buildings.” Or “Cut 001-05-0130 means we can’t afford refreshments for Senate offices. (Yes, its really in there.)”

That would enable a meaningful debate over “dire consequences.”

Part of that debate should be around who claims “dire consequences” and what “dire consequences” are being claimed.

Can you capture that without using a topic map?

September 10, 2012

Sunlight Academy (Finding US Government Data)

Filed under: Government,Government Data,Law,Law - Sources — Patrick Durusau @ 4:05 pm

Sunlight Academy

From the website:

Welcome to Sunlight Academy, a collection of interactive tutorials for journalists, activists, researchers and students to learn about tools by the Sunlight Foundation and others to unlock government data.

Be sure to create a profile to access our curriculum, track your progress, watch videos, complete training activities and get updates on new tutorials and tools.

Whether you are an investigative journalist trying to get insight on a complex data set, an activist uncovering the hidden influence behind your issue, or a congressional staffer in need of mastering legislative data, Sunlight Academy guides you through how to make our tools work for you. Let’s get started!

The Sunlight Foundation has created tools to make government data more accessible.

Unlike some governments and software projects, the Sunlight Foundation business model isn’t based on poor or non-existent documentation.

Modules (as of 2012 September 10):

  • Tracking Government
    • Scout Scout is a legislative and governmental tracking tool from the Sunlight Foundation that alerts you when Congress or your state capitol talks about or takes action on issues you care about. Learn how to search and create alerts on federal and state legislation, regulations and the Congressional Record.
    • Scout (Webinar) Recorded webinar and demo of Scout from July 26, 2012. The session covered basic skills such as search terms and bill queries, as well as advanced functions such as tagging, merging outside RSS feeds and creating curated search collections.
  • Unlocking Data
    • Political Ad Sleuth Frustrated by political ads inundating your TV? Learn how you can discover who is funding these ads from the public files at your local television station through this tutorial.
    • Unlocking APIs What are APIs and how do they deliver government data? This tutorial provides an introduction to using APIs and highlights what Sunlight’s APIs have to offer on legislative and congressional data.
  • Lobbying
    • Lobbying Contribution Reports These reports highlight the millions of dollars that lobbying entities spend every year giving to charities in honor of lawmakers and executive branch officials, technically referred to as “honorary fees.” Find out how to investigate lobbying contribution reports, understand the rules behind them and see what you can do with the findings.
    • Lobbying Registration Tracker Learn about the Lobbying Registration Tracker, a Sunlight Foundation tool that allows you to track new registrations for federal lobbyists and lobbying firms. This database allows users to view registrations as they’re submitted, browse by issue, registrant or client, and see the trends in issues and registrations over the last 12 months.
    • Lobbying Report Form Four times a year, groups that lobby Congress and the federal government file reports on their activities. Unlock the important information contained in the quarterly lobbying reports to keep track of who’s influencing whom in Washington. Learn tips on how to read the reports and how they can inform your reporting.
  • Data Analysis
    • Data Visualizations in Google Docs While Google is often used for internet searches and maps, it can also help with data visualizations via Google Charts. Learn how to use Google Docs to generate interactive charts in this training.
    • Mapping Campaign Finance Data Campaign finance data can be complex and confusing — for reporters and for readers. But it doesn’t have to be. One way to make sense of it all is through mapping. Learn how to turn campaign finance information into beautiful maps, all through free tools.
    • Pivot Tables Pivot tables are powerful tools, but it’s not always obvious how to use them. Learn how to create and use pivot tables in Excel to aggregate and summarize data that otherwise would require a database.
  • Research Tools
    • Advanced Google Searches Google has made search easy and effective, but that doesn’t mean it can’t be better. Learn how to effectively use Google’s Advanced Search operators so you can get what you’re looking for without wasting time on irrelevant results.
    • Follow the Unlimited Money (webinar) Recorded webinar from August 8, 2012. This webinar covered tools to follow the millions of dollars being spent this election year by super PACs and other outside groups.
    • Learning about Data.gov Data.gov seeks to organize all of the U.S. government’s data, a daunting and unfinished task. In this module, learn about the powers and limitations of Data.gov, and what other resources to use to fill in Data.gov’s gaps.

Researching Current Federal Legislation and Regulations:…

Filed under: Government,Government Data,Law,Law - Sources,Legal Informatics — Patrick Durusau @ 3:30 pm

Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

Description quoted at Full Text Reports:

This report is designed to introduce congressional staff to selected governmental and nongovernmental sources that are useful in tracking and obtaining information federal legislation and regulations. It includes governmental sources such as the Legislative Information System (LIS), THOMAS, the Government Printing Office’s Federal Digital System (FDsys), and U.S. Senate and House websites. Nongovernmental or commercial sources include resources such as HeinOnline and the Congressional Quarterly (CQ) websites. It also highlights classes offered by the Congressional Research Service (CRS) and the Library of Congress Law Library.

This report will be updated as new information is available.

Direct link to PDF: Researching Current Federal Legislation and Regulations: A Guide to Resources for Congressional Staff

A very useful starting point for research on U.S. federal legislation and regulations, but only a starting point.

Each listed resource merits a user’s guide. And no two of them are exactly the same.

Suggestions for research/topic map exercises based on this listing of resources?

September 4, 2012

Getting data on your government

Filed under: Data Mining,Government Data,R — Patrick Durusau @ 6:52 pm

Getting data on your government

From the post:

I created an R package a while back to interact with some APIs that serve up data on what our elected represenatives are up to, including the New York Times Congress API, and the Sunlight Labs API.

What kinds of things can you do with govdat? Here are a few examples.

How do the two major parties differ in the use of certain words (searches the congressional record using the Sunlight Labs Capitol Words API)?

[text and code omitted]

Let’s get some data on donations to individual elected representatives.

[text and code omitted]

Or we may want to get a bio of a congressperson. Here we get Todd Akin of MO. And some twitter searching too? Indeed.

[text and code omitted]

I waver between thinking mining government data is a good thing and being reminded the government did voluntarily release it. In the latter case, it may be nothing more than a distraction.

August 4, 2012

FBI’s Sentinel Project: 5 Lessons Learned[?]

Filed under: Government,Government Data,Knowledge Management,Project Management — Patrick Durusau @ 1:57 pm

FBI’s Sentinel Project: 5 Lessons Learned [?] by John Foley.

John writes of lessons learned from the Sentinel Project, which replaces the $170 million disaster, Virtual Case File system.

Lessons you need to avoid applying to your information management projects, whether you use topic maps or no.

2. Agile development gets things done. The next big shift in strategy was Fulgham’s decision in September 2010 to wrest control of the project from prime contractor Lockheed Martin and use agile development to accelerate software deliverables. The thinking was that a hands-on, incremental approach would be faster because functionality would be developed, and adjustments made, in two-week “sprints.” The FBI missed its target date for finishing that work–September 2011–but it credits the agile methodology with ultimately getting the job done.

Missing a start date by ten (10) months does not count as a success for most projects. Moreover, note how they define “success:”

this week’s announcement that Sentinel, as of July 1, became available to all FBI employees is a major achievement.

Available to all FBI employees? I would think using it by all FBI employees would be the measure of success. Yes?

Can you think a success measure other than use by employees?

3. Commercial software plays an important role. Sentinel is based in part on commercial software, a fact that’s often overlooked because of all the custom coding and systems integration involved. Under the hood are EMC’s Documentum document management software, Oracle databases, IBM’s WebSphere middleware, Microsoft’s SharePoint, and Entrust’s PKI technology. Critics who say that Sentinel would have gone more smoothly if only it had been based on off-the-shelf software seem unaware that, in fact, it is.

Commercial software? Sounds like a software Frankenstein to me. I wonder if they simply bought software based on the political clout of the vendors and then wired it together? What it sounds like. Do you have access to the system documentation? That could prove to be an interesting read.

I can imagine legacy systems wired together with these components but if you are building a clean system, why the cut-n-paste from different vendors?

4. Agile development is cheaper, too. Sentinel came in under its $451 million budget. The caveat is that the FBI’s original cost estimate for Sentinel was $425 million, but that was before Fulgham and Johnson took over, and they stayed within the budget they were given. The Inspector General might quibble with how the FBI accounts for the total project cost, having pointed out in the past that its tally didn’t reflect the agency’s staff costs. But the FBI wasn’t forced to go to Congress with its hand out. Agile development wasn’t only faster, but also cheaper.

Right, let’s simply lie to the prospective client about the true cost of development for a project. Their staff, who already have full time duties, can just tough it out and give us the review/feedback that we need to build a working system. Right.

This is true for IT projects in general but topic map projects in particular. Clients will have to resource the project properly from the beginning, not just with your time but the time of its staff and subject matter experts.

A good topic map, read a useful topic map, is going to reflect contributions from the client’s staff. You need to make the case to decision makers that the staff contributions are just as important as their present day to day tasks.

BTW, if agile development oh so useful, people would be using it. Like C, Java, C++.

Do you see marketing pieces for C, Java, C++?

Successful approaches/languages are used, not advertised.

July 31, 2012

Political Moneyball

Filed under: Government Data,Networks,Politics — Patrick Durusau @ 1:51 pm

Nathan Yau points out the Wall Street Journal’s “Political Moneyball” visualization in Network of political contributions.

You will probably benefit from starting with Nathan’s comments and then navigating the WSJ visualization.

I like the honesty of the Wall Street Journal. They have chosen a side and yet see humor in its excesses.

Nathan mentions the difficulty with unfamiliar names and organizations.

An example of where topic maps could enable knowledgeable users to gather information together for the benefit of subsequent, less knowledgeable users of the map.

Creating the potential for a collaborative, evolutionary information resource that improves with usage.

July 30, 2012

U.S. Census Bureau Offers Public API for Data Apps

Filed under: Census Data,Government Data — Patrick Durusau @ 3:37 pm

U.S. Census Bureau Offers Public API for Data Apps by Nick Kolakowski.

From the post:

For any software developers with an urge to play around with demographic or socio-economic data: the U.S. Census Bureau has launched an API for Web and mobile apps that can slice that statistical information in all sorts of nifty ways.

The API draws data from two sets: the 2010 Census (statistics include population, age, sex, and race) and the 2006-2010 American Community Survey (offers information on education, income, occupation, commuting, and more). In theory, developers could use those datasets to analyze housing prices for a particular neighborhood, or gain insights into a city’s employment cycles.

The APIs include no information that could identify an individual. (emphasis added)

Suppose it should say: “Some assembly required.”

Similar resources at Data.gov and Google Public Data Explorer.

I first saw this at: Dashboard Insight.

July 16, 2012

Processing Public Data with R

Filed under: Environment,Government Data,R — Patrick Durusau @ 4:30 pm

Processing Public Data with R

From the post:

I use R aplenty in analysis and thought it might be worthwhile for some to see the typical process a relative newcomer goes through in extracting and analyzing public datasets

In this instance I happen to be looking at Canadian air pollution statistics.

The data I am interested in is available on the Ontario Ministry of Environment’s website. I have downloaded the hourly ozone readings from two weather stations (Grand Bend and Toronto West) for two years (2000 and 2011) which are available in several formats , including my preference, csv. According to the 2010 annual report from the Ministry, the two selected represent the extremes in readings for that year

I firstly set the directory in which the code and the associated datafiles will reside and import the data. I would normally load any R packages I will utilize at the head of the script (if not already in my start up file) but will hold off here until they are put to use.

I had to do a small amount of row deletion in the csv files so that only the readings data was included

A useful look at using R to manipulate public data.

Do you know of any articles on using R to output topic maps?

July 11, 2012

Importing public data with SAS instructions into R

Filed under: Data,Government Data,Parsing,Public Data,R — Patrick Durusau @ 2:28 pm

Importing public data with SAS instructions into R by David Smith.

From the post:

Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a “data dictionary” defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC IMPORT statement.

David reports on the SAScii package from Anthony Damico.

You still have to parse the files but it gets you one step closer to having useful information.

July 10, 2012

What is Linked Data

Filed under: Government Data,Linked Data,LOD — Patrick Durusau @ 10:37 am

What is Linked Data by John Goodwin.

From the post:

In the early 1990s there began to emerge a new way of using the internet to link documents together. It was called the World Wide Web. What the Web did that was fundamentally new was that it enabled people to publish documents on the internet and link them such that you could navigate from one document to another.

Part of Sir Tim Berners-Lee’s original vision of the Web was that it should also be used to publish, share and link data. This aspect of Sir Tim’s original vision has gained a lot of momentum over the last few years and has seen the emergence of the Linked Data Web.

The Linked Data Web is not just about connecting datasets, but about linking information at the level of a single statement or fact. The idea behind the Linked Data Web is to use URIs (these are like the URLs you type into your browser when going to a particular website) to identify resources such as people, places and organisations, and to then use web technology to provide some meaningful and useful information when these URIs are looked up. This ‘useful information’ can potentially be returned in a number of different encodings or formats, but the standard way for the linked data web is to use something called RDF (Resource Description Framework).

An introductory overview of the rise and use of linked data.

John is involved in efforts at data.gov.uk to provide open access to governmental data and one form of that delivery will be linked data.

You will be encountering linked data, both as a current and legacy format so it is worth your time to learn it now.

I first saw this at semanticweb.com.

June 24, 2012

Closing In On A Million Open Government Data Sets

Filed under: Dataset,Geographic Data,Government,Government Data,Open Data — Patrick Durusau @ 7:57 pm

Closing In On A Million Open Government Data Sets by Jennifer Zaino.

From the post:

A million data sets. That’s the number of government data sets out there on the web that we have closed in on.

“The question is, when you have that many, how do you search for them, find them, coordinate activity between governments, bring in NGOs,” says James A. Hendler, Tetherless World Senior Constellation Professor, Department of Computer Science and Cognitive Science Department at Rensselaer Polytechnic Institute, and a principal investigator of its Linking Open Government Data project lives, as well as Internet web expert for data.gov, He also is connected with many other governments’ open data projects. “Semantic web tools organize and link the metadata about these things, making them searchable, explorable and extensible.”

To be more specific, Hendler at SemTech a couple of weeks ago said there are 851,000 open government data sets across 153 catalogues from 30-something countries, with the three biggest representatives, in terms of numbers, at the moment being the U.S., the U.K, and France. Last week, the one million threshold was crossed.

About 410,000 of these data sets are from the U.S. (federal, state, city, county, tribal included), including quite a large number of geo-data sets. The U.S. government’s goal is to put “lots and lots and lots of stuff out there” and let people figure out what they want to do with it, he notes.

My question about data that “..[is] searchable, explorable and extensible,” is whether anyone wants to search, explore or extend it?

Simply piling up data to say you have a large pile of data doesn’t sound very useful.

I would rather have a smaller pile of data that included contract/testing transparency on anti-terrorism IT projects, for example. If the systems aren’t working, then disclosing them isn’t going to make them work any less well.

Not that anyone need fear transparency or failure to perform. The TSA has failed to perform for more than a decade now, failed to catch a single terrorist and it remains funded. Even when it starts groping children, passengers are so frightened that even that outrage passes without serious opposition.

Still, it would be easier to get people excited about mining government data if the data weren’t so random or marginal.

June 2, 2012

A Competent CTO Can Say No

Filed under: Government,Government Data,Health care — Patrick Durusau @ 9:05 pm

Todd Park, CTO of the United States, should be saying no.

Todd has mandated six months for progress on:

  1. MyGov
  2. Reimagine the relationship between the federal government and its citizens through an online footprint developed not just for the people, but also by the people.

  3. Open Data Initiatives
  4. Stimulate a rising tide of innovation and entrepreneurship that utilizes government data to create tools that help Americans in numerous ways – e.g., apps and services that help people find the right health care provider, identify the college that provides the best value for their money, save money on electricity bills through smarter shopping, or keep their families safe by knowing which products have been recalled.

  5. Blue Button for America
  6. Develop apps and create awareness of tools that help individuals get access to their personal health records — current medications and drug allergies, claims and treatment data, and lab reports – that can improve their health and healthcare.

  7. RFP-EZ
  8. Build a platform that makes it easier for small high-growth businesses to navigate the federal government, and enables agencies to quickly source low-cost, high-impact information technology solutions.

  9. The 20% Campaign
  10. Create a system that enables US government programs to seamlessly move from making cash payments to support foreign policy, development assistance, government operations or commercial activities to using electronic payments such as mobile devices, smart cards and other methods.

This is a classic “death march” pattern.

Having failed to make progress on any of these fronts in forty-two months, President Obama wants to mandate progress in six months.

Progress cannot be mandated and a competent CTO would say no. To the President and anyone who asks.

Progress is possible but only with proper scoping and requirements development.

Don’t further incompetence.

Take the pledge:

I refuse to apply for or if appointed to serve as a Presidential Innovation Fellow “…to deliver significant results in six months.” /s/ Patrick Durusau, Covington, Georgia, 2 June 2012.

(Details: US CTO seeks to scale agile thinking and open data across federal government)

TechAmerica Foundation Big Data Commission

Filed under: BigData,Government,Government Data — Patrick Durusau @ 9:04 pm

TechAmerica Foundation Big Data Commission

From the post:

Big Data Commission Launch

Data in the world is doubling every 18 months. Across government everyone is talking about the concept of Big Data, and how this new technology will transform the way Washington does business. But looking past the excitement, questions abound. What is Big Data, really? How is it defined? What capabilities are required to succeed? How do you use Big Data to make intelligent decisions? How will agencies effectively govern and secure huge volumes of information, while protecting privacy and civil liberties? And perhaps most importantly, what value will it really deliver to the US Government and the citizenry we serve?

To help answer these questions, and provide guidance to our Government’s senior policy and decision makers, TechAmerica is pleased to announce the formation of the Big Data Commission.

The Commission:

The Commission will be chaired by senior executives from IBM and SAP with vice chairs from Amazon and Wyle and will assemble 25-30 industry leaders, academia, along with a government advisory board with the objective of providing guidance on how Government Agencies should be leveraging Big Data to address their most critical business imperatives, and how Big Data can drive U.S. innovation and competitiveness.

Unlike Todd Park (soon to be former CTO of the United States) in A Competent CTO Can Say No, TechAmerica doesn’t promise “significant results” in six months.

Until the business imperatives of government agencies are understood, it isn’t possible for anyone, however well-intentioned or skilled, to give them useful advice.

Can’t say how well the commission will do at that task, to say nothing of determining what advice to give, but at least it isn’t starting with an arbitrary, election driven deadline.

New open data platform launches

Filed under: Government,Government Data,Graphics,Visualization — Patrick Durusau @ 6:25 pm

New open data platform launches

Kim Rees (of Flowing Data) writes:

Open data is everywhere. However, open data initiatives often manifest as mere CSV dumps on a forlorn web page. Junar, Lunfardo (Argentina slang) for “to know” or “to view,” seeks to help government and organizations take the guesswork out of developing their own software for such efforts.

If you are looking to explore options for making data available, this is worth a stop.

It won’t make you an expert at data visualization, any more than a copy of Excel™ will make you a business analyst. But having the right tools for a job never hurts.

May 23, 2012

White House launches new digital government strategy

Filed under: Government Data — Patrick Durusau @ 2:41 pm

White House launches new digital government strategy by Alex Howard.

From the post:

There’s a long history of people who have tried to transform the United States federal government through better use of information technology and data. It extends back to the early days of Alexander Hamilton’s ledgers of financial transaction, continues through information transmitted through telegraph, radio, telephone, and comes up to the introduction of the Internet, which has been driving dreams of better e-government for decades.

Vivek Kundra, the first U.S. chief information officer, and Aneesh Chopra, the nation’s first chief technology officer, were chosen by President Barack Obama to try to bring the federal government’s IT infrastructure and process into the 21st century, closing the IT gap that had opened between the private sector and public sector.

Today, President Obama issued a presidential memorandum on building a 21st century digital government.

In this memorandum, the president directs each major federal agency in the United States to make two key services that American citizens depend upon available on mobile devices within the next 12 months and to make “applicable” government information open and machine-readable by default. President Obama directed federal agencies to do two specific things: comply with the elements of the strategy by May 23, 2013 and to create a “/developer” page on ever major federal agency’s website.

Thought you might find some good marketing quotes for your products or services in the article or the presidential memorandum.

I do have to wince when I read:

For far too long, the American people have been forced to navigate a labyrinth of information across different Government programs in order to find the services they need.

Obviously it has been a while since President Obama has called a tech support line. My experiences recently have been good but then also very few. There is probably a relationship there.

There is going to be a lot of IT churn if not actual change so dust off your various proposals and watch for agency calls for assistance.

Don’t forget to offer topic map based solutions for agencies that want to find data once and not time after time.

May 11, 2012

Read’em and Weep

Filed under: Government,Government Data,Intelligence — Patrick Durusau @ 2:14 pm

I read Progress Made and Challenges Remaining in Sharing Terrorism-Related Information today.

My summary: We are less than five years away from some unknown level of functioning for an Information Sharing Environment (ISE) that facilitates the sharing of terrorism-related information.

Less than 20 years after 9/11, we will have some capacity to share information that may enable the potential disruption of terrorist plots.

The patience of terrorists and their organizations is appreciated. (I added that part. The report doesn’t say that.)

The official summary.

A breakdown in information sharing was a major factor contributing to the failure to prevent the September 11, 2001, terrorist attacks. Since then, federal, state, and local governments have taken steps to improve sharing. This statement focuses on government efforts to (1) establish the Information Sharing Environment (ISE), a government-wide approach that facilitates the sharing of terrorism-related information; (2) support fusion centers, where states collaborate with federal agencies to improve sharing; (3) provide other support to state and local agencies to enhance sharing; and (4) strengthen use of the terrorist watchlist. GAO’s comments are based on products issued from September 2010 through July 2011 and selected updates in September 2011. For the updates, GAO reviewed reports on the status of Department of Homeland Security (DHS) efforts to support fusion centers, and interviewed DHS officials regarding these efforts. This statement also includes preliminary observations based on GAO’s ongoing watchlist work. For this work, GAO is analyzing the guidance used by agencies to nominate individuals to the watchlist and agency procedures for screening individuals against the list, and is interviewing relevant officials from law enforcement and intelligence agencies, among other things..

The government continues to make progress in sharing terrorism-related information among its many security partners, but does not yet have a fully-functioning ISE in place. In prior reports, GAO recommended that agencies take steps to develop an overall plan or roadmap to guide ISE implementation and establish measures to help gauge progress. These measures would help determine what information sharing capabilities have been accomplished and are left to develop, as well as what difference these capabilities have made to improve sharing and homeland security. Accomplishing these steps, as well as ensuring agencies have the necessary resources and leadership commitment, should help strengthen sharing and address issues GAO has identified that make information sharing a high-risk area. Federal agencies are helping fusion centers build analytical and operational capabilities, but have more work to complete to help these centers sustain their operations and measure their homeland security value. For example, DHS has provided resources, including personnel and grant funding, to develop a national network of centers. However, centers are concerned about their ability to sustain and expand their operations over the long term, negatively impacting their ability to function as part of the network. Federal agencies have provided guidance to centers and plan to conduct annual assessments of centers’ capabilities and develop performance metrics by the end of 2011 to determine centers’ value to the ISE. DHS and the Department of Justice are providing technical assistance and training to help centers develop privacy and civil liberties policies and protections, but continuous assessment and monitoring policy implementation will be important to help ensure the policies provide effective protections. In response to its mission to share information with state and local partners, DHS’s Office of Intelligence and Analysis (I&A) has taken steps to identify these partner’s information needs, develop related intelligence products, and obtain more feedback on its products. I&A also provides a number of services to its state and local partners that were generally well received by the state and local officials we contacted. However, I&A has not yet defined how it plans to meet its state and local mission by identifying and documenting the specific programs and activities that are most important for executing this mission. The office also has not developed performance measures that would allow I&A to demonstrate the expected outcomes and effectiveness of state and local programs and activities. In December 2010, GAO recommended that I&A address these issues, which could help it make resource decisions and provide accountability over its efforts. GAO’s preliminary observations indicate that federal agencies have made progress in implementing corrective actions to address problems in watchlist-related processes that were exposed by the December 25, 2009, attempted airline bombing. These actions are intended to address problems in the way agencies share and use information to nominate individuals to the watchlist, and use the list to prevent persons of concern from boarding planes to the United States or entering the country, among other things. These actions can also have impacts on agency resources and the public, such as traveler delays and other inconvenience. GAO plans to report the results of this work later this year. GAO is not making new recommendations, but has made recommendations in prior reports to federal agencies to enhance information sharing. The agencies generally agreed and are making progress, but full implementation of these recommendations is needed.

Full Report: Progress Made and Challenges Remaining in Sharing Terrorism-Related Information

Let me share with you the other GAO reports cited in this report:

Do you see semantic mapping opportunities in all those reports?

May 9, 2012

Data.gov launches developer community

Filed under: Dataset,Government Data — Patrick Durusau @ 2:15 pm

Data.gov launches developer community

Federal Computer Week reports:

Data.gov has launched a new community for software developers to share ideas, collaborate or compete on projects and request new datasets.

Developer.data.gov joins a growing list of communities and portals tapping into Data.gov’s datasets, including those for health, energy, education, law, oceans and the Semantic Web.

The developer site is set up to offer access to federal agency datasets, source code, applications and ongoing developer challenges, along with blogs and forums where developers can discuss projects and share ideas.

Source: FCW (http://s.tt/1azwt)

Depending upon your developer skills, this could be a good place to hone them.

Not to mention having a wealth of free data sets at hand.

April 24, 2012

Let’s Party Like It’s 1994!

Filed under: BigData,Government,Government Data — Patrick Durusau @ 7:13 pm

Just coincidence that I read Fast algorithms for mining association rules (in Mining Basket Data) the same day I read: Big Data Lessons for Big Government by Julie Ginches.

The points Julie pulls out from a study by DateXu could have easily been from 1994.

The dates and names have changed, the challenges have not.

  • Employees need new skills, new technologies, and new ways to combine information from multiple sources so they can make sense of all the data pouring in so they can add more value and be effective. This new way of working directly applies to and will benefit both private industry and government.
  • Organizations need departmental specialists to work with IT to create systems that are better at collecting, managing, and analyzing data. If the government is going to succeed with big data, it will need to find better ways to communicate and collaborate across organizations, with tools that can be used by technical and non-technical staff in order to make discoveries and quickly act.
  • Enterprise businesses need a single, cross-channel platform to manage their data flows. The same is likely to hold true for government agencies that have typically been hamstrung in their data analysis because information is spread across multiple different, disconnected silos and multiple public and private organizations.
  • Seventy-five percent indicate that data has the potential to dramatically improve their business; however, 58 percent report that their organizations don’t have the quantitative skills and technology needed to analyze the data. More than 70 percent report they can’t effectively leverage the full value of their customer data….
  • 90% indicate that digital marketing can reduce customer acquisition costs through increased efficiency, but 46% report that they lack the information they need to communicate the benefits of big data to management….

If we are recycling old problems, that means solutions to those problems failed.

If we use the same solutions for the same problems this time, what result would you expect? (Careful, you only get one answer.)

Look for Let’s Party Like It’s 1994 II, to read about the one commonality of Julie’s five points. The one that topic maps can address, effectively.

April 22, 2012

The wrong way: Worst best practices in ‘big data’ analytics programs

Filed under: Analytics,BigData,Government,Government Data — Patrick Durusau @ 7:07 pm

The wrong way: Worst best practices in ‘big data’ analytics programs

Rick Sherman writes:

“Big data” analytics is hot. Read any IT publication or website and you’ll see business intelligence (BI) vendors and their systems integration partners pitching products and services to help organizations implement and manage big data analytics systems. The ads and the big data analytics press releases and case studies that vendors are rushing out might make you think it’s easy — that all you need for a successful deployment is a particular technology.

If only it were that simple. While BI vendors are happy to tell you about their customers who are successfully leveraging big data for analytics uses, they’re not so quick to discuss those who have failed. There are many potential reasons why big data analytics projects fall short of their goals and expectations. You can find lots of advice on big data analytics best practices; below are some worst practices for big data analytics programs so you know what to avoid.

Rick gives seven reasons why “big data” analytics projects fail:

  1. “If we build, it they will come.”
  2. Assuming that the software will have all the answers.
  3. Not understanding that you need to think differently.
  4. Forgetting all the lessons of the past.
  5. Not having the requisite business and analytical expertise.
  6. Treating the project like it’s a science experiment.
  7. Promising and trying to do too much.

Seven reasons that should be raised when the NSA Money Trap project fails.

Because no one has taken responsibility for those seven issues.

Or asked the contractors: What about your failed “big data” analytics projects?

Simple enough question.

Do you ask that question?

Open Government Data

Filed under: Data,Government Data,Open Data — Patrick Durusau @ 7:06 pm

Open Government Data by Joshua Tauberer.

From the website:

This book is the culmination of several years of thinking about the principles behind the open government data movement in the United States. In the pages within, I frame the movement as the application of Big Data to civics. Topics include principles, uses for transparency and civic engagement, a brief legal history, data quality, civic hacking, and paradoxes in transparency.

Johshua’s book can be ordered in hard copy, ebook, or viewed online for free.

You may find this title useful in discussions of open government data.

April 20, 2012

Standardizing Federal Transparency

Filed under: Government Data,Identity,Transparency — Patrick Durusau @ 6:24 pm

Standardizing Federal Transparency

From the post:

A new federal data transparency coalition is pushing for standardization of government documents and support for legislation on public records disclosures, taxpayer spending and business identification codes.

The Data Transparency Coalition announced its official launch Monday, vowing nonpartisan work with Congress and the Executive Branch on ventures toward digital publishing of government documents in a standardized and integrated formats. As part of that effort, the coalition expressed its support of legislative proposals such as: the Digital Accountability and Transparency Act, which would open public spending records published on a single digital format; the Public Information Online Act, which pushes for all records to be released digitally in a machine-readable format; and the Legal Entity Identifier proposal, creating a standard ID code for companies.

The 14 founding members include vendors Microsoft, Teradata, MarkLogic, Rivet Software, Level One Technologies and Synteractive, as well as the Maryland Association of CPAs, financial advisory BrightScope, and data mining and pattern discovery consultancy Elder Research. The coalition board of advisors includes former U.S. Deputy CTO Beth Noveck, data and information services investment firm partner Eric Gillespie and former Recovery Accountability and Transparency Board Chairman Earl E. Devaney.

Data Transparency Coalition Executive Director Hudson Hollister, a former counsel for the House of Representatives and U.S. Securities and Exchange Commission, noted that when the federal government does electronically publish public documents it “often fails to adopt consistent machine-readable identifiers or uniform markup languages.”

Sounds like an opportunity for both the markup and semantic identity communities, topic maps in particular.

Reasoning not only will there need to be mappings between vocabularies and entities but also between “uniform markup languages” as they evolve and develop.

With Perfect Timing, UK Audit Office Review Warns Open Government Enthusiasts

Filed under: Government Data,Open Data — Patrick Durusau @ 6:24 pm

With Perfect Timing, UK Audit Office Review Warns Open Government Enthusiasts

Andrea Di Maio writes:

Right in the middle of the Open Government Partnership conference, which I mentioned in my post yesterday, the UK National Audit Office (NAO) published its cross-government review on Implementing Transparency.

The report, while recognizing the importance and the potential for open data initiatives, highlights a few areas of concern that should be taken quite seriously by the OGP conference attendees, most of which are making open data more a self-fulfilling prophecy than an actual tool for government transparency and transformation.

The areas of concern highlighted in the review are an insufficient attention to assess costs, risks and benefits of transparency, the variation in completeness of information and the mixed progress. While the two latter can improve with greater maturity, it is the first time that requires the most attention.

Better late than never.

I have yet to hear a discouraging word in the U.S. about the rush to openness by the Obama administration.

Not that I object to “openness,” but I would like to see meaningful “openness.”

Take campaign finance for example. Treating all contributions over fifty dollars ($50) the same is hiding influence buying in the chaff of reporting.

What matters is any contribution of over say $100,000 to a candidate. That would make the real supporters (purchasers really) of a particular office stand out.

The Obama Whitehouse uses hiding in the chaff to say they are disclosing White House visitors. Who are mixed into the weekly visitor log for the White House. Girl and Boy Scout troop visits don’t count the same as personal audiences with the President.

Government contract data should be limited to contracts over 500,000 and include individual owner and corporate names plus the names of their usual government contract officers. Might need to bump the $500,000 up but could try it for a year.

If we bring up the house lights we have to search everyone. Why not a flashlight on the masher in the back row?

April 15, 2012

Announcing Fech 1.0

Filed under: Data Mining,Government Data,News — Patrick Durusau @ 7:15 pm

Announcing Fech 1.0 by Derek Willis.

From the post:

Fech now retrieves a whole lot more campaign finance data.

We’re excited to announce the 1.0 release of Fech, our Ruby library for parsing Federal Election Commission electronic campaign filings. Fech 1.0 now covers all of the current form types that candidates and committees submit. Originally developed to parse presidential committee filings, Fech now can be used for almost any kind of report (Senate candidates file on paper, so Fech can’t help there). The updated documentation, made with Github Pages, has a full listing of the supported formats.

Now it’s possible to use Fech to parse the pre-election filings of candidates receiving contributions of $1,000 or more — one way to see the late money in politics — or to dig through political party and political action committee reports to see how committees spend their funds. At The Times, Fech now plays a much greater role in powering our Campaign Finance API and in interactives that make use of F.E.C. data.

The additions to Fech include the ability to compare two filings and examine the differences between them. Since the F.E.C. requires that amendments replace the entire original filing, the comparison feature is especially useful for seeing what has changed between an original filing and an amendment to it. Another feature allows users to pass in a specific quote character (or parse a filing’s data without one at all) in order to avoid errors parsing comma-separated values that occasionally appear in filings.

Kudos to the New York Times for development of software and Fech in particular, to give the average person access to “public” information. Without meaningful access, it can hardly qualify as “public” can it?

Something the U.S. Senate should keep in mind as it remains mired in 19th century pomp and privilege. Or diplomats. The other remaining class of privilege. Transparency is coming.


Update: Fech 1.1 Released.

« Newer PostsOlder Posts »

Powered by WordPress