Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 25, 2014

Open-Data Data Dumps

Filed under: Open Data,Open Government — Patrick Durusau @ 3:04 pm

Government’s open-data portal at risk of becoming a data dump by Jj Worrall.

From the post:

The Government’s new open-data portal is not yet where it would like it to be, Minister Brendan Howlin said in a Department of Public Expenditure and Reform meeting room earlier this week.

In case expectations are too high, the word “pilot” is in italics when you visit the site in question – data.gov.ie.

Meanwhile the words “start” and “beginning” pepper the conversation with the Minister and a variety of data experts from the Insight Centre in NUI Galway who have helped create the site.

Data.gov.ie allows those in the Government, as well as interested businesses and citizens, to examine data from a variety of public bodies, opening opportunities for Government efficiencies and commercial possibilities along the way.

The main problem is that there is not much of it, and a lot of what is there can’t be utilised in a particularly useful fashion.

As director of the US Open Data Institute Waldo Jaquith told The Irish Times, with “almost no data” available in a format that’s genuinely usable by app developers, businesses or interested parties, for the moment data.gov.ie represents “a haphazard collection of data”.

It is important to realize that governments and their staffs have very little experience at being open and/or sharing data. Among the reasons for being reluctant to post open-data are:

  1. Less power over you since requests for data cannot be delayed or denied
  2. Less power in general because others will have the data
  3. Less power to confer on others by exclusive access to the data
  4. Less security since data may show poor results or performance
  5. Less security since data may show favoritism or fraud
  6. Less prestige as the source of answers on the data

Not an exhaustive list but it is a reminder that changing the attitudes about open-data probably beyond your reach.

What you can do with a site such as data.gov.ie, is to find a dataset of interest to you and make concrete suggestions for improvements.

There are a number of government staffers who I didn’t capture in my list of reasons to not share data. Side with them and facilitate their work.

For example:

Met Éireann Climate Products. A polite note to Evelyn.O’Connor@per.gov.ie should point out that an order form and price list doesn’t really constitute “open-data” in the sense citizens and developers expect. Should take the “resource” off the listing and make it available elsewhere. “Data products to order” for example.

or,

Weather Buoy Network Real Time Data, where if you dig long enough, you will find that you can download csv formatted data by blindly guessing at bouy names. A map of bouy locations would greatly assist at that point, not to mention having an RSS feed for bouy data as it is received. Downloading a file tells me I am not getting “Real Time Data.” Yes?

Not major improvements but would improve those two items at any rate.

It will take time but ultimately sharing staff will prevail. You can hasten that day’s arrival or you can retard it. Your choice.

I first saw this in a tweet by Deirdre Lee.

July 15, 2014

Free Companies House data to boost UK economy

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 4:57 pm

Free Companies House data to boost UK economy

From the post:

Companies House is to make all of its digital data available free of charge. This will make the UK the first country to establish a truly open register of business information.

As a result, it will be easier for businesses and members of the public to research and scrutinise the activities and ownership of companies and connected individuals. Last year (2013/14), customers searching the Companies House website spent £8.7 million accessing company information on the register.

This is a considerable step forward in improving corporate transparency; a key strand of the G8 declaration at the Lough Erne summit in 2013.

It will also open up opportunities for entrepreneurs to come up with innovative ways of using the information.

This change will come into effect from the second quarter of 2015 (April – June).

In a side bar, Business Secretary Vince Cable said in part:

Companies House is making the UK a more transparent, efficient and effective place to do business.

I’m not sure about “efficient,” but providing incentives for lawyers and others to track down insider trading and other business as usual practices and arming them with open data would be a start in the right direction.

I first saw this in a tweet by Hadley Beeman.

July 8, 2014

Crowdscraping – You Game?

Filed under: Corporate Data,Crowd Sourcing,Open Data,Web Scrapers — Patrick Durusau @ 1:12 pm

Launching #FlashHacks: a crowdscraping movement to release 10 million data points in 10 days. Are you in? by Hera.

From the post:

The success story that is OpenCorporates is very much a team effort – not just the tiny OpenCorporates core team, but the whole open data community, who from the beginning have been helping us in so many ways, from writing scrapers for company registers, to alerting us when new data is available, to helping with language or data questions.

But one of the most common questions has been, “How can I get data into OpenCorporates“. Given that OpenCorporates‘ goal is not just every company in the world but also all the public data that relates to those companies, that’s something we’ve wanted to allow, as we would not achieve that alone, and it’s something that will make OpenCorporates not just the biggest open database of company data in the world, but the biggest database of company data, open or proprietary.

To launch this new era in corporate data, we are launching a #FlashHacks campaign.

Flash What? #FlashHacks.

We are inviting all Ruby and Python botwriters to help us crowdscrape 10 million data points into OpenCorporates in 10 days.

How you can join the crowdscraping movement

  • Join missions.opencorporates.com and sign up!
  • Have a look at the datasets we have listed on the Campaign page as inspiration. You can either write bots for these or even chose your own!
  • Sign up to a mission! Send a tweet pledge to say you have taken on a mission.
  • Write the bot and submit on the platform.
  • Tweet your success with the #FlashHacks tag! Don’t forget to upload the FlashHack design as your twitter cover photo and facebook cover photo to get more people involved.

Join us on our Google Group, share problems and solutions, and help build the open corporate data community.

If you are interested in covering this story, you can view the press release here.

Also of interest: Ruby and Python coders – can you help us?

To join this crowdscrape, sign up at: missions.opencorporates.com.

Tweet, email, post, etc.

Could be the start of a new social activity, the episodic crowdscrape.

Are crowdscrapes an answer to massive data dumps from corporate interests?

I first saw this in a tweet by Martin Tisne.

July 7, 2014

Access vs. Understanding

Filed under: Open Data,Public Data,Statistics — Patrick Durusau @ 4:09 pm

In Do doctors understand test results? William Kremer covers Risk savvy : how to make good decisions, a recent book on understanding risk statistics by Gerd Gigerenzer.

You will have little doubt that doctors don’t know the correct risk statistics for very common medical issues (breast cancer screening) and even when supplied with the correct information, they are incapable of interpreting it correctly when you finish Kermer’s article.

And the public?

Unsurprisingly, patients’ misconceptions about health risks are even further off the mark than doctors’. Gigerenzer and his colleagues asked over 10,000 men and women across Europe about the benefits of PSA screening and breast cancer screening respectively. Most overestimated the benefits, with respondents in the UK doing particularly badly – 99% of British men and 96% of British women overestimated the benefit of the tests. (Russians did the best, though Gigerenzer speculates that this is not because they get more good information, but because they get less misleading information.)

What does that suggest to you about the presentation/interpretation of data encoded with a topic map or not?

To me it says that beyond testing an interface for usability and meeting the needs of users, we need to start testing users’ understanding of the data presented by interfaces. Delivery of great information that leaves a user mis-informed (unless that is intentional) doesn’t seem all that helpful.

I am looking forward to reading Risk savvy : how to make good decisions. I don’t know that I will make “better” decisions but I will know when I am ignoring the facts. 😉

I first saw this in a tweet by Alastair Kerr.

July 2, 2014

OpenPrism

Filed under: Open Data,Open Government — Patrick Durusau @ 2:50 pm

Searching Data Tables

From the webpage:

There are loads of open data portals There’s even portal about data portals. And each of these portals has loads of datasets.

OpenPrism is my most recent attempt at understanding what is going on in all of these portals. Read on if you want to see why I made it, or just go to the site and start playing with it.

Naive search method

One difficulty in discovering open data is the search paradigm.

Open data portals approach searching data as if data were normal prose; your search terms are some keywords, a category, &c., and your results are dataset titles and descriptions.

OpenPrism is one small attempt at making it easier to search. Rather than going to all of the different portals and making a separate search for each portal, you type your search in one search bar, and you get results from a bunch of different Socrata, CKAN and Junar portals.

Certainly more efficient than searching data portals separately but searching data portals is highly problematic in any event.

Or at least more problematic that using one of the standard web search engines. Search engines that rely upon the choices of millions of users to fine tune their results and even then they are often a mixed bag.

Inter-data portals and I suspect most intra-data portals do not share common schemas or metadata. Which means a search that is successful in one data portal may return no results in another data portal.

Not that I am about to advocate a “universal” schema for all data portals. 😉

A good first step would be enabling data silo to have searchable mappings for data columns as suggested by users. Not machine implemented but just simple prose. Users researching in particular areas are likely to encounter the same data sets and recording their mappings could well assist other users.

Relying on user suggested mappings would also enable improvements to those data sets that get used the most, the ones users care about possibly combining. As opposed to having IT guessing what data mappings should have priority.

Sound like a plan?

See the source at GitHub.

I first saw this in a tweet by Felienne Hermans

July 1, 2014

Piketty in R markdown

Filed under: Ecoinformatics,Open Data,R — Patrick Durusau @ 11:56 am

Piketty in R markdown – we need some help from the crowd by Jeff Leek.

From the post:

Thomas Piketty’s book Capital in the 21st Century was a surprise best seller and the subject of intense scrutiny. A few weeks ago the Financial Times claimed that the analysis was riddled with errors, leading to a firestorm of discussion. A few days ago the London School of economics posted a similar call to make the data open and machine readable saying.

None of this data is explicitly open for everyone to reuse, clearly licenced and in machine-readable formats.

A few friends of Simply Stats had started on a project to translate his work from the excel files where the original analysis resides into R. The people that helped were Alyssa Frazee, Aaron Fisher, Bruce Swihart, Abhinav Nellore, Hector Corrada Bravo, John Muschelli, and me. We haven’t finished translating all chapters, so we are asking anyone who is interested to help contribute to translating the book’s technical appendices into R markdown documents. If you are interested, please send pull requests to the gh-pages branch of this Github repo.

Hmmm, debate to be conducted based on known data sets?

That sounds like a radical departure from most public debates, to say nothing of debates in politics.

Dangerous because the general public may come to expect news reports, government budgets, documents, etc. to be accompanied by machine readable data files.

Even more dangerous if data files are compared to other data files, for consistency, etc.

No time to start like the present. Think about helping with the Piketty materials.

You may be helping to start a trend.

June 6, 2014

UK Houses of Parliament launches Open Data portal

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 6:05 pm

UK Houses of Parliament launches Open Data portal

From the webpage:

Datasets related to the UK Houses of Parliament are now available via data.parliament.uk – the institution’s new dedicated Open Data portal.

Site developers are currently seeking feedback on the portal ahead of the next release, details of how to get in touch can be found by clicking here.

From the alpha release of the portal:

Welcome to the first release of data.parliament.uk – the home of Open Data from the UK Houses of Parliament. This is an alpha release and contains a limited set of features and data. We are seeking feedback from users about the platform and the data on it so please contact us.

I would have to agree that the portal presently contains “limited data.” 😉

What would be helpful for non-U.K. data miners as well as ones in the U.K., would be some sense of what data is available?

A PDF file listing data that is currently maintained on the UK Houses of Parliament, their members, record of proceedings, transcripts, etc. would be a good starting point.

Pointers anyone?

June 2, 2014

openFDA

Filed under: Government,Government Data,Medical Informatics,Open Access,Open Data — Patrick Durusau @ 4:30 pm

openFDA

Not all the news out of government is bad.

Consider openFDA which is putting

More than 3 million adverse drug event reports at your fingertips.

From the “about” page:

OpenFDA is an exciting new initiative in the Food and Drug Administration’s Office of Informatics and Technology Innovation spearheaded by FDA’s Chief Health Informatics Officer. OpenFDA offers easy access to FDA public data and highlight projects using these data in both the public and private sector to further regulatory or scientific missions, educate the public, and save lives.

What does it do?

OpenFDA provides API and raw download access to a number of high-value structured datasets. The platform is currently in public beta with one featured dataset, FDA’s publically available drug adverse event reports.

In the future, openFDA will provide a platform for public challenges issued by the FDA and a place for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data.

We’re currently focused on working on datasets in the following areas:

  • Adverse Events: FDA’s publically available drug adverse event reports, a database that contains millions of adverse event and medication error reports submitted to FDA covering all regulated drugs.
  • Recalls (coming soon): Enforcement Report and Product Recalls Data, containing information gathered from public notices about certain recalls of FDA-regulated products
  • Documentation (coming soon): Structured Product Labeling Data, containing detailed product label information on many FDA-regulated product

We’ll be releasing a number of updates and additional datasets throughout the upcoming months.

OK, I’m Twitter follower #522 @openFDA.

What’s your @openFDA number?

A good experience, i.e., people making good use of released data, asking for more data, etc., is what will drive more open data. Make every useful government data project count.

May 20, 2014

Follow the Money (OpenTED)

Filed under: EU,Open Data,Open Government — Patrick Durusau @ 10:42 am

Opening Up EU Procurement Data by Friedrich Lindenberg.

From the post:

What is the next European dataset that investigative journalists should look at? Back in 2012 at the DataHarvest conference, Brigitte, investigative superstar from FarmSubsidy and co-host of the conference, had a clear answer: let’s open up TED (Tenders Electronic Daily). TED is the EU’s shared procurement mechanism, and is at the heart of the EU contracting process. Opening it up would shine a light on the key questions of who receives public money, and what they receive it for.

Her suggestion triggered a two-year project, OpenTED, which, as of last week, has finally matured into a useful resource for journalists and researchers. While gaps remain, we hope it will now start to be used by journalists, NGOs, analysts and citizens to get information on everything from large scale trends to local municipal developments.

(image omitted)

OpenTED

TED collects tender notices for large public projects so that companies from all EU countries can bid on those contracts. For journalists, there are many exciting questions such a database would be able to answer: What major projects are being announced? Who is winning the contracts for these projects, and is that decision made prudently and impartially? Who are the biggest suppliers in a particular country or industry?

A data dictionary for the project remains unfinished and there are plenty of other opportunities to contribute to this project.

The phrase “large public project” means projects with budgets in excess of €200,000. If experience in the United States holds true for the EU, there can be a lot of FGC (Fraud, Greed, Corruption) in under €200,000 contracts.

If you are looking for volunteer opportunities, the data needs to be used and explored, a data dictionary remains unfinished, current code can be improved and I assume documentation would be appreciated.

Certainly the type of project that merits widespread public support.

I find the project interesting because once you connect the players based on this data set, folding in other sets of connections, such as school, social, club, agency, employer, will improve the value of the original data set. Topic maps of course being my preferred method for the folding.

I first saw this in a tweet by ePSIplatform.

April 19, 2014

New trends in sharing data science work

Filed under: Conferences,GraphLab,Graphs,Open Data — Patrick Durusau @ 6:42 pm

New trends in sharing data science work

Danny Bickson writes:

I got the following venturebeat article from my colleague Carlos Guestrin.

It seems there is an interesting trend of allowing data scientists to share their work: Imagine if a company’s three highly valued data scientists can happily work together without duplicating each other’s efforts and can easily call up the ingredients and results of each other’s previous work.

That day has come. As the data scientist arms race continues, data scientists might want to join forces. Crazy idea, right? Two San Francisco startups — Domino Data Lab and Sense — have emerged recently with software to let data scientists collaborate on multiple projects. In a way, it’s like code storehouse GitHub for the data science world. A Montreal startup named Plot.ly has been talking about the same themes, but it brings a more social twist. Another startup, Mode Analytics, is building software for data analysts to ask questions of data without duplicating previous efforts. And at least one more mature software vendor, Alpine Data Labs, has been adding features to help many colleagues in a company apply algorithms to code on one central hub.

If you aren’t already registered for GraphLab Conference 2014, notice that Alpine Data Labs, Domino Data Labs, Mode Analytics, Plot.ly, and, Sense will all be at the GraphLab Conference.

Go ahead, register for the GraphLab conference. At the very worst you will learn something. If you socialize a little bit, you will meet some of the brightest graph people on the planet.

Plus, when the history of “sharing” in data science is written, you will have attended one of the early conferences on sharing code for data science. After years of hoarding data (where you now see open data) and beginning to see code sharing, data science is developing a different model.

And you were there to cheer them on!

April 9, 2014

IRS Data?

Filed under: Government,Government Data,Open Access,Open Data — Patrick Durusau @ 7:45 pm

New, Improved IRS Data Available on OpenSecrets.org by Robert Maguire.

From the post:

Among the more than 160,000 comments the IRS received recently on its proposed rule dealing with candidate-related political activity by 501(c)(4) organizations, the Center for Responsive Politics was the only organization to point to deficiencies in a critical data set the IRS makes available to the public.

This month, the IRS released the newest version of that data, known as 990 extracts, which have been improved considerably. Now, the data is searchable and browseable on OpenSecrets.org.

“Abysmal” IRS data

Back in February, CRP had some tough words for the IRS concerning the information. In the closing pages of our comment on the agency’s proposed guidelines for candidate-related political activity, we wrote that “the data the IRS provides to the public — and the manner in which it provides it — is abysmal.”

While I am glad to see better access to 501(c) 990 data, in a very real sense, this isn’t “IRS data” is it?

This is data that the government collected under penalty of law from tax entities in the United States.

Granting it was sent in “voluntarily” but there is a lot of data that entities and individuals send to local, state and federal government “voluntarily.” Not all of it is data that most of us would want handed out because other people are curious.

As I said, I like better access to 990 data but we need to distinguish between:

  1. Government sharing data it collected from citizens or other entities, and
  2. Government sharing data about government meetings, discussions, contacts with citizens/contractors, policy making, processes and the like.

If I’m not seriously mistaken, most of the open data from government involves a great deal of #1 and very little of #2.

Is that your impression as well?

One quick example. The United States Congress, with some reluctance, seems poised on delivery of near real-time information on legislative proposals before Congress. Which is a good thing.

But there has been no discussion of tracking the final editing of bills to trace the insertion or deletion of language by who and with whose agreement? Which is a bad thing.

It makes no difference how public the process is up to final edits, if the final version is voted upon before changes can be found and charged to those responsible.

March 30, 2014

The Theoretical Astrophysical Observatory:…

Filed under: Astroinformatics,Funding,Government,Open Data — Patrick Durusau @ 7:05 pm

The Theoretical Astrophysical Observatory: Cloud-Based Mock Galaxy Catalogues by Maksym Bernyk, et al.

Abstract:

We introduce the Theoretical Astrophysical Observatory (TAO), an online virtual laboratory that houses mock observations of galaxy survey data. Such mocks have become an integral part of the modern analysis pipeline. However, building them requires an expert knowledge of galaxy modelling and simulation techniques, significant investment in software development, and access to high performance computing. These requirements make it difficult for a small research team or individual to quickly build a mock catalogue suited to their needs. To address this TAO offers access to multiple cosmological simulations and semi-analytic galaxy formation models from an intuitive and clean web interface. Results can be funnelled through science modules and sent to a dedicated supercomputer for further processing and manipulation. These modules include the ability to (1) construct custom observer light-cones from the simulation data cubes; (2) generate the stellar emission from star formation histories, apply dust extinction, and compute absolute and/or apparent magnitudes; and (3) produce mock images of the sky. All of TAO’s features can be accessed without any programming requirements. The modular nature of TAO opens it up for further expansion in the future.

The website: Theoretical Astrophysical Observatory.

While disciplines in the sciences and the humanities play access games with data and publications, the astronomy community continues to shame both of them.

Funders, both government and private should take a common approach: Open and unfettered access to data or no funding.

It’s just that simple.

If grantees object, they can try to function without funding.

March 22, 2014

Opening data: Have you checked your pipes?

Filed under: Data Mining,ETL,Open Access,Open Data — Patrick Durusau @ 7:44 pm

Opening data: Have you checked your pipes? by Bob Lannon.

From the post:

Code for America alum Dave Guarino had a post recently entitled “ETL for America”. In it, he highlights something that open data practitioners face with every new project: the problem of Extracting data from old databases, Transforming it to suit a new application or analysis and Loading it into the new datastore that will support that new application or analysis. Almost every technical project (and every idea for one) has this process as an initial cost. This cost is so pervasive that it’s rarely discussed by anyone except for the wretched “data plumber” (Dave’s term) who has no choice but to figure out how to move the important resources from one place to another.

Why aren’t we talking about it?

The up-front costs of ETL don’t come up very often in the open data and civic hacking community. At hackathons, in funding pitches, and in our definitions of success, we tend to focus on outputs (apps, APIs, visualizations) and treat the data preparation as a collateral task, unavoidable and necessary but not worth “getting into the weeds” about. Quoting Dave:

The fact that I can go months hearing about “open data” without a single
mention of ETL is a problem. ETL is the pipes of your house: it’s how you
open data.

It’s difficult to point to evidence that this is really the case, but I personally share Dave’s experience. To me, it’s still the elephant in the room during the proceedings of any given hackathon or open data challenge. I worry that the open data community is somehow under the false impression that, eventually in the sunny future, data will be released in a more clean way and that this cost will decrease over time.

It won’t. Open data might get cleaner, but no data source can evolve to the point where it serves all possible needs. Regardless of how easy it is to read, the data published by government probably wasn’t prepared with your new app idea in mind.

Data transformation will always be necessary, and it’s worth considering apart from the development of the next cool interface. It’s a permanent cost of developing new things in our space, so why aren’t we putting more resources toward addressing it as a problem in its own right? Why not spend some quality time (and money) focused on data preparation itself, and then let a thousand apps bloom?

If you only take away this line:

Open data might get cleaner, but no data source can evolve to the point where it serves all possible needs. (emphasis added)

From Bob’s entire post, reading it has been time well spent.

Your “clean data” will at times be my “dirty data” and vice versa.

Documenting the semantics we “see” in data and that drives our transformations into “clean” data for us, stands a chance of helping the next person in the line to use that data.

Think of it as an accumulation of experience with a data sets and the results obtained from it.

Or you can just “wing it” with ever data set you encounter and so shall we all.

Your call.

I first saw this in a tweet by Dave Guarino.

March 18, 2014

UK statistics and open data…

Filed under: Government,Government Data,Open Data,Open Government — Patrick Durusau @ 7:52 pm

UK statistics and open data: MPs’ inquiry report published Owen Boswarva.

From the post:

This morning the Public Administration Select Committee (PASC), a cross-party group of MPs chaired by Bernard Jenkin, published its report on Statistics and Open Data.

This report is the product of an inquiry launched in July 2013. Witnesses gave oral evidence in three sessions; you can read the transcripts and written evidence as well.

Useful if you are looking for rhetoric and examples of use of government data.

Ironic that just last week the news broke that Google has given British security the power to censor “unsavory” (but legal) content from Youtube. UK gov wants to censor legal but “unsavoury” YouTube content by Lisa Vaas.

Lisa writes:

Last week, the Financial Times revealed that Google has given British security the power to quickly yank terrorist content offline.

The UK government doesn’t want to stop there, though – what it really wants is the power to pull “unsavoury” content, regardless of whether it’s actually illegal – in other words, it wants censorship power.

The news outlet quoted UK’s security and immigration minister, James Brokenshire, who said that the government must do more to deal with material “that may not be illegal but certainly is unsavoury and may not be the sort of material that people would want to see or receive.”

I’m not sure why the UK government wants to block content that people don’t want to see or receive. They simply won’t look at it. Yes?

But, intellectual coherence has never been a strong point of most governments and the UK in particular of late.

Is this more evidence for my contention that “open data” for government means only the data government wants you to have?

March 17, 2014

Peyote and the International Plant Names Index

Filed under: Agriculture,Data,Names,Open Access,Open Data,Science — Patrick Durusau @ 1:30 pm

International Plant Names Index

What a great resource to find as we near Spring!

From the webpage:

The International Plant Names Index (IPNI) is a database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes. Its goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names. The data are freely available and are gradually being standardized and checked. IPNI will be a dynamic resource, depending on direct contributions by all members of the botanical community.

I entered the first plant name that came to mind: Peyote.

No “hits.” ?

Wikipedia gives Peyote’s binomial name as: Lophophora williamsii (think synonym).*

Searching on Lophophora williamsii, I got three (3) “hits.”

Had I bothered to read the FAQ before searching:

10. Can I use IPNI to search by common (vernacular) name?

No. IPNI does not include vernacular names of plants as these are rarely formally published. If you are looking for information about a plant for which you only have a common name you may find the following resources useful. (Please note that these links are to external sites which are not maintained by IPNI)

I understand the need to specialize in one form of names but “formally published” means that without a useful synonyms list, the general public has an additional burden to access publicly funded research results.

Even with a synonym list there is an additional burden because you have to look up terms in the list, then read the text with that understanding and then back to the synonym list again.

What would dramatically increase public access to publicly funded research would be to have a specialized synonym list for publications that transposes the jargon in articles to selected sets of synonyms. Would not be as precise or grammatical as the original, but it would allow the reading pubic to get a sense of even very technical research.

That could be a way to hitch topic maps to the access to publicly funded data band wagon.

Thoughts?

I first saw this in a tweet by Bill Baker.

* A couple of other fun facts from Wikipedia on Peyote: 1. It’s conservation status is listed as “apparently secure,” and 2. Wikipedia has photos of Peyote “in the wild.” I suppose saying “Peyote growing in a pot” would raise too many questions.

March 15, 2014

Publishing biodiversity data directly from GitHub to GBIF

Filed under: Biodiversity,Data Repositories,Open Access,Open Data — Patrick Durusau @ 9:01 pm

Publishing biodiversity data directly from GitHub to GBIF by Roderic D. M. Page.

From the post:

Today I managed to publish some data from a GitHub repository directly to GBIF. Within a few minutes (and with Tim Robertson on hand via Skype to debug a few glitches) the data was automatically indexed by GBIF and its maps updated. You can see the data I uploaded here.

In case you don’t know about GBIF (I didn’t):

The Global Biodiversity Information Facility (GBIF) is an international open data infrastructure, funded by governments.

It allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet.

By encouraging and helping institutions to publish data according to common standards, GBIF enables research not possible before, and informs better decisions to conserve and sustainably use the biological resources of the planet.

GBIF operates through a network of nodes, coordinating the biodiversity information facilities of Participant countries and organizations, collaborating with each other and the Secretariat to share skills, experiences and technical capacity.

GBIF’s vision: “A world in which biodiversity information is freely and universally available for science, society and a sustainable future.”

Roderic summarizes his post saying:

what I’m doing here is putting data on GitHub and having GBIF harvest that data directly from GitHub. This means I can edit the data, rebuild the Darwin Core Archive file, push it to GitHub, and GBIF will reindex it and update the data on the GBIF portal.

The process isn’t perfect but unlike disciplines where data sharing is the exception rather than the rule, the biodiversity community is trying to improve its sharing of data.

Every attempt at improvement will not succeed but lessons are learned from every attempt.

Kudos to the biodiversity community for a model that other communities should follow!

March 4, 2014

Beyond Transparency

Filed under: Open Data,Open Government,Transparency — Patrick Durusau @ 1:53 pm

Beyond Transparency, edited by Brett Goldstein and Lauren Dyson.

From the webpage:

The rise of open data in the public sector has sparked innovation, driven efficiency, and fueled economic development. And in the vein of high-profile federal initiatives like Data.gov and the White House’s Open Government Initiative, more and more local governments are making their foray into the field with Chief Data Officers, open data policies, and open data catalogs.

While still emerging, we are seeing evidence of the transformative potential of open data in shaping the future of our civic life. It’s at the local level that government most directly impacts the lives of residents—providing clean parks, fighting crime, or issuing permits to open a new business. This is where there is the biggest opportunity to use open data to reimagine the relationship between citizens and government.

Beyond Transparency is a cross-disciplinary survey of the open data landscape, in which practitioners share their own stories of what they’ve accomplished with open civic data. It seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving. Through these stories, we examine what is needed to build an ecosystem in which open data can become the raw materials to drive more effective decision-making and efficient service delivery, spur economic activity, and empower citizens to take an active role in improving their own communities.

Let me list the titles for two (2) parts out of five (5):

  • PART 1 Opening Government Data
    • Open Data and Open Discourse at Boston Public Schools Joel Mahoney
    • Open Data in Chicago: Game On Brett Goldstein
    • Building a Smarter Chicago Dan X O’Neil
    • Lessons from the London Datastore Emer Coleman
    • Asheville’s Open Data Journey: Pragmatics, Policy, and Participation Jonathan Feldman
  • PART 2 Building on Open Data
    • From Entrepreneurs to Civic Entrepreneurs, Ryan Alfred, Mike Alfred
    • Hacking FOIA: Using FOIA Requests to Drive Government Innovation, Jeffrey D. Rubenstein
    • A Journalist’s Take on Open Data. Elliott Ramos
    • Oakland and the Search for the Open City, Steve Spiker
    • Pioneering Open Data Standards: The GTFS Story, Bibiana McHugh

Steve Spiker captures my concerns about efficacy of “open data” in his opening sentence:

At the center of the Bay Area lies an urban city struggling with the woes of many old, great cities in the USA, particularly those in the rust belt: disinvestment, white flight, struggling schools, high crime, massive foreclosures, political and government corruption, and scandals. (Oakland and the Search for the Open City)

It may well be that I agree with “open data,” in part because I have no real data to share. So any sharing of data is going to benefit me and whatever agenda I want to pursue.

People who are pursuing their own agendas without open data, have nothing to gain by an open playing field and more than a little to lose. Particularly if they are on the corrupt side of public affairs.

All the more reason to pursue open data in my view but with the understanding that every line of data access benefits some and penalizes others.

Take the long standing tradition of not publishing who meets with the President of the United States. Justified on the basis that the President needs open and frank advice from people who feel free to speak openly.

That’s one explanation. Another explanation is being clubby with media moguls would look inconvenient with the U.S. trade delegation be pushing a pro-media position, to the detriment of us all.

When open data is used to take down members of Congress, the White House, heads and staffs of agencies, it will truly have arrived.

Until then, open data is just whistling as it walks past a graveyard in the dark.

I first saw this in a tweet by ladyson.

PLOS’ Bold Data Policy

Filed under: Data,Open Access,Open Data,Public Data — Patrick Durusau @ 11:32 am

PLOS’ Bold Data Policy by David Crotty.

From the post:

If you pay any attention at all to scholarly publishing, you’re likely aware of the current uproar over PLOS’ recent announcement requiring all article authors to make their data publicly available. This is a bold move, and a forward-looking policy from PLOS. It may, for many reasons, have come too early to be effective, but ultimately, that may not be the point.

Perhaps the biggest practical problem with PLOS’ policy is that it puts an additional time and effort burden on already time-short, over-burdened researchers. I think I say this in nearly every post I write for the Scholarly Kitchen, but will repeat it again here: Time is a researcher’s most precious commodity. Researchers will almost always follow the path of least resistance, and not do anything that takes them away from their research if it can be avoided.

When depositing NIH-funded papers in PubMed Central was voluntary, only 3.8% of eligible papers were deposited, not because people didn’t want to improve access to their results, but because it wasn’t required and took time and effort away from experiments. Even now, with PubMed Central deposit mandatory, only 20% of what’s deposited comes from authors. The majority of papers come from journals depositing on behalf of authors (something else for which no one seems to give publishers any credit, Kent, one more for your list). Without publishers automating the process on the author’s behalf, compliance would likely be vastly lower. Lightening the burden of the researcher in this manner has become a competitive advantage for the journals that offer this service.

While recognizing the goal of researchers to do more experiments, isn’t this reminiscent of the lack of documentation for networks and software?

That creators of networks and software want to get on with the work they enjoy, documentation not being part of that work.

The problem with the semantics of research data, much as it is with network and software semantics, it there is no one else to ask about its semantics. If researchers don’t document those semantics as they perform experiments, then they will have to spend the time at publication to gather that information together.

I sense an opportunity here for software to assist researchers in capturing semantics as they perform experiments, so that production of semantically annotated data at the end of an experiment can be largely a clerical task, subject to review by the actual researchers.

The minimal semantics that needs to be captured for different type of research will vary. That is all the more reason to research and document those semantics before anyone writes a complex monolith of semantics into which existing semantics must be shoe horned.

Reasoning if we don’t know the semantics of data, it is more cost effective to pipe it to /dev/null.

I first saw this in a tweet by ChemConnector.

February 26, 2014

Data Access for the Open Access Literature: PLOS’s Data Policy

Filed under: Data,Open Access,Open Data,Public Data — Patrick Durusau @ 5:44 pm

Data Access for the Open Access Literature: PLOS’s Data Policy by Theo Bloom.

From the post:

Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances. In line with Open Access to research articles themselves, PLOS strongly believes that to best foster scientific progress, the underlying data should be made freely available for researchers to use, wherever this is legal and ethical. Data availability allows replication, reanalysis, new analysis, interpretation, or inclusion into meta-analyses, and facilitates reproducibility of research, all providing a better ‘bang for the buck’ out of scientific research, much of which is funded from public or nonprofit sources. Ultimately, all of these considerations aside, our viewpoint is quite simple: ensuring access to the underlying data should be an intrinsic part of the scientific publishing process.

PLOS journals have requested data be available since their inception, but we believe that providing more specific instructions for authors regarding appropriate data deposition options, and providing more information in the published article as to how to access data, is important for readers and users of the research we publish. As a result, PLOS is now releasing a revised Data Policy that will come into effect on March 1, 2014, in which authors will be required to include a data availability statement in all research articles published by PLOS journals; the policy can be found below. This policy was developed after extensive consultation with PLOS in-house professional and external Academic Editors and Editors in Chief, who are practicing scientists from a variety of disciplines.

We now welcome input from the larger community of authors, researchers, patients, and others, and invite you to comment before March. We encourage you to contact us collectively at data@plos.org; feedback via Twitter and other sources will also be monitored. You may also contact individual PLOS journals directly.

That is a large step towards verifiable research and was taken by PLOS in December of 2013.

That has been supplemented with details that do not change the December announcement in: PLOS’ New Data Policy: Public Access to Data by Liz Silva, which reads in part:

A flurry of interest has arisen around the revised PLOS data policy that we announced in December and which will come into effect for research papers submitted next month. We are gratified to see a huge swell of support for the ideas behind the policy, but we note some concerns about how it will be implemented and how it will affect those preparing articles for publication in PLOS journals. We’d therefore like to clarify a few points that have arisen and once again encourage those with concerns to check the details of the policy or our FAQs, and to contact us with concerns if we have not covered them.

I think the bottom line is: Don’t Panic, Ask.

There are always going to be unanticipated details or concerns but as time goes by and customs develop for how to solve those issues, the questions will become fewer and fewer.

Over time and not that much time, our history of arrangements other than open access are going to puzzle present and future generations of researchers.

February 25, 2014

Project Open Data

Filed under: Government,Open Data,Open Government — Patrick Durusau @ 4:21 pm

Project Open Data

From the webpage:

Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public. Managing this data as an asset and making it available, discoverable, and usable – in a word, open – not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens’ quality of life.

For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Now, weather and mapping tools are ubiquitous and help everyday Americans navigate their lives.

The ultimate value of data can often not be predicted. That’s why the U.S. Government released a policy that instructs agencies to manage their data, and information more generally, as an asset from the start and, wherever possible, release it to the public in a way that makes it open, discoverable, and usable.

The White House developed Project Open Data – this collection of code, tools, and case studies – to help agencies adopt the Open Data Policy and unlock the potential of government data. Project Open Data will evolve over time as a community resource to facilitate broader adoption of open data practices in government. Anyone – government employees, contractors, developers, the general public – can view and contribute. Learn more about Project Open Data Governance and dive right in and help to build a better world through the power of open data.
….

An impressive list of tools and materials for federal (United States) agencies seeking to release data.

And as Ryan Swanstrom says in his post:

Best of all, the entire project is available on GitHub and contributions are welcomed.

Thoughts on possible contributions?

R Markdown:… [Open Analysis, successor to Open Data?]

Filed under: Government,Government Data,Open Data,Open Government — Patrick Durusau @ 11:53 am

R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics by Ben Baumer, et.al.

Abstract:

Nolan and Temple Lang argue that “the ability to express statistical computations is an essential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation. (emphasis in original)

The author’s third point for R Markdown I would have made the first:

Third, the separation of computing from presentation is not necessarily honest… More subtly and less perniciously, the copy-and-paste paradigm enables, and in many cases even encourages, selective reporting. That is, the tabular output from R is admittedly not of presentation quality. Thus the student may be tempted or even encouraged to prettify tabular output before submitting. But while one is fi ddling with margins and headers, it is all too tempting to remove rows or columns that do not suit the student’s purpose. Since the commands used to generate the table are not present, the reader is none the wiser.

Although I have to admit that reproducibility has a lot going for it.

Can you imagine reproducible analysis from the OMB? Complete with machine readable data sets? Or for any other agency reports. Or for that matter, for all publications by registered lobbyists. That could be real interesting.

Open Analysis (OA) as a natural successor to Open Data.

That works for me.

You?

PS: More resources:

Create Dynamic R Statistical Reports Using R Markdown

R Markdown

Using R Markdown with RStudio

Writing papers using R Markdown

If journals started requiring R Markdown as a condition for publication, some aspects of research would become more transparent.

Some will say that authors will resistl

Assume Science or Nature has accepted your article on the condition of your use of R Markdown.

Honestly, are you really going to say no?

I first saw this in a tweet by Scott Chamberlain.

February 22, 2014

OpenRFPs:…

Filed under: Government,Government Data,Open Data,Open Government — Patrick Durusau @ 8:45 pm

OpenRFPs: Open RFP Data for All 50 States by Clay Johnson.

From the post:

Tomorrow at CodeAcross we’ll be launching our first community-based project, OpenRFPs. The goal is to liberate the data inside of every state RFP listing website in the country. We hope you’ll find your own state’s RFP site, and contribute a parser.

The Department of Better Technology’s goal is to improve the way government works by making it easier for small, innovative businesses to provide great technology to government. But those businesses can barely make it through the front door when the RFPs themselves are stored in archaic systems, with sloppy user interfaces and disparate data formats, or locked behind paywalls.

I have posted to the announcement suggesting they use UBL. But in any event, mapping the semantics of RFPs, to enable wider participation would make an interesting project.

I first saw this in a tweet by Tim O’Reilly.

February 21, 2014

Fiscal Year 2015 Budget (US) Open Government?

Filed under: Government,Government Data,Open Data,Open Government — Patrick Durusau @ 11:58 am

Fiscal Year 2015 Budget

From the description:

Each year, the Office of Management and Budget (OMB) prepares the President’s proposed Federal Government budget for the upcoming Federal fiscal year, which includes the Administration’s budget priorities and proposed funding.

For Fiscal Year (FY) 2015– which runs from October 1, 2014, through September 30, 2015– OMB has produced the FY 2015 Federal Budget in four print volumes plus an all-in-one CD-ROM:

  1. the main “Budget” document with the Budget Message of the President, information on the President’s priorities and budget overviews by agency, and summary tables;
  2. “Analytical Perspectives” that contains analyses that are designed to highlight specified subject areas;
  3. “Historical Tables” that provides data on budget receipts, outlays, surpluses or deficits, Federal debt over a time period
  4. an “Appendix” with detailed information on individual Federal agency programs and appropriation accounts that constitute the budget.
  5. A CD-ROM version of the Budget is also available which contains all the FY 2015 budget documents in PDF format along with some additional supporting material in spreadsheet format.

You will also want a “Green Book,” the 2014 version carried this description:

Each February when the President releases his proposed Federal Budget for the following year, Treasury releases the General Explanations of the Administration’s Revenue Proposals. Known as the “Green Book” (or Greenbook), the document provides a concise explanation of each of the Administration’s Fiscal Year 2014 tax proposals for raising revenue for the Government. This annual document clearly recaps each proposed change, reviewing the provisions in the Current Law, outlining the Administration’s Reasons for Change to the law, and explaining the Proposal for the new law. Ideal for anyone wanting a clear summary of the Administration’s policies and proposed tax law changes.

Did I mention that the four volumes for the budget in print with CD-ROM are $250? And last year the Green Book was $75?

For $325.00, you can have a print and pdf of the Budget plus a print copy of the Green Book.

Questions:

  1. Would machine readable versions of the Budget + Green Book make it easier to explore and compare the information within?
  2. Are PDFs and print volumes what President Obama considers to be “open government?”
  3. Who has the advantage in policy debates, the OMB and Treasury with machine readable versions of these documents or the average citizen who has the PDFs and print?
  4. Do you think OMB and Treasury didn’t get the memo? Open Data Policy-Managing Information as an Asset

Public policy debates cannot be fairly conducted without meaningful access to data on public policy issues.

February 12, 2014

…Open GIS Mapping Data To The Public

Filed under: Geographic Data,GIS,Maps,Open Data — Patrick Durusau @ 9:13 pm

Esri Allows Federal Agencies To Open GIS Mapping Data To The Public by Alexander Howard.

From the post:

A debate in the technology world that’s been simmering for years, about whether mapping vendor Esri will allow public geographic information systems (GIS) to access government customers’ data, finally has an answer: The mapping software giant will take an unprecedented step, enabling thousands of government customers around the U.S. to make their data on the ArcGIS platform open to the public with a click of a mouse.

“Everyone starting to deploy ArcGIS can now deploy an open data site,” Andrew Turner, chief technology officer of Esri’s Research and Development Center in D.C., said in an interview. “We’re in a unique position here. Users can just turn it on the day it becomes public.”

Government agencies can use the new feature to turn geospatial information systems data in Esri’s format into migratable, discoverable, and accessible open formats, including CSVs, KML and GeoJSON. Esri will demonstrate the ArcGIS feature in ArcGIS at the Federal Users Conference in Washington, D.C. According to Turner, the new feature will go live in March 2014.

I’m not convinced that GIS data alone is going to make government more transparent but it is a giant step in the right direction.

To have even partial transparency in government, not only would you need GIS data but to have that correlated with property sales and purchases going back decades, along with tracing the legal ownership of property past shell corporations and holding companies, to say nothing of the social, political and professional relationships of those who benefited from various decisions. For a start.

Still, the public may be a better starting place to demand transparency with this type of data.

February 9, 2014

Medical research—still a scandal

Filed under: Medical Informatics,Open Access,Open Data,Research Methods — Patrick Durusau @ 5:45 pm

Medical research—still a scandal by Richard Smith.

From the post:

Twenty years ago this week the statistician Doug Altman published an editorial in the BMJ arguing that much medical research was of poor quality and misleading. In his editorial entitled, “The Scandal of Poor Medical Research,” Altman wrote that much research was “seriously flawed through the use of inappropriate designs, unrepresentative samples, small samples, incorrect methods of analysis, and faulty interpretation.” Twenty years later I fear that things are not better but worse.

Most editorials like most of everything, including people, disappear into obscurity very fast, but Altman’s editorial is one that has lasted. I was the editor of the BMJ when we published the editorial, and I have cited Altman’s editorial many times, including recently. The editorial was published in the dawn of evidence based medicine as an increasing number of people realised how much of medical practice lacked evidence of effectiveness and how much research was poor. Altman’s editorial with its concise argument and blunt, provocative title crystallised the scandal.

Why, asked Altman, is so much research poor? Because “researchers feel compelled for career reasons to carry out research that they are ill equipped to perform, and nobody stops them.” In other words, too much medical research was conducted by amateurs who were required to do some research in order to progress in their medical careers.

Ethics committees, who had to approve research, were ill equipped to detect scientific flaws, and the flaws were eventually detected by statisticians, like Altman, working as firefighters. Quality assurance should be built in at the beginning of research not the end, particularly as many journals lacked statistical skills and simply went ahead and published misleading research.

If you are thinking things are better today, consider a further comment from Richard:

The Lancet has this month published an important collection of articles on waste in medical research. The collection has grown from an article by Iain Chalmers and Paul Glasziou in which they argued that 85% of expenditure on medical research ($240 billion in 2010) is wasted. In a very powerful talk at last year’s peer review congress John Ioannidis showed that almost none of thousands of research reports linking foods to conditions are correct and how around only 1% of thousands of studies linking genes with diseases are reporting linkages that are real. His famous paper “Why most published research findings are false” continues to be the most cited paper of PLoS Medicine.

Not that I think open access would be a panacea for poor research quality but at least it would provide the opportunity for discovery.

All this talk about medical research reminds me of the Big Mechanism DARPA. Assume the research data on pathways is no better or no worse than mapping genes to diseases, DARPA will be spending $42 million to mine data with 1% accuracy.

A better use of those “Big Mechanism” dollars would be to test solutions to produce better medical research for mining.

1% sounds like low-grade ore to me.

February 1, 2014

Academic Torrents!

Filed under: Data,Open Access,Open Data — Patrick Durusau @ 4:02 pm

Academic Torrents!

From the homepage:

Currently making 1.67TB of research data available.

Sharing data is hard. Emails have size limits, and setting up servers is too much work. We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds. Contact us at joecohen@cs.umb.edu.

Some data sets you have probably already seen but perhaps several you have not! Like the crater data set for Mars!

Enjoy!

I first saw this in a tweet by Tony Ojeda.

January 31, 2014

Scientific Data

Filed under: Open Data,Open Science — Patrick Durusau @ 1:03 pm

Scientific Data

From the homepage:

Scientific Data is a new open-access, online-only publication for descriptions of scientifically valuable datasets. It introduces a new type of content called the Data Descriptor designed to make your data more discoverable, interpretable and reusable. Scientific Data is currently calling for submissions, and will launch in May 2014.

The Data Descriptors are described in more detail in Metadata associated with Data Descriptor articles to be released under CC0 waiver with this overview:

Box 1. Overview of information in Data Descriptor metadata

Metadata files will be released in the ISA-Tab format, and potentially in other formats in the future, such as Linked Data. An example metadata file is available here, associated with one of our sample Data Descriptors. The information in these files is designed to be a machine-readable supplement to the main Data Descriptor article.

  • Article citation information: Manuscript title, Author list, DOI, publication date, etc
  • Subject terms: according to NPG’s own subject categorization system
  • Annotation of the experimental design and main technologies used: Annotation terms will be derived from community-based ontologies wherever possible. Fields are derived from the ISA framework and include: Design Type, Measurement Type, Factors, Technology Type, and Technology Platform.
  • Information about external data records: Names of the data repositories, data record accession or DOIs, and links to the externally-stored data records
  • Structured tables that provide a detailed accounting of the experimental samples and data-producing assays, including characteristics of samples or subjects of the study, such as species name and tissue type, described using standard terminologies.

For more information on the value of this structured content and how it relates to the narrative article-like content see this earlier blog post by our Honorary Academic Editor, Susanna-Assunta Sansone.

Nature is taking the lead in this effort, which should bring a sense of security to generations of researchers. Security in knowing Nature takes the rights of authors seriously but also knowing the results will be professional grade.

I am slightly concerned that there is no obvious mechanism for maintenance of “annotation terms” from community-based ontologies or other terms, as terminology changes over time. Change in the vocabulary of for any discipline is too familiar to require citation. As those terms change, so will access to valuable historical resources.

Looking at the Advisory Panel, it is heavily weighted in favor of medical and biological sciences. Is there an existing publication that performs a similar function for data sets from physics, astronomy, botany, etc.?

I first saw this in a tweet by ChemConnector.

Open Science Leaps Forward! (Johnson & Johnson)

Filed under: Bioinformatics,Biomedical,Data,Medical Informatics,Open Data,Open Science — Patrick Durusau @ 11:15 am

In Stunning Win For Open Science, Johnson & Johnson Decides To Release Its Clinical Trial Data To Researchers by Matthew Herper.

From the post:

Drug companies tend to be secretive, to say the least, about studies of their medicines. For years, negative trials would not even be published. Except for the U.S. Food and Drug Administration, nobody got to look at the raw information behind those studies. The medical data behind important drugs, devices, and other products was kept shrouded.

Today, Johnson & Johnson is taking a major step toward changing that, not only for drugs like the blood thinner Xarelto or prostate cancer pill Zytiga but also for the artificial hips and knees made for its orthopedics division or even consumer products. “You want to know about Listerine trials? They’ll have it,” says Harlan Krumholz of Yale University, who is overseeing the group that will release the data to researchers.

….

Here’s how the process will work: J&J has enlisted The Yale School of Medicine’s Open Data Access Project (YODA) to review requests from physicians to obtain data from J&J products. Initially, this will only include products from the drug division, but it will expand to include devices and consumer products. If YODA approves a request, raw, anonymized data will be provided to the physician. That includes not just the results of a study, but the results collected for each patient who volunteered for it with identifying information removed. That will allow researchers to re-analyze or combine that data in ways that would not have been previously possible.

….

Scientists can make a request for data on J&J drugs by going to www.clinicaltrialstudytransparency.com.

The ability to “…re-analyze or combine that data in ways that would not have been previously possible…” is the public benefit of Johnson & Johnson’s sharing of data.

With any luck, this will be the start of a general trend among drug companies.

Mappings of the semantics of such data sets should be contributed back to the Yale School of Medicine’s Open Data Access Project (YODA), to further enhance re-use of these data sets.

January 20, 2014

OpenAIRE Legal Study has been published

Filed under: Law,Licensing,Open Access,Open Data,Open Source — Patrick Durusau @ 2:14 pm

OpenAIRE Legal Study has been published

From the post:

Guibault, Lucie; Wiebe, Andreas (Eds) (2013) Safe to be Open: Study on the protection of research data and recommendation for access and usage. The full-text of the book is available (PDF, ca. 2 MB ) under the CC BY 4.0 license. Published by University of Göttingen Press (Copies can be ordered from the publisher’s website)

Any e-infrastructure which primarily relies on harvesting external data sources (e.g. repositories) needs to be fully aware of any legal implications for re-use of this knowledge, and further application by 3rd parties. OpenAIRE’s legal study will put forward recommendations as to applicable licenses that appropriately address scientific data in the context of OpenAIRE.

CAUTION:: Safe to be Open is a EU-centric publication and while very useful in copyright discussions elsewhere, should not be relied upon as legal advice. (That’s not an opinion about relying on it in the EU. Ask local counsel for that advice.)

I say that having witnessed too many licensing discussions that were uninformed by legal counsel. Entertaining to be sure but if I have a copyright question, I will be posing it to counsel who is being paid to be correct.

At least until ignorance of the law becomes an affirmative shield against liability for copyright infringement. 😉

To be sure, I recommend reading of Safe to be Open as a means to become informed about the contours of access and usage of research data in the EU. And possibly a model for solutions in legal systems that lag behind the EU in that regard.

Personally I favor Attribution CC BY because the other CC licenses presume the licensed material was created without unacknowledged/uncompensated contributions from others.

Think of all the people who taught you to read, write, program and all the people whose work you have read, been influenced by, etc. Hopefully you can add to the sum of communal knowledge but it is unfair to claim ownership of the whole of communal knowledge simply because you contributed a small part. (That’s not legal advice either, just my personal opinion.)

Without all the instrument makers, composers, singers, organists, etc. that came before him, Mozart would not the same Mozart that we remember. Just as gifted but without a context to display his gifts.

Patent and copyright need to be recognized as “thumbs on the scale” against development of services and knowledge. That’s where I would start a discussion of copyright and patents.

January 12, 2014

Transparency and Bank Failures

Filed under: Finance Services,Open Data,Transparency — Patrick Durusau @ 11:40 am

The Relation Between Bank Resolutions and Information Environment: Evidence from the Auctions for Failed Banks by João Granja.

Abstract:

This study examines the impact of disclosure requirements on the resolution costs of failed banks. Consistent with the hypothesis that disclosure requirements mitigate information asymmetries in the auctions for failed banks, I find that, when failed banks are subject to more comprehensive disclosure requirements, regulators incur lower costs of closing a bank and retain a lower portion of the failed bank’s assets, while bidders that are geographically more distant are more likely to participate in the bidding for the failed bank. The paper provides new insights into the relation between disclosure and the reorganization of a banking system when the regulators’ preferred plan of action is to promote the acquisition of undercapitalized banks by healthy ones. The results suggest that disclosure regulation policy influences the cost of resolution of a bank and, as a result, could be an important factor in the definition of the optimal resolution strategy during a banking crisis event.

A reminder that transparency needs to be broader than open data in science and government.

In the case of bank failures, transparency lowers the cost of such failures for the public.

Some interests profit from less transparency in bank failures and other interests (like the public) profit from greater transparency.

If bank failure doesn’t sound like a current problem, consider Map of Banks Failed since 2008. (Select from Failed Banks Map (under Quick Links) to display the maps.) U.S. only. Do you know of a similar map for other countries?

Speaking of transparency, it would be interesting to track the formal, financial and social relationships of those acquiring failed bank assets.

You know, the ones that are selling for less than fair market value due to a lack of transparency.

« Newer PostsOlder Posts »

Powered by WordPress