Archive for the ‘Ecoinformatics’ Category

Earthdata Search – Smells Like A Topic Map?*

Sunday, February 28th, 2016

Earthdata Search

From the webpage:

Search NASA Earth Science data by keyword and filter by time or space.

After choosing tour:

Keyword Search

Here you can enter search terms to find relevant data. Search terms can be science terms, instrument names, or even collection IDs. Let’s start by searching for Snow Cover NRT to find near real-time snow cover data. Type Snow Cover NRT in the keywords box and press Enter.

Which returns a screen in three sections, left to right: Browse Collections, 21 Matching Collections (Add collections to your project to compare and retrieve their data), and the third section displays a world map (navigate by grabbing the view)

Under Browse Collections:

In addition to searching for keywords, you can narrow your search through this list of terms. Click Platform to expand the list of platforms (still in a tour box)

Next step:

Now click Terra to select the Terra satellite.

Comment: Wondering how I will know which “platform” or “instrument” to select? There may be more/better documentation but I haven’t seen it yet.

The data follows the Unified Metadata Model (UMM):

NASA’s Common Metadata Repository (CMR) is a high-performance, high-quality repository for earth science metadata records that is designed to handle metadata at the Concept level. Collections and Granules are common metadata concepts in the Earth Observation (EO) world, but this can be extended out to Visualizations, Parameters, Documentation, Services, and more. The CMR metadata records are supplied by a diverse array of data providers, using a variety of supported metadata standards, including:


Initially, designers of the CMR considered standardizing all CMR metadata to a single, interoperable metadata format – ISO 19115. However, NASA decided to continue supporting multiple metadata standards in the CMR — in response to concerns expressed by the data provider community over the expense involved in converting existing metadata systems to systems capable of generating ISO 19115. In order to continue supporting multiple metadata standards, NASA designed a method to easily translate from one supported standard to another and constructed a model to support the process. Thus, the Unified Metadata Model (UMM) for EOSDIS metadata was born as part of the EOSDIS Metadata Architecture Studies (MAS I and II) conducted between 2012 and 2013.

What is the UMM?

The UMM is an extensible metadata model which provides a ‘Rosetta stone’ or cross-walk for mapping between CMR-supported metadata standards. Rather than create mappings from each CMR-supported metadata standard to each other, each standard is mapped centrally to the UMM model, thus reducing the number of translations required from n x (n-1) to 2n.

Here the mapping graphic:


Granting profiles don’t make the basis for mappings explicit, but the mappings have the same impact post mapping as a topic map would post merging.

The site could use better documentation for the interface and data, at least in the view of this non-expert in the area.

Thoughts on documentation for the interface or making the mapping more robust via use of a topic map?

I first saw this in a tweet by Kirk Borne.

*Smells Like A Topic Map – Sorry, culture bound reference to a routine on the first Cheech & Chong album. No explanation would do it justice.

Economic Crime and Criminal Graphics

Friday, February 26th, 2016

Adjusting the Lens on Economic Crime (Global Economic Crime Survey 2016)

From the foreword:

In business, the promise of opportunity is often tempered with the reality of risk.

This formula holds true not only for those working to build and sustain a business, but also for those looking to victimise one.
The story told in our 2016 Global Economic Crime

Survey is one with which we are all too familiar: economic crime continues to forge new paths into business, regulatory compliance adds stress and burden to responsible businesses, and an increasingly complicated threat landscape challenges the balance between resources and growth. The moral of this story is not new, but is one that may have been forgotten in our haste to succeed in today’s fast-paced global marketplace.

Our report challenges you to adjust your lens on economic crime and refocus your path towards opportunity around strategic preparation.

This work needs to be embedded in your day-to-day decision-making, and supported by strong corporate ethics. Preparing your company for sustained success in today’s world is no longer an exercise in mapping out plans that live out their days in dusty binders on a director’s shelf. Preparation today is a living,
breathing exercise; one that must be constantly tweaked, practiced and tended to, so that it is ready when threats become realities.

Understanding the vision of your company and strategically mapping out a plan for both growth as well as a plan for defence – one that is based on your unique threat landscape and profile – will be the difference between realizing your opportunity or allowing those who want to victimise you to capitalise on theirs.

It wasn’t entirely clear to me what was meant by “economic crime,” aside from possibly a different method of making a profit than the complaining enterprise. It’s all capitalism. Crime is just capitalism that doesn’t follow a particular set of local rules.

I am bolstered in that belief by Fig. 2 from the paper:


I have always puzzled over bribery & corruption for example. Why piece-work corruption is any worse than structural corruption (the sort preferred in the United State) has never been clear to me.

It isn’t clear how useful you will find the report, especially given graphics like the one found at Fig. 3:


I have puzzled over it and the accompanying text for some time.

Does the 49% for financial services represent its percentage of the 36% of global crime rate? Seems unlikely because government/state owned follows at 44% and retail & consumer at 43%, which put us up to over 136%, without including the other categories.

Is it that 49% of financial services are economic crimes? That’s possible but I would hardly expect them to claim that title.

Sometimes, when graphics make no sense, they literally make no sense.

I think you can safely skip this paper.

Core Econ: a free economics textbook

Wednesday, November 5th, 2014

Core Econ: a free economics textbook by Cathy O’Neil.

From the post:

Today I want to tell you guys about, a free (although you do have to register) textbook my buddy Suresh Naidu is using this semester to teach out of and is also contributing to, along with a bunch of other economists.

(image omitted)

It’s super cool, and I wish a class like that had been available when I was an undergrad. In fact I took an economics course at UC Berkeley and it was a bad experience – I couldn’t figure out why anyone would think that people behaved according to arbitrary mathematical rules. There was no discussion of whether the assumptions were valid, no data to back it up. I decided that anybody who kept going had to be either religious or willing to say anything for money.

Not much has changed, and that means that Econ 101 is a terrible gateway for the subject, letting in people who are mostly kind of weird. This is a shame because, later on in graduate level economics, there really is no reason to use toy models of society without argument and without data; the sky’s the limit when you get through the bullshit at the beginning. The goal of the Core Econ project is to give students a taste for the good stuff early; the subtitle on the webpage is teaching economics as if the last three decades happened.

Skepticism of government economic forecasts and data requires knowledge of the lingo and assumptions of economics. This introduction won’t get you to that level but it is a good starting place.


EcoData Retriever

Sunday, October 5th, 2014

EcoData Retriever

From the webpage:

Most ecological datasets do not adhere to any agreed-upon standards in format, data structure or method of access. As a result acquiring and utilizing available datasets can be a time consuming and error prone process. The EcoData Retriever automates the tasks of finding, downloading, and cleaning up ecological data files, and then stores them in a local database. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days. Small datasets can be downloaded and installed in seconds and large datasets in minutes. The program also cleans up known issues with the datasets and automatically restructures them into standard formats before inserting the data into your choice of database management systems (Microsoft Access, MySQL, PostgreSQL, and SQLite, on Windows, Mac and Linux).

When faced with:

…datasets [that] do not adhere to any agreed-upon standards in format, data structure or method of access

you can:

  • Complain to fellow cube dwellers
  • Complain about data producers
  • Complain to the data producers
  • Create a solution to clean up and reformat the data as open source

Your choice?

I first saw this in a tweet by Dan McGlinn

Piketty in R markdown

Tuesday, July 1st, 2014

Piketty in R markdown – we need some help from the crowd by Jeff Leek.

From the post:

Thomas Piketty’s book Capital in the 21st Century was a surprise best seller and the subject of intense scrutiny. A few weeks ago the Financial Times claimed that the analysis was riddled with errors, leading to a firestorm of discussion. A few days ago the London School of economics posted a similar call to make the data open and machine readable saying.

None of this data is explicitly open for everyone to reuse, clearly licenced and in machine-readable formats.

A few friends of Simply Stats had started on a project to translate his work from the excel files where the original analysis resides into R. The people that helped were Alyssa Frazee, Aaron Fisher, Bruce Swihart, Abhinav Nellore, Hector Corrada Bravo, John Muschelli, and me. We haven’t finished translating all chapters, so we are asking anyone who is interested to help contribute to translating the book’s technical appendices into R markdown documents. If you are interested, please send pull requests to the gh-pages branch of this Github repo.

Hmmm, debate to be conducted based on known data sets?

That sounds like a radical departure from most public debates, to say nothing of debates in politics.

Dangerous because the general public may come to expect news reports, government budgets, documents, etc. to be accompanied by machine readable data files.

Even more dangerous if data files are compared to other data files, for consistency, etc.

No time to start like the present. Think about helping with the Piketty materials.

You may be helping to start a trend.

Berkeley Ecoinformatics Engine

Tuesday, January 21st, 2014

Berkeley Ecoinformatics Engine – An open API serving UC Berkeley’s Natural History Data

From the News page:

We are thrilled to release an early version of the Berkeley Ecoinformatics Engine API! We have a lot of data and tools that we’ll be pushing out in future releases so keep an eye out as we are just getting started.

To introduce eco-minded developers to this new resource, we are serving up two key data sets that will be available for this weekend’s EcoHackSF:

For this hackathon, we are encouraging participants to help us document our changing environment. Here’s the abstract:

Wieslander Vegetation Mapping Project – Data from the 1920s needs an update

During the 1920’s and 30’s Albert Everett Wieslander and his team at USGS compiled an amazing and comprehensive dataset known as the Wieslander Vegetation Mapping Project. The data collected includes landscape photos, species inventories, plot maps, and vegetation maps covering most of California. Several teams have been digitizing this valuable historic data over the last ten years, and much of it is now complete. We will be hosting all of the finalized data in our Berkeley Ecoinformatics Engine.

Our task for the EcoHack community will be to develop a web/mobile application that will allow people to view and find the hundreds of now-geotagged landscape photos, and reshoot the same scene today. These before and after images will provide scientists and enthusiasts with an invaluable view of how these landscapes have changed over the last century.

Though this site is focused on the development of the EcoEngine, this project is a part of a larger effort to address the challenge of identifying the interactions and feedbacks between different species and their environment. It will promote the type of multi-disciplinary building that will lead to breakthroughs in our understanding of the biotic input and response to global change. The EcoEngine will serve to unite previously disconnected perspectives from paleo-ecologists, population biologists, and ecologists and make possible the testing of predictive models of global change, a critical advance in making the science more rigorous. Visit to learn more.

Hot damn! Another project trying to reach across domain boundaries and vocabularies to address really big problems.

Maybe the original topic maps effort was just a little too early.