Archive for the ‘Health care’ Category

The PokitDok HealthGraph

Tuesday, January 20th, 2015

The PokitDok HealthGraph by Denise Gosnell, PhD and Alec Macrae.

From the post:


While the front-end team has been busy putting together version 3 of our healthcare marketplace, the data science team has been hard at work on several things that will soon turn into new products. Today, I’d like to give you a sneak peek at one of these projects, one that we think will profoundly change the way you think about health data. We call it the PokitDok HealthGraph. Let’s ring in the New Year with some data science!

Everyone’s been talking about Graph Theory, but what is it, exactly?

And we aren’t talking about bar graphs and pie charts.

Social networks have brought the world of graph theory to the forefront of conversation. Even though graph theory has been around since Euler solved the infamous Konigsberg bridge problem in the 1700’s, we can thank the current age of social networking for giving graph theory a modern revival.

At the very least, graph theory is the art of connecting the dots, kind of like those sweet pictures you drew as a kid. A bit more formally, graph theory studies relationships between people, places and/or things. Take any ol’ social network – Facebook, for example, uses a graph database to help people find friends and interests. In graph theory, we represent this type of information with nodes (dots) and edges (lines) where the nodes are people, places and/or things and the lines represent their relationship.

To make a long story short: healthcare is about you and connecting you with quality care. When data scientists think of connecting things together, graphs are most often the direction we go.

At PokitDok, we like to look at your healthcare needs as a social network, aka: your personal HealthGraph. The HealthGraph is a network of doctors, other patients, insurance providers, common ailments and all of the potential connections between them.

Hard to say in advance but it looks like Denise and Alec are close to the sweet spot on graph explanations for lay people. Having subject matter that is important to users helps. And using familiar names for the nodes of the graph works as well.

Worth following this series of posts to see if they continue along this path.

Project Tycho:… [125 Years of Disease Records]

Tuesday, December 3rd, 2013

Project Tycho: Data for Health

From the webpage:

After four years of data digitization and processing, the Project Tycho™ Web site provites open access to newly digitized and integrated data from the entire 125 years history of United States weekly nationally notifiable disease surveillance data since 1888. These data can now be used by scientists, decision makers, investors, and the general public for any purpose. The Project Tycho™ aim is to advance the availability and use of public health data for science and decision making in public health, leading to better programs and more efficient control of diseases.

Three levels of data have been made available: Level 1 data include data that have been standardized for specific analyses, Level 2 data include standardized data that can be used immediately for analysis, and Level 3 data are raw data that cannot be used for analysis without extensive data management. See the video tutoral.

An interesting factoid concerning disease reporting in the United States, cica 1917. Influenza, in 1917, was not a reportable disease. The Great Influenza by John Barry.

I am curious about the Level 3 data.

Mostly in terms of how much “data management” would be needed to make it useful?

Could be a window into the data management required to unify medical records in the United States.

Or simply a way to practice your data management skills.

Modern Healthcare Architectures Built with Hadoop

Monday, December 2nd, 2013

Modern Healthcare Architectures Built with Hadoop by Justin Sears.

From the post:

We have heard plenty in the news lately about healthcare challenges and the difficult choices faced by hospital administrators, technology and pharmaceutical providers, researchers, and clinicians. At the same time, consumers are experiencing increased costs without a corresponding increase in health security or in the reliability of clinical outcomes.

One key obstacle in the healthcare market is data liquidity (for patients, practitioners and payers) and some are using Apache Hadoop to overcome this challenge, as part of a modern data architecture. This post describes some healthcare use cases, a healthcare reference architecture and how Hadoop can ease the pain caused by poor data liquidity.

As you would guess, I like the phrase data liquidity. 😉

And Justin lays out the areas where we are going to find “poor data liquidity.”

Source data comes from:

  • Legacy Electronic Medical Records (EMRs)
  • Transcriptions
  • PACS
  • Medication Administration
  • Financial
  • Laboratory (e.g. SunQuest, Cerner)
  • RTLS (for locating medical equipment & patient throughput)
  • Bio Repository
  • Device Integration (e.g. iSirona)
  • Home Devices (e.g. scales and heart monitors)
  • Clinical Trials
  • Genomics (e.g. 23andMe, Cancer Genomics Hub)
  • Radiology (e.g. RadNet)
  • Quantified Self Sensors (e.g. Fitbit, SmartSleep)
  • Social Media Streams (e.g. FourSquare, Twitter)

But then I don’t see what part of the Hadoop architecture addresses the problem of “poor data liquidity.”

Do you?

I thought I had found it when Charles Boicey (in the UCIH case study) says:

“Hadoop is the only technology that allows healthcare to store data in its native form. If Hadoop didn’t exist we would still have to make decisions about what can come into our data warehouse or the electronic medical record (and what cannot). Now we can bring everything into Hadoop, regardless of data format or speed of ingest. If I find a new data source, I can start storing it the day that I learn about it. We leave no data behind.”

But that’s not “data liquidity,” not in any meaningful sense of the word. Dumping your data to paper would be just as effective and probably less costly.

To be useful, “data liquidity” must has a sense of being integrated with data from diverse sources. To present the clinician, researcher, health care facility, etc. with all the data about a patient, not just some of it.

I also checked the McKinsey & Company report “The ‘Big Data’ Revolution in Healthcare.” I didn’t expect them to miss the data integration question and they didn’t.

The second exhibit in the McKinsey and Company report (the full report):

big data integration

The part in red reads:

Integration of data pools required for major opportunities.

I take that to mean that in order to have meaningful healthcare reform, integration of health care data pools is the first step.

Do you disagree?

And if that’s true, that we need integration of health care data pools first, do you think Hadoop can accomplish that auto-magically?

I don’t either.

Casualty Count for Obamacare (0)

Wednesday, November 20th, 2013

5 lessons IT leaders can learn from Obamacare rollout mistakes by Teena Hammond.

Teena reports on five lessons to be learned from the rollout:

  1. If you’re going to launch a new website, decide whether to use in-house talent or outsource. If you opt to outsource, hire a good contractor.
  2. Follow the right steps to hire the best vendor for the project, and properly manage the relationship.
  3. Have one person in charge of the project with absolute veto power.
  4. Do not gloss over any problems along the way. Be open and honest about the progress of the project. And test the site.
  5. Be ready for success or failure. Hope for the best but prepare for the worst and have guidelines to manage any potential failure.

There is a sixth lesson that emerges from Vaughn Bullard, CEO and founder of Build.Automate Inc., who is quoted in part saying:

The contractor telling the government that it was ready despite the obvious major flaws in the system is just baffling to me. If I had an employee that did something similar, I would have terminated their employment. It’s pretty simple.”

What it comes down to in the end, Bullard said, is that, “Quality and integrity count in all things.”

To avoid repeated failures in the future (sixth lesson), terminate those responsible for the current failure.

All contractors and their staffs. Track the staffs in order to avoid the same staff moving to other contractors.

Termination all appointed or hired staff who responsible for the contract and/or management of the project.

Track former staff employment by contractors and refuse contracts wherever they are employed.

You may have noticed that the reported casualty count for the Obamacare failure has been zero.

What incentive exists for the next group of contract/project managers and/or contractors for “quality and integrity?”

That would be the same as the casualty count, zero.

PS: Before you protest the termination and ban of failures as cruel, consider its advantages as a wealth redistribution program.

The government may not get better service but it will provide opportunities for fraud and poor quality work from new participants.

Not to mention there are IT service providers who exhibit quality and integrity. Absent traditional mis-management, the government could happen upon one of those.

The tip for semantic technologies is to under-promise and over-deliver. Always. website ‘didn’t have a chance in hell’

Tuesday, October 22nd, 2013 website ‘didn’t have a chance in hell’ by Patrick Thibodeau.

From the post:

A majority of large IT projects fail to meet deadlines, are over budget and don’t make their users happy. Such is the case with

The U.S. is now racing to fix, the Affordability Care Act (ACA) website that launched Oct 1, by bringing in new expertise to fix it.’s problems include site availability due to excessive loads, incorrect data recording among other things.

President Barack Obama said Monday that there is “no excuse” for the problems at the site.

But his IT advisors shouldn’t be surprised — the success rate for large, multi-million dollar commercial and government IT projects is very low.

The Standish Group, which has a database of some 50,000 development projects, looked at the outcomes of multimillion dollar development projects and ran the numbers for Computerworld.

Of 3,555 projects from 2003 to 2012 that had labor costs of at least $10 million, only 6.4% were successful. The Standish data showed that 52% of the large projects were “challenged,” meaning they were over budget, behind schedule or didn’t meet user expectations. The remaining 41.4% were failures — they were either abandoned or started anew from scratch.

“They didn’t have a chance in hell,” said Jim Johnson, founder and chairman of Standish, of “There was no way they were going to get this right – they only had a 6% chance,” he said.

There is one reason that wasn’t offered for the failure.

Let me illustrate that reason.

In the Computer World article I quoted above, the article mentions the FBI tanking the $170 million virtual case initiative.

Contractor: SAIC.

Just last month I saw this notice:

XXXXXXXXX, San Diego, Calif., was awarded a $35,883,761 cost-plus-incentive-fee contract for software engineering, hardware, integration, technical support, and training requirements of the Integrated Strategic Planning and Analysis Network targeting function, including the areas of National Target Base production and National Desired Ground Zero List development. Work is being performed at Offut Air Force Base, Neb., with an expected completion date of Sept. 30 2018. This contract was a competitive acquisition and two offers were received. No funds have been obligated at time of award. The 55th Contracting Squadron at Offut Air Force Base, Neb., is the contracting activity. (FA4600-13-D-0001) [From Defense News and Career Advice, September 19, 2013.]

Can you guess who the contractor is in that $35 million award?

If you guessed SAIC, you would be correct!

Where is the incentive to do a competent job on any contract?

If you fail on a government contract, you get to keep the money.

Not to mention that you are still in line for more $multi-million dollar contracts.

I’m not on that gravy train but I don’t think that is what bothers me.

Doing poor quality work, in software projects or anywhere else, diminishes all the practitioners in a particular profession.

The first step towards a solution is for government and industry to stop repeating business with software firms that fail.

If smaller firms can’t match the paperwork/supervision required by layers of your project management, that’s a clue you need to do internal house cleaning.

Remember the quote about what is defined by doing the same thing over and over and expecting a different result?

Full Healthcare Interoperability “…may take some creative thinking.”

Tuesday, June 4th, 2013

Completing drive toward healthcare interoperability will be challenge by Ed Burns.

From the post:

The industry has made progress toward healthcare interoperability in the last couple years, but getting over the final hump may take some creative thinking. There are still no easy answers for how to build fully interoperable nationwide networks.

At the Massachusetts Institute of Technology CIO Symposium, held May 22 in Cambridge, Ma., Beth Israel Deaconess Medical Center CIO John Halamka, M.D., said significant progress has been made.

In particular, he pointed to the growing role of the Clinical Document Architecture (CDA) standard. Under the 2014 Certification Standards, EHR software must be able to produce transition of care documents in this form.

But not every vendor has reached the point where it fully supports this standard, and it is not the universal default for clinician data entry. Additionally, Halamka pointed out that information in health records tends to be incomplete. Often the worker responsible for entering important demographic data and other information into the record is the least-trained person on the staff, which can increase the risk of errors and produce bad data.

There are ways around the lack of vendor support for healthcare data interoperability. Halamka said most states’ information exchanges can function as middleware. As an example, he talked about how Beth Israel is able to exchange information with Atrius Health, a group of community-based hospitals in Eastern Massachusetts, across the state’s HIE even though the two networks are on different systems.

“You can get around what the vendor is able to do with middleware,” Halamka said.

But while these incremental changes have improved data interoperability, supporting full interconnectedness across all vendor systems and provider networks could take some new solutions.

Actually “full” healthcare interoperability isn’t even a possibility.

What we can do is decide how much interoperability is worth in particular situations and do the amount required.

Everyone in the healthcare industry has one or more reasons for the formats and semantics they use now.

Changing those formats and semantics requires not only changing the software but training the people who use the software and the data it produces.

Not to mention the small task of deciding on what basis interoperability will be built.

As you would expect, I think a topic map as middleware solution, one that ties diverse systems together in a re-usable way, is the best option.

Convincing the IT system innocents that write healthcare policy that demanding interoperability isn’t an effective strategy would be a first step.

What would you suggest as a second step?

Medicare Provider Charge Data

Thursday, May 30th, 2013

Medicare Provider Charge Data

From the webpage:

As part of the Obama administration’s work to make our health care system more affordable and accountable, data are being released that show significant variation across the country and within communities in what hospitals charge for common inpatient services.

The data provided here include hospital-specific charges for the more than 3,000 U.S. hospitals that receive Medicare Inpatient Prospective Payment System (IPPS) payments for the top 100 most frequently billed discharges, paid under Medicare based on a rate per discharge using the Medicare Severity Diagnosis Related Group (MS-DRG) for Fiscal Year (FY) 2011. These DRGs represent almost 7 million discharges or 60 percent of total Medicare IPPS discharges.

Hospitals determine what they will charge for items and services provided to patients and these charges are the amount the hospital bills for an item or service. The Total Payment amount includes the MS-DRG amount, bill total per diem, beneficiary primary payer claim payment amount, beneficiary Part A coinsurance amount, beneficiary deductible amount, beneficiary blood deducible amount and DRG outlier amount.

For these DRGs, average charges and average Medicare payments are calculated at the individual hospital level. Users will be able to make comparisons between the amount charged by individual hospitals within local markets, and nationwide, for services that might be furnished in connection with a particular inpatient stay.

Data are being made available in Microsoft Excel (.xlsx) format and comma separated values (.csv) format.

Inpatient Charge Data, FY2011, Microsoft Excel version
Inpatient Charge Data, FY2011, Comma Separated Values (CSV) version

A nice start towards a useful data set.

Next step would be tying identifiable physicians with ordered medical procedures and tests.

The only times I have arrived at a hospital by ambulance, I never thought to ask for a comparison of their prices with other local hospitals. Nor did I see any signs advertising discounts on particular procedures.

Have you?

Let’s not pretend medical care is a consumer market, where “consumers” are penalized for not being good shoppers.

I first saw this at Nathan Yau’s Medicare provider charge data released.

SURAAK – When Search Is Not Enough [A “google” of search results, new metric]

Wednesday, March 13th, 2013

SURAAK – When Search Is Not Enough (video)

A new way to do research. SURAAK is a web application that uses natural language processing techniques to analyze big data of published healthcare articles in the area of geriatrics and senior care. See how SURAAK uses text causality to find and analyze word relationship is this and other areas of interest.

SURAAK = Semantic Understanding Research in the Automatic Acquisition of Knowledge.

NLP based system that extracts “causal” sentences.

Differences from Google (according to the video)

  • Extracts text from PDFs
  • Links concepts together building relationships found in extracted text
  • Links articles together based on shared concepts

Search demo was better than using Google but that’s not hard to do.

The “notes” that are extracted from texts are sentences.

I am uneasy about the use of sentences in isolation from the surrounding text as a “note.”

It’s clearly “doable,” but whether it is a good idea, remains to be seen. Particularly since users are rating sentences/notes in isolation from the text in which they occur.

BTW, funded with tax dollars from the National Institutes of Health and the National Institute on Aging, to the tune of $844K.

I am still trying to track down the resulting software.

I take this as an illustration that anything over a “google” of search results (a new metric), is of interest and fundable.

Big Data and Healthcare Infographic

Saturday, February 2nd, 2013

Big Data and Healthcare Infographic by Shar Steed.

From the post:

Big Data could revolutionize healthcare by replacing up to 80% of what doctors do while still maintaining over 91% accuracy. Please take a look at the infographic below to learn more.

An interesting graphic, even if I don’t buy the line that computers are better than doctors at:

Integrating and balancing considerations of patient symptoms, history, demeanor, environmental factors, and population management guidelines.

Noting that in the next graphic block, the 91% accuracy rate using a “diagnostic knowledge system” doesn’t say what sort of “clinical trials” were used.

Makes a difference if we are talking brain surgery or differential diagnosis versus seeing patients in an out-patient clinic.

Still, an interesting graphic.

Curious where you see semantic integration issues, large or small in this graphic?

Health Design Challenge [$50K in Prizes – Deadline 30th Nov 2012]

Thursday, October 25th, 2012

Health Design Challenge

More details at the site but:

ONC & VA invite you to rethink how the medical record is presented. We believe designers can use their talents to make health information patient-centered and improve the patient experience.

Being able to access your health information on demand can be lifesaving in an emergency situation, can help prevent medication errors, and can improve care coordination so everyone who is caring for you is on the same page. However, too often health information is presented in an unwieldy and unintelligible way that makes it hard for patients, their caregivers, and their physicians to use. There is an opportunity for talented designers to reshape the way health records are presented to create a better patient experience.

Learn more at

The purpose of this effort is to improve the design of the medical record so it is more usable by and meaningful to patients, their families, and others who take care of them. This is an opportunity to take the plain-text Blue Button file and enrich it with visuals and a better layout. Innovators will be invited to submit their best designs for a medical record that can be printed and viewed digitally.

This effort will focus on the content defined by a format called the Continuity of Care Document (CCD). A CCD is a common template used to describe a patient’s health history and can be output by electronic medical record (EMR) software. Submitted designs should use the sections and fields found in a CCD. See for CCD sections and fields.

Entrants will submit a design that:

  • Improves the visual layout and style of the information from the medical record
  • Makes it easier for a patient to manage his/her health
  • Enables a medical professional to digest information more efficiently
  • Aids a caregiver such as a family member or friend in his/her duties and responsibilities with respect to the patient

Entrants should be conscious of how the wide variety of personas will affect their design. Our healthcare system takes care of the following types of individuals:

  • An underserved inner-city parent with lower health literacy
  • A senior citizen that has a hard time reading
  • A young adult who is engaged with technology and mobile devices
  • An adult whose first language is not English
  • A patient with breast cancer receiving care from multiple providers
  • A busy mom managing her kids’ health and helping her aging parents

This is an opportunity for talented individuals to touch the lives of Americans across the country through design. The most innovative designs will be showcased in an online gallery and in a physical exhibit at the Annual ONC Meeting in Washington DC.

should be enough to capture your interest.

Winners will be announced December 12, 2012.

Only the design is required, no working code.

Still, a topic map frame of mind may give you more options than other approaches.

Harmonization of Reported Medical Events in Europe

Friday, September 7th, 2012

Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project by Paul Avillach, et. al. (J Am Med Inform Assoc doi:10.1136/amiajnl-2012-000933)


Objective Data from electronic healthcare records (EHR) can be used to monitor drug safety, but in order to compare and pool data from different EHR databases, the extraction of potential adverse events must be harmonized. In this paper, we describe the procedure used for harmonizing the extraction from eight European EHR databases of five events of interest deemed to be important in pharmacovigilance: acute myocardial infarction (AMI); acute renal failure (ARF); anaphylactic shock (AS); bullous eruption (BE); and rhabdomyolysis (RHABD).

Design The participating databases comprise general practitioners’ medical records and claims for hospitalization and other healthcare services. Clinical information is collected using four different disease terminologies and free text in two different languages. The Unified Medical Language System was used to identify concepts and corresponding codes in each terminology. A common database model was used to share and pool data and verify the semantic basis of the event extraction queries. Feedback from the database holders was obtained at various stages to refine the extraction queries.


Conclusions The iterative harmonization process enabled a more homogeneous identification of events across differently structured databases using different coding based algorithms. This workflow can facilitate transparent and reproducible event extractions and understanding of differences between databases.

Not to be overly critical but the one thing left out of the abstract was some hint about the “…procedure used for harmonizing the extraction…” which interests me.

The workflow diagram from figure 2 is worth transposing into HTML markup:

  • Event definition
    • Choice of the event
    • Event Definition Form (EDF) containing the medical definition and diagnostic criteria for the event
  • Concepts selection and projection into the terminologies
    • Search for Unified Medical Language System (UMLS) concepts corresponding to the medical definition as reported in the EDF
    • Projection of UMLS concepts into the different terminologies used in the participating databases
    • Publication on the project’s forum of the first list of UMLS concepts and corresponding codes and terms for each terminology
  • Revision of concepts and related terms
    • Feedback from database holders about the list of concepts with corresponding codes and related terms that they have previously used to identify the event of interest
    • Report on literature review on search criteria being used in previous observational studies that explored the event of interest
    • Text mining in database to identify potentially missing codes through the identification of terms associated with the event in databases
    • Conference call for finalizing the list of concepts
    • Search for new UMLS concepts from the proposed terms
    • Final list of UMLS concepts and related codes posted on the forum
  • Translation of concepts and coding algorithms into queries
    • Queries in each database were built using:
      1. the common data model;
      2. the concept projection into different terminologies; and
      3. the chosen algorithms for event definition
    • Query Analysis
      • Database holders extract data on the event of interest using codes and free text from pre-defined concepts and with database-specific refinement strategies
      • Database holders calculate incidence rates and comparisons are made among databases
      • Database holders compare search queries via the forum

At least for non-members, the EU-ADR website does not appear to offer access to the UMLS concepts and related codes mapping. That mapping could be used to increase accessibility to any database using those codes.

UC Irvine Medical Center: Improving Quality of Care with Apache Hadoop

Tuesday, August 21st, 2012

UC Irvine Medical Center: Improving Quality of Care with Apache Hadoop by Charles Boicey.

From the post:

With a single observation in early 2011, the Hadoop strategy at UC Irvine Medical Center started. While using Twitter, Facebook, LinkedIn and Yahoo we came to the conclusion that healthcare data although domain specific is structurally not much different than a tweet, Facebook posting or LinkedIn profile and that the environment powering these applications should be able to do the same with healthcare data.

In healthcare, data shares many of the same qualities as that found in the large web properties. Each has a seemingly infinite volume of data to ingest and it is all types and formats across structured, unstructured, video and audio. We also noticed the near zero latency in which data was not only ingested but also rendered back to users was important. Intelligence was also apparent in that algorithms were employed to make suggestion such as people you may know.

We started to draw parallels to the challenges we were having with the typical characteristic of Big Data, volume, velocity and variety.

The start of a series Hadoop in health care.

I am more interested in the variety question than volume or velocity but for practical applications, all three are necessary considerations.

From further within the post:

We saw this project as vehicle for demonstrating the value of Applied Clinical Informatics and promoting the translational effects of rapidly moving from “code side to bedside”. (emphasis added)

Just so you know to add the string “Applied Clinical Informatics” to your literature searches in this area.

The wheel will be re-invented often enough without your help.

A Competent CTO Can Say No

Saturday, June 2nd, 2012

Todd Park, CTO of the United States, should be saying no.

Todd has mandated six months for progress on:

  1. MyGov
  2. Reimagine the relationship between the federal government and its citizens through an online footprint developed not just for the people, but also by the people.

  3. Open Data Initiatives
  4. Stimulate a rising tide of innovation and entrepreneurship that utilizes government data to create tools that help Americans in numerous ways – e.g., apps and services that help people find the right health care provider, identify the college that provides the best value for their money, save money on electricity bills through smarter shopping, or keep their families safe by knowing which products have been recalled.

  5. Blue Button for America
  6. Develop apps and create awareness of tools that help individuals get access to their personal health records — current medications and drug allergies, claims and treatment data, and lab reports – that can improve their health and healthcare.

  7. RFP-EZ
  8. Build a platform that makes it easier for small high-growth businesses to navigate the federal government, and enables agencies to quickly source low-cost, high-impact information technology solutions.

  9. The 20% Campaign
  10. Create a system that enables US government programs to seamlessly move from making cash payments to support foreign policy, development assistance, government operations or commercial activities to using electronic payments such as mobile devices, smart cards and other methods.

This is a classic “death march” pattern.

Having failed to make progress on any of these fronts in forty-two months, President Obama wants to mandate progress in six months.

Progress cannot be mandated and a competent CTO would say no. To the President and anyone who asks.

Progress is possible but only with proper scoping and requirements development.

Don’t further incompetence.

Take the pledge:

I refuse to apply for or if appointed to serve as a Presidential Innovation Fellow “…to deliver significant results in six months.” /s/ Patrick Durusau, Covington, Georgia, 2 June 2012.

(Details: US CTO seeks to scale agile thinking and open data across federal government)

Health Care Cost Institute

Tuesday, May 22nd, 2012

Health Care Cost Institute

I can’t give you a clean URL but on Monday (21 May 2012), the Washington Post ran a story on the Health Care Cost Institute, which had the following quotes:

This morning a new nonprofit called the Health Care Cost Institute will roll out a database of 5 billion health insurance claims (all stripped of the individual health plan’s identity, to address privacy concerns).

This is the first study to use the HCCI data, although more are in the works. Gaynor has been inundated with about 130 requests from health policy researchers to use the database. While his team sifts through those, three approved studies are already tackling big health policy questions.

There is immense interest in gaining access,” says HCCI executive director David Newman. “We’re having trouble keeping up with that.” (emphasis added)

Sorry, that went by a little fast. The data has already been scrubbed so why the choke point of the Health Care Cost Insitute on the data?

Spin it up to one or more clouds that support free public storage for data sets of public interest.

Problem of sorting through access request is solved.

Just maybe researchers will want to address other questions, ones that aren’t necessarily about costs. And/or combine this data with other data. Like data on local pollution. (Although you would need historical data to make that work.)

Mapping this data set to other data sets could only magnify its importance.

Many thanks are owed to the Health Care Cost Institute for securing the data set.

But our thanks should not include electing the HCCI as censor of uses of this data set.

Fixing Healthcare with Big Data

Thursday, April 5th, 2012

I was reminded of “Unrealistic Expectations,” no, sorry, that was “Great Expectations” (Dickens) when I read: Fixing Healthcare with Big Data

Robert Gelber writes:

Yesterday, Roger Foster, the Senior Director for DRC’s technologies group, discussed the immense expenses of the U.S. healthcare system. The 2.6 trillion dollar market is ripe for new efficiencies reducing overall costs and improving public health. He believes these enhancements can be achieved with the help of big data.

Foster set forth a six-part approach aimed at reducing costs and improving patient outcomes using big data.

It would have been more correct to say Foster is “selling big data as the basis for these enhancements.”

Consider his six part plan:

Unwarranted use

Many healthcare providers focus on a fee-for-service model, which promotes recurring medical visits at higher rates. Instead, big data analytics could help generate a model that implements a performance-based payment method.

Did you notice the use of “could” in the first approach? The current service model developed in the absence of “big data.” But Foster would have us create big data analytics of healthcare in hopes a new model will suddenly appear. Not real likely.

Fraud waste & abuse

Criminal organizations defraud Centers for Medicare and Medicaid Services (CMS) by charging for services never rendered. Using big data analytics, these individuals could be tracked much faster through the employment of outlier algorithms.

Here I think “criminal organizations” is a synonym for dishonest doctors and hospitals. Hardly takes big data analytics and outlier algorithms to know one doctor cannot read several hundred x-rays day after day.

Administrative costs

The departments of Veterans Affairs (VA), Military Heath System (MHS) and others suffer high costs due to administrative inefficiencies in billing and medical records management. By updating billing systems and employing big data records management, facilities could spend less time working on bookkeeping and more time providing accurate information to doctors and physicians assistants.

Really? Perhaps Foster could consult the VA Data Repository and get back to us on that.

Provider inefficiencies

A wide implementation of clinical decision systems could reduce errors and increase congruency among various healthcare providers. Such systems could also predict risks based on population data.

Does that sound like more, not less, IT overhead to you?

Lack of coordinated care

The process of sharing medical data across institutions has become cumbersome resulting in redundant information and added costs. Improved sharing of information would open systems up to predictive modeling and also allow patients to view their history. This would allow the patient to have greater control in their treatment.

How much “added costs” versus the cost of predictive modeling? Sounds like we are going to create “big data” and then manage it to save money. That went by a little fast for me.

Preventable conditions

Through the use of big data, healthcare providers can track the change in behavior of patients after treatment. Using this data, medical professionals can better educate patients of the resulting effects from their behavior.

Is it a comfort that we are viewed as ignorant rather than making poor choices? What if despite tracking and education we don’t change our behavior? Sounds like someone is getting ready to insist that we change.

Foster is confident that big data will answer pressing issues in the healthcare as long as solutions are deployed properly.

That last sentence sums up my problem with Foster’s position. “Big data” is the answer, so long as you understand the problem correctly.

“Big data” has a lot of promise, but we need understand the problems at hand before choosing solutions.

Let’s avoid “Unrealistic Expectations.”

Distributed Terminology System 4.0 – Apelon – != a Topic Map?

Friday, March 23rd, 2012

APELON INTRODUCES DISTRIBUTED TERMINOLOGY SYSTEM 4.0 – Latest Version of Leading Open Source Terminology Management Software Provides Enhanced Interoperability and Integration Capabilities

From the post:

Apelon, Inc., an international provider of terminology and data interoperability solutions, is pleased to announce a major new release (4.0) of its Distributed Terminology System (DTS), the healthcare industry’s leading open source terminology management platform. Based on extensive user feedback from deployments around the world, the new release features significant usability enhancements, new methods for tracking terminology changes over time, and greater integration with Java Enterprise Edition (JEE) and Software Oriented Architecture (SOA) infrastructures. The product will be unveiled this month at the Healthcare Information and Management Systems Society (HIMSS) 2012 Conference and Exhibition in Las Vegas, February 21 – 23, 2012.

Apelon’s DTS is a comprehensive open-source solution for the acquisition, management and practical deployment of standardized healthcare terminologies. Integration of data standards is a critical element for healthcare organizations to realize care improvement. The product supports data standardization and interoperability in Electronic Health Records systems, Healthcare Information Exchanges, and Clinical Decision Systems.

With version 4.0, DTS users easily manage the complete terminology lifecycle. The system provides the ability to transparently view, query, and browse across terminology versions. This facilitates the management of rapidly evolving standards such as SNOMED CT, ICD-10-CM, LOINC and RxNorm, and supports their use for longitudinal electronic health records. Local vocabularies, subsets and cross-maps can be versioned and queried in the same way, meaning that DTS users can tailor and adapt standards to their particular needs. Users also benefit from usability enhancements to DTS applications such as the DTS 4.0 Editor and DTS Browser, including internationalization capabilities for non-English-speaking environments.

To simplify integration into existing enterprise systems, DTS 4.0 is built on the JEE platform, supporting a complete set of web service APIs, in addition to the existing Java and .NET interfaces. Continuing the company’s commitment to open standards, DTS version 4.0 also supports HL7 Common Terminology Services 2 (CTS2).

According to Stephen Coady, Apelon president and CEO, the increasing use of reference terminologies in healthcare has precipitated the need for enhanced functionality in terminology management tools. “DTS 4.0 evidences our long-term commitment to making open source tools that allow organizations worldwide to improve care using reference terminologies. The new version is simpler to use, and will help even more institutions interoperate and integrate the latest decision support technologies into their daily work.”

DTS establishes a single common resource for an organization’s terminology assets that can be deployed across the spectrum of health delivery systems. Apelon made DTS open source in early 2007, providing the industry with significant cost, integration and adoption advantages compared to proprietary solutions. Since then the software has been downloaded by more than 3,500 informaticists and healthcare organizations worldwide.

You can grab a copy of the software (not the 4.0, yet) at Sourceforge: Apelon-DTS.

I just grabbed a copy so it will be several days before I have substantive comments on the 3.5.2 version of DTS at Sourceforge.

Part of what I will be investigating is how DTS differs from a topic map solution. Which one is appropriate for you will depend on your requirements.

New mapping tools bring public health surveillance to the masses

Sunday, February 12th, 2012

New mapping tools bring public health surveillance to the masses by Kim Krisberg.

From the post:

Many of us probably look into cyberspace and are overwhelmed with its unwieldy amounts of never-ending information. John Brownstein, on the other hand, sees points on a map.

Brownstein is the co-founder of HealthMap, a team of researchers, epidemiologists and software developers at Children’s Hospital Boston who use online sources to track disease outbreaks and deliver real-time surveillance on emerging public health threats. But instead of depending wholly on traditional methods of public health data collection and official reports to create maps, HealthMap enlists helps from, well, just about everybody.

“We recognized that collecting data in more traditional ways can sometimes be difficult and the flow of information can take a while,” said Brownstein, also an assistant professor of pediatrics at Harvard Medical School. “So, the question was how to collect data outside the health care structure to serve public health and the general public.”

HealthMap, which debuted in 2006, scours the Internet for relevant information, aggregating data from online news services, eyewitness reports, professional discussion rooms and official sources. The result? The possibility to map disease trends in places where no public health or health care infrastructures even exist, Brownstein told me. And because HealthMap works non-stop, continually monitoring, sorting and visualizing online information, the system can also serve as an early warning system for disease outbreaks.

You need to read this post and then visit HealthMap.

Collating information from diverse sources is a mainstay of epidemiology.

Topic maps are an effort to bring the benefits of collating information from diverse sources to other fields.

(I first saw this on Beyond Search.)

Mondeca helps to bring Electronic Patient Record to reality

Monday, December 26th, 2011

Mondeca helps to bring Electronic Patient Record to reality

This has been out for a while but I just saw it today.

From the post:

Data interoperability is one of the key issues in assembling unified Electronic Patient Records, both within and across healthcare providers. ASIP Santé, the French national healthcare agency responsible for implementing nation-wide healthcare management systems, has been charged to ensure such interoperability for the French national healthcare.

The task is a daunting one since most healthcare providers use their own custom terminologies and medical codes. This is due to a number of issues with standard terminologies: 1) standard terminologies take too long to be updated with the latest terms; 2) significant internal data, systems, and expertise rely on the usage of legacy custom terminologies; and 3) a part of the business domain is not covered by a standard terminology.

The only way forward was to align the local custom terminologies and codes with the standard ones. This way local data can be automatically converted into the standard representation, which will in turn allow to integrate it with the data coming from other healthcare providers.

I assume the alignment of local custom terminologies is an ongoing process so as the local terminologies change, re-alignment occurs as well?

Kudos to Mondeca for they played an active role in the early days of XTM and I suspect that experience has influenced (for the good), their approach to this project.

Network Modeling and Analysis in Health Informatics and Bioinformatics (NetMAHIB)

Friday, October 28th, 2011

Network Modeling and Analysis in Health Informatics and Bioinformatics (NetMAHIB) Editor-in-Chief: Reda Alhajj, University of Calgary.

From Springer, a new journal of health informatics and bioinformatics.

From the announcement:

NetMAHIB publishes original research articles and reviews reporting how graph theory, statistics, linear algebra and machine learning techniques can be effectively used for modelling and knowledge discovery in health informatics and bioinformatics. It aims at creating a synergy between these disciplines by providing a forum for disseminating the latest developments and research findings; hence results can be shared with readers across institutions, governments, researchers, students, and the industry. The journal emphasizes fundamental contributions on new methodologies, discoveries and techniques that have general applicability and which form the basis for network based modelling and knowledge discovery in health informatics and bioinformatics.

The NetMAHIB journal is proud to have an outstanding group of editors who widely and rigorously cover the multidisciplinary score of the journal. They are known to be research leaders in the field of Health Informatics and Bioinformatics. Further, the NetMAHIB journal is characterized by providing thorough constructive reviews by experts in the field and by the reduced turn-around time which allows research results to be disseminated and shared on timely basis. The target of the editors is to complete the first round of the refereeing process within about 8 to 10 weeks of submission. Accepted papers go to the online first list and are immediately made available for access by the research community.

CITRIS – Center for Information Technology Research in the Interest of Society

Wednesday, September 21st, 2011

CITRIS – Center for Information Technology Research in the Interest of Society

The mission statement:

The Center for Information Technology Research in the Interest of Society (CITRIS) creates information technology solutions for many of our most pressing social, environmental, and health care problems.

CITRIS was created to “shorten the pipeline” between world-class laboratory research and the creation of start-ups, larger companies, and whole industries. CITRIS facilitates partnerships and collaborations among more than 300 faculty members and thousands of students from numerous departments at four University of California campuses (Berkeley, Davis, Merced, and Santa Cruz) with industrial researchers from over 60 corporations. Together the groups are thinking about information technology in ways its never been thought of before.

CITRIS works to find solutions to many of the concerns that face all of us today, from monitoring the environment and finding viable, sustainable energy alternatives to simplifying health care delivery and developing secure systems for electronic medical records and remote diagnosis, all of which will ultimately boost economic productivity. CITRIS represents a bold and exciting vision that leverages one of the top university systems in the world with highly successful corporate partners and government resources.

I mentioned CITRIS as an aside (News: Summarization and Visualization) yesterday but then decided it needed more attention.

Its grants are limited the four University of California campuses mentioned above. Shades of EU funding restrictions. Location has a hand in the selection process.

Still, the projects funded by CITRIS could likely profit from the use of topic maps and as they say, a rising tide lifts all boats.

Health Data Sources

Wednesday, March 2nd, 2011

The Flowing Data blog mentioned two government sources of health data that appeared recently:



Health Indicators Warehouse

From the Flowing Data comments, it appears both have some shortcomings, but it is a start.

Healthcare Terminologies and Classification: Essential Keys to Interoperability

Tuesday, November 2nd, 2010

Healthcare Terminologies and Classification: Essential Keys to Interoperability published by the American Medical Informatics Association and the American Health Information Management Association is a bit dated (2007) but is still a good overview of the area.


  1. What are the major initiatives on interoperability of healthcare terminologies today?
  2. What are the primary resources (web/print) for one of those initiatives?
  3. Prepare a one page abstract for each of five articles on one of these initiatives.

National Center for Biomedical Ontology

Friday, October 22nd, 2010

National Center for Biomedical Ontology

I feel like a kid in a candy store at this site.

I suppose it is being an academic researcher at heart.

Reports on specific resources to follow.

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Sunday, September 19th, 2010

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Destined to be a deeply influential resource.

Read the paper, use the application for a week Chem2Bio2RDF, then answer these questions:

  1. Choose three (3) subjects that are identified in this framework.
  2. For each subject, how is it identified in this framework?
  3. For each subject, have you seen it in another framework or system?
  4. For each subject seen in another framework/system, how was it identified there?

Extra credit: What one thing would you change about any of the identifications in this system? Why?

1st ACM International Health Informatics Symposium – November 11-12, 2010

Wednesday, September 15th, 2010

1st ACM International Health Informatics Symposium – November 11-12, 2010.

Interesting presentations:

  • The Effect of Different Context Representations on Word Sense Discrimination in Biomedical Texts
  • An evaluation of feature sets and sampling techniques for de-identification of medical records
  • Federated Querying Architecture for Clinical & Translational Health IT
  • Contextualizing consumer health information searching: an analysis of questions in a social Q&A community

Will watch for the call for papers for next year. Would be nice to have a topic map paper or two on the program.