Data Management « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 31, 2011

Webdam Project: Foundations of Web Data Management

Filed under: Data,Data Management,Web Applications,XML — Patrick Durusau @ 7:28 pm

Webdam Project: Foundations of Web Data Management

From the homepage:

The goal of the Webdam project is to develop a formal model for Web data management. This model will open new horizons for the development of the Web in a well-principled way, enhancing its functionality, performance, and reliability. Specifically, the goal is to develop a universally accepted formal framework for describing complex and flexible interacting Web applications featuring notably data exchange, sharing, integration, querying and updating. We also propose to develop formal foundations that will enable peers to concurrently reason about global data management activities, cooperate in solving specific tasks and support services with desired quality of service. Although the proposal addresses fundamental issues, its goal is to serve as the basis for future software development for Web data management.

Books from the project:

Foundation of Database, Serge Abiteboul, Rick Hull, Victor Vianu, open access online edition
Web Data Management and Distribution, Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart, open access online edition
Modeling, Querying and Mining Uncertain XML Data Evgeny Kharlamov and Pierre Senellart, , In A. Tagarelli, editor, XML Data Mining: Models, Methods, and Applications. IGI Global, 2011. open access online edition

I discovered this project via a link to “Web Data Management and Distribution” in Christophe Lalanne’s A bag of tweets / Dec 2011, that pointed to the PDF file, some 400 pages. I went looking for the HTML page with the link and discovered this project along with these titles.

There are a number of other publications associated with the project that you may find useful. The “Querying and Mining Uncertain XML” is only a chapter out of a larger publication by IGI Global. About what one expects from IGI Global. Cambrige Press published the title just proceeding this chapter and allows download for personal use of the entire book.

I think there is a lot to be learned from this project, even if it has not resulted in a universal framework for web applications that exchange data. I don’t think we are in any danger of universal frameworks on or off the web. And we are better for it.

Comments Off

November 24, 2011

Weather forecast and good development practices

Filed under: Data,Data Management — Patrick Durusau @ 3:54 pm

Weather forecast and good development practices by Paolo Sonego.

From the post:

Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:

Weather summary for Trieste, Friuli-Venezia Giulia:
The weather in Trieste is clear. The temperature is currently 14°C (57°F). Humidity: 63%.

Fortunately, thanks to the always useful Duncan Temple Lang’s XML package (see here for a tutorial about XML programming under R), it is straightforward to write few lines of R code to invoke the google weather api for the location of interest, retrieve the XML file, parse it using the XPath paradigm and get the required informations:

You may need weather information for your topic map but more importantly, it will be useful if small routines or libraries are written for common data sets. There is little reason for multiple libraries for say census data, unless the data is substantially different.

Comments Off

November 9, 2011

Connecting the Dots: An Introduction

Filed under: Business Intelligence,Data Management,Data Models — Patrick Durusau @ 7:43 pm

Connecting the Dots: An Introduction

A new series of posts by Rick Sherman who writes:

In the real world the situations I discuss or encounter in enterprise BI, data warehousing and MDM implementations lead me to the conclusion that many enterprises simply do not connect the dots. These implementations potentially involve various disciplines such as data modeling, business and data requirements gathering, data profiling, data integration, data architecture, technical architecture, BI design, data governance, master data management (MDM) and predictive analytics. Although many BI project teams have experience in each of these disciplines they’re not applying the knowledge from one discipline to another.

The result is knowledge silos where the the best practices and experience from one discipline is not applied in the other disciplines.

The impact is a loss in productivity for all, higher long-term costs and poorly constructed solutions. This often results in solutions that are difficult to change as the business changes, don’t scale as the data volumes or numbers of uses increase, or is costly to maintain and operate.

Imagine that, knowledge silos in the practice of eliminating knowledge silos.

I suspect that reflects the reality that each of us is a model of a knowledge silo. There are areas we like better than others, areas we know better than others, areas where we simply don’t have the time to learn. But when asked for an answer to our part of a project, we have to have some answer, so we give the one we know. Hard to imagine us doing otherwise.

We can try to offset that natural tendency by reading broadly, looking for new areas or opportunities to learn new techniques, or at least have team members or consultants who make a practice out of surveying knowledge techniques broadly.

Rick promises to show how data modeling is disconnected from the other BI disciplines in the next Connecting the Dots post.

Comments Off

November 6, 2011

Munnecke, Heath Records and VistA (NoSQL 35 years old?)

Filed under: Data Management,Data Structures,Medical Informatics,MUMPS — Patrick Durusau @ 5:42 pm

Tom Munnecke is the inventor of Veterans Health Information Systems and Technology Architecture (VISTA), which is the core for half of the operational electronic health records in existence today.

From the VISTA monograph:

In 1996, the Chief Information Office introduced VISTA, which is the Veterans Health Information Systems and Technology Architecture. It is a rich, automated environment that supports day-to-day operations at local Department of Veterans Affairs (VA) health care facilities.

VISTA is built on a client-server architecture, which ties together workstations and personal computers with graphical user interfaces at Veterans Health Administration (VHA) facilities, as well as software developed by local medical facility staff. VISTA also includes the links that allow commercial off-the-shelf software and products to be used with existing and future technologies. The Decision Support System (DSS) and other national databases that might be derived from locally generated data lie outside the scope of VISTA.

When development began on the Decentralized Hospital Computer Program (DHCP) in the early 1980s, information systems were in their infancy in VA medical facilities and emphasized primarily hospital-based activities. DHCP grew rapidly and is used by many private and public health care facilities throughout the United States and the world. Although DHCP represented the total automation activity at most VA medical centers in 1985, DHCP is now only one part of the overall information resources at the local facility level. VISTA incorporates all of the benefits of DHCP as well as including the rich array of other information resources that are becoming vital to the day-to-day operations at VA medical facilities. It represents the culmination of DHCP’s evolution and metamorphosis into a new, open system, client-server based environment that takes full advantage of commercial solutions, including those provided by Internet technologies.

Yeah, you caught the alternative expansion of DHCP. Surprised me the first time I saw it.

A couple of other posts/resources on Munnecke to consider:

Some of my original notes on the design of VistA and Rehashing MUMPS/Data Dictionary vs. Relational Model.

From the MUMPS/Data Dictionary post:

This is another never-ending story, now going 35 years. It seems that there are these Mongolean hordes of people coming over the horizon, saying the same thing about treating medical informatics as just another transaction processing system. They know banking, insurance, or retail, so therefore they must understand medical informatics as well.

I looked very seriously at the relational model, and rejected it because I thought it was too rigid for the expression of medical informatics information. I made a “grand tour” of the leading medical informatics sites to look at what was working for them. I read and spoke extensively with Chris Date http://en.wikipedia.org/wiki/Christopher_J._Date , Stanford CS prof Gio Wiederhold http://infolab.stanford.edu/people/gio.html (who was later to become the major professor of PhD dropout Sergy Brin), and Wharton professor Richard Hackathorn. I presented papers at national conventions AFIPS and SCAMC, gave colloquia at Stanford, Harvard Medical School, Linkoping University in Sweden, Frankfurt University in Germany, and Chiba University in Japan.

So successful, widespread and mainstream NoSQL has been around for 35 years? 😉

Comments (3)

October 3, 2011

DataCleaner

Filed under: Data Analysis,Data Governance,Data Management,DataCleaner,Software — Patrick Durusau @ 7:08 pm

DataCleaner

From the website:

DataCleaner is an Open Source application for analyzing, profiling, transforming and cleansing data. These activities help you administer and monitor your data quality. High quality data is key to making data useful and applicable to any modern business.

DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more.

Err, “…cleansing data.”? Did someone just call topic maps name? 😉

If it is important to eliminate duplicate data, everyone using duplicated data needs updates and relationships to it. Unless the duplicated data was the result of poor design or just wasting drive space.

This looks like an interesting project and certainly one were topic maps are clearly relevant as one possible output.

Comments (1)

September 6, 2011

SmartData Collective

Filed under: Business Intelligence,Data Management — Patrick Durusau @ 7:16 pm

SmartData Collective

From the about page:

SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.

Maybe a bit more mainstream than what you are accustomed to but think of it as a cross-cultural experience. 😉

Seriously, effective promotion of topic maps means pitching them as solving problems as seen by others, not ourselves.

Comments Off

« Newer Posts