Archive for the ‘Data Management’ Category

Information Management – Gartner 2013 “Predictions”

Wednesday, April 3rd, 2013

I hesitate to call Gartner reports “predictions.”

The public ones I have seen are c-suite summaries of information already known to the semi-informed.

Are Gartner “predictions” about what c-suite types may become informed about in the coming year?

That qualifies for the dictionary sense of “prediction.”

More importantly, what c-suite types may become informed about are clues on how to promote topic maps.

If you don’t have access to the real Gartner reports, Andy Price has summarized information management predictions in: IT trends: Gartner’s 2013 predictions for information management.

The ones primarily relevant to topic maps are:

  • Big data
  • Semantic technologies
  • The logical data warehouse
  • NoSQL DBMSs
  • Information stewardship applications
  • Information valuation/infonomics

One possible way to capitalize on these “predictions” would be to create a word cloud from the articles reporting on these “predictions.”

Every article with use slightly different language and the most popular terms are the ones to use for marketing.

Thinking they will be repeated often enough to resonate with potential customers.

Capturing the business needs answered by those terms would be a separate step.

Project Falcon…

Wednesday, April 3rd, 2013

Project Falcon: Tackling Hadoop Data Lifecycle Management via Community Driven Open Source by Venkatesh Seetharam.

From the post:

Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.

All About Falcon and Data Lifecycle Management

Falcon is a data lifecycle management framework for Apache Hadoop that enables users to configure, manage and orchestrate data motion, disaster recovery, and data retention workflows in support of business continuity and data governance use cases.

Falcon workflow

I am certain a topic map based workflow solution could be created.

However, using a solution being promoted by others removes one thing from the topic map “to do” list.

Not to mention giving topic maps an introduction to other communities.

Research Data Symposium – Columbia

Saturday, March 9th, 2013

Research Data Symposium – Columbia.

Posters from the Research Data Symposium, held at Columbia University, February 27, 2013.

Subject to the limitations of the poster genre but useful as a quick overview of current projects and directions.

Data Governance needs Searchers, not Planners

Wednesday, March 6th, 2013

Data Governance needs Searchers, not Planners by Jim Harris.

From the post:

In his book Everything Is Obvious: How Common Sense Fails Us, Duncan Watts explained that “plans fail, not because planners ignore common sense, but rather because they rely on their own common sense to reason about the behavior of people who are different from them.”

As development economist William Easterly explained, “A Planner thinks he already knows the answer; A Searcher admits he doesn’t know the answers in advance. A Planner believes outsiders know enough to impose solutions; A Searcher believes only insiders have enough knowledge to find solutions, and that most solutions must be homegrown.”

I made a similar point in my post Data Governance and the Adjacent Possible. Change management efforts are resisted when they impose new methods by emphasizing bad business and technical processes, as well as bad data-related employee behaviors, while ignoring unheralded processes and employees whose existing methods are preventing other problems from happening.

If you don’t remember any line from any post you read here or elsewhere, remember this one:

“…they rely on their own common sense to reason about the behavior of people who are different from them.”

Whenever you encounter a situation where that description fits, you will find failed projects, waste and bad morale.

Why Most BI Programs Under-Deliver Value

Sunday, February 10th, 2013

Why Most BI Programs Under-Deliver Value by Steve Dine.

From the post:

Business intelligence initiatives have been undertaken by organizations across the globe for more than 25 years, yet according to industry experts between 60 and 65 percent of BI projects and programs fail to deliver on the requirements of their customers.

This impact of this failure reaches far beyond the project investment, from unrealized revenue to increased operating costs. While the exact reasons for failure are often debated, most agree that a lack of business involvement, long delivery cycles and poor data quality lead the list. After all this time, why do organizations continue to struggle with delivering successful BI? The answer lies in the fact that they do a poor job at defining value to the customer and how that value will be delivered given the resource constraints and political complexities in nearly all organizations.

BI is widely considered an umbrella term for data integration, data warehousing, performance management, reporting and analytics. For the vast majority of BI projects, the road to value definition starts with a program or project charter, which is a document that defines the high level requirements and capital justification for the endeavor. In most cases, the capital justification centers on cost savings rather than value generation. This is due to the level of effort required to gather and integrate data across disparate source systems and user developed data stores.

As organizations mature, the number of applications that collect and store data increase. These systems usually contain few common unique identifiers to help identify related records and are often referred to as data silos. They also can capture overlapping data attributes for common organizational entities, such as product and customer. In addition, the data models of these systems are usually highly normalized, which can make them challenging to understand and difficult for data extraction. These factors make cost savings, in the form of reduced labor for data collection, easy targets. Unfortunately, most organizations don’t eliminate employees when a BI solution is implemented; they simply work on different, hopefully more value added, activities. From the start, the road to value is based on a flawed assumption and is destined to under deliver on its proposition.

This post merits a close read, several times.

In particular I like the focus on delivery of value to the customer.

Err, that would be the person paying you to do the work.

Steve promises a follow-up on “lean BI” that focuses on delivering more value that it costs to deliver.

I am inherently suspicious of “lean” or “agile” approaches. I sat on a committee that was assured by three programmers they had improved upon IBM’s programming methodology but declined to share the details.

Their requirements document for a content management system, to be constructed on top of subversion, was a paragraph in an email.

Fortunately the committee prevailed upon management to tank the project. The programmers persist, management being unable or unwilling to correct past mistakes.

I am sure there are many agile/lean programming projects that deliver well documented, high quality results.

But I don’t start with the assumption that agile/lean or other methodology projects are well documented.

That is a question of fact. One that can be answered.

Refusal to answer due to time or resource constraints, is a very bad sign.

I first saw this in a top ten tweets list from KDNuggets.

Mule ESB 3.3.1

Thursday, September 13th, 2012

Mule ESB 3.3.1 by Ramiro Rinaudo.

I got the “memo” on 4 September 2012 but it got lost in my inbox. Sorry.

From the post:

Mule ESB 3.3.1 represents a significant amount of effort on the back of Mule ESB 3.3 and our happiness with the result is multiplied by the number of products that are part of this release. We are releasing new versions with multiple enhancements and bug fixes to all of the major stack components in our Enterprise Edition. This includes:

Are You An IT Hostage?

Monday, August 13th, 2012

As I promised last week in From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report], the key finding that is missing from Oracle’s summary:

Executives’ Biggest Data Management Gripes:*

#1 Don’t have the right systems in place to gather the information we need (38%)

#2 Can’t give our business managers access to the information they need; need to rely on IT (36%)

Ask your business managers: Do they feel like IT hostages?

You are likely to be surprised at the answers you get.

IT’s vocabulary acts as an information clog.

A clog that impedes the flow of information in your organization.

Information that can improve the speed and quality of business decision making.

The critical point is: Information clogs are bad for business.

Do you want to borrow my plunger?

From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report]

Friday, August 10th, 2012

From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report]

Summary:

IT powers today’s enterprises, which is particularly true for the world’s most data-intensive industries. Organizations in these highly specialized industries increasingly require focused IT solutions, including those developed specifically for their industry, to meet their most pressing business challenges, manage and extract insight from ever-growing data volumes, improve customer service, and, most importantly, capitalize on new business opportunities.

The need for better data management is all too acute, but how are enterprises doing? Oracle surveyed 333 C-level executives from U.S. and Canadian enterprises spanning 11 industries to determine the pain points they face regarding managing the deluge of data coming into their organizations and how well they are able to use information to drive profit and growth.

Key Findings:

  • 94% of C-level executives say their organization is collecting and managing more business information today than two years ago, by an average of 86% more
  • 29% of executives give their organization a “D” or “F” in preparedness to manage the data deluge
  • 93% of executives believe their organization is losing revenue – on average, 14% annually – as a result of not being able to fully leverage the information they collect
  • Nearly all surveyed (97%) say their organization must make a change to improve information optimization over the next two years
  • Industry-specific applications are an important part of the mix; 77% of organizations surveyed use them today to run their enterprise—and they are looking for more tailored options

What key finding did they miss?

They cover it in the forty-two (42) page report but it doesn’t appear here.

Care to guess what it is?

Forgotten key finding post coming Monday, 13 August 2012. Watch for it!

I first saw this at Beyond Search.

Data citation initiatives and issues

Monday, June 25th, 2012

Data citation initiatives and issues by Matthew S. Mayernik (Bulletin of the American Society for Information Science and Technology Volume 38, Issue 5, pages 23–28, June/July 2012)

Abstract:

The importance of formally citing scientific research data has been recognized for decades but is only recently gaining momentum. Several federal government agencies urge data citation by researchers, DataCite and its digital object identifier registration services promote the practice of citing data, international citation guidelines are in development and a panel at the 2012 ASIS&T Research Data Access and Preservation Summit focused on data citation. Despite strong reasons to support data citation, the lack of individual user incentives and a pervasive cultural inertia in research communities slow progress toward broad acceptance. But the growing demand for data transparency and linked data along with pressure from a variety of stakeholders combine to fuel effective data citation. Efforts promoting data citation must come from recognized institutions, appreciate the special characteristics of data sets and initially emphasize simplicity and manageability.

This is an important and eye-opening article on the state of data citations and issues related to it.

I found it surprising in part because citation of data in radio and optical astronomy has long been commonplace. In part because for decades now, the astronomical community has placed a high value on public archiving of research data as it is acquired, both in raw and processed formats.

As pointed out in this paper, without public archiving, there can be no effective form of data citation. Sad to say, the majority of data never makes it to public archives.

Given the reliance on private and public sources of funding for research, public archiving and access should be guaranteed as a condition of funding. Researchers would be free to continue to not make their data publicly accessible, should they choose to fund their own work.

If that sounds harsh, consider the well deserved amazement at the antics over access to the Dead Sea Scrolls.

If the only way for your opinion/analysis to prevail is to deny others access to the underlying data, that is all the commentary the community needs on your work.

Cascading 2.0

Thursday, June 7th, 2012

Cascading 2.0

From the post:

We are happy to announce that Cascading 2.0 is now publicly available for download.

http://www.cascading.org/downloads/

This release includes a number of new features. Specifically:

  • Apache 2.0 Licensing
  • Support for Hadoop 1.0.2
  • Local and Hadoop planner modes, where local runs in memory without Hadoop dependencies
  • HashJoin pipe for “map side joins”
  • Merge pipe for “map side merges”
  • Simple Checkpointing for capturing intermediate data as a file
  • Improved Tap and Scheme APIs

We have also created a new top-level project on GitHub for all community sponsored Cascading projects:

https://github.com/Cascading

From the documentation:

What is Cascading?

Cascading is a data processing API and processing query planner used for defining, sharing, and executing data-processing workflows on a single computing node or distributed computing cluster. On a single node, Cascading’s “local mode” can be used to efficiently test code and process local files before being deployed on a cluster. On a distributed computing cluster using Apache Hadoop platform, Cascading adds an abstraction layer over the Hadoop API, greatly simplifying Hadoop application development, job creation, and job scheduling.

Cascading homepage.

Don’t miss the extensions to Cascading: Cascading Extensions. Any summary would be unfair. Take a look for yourself. Coverage of any of these you would like to point out?

I first spotted Cascading 2.0 at Alex Popescu’s myNoSQL.

Data Management is Based on Philosophy, Not Science

Tuesday, May 1st, 2012

Data Management is Based on Philosophy, Not Science by Malcolm Chisholm.

From the post:

There’s a joke running around on Twitter that the definition of a data scientist is “a data analyst who lives in California.” I’m sure the good natured folks of the Golden State will not object to me bringing this up to make a point. The point is: Thinking purely in terms of marketing, which is a better title — data scientist or data philosopher?

My instincts tell me there is no contest. The term data scientist conjures up an image of a tense, driven individual, surrounded by complex technology in a laboratory somewhere, wrestling valuable secrets out of the strange substance called data. By contrast, the term data philosopher brings to mind a pipe-smoking elderly gentleman sitting in a winged chair in some dusty recess of academia where he occasionally engages in meaningless word games with like-minded individuals.

These stereotypes are obviously crude, but they are probably what would come into the minds of most executive managers. Yet how true are they? I submit that there is a strong case that data management is much more like applied philosophy than it is like applied science.

Applied philosophy. I like that!

You know where I am going to come out on this issue so I won’t belabor it.

Enjoy reading Malcolm’s post!

When It Comes to Data Quality Delivery, the Soft Stuff is the Hard Stuff (Part 1 of 6)

Saturday, March 10th, 2012

When It Comes to Data Quality Delivery, the Soft Stuff is the Hard Stuff (Part 1 of 6) by Richard Trapp.

From the post:

I regularly receive questions regarding the types of skills data quality analysts should have in order to be effective. In my experience, regardless of scope, high performing data quality analysts need to possess a well-rounded, balanced skill set – one that marries technical “know how” and aptitude with a solid business understanding and acumen. But, far too often, it seems that undue importance is placed on what I call the data quality “hard skills”, which include; a firm grasp of database concepts, hands on data analysis experience using standard analytical tool sets, expertise with commercial data quality technologies, knowledge of data management best practices and an understanding of the software development life cycle.

Read Richard’s post to get the listing of “soft skills” and evaluate yourself.

I am going to track this series and will post updates here.

Being successful with “big data,” semantic integration, whatever the next buzz words are, will require a mix of hard and soft skills.

Success has always required both hard and soft skills, but it doesn’t hurt to repeat the lesson.

Selling Data Mining to Management

Sunday, February 19th, 2012

Selling Data Mining to Management by Sandro Saitta.

From the post:

Preparing data and building data mining models are two very well documented steps of analytics projects. However, whatever interesting your results are, they are useless if no action is taken. Thus, the step from analytics to action is a crucial one in any analytics project. Imagine you have the best data and found the best model of all time. You need to industrialize the data mining solution to make your company benefits from them. Often, you will first need to sell your project to the management.

Sandro references three very good articles on pitching data management/mining/analytics to management.

I would rephrase Sandra’s opening line to read: “Preparing data [for a topic map] and building [a topic map] are two very well documented steps of [topic map projects]. However, whatever interesting your results are, [there is no revenue if no one buys the map].”

OK, maybe I am being generous on the preparing data and building a topic map points but you can see where the argument is going.

And there are successful topic map merchants with active clients, just not enough of either one.

These papers maybe the push in the right direction to get more of them.

First Look — Talend

Saturday, January 7th, 2012

First Look — Talend

From the post:

Talend has been around for about 6 years and the original focus was on “democratizing” data integration – making it cheaper, easier, quicker and less maintenance-heavy. They originally wanted to build an open source alternative for data integration. In particular they wanted to make sure that there was a product that worked for smaller companies and smaller projects, not just for large data warehouse efforts.

Talend has 400 employees in 8 countries and 2,500 paying customers for their Enterprise product. Talend uses an “open core” philosophy where the core product is open source and the enterprise version wraps around this as a paid product. They have expanded from pure data integration into a broader platform with data quality and MDM and a year ago they acquired an open source ESB vendor and earlier this year released a Talend branded version of this ESB.

I have the Talend software but need to spend some time working through the tutorials, etc.

A review from a perspective of subject identity and re-use of subject identification.

It may help me to simply start posting as I work through the software rather than waiting to create an edited review of the whole. Which I could always fashion from the pieces if it looked useful.

Watch for the start of my review of Talend this next week.

What the Sumerians can teach us about data

Tuesday, January 3rd, 2012

What the Sumerians can teach us about data

Pete Warden writes:

I spent this afternoon wandering the British Museum’s Mesopotamian collection, and I was struck by what the humanities graduates in charge of the displays missed. The way they told the story, the Sumerian’s biggest contribution to the world was written language, but I think their greatest achievement was the invention of data.

Writing grew out of pictograms that were used to tally up objects or animals. Historians and other people who write for a living treat that as a primitive transitional use, a boring stepping-stone to the final goal of transcribing speech and transmitting stories. As a data guy, I’m fascinated by the power that being able to capture and transfer descriptions of the world must have given the Sumerians. Why did they invent data, and what can we learn from them?

Although Pete uses the term “Sumerians” to cover a very wide span of peoples, languages and history, I think his comment:

Gathering data is not a neutral act, it will alter the power balance, usually in favor of the people collecting the information.

is right on the mark.

There aspect of data management that we can learn from the Ancient Near East (not just the Sumerians).

Preservation of access.

It isn’t enough to simply preserve data. You can ask NASA preservation of data. (Houston, We Erased The Apollo 11 Tapes)

Particularly with this attitude:

“We’re all saddened that they’re not there. We all wish we had 20-20 hindsight,” says Dick Nafzger, a TV specialist at NASA’s Goddard Space Flight Center in Maryland, who helped lead the search team.

“I don’t think anyone in the NASA organization did anything wrong,” Nafzger says. “I think it slipped through the cracks, and nobody’s happy about it.”

Didn’t do anything wrong?

You do know the leading cause for firing of sysadmins is failure to maintain proper backups? I would hold everyone standing near a crack responsible. Would not bring the missing tapes back but it would make future generations more careful.

Considering that was only a few decades ago, how do we read ancient texts for which we have no key in English?

The ancients preserved access to their data by way of triliteral inscriptions. Inscriptions in three different languages but all saying the same thing. If you know only one of the languages you can work towards understanding the other two.

A couple of examples:

Van Fortress, with an inscription of Xerxes the Great.

Behistun Inscription, with an inscription in Old Persian, Elamite, and Babylonian.

BTW, the final image in Pete’s post is much later than the Sumerians and is one of the first cuneiform artifacts to be found. (Taylor’s Prism) It describes King Sennacherib’s military victories and dates from about 691 B.C. It is written in Neo-Assyrian cuneiform script. That script is used in primers and introductions to Akkadian.

Can I guess how many mappings you have of your ontologies or database schemas? I suppose the first question should be if they are documented at all? Then follow up with the question of about mapping to other ontologies or schemas. Such as an industry standard schema or set of terms.

If that sounds costly, consider the cost of migration/integration without documentation/mapping. Topic maps can help with the mapping aspects of such a project.

Webdam Project: Foundations of Web Data Management

Saturday, December 31st, 2011

Webdam Project: Foundations of Web Data Management

From the homepage:

The goal of the Webdam project is to develop a formal model for Web data management. This model will open new horizons for the development of the Web in a well-principled way, enhancing its functionality, performance, and reliability. Specifically, the goal is to develop a universally accepted formal framework for describing complex and flexible interacting Web applications featuring notably data exchange, sharing, integration, querying and updating. We also propose to develop formal foundations that will enable peers to concurrently reason about global data management activities, cooperate in solving specific tasks and support services with desired quality of service. Although the proposal addresses fundamental issues, its goal is to serve as the basis for future software development for Web data management.

Books from the project:

  • Foundation of Database, Serge Abiteboul, Rick Hull, Victor Vianu, open access online edition
  • Web Data Management and Distribution, Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart, open access online edition
  • Modeling, Querying and Mining Uncertain XML Data Evgeny Kharlamov and Pierre Senellart, , In A. Tagarelli, editor, XML Data Mining: Models, Methods, and Applications. IGI Global, 2011. open access online edition

I discovered this project via a link to “Web Data Management and Distribution” in Christophe Lalanne’s A bag of tweets / Dec 2011, that pointed to the PDF file, some 400 pages. I went looking for the HTML page with the link and discovered this project along with these titles.

There are a number of other publications associated with the project that you may find useful. The “Querying and Mining Uncertain XML” is only a chapter out of a larger publication by IGI Global. About what one expects from IGI Global. Cambrige Press published the title just proceeding this chapter and allows download for personal use of the entire book.

I think there is a lot to be learned from this project, even if it has not resulted in a universal framework for web applications that exchange data. I don’t think we are in any danger of universal frameworks on or off the web. And we are better for it.

Weather forecast and good development practices

Thursday, November 24th, 2011

Weather forecast and good development practices by Paolo Sonego.

From the post:

Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:

Weather summary for Trieste, Friuli-Venezia Giulia:
The weather in Trieste is clear. The temperature is currently 14°C (57°F). Humidity: 63%.

Fortunately, thanks to the always useful Duncan Temple Lang’s XML package (see here for a tutorial about XML programming under R), it is straightforward to write few lines of R code to invoke the google weather api for the location of interest, retrieve the XML file, parse it using the XPath paradigm and get the required informations:

You may need weather information for your topic map but more importantly, it will be useful if small routines or libraries are written for common data sets. There is little reason for multiple libraries for say census data, unless the data is substantially different.

Connecting the Dots: An Introduction

Wednesday, November 9th, 2011

Connecting the Dots: An Introduction

A new series of posts by Rick Sherman who writes:

In the real world the situations I discuss or encounter in enterprise BI, data warehousing and MDM implementations lead me to the conclusion that many enterprises simply do not connect the dots. These implementations potentially involve various disciplines such as data modeling, business and data requirements gathering, data profiling, data integration, data architecture, technical architecture, BI design, data governance, master data management (MDM) and predictive analytics. Although many BI project teams have experience in each of these disciplines they’re not applying the knowledge from one discipline to another.

The result is knowledge silos where the the best practices and experience from one discipline is not applied in the other disciplines.

The impact is a loss in productivity for all, higher long-term costs and poorly constructed solutions. This often results in solutions that are difficult to change as the business changes, don’t scale as the data volumes or numbers of uses increase, or is costly to maintain and operate.

Imagine that, knowledge silos in the practice of eliminating knowledge silos.

I suspect that reflects the reality that each of us is a model of a knowledge silo. There are areas we like better than others, areas we know better than others, areas where we simply don’t have the time to learn. But when asked for an answer to our part of a project, we have to have some answer, so we give the one we know. Hard to imagine us doing otherwise.

We can try to offset that natural tendency by reading broadly, looking for new areas or opportunities to learn new techniques, or at least have team members or consultants who make a practice out of surveying knowledge techniques broadly.

Rick promises to show how data modeling is disconnected from the other BI disciplines in the next Connecting the Dots post.

Munnecke, Heath Records and VistA (NoSQL 35 years old?)

Sunday, November 6th, 2011

Tom Munnecke is the inventor of Veterans Health Information Systems and Technology Architecture (VISTA), which is the core for half of the operational electronic health records in existence today.

From the VISTA monograph:

In 1996, the Chief Information Office introduced VISTA, which is the Veterans Health Information Systems and Technology Architecture. It is a rich, automated environment that supports day-to-day operations at local Department of Veterans Affairs (VA) health care facilities.

VISTA is built on a client-server architecture, which ties together workstations and personal computers with graphical user interfaces at Veterans Health Administration (VHA) facilities, as well as software developed by local medical facility staff. VISTA also includes the links that allow commercial off-the-shelf software and products to be used with existing and future technologies. The Decision Support System (DSS) and other national databases that might be derived from locally generated data lie outside the scope of VISTA.

When development began on the Decentralized Hospital Computer Program (DHCP) in the early 1980s, information systems were in their infancy in VA medical facilities and emphasized primarily hospital-based activities. DHCP grew rapidly and is used by many private and public health care facilities throughout the United States and the world. Although DHCP represented the total automation activity at most VA medical centers in 1985, DHCP is now only one part of the overall information resources at the local facility level. VISTA incorporates all of the benefits of DHCP as well as including the rich array of other information resources that are becoming vital to the day-to-day operations at VA medical facilities. It represents the culmination of DHCP’s evolution and metamorphosis into a new, open system, client-server based environment that takes full advantage of commercial solutions, including those provided by Internet technologies.

Yeah, you caught the alternative expansion of DHCP. Surprised me the first time I saw it.

A couple of other posts/resources on Munnecke to consider:

Some of my original notes on the design of VistA and Rehashing MUMPS/Data Dictionary vs. Relational Model.

From the MUMPS/Data Dictionary post:

This is another never-ending story, now going 35 years. It seems that there are these Mongolean hordes of people coming over the horizon, saying the same thing about treating medical informatics as just another transaction processing system. They know banking, insurance, or retail, so therefore they must understand medical informatics as well.

I looked very seriously at the relational model, and rejected it because I thought it was too rigid for the expression of medical informatics information. I made a “grand tour” of the leading medical informatics sites to look at what was working for them. I read and spoke extensively with Chris Date http://en.wikipedia.org/wiki/Christopher_J._Date , Stanford CS prof Gio Wiederhold http://infolab.stanford.edu/people/gio.html (who was later to become the major professor of PhD dropout Sergy Brin), and Wharton professor Richard Hackathorn. I presented papers at national conventions AFIPS and SCAMC, gave colloquia at Stanford, Harvard Medical School, Linkoping University in Sweden, Frankfurt University in Germany, and Chiba University in Japan.

So successful, widespread and mainstream NoSQL has been around for 35 years? ;-)

DataCleaner

Monday, October 3rd, 2011

DataCleaner

From the website:

DataCleaner is an Open Source application for analyzing, profiling, transforming and cleansing data. These activities help you administer and monitor your data quality. High quality data is key to making data useful and applicable to any modern business.

DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more.

Err, “…cleansing data.”? Did someone just call topic maps name? ;-)

If it is important to eliminate duplicate data, everyone using duplicated data needs updates and relationships to it. Unless the duplicated data was the result of poor design or just wasting drive space.

This looks like an interesting project and certainly one were topic maps are clearly relevant as one possible output.

SmartData Collective

Tuesday, September 6th, 2011

SmartData Collective

From the about page:

SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.

Maybe a bit more mainstream than what you are accustomed to but think of it as a cross-cultural experience. ;-)

Seriously, effective promotion of topic maps means pitching them as solving problems as seen by others, not ourselves.