Archive for the ‘Master Data Management’ Category

Ad for Topic Maps

Monday, February 17th, 2014

Imagine my surprise at finding an op-ed piece in Information Management flogging topic maps!

Karen Heath writes in: Is it Really Possible to Achieve a Single Version of Truth?:

There is a pervasive belief that a single version of truth–eliminating data siloes by consolidating all enterprise data in a consistent, non-redundant form – remains the technology-equivalent to the Holy Grail. And, the advent of big data is making it even harder to realize. However, even though SVOT is difficult or impossible to achieve today, beginning the journey is still a worthwhile business goal.

The road to SVOT is paved with very good intentions. SVOT has provided the major justification over the past 20 years for building enterprise data warehouses, and billions of dollars have been spent on relational databases, ETL tools and BI technologies. Millions of resource hours have been expended in construction and maintenance of these platforms, yet no organization is able to achieve SVOT on a sustained basis. Why? Because new data sources, either sanctioned or rogue, are continually being introduced, and existing data is subject to decay of quality over time. As much as 25 percent of customer demographic data, including name, address, contact info, and marital status changes every year. Also, today’s data is more dispersed and distributed and even “bigger” (volume, variety, velocity) than it has ever been.

Karen does a brief overview of why so many SVOT projects have failed (think lack of imagination and insight for starters) but then concludes:

As soon as MDM and DG are recognized as having equal standing with other programs in terms of funding and staffing, real progress can be made toward realization of a sustained SVOT. It takes enlightened management and a committed workforce to understand that successful MDM and DG programs are typically multi-year endeavors that require a significant commitment to of people, processes and technology. MDM and DG are not something that organizations should undertake with a big-bang approach, assuming that there is a simple end to a single project. SVOT is no longer dependent on all data being consolidated into a single physical platform. With effective DG, a federated architecture and robust semantic layer can support a multi-layer, multi-location, multi-product organization that provides its business users the sustained SVOT. That is the reward. (emphasis added)

In case you aren’t “in the know,” DG – data governance, MDM – master data management, SVOT – single version of truth.

The bolded line about the “robust semantic layer” is obviously something topic maps can do quite well. But that’s not where I saw the topic map ad.

I saw the topic map ad being highlighted by:

As soon as MDM and DG are recognized as having equal standing with other programs in terms of funding and staffing

Because that’s never going to happen.

And why should it? GM for example has legendary data management issues but their primary business, MDM and DG people to one side, is making and financing automobiles. They could divert enormous resources to obtain an across the board SVOT but why?

Rather than across the board SVOT, GM is going to want a more selective, a MVOT (My Version Of Truth) application. So it can be applied where it returns the greatest ROI for the investment.

With topic maps as “a federated architecture and robust semantic layer [to] support a multi-layer, multi-location, multi-product organization,” then accounting can have its MVOT, production its MVOT, shipping its MVOT, management its MVOT, regulators their MVOT.

Given the choice between a Single Version Of Truth and your My Version Of Truth, which one would you choose?

That’s what I thought.

PS: Topics maps can also present a SVOT, just in case its advocates come around.

Master Indexing and the Unified View

Monday, March 25th, 2013

Master Indexing and the Unified View by David Loshin.

From the post:

1) Identity resolution – The master data environment catalogs the set of representations that each unique entity exhibits in the original source systems. Applying probabilistic aggregation and/or deterministic rules allows the system to determine that the data in two or more records refers to the same entity, even if the original contexts are different.

2) Data quality improvement – Linking records that share data about the same real-world entity enable the application of business rules to improve the quality characteristics of one or more of the linked records. This doesn’t specifically mean that a single “golden copy” record must be created to replace all instances of the entity’s data. Instead, depending on the scenario and quality requirements, the accessibility of the different sources and the ability to apply those business rules at the data user’s discretion will provide a consolidated view that best meets the data user’s requirements at the time the data is requested.

3) Inverted mapping – Because the scope of data linkage performed by the master index spans the breadth of both the original sources and the collection of data consumers, it holds a unique position to act as a map for a standardized canonical representation of a specific entity to the original source records that have been linked via the identity resolution processes.

In essence this allows you to use a master data index to support federated access to original source data while supporting the application of data quality rules upon delivery of the data.

It’s been a long day but does David’s output have all the attributes of a topic map?

  1. Identity resolution – Two or more representatives the same subject
  2. Data quality improvement – Consolidated view of the data based on a subject and presented to the user
  3. Inverted mapping – Navigation based on a specific entity into original source records


The Ironies of MDM [Master Data Management/Muti-Database Mining]

Saturday, November 24th, 2012

A survey on mining multiple data sources by T. Ramkumar, S. Hariharan and S. Selvamuthukumaran.


Advancements in computer and communication technologies demand new perceptions of distributed computing environments and development of distributed data sources for storing voluminous amount of data. In such circumstances, mining multiple data sources for extracting useful patterns of significance is being considered as a challenging task within the data mining community. The domain, multi-database mining (MDM) is regarded as a promising research area as evidenced by numerous research attempts in the recent past. The methods exist for discovering knowledge from multiple data sources, they fall into two wide categories, namely (1) mono-database mining and (2) local pattern analysis. The main intent of the survey is to explain the idea behind those approaches and consolidate the research contributions along with their significance and limitations.

I can’t reach the full article, yet, but it sounds like one that merits attention.

I was struck by the irony of MDM, which some data types would expand to be “Master Data Management,” is read here to mean, “Multi-Database Mining.”

To be sure, “Master Data Management” can be useful, but be mindful that non-managed data lurks just outside your door.

MDM: It’s Not about One Version of the Truth

Wednesday, October 31st, 2012

MDM: It’s Not about One Version of the Truth by Michele Goetz.

From the post:

Here is why I am not a fan of the “single source of truth” mantra. A person is not one-dimensional; they can be a parent, a friend, a colleague and each has different motivations and requirements depending on the environment. A product is as much about the physical aspect as it is the pricing, message, and sales channel it is sold through. Or, it is also faceted by the fact that it is put together from various products and parts from partners. In no way is a master entity unique or has a consistency depending on what is important about the entity in a given situation. What MDM provides are definitions and instructions on the right data to use in the right engagement. Context is a key value of MDM.

When organizations have implemented MDM to create a golden record and single source of truth, domain models are extremely rigid and defined only within a single engagement model for a process or reporting. The challenge is the master entity is global in nature when it should have been localized. This model does not allow enough points of relationship to create the dimensions needed to extend beyond the initial scope. If you want to now extend, you need to rebuild your MDM model. This is essentially starting over or you ignore and build a layer of redundancy and introduce more complexity and management.

The line:

The challenge is the master entity is global in nature when it should have been localized.

stopped me cold.

What if I said:

“The challenge is a subject proxy is global in nature when it should have been localized.”

Would your reaction be the same?

Shouldn’t subject identity always be local?

Or perhaps better, have you ever experienced a subject identification that wasn’t local?

We may talk about a universal notion of subject but even so we are using a localized definition of universal subject.

If a subject proxy is a container for local identifications, thought to be identifications of the same subject, need we be concerned if it doesn’t claim to be a universal representative for some subject? Or is it sufficient that it is a faithful representative of one or more identifications, thought by some collector to identify the same subject?

I am leaning towards the latter because it jettisons the doubtful baggage of universality.

That is a subject may have more than one collection of local identifications (such collections being subject proxies), none of which is the universal representative for that subject.

Even if we think another collection represents the same subject, merging those collections is a question of your requirements.

You may not want to collect Twitter comments in Hindi about Glee.

Your topic map, your requirements, your call.

PS: You need to read Michele’s original post to discover what could entice management to fund an MDM project. Interoperability of data isn’t it.

30 MDM Customer Use Cases (Master Data Management in action)

Friday, March 16th, 2012

30 MDM Customer Use Cases (Master Data Management in action)

Jakki Geiger writes:

Master Data Management (MDM) has been used by companies for more than eight years to address the challenge of fragmented and inconsistent data across systems. Over the years we’ve compiled quite a cadre of uses cases across industries and strategic initiatives. I thought this outline of the 30 most common MDM initiatives may be of interest to those of you who are just getting started on your MDM journey.

Although these organizations span different industries, face varied business problems and started with diverse domains, you’ll notice that revenue, compliance and operational efficiency are the most common drivers of MDM initiatives. The impetus is to improve the foundational data that’s used for analysis and daily operations. (Click on the chart to make it larger.)

Curious what you make of the “use cases” in the charts?

They are all good goals but I am not sure I would call them “use cases.”

Take HealthCare under Marketing, which reads:

To improve the customer experience and marketing effectiveness with a better understanding of members, their household relationships and plan/policy information.

Is that a use case? For master data management?

The Wikipedia entry on master data management says in part:

At a basic level, MDM seeks to ensure that an organization does not use multiple (potentially inconsistent) versions of the same master data in different parts of its operations, which can occur in large organizations. A common example of poor MDM is the scenario of a bank at which a customer has taken out a mortgage and the bank begins to send mortgage solicitations to that customer, ignoring the fact that the person already has a mortgage account relationship with the bank. This happens because the customer information used by the marketing section within the bank lacks integration with the customer information used by the customer services section of the bank. Thus the two groups remain unaware that an existing customer is also considered a sales lead. The process of record linkage is used to associate different records that correspond to the same entity, in this case the same person.

Other problems include (for example) issues with the quality of data, consistent classification and identification of data, and data-reconciliation issues.

Can you find any “use cases” in the Infomatica post?

BTW, topic maps avoid “inconsistent” data without forcing you to reconcile and update all your data records. (Inquire.)

Structure, Semantics and Master Data Models

Friday, March 9th, 2012

Structure, Semantics and Master Data Models by David Loshin.

From the post:

Looking back at some of my Informatica Perspectives posts over the past year or so, I reflected on some common themes about data management and data governance, especially in the context of master data management and particularly, master data models. As both the tools and the practices around MDM mature, we have seen some disillusionment in attempts to deploy an MDM solution, with our customers noting that they continue to hit bumps in the road in the technical implementation associated with both master data consolidation and then with publication of shared master data.

Almost every issue we see can be characterized into one of three buckets:

What do you think about David’s three buckets? Close? Far away?

David continued this line of postings:

Master Data Model Alternatives – Part 2 March 12, 2012.

Master Data Consolidation Versus Master Data Sharing: Modeling Matters! – Part 3 March 19, 2012.

Considerations for Multi-Domain Master Data Modeling – Part 4 March 26, 2012.

MDM Goes Beyond the Data Warehouse

Wednesday, December 28th, 2011

MDM Goes Beyond the Data Warehouse

Rich Sherman writes:

Enterprises are awash with data from customers, suppliers, employees and their operational systems. Most enterprises have data warehousing (DW) or business intelligence (BI) programs, which sometimes have been operating for many years. The DW/BI programs frequently do not provide the consistent information needed by the business because of multiple and often inconsistent lists of customers, prospects, employees, suppliers and products. Master data management (MDM) is the initiative that is needed to address the problem of inconsistent lists or dimensions.

The reality is that for many years, whether people realized it or not, the DW has served as the default MDM repository. This happened because the EDW had to reconcile and produce a master list of data for every data subject area that the business needs for performing enterprise analytics. Years before the term MDM was coined, MDM was referred to as reference data management. But DW programs have fallen short of providing effective MDM solutions for several reasons.

Interesting take on the problems faced in master data management projects. (Yes, I added index entries for MDM and “master data management.” People might look under one and not the other.)

It occurs to me that there may be transitions towards a master data list that includes understanding data systems that will eventually migrate to the master system. Topic maps could play a useful role in creating the mapping to the master system as well as finding commonalities in other systems to be migrated to the master system.

Documenting the master system with a topic map would give such a project one leg up as they say on its eventual migration to some other system.

And there are always alien data systems that have different data systems from the internal MDM system (assuming that comes to pass), which could also be mapped into the master system using topic maps. I say “assuming that comes to pass” about MDM systems because the “reference data management” if implemented, would have already solved the problems that MDM faces today.

IT services are not regarded as a project with a defined end point. After all, users expect IT services every day. And such services are necessary for any enterprise to conduct business.

Perhaps data integration should move from a “project” orientation to a “process” orientation, so that continued investment and management of the integration process is ongoing and not episodic. That would create a base for in-house expertise at data integration and a continual gathering of information and expertise to anticipate data integration issues, instead of trying to solve them in hindsight.