Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 7, 2012

Reverse Data Management

Filed under: Reverse Data Management — Patrick Durusau @ 7:19 pm

Reverse Data Management

I encountered “reverse data management” today.

Being unfamiliar I searched for more information and encountered this page, which reads in part as follows:

Forward and Reverse Data Transformations

Database research mainly focuses on forward-moving data flows: source data is subjected to transformations and evolves through queries, aggregations, and view definitions to form a new target in- stance, possibly with a different schema. This forward paradigm underpins most data management tasks today, such as querying, data integration, data mining, etc. We contrast this forward processing with Reverse Data Management (RDM), where the action needs to be performed on the input data, on behalf of desired outcomes in the output data. Some data management tasks already fall under this paradigm, for example updates through views, data generation, data cleaning and repair. RDM is, by necessity, conceptually more difficult to define, and computationally harder to achieve. Today, however, as increasingly more of the available data is derived from other data, there is an increased need to be able to modify the input in order to achieve a desired effect on the output, motivating a systematic study of RDM.

[graphic omitted]

In general, RDM problems are harder to formulate and implement, because of the simple fact that the inverse of a function is not always a function. Given a desired output (or change to the output), there are multiple inputs (or none at all) that can produce it. This is a prevailing difficulty in all RDM problems. This project aims to develop a unified framework for Reverse Data Management problems, which will bring together several subfields of database research. RDM problems can be classified along two dimensions, as shown in the table below. On the “target” dimension, problems are divided into those that have explicit and those that have implicit specifications. The former means that the desired target effect is given as a tuple-level data instance; this is the case in causality and view updates. The latter means that the target effect is described indirectly, through statistics and constraints; examples include how-to queries and data generation. On the “source” dimension, problems are divided in those that use a reference source, and those that do not. For example, view updates and how-to queries fall under the former category, while data generation under the latter.

I am encouraged by the view that changes in inputs can cause changes in outputs.

It sounds trivial to say.

It is one step down the slippery path where outputs aren’t determinate, in some manner “out there” and inter-subjective.

Outputs depend upon what we make of the inputs.

If I don’t like the outcome, I just need a new set of inputs. Or a new reading of the old ones.

And I need to be mindful that is always the case, whatever I think of the current outputs.

If the edges of database research are exploring RDM issues, that should be a warning to the intelligence community that appears to think value can be derived from crunching enough data.

Perhaps, but be mindful that data crunching produces outcomes based on inputs. If the inputs change, so may the outputs. Something to think about.

Particularly if your integration solution is “lite” on enabling you to probe (alter?) the subject identities as inputs that are shaping your outputs.

Make no mistake, whether we acknowledge it or not, ever datum, every data structure, every operation, every input and every output represents choices. Choices that can be explicit and accounted for or not.

RDM looks like a coming “hot” field of research that addresses some of those issues.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress