Archive for the ‘Conceptualizations’ Category

Big Data and the Coming Conceptual Model Revolution

Sunday, April 22nd, 2012

Big Data and the Coming Conceptual Model Revolution

Malcolm Chisholm writes:

Conceptual models must capture all business concepts and all relevant relationships. If instances of things are also part of the business reality, they must be captured too. Unfortunately, there is no standard methodology and notation to do this. Conceptual models that communicate business reality effectively require some degree of artistic imagination. They are products of analysis, not of design.(emphasis added)

That’s the trick isn’t it? Developing a good conceptual model.

You can have system requirements for multiple Terabytes of data storage, Gigabytes of bandwidth, messages and processes galore, but if you don’t have a good conceptual model, it’s just so much hardware junk.

Are you planning your system based on hardware or software capabilities?

Or are you developing a conceptual model you want to implement in hardware and software?

Which one do you think will come closer to meeting your needs?

General Purpose Computer-Assisted Clustering and Conceptualization

Friday, January 6th, 2012

General Purpose Computer-Assisted Clustering and Conceptualization by Justin Grimmer and Gary King.

Abstract:

We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an “insightful” or “useful” way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given data set (along with millions of other solutions we add based on combinations of existing clusterings), and enable a user to explore and interact with it, and quickly reveal or prompt useful or insightful conceptualizations. In addition, although uncommon in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than either expert human coders or many existing fully automated methods.

Despite my misgivings about metric spaces for semantics, the central theme that clustering (dare I say merging?) cannot be determined in advance of some user viewing the data, makes sense to me. Not every user will want or perhaps even need to do interactive clustering but I think this theme represents a substantial advance in this area.

The publication appeared in the Proceeding of the National Academy of Sciences of the United States of America and the authors are from Stanford and Harvard, respectively. Institutions that value technical and scientific brilliance.