Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 26, 2014

Cloudera Navigator Demo

Filed under: Cloudera,Hadoop — Patrick Durusau @ 4:22 pm

Cloudera Navigator Demo

Not long (9:50) but useful demo of Cloudera Navigator.

There was a surprise or two.

The first one was the suggestion that if there are multiple columns with different names for zip code (the equivalent of postal codes), that you should normalize all the columns to one name.

Understandable but what if the column has a non-intuitive (to the user) name for the column? Such as CEP?

It appears that “searching” is on surface tokens and we all know the perils of that type of searching. More robust searching would allow for searching for any variant name of postal code, for example, and return the columns that shared the property of being a postal code, without regard to the column name.

The second surprise was that “normalization” as described sets the stage for repeating normatization with each data import. That sounds subject to human error as more and more data sets are imported.

The interface itself appears easy to use, assuming you are satisfied with opaque tokens for which you have to guess the semantics. You could be right but then on the other hand, you could be wrong.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress