Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 13, 2013

Data deduplication tactics with HDFS and MapReduce [Contractor Plagiarism?]

Filed under: Deduplication,HDFS,MapReduce,Plagiarism — Patrick Durusau @ 11:29 am

Data deduplication tactics with HDFS and MapReduce

From the post:

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.

Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.

From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.

Covers five (5) tactics:

  1. Using HDFS and MapReduce only
  2. Using HDFS and HBase
  3. Using HDFS, MapReduce and a Storage Controller
  4. Using Streaming, HDFS and MapReduce
  5. Using MapReduce with Blocking techniques

In these times of “Great Sequestration,” how much you are spending on duplicated contractor documentation?

You do get electronic forms of documentation. Yes?

Not that difficult to document prior contractor self-plagiarism. Teasing out what you “mistakenly” paid for it may be harder.

Question: Would you rather find out now and correct or have someone else find out?

PS: For the ambitious in government employment. You might want to consider how discovery of contractor self-plagiarism reflects on your initiative and dedication to “good” government.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress