Cognitive systems speculate on big data by Ravi Arimilli.
From the post:
Our brains don’t need to tell our lungs to breathe or our hearts to pump blood. Unfortunately, computers require instructions for everything they do. But what if machines could analyze big data and determine what to do, based on the content of the data, without specific instructions? Patent #8,387,065 establishes a way for computer systems to analyze data in a whole new way, using “speculative” population count (popcount) operations.
Popcount technology has been around for several years. It uses algorithms to pair down the number of traditional instructions a system has to run through to solve a problem. For example, if a problem takes 10,000 instructions to be solved using standard computing, popcount techniques can reduce the number of instructions by more than half.
This is how IBM Watson played Jeopardy! It did not need to be given instructions to look for every possible bit of data to answer a question. Its Power 7-based system used popcount operations to make assumptions about the domain of data in question, to come up with a real time answer.
Reading the patent: Patent #8,387,065, you will find this statement:
An actual method or mechanism by which the popcount is calculated is not described herein because the invention applies to any one of the various popcount algorithms that may be executed by CPU to determine a popcount. (under DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT. There are no section/paragraph numbers, etc.)
IBM patented a process to house a sampling method without ever describing the sampling method. As Ben Stein would say: “wow.”
When I think of IBM patents, I think of eWeek’s IBM Patent: 100 Years of High-Tech Innovations top ten (10) list:
- U.S. Patent #998,631:Perforating Machine
- U.S. Patent #3,387,286:Field-Effect Transistor Memory
- U.S. Patent #4,343,993:Scanning Tunneling Microscope
- U.S. Patent #4,528,626:IBM PC/At
- U.S. Patent #4,784,135:Ultraviolet Surgery
- U.S. Patent #5,319,542:Electronic Catalogue
- U.S. Patent #5,424,054:Nanotubes
- U.S. Patent # 6,496,814:Mining for Maintenance
- U.S. Patent #7,006,793:Safe Use of Electronic Devices in Vehicles
- U.S. Patent #7,684,673:DVR Management
Sampling methods, just like naive Bayes classifiers, work if and only if certain assumptions are met. Naive Bayes classifiers assume all features are independent. Sampling methods, on the other hand, assume a data set is uniform. Meaning that a sample is an accurate reflection of an entire data set.
Uniformity is a chancy assumption because in order to confirm that is the right choice, you have to process data that sampling allows you to avoid.
There are methods to reduce the risks of sampling but it isn’t possible to tell from IBM’s “patent” in this case which of any of them are being used.