Archive for the ‘Predictive Analytics’ Category

UC Irvine Extension Announces Winter Predictive Analytics and Info System Courses

Wednesday, December 12th, 2012

UC Irvine Extension Announces Winter Predictive Analytics and Info System Courses

From the post:

Predictive Analytics Certificate Program:

This program is designed for professionals who are using or wish to use Predictive Analytics to optimize business performance at a variety of levels. UC Irvine Extension is offering the following webinar and two courses during winter quarter:

Predictive Analytics Special Topic Webinar: Text Analytics & Text Mining (Jan. 15, 11:30 a.m. to 12:30 p.m., PST) - This free webinar will provide participants with the introductory concepts of text analytics and text mining that are used to recognize how stored, unstructured data represents an extremely valuable source of business information.

Course: Effective Data Preparation (Jan. 7 to Feb. 24) - This online course will address how to extract stored data elements, transform their formats, and derive new relationships among them, in order to produce a dataset suitable for analytical modeling. Course instructor Dr. Robert Nisbet, chief scientist at Smogfarm, which studies crowd psychology, will provide attendees with the skills to produce a fully processed data set compatible for building powerful predictive models.

Course: Text Analytics & Text Mining (Jan. 28 to March 24) - This new online course instructed by Dr. Gary Miner, author of Handbook of Statistical Analysis & Data Mining Applications and Practical Text Mining, will focus on basic concepts of textual information including tokenization and part-of-speech tagging. The course will expose participants to practical techniques for text extraction and text mining, document clustering and classification, information retrieval, and the enhancement of structured data.

Just so you know, the webinar is free but Effective Data Preparation and Text Analytics & Text Mining are $695.00 each.

I am always made more curious by the omission of the most obvious questions from an FAQ or location of the information in very non-prominent places.

I suspect well worth the price but why not be up front with the charges?

BigML creates a marketplace for Predictive Models

Friday, October 26th, 2012

BigML creates a marketplace for Predictive Models by Ajay Ohri.

From the post:

BigML has created a marketplace for selling Datasets and Models. This is a first (?) as the closest market for Predictive Analytics till now was Rapid Miner’s marketplace for extensions (at http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml)

From http://blog.bigml.com/2012/10/25/worlds-first-predictive-marketplace/

SELL YOUR DATA

You can make your Dataset public. Mind you: the Datasets we are talking about are BigML’s fancy histograms. This means that other BigML users can look at your Dataset details and create new models based on this Dataset. But they can not see individual records or columns or use it beyond the statistical summaries of the Dataset. Your Source will remain private, so there is no possibility of anyone accessing the raw data.

SELL YOUR MODEL

Now, once you have created a great model, you can share it with the rest of the world. For free or at any price you set.Predictions are paid for in BigML Prediction Credits. The minimum price is ‘Free’ and the maximum price indicated is 100 credits.

Having a public, digital marketplace for data and data analysis has been proposed by many and attempted by more than just a few.

Data is bought and sold today, but not by the digital equivalent of small shop keepers. The shop keepers who changed the face of Europe.

Data is bought and sold today by the digital equivalent of the great feudal lords. Complete with castles (read silos).

Will BigML give rise to a new mercantile class?

Or just as importantly, will you be a member of it or bound to the estate of a feudal lord?

Fall Lineup: Protest Monitoring, Bin Laden Letters Analysis, … [Defensive Big Data (DBD)]

Friday, August 24th, 2012

Protest Monitoring, Bin Laden Letters Analysis, and Building Custom Applications

OK, not “Fall Lineup” in the TV sense. ;-)

Webinars from Recorded Future in September, 2012.

All start at 11 AM EST.

These webinars should help you learn how data mining looks for clues or how to not leave clues.

Is the term: Defensive Big Data (DBD) in common usage?

Think of using Mahout to analyze email traffic to support reforming your emails to be close to messages that are routinely ignored.

Predictive Models: Build once, Run Anywhere

Tuesday, August 21st, 2012

Predictive Models: Build once, Run Anywhere

From the post:

We have released a new version of our open source Python bindings. This new version aims at showing how the BigML API can be used to build predictive models capable of generating predictions locally or remotely. You can get full access to the code at Github and read the full documentation at Read the Docs.

The complete list of updates includes (drum roll, please):

Development Mode

We recently introduced a free sandbox to help developers play with BigML on smaller datasets without being concerned about credits. In the new Python bindings you can use BigML in development mode, and all dataset and models smaller than 1MB can be created for free:

from bigml.api import BigML

api = BigML(dev_mode=True)

A “sandbox” for your machine learning experiments!

Day Nine of a Predictive Coding Narrative: A scary search…

Wednesday, August 8th, 2012

Day Nine of a Predictive Coding Narrative: A scary search for false-negatives, a comparison of my CAR with the Griswold’s, and a moral dilemma by Ralph Losey.

From the post:

In this sixth installment I continue my description, this time covering day nine of the project. Here I do a quality control review of a random sample to evaluate my decision in day eight to close the search.

Ninth Day of Review (4 Hours)

I began by generating a random sample of 1,065 documents from the entire null set (95% +/- 3%) of all documents not reviewed. I was going to review this sample as a quality control test of the adequacy of my search and review project. I would personally review all of them to see if any were False Negatives, in other words, relevant documents, and if relevant, whether any were especially significant or Highly Relevant.

I was looking to see if there were any documents left on the table that should have been produced. Remember that I had already personally reviewed all of the documents that the computer had predicted were like to be relevant (51% probability). I considered the upcoming random sample review of the excluded documents to be a good way to check the accuracy of reliance on the computer’s predictions of relevance.

I know it is not the only way, and there are other quality control measures that could be followed, but this one makes the most sense to me. Readers are invited to leave comments on the adequacy of this method and other methods that could be employed instead. I have yet to see a good discussion of this issue, so maybe we can have one here.

I can appreciate Ralph’s apprehension at a hindsight review of decisions already made. In legal proceedings, decisions are made and they move forward. Some judgements/mistakes can be corrected, others are simply case history.

Days Seven and Eight of a Predictive Coding Narrative [Re-Use of Analysis?]

Wednesday, August 8th, 2012

Days Seven and Eight of a Predictive Coding Narrative: Where I have another hybrid mind-meld and discover that the computer does not know God by Ralph Losey.

From the post:

In this fifth installment I will continue my description, this time covering days seven and eight of the project. As the title indicates, progress continues and I have another hybrid mind-meld moment. I also discover that the computer does not recognize the significance of references to God in an email. This makes sense logically, but is unexpected and kind of funny when encountered in a document review.

Ralph discovered new terms to use for training as the analysis of the documents progressed.

While Ralph captures those for his use, my question would be how to capture what he learned for re-use?

As in re-use by other parties, perhaps in other litigation.

Thinking of reducing the cost of discovery by sharing analysis of data sets, rather than every discovery process starting at ground zero.

Big Data Machine Learning: Patterns for Predictive Analytics

Tuesday, July 31st, 2012

Big Data Machine Learning: Patterns for Predictive Analytics by Ricky Ho.

A DZone “refcard” and as you might expect, a bit “slim” to cover predictive analytics. Still, printed in full color it would make a nice handout on predictive analytics for a general audience.

What would you add to make a “refcard” on a particular method?

Or for that matter, what would you include to make a “refcard” on popular government resources? Can you name all the fields on the campaign disclosure files? Thought not.

Predictive analytics might not have predicted the Aurora shooter

Saturday, July 28th, 2012

Predictive analytics might not have predicted the Aurora shooter by Robert L. Mitchell.

From the post:

Could aggressive data mining by law enforcement prevent heinous crimes, such as the recent mass murder in Aurora, CO., by catching killers before they can act?

The Aurora shooter certainly left a long trail of transactions. In the two months leading up to the crime he bought more than 6,000 rounds of ammunition, several guns, head-to-toe ballistic protective gear and accelerants and other chemicals used to build homemade explosives. These purchases were made from both online ecommerce sites and brick and mortar stores, and more than 50 packages were sent to his apartment, according to news reports.

Robert injects a note of sanity into recent discussions about data mining and the Aurora shooting by quoting Dean Abbott of Abbott Analytics as saying:

Much as we’d like to think we can solve the problem with technology, it turns out that there is no magic bullet. “Something like this could be valuable,” Abbott says. “I just don’t think it’s obvious that it would be fruitful.”

That would make a good movie script but not much else. (Oh, wait, there is such a movie, Minority Report.)

Predictive analytics are useful in the aggregate, but we already knew that from the Foundation Triology (or you could ask your local sociologist).

Days Five and Six of a Predictive Coding Narrative

Friday, July 27th, 2012

Days Five and Six of a Predictive Coding Narrative: Deep into the weeds and a computer mind-meld moment by Ralph Losey.

From the post:

This is my fourth in a series of narrative descriptions of an academic search project of 699,082 Enron emails and attachments. It started as a predictive coding training exercise that I created for Jackson Lewis attorneys. The goal was to find evidence concerning involuntary employee terminations. The first day of search is described in Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. The second day is described in Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane. The third and fourth days are described in Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disagree.

In this fourth installment I continue to describe what I did in days five and six of the project. In this narrative I go deep into the weeds and describe the details of multimodal search. Near the end of day six I have an affirming hybrid multimodal mind-meld moment, which I try to describe. I conclude by sharing some helpful advice I received from Joseph White, one of Kroll Ontrack’s (KO) experts on predictive coding and KO’s Inview software. Before I launch into the narrative, a brief word about vendor experts. Don’t worry, it is not going to be a commercial for my favorite vendors; more like a warning based on hard experience.

You will learn a lot about predictive analytics and e-discovery from this series of posts but the most important paragraphs I have read thus far:

When talking to the experts, be sure that you understand what they say to you, and never just nod in agreement when you do not really get it. I have been learning and working with new computer software of all kinds for over thirty years, and am not at all afraid to say that I do not understand or follow something.

Often you cannot follow because the explanation is so poor. For instance, often the words I hear from vendor tech experts are too filled with company specific jargon. If what you are being told makes no sense to you, then say so. Keep asking questions until it does. Do not be afraid of looking foolish. You need to be able to explain this. Repeat back to them what you do understand in your own words until they agree that you have got it right. Do not just be a parrot. Take the time to understand. The vendor experts will respect you for the questions, and so will your clients. It is a great way to learn, especially when it is coupled with hands-on experience.

Insisting that experts explain until you understand what is being said will help you avoid costly mistakes and make you more sympathetic to a client’s questions when you are the expert.

The technology and software will change for predictive coding will change beyond recognition in a few short years.

Demanding and giving explanations that “explain” is a skill that will last a lifetime.

Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disagree

Friday, July 27th, 2012

Days Three and Four of a Predictive Coding Narrative: Where I find that the computer is free to disagree by Ralph Losey.

From the post:

This is the third in a series of detailed descriptions of a legal search project. The project was an academic training exercise for Jackson Lewis e-discovery liaisons conducted in May and June 2012. I searched a set of 699,082 Enron emails and attachments for possible evidence pertaining to involuntary employee terminations. The first day of search is described in Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. The second day is described in Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane.

The description of day-two was short, but it was preceded by a long explanation of my review plan and search philosophy, along with a rant in favor of humanity and against over-dependence on computer intelligence. Here I will just stick to the facts of what I did in days three and four of my search using Kroll Ontrack’s (KO) Inview software.

Interesting description of where Ralph and the computer disagree on relevant/irrelevant judgement on documents.

Unless I just missed it, Ralph is only told be the software what rating a document was given, not why the software arrived at that rating. Yes?

If you knew what terms drove a particular rating, it would be interesting to “comment out” those terms in a document to see the impact on its relevance rating.

Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane

Friday, July 13th, 2012

Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane by Ralph Losey.

From the post:

Day One of the search project ended when I completed review of the initial 1,507 machine-selected documents and initiated the machine learning. I mentioned in the Day One narrative that I would explain why the sample size was that high. I will begin with that explanation and then, with the help of William Webber, go deeper into math and statistical sampling than ever before. I will also give you the big picture of my review plan and search philosophy: its hybrid and multimodal. Some search experts disagree with my philosophy. They think I do not go far enough to fully embrace machine coding. They are wrong. I will explain why and rant on in defense of humanity. Only then will I conclude with the Day Two narrative.

More than you are probably going to want to know about sample sizes and their calculation but persevere until you get to the defense of humanity stuff. It is all quite good.

If I had to add a comment on the defense of humanity rant, it would be that machines have a flat view of documents and not the richly textured one of a human reader. While true that machines can rapidly compare document without tiring, they will miss an executive referring to a secretary as his “cupcake.” A reference that would jump out at a human reader. Same text, different result.

Perhaps because in one case the text is being scanned for tokens and in the other case it is being read.

Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron

Friday, July 13th, 2012

Day One of a Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron by Ralph Losey.

The start of a series of posts on predictive coding and searching of the Enron emails by a lawyer. A legal perspective is important enough that I will be posting a note about each post in this series as they occur.

A couple of preliminary notes:

I am sure this is the first time that Ralph has used predictive encoding with the Enron emails. On the other hand, I would not take “…this is the first time for X…” sort of claims from any vendor or service organization. ;-)

You can see other examples of processing the Enron emails at:

And that is just a “lite” scan. There are numerous other projects that use the Enron email collection.

I wonder if that is because we are naturally nosey?

From the post:

This is the first in a series of narrative descriptions of a legal search project using predictive coding. Follow along while I search for evidence of involuntary employee terminations in a haystack of 699,082 Enron emails and attachments.

Joys and Risks of Being First

To the best of my knowledge, this writing project is another first. I do not think anyone has ever previously written a blow-by-blow, detailed description of a large legal search and review project of any kind, much less a predictive coding project. Experts on predictive coding speak only from a mile high perspective; never from the trenches (you can speculate why). That has been my practice here, until now, and also my practice when speaking about predictive coding on panels or in various types of conferences, workshops, and classes.

There are many good reasons for this, including the main one that lawyers cannot talk about their client’s business or information. That is why in order to do this I had to run an academic project and search and review the Enron data. Many people could do the same. In fact, each year the TREC Legal Track participants do similar search projects of Enron data. But still, no one has taken the time to describe the details of their search, not even the spacey TRECkies (sorry Jason).

A search project like this takes an enormous amount of time. In fact, to my knowledge (Maura, please correct me if I’m wrong), no Legal Track TRECkies have ever recorded and reported the time that they put into the project, although there are rumors. In my narrative I will report the amount of time that I put into the project on a day-by-day basis, and also, sometimes, on a per task basis. I am a lawyer. I live by the clock and have done so for thirty-two years. Time is important to me, even non-money time like this. There is also a not-insignificant amount of time it takes to write it up a narrative like this. I did not attempt to record that.

There is one final reason this has never been attempted before, and it is not trivial: the risks involved. Any narrator who publicly describes their search efforts assumes the risk of criticism from monday morning quarterbacks about how the sausage was made. I get that. I think I can handle the inevitable criticism. A quote that Jason R. Baron turned me on to a couple of years ago helps, the famous line from Theodore Roosevelt in his Man in the Arena speech at the Sorbonne:

It is not the critic who counts: not the man who points out how the strong man stumbles or where the doer of deeds could have done better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood, who strives valiantly, who errs and comes up short again and again, because there is no effort without error or shortcoming, but who knows the great enthusiasms, the great devotions, who spends himself for a worthy cause; who, at the best, knows, in the end, the triumph of high achievement, and who, at the worst, if he fails, at least he fails while daring greatly, so that his place shall never be with those cold and timid souls who knew neither victory nor defeat.

I know this narrative is no high achievement, but we all do what we can, and this seems within my marginal capacities.

BigMl 0.3.1 Release

Friday, July 6th, 2012

BigMl 0.3.1 Release

From the webpage:

An open source binding to BigML.io, the public BigML API

Downloads

BigML makes machine learning easy by taking care of the details required to add data-driven decisions and predictive power to your company. Unlike other machine learning services, BigML creates beautiful predictive models that can be easily understood and interacted with.

These BigML Python bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and, predictions).

There’s that phrase again, predictive models.

Don’t people read patent literature anymore? ;-) I don’t care for absurdist fiction so I tend to avoid it. People claiming invention for having a patent lawyer write common art up in legal prose. Good for patent lawyers, bad for researchers and true inventers.

Predictive Analytics World

Saturday, June 30th, 2012

Predictive Analytics World

I mention a patent on “predictive coding” and now a five (5) day conference on predictive analytics?

The power of blogging? Or self-delusion. Your call. ;-)

Seriously, if you are interested in predictive analytics, this looks like a good opportunity to learn more.

It has all the earmarks of a “vendor” conference so I predict you will be spending money but the contacts and basic information should be worth your while.

Suggestions of other predictive analytic resources that aren’t vendor posturing and useful as general introduction?

Reasoning that if it is information, then you should be using a topic map to either trail blaze or navigate it.

Predictive Coding Patented, E-Discovery World Gets Jealous

Wednesday, June 27th, 2012

Predictive Coding Patented, E-Discovery World Gets Jealous by Christopher Danzig

From the post:

The normally tepid e-discovery world felt a little extra heat of competition yesterday. Recommind, one of the larger e-discovery vendors, announced Wednesday that it was issued a patent on predictive coding (which Gabe Acevedo, writing in these pages, named the Big Legal Technology Buzzword of 2011).

In a nutshell, predictive coding is a relatively new technology that allows large chunks of document review to be automated, a.k.a. done mostly by computers, with less need for human management.

Some of Recommind’s competitors were not happy about the news. See how they responded (grumpily), and check out what Recommind’s General Counsel had to say about what this means for everyone who uses e-discovery products….

Predictive coding has received a lot of coverage recently as a new way to save buckets of money during document review (a seriously expensive endeavor, for anyone who just returned to Earth).

I am always curious why a patent or even patent number will be cited but no link to the patent given?

In case you are curious, it is patent 7,933,859, as a hyperlink.

The abstract reads:

Systems and methods for analyzing documents are provided herein. A plurality of documents and user input are received via a computing device. The user input includes hard coding of a subset of the plurality of documents, based on an identified subject or category. Instructions stored in memory are executed by a processor to generate an initial control set, analyze the initial control set to determine at least one seed set parameter, automatically code a first portion of the plurality of documents based on the initial control set and the seed set parameter associated with the identified subject or category, analyze the first portion of the plurality of documents by applying an adaptive identification cycle, and retrieve a second portion of the plurality of documents based on a result of the application of the adaptive identification cycle test on the first portion of the plurality of documents.

If that sounds familiar to you, you are not alone.

Predictive coding, developed over the last forty years, is an excellent feed into a topic map. As a matter of fact, it isn’t hard to imagine a topic map seeding and being augmented by a predictive coding process.

I also mention it as a caution that the IP in this area, as in many others, is beset by the ordinary being approved as innovation.

A topic map would be ideal to trace claims, prior art and to attach analysis to a patent. I saw several patents assigned to Recommind and some pending applications. When I have a moment I will post a listing with links to those documents.

I first saw this at Beyond Search.

Predictive Analytics: Evaluate Model Performance

Sunday, June 24th, 2012

Predictive Analytics: Evaluate Model Performance by Ricky Ho.

Ricky finishes his multi-part series on models for machine learning with the one question left hanging:

OK, so which model should I use?

In previous posts, we have discussed various machine learning techniques including linear regression with regularization, SVM, Neural network, Nearest neighbor, Naive Bayes, Decision Tree and Ensemble models. How do we pick which model is the best ? or even whether the model we pick is better than a random guess ? In this posts, we will cover how we evaluate the performance of the model and what can we do next to improve the performance.

Best guess with no model

First of all, we need to understand the goal of our evaluation. Are we trying to pick the best model ? Are we trying to quantify the improvement of each model ? Regardless of our goal, I found it is always useful to think about what the baseline should be. Usually the baseline is what is your best guess if you don’t have a model.

For classification problem, one approach is to do a random guess (with uniform probability) but a better approach is to guess the output class that has the largest proportion in the training samples. For regression problem, the best guess will be the mean of output of training samples.

Ricky walks you through the steps and code to make an evaluation of each model.

It is always better to have evidence that your choices were better than a coin flip.

Although, I am mindful of the wealth advice story in “Thinking, Fast and Slow” by Daniel Kahneman, where he was given data of investment outcomes for eight years by 28 wealth advisers. The results indicated there was no correlation between “skill” and the outcomes. Luck and not skill was being rewarded with bonuses.

The results were ignored by both management and advisers as inconsistent with their “…personal experiences from experience.” (pp. 215-216)

Do you think the same can be said of search results? Just curious.

Predictive Analytics: Generalized Linear Regression [part 3]

Sunday, June 3rd, 2012

Predictive Analytics: Generalized Linear Regression by Ricky Ho.

From the post:

In the previous 2 posts, we have covered how to visualize input data to explore strong signals as well as how to prepare input data to a form that is situation for learning. In this and subsequent posts, I’ll go through various machine learning techniques to build our predictive model.

  1. Linear regression
  2. Logistic regression
  3. Linear and Logistic regression with regularization
  4. Neural network
  5. Support Vector Machine
  6. Naive Bayes
  7. Nearest Neighbor
  8. Decision Tree
  9. Random Forest
  10. Gradient Boosted Trees

There are two general types of problems that we are interested in this discussion; Classification is about predicting a category (value that is discrete, finite with no ordering implied) while Regression is about predicting a numeric quantity (value is continuous, infinite with ordering).

For classification problem, we use the “iris” data set and predict its “species” from its “width” and “length” measures of sepals and petals. Here is how we setup our training and testing data.

Ricky walks you through linear regression, logistic regression and linear and logistic regression with regularization.

Predictive Analytics: Data Preparation [part 2]

Friday, May 18th, 2012

Predictive Analytics: Data Preparation by Ricky Ho.

From the post:

As a continuation of my last post on predictive analytics, in this post I will focus in describing how to prepare data for the training the predictive model., I will cover how to perform necessary sampling to ensure the training data is representative and fit into the machine processing capacity. Then we validate the input data and perform necessary cleanup on format error, fill-in missing values and finally transform the collected data into our defined set of input features.

Different machine learning model will have its unique requirement in its input and output data type. Therefore, we may need to perform additional transformation to fit the model requirement

Part 2 of Ricky’s posts on predictive analytics.

Predictive Analytics: Overview and Data visualization

Friday, May 18th, 2012

Predictive Analytics: Overview and Data visualization by Ricky Ho.

From the post:

I plan to start a series of blog post on predictive analytics as there is an increasing demand on applying machine learning technique to analyze large amount of raw data. This set of technique is very useful to me and I think they should be useful to other people as well. I will also going through some coding example in R. R is a statistical programming language that is very useful for performing predictive analytic tasks. In case you are not familiar with R, here is a very useful link to get some familiarity in R.

Predictive Analytics is a specialize data processing techniques focusing in solving the problem of predicting future outcome based on analyzing previous collected data. The processing cycle typically involves two phases of processing:

  1. Training phase: Learn a model from training data
  2. Predicting phase: Deploy the model to production and use that to predict the unknown or future outcome

The whole lifecycle of training involve the following steps.

Ricky has already posted part 2 but I am going to create separate entries for them. Mostly to make sure I don’t miss any of his posts.

Enjoy!

Data mining opens the door to predictive neuroscience (Google Hazing Rituals)

Tuesday, April 17th, 2012

Data mining opens the door to predictive neuroscience

From the post:

Ecole Polytechnique Fédérale de Lausanne (EPFL) researchers have discovered rules that relate the genes that a neuron switches on and off to the shape of that neuron, its electrical properties, and its location in the brain.

The discovery, using state-of-the-art computational tools, increases the likelihood that it will be possible to predict much of the fundamental structure and function of the brain without having to measure every aspect of it.

That in turn makes modeling the brain in silico — the goal of the proposed Human Brain Project — a more realistic, less Herculean, prospect.

The fulcrum of predictive analytics is finding the “basis” for prediction and within what measurement of error.

Curious how that would work in an employment situation?

Rather than Google’s intellectual hazing rituals, project a thirty-minute questionnaire on Google hires against their evaluations at six-month intervals. Give prospective hires the same questionnaire and then “up” or “down” decisions on hiring. Likely to be as accurate as the current rituals.

Updated Google Prediction API

Saturday, March 17th, 2012

Updated Google Prediction API

From the post:

Although we can’t reliably compare its future-predicting abilities to a crystal ball, the Google Prediction API unlocks a powerful mechanism to use machine learning in your applications.

The Prediction API allows developers to train their own predictive models, taking advantage of Google’s world-class machine learning algorithms. It can be used for all sorts of classification and recommendation problems from spam detection to message routing decisions. In the latest release, the Prediction API has added more detailed debugging information on trained models and a new App Engine sample, which illustrates how to use the Google Prediction API for the Java and Python runtimes.

To help App Engine developers get started with the prediction API, we’ve published an article and walkthrough detailing how to create and manage predictive models in App Engine apps with simple authentication using OAuth2 and service accounts. Check out the walkthrough and let us know what you think on the group. Happy coding!

OK, so what do I do when I leave my crystal ball at home?

Oh, that is why this is on the “cloud” I suppose. ;-)

Are you using the Google Prediction API? Would appreciate hearing from satisfied/unsatisfied users. Certainly the sort of thing that could be important in authoring/curating a topic map.

Target, Pregnancy and Predictive Analytics (parts 1 and 2)

Thursday, March 1st, 2012

Dean Abbott wrote a pair of posts on a New York Times article about Target predicting if customers are pregnant.

Target, Pregnancy and Predictive Analytics (part 1)

Target, Pregnancy and Predictive Analytics (part 2)

Read both I truly liked his conclusion that models give us the patterns in data but it is up to us to “recognize” the patterns as significant.

BTW, I do wonder what the different is between the New York Times snooping for secrets to sell newspapers versus Target to sell products? If you know, please give a shout!

Nice article on predictive analytics in insurance

Sunday, January 8th, 2012

Nice article on predictive analytics in insurance

James Taylor writes:

Patrick Sugent wrote a nice article on A Predictive Analytics Arsenal in claims magazine recently. The article is worth a read and, if this is a topic that interests you check out our white paper on next generation claims systems or the series of blog posts on decision management in insurance that I wrote after I did a webinar with Deb Smallwood (an insurance industry expert quoted in the article).

The article is nice but I thought the white paper was better. Particularly this passage:

Next generation claims systems with Decision Management focus on the decisions in the claims process. These decisions are managed as reusable assets and made widely available to all channels, processes and systems via Decision Services. A decision-centric approach enables claims feedback and experience to be integrated into the whole product life cycle and brings the company’s know-how and expertise to bear at every step in the claims process.

At the heart of this new mindset is an approach for replacing decision points with Decision Services and improving business performance by identifying the key decisions that drive value in the business and improving on those decisions by leveraging a company’s expertise, data and existing systems.

Insurers are adopting Decision Management to build next generation claims systems that improve claims processes.

In topic map lingo, “next generation claims systems” are going to treat decisions as subjects that can be identified and re-used to improve the process.

Decisions are made everyday in claims processing but, current systems don’t identify them as subjects and so re-use simply isn’t possible.

True enough the proposal in the white paper does not allow for merging of decisions identified by others, but that doesn’t look like a requirement in their case. They need to be able to identify decisions they make and feed them back into their systems.

The other thing I liked about the white paper was the recognition that hard coding decision rules by IT is a bad idea. (full stop) You can take that one to the bank.

Of course, remember what James says about changes:

Most policies and regulations are written up as requirements and then hard-coded after waiting in the IT queue, making changes slow and costly.

But he omits that hard-coding empowers IT because any changes have to come to IT for implementation.

Making changes possible by someone other than IT, will empower that someone else and diminish IT.

Who knows what and when do they get to know it is a question of power.

Topic maps and other means of documentation/disclosure, have the potential to shift balances of power in an organization.

May as well say that up front so we can start identifying the players, who will cooperate, who will resist. And experimenting with what might work as incentives to promote cooperation. Which can be measured just like you measure other processes in a business.