Archive for the ‘Domain Expertise’ Category

Domain Modeling: Choose your tools

Sunday, June 28th, 2015

Kirk Borne posted to Twitter:

Great analogy by @wcukierski at #GEOINT2015 on #DataScience Domain Modeling > bulldozers: toy model versus the real thing.

cat

data-modeling

Does your tool adapt to the data? (The real bulldozer above.)

Or, do you adapt your data to the tool? (The toy bulldozer above.)

No, I’m not going there. That is like a “the best editor” flame war. You have to decide that question for yourself and your project.

Good luck!

Is Machine Learning v Domain expertise the wrong question?

Friday, April 6th, 2012

Is Machine Learning v Domain expertise the wrong question?

James Taylor writes:

KDNuggets had an interesting poll this week in which readers expressed themselves as Skeptical of Machine Learning replacing Domain Expertise. This struck me not because I disagree but because I think it is in some ways the wrong question:

  • Any given decision is made based on a combination of information, know-how and pre-cursor decisions.
  • The know-how can be based on policy, regulation, expertise, best practices or analytic insight (such as machine learning).
  • Some decisions are heavily influenced by policy and regulation (deciding if a claim is complete and valid for instance) while others are more heavily influenced by the kind of machine learning insight common in analytics (deciding if the claim is fraudulent might be largely driven by a Neural Network that determines how “normal” the claim seems to be).
  • Some decisions are driven primarily by the results of pre-cursor or dependent decisions.
  • All require access to some set of information.

I think the stronger point, the one that James closes with, is decision management needs machine learning and domain expertise, together.

And we find our choices of approaches justified by the results, “as we see them.” What more could you ask for?

The unreasonable necessity of subject experts

Monday, March 26th, 2012

The unreasonable necessity of subject experts – Experts make the leap from correct results to understood results by Mike Loukides.

From the post:

One of the highlights of the 2012 Strata California conference was the Oxford-style debate on the proposition “In data science, domain expertise is more important than machine learning skill.” If you weren’t there, Mike Driscoll’s summary is an excellent overview (full video of the debate is available here). To make the story short, the “cons” won; the audience was won over to the side that machine learning is more important. That’s not surprising, given that we’ve all experienced the unreasonable effectiveness of data. From the audience, Claudia Perlich pointed out that she won data mining competitions on breast cancer, movie reviews, and customer behavior without any prior knowledge. And Pete Warden (@petewarden) made the point that, when faced with the problem of finding “good” pictures on Facebook, he ran a data mining contest at Kaggle.

A good impromptu debate necessarily raises as many questions as it answers. Here’s the question that I was left with. The debate focused on whether domain expertise was necessary to ask the right questions, but a recent Guardian article,”The End of Theory,” asked a different but related question: Do we need theory (read: domain expertise) to understand the results, the output of our data analysis? The debate focused on a priori questions, but maybe the real value of domain expertise is a posteriori: after-the-fact reflection on the results and whether they make sense. Asking the right question is certainly important, but so is knowing whether you’ve gotten the right answer and knowing what that answer means. Neither problem is trivial, and in the real world, they’re often closely coupled. Often, the only way to know you’ve put garbage in is that you’ve gotten garbage out.

By the same token, data analysis frequently produces results that make too much sense. It yields data that merely reflects the biases of the organization doing the work. Bad sampling techniques, overfitting, cherry picking datasets, overly aggressive data cleaning, and other errors in data handling can all lead to results that are either too expected or unexpected. “Stupid Data Miner Tricks” is a hilarious send-up of the problems of data mining: It shows how to “predict” the value of the S&P index over a 10-year period based on butter production in Bangladesh, cheese production in the U.S., and the world sheep population.

An interesting post and debate. Both worth the time to read/watch.

I am not surprised the “cons” won, saying that machine learning is more important than subject expertise, but not for the reasons Mike gives.

True enough, data is said to be “unreasonably” effective, but when judged against what?

When asked, 90% of all drivers think they are better than average drivers. If I remember averages, there is something wrong with that result. 😉

The trick is, according to Daniel Kahneman, is that drivers create an imaginary average and then say they are better than that.

I wonder what “average” data is being evaluated against?