Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 28, 2015

Domain Modeling: Choose your tools

Filed under: Data Models,Domain Change,Domain Driven Design,Domain Expertise — Patrick Durusau @ 4:20 pm

Kirk Borne posted to Twitter:

Great analogy by @wcukierski at #GEOINT2015 on #DataScience Domain Modeling > bulldozers: toy model versus the real thing.

cat

data-modeling

Does your tool adapt to the data? (The real bulldozer above.)

Or, do you adapt your data to the tool? (The toy bulldozer above.)

No, I’m not going there. That is like a “the best editor” flame war. You have to decide that question for yourself and your project.

Good luck!

April 6, 2012

Is Machine Learning v Domain expertise the wrong question?

Filed under: Domain Expertise,Machine Learning — Patrick Durusau @ 6:48 pm

Is Machine Learning v Domain expertise the wrong question?

James Taylor writes:

KDNuggets had an interesting poll this week in which readers expressed themselves as Skeptical of Machine Learning replacing Domain Expertise. This struck me not because I disagree but because I think it is in some ways the wrong question:

  • Any given decision is made based on a combination of information, know-how and pre-cursor decisions.
  • The know-how can be based on policy, regulation, expertise, best practices or analytic insight (such as machine learning).
  • Some decisions are heavily influenced by policy and regulation (deciding if a claim is complete and valid for instance) while others are more heavily influenced by the kind of machine learning insight common in analytics (deciding if the claim is fraudulent might be largely driven by a Neural Network that determines how “normal” the claim seems to be).
  • Some decisions are driven primarily by the results of pre-cursor or dependent decisions.
  • All require access to some set of information.

I think the stronger point, the one that James closes with, is decision management needs machine learning and domain expertise, together.

And we find our choices of approaches justified by the results, “as we see them.” What more could you ask for?

March 26, 2012

The unreasonable necessity of subject experts

Filed under: Data Mining,Domain Expertise,Subject Experts — Patrick Durusau @ 6:40 pm

The unreasonable necessity of subject experts – Experts make the leap from correct results to understood results by Mike Loukides.

From the post:

One of the highlights of the 2012 Strata California conference was the Oxford-style debate on the proposition “In data science, domain expertise is more important than machine learning skill.” If you weren’t there, Mike Driscoll’s summary is an excellent overview (full video of the debate is available here). To make the story short, the “cons” won; the audience was won over to the side that machine learning is more important. That’s not surprising, given that we’ve all experienced the unreasonable effectiveness of data. From the audience, Claudia Perlich pointed out that she won data mining competitions on breast cancer, movie reviews, and customer behavior without any prior knowledge. And Pete Warden (@petewarden) made the point that, when faced with the problem of finding “good” pictures on Facebook, he ran a data mining contest at Kaggle.

A good impromptu debate necessarily raises as many questions as it answers. Here’s the question that I was left with. The debate focused on whether domain expertise was necessary to ask the right questions, but a recent Guardian article,”The End of Theory,” asked a different but related question: Do we need theory (read: domain expertise) to understand the results, the output of our data analysis? The debate focused on a priori questions, but maybe the real value of domain expertise is a posteriori: after-the-fact reflection on the results and whether they make sense. Asking the right question is certainly important, but so is knowing whether you’ve gotten the right answer and knowing what that answer means. Neither problem is trivial, and in the real world, they’re often closely coupled. Often, the only way to know you’ve put garbage in is that you’ve gotten garbage out.

By the same token, data analysis frequently produces results that make too much sense. It yields data that merely reflects the biases of the organization doing the work. Bad sampling techniques, overfitting, cherry picking datasets, overly aggressive data cleaning, and other errors in data handling can all lead to results that are either too expected or unexpected. “Stupid Data Miner Tricks” is a hilarious send-up of the problems of data mining: It shows how to “predict” the value of the S&P index over a 10-year period based on butter production in Bangladesh, cheese production in the U.S., and the world sheep population.

An interesting post and debate. Both worth the time to read/watch.

I am not surprised the “cons” won, saying that machine learning is more important than subject expertise, but not for the reasons Mike gives.

True enough, data is said to be “unreasonably” effective, but when judged against what?

When asked, 90% of all drivers think they are better than average drivers. If I remember averages, there is something wrong with that result. 😉

The trick is, according to Daniel Kahneman, is that drivers create an imaginary average and then say they are better than that.

I wonder what “average” data is being evaluated against?

Powered by WordPress