Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 11, 2016

4330 Data Scientists and No Data Science Renee

Filed under: Data Science,Web Scrapers — Patrick Durusau @ 4:22 pm

After I posted 1880 Big Data Influencers in CSV File, I got a tweet from Data Science Renee pointing out that her name wasn’t in the list.

Renee does a lot more on “data science” and not so much on “big data,” which sounded like a plausible explanation.

Even if “plausible,” I wanted to know if there was some issue with my scrapping of Right Relevance.

Knowing that Renee’s influence score for “data science” is 81, I set the query to scrape the list between 65 and 98, just to account for any oddities in being listed.

The search returned 1832 entries. Search for Renee, nada, no got. Here’s the 1832-data-science-list.

In an effort to scrape all the listings, which should be 10,375 influencers, I set the page delay up to Ted Cruz reading speed. Ten entries every 72,000 milliseconds. 😉

That resulted in 4330-data-science-list.

No joy, no Renee!

It isn’t clear to me why my scraping fails before recovering the entire data set but in any reasonable sort order, a listing of roughly 10K data scientists should have Renee in the first 100 entries, much less the first 1,000 or even first 4K.

Something is clearly amiss with the data but what?

Check me on the first ten entries for data science as the search term but I find:

  • Hilary Mason
  • Kirk Borne – no data science
  • Nathan Yau
  • Gregory Piatetsky – no data science
  • Randy Olson
  • Jeff Hammerbacher – no data science
  • Chris Dixon @cdixon – no data science
  • dj patil @dpatil
  • Doug Laney – no data science
  • Big Data Science no data science

The notation, “no data science,” means that entry does not have a label for data science. Odd considering that my search was specifically for influencers in “data science.” The same result obtains if you choose one of the labels instead of searching. (I tried.)

Clearly all of these people could be listed for “data science,” but if I am searching for that specific category, why is that missing from six of the first ten “hits?”

As far as Data Science Renee, I can help you with that to a degree. Follow @BecomingDataSci, or @DataSciGuide, @DataSciLearning & @NewDataSciJobs. Visit her website: http://t.co/zv9NrlxdHO. Podcasts, interviews, posts, just a hive of activity.

On the mysteries of Right Relevance and its data I’m not sure what to say. I posted feedback a week ago mentioning the issue with scraping and ordering, but haven’t heard back.

The site has a very clever idea but looking in from the outside with a sample size of 1, I’m not impressed with its delivery on that idea.

Issues I don’t know about with Web Scraper?

If you have contacts with Right Relevance could you gently ping them for me? Thanks!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress