600 websites about R by Laetitia Van Cauwenberge.
From the post:
Anyone interested in categorizing them? It could be an interesting data science project, scraping these websites, extracting keywords, and categorizing them with a simple indexation or tagging algorithm. For instance, some of these blogs cater about stats, or Bayesian stats, or R libraries, or R training, or visualization, or anything else. This indexation technique was used here to classify 2,500 data science websites. For web crawling tutorials, click here or here.
BTW, Laetitia lists, with links, all 600 R sites.
How many of those R sites will you visit?
Or will you scan the list for your site or your favorite R site?
For that matter, how duplicated content are you going to find at those R sites?
All have some unique content, but neither an index nor classification will help you find unique content.
Thinking of this as a potential data science experiment, we have a list of 600 sites with content related to R.
What would be your next step towards avoiding duplicated content?
By what criteria would you judge “success” in avoiding duplicate content?