Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 4, 2011

Corpus of Erotica Stories

Filed under: Erotica,Text Corpus — Patrick Durusau @ 8:18 pm

Corpus of Erotica Stories from InfoChimps.

From the webpage:

Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet and later Usenet, these stories achieved wider and wider distribution. Unfortunately, the nature of erotica is that it is often uncredited, undated, and hard to fix in time. As a result, you might be looking at stories much older or much newer than you might think.

Well, you have been looking for an interesting text for NLP and machine learning. Here’s your chance.

The subjects just abound.

One imagines the same could be done with an appropriate Twitter stream and writing it to a file.

Powered by WordPress