There’s More Than One Kind Of Reddit Comment?

‘Sarcasm detection on Reddit comments’

Contest ends: 15th of February, 2016.

From the webpage:

Sentiment analysis is a fairly well-developed field, but on the Internet, people often don’t say exactly what they mean. One of the toughest modes of communication for both people and machines to identify is sarcasm. Sarcastic statements often sound positive if interpreted literally, but through context and other cues the speaker indicates that they mean the opposite of what they say. In English, sarcasm is primarily communicated through verbal cues, meaning that it is difficult, even for native speakers, to determine it in text.

Sarcasm detection is a subtask of opinion mining. It aims at correctly identifying the user opinions expressed in the written text. Sarcasm detection plays a critical role in sentiment analysis by correctly identifying sarcastic sentences which can incorrectly flip the polarity of the sentence otherwise. Understanding sarcasm, which is often a difficult task even for humans, is a challenging task for machines. Common approaches for sarcasm detection are based on machine learning classifiers trained on simple lexical or dictionary based features. To date, some research in sarcasm detection has been done on collections of tweets from Twitter, and reviews on For this task, we are interested in looking at a more conversational medium—comments on Reddit—in order to develop an algorithm that can use the context of the surrounding text to help determine whether a specific comment is sarcastic or not.

The premise of this competition is there is more than one kind of comment on Reddit, aside from sarcasm.

A surprising assumption I know but there you have it.

I wonder if participants will have to separate sarcastic + sexist, sarcastic + misogynistic, sarcastic + racist, sarcastic + abusive, into separate categories or will all sarcastic comments be classified as sarcasm?

I suppose the default case would be to assume all Reddit comments are some form of sarcasm and see how accurate that model proves to be when judged against the results of the competition.

Training data for sarcasm? Pointers anyone?

Comments are closed.