Archive for the ‘Discourse’ Category

Shallow Discourse Parsing

Monday, January 5th, 2015

Shallow Discourse Parsing

From the webpage:

A participant system is given a piece of newswire text as input and returns discourse relations in the form of a discourse connective (explicit or implicit) taking two arguments (which can be clauses, sentences, or multi-sentence segments). Specifically, the participant system needs to i) locate both explicit (e.g., “because”, “however”, “and”) and implicit discourse connectives (often signaled by periods) in the text, ii) identify the spans of text that serve as the two arguments for each discourse connective, and iii) predict the sense of the discourse connectives (e.g., “Cause”, “Condition”, “Contrast”). Understanding such discourse relations is clearly an important part of natural language understanding that benefits a wide range of natural language applications.

Important Dates

  • January 26, 2015: registration begins, and release of training set and scorer
  • March 1, 2015: Registration deadline.
  • April 20, 2015: Test set available.
  • April 24, 2015: Systems collected.
  • May 1, 2015: System results due to participants
  • May 8, 2015: System papers due.
  • May 18, 2015: Reviews due.
  • May 21, 2015: notification of acceptance.
  • May 28, 2015: camera-ready version of system papers due.
  • July 30-31, 2015. CoNLL conference (Beijing China).

You have to admire the ambiguity of the title.

Does it mean the parsing of shallow discourse (my first bet) or does it mean shallow parsing of discourse (my unlikely)?

What do you think?

With the recent advances in deep learning, I am curious if the Turing test could be passed by training an algorithm on sitcom dialogue over the last two or three years?

Would you use regular TV viewers as part of the test or use people who rarely watch TV? Could make a difference in the outcome of the test.

I first saw this in a tweet by Jason Baldridge.

Exploiting Discourse Analysis…

Wednesday, October 16th, 2013

Exploiting Discourse Analysis for Article-Wide Temporal Classification by Jun-Ping Ng, Min-Yen Kan, Ziheng Lin, Wei Feng, Bin Chen, Jian Su, Chew-Lim Tan.


In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%

Cutting edge of discourse analysis, which should be interesting if you are automatically populating topic maps based upon textual analysis.

It won’t be perfect, but even human editors are not perfect. (Or so rumor has it.)

A robust topic map system should accept, track and if approved, apply user submitted corrections and changes.

Cliff Bleszinski’s Game Developer Flashcards

Tuesday, August 21st, 2012

Cliff Bleszinski’s Game Developer Flashcards by Cliff Bleszinski.

From the post:

As of this summer, I’ll have been making games for 20 years professionally. I’ve led the design on character mascot platform games, first-person shooters, single-player campaigns, multiplayer experiences, and much more. I’ve worked with some of the most amazing programmers, artists, animators, writers, and producers around. Throughout this time period, I’ve noticed patterns in how we, as creative professionals, tend to communicate.

I’ve learned that while developers are incredibly intelligent, they can sometimes be a bit insecure about how smart they are compared to their peers. I’ve seen developer message boards tear apart billion-dollar franchises, indie darlings, and everything in between by overanalyzing and nitpicking. We always want to prove that we thought of an idea before anyone else, or we will cite a case in which an idea has been attempted, succeeded, failed, or been played out.

In short, this article identifies communication techniques that are often used in discussions, arguments, and debates among game developers in order to “win” said conversations.

Written in a “game development” context but I think you can recognize some of these patterns in standards work, ontology development and other areas as well.

I did not transpose/translate it into standards lingo, reasoning that it would be easier to see the mote in someone else’s eye than the plank in our own. 😉

Only partially in jest.

Listening to others is hard, listening to ourselves (for patterns like these), is even harder.

I first saw this at: Nat Turkington’s Four short links: 21 August 2012.

Algorithm estimates who’s in control

Wednesday, January 4th, 2012

Algorithm estimates who’s in control

John Kleinberg, whose work influenced Google’s PageRank, is working on ranking something else. Kelinberg et al. developed an algorithm that ranks people, based on how they speak to each other.

This on the heels of the Big Brother’s Name is… has to have you wondering if you even want Internet access at all. 😉

Just imagine, power (who has, who doesn’t) analysis of email discussion lists, wiki edits, email archives, transcripts.

This has the potential (along with other clever analysis) to identify and populate topic maps with some very interesting subjects.

I first saw this at FlowingData

Detecting Structure in Scholarly Discourse

Saturday, December 3rd, 2011

Detecting Structure in Scholarly Discourse (DSSD2012)

Important Dates:

March 11, 2012 Submission Deadline
April 15, 2012 Notification of acceptance
April 30, 2012 Camera-ready papers due
July 12 or 13, 2012 Workshop

From the Call for Papers:

The detection of discourse structure in scientific documents is important for a number of tasks, including biocuration efforts, text summarization, error correction, information extraction and the creation of enriched formats for scientific publishing. Currently, many parallel efforts exist to detect a range of discourse elements at different levels of granularity and for different purposes. Discourse elements detected include the statement of facts, claims and hypotheses, the identification of methods and protocols, and as the differentiation between new and existing work. In medical texts, efforts are underway to automatically identify prescription and treatment guidelines, patient characteristics, and to annotate research data. Ambitious long-term goals include the modeling of argumentation and rhetorical structure and more recently narrative structure, by recognizing ‘motifs’ inspired by folktale analysis.

A rich variety of feature classes is used to identify discourse elements, including verb tense/mood/voice, semantic verb class, speculative language or negation, various classes of stance markers, text-structural components, or the location of references. These features are motivated by linguistic inquiry into the detection of subjectivity, opinion, entailment, inference, but also author stance and author disagreement, motif and focus.

Several workshops have been focused on the detection of some of these features in scientific text, such as speculation and negation in the 2010 workshop on Negation and Speculation in Natural Language Processing and the BioNLP’09 Shared Task, and hedging in the CoNLL-2010 Shared Task Learning to detect hedges and their scope in natural language textM. Other efforts that have included a clear focus on scientific discourse annotation include STIL2011 and Force11, the Future of Research Communications and e-Science. There have been several efforts to produce large-scale corpora in this field, such as BioScope, where negation and speculation information were annotated, and the GENIA Event corpus.

The goal of the 2012 workshop Detecting Structure in Scholarly Discourse is to discuss and compare the techniques and principles applied in these various approaches, to consider ways in which they can complement each other, and to initiate collaborations to develop standards for annotating appropriate levels of discourse, with enhanced accuracy and usefulness.

This conference is being held in conjunction with ACL 2012.