OK, the real title is: Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation by Zheng Lin, Songbo Tan, Yue Liu, Xueqi Cheng, Xueke Xu. (Lin Z, Tan S, Liu Y, Cheng X, Xu X (2013) Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation. PLoS ONE 8(11): e79294. doi:10.1371/journal.pone.0079294)
Abstract:
There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly.
I have always wondered when someone would notice the WordNet database is limited to the English language. 😉
The authors are seeking to develop “…a language-independent approach for resource-poor language,” saying:
Our approach differs from existing approaches in the following three points: first, it does not depend on rich external resources and it is language-independent. Second, our method is domain-specific since the polarity of opinion word is domain-aware. We aim to extract the domain-dependent opinion lexicon (i.e. an opinion lexicon per domain) instead of a universal opinion lexicon. Third, the most importantly, our approach can mine opinion lexicon for a target language by leveraging data and knowledge available in another language…
Our approach propagates information back and forth between source language and target language, which is called mutual-reinforcement label propagation. The mutual-reinforcement label propagation model follows a two-stage framework. At the first stage, for each language, a label propagation algorithm is applied to a large word relation graph to produce a polarity estimate for any given word. This stage solves the problem of external resource dependency, and can be easily transferred to almost any language because all we need are unlabeled data and a couple of seed words. At the second stage, a bilingual dictionary is introduced as a bridge between source and target languages to start a bootstrapping process. Initially, information about the source language can be utilized to improve the polarity assignment in target language. In turn, the updated information of target language can be utilized to improve the polarity assignment in source language as well.
…
Two points of particular interest:
- The authors focus on creating domain specific lexicons and don’t attempt to boil the ocean. Useful semantic results will arrive sooner if you avoid attempts at universal solutions.
- English speakers are a large market, but the target of this exercise is the #1 language of the world, Mandarin Chinese.
Taking the numbers for English speakers at face value, approximately 0.8 billion speakers, with a world population of 7.125 billion, that leaves 6.3 billion potential customers.
You’ve heard what they say: A billion potential customers here and a billion potential customers there, pretty soon you are talking about a real market opportunity. (The original quote misattributed to Sen. Everett Dirksen.)