Mining Text Data Charu Aggarwal and ChengXiang Zhai, Springer, February 2012, Approximately 500 pages.
From the publisher’s description:
Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned.
Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases.
Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.
Not at the publisher’s site but you can see the Table of Contents and chapter 4, A SURVEY OF TEXT CLUSTERING ALGORITHMS and chapter 6, A SURVEY OF TEXT CLASSIFICATION ALGORITHMS at: www.charuaggarwal.net/text-content.pdf.
The two chapters you can download from Aggarwal’s website will give you a good idea of what to expect from the text.
While an excellent survey work, with chapters written by experts in various sub-fields, it also suffers from the survey work format.
For example, for the two sample chapters, there are overlaps in the bibliographies for both chapters. Not surprising given the closely related subject matter but as a reader I would be interested in discovering that some works are cited in both chapters. Something that is possible given the back of the chapter bibliography format, only by repetitive manual inspection.
Although I rail against examples in standards, expanding the survey reference work format to include more details and examples would only increase its usefulness and possible its life as a valued reference.
Which raises the question of having a print format for survey works at all. The research landscape is changing quickly and a shelf life of 2 to 3 years, if that long, seems a bit brief for the going rate for print editions. Printed versions of chapters as smaller and more timely works on demand, that is a value-add proposition that Springer is in a unique position to bring to its customers.