Encyclopedia of Database Systems is a massive reference work on database systems, numbering some 3752 pages and being 8 inches thick (20.3 centimeters).
My first impression was favorable, particularly since entries included synonyms for entries, historical background materials, cross-references and recommended reading. All the things that I appreciate in a reference work.
The entry for record linkage was disappointing in several respects.
It focuses on statistical disclosure control (SDC), a current use of record linkage, but hardly the range of record linkage uses. For a more accurate account of record linkage see William Winkler’s Overview of Record Linkage and Current Research Directions.
Only two synonyms were given for record linkage, Record Matching and Re-identification. No mention of entity heterogeneity, list washing, entity reconciliation, co-reference resolution, etc.
The “synonyms” under Record Matching (the main article for record linkage) point back to the article Record Matching. Multiple terms that point to the main entry are useful. But to have the main entry point to terms that only point back to it waste a reader’s time.
There was a quality control problem in terms of currency of cited research. For William Winkler, one of the leading researchers on record linkage, the most recent citation under Record Matching dates from 1999. Which omits Record Linkage References (Winkler, 2008), Overview of Record Linkage for Name Matching (Winkler, 2008), and, Overview of Record Linkage and Current Research Directions (Winkler, 2006).
My question becomes: What is missing from entries where I lack the familiarity to notice the loss?
Resources that are online is should have hyperlinks. Under the record linkage, Winkler’s 1999 The state of record linkage and current research problems is listed but without any link to the online version. Most of the cited resources are available either from commercial publishers (like the publisher of this tome) or freely online. Hyperlinks would be a value-add to readers.
The 10,696 bibliographic entries are scattered across 3752 pages. In addition to listing the bibliographic entries with each entry (as hyperlinks when possible), there should be comprehensive bibliography for the work. Such hyperlinks could be the basis for a cited-by value-add feature.
With clever use of the subject listings and more complete synonym lists, another value-add would be to provide readers with a dynamic “latest” research on each subject listing.
This review was of the electronic version, which was delivered as a series of separate PDF files. Which quite naturally means that the hyperlinks entries that occur in different sections, do not work. Defeats part of the utility of having an electronic version, at least in my view.
To their credit, Springer has made the subject listing for this work available in XML. Perhaps some enterprising graduate student will use that as a basis for a “latest” research listing.
I will be doing a more systematic review but stumbled across the entry for the W3C. The synonym for W3C is not World Wide Web consortium. Note the lowercase “consortium.” Rather, World Wide Web Consortium. And “Recommended Reading” for that entry, “W3C. Available at: http://www.w3.org” reinforces my point about quality control on references.
This is a very expensive work but I have no objection to commercial publishing, even expensive commercial publishing. I do have an expectation that I will find quality, innovation and value-add as the result of commercial publishing. So far, that expectation has been disappointed in this case.
PS: Every time an author’s name appears either for an entry or a cited work, there should be a hyperlink to the author’s entry in DBLP. That gives a reader access to a constantly updated bibliography of the author’s publications. Another value-add.