Lars Marius Garshol walks through finding duplicate records in data records.
As Lars notes, there are commercial products for this same task but I think this is a useful exercise.
Isn’t that hard to imagine the creation of test data sets with a variety of conditions to underscore lessons about detecting duplicate records.
I suspect such training data may already be available.
Will have to see what I can find and post about it.
PS: Lars is primary editor of the TMDM, working on TMCL and several other parts of the topic maps standard.