Hilary Mason (live, data scientist) writes about Google confusing her with Hilary Mason (deceased, actress) in Et tu, Google?
To be fair, Hilary Mason (live, data scientist), notes Bing has made the same mistake in the past.
Hilary Mason (live, data scientist) goes on to say:
I know that entity disambiguation is a hard problem. I’ve worked on it, though never with the kind of resources that I imagine Google can bring to it. And yet, this is absurd!
Is entity disambiguation a hard problem?
Or is entity disambiguation a hard problem after the act of authorship?
Authors (in general) know what entities they meant.
The hard part is inferring what entity they meant when they forgot to disambiguate between possible entities.
Rather than focusing on mining low grade ore (content where entities are not disambiguated), wouldn’t a better solution be authoring with automatic entity disambiguation?
We have auto-correction in word processing software now, why not auto-entity software that tags entities in content?
Presenting the author of content with disambiguated entities for them to accept, reject or change.
Won’t solve the problem of prior content with undistinguished entities but can keep the problem from worsening.