Jack Park points to The Forth Paradigm: Data-Intensive Scientific Discovery as a book that merits our attention.
Indeed it does! Lurking just beneath the surface of data-intensive research are questions of semantics. Diverse semantics. How does data-intensive research occur in a multi-semantic world?
Paul Ginsparg (Cornell University), in Text in a Data-centric World, has the usual genuflection towards “linked data” without stopping to consider the cost of evaluating every URI to decide if it is an identifier or a resource. Nor why adding one more name to the welter of names we have now (that is the semantic diversity problem) is going to make our lives any better?
Ginsparg writes:
Such an articulated semantic structure [linked data] facilitates simpler algorithms acting on World Wide Web text and data and is more feasible in the near term than building a layer of complex artificial intelligence to interpret free-form human ideas using some probabilistic approach.
Solving the “perfect language” problem, which has never been solved, is more feasible than “…building a layer of complex artificial intelligence to interpret free-form human ideas using some probabilistic approach” to solve it for us?
Perhaps so but one wonders why that is a useful observation?
On the “perfect language” problem, see The Search for the Perfect Language by Umberto Eco.