Edd Dumbill sees Big Data as answering the semantic questions posed by the Semantic Web:
Conventionally, semantic web systems generate metadata and identified entities explicitly, ie. by hand or as the output of database values. But as anybody who’s tried to get users to do it will tell you, generating metadata is hard. This is part of why the full semantic web dream isn’t yet realized. Analytical approaches take a different approach: surfacing and classifying the metadata from analysis of the actual content and data itself. (Freely exposing metadata is also controversial and risky, as open data advocates will attest.)
Once big data techniques have been successfully applied, you have identified entities and the connections between them. If you want to join that information up to the rest of the web, or to concepts outside of your system, you need a language in which to do that. You need to organize, exchange and reason about those entities. It’s this framework that has been steadily built up over the last 15 years with the semantic web project.
To give an already widespread example: many data scientists use Wikipedia to help with entity resolution and disambiguation, using Wikipedia URLs to identify entities. This is a classic use of the most fundamental of semantic web technologies: the URI.
I am not sure where Edd gets: “Once big data techniques have been successfully applied, you have identified entities and the connections between them.” Really? Or is that something hoped for in the future? A general solution to entity extraction and discovery of relationships remains a research topic.
Big Data will worsen the semantic poverty of the Semantic Web and drive the search for tools and approaches to address that poverty.