Integrating Structured and Unstructured Data by David Loshin.
It’s a checklist report but David comes up with useful commentary on the following seven points:
- Document clearly defined business use cases.
- Employ collaborative tools for the analysis, use, and management of semantic metadata.
- Use pattern-based analysis tools for unstructured text.
- Build upon methods to derive meaning from content, context, and concept.
- Leverage commodity components for performance and scalability.
- Manage the data life cycle.
- Develop a flexible data architecture.
It’s not going to save you planning time but may keep you from overlooking important issues.
My only quibble is that David doesn’t call out data structures as needing defined and preserved semantics.
Data is a no brainer but the containers of data, dare I say “Hadoop silos,” need to have semantics defined as well.
Data or data containers without defined and preserved semantics are much more costly in the long run.
Both in lost opportunity costs and after the fact integration costs.