From the website:
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
A number of resources for Flume.
Will “data flows” as the dominant means of accessing data be a consequence of an environment where a “local copy” of data is no longer meaningful or an enabler of such an environment? Or both?
I think topic maps would do well to develop models for streaming and perhaps probabilistic merging or even probabilistic creation of topics/associations from data streams. Static structures only give the appearance of certainty.