How to Track Your Data: Rule-Based Data Provenance Tracing Algorithms by Zhang, Qing Olive; Ko, Ryan K L; Kirchberg, Markus; Suen, Chun-Hui; Jagadpramana, Peter; Lee, Bu Sung.
Abstract:
As cloud computing and virtualization technologies become mainstream, the need to be able to track data has grown in importance. Having the ability to track data from its creation to its current state or its end state will enable the full transparency and accountability in cloud computing environments. In this paper, we showcase a novel technique for tracking end-to-end data provenance, a meta-data describing the derivation history of data. This breakthrough is crucial as it enhances trust and security for complex computer systems and communication networks. By analyzing and utilizing provenance, it is possible to detect various data leakage threats and alert data administrators and owners; thereby addressing the increasing needs of trust and security for customers’ data. We also present our rule-based data provenance tracing algorithms, which trace data provenance to detect actual operations that have been performed on files, especially those under the threat of leaking customers’ data. We implemented the cloud data provenance algorithms into an existing software with a rule correlation engine, show the performance of the algorithms in detecting various data leakage threats, and discuss technically its capabilities and limitations.
Interesting work but data provenance isn’t solely a cloud computing, virtualization issue.
Consider the ongoing complaints in Washington, D.C. on who leaked what to who and why?
All posturing to one side, that is a data provenance and subject identity based issue.
The sort of thing where a topic map application could excel.