Overview | News | Publications | Members | File Systems | PQL | Internal | Workshop
Provenance (also known as pedigree or lineage) refers to the complete history of a document. In the scientific community, provenance refers to the information that describes data in sufficient detail to facilitate reproduction and enable validation of results. In the archival community, provenance refers to the chain of ownership and the transformations a document has undergone. However, in most computer systems today, provenance is an after-thought, implemented as an auxiliary indexing structure parallel to the actual data.
Provenance, however, is merely a particular type of meta-data. The operating system should be responsible for the collection of provenance and the storage system should be responsible for its management. We define a new class of storage system, called a provenance-aware storage system (PASS), that supports the automatic collection and maintenance of provenance. A PASS collects provenance as new objects are created in the system and maintains that provenance just as it maintains conventional file system meta-data. A PASS, in addition to collecting and maintaining provenance, also supports queries upon the provenance.
We have implemented 2 PASS prototypes. The latest prototype (v2) has been implemented on Linux 2.6.23.17. The new features in v2 are:
- In addition to provenance collected by the system, applications can record application specific provenance.
- v2 architecture is Network enabled and we can collect provenance across NFS.
- We also have designed and built a recovery scheme (that we call Write Ahead Provenance).
- We have designed a new query language for provenance (called PQL, see our IPAW 08 paper and poster).
- We are working on a provenance security model (we plan to publish a paper on this topic soon!)
The v1 prototype was implemented on Linux 2.4.29. The v1 prototype recorded relevant system activity and stored it persistently in an in-kernel database.
If you are interested in running a version of our system, please send email.
News
Version 0.4.1 of the PQL query engine has been released.
Thank you to IBM & NetApp who have made the workshops possible.
Publications
- Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer.
Provenance for the cloud. (PDF)
8th USENIX Conference on File and Storage Technologies (FAST '10), February 2010. - Kiran-Kumar Muniswamy-Reddy and Margo Seltzer.
Provenance as First-Class Cloud Data. (PDF)
3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS'09), October 2009. - Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.
Layering in Provenance Systems. (PDF)
In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009. - Dan Margo and Margo Seltzer
The Case for Browser Provenance. (PDF)
1st Workshop on the Theory and Practice of Provenance (TaPP'09), February 2009. - Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer
Making a Cloud Provenance-Aware. (PDF)
1st Workshop on the Theory and Practice of Provenance (TaPP'09), February 2009. - Kiran-Kumar Muniswamy-Reddy and David A. Holland
Causality Based Versioning. (PDF)
7th USENIX Conference on File and Storage Technologies (FAST '09), February 2009.
Selected as a top paper and forwarded to Nov. 2009 issue on Transactions on Storage (TOS). - Kiran-Kumar Muniswamy-Reddy, Joseph Barillari, Uri Braun, David A. Holland, Diana Maclean, Margo Seltzer, and Stephen D. Holland.
Layering in Provenance-Aware Storage Systems. (PDF)
Harvard University Computer Science Technical Report TR-04-08. - Uri Braun, Avraham Shinnar, and Margo Seltzer.
Securing Provenance., (PDF, HTML)
In Proceedings of the 3rd USENIX Workshop on Hot Topics in Security (HotSec), San Jose, CA, July 2008. - David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
Choosing a Data Model and Query Language for Provenance. (PDF)
In proceedings of the 2nd International Provenance and Annotation Workshop, Salt Lake City, UT, Jun 2008. - David A. Holland, Margo I. Seltzer, Uri Braun, and Kiran-Kumar Muniswamy-Reddy.
PASSing the provenance challenge. (PDF)
In Concurrency and Control: Practice and Experience: 2008;20:531-540. - Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer.
Provenance-Aware Storage Systems. (PDF)
In proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, June 2006. - Uri Braun, Simson Garfinkel, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Margo Seltzer.
Issues in Automatic Provenance Collection. (PDF)
In proceedings of the 2006 International Provenance and Annotation Workshop, Chicago, IL, May 2006. - Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
Provenance-Aware Storage Systems. (PDF)
Harvard University Computer Science Technical Report TR-18-05, July 2005 - Jonathan Ledlie, Chaki Ng, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Uri Braun, and Margo Seltzer.
Provenance-Aware Sensor Data Storage. (PDF, HTML)
In Proceedings of NetDB 2005, Tokyo, Japan, April 2005.
Talks
- Slides (PPS) from the IPAW Provenance Challenge workshop, September 2006
- Talk slides (PPS) from the 2006 International Provenance and Annotation Workshop, May 2006
- Project Overview: April 2005 (PDF, PPS)
Posters
- David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
Choosing a Data Model and Query Language for Provenance. (pdf)
2nd International Provenance and Annotation Workshop (IPAW'08), June 2008. - Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
Provenance-Aware Storage Systems. (pdf)
Harvard Industrial Partnership (HIP) 2005, October 2005.
Current Members
- Prof. Margo Seltzer
- Elaine Angelino
- Uri Braun
- David Holland
- Peter Macko
- Daniel Margo
- Nicholas Murphy
- Robin Smogor
Alumni
- Jonathan Ledlie
- Diana Maclean
- Kiran-Kumar Muniswamy-Reddy
- Chaki Ng
Sponsors
- Source of Support: NSF
CSR-PDOS (CNS-0614784)
Proposal Number: 0614784
Title: CSR---PDOS: Support for Atomic Sequences of File System Operations
Funded amount: $561,727
Period: 09/01/06 -- 08/31/09
Location: SUNY at Stony Brook
SubContract: Harvard University
Months: Cal:0; Acad:0; Sumr:1(?)