Recent Publications

Athanassoulis, Manos, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. “Designing Access Methods: The RUM Conjecture.” International Conference on Extending Database Technology (EDBT),. Bordeaux, France.Abstract
The database research community has been building methods to store, access, and update data for more than four decades. Throughout the evolution of the structures and techniques used to access data, access methods adapt to the ever changing hardware and workload requirements. Today, even small changes in the workload or the hardware lead to a redesign of access methods. The need for new designs has been increasing as data generation and workload diversification grow exponentially, and hardware advances introduce increased complexity. New workload requirements are introduced by the emergence of new applications, and data is managed by large systems composed of more and more complex and heterogeneous hardware. As a result, it is increasingly important to develop application-aware and hardware-aware access methods. The fundamental challenges that every researcher, systems architect, or designer faces when designing a new access method are how to minimize, i) read times (R), ii) update cost (U), and iii) memory (or storage) overhead (M). In this paper, we conjecture that when optimizing the read-update-memory overheads, optimizing in any two areas negatively impacts the third. We present a simple model of the RUM overheads, and we articulate the RUM Conjecture. We show how the RUM Conjecture manifests in state-of-the-art access methods, and we envision a trend toward RUM-aware access methods for future data systems.
Netravali, Ravi, Ameesh Goyal, James Mickens, and Hari Balakrishnan, ed. 2016. “Polaris: Faster Page Loads Using Fine-grained Dependency Tracking.” NSDI 2016. Santa Clara, CA. Publisher's VersionAbstract

To load a web page, a browser must fetch and evaluate objects like HTML files and JavaScript source code. Evaluating an object can result in additional objects being fetched and evaluated. Thus, loading a web page requires a browser to resolve a dependency graph; this partial ordering constrains the sequence in which a browser can process individual objects. Unfortunately, many edges in a page’s dependency graph are unobservable by today’s browsers. To avoid violating these hidden dependencies, browsers make conservative assumptions about which objects to process next, leaving the network and CPU underutilized.

We provide two contributions. First, using a new measurement platform called Scout that tracks fine-grained data flows across the JavaScript heap and the DOM, we show that prior, coarse-grained dependency analyzers miss crucial edges: across a test corpus of 200 pages, prior approaches miss 30% of edges at the median, and 118% at the 95th percentile. Second, we quantify the benefits of exposing these new edges to web browsers. We introduce Polaris, a dynamic client-side scheduler that is written in JavaScript and runs on unmodified browsers; using a fully automatic compiler, servers can translate normal pages into ones that load themselves with Polaris. Polaris uses fine-grained dependency graphs to dynamically determine which objects to load, and when. Since Polaris’ graphs have no missing edges, Polaris can aggressively fetch objects in a way that minimizes network round trips. Experiments in a variety of network conditions show that Polaris decreases page load times by 34% at the median, and 59% at the 95th percentile.

Karras, P., A. Nikitin, M. Saad, R. Bhatt, D. Antyukhov, and S. Idreos. 2016. “Adaptive Indexing over Encrypted Numeric Data.” SIGMOD International Conference on Management of Data. San Francisco, CA: ACM.Abstract
Today, outsourcing query processing tasks to remote cloud servers becomes a viable option; such outsourcing calls for encrypting data stored at the server so as to render it secure against eavesdropping adversaries and/or an honest-but-curious server itself. At the same time, to be efficiently managed, outsourced data should be indexed, and even adaptively so, as a side-effect of query pro- cessing. Computationally heavy encryption schemes render such outsourcing unattractive; an alternative, Order-Preserving Encryption Scheme (OPES), intentionally preserves and reveals the order in the data, hence is unattractive from the security viewpoint. In this paper, we propose and analyze a scheme for lightweight and indexable encryption, based on linear-algebra operations. Our scheme provides higher security than OPES and allows for range and point queries to be efficiently evaluated over encrypted numeric data, with decryption performed at the client side. We implement a prototype that performs incremental, query-triggered adaptive indexing over encrypted numeric data based on this scheme, without leaking order information in advance, and without prohibitive overhead, as our extensive experimental study demonstrates.
Netravali, Ravi, Ameesh Goyal, James Mickens, and Hari Balakrishnan, ed. 2016. “Polaris: Faster Page Loads Using Fine-grained Dependency Tracking.” NSDI 2016. Santa Clara, CA. Publisher's VersionAbstract

To load a web page, a browser must fetch and evaluate objects like HTML files and JavaScript source code. Evaluating an object can result in additional objects being fetched and evaluated. Thus, loading a web page requires a browser to resolve a dependency graph; this partial ordering constrains the sequence in which a browser can process individual objects. Unfortunately, many edges in a page’s dependency graph are unobservable by today’s browsers. To avoid violating these hidden dependencies, browsers make conservative assumptions about which objects to process next, leaving the network and CPU underutilized.

We provide two contributions. First, using a new measurement platform called Scout that tracks fine-grained data flows across the JavaScript heap and the DOM, we show that prior, coarse-grained dependency analyzers miss crucial edges: across a test corpus of 200 pages, prior approaches miss 30% of edges at the median, and 118% at the 95th percentile. Second, we quantify the benefits of exposing these new edges to web browsers. We introduce Polaris, a dynamic client-side scheduler that is written in JavaScript and runs on unmodified browsers; using a fully automatic compiler, servers can translate normal pages into ones that load themselves with Polaris. Polaris uses fine-grained dependency graphs to dynamically determine which objects to load, and when. Since Polaris’ graphs have no missing edges, Polaris can aggressively fetch objects in a way that minimizes network round trips. Experiments in a variety of network conditions show that Polaris decreases page load times by 34% at the median, and 59% at the 95th percentile.

Is achieving security a hopeless quest? Seltzer, Margo, Mark Miller, David Mazières, and Yuanyuan Zhou. 2015. “Is achieving security a hopeless quest?.” SOSP ’15 SOSP History Day 2015, no. 13. New York, NY: ACM.
Margo, Daniel W., and Margo I. Seltzer. 2015. “Proceedings of the 41st International Conference on Very Large Data Bases.” Proceedings of the VLDB Endowment. Kohala Coast, Hawaii: The VLDB Endowment, 8, 1478-1489.Abstract
We present Scalable Host-tree Embeddings for Efficient Partitioning (Sheep), a distributed graph partitioning algorithm capable of handling graphs that far exceed main memory. Sheep produces high quality edge partitions an order of magnitude faster than both state of the art offline (eg, METIS) and streaming partitioners (eg, Fennel). Sheep’s partitions are independent of the input graph distribution, which means that graph elements can be assigned to processing nodes arbitrarily without affecting the partition quality.
Macko, Peter, Virendra Marathe, Daniel Margo, and Margo Seltzer. 2015. “LLAMA: Efficient graph analytics using Large Multiversioned Arrays.” 2015 IEEE 31st International Conference on Data Engineering (ICDE). Seoul, South Korea: IEEE.Abstract
We present LLAMA, a graph storage and analysis system that supports mutability and out-of-memory execution. LLAMA performs comparably to immutable main-memory analysis systems for graphs that fit in memory and significantly outperforms existing out-of-memory analysis systems for graphs that exceed main memory. LLAMA bases its implementation on the compressed sparse row (CSR) representation, which is a read-only representation commonly used for graph analytics. We augment this representation to support mutability and persistence using a novel implementation of multi-versioned array snapshots, making it ideal for applications that receive a steady stream of new data, but need to perform whole-graph analysis on consistent views of the data. We compare LLAMA to state-of-the-art systems on representative graph analysis workloads, showing that LLAMA scales well both out-of-memory and across parallel cores. Our evaluation shows that LLAMA’s mutability introduces modest overheads of 3-18% relative to immutable CSR for in-memory execution and that it outperforms state-of-the-art out-of-memory systems in most cases, with a best case improvement of 5x on breadth-first-search.
Eldridge, Schuyler, Amos Waterland, Margo Seltzer, Jonathan Appavoo, and Ajay Joshi. 2015. “Towards general-purpose neural network computing.” 2015 International Conference on Parallel Architecture and Compilation (PACT), 99–112. IEEE, 99–112.
Waterland, Amos, Elaine Angelino, Ryan P Adams, Jonathan Appavoo, and Margo Seltzer. 2014. “ASC: Automatically scalable computation.” ACM SIGPLAN Notices, 49: 575–590. ACM, 49, 4, 575–590.
Seltzer, Margo, John Arrasjid, Carolyn Rowland, Brian Noble, David Blank-Edelman, Sasha Fedorova, Niels Provos, Dan Wallach, Anne Dickison, and Casey Henderson. 2014. “USENIX Member Benefits”.
Carata, Lucian, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Seltzer, and Andy Hopper. 2014. “A primer on provenance.” Communications of the ACM 57 (5). ACM: 52–60.
More