Publications

2016
Whitehill, Jacob, and Margo Seltzer. 2016. “A Crowdsourcing Approach to Collecting 399 Tutorial Videos on Logarithms.” arXiv preprint arXiv:1606.09610.
Yang, Hongyu, Cynthia Rudin, and Margo Seltzer. 2016. “Scalable Bayesian Rule Lists.” arXiv preprint arXiv:1602.08610.
Athanassoulis, Manos, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. “Designing Access Methods: The RUM Conjecture.” International Conference on Extending Database Technology (EDBT),. Bordeaux, France. Abstract
The database research community has been building methods to store, access, and update data for more than four decades. Throughout the evolution of the structures and techniques used to access data, access methods adapt to the ever changing hardware and workload requirements. Today, even small changes in the workload or the hardware lead to a redesign of access methods. The need for new designs has been increasing as data generation and workload diversification grow exponentially, and hardware advances introduce increased complexity. New workload requirements are introduced by the emergence of new applications, and data is managed by large systems composed of more and more complex and heterogeneous hardware. As a result, it is increasingly important to develop application-aware and hardware-aware access methods. The fundamental challenges that every researcher, systems architect, or designer faces when designing a new access method are how to minimize, i) read times (R), ii) update cost (U), and iii) memory (or storage) overhead (M). In this paper, we conjecture that when optimizing the read-update-memory overheads, optimizing in any two areas negatively impacts the third. We present a simple model of the RUM overheads, and we articulate the RUM Conjecture. We show how the RUM Conjecture manifests in state-of-the-art access methods, and we envision a trend toward RUM-aware access methods for future data systems.
Netravali, Ravi, Ameesh Goyal, James Mickens, and Hari Balakrishnan, ed. 2016. “Polaris: Faster Page Loads Using Fine-grained Dependency Tracking.” NSDI 2016. Santa Clara, CA. Publisher's Version Abstract

To load a web page, a browser must fetch and evaluate objects like HTML files and JavaScript source code. Evaluating an object can result in additional objects being fetched and evaluated. Thus, loading a web page requires a browser to resolve a dependency graph; this partial ordering constrains the sequence in which a browser can process individual objects. Unfortunately, many edges in a page’s dependency graph are unobservable by today’s browsers. To avoid violating these hidden dependencies, browsers make conservative assumptions about which objects to process next, leaving the network and CPU underutilized.

We provide two contributions. First, using a new measurement platform called Scout that tracks fine-grained data flows across the JavaScript heap and the DOM, we show that prior, coarse-grained dependency analyzers miss crucial edges: across a test corpus of 200 pages, prior approaches miss 30% of edges at the median, and 118% at the 95th percentile. Second, we quantify the benefits of exposing these new edges to web browsers. We introduce Polaris, a dynamic client-side scheduler that is written in JavaScript and runs on unmodified browsers; using a fully automatic compiler, servers can translate normal pages into ones that load themselves with Polaris. Polaris uses fine-grained dependency graphs to dynamically determine which objects to load, and when. Since Polaris’ graphs have no missing edges, Polaris can aggressively fetch objects in a way that minimizes network round trips. Experiments in a variety of network conditions show that Polaris decreases page load times by 34% at the median, and 59% at the 95th percentile.

Karras, P., A. Nikitin, M. Saad, R. Bhatt, D. Antyukhov, and S. Idreos. 2016. “Adaptive Indexing over Encrypted Numeric Data.” SIGMOD International Conference on Management of Data. San Francisco, CA: ACM. Abstract
Today, outsourcing query processing tasks to remote cloud servers becomes a viable option; such outsourcing calls for encrypting data stored at the server so as to render it secure against eavesdropping adversaries and/or an honest-but-curious server itself. At the same time, to be efficiently managed, outsourced data should be indexed, and even adaptively so, as a side-effect of query pro- cessing. Computationally heavy encryption schemes render such outsourcing unattractive; an alternative, Order-Preserving Encryption Scheme (OPES), intentionally preserves and reveals the order in the data, hence is unattractive from the security viewpoint. In this paper, we propose and analyze a scheme for lightweight and indexable encryption, based on linear-algebra operations. Our scheme provides higher security than OPES and allows for range and point queries to be efficiently evaluated over encrypted numeric data, with decryption performed at the client side. We implement a prototype that performs incremental, query-triggered adaptive indexing over encrypted numeric data based on this scheme, without leaking order information in advance, and without prohibitive overhead, as our extensive experimental study demonstrates.
Netravali, Ravi, Ameesh Goyal, James Mickens, and Hari Balakrishnan, ed. 2016. “Polaris: Faster Page Loads Using Fine-grained Dependency Tracking.” NSDI 2016. Santa Clara, CA. Publisher's Version Abstract

To load a web page, a browser must fetch and evaluate objects like HTML files and JavaScript source code. Evaluating an object can result in additional objects being fetched and evaluated. Thus, loading a web page requires a browser to resolve a dependency graph; this partial ordering constrains the sequence in which a browser can process individual objects. Unfortunately, many edges in a page’s dependency graph are unobservable by today’s browsers. To avoid violating these hidden dependencies, browsers make conservative assumptions about which objects to process next, leaving the network and CPU underutilized.

We provide two contributions. First, using a new measurement platform called Scout that tracks fine-grained data flows across the JavaScript heap and the DOM, we show that prior, coarse-grained dependency analyzers miss crucial edges: across a test corpus of 200 pages, prior approaches miss 30% of edges at the median, and 118% at the 95th percentile. Second, we quantify the benefits of exposing these new edges to web browsers. We introduce Polaris, a dynamic client-side scheduler that is written in JavaScript and runs on unmodified browsers; using a fully automatic compiler, servers can translate normal pages into ones that load themselves with Polaris. Polaris uses fine-grained dependency graphs to dynamically determine which objects to load, and when. Since Polaris’ graphs have no missing edges, Polaris can aggressively fetch objects in a way that minimizes network round trips. Experiments in a variety of network conditions show that Polaris decreases page load times by 34% at the median, and 59% at the 95th percentile.

2015
Seltzer, Margo. 2015. “Automatically Scalable Computation.” Proceedings of the 29th ACM on International Conference on Supercomputing, 283–283. ACM.
Margo, Daniel, and Margo Seltzer. 2015. “A scalable distributed graph partitioner.” Proceedings of the VLDB Endowment 8 (12). VLDB Endowment: 1478–1489.
Eldridge, Schuyler, Amos Waterland, Margo Seltzer, Jonathan Appavoo, and Ajay Joshi. 2015. “Towards general-purpose neural network computing.” 2015 International Conference on Parallel Architecture and Compilation (PACT), 99–112. IEEE.
Balakrishnan, Nikilesh, Thomas Bytheway, Lucian Carata, Oliver RA Chick, James Snee, Sherif Akoush, Ripduman Sohan, Margo Seltzer, and Andy Hopper. 2015. “Recent advances in computer architecture: the opportunities and challenges for provenance.” 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15).
Macko, Peter, Virendra Marathe, Daniel Margo, and Margo Seltzer. 2015. “LLAMA: Efficient graph analytics using Large Multiversioned Arrays.” 2015 IEEE 31st International Conference on Data Engineering (ICDE). Seoul, South Korea: IEEE. Abstract
We present LLAMA, a graph storage and analysis system that supports mutability and out-of-memory execution. LLAMA performs comparably to immutable main-memory analysis systems for graphs that fit in memory and significantly outperforms existing out-of-memory analysis systems for graphs that exceed main memory. LLAMA bases its implementation on the compressed sparse row (CSR) representation, which is a read-only representation commonly used for graph analytics. We augment this representation to support mutability and persistence using a novel implementation of multi-versioned array snapshots, making it ideal for applications that receive a steady stream of new data, but need to perform whole-graph analysis on consistent views of the data. We compare LLAMA to state-of-the-art systems on representative graph analysis workloads, showing that LLAMA scales well both out-of-memory and across parallel cores. Our evaluation shows that LLAMA’s mutability introduces modest overheads of 3-18% relative to immutable CSR for in-memory execution and that it outperforms state-of-the-art out-of-memory systems in most cases, with a best case improvement of 5x on breadth-first-search.
Margo, Daniel W., and Margo I. Seltzer. 2015. “A Scalable Distributed Graph Partitioner .” Proceedings of the VLDB Endowment. Kohala Coast, Hawaii: The VLDB Endowment, 8, 1478-1489. Abstract

We present Scalable Host-tree Embeddings for Efficient Partitioning (Sheep), a distributed graph partitioning algorithm capable of handling graphs that far exceed main memory. Sheep produces high quality edge partitions an order of magnitude faster than both state of the art offline (eg, METIS) and streaming partitioners (eg, Fennel). Sheep’s partitions are independent of the input graph distribution, which means that graph elements can be assigned to processing nodes arbitrarily without affecting the partition quality.

2014
Waterland, Amos, Elaine Angelino, Ryan P Adams, Jonathan Appavoo, and Margo Seltzer. 2014. “ASC: Automatically scalable computation.” ACM SIGPLAN Notices, 49: 575–590. ACM.
Rao, Malvika, David C Parkes, Margo Seltzer, and David F Bacon. 2014. A Framework for Incentivizing Deep Fixes. Vol. 14. WIT-EC, 14.
Carata, Lucian, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Seltzer, and Andy Hopper. 2014. “A primer on provenance.” Communications of the ACM 57 (5). ACM: 52–60.
Angelino, Elaine, Eddie Kohler, Amos Waterland, Margo Seltzer, and Ryan P Adams. 2014. “Accelerating MCMC via parallel predictive prefetching.” arXiv preprint arXiv:1403.7265.
Arce, Iván, Kathleen Clark-Fisher, Neil Daswani, Jim DelGrosso, Danny Dhillon, Christoph Kern, Tadayoshi Kohno, et al. 2014. “Avoiding the top 10 software security design flaws.” IEEE Computer Society.
Appavoo, Jonathan, Amos Waterland, Katherine Zhao, Schuyler Eldridge, Ajay Joshi, Margo Seltzer, and Steve Homer. 2014. “Programmable smart machines: A hybrid neuromorphic approach to general purpose computation.” Proc. Neuromorphic Architectures (NeuroArch) Workshop at 41th International Symposium on Computer Architecture (ISCA-41).
2013
Waterland, Amos, Elaine Angelino, Ekin D Cubuk, Efthimios Kaxiras, Ryan P Adams, Jonathan Appavoo, and Margo Seltzer. 2013. “Computational caches.” Proceedings of the 6th International Systems and Storage Conference, 8. ACM.
Rasssen, Jeremy A, Peter M Wahl, Elaine Angelino, Margo I Seltzer, Marc D Rosenman, and Sebastian Schneeweiss. 2013. “Automated Use of Electronic Health Record Text Data To Improve Validity in Pharmacoepidemiology Studies.” PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 22: 376–376. WILEY-BLACKWELL 111 RIVER ST, HOBOKEN 07030-5774, NJ USA.

Pages