Graphs of Objects, Attributes, and Tables
GOAT (Graphs of Objects, Attributes, and Tables) is an umbrella for several related graph data projects at Harvard University:
- LLAMA, a graph storage library, based on the thesis work of Peter Macko
- Sheep, a graph partitioning/reordering library, based on the thesis work of Daniel Margo.
- and goatdb, a graph DBMS using libgoat.
Our advisor is Margo Seltzer.
LLAMA (Linked-node analytics using LArge Multiversioned Arrays) is a high performance graph storage library written in C++. It provides a writable storage format for graphs that vastly exceed the size of normal memory (i.e. billions of objects), as well as object attributes such as edge weights. Graph analysis algorithms implemented on LLAMA are highly competitive with the performance of other, less featureful systems.
LLAMA is based on the Large Multiversioned Array data structure developed by Peter Macko, which is a form of log-structured CSR (compressed sparse row matrix). It is especially well-suited for incremental graph applications, such as graphs generated and analyzed over time.
LLAMA's source code is available at: github.com/goatdb/llama
- Peter Macko, Virendra Marathe, Daniel Margo, and Margo Seltzer. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. 31st IEEE International Conference on Data Engineering (ICDE '15), Seoul, Korea, April 2015.
- Peter Macko. LLAMA: A Persistent, Mutable Representation for Graphs. Ph.D. Dissertation. Harvard University, Cambridge, MA, December 2014. [pdf]
Sheep (Scalable Host-tree Embeddings for Efficient Partitioning) is a scalable and efficient graph partitioning and reordering library written in C++. It improves the performance of graph-structured systems, such as LLAMA, by restructuring the graph to minimize communication costs. Sheep is a distributed algorithm that scales to extremely large graphs; it also operates efficiently on graphs that exceed main memory.
Sheep reduces the input graph to a much smaller tree structure. By partitioning this tree, Sheep approximately partitions the original graph. For a common class of graphs (the scale-free networks), Sheep outperforms other partitioners (such as METIS) by several orders of magnitude.
Sheep's source code is available at: github.com/dmargo/sheep
- Daniel Margo and Margo Seltzer. A Scalable Distributed Graph Partitioner. 41st International Conference on Very Large Data Bases (VLDB '15), Kohola Coast, Hawai'i, September 2015. [poster]
This material is based upon work supported by the National Science Foundation under Grant No. CSR-1302334, Collaborative Research with Erez Zadok and Ari Kaufman of SUNY Stony Brook and Geoffrey Keunning of Harvey Mudd College, "Workload-Aware Storage Architectures for Optimal Performance and Energy Efficiency." Parts of this work were also funded by Oracle Corporation through its External Research Organization and student internships.