What is TrailDB?¶
TrailDB is an efficient tool for storing and querying series of events, in other words, things that happen over time. The series of events could be generated by a user using a mobile application, or a trading algorithm making trades in the market. Events likes this are discrete and often structured, such as JSON objects. This makes TrailDB different from typical time-series databases that handle continuous data, like CPU utilization that can be aggregated easily.
TrailDB shines at compressing large amounts of discrete event data in a small amount of space, which can be queried efficiently. For instance, a TrailDB could include all actions by all users ever taken on your web site, or granular event logs from thousands of servers – all compressed to a single, neat file which can be small enough to allow analysis on a laptop.
TrailDBs are immutable files which your application can create and access using the TrailDB library. The TrailDB library is implemented in C that makes it extremely performant. Multiple language bindings are provided, including Go, Python, D, Haskell, and R.
TrailDB is designed to be a core building block for systems that need to store and process a large number of discrete events, organized by a primary key. It is complementary to existing relational databases and key-value stores.
It is easy to store and query series of events using existing databases. What makes TrailDB different is immutability: Immutable data enables deeper compression, scalability, and architectural decisions, which would not be feasible with existing databases. This is especially true for cloud environments with object stores like Amazon S3 that are a perfect match for compressed, immutable files.
A typical data pipeline using TrailDB consists of producers that encode new TrailDBs at regular intervals, e.g. daily, and push them to S3. Once in S3, TrailDBs can be easily processed using an arbitrary number of consumers in parallel, without any centralized bottlenecks. This straightforward architecture is easy to integrate in a larger production setup with minimal operational overhead.
Developer productivity is another main motivation of TrailDB. Individual files are easy to manipulate using standard filesystem tools. The easily portable C library has only a few easily available dependencies, making it easily deployable. The API is clean and minimal by design. In the Devops point of view, it is convenient to be able to observe slices of a large production system just by downloading a file.
As TrailDB is used as a core building block in large production systems, maintainability, reliability and robustness are of central importance. The test coverage is nearly 100%. TrailDB takes backwards compatibility very seriously: You should be always able to read older TrailDBs using the newest version of the library.