Bullet ...

How is this useful

How Bullet is used is largely determined by the data source it consumes. Depending on what kind of data you put Bullet on, the types of queries you run on it and your use-cases will change. As a look-forward query system with no persistence, you will not be able to repeat your queries on the same data. The next time you run your query, it will operate on the different data that arrives after that submission. If this usage pattern is what you need and you are looking for a light-weight system that can tap into your streaming data, then Bullet is for you!

Example: How Bullet is used at Yahoo

Bullet is used in production internally at Yahoo by having it sit on a subset of raw user engagement events from Yahoo sites and apps. This lets Yahoo developers automatically validate their instrumentation code end-to-end in their Continuous Delivery pipelines. Validating instrumentation is critical since it powers pretty much all decisions and products including machine learning, corporate KPIs, analytics, personalization, targeting.

This instance of Bullet also powers other use-cases such as letting analysts validate assumptions about data, product managers verify launches instantly, debug issues and outages, or simply explore and play around with the data.

Blog post

Here is a link to our blog post condensing most of this information if you want to take a look.

Quick Start

See Quick Start to set up Bullet on a local Storm topology. We will generate some synthetic streaming data that you can then query with Bullet.

Setting up Bullet on your streaming data

To set up Bullet on a real data stream, you need:

  1. To setup the Bullet backend on a stream processing framework. Currently, we support Bullet on Storm:
    1. Plug in your source of data. See Getting your data into Bullet for details
    2. Consume your data stream
  2. The Web Service set up to convey queries and return results back from the backend
  3. The optional UI set up to talk to your Web Service. You can skip the UI if all your access is programmatic

Schema in the UI

The UI also needs an endpoint that provides your data schema to help with query building. The Web Service you set up provides a simple file based schema endpoint that you can point the UI to if that is sufficient for your needs.

Querying in Bullet

Bullet queries allow you to filter, project and aggregate data. It lets you fetch raw (the individual data records) as well as aggregated data.

Termination conditions

A Bullet query terminates and returns whatever has been collected so far when:

  1. A maximum duration is reached. In other words, a query runs for a defined time window
  2. A maximum number of records is reached (only applicable for queries that are fetching raw data records and not aggregating).


Bullet supports two kinds of filters:

Filter Type Meaning
Logical filter Allow you to combine filter clauses (Logical or Relational) with logical operations like AND, OR and NOTs
Relational filters Allow you to use comparison operations like equals, not equals, greater than, less than, regex like etc, on fields


Projections allow you to pull out only the fields needed and rename them when you are querying for raw data records.


Aggregations allow you to perform some operation on the collected records.

The current aggregation types that are supported are:

Aggregation Meaning
GROUP The resulting output would be a record containing the result of an operation for each unique value combination in your specified fields
COUNT DISTINCT Computes the number of distinct elements in the fields. (May be approximate)
LIMIT or RAW The resulting output would be at most the number specified in size.
DISTRIBUTION Computes distributions of the elements in the field. E.g. Find the median value or various percentile of a field, or get frequency or cumulative frequency distributions
TOP K Returns the top K most frequently appearing values in the column

Currently we support GROUP aggregations with the following operations:

Operation Meaning
COUNT Computes the number of the elements in the group
SUM Computes the sum of the non-null values in the provided field for all elements in the group
MIN Returns the minimum of the non-null values in the provided field for all the elements in the group
MAX Returns the maximum of the non-null values in the provided field for all the elements in the group
AVG Computes the average of the non-null values in the provided field for all the elements in the group


The Bullet Web Service returns your query result as well as associated metadata information in a structured JSON format. The UI can display the results in different formats.

Approximate computation

It is often intractable to perform aggregations on an unbounded stream of data and still support arbitrary queries. However, it is possible if an exact answer is not required and the approximate answer's error is exactly quantifiable. There are stochastic algorithms and data structures that let us do this. We use Data Sketches to perform aggregations such as counting uniques, and will be using Sketches to implement some future aggregations.

Sketches let us be exact in our computation up to configured thresholds and approximate after. The error is very controllable and quantifiable. All Bullet queries that use Sketches return the error bounds with Standard Deviations as part of the results so you can quantify the error exactly. Using Sketches lets us address otherwise hard to solve problems in sub-linear space. We uses Sketches to compute COUNT DISTINCT, GROUP, DISTRIBUTION and TOP K queries.

We also use Sketches as a way to control high cardinality grouping (group by a natural key column or related) and rely on the Sketching data structure to drop excess groups. It is up to you setting up Bullet to determine to set Sketch sizes large or small enough for to satisfy the queries that will be performed on that instance of Bullet.



High Level Architecture

The Bullet backend can be split into three main sub-systems:

  1. Request Processor - receives queries, adds metadata and sends it to the rest of the system
  2. Data Processor - reads data from a input stream, converts it to an unified data format and matches it against queries
  3. Combiner - combines results for different queries, performs final aggregations and returns results

Web Service and UI

The rest of the pieces are just the standard other two pieces in a full-stack application:

The Bullet Web Service is built using Jersey and the UI is built in Ember.

The Web Service can be deployed with your favorite servlet container like Jetty. The UI is a client-side application that can be served using Node.js

In the case of Bullet on Storm, the Web Service and UI talk to the backend using Storm DRPC.

End-to-End Architecture on Storm

Overall Storm Architecture

Want to know more?

In practice, the backend is implemented using the basic components that the Stream processing framework provides. See Storm Architecture for details.

Past Releases and Source

See the Releases section where the various Bullet releases and repository links are collected in one place.