This document contains information about the Heron codebase intended primarily for developers who want to contribute to Heron. The Heron codebase lives on github.

If you’re looking for documentation about developing topologies for a Heron cluster, see Building Topologies instead.

Languages

The primary programming languages for Heron are C++, Java, and Python.

  • C++ 11 is used for most of Heron’s core components, including the Topology Master, and Stream Manager.

  • Java 8 is used primarily for Heron’s topology API, and Heron Instance. It is currently the only language in which topologies can be written. Instructions can be found in Building Topologies, while documentation for the Java API can be found here. Please note that Heron topologies do not require Java 8 and can be written in Java 7 or later.

  • Python 2 (specifically 2.7) is used primarily for Heron’s CLI interface and UI components such as Heron UI and the Heron Tracker.

Main Tools

  • Build tool — Heron uses Bazel as its build tool. Information on setting up and using Bazel for Heron can be found in Compiling Heron.

  • Inter-component communication — Heron uses Protocol Buffers for communication between components. Most .proto definition files can be found in heron/proto.

  • Cluster coordination — Heron relies heavily on ZooKeeper for cluster coordination for distributed deployment, be it for Aurora or for a custom scheduler that you build. More information on ZooKeeper components in the codebase can be found in the State Management section below.

Common Utilities

The heron/common contains a variety of utilities for each of Heron’s languages, including useful constants, file utilities, networking interfaces, and more.

Cluster Scheduling

Heron supports two cluster schedulers out of the box: Aurora and a local scheduler. The Java code for each of those schedulers can be found in heron/schedulers , while the underlying scheduler API can be found here

Info on custom schedulers can be found in Implementing a Custom Scheduler; info on the currently available schedulers can be found in Deploying Heron on Aurora and Local Deployment.

State Management

The parts of Heron’s codebase related to ZooKeeper are mostly contained in heron/state. There are ZooKeeper-facing interfaces for C++, Java, and Python that are used in a variety of Heron components.

Topology Components

Topology Master

The C++ code for Heron’s Topology Master is written in C++ can be found in heron/tmaster.

Stream Manager

The C++ code for Heron’s Stream Manager can be found in heron/stmgr.

Heron Instance

The Java code for Heron instances can be found in heron/instance.

Metrics Manager

The Java code for Heron’s Metrics Manager can be found in heron/metricsmgr.

If you’d like to implement your own custom metrics handler (known as a metrics sink), see Implementing a Custom Metrics Sink.

Developer APIs

Topology API

Heron’s API for writing topologies is written in Java. The code for this API can be found in heron/api.

Documentation for writing topologies can be found in Building Topologies, while API documentation can be found here.

Simulator

Heron enables you to run topologies in Simulator for debugging purposes.

The Java API for simulator can be found in heron/simulator.

Example Topologies

Heron’s codebase includes a wide variety of example topologies built using Heron’s topology API for Java. Those examples can be found in heron/examples.

User Interface Components

Heron CLI

Heron has a tool called heron that is used to both provide a CLI interface for managing topologies and to perform much of the heavy lifting behind assembling physical topologies in your cluster. The Python code for heron can be found in heron/cli.

Sample configurations for different Heron schedulers

Heron Tracker

The Python code for the Heron Tracker can be found in heron/tracker.

The Tracker is a web server written in Python. It relies on the Tornado framework. You can add new HTTP routes to the Tracker in main.py and corresponding handlers in the handlers directory.

Heron UI

The Python code for the Heron UI can be found in heron/ui.

Like Heron Tracker, Heron UI is a web server written in Python that relies on the Tornado framework. You can add new HTTP routes to Heron UI in main.py and corresponding handlers in the handlers directory.

Heron Shell

The Python code for the Heron Shell can be found in heron/shell. The HTTP handlers and web server are defined in main.py while the HTML, JavaScript, CSS, and images for the web UI can be found in the assets directory.

Tests

There are a wide variety of tests for Heron that are scattered throughout the codebase. For more info see Testing Heron.