This document contains information about the Heron codebase intended primarily for developers who want to contribute to Heron. The Heron codebase lives on github.
If you’re looking for documentation about developing topologies for a Heron cluster, see Building Topologies instead.
Languages
The primary programming languages for Heron are C++, Java, and Python.
C++ 11 is used for most of Heron’s core components, including the Topology Master, and Stream Manager.
Java 8 is used primarily for Heron’s topology API, and Heron Instance. It is currently the only language in which topologies can be written. Instructions can be found in Building Topologies, while documentation for the Java API can be found here. Please note that Heron topologies do not require Java 8 and can be written in Java 7 or later.
Python 2 (specifically 2.7) is used primarily for Heron’s CLI interface and UI components such as Heron UI and the Heron Tracker.
Main Tools
Build tool — Heron uses Bazel as its build tool. Information on setting up and using Bazel for Heron can be found in Compiling Heron.
Inter-component communication — Heron uses Protocol Buffers for communication between components. Most
.proto
definition files can be found inheron/proto
.Cluster coordination — Heron relies heavily on ZooKeeper for cluster coordination for distributed deployment, be it for Aurora or for a custom scheduler that you build. More information on ZooKeeper components in the codebase can be found in the State Management section below.
Common Utilities
The heron/common
contains a variety of
utilities for each of Heron’s languages, including useful constants, file
utilities, networking interfaces, and more.
Cluster Scheduling
Heron supports two cluster schedulers out of the box:
Aurora and a local
scheduler. The Java code for each of those
schedulers can be found in heron/schedulers
, while the underlying scheduler API can be found here
Info on custom schedulers can be found in Implementing a Custom Scheduler; info on the currently available schedulers can be found in Deploying Heron on Aurora and Local Deployment.
State Management
The parts of Heron’s codebase related to
ZooKeeper are mostly contained in
heron/state
. There are ZooKeeper-facing
interfaces for C++,
Java, and
Python that are used in a variety of
Heron components.
Topology Components
Topology Master
The C++ code for Heron’s Topology
Master is written in C++ can be
found in heron/tmaster
.
Stream Manager
The C++ code for Heron’s Stream
Manager can be found in
heron/stmgr
.
Heron Instance
The Java code for Heron
instances can be found in
heron/instance
.
Metrics Manager
The Java code for Heron’s Metrics
Manager can be found in
heron/metricsmgr
.
If you’d like to implement your own custom metrics handler (known as a metrics sink), see Implementing a Custom Metrics Sink.
Developer APIs
Topology API
Heron’s API for writing topologies is written in Java. The code for this API can
be found in heron/api
.
Documentation for writing topologies can be found in Building Topologies, while API documentation can be found here.
Simulator
Heron enables you to run topologies in Simulator
for debugging purposes.
The Java API for simulator can be found in
heron/simulator
.
Example Topologies
Heron’s codebase includes a wide variety of example
topologies built using Heron’s topology API for
Java. Those examples can be found in
heron/examples
.
User Interface Components
Heron CLI
Heron has a tool called heron
that is used to both provide a CLI interface
for managing topologies and to perform much of
the heavy lifting behind assembling physical topologies in your cluster.
The Python code for heron
can be found in
heron/cli
.
Sample configurations for different Heron schedulers
- Local scheduler config can be found in
heron/config/src/yaml/conf/local
, - Aurora scheduler config can be found
heron/config/src/yaml/conf/aurora
.
Heron Tracker
The Python code for the Heron Tracker can be
found in heron/tracker
.
The Tracker is a web server written in Python. It relies on the
Tornado framework. You can add new HTTP
routes to the Tracker in
main.py
and
corresponding handlers in the
handlers
directory.
Heron UI
The Python code for the Heron UI can be found in
heron/ui
.
Like Heron Tracker, Heron UI is a web server written in Python that relies on
the Tornado framework. You can add new
HTTP routes to Heron UI in
main.py
and corresponding
handlers in the handlers
directory.
Heron Shell
The Python code for the Heron Shell can be
found in heron/shell
. The HTTP handlers and
web server are defined in
main.py
while the HTML,
JavaScript, CSS, and images for the web UI can be found in the
assets
directory.
Tests
There are a wide variety of tests for Heron that are scattered throughout the codebase. For more info see Testing Heron.