Configuring a Cluster
To setup a Heron cluster, you need to configure a few files. Each file configures a component of the Heron streaming framework.
scheduler.yaml — This file specifies the required classes for launcher, scheduler, and for managing the topology at runtime. Any other specific parameters for the scheduler go into this file.
statemgr.yaml — This file contains the classes and the configuration for state manager. The state manager maintains the running state of the topology as logical plan, physical plan, scheduler state, and execution state.
uploader.yaml — This file specifies the classes and configuration for the uploader, which uploads the topology jars to storage. Once the containers are scheduled, they will download these jars from the storage for running.
heron_internals.yaml — This file contains parameters that control how heron behaves. Tuning these parameters requires advanced knowledge of heron architecture and its components. For starters, the best option is just to copy the file provided with sample configuration. Once you are familiar with the system you can tune these parameters to achieve high throughput or low latency topologies.
metrics_sinks.yaml — This file specifies where the run-time system and topology metrics will be routed. By default, the
file sink
andtmaster sink
need to be present. In addition,scribe sink
andgraphite sink
are also supported.packing.yaml — This file specifies the classes for
packing algorithm
, which defaults to Round Robin, if not specified.client.yaml — This file controls the behavior of the
heron
client. This is optional.
Assembling the Configuration
All configuration files are assembled together to form the cluster configuration. For example,
a cluster named devcluster
that uses the Aurora for scheduler, ZooKeeper for state manager and
HDFS for uploader will have the following set of configurations.