Apache Flume - Configuration
After installing Flume, we need to configure it using the configuration file which is a Java property file having key-value pairs. We need to pass values to the keys in the file.
In the Flume configuration file, we need to −
- Name the components of the current agent.
- Describe/Configure the source.
- Describe/Configure the sink.
- Describe/Configure the channel.
- Bind the source and the sink to the channel.
Usually we can have multiple agents in Flume. We can differentiate each agent by using a unique name. And using this name, we have to configure each agent.
Naming the Components
First of all, you need to name/list the components such as sources, sinks, and the channels of the agent, as shown below.
agent_name.sources = source_name agent_name.sinks = sink_name agent_name.channels = channel_name
Flume supports various sources, sinks, and channels. They are listed in the table given below.
You can use any of them. For example, if you are transferring Twitter data using Twitter source through a memory channel to an HDFS sink, and the agent name id TwitterAgent, then
TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS
After listing the components of the agent, you have to describe the source(s), sink(s), and channel(s) by providing values to their properties.
Describing the Source
Each source will have a separate list of properties. The property named “type” is common to every source, and it is used to specify the type of the source we are using.
Along with the property “type”, it is needed to provide the values of all the required properties of a particular source to configure it, as shown below.
agent_name.sources. source_name.type = value agent_name.sources. source_name.property2 = value agent_name.sources. source_name.property3 = value
For example, if we consider the twitter source, following are the properties to which we must provide values to configure it.
TwitterAgent.sources.Twitter.type = Twitter (type name) TwitterAgent.sources.Twitter.consumerKey = TwitterAgent.sources.Twitter.consumerSecret = TwitterAgent.sources.Twitter.accessToken = TwitterAgent.sources.Twitter.accessTokenSecret =
Describing the Sink
Just like the source, each sink will have a separate list of properties. The property named “type” is common to every sink, and it is used to specify the type of the sink we are using. Along with the property “type”, it is needed to provide values to all the required properties of a particular sink to configure it, as shown below.
agent_name.sinks. sink_name.type = value agent_name.sinks. sink_name.property2 = value agent_name.sinks. sink_name.property3 = value
For example, if we consider HDFS sink, following are the properties to which we must provide values to configure it.
TwitterAgent.sinks.HDFS.type = hdfs (type name) TwitterAgent.sinks.HDFS.hdfs.path = HDFS directory’s Path to store the data
Describing the Channel
Flume provides various channels to transfer data between sources and sinks. Therefore, along with the sources and the channels, it is needed to describe the channel used in the agent.
To describe each channel, you need to set the required properties, as shown below.
agent_name.channels.channel_name.type = value agent_name.channels.channel_name. property2 = value agent_name.channels.channel_name. property3 = value
For example, if we consider memory channel, following are the properties to which we must provide values to configure it.
TwitterAgent.channels.MemChannel.type = memory (type name)
Binding the Source and the Sink to the Channel
Since the channels connect the sources and sinks, it is required to bind both of them to the channel, as shown below.
agent_name.sources.source_name.channels = channel_name agent_name.sinks.sink_name.channels = channel_name
The following example shows how to bind the sources and the sinks to a channel. Here, we consider twitter source, memory channel, and HDFS sink.
TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sinks.HDFS.channels = MemChannel
Starting a Flume Agent
After configuration, we have to start the Flume agent. It is done as follows −
$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent
agent − Command to start the Flume agent
--conf ,-c<conf> − Use configuration file in the conf directory
-f<file> − Specifies a config file path, if missing
--name, -n <name> − Name of the twitter agent
-D property =value − Sets a Java system property value.