Apache Flink - Libraries



In this chapter, we will learn about the different libraries of Apache Flink.

Complex Event Processing (CEP)

FlinkCEP is an API in Apache Flink, which analyses event patterns on continuous streaming data. These events are near real time, which have high throughput and low latency. This API is used mostly on Sensor data, which come in real-time and are very complex to process.

CEP analyses the pattern of the input stream and gives the result very soon. It has the ability to provide real-time notifications and alerts in case the event pattern is complex. FlinkCEP can connect to different kind of input sources and analyse patterns in them.

This how a sample architecture with CEP looks like −

architecture with CEP

Sensor data will be coming in from different sources, Kafka will act as a distributed messaging framework, which will distribute the streams to Apache Flink, and FlinkCEP will analyse the complex event patterns.

You can write programs in Apache Flink for complex event processing using Pattern API. It allows you to decide the event patterns to detect from the continuous stream data. Below are some of the most commonly used CEP patterns −

Begin

It is used to define the starting state. The following program shows how it is defined in a Flink program −

Pattern<Event, ?> next = start.next("next");

Where

It is used to define a filter condition in the current state.

patternState.where(new FilterFunction <Event>() {  
   @Override 
      public boolean filter(Event value) throws Exception { 
   } 
});

Next

It is used to append a new pattern state and the matching event needed to pass the previous pattern.

Pattern<Event, ?> next = start.next("next");

FollowedBy

It is used to append a new pattern state but here other events can occur b/w two matching events.

Pattern<Event, ?> followedBy = start.followedBy("next");

Gelly

Apache Flink's Graph API is Gelly. Gelly is used to perform graph analysis on Flink applications using a set of methods and utilities. You can analyse huge graphs using Apache Flink API in a distributed fashion with Gelly. There are other graph libraries also like Apache Giraph for the same purpose, but since Gelly is used on top of Apache Flink, it uses single API. This is very helpful from development and operation point of view.

Let us run an example using Apache Flink API − Gelly.

Firstly, you need to copy 2 Gelly jar files from opt directory of Apache Flink to its lib directory. Then run flink-gelly-examples jar.

cp opt/flink-gelly* lib/ 
./bin/flink run examples/gelly/flink-gelly-examples_*.jar 
Gelly

Let us now run the PageRank example.

PageRank computes a per-vertex score, which is the sum of PageRank scores transmitted over in-edges. Each vertex's score is divided evenly among out-edges. High-scoring vertices are linked to by other high-scoring vertices.

The result contains the vertex ID and the PageRank score.

usage: flink run examples/flink-gelly-examples_<version>.jar --algorithm PageRank [algorithm options] --input <input> [input options] --output <output> [output options] 

./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm PageRank --input CycleGraph --vertex_count 2 --output Print 
PageRank score
Advertisements