Apache Drill - Architecture



As of now, you are aware of the Apache Drill fundamentals. This chapter will explain about its architecture in detail. Following is a diagram that illustrates the Apache Drill core module.

Apache Drill Core Module

The above diagram consists of different components. Let’s take a look at each of these components in detail.

  • DrillBit − Apache Drill consists of a Daemon service called the DrillBit. It is responsible for accepting requests from the client, processing queries, and returning results to the client. There is no master-slave concept in DrillBit.

  • SQL Parser − The SQL parser parses all the incoming queries based on the open source framework called Calcite.

  • Logical Plan − A Logical plan describes the abstract data flow of a query. Once a query is parsed into a logical plan, the Drill optimizer determines the most efficient execution plan using a variety of rule-based and cost-based techniques, translating the logical plan into a physical plan.

  • Optimizer − Apache Drill uses various database optimizations such as rule based/cost based, as well as other optimization rules exposed by the storage engine to re-write and split the query. The output of the optimizer is a distributed physical query plan. Optimization in Drill is pluggable so you can provide rules for optimization at various parts of the query execution.

  • Physical Plan − A Physical plan is also called as the execution plan. It represents the most efficient and fastest way to execute the query across the different nodes in the cluster. The physical plan is a DAG (directed acyclic graph) of physical operators, and each parent-child relationship implies how data flows through the graph.

  • Storage Engine interface − A Storage plugin interfaces in Drill represent the abstractions that Drill uses to interact with the data sources. The plugins are extensible, allowing you to write new plugins for any additional data sources.

Query Execution Diagram

The following image shows a DrillBit query execution diagram −

Query Execution

The above diagram involves the following steps −

  • The Drill client issues a query. Any Drillbit in the cluster can accept queries from clients.

  • A Drillbit then parses the query, optimizes it, and generates an optimized distributed query plan for fast and efficient execution.

  • The Drillbit that accepts the initial query becomes the Foreman (driving Drillbit) for the request. It gets a list of available Drillbit nodes in the cluster from ZooKeeper.

  • The foreman gets a list of available Drillbit nodes in the cluster from ZooKeeper and schedules the execution of query fragments on individual nodes according to the execution plan.

  • The individual nodes finish their execution and return data to the foreman.

  • The foreman finally returns the results back to the client.

Advertisements