Apache Drill - Custom Function



Apache Drill has an option to create custom functions. These custom functions are reusable SQL functions that you develop in Java to encapsulate the code that processes column values during a query.

Custom functions can perform calculations and transformations that the built-in SQL operators and functions do not provide. Custom functions are called from within a SQL statement, like a regular function, and return a single value. Apache Drill has custom aggregate function as well and it is still evolving. Let us see how to create a simple custom function in this section.

IsPass Custom Function

Apache Drill provides a simple interface, “DrillSimpleFunc”, which we have to implement to create a new custom function. The “DrillSimpleFunc” interface has two methods, “setup” and “eval”. The “setup” method is to initialize necessary variables. “eval” method is actual method used to incorporate the custom function logic. The “eval” method has certain attributes to set function name, input and output variables.

Apache Drill provide a list of datatype to hold input and output variable like BitHolder, VarCharHolder, BigIntHolder, IntHolder, etc. We can use these datatypes to pass on information between drill and custom function. Now, let us create a new application using Maven with “com.tutorialspoint.drill.function” as the package name and “is-pass” as the library name.

mvn archetype:generate -DgroupId = com.tutorialspoint.drill.function -DartifactId =
   is-pass -DarchetypeArtifactId = maven-archetype-quickstart -DinteractiveMode = false

Here,

  • -DgroupId &minus package name

  • -DartifactId &minus argument

Then remove the App.java file and create new java file and name it as “IsPassFunc.java”. This java file will hold out custom function logic. The custom function logic is to check whether the particular student is secured pass in a particular subject by checking his mark with cutoff mark. The student mark will be first input and it will change according to the record.

The second input is the cutoff mark, which will be a constant and does not change for different records. The custom function will implement “DrillSimpleFunc” interface and just check whether the given input is higher than the cutoff. If the input is higher, then if returns true, otherwise false.

The coding is as follows −

Coding: IsPassFunc.java

package com.tutorialspoint.drill.function;

import com.google.common.base.Strings;
import io.netty.buffer.DrillBuf;

import org.apache.drill.exec.expr.DrillSimpleFunc;

import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;

import org.apache.drill.exec.expr.holders.BigIntHolder;
import org.apache.drill.exec.expr.holders.BitHolder;
import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
import org.apache.drill.exec.expr.holders.VarCharHolder;

import javax.inject.Inject;

// name of the function to be used in drill
@FunctionTemplate(
   name = “ispass",
   scope = FunctionTemplate.FunctionScope.SIMPLE,
   nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
)

public class IsPassFunc implements DrillSimpleFunc {

   // input - student mark
   @Param
   BigIntHolder input;
   
   // input - cutoff mark, constant value
   @Param(constant = true)
   BigIntHolder inputCutOff;
   
   // output - true / false
   @Output
   BitHolder out;
   
   public void setup() { }
   
   // main logic of the function. checks mark with cutoff and returns true / false.
   
   public void eval() {
   
      int mark = (int) input.value;
      int cutOffMark = (int) inputCutOff.value;
      if(mark >= cutOffMark)
      out.value = 1;
      else
      out.value = 0;
   }
}

Now, you can create a resource file @ is-pass/src/main/resources/drill-module.conf and place the following code into it.

drill {
   classpath.scanning {
      base.classes : ${?drill.classpath.scanning.base.classes} [
         com.tutorialspoint.drill.function.IsPassFunc
      ],
      packages : ${?drill.classpath.scanning.packages} [
         com.tutorialspoint.drill.function
      ]
   }
}

Apache Drill uses this configuration file to find the custom function class in the jar file. A jar file can have any number of custom function and it should be properly configured here.

Finally, add the following configuration in “pom.xml” to properly compile the custom function in maven.

pom.xml

Change the following settings in “pom.xml” file.

<dependencies>
   <dependency>
      <groupId>org.apache.drill.exec</groupId>
      <artifactId>drill-java-exec</artifactId>
      <version>1.1.0</version>
   </dependency>
</dependencies>

<build>
   <plugins>
      <plugin>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-source-plugin</artifactId>
         <version>2.4</version>
         <executions>
            <execution>
               <id>attach-sources</id>
               <phase>package</phase>
               <goals>
                  <goal>jar-no-fork</goal>
               </goals>
            </execution>
         </executions>
      </plugin>
      
      <plugin>
         <artifactId>maven-compiler-plugin</artifactId>
         <version>3.0</version>
         <configuration>
            <verbose>true</verbose>
            <compilerVersion>1.7</compilerVersion>
            <source>1.7</source>
            <target>1.7</target>
         </configuration>
      </plugin>
   </plugins>
</build>

After making all the changes, create a package using the following command.

mvn clean package

Maven will create the necessary jars, is-pass-1.0.jar & is-pass-1.0-sources.jar in the "target" folder. Now, copy the jar files and place it @ /path/to/apache-drill/jars/3rdparty in all the drill nodes.

After jar files are place properly in all the drillbits, restart all the drillbits, open a new drill shell and then execute the query as shown in the following program.

Query

select name, ispass(mark1, 35) as is_pass from dfs.`/Users/../Workspace/drill_sample/student_list.json` limit 3;

Result

name is_pass
Adam true
Amit true
Bob true

Apache Drill custom functions are simple to create and provides great extension capabilities to drill query language.

Advertisements