Apache Pig - Diagnostic Operators


The load statement will simply load the data into the specified relation in Apache Pig. To verify the execution of the Load statement, you have to use the Diagnostic Operators. Pig Latin provides four different types of diagnostic operators −

  • Dump operator
  • Describe operator
  • Explanation operator
  • Illustration operator

In this chapter, we will discuss the Dump operators of Pig Latin.

Dump Operator

The Dump operator is used to run the Pig Latin statements and display the results on the screen. It is generally used for debugging Purpose.


Given below is the syntax of the Dump operator.

grunt> Dump Relation_Name


Assume we have a file student_data.txt in HDFS with the following content.


And we have read it into a relation student using the LOAD operator as shown below.

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' 
   USING PigStorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, 
   city:chararray );

Now, let us print the contents of the relation using the Dump operator as shown below.

grunt> Dump student

Once you execute the above Pig Latin statement, it will start a MapReduce job to read data from HDFS. It will produce the following output.

2015-10-01 15:05:27,642 [main]
INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 
100% complete
2015-10-01 15:05:27,652 [main]
INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:   
HadoopVersion  PigVersion  UserId    StartedAt             FinishedAt       Features             
2.6.0          0.15.0      Hadoop  2015-10-01 15:03:11  2015-10-01 05:27     UNKNOWN
Job Stats (time in seconds):
JobId           job_14459_0004
Maps                 1  
Reduces              0  
MaxMapTime          n/a    
MinMapTime          n/a
AvgMapTime          n/a 
MedianMapTime       n/a
MaxReduceTime        0
MinReduceTime        0  
AvgReduceTime        0
MedianReducetime     0
Alias             student 
Feature           MAP_ONLY        
Outputs           hdfs://localhost:9000/tmp/temp580182027/tmp757878456,

Input(s): Successfully read 0 records from: "hdfs://localhost:9000/pig_data/
Output(s): Successfully stored 0 records in: "hdfs://localhost:9000/tmp/temp580182027/

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager 
spill count : 0Total bags proactively spilled: 0 Total records proactively spilled: 0  

Job DAG: job_1443519499159_0004
2015-10-01 15:06:28,403 [main]
INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLau ncher - Success!
2015-10-01 15:06:28,441 [main] INFO  org.apache.pig.data.SchemaTupleBackend - 
Key [pig.schematuple] was not set... will not generate code.
2015-10-01 15:06:28,485 [main]
INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths 
to process : 1
2015-10-01 15:06:28,485 [main]
INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths
to process : 1


Useful Video Courses


Apache Spark Online Training

46 Lectures 3.5 hours

Arnab Chakraborty


Apache Spark with Scala - Hands On with Big Data

23 Lectures 1.5 hours

Mukund Kumar Mishra


Learn Apache Cordova using Visual Studio 2015 & Command line

16 Lectures 1 hours

Nilay Mehta


Delta Lake with Apache Spark using Scala

52 Lectures 1.5 hours

Bigdata Engineer


Apache Zeppelin - Big Data Visualization Tool

14 Lectures 1 hours

Bigdata Engineer


Olympic Games Analytics Project in Apache Spark for Beginner

23 Lectures 1 hours

Bigdata Engineer