Apache Pig - CONCAT()



The CONCAT() function of Pig Latin is used to concatenate two or more expressions of the same type.

Syntax

grunt> CONCAT (expression, expression, [...expression])

Example

Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad,89
002,siddarth,Battacharya,22,9848022338,Kolkata,78 
003,Rajesh,Khanna,22,9848022339,Delhi,90 
004,Preethi,Agarwal,21,9848022330,Pune,93 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar,75 
006,Archana,Mishra,23,9848022335,Chennai,87 
007,Komal,Nayak,24,9848022334,trivendram,83 
008,Bharathi,Nambiayar,24,9848022333,Chennai,72

And we have loaded this file into Pig with the relation name student_details as shown below.

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray, gpa:int);

Concatenating Two Strings

We can use the CONCAT() function to concatenate two or more expressions. First of all, verify the contents of the student_details relation using the Dump operator as shown below.

grunt> Dump student_details;
 
( 1,Rajiv,Reddy,21,9848022337,Hyderabad,89 ) 
( 2,siddarth,Battacharya,22,9848022338,Kolkata,78 )
( 3,Rajesh,Khanna,22,9848022339,Delhi,90 ) 
( 4,Preethi,Agarwal,21,9848022330,Pune,93 )
( 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar,75 )
( 6,Archana,Mishra,23,9848022335,Chennai,87 )
( 7,Komal,Nayak,24,9848022334,trivendram,83 )
( 8,Bharathi,Nambiayar,24,9848022333,Chennai,72 )

And, verify the schema using describe operator as shown below.

grunt> Describe student_details;
  
student_details: {id: int, firstname: chararray, lastname: chararray, age: int,
   phone: chararray, city: chararray, gpa: int}

In the above schema, you can observe that the name of the student is represented using two chararray values namely firstname and lastname. Let us concatinate these two values using the CONCAT() function.

grunt> student_name_concat = foreach student_details Generate CONCAT (firstname, lastname);

Verification

Verify the relation student_name_concat using the DUMP operator as shown below.

grunt> Dump student_name_concat;

Output

It will produce the following output, displaying the contents of the relation student_name_concat.

(RajivReddy) 
(siddarthBattacharya) 
(RajeshKhanna) 
(PreethiAgarwal) 
(TrupthiMohanthy) 
(ArchanaMishra) 
(KomalNayak) 
(BharathiNambiayar) 

We can also use an optional delimiter between the two expressions as shown below.

grunt> CONCAT(firstname, '_',lastname);

Now, let us concatenate the first name and last name of the student records in the student_details relation by placing ‘_’ between them as shown below.

grunt> student_name_concat = foreach student_details GENERATE CONCAT(firstname, '_',lastname); 

Verification

Verify the relation student_name_concat using the DUMP operator as shown below.

grunt> Dump student_name_concat;

Output

It will produce the following output, displaying the contents of the relation student_name_concat as follows.

(Rajiv_Reddy) 
(siddarth_Battacharya) 
(Rajesh_Khanna) 
(Preethi_Agarwal) 
(Trupthi_Mohanthy) 
(Archana_Mishra) 
(Komal_Nayak) 
(Bharathi_Nambiayar)
apache_pig_eval_functions.htm
Advertisements