- Apache Pig Tutorial
- Apache Pig - Home
- Apache Pig Introduction
- Apache Pig - Overview
- Apache Pig - Architecture
- Apache Pig Environment
- Apache Pig - Installation
- Apache Pig - Execution
- Apache Pig - Grunt Shell
- Pig Latin
- Pig Latin - Basics
- Load & Store Operators
- Apache Pig - Reading Data
- Apache Pig - Storing Data
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Combining & Splitting
- Apache Pig - Union Operator
- Apache Pig - Split Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Other Modes Of Execution
- Apache Pig - User-Defined Functions
- Apache Pig - Running Scripts
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
Apache Pig - CONCAT()
The CONCAT() function of Pig Latin is used to concatenate two or more expressions of the same type.
Syntax
grunt> CONCAT (expression, expression, [...expression])
Example
Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad,89 002,siddarth,Battacharya,22,9848022338,Kolkata,78 003,Rajesh,Khanna,22,9848022339,Delhi,90 004,Preethi,Agarwal,21,9848022330,Pune,93 005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar,75 006,Archana,Mishra,23,9848022335,Chennai,87 007,Komal,Nayak,24,9848022334,trivendram,83 008,Bharathi,Nambiayar,24,9848022333,Chennai,72
And we have loaded this file into Pig with the relation name student_details as shown below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray, gpa:int);
Concatenating Two Strings
We can use the CONCAT() function to concatenate two or more expressions. First of all, verify the contents of the student_details relation using the Dump operator as shown below.
grunt> Dump student_details; ( 1,Rajiv,Reddy,21,9848022337,Hyderabad,89 ) ( 2,siddarth,Battacharya,22,9848022338,Kolkata,78 ) ( 3,Rajesh,Khanna,22,9848022339,Delhi,90 ) ( 4,Preethi,Agarwal,21,9848022330,Pune,93 ) ( 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar,75 ) ( 6,Archana,Mishra,23,9848022335,Chennai,87 ) ( 7,Komal,Nayak,24,9848022334,trivendram,83 ) ( 8,Bharathi,Nambiayar,24,9848022333,Chennai,72 )
And, verify the schema using describe operator as shown below.
grunt> Describe student_details; student_details: {id: int, firstname: chararray, lastname: chararray, age: int, phone: chararray, city: chararray, gpa: int}
In the above schema, you can observe that the name of the student is represented using two chararray values namely firstname and lastname. Let us concatinate these two values using the CONCAT() function.
grunt> student_name_concat = foreach student_details Generate CONCAT (firstname, lastname);
Verification
Verify the relation student_name_concat using the DUMP operator as shown below.
grunt> Dump student_name_concat;
Output
It will produce the following output, displaying the contents of the relation student_name_concat.
(RajivReddy) (siddarthBattacharya) (RajeshKhanna) (PreethiAgarwal) (TrupthiMohanthy) (ArchanaMishra) (KomalNayak) (BharathiNambiayar)
We can also use an optional delimiter between the two expressions as shown below.
grunt> CONCAT(firstname, '_',lastname);
Now, let us concatenate the first name and last name of the student records in the student_details relation by placing ‘_’ between them as shown below.
grunt> student_name_concat = foreach student_details GENERATE CONCAT(firstname, '_',lastname);
Verification
Verify the relation student_name_concat using the DUMP operator as shown below.
grunt> Dump student_name_concat;
Output
It will produce the following output, displaying the contents of the relation student_name_concat as follows.
(Rajiv_Reddy) (siddarth_Battacharya) (Rajesh_Khanna) (Preethi_Agarwal) (Trupthi_Mohanthy) (Archana_Mishra) (Komal_Nayak) (Bharathi_Nambiayar)