Apache Pig - BagToString()


Advertisements

The Pig Latin BagToString() function is used to concatenate the elements of a bag into a string. While concatenating, we can place a delimiter between these values (optional).

Generally bags are disordered and can be arranged by using ORDER BY operator.

Syntax

Given below is the syntax of the BagToString() function.

grunt> BagToString(vals:bag [, delimiter:chararray])

Example

Assume that we have a file named dateofbirth.txt in the HDFS directory /pig_data/ as shown below. This file contains the date-of-births.

dateofbirth.txt

22,3,1990
23,11,1989
1,3,1998
2,6,1980
26,9,1989

And we have loaded this file into Pig with the relation name dob as shown below.

grunt> dob = LOAD 'hdfs://localhost:9000/pig_data/dateofbirth.txt' USING PigStorage(',')
   as (day:int, month:int, year:int);

Converting Bag to String

Using the bagtostring() function, we can convert the data in the bag to string. Let us group the dob relation. The group operation will produce a bag containing all the tuples of the relation.

Group the relation dob using the Group All operator, and store the result in the relation named group_dob as shown below.

grunt> group_dob = Group dob All;

It will produce a relation as shown below.

grunt> Dump group_dob; 
 
(all,{(26,9,1989),(2,6,1980),(1,3,1998),(23,11,1989),(22,3,1990)})

Here, we can observe a bag having all the date-of-births as tuples of it. Now, let’s convert the bag to string using the function BagToString().

grunt> dob_string = foreach group_dob Generate BagToString(dob);

Verification

Verify the relation dob_string using the DUMP operator as shown below.

grunt> Dump dob_string;

Output

It will produce the following output, displaying the contents of the relation dob_string.

(26_9_1989_2_6_1980_1_3_1998_23_11_1989_22_3_1990)
apache_pig_eval_functions.htm
Advertisements