Apache Pig - BagToString()

The Pig Latin BagToString() function is used to concatenate the elements of a bag into a string. While concatenating, we can place a delimiter between these values (optional).

Generally bags are disordered and can be arranged by using ORDER BY operator.


Given below is the syntax of the BagToString() function.

grunt> BagToString(vals:bag [, delimiter:chararray])


Assume that we have a file named dateofbirth.txt in the HDFS directory /pig_data/ as shown below. This file contains the date-of-births.



And we have loaded this file into Pig with the relation name dob as shown below.

grunt> dob = LOAD 'hdfs://localhost:9000/pig_data/dateofbirth.txt' USING PigStorage(',')
   as (day:int, month:int, year:int);

Converting Bag to String

Using the bagtostring() function, we can convert the data in the bag to string. Let us group the dob relation. The group operation will produce a bag containing all the tuples of the relation.

Group the relation dob using the Group All operator, and store the result in the relation named group_dob as shown below.

grunt> group_dob = Group dob All;

It will produce a relation as shown below.

grunt> Dump group_dob; 

Here, we can observe a bag having all the date-of-births as tuples of it. Now, let’s convert the bag to string using the function BagToString().

grunt> dob_string = foreach group_dob Generate BagToString(dob);


Verify the relation dob_string using the DUMP operator as shown below.

grunt> Dump dob_string;


It will produce the following output, displaying the contents of the relation dob_string.
