This function is used to split a given string by a given delimiter.
The syntax of STRSPLIT() is given below. This function accepts a string that is needed to be split, a regular expression, and an integer value specifying the limit (the number of substrings the string should be split). This function parses the string and when it encounters the given regular expression, it splits the string into n number of substrings where n will be the value passed to limit.
grunt> STRSPLIT(string, regex, limit)
Assume that there is a file named emp.txt in the HDFS directory /pig_data/ as shown below. This file contains the employee details such as id, name, age, and city.
001,Robin_Smith,22,newyork 002,BOB_Wilson,23,Kolkata 003,Maya_Reddy,23,Tokyo 004,Sara_Jain,25,London 005,David_Miller,23,Bhuwaneshwar 006,Maggy_Moore,22,Chennai 007,Robert_Scott,22,newyork 008,Syam_Ketavarapu,23,Kolkata 009,Mary_Carter,25,Tokyo 010,Saran_Naidu,25,London 011,Stacy_Green,25,Bhuwaneshwar 012,Kelly_Moore,22,Chennai
And, we have loaded this file into Pig with a relation named emp_data as shown below.
grunt> emp_data = LOAD 'hdfs://localhost:9000/pig_data/emp.txt' USING PigStorage(',') as (id:int, name:chararray, age:int, city:chararray);
Following is an example of the STRSPLIT() function. If you observe the emp.txt file, you can find that, in the name column, we have the names and surnames of the employees separated by the delemeter '_'.
In this example, we are trying to split the name and surname of the employees using STRSPLIT() function.
grunt> strsplit_data = FOREACH emp_data GENERATE (id,name), STRSPLIT (name,'_',2);
The result of the statement will be stored in the relation named strsplit_data. Verify the content of the relation strsplit_data, using the Dump operator as shown below.
grunt> Dump strsplit_data; ((1,Robin_Smith),(Robin,Smith)) ((2,BOB_Wilson),(BOB,Wilson)) ((3,Maya_Reddy),(Maya,Reddy)) ((4,Sara_Jain),(Sara,Jain)) ((5,David_Miller),(David,Miller)) ((6,Maggy_Moore),(Maggy,Moore)) ((7,Robert_Scott),(Robert,Scott)) ((8,Syam_Ketavarapu),(Syam,Ketavarapu)) ((9,Mary_Carter),(Mary,Carter)) ((10,Saran_Naidu),(Saran,Naidu)) ((11,Stacy_Green),(Stacy,Green)) ((12,Kelly_Moore),(Kelly,Moore))