- Apache Pig Tutorial
- Apache Pig - Home
- Pig Latin
- Pig Latin - Basics
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Apache Pig - SUBSTRING()
This function returns a substring from the given string.
Given below is the syntax of the SUBSTRING() function. This function accepts three parameters one is the column name of the string we want. And the other two are the start and stop indexes of the required substring.
grunt> SUBSTRING(string, startIndex, stopIndex)
Assume that there is a file named emp.txt in the HDFS directory /pig_data/ as shown below. This file contains the employee details such as id, name age and city.
001,Robin,22,newyork 002,Stacy,25,Bhuwaneshwar 003,Kelly,22,Chennai
And, we have loaded this file into Pig with a relation named emp_data as shown below.
grunt> emp_data = LOAD 'hdfs://localhost:9000/pig_data/emp.txt' USING PigStorage(',')as (id:int, name:chararray, age:int, city:chararray);
Following is an example of the SUBSTRING() function. This example fetches the sub strings that starts with 0th letter and ends with 2nd letter from the employee names.
grunt> substring_data = FOREACH emp_data GENERATE (id,name), SUBSTRING (name, 0, 2);
The above statement fetches the required substrings from the names of the employees. The result of the statement will be stored in the relation named substring_data.
Verify the content of the relation substring_data, using the Dump operator as shown below.
grunt> Dump substring_data; ((1,Robin),Rob) ((2,Stacy),Sta) ((3,Kelly),Kel)