- Apache Pig Tutorial
- Apache Pig - Home
- Pig Latin
- Pig Latin - Basics
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Apache Pig - INDEXOF()
The INDEXOF() function accepts a string value, a character and an index (integer). It returns the first occurrence of the given character in the string, searching forward from the given index.
Given below is the syntax of the INDEXOF() function.
grunt> INDEXOF(string, 'character', startIndex)
Assume that there is a file named emp.txt in the HDFS directory /pig_data/ as shown below. This file contains the employee details such as id, name, age, and city.
001,Robin,22,newyork 002,BOB,23,Kolkata 003,Maya,23,Tokyo 004,Sara,25,London 005,David,23,Bhuwaneshwar 006,Maggy,22,Chennai 007,Robert,22,newyork 008,Syam,23,Kolkata 009,Mary,25,Tokyo 010,Saran,25,London 011,Stacy,25,Bhuwaneshwar 012,Kelly,22,Chennai
And, we have loaded this file into Pig with a relation named emp_data as shown below.
grunt> emp_data = LOAD 'hdfs://localhost:9000/pig_data/emp.txt' USING PigStorage(',') as (id:int, name:chararray, age:int, city:chararray);
Given below is an example of the INDEXOF() function. In this example, we are finding the occurrence of the letter 'r' in the names of every employee using this function.
grunt> indexof_data = FOREACH emp_data GENERATE (id,name), INDEXOF(name, 'r',0);
The above statement parses the name of each employee and returns the index value at which the letter ‘r’ occurred for the first time. If the name doesn’t contain the letter ‘r’ it returns the value -1
The result of the statement will be stored in the relation named indexof_data. Verify the content of the relation indexof_data, using the Dump operator as shown below.
grunt> Dump indexof_data; ((1,Robin),-1) ((2,BOB),-1) ((3,Maya),-1) ((4,Sara),2) ((5,David),-1) ((6,Maggy),-1) ((7,Robert),4) ((8,Syam),-1) ((9,Mary),2) ((10,Saran),2) ((11,Stacy),-1) ((12,Kelly),-1)