- Apache Pig Tutorial
- Apache Pig - Home
- Apache Pig Introduction
- Apache Pig - Overview
- Apache Pig - Architecture
- Apache Pig Environment
- Apache Pig - Installation
- Apache Pig - Execution
- Apache Pig - Grunt Shell
- Pig Latin
- Pig Latin - Basics
- Load & Store Operators
- Apache Pig - Reading Data
- Apache Pig - Storing Data
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Combining & Splitting
- Apache Pig - Union Operator
- Apache Pig - Split Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Other Modes Of Execution
- Apache Pig - User-Defined Functions
- Apache Pig - Running Scripts
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
Apache Pig - SIZE()
The SIZE() function of Pig Latin is used to compute the number of elements based on any Pig data type.
Syntax
Given below is the syntax of the SIZE() function.
grunt> SIZE(expression)
The return values vary according to the data types in Apache Pig.
Data type | Value |
---|---|
int, long, float, double | For all these types, the size function returns 1. |
Char array | For a char array, the size() function returns the number of characters in the array. |
Byte array | For a bytearray, the size() function returns the number of bytes in the array. |
Tuple | For a tuple, the size() function returns number of fields in the tuple. |
Bag | For a bag, the size() function returns number of tuples in the bag. |
Map | For a map, the size() function returns the number of key/value pairs in the map. |
Example
Assume that we have a file named employee.txt in the HDFS directory /pig_data/ as shown below.
employee.txt
1,John,2007-01-24,250 2,Ram,2007-05-27,220 3,Jack,2007-05-06,170 3,Jack,2007-04-06,100 4,Jill,2007-04-06,220 5,Zara,2007-06-06,300 5,Zara,2007-02-06,350
And we have loaded this file into Pig with the relation name employee_data as shown below.
grunt> employee_data = LOAD 'hdfs://localhost:9000/pig_data/ employee.txt' USING PigStorage(',') as (id:int, name:chararray, workdate:chararray, daily_typing_pages:int);
Calculating the Size of the Type
To calculate the size of the type of a particular column, we can use the SIZE() function. Let’s calculate the size of the name type as shown below.
grunt> size = FOREACH employee_data GENERATE SIZE(name);
Verification
Verify the relation size using the DUMP operator as shown below.
grunt> Dump size;
Output
It will produce the following output, displaying the contents of the relation size as follows. In the example, we have calculated the size of the name column. Since it is of varchar type, the SIZE() function gives you the number of characters in the name of each employee.
(4) (3) (4) (4) (4) (4) (4)