- Apache Pig Tutorial
- Apache Pig - Home
- Apache Pig Introduction
- Apache Pig - Overview
- Apache Pig - Architecture
- Apache Pig Environment
- Apache Pig - Installation
- Apache Pig - Execution
- Apache Pig - Grunt Shell
- Pig Latin
- Pig Latin - Basics
- Load & Store Operators
- Apache Pig - Reading Data
- Apache Pig - Storing Data
- Diagnostic Operators
- Apache Pig - Diagnostic Operator
- Apache Pig - Describe Operator
- Apache Pig - Explain Operator
- Apache Pig - Illustrate Operator
- Grouping & Joining
- Apache Pig - Group Operator
- Apache Pig - Cogroup Operator
- Apache Pig - Join Operator
- Apache Pig - Cross Operator
- Combining & Splitting
- Apache Pig - Union Operator
- Apache Pig - Split Operator
- Pig Latin Built-In Functions
- Apache Pig - Eval Functions
- Load & Store Functions
- Apache Pig - Bag & Tuple Functions
- Apache Pig - String Functions
- Apache Pig - date-time Functions
- Apache Pig - Math Functions
- Other Modes Of Execution
- Apache Pig - User-Defined Functions
- Apache Pig - Running Scripts
- Apache Pig Useful Resources
- Apache Pig - Quick Guide
- Apache Pig - Useful Resources
- Apache Pig - Discussion
Apache Pig - SUM()
You can use the SUM() function of Pig Latin to get the total of the numeric values of a column in a single-column bag. While computing the total, the SUM() function ignores the NULL values.
Note −
To get the global sum value, we need to perform a Group All operation, and calculate the sum value using the SUM() function.
To get the sum value of a group, we need to group it using the Group By operator and proceed with the sum function.
Syntax
Given below is the syntax of the SUM() function.
grunt> SUM(expression)
Example
Assume that we have a file named employee.txt in the HDFS directory /pig_data/ as shown below.
employee.txt
1,John,2007-01-24,250 2,Ram,2007-05-27,220 3,Jack,2007-05-06,170 3,Jack,2007-04-06,100 4,Jill,2007-04-06,220 5,Zara,2007-06-06,300 5,Zara,2007-02-06,350
And we have loaded this file into Pig with the relation name employee_data as shown below.
grunt> employee_data = LOAD 'hdfs://localhost:9000/pig_data/ employee.txt' USING PigStorage(',') as (id:int, name:chararray, workdate:chararray, daily_typing_pages:int);
Calculating the Sum of All GPA
To demonstrate the SUM() function, let’s try to calculate the total number of pages typed daily of all the employees. We can use the Apache Pig’s built-in function SUM() (case sensitive) to calculate the sum of the numerical values. Let us group the relation employee_data using the Group All operator, and store the result in the relation named employee_group as shown below.
grunt> employee_group = Group employee_data all;
It will produce a relation as shown below.
grunt> Dump employee_group; (all,{(5,Zara,2007-02-06,350), (5,Zara,2007-06-06,300), (4,Jill,2007-0406,220), (3,Jack,2007-04-06,100), (3,Jack,2007-05-06,170), (2,Ram,2007-0527,220), (1,John,2007-01-24,250)})
Let us now calculate the global sum of the pages typed daily.
grunt> student_workpages_sum = foreach employee_group Generate (employee_data.name,employee_data.daily_typing_pages),SUM(employee_data.daily_typing_pages);
Verification
Verify the relation student_workpages_sum using the DUMP operator as shown below.
grunt> Dump student_workpages_sum;
Output
It will produce the following output, displaying the contents of the relation student_workpages_sum as follows.
(({ (Zara), (Zara), (Jill) ,(Jack) , (Jack) , (Ram) , (John) }, { (350) , (300) , (220) ,(100) , (170) , (220) , (250) }),1610)