Apache Pig - RANDOM()



The RANDOM() function is used to get a pseudo random number (type double) greater than or equal to 0.0 and less than 1.0.

grunt> RANDOM()

Example

Assume that there is a file named math.txt in the HDFS directory /pig_data/. This file contains integer and floating point values as shown below.

math.txt

5 
16 
9 
2.5 
5.9 
3.1 

And, we have loaded this file into Pig with a relation named math_data as shown below.

grunt> math_data = LOAD 'hdfs://localhost:9000/pig_data/math.txt' USING PigStorage(',')
   as (data:float);

Let us now generate random values of the contents of the math.txt file using RANDOM() function as shown below.

grunt> random_data = foreach math_data generate (data), RANDOM();

The above statement stores the result in the relation named random_data. Verify the contents of the relation using the Dump operator as shown below.

grunt> Dump random_data;
  
(5.0,0.6842057767279982) 
(16.0,0.9725172591786139) 
(9.0,0.4159326414649489) 
(2.5,0.30962777780713147) 
(5.9,0.705213727551145) 
(3.1,0.24247708413861724)
apache_pig_math_functions.htm
Advertisements