What are the different data types in Apache Pig?


Apache Hadoop is a data file system, but to perform data processing, we need an SQL, such as a language that can change data or make complex data conversions according to our requirements. Apache PIG can achieve this data manipulation. An advanced writing language like SQL is used with Hadoop to create the Pig. Pig Data types work with formal and informal data and are translated into a Map Reduce number processed in the Hadoop collection.

We must know about Pig Data Types before understanding operators in Pig. Any data uploaded to a pig has a specific structure and schema that uses a data structure processed by pig data types to form a data model.

For understanding the structure, data must go through the map to define the data model. The pig can handle any data because of the SQL-like structure that works well with the Single value structure and the hierarchical data structure in the nest. It comes with a finite set of data types. Pig data type can be classified into two categories, and they are −

  • Primitive

  • Complex

Primitive Data type

It is also named as Simple Data type. The primitive data types are as follows −

  • int − Signed 32-bit integer and similar to Integer in Java.

  • Long − It is a fully signed 64-bit number similar to Long in Java.

  • Float − It is a signed 32-bit floating surface that appears to be similar to Java's float.

  • Double − A floating-point 63-bit and similar to Double in Java.

  • Char array − A list of characters in the Unicode format, UTF-8. This is compatible with the Java character unit item.

  • byte array − The byte data type represents bytes by default. When the data file type is not specified, the default value is byte array.

  • Boolean − A value that is either true or false.

Complex Data type

Complex data types consist of a bit of logical and complicated data type. The following are the complex data type −

Data Types Definition Code Example
Tuple A set of ordered fields. The tuple is written with braces. (field[,fields....]) (1,2)
Bag A group of tuples is called a bag. Represented by folded weights or curly braces. {tuple,[,tuple...]} {(1,2), (3,4)}
Map A set of key-value pairs. The map is represented by square brackets. [Key # Value] ['keyname'#'valuename']
  • Key − An element of finding an element, the key must be unique and must be charrarray.

  • Value − Any data can be stored in a value, and each key has particular data related to it. The map is built using a bracket and hash between key and values. Cas to separate pairs of over one key value. Here # is used to distinguish key and value.

  • Null Values − Valuable value is missing or unknown, and any data may apply. The pig handles an empty value similar to SQL. Pig detects blank values when data is missing, or an error occurs during data processing. Also, null can be used as a value proposition of your choice.

Note − Pig allows for the reproduction of complex data structures. For example, you can easily place a tuple on the interior side of a tuple, bag, and map.

Conclusion

Apache pig is part of the Hadoop ecosystem that supports SQL as a structure and supports the data used in SQL represented in Java.lang classes. Because of the complex nature of the data, the pig is used for tasks that include formal and informal data processing. Yahoo uses about 40% of its operations to search as Pig extracts data, performs tasks, and discards data to the HDFS file system.

Updated on: 25-Aug-2022

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements