Hive - Partitioning
Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.
Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.
For example, a table named Tab1 contains employee data such as id, name, dept, and yoj (i.e., year of joining). Suppose you need to retrieve the details of all employees who joined in 2012. A query searches the whole table for the required information. However, if you partition the employee data with the year and store it in a separate file, it reduces the query processing time. The following example shows how to partition a file and its data:
The following file contains employeedata table.
id, name, dept, yoj 1, gopal, TP, 2012 2, kiran, HR, 2012 3, kaleel,SC, 2013 4, Prasanth, SC, 2013
The above data is partitioned into two files using year.
1, gopal, TP, 2012 2, kiran, HR, 2012
3, kaleel,SC, 2013 4, Prasanth, SC, 2013
Adding a Partition
We can add partitions to a table by altering the table. Let us assume we have a table called employee with fields such as Id, Name, Salary, Designation, Dept, and yoj.
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...; partition_spec: : (p_column = p_col_value, p_column = p_col_value, ...)
The following query is used to add a partition to the employee table.
hive> ALTER TABLE employee > ADD PARTITION (year=’2012’) > location '/2012/part2012';
Renaming a Partition
The syntax of this command is as follows.
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
The following query is used to rename a partition:
hive> ALTER TABLE employee PARTITION (year=’1203’) > RENAME TO PARTITION (Yoj=’1203’);
Dropping a Partition
The following syntax is used to drop a partition:
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;
The following query is used to drop a partition:
hive> ALTER TABLE employee DROP [IF EXISTS] > PARTITION (year=’1203’);