Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Overview of Dynamic Partition in Hive
Apache Hive is a data warehousing system built on Hadoop for analytics and MapReduce jobs. Partitioning divides large datasets into smaller parts for faster queries. Dynamic partitioning automatically determines partition values from the data being inserted, unlike static partitioning where values are manually specified.
Static vs Dynamic Partitioning
| Feature | Static Partitioning | Dynamic Partitioning |
|---|---|---|
| Partition values | Manually specified per insert | Automatically derived from data |
| Best for | Few known partitions | Many or unknown partitions |
| WHERE clause | Required | Not required |
| Flexibility | Low | High |
Enabling Dynamic Partitioning
SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict;
Complete Example
Create a source table, load data, then dynamically partition it ?
-- Step 1: Create source table
CREATE TABLE sales_raw (
id INT,
product STRING,
amount DOUBLE,
sale_date STRING,
country STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- Step 2: Load data
LOAD DATA LOCAL INPATH '/home/data/sales.csv' INTO TABLE sales_raw;
-- Step 3: Create partitioned table
CREATE TABLE sales_partitioned (
id INT,
product STRING,
amount DOUBLE,
sale_date STRING
)
PARTITIONED BY (country STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- Step 4: Insert with dynamic partitioning
INSERT INTO TABLE sales_partitioned PARTITION (country)
SELECT id, product, amount, sale_date, country
FROM sales_raw;
Hive automatically creates partition directories based on distinct country values in the source data.
Managing Partitions
-- View partitions SHOW PARTITIONS sales_partitioned; -- Query specific partition (partition pruning) SELECT * FROM sales_partitioned WHERE country = 'India'; -- Drop a partition ALTER TABLE sales_partitioned DROP PARTITION (country = 'India');
Conclusion
Dynamic partitioning in Hive automatically creates partitions from data values during INSERT, eliminating the need to manually specify each partition. It is ideal for large datasets with many distinct partition values, improving query performance through partition pruning while reducing manual effort.
