Bucketing with Sorting

#1

We can also ensure that data is sorted with in the bucket in the bucketed table.

  • Number of files in the bucketed tables will be multiples of number of buckets.
  • Using hash mod algorithm on top of bucket key, data will land into appropriate file.
  • However data is not sorted with in bucket.
  • We can sort the data with in the bucket by using SORTED BY while creating bucketed tables
  • Let us get create table statement of orders_buck and recreate with SORTED BY clause.

Here is the example of creating bucketed table using sorting and then inserting data into it.

USE training_retail;
CREATE TABLE orders_buck (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) 
CLUSTERED BY (order_id)
SORTED BY (order_id)
INTO 8 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

INSERT INTO orders_buck
SELECT * FROM orders;
  • Use dfs -tail command to confirm data is sorted as expected.

Practice hive on state of the art Big Data cluster - https://labs.itversity.com
You can sign up for our courses on Udemy using $10 coupons - Udemy Coupons - Big Data Courses


0 Likes