Exercise - Partitioning using NYSE Data

Let us perform this task as exercise to understand partitioning.

  • Create Stage Table nyse_eod_stage and load from source…
  • Duration: 20 Minutes
  • Use data from /data/nyse_all/nyse_data
  • Use database YOUR_OS_USER_NAME_nyse
  • Create partitioned table nyse_eod_part
  • Field Names: stockticker, tradedate, openprice, highprice, lowprice, closeprice, volume
  • Determine correct data types based on the values
  • Create Managed table with “,” as delimiter.
  • Partition Field should be tradeyear and of type INT (one partition for corresponding year)
  • Insert data into partitioned table using dynamic partition mode.

Exercise for Databricks Platform

  • Duration: 30 Minutes
  • Task: Develop the script to partition NYSE Data.
  • Create database YOUR_OS_USER_NAME_nyse
  • Create stage table nyse_eod_stage
    • Type: External
    • Delimiter: ,
    • Filed Names: stockticker STRING, tradedate INT, openprice FLOAT, highprice FLOAT, lowprice FLOAT, closeprice FLOAT, volume BIGINT
    • Location: s3://itversity-databricks-data/nyse_all/nyse_data/
  • Create partitioned table nyse_eod_part
  • Field Names: stockticker, tradedate, openprice, highprice, lowprice, closeprice, volume
  • Create Managed table with “,” as delimiter.
  • Partition Field should be tradeyear and of type INT (one partition for corresponding year)
  • Insert data into partitioned table using dynamic partition mode.

Practice hive on state of the art Big Data cluster - https://labs.itversity.com
You can sign up for our courses on Udemy using $10 coupons - Udemy Coupons - Big Data Courses