sparkSQL : creating external hive table from ORC and parquet file



Please help me out with below problem.

You have been given MySQL DB with following details. You have
been given following product.csv file
1001,PEN,Pen Red,5000,1.23
1002,PEN,Pen Blue,8000,1.25
1003,PEN,Pen Black,2000,1.25
1004,PEC,Pencil 2B,10000,0.48
1005,PEC,Pencil 2H,8000,0.49
1006,PEC,Pencil HB,0,9999.99
Now accomplish following activities.

  1. Create a Hive ORC table using SparkSql
  2. Load this data in Hive table.
  3. Create a Hive parquet table using SparkSQL and load data in it.

launch pyspark --master yarn --conf spark.ui.port=1288

from pyspark.sql import Row p:(Row(productID=int(p.split(",")[0]),productCode=p.split(",")[1],name=p.split(",")[2],quantity=int(p.split(",")[3]),
price= float(p.split(",")[4])))).toDF()

launch hive
CREATE EXTERNAL TABLE products_81_orc (
productid int,
code string,
name string,
quantity int,
price float) STORED AS orc
LOCATION ‘/user/mudassir_s2000/product_81_orc’;

output from HIVE
hive (mudassirsk_retail_db_txt)> select * from products_81_orc;
NULL 1.23 PEN 1001 5000.0
NULL 1.25 PEN 1002 8000.0
NULL 1.25 PEN 1003 2000.0
NULL 0.48 PEC 1004 10000.0
NULL 0.49 PEC 1005 8000.0
NULL 9999.99 PEC 1006 0.0
Time taken: 0.21 seconds, Fetched: 6 row(s)

==>ISSUE here is table output and file structure is not matching. Evwen though I have created the table similar to file structure
launch hive
CREATE EXTERNAL TABLE product_parquet_table (productid int,code string,name string,quantity int, price float) STORED AS parquet
LOCATION ‘/user/mudassir_s2000/product_parquet_table’;

hive (mudassirsk_retail_db_txt)> select * from product_parquet_table;

==>ISSUE here I am facing is data is not getting loaded in external table

please help me out with this issues.