Creating external tables from parquet file

Hello Friends,
what is the exact syntax to create an external table from parquet file.
this is what i have done and i get null values.

my location /user/abinashparida/parquet’ contains all the files and directory of parquet as i did a sqoop import as-parquetfile on this directory and using the same directory as location here.

hive> CREATE EXTERNAL TABLE orders(
> id int,order_daate string,cust_id int,status string)
> STORED AS PARQUET
> LOCATION ‘/user/abinashparida/parquet’
> ;
OK
Time taken: 0.111 seconds
hive> show tables;
OK
orders
products
Time taken: 0.018 seconds, Fetched: 2 row(s)
hive> select * from orders limit 10;
OK
SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
NULL NULL NULL NULL
Time taken: 0.251 seconds, Fetched: 10 row(s)

@vinodh_venugopalan : i think you also had this same issue and were able to resolve . can you please look into this.

thank you.

Hi,
Is the data present in “’/user/abinashparida/parquet’” in parquet format??
If not, you can not directly create as parquet.

create a normal table and load data into it.
then create a parquet table and insert overwrite into it from normal table.

@itversity : hello durga sir , can you please look into this issue.

hi,
this is the data in the folder
hadoop fs -ls /user/abinashparida/parquet
Found 2 items
drwxr-xr-x - abinashparida hdfs 0 2017-01-12 07:09 /user/abinashparida/parquet/.metadata
-rw-r–r-- 3 abinashparida hdfs 488078 2017-01-12 07:10 /user/abinashparida/parquet/394d3db0-ea1d-4619-abc3-f507f82bc122.parquet

If data is in parquet format, then the issue could be with delimiters.
Check the delimiter in the file and use "fields terminated by " in Hive create table query.

I could able to load the data successfully into Parquet formatted external table.
hive> CREATE EXTERNAL TABLE orders_parquet(
>
> id int,order_date string,cust_id int,status string)
> STORED AS PARQUET
> LOCATION ‘/user/maryhelenarose/parquet’;
OK
Time taken: 0.284 seconds

I can execute SELECT COUNT(*) FROM orders_parquet and get the correct number of rows back as
"Total MapReduce CPU Time Spent: 5 seconds 880 msec
OK
68883
Time taken: 21.295 seconds, Fetched: 1 row(s) "

but when I ask for the first row it says:

hive> select * from orders_parquet limit 10;
OK
SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hado
op.io.LongWritable

Please can you someone help me to identify the error here?

]

already tried that , but it didnot work

Column names in hive should match with column names in parquet file -

    CREATE EXTERNAL TABLE ORDERS (
    ORDER_ID INT, 
    ORDER_DATE BIGINT,
    order_customer_id INT,
    order_status STRING)
    STORED AS PARQUET
    LOCATION  '/user/abinashparida/parquet';

Yes. Below query is working
CREATE EXTERNAL TABLE ORDERS (
ORDER_ID INT,
ORDER_DATE BIGINT,
order_customer_id INT,
order_status STRING)
STORED AS PARQUET
LOCATION ‘/user/abinashparida/parquet’;

If You have done sqoop import as parquet file, Then I guess you should use fields terminated by ‘,’.

yes i already tried using delimiters . still it gives the error.

if you have time, can you try once, importing as parquet file and then creating the external table on that.

how did you proceed with this scenario?

@itversity : hello sir , can you please let us know what the issue here exactly is .
thanks in advance.

@Avinash_Parida and @Helena_Rose what happens when you select column names instead of select *?

Please try and post here…

same thing since the data is getting stored as null values , instead of the actual values.

@Helena_Rose: count matches , because it imports the data into the table but as null so number of records in parquet file is equal to number of nulls .
problem is why its storing data as null instead of actual values.
select id from orders limit 10;
OK
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
Time taken: 0.198 seconds, Fetched: 10 row(s)
hive>

@itversity : sir please look into this.

yes,it is showing NULL values when I select based on column names

hive> select id from orders_parquet limit 5;
OK
NULL
NULL
NULL
NULL
NULL
Time taken: 0.205 seconds, Fetched: 5 row(s)
hive>

@Helena_Rose : ithink this synatx doesnot work in hortonworks and the lab we are using is hortonworks.
it should work in a cloudera distribution.

I have created an external parquet table using avro schema.

script is below,

CREATE TABLE orders_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe’
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat’
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat’
LOCATION 'hdfs:////user/maryhelenarose/sqoop/orders-avro’
TBLPROPERTIES (‘avro.schema.url’=‘hdfs://nn01.itversity.com:8020/user/maryhelenarose/orders.avsc’);

CREATE EXTERNAL TABLE orders_parquet LIKE orders_avro
STORED AS PARQUET
LOCATION ‘hdfs://nn01.itversity.com:8020/user/maryhelenarose/parquet’;

hive> select * from orders_parquet limit 10;
OK
SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
1 1374724800000 11599 CLOSED
2 1374724800000 256 PENDING_PAYMENT
3 1374724800000 12111 COMPLETE
4 1374724800000 8827 CLOSED
5 1374724800000 11318 COMPLETE
6 1374724800000 7130 COMPLETE
7 1374724800000 4530 COMPLETE
8 1374724800000 2911 PROCESSING
9 1374724800000 5657 PENDING_PAYMENT
10 1374724800000 5648 PENDING_PAYMENT
Time taken: 0.689 seconds, Fetched: 10 row(s)

1 Like