HIVE AVRO Partitioning

I was trying this but i don’t understand how this worked in the Video :

hive (retail_stage_avro)>
                        > CREATE TABLE orders_part_avro(
                        > order_id int,
                        > order_date bigint,
                        > order_customer_id int,
                        > order_status string
                        > )
                        > PARTITIONED BY (order_month string)
                        > STORED AS AVRO
                        > LOCATION 'hdfs:///user/hive/warehouse/retail_stage_avro.db/orders_part_avro'
                        > TBLPROPERTIES ('avro.schema.url'='file:///home/cloudera/retail_stage_avro_schema/orders_part_avro.avsc');
Time taken: 0.154 seconds
hive (retail_stage_avro)>
                        > alter table orders_part_avro ADD PARTITION (orders_month='2014-01');
FAILED: SemanticException Partition spec {orders_month=2014-01} contains non-partition columns

Solution given in stackoverflow was with the EXTERNAL table But that is also not working (

Can anyone explain what is the behavior

In your table creation you are using avsc file from home dir. Copy that file to hdfs using put command and then use that hdfs location for the tblproperties in avro.schema.url.

what should we put in TBLPROPERTIES (‘avro.schema.url’=‘file:///home/cloudera/retail_stage_avro_schema/orders_part_avro.avsc’) file??

Can you be specific, which things go in to file and how we should put it

Is it mandatory to keep the schema file in hdfs …???

if we dont know the schema , we can use avro.schema.url to specify the schema of the loaded data.

So that is what i am trying to do … But that Error is something else And also keeping the schema(.avsc) files in Local FS is not an issue Bcoz i created normal (NON-Partitioned) tables with the same script above without the partition & it worked fine.

@jack_AKA_karthik - Follow this link to get the schema details from avro file and push it back to HDFS.

1 Like

I think the issue is table is PARTITIONED BY (order_month string)

where as alter statetement contains table orders_part_avro ADD PARTITION (orders_month=‘2014-01’), It shoul be order_month='2014-01

@vishwanath - How would that make a difference…

I agree with Vishwanath. The column names do not match in PARTITIONED BY and ADD PARTITION. It must be order_month and not orders_month

1 Like

I do not have much idea on, How Hive works internally. But according to my understanding…We defined a table,which says the sub folders(partitions) will have name order_month = some string, But, when we try to create a partition we are providing something which is not that.So, it is giving "semantic error."We get semantic erros in such cases where we are give wrong column names or something like that.It is not syntax error.

thanks @gnanaprakasam , which service we have to install in my cluster to do this avro schema extraction.
And does the same work with parquet also?

@jack_AKA_karthik - See if this helps

For parquet, you can sqoop import & Load/Insert to have data in Hive tables.

1 Like

PARTITIONED BY (order_month string)

(orders_month=‘2014-01’) . i think this is a typo