Getting warning messages while storing the textfile as parquet table


#1

Hi,

I am trying to save textfile as parquet table, iam using the below code for that

res.write.format(“parquet”).saveAsTable(“cca.orders”)

when i query the table in hive, iam getting the below warnings followed by result
can you please let me know if iam missing anything while saving.

WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0
org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.) )?(build ?(.))
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583)
at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:120)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:83)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
at org.apache.h1 2013-07-25 00:00:00.0 11599 CLOSED
2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT
3 2013-07-25 00:00:00.0 12111 COMPLETE
4 2013-07-25 00:00:00.0 8827 CLOSED
5 2013-07-25 00:00:00.0 11318 COMPLETE
6 2013-07-25 00:00:00.0 7130 COMPLETE
7 2013-07-25 00:00:00.0 4530 COMPLETE
8 2013-07-25 00:00:00.0 2911 PROCESSING
9 2013-07-25 00:00:00.0 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00.0 5648 PENDING_PAYMENT


#2

@harikalyan Can you share the complete code once for better understanding?


#3

Hi .

Please find the code below

val orders = sc.textFile("/public/retail_db/orders")
val ordersmap = orders.map(x=>x.split(",")).map(x=>(x(0).toInt,x(1),x(2).toInt,x(3)))
val ordersdf =ordersmap.toDF(“order_id”,“order_date”,“order_customer_id”,“order_status”)
sqlContext.setConf(“spark.sql.parquet.compression.codec”,“snappy”)
ordersdf.write.format(“parquet”).saveAsTable(“cca.orders”)


#4

@harikalyan Issue resolved. Try now and lets us know.


#5

Hi annapurna,

Iam still getting same warnings.


#6

@harikalyan

orders table is not saved as parquet file.


#7

when i query the table in hive, iam getting above warnings followed by result set.
I wanted to check if i missed anything while saving file as parquet table


#8

I executed same code which you tried and its working fine.

val orders = sc.textFile("/public/retail_db/orders")
val ordersmap = orders.map(x=>x.split(",")).map(x=>(x(0).toInt,x(1),x(2).toInt,x(3)))
val ordersdf =ordersmap.toDF("order_id","order_date","order_customer_id","order_status")
sqlContext.setConf("spark.sql.parquet.compression.codec","snappy")
ordersdf.write.format("parquet").saveAsTable("cca.orders_parquet_snappy")

output in hive: