Sqoop : --split-by Question


#1

Hi all,
With the below sqoop script, why data skewed unevenly btw 4 files and in 1 file there is no data.
Any idea what the issue and why --split-by is not splitting data evenly between these 4 files?
Thanks

######################################################
sqoop import
–connect jdbc:mysql://quickstart.cloudera:3306/retail_db
–username retail_dba
–password cloudera
–target-dir /user/cloudera/retail_db/orders_2013/orders
–query “select * from orders where $CONDITIONS AND order_date like ‘2013-%’”
–split-by order_id
######################################################

740460 2018-01-06 15:30 /user/cloudera/retail_db/orders_2013/orders/part-m-00000
379604 2018-01-06 15:30 /user/cloudera/retail_db/orders_2013/orders/part-m-00001
0 2018-01-06 15:30 /user/cloudera/retail_db/orders_2013/orders/part-m-00002
209109 2018-01-06 15:30 /user/cloudera/retail_db/orders_2013/orders/part-m-00003

Thanks
Venkat


#2

FYI

I tried for 2014, data skewed into 4 files evenly .

sqoop import
–connect jdbc:mysql://quickstart.cloudera:3306/retail_db
–username retail_dba
–password cloudera
–target-dir /user/cloudera/retail_db/orders_2014/orders
–query “select * from orders where $CONDITIONS AND order_date like ‘2014-%’”
–split-by order_id

469901 2018-01-06 16:04 /user/cloudera/retail_db/orders_2014/orders/part-m-00000
469960 2018-01-06 16:04 /user/cloudera/retail_db/orders_2014/orders/part-m-00001
453321 2018-01-06 16:04 /user/cloudera/retail_db/orders_2014/orders/part-m-00002
277589 2018-01-06 16:05 /user/cloudera/retail_db/orders_2014/orders/part-m-00003

Thanks
Venkat