Sqoop --null-string and --non-null-string

#1

Hi guys,

In MYSQL I have a table departments_null_nonnull which has 9 rows and last is null for both department_id and department_name.

I ran the following command :

sqoop import
–connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db \
–username retail_dba
—-password hadoop
—-table departments_null_nonnull
—-hive-import
—-hive-home /apps/hive/warehouse
—-hive-table sqoop_import_hive.departments_null_nonnull
—-create-hive-table
-m -4
—-null-string nvl
—-null-non-string -1
—-split-by department_id

In hive prompt when I did select * from departments_null_nonnull I did not see nvl and -1. In sqoop import too it said fetching 8 records.

But in the above command when I changed -m from 4 to 1 and then executing select * from departments_null_nonnull in hive I was able to see the expected result both -1 and nvl.

I did not understand the importance of changing -m from 4 to 1.

Can anybody explain me.

Thanks in Advance :slight_smile:

0 Likes

#2

If there is null value in split by column, they will be ignored if number of mappers is greater than 1. Ideally you should use indexed column which does not have null values in the split-by clause.

1 Like

#3

Now I understood.

Thank you very much.

0 Likes