Problem 20 from Simulator - Sqoop error

sqoop

#1

problem 20

Instructions
Connect to the MySQL database on the itversity labs using sqoop and import data with employer_name, case_status and count. Make sure data is sorted by employer_name in ascending order and by count in descending order

Data Description
A MySQL instance is running on a remote node ms.itversity.com in the instance. You will find a table that contains 3002373 rows of h1b data

MySQL database information:

Installation on the node ms.itversity.com
Database name is h1b_db
Username: h1b_user
Password: itversity
Table name h1b_data
Output Requirements

Place the h1b related data in files in HDFS directory
/user/jkmasurkar/problem20/solution/
Replace whoami with your OS user name
Use text file format and tab (\t) as delimiter
Hint: You can use Spark with JDBC or Sqoop import with query
You might not get such hints in actual exam
Output should contain employer name, case status and count
End of Problem

My solution was

_sqoop import _
_–connect jdbc:mysql://ms.itversity.com:3306/h1b_db _
_–username h1b_user _
_–password itversity _
_–target-dir “/user/jkmasurkar/problem20/solution22/” _
_–query “select employer_name,case_status, count(*) as count from h1b_db.h1b_data where $CONDITIONS group by employer_name,case_status order by employer_name, count desc” _
_–split-by employer_name _
–fields-terminated-by “\t”

I am getting error shown in attached file

Please help


#2

Looks like split by accepts only numeric fields


#3

Finally I used below query and it worked but I had to take ID col which was not in problem statement.

sqoop import
–connect jdbc:mysql://ms.itversity.com:3306/h1b_db
–username h1b_user
–password itversity
–target-dir “/user/jkmasurkar/problem20/solution22/”
–query “select ID,employer_name, case_status, count(*) as cnt from h1b_db.h1b_data where $CONDITIONS group by ID,employer_name,case_status order by employer_name, cnt desc”
–split-by ID
–fields-terminated-by “\t”


#4

Finally got the solution

sqoop import
-Dorg.apache.sqoop.splitter.allow_text_splitter=true
–connect jdbc:mysql://ms.itversity.com:3306/h1b_db
–username h1b_user
–password itversity
–target-dir “/user/jkmasurkar/problem20/solution23/”
–query “select employer_name, case_status, count(*) as cnt from h1b_db.h1b_data where $CONDITIONS group by employer_name,case_status order by employer_name, cnt desc”
–split-by case_status
–fields-terminated-by “\t”


#5