Data validation while sqoop import



Can we do any data validation while importing he data using sqoop.

Scenario: Let’s say I am importing 1 million records from Oracle to a file in HDFS sing sqoop, while importing 5 records got corrupted. How can we identify those corrupted records in HDFS.

Thank you


I see this as common issue in SQOOP. We can trim new line and couple of more characters during SQOOP import. But cant really clean the delimiter and some other bad characters. I have researched enough and did not find any direct option to add in Sqoop Import. We have to custom SQL rather than Select * from table to address these kind of issues. Most of the times, we might to address which column is having bad data and clean in Custom SQL which Sqoop import runs.