Silly question on how to auto remove comma in the numbers when sqoop import


#1

my original raw csv file contains column of “integer” but like String:

ID, Account, Amount
1,234,"56,789"
2,345,"6,789"

This is actually common in English denoting numbering with comma, however, because this is a csv file, meaning a comma could introduce that column be split into two columns, hence there is a double quote pair around it.

After a normal sqoop import, the Amount is treated as String with value of “56,789” and such.

This makes the later cast string into integer failed. you can cast string “56789” to integer 56789, but you can’t cast string “56,789” to integer.

Can anyone share your thought?

Any clue is greatly appreciated here.

Thank you very much.