How do we clean incoming data in Apache spark projects

apache-spark

#1

I recently encountered an interview question , how do you clean incoming data before processing in your organization ? I would like to know what are the industry standards .


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster