How do we clean incoming data in Apache spark projects



I recently encountered an interview question , how do you clean incoming data before processing in your organization ? I would like to know what are the industry standards .

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster