How do we clean incoming data in Apache spark projects



I recently encountered an interview question , how do you clean incoming data before processing in your organization ? I would like to know what are the industry standards .

