I have to validate the data from a csv file and hive table in pyspark . By loading the data in pyspark from both the source how should i validate each column of the dataframe . Please help me in this.
Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs
- Click here for access to state of the art 13 node Hadoop and Spark Cluster