I want to compare two dataframe in pyspark



I have to validate the data from a csv file and hive table in pyspark . By loading the data in pyspark from both the source how should i validate each column of the dataframe . Please help me in this.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster



First, you need to create a data frame for the CSV file and another data frame using a hive table then use join api to join the data frames into a variable as per your requirement.