I want to compare two dataframe in pyspark

pyspark

#1

I have to validate the data from a csv file and hive table in pyspark . By loading the data in pyspark from both the source how should i validate each column of the dataframe . Please help me in this.


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

@anurag_guleria

First, you need to create a data frame for the CSV file and another data frame using a hive table then use join api to join the data frames into a variable as per your requirement.