The concept is simple - extract data from a database (two different tables) and perform a series of left outer joins. That is all.
The issue is that the one table is 300GB, 3 billion rows.
After performing these joins, I get the following errors. Not sure what is actually happening.
org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 389 (run at TrustedRunnableCommand.scala:29) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Too large frame: 2733174092