Spark Code Failure: org.apache.spark.shuffle.FetchFailedException: Too large frame


#1

The concept is simple - extract data from a database (two different tables) and perform a series of left outer joins. That is all.

The issue is that the one table is 300GB, 3 billion rows.

After performing these joins, I get the following errors. Not sure what is actually happening.

org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 389 (run at TrustedRunnableCommand.scala:29) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Too large frame: 2733174092

Please help.

Thanks


#2

Can you try adding this parameter

sqlContext.setConf(“spark.sql.shuffle.partitions”,“10”)