Unable to connect lab from IntelliJ

apache-spark

#1

Dear ItVersity Lab Team,

Am trying to access our lab from IntelliJ to run spark application locally. But it throws me an error.

Below are steps I set up in IntelliJ.

Step: 1: Copy hdfs-site, core-site, mapred-site, yarn-site,hive-site.xml files to my src/main/resources path.
Step 2: Copy spark jars from local to hdfs path and I passed as an argument in code.

Code

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

object HortonConn {
def main(args: Array[String]): Unit = {
Logger.getLogger(“org”).setLevel(Level.ERROR)

val spark = SparkSession.builder()
  .appName("Hortonworks Lab Cluster connection")
  .master("yarn")
  .config("spark.yarn.jars","hdfs://nn01.itversity.com:8020/user/anuvenkatesheee/sparkJars/jars/*.jar")
  .getOrCreate()

println("connection successfull")

}

Error

Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/06 00:12:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:390)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:385)
at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:385)
at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:410)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2330)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:107)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method).

Advice me to resolve this issue.
Please confirm me will I have access to copy this xml files to my local path. If I did anything wrong apologies me.

Thanks
Regards
Venkatesh


Input file doesn't exist for spark application in Intellij Idea
#2

These type of issues are not supported. It is not correct to connect your IDE to the lab directly like this.

You might be able to access if your PC also have CentOS to act as client to the cluster. However, it is not recommended.

Here is the application life cycle:

  • Develop usng IDE after setting up some test data sets
  • Unit test to make sure code is working
  • Build jar file using sbt or maven
  • Ship the jar file to the cluster
  • Run on the cluster using spark-submit

#3

Thanks for the reply.
Am little curious and tried to execute my code directly from local. Nothing personnel other than that.
Sorry for the thing which I did.


#4

No need to be sorry for it and thank you for understanding.