Kafka and Streaming Issue


#1

Ref: CCA175 (Udemy)

Link: https://www.udemy.com/cca-175-spark-and-hadoop-developer-certification-scala/learn/v4/t/lecture/8776126?start=0 (Vide # 147 )

I done all the steps and mainly I found 2 ERROR messages in log file:
spark-submit
–class KafkaStreamingDepartmentCount
–master yarn
–conf spark.ui.port=12345
–jars “/usr/hdp/2.5.0.0-1245/kafka/libs/kafka_2.10-0.8.2.1.jar,
/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar,
/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar”
retail_2.10-1.0.jar yarn-client

ERROR:
[vanampudi@gw01 ~]$ spark-submit --class KafkaStreamingDepartmentCount --master yarn --conf spark.ui.port=12345 --jars “/usr/hdp/2.5.0.0-1245/kafka/libs/kafka_2.10-0.8.2.1.jar,
/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar,
/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar” retail_2.10-1.0.jar yarn-client

Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default

Warning: Local jar /home/vanampudi/
/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar does not exist, skipping.
Warning: Local jar /home/vanampudi/
/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar does not exist, skipping

17/12/17 11:48:31 ERROR SparkContext: Jar not found at file:/home/vanampudi/%0A%20%20%20%20%20%20%20%20%20%20/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar
17/12/17 11:48:31 ERROR SparkContext: Jar not found at file:/home/vanampudi/%0A%20%20/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar

But jar files available under:
/usr/hdp/2.5.0.0-1245/kafka/libs:
-rw-r–r-- 1 root root 284923 Mar 20 2017 spark-streaming-kafka_2.10-1.6.2.jar
-rw-r–r-- 1 root root 3991269 Mar 20 2017 kafka_2.10-0.8.2.1.jar

What I did:
I copied to these 2 jar file to my home dir and executed spark-shell again but no luck. May be this is not correct.

************ THE COMPLETE LOG FILE: (copied here as I cannot attach) *******************

[vanampudi@gw01 ~]$ spark-submit --class KafkaStreamingDepartmentCount --master yarn --conf spark.ui.port=12345 --jars “/usr/hdp/2.5.0.0-1245/kafka/libs/kafka_2.10-0.8.2.1.jar,
/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar,
/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar” retail_2.10-1.0.jar yarn-client
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
Warning: Local jar /home/vanampudi/
/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar does not exist, skipping.
Warning: Local jar /home/vanampudi/
/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar does not exist, skipping.
17/12/17 12:02:31 INFO SparkContext: Running Spark version 1.6.2
17/12/17 12:02:31 INFO SecurityManager: Changing view acls to: vanampudi
17/12/17 12:02:31 INFO SecurityManager: Changing modify acls to: vanampudi
17/12/17 12:02:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vanampudi); users with modify permissions: Set(vanampudi)
17/12/17 12:02:34 INFO Utils: Successfully started service ‘sparkDriver’ on port 50466.
17/12/17 12:02:35 INFO Slf4jLogger: Slf4jLogger started
17/12/17 12:02:35 INFO Remoting: Starting remoting
17/12/17 12:02:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.16.1.100:46118]
17/12/17 12:02:35 INFO Utils: Successfully started service ‘sparkDriverActorSystem’ on port 46118.
17/12/17 12:02:35 INFO SparkEnv: Registering MapOutputTracker
17/12/17 12:02:35 INFO SparkEnv: Registering BlockManagerMaster
17/12/17 12:02:35 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4b164305-3281-4c67-9423-e523dea089e8
17/12/17 12:02:35 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/12/17 12:02:35 INFO SparkEnv: Registering OutputCommitCoordinator
17/12/17 12:02:36 INFO Server: jetty-8.y.z-SNAPSHOT
17/12/17 12:02:36 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:12345
17/12/17 12:02:36 INFO Utils: Successfully started service ‘SparkUI’ on port 12345.
17/12/17 12:02:36 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.16.1.100:12345
17/12/17 12:02:36 INFO HttpFileServer: HTTP File server directory is /tmp/spark-1d073fd7-f082-47f8-a2b6-2ed6ab1934c8/httpd-7bad44f3-7b47-41b1-8084-aeaf1e745f99
17/12/17 12:02:36 INFO HttpServer: Starting HTTP Server
17/12/17 12:02:36 INFO Server: jetty-8.y.z-SNAPSHOT
17/12/17 12:02:36 INFO AbstractConnector: Started SocketConnector@0.0.0.0:55017
17/12/17 12:02:36 INFO Utils: Successfully started service ‘HTTP file server’ on port 55017.
17/12/17 12:02:36 INFO SparkContext: Added JAR file:/usr/hdp/2.5.0.0-1245/kafka/libs/kafka_2.10-0.8.2.1.jar at http://172.16.1.100:55017/jars/kafka_2.10-0.8.2.1.jar with timestamp 1513530156512
17/12/17 12:02:36 ERROR SparkContext: Jar not found at file:/home/vanampudi/%0A%20%20%20%20%20%20%20%20%20%20/usr/hdp/2.5.0.0-1245/kafka/libs/spark-streaming-kafka_2.10-1.6.2.jar
17/12/17 12:02:36 ERROR SparkContext: Jar not found at file:/home/vanampudi/%0A%20%20/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar
17/12/17 12:02:36 INFO SparkContext: Added JAR file:/home/vanampudi/retail_2.10-1.0.jar at http://172.16.1.100:55017/jars/retail_2.10-1.0.jar with timestamp 1513530156514
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
17/12/17 12:02:37 INFO TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
17/12/17 12:02:37 INFO RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
17/12/17 12:02:37 INFO AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
17/12/17 12:02:38 INFO Client: Requesting a new application from cluster with 5 NodeManagers
17/12/17 12:02:38 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
17/12/17 12:02:38 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/12/17 12:02:38 INFO Client: Setting up container launch context for our AM
17/12/17 12:02:38 INFO Client: Setting up the launch environment for our AM container
17/12/17 12:02:38 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://nn01.itversity.com:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar
17/12/17 12:02:38 INFO Client: Preparing resources for our AM container
17/12/17 12:02:38 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://nn01.itversity.com:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar
17/12/17 12:02:38 INFO Client: Source and destination file systems are the same. Not copying hdfs://nn01.itversity.com:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar
17/12/17 12:02:38 INFO Client: Uploading resource file:/tmp/spark-1d073fd7-f082-47f8-a2b6-2ed6ab1934c8/__spark_conf__8603136198120763709.zip -> hdfs://nn01.itversity.com:8020/user/vanampudi/.sparkStaging/application_1507687444776_25894/__spark_conf__8603136198120763709.zip
17/12/17 12:02:39 INFO SecurityManager: Changing view acls to: vanampudi
17/12/17 12:02:39 INFO SecurityManager: Changing modify acls to: vanampudi
17/12/17 12:02:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vanampudi); users with modify permissions: Set(vanampudi)
17/12/17 12:02:39 INFO Client: Submitting application 25894 to ResourceManager
17/12/17 12:02:39 INFO YarnClientImpl: Submitted application application_1507687444776_25894
17/12/17 12:02:39 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1507687444776_25894 and attemptId None
17/12/17 12:02:40 INFO Client: Application report for application_1507687444776_25894 (state: ACCEPTED)
17/12/17 12:02:40 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1513530159229
final status: UNDEFINED
tracking URL: http://rm01.itversity.com:8088/proxy/application_1507687444776_25894/
user: vanampudi
17/12/17 12:02:41 INFO Client: Application report for application_1507687444776_25894 (state: ACCEPTED)
17/12/17 12:02:42 INFO Client: Application report for application_1507687444776_25894 (state: ACCEPTED)
17/12/17 12:02:43 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
17/12/17 12:02:43 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rm01.itversity.com, PROXY_URI_BASES -> http://rm01.itversity.com:8088/proxy/application_1507687444776_25894), /proxy/application_1507687444776_25894
17/12/17 12:02:43 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/12/17 12:02:43 INFO Client: Application report for application_1507687444776_25894 (state: ACCEPTED)
17/12/17 12:02:44 INFO Client: Application report for application_1507687444776_25894 (state: RUNNING)
17/12/17 12:02:44 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.16.1.102
ApplicationMaster RPC port: 0
queue: default
start time: 1513530159229
final status: UNDEFINED
tracking URL: http://rm01.itversity.com:8088/proxy/application_1507687444776_25894/
user: vanampudi
17/12/17 12:02:44 INFO YarnClientSchedulerBackend: Application application_1507687444776_25894 has started running.
17/12/17 12:02:44 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 39295.
17/12/17 12:02:44 INFO NettyBlockTransferService: Server created on 39295
17/12/17 12:02:44 INFO BlockManagerMaster: Trying to register BlockManager
17/12/17 12:02:44 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.1.100:39295 with 511.1 MB RAM, BlockManagerId(driver, 172.16.1.100, 39295)
17/12/17 12:02:44 INFO BlockManagerMaster: Registered BlockManager
17/12/17 12:02:44 INFO EventLoggingListener: Logging events to hdfs:///spark-history/application_1507687444776_25894
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (wn01.itversity.com:56686) with ID 1
17/12/17 12:02:48 INFO BlockManagerMasterEndpoint: Registering block manager wn01.itversity.com:41307 with 511.1 MB RAM, BlockManagerId(1, wn01.itversity.com, 41307)
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (wn03.itversity.com:50391) with ID 2
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/12/17 12:02:48 INFO BlockManagerMasterEndpoint: Registering block manager wn03.itversity.com:34445 with 511.1 MB RAM, BlockManagerId(2, wn03.itversity.com, 34445)
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
at KafkaStreamingDepartmentCount$.main(KafkaStreamingDepartmentCount.scala:12)
at KafkaStreamingDepartmentCount.main(KafkaStreamingDepartmentCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
… 11 more
17/12/17 12:02:48 INFO SparkContext: Invoking stop() from shutdown hook
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
17/12/17 12:02:48 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
17/12/17 12:02:48 INFO SparkUI: Stopped Spark web UI at http://172.16.1.100:12345
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: Interrupting monitor thread
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: Shutting down all executors
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: Asking each executor to shut down
17/12/17 12:02:48 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/12/17 12:02:48 INFO YarnClientSchedulerBackend: Stopped
17/12/17 12:02:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/12/17 12:02:48 INFO MemoryStore: MemoryStore cleared
17/12/17 12:02:48 INFO BlockManager: BlockManager stopped
17/12/17 12:02:48 INFO BlockManagerMaster: BlockManagerMaster stopped
17/12/17 12:02:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/12/17 12:02:48 INFO SparkContext: Successfully stopped SparkContext
17/12/17 12:02:48 INFO ShutdownHookManager: Shutdown hook called
17/12/17 12:02:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-1d073fd7-f082-47f8-a2b6-2ed6ab1934c8
17/12/17 12:02:48 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/12/17 12:02:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-1d073fd7-f082-47f8-a2b6-2ed6ab1934c8/httpd-7bad44f3-7b47-41b1-8084-aeaf1e745f99