Flume Streaming JAR loading errors

flume

#1

Hello All,
I am running the flume script copied from resources section. I am using lab environment. I am getting following errors -


spark-submit \
--class FlumeStreamingDepartmentCount \
--master yarn \
--conf spark.ui.port=12345 \
--jars "/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume-sink_2.10-1.6.2.jar,
/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume_2.10-1.6.2.jar,
/usr/hdp/2.5.0.0-1245/flume/lib/commons-lang3-3.5.jar,
/usr/hdp/2.5.0.0-1245/flume/lib/flume-ng-sdk-1.5.2.2.5.0.0-1245.jar" \
sparkstreamingdemo_2.10-1.0.jar yarn-client gw03.itversity.com 8123

Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
Warning: Local jar /home/omkarprabhu/flumeStreamingDemo/
/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume_2.10-1.6.2.jar does not exist, skipping.
Warning: Local jar /home/omkarprabhu/flumeStreamingDemo/
/usr/hdp/2.5.0.0-1245/flume/lib/commons-lang3-3.5.jar does not exist, skipping.
Warning: Local jar /home/omkarprabhu/flumeStreamingDemo/
/usr/hdp/2.5.0.0-1245/flume/lib/flume-ng-sdk-1.5.2.2.5.0.0-1245.jar does not exist, skipping.
18/04/20 00:41:06 INFO SparkContext: Running Spark version 1.6.2
18/04/20 00:41:06 INFO SecurityManager: Changing view acls to: omkarprabhu
18/04/20 00:41:06 INFO SecurityManager: Changing modify acls to: omkarprabhu
18/04/20 00:41:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(omkarprabhu); users with modify permissions: Set(omkarprabhu)
18/04/20 00:41:09 INFO Utils: Successfully started service ‘sparkDriver’ on port 35794.
18/04/20 00:41:09 INFO Slf4jLogger: Slf4jLogger started
18/04/20 00:41:09 INFO Remoting: Starting remoting
18/04/20 00:41:09 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.16.1.113:41126]
18/04/20 00:41:09 INFO Utils: Successfully started service ‘sparkDriverActorSystem’ on port 41126.
18/04/20 00:41:09 INFO SparkEnv: Registering MapOutputTracker
18/04/20 00:41:09 INFO SparkEnv: Registering BlockManagerMaster
18/04/20 00:41:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-71ec8aca-d721-41e3-8776-ed01310f2179
18/04/20 00:41:09 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
18/04/20 00:41:09 INFO SparkEnv: Registering OutputCommitCoordinator
18/04/20 00:41:10 INFO Server: jetty-8.y.z-SNAPSHOT
18/04/20 00:41:10 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:12345
18/04/20 00:41:10 INFO Utils: Successfully started service ‘SparkUI’ on port 12345.
18/04/20 00:41:10 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.16.1.113:12345
18/04/20 00:41:10 INFO HttpFileServer: HTTP File server directory is /tmp/spark-c9d6a9cd-ce81-4b8e-98ca-b10850263837/httpd-3bfb52fa-5f62-4670-8e47-f3aa809a612b
18/04/20 00:41:10 INFO HttpServer: Starting HTTP Server
18/04/20 00:41:10 INFO Server: jetty-8.y.z-SNAPSHOT
18/04/20 00:41:10 INFO AbstractConnector: Started SocketConnector@0.0.0.0:43423
18/04/20 00:41:10 INFO Utils: Successfully started service ‘HTTP file server’ on port 43423.
18/04/20 00:41:10 INFO SparkContext: Added JAR file:/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume-sink_2.10-1.6.2.jar at http://172.16.1.113:43423/jars/spark-streaming-flume-sink_2.10-1.6.2.jar with timestamp 1524199270157
18/04/20 00:41:10 ERROR SparkContext: Jar not found at file:/home/omkarprabhu/flumeStreamingDemo/%0A/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume_2.10-1.6.2.jar
18/04/20 00:41:10 ERROR SparkContext: Jar not found at file:/home/omkarprabhu/flumeStreamingDemo/%0A/usr/hdp/2.5.0.0-1245/flume/lib/commons-lang3-3.5.jar
18/04/20 00:41:10 ERROR SparkContext: Jar not found at file:/home/omkarprabhu/flumeStreamingDemo/%0A/usr/hdp/2.5.0.0-1245/flume/lib/flume-ng-sdk-1.5.2.2.5.0.0-1245.jar
18/04/20 00:41:10 INFO SparkContext: Added JAR file:/home/omkarprabhu/flumeStreamingDemo/sparkstreamingdemo_2.10-1.0.jar at http://172.16.1.113:43423/jars/sparkstreamingdemo_2.10-1.0.jar with timestamp 1524199270159
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
18/04/20 00:41:10 INFO TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
18/04/20 00:41:10 INFO RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/04/20 00:41:11 INFO AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
18/04/20 00:41:11 INFO Client: Requesting a new application from cluster with 7 NodeManagers
18/04/20 00:41:11 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
18/04/20 00:41:11 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/04/20 00:41:11 INFO Client: Setting up container launch context for our AM
18/04/20 00:41:11 INFO Client: Setting up the launch environment for our AM container
18/04/20 00:41:11 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://nn01.itversity.com:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar
18/04/20 00:41:11 INFO Client: Preparing resources for our AM container
18/04/20 00:41:11 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://nn01.itversity.com:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar
18/04/20 00:41:11 INFO Client: Source and destination file systems are the same. Not copying hdfs://nn01.itversity.com:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar
18/04/20 00:41:12 INFO Client: Uploading resource file:/tmp/spark-c9d6a9cd-ce81-4b8e-98ca-b10850263837/__spark_conf__5842475546911729097.zip -> hdfs://nn01.itversity.com:8020/user/omkarprabhu/.sparkStaging/application_1520592249193_51186/__spark_conf__5842475546911729097.zip
18/04/20 00:41:12 INFO SecurityManager: Changing view acls to: omkarprabhu
18/04/20 00:41:12 INFO SecurityManager: Changing modify acls to: omkarprabhu
18/04/20 00:41:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(omkarprabhu); users with modify permissions: Set(omkarprabhu)
18/04/20 00:41:12 INFO Client: Submitting application 51186 to ResourceManager
18/04/20 00:41:12 INFO YarnClientImpl: Submitted application application_1520592249193_51186
18/04/20 00:41:12 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1520592249193_51186 and attemptId None
18/04/20 00:41:13 INFO Client: Application report for application_1520592249193_51186 (state: ACCEPTED)
18/04/20 00:41:13 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1524199270245
final status: UNDEFINED
tracking URL: http://rm01.itversity.com:8088/proxy/application_1520592249193_51186/
user: omkarprabhu
18/04/20 00:41:14 INFO Client: Application report for application_1520592249193_51186 (state: ACCEPTED)
18/04/20 00:41:15 INFO Client: Application report for application_1520592249193_51186 (state: ACCEPTED)
18/04/20 00:41:15 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
18/04/20 00:41:15 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rm01.itversity.com, PROXY_URI_BASES -> http://rm01.itversity.com:8088/proxy/application_1520592249193_51186), /proxy/application_1520592249193_51186
18/04/20 00:41:15 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
18/04/20 00:41:16 INFO Client: Application report for application_1520592249193_51186 (state: RUNNING)
18/04/20 00:41:16 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.16.1.102
ApplicationMaster RPC port: 0
queue: default
start time: 1524199270245
final status: UNDEFINED
tracking URL: http://rm01.itversity.com:8088/proxy/application_1520592249193_51186/
user: omkarprabhu
18/04/20 00:41:16 INFO YarnClientSchedulerBackend: Application application_1520592249193_51186 has started running.
18/04/20 00:41:16 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 38116.
18/04/20 00:41:16 INFO NettyBlockTransferService: Server created on 38116
18/04/20 00:41:16 INFO BlockManagerMaster: Trying to register BlockManager
18/04/20 00:41:16 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.1.113:38116 with 511.1 MB RAM, BlockManagerId(driver, 172.16.1.113, 38116)
18/04/20 00:41:16 INFO BlockManagerMaster: Registered BlockManager
18/04/20 00:41:16 INFO EventLoggingListener: Logging events to hdfs:///spark-history/application_1520592249193_51186
18/04/20 00:41:20 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (wn02.itversity.com:56645) with ID 2
18/04/20 00:41:20 INFO BlockManagerMasterEndpoint: Registering block manager wn02.itversity.com:37271 with 511.1 MB RAM, BlockManagerId(2, wn02.itversity.com, 37271)
18/04/20 00:41:21 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (wn05.itversity.com:39647) with ID 1
18/04/20 00:41:21 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/04/20 00:41:21 INFO BlockManagerMasterEndpoint: Registering block manager wn05.itversity.com:42070 with 511.1 MB RAM, BlockManagerId(1, wn05.itversity.com, 42070)
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/streaming/flume/FlumeUtils$
at FlumeStreamingDepartmentCount$.main(flumestreaming.scala:30)
at FlumeStreamingDepartmentCount.main(flumestreaming.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.flume.FlumeUtils$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
… 11 more
18/04/20 00:41:21 INFO SparkContext: Invoking stop() from shutdown hook
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
18/04/20 00:41:21 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
18/04/20 00:41:21 INFO SparkUI: Stopped Spark web UI at http://172.16.1.113:12345
18/04/20 00:41:21 INFO YarnClientSchedulerBackend: Interrupting monitor thread
18/04/20 00:41:21 INFO YarnClientSchedulerBackend: Shutting down all executors
18/04/20 00:41:21 INFO YarnClientSchedulerBackend: Asking each executor to shut down
18/04/20 00:41:21 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
18/04/20 00:41:21 INFO YarnClientSchedulerBackend: Stopped
18/04/20 00:41:21 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/20 00:41:21 INFO MemoryStore: MemoryStore cleared
18/04/20 00:41:21 INFO BlockManager: BlockManager stopped
18/04/20 00:41:21 INFO BlockManagerMaster: BlockManagerMaster stopped
18/04/20 00:41:21 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/20 00:41:21 INFO SparkContext: Successfully stopped SparkContext
18/04/20 00:41:21 INFO ShutdownHookManager: Shutdown hook called
18/04/20 00:41:21 INFO ShutdownHookManager: Deleting directory /tmp/spark-c9d6a9cd-ce81-4b8e-98ca-b10850263837/httpd-3bfb52fa-5f62-4670-8e47-f3aa809a612b
18/04/20 00:41:21 INFO ShutdownHookManager: Deleting directory /tmp/spark-c9d6a9cd-ce81-4b8e-98ca-b10850263837
[omkarprabhu@gw03 flumeStreamingDemo]$


My queries -

  1. Why spark submit is looking jars under my home directory path?
  2. Why FlumeUtils is not found?

#2

Below command is working fine, you should have spaces or new line character between the jar files passed under --jars.

spark-submit \
--class FlumeStreamingDepartmentCount \
--master yarn --conf spark.ui.port=12345 \
--jars "/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume-sink_2.10-1.6.2.jar,/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume_2.10-1.6.2.jar,/usr/hdp/2.5.0.0-1245/flume/lib/commons-lang3-3.5.jar,/usr/hdp/2.5.0.0-1245/flume/lib/flume-ng-sdk-1.5.2.2.5.0.0-1245.jar" \
sparkstreamingdemo_2.10-1.0.jar yarn-client gw03.itversity.com 8123

Having access to the lab have facilitated us to troubleshoot the issue and come up with the solution. For accelerated learning, please visit our labs

Satisfaction guaranteed with high quality content and state of the art labs.