Flume help required

flume

#1

I am trying to run the file flume-twitteranalysis.conf in the Cloudera VM 5.12.0

source file flume-twitteranalysis.conf

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = ULyDsn0RtWyT1zUU8sBcgV6IA
TwitterAgent.sources.Twitter.consumerSecret = u2OKRKJqq38YiyP5oOijZcntAe1UfA1h9w12kr2WJqC1Ajpv2s
TwitterAgent.sources.Twitter.accessToken = 986649262170824707-MxnyXa7wm4fuv81X43RZGJdQRwwvfXn
TwitterAgent.sources.Twitter.accessTokenSecret = cWaMv7SJsAUv7ZPLga62L6LHIFV1aaDek7z9YEeWVnE3W
TwitterAgent.sources.Twitter.keywords = BigData, Spark, Hadoop

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollsize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

Invoking command

flume-ng agent -name TwitterAgent --conf-file /home/cloudera/flume_twitteranalysis.conf -D flume.root.logger=INFO,console

here is the error

18/04/19 09:20:14 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
18/04/19 09:20:14 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/cloudera/flume_twitteranalysis.conf
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Processing:HDFS
18/04/19 09:20:14 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
18/04/19 09:20:14 INFO node.AbstractConfigurationProvider: Creating channels
18/04/19 09:20:14 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
18/04/19 09:20:14 INFO node.AbstractConfigurationProvider: Created channel MemChannel
18/04/19 09:20:14 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
18/04/19 09:20:14 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
18/04/19 09:20:14 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [Twitter, HDFS]
18/04/19 09:20:14 INFO node.Application: Starting new configuration:{ sourceRunners:{Twitter=EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@a842913 counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
18/04/19 09:20:14 INFO node.Application: Starting Channel MemChannel
18/04/19 09:20:14 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
18/04/19 09:20:14 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
18/04/19 09:20:14 INFO node.Application: Starting Sink HDFS
18/04/19 09:20:14 INFO node.Application: Starting Source Twitter
18/04/19 09:20:14 ERROR lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:140)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
18/04/19 09:20:14 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
18/04/19 09:20:14 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
18/04/19 09:20:14 WARN lifecycle.LifecycleSupervisor: Component EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:STOP} } stopped, since it could not besuccessfully started due to missing dependencies


#2

I had the dependency jar in the following location
/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar


#3

Here is what working for me!!!

I have made these 2 changes

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://nn01.itversity.com:8020/user/training/tweets/

You have given hdfs path like this - hdfs://quickstart.cloudera:8020/tweets/
Try changing to hdfs://quickstart.cloudera:8020/user/cloudera/tweets/

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = ULyDsn0RtWyT1zUU8sBcgV6IA
TwitterAgent.sources.Twitter.consumerSecret = u2OKRKJqq38YiyP5oOijZcntAe1UfA1h9w12kr2WJqC1Ajpv2s
TwitterAgent.sources.Twitter.accessToken = 986649262170824707-MxnyXa7wm4fuv81X43RZGJdQRwwvfXn
TwitterAgent.sources.Twitter.accessTokenSecret = cWaMv7SJsAUv7ZPLga62L6LHIFV1aaDek7z9YEeWVnE3W
TwitterAgent.sources.Twitter.keywords = BigData, Spark, Hadoop

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://nn01.itversity.com:8020/user/training/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollsize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

Above code is tested in our state of the art Big Data cluster



#4

sir, can you please post the output of this


#5

Here is the screenshot: