Exercise 26 - Data ingestion using flume to HDFS

hdfs
flume
#1

Problem Statement

  • Read the data from logs generated by gen_logs application (validate using tail_logs.sh) and save the data in HDFS
  • Use exec as source, file as channel and HDFS as sink
  • Make sure you use appropriate parameters where directories are named based on date
  • Reference material - http://www.itversity.com/topic/ingest-streaming-data-using-flume/

Please provide the following

  • agent configuration file
  • command to start flume agent
  • output of hadoop fs -ls command
0 Likes

#2

#agent configuration file

###flume.conf

# Describe/configure r1
# agent name a1
a1.sources = r1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/gen_logs/logs/access.log

a1.channels = c1

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
a1.channels.c1.type = FILE 

# The maximum size of transaction supported by the channel
a1.channels.c1.capacity = 20000
a1.channels.c1.transactionCapacity = 1000

# Amount of time (in millis) between checkpoints
a1.channels.c1.checkpointInterval 3000

# Max size (in bytes) of a single log file 
a1.channels.c1.maxFileSize = 2146435071

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /user/farhanmisarwala/flume/%y-%m-%d
a1.sinks.k1.hdfs.filePrefix = flume-%y-%m-%d
a1.sinks.k1.hdfs.rollSize = 1048576
a1.sinks.k1.hdfs.rollCount = 100
a1.sinks.k1.hdfs.rollInterval = 120
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.idleTimeout = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks = k1

#command to start flume agent

flume-ng agent --name a1 --conf /home/farhanmisarwala/flume/conf --conf-file /home/farhanmisarwala/flume/conf/flume.conf

#output of hadoop fs -ls command

[farhanmisarwala@gw01 ~]$ hadoop fs -ls -R /user/farhanmisarwala/flume
drwx------   - farhanmisarwala hdfs          0 2016-12-23 01:34 /user/farhanmisarwala/flume/16-12-23
-rw-r--r--   3 farhanmisarwala hdfs      19982 2016-12-23 01:34 /user/farhanmisarwala/flume/16-12-23/flume-16-12-23.1482474806624
-rw-r--r--   3 farhanmisarwala hdfs        205 2016-12-23 01:34 /user/farhanmisarwala/flume/16-12-23/flume-16-12-23.1482474806625.tmp
0 Likes

#3

flm.conf ::

Name the components on this agent

a1.sources = r1
a1.sinks = k1
a1.channels = c1

Describe/configure the source

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/gen_logs/logs/access.log

Describe the sink

a1.sinks.k1.type = hdfs
a1.sinks.k1.fileType=DataStream
a1.sinks.k1.hdfs.path = /user/saswat232/iventfromflume

Use a channel which buffers events in file

a1.channels.c1.type = file
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

Bind the source and sink to the channel

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Command to Start flume agent ::

flume-ng agent --name a1 --conf /home/saswat232/conf --conf-file /home/saswat232/conf/flm.conf

output of hadoop fs -ls command

-rw-r–r-- 3 saswat232 hdfs 1233 2016-12-23 01:48 /user/saswat232/iventfromflume/FlumeData.1482475708699
-rw-r–r-- 3 saswat232 hdfs 1385 2016-12-23 01:48 /user/saswat232/iventfromflume/FlumeData.1482475708700
-rw-r–r-- 3 saswat232 hdfs 1458 2016-12-23 01:48 /user/saswat232/iventfromflume/FlumeData.1482475708701
-rw-r–r-- 3 saswat232 hdfs 1322 2016-12-23 01:48 /user/saswat232/iventfromflume/FlumeData.1482475708702
-rw-r–r-- 3 saswat232 hdfs 1399 2016-12-23 01:48 /user/saswat232/iventfromflume/FlumeData.1482475708703
-rw-r–r-- 3 saswat232 hdfs 1362 2016-12-23 01:48 /user/saswat232/iventfromflume/FlumeData.1482475708704

0 Likes

#4

Agent Configuration File

# Describe/configure r1
# agent name a1

a1.sources = r1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/gen_logs/logs/access.log

a1.channels = c1

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
a1.channels.c1.type = FILE 

# The maximum size of transaction supported by the channel
a1.channels.c1.capacity = 20000
a1.channels.c1.transactionCapacity = 1000

# Amount of time (in millis) between checkpoints
a1.channels.c1.checkpointInterval 3000

# Max size (in bytes) of a single log file 
a1.channels.c1.maxFileSize = 2146435071

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /user/jasonbourne/flume/%y-%m-%d
a1.sinks.k1.hdfs.filePrefix = flume-%y-%m-%d
a1.sinks.k1.hdfs.rollSize = 1048576
a1.sinks.k1.hdfs.rollCount = 100
a1.sinks.k1.hdfs.rollInterval = 120
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.idleTimeout = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks = k1

Command to start flume agent

flume-ng agent \
--name a1 \
--conf /home/jasonbourne/flume/conf \
--conf-file /home/jasonbourne/flume/conf/flume.conf

Output of hadoop fs -ls command

[jasonbourne@gw01 ~]$ hadoop fs -ls /user/jasonbourne/flume/16-12-23
Found 1 items
-rw-r--r--   3 jasonbourne hdfs      14544 2016-12-23 04:33 /user/jasonbourne/flume/16-12-23/flume-16-12-23.1482485576284
0 Likes

#5
agent1.sources=s1
agent1.channels=c1
agent1.sinks=sink1

agent1.sources.s1.type=exec
agent1.sources.s1.command=tail_logs.sh

agent1.channels.c1.type=file

agent1.channels.c1.capacity=100000
agent1.channels.c1.dataDir=/user/infosnehasish/flume/data

agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=/user/infosnehasish/flume/output/events/%y-%m-%d/%H%M/%S
agent1.sinks.sink1.hdfs.useLocalTimeStamp=true

agent1.sinks.sink1.channel=c1
agent1.sources.s1.channels=c1
0 Likes

#6

agent1.sources=s1
agent1.channels=c1
agent1.sinks=sink1

agent1.sources.s1.type=exec
agent1.sources.s1.command=tail_logs.sh

agent1.channels.c1.type=file
#agent1.channels.c1.checkpointDir=/user/nagellarajashyam/flume/checkpoint
agent1.channels.c1.capacity=100000
agent1.channels.c1.dataDir=/user/parulshine92/flume/data

agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=/user/parulshine92/flume/output/events/%y-%m-%d/%H%M/%S
agent1.sinks.sink1.hdfs.useLocalTimeStamp=true

agent1.sinks.sink1.channel=c1
agent1.sources.s1.channels=c1


flume-ng agent --name agent1 --conf /home/parulshine92/flume/conf --conf-file flume/conf/exec.conf

drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:25 /user/parulshine92/flume/output/events/16-12-23
[parulshine92@gw01 ~]$ hdfs dfs -ls /user/parulshine92/flume/output/events/16-12-23/
Found 20 items
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:06 /user/parulshine92/flume/output/events/16-12-23/0606
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:07 /user/parulshine92/flume/output/events/16-12-23/0607
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:08 /user/parulshine92/flume/output/events/16-12-23/0608
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:09 /user/parulshine92/flume/output/events/16-12-23/0609
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:10 /user/parulshine92/flume/output/events/16-12-23/0610
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:11 /user/parulshine92/flume/output/events/16-12-23/0611
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:12 /user/parulshine92/flume/output/events/16-12-23/0612
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:14 /user/parulshine92/flume/output/events/16-12-23/0613
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:14 /user/parulshine92/flume/output/events/16-12-23/0614
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:15 /user/parulshine92/flume/output/events/16-12-23/0615
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:16 /user/parulshine92/flume/output/events/16-12-23/0616
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:17 /user/parulshine92/flume/output/events/16-12-23/0617
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:18 /user/parulshine92/flume/output/events/16-12-23/0618
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:19 /user/parulshine92/flume/output/events/16-12-23/0619
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:20 /user/parulshine92/flume/output/events/16-12-23/0620
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:21 /user/parulshine92/flume/output/events/16-12-23/0621
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:22 /user/parulshine92/flume/output/events/16-12-23/0622
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:23 /user/parulshine92/flume/output/events/16-12-23/0623
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:24 /user/parulshine92/flume/output/events/16-12-23/0624
drwxr-xr-x - parulshine92 hdfs 0 2016-12-23 06:25 /user/parulshine92/flume/output/events/16-12-23/0625

0 Likes

#7
agent1.sources=s1
agent1.channels=c1
agent1.sinks=sink1

agent1.sources.s1.type=exec
agent1.sources.s1.command=tail_logs.sh

agent1.channels.c1.type=file
--agent1.channels.c1.checkpointDir=/user/nagellarajashyam/flume/checkpoint
agent1.channels.c1.capacity=100000
agent1.channels.c1.dataDir=/user/nagellarajashyam/flume/data

agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=/user/nagellarajashyam/flume/output/events/%y-%m-%d/%H%M/%S
agent1.sinks.sink1.hdfs.useLocalTimeStamp=true

agent1.sinks.sink1.channel=c1
agent1.sources.s1.channels=c1

flume-ng --name agent1 --conf /user/nagellarajashyam/flume --conf-file /user/nagellarajashyam/conf/exec.conf

o/p files;

[nagellarajashyam@gw01 flume]$ hdfs dfs -ls -R /user/nagellarajashyam/flume/output
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:03 /user/nagellarajashyam/flume/output/events
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:05 /user/nagellarajashyam/flume/output/events/16-12-23
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:03 /user/nagellarajashyam/flume/output/events/16-12-23/0203
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/55
-rw-r--r--   3 nagellarajashyam hdfs        326 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/55/FlumeData.1482476635534
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/56
-rw-r--r--   3 nagellarajashyam hdfs       1355 2016-12-23 02:03 /user/nagellarajashyam/flume/output/events/16-12-23/0203/56/FlumeData.1482476636425
-rw-r--r--   3 nagellarajashyam hdfs       1428 2016-12-23 02:03 /user/nagellarajashyam/flume/output/events/16-12-23/0203/56/FlumeData.1482476636426
-rw-r--r--   3 nagellarajashyam hdfs        315 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/56/FlumeData.1482476636427
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/57
-rw-r--r--   3 nagellarajashyam hdfs       1221 2016-12-23 02:03 /user/nagellarajashyam/flume/output/events/16-12-23/0203/57/FlumeData.1482476637335
-rw-r--r--   3 nagellarajashyam hdfs        920 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/57/FlumeData.1482476637336
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/58
-rw-r--r--   3 nagellarajashyam hdfs       1444 2016-12-23 02:03 /user/nagellarajashyam/flume/output/events/16-12-23/0203/58/FlumeData.1482476638831
-rw-r--r--   3 nagellarajashyam hdfs        335 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/58/FlumeData.1482476638832
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/59
-rw-r--r--   3 nagellarajashyam hdfs       1394 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0203/59/FlumeData.1482476639144
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/02
-rw-r--r--   3 nagellarajashyam hdfs        766 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/02/FlumeData.1482476642359
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/05
-rw-r--r--   3 nagellarajashyam hdfs        761 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/05/FlumeData.1482476645567
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/08
-rw-r--r--   3 nagellarajashyam hdfs        779 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/08/FlumeData.1482476648776
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/11
-rw-r--r--   3 nagellarajashyam hdfs        344 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/11/FlumeData.1482476651993
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/12
-rw-r--r--   3 nagellarajashyam hdfs        537 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/12/FlumeData.1482476652119
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/15
-rw-r--r--   3 nagellarajashyam hdfs        730 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/15/FlumeData.1482476655361
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/18
-rw-r--r--   3 nagellarajashyam hdfs        718 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/18/FlumeData.1482476658759
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/19
-rw-r--r--   3 nagellarajashyam hdfs        333 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/19/FlumeData.1482476659994
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/20
-rw-r--r--   3 nagellarajashyam hdfs        552 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/20/FlumeData.1482476660110
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/23
-rw-r--r--   3 nagellarajashyam hdfs        743 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/23/FlumeData.1482476663311
drwx------   - nagellarajashyam hdfs          0 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/26
-rw-r--r--   3 nagellarajashyam hdfs        757 2016-12-23 02:04 /user/nagellarajashyam/flume/output/events/16-12-23/0204/26/FlumeData.1482476666528
0 Likes