Apache Flume Moving Tomcat Logs to HDFS


Apache Flume Moving Tomcat Logs to HDFS

Now we will see, how you can move apache tomcat logs into the HDFS.

Step 1 – Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 – Start all hadoop daemons.

$ start-all.sh

Step 3 – The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Step 4 – Create a /user/hduser/flumedata folder in HDFS.

$ hdfs dfs -mkdir hdfs://localhost:9000/flumedata

Step 5 – Change the directory to /usr/local/tomcat/bin

$ cd $CATALINA_HOME/bin

Step 6 – Starting the tomcat web server.

$ ./startup.sh

Step 7 – Check the web here. Open a browser and type the following URL.

http://127.0.0.1:8080

Step 8 – Change the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 9 – Configuration File

Given below is an example of the configuration file. Copy this content and save as nethd.conf. In my case, net.conf files is in /usr/local/flume/conf/ folder.

Dont forget to change this line with your tomcat log file name

agent.sources.tail‐source.command = cat ‐F /usr/local/tomcat/logs/access_log.2015-12-26.txt

flume.conf

agent.sources = tail‐source
agent.channels = memoryChannel 
agent.sinks = hdfs‐sink 

agent.sources.tail‐source.type = exec
agent.sources.tail‐source.command = cat ‐F /usr/local/tomcat/logs/access_log.2015-12-26.txt
#agent.sources.tail‐source.batchSize = 10
agent.sources.tail‐source.channels = memoryChannel 


agent.channels.memoryChannel.type = memory
#agent.channels.memoryChannel.capacity = 100000
#agent.channels.memoryChannel.transactionCapacity = 10000
#agent.channels.memoryChannel.keep-alive=2


agent.sinks.hdfs‐sink.type = hdfs 
agent.sinks.hdfs‐sink.channel = memoryChannel 
agent.sinks.hdfs‐sink.hdfs.path = hdfs://localhost:9000/flumedata/
agent.sinks.hdfs‐sink.hdfs.fileType = DataStream
agent.sinks.hdfs‐sink.hdfs.writeFormat = Text
#agent.sinks.hdfs‐sink.hdfs.filePrefix=access_%y-%m-%d-%H-%M
agent.sinks.hdfs‐sink.hdfs.fileSuffix=.txt

#agent.sinks.hdfs‐sink.hdfs.batchSize = 10
#agent.sinks.hdfs‐sink.hdfs.rollSize = 0
#agent.sinks.hdfs‐sink.hdfs.rollCount = 10
#agent.sinks.hdfs‐sink.hdfs.rollInterval = 30

Step 10 – Execution

$ bin/flume-ng agent -c ./conf -f conf/flume.conf --name agent -Dflume.root.logger=INFO,console

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *