Apache Flume Collecting Twitter Data
We will create an application and get the tweets from it using the experimental twitter source provided by Apache Flume. We will use the memory channel to buffer these tweets and HDFS sink to push these tweets into the HDFS.
Step 1 – Create an application in twitter with your twitter account. Browse to below twitter URL to create twitter application.
a) Sign in to your Twitter account. You will have a Twitter Application Management window where you can create, delete, and manage Twitter Apps.
b) Click on the Create New App button. You will be redirected to a window where you will get an application form in which you have to fill in your details in order to create the App. While filling the website address, give the complete URL pattern, for example, http://example.com.
c) Fill in the details, accept the Developer Agreement when finished, click on the Create your Twitter application button which is at the bottom of the page. If everything goes fine, an App will be created.
d) Under keys and Access Tokens tab at the bottom of the page, you can observe a button named Create my access token. Click on it to generate the access token.
e) Finally, click on the Test OAuth button which is on the right side top of the page. This will lead to a page which displays your Consumer key, Consumer secret, Access token, and Access token secret. Copy these details. These are useful to configure the agent in Flume.
Step 2 – Change the directory to /usr/local/hadoop/sbin
Step 3 – Start all hadoop daemons.
Step 4 – The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.
Step 5 – Create a /user/hduser/twitter_data folder in HDFS.
Step 6 – Copy these twitter jar files in /usr/local/flume/lib/ folder. You can download these jar files from internet.
Step 7 – Edit flume-env.sh file.
Step 8 – Add flume library path to flume-env.sh file. Save and Close.
Step 9 – Configuration File
Given below is an example of the configuration file. Copy this content and save as twitter.conf in the conf folder of Flume.
Dont forget to change consumerKey, consumerSecret, accessToken, accessTokenSecret with your twitter OAuths.
Step 10 – Change the directory to /usr/local/flume
Step 11 – Execution