Running Hadoop in Standalone Mode

CentOS5 MacBookPro VMWare

SSH setup

% ssh-keygen -t dsa
% cat ~/.ssh/ >> authorized_keys
% chmod 600 ~/.ssh/authorized_keys

JDK6 Install

% chmod +x jdk-6u23-linux-i586.bin
% ./jdk-6u23-linux-i586.bin
% sudo cp -r jdk1.6.0_23 /usr/local/jdk1.6.0_23
% sudo ln -s /usr/local/jdk1.6.0_23 /usr/local/jdk
% export PATH=$PATH:/usr/local/jdk/bin

Hadoop Install

% wget
% tar -zxvf hadoop-0.21.0.tar.gz
% sudo cp -r  hadoop-0.21.0 /usr/local/hadoop-0.21.0
% sudo ln -s /usr/local/hadoop-0.21.0 /usr/local/hadoop
% export PATH=$PATH:/usr/local/hadoop/bin

Hadoop Setup

export JAVA_HOME=/usr/local/jdk

Standalone Mode

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

Format HDFS

% /usr/local/hadoop/bin/hadoop namenode -format

Start and Kill Hadoop

% /usr/local/hadoop/bin/
% /usr/local/hadoop/bin/

Start MapReduce
Copy from access_log.txt on local disk to HDFS

% hadoop fs -copyFromLocal access_log.txt /yamakk/log_input.txt
% hadoop fs -ls /yamakk

Sort URLs by number of access

% hadoop jar /usr/local/hadoop/hadoop-mapred-examples-0.21.0.jar grep /yamakk/log_input.txt /yamakk/log_out "GET (\\S+)" 1

check /yamakk/log_out/part-r-00000

