Hadoop on cluster was setup using the instructions found in Cloudera Document.
Both dev1 and dev2 machines have
1. NameNode Directory is found at /etc/hadoop/hdfs/namenode
2. DataNode Directory is found at /etc/hadoop/hdfs/datanode
3. Configuration is found at /etc/hadoop/conf
dev2 machines can run multiple datanodes
1. DataNode2 and Datanode3 Directories are at /etc/hadoop/hdfs/datanode2 and /etc/hadoop/hdfs/datanode3 respectively.
2. Configurations are found at /etc/hadoop/conf2 and /etc/hadoop/conf3 respectively.
sudo service hadoop-0.20-jobtracker start"
sudo service hadoop-0.20-tasktracker start"
sudo service hadoop-0.20-namenode start"
sudo service hadoop-0.20-datanode start" and services for other 2 are hadoop-0.20-datanode2 and hadoop-0.20-datanode3
Install scribe-service-hdfs-orig package. This installs the scribe binary in location /usr/bin.
1. We have a wrapper script which sets the hadoop library class path present in /usr/bin/scribe.sh which should be run in background
2. Logs everything to /tmp/scribe.log.
3. The configuration is present in /etc/scribe.conf.
4. We also have scribe_ctrl.py script present in /usr/bin for stopping, status, reloading etc. for controlling scribe.
NOTE : PLEASE START SCRIBE AS "conduit" USER
Zookeeper is setup using the Cloudera Document. No changes are made from default configuration.
Start "sudo service hadoop-zookeeper-server start"
Stop "sudo service hadoop-zookeeper-server stop"
Conduit is setup using the generated deb package using dpkg command.
Use the default configuration files present in /etc/conduit/conf to /usr/local/conduit/conf.
All the configurations are attached here. Remove Symlink /usr/local/conduit and symlink to new /usr/local/conduit-newversion.
All the available config properties conduit.cfg.
Sample configuation is available at conduit.xml for configuring conduit.xml file.
Starting conduit using the following command
cd /var/log/conduit
. /usr/local/conduit/bin/conduit.sh start /usr/local/conduit/conf/conduit.cfg
For Stopping
/usr/local/conduit/bin/conduit.sh stop /usr/local/conduit/conf/conduit.cfg
Logs are written to the cwd where conduit is started in the above case /var/log/conduit
Click here for the mailing script
1. greps for any exception in logs if found gets the stacktrace from next few statements, caches all the necessary lines
2. checkpoints the file so that same exceptions are not sent again.
3. constructs email using email library in python and adds the caches lines and attaches conduit.log to the email to user@
To change the open file handle count to 10240, do the following