HBase Multiple Node Setup Guide



Not so quick guide to HDFS based HBase setup on Multiple Ubuntu boxes


First let us follow the awesome guide here to setup hadoop on two ubunu nodes. Once done with that follow the guide here to setup single mode HBase on both the master and slave.

INFO: We will be setting up a seperate zookeper on our own instead of the one which HBase provides, because its always a good idea to keep components seperated, may be it will be useful when you keep the zookeeper under supervision later as its a fail fast process.

If you have already setup HBase on single node as described in the previous guide you must be having a HQuorumPeer process running which is internal zookeper given by HBase. Lets get rid of it and use a new zookeeper by downloading it from here. Use following commands

$ cd /usr/local
$ tar zxf zookeeper-x.x.x.tar.gz
$ mv zookeeper-x.x.x zookeeper
$ chown -R hduser:hduser zookeeper

Setup a entry in our .bashrc file for zookeeper, export ZK_HOME=/usr/local/zookeeper and also add it to $PATH by export PATH=$PATH:[your old entries]:$ZK_HOME/bin. For configuring the zookeeper goto $ZK_HOME/conf, you may or may not find zoo.cfg, if not do a cp zoo_sample.cfg zoo.cfg. You dont really need to edit it but you may want to edit the dataDir to something like dataDir=/app/zookeeper. This should be enough for running our zookeeper at port 2181 (default). Now lets get rid of the one which HBase starts. This is simple in $HBASE_HOME/conf/hbase-env.sh set export HBASE_MANAGES_ZK=false

Notice the following anology between HDFS and HBASE.

HDFS                                    

HBASE

NameNode
HMaster
DataNode
HRegionServer
SecondaryNameNode
None

You will notice that setting up multiple node HBase cluster is very similar to setting up hadoop cluster. We define the HRegionServers (Slaves) in the file $HBASE_HOME/conf/regionservers and this has to be done on the HMaster (Master) node. And for all the salves the hbase-site.xml should just correctly refer the master machine's IP address.

Ideally this should be enough to get your HBase cluster running. Quick review for the two node setup
On Master Machine: The regionservers file should look like

master
slave

On Slave Machine: If you followed the previous tutorial just change the machine name to slave.
And add the following two properties to both the machines hbase-site.xml which tells it the zookeeper details we just setup.

<property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
    <description>
        Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.
    </description>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>master</value>
    <description>
        Comma separated list of servers in the ZooKeeper Quorum.
    </description>
</property>

Thats it! Now give it a spin. First make sure nothing is running by checking jps. Or stop everything using stop-all.sh. Now try the following commands on the Master machine, which should start everything on slave also.

$ start-dfs.sh     #starts the HDFS
$ start-mapred.sh     #starts the mapred
$ zkServer.sh start     #starts our own zookeeper
$ start-hbase.sh     #starts hbase cluster

Test the setup using jps.On Master:

$jps
23143   Jps
22985   HRegionServer
22817   HMaster
22767   QuorumPeer
5750   SecondaryNameNode
5399   NameNode
5838   JobTracker
5567   DataNode
6006   TaskTracker


On Slave:

$jps
5613   Jps
5797   HRegionServer
3243   DataNode
8274   TaskTracker

Thanks for reading! Hope it helps. Leave comments for any issues.

CAVEAT: One of the most common problem with HBase is due to the address machines register with zookeeper. Try to make sure master and slave are resolved to network addresses which other machines can find and you will save yourself from a lot of frustration :)



No comments: