Not so quick guide to HDFS based HBase setup on Multiple Ubuntu boxes
First let us follow the awesome guide here to setup hadoop on two ubunu nodes. Once done with that follow the guide here to setup single mode HBase on both the master and slave.
INFO:
We will be setting up a seperate zookeper on our own instead of the one
which HBase provides, because its always a good idea to keep components
seperated, may be it will be useful when you keep the zookeeper
under supervision later as its a fail fast process.
If you have already setup HBase on single node as described in the previous guide you must be having a HQuorumPeer process running which is internal zookeper given by HBase. Lets get rid of it and use a new zookeeper by downloading it from here. Use following commands
$ cd /usr/local
$ tar zxf zookeeper-x.x.x.tar.gz
$ mv zookeeper-x.x.x zookeeper
$ chown -R hduser:hduser zookeeper
$ tar zxf zookeeper-x.x.x.tar.gz
$ mv zookeeper-x.x.x zookeeper
$ chown -R hduser:hduser zookeeper
Setup a entry in our .bashrc file for zookeeper, export ZK_HOME=/usr/local/zookeeper and also add it to $PATH by export PATH=$PATH:[your old entries]:$ZK_HOME/bin. For configuring the zookeeper goto $ZK_HOME/conf, you may or may not find zoo.cfg, if not do a cp zoo_sample.cfg zoo.cfg. You dont really need to edit it but you may want to edit the dataDir to something like dataDir=/app/zookeeper. This should be enough for running our zookeeper at port 2181 (default). Now lets get rid of the one which HBase starts. This is simple in $HBASE_HOME/conf/hbase-env.sh set export HBASE_MANAGES_ZK=false
Notice the following anology between HDFS and HBASE.
HDFS
HBASE
NameNode
HMaster
DataNode
HRegionServer
SecondaryNameNode
None
You will notice that setting up multiple node HBase cluster is very similar to setting up hadoop cluster. We define the HRegionServers (Slaves) in the file $HBASE_HOME/conf/regionservers and this has to be done on the HMaster (Master) node. And for all the salves the hbase-site.xml should just correctly refer the master machine's IP address.
Ideally this should be enough to get your HBase cluster running. Quick review for the two node setup
On Master Machine: The regionservers file should look like
master
slave
On Slave Machine: If you followed the previous tutorial just change the machine name to slave.
And add the following two properties to both the machines hbase-site.xml which tells it the zookeeper details we just setup.
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>
Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
<description>
Comma separated list of servers in the ZooKeeper Quorum.
</description>
</property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>
Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
<description>
Comma separated list of servers in the ZooKeeper Quorum.
</description>
</property>
Thats it! Now give it a spin. First make sure nothing is running by checking jps. Or stop everything using stop-all.sh. Now try the following commands on the Master machine, which should start everything on slave also.
$ start-dfs.sh #starts the HDFS
$ start-mapred.sh #starts the mapred
$ zkServer.sh start #starts our own zookeeper
$ start-hbase.sh #starts hbase cluster
$ start-mapred.sh #starts the mapred
$ zkServer.sh start #starts our own zookeeper
$ start-hbase.sh #starts hbase cluster
Test the setup using jps.On Master:
$jps
23143 Jps
22985 HRegionServer
22817 HMaster
22767 QuorumPeer
5750 SecondaryNameNode
5399 NameNode
5838 JobTracker
5567 DataNode
6006 TaskTracker
23143 Jps
22985 HRegionServer
22817 HMaster
22767 QuorumPeer
5750 SecondaryNameNode
5399 NameNode
5838 JobTracker
5567 DataNode
6006 TaskTracker
On Slave:
$jps
5613 Jps
5797 HRegionServer
3243 DataNode
8274 TaskTracker
5613 Jps
5797 HRegionServer
3243 DataNode
8274 TaskTracker
Thanks for reading! Hope it helps. Leave comments for any issues.
CAVEAT: One of the most common problem with HBase is due to the address
machines register with zookeeper. Try to make sure master and slave are
resolved to network addresses which other machines can find and you
will save yourself from a lot of frustration :)
No comments:
Post a Comment