Hadoop – Hbase Cluster with Docker on AWS


 1. Problems of Hadoop1
– data streaming
– map process and reduce process are seperated
– job tracker manage all jobs alone (too busy)
. cannot manage resource(cpu, memory) effectively
– SPOF weakness (name node dies whole system dies)

2. Solution of Hadoop2 
– job tracker is too busy => Yarn
– namenode availability => Zookeeper

3. Hbase – Thrift – Happybase
– HDFS & Mapreduce those are not real time based service, they are batch service
– Hbase support real time big data service with Hadoop
– Thrift and Happybase help you to use it from other system (like rdb)

4. Docker
– Docker is OS free container based system
– It allow user to use 95% of original H/W capacity
– Support docker build & image, faster to install Hadoop systems

5. Problems to solve
– AWS : need to use same security groups
need to open ports to communicate (ex : 22, 50070, etc)
– Docker : need to use ssh 22 port
docker on CentOS allows only 10G

6. AWS setting
[Link : using EC2 service]

– make security group
– add instance with same security group
– add inbound rules
– open all ICMP rules

50070 , 6060, 6061, 8032, 50090, 50010, 50075, 50020, 50030, 50060, 9090, 9091, 22

– change ssh port of AWS
change AWS ssh port from 22 to something else so that Docker can use port 22 with -p 22:22 . but by doing this you should specify the port to something you change every time you try to access AWS with ssh

vi /etc/ssh/sshd_config
----------------------------------------
# find port:22 and change 
port : 3022
----------------------------------------
sudo service sshd restart

7. Docker setting
[Link : install Docker]

[Link : use Docker]

(1) Docker Build (option 1)
– downloads : [dockerfiles]
– unzip :  unzip Dockerfiles
– change name : cp Dockerfiles-Hbase Dockerfile
– copy conf : cp hadoop-2.7.2 /* .
– build : # docker build –tag=tmddno1/datastore:v1 ./

(2) Download Docker Image (option 2)

docker pull tmddno1/datastore:v1

 (3) Create Daemon Container

docker run --net=host -d <imageid>

(4) Exec Container with bash

docker exec -it <container id> bash

 (5) change sshd_config 

vi /etc/ssh/sshd_config
-----------------------------------
# change bellow

PasswordAuthentication yes
PermitRootLogin yes
----------------------------------

/etc/init.d/ssh restart

 8. SSH Setting

[AWS pem file share]

– WinSCP Download [Download]
– upload your aws pem file on master node with using WinSCP

[AWS pem file add on ssh-agent]

eval 'ssh-agent -s'
eval $(ssh-agent) 
chmod 644 authorized_keys
chmod 400 <pem_keyname>
ssh-add <pem_keyname>

[ /etc/hosts]

192.168.1.109 hadoop-master 
192.168.1.145 hadoop-slave-1
192.168.56.1 hadoop-slave-2

[ssh – rsa key share]

$ ssh-keygen -t rsa 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave1 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave2 
$ chmod 0600~/.ssh/authorized_keys $ exit

9. Run Hadoop

[Slaves]

vi /hadoop-2.7.2/etc/hadoop/slaves
---------------------------------------
# set hosts defined on /etc/hosts
hadoop-slave-1
hadoop-slave-2 

[modify core-site.xml]

<configuration>
  <property>
   <name>fs.default.name</name>
   <value>hdfs://hadoop-master:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/root/tmp</value>
  </property>
</configuration>

 [start-all.sh]

hadoop namenode -format
start-all.sh(start-dfs.sh + start_yarn.sh)
stop-all.sh(stop-dfs.sh + stop_yarn.sh)

10. ZooKeeper

[zookeeper/conf/zoo.cfg]

server.1=master
server.2=slave2

[make dummy my id file]

cd /root/zookeeper
vi 1  <== on the master
vi 2  <== on the slave 2

[start zookeepr on every server]

/zookeeper/bin/zkServer.sh start

11. Hbase Setting

[hbase-env.sh]

export HBASE_MANAGES_ZK=false

[hbase-site.xml]

<configuration>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase:rootdir</name>
  <value>hdfs://master:9000/hbase</value>
</property>
<property>
  <name>hbase.master</name>
  <value>master:6000</value>
</property>
<property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/root/zookeeper</value>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>master, slave2, slave3</value>
</property>
</configuration>

[regionservers]

# hbase/conf/regionservers

master
slave1
slave2
slave3

[start hbase]

start-hbase.sh

12. Hbase Thrift

- hbase start : start-hbase.sh
- thrift start : hbase thrift start -p <port> --infoport <port>

13. Check Site server running correctly

Yarn : http://localhost:8088
Hadoop : http://localhost:50070
Hbase : http://localhost:9095

14. Hbase Shell Test

hbase shell

15. Install happy base (on the client server)
– I will explain about happybase later 

sudo yum install gcc
pip install happybase

 

Categories: Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *