Hadoop – Hbase Cluster with Docker on AWS
1. Problems of Hadoop1
– data streaming
– map process and reduce process are seperated
– job tracker manage all jobs alone (too busy)
. cannot manage resource(cpu, memory) effectively
– SPOF weakness (name node dies whole system dies)
2. Solution of Hadoop2
– job tracker is too busy => Yarn
– namenode availability => Zookeeper
3. Hbase – Thrift – Happybase
– HDFS & Mapreduce those are not real time based service, they are batch service
– Hbase support real time big data service with Hadoop
– Thrift and Happybase help you to use it from other system (like rdb)
4. Docker
– Docker is OS free container based system
– It allow user to use 95% of original H/W capacity
– Support docker build & image, faster to install Hadoop systems
5. Problems to solve
– AWS : need to use same security groups
need to open ports to communicate (ex : 22, 50070, etc)
– Docker : need to use ssh 22 port
docker on CentOS allows only 10G
6. AWS setting
[Link : using EC2 service]
– make security group
– add instance with same security group
– add inbound rules
– open all ICMP rules
50070 , 6060, 6061, 8032, 50090, 50010, 50075, 50020, 50030, 50060, 9090, 9091, 22
– change ssh port of AWS
change AWS ssh port from 22 to something else so that Docker can use port 22 with -p 22:22 . but by doing this you should specify the port to something you change every time you try to access AWS with ssh
vi /etc/ssh/sshd_config ---------------------------------------- # find port:22 and change port : 3022 ---------------------------------------- sudo service sshd restart
7. Docker setting
[Link : install Docker]
[Link : use Docker]
(1) Docker Build (option 1)
– downloads : [dockerfiles]
– unzip : unzip Dockerfiles
– change name : cp Dockerfiles-Hbase Dockerfile
– copy conf : cp hadoop-2.7.2 /* .
– build : # docker build –tag=tmddno1/datastore:v1 ./
(2) Download Docker Image (option 2)
docker pull tmddno1/datastore:v1
(3) Create Daemon Container
docker run --net=host -d <imageid>
(4) Exec Container with bash
docker exec -it <container id> bash
(5) change sshd_config
vi /etc/ssh/sshd_config ----------------------------------- # change bellow PasswordAuthentication yes PermitRootLogin yes ---------------------------------- /etc/init.d/ssh restart
8. SSH Setting
[AWS pem file share]
– WinSCP Download [Download]
– upload your aws pem file on master node with using WinSCP
[AWS pem file add on ssh-agent]
eval 'ssh-agent -s' eval $(ssh-agent) chmod 644 authorized_keys chmod 400 <pem_keyname> ssh-add <pem_keyname>
[ /etc/hosts]
192.168.1.109 hadoop-master 192.168.1.145 hadoop-slave-1 192.168.56.1 hadoop-slave-2
[ssh – rsa key share]
$ ssh-keygen -t rsa $ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master $ ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave1 $ ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave2 $ chmod 0600~/.ssh/authorized_keys $ exit
9. Run Hadoop
[Slaves]
vi /hadoop-2.7.2/etc/hadoop/slaves --------------------------------------- # set hosts defined on /etc/hosts hadoop-slave-1 hadoop-slave-2
[modify core-site.xml]
<configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop-master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/root/tmp</value> </property> </configuration>
[start-all.sh]
hadoop namenode -format start-all.sh(start-dfs.sh + start_yarn.sh) stop-all.sh(stop-dfs.sh + stop_yarn.sh)
10. ZooKeeper
[zookeeper/conf/zoo.cfg]
server.1=master server.2=slave2
[make dummy my id file]
cd /root/zookeeper vi 1 <== on the master vi 2 <== on the slave 2
[start zookeepr on every server]
/zookeeper/bin/zkServer.sh start
11. Hbase Setting
[hbase-env.sh]
export HBASE_MANAGES_ZK=false
[hbase-site.xml]
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase:rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.master</name> <value>master:6000</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/root/zookeeper</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master, slave2, slave3</value> </property> </configuration>
[regionservers]
# hbase/conf/regionservers master slave1 slave2 slave3
[start hbase]
start-hbase.sh
12. Hbase Thrift
- hbase start : start-hbase.sh - thrift start : hbase thrift start -p <port> --infoport <port>
13. Check Site server running correctly
Yarn : http://localhost:8088 Hadoop : http://localhost:50070 Hbase : http://localhost:9095
14. Hbase Shell Test
hbase shell
15. Install happy base (on the client server)
– I will explain about happybase later
sudo yum install gcc pip install happybase