Category: Hadoop

Hadoop – Hbase Cluster with Docker on AWS

Hadoop – Hbase Cluster with Docker on AWS  1. Problems of Hadoop1 – data streaming – map process and reduce process are seperated – job tracker manage all jobs alone (too busy) . cannot manage resource(cpu, memory) effectively – SPOF weakness (name node dies whole system dies) 2. Solution of Hadoop2  – job tracker is […]

Read more

Hadoop MapReduce – word count (improve)

About Map Reduce Code  1.Ordering with Map Reduce   (A) Binary Search we are going to make a map reduce program which return N numbers of   keywords from the top rank (ordered by number of appears) Hadoop support beautiful sorting Library which is called PriorityQueue and by calling peek you can get keyword on the […]

Read more

Hadoop Map Reduce – word count

Build & Run Example Code  1. download maven – download maven build tool from site using apt-get sudo apt-get install maven  2. get test source code using wget wget https://s3.amazonaws.com/hadoopkr/source.tar.gz  3. build source with mvn cd /home/<user>/source/<where pom.xml> mvn compile 5. upload local file to hadoop hadoop fs -copyFromLocal README.txt /  6. execute on hadoop […]

Read more

Install Hadoop on Docker

Get Ubuntu Docker – docker pull ubuntu Start Container docker run -i -p 22 -p 8000:80 -m /data:/data -t <ubuntu> /bin/bash Install Jdk sudo add-apt-repository ppa:openjdk-r/ppa sudo apt-get update sudo apt-get install openjdk-7-jre .bashrc export JAVA_HOME=/usr/lib/jvm/… export CLASSPATH=$JAVA_HOME/lib/*:. export PATH=$PATH:$JAVA_HOME/bin HADOOP 1.2.1 install download hadoop and unpack root@4aa2cda88fcc:/home/kim# wget http://apache.mirror.cdnetworks.com/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz root@4aa2cda88fcc:/home/kim# mv ./hadoop-1.2.1.tar.gz /home/user root@4aa2cda88fcc:/home/kim# […]

Read more