Ubuntu Spark/R 설치

가. Spark 설치
(1) http://spark.apache.org/downloads.html 접속
(2) 기 구축된 Hadoop 환경이 있는 것이 아니라면 Hadoop Pre-Build 선택
(3) download Spark
(4) 압축 해제 tar -zxvf spark-1.6.1-bin-hadoop.2.6.tgz

나. Spark 실행

[커맨드 모드]
(1) spark-1.6.1-bin-hadoop2.6/bin$ ./pyspark
(2) Spark 모니터링
16/06/01 22:03:46 INFO SparkUI: Started SparkUI at http://192.168.0.3:4040
선택 영역_002

[Master Node]
/sbin/
start-master.sh  

※ 아래 Page 기본 접속 포트는 8080 이다. (이미 사용중일 경우 +1)

선택 영역_001

[Slave  Node]
root@kim:/home/kim/spark/spark-1.6.1-bin-hadoop2.6/bin# ./spark-class org.apache.spark.deploy.worker.Worker spark://kim:7077
선택 영역_003

선택 영역_005

[Master Node 에서 slave 인식 확인]

선택 영역_004

다. R 설치
(1) root 권한 설정 : sudo passwd root
(2) Super user Login : su
(3) https://www.rstudio.com/products/rstudio/download-server-2/ 참조
$ sudo apt-get install r-base
$ sudo apt-get install gdebi-core
$ wget https://download2.rstudio.org/rstudio-server-0.99.902-amd64.deb
$ sudo gdebi rstudio-server-0.99.902-amd64.deb

라. R 실행
(1) http:// IP : 8787
(2) 접속 계정 : Linux 계정

마. R – Spark Cluster실행

(1) SPARK_HOME 설정

root@kim:/home/kim/spark/spark-1.6.1-bin-hadoop2.6# export SPARK_HOME=/home/kim/spark/spark-1.6.1-bin-hadoop2.6
root@kim:/home/kim/spark/spark-1.6.1-bin-hadoop2.6# echo “$SPARK_HOME”
/home/kim/spark/spark-1.6.1-bin-hadoop2.6

(2) R 에서 Spark Lib Load

if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
  Sys.setenv(SPARK_HOME = "/home/kim/spark/spark-1.6.1-bin-hadoop2.6")
}

Sys.getenv("SPARK_HOME")

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
##sc <- sparkR.init(master="spark://192.168.0.3:7077")

다음의 패키지를 부착합니다: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:

    colnames, colnames<-, intersect, rank, rbind, sample, subset, summary, table, transform

(3) 로컬로 Context 생성

sc <- sparkR.init(master="local[*]",appName='test', sparkEnvir=list(spark.executor.memory='2g'))

(4) Remote context 생성

선택 영역_006

 

[전체 테스트 코드]

if (nchar(Sys.getenv(“SPARK_HOME”)) < 1) {
Sys.setenv(SPARK_HOME = “/home/kim/spark/spark-1.6.1-bin-hadoop2.6”)
}

Sys.getenv(“SPARK_HOME”)

library(SparkR, lib.loc = c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”)))

sc <- sparkR.init(master=”spark://kim:7077″,appName=’test’, sparkEnvir=list(spark.executor.memory=’500m’),
sparkPackages=”com.databricks:spark-csv_2.11:1.0.3″)

sqlContext <- sparkRSQL.init(sc)

df <- createDataFrame(sc, faithful)
head(df)

people <- read.df(sqlContext, “/home/kim/spark/spark-1.6.1-bin-hadoop2.6/examples/src/main/resources/people.json”, “json”)
head(people)

sparkR.stop()

———결과———–

  age    name
1  NA Michael
2  30    Andy
3  19  Justin

————————–


 

Leave a Reply

Your email address will not be published. Required fields are marked *