가. Spark 설치
(1) http://spark.apache.org/downloads.html 접속
(2) 기 구축된 Hadoop 환경이 있는 것이 아니라면 Hadoop Pre-Build 선택
(3) download Spark
(4) 압축 해제 tar -zxvf spark-1.6.1-bin-hadoop.2.6.tgz
나. Spark 실행
[커맨드 모드]
(1) spark-1.6.1-bin-hadoop2.6/bin$ ./pyspark
(2) Spark 모니터링
16/06/01 22:03:46 INFO SparkUI: Started SparkUI at http://192.168.0.3:4040
[Master Node]
/sbin/start-master.sh
※ 아래 Page 기본 접속 포트는 8080 이다. (이미 사용중일 경우 +1)
[Slave Node]
root@kim:/home/kim/spark/spark-1.6.1-bin-hadoop2.6/bin# ./spark-class org.apache.spark.deploy.worker.Worker spark://kim:7077
[Master Node 에서 slave 인식 확인]
다. R 설치
(1) root 권한 설정 : sudo passwd root
(2) Super user Login : su
(3) https://www.rstudio.com/products/rstudio/download-server-2/ 참조
$ sudo apt-get install r-base
$ sudo apt-get install gdebi-core
$ wget https://download2.rstudio.org/rstudio-server-0.99.902-amd64.deb
$ sudo gdebi rstudio-server-0.99.902-amd64.deb
라. R 실행
(1) http:// IP : 8787
(2) 접속 계정 : Linux 계정
마. R – Spark Cluster실행
(1) SPARK_HOME 설정
root@kim:/home/kim/spark/spark-1.6.1-bin-hadoop2.6# export SPARK_HOME=/home/kim/spark/spark-1.6.1-bin-hadoop2.6
root@kim:/home/kim/spark/spark-1.6.1-bin-hadoop2.6# echo “$SPARK_HOME”
/home/kim/spark/spark-1.6.1-bin-hadoop2.6
(2) R 에서 Spark Lib Load
if (nchar(Sys.getenv("SPARK_HOME")) < 1) { Sys.setenv(SPARK_HOME = "/home/kim/spark/spark-1.6.1-bin-hadoop2.6") } Sys.getenv("SPARK_HOME") library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) ##sc <- sparkR.init(master="spark://192.168.0.3:7077") 다음의 패키지를 부착합니다: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var The following objects are masked from ‘package:base’: colnames, colnames<-, intersect, rank, rbind, sample, subset, summary, table, transform
(3) 로컬로 Context 생성
sc <- sparkR.init(master="local[*]",appName='test', sparkEnvir=list(spark.executor.memory='2g'))
(4) Remote context 생성
[전체 테스트 코드]
if (nchar(Sys.getenv(“SPARK_HOME”)) < 1) {
Sys.setenv(SPARK_HOME = “/home/kim/spark/spark-1.6.1-bin-hadoop2.6”)
}
Sys.getenv(“SPARK_HOME”)
library(SparkR, lib.loc = c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”)))
sc <- sparkR.init(master=”spark://kim:7077″,appName=’test’, sparkEnvir=list(spark.executor.memory=’500m’),
sparkPackages=”com.databricks:spark-csv_2.11:1.0.3″)
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sc, faithful)
head(df)
people <- read.df(sqlContext, “/home/kim/spark/spark-1.6.1-bin-hadoop2.6/examples/src/main/resources/people.json”, “json”)
head(people)
sparkR.stop()
———결과———–
age name 1 NA Michael 2 30 Andy 3 19 Justin
————————–