Prediction(2)R running through Spark/Hadoop Cluster

wangwengwu · 发表于 2016-12-10 10:20:49

Prediction(2)R running through Spark/Hadoop Cluster

1. How we Load the Config in R
install.packages("yaml", repos="http://cran.rstudio.com/")

library("yaml")
config = yaml.load_file("config.yaml")

config$spark$home

These codes in Rstudio can be run. And also we can run them directly from shell
> Rscript scripts/WordCount.R

2. Prepare Hadoop Data
Create the Directory
>hadoop fs -mkdir user/carl/sparkR

Upload the file
>cd /home/carl/install/spark-1.4.1-bin-hadoop2.6/examples/src/main/resources

> hadoop fs -put ./people.json /user/carl/sparkR/

3. This RScript Run Great on Hadoop Cluster
#install.packages("yaml", repos="http://cran.rstudio.com/")

library("yaml")
config = yaml.load_file("config.yaml")

spark_home <- config$spark$home
spark_r_location <- paste0(spark_home,"/R/lib")
spark_server <- config$spark$server

library("SparkR", lib.loc = spark_r_location)

sc <- sparkR.init(master = spark_server, appName = "SparkR_Wordcount",
sparkHome = spark_home)
sqlContext <- sparkRSQL.init(sc)

path <- file.path("sparkR/people.json")

peopleDF <- jsonFile(sqlContext, path)

printSchema(peopleDF)
head(peopleDF)

Running great both on RStudio and RScript.

Tips
1. Error Message:
trying to use CRAN without setting a mirror

Solution:
install.packages("yaml", repos="http://cran.rstudio.com/")

Add the repos there will fix the problem.

References:
http://www.mayin.org/ajayshah/KB/R/

http://stackoverflow.com/questions/5272846/how-to-get-parameters-from-config-file-in-r-script

wordcount example
https://github.com/amplab-extras/SparkR-pkg/blob/master/examples/wordcount.R

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Prediction(2)R running through Spark/Hadoop Cluster

浏览过的版块

扫码加入运维网微信交流群