Spark以及Spark SQL在IntelliJ-idea上的环境搭建

jiaxp 发表于 2017-2-28 10:53:20

　　Scala的IDE有两大阵营，ScalaIDE（Eclipse）和IntelliJ-idea，我之前基本上都是把Spark的开发环境搭建在ScalaIDE上，但是它的编译确实太慢，不得已我迁移到了IntelliJ上，下面，我以图解的方式说明在IntelliJ上搭建Spark和Spark SQL开发环境的主要步骤。
　　0. 假设IntelliJ-Idea for Scala和Spark最新的源代码都已经准备好了
　　1. 生成IntelliJ的工程文件(必须)
　　sbt/sbt -Phive -Phive-0.13.1 -Phadoop-2.3 gen-idea
　　。。漫长的等待和重试，（有高速vpnFQ经验的偷笑吧）
　　2. 打开IntelliJ，导入Spark工程(必须)
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/360bc68f-cd09-435b-8aa0-1e5bdb84d4cb/bfaf50ceda9a12b7435c56b6bd81179f/deep/0/Select-File-or-Directory-to-Import-和-IntelliJ-IDEA-和-hcheng@chenghaodeMacBook-Pro----git-spark---zsh---118-6.png
　　3. 打开Project Structure，设置相关模块(必须)
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/5a821cd1-7b62-4e5c-92d3-ccccb23c94b2/b7b4b6a5e216746ddc45b63ff0ac58b0/deep/0/File-和-Menubar.png
　　4. 删除不需要的模块(可选)
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/1033a0c7-caf3-4c8a-bb56-72e10afacf95/c7755d49d1d62d63c886589a30695035/deep/0/Project-Structure.png
　　（Scala代码的编译时间一般比较长，不常用的模块就直接删除吧，不会真正从磁盘删除源代码，只是工程文件而已）
　　5. 替换”CORE“模块中的mesos jar(必须)
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/89b78611-f290-443a-bd25-2e4a597ab686/268f24ec11ac4f49319450c3ea586a6c/deep/0/Project-Structure.png
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/7b894e83-70b4-4b66-b394-e50ddde95369/67b8f7127296169608b43ed671706aae/deep/0/Attach-Files-or-Directories-和-Project-Structure-和-spark------git-spark-.png
　　6. 把“CORE”模块中的"resources"文件夹真正标示为“Resource”(必须)
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/23cafcba-ae9e-4732-ad94-a9b47129c3c2/a3fa4eb4c2968ad1e2e085e7a4f66182/deep/0/Project-Structure.png
　　否则会说Jetty启动失败，一堆ui/static下的文件找不到
　　7. 修改文件IntelliJ中class文件的输出路径（可选）
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/a5d17e30-cfe7-4d1e-9b05-d0b7eba24404/aef9251928750602c57d5ab21e8db74b/deep/0/Project-Structure.png
　　通常我都会修改“Output Path”，将其设置到ramdisk中，原因有二：1）.避免和sbt输出的文件路径相同，会导致在sbt和ide切换执行单元测试时编译时间过长；2）.ramdisk毕竟能减少磁盘的读写，能省一点就省一点吧。
　　上图只修改了“CORE”模块，对应其它模块和“Test Output Path”的路径也一并修改
　　8.把conf文件夹作为“resource”放到hive模块中（可选）
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/4e9bc3ab-105e-42a7-9f5a-8d2d60b69232/6729b0332f0987f0bc71cf5baf472cea/deep/0/Select-content-root-directory-和-Project-Structure.png
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/ae8e2c5d-d991-4a90-a87b-3016d2e5d170/43d34b34f2c85139a01610bec3834993/deep/0/Project-Structure.png
　　通常，我把hive-site.xml文件也会放到conf下，把常用配置设置好。
　　9.打开SparkSQLCLIDriver.scala，配置执行环境（可选）
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/4f80e1ae-f678-4e86-a746-b10188e44a32/fabc7521522b4540476c74c7abb555c5/deep/0/Menubar.png
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/b0263b3f-84e6-4b89-a796-46f3777af8c2/7a38149d8b7abfd813079219a8abaf6a/deep/0/Run-Debug-Configurations.png
　　10.执行，测试！（可选）
http://onexin.iyunv.com/source/plugin/onexin_bigdata/https://app.yinxiang.com/shard/s1/sh/97aa7b2f-dd86-4e9f-a5ba-9acb9cbc9628/f651d0a976bc236b684418e094f9d4df/deep/0/Menubar.png
　　以上就是我配置开发环境的完整过程，可能有些细节没有完全说到位，大家多尝试一下！

页: [1]

运维网's Archiver

Spark以及Spark SQL在IntelliJ-idea上的环境搭建