设为首页 收藏本站
查看: 937|回复: 0

[经验分享] 配置Ipython Nodebook 运行 Python Spark 程序

[复制链接]

尚未签到

发表于 2019-1-30 10:54:06 | 显示全部楼层 |阅读模式
配置Ipython Nodebook 运行 Python Spark 程序

1.1、安装Anaconda
  Anaconda的官网是https://www.anaconda.com,下载对应的版本;

1.1.1、下载Anaconda

$ cd /opt/local/src/
$ wget -c https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
1.1.2、安装Anaconda

# 参数 -b 表示 batch -p 表示指定安装目录
$ bash Anaconda3-5.2.0-Linux-x86_64.sh -p /opt/local/anaconda -b
1.1.3、配置Anaconda相关环境变量


  • 配置环境变量

$ tail -n 8 ~/.bashrc
# Anaconda3
export ANACONDA_PATH=/opt/local/anaconda
export PATH=$ANACONDA_PATH/bin:$PATH
# PySpark
export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python

  • 启用环境变量

$ source ~/.bashrc

  • 验证

$ python --version
Python 3.6.5 :: Anaconda, Inc.
1.2、在Ipython Notebook 使用pySpark

1.2.1、创建工作目录

$ mkdir  ~/ipynotebook
$ cd ~/ipynotebook
1.2.2、Ipython Notebook 运行pySpark


  • 运行Ipython Notebook

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
[I 14:21:56.030 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
[I 14:21:56.030 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
[I 14:21:56.037 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook
[I 14:21:56.037 NotebookApp] 0 active kernels
[I 14:21:56.037 NotebookApp] The Jupyter Notebook is running at:
[I 14:21:56.037 NotebookApp] http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d
[I 14:21:56.037 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:21:56.040 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d&token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d
[I 14:21:56.683 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1
  会自动通过默认的浏览器打开http://localhost:8888 页面


  • 在IPython Notebook 上编写程序


1.2.3、Ipython Notebook 在Hadoop Yarn 运行pySpark


  • 运行Ipython Notebook

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
[I 14:50:48.149 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
[I 14:50:48.149 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
[I 14:50:48.157 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook
[I 14:50:48.157 NotebookApp] 0 active kernels
[I 14:50:48.157 NotebookApp] The Jupyter Notebook is running at:
[I 14:50:48.157 NotebookApp] http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45
[I 14:50:48.157 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:50:48.161 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45&token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45


  • 在IPython Notebook 上编写程序



  • 在YARN查看任务

$ yarn application -list
18/06/24 14:53:06 INFO client.RMProxy: Connecting to ResourceManager at node/192.168.20.10:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
application_1529805293111_0001          PySparkShell                   SPARK        hadoop     default             RUNNING           UNDEFINED              10%                    http://node:4040
1.2.4、Ipython Notebook 在Spark Stand Alone 运行pySpark


  • 启动Spark Stand Alone

$ /opt/local/spark/sbin/start-master.sh
$ /opt/local/spark/sbin/start-slaves.sh
$ jps
13249 Jps
13027 Master
13188 Worker

  • 运行Ipython Notebook

$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
[I 15:11:59.211 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
[I 15:11:59.212 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
[I 15:11:59.230 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook
[I 15:11:59.230 NotebookApp] 0 active kernels
[I 15:11:59.230 NotebookApp] The Jupyter Notebook is running at:
[I 15:11:59.230 NotebookApp] http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea
[I 15:11:59.230 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 15:11:59.233 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea&token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea
[I 15:12:02.594 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1

  • 在IPython Notebook 上编写程序



  • 查看Spark Standalone Web UI 界面


1.3、总结
  启动启动Ipython Notebook,首先进入Ipython Notebook的工作目录,如~/ipynotebook这个根据实际的情况确定;

1.3.1、Local 启动Ipython Notebook

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
#### 或者
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local

  • 1.3.2、Hadoop YARN 启动Ipython Notebook

    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
    #### 或者
    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
    1.3.2、Spark Stand Alone 启动Ipython Notebook

    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m



  • 运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
    2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
    3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
    4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
    5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
    6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
    7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
    8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

    所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-669527-1-1.html 上篇帖子: 急中生智~利用Spark core完成"ETL"! 下篇帖子: Intelli IDEA开发Spark工程关联Spark源码!
    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则

    扫码加入运维网微信交流群X

    扫码加入运维网微信交流群

    扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

    扫描微信二维码查看详情

    客服E-mail:kefu@iyunv.com 客服QQ:1061981298


    QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


    提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


    本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



    合作伙伴: 青云cloud

    快速回复 返回顶部 返回列表