2.1 方法一:桌面(desktop)安装
想要用GPU版的MxNet必须用NVIDIA的GPU,如果没有禁用Ubuntu自带的显卡驱动,更新Nvdia的驱动,就会出现如X server is running或者不停的提示你重启,
或者即使你安装成功了,也没办连接驱动等各种问题。
桌面版的Ubuntu,就有一个最简单的方式。在“软件和更新”里,有“附加驱动”这一选项,系统会自动检测到NVIDIA官方的显卡驱动,只要选中安装然后重启即可!
安装完,查看显卡驱动信息
user@gpu:~$ nvidia-smi
Sat Sep 22 17:50:29 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 On | N/A |
| 0% 44C P8 14W / 300W | 249MiB / 11170MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1184 G /usr/lib/xorg/Xorg 126MiB |
| 0 1773 G compiz 120MiB |
+-----------------------------------------------------------------------------+
user@gpu:~$
要求驱动版本>=384.81
2.2 方法二:server版安装
2.2.1 驱动下载
下载:官方
选择自己的驱动型号,系统版本,语言
我的版本为:
类型
型号
产品类型
GeForce
产品系列
GeForce 10 Servers
产品家族
GeForce CTX 1080 Ti
操作系统
Linux 64-bit
语言
English
我下载的文件为:NVIDIA-Linux-x86_64-390.87.run
2.2.2 安装
user@gpu:~$ mkdir ~/driver
user@gpu:~$ cd ~/driver
user@gpu:driver$ sudo chmod +x NVIDIA-Linux-x86_64-390.87.run
user@gpu:driver$ sudo sh NVIDIA-Linux-x86_64-390.87.run
安装第一部会提示协议条款,accept即可;之后按照提示进行安装,中间会提示警告32-bit文件无法安装,忽略即可,接着下一步;接下来根据提示一步一步安装即可…
安装完成后,重启:
ERROR: The Nouveau kernel driver is currently in use by your system.
This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.
Please consult the NVIDIA driver README and
your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver. 2.3.1 清理所有nvidia包
在此步骤中,我们将删除所有与nvidia相关的包。
user@gpu:~$ sudo vim /etc/modprobe.d/blacklist.conf
#添加
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off 2.3.3 更新initramfs
键入以下命令禁用内核nouveau:
user@gpu:~$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf build the new kernel by:
最后更新并重启:
Installing the NVIDIA display driver...
It appears that an X server is running. Please exit X before installation.
If you're sure that X is not running, but are getting this error, please delete any X lock files in /tmp. 3.1.1 查看系统目前运行级别
user@gpu:~$ runlevel
N 5 3.1.2 修改运行级别为3
命令行模式和图形界面模式的切换
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so 3.3 安装Nvidia cuda_9.0驱动
user@gpu:/data/tools$ sudo sh cuda_9.0.176_384.81_linux.run
......
# 空格键阅读协议
......
Do you accept the previously read EULA?
accept/decline/quit: accept # 同意协议
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: y # 安装NVIDIA加速图形驱动程序
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: n # 不安装OpenGL库
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: # 默认不安装nvidia-xconfig
Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y # 安装CUDA 9.0 Toolkit
Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]: # cuda安装位置
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y # 安装符号链接
Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y # 安装CUDA示例
Enter CUDA Samples Location
[ default is /home/user ]: # CUDA示例位置
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Installing the CUDA Samples in /home/user ...
Copying samples to /home/user/NVIDIA_CUDA-9.0_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /home/user
Please make sure that # 提示添加变量
- PATH includes /usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root
- PATH包括/usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH包含/usr/local/cuda-9.0/lib64,或者将/usr/local/cuda-9.0/lib64添加到/etc/ld.so.conf并以root身份运行ldconfig
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall # 卸载方法
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.
Logfile is /tmp/cuda_install_14141.log
user@gpu:/data/tools/tensorflow-gpu$ 3.4 添加环境变量
user@gpu:~$ vim ~/.bashrc # 在最后追加
# cuda
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
user@gpu:~$ source ~/.bashrc 3.5 验证
user@gpu:/data/tools/tensorflow-gpu$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176 4 安装NVIDIA cuDNN 7_7.3.0.29-1
GPU加速深度学习
安装cudnn前先要确保cuda和NVIDIA驱动已正确安装
user@gpu:/data/tools$ ll
总用量 1952872
drwxr-xr-x 3 user user 269 9月 14 13:25 ./
drwxr-xr-x 3 user user 19 9月 14 10:21 ../
-rw-rw-r-- 1 user user 1643293725 9月 22 16:35 cuda_9.0.176_384.81_linux.run
-rw-rw-r-- 1 user user 125687148 9月 22 16:33 libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb
-rw-rw-r-- 1 user user 115870862 9月 22 16:33 libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb
-rw-rw-r-- 1 user user 4913038 9月 22 16:33 libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb 4.2 安装cuDNN