Ubuntu深度学习服务器配置

Ubuntu 配置

NVIDIA驱动

  1. 驱动删除

    1
    2
    3
    sudo apt --purge autoremove nvidia*
    or
    sudo /usr/bin/nvidia-uninstall
  2. 驱动安装

    1
    2
    3
    4
    5
    sudo add-apt-repository ppa:graphics-drivers/ppa
    sudo apt update
    sudo apt upgrade
    ubuntu-drivers list
    sudo apt install nvidia-driver-VERSION_NUMBER_HERE

Reboot your computer so that the new driver is loaded.

CUDA+Cudnn

  1. 下载对应驱动版本的cuda以及cudnn

    1
    2
    chmod 755 cuda_%version%_linux.run
    sudo sh cuda_%version%_linux.run
  2. 安装cuda后配置环境变量

    1
    2
    3
    export CUDA_HOME=/usr/local/cuda 
    export PATH=$PATH:$CUDA_HOME/bin
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  3. 查看cuda版本

    1
    nvcc --version
  4. 把cudnn对应文件移入 /usr/local/cuda/ 中

  • cudnn7.6.3

    1
    2
    3
    4
    5
    6
    7
    sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
    sudo cp cuda/lib64/libcudnn.so.7.6.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_static.a /usr/local/cuda/lib64/
    sudo chmod a+r /usr/local/cuda/include/cudnn.h
    sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
    sudo ln -s /usr/local/cuda/lib64/libcudnn.so.7.6.5 /usr/local/cuda/lib64/libcudnn.so.7
    sudo ln -s /usr/local/cuda/lib64/libcudnn.so.7 /usr/local/cuda/lib64/libcudnn.so
  • cudnn8.0.5

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    sudo cp cuda/include/cudnn* /usr/local/cuda/include/
    sudo cp cuda/lib64/libcudnn.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_adv_train.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_ops_train.so.8.0.5 /usr/local/cuda/lib64/
    sudo cp cuda/lib64/libcudnn_static.a /usr/local/cuda/lib64/
    sudo chmod a+r /usr/local/cuda/include/cudnn*
    sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
    sudo ln -s /usr/local/cuda/lib64/libcudnn.so.8.0.5 /usr/local/cuda/lib64/libcudnn.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn.so.8 /usr/local/cuda/lib64/libcudnn.so
    sudo ln -s /usr/local/cuda/lib64/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda/lib64/libcudnn_adv_infer.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn_adv_infer.so.8 /usr/local/cuda/lib64/libcudnn_adv_infer.so
    sudo ln -s /usr/local/cuda/lib64/libcudnn_adv_train.so.8.0.5 /usr/local/cuda/lib64/libcudnn_adv_train.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn_adv_train.so.8 /usr/local/cuda/lib64/libcudnn_adv_train.so
    sudo ln -s /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8 /usr/local/cuda/lib64/libcudnn_cnn_infer.so
    sudo ln -s /usr/local/cuda/lib64/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda/lib64/libcudnn_cnn_train.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn_cnn_train.so.8 /usr/local/cuda/lib64/libcudnn_cnn_train.so
    sudo ln -s /usr/local/cuda/lib64/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda/lib64/libcudnn_ops_infer.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn_ops_infer.so.8 /usr/local/cuda/lib64/libcudnn_ops_infer.so
    sudo ln -s /usr/local/cuda/lib64/libcudnn_ops_train.so.8.0.5 /usr/local/cuda/lib64/libcudnn_ops_train.so.8
    sudo ln -s /usr/local/cuda/lib64/libcudnn_ops_train.so.8 /usr/local/cuda/lib64/libcudnn_ops_train.so
  1. 查看cudnn安装
    1
    cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Vim

image

  1. 升级vim到8.0以上

  2. 使用

    Sans Mono Nerd Font``` 作为终端字体: [download](https://drive.google.com/file/d/1aWA6edSaMiG6cGV0tyQnt012Ub-CoUCO/view?usp
    1
    2
    3
    4
    5
    6

    3. 安装
    ```sh
    git clone https://github.com/chxuan/vimplus.git ~/.vimplus
    cd ~/.vimplus
    ./install.sh //不加sudo

  3. 更新

    1
    2
    cd ~/.vimplus
    ./update.sh

Tmux

  1. 安装

    1
    sudo apt-get install tmux
  2. tmux 内无法连接X

    1
    set-option -g update-environment "SSH_ASKPASS SSH_AUTH_SOCK SSH_AGENT_PID SSH_CONNECTION WINDOWID XAUTHORITY"

上面命令放到 tmux.conf 中

1
tmux source-file ~/.tmux.conf

Anaconda3

官网下载安装包
For Linux Installer

打开命令行

  1. /path/filename 替换为安装包路径

    1
    sha256sum /path/filename
  2. 安装

    1
    bash ~/path/filename
  3. 安装过程中出现说明以及选择的地方选择YES

  4. 修改环境变量

1
vim ~/.bashrc

按”i”进入编辑模式,在最后一行添加

1
export PATH=~/anaconda3/bin:$PATH

然后重启环境变量

1
source ~/.bashrc

  1. 配置完成,命令行输入

    1
    anaconda-navigator
  2. 启动

Anaconda环境管理

断开VPN!!!

  1. 创建新环境(自定义python版本)

    1
    conda create -n pytorch python=3.7
  2. 启动环境

    1
    source activate pytorch
  3. 关联环境到Jupyter-Notebook

    1
    conda install ipykernel

切换国内源

  1. 升级pip>10.0

    1
    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pip -U
  2. 设置清华源作为镜像

    1
    pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
  3. Anaconda 镜像

    1
    2
    conda config --add channels 'https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/'
    conda config --set show_channel_urls yes

Pytorch

  1. 各个pytorch以及torchvision版本地址 here

TensorRT

  1. 下载对应版本tensorrt(cuda, cudnn, linux)
  2. 进入conda虚拟环境
  3. 解压

    1
    tar xzvf TensorRT-${version}.${os}.${arch}-gnu.${cuda}.${cudnn}.tar.gz
  4. 添加环境变量

    1
    2
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TensorRT-${version}/lib>
    export LIBRARY_PATH=$LIBRARY_PATH:<TensorRT-${version}/lib>
  5. 安装

    1
    2
    3
    4
    5
    6
    cd TensorRT-${version}/python
    sudo pip install tensorrt-*-cp3x-none-linux_x86_64.whl
    cd TensorRT-${version}/graphsurgeon
    sudo pip install graphsurgeon-0.4.4-py2.py3-none-any.whl
    cd TensorRT-${version}/onnx_graphsurgeon
    sudo pip install onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
  6. 重启虚拟环境

    1
    source ~/.bashrc
  7. tensorrt bug记录


1
2
Assertion failed: !_importer_ctx.network()->hasImplicitBatchDimension() && "This version of the ONNX parser only supports TensorRT INetworkDefinitions with an explicit batch dimension. Please ensure t
he network was created using the EXPLICIT_BATCH NetworkDefinitionCreationFlag."

build trt enginn时候设定 EXPLICIT_BATCH

1
2
3
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
builder = trt.Builder()
network = builder.create_network(EXPLICIT_BATCH)


Docker

使用脚本自动安装

1
2
3
4
5
6
7
8
curl -fsSL get.docker.com -o get-docker.sh
sudo sh get-docker.sh --mirror Aliyun
sudo systemctl enable docker
sudo systemctl start docker
sudo groupadd docker
sudo usermod -aG docker $USER
sudo gpasswd -a $USER docker
newgrp docker

退出当前终端并重新登录,进行如下测试

1
docker run hello-world

Cmake

官网下载Cmake压缩包,解压后

1
2
3
4
5
sudo apt-get install libssl-dev
./bootstrap --prefix=/usr
make
sudo make install
cmake --version

OpenCV

  1. 安装
    1
    2
    3
    4
    5
    6
    7
    8
    9
    git clone https://github.com/opencv/opencv.git
    cd opencv
    mkdir release
    cd release
    cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local ..
    sudo make
    sudo make install
    sudo /bin/bash -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
    sudo ldconfig
兴趣使然