TensorFlow 與 Keras 指定 NVIDIA GPU 顯示卡與記憶體用量教學

本篇介紹如何指定 TensorFlow 與 Keras 程式所使用的 GPU 顯示卡與記憶體用量。

在 TensorFlow 或 Keras 中使用 NVIDIA 的 GPU 做運算時，預設會把整台機器上所有的 GPU 卡都獨佔下來，而且不管實際需要多少顯示卡的記憶體，每張卡的記憶體都會被佔滿，以下介紹如何調整設定，讓多張顯示卡可以分給多個程式或多人使用。

指定 GPU 顯示卡

若要只使用特定的 GPU 卡，可以使用 CUDA_VISIBLE_DEVICES 這個環境變數來設定，它的值代表 CUDA 程式可以使用的 GPU 卡編號（從 0 開始），例如只讓 CUDA 程式使用第一張 GPU 卡：

# 只讓 CUDA 程式使用第一張 GPU 卡
export CUDA_VISIBLE_DEVICES=0
python my_script.py

這樣當 my_script.py 這個 Python 程式在執行時，就只會用到機器上的第一張 GPU 卡。若要指定多張 GPU 卡，則以逗號分隔：

# 讓 CUDA 程式使用第一張與第三張 GPU 卡
export CUDA_VISIBLE_DEVICES=0,2
python my_script.py

如果自己會用的 GPU 卡都是固定的，我們可以將 CUDA_VISIBLE_DEVICES 的設定寫在 ~/.bashrc 中，在登入 Linux 系統時就自動設定好。而如果要讓不同的程式用不同的 GPU 卡計算，分散計算量的話，可以在執行程式時直接以 CUDA_VISIBLE_DEVICES 指定：

# 使用第一張 GPU 卡
CUDA_VISIBLE_DEVICES=0 python my_script1.py

# 使用第二張與第三張 GPU 卡
CUDA_VISIBLE_DEVICES=1,2 python my_script2.py

另外還有一種方式是直接在 Python 程式中更改 CUDA_VISIBLE_DEVICES 這個環境變數，此種方式的原理也是一樣的，只是這樣可以把 GPU 卡的指定邏輯寫在 Python 程式中：

import os

# 使用第一張與第三張 GPU 卡
os.environ["CUDA_VISIBLE_DEVICES"] = "0,2"

指定 GPU 顯示卡記憶體用量上限

若在 TensorFlow 中，我們可以使用 tf.GPUOptions 來調整程式佔用的 GPU 記憶體：

import tensorflow as tf
W = tf.constant([1.0, 2.0, 3.0, 4.0], shape=[2, 2], name='W')
x = tf.constant([1.3, 2.4], shape=[2, 1], name='x')
y = tf.matmul(W, x)

# 只使用 30% 的 GPU 記憶體
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
print(sess.run(y))

在以 TensorFlow 為 backend 的 Keras 程式中，我們可以透過以下的設定方式來指定 GPU 記憶體的佔用量：

import tensorflow as tf

# 只使用 30% 的 GPU 記憶體
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

# 設定 Keras 使用的 TensorFlow Session
tf.keras.backend.set_session(sess)

# 使用 Keras 建立模型
# ...

自動增長 GPU 記憶體用量

直接設定 GPU 記憶體的用量可以確保程式不會吃掉過多的記憶體，但是如果程式遇到真的需要更大的記憶體時，就會因為記憶體不足而產生 ResourceExhaustedError，比較折衷的做法是採用自動增長 GPU 記憶體用量的方式，讓程式需要多少記憶體就拿多少，剩下的才留給別人：

import tensorflow as tf
W = tf.constant([1.0, 2.0, 3.0, 4.0], shape=[2, 2], name='W')
x = tf.constant([1.3, 2.4], shape=[2, 1], name='x')
y = tf.matmul(W, x)

# 自動增長 GPU 記憶體用量
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
print(sess.run(y))

在 Keras 的部分也是一樣的做法：

import tensorflow as tf

# 自動增長 GPU 記憶體用量
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

# 設定 Keras 使用的 Session
tf.keras.backend.set_session(sess)

# 使用 Keras 建立模型
# ...

參考資料：Kevin Chan’s blog、csdn、StackOverflow

2 留言

Tony

您好，我嘗試依照您的教學在Keras 使用allow_growth的寫法，有遇到log如下：
Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

以上的log是不是不代表出現error? 但會不會有衍生其他問題呢？而我沒設定allow_growth直接跑fit_generator時，預設佔用全滿，就不會有這問題。
先感謝協助解答！

2019/05/30
HarryPotter

非常感謝你讓我可以不用時時注意VRAM的用量

裡面只有一個地方需要修改，因為tf2.0有把一些東西給砍掉，要有控制的功能就改一些函數讓他指定到對的函數去：
tf.keras.backend.set_session(sess)
改成
tf.compat.v1.keras.backend.set_session(sess)
就可以在tf2.x上使用tf1.0記憶體用量控制了

2020/02/19

TensorFlow 與 Keras 指定 NVIDIA GPU 顯示卡與記憶體用量教學

指定 GPU 顯示卡

指定 GPU 顯示卡記憶體用量上限

自動增長 GPU 記憶體用量

G. T. Wang

2 留言

Tony

HarryPotter

搜尋

分類

宗教

公益