潜在データ (latent data) の可視化#

Section 06: Latent Diffusion and Stable DiffusionLecture 18: Components of Stable Diffusion にて紹介した潜在データ (latent data) の可視化方法について紹介します。

セットアップ#

diffusers ライブラリをインストールすることで拡散モデルを簡単に使用できるようにします。

pip install diffusers==0.16.1
Collecting diffusers==0.16.1
  Downloading diffusers-0.16.1-py3-none-any.whl (934 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 934.9/934.9 kB 6.6 MB/s eta 0:00:00
?25hRequirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from diffusers==0.16.1) (9.4.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from diffusers==0.16.1) (3.12.2)
Collecting huggingface-hub>=0.13.2 (from diffusers==0.16.1)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 268.8/268.8 kB 9.0 MB/s eta 0:00:00
?25hRequirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from diffusers==0.16.1) (6.8.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from diffusers==0.16.1) (1.23.5)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from diffusers==0.16.1) (2023.6.3)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from diffusers==0.16.1) (2.31.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.13.2->diffusers==0.16.1) (2023.6.0)
Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.13.2->diffusers==0.16.1) (4.66.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.13.2->diffusers==0.16.1) (6.0.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.13.2->diffusers==0.16.1) (4.7.1)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.13.2->diffusers==0.16.1) (23.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->diffusers==0.16.1) (3.16.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->diffusers==0.16.1) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->diffusers==0.16.1) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->diffusers==0.16.1) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->diffusers==0.16.1) (2023.7.22)
Installing collected packages: huggingface-hub, diffusers
Successfully installed diffusers-0.16.1 huggingface-hub-0.16.4

VAE の読み込み#

入力画像を潜在データへ変換する VAE を読み込みます。ここではこれまで Stable Diffusion として読み込んできた runwayml/stable-diffusion-v1-5 を使用します。

import torch
from diffusers import AutoencoderKL

model_id = "runwayml/stable-diffusion-v1-5"
vae = AutoencoderKL.from_pretrained(model_id, subfolder="vae")
Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: 
```
pip install accelerate
```
.

入力画像を潜在データへ変換#

読み込んだ VAE を元に、以下のようにして入力画像から潜在データへ変換する pil_to_latents 関数を定義します。

import torchvision.transforms as transforms
from PIL.Image import Image as PilImage

def pil_to_latents(image: PilImage, scale_factor: float = 0.18215) -> torch.Tensor:
    init_image = transforms.ToTensor()(image)
    init_image = init_image.unsqueeze(dim=0)
    init_image = init_image * 2.0 - 1.0
    init_latent_dist = vae.encode(init_image).latent_dist.sample() * scale_factor
    return init_latent_dist

可視化で使用する画像をダウンロードしてきます。今回は講座内でも度々登場している画像生成 AI くんの画像を使用してみます。

from diffusers.utils import load_image

url = "https://github.com/shunk031/coloso-python-diffusion-models/blob/main/assets/image_generation_ai_kun.png?raw=true"

image = load_image(url)
image
../../_images/87d25b4afe01b4e62d2f57f29f57bdf148eb0726cf7d919b0bca6a4fb5121474.png

上記で定義した pil_to_latents 関数を使用して、読み込んだ画像を潜在データに変換します。

latent_img = pil_to_latents(image)
print(f"Dimension of this latent representation: {latent_img.shape}")
Dimension of this latent representation: torch.Size([1, 4, 64, 64])

変換した潜在データを以下のようにして可視化します。

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 4, figsize=(16, 4))
for c in range(4):
    axes[c].imshow(latent_img[0][c].detach().cpu(), cmap='Greys')
../../_images/831f0cd4d2d7edd8b39a30c4a56c7cad802e9d8c51bd569b6637ec3fbf622db7.png