[MLOps] WandB (Weights & Biases) 사용법

WandB란?

딥러닝 실험 과정을 손쉽게 Tracking하고, 시각화할 수 있는 Tool이다. 딥러닝에서 흔히 사용하는 Weights(가중치)와 Biases(편향)을 줄여서 wandb(Weights and biases)라고 부른다.

할 수 있는 일

학습 시 사용된 파라미터들을 저장할 수 있음
학습했던 각 실험들을 탐색하고, 비교하고, 시각화할 수 있음
학습 환경의 시스템을 어떻게 사용하고 있는지 분석할 수 있음
다른 사람들과 협업할 수 있음
과거 실험 결과들을 복제할 수 있음
하이퍼 파라미터 튜닝이 가능함
실험했던 모든 기록들을 영구적으로 저장할 수 있음

설치과정

먼저 wandb 홈페이지(https://wandb.ai/site)에 들어가서 회원가입을 한다.

Home

The Weights & Biases MLOps platform helps AI developers streamline their ML workflow from end-to-end.

wandb.ai

그리고 Linux shell에서 wandb 라이브러리를 설치합니다.

pip install wandb

로그인을 하기 위해 터미널에 아래 명령어를 입력한다.

wandb login

wandb: You can find your API key in your browser here: https://app.wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

**참고로 여기서 copy and paste할 때 paste한 내용이 안보이지만 paste 된 것이므로 paste후 바로 enter를 치면 된다.

해당 링크(https://wandb.ai/authorize)에 있는 API key값을 복사하여 명령어 창에 입력하면 초기 설정이 끝난다.

wandb.ai

예제(Train)

#%%
from tqdm import tqdm
import wandb
import argparse
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
#%%
"""wandb 초기화"""
project = "wandb_practice" # put your WANDB project name
entity = "wotjd1410" # put your WANDB username

run = wandb.init(
    project=project, 
    # entity=entity, 
    tags=["exercise"], # put tags of this python project
)
#%%
# wandb 설정값을 불러옵니다.
def get_args(debug):
    parser = argparse.ArgumentParser('parameters')
    
    
    parser.add_argument('--epochs', default=20, type=int,
                        help='Number epochs to train model.')
    parser.add_argument("--batch_size", type=int, default=16, 
                        help="Batch size")
    parser.add_argument("--lr", type=float, default=5e-5, 
                        help="learning rate")
    if debug:
        return parser.parse_args(args=[])
    else:    
        return parser.parse_args()
#%%
def main():
    #%%
    config = vars(get_args(debug=True)) # default configuration
    wandb.config.update(config)
    #%%
    """1. 합성 데이터 생성 (y = 3*x + 2 + noise)"""
    np.random.seed(42)
    x = np.random.rand(1000, 1)
    y = 3 * x + 2 + np.random.randn(1000, 1) * 0.1

    # 데이터를 PyTorch 텐서로 변환합니다.
    x_tensor = torch.FloatTensor(x)
    y_tensor = torch.FloatTensor(y)

    # 데이터셋과 데이터로더를 구성합니다.
    dataset = TensorDataset(x_tensor, y_tensor)
    dataloader = DataLoader(dataset, batch_size=config["batch_size"], shuffle=True)
    #%%
    """2. 간단한 선형 회귀 모델 정의 (입력 1차원 -> 출력 1차원)"""
    model = nn.Sequential(
        nn.Linear(1, 1)
    )

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print('Current device is', device)

    model.to(device)

    # 손실 함수 및 옵티마이저 설정
    criterion = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=config["lr"])
    #%%
    """3. 학습 루프: 지정한 epoch 만큼 모델을 학습"""
    for epoch in tqdm(range(config["epochs"]), desc="training..."):
        epoch_loss = 0
        for batch_x, batch_y in dataloader:
            batch_x, batch_y = batch_x.to(device), batch_y.to(device)
            optimizer.zero_grad()           # 이전 기울기를 초기화
            preds = model(batch_x)            # 모델의 예측 값을 계산
            loss = criterion(preds, batch_y)  # 손실 값을 계산
            loss.backward()                   # 역전파 수행
            optimizer.step()                  # 모델 업데이트
            
            epoch_loss += loss.item() * batch_x.size(0)
        
        epoch_loss /= len(dataset)
        
        # wandb에 지표 로그 기록: epoch 번호와 손실 값
        wandb.log({"epoch": epoch + 1, "loss": epoch_loss})
        print(f"Epoch {epoch + 1}, Loss: {epoch_loss}")
    #%%
    # wandb run 종료
    wandb.config.update(config, allow_val_change=True)
    wandb.run.finish()
#%%
if __name__ == "__main__":
    main()

wandb.init()

wandb.init()은 현재 실행하는 프로젝트를 wandb에 추적, 로깅을 시작한다.

wandb를 먼저 import 하고 코드의 시작 부분에 wandb를 초기화 해준다. class에 init과 비슷한 느낌

init 메소드의 parameter는 생략 가능하고, 아래 실행 이름 설정(run)은 필수가 아니기 때문에 마음대로 하면 된다.

**wandb documentation(https://docs.wandb.ai/ref/python/init)를 참고

wandb.config()

원하는 설정 값(아래에선 Hyperparameter 값)들을 wandb로 전달해준다.

wandb.log()

dictionary 형태의 data를 wandb로 전달해준다.

이제 실행시켜 보겠다.

실행시키시면 아래와 같은 화면을 볼 수 있다. 위에서 wandb.log로 전달한 loss값의 변화가 나타난다.

Statistics.holic