본문 바로가기
📂 프로젝트/◾ PAPERS

[AlexNet] 논문 리뷰 및 구현 (코드 설명 포함)

by 이 정규 2023. 2. 8.
728x90
반응형

AlexNet


Description:

  • AlexNet 특징:
  1. AlexNet Architecture
    • Convolutional layers = 5
    • Fully Connected layers = 3
  2. GPU를 2개의 병렬 구조로 처음 활용
  3. Activation Function : ReLU 함수를 모든 convolution layer와 fully-connected에 적용 (tanh함수 보다 더 낮은 error 보여줌)
  4. Max-Pooling Layer : 이전에는 Average Pooling 적용
    • kernel size = 3x3
    • stride = 2
    • number of Max-Pooling layer = 3
  5. Dropout: over-fitting 방지
    • 1, 2 번째 fully-connected layer에 적용
    • dropout rate = 0.5
  6. LRN(local response normalization):
    • ReLU 함수가 매우 높은 픽셀을 그대로 받아서 주변에 영향을 미칠 수 있음
    • 주변에 영향을 주는 것을 방지해주기 위해 같은 위치에 있는 픽셀끼리 정규화
    • 1, 2 번째 convolution layer와 ReLU 뒤에 적용 (k = 2, n = 5, α = 10−4, β = 0.75 )
    • LRN layer가 메모리 점유율과 계산 복잡도를 증대시 킬 수 있음 (모델 용량 증대)
    • 정규화에 도움 (error 감소)
    • 측면 억제를 보여주는 대표적인 사례
      • Hermann Grid
      • 검은 사각형 안에 흰색의 선이 지나가는 구조로 검은 사각형을 집중해서 볼 때, 측면에 회색 점이 보임
      • 반대로 흰색의 선에 집중하면 회색 점이 보이지 않음
      • 흰색으로 둘러싸인 측면에서 억제를 발생시키기 때문, 그 결과 흰색이 더 반감되어 보임
  7. Weight and Bias initialization (convolution layer and Fully connected layer):
    • 표준편차(STD)를 0.01로 하는 Zero-mean Gaussian 정규 분포를 활용해 모든 레이어의 weight 초기화
    • neuron bias: 2, 4, 5번째 convolution 레이어와, fully-connected 레이어에 상수 1로 적용하고 나머지 레이어는 0을 적용
  8. Hyperparameter:
    • optimizer: SGD -> Adam으로 변경
    • momentum: 0.9
    • weight decay: 5e-4
    • batch size: 128
    • learning rate: 0.01 -> 0.0001로 변경
    • epoch : 90 -> 10
    • adjust learning rate: validation error가 현재 lr로 더 이상 개선 안되면 lr을 10으로 나눠준다
  9. Image preprocessing:
    • Input image shape = 224x224x3
    • resize = 256x256
    • center crop = 227
    • Mean subtraction of RGB per channel
  10. Dataset:
    • 논문: ImageNet Large Scale Visual Recognition Challenge (ILSVRC) - 2012
    • 구현: CIFAR10
  11. System Environment:
    • Google Colab Pro

Reference:


Load Modules

# Utils
import numpy as np
import matplotlib.pyplot as plt

# Torch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
from torchvision import transforms
import torchvision
from torchvision.utils import make_grid

# TensorBoard
from torch.utils.tensorboard import SummaryWriter
! pip install tensorboardX
from tensorboardX import SummaryWriter

Set Hyperparameters

batch_size = 128
num_epochs = 10
learning_rate = 0.0001

Load Data - CIFAR10

train_set = torchvision.datasets.CIFAR10(root='./cifar10', train=True, download=True, transform=transforms.ToTensor())
test_set = torchvision.datasets.CIFAR10(root='./cifar10', train=False, download=True, transform=transforms.ToTensor())

Visualization

dataiter = iter(train_set) 
images, labels = next(dataiter) 

img   = make_grid(images, padding=0) 
npimg = img.numpy() 
plt.figure(figsize=(5, 7))
plt.imshow(np.transpose(npimg, (1,2,0))) 
plt.show()

Mean subtraction of RGB per channel

train_meanRGB = [np.mean(x.numpy(), axis=(1,2)) for x, _ in train_set]
train_stdRGB = [np.std(x.numpy(), axis=(1,2)) for x, _ in train_set]

train_meanR = np.mean([m[0] for m in train_meanRGB])
train_meanG = np.mean([m[1] for m in train_meanRGB])
train_meanB = np.mean([m[2] for m in train_meanRGB])
train_stdR = np.mean([s[0] for s in train_stdRGB])
train_stdG = np.mean([s[1] for s in train_stdRGB])
train_stdB = np.mean([s[2] for s in train_stdRGB])

print(train_meanR, train_meanG, train_meanB)
print(train_stdR, train_stdG, train_stdB)

Define the image transformation for data

train_transformer = transforms.Compose([transforms.Resize((256,256)),
                                        transforms.CenterCrop(227),
                                        transforms.ToTensor(),
                                        transforms.Normalize(mean=[train_meanR, train_meanG, train_meanB], std=[train_stdR, train_stdG, train_stdB])])

train_set.transform = train_transformer
test_set.transform = train_transformer

Define DataLoader

trainloader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True, num_workers=2)

Create AlexNet Model

class AlexNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4),
            nn.ReLU(inplace=True),
            nn.LocalResponseNorm(size=5, alpha=1e-4, beta=0.75, k=2),
            nn.MaxPool2d(kernel_size=3, stride=2),

            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.LocalResponseNorm(size=5, alpha=1e-4, beta=0.75, k=2),
            nn.MaxPool2d(kernel_size=3, stride=2),

            nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )

        self.fc = nn.Sequential(
            nn.Linear(in_features=256 * 6 * 6, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),

            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),

            nn.Linear(in_features=4096, out_features=10),
            nn.LogSoftmax(dim=1)
        )

        self.apply(self._init_weight_bias)

    def _init_weight_bias(self, module):
        classname = module.__class__.__name__
        if classname == 'Conv2d':
            nn.init.normal_(module.weight, mean=0.0, std=0.01)
            nn.init.constant_(module.bias, 0)
            if (module.in_channels == 96 or module.in_channels == 384):
              nn.init.constant_(module.bias, 1)

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

Set Device and Model

use_cuda = torch.cuda.is_available()
print("use_cuda : ", use_cuda)

FloatTensor = torch.cuda.FloatTensor if use_cuda else torch.FloatTensor
device = torch.device("cuda:0" if use_cuda else "cpu")

net = AlexNet().to(device)

X = torch.randn(size=(1,3,227,227)).type(FloatTensor)
print(net(X))
print(summary(net, (3,227,227)))

Loss and Optimizer

use_cuda = torch.cuda.is_available()
print("use_cuda : ", use_cuda)
device = torch.device("cuda:0" if use_cuda else "cpu")
model = AlexNet().to(device)
criterion = F.nll_loss
optimizer = torch.optim.Adam(params=model.parameters(), lr=learning_rate)

Training Loop

writer = SummaryWriter("./alexnet/tensorboard") 

def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data,target) in enumerate(train_loader):
        target = target.type(torch.LongTensor)
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 30 == 0:
            print(f"{batch_idx*len(data)}/{len(train_loader.dataset)}")   

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target, reduction='mean').item()
            writer.add_scalar("Test Loss", test_loss, epoch)
            pred = output.argmax(1)
            correct += float((pred == target).sum())
            writer.add_scalar("Test Accuracy", correct, epoch)

        test_loss /= len(test_loader.dataset)
        correct /= len(test_loader.dataset)

        return test_loss, correct
        writer.close()

Per-Epoch Activity

from tqdm import tqdm 
for epoch in tqdm(range(1, num_epochs + 1)):
    train(model, device, trainloader, optimizer, epoch)
    test_loss, test_accuracy = test(model, device, testloader)
    writer.add_scalar("Test Loss", test_loss, epoch)
    writer.add_scalar("Test Accuracy", test_accuracy, epoch)
    print(f"Processing Result = Epoch : {epoch}   Loss : {test_loss}   Accuracy : {test_accuracy}")
    writer.close()

Result

print(f"Result of AlexNet = Epoch:{epoch}   Loss:{test_loss}   Accuracy:{test_accuracy}")

Visualization

%load_ext tensorboard
%tensorboard --logdir=./alexnet/tensorboard --port=6006

Visualize result of Accuracy and Loss by Tensorboard

728x90
반응형

댓글