Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for image sequence classifier #60

Open
selcukyazarklu opened this issue Jan 2, 2024 · 3 comments
Open

Example for image sequence classifier #60

selcukyazarklu opened this issue Jan 2, 2024 · 3 comments

Comments

@selcukyazarklu
Copy link

Hi,

Could you supply more detailed steps for image sequence classification?

I have 200x200 and 3 channels of images for about 49 classses

Regards.

@MalekWahidi
Copy link

MalekWahidi commented Jan 2, 2024

To classify image sequences with dimensions (200,200,3) for about 49 classes, you can adapt the Atari behavior cloning example from the documentation with some modifications:

  • Use a similar approach to the ConvBlock class in the documentation but modify the first convolutional layer to accept 3-channel inputs. Ensure that the output of this block matches the input requirements of the first recurrent layer (input to NCP model).
  • As your task involves handling sequences of images, make sure your data loader provides batches of sequences. You'll need to reshape your inputs to match the expected dimensions by the model such as (batch, sequence_length, channels, height, width).
  • Train your model using a similar approach as described in the documentation. Use a suitable loss function for multi-class classification (Categorical Cross-Entropy Loss) and an optimizer like Adam.
  • If your task is to classify the entire sequence as a whole (e.g., video classification where the entire sequence determines the class), you might want to set return_sequences=False. Otherwise, if your task requires making decisions or predictions at each time step (e.g., frame-by-frame classification in a video), you should set return_sequences=True to get each time step's output from the NCP model.

Basic PyTorch template code:

Import libraries

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

from ncps.torch import CfC
from ncps.wirings import AutoNCP

Define the Convolutional Block

class ConvBlock(nn.Module):
    def __init__(self):
        super(ConvBlock, self).__init__()
        # Adjust these layers to match your 200x200x3 input
        self.conv1 = nn.Conv2d(3, 64, 5, padding=2, stride=2)
        self.conv2 = nn.Conv2d(64, 128, 5, padding=2, stride=2)
        self.bn2 = nn.BatchNorm2d(128)
        self.conv3 = nn.Conv2d(128, 256, 5, padding=2, stride=2)
        self.bn3 = nn.BatchNorm2d(256)
        # Add more layers as needed

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        # Global average pooling
        x = x.mean((-1, -2))  
        return x

Define the combined Convolutional and CfC model

class ConvCfC(nn.Module):
    def __init__(self, n_classes, n_features, n_neurons):
        super(ConvCfC, self).__init__()
        self.conv_block = ConvBlock()
        wiring = AutoNCP(n_neurons, n_classes)  # Assuming n_classes is the same as n_outputs
        self.rnn = CfC(n_features, wiring, batch_first=True, return_sequences=False)
        # Add a fully connected layer
        self.fc = nn.Linear(n_neurons, n_classes)

    def forward(self, x, hx=None):
        batch_size, seq_len = x.size(0), x.size(1)
        # Reshape to combine batch and sequence dimensions
        x = x.view(batch_size * seq_len, *x.shape[2:])
        x = self.conv_block(x)
        # Separate batch and sequence dimensions
        x = x.view(batch_size, seq_len, -1)
        x, hx = self.rnn(x, hx)
        return x, hx

Define your custom dataset

class MyDataset(Dataset):
    # Implement dataset loading here
    def __init__(self):
        pass

    def __len__(self):
        # Return the size of dataset
        pass

    def __getitem__(self, idx):
        # Implement logic to get a single item at idx
        pass

Instantiate model, criterion, optimizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvCfC(n_classes=49).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Load your dataset

train_ds = MyDataset()  # Implement this
trainloader = DataLoader(train_ds, batch_size=32, shuffle=True)

Training loop

for epoch in range(num_epochs):
    model.train()
    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs, hx = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    print(f'Epoch {epoch+1}, Loss: {loss.item()}')

Keep in mind that this is just a quick draft of the general layout and a lot should be modified based on best practices and trial-and-error.

@by90
Copy link

by90 commented Apr 8, 2024

outputs, hx = model(inputs)...i found there isn't any example about hx....how we use the hidden state?here,you haven't pass the hx like outputs, hx = model(inputs,hx),and doesn't use the returned hx

@noorchauhan
Copy link

am unable to understand what exactly is your end goal regarding your recent statement @by90 can you elaborate more with what exactly are you trying to achieve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants