Auto3DSeg not working for TCIA DICOM dataset #1664

LucianoDeben · 2024-03-16T10:38:44Z

LucianoDeben
Mar 16, 2024

Hi all,

I am trying to train a model using Auto3DSegs AutoRunner() on the TCIA HCC-TACE dataset found here . Consisting of 105 subjects with each multiple CT volumes (pre and post operative) and corresponding annotated segmentation masks. I am using the TciaDataset API call to load the described dataset directly into the workspace. Where each image volume consists of mulitple DICOM slices and the segmentation is a single DICOM file with the SOP class (I believe). The problem seems to lay in the fact that the segmentation labels are one hot encoded as the shapes of a single loaded batch image and label are the following :

(image, label) = torch.Size([1, 1, 512, 512, 87]) torch.Size([1, 4, 512, 512, 87])

Where 4 classes are segmented:

label_dict = {'Liver': 0, 'Tumor': 1, 'vessels': 2, 'aorta': 3}

On running the AutoRunner() I experience the following error:
RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "c:\Users\20191678\AppData\Local\anaconda3\envs\ITP\Lib\site-packages\monai\transforms\transform.py", line 141, in apply_transform return _apply_transform(transform, data, unpack_items, lazy, overrides, log_stats) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\20191678\AppData\Local\anaconda3\envs\ITP\Lib\site-packages\monai\transforms\transform.py", line 98, in _apply_transform return transform(data, lazy=lazy) if isinstance(transform, LazyTrait) else transform(data) ^^^^^^^^^^^^^^^ File "c:\Users\20191678\AppData\Local\anaconda3\envs\ITP\Lib\site-packages\monai\apps\auto3dseg\transforms.py", line 82, in __call__ raise ValueError( ValueError: The label shape torch.Size([512, 512, 348]) is different from the source shape torch.Size([512, 512, 87]) ..\data\..\data\HCC-TACE-Seg\HCC_077\300\seg.

From this multiple questions arise:

Is the AutoRunner() compatible with DICOM image folders as volume and especially the One-hot-encoded segmentation DICOM as label? As all the tutorial notebooks use .nii.gz formats.
I came up with a custom data transform that loads the data into a dataloader with a argmax to turn the one-hot-encoded labels to a single int value. How can I use this dataloader for the AutoRunner() if it turns out that One-hot-encoded labels are not supported?
Not related to the problem but: Does the AutoRunner really use pre-trained SwinUNETR weights as described in this MONAI meetup video (27 minute)?

I will add the notebook code below. If their is any need for clonding my code repo to reproduce let me know.
NOTE: I used the dependencies described in the MONAI repo itself.

`

Import libraries

from monai.data import DataLoader
from monai.transforms import (EnsureChannelFirstd,
Compose, LoadImaged, ResampleToMatchd, MapTransform)

from monai.apps import TciaDataset
from monai.apps.auto3dseg import AutoRunner
from monai.bundle import ConfigParser

from monai.config import print_config
import json

print_config()

Specify the collection and segmentation type

collection, seg_type = "HCC-TACE-Seg", "SEG"

Create a dictionary to map the labels in the segmentation to the labels in the image

label_dict = {'Liver': 0,
'Tumor': 1,
'vessels': 2,
'aorta': 3}

class UndoOneHotEncoding(MapTransform):
def init(self, keys):
super().init(keys)

def __call__(self, data):
    for key in self.keys:
        data[key] = data[key].argmax(dim=0).unsqueeze(0)
    return data

Create a composed transform that loads the image and segmentation, resamples the image to match the segmentation,

and undoes the one-hot encoding of the segmentation

transform = Compose(
[
LoadImaged(reader="PydicomReader", keys=["image", "seg"], label_dict=label_dict),
EnsureChannelFirstd(keys=["image", "seg"]),
ResampleToMatchd(keys="image", key_dst="seg"),
UndoOneHotEncoding(keys="seg"),
]
)

Create a dataset for the training with a validation split

train_dataset = TciaDataset(
root_dir="../data",
collection=collection,
section="training",
transform=transform,
download=True,
download_len=2,
seg_type=seg_type,
progress=True,
cache_rate=0.0,
val_frac=0.0,
)

Create a dataloader

train_loader = DataLoader(train_dataset, batch_size=1, num_workers=0)

Sample a batch of data from the dataloader

batch = next(iter(train_loader))

Separate the image and segmentation from the batch

image, seg = batch["image"], batch["seg"]

print(image.shape, seg.shape, seg.unique())

torch.Size([1, 1, 512, 512, 87]) torch.Size([1, 1, 512, 512, 87]) metatensor([0, 1, 2, 3])

Add a fold key to all the training data

train_dataset.datalist = [{**item, 'fold': 0} for item in train_dataset.datalist]

Change "seg" to "label" in the datalist

for item in train_dataset.datalist:
item["label"] = item.pop("seg")

Concatenate the training and test datalists

data_list = {"training": train_dataset.datalist}

datalist_file = "../auto3dseg_datalist.json"
with open(datalist_file, "w") as f:
json.dump(data_list, f)

Create input configuration .yaml file

input_config = {
"name": "HCC-TACE-Seg",
"task": "segmentation",
"modality": "CT",
"datalist": "../auto3dseg_datalist.json",
"dataroot": "../data",
}

config_yaml = "./auto3dseg_config.yaml"
ConfigParser.export_config_file(input_config, config_yaml)

runner = AutoRunner(work_dir = "../data/auto3dseg", input=input_config)
runner.run()
`

The datalist has the following structure:
{ "training": [ { "image": "..\\data\\HCC-TACE-Seg\\HCC_077\\300\\image", "fold": 0, "label": "..\\data\\HCC-TACE-Seg\\HCC_077\\300\\seg" }, { "image": "..\\data\\HCC-TACE-Seg\\HCC_017\\300\\image", "fold": 0, "label": "..\\data\\HCC-TACE-Seg\\HCC_017\\300\\seg" } ] }
Can I get some guidance how to make this work with this custom dataset?

KumoLiu · 2024-03-18T03:14:22Z

KumoLiu
Mar 18, 2024
Maintainer

Hi @LucianoDeben, thanks for your interest here.

Is the AutoRunner() compatible with DICOM image folders as volume and especially the One-hot-encoded segmentation DICOM as label? As all the tutorial notebooks use .nii.gz formats.

MONAI does provide components to work with DICOM images. MONAI's monai.data.ImageReader can select a suitable reader for different image formats including DICOM, NIfTI, PNG, etc.
https://github.com/Project-MONAI/research-contributions/blob/9c95bff194e3cdb97970606ba47bd36160ec0ee2/auto3dseg/algorithm_templates/segresnet/scripts/segmenter.py#L211

I came up with a custom data transform that loads the data into a dataloader with a argmax to turn the one-hot-encoded labels to a single int value. How can I use this dataloader for the AutoRunner() if it turns out that One-hot-encoded labels are not supported?
Not related to the problem but: Does the AutoRunner really use pre-trained SwinUNETR weights as described in this MONAI meetup video (27 minute)?

Hi @finalelement and @dongyang0122, could you please help take a look at these questions? Thanks in advance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto3DSeg not working for TCIA DICOM dataset #1664

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Auto3DSeg not working for TCIA DICOM dataset #1664

LucianoDeben Mar 16, 2024

Import libraries

Specify the collection and segmentation type

Create a dictionary to map the labels in the segmentation to the labels in the image

Create a composed transform that loads the image and segmentation, resamples the image to match the segmentation,

and undoes the one-hot encoding of the segmentation

Create a dataset for the training with a validation split

Create a dataloader

Sample a batch of data from the dataloader

Separate the image and segmentation from the batch

torch.Size([1, 1, 512, 512, 87]) torch.Size([1, 1, 512, 512, 87]) metatensor([0, 1, 2, 3])

Add a fold key to all the training data

Change "seg" to "label" in the datalist

Concatenate the training and test datalists

Create input configuration .yaml file

Replies: 1 comment

KumoLiu Mar 18, 2024 Maintainer

LucianoDeben
Mar 16, 2024

KumoLiu
Mar 18, 2024
Maintainer