Extract region with uint16 (bioformats) #1742

alevangel · 2024-12-10T14:54:33Z

I'm encountering an issue when attempting to extract an image with a 16-bit color space using Bio-Formats source. I'm unsure if this is expected behavior.

Specifically, if I convert the image to 8-bit, the resulting values are between 0 and 15, which seems unusually small.

Questions

Is there a intensity scaling issue? I noticed the following in the metadata: '0028,1040 Pixel Intensity Relationship': 'LOG', and I'm not sure if this might be a factor.
I'm requesting an output image with JPEG encoding, so I would generally expect it to be 8-bit. Is there something I'm missing?

To reproduce

Env

python 3.10
large_image 1.30.4
large_image_source_bioformats (commit a5b46d1)
python-bioformats 4.1.0

Steps

Here the input dicom file: 1-1 2.zip

Use the following Python code to attempt reading the region (e.g. the whole slide in this example):

import large_image
import matplotlib.pyplot as plt
import numpy as np

level = 3
size = (4096, 3328)  # (w,h)
location = (0, 0)  # (y,x)

slide = large_image.open(filename)
mag = slide.getMagnificationForLevel(level)
tile_image, _ = self.slide.getRegion(
            region=dict(left=location[1],
                        top=location[0],
                        right=location[1]+size[1],
                        bottom=location[0]+size[0],
                        units='pixels'),
            tile_size=dict(width=size[1], height=size[0]),
            tile_offset=dict(x=0, y=0),
            scale=dict(magnification=mag['magnification']),
            format=large_image.constants.TILE_FORMAT_NUMPY,
            encoding="JPEG",
        )

# Convert to 8 bit
if tile_image.dtype == np.uint32:
    tile_image = np.floor_divide(tile_image, 2 ** (32-8)).astype(np.uint8)
elif tile_image.dtype == np.uint16:
    tile_image = np.floor_divide(tile_image, 2 ** (16-8)).astype(np.uint8)

# Show
plt.imshow(nparray)
plt.show()

The text was updated successfully, but these errors were encountered:

alevangel · 2024-12-18T18:05:09Z

Hi @manthey,

I've been working on improving the handling of DICOM images where Bits Stored is less than Bits Allocated, and I've developed a solution that addresses the issues I was encountering, particularly with images appearing too dark due to incorrect scaling.

While I understand that a definitive solution should ideally be implemented within the Bioformats module itself, I've implemented a workaround by adding a new method, _scalePixelValues, to the base TileSource class

large_image/large_image/tilesource/base.py

Line 36 in 0e6db1f

class TileSource(IPyLeafletMixin):

Here's the code for the _scalePixelValues method:

def _scalePixelValues(self, tile: np.ndarray) -> np.ndarray:
        """
        Scales pixel values of a tile based on DICOM metadata.

        This method applies necessary transformations to pixel values, including:
        1. Rescale Slope and Intercept.
        2. Scaling to utilize the full range of the dtype when Bits Stored < Bits Allocated.
        3. Handling of bit shifting when High Bit is not aligned with Bits Stored.

        Args:
            tile: The input tile as a NumPy array.

        Returns:
            The scaled tile as a NumPy array.
        """

        # Scale based on Rescale Intercept and Slope
        if all(key in self._metadata['seriesMetadata'] for key in
               ('0028,1053 Rescale Slope', '0028,1052 Rescale Intercept')):
            rescale_slope = float(self._metadata['seriesMetadata']['0028,1053 Rescale Slope'])
            rescale_intercept = float(self._metadata['seriesMetadata']['0028,1052 Rescale Intercept'])
            tile = tile * rescale_slope + rescale_intercept
            tile = tile.astype(self._dtype)

        # Scaling pixel values to use the full range of the dtype
        bits_stored_keys = [
            'Bits Stored',
            '0028,0101 Bits Stored',
        ]
        high_bits_keys = [
            'High Bit',
            '0028,0102 High Bit',
        ]

        if self._metadata.get('bitsPerPixel'):
            bits_allocated = self._metadata['bitsPerPixel']
        elif self._metadata['seriesMetadata'].get('0028,0100 Bits Allocated'):
            bits_allocated = int(self._metadata['seriesMetadata']['0028,0100 Bits Allocated'])
        else:
            # Infer the bits per pixel from the dtype
            bits_allocated = tile.dtype.itemsize * 8

        for key in bits_stored_keys:
            if key in self._metadata['seriesMetadata']:
                bits_stored = int(self._metadata['seriesMetadata'][key])

                # Handling bit shift
                for high_bit_key in high_bits_keys:
                    if high_bit_key in self._metadata['seriesMetadata']:
                        high_bit = int(self._metadata['seriesMetadata'][high_bit_key])
                        if high_bit < bits_stored - 1:
                            shift = (bits_stored - 1) - high_bit
                            tile = tile << shift
                            mask = (1 << bits_stored) - 1
                            tile = tile & mask

                if bits_stored < bits_allocated:
                    scale_factor = 2 ** (bits_allocated - bits_stored)
                    tile = (tile.astype(np.int32) * scale_factor).astype(self._dtype)

                    # Clamping
                    if self._dtype == np.uint16:
                        tile = np.clip(tile, 0, 65535)
                    elif self._dtype == np.int16:
                        tile = np.clip(tile, -32768, 32767)

        return tile

I then call this method within the getRegion method of the TileSource class, right after obtaining the image region:

if hasattr(self, '_metadata'):
            image = self._scalePixelValues(cast(np.ndarray, image))

I initially attempted to implement this logic within the getTile function of the Bioformats module. However, I encountered issues due to the recursive nature of getTile when fetching regions that span multiple tiles. Applying the scaling within getTile resulted in multiple scaling operations, leading to incorrect pixel values.

By placing the scaling logic in a separate method and calling it from getRegion, the scaling is applied only once per region request, ensuring the correct pixel values are returned.

I believe this approach provides a reasonable workaround for handling DICOM images with these specific characteristics, although a more permanent solution within the Bioformats library would be ideal.

I'm sharing this code in the hope that it might be helpful to others facing similar issues and to solicit feedback or suggestions for improvement.

Thanks for your work on this project!

dgutman · 2024-12-18T18:22:31Z

The issue related to ... interesting .. behavior related to scaling of 16 bit uint's when the "real" data is perhaps 12 bit is probably not just limited to Bioformats based tile readers. There are some methods using the "style" parameter that deals with this, but at a high level, being able to detect / deal with this sort of issue is worth some thought. I'll have to review your suggestion above in greater detail, but at least conceptually... it would be great to not have 100's or 1000's of completely black tiles when we ingest poorly / incorrectly scaled / bits stored!=bitsAllocated issues. The main thing to think through is how expensive this is to detect... and when/where to detect it. From what I have stumbled across, it's very common when people are converting whatever image came off their scanner, and try and convert it to OME XML and/or DICOM the coder ( grad student?, :-) ) will have the image scaling not configured properly, making it very time consuming to try and autoScale the images... or even to figure out if there's any content in the uploaded images to make it worth trying to scale them! Thanks for the code / suggestion as well.

…

On Wed, Dec 18, 2024 at 1:05 PM Alessandro Evangelista < ***@***.***> wrote: Hi @manthey <https://github.com/manthey>, I've been working on improving the handling of DICOM images where Bits Stored is less than Bits Allocated, and I've developed a solution that addresses the issues I was encountering, particularly with images appearing too dark due to incorrect scaling. While I understand that a definitive solution should ideally be implemented within the Bioformats module itself, I've implemented a workaround by adding a new method, _scalePixelValues, to the base TileSource class https://github.com/girder/large_image/blob/0e6db1f9ebd472a4fe3f1e6d81d14ba93b4ee4b6/large_image/tilesource/base.py#L36 Here's the code for the _scalePixelValues method: def _scalePixelValues(self, tile: np.ndarray) -> np.ndarray: """ Scales pixel values of a tile based on DICOM metadata. This method applies necessary transformations to pixel values, including: 1. Rescale Slope and Intercept. 2. Scaling to utilize the full range of the dtype when Bits Stored < Bits Allocated. 3. Handling of bit shifting when High Bit is not aligned with Bits Stored. Args: tile: The input tile as a NumPy array. Returns: The scaled tile as a NumPy array. """ # Scale based on Rescale Intercept and Slope if all(key in self._metadata['seriesMetadata'] for key in ('0028,1053 Rescale Slope', '0028,1052 Rescale Intercept')): rescale_slope = float(self._metadata['seriesMetadata']['0028,1053 Rescale Slope']) rescale_intercept = float(self._metadata['seriesMetadata']['0028,1052 Rescale Intercept']) tile = tile * rescale_slope + rescale_intercept tile = tile.astype(self._dtype) # Scaling pixel values to use the full range of the dtype bits_stored_keys = [ 'Bits Stored', '0028,0101 Bits Stored', ] high_bits_keys = [ 'High Bit', '0028,0102 High Bit', ] if self._metadata.get('bitsPerPixel'): bits_allocated = self._metadata['bitsPerPixel'] elif self._metadata['seriesMetadata'].get('0028,0100 Bits Allocated'): bits_allocated = int(self._metadata['seriesMetadata']['0028,0100 Bits Allocated']) else: # Infer the bits per pixel from the dtype bits_allocated = tile.dtype.itemsize * 8 for key in bits_stored_keys: if key in self._metadata['seriesMetadata']: bits_stored = int(self._metadata['seriesMetadata'][key]) # Handling bit shift for high_bit_key in high_bits_keys: if high_bit_key in self._metadata['seriesMetadata']: high_bit = int(self._metadata['seriesMetadata'][high_bit_key]) if high_bit < bits_stored - 1: shift = (bits_stored - 1) - high_bit tile = tile << shift mask = (1 << bits_stored) - 1 tile = tile & mask if bits_stored < bits_allocated: scale_factor = 2 ** (bits_allocated - bits_stored) tile = (tile.astype(np.int32) * scale_factor).astype(self._dtype) # Clamping if self._dtype == np.uint16: tile = np.clip(tile, 0, 65535) elif self._dtype == np.int16: tile = np.clip(tile, -32768, 32767) return tile I then call this method within the getRegion method of the TileSource class, right after obtaining the image region: if hasattr(self, '_metadata'): image = self._scalePixelValues(cast(np.ndarray, image)) I initially attempted to implement this logic within the getTile function of the Bioformats module. However, I encountered issues due to the recursive nature of getTile when fetching regions that span multiple tiles. Applying the scaling within getTile resulted in multiple scaling operations, leading to incorrect pixel values. By placing the scaling logic in a separate method and calling it from getRegion, the scaling is applied only once per region request, ensuring the correct pixel values are returned. I believe this approach provides a reasonable workaround for handling DICOM images with these specific characteristics, although a more permanent solution within the Bioformats library would be ideal. I'm sharing this code in the hope that it might be helpful to others facing similar issues and to solicit feedback or suggestions for improvement. Thanks for your work on this project! — Reply to this email directly, view it on GitHub <#1742 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFODTQNAEHFDT42NZTJOY32GG2O3AVCNFSM6AAAAABTLLJR32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRHE3DENBUGU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract region with uint16 (bioformats) #1742

Extract region with uint16 (bioformats) #1742

alevangel commented Dec 10, 2024 •

edited

Loading

alevangel commented Dec 18, 2024

dgutman commented Dec 18, 2024 via email

Extract region with uint16 (bioformats) #1742

Extract region with uint16 (bioformats) #1742

Comments

alevangel commented Dec 10, 2024 • edited Loading

Questions

To reproduce

Env

Steps

alevangel commented Dec 18, 2024

dgutman commented Dec 18, 2024 via email

alevangel commented Dec 10, 2024 •

edited

Loading