defer_size is not having any performance impact with many tags #1957
Replies: 3 comments
-
I can partially reproduce this, but I think this does not only depend on the image size, but also on the number of contained tags. Given that each tag in the dataset must be traversed, regardless if it will be read, the overhead of many tags will be too large. Here is an example: An enhanced MR dataset with a size of 8MB: A Breast Tomo Image with a size of 8 MB: Admittedly, this can only partly explain it. Here is a larger Tomo with a size of 950 MB: which is a bit buffling, given that it is faster to read (with EDIT: Rounded the numbers for readability. |
Beta Was this translation helpful? Give feedback.
-
It seems you are right - I spent a while debugging and profiling (runtime and memory), and I could not find any operation in the code that allocates/reads more than 1 MB of data, let alone 50. On the other hand, my 1 GB file has only 214 tags while my 50 MB one has 20725 ( I think this is not an awful result for me - I am reading offset and length of some pixel data before passing these parameters to another library to actually read the data, and I was worried that I might be doubling efforts at runtime if I am inclined to modify my two example file by switching the pixel data between them, or something like this. In fact, I used a hex editor to convert my 50 MB file to a 700 KB file by setting the length of the (final) pixel data tag to 10, and cropping the value to 10 bytes, setting them to 01-02-03.... Using dcmread, I could validate that I was getting the expected data for "PixelData", while at the same time seeing the same timing behavior as before. So the long read times are not a result of reading 50 MB of pixel data, but, as you suggested, of reading 20k tags. I'll leave this open for someone to decide if they want to investigate why parsing 20k tags takes ~0.1s and if that is expected or not. Feel free to close if so. |
Beta Was this translation helpful? Give feedback.
-
That is certainly true, but there is nothing to be done about this - changes in the medical domain are slow (for good reasons), and it is unlikely that a new format without this problem will appear any time soon. Given that most applications are interested in the image data, and the time to read them usually trumps the tag traversal time, this is probably not a real problem for most use cases.
I'm afraid that there isn't much that can be done, but I would be happy to be proven wrong. I'll convert this into a discussion issue instead of closing it. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
I previously reported #1873, which is asking for a public API to read an element offset/length but not its value. This works now with a private API (
_dict
).However, when a DICOM file has pixel data, even that workaround does not seem to work.
I have a 50MB file with pixel data, which is slower to read than a 1 GB file without pixel data (with
defer_size=0
). In addition, I do not see a difference betweendefer_size=0
ordefer_size=None
for the 50 MB file, while I see it for the 1 GB file.Steps To Reproduce
Output:
Not sure if this should be marked as a duplicate of #1873, but I felt reporting it separately could make sense since #1873 is about changing the API, while this one is about changing some implemention only (I suspect "PixelData" is accessed somewhere, so that
defer_size
is not effective.)Beta Was this translation helpful? Give feedback.
All reactions