Fastest way to get an element #1651
Replies: 3 comments 1 reply
-
If you want to find your item in all nested sequences, you have no choice but to iterate over all items of the dataset to find sequences - I see no way around this. The problem with def handle_tag(ds: Dataset, tag: BaseTag):
for raw_element in ds.elements():
if raw_element.tag == tag:
element = ds[raw_element.tag] # converts the raw element to a DataElement
# do anonymization
elif raw_element.VR == "SQ":
element = ds[raw_element.tag]
sequence = element.value
for dataset in sequence:
handle_tag(dataset, tag) |
Beta Was this translation helpful? Give feedback.
-
Firstly, if you haven't already, I would strongly recommend you profile your code to learn what the speed bottlenecks actually are. We can speculate but most likely only one or two specific things will make a substantial difference, and the profiling would point to those. Since you mention big files, I suspect that reading time is dominated by disk I/O, not so much from the access to specific tags. If so, and it is image files your are mainly dealing with, then are you using
@mrbean-bremen's answer seems quite good - once in memory, the access can be quite fast, except for the decoding to native Python types which @mrbean-bremen noted, and worked around using The only other thought I have is that if you know which tags and sequences might appear, then you can just go directly to them, rather than iterating all data elements. tags_to_change = [0x100010, 0x100020, ...]
for tag in tags_to_change:
try:
ds[tag].value = # change as needed
except KeyError:
pass Then the above would have to be repeated with any known sequences and the tags that could be in them. |
Beta Was this translation helpful? Give feedback.
-
For some reason i lost track of your response. The dataset is loaded into memory without the pixeldata. We do not process pixeldata so for large files (like fMRI) it is aboslutly worth it to use the stop_before_pixels to read he files. Since we do not process every attribute in the dataset, we had to design a way to only access the tags in the dataset that require processing. Now the only performance issue we have, is large datasets that require a lot of processing. Code snippets are: def create_mapped_dataset(self, dataset):
logger.debug('Converting dicom to json')
try:
json_file = dataset.to_json()
except Exception as e:
logger.error(f'Unable to convert object to JSON file {e}')
return False
elements_dict = json.loads(json_file)
logger.debug('Looping over JSON Dict to create anonimisation mapping')
self.anonimisation_mapping = self.loop_over_json(dicom_dict=elements_dict)
return True and then: def loop_over_json(self, dicom_dict):
return_set = {}
for element in dicom_dict:
try:
vr = dicom_dict[element]["vr"]
except:
vr = 'N/A'
if vr == 'SQ':
sequence_return = [self.loop_over_json(dicom_dict=item) for item in dicom_dict[element]["Value"]]
try:
if sequence_return[0]:
return_set[element] = sequence_return
except IndexError as e:
logger.debug(f'An index error occurred: {e}')
#pass
else:
if str(string_to_tag(element)) in self.processing_list:
return_set[element] = dicom_dict[element]
if return_set:
return return_set In the second one, the self.processing_list value contains all tags we would like to access and process. def iterate_mapped_sequences(self, dataset, mapped_dataset):
logger.debug('Modifying existing elements with mapped sequences')
# initiate the class in ' Processing Functions.py' so we can easily execute the functions to modify
# the listed elements.
modify_elements = Process()
# loop over eacht attribute in each dataset
pydicom.config.convert_wrong_length_to_UN = True
logger.debug(f'Working with mapped_dataset: {mapped_dataset}')
for mapped_tag in mapped_dataset:
logger.debug(f'Accessing mapped tag {mapped_tag}')
#process the elements that are part of the mapped_dataset
if isinstance(mapped_dataset[mapped_tag], list):
logger.debug(f'{mapped_tag} is a sequence.')
[self.iterate_mapped_sequences(dataset=item,
mapped_dataset=mapped_dataset[mapped_tag][0]) for item in
dataset[string_to_tag(mapped_tag)].value] Any suggestions how to loop over large files that have many tags in sequences we have to access? |
Beta Was this translation helpful? Give feedback.
-
We have created a script that will perform anonymisation of the content.
We use the function walk() to walk over the full dataset and test if the value requires modification.
There are extrme big files with hundrets of tags, that could take more than 30 minutes to walk over.
Is there a faster way to get all attributes of a certain tag? Especially the nested ones as well, without knowing where the tags can be and which/how many sequences there are?
Beta Was this translation helpful? Give feedback.
All reactions