Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with TF 2.14 image when providing automata to RETURNN #68

Open
vieting opened this issue Nov 9, 2023 · 17 comments
Open

Comments

@vieting
Copy link
Contributor

vieting commented Nov 9, 2023

Due to an issue that I have with a training that uses FastBaumWelchLoss that might be related to tensorflow, I wanted to try running the same setup with a newer tensorflow version. I tried Bene's image and RASR from #64 to run it, however, get a segmentation fault.

From the log, I'm not sure what is going wrong. I see configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory), but this seems to be normal and is included in the log of other working examples as well. I also see these warnings multiple times, but not sure if that's critical.

2023-11-09 09:24:43.512116: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:24:43.512276: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:24:43.512344: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Can anyone help to find out what issue is causing the segmentation fault?

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

The full stdout+stderr of the RETURNN training is below.


RETURNN starting up, version 1.20231108.140626+git.9fe93590, date/time 2023-11-09-09-23-45 (UTC+0100), pid 75081, cwd ..., Python /usr/bin/python3
RETURNN command line options: ['returnn.tf214.config']
Hostname: cn-227
2023-11-09 09:23:49.835166: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:23:49.835231: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:23:49.840281: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-09 09:23:51.276416: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow: 2.14.0 (v3.14.0-rc1-21-g4dacf3f368e) (<not-under-git> in /usr/local/lib/python3.11/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
2023-11-09 09:24:14.771153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 9619 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 8195166591384647575
       xla_global_id: -1
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10086907904
       locality {
         bus_id: 1
         links {
         }
       }
       incarnation: 2686889896139600267
       physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5"
       xla_global_id: 416903419
Using gpu device 0: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-227', GPU 0, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
  OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-11-09 09:24:25.919194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9619 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
WARNING:tensorflow:From .../returnn/returnn/tf/util/basic.py:1723: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
2023-11-09 09:24:26.292053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9619 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From .../returnn/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

Network layer topology:
  extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
  used data keys: ['data', 'seq_tag']
  layers:
    layer conv 'conv_1' #: 32
    layer pool 'conv_1_pool' #: 32
    layer conv 'conv_2' #: 64
    layer conv 'conv_3' #: 64
    layer merge_dims 'conv_merged' #: 24000
    layer split_dims 'conv_source' #: 1
    layer source 'data' #: 1
    layer copy 'encoder' #: 512
    layer subnetwork 'features' #: 750
    layer conv 'features/conv_h' #: 150
    layer eval 'features/conv_h_act' #: 150
    layer variable 'features/conv_h_filter' #: 150
    layer split_dims 'features/conv_h_split' #: 1
    layer conv 'features/conv_l' #: 5
    layer layer_norm 'features/conv_l_act' #: 750
    layer eval 'features/conv_l_act_no_norm' #: 750
    layer merge_dims 'features/conv_l_merge' #: 750
    layer copy 'features/output' #: 750
    layer linear 'input_linear' #: 512
    layer softmax 'output' #: 88
    layer copy 'specaug' #: 750
net params #: 12409788
net trainable params: [<tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
2023-11-09 09:24:29.390553: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-09-08-23-44
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
2023-11-09 09:24:35.424660: W tensorflow/c/c_api.cc:305] Operation '{name:'global_step' id:357 op device:{requested: '/device:CPU:0', assigned: ''} def:{{{node global_step}} = VarHandleOp[_class=["loc:@global_step"], _has_manual_control_dependencies=true, allowed_devices=[], container="", dtype=DT_INT64, shape=[], shared_name="global_step", _device="/device:CPU:0"]()}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=.../returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--model-automaton.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 75679
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-09 09:24:39.785865: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:24:39.786014: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:24:39.786077: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 75679] Python module load
RETURNN SprintControl[pid 75679] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7f1248f66e80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, config={'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 75679] PythonControl create {'c2p_fd': 37, 'p2c_fd': 38, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, 'config': {'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f1248f66e80>}
RETURNN SprintControl[pid 75679] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, 'config': {'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f1248f66e80>}
RETURNN SprintControl[pid 75679] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f1248f66e80>, 'config': {'c2p_fd': '37', 'p2c_fd': '38', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 75679] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7f1248f66e80>, {}
RETURNN SprintControl[pid 75679] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=.../returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:38,p2c_fd:40,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--model-automaton.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 75699
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-09 09:24:43.512116: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-09 09:24:43.512276: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-09 09:24:43.512344: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 75699] Python module load
RETURNN SprintControl[pid 75699] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7fc241a2ae80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, config={'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 75699] PythonControl create {'c2p_fd': 38, 'p2c_fd': 40, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, 'config': {'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7fc241a2ae80>}
RETURNN SprintControl[pid 75699] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, 'config': {'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7fc241a2ae80>}
RETURNN SprintControl[pid 75699] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7fc241a2ae80>, 'config': {'c2p_fd': '38', 'p2c_fd': '40', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 75699] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7fc241a2ae80>, {}
RETURNN SprintControl[pid 75699] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
2023-11-09 09:25:01.405775: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
Fatal Python error: Segmentation fault

Current thread 0x00007fc2462c4380 (most recent call first):
  File ".../returnn/returnn/sprint/control.py", line 499 in _handle_cmd_export_allophone_state_fsa_by_segment_name
  File ".../returnn/returnn/sprint/control.py", line 509 in _handle_cmd
  File ".../returnn/returnn/sprint/control.py", line 524 in handle_next
  File ".../returnn/returnn/sprint/control.py", line 550 in run_control_loop

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector (total: 37)
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<?xml version="1.0" encoding="UTF-8"?>
<sprint>


  PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
  Segmentation fault

  Creating stack trace (innermost first):
  #2  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fc2485f8520]
  #3  /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7fc24864c9fc]
  #4  /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7fc2485f8476]
  #5  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fc2485f8520]
  #6  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl13TrimAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a) [0x55edd8e4640a]
  #7  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl14CacheAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a2) [0x55edd8e55c72]
  #8  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fb257) [0x55edd8dd7257]
  #9  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fe9ac) [0x55edd8dda9ac]
  #10  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Am15TransitionModel5applyEN4Core3RefIKN3Fsa9AutomatonEEEib+0x274) [0x55edd8dd3194]
  #11  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Am24ClassicTransducerBuilder20applyTransitionModelEN4Core3RefIKN3Fsa9AutomatonEEE+0x387) [0x55edd8dc2df7]
  #12  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x123) [0x55edd8be4e43]
  #13  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x53) [0x55edd8be5183]
  #14  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder15buildTransducerEN4Core3RefIKN3Fsa9AutomatonEEE+0x8f) [0x55edd8be7cbf]
  #15  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder15buildTransducerERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x66) [0x55edd8be2516]
  #16  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder5buildERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2e) [0x55edd8be2d5e]
  #17  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Nn25AllophoneStateFsaExporter23exportFsaForOrthographyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x54) [0x55edd8abb054]
  #18  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal32exportAllophoneStateFsaBySegNameEP7_objectS3_+0x133) [0x55edd8aa0833]
  #19  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal8callbackEP7_objectS3_+0x25d) [0x55edd8aa0e6d]
  #20  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1cd073) [0x7fc27c978073]
  #21  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x87) [0x7fc27c928ff7]
  #22  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x477a) [0x7fc27c8b696a]
  #23  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7fc27ca16f9a]
  #24  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7fc27c92c058]
  #25  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7fc27c8b729e]
  #26  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7fc27ca16f9a]
  #27  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7fc27c92c058]
  #28  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7fc27c8b729e]
  #29  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7fc27ca16f9a]
  #30  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1810d8) [0x7fc27c92c0d8]
  #31  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_Call+0x128) [0x7fc27c92bb88]
  #32  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Python8PyCallKwEP7_objectPKcS3_z+0xe6) [0x55edd8cee876]
  #33  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl16run_control_loopEv+0x5f) [0x55edd8a94fbf]
  #34  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer13pythonControlEv+0x167) [0x55edd8841317]
  #35  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer4mainERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x303) [0x55edd881ae13]
  #36  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x23) [0x55edd8880413]
  #37  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application4mainEiPPc+0x577) [0x55edd881c577]
  #38  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55edd881a52d]
  #39  /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc2485dfd90]
  #40  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fc2485dfe40]
  #41  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x25) [0x55edd883f7a5]

Exception in py_wrap_get_sprint_automata_for_batch:
EXCEPTION
Traceback (most recent call last):
  File ".../returnn/returnn/tf/sprint.py", line 45, in get_sprint_automata_for_batch_op.<locals>.py_wrap_get_sprint_automata_for_batch
    line: return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
    locals:
      py_get_sprint_automata_for_batch = <global> <function py_get_sprint_automata_for_batch at 0x7f8a197611c0>
      sprint_opts = <local> {'sprintExecPath': '/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*...
      tags = <not found>
      py_tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
                               b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
                               b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
                               b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
                               b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
                               b'switchboard-1/sw02...
  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    line: edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
    locals:
      edges = <not found>
      weights = <not found>
      start_end_states = <not found>
      sprint_instance_pool = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7f8a19ea6510>
      sprint_instance_pool.get_automata_for_batch = <local> <bound method SprintInstancePool.get_automata_for_batch of <returnn.sprint.error_signals.SprintInstancePool object at 0x7f8a19ea6510>>
      tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
                            b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
                            b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
                            b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
                            b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
                            b'switchboard-1/sw02...
  File ".../returnn/returnn/sprint/error_signals.py", line 528, in SprintInstancePool.get_automata_for_batch
    line: r = instance._read()
    locals:
      r = <local> ('ok', 9, 22, array([ 1,  2,  3,  4,  5,  6,  7,  0,  1,  2,  3,  4,  5,  6,  0,  2,  4,
                          6,  7,  5,  6,  4,  1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  4,  5,
                          6,  7,  2,  4,  6,  8,  8,  8,  8,  8,  0,  6,  0, 22,  0, 48,  0,
                          0,  6,  0, 22,  0, 48,  0,  6, 22, 48, 48,  0, 48,...
      instance = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7f8a1469b710>
      instance._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7f8a1469b710>>
  File ".../returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
    line: return util.read_pickled_object(p)
    locals:
      util = <global> <module 'returnn.util.basic' from '.../returnn/returnn/util/basic.py'>
      util.read_pickled_object = <global> <function read_pickled_object at 0x7f8b51e56b60>
      p = <local> <_io.FileIO name=37 mode='rb' closefd=True>
  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
    locals:
      size_raw = <not found>
      read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7f8b51e56ac0>
      p = <local> <_io.FileIO name=37 mode='rb' closefd=True>
      getvalue = <not found>
  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
    locals:
      EOFError = <builtin> <class 'EOFError'>
      size = <local> 4
      read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
2023-11-09 09:25:23.574595: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


2023-11-09 09:25:24.557823: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4669204044388377120
2023-11-09 09:25:24.558049: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4611900397994247129
2023-11-09 09:25:24.558141: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 14394728958513161507
2023-11-09 09:25:24.701611: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11246935140361182411
2023-11-09 09:25:24.701719: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 3527483492372743068
2023-11-09 09:25:24.701829: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 455321662105441778
2023-11-09 09:25:24.702001: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4997316685218163964
2023-11-09 09:25:24.702105: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11970666840078253952
TensorFlow exception: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "..././returnn/rnn.py", line 11, in <module>
    File ".../returnn/returnn/__main__.py", line 634, in main
    File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
    File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
    File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
    File ".../returnn/returnn/tf/updater.py", line 172, in __init__
    File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
    File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
    File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "..././returnn/rnn.py", line 11, in <module>
    File ".../returnn/returnn/__main__.py", line 634, in main
    File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
    File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
    File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
    File ".../returnn/returnn/tf/updater.py", line 172, in __init__
    File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
    File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
    File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "..././returnn/rnn.py", line 11, in <module>
  File ".../returnn/returnn/__main__.py", line 634, in main
  File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
  File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
  File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
  File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
  File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
  File ".../returnn/returnn/tf/updater.py", line 172, in __init__
  File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
  File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
  File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
  File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
  File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
  File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
  File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
  File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
  File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def

Exception UnknownError() in step 0. (pid 75081)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
EXCEPTION
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
    line: return fn(*args)
    locals:
      fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
      args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                             [-0.09610788],
                             [-0.05115783],
                             ...,
                             [ 0.        ],
                             [ 0.        ],
                             [ 0.        ]],

                            [[-0.00226238],
                             [-0.01049833],
                             [-0.00...
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
    line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
                                          target_list, run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
    line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
                                                  fetch_list, target_list,
                                                  run_metadata)
    locals:
      tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
      tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7f8b1d3596e0>
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7f8a1ab23cb0>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
      run_metadata = <local> None
UnknownError: 2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File ".../returnn/returnn/tf/engine.py", line 744, in Runner.run
    line: fetches_results = sess.run(
              fetches_dict, feed_dict=feed_dict, options=run_options
          )  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
    locals:
      fetches_results = <not found>
      sess = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options = <not found>
      run_options = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
    line: result = self._run(None, fetches, feed_dict, options_ptr,
                             run_metadata_ptr)
    locals:
      result = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options_ptr = <local> None
      run_metadata_ptr = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
    line: results = self._do_run(handle, final_targets, final_fetches,
                                 feed_dict_tensor, options, run_metadata)
    locals:
      results = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      handle = <local> None
      final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
      final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
      feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
                                         [-0.09610788],
                                         [-0.05115783],
                                         ...,
                                         [ 0.        ],
                                         [ 0.        ],
                                         [ 0.        ]],

                                        [[-0.00226238],
                                         [-0.01049...
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
    line: return self._do_call(_run_fn, feeds, fetches, targets, options,
                               run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      _run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
      feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                              [-0.09610788],
                              [-0.05115783],
                              ...,
                              [ 0.        ],
                              [ 0.        ],
                              [ 0.        ]],

                             [[-0.00226238],
                              [-0.01049833],
                              [-0.001...
      fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
    line: raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    locals:
      type = <builtin> <class 'type'>
      e = <not found>
      node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
                         op: "PyFunc"
                         input: "extern_data/placeholders/seq_tag/seq_tag"
                         attr {
                           key: "token"
                           value {
                             s: "pyfunc_0"
                           }
                         }
                         attr {
                           key: "Tout"
                           value {
                             list {
                               type: DT_INT32
                               type: DT_FLOAT
                               type: DT_INT...
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "..././returnn/rnn.py", line 11, in <module>\n    File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "..././returnn/rnn.py", line 11, in <module>
    File ".../returnn/returnn/__main__.py", line 634, in main
    File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
    File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
    File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
    File ".../returnn/returnn/tf/updater.py", line 172, in __init__
    File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
    File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
    File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "..././returnn/rnn.py", line 11, in <module>
    File ".../returnn/returnn/__main__.py", line 634, in main
    File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
    File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
    File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
    File ".../returnn/returnn/tf/updater.py", line 172, in __init__
    File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
    File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
    File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "..././returnn/rnn.py", line 11, in <module>
  File ".../returnn/returnn/__main__.py", line 634, in main
  File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
  File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
  File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
  File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
  File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
  File ".../returnn/returnn/tf/updater.py", line 172, in __init__
  File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
  File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
  File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
  File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
  File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
  File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
  File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
  File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
  File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def



During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File ".../returnn/returnn/tf/network.py", line 4341, in help_on_tf_exception
    line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph(
              debug_fetch,
              target_op=op,
              fetch_helper_tensors=list(op.inputs),
              stop_at_ts=stop_at_ts,
              verbose_stream=file,
          )
    locals:
      debug_fetch = <local> <tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>
      fetch_helpers = <not found>
      op_copied = <not found>
      FetchHelper = <local> <class 'returnn.tf.util.basic.FetchHelper'>
      FetchHelper.copy_graph = <local> <bound method FetchHelper.copy_graph of <class 'returnn.tf.util.basic.FetchHelper'>>
      target_op = <not found>
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      fetch_helper_tensors = <not found>
      list = <builtin> <class 'list'>
      op.inputs = <local> (<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>,)
      stop_at_ts = <local> [<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>, <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, <tf.Tensor 'extern_data/placeholders/batch_dim:...
      verbose_stream = <not found>
      file = <local> <returnn.log.Stream object at 0x7f8b5360de50>
  File ".../returnn/returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph
    line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops))
    locals:
      target_op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      ops = <local> [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
      pformat = <local> <function pformat at 0x7f8b56591e40>
AssertionError: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]

Step meta information:
{'seq_idx': [0,
             1,
             2,
             3,
             4,
             5,
             6,
             7,
             8,
             9,
             10,
             11,
             12,
             13,
             14,
             15,
             16,
             17,
             18,
             19,
             20,
             21,
             22,
             23,
             24,
             25,
             26,
             27,
             28,
             29,
             30,
             31,
             32,
             33,
             34,
             35,
             36,
             37,
             38],
 'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
             'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
             'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
             'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
             'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
             'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
             'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
             'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
             'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
             'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
             'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
             'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
             'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
             'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
             'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
             'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
             'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
             'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
             'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
             'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
             'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
             'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
             'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
             'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
             'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
             'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
             'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
             'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
             'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
             'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
             'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
             'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
             'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
             'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
             'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
             'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
             'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
             'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
             'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760  6246  6372  6861  7296  7499  7534  7622  7824  8031  8295  8431
  8690  8675  8667  8886  9084  9199  9163  9156  9274  9262  9540  9668
  9678  9719  9711  9902  9989 10010 10020 10073 10006 10102 10131 10112
 10130 10178 10208])
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
EXCEPTION
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
    line: return fn(*args)
    locals:
      fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
      args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                             [-0.09610788],
                             [-0.05115783],
                             ...,
                             [ 0.        ],
                             [ 0.        ],
                             [ 0.        ]],

                            [[-0.00226238],
                             [-0.01049833],
                             [-0.00...
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
    line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
                                          target_list, run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
    line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
                                                  fetch_list, target_list,
                                                  run_metadata)
    locals:
      tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
      tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7f8b1d3596e0>
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7f8a1ab23cb0>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
      run_metadata = <local> None
UnknownError: 2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File ".../returnn/returnn/tf/engine.py", line 744, in Runner.run
    line: fetches_results = sess.run(
              fetches_dict, feed_dict=feed_dict, options=run_options
          )  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
    locals:
      fetches_results = <not found>
      sess = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options = <not found>
      run_options = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
    line: result = self._run(None, fetches, feed_dict, options_ptr,
                             run_metadata_ptr)
    locals:
      result = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options_ptr = <local> None
      run_metadata_ptr = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
    line: results = self._do_run(handle, final_targets, final_fetches,
                                 feed_dict_tensor, options, run_metadata)
    locals:
      results = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      handle = <local> None
      final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
      final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
      feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
                                         [-0.09610788],
                                         [-0.05115783],
                                         ...,
                                         [ 0.        ],
                                         [ 0.        ],
                                         [ 0.        ]],

                                        [[-0.00226238],
                                         [-0.01049...
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
    line: return self._do_call(_run_fn, feeds, fetches, targets, options,
                               run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7f8ad3148090>
      self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7f8ad3148090>>
      _run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7f8a0bb7fc40>
      feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a1a9e96b0>: array([[[-0.05505638],
                              [-0.09610788],
                              [-0.05115783],
                              ...,
                              [ 0.        ],
                              [ 0.        ],
                              [ 0.        ]],

                             [[-0.00226238],
                              [-0.01049833],
                              [-0.001...
      fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19573130>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a19571230>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7f8a196b8770>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7f8a1a5a51f0>]
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
    line: raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    locals:
      type = <builtin> <class 'type'>
      e = <not found>
      node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
                         op: "PyFunc"
                         input: "extern_data/placeholders/seq_tag/seq_tag"
                         attr {
                           key: "token"
                           value {
                             s: "pyfunc_0"
                           }
                         }
                         attr {
                           key: "Tout"
                           value {
                             list {
                               type: DT_INT32
                               type: DT_FLOAT
                               type: DT_INT...
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "..././returnn/rnn.py", line 11, in <module>\n    File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "..././returnn/rnn.py", line 11, in <module>
    File ".../returnn/returnn/__main__.py", line 634, in main
    File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
    File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
    File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
    File ".../returnn/returnn/tf/updater.py", line 172, in __init__
    File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
    File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
    File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "..././returnn/rnn.py", line 11, in <module>
    File ".../returnn/returnn/__main__.py", line 634, in main
    File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
    File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
    File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
    File ".../returnn/returnn/tf/updater.py", line 172, in __init__
    File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
    File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
    File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File ".../returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "..././returnn/rnn.py", line 11, in <module>
  File ".../returnn/returnn/__main__.py", line 634, in main
  File ".../returnn/returnn/__main__.py", line 439, in execute_main_task
  File ".../returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
  File ".../returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
  File ".../returnn/returnn/tf/engine.py", line 1429, in _init_network
  File ".../returnn/returnn/tf/engine.py", line 1491, in create_network
  File ".../returnn/returnn/tf/updater.py", line 172, in __init__
  File ".../returnn/returnn/tf/network.py", line 1552, in get_objective
  File ".../returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
  File ".../returnn/returnn/tf/network.py", line 1529, in _construct_objective
  File ".../returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
  File ".../returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
  File ".../returnn/returnn/tf/network.py", line 4080, in _prepare
  File ".../returnn/returnn/tf/layers/basic.py", line 13165, in get_value
  File ".../returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
  File ".../returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def

Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 75081)
SprintSubprocessInstance: interrupt child proc 75679



RASR writes the following log for the nn-trainer:


<?xml version="1.0" encoding="UTF-8"?>
<sprint>
  <system-information>
    <name>cn-227</name>
    <type>x86_64</type>
    <operating-system>Linux</operating-system>
    <build-date>Nov  9 2023</build-date>
    <local-time>2023-11-09 09:24:43.534</local-time>
  </system-information>
  <version>
    RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)
  </version>
  <configuration>
    <source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:38,p2c_fd:40,minPythonControlVersion:4 --*.configuration.channel=output-channel --model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
    <source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:38,p2c_fd:40,minPythonControlVersion:4 --*.configuration.channel=output-channel --model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
    <resources>
      *.home = /u/vieting
      APPTAINER_APPNAME = 
      APPTAINER_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
      APPTAINER_COMMAND = exec
      APPTAINER_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
      APPTAINER_ENVIRONMENT = /.singularity.d/env/91-environment.sh
      APPTAINER_NAME = image2.sif
      CLICOLOR = 1
      CUDA_VERSION = 11.8.0
      CUDA_VISIBLE_DEVICES = 0
      DBUS_SESSION_BUS_ADDRESS = unix:path=/run/user/2699/bus
      DEBIAN_FRONTEND = noninteractive
      GPU_DEVICE_ORDINAL = 0
      GREPCOLOR = 32
      GREP_COLOR = 32
      HOME = /u/vieting
      KEYTIMEOUT = 1
      LANG = C.UTF-8
      LC_ADDRESS = de_DE.UTF-8
      LC_IDENTIFICATION = de_DE.UTF-8
      LC_MEASUREMENT = de_DE.UTF-8
      LC_MONETARY = de_DE.UTF-8
      LC_NAME = de_DE.UTF-8
      LC_NUMERIC = de_DE.UTF-8
      LC_PAPER = de_DE.UTF-8
      LC_TELEPHONE = de_DE.UTF-8
      LC_TIME = de_DE.UTF-8
      LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
      LOGNAME = vieting
      LSCOLORS = ExFxCxDxBxehedabagacad
      LS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.jpeg=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36
      MKL_NUM_THREADS = 1
      MOTD_SHOWN = pam
      NVARCH = x86_64
      NVIDIA_DRIVER_CAPABILITIES = compute,utility
      NVIDIA_REQUIRE_CUDA = cuda&gt;=11.8 brand=tesla,driver&gt;=450,driver&lt;451 brand=tesla,driver&gt;=470,driver&lt;471 brand=unknown,driver&gt;=470,driver&lt;471 brand=nvidia,driver&gt;=470,driver&lt;471 brand=nvidiartx,driver&gt;=470,driver&lt;471 brand=geforce,driver&gt;=470,driver&lt;471 brand=geforcertx,driver&gt;=470,driver&lt;471 brand=quadro,driver&gt;=470,driver&lt;471 brand=quadrortx,driver&gt;=470,driver&lt;471 brand=titan,driver&gt;=470,driver&lt;471 brand=titanrtx,driver&gt;=470,driver&lt;471 brand=tesla,driver&gt;=510,driver&lt;511 brand=unknown,driver&gt;=510,driver&lt;511 brand=nvidia,driver&gt;=510,driver&lt;511 brand=nvidiartx,driver&gt;=510,driver&lt;511 brand=geforce,driver&gt;=510,driver&lt;511 brand=geforcertx,driver&gt;=510,driver&lt;511 brand=quadro,driver&gt;=510,driver&lt;511 brand=quadrortx,driver&gt;=510,driver&lt;511 brand=titan,driver&gt;=510,driver&lt;511 brand=titanrtx,driver&gt;=510,driver&lt;511 brand=tesla,driver&gt;=515,driver&lt;516 brand=unknown,driver&gt;=515,driver&lt;516 brand=nvidia,driver&gt;=515,driver&lt;516 brand=nvidiartx,driver&gt;=515,driver&lt;516 brand=geforce,driver&gt;=515,driver&lt;516 brand=geforcertx,driver&gt;=515,driver&lt;516 brand=quadro,driver&gt;=515,driver&lt;516 brand=quadrortx,driver&gt;=515,driver&lt;516 brand=titan,driver&gt;=515,driver&lt;516 brand=titanrtx,driver&gt;=515,driver&lt;516
      NVIDIA_VISIBLE_DEVICES = all
      NV_CUDA_COMPAT_PACKAGE = cuda-compat-11-8
      NV_CUDA_CUDART_VERSION = 11.8.89-1
      OLDPWD = /work/asr4/vieting/tmp
      OMP_NUM_THREADS = 1
      PATH = /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
      PROMPT_COMMAND = ${PROMPT_COMMAND%%; PROMPT_COMMAND=*}"; PS1="Apptainer&gt; 
      PS1 = Apptainer&gt; 
      PWD = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
      ROCR_VISIBLE_DEVICES = 0
      SHELL = zsh
      SHLVL = 4
      SINFO_FORMAT = %30N %12P %5D %14T %10c %10m %16G
      SINGULARITY_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
      SINGULARITY_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
      SINGULARITY_ENVIRONMENT = /.singularity.d/env/91-environment.sh
      SINGULARITY_NAME = image2.sif
      SLURMD_NODENAME = cn-227
      SLURM_CLUSTER_NAME = cluster
      SLURM_CONF = /var/spool/slurm/slurmd/conf-cache/slurm.conf
      SLURM_CPUS_ON_NODE = 2
      SLURM_CPUS_PER_TASK = 2
      SLURM_DISTRIBUTION = cyclic
      SLURM_GPUS_ON_NODE = 1
      SLURM_GTIDS = 0
      SLURM_JOBID = 3024215
      SLURM_JOB_ACCOUNT = hlt
      SLURM_JOB_CPUS_PER_NODE = 2
      SLURM_JOB_GID = 2000
      SLURM_JOB_ID = 3024215
      SLURM_JOB_NAME = bash
      SLURM_JOB_NODELIST = cn-227
      SLURM_JOB_NUM_NODES = 1
      SLURM_JOB_PARTITION = gpu_11gb
      SLURM_JOB_QOS = normal
      SLURM_JOB_UID = 2699
      SLURM_JOB_USER = vieting
      SLURM_LAUNCH_NODE_IPADDR = 10.6.4.4
      SLURM_LOCALID = 0
      SLURM_NNODES = 1
      SLURM_NODEID = 0
      SLURM_NODELIST = cn-227
      SLURM_NPROCS = 1
      SLURM_NTASKS = 1
      SLURM_PRIO_PROCESS = 0
      SLURM_PROCID = 0
      SLURM_PTY_PORT = 39487
      SLURM_PTY_WIN_COL = 203
      SLURM_PTY_WIN_ROW = 58
      SLURM_SRUN_COMM_HOST = 10.6.4.4
      SLURM_SRUN_COMM_PORT = 46737
      SLURM_STEPID = 0
      SLURM_STEP_GPUS = 0
      SLURM_STEP_ID = 0
      SLURM_STEP_LAUNCHER_PORT = 46737
      SLURM_STEP_NODELIST = cn-227
      SLURM_STEP_NUM_NODES = 1
      SLURM_STEP_NUM_TASKS = 1
      SLURM_STEP_TASKS_PER_NODE = 1
      SLURM_SUBMIT_DIR = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
      SLURM_SUBMIT_HOST = cn-04
      SLURM_TASKS_PER_NODE = 1
      SLURM_TASK_PID = 71862
      SLURM_TOPOLOGY_ADDR = cn-227
      SLURM_TOPOLOGY_ADDR_PATTERN = node
      SLURM_UMASK = 0022
      SLURM_WORKING_CLUSTER = cluster:mn-04:6817:9472:109
      SQUEUE_FORMAT = %.18i %.9P %.64j %.16u %8Q %.2t %19V %.10M %16R
      SRUN_DEBUG = 3
      SSH_CLIENT = 137.226.223.15 50634 22
      SSH_CONNECTION = 137.226.223.15 50634 137.226.116.49 22
      SSH_TTY = /dev/pts/5
      TERM = screen-256color
      TERM_PROGRAM = tmux
      TERM_PROGRAM_VERSION = 3.2a
      TF2_BEHAVIOR = 1
      THEANO_FLAGS = compiledir_format=compiledir_%(platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s--sprint-sub,device=cpu,force_device=True
      TMPDIR = /var/tmp
      TMUX = /tmp/tmux-2699/default,164990,4
      TMUX_PANE = %55
      TMUX_PLUGIN_MANAGER_PATH = /u/vieting/.tmux/plugins/
      TPU_ML_PLATFORM = Tensorflow
      USER = vieting
      USER_PATH = /usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/u/vieting/.fzf/bin:/u/vieting/.local/share/JetBrains/Toolbox/scripts:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
      XDG_DATA_DIRS = /usr/local/share:/usr/share:/var/lib/snapd/desktop
      XDG_RUNTIME_DIR = /run/user/2699
      XDG_SESSION_CLASS = user
      XDG_SESSION_ID = 15273
      XDG_SESSION_TYPE = tty
      ZLS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36:(-default-)*.swp=-1;44;37:(-default-)*,v=5;34;93:(-default-)*.vim=35:(-default-)no=0:(-default-)ex=1;31:(-default-)fi=0:(-default-)di=1;36:(-default-)ln=33:(-default-)or=5;35:(-default-)mi=1;40:(-default-)pi=93:(-default-)so=33:(-default-)bd=44;37:(-default-)cd=44;37:(-default-)*.jpg=1:(-default-)*.jpeg=1:(-default-)*.JPG=1:(-default-)*.gif=1:(-default-)*.png=1:(-default-)*.ppm=1:(-default-)*.pgm=1:(-default-)*.pbm=1:(-default-)*.c=1;33:(-default-)*.C=1;33:(-default-)*.h=1;33:(-default-)*.cc=1;33:(-default-)*.awk=1;33:(-default-)*.pl=1;33:(-default-)*.py=1;33:(-default-)*.m=1;33:(-default-)*.rb=1;33:(-default-)*.gz=0;33:(-default-)*.tar=0;33:(-default-)*.zip=0;33:(-default-)*.lha=0;33:(-default-)*.lzh=0;33:(-default-)*.arj=0;33:(-default-)*.bz2=0;33:(-default-)*.tgz=0;33:(-default-)*.taz=33:(-default-)*.dmg=0;33:(-default-)*.html=36:(-default-)*.htm=36:(-default-)*.doc=36:(-default-)*.txt=1;36:(-default-)*.o=1;36:(-default-)*.a=1;36
      _ = /usr/bin/apptainer
      neural-network-trainer.*.LOGFILE = nn-trainer.loss.log
      neural-network-trainer.*.TASK = 1
      neural-network-trainer.*.allophone-state-graph-builder.topology = ctc
      neural-network-trainer.*.allow-for-silence-repetitions = no
      neural-network-trainer.*.configuration.channel = output-channel
      neural-network-trainer.*.corpus.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz
      neural-network-trainer.*.corpus.segments.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1
      neural-network-trainer.*.dot.channel = nil
      neural-network-trainer.*.encoding = UTF-8
      neural-network-trainer.*.error.channel = output-channel,
      neural-network-trainer.*.fix-allophone-context-at-word-boundaries = yes
      neural-network-trainer.*.log.channel = output-channel
      neural-network-trainer.*.model-combination.acoustic-model.allophones.add-all = yes
      neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-file = /u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank
      neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-lexicon = no
      neural-network-trainer.*.model-combination.acoustic-model.hmm.across-word-model = yes
      neural-network-trainer.*.model-combination.acoustic-model.hmm.early-recombination = no
      neural-network-trainer.*.model-combination.acoustic-model.hmm.state-repetitions = 1
      neural-network-trainer.*.model-combination.acoustic-model.hmm.states-per-phone = 1
      neural-network-trainer.*.model-combination.acoustic-model.phonology.future-length = 0
      neural-network-trainer.*.model-combination.acoustic-model.phonology.history-length = 0
      neural-network-trainer.*.model-combination.acoustic-model.state-tying.file = /u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank
      neural-network-trainer.*.model-combination.acoustic-model.state-tying.type = lookup
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.exit = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.forward = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.loop = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.skip = infinity
      neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m1.loop = infinity
      neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m2.loop = infinity
      neural-network-trainer.*.model-combination.acoustic-model.tdp.scale = 1.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.exit = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.forward = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.loop = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.skip = infinity
      neural-network-trainer.*.model-combination.lexicon.file = /u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml
      neural-network-trainer.*.output-channel.append = no
      neural-network-trainer.*.output-channel.compressed = no
      neural-network-trainer.*.output-channel.file = $(LOGFILE)
      neural-network-trainer.*.output-channel.unbuffered = yes
      neural-network-trainer.*.progress.channel = output-channel
      neural-network-trainer.*.pymod-config = c2p_fd:38,p2c_fd:40,minPythonControlVersion:4
      neural-network-trainer.*.pymod-name = returnn.sprint.control
      neural-network-trainer.*.pymod-path = /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn
      neural-network-trainer.*.python-control-enabled = true
      neural-network-trainer.*.real-time-factor.channel = output-channel
      neural-network-trainer.*.statistics.channel = output-channel
      neural-network-trainer.*.system-info.channel = output-channel
      neural-network-trainer.*.time.channel = output-channel
      neural-network-trainer.*.transducer-builder-filter-out-invalid-allophones = yes
      neural-network-trainer.*.version.channel = output-channel
      neural-network-trainer.*.warning.channel = output-channel,
      neural-network-trainer.action = python-control
      neural-network-trainer.extract-features = no
      neural-network-trainer.model-automaton.channel = output-channel
      neural-network-trainer.python-control-loop-type = python-control-loop
    </resources>
    <selection>neural-network-trainer</selection>
  </configuration>
  <information component="neural-network-trainer">
    use 0 as seed for random number generator
  </information>
  <information component="neural-network-trainer">
    using single precision
  </information>
  <information component="neural-network-trainer">
    action: python-control
  </information>
  <information component="neural-network-trainer">
    PythonControl: run_control_loop
  </information>
  <information component="neural-network-trainer.corpus">
    Use a segment whitelist with 249529 entries, keep only listed segments.
  </information>
  <corpus name="switchboard-1" full-name="switchboard-1">
      [...]
  </corpus>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
    reading lexicon from file "/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml" ...
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
    dependency value: d5c175f07244eeb9a36f094fcd17677a
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
    statistics:
    number of phonemes:                    46
    number of lemmas:                      30250
    number of lemma pronunciations:        30858
    number of distinct pronunciations:     28085
    number of distinct syntactic tokens:   30245
    number of distinct evaluation tokens:  30243
    average number of phonemes per pronunciation: 6.35642
  </information>
  <information component="neural-network-trainer">
    Load classic acoustic model.
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
    184 allophones after adding allophones from file "/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank"
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
    184 allophones after adding all allophones
  </information>
  <information component="neural-network-trainer">
    create CTC topology graph builder
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    blank allophone id 179
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    lemma-pronuncation-to-lemma transducer
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
      <semiring>tropical</semiring>
      <input-labels>30858</input-labels>
      <output-labels>30250</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>0</max-state-id>
      <states>1</states>
      <arcs>30858</arcs>
      <final-states>1</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>0</output-epsilon-arcs>
      <memory>493792</memory>
    </fsa-info>
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    phoneme-to-lemma-pronuncation transducer
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
      <semiring>tropical</semiring>
      <input-labels>47</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>48718</max-state-id>
      <states>48719</states>
      <arcs>79576</arcs>
      <final-states>1</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>48718</output-epsilon-arcs>
      <memory>4391232</memory>
    </fsa-info>
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
    184 distinct allophones found
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
    <statistics type="state-model-transducer">
      <number-of-distinct-allophones>184</number-of-distinct-allophones>
      <number-of-states boundary="word-end" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="word-start" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="word-end" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="word-start" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="intra-word" coarticulated="false">
        1
      </number-of-states>
    </statistics>
  </information>


The opts for the FastBaumWelchLoss are below and except for RASR path and unbuffered output, they are identical to working setups with tf 2.8 that do not have the issue described in rwth-i6/returnn#1450. So there should not be a specific problem with this configuration.


    "output": {                                                                                                             
        "class": "softmax",                                                                                                 
        "from": "encoder",                                                                                                  
        "loss": "fast_bw",                                                                                                  
        "loss_opts": {                                                                                                      
            "sprint_opts": {                                                                                                
                "sprintExecPath": "/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard",
                "sprintConfigStr": "--*.configuration.channel=output-channel --model-automaton.channel=output-channel "  
                "--*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel "                    
                "--*.time.channel=output-channel --*.version.channel=output-channel "                                    
                "--*.log.channel=output-channel --*.warning.channel=output-channel, stderr "                             
                "--*.error.channel=output-channel, stderr --*.statistics.channel=output-channel "                        
                "--*.progress.channel=output-channel --*.dot.channel=nil "                                               
                "--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz "
                "--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 "
                "--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml "
                "--*.model-combination.acoustic-model.state-tying.type=lookup "                                          
                "--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank "
                "--*.model-combination.acoustic-model.allophones.add-from-lexicon=no "                                   
                "--*.model-combination.acoustic-model.allophones.add-all=yes "                                           
                "--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank "
                "--*.model-combination.acoustic-model.hmm.states-per-phone=1 "                                           
                "--*.model-combination.acoustic-model.hmm.state-repetitions=1 "                                          
                "--*.model-combination.acoustic-model.hmm.across-word-model=yes "                                        
                "--*.model-combination.acoustic-model.hmm.early-recombination=no "                                       
                "--*.model-combination.acoustic-model.tdp.scale=1.0 "                                                    
                "--*.model-combination.acoustic-model.tdp.*.loop=0.0 "                                                   
                "--*.model-combination.acoustic-model.tdp.*.forward=0.0 "                                                
                "--*.model-combination.acoustic-model.tdp.*.skip=infinity "                                              
                "--*.model-combination.acoustic-model.tdp.*.exit=0.0 "                                                   
                "--*.model-combination.acoustic-model.tdp.silence.loop=0.0 "                                             
                "--*.model-combination.acoustic-model.tdp.silence.forward=0.0 "                                          
                "--*.model-combination.acoustic-model.tdp.silence.skip=infinity "                                        
                "--*.model-combination.acoustic-model.tdp.silence.exit=0.0 "                                             
                "--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity "                                       
                "--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity "                                       
                "--*.model-combination.acoustic-model.phonology.history-length=0 "                                       
                "--*.model-combination.acoustic-model.phonology.future-length=0 "                                        
                "--*.transducer-builder-filter-out-invalid-allophones=yes "                                              
                "--*.fix-allophone-context-at-word-boundaries=yes "                                                      
                "--*.allophone-state-graph-builder.topology=ctc "                                                        
                "--*.allow-for-silence-repetitions=no --action=python-control "                                          
                "--python-control-loop-type=python-control-loop --extract-features=no "                                  
                "--*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) "                                                 
                "--*.output-channel.compressed=no --*.output-channel.append=no "                                         
                "--*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1",                          
                "minPythonControlVersion": 4,                                                                            
                "numInstances": 2,                                                                                       
                "usePythonSegmentOrder": False,                                                                          
            },                                                                                                           
            "tdp_scale": 0.0,                                                                                            
        },                                                                                                               
        "target": None,                                                                                                  
        "n_out": 88,                                                                                                     
    }, 

@albertz
Copy link
Member

albertz commented Nov 9, 2023

The relevant stack trace (demangled):

  PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
  Segmentation fault

  Creating stack trace (innermost first):
  #2  /lib/x86_64-linux-gnu/libc.so.6( 0x42520) [0x7fc2485f8520]
  #3  /lib/x86_64-linux-gnu/libc.so.6(pthread_kill 0x12c) [0x7fc24864c9fc]
  #4  /lib/x86_64-linux-gnu/libc.so.6(raise 0x16) [0x7fc2485f8476]
  #5  /lib/x86_64-linux-gnu/libc.so.6( 0x42520) [0x7fc2485f8520]
  #6  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int) const 0x3a) [0x55edd8e4640a]
  #7  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Ftl::CacheAutomaton<Fsa::Automaton>::getState(unsigned int) const 0x3a2) [0x55edd8e55c72]
  #8  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard( 0x9fb257) [0x55edd8dd7257]
  #9  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard( 0x9fe9ac) [0x55edd8dda9ac]
  #10  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Am::TransitionModel::apply(Core::Ref<Fsa::Automaton const>, int, bool) const 0x274) [0x55edd8dd3194]
  #11  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Am::ClassicTransducerBuilder::applyTransitionModel(Core::Ref<Fsa::Automaton const>) 0x387) [0x55edd8dc2df7]
  #12  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::addLoopTransition(Core::Ref<Fsa::Automaton const>) 0x123) [0x55edd8be4e43]
  #13  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::CTCTopologyGraphBuilder::addLoopTransition(Core::Ref<Fsa::Automaton const>) 0x53) [0x55edd8be5183]
  #14  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::CTCTopologyGraphBuilder::buildTransducer(Core::Ref<Fsa::Automaton const>) 0x8f) [0x55edd8be7cbf]
  #15  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::buildTransducer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 0x66) [0x55edd8be2516]
  #16  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::build(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 0x2e) [0x55edd8be2d5e]
  #17  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::AllophoneStateFsaExporter::exportFsaForOrthography(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const 0x54) [0x55edd8abb054]
  #18  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::Internal::exportAllophoneStateFsaBySegName(_object*, _object*) 0x133) [0x55edd8aa0833]
  #19  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::Internal::callback(_object*, _object*) 0x25d) [0x55edd8aa0e6d]
  #20  /lib/x86_64-linux-gnu/libpython3.11.so.1.0( 0x1cd073) [0x7fc27c978073]
  #21  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall 0x87) [0x7fc27c928ff7]

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

@SimBe195 pointed me to use --*.model-automaton.channel=output-channel, which results in the following nn-trainer log output:


<?xml version="1.0" encoding="UTF-8"?>
<sprint>
  <system-information>
    <name>cn-259</name>
    <type>x86_64</type>
    <operating-system>Linux</operating-system>
    <build-date>Nov  9 2023</build-date>
    <local-time>2023-11-09 10:09:31.450</local-time>
  </system-information>
  <version>
    RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)
  </version>
  <configuration>
    <source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:36,p2c_fd:38,minPythonControlVersion:4 --*.configuration.channel=output-channel --*.model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
    <source type="command line">--*.python-control-enabled=true --*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn --*.pymod-name=returnn.sprint.control --*.pymod-config=c2p_fd:36,p2c_fd:38,minPythonControlVersion:4 --*.configuration.channel=output-channel --*.model-automaton.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version.channel=output-channel --*.log.channel=output-channel --*.warning.channel=output-channel, stderr --*.error.channel=output-channel, stderr --*.statistics.channel=output-channel --*.progress.channel=output-channel --*.dot.channel=nil --*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz --*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1 --*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml --*.model-combination.acoustic-model.state-tying.type=lookup --*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank --*.model-combination.acoustic-model.allophones.add-from-lexicon=no --*.model-combination.acoustic-model.allophones.add-all=yes --*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank --*.model-combination.acoustic-model.hmm.states-per-phone=1 --*.model-combination.acoustic-model.hmm.state-repetitions=1 --*.model-combination.acoustic-model.hmm.across-word-model=yes --*.model-combination.acoustic-model.hmm.early-recombination=no --*.model-combination.acoustic-model.tdp.scale=1.0 --*.model-combination.acoustic-model.tdp.*.loop=0.0 --*.model-combination.acoustic-model.tdp.*.forward=0.0 --*.model-combination.acoustic-model.tdp.*.skip=infinity --*.model-combination.acoustic-model.tdp.*.exit=0.0 --*.model-combination.acoustic-model.tdp.silence.loop=0.0 --*.model-combination.acoustic-model.tdp.silence.forward=0.0 --*.model-combination.acoustic-model.tdp.silence.skip=infinity --*.model-combination.acoustic-model.tdp.silence.exit=0.0 --*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity --*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity --*.model-combination.acoustic-model.phonology.history-length=0 --*.model-combination.acoustic-model.phonology.future-length=0 --*.transducer-builder-filter-out-invalid-allophones=yes --*.fix-allophone-context-at-word-boundaries=yes --*.allophone-state-graph-builder.topology=ctc --*.allow-for-silence-repetitions=no --action=python-control --python-control-loop-type=python-control-loop --extract-features=no --*.encoding=UTF-8 --*.output-channel.file=$(LOGFILE) --*.output-channel.compressed=no --*.output-channel.append=no --*.output-channel.unbuffered=yes --*.LOGFILE=nn-trainer.loss.log --*.TASK=1</source>
    <resources>
      *.home = /u/vieting
      APPTAINER_APPNAME = 
      APPTAINER_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
      APPTAINER_COMMAND = exec
      APPTAINER_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
      APPTAINER_ENVIRONMENT = /.singularity.d/env/91-environment.sh
      APPTAINER_NAME = image2.sif
      CLICOLOR = 1
      CUDA_VERSION = 11.8.0
      CUDA_VISIBLE_DEVICES = 2
      DBUS_SESSION_BUS_ADDRESS = unix:path=/run/user/2699/bus
      DEBIAN_FRONTEND = noninteractive
      GPU_DEVICE_ORDINAL = 2
      GREPCOLOR = 32
      GREP_COLOR = 32
      HOME = /u/vieting
      KEYTIMEOUT = 1
      LANG = C.UTF-8
      LC_ADDRESS = de_DE.UTF-8
      LC_IDENTIFICATION = de_DE.UTF-8
      LC_MEASUREMENT = de_DE.UTF-8
      LC_MONETARY = de_DE.UTF-8
      LC_NAME = de_DE.UTF-8
      LC_NUMERIC = de_DE.UTF-8
      LC_PAPER = de_DE.UTF-8
      LC_TELEPHONE = de_DE.UTF-8
      LC_TIME = de_DE.UTF-8
      LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
      LOGNAME = vieting
      LSCOLORS = ExFxCxDxBxehedabagacad
      LS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.jpeg=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36
      MKL_NUM_THREADS = 1
      MOTD_SHOWN = pam
      NVARCH = x86_64
      NVIDIA_DRIVER_CAPABILITIES = compute,utility
      NVIDIA_REQUIRE_CUDA = cuda&gt;=11.8 brand=tesla,driver&gt;=450,driver&lt;451 brand=tesla,driver&gt;=470,driver&lt;471 brand=unknown,driver&gt;=470,driver&lt;471 brand=nvidia,driver&gt;=470,driver&lt;471 brand=nvidiartx,driver&gt;=470,driver&lt;471 brand=geforce,driver&gt;=470,driver&lt;471 brand=geforcertx,driver&gt;=470,driver&lt;471 brand=quadro,driver&gt;=470,driver&lt;471 brand=quadrortx,driver&gt;=470,driver&lt;471 brand=titan,driver&gt;=470,driver&lt;471 brand=titanrtx,driver&gt;=470,driver&lt;471 brand=tesla,driver&gt;=510,driver&lt;511 brand=unknown,driver&gt;=510,driver&lt;511 brand=nvidia,driver&gt;=510,driver&lt;511 brand=nvidiartx,driver&gt;=510,driver&lt;511 brand=geforce,driver&gt;=510,driver&lt;511 brand=geforcertx,driver&gt;=510,driver&lt;511 brand=quadro,driver&gt;=510,driver&lt;511 brand=quadrortx,driver&gt;=510,driver&lt;511 brand=titan,driver&gt;=510,driver&lt;511 brand=titanrtx,driver&gt;=510,driver&lt;511 brand=tesla,driver&gt;=515,driver&lt;516 brand=unknown,driver&gt;=515,driver&lt;516 brand=nvidia,driver&gt;=515,driver&lt;516 brand=nvidiartx,driver&gt;=515,driver&lt;516 brand=geforce,driver&gt;=515,driver&lt;516 brand=geforcertx,driver&gt;=515,driver&lt;516 brand=quadro,driver&gt;=515,driver&lt;516 brand=quadrortx,driver&gt;=515,driver&lt;516 brand=titan,driver&gt;=515,driver&lt;516 brand=titanrtx,driver&gt;=515,driver&lt;516
      NVIDIA_VISIBLE_DEVICES = all
      NV_CUDA_COMPAT_PACKAGE = cuda-compat-11-8
      NV_CUDA_CUDART_VERSION = 11.8.89-1
      OLDPWD = /work/asr4/vieting/tmp
      OMP_NUM_THREADS = 1
      PATH = /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
      PROMPT_COMMAND = ${PROMPT_COMMAND%%; PROMPT_COMMAND=*}"; PS1="Apptainer&gt; 
      PS1 = Apptainer&gt; 
      PWD = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
      ROCR_VISIBLE_DEVICES = 2
      SHELL = zsh
      SHLVL = 4
      SINFO_FORMAT = %30N %12P %5D %14T %10c %10m %16G
      SINGULARITY_BIND = /work/asr4,/work/common,/work/tools22,/u/hilmes,/work/asr3,/usr/local/cache-manager
      SINGULARITY_CONTAINER = /work/asr4/hilmes/dev/rasr/apptainer/2023-11-08_tensorflow-2.14_v1/image2.sif
      SINGULARITY_ENVIRONMENT = /.singularity.d/env/91-environment.sh
      SINGULARITY_NAME = image2.sif
      SLURMD_NODENAME = cn-259
      SLURM_CLUSTER_NAME = cluster
      SLURM_CONF = /var/spool/slurm/slurmd/conf-cache/slurm.conf
      SLURM_CPUS_ON_NODE = 2
      SLURM_CPUS_PER_TASK = 2
      SLURM_DISTRIBUTION = cyclic
      SLURM_GPUS_ON_NODE = 1
      SLURM_GTIDS = 0
      SLURM_JOBID = 3024332
      SLURM_JOB_ACCOUNT = hlt
      SLURM_JOB_CPUS_PER_NODE = 2
      SLURM_JOB_GID = 2000
      SLURM_JOB_ID = 3024332
      SLURM_JOB_NAME = bash
      SLURM_JOB_NODELIST = cn-259
      SLURM_JOB_NUM_NODES = 1
      SLURM_JOB_PARTITION = gpu_11gb
      SLURM_JOB_QOS = normal
      SLURM_JOB_UID = 2699
      SLURM_JOB_USER = vieting
      SLURM_LAUNCH_NODE_IPADDR = 10.6.4.4
      SLURM_LOCALID = 0
      SLURM_NNODES = 1
      SLURM_NODEID = 0
      SLURM_NODELIST = cn-259
      SLURM_NPROCS = 1
      SLURM_NTASKS = 1
      SLURM_PRIO_PROCESS = 0
      SLURM_PROCID = 0
      SLURM_PTY_PORT = 36351
      SLURM_PTY_WIN_COL = 203
      SLURM_PTY_WIN_ROW = 58
      SLURM_SRUN_COMM_HOST = 10.6.4.4
      SLURM_SRUN_COMM_PORT = 38771
      SLURM_STEPID = 0
      SLURM_STEP_GPUS = 2
      SLURM_STEP_ID = 0
      SLURM_STEP_LAUNCHER_PORT = 38771
      SLURM_STEP_NODELIST = cn-259
      SLURM_STEP_NUM_NODES = 1
      SLURM_STEP_NUM_TASKS = 1
      SLURM_STEP_TASKS_PER_NODE = 1
      SLURM_SUBMIT_DIR = /work/asr4/vieting/tmp/20231108_tf213_sprint_op
      SLURM_SUBMIT_HOST = cn-04
      SLURM_TASKS_PER_NODE = 1
      SLURM_TASK_PID = 3285529
      SLURM_TOPOLOGY_ADDR = cn-259
      SLURM_TOPOLOGY_ADDR_PATTERN = node
      SLURM_UMASK = 0022
      SLURM_WORKING_CLUSTER = cluster:mn-04:6817:9472:109
      SQUEUE_FORMAT = %.18i %.9P %.64j %.16u %8Q %.2t %19V %.10M %16R
      SRUN_DEBUG = 3
      SSH_CLIENT = 137.226.223.15 50634 22
      SSH_CONNECTION = 137.226.223.15 50634 137.226.116.49 22
      SSH_TTY = /dev/pts/5
      TERM = screen-256color
      TERM_PROGRAM = tmux
      TERM_PROGRAM_VERSION = 3.2a
      TF2_BEHAVIOR = 1
      THEANO_FLAGS = compiledir_format=compiledir_%(platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s--sprint-sub,device=cpu,force_device=True
      TMPDIR = /var/tmp
      TMUX = /tmp/tmux-2699/default,164990,4
      TMUX_PANE = %55
      TMUX_PLUGIN_MANAGER_PATH = /u/vieting/.tmux/plugins/
      TPU_ML_PLATFORM = Tensorflow
      USER = vieting
      USER_PATH = /usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/cuda-10.1/bin:/u/vieting/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/u/vieting/.fzf/bin:/u/vieting/.local/share/JetBrains/Toolbox/scripts:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
      XDG_DATA_DIRS = /usr/local/share:/usr/share:/var/lib/snapd/desktop
      XDG_RUNTIME_DIR = /run/user/2699
      XDG_SESSION_CLASS = user
      XDG_SESSION_ID = 15273
      XDG_SESSION_TYPE = tty
      ZLS_COLORS = *.swp=-1;44;37:*,v=5;34;93:*.vim=35:no=0:ex=1;31:fi=0:di=1;36:ln=33:or=5;35:mi=1;40:pi=93:so=33:bd=44;37:cd=44;37:*.jpg=1:*.jpeg=1:*.JPG=1:*.gif=1:*.png=1:*.ppm=1:*.pgm=1:*.pbm=1:*.c=1;33:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.py=1;33:*.m=1;33:*.rb=1;33:*.gz=0;33:*.tar=0;33:*.zip=0;33:*.lha=0;33:*.lzh=0;33:*.arj=0;33:*.bz2=0;33:*.tgz=0;33:*.taz=33:*.dmg=0;33:*.html=36:*.htm=36:*.doc=36:*.txt=1;36:*.o=1;36:*.a=1;36:(-default-)*.swp=-1;44;37:(-default-)*,v=5;34;93:(-default-)*.vim=35:(-default-)no=0:(-default-)ex=1;31:(-default-)fi=0:(-default-)di=1;36:(-default-)ln=33:(-default-)or=5;35:(-default-)mi=1;40:(-default-)pi=93:(-default-)so=33:(-default-)bd=44;37:(-default-)cd=44;37:(-default-)*.jpg=1:(-default-)*.jpeg=1:(-default-)*.JPG=1:(-default-)*.gif=1:(-default-)*.png=1:(-default-)*.ppm=1:(-default-)*.pgm=1:(-default-)*.pbm=1:(-default-)*.c=1;33:(-default-)*.C=1;33:(-default-)*.h=1;33:(-default-)*.cc=1;33:(-default-)*.awk=1;33:(-default-)*.pl=1;33:(-default-)*.py=1;33:(-default-)*.m=1;33:(-default-)*.rb=1;33:(-default-)*.gz=0;33:(-default-)*.tar=0;33:(-default-)*.zip=0;33:(-default-)*.lha=0;33:(-default-)*.lzh=0;33:(-default-)*.arj=0;33:(-default-)*.bz2=0;33:(-default-)*.tgz=0;33:(-default-)*.taz=33:(-default-)*.dmg=0;33:(-default-)*.html=36:(-default-)*.htm=36:(-default-)*.doc=36:(-default-)*.txt=1;36:(-default-)*.o=1;36:(-default-)*.a=1;36
      _ = /usr/bin/apptainer
      neural-network-trainer.*.LOGFILE = nn-trainer.loss.log
      neural-network-trainer.*.TASK = 1
      neural-network-trainer.*.allophone-state-graph-builder.topology = ctc
      neural-network-trainer.*.allow-for-silence-repetitions = no
      neural-network-trainer.*.configuration.channel = output-channel
      neural-network-trainer.*.corpus.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz
      neural-network-trainer.*.corpus.segments.file = /u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1
      neural-network-trainer.*.dot.channel = nil
      neural-network-trainer.*.encoding = UTF-8
      neural-network-trainer.*.error.channel = output-channel,
      neural-network-trainer.*.fix-allophone-context-at-word-boundaries = yes
      neural-network-trainer.*.log.channel = output-channel
      neural-network-trainer.*.model-automaton.channel = output-channel
      neural-network-trainer.*.model-combination.acoustic-model.allophones.add-all = yes
      neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-file = /u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank
      neural-network-trainer.*.model-combination.acoustic-model.allophones.add-from-lexicon = no
      neural-network-trainer.*.model-combination.acoustic-model.hmm.across-word-model = yes
      neural-network-trainer.*.model-combination.acoustic-model.hmm.early-recombination = no
      neural-network-trainer.*.model-combination.acoustic-model.hmm.state-repetitions = 1
      neural-network-trainer.*.model-combination.acoustic-model.hmm.states-per-phone = 1
      neural-network-trainer.*.model-combination.acoustic-model.phonology.future-length = 0
      neural-network-trainer.*.model-combination.acoustic-model.phonology.history-length = 0
      neural-network-trainer.*.model-combination.acoustic-model.state-tying.file = /u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank
      neural-network-trainer.*.model-combination.acoustic-model.state-tying.type = lookup
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.exit = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.forward = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.loop = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.*.skip = infinity
      neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m1.loop = infinity
      neural-network-trainer.*.model-combination.acoustic-model.tdp.entry-m2.loop = infinity
      neural-network-trainer.*.model-combination.acoustic-model.tdp.scale = 1.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.exit = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.forward = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.loop = 0.0
      neural-network-trainer.*.model-combination.acoustic-model.tdp.silence.skip = infinity
      neural-network-trainer.*.model-combination.lexicon.file = /u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml
      neural-network-trainer.*.output-channel.append = no
      neural-network-trainer.*.output-channel.compressed = no
      neural-network-trainer.*.output-channel.file = $(LOGFILE)
      neural-network-trainer.*.output-channel.unbuffered = yes
      neural-network-trainer.*.progress.channel = output-channel
      neural-network-trainer.*.pymod-config = c2p_fd:36,p2c_fd:38,minPythonControlVersion:4
      neural-network-trainer.*.pymod-name = returnn.sprint.control
      neural-network-trainer.*.pymod-path = /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn
      neural-network-trainer.*.python-control-enabled = true
      neural-network-trainer.*.real-time-factor.channel = output-channel
      neural-network-trainer.*.statistics.channel = output-channel
      neural-network-trainer.*.system-info.channel = output-channel
      neural-network-trainer.*.time.channel = output-channel
      neural-network-trainer.*.transducer-builder-filter-out-invalid-allophones = yes
      neural-network-trainer.*.version.channel = output-channel
      neural-network-trainer.*.warning.channel = output-channel,
      neural-network-trainer.action = python-control
      neural-network-trainer.extract-features = no
      neural-network-trainer.python-control-loop-type = python-control-loop
    </resources>
    <selection>neural-network-trainer</selection>
  </configuration>
  <information component="neural-network-trainer">
    use 0 as seed for random number generator
  </information>
  <information component="neural-network-trainer">
    using single precision
  </information>
  <information component="neural-network-trainer">
    action: python-control
  </information>
  <information component="neural-network-trainer">
    PythonControl: run_control_loop
  </information>
  <information component="neural-network-trainer.corpus">
    Use a segment whitelist with 249529 entries, keep only listed segments.
  </information>
  <corpus name="switchboard-1" full-name="switchboard-1">
      [...]
  </corpus>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
    reading lexicon from file "/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml" ...
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
    dependency value: d5c175f07244eeb9a36f094fcd17677a
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.lexicon">
    statistics:
    number of phonemes:                    46
    number of lemmas:                      30250
    number of lemma pronunciations:        30858
    number of distinct pronunciations:     28085
    number of distinct syntactic tokens:   30245
    number of distinct evaluation tokens:  30243
    average number of phonemes per pronunciation: 6.35642
  </information>
  <information component="neural-network-trainer">
    Load classic acoustic model.
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
    184 allophones after adding allophones from file "/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank"
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model.allophones">
    184 allophones after adding all allophones
  </information>
  <information component="neural-network-trainer">
    create CTC topology graph builder
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    blank allophone id 179
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>static</describe>
    <properties>!cached linear storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30250</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>144</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    lemma-pronuncation-to-lemma transducer
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
      <semiring>tropical</semiring>
      <input-labels>30858</input-labels>
      <output-labels>30250</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>0</max-state-id>
      <states>1</states>
      <arcs>30858</arcs>
      <final-states>1</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>0</output-epsilon-arcs>
      <memory>493792</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30858</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>494179</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    phoneme-to-lemma-pronuncation transducer
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!acyclic !cached !linear !sorted-by-arc !sorted-by-input sorted-by-output storage</properties>
      <semiring>tropical</semiring>
      <input-labels>47</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>48718</max-state-id>
      <states>48719</states>
      <arcs>79576</arcs>
      <final-states>1</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>48718</output-epsilon-arcs>
      <memory>4391232</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>47</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>4</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4885673</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
    184 distinct allophones found
  </information>
  <information component="neural-network-trainer.alignment-fsa-exporter.model-combination.acoustic-model">
    <statistics type="state-model-transducer">
      <number-of-distinct-allophones>184</number-of-distinct-allophones>
      <number-of-states boundary="word-end" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="word-start" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="word-end" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="word-start" coarticulated="false">
        1
      </number-of-states>
      <number-of-states boundary="intra-word" coarticulated="false">
        1
      </number-of-states>
    </statistics>
  </information>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>7</max-state-id>
    <states>5</states>
    <arcs>4</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4889690</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>static</describe>
    <properties>!cached storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>7</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>6</output-epsilon-arcs>
    <memory>432</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!cached storage</properties>
      <semiring>tropical</semiring>
      <input-labels>0</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>8</max-state-id>
      <states>9</states>
      <arcs>18</arcs>
      <final-states>2</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>16</output-epsilon-arcs>
      <memory>864</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
    <properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>8</max-state-id>
    <states>8</states>
    <arcs>17</arcs>
    <final-states>2</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>1764</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>static</describe>
    <properties>!cached linear storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30250</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>144</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30858</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>494179</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>47</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>2</max-state-id>
    <states>3</states>
    <arcs>2</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>1</output-epsilon-arcs>
    <memory>4885649</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>3</max-state-id>
    <states>3</states>
    <arcs>2</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>1</output-epsilon-arcs>
    <memory>4889421</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>static</describe>
    <properties>!cached storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>2</max-state-id>
    <states>3</states>
    <arcs>3</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>2</output-epsilon-arcs>
    <memory>240</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!cached storage</properties>
      <semiring>tropical</semiring>
      <input-labels>0</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>4</max-state-id>
      <states>5</states>
      <arcs>8</arcs>
      <final-states>2</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>6</output-epsilon-arcs>
      <memory>448</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
    <properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>4</states>
    <arcs>7</arcs>
    <final-states>2</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>896</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>static</describe>
    <properties>!cached linear storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30250</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>144</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30858</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>494179</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>47</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>4</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4885673</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>7</max-state-id>
    <states>5</states>
    <arcs>4</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4889690</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>static</describe>
    <properties>!cached storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>7</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>6</output-epsilon-arcs>
    <memory>432</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!cached storage</properties>
      <semiring>tropical</semiring>
      <input-labels>0</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>8</max-state-id>
      <states>9</states>
      <arcs>18</arcs>
      <final-states>2</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>16</output-epsilon-arcs>
      <memory>864</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
    <properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>8</max-state-id>
    <states>8</states>
    <arcs>17</arcs>
    <final-states>2</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>1764</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</    <t    <describe>static    <describ    <properties>!cac    <properties>!cached linear s    <semiring>tropic    <semiring>    <input-labels>30250</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>144</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30858</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>494179</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>47</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>2</max-state-id>
    <states>3</states>
    <arcs>2</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>1</output-epsilon-arcs>
    <memory>4885649</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>3</max-state-id>
    <states>3</states>
    <arcs>2</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>1</output-epsilon-arcs>
    <memory>4889421</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>static</describe>
    <properties>!cached storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>2</max-state-id>
    <states>3</states>
    <arcs>3</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>2</output-epsilon-arcs>
    <memory>240</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!cached storage</properties>
      <semiring>tropical</semiring>
      <input-labels>0</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>4</max-state-id>
      <states>5</states>
      <arcs>8</arcs>
      <final-states>2</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>6</output-epsilon-arcs>
      <memory>448</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
    <properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>4</states>
    <arcs>7</arcs>
    <final-states>2</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>896</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>static</describe>
    <properties>!cached linear storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30250</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>1</max-state-id>
    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>144</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type      <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30858</input-labels>
    <initial-state-id>0<    <initial-state-    <max-state-id>1</max    <max-st    <states>2</states>
    <arcs>1</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>494179</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>47</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>4</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4885673</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>7</max-state-id>
    <states>5</states>
    <arcs>4</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4889690</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>static</describe>
    <properties>!cached storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>7</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>6</output-epsilon-arcs>
    <memory>432</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!cached storage</properties>
      <semiring>tropical</semiring>
      <input-labels>0</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>8</max-state-id>
      <states>9</states>
      <arcs>18</arcs>
      <final-states>2</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>16</output-epsilon-arcs>
      <memory>864</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
    <properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>8</max-state-id>
    <states>8</states>
    <arcs>17</arcs>
    <final-states>2</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>1764</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>static</describe>
    <properties>!cached linear storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30250</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>3</max-state-id>
    <states>4</states>
    <arcs>3</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>304</memory>
  </fsa-info>
  <fsa-info>
    <type>acceptor</type>
    <describe>projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000))))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>30858</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>3</max-state-id>
    <states>4</states>
    <arcs>3</arcs>
    <final-states>1</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>494541</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>47</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>12</max-state-id>
    <states>13</states>
    <arcs>12</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>9</output-epsilon-arcs>
    <memory>4886310</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>trim(composeMatching(static,cache(sort(trim(composeMatching(static,cache(sort(projectOutput(trim(composeMatching(cache(sort(static,SortTypeByOutput),10000),cache(invert(static),10000)))),SortTypeByInput),10000))),SortTypeByInput),10000)))</describe>
    <properties>!cached !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <states>0</states>
    <arcs>0</arcs>
    <final-states>0</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>0</output-epsilon-arcs>
    <memory>4890327</memory>
  </fsa-info>
-epsilon-arcs>
    <output-epsilon-arcs>3</output-epsilon-arcs>
    <memory>4889690</memory>
  </fsa-info>
  <fsa-info>
    <type>transducer</type>
    <describe>static</describe>
    <properties>!cached storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <output-labels>30858</output-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>4</max-state-id>
    <states>5</states>
    <arcs>7</arcs>
    <final-states>1</final-states>
    <io-epsilon-arcs>0</io-epsilon-arcs>
    <input-epsilon-arcs>0</input-epsilon-arcs>
    <output-epsilon-arcs>6</output-epsilon-arcs>
    <memory>432</memory>
  </fsa-info>
  <information component="neural-network-trainer.alignment-fsa-exporter.allophone-state-graph-builder">
    <fsa-info>
      <type>transducer</type>
      <describe>static</describe>
      <properties>!cached storage</properties>
      <semiring>tropical</semiring>
      <input-labels>0</input-labels>
      <output-labels>30858</output-labels>
      <initial-state-id>0</initial-state-id>
      <max-state-id>8</max-state-id>
      <states>9</states>
      <arcs>18</arcs>
      <final-states>2</final-states>
      <io-epsilon-arcs>0</io-epsilon-arcs>
      <input-epsilon-arcs>0</input-epsilon-arcs>
      <output-epsilon-arcs>16</output-epsilon-arcs>
      <memory>864</memory>
    </fsa-info>
  </information>
  <fsa-info>
    <type>acceptor</type>
    <describe>removeEpsilons(cache(sort(replaceDisambiguationSymbols(projectInput(static), *EPS*),SortTypeByInputAndTarget),10000))</describe>
    <properties>!cached !sorted-by-arc sorted-by-input !sorted-by-output !storage</properties>
    <semiring>tropical</semiring>
    <input-labels>0</input-labels>
    <initial-state-id>0</initial-state-id>
    <max-state-id>8</max-state-id>
    <states>8</states>
    <arcs>17</arcs>
    <final-states>2</final-states>
    <epsilon-arcs>0</epsilon-arcs>
    <memory>1764</memory>
  </fsa-info>


@albertz
Copy link
Member

albertz commented Nov 9, 2023

It would be helpful to have a RASR compiled with debugging information, and then to run this in GDB, such that you don't just get the crash, but that you can inspect it in GDB, and see the more detailed stack trace with line numbers. Specifically interesting is maybe Am::TransitionModel::apply.

@Marvin84
Copy link
Contributor

Marvin84 commented Nov 9, 2023

Specifically interesting is maybe Am::TransitionModel::apply.

Isn't the traceback showing Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int) as the last function call?

@albertz
Copy link
Member

albertz commented Nov 9, 2023

Specifically interesting is maybe Am::TransitionModel::apply.

Isn't the traceback showing Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int) as the last function call?

Yes but my assumption is that the this pointer here is already invalid, and that causes an invalid memory access in getState, but the question is, why is the this pointer invalid, and I assume the code in Am::TransitionModel::apply might give a better hint about that.

@Marvin84
Copy link
Contributor

Marvin84 commented Nov 9, 2023

Am::TransitionModel::apply might give a better hint about that.

Given a flat automaton resulting from the HCLG composition, apply function call adds the valid transitions following the desired topology and the respective scores on the arcs.
Last year for correction of the FSA bug we refactored this in order to distinguish between two different classes here, one with the bug correction and one legacy. However, this was for the classic HMM topology. For the CTC topology the code initially was not integrable and was manually overwriting the HMM automaton in the export FSA function right before passing it to returnn. After the integration, AFAIK now they all go through ClassicTransducerBuilder. However, it is not clear to me why this apply function is even called in the case of CTC. I see specific calls to CTC-related transitions in the stack above. @SimBe195 might know more.

@albertz
Copy link
Member

albertz commented Nov 9, 2023

TrimAutomaton has this getState:

    virtual _ConstStateRef getState(Fsa::StateId s) const {
        if (accAndCoacc_[s]) {
            _ConstStateRef _s = Precursor::fsa_->getState(s);
            _State*        sp = new _State(_s->id(), _s->tags(), _s->weight_);
            for (typename _State::const_iterator a = _s->begin(); a != _s->end(); ++a)
                if (accAndCoacc_[a->target()])
                    *sp->newArc() = *a;
            sp->minimize();
            return _ConstStateRef(sp);
        }
        return _ConstStateRef();
    }

So maybe my previous assumption was wrong, and this is valid, but the state id s here is invalid (e.g. -1 or so, or too high).

It would really help to run this in a debugger with debugging symbols, so that we can just better understand what's wrong here, without needing to guess blindly around.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

It seems that the problem is not universal but segment-related. With two segments (orth "um-hum" and "uh-huh"), the training runs, but there are others for which it crashes (examples I saw: "that's right" and "that is great").

@SimBe195
Copy link
Collaborator

SimBe195 commented Nov 9, 2023

Judging by the .dot files that @vieting generated the phon.dot graph still looks okay but the allophon.dot graph is empty which most likely means that the Fsa::trim here trimmed away every single node. This could happen e.g. when there is no reachable final state. I would remove the Fsa::trim call and inspect the resulting allophon.dot in order to see how the graph is malformed.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

Judging by the .dot files that @vieting generated the phon.dot graph still looks okay but the allophon.dot graph is empty which most likely means that the Fsa::trim here trimmed away every single node. This could happen e.g. when there is no reachable final state. I would remove the Fsa::trim call and inspect the resulting allophon.dot in order to see how the graph is malformed.

This is how the allophon.dot looks like when removing the trim:

digraph "fsa" {
ranksep = 1.0;
rankdir = LR;
center = 1;
orientation = Portrait
node [fontname="Helvetica"]
edge [fontname="Helvetica"]
n0 [label="0",shape=circle,style=bold]
n0 -> n1 [label="s{#+#}@i.0:so   /s ow/"]
n0 -> n2 [label="s{#+#}@[email protected]:so   /s ow/"]
n2 [label="2",shape=circle]
n1 [label="1",shape=circle]
n1 -> n3 [label="ow{#+#}.0:*EPS*"]
n1 -> n4 [label="ow{#+#}@f.0:*EPS*"]
n4 [label="4",shape=circle]
n4 -> n5 [label="#0:*EPS*"]
n5 [label="5",shape=circle]
n3 [label="3",shape=circle]
}

Plot

The transcription of the segment is "so the", but the "the" seems to be lost here.

@SimBe195
Copy link
Collaborator

SimBe195 commented Nov 9, 2023

Ooh, this might be caused by this issue here #50 for which I have the fix in my RASR versions but it hasn't been merged into master yet!

@albertz
Copy link
Member

albertz commented Nov 9, 2023

But we should also avoid that it crashes in that case. At least it should raise a C++ exception, or use our criticalError(), require or whatever. At that place where we get the wrong access (still not sure where that is, e.g. like in getState with invalid s, or already before). There should be additional checks. Ideally in getState I think it should also check in any case whether s is valid, using require or so.
(require is only checked if SPRINT_RELEASE_BUILD is disabled. Do you have that?)

@SimBe195
Copy link
Collaborator

SimBe195 commented Nov 9, 2023

But we should also avoid that it crashes in that case. At least it should raise a C++ exception, or use our criticalError(), require or whatever.

Yes I agree. We could simply put in another check of the form

if (model->initialStateId() == Fsa::InvalidStateId)
        criticalError("...");

like it's also done for other intermediate automatons in the GraphBuilder already.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

Ooh, this might be caused by this issue here #50 for which I have the fix in my RASR versions but it hasn't been merged into master yet!

I can confirm that the suggested fix in #50 solves my issue as well 🎉

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

The RASR version that I had with tf2.8 was taken from @SimBe195, so it already had the fix.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

I guess we should leave this issue open until #50 is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants