-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault with TF 2.14 image when providing automata to RETURNN #68
Comments
The full stdout+stderr of the RETURNN training is below.
RASR writes the following log for the nn-trainer:
The opts for the
|
The relevant stack trace (demangled):
|
@SimBe195 pointed me to use
|
It would be helpful to have a RASR compiled with debugging information, and then to run this in GDB, such that you don't just get the crash, but that you can inspect it in GDB, and see the more detailed stack trace with line numbers. Specifically interesting is maybe |
Isn't the traceback showing |
Yes but my assumption is that the |
Given a flat automaton resulting from the HCLG composition, |
virtual _ConstStateRef getState(Fsa::StateId s) const {
if (accAndCoacc_[s]) {
_ConstStateRef _s = Precursor::fsa_->getState(s);
_State* sp = new _State(_s->id(), _s->tags(), _s->weight_);
for (typename _State::const_iterator a = _s->begin(); a != _s->end(); ++a)
if (accAndCoacc_[a->target()])
*sp->newArc() = *a;
sp->minimize();
return _ConstStateRef(sp);
}
return _ConstStateRef();
} So maybe my previous assumption was wrong, and It would really help to run this in a debugger with debugging symbols, so that we can just better understand what's wrong here, without needing to guess blindly around. |
It seems that the problem is not universal but segment-related. With two segments (orth "um-hum" and "uh-huh"), the training runs, but there are others for which it crashes (examples I saw: "that's right" and "that is great"). |
Judging by the .dot files that @vieting generated the |
This is how the
The transcription of the segment is "so the", but the "the" seems to be lost here. |
Ooh, this might be caused by this issue here #50 for which I have the fix in my RASR versions but it hasn't been merged into master yet! |
But we should also avoid that it crashes in that case. At least it should raise a C++ exception, or use our |
Yes I agree. We could simply put in another check of the form
like it's also done for other intermediate automatons in the GraphBuilder already. |
The RASR version that I had with tf2.8 was taken from @SimBe195, so it already had the fix. |
I guess we should leave this issue open until #50 is merged? |
Due to an issue that I have with a training that uses
FastBaumWelchLoss
that might be related to tensorflow, I wanted to try running the same setup with a newer tensorflow version. I tried Bene's image and RASR from #64 to run it, however, get a segmentation fault.From the log, I'm not sure what is going wrong. I see
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
, but this seems to be normal and is included in the log of other working examples as well. I also see these warnings multiple times, but not sure if that's critical.Can anyone help to find out what issue is causing the segmentation fault?
The text was updated successfully, but these errors were encountered: