-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export recording of prompted performnce as subtitles or closed captions #22
Comments
I really like this suggestion! If anyone else wants to export subtitles, please leave your comments and suggestions here. |
Implementation wise, the greatest challenge is to keep timestamps in sync. It is not possible to precise estimate timestamps when the user uses a small font with a relatively wide prompter that allows for many words per line. This doesn't seem to be standard practice in studios anywhere, but people who record from webcams find it very convenient. I find it that a Speech To Text conversion would be necessary to achieve precisely estimate timestamps. The only advantage to using a teleprompter program to create subtitles would be that you'd be able to use the original text for the subtitles instead of the text produced by the STT conversion. Wouldn't it be easier if you could give the edited video and its teleprompter file/s to an intermediary or video editor program that uses STT to match the voice with the text and generate the preliminary SRT file? The SRT could then be refined using an editor like Subtitle Edit or Aegisub before upload. |
I do not expect precise synchronisation. Speech to text conversion is nice for informal language. On Dutch television, if it is used, (For instance at a life press conference) we first get a long warning that the subtitles are not correct. But for professional videos you cannot use it. And speech to text conversion in the Dutch language is even worse. (Laughable wrong, sometimes do not accept the Dutch word for "not", so you get the opposite meaning.). At the moment, we use Google. It gives a100% precise synchronisation, and is far better than the synchronisation we expect from the subtitle export. However, it often breaks a sentence in the middle, does not allow for multiple line subtitles. But 100% synchronization is not what we want. People are not robots. Google will never be able to get it right, because you want in the subttitle to present a full sentence. So you show the full sentence in the subtitle 0,2 second before the sentence is started, including the words that are not just spoken. And you keep it 0,5 second on the screen after the sentence is ended. (There are two kinds of people: that read faster, and people that read slower, you must accommodate them both). So we must edit it anyhow, and check very precisely... And that is a lot of work because we need to recombine the subtitles. With an exported file, we only would have to move it a little. And with typically 100 subtitles per file, that is not much work. What I did forget in my requirement is to be able to detect the start of a new line (Hard new line) and a new line because the length of the subtitle is exceeded (Soft new line). In the subtitle, this has the effect of a new subtitle, or a new line with a subtitle. |
The problem with Google's model is it can't accurately determine the start or end of a sentence, so instead they compromise punctuation marks and subtitles end up becoming a stream of words; at least that's my understanding of what is done for videos on YouTube. Having the original text as reference is a great advantage here. Nevertheless QPrompt's ability to time the start and end of sentences is be bounded to the imprecision of how many words fit in a line, and how fast is that line scrolled past. This is why something like STT is needed for better synchronization. Having useful subtitles exported solely from settings would be a frustrating experience because of the amount of trial and error users would have to do to match the recording. Precise text could be achieved by copying the teleprompter text to the subtitle file, after fuzzily matching it against a STT conversion. The fuzzy match would be done using characters or phonemes instead of words for better accuracy (using a dictionary against segmented words, this could later be made work with logographic writing systems as well). The program would then use punctuation marks from the original text to determine the start and end positions of individual subtitles, and match those positions with the more precise time codes from the STT conversion. Add or subtract the 0,2 and 0,5 paddings as applicable and you'd get very readable subtitles. Text would match what was used for the teleprompter and synchronization would be usable and much closer to Google's. The big question is whether this should be a separate program or a feature in QPrompt. I see this more as a separate program that QPrompt can communicate with. This program would be used as a standalone program to match text with edited recordings; or it could be connected to QPrompt to be used live. QPrompt would then indicate which lines to use as reference and update any changes to the script using a protocol like MOS. The program would then perform a live conversion, word by word, to generate closed captions from the teleprompter text. You'd get full sentence subtitles for edited recordings and precise closed captions for live streams. |
Thank you. Editing subtitles is a lot of work, and easy to make mistakes. |
A better solution than to generate subtitles is to use machine learning models such as Whisper to generate them in post. The issues you've mentioned are not very likely to happen on a well trained model, but they're still difficult and in some cases imposible to avoid if implementing this through the teleprompter. For this reason I'm closing this issue as won't implement. The very similar #271 is still in plans tho as that takes care of a slightly different need, which is creating captions for live productions. |
Is your feature request related to a problem or a limitation? Please describe...
Export to subtitle file.
This is a file that can be uploaded to youtube or branded in a video. It shows a text on the right moment.
In Europe, especially in small countries, nearly all professional videos are subtitled, It is easier for older people and people with hearing problems. And has the added advantage that you can look at videos in the office without annoying your colleges. It is for business, government and professionals, and rather impolite, to publish a video that is not subtitled.
Describe the solution you'd prefer
First I want to describe the workflow.
Our workflow, rather standard in industry for the production of professional (sales, demonstration, instruction, education) videos is as follows.
QPrompt is ideal because we cannot only include text but also in another font or color directives.
The problem is in step 4, because we do not have timing data. But Qprompt has that somewhere.
The SRT File is an standard extentions: he open source standard is SubTitle Edit.
Describe alternatives you've considered
We now export to txt, use SubTitle Edit to convert it to SRT, upload it together with the teleprompter video to Google, that sets the time stamps. This is not going well always.
And after that, we download the subtitle file and continue with step 5. T
...
Provide use examples
The text was updated successfully, but these errors were encountered: