Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling timeouts #93

Open
krystof-k opened this issue Feb 1, 2024 · 7 comments
Open

Handling timeouts #93

krystof-k opened this issue Feb 1, 2024 · 7 comments

Comments

@krystof-k
Copy link

krystof-k commented Feb 1, 2024

Hey there, I need to implement timeout for a long running Tesseract command.

I came up with two options how to do it:

  1. Add the timeout option to the RTesseract.new and reimplement the Command#run using the Open3.popen3 instead of Open3.capture3 and catch the timout there (if set)
  2. Add some async option to the RTesseract.new and implement some run_async and results methods, also using Open3.popen3, which would return PID therefore the timeout (killing the process) can be handled in the client code.

What do you think? Should I try to open a PR? Thanks!

@krystof-k
Copy link
Author

Just posting a workaround until this moves on, if anyone finds it useful.

Simply create a shell script wrapper around tesseract command:

#!/usr/bin/env sh

timeout 10s tesseract "$@"

And then use it when calling RTesseract:

RTesseract.new("image.jpeg", command: "./tesseract_with_timeout.sh").to_s

Unfortunately you cannot tell whether it timed out or it crashed:

begin
  RTesseract.new("image.jpeg", command: "./tesseract_with_timeout.sh").to_s
rescue RTesseract::Error => e
  raise e unless e.message.include?("Terminated")
  raise "Tesseract probably timed out"
end

So it would be still much better to handle it directly in the gem as proposed above.

@danielfriis
Copy link

Hey @krystof-k. Why do you need a timeout? Reason I'm asking is that I'm running a RTesseract command in a job. The job eventually stalls while demanding +100% CPU. I was wondering if you were experiencing something similar?

@krystof-k
Copy link
Author

Hey @danielfriis, well, maybe yes – I needed to avoid a long-running jobs because of limited resources, which would block execution of next jobs in the queue. I haven't got deep enough whether it just took a long time or hanged completely yet.

@danielfriis
Copy link

@krystof-k sounds like the same issue here. I'll let you know if I learn more

@danielfriis
Copy link

@krystof-k When I use your timeout script, I get this error: tesseract_with_timeout.sh: line 3: timeout: command not found. Do you know why?

@krystof-k
Copy link
Author

What system are you running it on? On macOS there is no built-in timeout command, you would need to install coreutils (brew install coreutils).

@danielfriis
Copy link

FYI. I found that the timeout script caused tesseract to close prematurely when I was looping through about 30 pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants