Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task displays as 'Running' without any progress #5895

Open
AenBleidd opened this issue Nov 12, 2024 · 6 comments
Open

Task displays as 'Running' without any progress #5895

AenBleidd opened this issue Nov 12, 2024 · 6 comments

Comments

@AenBleidd
Copy link
Member

Discussed in #5894

Originally posted by homersimpsons November 11, 2024
Describe the bug
Sometimes a task displays as "Running" but does not get any progress (no elapsed times counter, no percentage, no process in task manager).

Steps To Reproduce

  1. Partiticipate to GPUGRID
  2. Wait for an ATMML task
  3. After the task as started for some times, pause it
  4. Wait a bit and resume it
  5. Here it should be stuck (but this does not always happen)

Suspending / resuming it again does not change anything. I have to stop the boinc manager (and the daemon) ans restart it so the task will restart directly.

Expected behavior
The task should start correctly

Screenshots
A screenshot of this task row, it was stuck like this for 30 minutes, other tasks were processing fine.
Image

System Information

  • OS: Windows 11
  • BOINC Version: 8.0.2

Additional context
I reported this on GPUGRID a mont ago (https://www.gpugrid.net/forum_thread.php?id=5487).
Maybe priority applications on GPU has something to do with the issue

I tried to run a simulation with https://boinc.berkeley.edu/sim_web.php?action=show_scenario&name=212 but when I submit it I have an error page saying:

command failed (139): ./sim --duration 600 --delta 60 --rec_half_life 864000 --infile_prefix scenarios/212/ --outfile_prefix scenarios/212/simulations/3/

Maybe I should also open an issue for this. The simulation seems to be created but empty though.

@homersimpsons
Copy link

homersimpsons commented Nov 13, 2024

I had this again today, the order of events I remember was:

  1. Get a new task
  2. Suspend it
  3. Run a GPU priority application
  4. Close the GPU priority application (after ~2h)
  5. Resume the task
  6. BUG: (observe the "Running", but the elapsed time remains blank and no process started for 2 minutes)
  7. WORKAROUND: Close and restart the BOINC Manager
  8. The task starts directly

NOTE: To me this is a P: Minor, the workaround is rather easy. The "biggest" issue could be someone not noticing and just losing time up to the deadline.

@davidpanderson
Copy link
Contributor

I don't understand the above.
What does 'GPU priority application' mean in 3) and 4),
and what do 'Run' and 'Close' mean?

Does the task in 1) need to be a GPU app?

In 6), are you talking about the same task as in 1)?

@homersimpsons
Copy link

What does 'GPU priority application' mean in 3) and 4),

I have the UI in french (I do not know how can I switch it to english), this is in this entry:
Image

and what do 'Run' and 'Close' mean?

It means start the application defined in the above settings (in my case a game), this will of course stop any GPU computation, then close the application, here the GPU computations should start, but they do not.

Does the task in 1) need to be a GPU app?

I think so, but maybe it is possible that this works with a CPU application with a "CPU priority application" defined. I am not 100% sure about the reproduction because I know those are the steps I take but maybe there are other steps leading to the same result.

In 6), are you talking about the same task as in 1)?

Yes, the same boinc task. In my case it is most often an "ATMML" (GPUGRID) one, but I just reproduced the issue tonight with an "ACEMD 3" (GPUGRID) task.

I run Einstein@home too for GPU, but I set it to 0 resource share so any other available GPU task will run instead.

@CharlieFenton
Copy link
Contributor

I think he is saying that he has set a game in the Exclusive Applications dialog to suspend the GPU when the game is running, but BONC does not resume a GPU task when the game is exited.

@AenBleidd
Copy link
Member Author

I still think this is an issue with the Project's application that doesn't resume after being suspend, and it starts work again only after complete restart of the application that is what basically happens when BOINC client is being restarted.

@homersimpsons
Copy link

I think he is saying that he has set a game in the Exclusive Applications dialog to suspend the GPU when the game is running, but BONC does not resume a GPU task when the game is exited.

Yes, I usually suspend and resume the task manually. But maybe it is just the fact that it does not correctly restart after an exclusive application.

I still think this is an issue with the Project's application that doesn't resume after being suspend, and it starts work again only after complete restart of the application that is what basically happens when BOINC client is being restarted.

Maybe, I do not have any technical details, is there any log I could provide that could help here? For the record, the GPUGRID applications does not checkpoint and they will just restart there computation if they have been suspended.

@AenBleidd AenBleidd removed this from Planning Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

4 participants