Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs may stuck in Enqueued state after app crash/restart #49

Open
f1nzer opened this issue Jun 22, 2022 · 3 comments
Open

Jobs may stuck in Enqueued state after app crash/restart #49

f1nzer opened this issue Jun 22, 2022 · 3 comments

Comments

@f1nzer
Copy link

f1nzer commented Jun 22, 2022

In an unstable environment where an application may crash or restart due to some external issue, there may be a case where some jobs may hang and never be moved to the processing state.

In my case there are 6 jobs that are in Enqueued state, but I can't see them via the dashboard (only count is displayed).

image

Looks like an item with type DocumentTypes.Queue was fetched using a JobQueue class and then the application crashed or something like that.
There is data from CosmosDb related to the document:

SELECT * FROM doc WHERE doc.job_id = 'e713eaed-5529-4dac-bcda-6452879ed1eb' or doc.id = 'e713eaed-5529-4dac-bcda-6452879ed1eb'
[
    {
        "data": {
            "type": "#, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null",
            "method": "RunAsync",
            "parameterTypes": "[\"#, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null\"]",
            "arguments": "#"
        },
        "arguments": "#",
        "parameters": [
            {
                "name": "CurrentCulture",
                "value": "\"\""
            },
            {
                "name": "CurrentUICulture",
                "value": "\"\""
            }
        ],
        "created_on": 1655817700,
        "type": 2,
        "id": "e713eaed-5529-4dac-bcda-6452879ed1eb",
        "_rid": "qcwkAMDjJzg14gMAAAAAAA==",
        "_self": "dbs/qcwkAA==/colls/qcwkAMDjJzg=/docs/qcwkAMDjJzg14gMAAAAAAA==/",
        "_etag": "\"fa003a2b-0000-0d00-0000-62b1c5e40000\"",
        "_attachments": "attachments/",
        "state_id": "75103e38-6bd9-42a2-b7ae-254a121728b5",
        "state_name": "Enqueued",
        "_ts": 1655817700
    },
    {
        "job_id": "e713eaed-5529-4dac-bcda-6452879ed1eb",
        "name": "Enqueued",
        "created_on": 1655817700,
        "data": {
            "EnqueuedAt": "2022-06-21T13:21:40.1279900Z",
            "Queue": "default"
        },
        "type": 8,
        "id": "75103e38-6bd9-42a2-b7ae-254a121728b5",
        "_rid": "qcwkAMDjJzg44gMAAAAAAA==",
        "_self": "dbs/qcwkAA==/colls/qcwkAMDjJzg=/docs/qcwkAMDjJzg44gMAAAAAAA==/",
        "_etag": "\"fa00292b-0000-0d00-0000-62b1c5e40000\"",
        "_attachments": "attachments/",
        "_ts": 1655817700
    }
]
@imranmomin
Copy link
Owner

When a job is dequeued it updates the fetched_at with current utc. The document is only removed if the job completes and the method RemoveFromQueue is invoked.

My guess is that after the job completed it mostly likely failed with other housekeeping tasks were called. i.e update state, counters and so on.

If you can provide logs we can surely look further into it

@f1nzer
Copy link
Author

f1nzer commented Jun 22, 2022

Unfortunately, there are no Hangfire related warnings/errors.

I have enabled additional logging to catch such problems in future.

@f1nzer
Copy link
Author

f1nzer commented Jun 23, 2022

My guess is that after the job completed it mostly likely failed with other housekeeping tasks were called. i.e update state, counters and so on.

Most likely the job was stored (+ state), but a Queue entity was not created. Probably, because it was scheduled in CosmosDbWriteOnlyTransaction but then due to app crash it was not executed (committed).

public override void AddToQueue(string queue, string jobId)
{
if (string.IsNullOrEmpty(queue)) throw new ArgumentNullException(nameof(queue));
if (string.IsNullOrEmpty(jobId)) throw new ArgumentNullException(nameof(jobId));
QueueCommand(() =>
{
IPersistentJobQueueProvider provider = connection.QueueProviders.GetProvider(queue);
IPersistentJobQueue persistentQueue = provider.GetJobQueue();
persistentQueue.Enqueue(queue, jobId);
});
}
#endregion

I think the only thing I can do there (at least in my bad environment) is to check for those "hung" jobs on app startup and then manually create Queue entities for them, but there is no queue name in those jobs to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants