Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans not exported for requests without IO operations in Cloudflare Workers #179

Open
gbmarc1 opened this issue Nov 12, 2024 · 3 comments · May be fixed by #182
Open

Spans not exported for requests without IO operations in Cloudflare Workers #179

gbmarc1 opened this issue Nov 12, 2024 · 3 comments · May be fixed by #182

Comments

@gbmarc1
Copy link

gbmarc1 commented Nov 12, 2024

Description

When using the otel-cf-workers library to instrument a Cloudflare Worker, spans are not being exported for requests that don't involve any IO operations outside of the worker.

Context

My worker is instrumented using this library. It works like a dream when we have IOs outside of the worker in the request. However, when there is no IO, the spans are not exported.

I did add a dummy KV cache GET/PUT to test that theory and it works (most of the time).
I noticed that when I could not see spans with 401s. A 401 does not make any call outside of the worker; but, successful requests which involve a network call had spans associated to them.

I may be wrong on the cause of this.

Current Behavior

  • Spans are successfully exported when there are IO operations outside of the worker in the request.
  • Spans are not exported when there are no IO operations.
  • Adding a dummy KV cache GET/PUT operation seems to resolve the issue most of the time.
  • Spans are not visible for 401 errors, which typically don't involve external calls.

Expected Behavior

Spans should be exported consistently for all requests, regardless of whether they involve IO operations or not.

Possible Cause ( I may strongly be wrong about this)

The issue may be related to how the worker's clock is updated (See details here). The scheduler.wait function in the library might be waiting indefinitely when there's no IO operation, as the clock is not being updated.

@gbmarc1
Copy link
Author

gbmarc1 commented Nov 12, 2024

Doing this very ugly patch seems to do the trick

const app = {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const startTime = performance.now();
     // Rest of code (Calling or not IOs)
    const endTime = performance.now();

    // This means that there hasn't been any IO, so we need to make a dummy call
    if (endTime - startTime === 0) {
      ctx.waitUntil(
        Promise.resolve().then(async () => {
          await incrementTime(env, 10);
        }),
      );
    }
  }
};

async function incrementTime(env: Env, delay: number): Promise<void> {
  const startTime = performance.now();
  const id = crypto.randomUUID();
  const maxTries = 50;
  let tries = 0;
  while (performance.now() - startTime < delay && tries < maxTries) {
    tries++;
    try {
      await env.CACHE.put(`dummy-call-to-trigger-spans ${id}`, 'dummy-value');
      await env.CACHE.delete(`dummy-call-to-trigger-spans ${id}`);
    } catch {
      // noop
    }
  }
}

@evanderkoogh
Copy link
Owner

That is a very ugly trick.. And absolutely a bug.
I am currently working on a major internals refactor to better handle this kind of flow and will make sure that this case is tested :)

@evanderkoogh
Copy link
Owner

Hey @gbmarc1, Apologies for the delay, but I just wanted to let you know that this is fixed in the core-logic-refactor branch.. (https://github.com/evanderkoogh/otel-cf-workers/tree/core-logic-refactor).

Please do not try this as everything but HTTP is broken in there, but here is what it looks like in Honeycomb..
Uploading Screenshot 2024-11-25 at 3.49.51 PM.png…

@evanderkoogh evanderkoogh linked a pull request Dec 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants