Is context-async-hooks suitable for production use? #2812

boutell · 2022-03-02T19:00:17Z

This only affects the JavaScript OpenTelemetry library
This may affect other libraries, but I would like to get opinions here first

Hello,

I'm wondering if context-async-hooks is considered suitable for production use. I see that it enables the automatic instrumentation of traces for mongodb and other libraries that don't receive the express "req" directly, which is absolutely amazing, but async_hooks is still marked "1 - experimental" in the node.js documentation (after several years, yes).

Thanks!

boutell · 2022-03-02T20:37:31Z

To be a little more helpful, I got this response on a related question posed on the nodejs repo:

nodejs/diagnostics#194 (comment)

It looks like AsyncLocalStorage is actually mostly declared stable, along with AsyncResource, but Async_Hooks itself is not and there's discussion of long-term removal, maybe, although that can't really be done without breaking at least the way AsyncLocalStorage gets imported.

Does AsyncResource provide everything that OpenTelemetry's family of tracers actually needs? I think it might, but I don't entirely understand what's depending on what.

Flarna · 2022-03-02T21:59:30Z

AsyncLocalStoreage is built on top of async hooks. From Otel point of view one works as good as the other.

The resons why the one is stable the the other is experimental is that async hooks API "leaks" node internal details. This results in an API surface which may change because of internal changes - so it would break the semver contract if moved to stable.

Also node.js domain module uses async hooks internally.

In short async_hooks are the base for stable Node.js APIs so they can't be removed completly. But they could be made internal to node.js.

AsyncResource is a different story and for a different use case. Otel uses AsyncLocalStore to propage transactional context. User libraries which do stuff that breaks this as of now (e.g. resource reuse/pooling,..) should use AsyncResource to avoid this breakage.

I think every APM tool for node.js - commercial or free - use AsyncHooks and/or AsyncLocalStorage since years. As millions applications in production use such tools the question regarding production readyness is most likely answered :o).

But to be clear: This "magic" functionality doesn't come for free. Using Otel and/or using AsyncHooks comes with CPU/Memory overhead (exact value depends on your app so please don't ask for numbers). Tracing is not for free.

Otel itself can work without AsyncLocalStorage/AsyncHooks. It's the users responsiblitly to pass the current span around in their app. Clearly non of the autoinstrumentations will be useable then but they are not mandatory.

boutell · 2022-03-02T22:44:56Z

Thanks! I appreciate that you don't have numbers on the overhead. I would be curious if you would consider it a major impact in the context of a web application. I assume if it was really ugly it would somewhat defeat the purpose of OpenTelemetry as it would mostly be measuring its own impact.

…

On Wed, Mar 2, 2022 at 4:59 PM Gerhard Stöbich ***@***.***> wrote: AsyncLocalStoreage is built on top of async hooks. From Otel point of view one works as good as the other. The resons why the one is stable the the other is experimental is that async hooks API "leaks" node internal details. This results in an API surface which may change because of internal changes - so it would break the semver contract if moved to stable. Also node.js domain module uses async hooks internally. In short async_hooks are the base for stable Node.js APIs so they can't be removed completly. But they could be made internal to node.js. AsyncResource is a different story and for a different use case. Otel uses AsyncLocalStore to propage transactional context. User libraries which do stuff that breaks this as of now (e.g. resource reuse/pooling,..) should use AsyncResource to avoid this breakage. I think every APM tool for node.js - commercial or free - use AsyncHooks and/or AsyncLocalStorage since years. As millions applications in production use such tools the question regarding production readyness is most likely answered :o). But to be clear: This "magic" functionality doesn't come for free. Using Otel and/or using AsyncHooks comes with CPU/Memory overhead (exact value depends on your app so please don't ask for numbers). Tracing is not for free. Otel itself can work without AsyncLocalStorage/AsyncHooks. It's the users responsiblitly to pass the current span around in their app. Clearly non of the autoinstrumentations will be useable then but they are not mandatory. — Reply to this email directly, view it on GitHub <#2812 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAH27OQ6UEPAAZ6WQUGHDDU57QE5ANCNFSM5PYKLRFA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

-- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

Qard · 2022-03-03T07:45:53Z

In a promise-heavy application, it's not uncommon to see 20-30% cpu overhead when turning on async_hooks or AsyncLocalStorage. Possibly more in extreme cases. It's a lot less than the 200%+ it used to be, but still noticeable.

boutell · 2022-03-03T12:51:34Z

Ah, I see. Very important to know. I will definitely avoid bringing it into production code situations in non-optional ways then. Thanks for the advice. When just using the ***@***.***/api` module for Open Telemetry compatibility that's not an issue, correct? It's only an issue if the SDK is actually in play?

…

On Thu, Mar 3, 2022 at 2:46 AM Stephen Belanger ***@***.***> wrote: In a promise-heavy application, it's not uncommon to see 20-30% cpu overhead when turning on async_hooks or AsyncLocalStorage. Possibly more in extreme cases. It's a lot less than the 200%+ it used to be, but still noticeable. — Reply to this email directly, view it on GitHub <#2812 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAH27IQ3B6IVPSTT45JVOLU6BU35ANCNFSM5PYKLRFA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

-- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

dyladan · 2022-03-03T14:25:28Z

Ah, I see. Very important to know. I will definitely avoid bringing it into production code situations in non-optional ways then. Thanks for the advice. When just using the @./api` module for Open Telemetry compatibility that's not an issue, correct? It's only an issue if the SDK is actually in play?
…
On Thu, Mar 3, 2022 at 2:46 AM Stephen Belanger @.> wrote: In a promise-heavy application, it's not uncommon to see 20-30% cpu overhead when turning on async_hooks or AsyncLocalStorage. Possibly more in extreme cases. It's a lot less than the 200%+ it used to be, but still noticeable. — Reply to this email directly, view it on GitHub <#2812 (comment)>, or unsubscribe <github.com/notifications/unsubscribe-auth/AAAH27IQ3B6IVPSTT45JVOLU6BU35ANCNFSM5PYKLRFA> . Triage notifications on the go with GitHub Mobile for iOS <apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: @.***>
-- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

The API on its own does not enable a context manager and therefore does not enable async_hooks or AsyncLocalStorage. You can also manually propagate context without enabling a context manager if you never register the SDK as a global instance, but then as @Flarna mentioned you will not be able to use any auto-instrumentations.

boutell · 2022-03-03T14:51:17Z

Great! I figured but this is helpful to have confirmation on. Appreciate the info.

In ApostropheCMS, which I'm working on adding OpenTelemetry support to, we potentially could do it the other way and still get a decent percentage of the information, but the auto instrumentation is just far too useful and our clients should be fine with enabling OpenTelemetry SDK in environments where they need the data and don't mind the performance hit until they are done gathering it.

I'm closing my ticket because my question has been more than answered. Maybe this deserves a note in documentation?

gmreads · 2023-05-04T15:30:23Z

User libraries which do stuff that breaks this as of now (e.g. resource reuse/pooling,..) should use AsyncResource to avoid this breakage.

@Flarna Can you please expand a bit more on this ? I've been trying to resolve a related bug for quite some time.

Basically the span created during first request is getting attached even in subsequent requests
span = trace.getSpan(context.active()) // always gives first span

Using sails-mysql adapter with, express and http instrumentation.
I suspect this might be due to connection pooling ?
Working on building a minimal reproducible example, will file an issue after pinpointing.

Flarna · 2023-05-04T19:20:14Z

@gmreads I hacked a small reproducer which includes the creation of AsyncResource to fix the issue:

import * as http from "http";
import { trace } from "@opentelemetry/api";
import { AsyncResource } from "async_hooks";

const port = 8600;
const tracer = trace.getTracer("sender");

interface Msg {
  msg: string;
  res?: AsyncResource;
}

const msgQueue: Msg[] = [];

function sendMessage(msg: string): void {
  msgQueue.push({
    msg,
    // if this is removed all three doSendMessage are in the same trace instead
    res: new AsyncResource("my.sendMessage")
  });
  if (msgQueue.length === 1) {
    doSendMessage();
  }
}

function doSendMessage(): void {
  if (msgQueue.length === 0) return;

  const doIt = () => {
    const span = tracer.startSpan("doSendMessage");
    // simulate that this takes a while
    setTimeout(() => {
      msgQueue.shift();
      span.end();
      process.nextTick(doSendMessage);
    }, 1000);
  }

  const msg = msgQueue[0];
  if (msg.res != null) {
    msg.res.runInAsyncScope(doIt);
  } else {
    doIt();
  }
}

let cnt = 0;
http.createServer(function onRequest(req, res) {
  sendMessage(`msg ${cnt++}`);
  res.end();
}).listen(port);

http.get({ port });
http.get({ port });
http.get({ port });

Dhruv-Garg79 · 2024-10-28T12:09:46Z

The API on its own does not enable a context manager and therefore does not enable async_hooks or AsyncLocalStorage. You can also manually propagate context without enabling a context manager if you never register the SDK as a global instance

@dyladan I am passing context manually and not using auto-instrumentation, but I can still see that async_hooks are being used from tracer.js from memory dump of production service.

const tracer = databaseTraceProvider.getTracer('db');
const span = tracer.startSpan(
	'db.query',
	{
		kind: SpanKind.CLIENT,
		attributes: {
			[SEMATTRS_DB_STATEMENT]: query,
		},
	},
	trace.setSpan(context.active(), parentSpan.trace),
);

I am passing context manually like this, can you tell if there is some other better way? I don't want to incur any penalty for async hooks, I am ready to write some extra custom code.

dyladan · 2024-10-28T12:21:21Z

Most likely the context manager is being enabled when you register the SDK. It is automatically enabled when you register the tracing sdk with the global API.

Flarna · 2024-10-28T12:21:31Z

If you don't want to use a ContextManager but pass around context instances manually I see no point in using context.active().

You should either use ROOT_CONTEXT if you start something fresh or the context object already present from your caller instead of asking ContextManager.
Clearly manually passing context tends to add an additional argument to quite some functions.

Dhruv-Garg79 · 2024-10-29T06:49:17Z

can you show some example of how to do that?
I am passing context manually only as of now, but couldn't find anything apart from context.active().

Most likely the context manager is being enabled when you register the SDK. It is automatically enabled when you register the tracing sdk with the global API.

I am not registering SDK, just batchExporter

Flarna · 2024-10-29T08:47:59Z

I think something like this:

const myRequestTracer = api.trace.getTracer("request-instrumentation");
const myWorkTracer = api.trace.getTracer("work-instrumentation");
function myRequestHandler(req: any) {
	const inCtx = api.propagation.extract(api.ROOT_CONTEXT, req);
	const reqSpan = myRequestTracer.startSpan("some-request", { kind: api.SpanKind.SERVER }, inCtx);
	const reqCtx = api.trace.setSpan(inCtx, reqSpan);
	doSomeWork(reqCtx);
	reqSpan.end();
}

function doSomeWork(ctx: api.Context) {
	const workSpan = myWorkTracer.startSpan("some-work", {}, ctx);
	const workCtx = api.trace.setSpan(ctx, workSpan);
	doSomeOtherWork(workCtx);
	workSpan.end();
}

function doSomeOtherWork(ctx: api.Context) {
	const workSpan = myWorkTracer.startSpan("other-work", {}, ctx);
	const workCtx = api.trace.setSpan(ctx, workSpan);
	//... call other functions, pass workCtx to allow them to pick it up
	workSpan.end();
}

==> never use context.active() as it uses context manager
==> always pass a context to startSpan to avoid the default implementation jumps in which uses context.active()

Clearly this results in quite some work and impacts the whole implementation.

And please note that all auto instrumentations rely one ContextManager to get the current active context. Therefore do not use them if you don't want to use a ContextManager. If you decide for the manual path its a application wide decision.

Dhruv-Garg79 · 2024-11-26T14:10:40Z

Hey @Flarna, Thank you very much. api.propagation.extract worked for me.

can you also suggest how to mark a trace as a parent of another trace, if we only have the trace_id of child?

Flarna · 2024-11-26T14:14:06Z

traces are separate entities. The concept of a trace is a parent of another trace does not exist in opentelemetry to my knowledge.
Spans have parent/child relationship and this forms a trace.

Dhruv-Garg79 · 2024-11-26T14:15:59Z

My mistake, I meant spans itself.

My use case looks like this - I am getting trace_id of span over the network, and I want this span to become child of new span I am creating.

Flarna · 2024-11-26T14:22:16Z

You have to put the received traceId/spanId into a SpanContext and the SpanContext into a Context and this Context should be given as argument to startSpan.
e.g. here is a sample.

Dhruv-Garg79 · 2024-11-27T07:29:12Z

you mean something like this, right?

		const reqSpan = trace.wrapSpanContext({
			traceId: req.traceId,
			spanId: '',
			traceFlags: 1,
			isRemote: true
		});

		const span = tracer.startSpan(
			'reqSpan',
			{
				startTime: req.startTime,
				kind: SpanKind.INTERNAL,
			},
			trace.setSpan(propagation.extract(ROOT_CONTEXT, reqSpan), reqSpan),
		);

		span.setStatus({ code: SpanStatusCode.OK });
		span.end(req.endTime);

Also, I will need both traceId and spanId to create a span context, right?

Flarna · 2024-11-27T10:32:25Z

Yes, you need traceId and spanId to identify a span.
But the way how you use propagation API is wrong you should extract from a carrier not from a spancontext.
Maybe look here

By the way, why do you hijack this long closed issue for this new topic?

Dhruv-Garg79 · 2024-11-27T12:23:55Z

No particular reason. I can delete these if you want.

Flarna · 2024-11-27T12:28:06Z

no need to delete, but please create a fresh issue if more help is needed.

boutell closed this as completed Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is context-async-hooks suitable for production use? #2812

Is context-async-hooks suitable for production use? #2812

boutell commented Mar 2, 2022

boutell commented Mar 2, 2022

Flarna commented Mar 2, 2022

boutell commented Mar 2, 2022 via email

Qard commented Mar 3, 2022

boutell commented Mar 3, 2022 via email

dyladan commented Mar 3, 2022

boutell commented Mar 3, 2022

gmreads commented May 4, 2023 •

edited

Loading

Flarna commented May 4, 2023

Dhruv-Garg79 commented Oct 28, 2024

dyladan commented Oct 28, 2024

Flarna commented Oct 28, 2024

Dhruv-Garg79 commented Oct 29, 2024

Flarna commented Oct 29, 2024

Dhruv-Garg79 commented Nov 26, 2024 •

edited

Loading

Flarna commented Nov 26, 2024

Dhruv-Garg79 commented Nov 26, 2024

Flarna commented Nov 26, 2024

Dhruv-Garg79 commented Nov 27, 2024 •

edited

Loading

Flarna commented Nov 27, 2024

Dhruv-Garg79 commented Nov 27, 2024

Flarna commented Nov 27, 2024

Is context-async-hooks suitable for production use? #2812

Is context-async-hooks suitable for production use? #2812

Comments

boutell commented Mar 2, 2022

boutell commented Mar 2, 2022

Flarna commented Mar 2, 2022

boutell commented Mar 2, 2022 via email

Qard commented Mar 3, 2022

boutell commented Mar 3, 2022 via email

dyladan commented Mar 3, 2022

boutell commented Mar 3, 2022

gmreads commented May 4, 2023 • edited Loading

Flarna commented May 4, 2023

Dhruv-Garg79 commented Oct 28, 2024

dyladan commented Oct 28, 2024

Flarna commented Oct 28, 2024

Dhruv-Garg79 commented Oct 29, 2024

Flarna commented Oct 29, 2024

Dhruv-Garg79 commented Nov 26, 2024 • edited Loading

Flarna commented Nov 26, 2024

Dhruv-Garg79 commented Nov 26, 2024

Flarna commented Nov 26, 2024

Dhruv-Garg79 commented Nov 27, 2024 • edited Loading

Flarna commented Nov 27, 2024

Dhruv-Garg79 commented Nov 27, 2024

Flarna commented Nov 27, 2024

gmreads commented May 4, 2023 •

edited

Loading

Dhruv-Garg79 commented Nov 26, 2024 •

edited

Loading

Dhruv-Garg79 commented Nov 27, 2024 •

edited

Loading