-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-create tracing spans for log groups #2309
base: main
Are you sure you want to change the base?
Conversation
internal/job/shell/shell.go
Outdated
if bytes.HasPrefix(p, []byte("~~~ ")) || bytes.HasPrefix(p, []byte("--- ")) || bytes.HasPrefix(p, []byte("+++ ")) { | ||
s.Close() | ||
operationName := bytes.SplitN(p[4:], []byte("\n"), 1)[0] | ||
s.span, s.ctx = opentracing.StartSpanFromContext(s.ctx, string(operationName)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I'm not clear about - can we assume Write calls are for each line? Or will we get subsets of a line and/or multiple lines inside the given byte array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In testing, it seems at least there will be multiple lines. I'm not sure if it will be called mid-line. It'll be simpler to assume not, since we won't have to maintain a buffer of writes.
bcbb7b6
to
4ee6553
Compare
|
||
func (s *spanMakerWriter) FinishIfActive() { | ||
if s.span != nil { | ||
s.span.Finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing missing is being able to mark a span as an error. I think we'd need to pass in the return of p.WaitResult here to know the process' exit code.
This will make it easy for CI users to trace and analyze their CI jobs without requiring additional work or scripting on their part. They can just look at tracing information on their tracing provider.
9e398d0
to
37d3ee9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea @goodspark, be we foresee some issues with the implementation.
We also would really appreciate in implementation for opentelemetry. It should not be that much more difficult.
@@ -532,6 +535,32 @@ func round(d time.Duration) time.Duration { | |||
} | |||
} | |||
|
|||
// spanMakerWriter is an io.Writer shim that captures logs and automatically creates trace spans for the log group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is getting pretty big. Can you put spanMakerWriter
and its methods in a new file?
And I think its name should reflect that it creates spans from log groups. spansFromLogGroupWriter
perhaps?
s.span, _ = opentracing.StartSpanFromContext(s.ctx, operationName) | ||
} | ||
return s.w.Write(p) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the complete line is not guaranteed to be contained in a single Write
. The +++
, etc could occur in the middle of p
. Even worse, it could be split across consecutive calls to Write
.
I think what we need here is a Writer
that has a state machine that will capture what's input between (for example) +++
and then next \n
and creates a new span as appropriate. It will also have to have an upper bound on how much it captures to prevent an adversary from crafting an input that will cause the agent to run out of memory. Of course, a limit is also useful because the tracing backend will have a limit on how long a span's name can be.
This will make it easy for CI users to trace and analyze their CI jobs without requiring additional work or scripting on their part. They can just look at tracing information on their tracing provider.
Example with Datadog tracing:
Associated CI job: