Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Journald input resilient to Journald errors #39355

Open
belimawr opened this issue May 1, 2024 · 2 comments
Open

Make Journald input resilient to Journald errors #39355

belimawr opened this issue May 1, 2024 · 2 comments
Labels
bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@belimawr
Copy link
Contributor

belimawr commented May 1, 2024

Currently if there is any error reading the next message from Journald, the input will stop working and never recover, effectively stopping ingestion and never recovering.

This happens because any error reading a new message or publishing a message

for {
entry, err := parser.Next()
if err != nil {
return err
}
event := entry.ToEvent()
if err := publisher.Publish(event, event.Private); err != nil {
return err
}
}
is returned by the Run method that was called in a goroutine that logs it and then exits
go func() {
defer r.wg.Done()
log.Infof("Input '%s' starting", name)
err := r.input.Run(
v2.Context{
ID: r.id,
Agent: *r.agent,
Logger: log,
Cancelation: r.sig,
},
r.connector,
)
if err != nil && !errors.Is(err, context.Canceled) {
log.Errorf("Input '%s' failed with: %+v", name, err)
} else {
log.Infof("Input '%s' stopped (goroutine)", name)
}

We need to make the Journald input more resilient to errors we get when calling the host's journald via github.com/coreos/go-systemd/v22/sdjournal.

@belimawr belimawr added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels May 1, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr
Copy link
Contributor Author

Even after the merge of #40061 and the migration to using journalctl this issue is still relevant, if journalctl crashes the input finishes and the ingestion of journal messages stops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

2 participants