Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging should be enabled by default #1996

Closed
dankrad opened this issue Nov 28, 2020 · 15 comments
Closed

Logging should be enabled by default #1996

dankrad opened this issue Nov 28, 2020 · 15 comments
Assignees

Comments

@dankrad
Copy link

dankrad commented Nov 28, 2020

Logging currently has to be manually enabled using the --logfile command line parameter. I cannot imagine that anyone would run a beacon node without generating logs in case something needs debugging, so this should really be enabled by default in some sensible location.

@JustinDrake
Copy link
Contributor

JustinDrake commented Nov 28, 2020

It may make sense for the default configurations to even log to disk with a cap (e.g. 100MB of the most recent logs).

@AgeManning
Copy link
Member

This is a good idea.

The screen output is currently tied to the log-level. Ideally we'd want to output default INFO log-level logs and write debug-level logs to disk.

A lot of our users are running in docker containers or systemd scripts which save the logs, but again by default this is only INFO-level and not sufficient to debug any issues.

We should find a way to save debug-level logs to disk separately

@hukkin
Copy link
Contributor

hukkin commented Nov 28, 2020

Lighthouse-docker might still want to keep this feature disabled. At least I prefer to configure logging driver (and disk cap etc.) in docker.

@jamesmorgan
Copy link

Ive just been looking for logs to diagnose a problem but couldn't find them and then found this issue. I think having a sensible and default logging config defined and enabled is definitely required as well. Makes it hard to know whats happened when issues arise if I cant look back and easily see the problem.

@macladson macladson self-assigned this Oct 19, 2021
@macladson
Copy link
Member

macladson commented Oct 20, 2021

Hi all, I've been spending some time working on this issue and am interested to hear some feedback.
Currently I can log to a file at debug level, and to stdout at info level simultaneously (currently working on making this as the default).

But I started looking into how I would go about adding log rotation (cycle through a variable number of log files where each file is limited to a certain size).
And I found a few things:

  1. slog doesn't support log rotation (at least that I could tell) although there has been some recent movement on the topic so it's possible it will be added sometime in the future
  2. sloggers does support log rotation, but does not support outputting to multiple log streams in a single thread. (This means to get the desired functionality, logging will run over 2 different async threads rather than 1.)
  3. tracing supports log rotation but using this would involve a complete rewrite of our logging code (although I suspect the in-code macros won't require many changes to be compatible so it might not be too bad)

So the main options I see seem to be.

  1. Forgo log rotation for now
  2. Continue using slog and I try to implement some form of log rotation myself (not sure if this will really even be feasible)
  3. Continue using slog and use sloggers to implement file logging in a separate thread (but with log rotation)
  4. Switch to tracing.

Something interesting to note about option 4, is that the authors of slog actually recommend tracing for most users anyway and it seems more suited for async use cases. It is also maintained by the same group that maintains the tokio runtime which we already use extensively throughout Lighthouse.

It could be an interesting switch to make for the long term as it is becoming the more popular logging library (better docs, more active dev, etc).

I haven't looked into tracing much more than this though, so I'm not 100% sure it'll fully fit our requirements.

Would be interested to hear others' thoughts on the topic.

@AgeManning
Copy link
Member

AgeManning commented Oct 20, 2021

If we are ignoring the overhead of the switch, i'm in favour of tracing. It does have some nice features for debugging async code and I've switched to it for the discv5 protocol already. It has the added benefit that it integrates nicely with some of our sub-dependencies such as libp2p.

It integrates much nicer with the traditional log crate as well, whereas slog doesn't really do a good job of that.

IIRC we went with slog initially because of its async logging, but I'm not convinced that's a huge benefit to us.

If someone wanted to take on the task of upgrading all our logs, I'd be for that. Otherwise I'd probably suggest option 1 and forgo the log rotation.

@macladson
Copy link
Member

Sounds good then! Unless anyone has any objections, I would be happy to take it on and switch everything over to tracing.

@paulhauner
Copy link
Member

paulhauner commented Oct 20, 2021

I'm open to switching to tracing, it looks quite nice. It didn't exist when we chose slog for Lighthouse: LH added slog in July 2018 and tracing has their first commit in Sept 2018.

I also fine to lose the async component, I can't think of anywhere we're emitting logs in fast-running loops, so waiting for stdout shouldn't hurt us. Another reason we chose slog was because it could do structured logs and drop them as rich JSON. I did a bit of this in the early days with AWS, but we didn't end up going down that path. I think tracing would support JSON output, and I don't think it matters even if it doesn't.

Before we commit to switching, it would be good to know if we can maintain the same (or at least similar-enough) log formatting. I know we generally don't treat the logs as a stable API, but it would nice to know how much we're going to break it before we're too far down the track.

Also, tracing-flame looks really cool!

@paulhauner
Copy link
Member

paulhauner commented Oct 20, 2021

It's worth noting that we use custom slog formatting: https://github.com/sigp/lighthouse/blob/stable/common/logging/src/lib.rs

@macladson
Copy link
Member

On first glance, it appears that tracing doesn't include the crit log level: https://docs.rs/tracing/0.1.29/tracing/struct.Level.html#implementations

We may need to decide if the crit's we have in Lighthouse are fine as error's or if we should work out something different.

@paulhauner
Copy link
Member

On first glance, it appears that tracing doesn't include the crit log level: https://docs.rs/tracing/0.1.29/tracing/struct.Level.html#implementations

Ouch! In our production setup, ERRO sends an email and CRIT triggers a phone call (bypassing do-not-disturb). It has been working very nicely so far. One option would be to push all current ERRO into WARN and then reserve ERRO for what we used to have as CRIT.

That seems reasonable enough to me. I can't imagine it would break anyone's log parsing systems, we'd just have to make it very clear to them that you'll never see another CRIT and that ERRO is a big deal now.

I'd probably need to sit on this one overnight at least..

@michaelsproul
Copy link
Member

I'm erring towards saying we should stick with slog, and adopt sloggers to get background file logging (option 3).

That way we can retain the CRIT level, and we don't have to rewrite a bunch of stuff to use tracing. It looks like the tracing macros work quite differently from slog: they don't take a Logger-like argument in the first position, and don't support the key-value syntax. It wouldn't be impossible to shim, but it seems like a lot of work for a small gain. On the other hand, tracing looks really cool and maybe we'll end up wanting more features from it as time goes on (e.g. tracing-distributed)?

Other thoughts that spring to mind:

  • We should probably impose a hard limit on the size of the log file. When we say log rotation I'm assuming we mean some kind of chunk-and-compress scheme? Making sure that we can also delete above a certain threshold is a must IMO (e.g. I have 30GB of debug logs since genesis).
  • We should carefully consider the interaction with systemd/journald as that's how most users run. Should we switch to logging DEBG on stderr when journald (or a pipe) is detected? Or should we continue debug logging to the separate file and require users to configure it themselves? The latter option seems a bit risky as the duplicated logs could end up using a lot more space

@macladson
Copy link
Member

When we say log rotation I'm assuming we mean some kind of chunk-and-compress scheme? Making sure that we can also delete above a certain threshold is a must IMO (e.g. I have 30GB of debug logs since genesis).

Some notes on this:
As far as log rotation goes, sloggers seems to be more powerful than tracing.
It can:

By contrast, tracing rotates a log file every minute/hour/day with (as far as I can see) no option to compress or remove old log files.
https://docs.rs/tracing-appender/0.1.1/tracing_appender/rolling/struct.RollingFileAppender.html

We should carefully consider the interaction with systemd/journald as that's how most users run.

If we do go with tracing, tracing-journald might be an option for this:

@macladson
Copy link
Member

Following up on my previous comment, I've decided to go with sloggers (see #2762). The main reasons are as follows:

  • The CRIT log level is used extensively throughout Lighthouse and losing that would require a significant reworking of Lighthouse.
  • Log macros in tracing are different enough to slog that it would be non-trivial to switch (although there could be some possibility of a shim to convert between the macros)
  • The file rotation in sloggers is significantly more powerful than other solutions that exist and offers the greatest flexibility in how users want to manage their logs.

bors bot pushed a commit that referenced this issue Nov 30, 2021
## Issue Addressed

Closes #1996 

## Proposed Changes

Run a second `Logger` via `sloggers` which logs to a file in the background with:
- separate `debug-level` for background and terminal logging
- the ability to limit log size
- rotation through a customizable number of log files
- an option to compress old log files (`.gz` format)

Add the following new CLI flags:
- `--logfile-debug-level`: The debug level of the log files
- `--logfile-max-size`: The maximum size of each log file
- `--logfile-max-number`: The number of old log files to store
- `--logfile-compress`: Whether to compress old log files

By default background logging uses the `debug` log level and saves logfiles to:
- Beacon Node:  `$HOME/.lighthouse/$network/beacon/logs/beacon.log`
- Validator Client:  `$HOME/.lighthouse/$network/validators/logs/validator.log`

Or, when using the `--datadir` flag:
`$datadir/beacon/logs/beacon.log` and `$datadir/validators/logs/validator.log`

Once rotated, old logs are stored like so: `beacon.log.1`, `beacon.log.2` etc. 
> Note: `beacon.log.1` is always newer than `beacon.log.2`.

## Additional Info

Currently the default value of `--logfile-max-size` is 200 (MB) and `--logfile-max-number` is 5.
This means that the maximum storage space that the logs will take up by default is 1.2GB. 
(200MB x 5 from old log files + <200MB the current logfile being written to)
Happy to adjust these default values to whatever people think is appropriate. 

It's also worth noting that when logging to a file, we lose our custom `slog` formatting. This means the logfile logs look like this:
```
Oct 27 16:02:50.305 INFO Lighthouse started, version: Lighthouse/v2.0.1-8edd9d4+, module: lighthouse:413
Oct 27 16:02:50.305 INFO Configured for network, name: prater, module: lighthouse:414
```
@macladson
Copy link
Member

Resolved in #2762

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants