Logging should be enabled by default #1996

dankrad · 2020-11-28T00:04:56Z

Logging currently has to be manually enabled using the --logfile command line parameter. I cannot imagine that anyone would run a beacon node without generating logs in case something needs debugging, so this should really be enabled by default in some sensible location.

The text was updated successfully, but these errors were encountered:

JustinDrake · 2020-11-28T00:19:29Z

It may make sense for the default configurations to even log to disk with a cap (e.g. 100MB of the most recent logs).

AgeManning · 2020-11-28T07:36:36Z

This is a good idea.

The screen output is currently tied to the log-level. Ideally we'd want to output default INFO log-level logs and write debug-level logs to disk.

A lot of our users are running in docker containers or systemd scripts which save the logs, but again by default this is only INFO-level and not sufficient to debug any issues.

We should find a way to save debug-level logs to disk separately

hukkin · 2020-11-28T15:51:05Z

Lighthouse-docker might still want to keep this feature disabled. At least I prefer to configure logging driver (and disk cap etc.) in docker.

jamesmorgan · 2020-12-11T10:44:52Z

Ive just been looking for logs to diagnose a problem but couldn't find them and then found this issue. I think having a sensible and default logging config defined and enabled is definitely required as well. Makes it hard to know whats happened when issues arise if I cant look back and easily see the problem.

macladson · 2021-10-20T01:41:56Z

Hi all, I've been spending some time working on this issue and am interested to hear some feedback.
Currently I can log to a file at debug level, and to stdout at info level simultaneously (currently working on making this as the default).

But I started looking into how I would go about adding log rotation (cycle through a variable number of log files where each file is limited to a certain size).
And I found a few things:

slog doesn't support log rotation (at least that I could tell) although there has been some recent movement on the topic so it's possible it will be added sometime in the future
sloggers does support log rotation, but does not support outputting to multiple log streams in a single thread. (This means to get the desired functionality, logging will run over 2 different async threads rather than 1.)
tracing supports log rotation but using this would involve a complete rewrite of our logging code ~~(although I suspect the in-code macros won't require many changes to be compatible so it might not be too bad)~~

So the main options I see seem to be.

Forgo log rotation for now
Continue using slog and I try to implement some form of log rotation myself (not sure if this will really even be feasible)
Continue using slog and use sloggers to implement file logging in a separate thread (but with log rotation)
Switch to tracing.

Something interesting to note about option 4, is that the authors of slog actually recommend tracing for most users anyway and it seems more suited for async use cases. It is also maintained by the same group that maintains the tokio runtime which we already use extensively throughout Lighthouse.

It could be an interesting switch to make for the long term as it is becoming the more popular logging library (better docs, more active dev, etc).

I haven't looked into tracing much more than this though, so I'm not 100% sure it'll fully fit our requirements.

Would be interested to hear others' thoughts on the topic.

AgeManning · 2021-10-20T02:21:18Z

If we are ignoring the overhead of the switch, i'm in favour of tracing. It does have some nice features for debugging async code and I've switched to it for the discv5 protocol already. It has the added benefit that it integrates nicely with some of our sub-dependencies such as libp2p.

It integrates much nicer with the traditional log crate as well, whereas slog doesn't really do a good job of that.

IIRC we went with slog initially because of its async logging, but I'm not convinced that's a huge benefit to us.

If someone wanted to take on the task of upgrading all our logs, I'd be for that. Otherwise I'd probably suggest option 1 and forgo the log rotation.

macladson · 2021-10-20T03:49:23Z

Sounds good then! Unless anyone has any objections, I would be happy to take it on and switch everything over to tracing.

paulhauner · 2021-10-20T03:56:53Z

I'm open to switching to tracing, it looks quite nice. It didn't exist when we chose slog for Lighthouse: LH added slog in July 2018 and tracing has their first commit in Sept 2018.

I also fine to lose the async component, I can't think of anywhere we're emitting logs in fast-running loops, so waiting for stdout shouldn't hurt us. Another reason we chose slog was because it could do structured logs and drop them as rich JSON. I did a bit of this in the early days with AWS, but we didn't end up going down that path. I think tracing would support JSON output, and I don't think it matters even if it doesn't.

Before we commit to switching, it would be good to know if we can maintain the same (or at least similar-enough) log formatting. I know we generally don't treat the logs as a stable API, but it would nice to know how much we're going to break it before we're too far down the track.

Also, tracing-flame looks really cool!

paulhauner · 2021-10-20T03:58:35Z

It's worth noting that we use custom slog formatting: https://github.com/sigp/lighthouse/blob/stable/common/logging/src/lib.rs

macladson · 2021-10-20T04:07:01Z

On first glance, it appears that tracing doesn't include the crit log level: https://docs.rs/tracing/0.1.29/tracing/struct.Level.html#implementations

We may need to decide if the crit's we have in Lighthouse are fine as error's or if we should work out something different.

paulhauner · 2021-10-20T04:14:01Z

On first glance, it appears that tracing doesn't include the crit log level: https://docs.rs/tracing/0.1.29/tracing/struct.Level.html#implementations

Ouch! In our production setup, ERRO sends an email and CRIT triggers a phone call (bypassing do-not-disturb). It has been working very nicely so far. One option would be to push all current ERRO into WARN and then reserve ERRO for what we used to have as CRIT.

That seems reasonable enough to me. I can't imagine it would break anyone's log parsing systems, we'd just have to make it very clear to them that you'll never see another CRIT and that ERRO is a big deal now.

I'd probably need to sit on this one overnight at least..

michaelsproul · 2021-10-20T04:46:58Z

I'm erring towards saying we should stick with slog, and adopt sloggers to get background file logging (option 3).

That way we can retain the CRIT level, and we don't have to rewrite a bunch of stuff to use tracing. It looks like the tracing macros work quite differently from slog: they don't take a Logger-like argument in the first position, and don't support the key-value syntax. It wouldn't be impossible to shim, but it seems like a lot of work for a small gain. On the other hand, tracing looks really cool and maybe we'll end up wanting more features from it as time goes on (e.g. tracing-distributed)?

Other thoughts that spring to mind:

We should probably impose a hard limit on the size of the log file. When we say log rotation I'm assuming we mean some kind of chunk-and-compress scheme? Making sure that we can also delete above a certain threshold is a must IMO (e.g. I have 30GB of debug logs since genesis).
We should carefully consider the interaction with systemd/journald as that's how most users run. Should we switch to logging DEBG on stderr when journald (or a pipe) is detected? Or should we continue debug logging to the separate file and require users to configure it themselves? The latter option seems a bit risky as the duplicated logs could end up using a lot more space

macladson · 2021-10-20T06:00:26Z

When we say log rotation I'm assuming we mean some kind of chunk-and-compress scheme? Making sure that we can also delete above a certain threshold is a must IMO (e.g. I have 30GB of debug logs since genesis).

Some notes on this:
As far as log rotation goes, sloggers seems to be more powerful than tracing.
It can:

Set the max size of log files,
Set the number of log files to rotate through,
Compress old log files,
https://docs.rs/sloggers/2.0.2/sloggers/file/struct.FileLoggerBuilder.html

By contrast, tracing rotates a log file every minute/hour/day with (as far as I can see) no option to compress or remove old log files.
https://docs.rs/tracing-appender/0.1.1/tracing_appender/rolling/struct.RollingFileAppender.html

We should carefully consider the interaction with systemd/journald as that's how most users run.

If we do go with tracing, tracing-journald might be an option for this:

macladson · 2021-11-02T06:09:19Z

Following up on my previous comment, I've decided to go with sloggers (see #2762). The main reasons are as follows:

The CRIT log level is used extensively throughout Lighthouse and losing that would require a significant reworking of Lighthouse.
Log macros in tracing are different enough to slog that it would be non-trivial to switch (although there could be some possibility of a shim to convert between the macros)
The file rotation in sloggers is significantly more powerful than other solutions that exist and offers the greatest flexibility in how users want to manage their logs.

## Issue Addressed Closes #1996 ## Proposed Changes Run a second `Logger` via `sloggers` which logs to a file in the background with: - separate `debug-level` for background and terminal logging - the ability to limit log size - rotation through a customizable number of log files - an option to compress old log files (`.gz` format) Add the following new CLI flags: - `--logfile-debug-level`: The debug level of the log files - `--logfile-max-size`: The maximum size of each log file - `--logfile-max-number`: The number of old log files to store - `--logfile-compress`: Whether to compress old log files By default background logging uses the `debug` log level and saves logfiles to: - Beacon Node: `$HOME/.lighthouse/$network/beacon/logs/beacon.log` - Validator Client: `$HOME/.lighthouse/$network/validators/logs/validator.log` Or, when using the `--datadir` flag: `$datadir/beacon/logs/beacon.log` and `$datadir/validators/logs/validator.log` Once rotated, old logs are stored like so: `beacon.log.1`, `beacon.log.2` etc. > Note: `beacon.log.1` is always newer than `beacon.log.2`. ## Additional Info Currently the default value of `--logfile-max-size` is 200 (MB) and `--logfile-max-number` is 5. This means that the maximum storage space that the logs will take up by default is 1.2GB. (200MB x 5 from old log files + <200MB the current logfile being written to) Happy to adjust these default values to whatever people think is appropriate. It's also worth noting that when logging to a file, we lose our custom `slog` formatting. This means the logfile logs look like this: ``` Oct 27 16:02:50.305 INFO Lighthouse started, version: Lighthouse/v2.0.1-8edd9d4+, module: lighthouse:413 Oct 27 16:02:50.305 INFO Configured for network, name: prater, module: lighthouse:414 ```

macladson · 2021-11-30T05:08:13Z

Resolved in #2762

divagant-martian added the UX-and-logs label Nov 29, 2020

macladson self-assigned this Oct 19, 2021

macladson mentioned this issue Nov 2, 2021

[Merged by Bors] - Add background file logging #2762

Closed

macladson closed this as completed Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging should be enabled by default #1996

Logging should be enabled by default #1996

dankrad commented Nov 28, 2020

JustinDrake commented Nov 28, 2020 •

edited

Loading

AgeManning commented Nov 28, 2020

hukkin commented Nov 28, 2020

jamesmorgan commented Dec 11, 2020

macladson commented Oct 20, 2021 •

edited

Loading

AgeManning commented Oct 20, 2021 •

edited

Loading

macladson commented Oct 20, 2021

paulhauner commented Oct 20, 2021 •

edited

Loading

paulhauner commented Oct 20, 2021 •

edited

Loading

macladson commented Oct 20, 2021

paulhauner commented Oct 20, 2021

michaelsproul commented Oct 20, 2021

macladson commented Oct 20, 2021

macladson commented Nov 2, 2021

macladson commented Nov 30, 2021

Logging should be enabled by default #1996

Logging should be enabled by default #1996

Comments

dankrad commented Nov 28, 2020

JustinDrake commented Nov 28, 2020 • edited Loading

AgeManning commented Nov 28, 2020

hukkin commented Nov 28, 2020

jamesmorgan commented Dec 11, 2020

macladson commented Oct 20, 2021 • edited Loading

AgeManning commented Oct 20, 2021 • edited Loading

macladson commented Oct 20, 2021

paulhauner commented Oct 20, 2021 • edited Loading

paulhauner commented Oct 20, 2021 • edited Loading

macladson commented Oct 20, 2021

paulhauner commented Oct 20, 2021

michaelsproul commented Oct 20, 2021

macladson commented Oct 20, 2021

macladson commented Nov 2, 2021

macladson commented Nov 30, 2021

JustinDrake commented Nov 28, 2020 •

edited

Loading

macladson commented Oct 20, 2021 •

edited

Loading

AgeManning commented Oct 20, 2021 •

edited

Loading

paulhauner commented Oct 20, 2021 •

edited

Loading

paulhauner commented Oct 20, 2021 •

edited

Loading