Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing metrics when a receiver is shutdown #7845

Open
thibautmery opened this issue Oct 19, 2024 · 1 comment
Open

Missing metrics when a receiver is shutdown #7845

thibautmery opened this issue Oct 19, 2024 · 1 comment

Comments

@thibautmery
Copy link

Thanos, Prometheus and Golang version used:

Thnaos version: v.0.35.1

Thanos is deploy with docker-compose.

Architecture:
Every prometheus remote write to thanos receiver lb. Those lb forward request to multiple thanos receiver.

We have a replica factor set to 1 and there is always 3 same thanos receiver.

Object Storage Provider:

What happened:

When I shutdown one thanos receiver in a clean way, I loose some metrics when I query them with thanos query:
image

What you expected to happen:

I expect that the metrics is replicated in almose one other thanos receiver, so when I query all the thanos recevier, even if one is missing, we should be able to get all the metrics.

The situation is resolved when the receiver that is shut come back.

Anything else we need to know:

The situation is present since few version of thanos.

Thanks you :)

Copy link

dosubot bot commented Oct 19, 2024

Hello @thibautmery! I'm here to help you with any bugs, questions, or contributions you may have.

I found some relevant information regarding your issue with missing metrics when a Thanos receiver is shut down. When the replication factor is set to 1, Thanos does not replicate the data across multiple receivers. This means that if a receiver is shut down, any data that was only stored on that receiver will be temporarily unavailable until the receiver is back online. This is why you're experiencing missing metrics when one of the receivers is down.

To ensure data availability and redundancy, it's recommended to set the replication factor to at least 2 or 3. This way, each time-series is stored on multiple receivers, allowing you to query the data even if one receiver is unavailable [1][2].

Additionally, I found a similar open issue discussing problems with Thanos receivers when one is down, which might provide further insights: Thanos receiver issue when 1 receiver is down [3].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant