Sidecar issue since v0.37.0 #7947

rgarcia89 · 2024-11-29T14:01:29Z

Thanos, Prometheus and Golang version used:
Thanos: v0.36.1 / v0.37.0
Prometheus: v2.51.2

What happened:
After upgrading the Thanos sidecar container from v0.36.1 to v0.37.0, the Thanos Querier (v0.37.0) is no longer able to properly connect to the sidecar. The screenshot below shows two clusters running v0.36.1 and four clusters running v0.37.0.

As observed, the Min Time data is missing for the sidecars running on the latest version. Interestingly, when using Thanos Query, I can still see data newer than two hours. This suggests that the sidecar is being used since this data has not yet been uploaded to the object storage. Similarly, I can also view data from within the last two hours via Grafana. However, label_values queries no longer work when selecting this time range.

dosubot · 2024-11-29T14:03:54Z

Hello, @rgarcia89! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

I found some similar threads that might be relevant to your issue:

Open Issue: Thanos Side Endpoints Not Showing on Querier - This issue discusses problems with the Querier not displaying sidecar endpoints. No solution was found in the comments ^[1].
Closed Issue: Querier cannot speak to external sidecar - This issue was resolved by implementing a proxy solution using Envoy to allow the Querier to communicate with the external sidecar through a proxy setup ^[2].

Additionally, there is a known breaking change in Thanos v0.37.0: the removal of the Store gRPC Info function, which might affect connectivity if your setup relies on this functionality ^[3].

These threads and changes might provide some insights or potential solutions to your connectivity issue with the Thanos sidecar after upgrading to v0.37.0.

To continue talking to Dosu, mention @dosu.

MichaHoffmann · 2024-11-29T14:10:48Z

Maybe this could be related to #7820 ?

MichaHoffmann · 2024-11-29T14:16:17Z

Do you see anything suspicious in the sidecar logs?

rgarcia89 · 2024-11-29T15:20:22Z

Nope nothing suspicious in there.

ts=2024-11-29T15:18:09.581373607Z caller=main.go:77 level=debug msg="maxprocs: Leaving GOMAXPROCS=2: CPU quota undefined"
ts=2024-11-29T15:18:09.584061062Z caller=options.go:29 level=info protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
ts=2024-11-29T15:18:09.5846933Z caller=factory.go:54 level=info msg="loading bucket configuration"
ts=2024-11-29T15:18:09.584853638Z caller=azure.go:150 level=debug msg="creating new Azure bucket connection" component=sidecar
ts=2024-11-29T15:18:09.648072464Z caller=sidecar.go:432 level=info msg="starting sidecar"
ts=2024-11-29T15:18:09.648222143Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
ts=2024-11-29T15:18:09.648276193Z caller=http.go:73 level=info service=http/server component=sidecar msg="listening for requests and metrics" address=:10902
ts=2024-11-29T15:18:09.648720274Z caller=reloader.go:274 level=info component=reloader msg="nothing to be watched"
ts=2024-11-29T15:18:09.648857256Z caller=tls_config.go:348 level=info service=http/server component=sidecar msg="Listening on" address=[::]:10902
ts=2024-11-29T15:18:09.648932726Z caller=tls_config.go:351 level=info service=http/server component=sidecar msg="TLS is disabled." http2=false address=[::]:10902
ts=2024-11-29T15:18:09.656684874Z caller=sidecar.go:444 level=warn msg="failed to get Prometheus flags. Is Prometheus running? Retrying" err="got non-200 response code: 503, response: Service Unavailable"
ts=2024-11-29T15:18:11.660515648Z caller=sidecar.go:444 level=warn msg="failed to get Prometheus flags. Is Prometheus running? Retrying" err="got non-200 response code: 503, response: Service Unavailable"
ts=2024-11-29T15:18:13.654530085Z caller=sidecar.go:200 level=info msg="successfully validated prometheus flags"
ts=2024-11-29T15:18:13.655620235Z caller=promclient.go:663 level=debug msg="build version" url=http://localhost:9090/api/v1/status/buildinfo
ts=2024-11-29T15:18:13.658524014Z caller=sidecar.go:223 level=info msg="successfully loaded prometheus version"
ts=2024-11-29T15:18:13.659194342Z caller=promclient.go:699 level=debug msg="lowest timestamp" url=http://localhost:9090/metrics
ts=2024-11-29T15:18:13.7098399Z caller=sidecar.go:254 level=info msg="successfully loaded prometheus external labels" external_labels="{cluster=\"rnd\", prometheus=\"monitoring/rnd\", prometheus_replica=\"prometheus-rnd-0\", stage=\"lab\"}"
ts=2024-11-29T15:18:13.714241929Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready
ts=2024-11-29T15:18:13.714966097Z caller=promclient.go:699 level=debug msg="lowest timestamp" url=http://localhost:9090/metrics
ts=2024-11-29T15:18:13.715796374Z caller=grpc.go:167 level=info service=gRPC/server component=sidecar msg="listening for serving gRPC" address=:10901
ts=2024-11-29T15:18:43.71504741Z caller=promclient.go:699 level=debug msg="lowest timestamp" url=http://localhost:9090/metrics
ts=2024-11-29T15:19:13.71577582Z caller=promclient.go:699 level=debug msg="lowest timestamp" url=http://localhost:9090/metrics

MichaHoffmann · 2024-11-29T15:25:59Z

So it looks like we were able to get lowest timestamp from prometheus. Did prometheus cut a block already?

rgarcia89 · 2024-11-29T15:33:37Z

Not since I restarted it. We have configured a block size of 2h

MichaHoffmann · 2024-11-29T15:36:25Z

I think the problem should solve itself once prometheus cuts a block for the first time.

rgarcia89 · 2024-11-29T15:38:57Z

I will let you know. Still, this is not happening with v0.36.1

MichaHoffmann · 2024-11-29T16:25:46Z

I will let you know. Still, this is not happening with v0.36.1

Yeah; we likely need to fallback to shipper timestamp if we cannot consult the metrics ~ thats a bug; but it would still be cool to know if it recovers after first block is cut!

rgarcia89 · 2024-11-29T21:16:27Z

No difference even after prometheus cut a block

dosubot bot added bug component: sidecar labels Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sidecar issue since v0.37.0 #7947

Sidecar issue since v0.37.0 #7947

rgarcia89 commented Nov 29, 2024

dosubot bot commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

Sidecar issue since v0.37.0 #7947

Sidecar issue since v0.37.0 #7947

Comments

rgarcia89 commented Nov 29, 2024

dosubot bot commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024

MichaHoffmann commented Nov 29, 2024

rgarcia89 commented Nov 29, 2024