Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thanos tools bucket rewrite fails with a segmentation violation on v0.36.0 and later of thanos. #7844

Open
chris-barbour-as opened this issue Oct 19, 2024 · 4 comments · Fixed by prometheus/prometheus#15383

Comments

@chris-barbour-as
Copy link

Thanos, Prometheus and Golang version used:

[email protected]
[email protected]

Also tested with:
thanos@main (v0.35.2-0.20241017120053-731e4607d34a according to go.mod)
[email protected]

Confirmed issue does not affect:
[email protected]

Object Storage Provider:

FILESYSTEM

What happened:

Starting with v0.36.0 thanos tools bucket rewrite fails with a Segmentation Violation while attempting to write the new block back to the store.

What you expected to happen:

thanos tools bucket rewrite should not fail with a Segmentation Violation

How to reproduce it (as minimally and precisely as possible):

Attached go code will reproduce error.

My specific command invocation is as follows:

thanos tools bucket rewrite --objstore.config-file=objstore-local.yaml --rewrite.to-relabel-config-file=relabel-config.yaml --no-dry-run --delete-blocks --id "01HCGZPEM6EEHJ75218N914HNM" --tmp.dir="/home/appuser/data/thanos-rewrite"

objstore-local.yaml:

type: FILESYSTEM
config:
  directory: "/home/appuser/data"
prefix: ""

relabel-config.yaml:

- action: replace
  target_label: "foo"
  replacement: "bar"

However, the attached code will reproduce the issue by simply running:

go get

go run main.go

Full logs to relevant components:

Logs

Start time: Sat Oct 19 01:43:15 UTC 2024
ts=2024-10-19T01:43:15.39052425Z caller=factory.go:53 level=info msg="loading bucket configuration"
ts=2024-10-19T01:43:15.406953575Z caller=fetcher.go:623 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=16.053695ms duration_ms=16 cached=1077 returned=1077 partial=0
ts=2024-10-19T01:43:15.414714245Z caller=main.go:174 level=info msg=exiting
ts=2024-10-19T01:43:15.44098453Z caller=factory.go:53 level=info msg="loading bucket configuration"
ts=2024-10-19T01:43:17.14659156Z caller=tools_bucket.go:1226 level=info msg="downloading block" source=01HCGZPEM6EEHJ75218N914HNM
ts=2024-10-19T01:46:02.077523638Z caller=tools_bucket.go:1263 level=info msg="changelog will be available" file=/home/appuser/data/thanos-rewrite/01JAH77XYXDS8NBHFBJFKCS3V1/change.log
ts=2024-10-19T01:46:02.134607708Z caller=tools_bucket.go:1278 level=info msg="starting rewrite for block" source=01HCGZPEM6EEHJ75218N914HNM new=01JAH77XYXDS8NBHFBJFKCS3V1 toDelete= toRelabel="- action: replace\n  target_label: \"foo\"\n  replacement: \"bar\"\n- action: replace\n"
ts=2024-10-19T01:49:15.763983386Z caller=compactor.go:42 level=info msg="processed 10.00% of 3419617 series"
ts=2024-10-19T01:49:47.464508668Z caller=compactor.go:42 level=info msg="processed 20.00% of 3419617 series"
ts=2024-10-19T01:50:48.513697195Z caller=compactor.go:42 level=info msg="processed 30.00% of 3419617 series"
ts=2024-10-19T01:51:20.873113385Z caller=compactor.go:42 level=info msg="processed 40.00% of 3419617 series"
ts=2024-10-19T01:51:25.882289646Z caller=compactor.go:42 level=info msg="processed 50.00% of 3419617 series"
ts=2024-10-19T01:51:37.924141467Z caller=compactor.go:42 level=info msg="processed 60.00% of 3419617 series"
ts=2024-10-19T01:51:43.005992941Z caller=compactor.go:42 level=info msg="processed 70.00% of 3419617 series"
ts=2024-10-19T01:55:21.447415738Z caller=compactor.go:42 level=info msg="processed 80.00% of 3419617 series"
ts=2024-10-19T01:56:18.949121034Z caller=compactor.go:42 level=info msg="processed 90.00% of 3419617 series"
ts=2024-10-19T01:57:32.258928562Z caller=compactor.go:42 level=info msg="processed 100.00% of 3419617 series"
ts=2024-10-19T01:57:32.26977951Z caller=tools_bucket.go:1288 level=info msg="wrote new block after modifications; flushing" source=01HCGZPEM6EEHJ75218N914HNM new=01JAH77XYXDS8NBHFBJFKCS3V1
ts=2024-10-19T01:58:03.092458808Z caller=tools_bucket.go:1297 level=info msg="uploading new block" source=01HCGZPEM6EEHJ75218N914HNM new=01JAH77XYXDS8NBHFBJFKCS3V1
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x92573b]

goroutine 102 [running]:
github.com/grafana/regexp.(*Regexp).UnmarshalText(0x0, {0xc129d88463?, 0xc0f46c6f08?, 0x2907820?})
	/go/pkg/mod/github.com/grafana/[email protected]/regexp.go:1302 +0x3b
encoding/json.(*decodeState).literalStore(0xc0005b4b68, {0xc129d88462, 0xc, 0x199e}, {0x2907820?, 0xc0f46c6f08?, 0x27ddc00?}, 0x0)
	/usr/local/go/src/encoding/json/decode.go:877 +0x5f3
encoding/json.(*decodeState).value(0xc0005b4b68, {0x2907820?, 0xc0f46c6f08?, 0x5?})
	/usr/local/go/src/encoding/json/decode.go:388 +0x115
encoding/json.(*decodeState).object(0xc0005b4b68, {0x24d0120?, 0xc0001de298?, 0x23212c0?})
	/usr/local/go/src/encoding/json/decode.go:755 +0xd08
encoding/json.(*decodeState).value(0xc0005b4b68, {0x24d0120?, 0xc0001de298?, 0x1?})
	/usr/local/go/src/encoding/json/decode.go:374 +0x3e
encoding/json.(*decodeState).array(0xc0005b4b68, {0x23212c0?, 0xc0f3ef2710?, 0x313c?})
	/usr/local/go/src/encoding/json/decode.go:555 +0x50f
encoding/json.(*decodeState).value(0xc0005b4b68, {0x23212c0?, 0xc0f3ef2710?, 0x10?})
	/usr/local/go/src/encoding/json/decode.go:364 +0x74
encoding/json.(*decodeState).object(0xc0005b4b68, {0x2629080?, 0xc0f3ef26e0?, 0x2325400?})
	/usr/local/go/src/encoding/json/decode.go:755 +0xd08
encoding/json.(*decodeState).value(0xc0005b4b68, {0x2629080?, 0xc0f3ef26e0?, 0x1?})
	/usr/local/go/src/encoding/json/decode.go:374 +0x3e
encoding/json.(*decodeState).array(0xc0005b4b68, {0x2325400?, 0xc0005b4b00?, 0x38c3?})
	/usr/local/go/src/encoding/json/decode.go:555 +0x50f
encoding/json.(*decodeState).value(0xc0005b4b68, {0x2325400?, 0xc0005b4b00?, 0x8?})
	/usr/local/go/src/encoding/json/decode.go:364 +0x74
encoding/json.(*decodeState).object(0xc0005b4b68, {0x282b140?, 0xc0005b4aa8?, 0x39c0?})
	/usr/local/go/src/encoding/json/decode.go:755 +0xd08
encoding/json.(*decodeState).value(0xc0005b4b68, {0x282b140?, 0xc0005b4aa8?, 0x6?})
	/usr/local/go/src/encoding/json/decode.go:374 +0x3e
encoding/json.(*decodeState).object(0xc0005b4b68, {0x252fa40?, 0xc0005b4a00?, 0xc0005f8ee8?})
	/usr/local/go/src/encoding/json/decode.go:755 +0xd08
encoding/json.(*decodeState).value(0xc0005b4b68, {0x252fa40?, 0xc0005b4a00?, 0xc0005f8f38?})
	/usr/local/go/src/encoding/json/decode.go:374 +0x3e
encoding/json.(*decodeState).unmarshal(0xc0005b4b68, {0x252fa40?, 0xc0005b4a00?})
	/usr/local/go/src/encoding/json/decode.go:181 +0x133
encoding/json.(*Decoder).Decode(0xc0005b4b40, {0x252fa40, 0xc0005b4a00})
	/usr/local/go/src/encoding/json/stream.go:73 +0x179
github.com/thanos-io/thanos/pkg/block/metadata.Read({0x38c90e8?, 0xc0001de288})
	/app/pkg/block/metadata/meta.go:260 +0x10c
github.com/thanos-io/thanos/pkg/block/metadata.ReadFromDir({0xc11dc14680, 0x3c})
	/app/pkg/block/metadata/meta.go:252 +0x8e
github.com/thanos-io/thanos/pkg/block.upload({0x38db3b0, 0xc0000ce550}, {0x38b9f40, 0xc0001c4440}, {0x38f2518?, 0xc000622820?}, {0xc11dc14680, 0x3c}, {0x0, 0x0}, ...)
	/app/pkg/block/block.go:126 +0x125
github.com/thanos-io/thanos/pkg/block.Upload(...)
	/app/pkg/block/block.go:98
main.registerBucketRewrite.func1.1()
	/app/cmd/thanos/tools_bucket.go:1303 +0x1cf6
github.com/oklog/run.(*Group).Run.func1({0xc000a5eb60?, 0xc000622ad0?})
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x29
created by github.com/oklog/run.(*Group).Run in goroutine 1
	/go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0x67

Anything else we need to know:

The issue is specifically related to the helpful information that thanos tools bucket rewrite adds to the updated block's meta.json file.

The following JSON can be used to reproduce the problem:

{
  "thanos": {
    "rewrites": [
      {
        "relabels_applied": [
          {
            "SourceLabels": null,
            "Separator": ";",
            "Regex": "foo",
            "Modulus": 0,
            "TargetLabel": "bar",
            "Replacement": "baz",
            "Action": "replace"
          }
        ]
      }
    ]
  }
}

Removing "Regex": "foo", will make the problem go away. See attached code.

thanos-rewrites-segv.tar.gz

Copy link

dosubot bot commented Oct 19, 2024

Hey, @chris-barbour-as! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.

The segmentation violation you're encountering with the thanos tools bucket rewrite command in Thanos v0.36.0 and later seems to be related to the handling of the Regex field in the meta.json file. The error occurs when the Regex field is present, and removing it resolves the issue. This suggests that there might be a problem with how the Regex field is being processed or unmarshaled, possibly due to a nil pointer dereference in the github.com/grafana/regexp package.

To address this issue, you might consider the following steps:

  1. Check for Updates: Ensure that you are using the latest version of Thanos, as the issue might have been fixed in a newer release.

  2. Review Code Changes: Look into the changes made in the Thanos codebase around the handling of the Regex field or the github.com/grafana/regexp package to identify any potential bugs introduced in v0.36.0.

  3. Modify meta.json: As a temporary workaround, you can try modifying the meta.json to exclude the Regex field if it's not critical for your use case.

  4. Debugging: If you have the capability, you could debug the code to pinpoint the exact location and cause of the nil pointer dereference.

If the issue persists, it might be beneficial to report it to the Thanos maintainers with the details you've provided, including the specific JSON structure that triggers the error.

To continue talking to Dosu, mention @dosu.

@chris-barbour-as
Copy link
Author

Possible this is related to: #7338

@yeya24
Copy link
Contributor

yeya24 commented Nov 11, 2024

@chris-barbour-as The issue here is that relabel.Regexp doesn't implement json encoder/decoder like it does for YAML hence it doesn't handle the regexp pointer correctly.

We can either change Prometheus or switch to use jsoniter in Thanos so that we can add our own encoding/decoding function to relabel.Regexp

@yeya24
Copy link
Contributor

yeya24 commented Nov 11, 2024

I created prometheus/prometheus#15383 which should fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants