Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-cmd/bottle: use system GNU tar if possible #18604

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cho-m
Copy link
Member

@cho-m cho-m commented Oct 21, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

I switched to non-reproducible args for building gnu-tar bottle in #18138 due to problems bottling gnu-tar with gnu-tar.

Looking into this again to restore reproducibility. Incomplete PR as back-to-back bottles are still producing a diff on Linux.


The main issue for gnu-tar is running with placeholders, particularly for the interpreter.

On Linux, there are 2 options:

  1. Use system GNU tar (changes in this PR). Potential limitations are if someone on old Linux tries bottling gnu-tar (unlikely) and maybe the version number difference.
  2. Directly run interpreter, e.g.
    #{HOMEBREW_PREFIX}/lib/ld.so \
      --library-path #{HOMEBREW_PREFIX}/lib:... \
      /path/to/tar ...
    

On macOS, gnu-tar has no dependencies and doesn't have interpreter handling like Linux so can just bottle gnu-tar with gnu-tar:

brew bottle --json --only-json-tab gnu-tar -v
/opt/homebrew/Library/Homebrew/vendor/bundle/ruby/3.3.0/bin/bundle clean
==> Determining gnu-tar bottle rebuild...
==> Bottling gnu-tar--1.35.arm64_sequoia.bottle.2.tar.gz...
==> Installing `gnu-tar` for bottling...
/opt/homebrew/bin/brew install --formula gnu-tar
Warning: gnu-tar 1.35 is already installed and up-to-date.
To reinstall 1.35, run:
  brew reinstall gnu-tar
/opt/homebrew/opt/gnu-tar/bin/gtar --create --numeric-owner --mtime=2023-07-18 02:55:23 --sort=name --owner=0 --group=0 --numeric-owner --format=pax --pax-option=globexthdr.name=/GlobalHead.%n,exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime --file /Users/cho-m/Desktop/gnu-tar--1.35.arm64_sequoia.bottle.2.tar gnu-tar/1.35
./gnu-tar--1.35.arm64_sequoia.bottle.2.tar.gz
  bottle do
    rebuild 2
    sha256 cellar: :any_skip_relocation, arm64_sequoia: "74c997076b93549ba3b8a5bd59d768a853b929d33317cc59ba7ec448824eff6f"
  end
Writing gnu-tar--1.35.arm64_sequoia.bottle.json

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me 👍🏻

@Bo98
Copy link
Member

Bo98 commented Oct 21, 2024

I believe the reason for using Homebrew gnu-tar was because there was output differences in a recent version and the system tar being older caused issues. 71888db.

Makes sense to use system tar when bottling gnu-tar itself though.

In the longer term, a Ruby implementation (e.g. minitar) is something on my "to investigate" list.

@MikeMcQuaid
Copy link
Member

Makes sense to use system tar when bottling gnu-tar itself though.

Agreed 👍🏻

@cho-m
Copy link
Member Author

cho-m commented Oct 21, 2024

Makes sense to use system tar when bottling gnu-tar itself though.

Sounds reasonable. I was seeing a diff when trying to bottle meson.


In the longer term, a Ruby implementation (e.g. minitar) is something on my "to investigate" list.

GNU tar does make it easy to potentially change the compression format. Although it wasn't something worked on (#13621), there may be value in allowing different compression formats per-formula

Mainly useful if noticeable improvement and if dependency is already met.

For example, I was able to use zstd to create llvm bottle in similar compression time as gzip that had a >10% size reduction and decompression is 1.4x faster (though llvms total install time is impacted by relocation overhead).

Summary
  tar -xf llvm--19.1.2.arm64_sequoia.bottle.1.tar.zst ran
    1.42 ± 0.21 times faster than tar -xf llvm--19.1.2.arm64_sequoia.bottle.tar.gz

577M llvm--19.1.2.arm64_sequoia.bottle.1.tar.lz4
289M llvm--19.1.2.arm64_sequoia.bottle.1.tar.xz
430M llvm--19.1.2.arm64_sequoia.bottle.1.tar.zst
490M llvm--19.1.2.arm64_sequoia.bottle.tar.gz
brew reinstall --quiet --display-times  ./llvm--19.1.2.arm64_sequoia.bottle.1.tar.zst
...
==> Summary
🍺  /opt/homebrew/Cellar/llvm/19.1.2: 8,041 files, 1.9GB
==> Running `brew cleanup llvm`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
==> Installation times
llvm                      6.580 sbrew reinstall --quiet --display-times  llvm
...
🍺  /opt/homebrew/Cellar/llvm/19.1.2: 8,041 files, 1.9GB
==> Running `brew cleanup llvm`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
==> Installation times
llvm                      7.522 s

@MikeMcQuaid
Copy link
Member

GNU tar does make it easy to potentially change the compression format. Although it wasn't something worked on (#13621), there may be value in allowing different compression formats per-formula

Mainly useful if noticeable improvement and if dependency is already met.

Would only want to do this if we can do so for all supported macOS versions (and typical Ubuntu runners) with no extra Homebrew or system dependencies.

@Bo98
Copy link
Member

Bo98 commented Oct 21, 2024

The compression algorithm is really a completely different thing. tar handles it for ease of use but under the hood it doesn't actually implement any compression code and simply defers to another command so I wouldn't really conflate our choice of tar for future compression support. tar does save having to remember what those external commands are but you still need to e.g. handle making sure they're installed, handle the different media types in the OCI manifest, etc.

Using Ruby Gzip has also been a good improvement as we don't need to worry about the system anymore (particularly GNU Gzip's different implementation) so that does set a fairly high quality bar, whereas meanwhile we've had to adjust this tar code repeatedly.

@cho-m
Copy link
Member Author

cho-m commented Oct 21, 2024

Would only want to do this if we can do so for all supported macOS versions (and typical Ubuntu runners) with no extra Homebrew or system dependencies.

Probably won't be possible for all formulae unless Apple decides to ship a copy.

If only handling a subset of formulae, then possible if zstd is in dependency tree already, e.g. anything that depends on libtiff.

For performance, we may need to look into relocation handling first.


Using Ruby Gzip has also been a good improvement as we don't need to worry about the system anymore (particularly GNU Gzip's different implementation) so that does set a fairly high quality bar, whereas meanwhile we've had to adjust this tar code repeatedly.

Letting tar handle both phases does allow some small optimizations for intermediary files (EDIT: maybe Ruby tar could also do this). But avoiding implementation details (particularly dealing with oldest macOS/Linux support) does help.

On side note, using equivalent of gzip --no-name would probably have been ideal for reproducible builds (I think this is default for other compression like zstd, xz, etc) but probably too late for that since our existing bottles have name/timestamp.

@MikeMcQuaid
Copy link
Member

Probably won't be possible for all formulae unless Apple decides to ship a copy.

Yes, I consider this the requirement here.

@Bo98
Copy link
Member

Bo98 commented Oct 22, 2024

One problem with zstd is it is not very reproducible. It changes between versions and, in certain cases, between CPUs.

I know we've broken reproducibility a lot in the last year, though I think it's still something we aim to keep fairly stable.

On macOS, we do have options like LZFSE but I suspect it's not too big of an impact.

@cho-m
Copy link
Member Author

cho-m commented Oct 22, 2024

One problem with zstd is it is not very reproducible. It changes between versions and, in certain cases, between CPUs.

Curious what Ubuntu, Fedora and Arch Linux do here given they use zstd, though haven't checked their stance on reproducibility. The ones focused on reproducibility like Debian and NixOS currently use xz.

Though either way not possible right now in Homebrew/core. macOS tar can only decompress .gz, .bz2, or .xz without extra dependencies and the others are all worse at compression/decompression speed.


On macOS, we do have options like LZFSE but I suspect it's not too big of an impact.

Seems like it would require more effort/code to use. And attempting all bottles would require more work. I guess it may have better decompression speed though doesn't seem worth it.


Anyway, will probably get back to PR soon.

Still looking at best place to handle logic, i.e. whether worth moving to Linux-specific code or keeping the GNU tar check and just adding a formula.name == "gnu-tar".

@Bo98
Copy link
Member

Bo98 commented Oct 22, 2024

Curious what Ubuntu, Fedora and Arch Linux do here given they use zstd, though haven't checked their stance on reproducibility. The ones focused on reproducibility like Debian and NixOS currently use xz.

Given the branched approach to many distros, they might not care too much about reproducibility between branches but do within a branch. I don't think zstd patch releases ever get backported in Ubuntu - only security update commits. Generally speaking it's usually reproducible with a fixed version (the CPU thing is seen as a bug and won't affect everyone).

For Homebrew, we're rolling release so we don't really have that distinction so we break as and when it makes sense to but ultimately try to have some reproducibility.

I do agree zstd is ideal, though any global dependency on a formula always has been awful (see attestations). If we go down that route it would probably have to ship with Portable Ruby like zlib already is, though that would still require code to handle system Ruby scenarios.

@carlocab
Copy link
Member

I do agree zstd is ideal, though any global dependency on a formula always has been awful (see attestations). If we go down that route it would probably have to ship with Portable Ruby like zlib already is, though that would still require code to handle system Ruby scenarios.

We could write a pure Ruby library for Zstd. Though by the time we've done that we might lose most of the benefits of using Zstd.

@Bo98
Copy link
Member

Bo98 commented Oct 23, 2024

I guess in theory you only need a decompressor which is an easier port given most magic is done on the compressor side.

Comment on lines +389 to +390
tar = if system_gnu_tar?
"tar"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplest way to update this PR is probably following with additional input parameter for formula:

Suggested change
tar = if system_gnu_tar?
"tar"
tar = if formula.name == "gnu-tar" && system_gnu_tar?
"tar"

Though this won't handle case if a user runs brew bottle gnu-tar on an old LTS Linux. Seems like a very unlikely situation.


A more complete option would be to add OS-specific logic to only check this on Linux-only as macOS doesn't have the same problem in relation to rewritten ld.so.


Either way, if we do want to consider pure Ruby alternative, then these changes would be a stopgap measure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me 👍🏻

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale No recent activity label Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants