-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge FreeBSD 2024-09-13 #2262
Merged
Merged
Merge FreeBSD 2024-09-13 #2262
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This yields substantial performance improvements when we only write out some small % of entries at a time, as it will cause entries that will go into "nearby" ZAP leaf nodes to be grouped closer together in the AVL, and so touch fewer blocks. Without this, the distribution is an even spread, so we touch a lot more ZAP leaf nodes for any given number of entries. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895
All objects stored in the MOS get copies=3. For a large dedup table, this requires significant extra IO and disk space, when its not really necessary - the dedup table itself isn't needed to read or write data, only to keep data usage down. Losing the dedup table does not render the pool unusable, it just messes up the accounting somewhat. This adds a dmu_ddt_copies tuneable. When set to 0, the existing behaviour is used. When set higher, dedup table blocks (ZAP and log) will have this many copies rather than the usual 3, while indirect blocks will have one more again. This is a tuneable for now mostly for testing. Losing a dedup table can cause blocks to be leaked, and we currently have no facilities to repair that. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895
Adds a log/journal to dedup. At the end of txg, instead of writing the entry directly to the ZAP, instead its adding to an in-memory tree and appended to an on-disk object. The on-disk object is only read at import, to reload the in-memory tree. Lookups first go the the log tree before going to the ZAP, so recently-used entries will remain close by in memory. This vastly reduces overhead from dedup IO, as it will not have to do so many read/update/write cycles on ZAP leaf nodes. A flushing facility is added at end of txg, to push logged entries out to the ZAP. There's actually two separate "logs" (in-memory tree and on-disk object), one active (recieving updated entries) and one flushing (writing out to disk). These are swapped (ie flushing begins) based on memory used by the in-memory log trees and time since we last flushed something. The flushing facility monitors the amount of entries coming in and being flushed out, and calibrates itself to try to flush enough each txg to keep up with the ingest rate without competing too much with other IO. Multiple tuneables are provided to control the flushing facility. All the histograms and stats are update to accomodate the log as a separate entry store. zdb gains knowledge of how to count them and dump them. Documentation included! Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895
The dedup log does not have a stable cursor, so its not possible to persist our current scan location within it across pool reloads. Beccause of this, when walking (scanning), we can't treat it like just another source of dedup entries. Instead, when a scan is wanted, we switch to an aggressive flushing mode, pushing out entries older than the scan start txg as fast as we can, before starting the scan proper. Entries after the scan start txg will be handled via other methods; the DDT ZAPs and logs will be written as normal, and blocks not seen yet will be offered to the scan machinery as normal. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Allan Jude <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895
Adds per-DDT stats counting lookups and where they were serviced from (either log or backing zap), number of log entries in memory, and flow rates. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895
Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #15895
`l2arc_mfuonly` was added to avoid wasting L2 ARC on read-once MRU data and metadata. However it can be useful to cache as much metadata as possible while, at the same time, restricting data cache to MFU buffers only. This patch allow for such behavior by setting `l2arc_mfuonly` to 2 (or higher). The list of possible values is the following: 0: cache both MRU and MFU for both data and metadata; 1: cache only MFU for both data and metadata; 2: cache both MRU and MFU for metadata, but only MFU for data. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes #16343 Closes #16402
Skip ro check for snapshots since they are always ro regardless if ro flag is passed by mount or not. This allows multi-mounting snapshots without requiring to specify ro flag. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #16299
For spl-taskq to use the kstats infrastructure, it has to be available first. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Syneto Closes #16171
This exposes a variety of per-taskq stats under /proc/spl/kstat/taskq, one file per taskq, named for the taskq name.instance. These include a small amount of info about the taskq config, the current state of the threads and queues, and various counters for thread and queue activity since the taskq was created. To assist with decrementing queue size counters, the list an entry is on is encoded in spare bits in the entry flags. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Syneto Closes #16171
This adds /proc/spl/kstats/taskq/summary, which attempts to show a useful subset of stats for all taskqs in the system. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Syneto Closes #16171
These had minimal useful information for the admin, didn't work properly in some places, and knew far too much about taskq internals. With the new stats available, these should never be needed anymore. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Syneto Closes #16171
In kernels 6.8 and later, the zvol block device is allocated with qlimits passed during initialization. However, the zvol driver does not set `max_hw_discard_sectors`, which is necessary to properly initialize `max_discard_sectors`. This causes the `zvol_misc_trim` test to fail on 6.8+ kernels when invoking the `blkdiscard` command. Setting `max_hw_discard_sectors` in the `HAVE_BLK_ALLOC_DISK_2ARG` case resolve the issue. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #16462
Rob Noris suggested that we could clean up redundant limits for the case of non-blk mq scenario. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes #16462
zfs_arc_shrinker_limit (default: 10000) avoids ARC collapse due to excessive memory reclaim. However, when the kernel is in direct reclaim mode (ie: low on memory), limiting ARC reclaim increases OOM risk. This is especially true on system without (or with inadequate) swap. This patch ignores zfs_arc_shrinker_limit when the kernel is in direct reclaim mode, avoiding most OOM. It also restores "echo 3 > /proc/sys/vm/drop_caches" ability to correctly drop (almost) all ARC. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Adam Moss <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes #16313
Nothing ever checks it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
Makes it harder to use memory debuggers like valgrind directly, because they can't see canary overruns. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
The Linux abd_os.c serves double-duty as the userspace scatter abd implementation, by carrying an emulation of kernel scatterlists. This commit lifts common and userspace-specific parts out into a separate abd_os.c for libzpool. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
Removing the platform #ifdefs from shared headers in favour of per-platform headers. Makes abd_t much leaner, among other things. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
This is intended to be a simple userspace scatter abd based on struct iovec. It's not very sophisticated as-is, but sets a base for something much more interesting. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #16253
Update the META file to reflect compatibility with the 6.10 kernel. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16466
This allows a simple "wrapping" ABD for an existing linear buffer to be allocated on the stack, avoiding an allocation. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
This will make future refactoring easier. There are two we can't change for the moment, because zio_compress_data does hole detection & collapsing which zio_decompress_data does not actually know how to handle. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
This is updating zstream to use the zio_compress calls rather than using its own dispatch. Since that was fairly entangled, some refactoring included. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
Nothing uses it anymore! Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
This is mostly to make searching easier. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
This commit changes the provider compress and decompress API to take ABD pointers instead of buffer pointers for both data source and destination. It then updates all providers to match. This doesn't actually change the providers to do chunked compression, just changes the API to allow such an update in the future. Helper macros are added to easily adapt the ABD functions to their buffer-based implementations. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
This commit changes the frontend zio_compress_data and zio_decompress_data APIs to take ABD points instead of buffer pointers. All callers are updated to match. Any that already have an appropriate ABD nearby now use it directly, while at the rest we create an one. Internally, the ABDs are passed through to the provider directly. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
Some callers (eg `do_corrective_recv()`) pass in a dest buffer much smaller than the wanted 87.5% of the source buffer, because the incoming abd is larger than the source data and they "know" what the decompressed size with be. However, `abd_borrow_buf()` rightly asserts if we try to borrow more than is available, so these callers fail. Previously when all we had was a dest buffer, we didn't know how big it was, so we couldn't do anything. Now we have a dest abd, with a size, so we can clamp dest size to the abd size. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
The STALE state means the L2T entry is valid in hardware but needs to be refreshed (ARP/NDP) in software. But stop/suspend wipes the hardware L2T and STALE entries need to be updated just like VALID entries to match actual hardware state. Fixes: c1c5248 cxgbe/t4_tom: Implement uld_stop and uld_restart for ULD_TOM. MFC after: 1 week Sponsored by: Chelsio Communications
The destination queue for tracing filters is destroyed during stop or suspend and the software state needs to reflect this. A new destination queue will be setup when the adapter resumes operation. MFC after: 1 week Sponsored by: Chelsio Communications
Follow the path of what is done with bsnmp, build the modules along with the main binary, this allows to build the modules at a moment where all needed libraries are already built and available in the linker path instead of having to declare all the libraries which a flua module will be linked to in _prebuild_libs. Discused with: markj Reviewed by: markj, jrtc27, kevans, imp Accepted by: kevans, imp Differential Revision: https://reviews.freebsd.org/D46610
When we install the tunneling function we had the ovpn lock, and then took the UDP lock. During normal data flow we are called with the UDP lock held and then take the ovpn lock. This naturally produces a lock order reversal warning. Avoid this by releasing the ovpn lock before installing the tunnel function. This is safe, in that installing the tunnel function does not fail (other than with EBUSY, which would mean another thread has already installed the function). On cleanup the problem is more difficult, in that we cannot reasonably release the ovpn lock before we can remove the tunneling function callback. Solve this by delaying the removal of the tunnel callback until the ovpn_softc is cleaned up. It's still safe for ovpn_udp_input() to be caled when all peers are removed. That will only increment counters (which are still allocated), discover there are no peers and then pass the message on to userspace, if any userspace users of the socket remain. We ensure that the socket object remains valid by holding a reference, which we release when we remove the ovpn_softc. This removes the need for per-peer reference counting on the socket, so remove that. Reviewed by: zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision:: https://reviews.freebsd.org/D46616
Add a central table of modes and loop over it rather than spelling out 10 essentialy identical strcmp if statemnts. Use the stable to generate usage as well reducing the number of ifdefs. Disallow multiple -m options. Previouly multiple were allowed, but only the last one was used and there was no indication this happened. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D46426
Fixes: 211bdd6 Add kcmp(2) userspace bits
When moving the freebsd.sys.linker sources the installation path was lost. Fixes 7899f91
Rather than attempt to install the tunnel callback every time we add a peer only do so the first time. Reviewed by: zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D46651
If the ZFS key is setup in prompt mode, use zfs to prompt to load the key during boot to unlock it. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D36081
PR: 281460 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")
The function ng_ipfw_input() used to enjoy implicit 32->16 bits truncation of its second argument. Make it explicit to recover from the breakage. PR: 281082 Reported by: Ruben van Staveren <[email protected]> Tested by: Ruben van Staveren <[email protected]> MFC after: 3 days Fixes: 20e1f20
Ignoring page_pools with the few needed adjustments and ignoring 7622 mt7615 seems to build as well. Add it so once we can connect it to the build people can start testing and debugging. (The actual work was done on a newer version of the mt76 drivers but it seems the to-build-changes equally apply here already). Requested by: Radu-Cristian Fotescu (freebsd-wireless, 2024-07-31) Sponsored by: The FreeBSD Foundation MFC after: 3 days
Add pci_err() as a wrapper to dev_err() as needed by an updated driver. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D46660
Add more fields required by updated wireless drivers to mhi.h. Sponsored by: The FreeBSD Foundation MFC after: 3 days
Add new enums to netdevice.h (including one which is referenced but no value of it is used in a driver so we have to add a "dummy" value to avoid an empty enum). Sponsored by: The FreeBSD Foundation MFC after: 3 days
Add changes required for later mt76 drivers. Sponsored by: The FreeBSD Foundation MFC after: 3 days
Upstream new defines, enum values, etc. for coming driver updates which are non-conflicting with the current state. The only notable change is the rename of the enum ieee80211_ap_reg_power but the enum name had not been used so far by any driver in the tree (only in mac80211.h) but an updated version of ath11k does use it so we need to correct our initial naming. Sponsored by: The FreeBSD Foundation MFC after: 3 days
Combine two loops, each iterating over the same array of pages to initialize them, into a single loop. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D46609
Rearrange the IMX clock control module driver so it is more straight forward to support clock trees from other SOCs in the family. Move the existing imx8mq_ccm driver to a more generic imx_ccm (based on rk_cru) and update the previous driver to sub class imx_ccm. Reviewed by: manu Sponsored by: The FreeBSD Foundations Differential Revision: https://reviews.freebsd.org/D46641
Add clock tree for imx8mp SOC. This provides clocks sufficient for several sub systems to work including USB and SD/MMC. Reviewed by: manu Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46642
Define a pctrie iterator type. A pctrie iterator is a wrapper around a pctrie that remembers a position in the trie where the last search left off, and where a new search can resume. When the next search is for an item very near in the trie to where the last search left off, iter-based search is faster because instead of starting from the root, the search usually only has to back up one or two steps up the root-to-last-search path to find the branch that leads to the new search target. Every kind of lookup (plain, lookup_ge, lookup_le) that can begin with the trie root can begin with an iterator instead. An iterator can also do a relative search ("look for the item 4 greater than the last item I found") because it remembers where that last search ended. It can also search within limits ("look for the item bigger than this one, but it has to be less than 100"), which can save time when the next item beyond the limits and that is known before we actually know what that item it is. An iterator can also be used to remove an item that has already been found, without having to search for it again. Iterators are vulnerable to unsynchronized data changes. If the iterator is created with a lock held, and that lock is released and acquired again, there's no guarantee that the iterator path remains valid. Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D45627
Reviewed by: sjg Approved by: kp MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D46644
This is a straight up typo! Differential Revision: https://reviews.freebsd.org/D46504
…ction frames) * add the MMIC element ID * add a comment showing the source of this table from the 802.11-2016 specification. Differential Revision: https://reviews.freebsd.org/D46505
While this approach works for trapping reads of an uninitialized pointer, it means that any attempt to store to the variable triggers a KASAN report, which is not what we want. Simply remove the kasan_mark() call. KMSAN will catch these kinds of bugs automatically anyway. Reported by: [email protected] MFC after: 1 week
Fixes: 47112d3 ("kassert: Remove KASAN marking from DEBUG_POISON_POINTER")
ps3 are broken since we moved to clang/elfv2. Fix this by updating the hypercall glue to the new ABI. Signed-off-by: Chattrapat Sangmanee <[email protected]> Reviewed by: jhibbits MFC after: 1 week Pull Request: freebsd/freebsd-src#1413
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR for CI