OpenZFS/src 645b833include/sys spa_impl.h zio.h, man/man4 zfs.4

Improve write issue taskqs utilization

- Reduce number of allocators on small system down to one per 4
CPU cores, keeping maximum at 4 on 16+ core systems. Small systems
should not have the lock contention multiple allocators supposed
to solve, while having several metaslabs open and modified each
TXG is not free.
 - Reduce number of write issue taskqs down to one per 16 CPU
cores and an integer fraction of number of allocators.  On mid-
sized systems, where multiple allocators already make sense, too
many write issue taskqs may reduce write speed on single-file
workloads, since single file is handled by only one taskq to
reduce fragmentation. On large systems, that can actually benefit
from many taskq's better IOPS, the bottleneck is less important,
since in worst case there will be at least 16 cores to handle it.
 - Distribute dnodes between allocators (and taskqs) in a round-
robin fashion instead of relying on sync taskqs to be balanced.
The last is not guarantied and may depend on scheduling.
 - Remove io_wr_iss_tq from struct zio.  io_allocator is enough.

    [4 lines not shown]
DeltaFile
+52-29module/zfs/spa.c
+15-10man/man4/zfs.4
+19-3module/zfs/spa_misc.c
+8-1include/sys/spa_impl.h
+0-3include/sys/zio.h
+2-0include/sys/spa.h
+96-462 files not shown
+98-478 files

OpenZFS/src 8fd3a5dmodule/zfs dmu_objset.c

Slightly improve dnode hash

As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16131 
DeltaFile
+3-3module/zfs/dmu_objset.c
+3-31 files

OpenZFS/src 051460bconfig user-libunwind.m4 user.m4, lib/libspl assert.c Makefile.am

libspl/assert: use libunwind for backtrace when available

libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+44-0config/user-libunwind.m4
+32-1lib/libspl/assert.c
+2-2lib/libspl/Makefile.am
+1-0config/user.m4
+79-34 files

OpenZFS/src 2152c40config user-backtrace.m4 user.m4, lib/libspl assert.c Makefile.am

libspl/assert: dump backtrace in assert

Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+20-0lib/libspl/assert.c
+14-0config/user-backtrace.m4
+2-0lib/libspl/Makefile.am
+1-0config/user.m4
+37-04 files

OpenZFS/src dec697alib/libspl assert.c

libspl/assert: add lock around assertion output

If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+6-0lib/libspl/assert.c
+6-01 files

OpenZFS/src 3948002config user.m4, lib/libspl assert.c

libspl/assert: show process/task details in assert output

Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+34-2lib/libspl/assert.c
+1-1config/user.m4
+35-32 files

OpenZFS/src 4429ad9include/sys zfs_context.h, lib/libzpool kernel.c taskq.c

libzpool: set thread names

Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <robn at despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
DeltaFile
+4-4include/sys/zfs_context.h
+4-1lib/libzpool/kernel.c
+2-2lib/libzpool/taskq.c
+10-73 files

OpenZFS/src 7ac00d3config find_system_library.m4

find_system_library: fix var cleanup when library not found

The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no

    [11 lines not shown]
DeltaFile
+2-2config/find_system_library.m4
+2-21 files

OpenZFS/src a6edc0amodule/zfs zio.c

zio: try to execute TYPE_NULL ZIOs on the current task

Many TYPE_NULL ZIOs are used to provide a sync point for child ZIOs, and
do not do any actual work themselves. However, they are still dispatched
to a dedicated, single-thread taskq, which leads to their execution
being entirely task switch and dequeue overhead for no actual reason.

This commit changes it so that when selecting a parent ZIO to execute,
if the parent is TYPE_NULL and has no done function (that is, no
additional work), it is executed on the same thread. This reduces task
switches and frees up CPU cores for other work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
Closes #16134 
DeltaFile
+6-4module/zfs/zio.c
+6-41 files

OpenZFS/src c3f2f1ainclude/sys uberblock_impl.h, module/zfs spa.c vdev.c

vdev probe to slow disk can stall mmp write checker

Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Don Brady <don.brady at klarasystems.com>
Closes #15839 
DeltaFile
+84-18module/zfs/spa.c
+97-0tests/zfs-tests/tests/functional/mmp/mmp_write_slow_disk.ksh
+13-9module/zfs/vdev.c
+8-8include/sys/uberblock_impl.h
+9-0module/zfs/txg.c
+6-3module/zfs/zfs_ioctl.c
+217-3810 files not shown
+242-5216 files

OpenZFS/src b28461bcmd arcstat.in

Fix arcstats for FreeBSD after zfetch support

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
Closes #16141 
DeltaFile
+8-2cmd/arcstat.in
+8-21 files

OpenZFS/src db499e6lib/libzfs libzfs_dataset.c

Overflowing refreservation is bad

Someone came to me and pointed out that you could pretty
readily cause the refreservation calculation to exceed
2**64, given the 2**17 multiplier in it, and produce
refreservations wildly less than the actual volsize in cases where
it should have failed.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain at gmail.com>
Closes #15996 
DeltaFile
+14-1lib/libzfs/libzfs_dataset.c
+14-11 files

OpenZFS/src 4840f02cmd/zpool/os/linux zpool_vdev_os.c, lib/libuutil uu_list.c

GCC: Fixes for gcc 14 on Fedora 40

- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Tony Hutter <hutter2 at llnl.gov>
Closes #16124
Closes #16125 
DeltaFile
+10-4lib/libuutil/uu_list.c
+3-2module/zfs/vdev_raidz.c
+1-1cmd/zpool/os/linux/zpool_vdev_os.c
+14-73 files

OpenZFS/src 21bc066module/os/freebsd/zfs zvol_os.c, module/os/linux/zfs zvol_os.c

Fix updating the zvol_htable when renaming a zvol

When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by:   Axcient
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk at axcient.com>
Signed-off-by:  Alan Somers <asomers at FreeBSD.org>
Closes #16127
Closes #16128 
DeltaFile
+1-1module/os/freebsd/zfs/zvol_os.c
+1-1module/os/linux/zfs/zvol_os.c
+2-22 files

OpenZFS/src 317b31econfig ax_python_devel.m4 always-pyzfs.m4, contrib/debian control

Python 3.12 deprecated python3-distutils

As for python-3.12 the distutils package has been deprecated.
The latest ax_python_devel.m4 macro from the autoconf archive
has been updated accordingly so let's pull in the new version.

We can also drop the changes made to our customized version
to continue if the development version is not installed since
this functionality has been included upstream.

Reviewed-by: Rich Ercolani <rincebrain at gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Closes #16126
Closes #16129
DeltaFile
+229-112config/ax_python_devel.m4
+5-4config/always-pyzfs.m4
+1-1contrib/debian/control
+235-1173 files

OpenZFS/src 5044c4eman/man4 zfs.4, module/zfs zap.c

Fast Dedup: ZAP Shrinking

This allows ZAPs to shrink. When there are two empty sibling leafs,
one of them is collapsed and its storage space is reused.
This improved performance on directories that at one time contained
a large number of files, but many or all of those files have since
been deleted.

This also applies to all other types of ZAPs as well.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Alexander Motin <mav at FreeBSD.org>
Signed-off-by: Alexander Stetsenko <alex.stetsenko at klarasystems.com>
Closes #15888 
DeltaFile
+328-8module/zfs/zap.c
+81-0tests/zfs-tests/tests/functional/zap_shrink/zap_shrink_001_pos.ksh
+35-0tests/zfs-tests/tests/functional/zap_shrink/setup.ksh
+34-0tests/zfs-tests/tests/functional/zap_shrink/cleanup.ksh
+3-4man/man4/zfs.4
+4-0tests/runfiles/common.run
+485-121 files not shown
+488-127 files

OpenZFS/src 67d1399man/man4 zfs.4, module/zfs spa.c

Make more taskq parameters writable

There is no reason for these module parameters to be read-only.
Being modified they just apply on next pool import/creation, that
is useful for testing different values.

Reviewed-by: Rich Ercolani <rincebrain at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16118 
DeltaFile
+7-2man/man4/zfs.4
+4-4module/zfs/spa.c
+11-62 files

OpenZFS/src 1f940demodule/zfs arc.c

L2ARC: Cleanup buffer re-compression

When compressed ARC is disabled, we may have to re-compress when
writing into L2ARC.  If doing so we can't fit it into the original
physical size, we should just fail immediately, since even if it
may still fit into allocation size, its checksum will never match.

While there, refactor the code similar to other compression places
without using abd_return_buf_copy().

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16038 
DeltaFile
+20-39module/zfs/arc.c
+20-391 files

OpenZFS/src 87d81d1rpm/redhat zfs-kmod.spec.in

zfs-kmod: fix empty rpm requires/conflicts

Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to
include the correct RPM requires/conflicts relationships.  With this
change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms
packages to be installed together.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Todd Seidelmann <18294602+seidelma at users.noreply.github.com>
Closes #16121 
DeltaFile
+1-1rpm/redhat/zfs-kmod.spec.in
+1-11 files

OpenZFS/src 4036b8dmodule/zfs dbuf.c

Refactor dbuf_read() for safer decryption

In dbuf_read_verify_dnode_crypt():
 - We don't need original dbuf locked there. Instead take a lock
on a dnode dbuf, that is actually manipulated.
 - Block decryption for a dnode dbuf if it is currently being
written.  ARC hash lock does not protect anonymous buffers, so
arc_untransform() is unsafe when used on buffers being written,
that may happen in case of encrypted dnode buffers, since they
are not copied by dbuf_dirty()/dbuf_hold_copy().

In dbuf_read():
 - If the buffer is in flight, recheck its compression/encryption
status after it is cached, since it may need arc_untransform().

Tested-by: Rich Ercolani <rincebrain at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16104 
DeltaFile
+108-114module/zfs/dbuf.c
+108-1141 files

OpenZFS/src c346068cmd/zfs zfs_main.c, man/man8 zfs-set.8

zfs get: add '-t fs' and '-t vol' options

Make `zfs get` accept `fs` for `filesystem` and `vol` for `volume`.

Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Ryan <errornointernet at envs.net>
Closes #16117 
DeltaFile
+16-6cmd/zfs/zfs_main.c
+10-1man/man8/zfs-set.8
+26-72 files

OpenZFS/src 7e52795cmd ztest.c

ztest: use ASSERT3P to compare pointers

With a sufficiently modern gcc (I saw this with gcc13), gcc complains
when casting pointers to an integer of a different type (even a larger
one).  On 32-bt ASSERT3U does this on 32-bit systems by casting a 32-bit
pointer to uint64_t so use ASSERT3P which uses uintptr_t.

Fixes: 5caeef02fa53 RAID-Z expansion feature

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis at sri.com>
Closes #16115 
DeltaFile
+1-1cmd/ztest.c
+1-11 files

OpenZFS/src cdae59etests/zfs-tests/tests/functional/user_namespace user_namespace_004.ksh

ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi at google.com>
Closes #16114 
DeltaFile
+2-2tests/zfs-tests/tests/functional/user_namespace/user_namespace_004.ksh
+2-21 files

OpenZFS/src 9b43d7bcmd/zpool zpool_main.c

Add newline to two zpool messages

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi at google.com>
Closes #16113 
DeltaFile
+2-2cmd/zpool/zpool_main.c
+2-21 files

OpenZFS/src c183d16cmd/zinject zinject.c, cmd/zpool zpool_main.c

Parallel pool import

This commit allow spa_load() to drop the spa_namespace_lock so
that imports can happen concurrently. Prior to dropping the
spa_namespace_lock, the import logic will set the spa_load_thread
value to track the thread which is doing the import.

Consumers of spa_lookup() retain the same behavior by blocking
when either a thread is holding the spa_namespace_lock or the
spa_load_thread value is set. This will ensure that critical
concurrent operations cannot take place while a pool is being
imported.

The zpool command is also enhanced to provide multi-threaded support
when invoking zpool import -a.

Lastly, zinject provides a mechanism to insert artificial delays
when importing a pool and new zfs tests are added to verify parallel
import functionality.

    [4 lines not shown]
DeltaFile
+165-0tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_parallel_admin.ksh
+129-9module/zfs/zio_inject.c
+137-0tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_parallel_pos.ksh
+130-0tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_parallel_neg.ksh
+106-9cmd/zinject/zinject.c
+55-17cmd/zpool/zpool_main.c
+722-3513 files not shown
+818-7219 files

OpenZFS/src f4f1561module/os/linux/zfs abd_os.c

abd_iter_page: rework to handle multipage scatterlists

Previously, abd_iter_page() would assume that every scatterlist would
contain a single page (compound or no), because that's all we ever
create in abd_alloc_chunks(). However, scatterlists can contain multiple
pages of arbitrary provenance, and if we get one of those, we'd get all
the math wrong.

This reworks things to handle multiple pages in a scatterlist, by
properly finding the right page within it for the given offset, and
understanding better where the end of the page is and not crossing it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Brian Atkinson <batkinson at lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
Closes #16108 
DeltaFile
+74-46module/os/linux/zfs/abd_os.c
+74-461 files

OpenZFS/src 9f83eecmodule/zfs zio.c

Handle FLUSH errors as "expected"

Before #16061 zio_vdev_io_done() was not used for FLUSH requests.
Addition of it triggers reprobe each TXG for vdevs not supporting
them.  Since those errors are often expected, they are normally
handled by individual vdev drivers and should be ignored here.

Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Rob Norris <rob.norris at klarasystems.com>
Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16110 
DeltaFile
+2-1module/zfs/zio.c
+2-11 files

OpenZFS/src 26d49fetests/zfs-tests/tests/functional/quota quota.kshlib quota_001_pos.ksh

tests/quota: consistently clear quota property between tests

When run in isolation, quota_005_pos would fail in cleanup because it
would attempt restore the previous quota, which was 0, and so get an
error (because you can't set quota to '0', you have to use 'none').

It worked as part of the quota tag set because the previous tests did
not clean up their quota, so there was always a non-zero quota to return
to.

This adds a simple quota reset function, and has all quota tests run it
at cleanup. For the ones that weren't cleaning up, they now do, and for
quota_005_pos, which was trying to do the right thing, it now just
resets it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>

    [2 lines not shown]
DeltaFile
+7-0tests/zfs-tests/tests/functional/quota/quota.kshlib
+2-0tests/zfs-tests/tests/functional/quota/quota_001_pos.ksh
+2-0tests/zfs-tests/tests/functional/quota/quota_002_pos.ksh
+2-0tests/zfs-tests/tests/functional/quota/quota_003_pos.ksh
+2-0tests/zfs-tests/tests/functional/quota/quota_004_pos.ksh
+1-1tests/zfs-tests/tests/functional/quota/quota_005_pos.ksh
+16-11 files not shown
+17-27 files

OpenZFS/src f75574ctests/zfs-tests/tests/functional/quota quota_005_pos.ksh

tests/quota_005_pos: use a long int for doubling the quota size

When run in isolation, quota_005_pos would see an empty ~300G dataset.
Doubling it's space overflows a int32, which meant it was trying to then
set the quota to a negative value, and would fail.

When run as part of the quota tests, the filesystem appears to have
stuff in it, and so a lower available space, which doesn't overflow, and
so succeeds.

The bare minimum fix seems to be to use a int64 for the available space,
so it can be comfortably doubled. Here it is.

(Also a typo fix and a tiny bit of cleanup).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: George Melikov <mail at gmelikov.ru>

    [2 lines not shown]
DeltaFile
+3-4tests/zfs-tests/tests/functional/quota/quota_005_pos.ksh
+3-41 files

OpenZFS/src 72e4996include/os/linux/kernel/linux blkdev_compat.h

bdev_discard_supported: understand discard_granularity=0

Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux at b35fd7422c2f) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See #16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux at 3c407dc723bb), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when

    [10 lines not shown]
DeltaFile
+4-2include/os/linux/kernel/linux/blkdev_compat.h
+4-21 files