Список изменений в ядре 5.15.154

ACPI: CPPC: Use access_width over bit_width for system memory accesses [+ + +]

Author: Jarred White <[email protected]>
Date:   Fri Mar 1 11:25:59 2024 -0800

    ACPI: CPPC: Use access_width over bit_width for system memory accesses
    
    commit 2f4a4d63a193be6fd530d180bb13c3592052904c upstream.
    
    To align with ACPI 6.3+, since bit_width can be any 8-bit value, it
    cannot be depended on to be always on a clean 8b boundary. This was
    uncovered on the Cobalt 100 platform.
    
    SError Interrupt on CPU26, code 0xbe000011 -- SError
     CPU: 26 PID: 1510 Comm: systemd-udevd Not tainted 5.15.2.1-13 #1
     Hardware name: MICROSOFT CORPORATION, BIOS MICROSOFT CORPORATION
     pstate: 62400009 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
     pc : cppc_get_perf_caps+0xec/0x410
     lr : cppc_get_perf_caps+0xe8/0x410
     sp : ffff8000155ab730
     x29: ffff8000155ab730 x28: ffff0080139d0038 x27: ffff0080139d0078
     x26: 0000000000000000 x25: ffff0080139d0058 x24: 00000000ffffffff
     x23: ffff0080139d0298 x22: ffff0080139d0278 x21: 0000000000000000
     x20: ffff00802b251910 x19: ffff0080139d0000 x18: ffffffffffffffff
     x17: 0000000000000000 x16: ffffdc7e111bad04 x15: ffff00802b251008
     x14: ffffffffffffffff x13: ffff013f1fd63300 x12: 0000000000000006
     x11: ffffdc7e128f4420 x10: 0000000000000000 x9 : ffffdc7e111badec
     x8 : ffff00802b251980 x7 : 0000000000000000 x6 : ffff0080139d0028
     x5 : 0000000000000000 x4 : ffff0080139d0018 x3 : 00000000ffffffff
     x2 : 0000000000000008 x1 : ffff8000155ab7a0 x0 : 0000000000000000
     Kernel panic - not syncing: Asynchronous SError Interrupt
     CPU: 26 PID: 1510 Comm: systemd-udevd Not tainted
    5.15.2.1-13 #1
     Hardware name: MICROSOFT CORPORATION, BIOS MICROSOFT CORPORATION
     Call trace:
      dump_backtrace+0x0/0x1e0
      show_stack+0x24/0x30
      dump_stack_lvl+0x8c/0xb8
      dump_stack+0x18/0x34
      panic+0x16c/0x384
      add_taint+0x0/0xc0
      arm64_serror_panic+0x7c/0x90
      arm64_is_fatal_ras_serror+0x34/0xa4
      do_serror+0x50/0x6c
      el1h_64_error_handler+0x40/0x74
      el1h_64_error+0x7c/0x80
      cppc_get_perf_caps+0xec/0x410
      cppc_cpufreq_cpu_init+0x74/0x400 [cppc_cpufreq]
      cpufreq_online+0x2dc/0xa30
      cpufreq_add_dev+0xc0/0xd4
      subsys_interface_register+0x134/0x14c
      cpufreq_register_driver+0x1b0/0x354
      cppc_cpufreq_init+0x1a8/0x1000 [cppc_cpufreq]
      do_one_initcall+0x50/0x250
      do_init_module+0x60/0x27c
      load_module+0x2300/0x2570
      __do_sys_finit_module+0xa8/0x114
      __arm64_sys_finit_module+0x2c/0x3c
      invoke_syscall+0x78/0x100
      el0_svc_common.constprop.0+0x180/0x1a0
      do_el0_svc+0x84/0xa0
      el0_svc+0x2c/0xc0
      el0t_64_sync_handler+0xa4/0x12c
      el0t_64_sync+0x1a4/0x1a8
    
    Instead, use access_width to determine the size and use the offset and
    width to shift and mask the bits to read/write out. Make sure to add a
    check for system memory since pcc redefines the access_width to
    subspace id.
    
    If access_width is not set, then fall back to using bit_width.
    
    Signed-off-by: Jarred White <[email protected]>
    Reviewed-by: Easwar Hariharan <[email protected]>
    Cc: 5.15+ <[email protected]> # 5.15+
    [ rjw: Subject and changelog edits, comment adjustments ]
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    [ eahariha: Backport to v5.15 by dropping SystemIO bits as
      commit a2c8f92bea5f is not present ]
    Signed-off-by: Easwar Hariharan <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields() [+ + +]

Author: Nikita Kiryushin <[email protected]>
Date:   Fri Mar 22 21:07:53 2024 +0300

    ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
    
    [ Upstream commit 40e2710860e57411ab57a1529c5a2748abbe8a19 ]
    
    ACPICA commit 9061cd9aa131205657c811a52a9f8325a040c6c9
    
    Errors in acpi_evaluate_object() can lead to incorrect state of buffer.
    
    This can lead to access to data in previously ACPI_FREEd buffer and
    secondary ACPI_FREE to the same buffer later.
    
    Handle errors in acpi_evaluate_object the same way it is done earlier
    with acpi_ns_handle_to_pathname.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Link: https://github.com/acpica/acpica/commit/9061cd9a
    Fixes: 5fd033288a86 ("ACPICA: debugger: add command to dump all fields of particular subtype")
    Signed-off-by: Nikita Kiryushin <[email protected]>
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ahci: asm1064: asm1166: don't limit reported ports [+ + +]

Author: Conrad Kostecki <[email protected]>
Date:   Wed Mar 13 22:46:50 2024 +0100

    ahci: asm1064: asm1166: don't limit reported ports
    
    [ Upstream commit 6cd8adc3e18960f6e59d797285ed34ef473cc896 ]
    
    Previously, patches have been added to limit the reported count of SATA
    ports for asm1064 and asm1166 SATA controllers, as those controllers do
    report more ports than physically having.
    
    While it is allowed to report more ports than physically having in CAP.NP,
    it is not allowed to report more ports than physically having in the PI
    (Ports Implemented) register, which is what these HBAs do.
    (This is a AHCI spec violation.)
    
    Unfortunately, it seems that the PMP implementation in these ASMedia HBAs
    is also violating the AHCI and SATA-IO PMP specification.
    
    What these HBAs do is that they do not report that they support PMP
    (CAP.SPM (Supports Port Multiplier) is not set).
    
    Instead, they have decided to add extra "virtual" ports in the PI register
    that is used if a port multiplier is connected to any of the physical
    ports of the HBA.
    
    Enumerating the devices behind the PMP as specified in the AHCI and
    SATA-IO specifications, by using PMP READ and PMP WRITE commands to the
    physical ports of the HBA is not possible, you have to use the "virtual"
    ports.
    
    This is of course bad, because this gives us no way to detect the device
    and vendor ID of the PMP actually connected to the HBA, which means that
    we can not apply the proper PMP quirks for the PMP that is connected to
    the HBA.
    
    Limiting the port map will thus stop these controllers from working with
    SATA Port Multipliers.
    
    This patch reverts both patches for asm1064 and asm1166, so old behavior
    is restored and SATA PMP will work again, but it will also reintroduce the
    (minutes long) extra boot time for the ASMedia controllers that do not
    have a PMP connected (either on the PCIe card itself, or an external PMP).
    
    However, a longer boot time for some, is the lesser evil compared to some
    other users not being able to detect their drives at all.
    
    Fixes: 0077a504e1a4 ("ahci: asm1166: correct count of reported ports")
    Fixes: 9815e3961754 ("ahci: asm1064: correct count of reported ports")
    Cc: [email protected]
    Reported-by: Matt <[email protected]>
    Signed-off-by: Conrad Kostecki <[email protected]>
    Reviewed-by: Hans de Goede <[email protected]>
    [cassel: rewrote commit message]
    Signed-off-by: Niklas Cassel <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ahci: asm1064: correct count of reported ports [+ + +]

Author: Andrey Jr. Melnikov <[email protected]>
Date:   Wed Feb 14 17:57:57 2024 +0100

    ahci: asm1064: correct count of reported ports
    
    [ Upstream commit 9815e39617541ef52d0dfac4be274ad378c6dc09 ]
    
    The ASM1064 SATA host controller always reports wrongly,
    that it has 24 ports. But in reality, it only has four ports.
    
    before:
    ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
    ahci 0000:04:00.0: AHCI 0001.0301 32 slots 24 ports 6 Gbps 0xffff0f impl SATA mode
    ahci 0000:04:00.0: flags: 64bit ncq sntf stag pm led only pio sxs deso sadm sds apst
    
    after:
    ahci 0000:04:00.0: ASM1064 has only four ports
    ahci 0000:04:00.0: forcing port_map 0xffff0f -> 0xf
    ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
    ahci 0000:04:00.0: AHCI 0001.0301 32 slots 24 ports 6 Gbps 0xf impl SATA mode
    ahci 0000:04:00.0: flags: 64bit ncq sntf stag pm led only pio sxs deso sadm sds apst
    
    Signed-off-by: "Andrey Jr. Melnikov" <[email protected]>
    Signed-off-by: Niklas Cassel <[email protected]>
    Stable-dep-of: 6cd8adc3e189 ("ahci: asm1064: asm1166: don't limit reported ports")
    Signed-off-by: Sasha Levin <[email protected]>

ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform [+ + +]

Author: Kailang Yang <[email protected]>
Date:   Fri Mar 1 15:29:50 2024 +0800

    ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
    
    commit d397b6e56151099cf3b1f7bfccb204a6a8591720 upstream.
    
    Headset Mic will no show at resume back.
    This patch will fix this issue.
    
    Fixes: d7f32791a9fc ("ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platform")
    Cc: <[email protected]>
    Signed-off-by: Kailang Yang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone [+ + +]

Author: I Gede Agastya Darma Laksana <[email protected]>
Date:   Tue Apr 2 00:46:02 2024 +0700

    ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
    
    commit 1576f263ee2147dc395531476881058609ad3d38 upstream.
    
    This patch addresses an issue with the Panasonic CF-SZ6's existing quirk,
    specifically its headset microphone functionality. Previously, the quirk
    used ALC269_FIXUP_HEADSET_MODE, which does not support the CF-SZ6's design
    of a single 3.5mm jack for both mic and audio output effectively. The
    device uses pin 0x19 for the headset mic without jack detection.
    
    Following verification on the CF-SZ6 and discussions with the original
    patch author, i determined that the update to
    ALC269_FIXUP_ASPIRE_HEADSET_MIC is the appropriate solution. This change
    is custom-designed for the CF-SZ6's unique hardware setup, which includes
    a single 3.5mm jack for both mic and audio output, connecting the headset
    microphone to pin 0x19 without the use of jack detection.
    
    Fixes: 0fca97a29b83 ("ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk")
    Signed-off-by: I Gede Agastya Darma Laksana <[email protected]>
    Cc: <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs [+ + +]

Author: Duoming Zhou <[email protected]>
Date:   Tue Mar 26 17:42:38 2024 +0800

    ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
    
    commit 051e0840ffa8ab25554d6b14b62c9ab9e4901457 upstream.
    
    The dreamcastcard->timer could schedule the spu_dma_work and the
    spu_dma_work could also arm the dreamcastcard->timer.
    
    When the snd_pcm_substream is closing, the aica_channel will be
    deallocated. But it could still be dereferenced in the worker
    thread. The reason is that del_timer() will return directly
    regardless of whether the timer handler is running or not and
    the worker could be rescheduled in the timer handler. As a result,
    the UAF bug will happen. The racy situation is shown below:
    
          (Thread 1)                 |      (Thread 2)
    snd_aicapcm_pcm_close()          |
     ...                             |  run_spu_dma() //worker
                                     |    mod_timer()
      flush_work()                   |
      del_timer()                    |  aica_period_elapsed() //timer
      kfree(dreamcastcard->channel)  |    schedule_work()
                                     |  run_spu_dma() //worker
      ...                            |    dreamcastcard->channel-> //USE
    
    In order to mitigate this bug and other possible corner cases,
    call mod_timer() conditionally in run_spu_dma(), then implement
    PCM sync_stop op to cancel both the timer and worker. The sync_stop
    op will be called from PCM core appropriately when needed.
    
    Fixes: 198de43d758c ("[ALSA] Add ALSA support for the SEGA Dreamcast PCM device")
    Suggested-by: Takashi Iwai <[email protected]>
    Signed-off-by: Duoming Zhou <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arch: Introduce CONFIG_FUNCTION_ALIGNMENT [+ + +]

Author: Peter Zijlstra <[email protected]>
Date:   Wed Mar 13 07:42:51 2024 -0300

    arch: Introduce CONFIG_FUNCTION_ALIGNMENT
    
    commit d49a0626216b95cd4bf696f6acf55f39a16ab0bb upstream.
    
    Generic function-alignment infrastructure.
    
    Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the
    FUNCTION_ALIGNMENT symbol is then set to the largest such selected
    size, 0 otherwise.
    
    >From this the -falign-functions compiler argument and __ALIGN macro
    are set.
    
    This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future
    alignment requirements for x86_64 (later in this series) into a single
    place.
    
    NOTE: also removes the 0x90 filler byte from the generic __ALIGN
          primitive, that value makes no sense outside of x86.
    
    NOTE: .balign 0 reverts to a no-op.
    
    Requested-by: Linus Torvalds <[email protected]>
    Change-Id: I053b3c408d56988381feb8c8bdb5e27ea221755f
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [cascardo: adjust context at arch/x86/Kconfig]
    Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken [+ + +]

Author: Johan Hovold <[email protected]>
Date:   Wed Mar 20 08:55:52 2024 +0100

    arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
    
    commit e12e28009e584c8f8363439f6a928ec86278a106 upstream.
    
    Several Qualcomm Bluetooth controllers lack persistent storage for the
    device address and instead one can be provided by the boot firmware
    using the 'local-bd-address' devicetree property.
    
    The Bluetooth bindings clearly states that the address should be
    specified in little-endian order, but due to a long-standing bug in the
    Qualcomm driver which reversed the address some boot firmware has been
    providing the address in big-endian order instead.
    
    The boot firmware in SC7180 Trogdor Chromebooks is known to be affected
    so mark the 'local-bd-address' property as broken to maintain backwards
    compatibility with older firmware when fixing the underlying driver bug.
    
    Note that ChromeOS always updates the kernel and devicetree in lockstep
    so that there is no need to handle backwards compatibility with older
    devicetrees.
    
    Fixes: 7ec3e67307f8 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt")
    Cc: [email protected]      # 5.10
    Cc: Rob Clark <[email protected]>
    Reviewed-by: Douglas Anderson <[email protected]>
    Signed-off-by: Johan Hovold <[email protected]>
    Acked-by: Bjorn Andersson <[email protected]>
    Reviewed-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm: dts: marvell: Fix maxium->maxim typo in brownstone dts [+ + +]

Author: Duje Mihanoviд┤ <[email protected]>
Date:   Thu Jan 25 19:39:32 2024 +0100

    arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
    
    [ Upstream commit 831e0cd4f9ee15a4f02ae10b67e7fdc10eb2b4fc ]
    
    Fix an obvious spelling error in the PMIC compatible in the MMP2
    Brownstone DTS file.
    
    Fixes: 58f1193e6210 ("mfd: max8925: Add dts")
    Cc: <[email protected]>
    Signed-off-by: Duje Mihanoviд┤ <[email protected]>
    Reported-by: Krzysztof Kozlowski <[email protected]>
    Closes: https://lore.kernel.org/linux-devicetree/[email protected]/
    Reviewed-by: Andrew Lunn <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [krzysztof: Just 10 years to take a patch, not bad! Rephrased commit
     msg]
    Signed-off-by: Krzysztof Kozlowski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw [+ + +]

Author: Stephen Lee <[email protected]>
Date:   Mon Mar 25 18:01:31 2024 -0700

    ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
    
    [ Upstream commit fc563aa900659a850e2ada4af26b9d7a3de6c591 ]
    
    In snd_soc_info_volsw(), mask is generated by figuring out the index of
    the most significant bit set in max and converting the index to a
    bitmask through bit shift 1. Unintended wraparound occurs when max is an
    integer value with msb bit set. Since the bit shift value 1 is treated
    as an integer type, the left shift operation will wraparound and set
    mask to 0 instead of all 1's. In order to fix this, we type cast 1 as
    `1ULL` to prevent the wraparound.
    
    Fixes: 7077148fb50a ("ASoC: core: Split ops out of soc-core.c")
    Signed-off-by: Stephen Lee <[email protected]>
    Link: https://msgid.link/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rt5682-sdw: fix locking sequence [+ + +]

Author: Pierre-Louis Bossart <[email protected]>
Date:   Mon Mar 25 17:18:12 2024 -0500

    ASoC: rt5682-sdw: fix locking sequence
    
    [ Upstream commit 310a5caa4e861616a27a83c3e8bda17d65026fa8 ]
    
    The disable_irq_lock protects the 'disable_irq' value, we need to lock
    before testing it.
    
    Fixes: 02fb23d72720 ("ASoC: rt5682-sdw: fix for JD event handling in ClockStop Mode0")
    Signed-off-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Chao Song <[email protected]>
    Link: https://msgid.link/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rt711-sdca: fix locking sequence [+ + +]

Author: Pierre-Louis Bossart <[email protected]>
Date:   Mon Mar 25 17:18:13 2024 -0500

    ASoC: rt711-sdca: fix locking sequence
    
    [ Upstream commit ee287771644394d071e6a331951ee8079b64f9a7 ]
    
    The disable_irq_lock protects the 'disable_irq' value, we need to lock
    before testing it.
    
    Fixes: 23adeb7056ac ("ASoC: rt711-sdca: fix for JD event handling in ClockStop Mode0")
    Signed-off-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Chao Song <[email protected]>
    Link: https://msgid.link/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rt711-sdw: fix locking sequence [+ + +]

Author: Pierre-Louis Bossart <[email protected]>
Date:   Mon Mar 25 17:18:14 2024 -0500

    ASoC: rt711-sdw: fix locking sequence
    
    [ Upstream commit aae86cfd8790bcc7693a5a0894df58de5cb5128c ]
    
    The disable_irq_lock protects the 'disable_irq' value, we need to lock
    before testing it.
    
    Fixes: b69de265bd0e ("ASoC: rt711: fix for JD event handling in ClockStop Mode0")
    Signed-off-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Chao Song <[email protected]>
    Link: https://msgid.link/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ata: sata_mv: Fix PCI device ID table declaration compilation warning [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Wed Apr 3 10:06:48 2024 +0200

    ata: sata_mv: Fix PCI device ID table declaration compilation warning
    
    [ Upstream commit 3137b83a90646917c90951d66489db466b4ae106 ]
    
    Building with W=1 shows a warning for an unused variable when CONFIG_PCI
    is diabled:
    
    drivers/ata/sata_mv.c:790:35: error: unused variable 'mv_pci_tbl' [-Werror,-Wunused-const-variable]
    static const struct pci_device_id mv_pci_tbl[] = {
    
    Move the table into the same block that containsn the pci_driver
    definition.
    
    Fixes: 7bb3c5290ca0 ("sata_mv: Remove PCI dependency")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Damien Le Moal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Tue Mar 26 15:53:37 2024 +0100

    ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
    
    [ Upstream commit 52f80bb181a9a1530ade30bc18991900bbb9697f ]
    
    gcc warns about a memcpy() with overlapping pointers because of an
    incorrect size calculation:
    
    In file included from include/linux/string.h:369,
                     from drivers/ata/sata_sx4.c:66:
    In function 'memcpy_fromio',
        inlined from 'pdc20621_get_from_dimm.constprop' at drivers/ata/sata_sx4.c:962:2:
    include/linux/fortify-string.h:97:33: error: '__builtin_memcpy' accessing 4294934464 bytes at offsets 0 and [16, 16400] overlaps 6442385281 bytes at offset -2147450817 [-Werror=restrict]
       97 | #define __underlying_memcpy     __builtin_memcpy
          |                                 ^
    include/linux/fortify-string.h:620:9: note: in expansion of macro '__underlying_memcpy'
      620 |         __underlying_##op(p, q, __fortify_size);                        \
          |         ^~~~~~~~~~~~~
    include/linux/fortify-string.h:665:26: note: in expansion of macro '__fortify_memcpy_chk'
      665 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
          |                          ^~~~~~~~~~~~~~~~~~~~
    include/asm-generic/io.h:1184:9: note: in expansion of macro 'memcpy'
     1184 |         memcpy(buffer, __io_virt(addr), size);
          |         ^~~~~~
    
    The problem here is the overflow of an unsigned 32-bit number to a
    negative that gets converted into a signed 'long', keeping a large
    positive number.
    
    Replace the complex calculation with a more readable min() variant
    that avoids the warning.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Damien Le Moal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

block: Clear zone limits for a non-zoned stacked queue [+ + +]

Author: Damien Le Moal <[email protected]>
Date:   Thu Feb 22 22:17:23 2024 +0900

    block: Clear zone limits for a non-zoned stacked queue
    
    [ Upstream commit c8f6f88d25929ad2f290b428efcae3b526f3eab0 ]
    
    Device mapper may create a non-zoned mapped device out of a zoned device
    (e.g., the dm-zoned target). In such case, some queue limit such as the
    max_zone_append_sectors and zone_write_granularity endup being non zero
    values for a block device that is not zoned. Avoid this by clearing
    these limits in blk_stack_limits() when the stacked zoned limit is
    false.
    
    Fixes: 3093a479727b ("block: inherit the zoned characteristics in blk_stack_limits")
    Cc: [email protected]
    Signed-off-by: Damien Le Moal <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Bluetooth: Fix TOCTOU in HCI debugfs implementation [+ + +]

Author: Bastien Nocera <[email protected]>
Date:   Wed Mar 27 15:24:56 2024 +0100

    Bluetooth: Fix TOCTOU in HCI debugfs implementation
    
    commit 7835fcfd132eb88b87e8eb901f88436f63ab60f7 upstream.
    
    struct hci_dev members conn_info_max_age, conn_info_min_age,
    le_conn_max_interval, le_conn_min_interval, le_adv_max_interval,
    and le_adv_min_interval can be modified from the HCI core code, as well
    through debugfs.
    
    The debugfs implementation, that's only available to privileged users,
    will check for boundaries, making sure that the minimum value being set
    is strictly above the maximum value that already exists, and vice-versa.
    
    However, as both minimum and maximum values can be changed concurrently
    to us modifying them, we need to make sure that the value we check is
    the value we end up using.
    
    For example, with ->conn_info_max_age set to 10, conn_info_min_age_set()
    gets called from vfs handlers to set conn_info_min_age to 8.
    
    In conn_info_min_age_set(), this goes through:
            if (val == 0 || val > hdev->conn_info_max_age)
                    return -EINVAL;
    
    Concurrently, conn_info_max_age_set() gets called to set to set the
    conn_info_max_age to 7:
            if (val == 0 || val > hdev->conn_info_max_age)
                    return -EINVAL;
    That check will also pass because we used the old value (10) for
    conn_info_max_age.
    
    After those checks that both passed, the struct hci_dev access
    is mutex-locked, disabling concurrent access, but that does not matter
    because the invalid value checks both passed, and we'll end up with
    conn_info_min_age = 8 and conn_info_max_age = 7
    
    To fix this problem, we need to lock the structure access before so the
    check and assignment are not interrupted.
    
    This fix was originally devised by the BassCheck[1] team, and
    considered the problem to be an atomicity one. This isn't the case as
    there aren't any concerns about the variable changing while we check it,
    but rather after we check it parallel to another change.
    
    This patch fixes CVE-2024-24858 and CVE-2024-24857.
    
    [1] https://sites.google.com/view/basscheck/
    
    Co-developed-by: Gui-Dong Han <[email protected]>
    Signed-off-by: Gui-Dong Han <[email protected]>
    Link: https://lore.kernel.org/linux-bluetooth/[email protected]/
    Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24858
    Link: https://lore.kernel.org/linux-bluetooth/[email protected]/
    Link: https://lore.kernel.org/linux-bluetooth/[email protected]/
    Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24857
    Fixes: 31ad169148df ("Bluetooth: Add conn info lifetime parameters to debugfs")
    Fixes: 729a1051da6f ("Bluetooth: Expose default LE advertising interval via debugfs")
    Fixes: 71c3b60ec6d2 ("Bluetooth: Move BR/EDR debugfs file creation into hci_debugfs.c")
    Signed-off-by: Bastien Nocera <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Bluetooth: hci_event: set the conn encrypted before conn establishes [+ + +]

Author: Hui Wang <[email protected]>
Date:   Wed Mar 27 12:30:30 2024 +0800

    Bluetooth: hci_event: set the conn encrypted before conn establishes
    
    commit c569242cd49287d53b73a94233db40097d838535 upstream.
    
    We have a BT headset (Lenovo Thinkplus XT99), the pairing and
    connecting has no problem, once this headset is paired, bluez will
    remember this device and will auto re-connect it whenever the device
    is powered on. The auto re-connecting works well with Windows and
    Android, but with Linux, it always fails. Through debugging, we found
    at the rfcomm connection stage, the bluetooth stack reports
    "Connection refused - security block (0x0003)".
    
    For this device, the re-connecting negotiation process is different
    from other BT headsets, it sends the Link_KEY_REQUEST command before
    the CONNECT_REQUEST completes, and it doesn't send ENCRYPT_CHANGE
    command during the negotiation. When the device sends the "connect
    complete" to hci, the ev->encr_mode is 1.
    
    So here in the conn_complete_evt(), if ev->encr_mode is 1, link type
    is ACL and HCI_CONN_ENCRYPT is not set, we set HCI_CONN_ENCRYPT to
    this conn, and update conn->enc_key_size accordingly.
    
    After this change, this BT headset could re-connect with Linux
    successfully. This is the btmon log after applying the patch, after
    receiving the "Connect Complete" with "Encryption: Enabled", will send
    the command to read encryption key size:
    > HCI Event: Connect Request (0x04) plen 10
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
            Class: 0x240404
              Major class: Audio/Video (headset, speaker, stereo, video, vcr)
              Minor class: Wearable Headset Device
              Rendering (Printing, Speaker)
              Audio (Speaker, Microphone, Headset)
            Link type: ACL (0x01)
    ...
    > HCI Event: Link Key Request (0x17) plen 6
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
    < HCI Command: Link Key Request Reply (0x01|0x000b) plen 22
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
            Link key: ${32-hex-digits-key}
    ...
    > HCI Event: Connect Complete (0x03) plen 11
            Status: Success (0x00)
            Handle: 256
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
            Link type: ACL (0x01)
            Encryption: Enabled (0x01)
    < HCI Command: Read Encryption Key... (0x05|0x0008) plen 2
            Handle: 256
    < ACL Data TX: Handle 256 flags 0x00 dlen 10
          L2CAP: Information Request (0x0a) ident 1 len 2
            Type: Extended features supported (0x0002)
    > HCI Event: Command Complete (0x0e) plen 7
          Read Encryption Key Size (0x05|0x0008) ncmd 1
            Status: Success (0x00)
            Handle: 256
            Key size: 16
    
    Cc: [email protected]
    Link: https://github.com/bluez/bluez/issues/704
    Reviewed-by: Paul Menzel <[email protected]>
    Reviewed-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Hui Wang <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

bounds: support non-power-of-two CONFIG_NR_CPUS [+ + +]

Author: Matthew Wilcox (Oracle) <[email protected]>
Date:   Tue Oct 10 15:55:49 2023 +0100

    bounds: support non-power-of-two CONFIG_NR_CPUS
    
    [ Upstream commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a ]
    
    ilog2() rounds down, so for example when PowerPC 85xx sets CONFIG_NR_CPUS
    to 24, we will only allocate 4 bits to store the number of CPUs instead of
    5.  Use bits_per() instead, which rounds up.  Found by code inspection.
    The effect of this would probably be a misaccounting when doing NUMA
    balancing, so to a user, it would only be a performance penalty.  The
    effects may be more wide-spread; it's hard to tell.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
    Fixes: 90572890d202 ("mm: numa: Change page last {nid,pid} into {cpu,pid}")
    Reviewed-by: Rik van Riel <[email protected]>
    Acked-by: Mel Gorman <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf, sockmap: Prevent lock inversion deadlock in map delete elem [+ + +]

Author: Jakub Sitnicki <[email protected]>
Date:   Tue Apr 2 12:46:21 2024 +0200

    bpf, sockmap: Prevent lock inversion deadlock in map delete elem
    
    commit ff91059932401894e6c86341915615c5eb0eca48 upstream.
    
    syzkaller started using corpuses where a BPF tracing program deletes
    elements from a sockmap/sockhash map. Because BPF tracing programs can be
    invoked from any interrupt context, locks taken during a map_delete_elem
    operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
    is possible, as reported by lockdep:
    
           CPU0                    CPU1
           ----                    ----
      lock(&htab->buckets[i].lock);
                                   local_irq_disable();
                                   lock(&host->lock);
                                   lock(&htab->buckets[i].lock);
      <Interrupt>
        lock(&host->lock);
    
    Locks in sockmap are hardirq-unsafe by design. We expects elements to be
    deleted from sockmap/sockhash only in task (normal) context with interrupts
    enabled, or in softirq context.
    
    Detect when map_delete_elem operation is invoked from a context which is
    _not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
    error.
    
    Note that map updates are not affected by this issue. BPF verifier does not
    allow updating sockmap/sockhash from a BPF tracing program today.
    
    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Reported-by: xingwei lee <[email protected]>
    Reported-by: yue sun <[email protected]>
    Reported-by: [email protected]
    Reported-by: [email protected]
    Signed-off-by: Jakub Sitnicki <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Tested-by: [email protected]
    Acked-by: John Fastabend <[email protected]>
    Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e
    Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466
    Link: https://lore.kernel.org/bpf/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

bpf: Protect against int overflow for stack access size [+ + +]

Author: Andrei Matei <[email protected]>
Date:   Tue Mar 26 22:42:45 2024 -0400

    bpf: Protect against int overflow for stack access size
    
    [ Upstream commit ecc6a2101840177e57c925c102d2d29f260d37c8 ]
    
    This patch re-introduces protection against the size of access to stack
    memory being negative; the access size can appear negative as a result
    of overflowing its signed int representation. This should not actually
    happen, as there are other protections along the way, but we should
    protect against it anyway. One code path was missing such protections
    (fixed in the previous patch in the series), causing out-of-bounds array
    accesses in check_stack_range_initialized(). This patch causes the
    verification of a program with such a non-sensical access size to fail.
    
    This check used to exist in a more indirect way, but was inadvertendly
    removed in a833a17aeac7.
    
    Fixes: a833a17aeac7 ("bpf: Fix verification of indirect var-off stack access")
    Reported-by: [email protected]
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/bpf/CAADnVQLORV5PT0iTAhRER+iLBTkByCYNBYyvBSgjN1T31K+gOw@mail.gmail.com/
    Acked-by: Andrii Nakryiko <[email protected]>
    Signed-off-by: Andrei Matei <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: fix off-by-one chunk length calculation at contains_pending_extent() [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Feb 29 10:37:04 2024 +0000

    btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
    
    [ Upstream commit ae6bd7f9b46a29af52ebfac25d395757e2031d0d ]
    
    At contains_pending_extent() the value of the end offset of a chunk we
    found in the device's allocation state io tree is inclusive, so when
    we calculate the length we pass to the in_range() macro, we must sum
    1 to the expression "physical_end - physical_offset".
    
    In practice the wrong calculation should be harmless as chunks sizes
    are never 1 byte and we should never have 1 byte ranges of unallocated
    space. Nevertheless fix the wrong calculation.
    
    Reported-by: Alex Lyakas <[email protected]>
    Link: https://lore.kernel.org/linux-btrfs/CAOcd+r30e-f4R-5x-S7sV22RJPe7+pgwherA6xqN2_qe7o4XTg@mail.gmail.com/
    Fixes: 1c11b63eff2a ("btrfs: replace pending/pinned chunks lists with io tree")
    CC: [email protected] # 6.1+
    Reviewed-by: Josef Bacik <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: zoned: use zone aware sb location for scrub [+ + +]

Author: Johannes Thumshirn <[email protected]>
Date:   Mon Feb 26 16:39:13 2024 +0100

    btrfs: zoned: use zone aware sb location for scrub
    
    commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.
    
    At the moment scrub_supers() doesn't grab the super block's location via
    the zoned device aware btrfs_sb_log_location() but via btrfs_sb_offset().
    
    This leads to checksum errors on 'scrub' as we're not accessing the
    correct location of the super block.
    
    So use btrfs_sb_log_location() for getting the super blocks location on
    scrub.
    
    Reported-by: WA AM <[email protected]>
    Link: http://lore.kernel.org/linux-btrfs/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com
    CC: [email protected] # 5.15+
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Naohiro Aota <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays [+ + +]

Author: Gabor Juhos <[email protected]>
Date:   Thu Feb 29 19:07:47 2024 +0100

    clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays
    
    [ Upstream commit cdbc6e2d8108bc47895e5a901cfcaf799b00ca8d ]
    
    The frequency table arrays are supposed to be terminated with an
    empty element. Add such entry to the end of the arrays where it
    is missing in order to avoid possible out-of-bound access when
    the table is traversed by functions like qcom_find_freq() or
    qcom_find_freq_floor().
    
    Only compile tested.
    
    Fixes: d9db07f088af ("clk: qcom: Add ipq6018 Global Clock Controller support")
    Signed-off-by: Gabor Juhos <[email protected]>
    Reviewed-by: Stephen Boyd <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays [+ + +]

Author: Gabor Juhos <[email protected]>
Date:   Thu Feb 29 19:07:48 2024 +0100

    clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
    
    [ Upstream commit 1040ef5ed95d6fd2628bad387d78a61633e09429 ]
    
    The frequency table arrays are supposed to be terminated with an
    empty element. Add such entry to the end of the arrays where it
    is missing in order to avoid possible out-of-bound access when
    the table is traversed by functions like qcom_find_freq() or
    qcom_find_freq_floor().
    
    Only compile tested.
    
    Fixes: 9607f6224b39 ("clk: qcom: ipq8074: add PCIE, USB and SDCC clocks")
    Signed-off-by: Gabor Juhos <[email protected]>
    Reviewed-by: Stephen Boyd <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

clk: qcom: gcc-sdm845: Add soft dependency on rpmhpd [+ + +]

Author: Amit Pundir <[email protected]>
Date:   Tue Jan 23 11:58:14 2024 +0530

    clk: qcom: gcc-sdm845: Add soft dependency on rpmhpd
    
    [ Upstream commit 1d9054e3a4fd36e2949e616f7360bdb81bcc1921 ]
    
    With the addition of RPMh power domain to the GCC node in
    device tree, we noticed a significant delay in getting the
    UFS driver probed on AOSP which futher led to mount failures
    because Android do not support rootwait. So adding a soft
    dependency on RPMh power domain which informs modprobe to
    load rpmhpd module before gcc-sdm845.
    
    Cc: [email protected] # v5.4+
    Fixes: 4b6ea15c0a11 ("arm64: dts: qcom: sdm845: Add missing RPMh power domain to GCC")
    Suggested-by: Manivannan Sadhasivam <[email protected]>
    Signed-off-by: Amit Pundir <[email protected]>
    Reviewed-by: Manivannan Sadhasivam <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays [+ + +]

Author: Gabor Juhos <[email protected]>
Date:   Thu Feb 29 19:07:51 2024 +0100

    clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
    
    [ Upstream commit a903cfd38d8dee7e754fb89fd1bebed99e28003d ]
    
    The frequency table arrays are supposed to be terminated with an
    empty element. Add such entry to the end of the arrays where it
    is missing in order to avoid possible out-of-bound access when
    the table is traversed by functions like qcom_find_freq() or
    qcom_find_freq_floor().
    
    Only compile tested.
    
    Fixes: 2b46cd23a5a2 ("clk: qcom: Add APQ8084 Multimedia Clock Controller (MMCC) support")
    Signed-off-by: Gabor Juhos <[email protected]>
    Reviewed-by: Stephen Boyd <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays [+ + +]

Author: Gabor Juhos <[email protected]>
Date:   Thu Feb 29 19:07:52 2024 +0100

    clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
    
    [ Upstream commit e2c02a85bf53ae86d79b5fccf0a75ac0b78e0c96 ]
    
    The frequency table arrays are supposed to be terminated with an
    empty element. Add such entry to the end of the arrays where it
    is missing in order to avoid possible out-of-bound access when
    the table is traversed by functions like qcom_find_freq() or
    qcom_find_freq_floor().
    
    Only compile tested.
    
    Fixes: d8b212014e69 ("clk: qcom: Add support for MSM8974's multimedia clock controller (MMCC)")
    Signed-off-by: Gabor Juhos <[email protected]>
    Reviewed-by: Stephen Boyd <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

clocksource/drivers/arm_global_timer: Fix maximum prescaler value [+ + +]

Author: Martin Blumenstingl <[email protected]>
Date:   Sun Feb 18 18:41:37 2024 +0100

    clocksource/drivers/arm_global_timer: Fix maximum prescaler value
    
    [ Upstream commit b34b9547cee41575a4fddf390f615570759dc999 ]
    
    The prescaler in the "Global Timer Control Register bit assignments" is
    documented to use bits [15:8], which means that the maximum prescaler
    register value is 0xff.
    
    Fixes: 171b45a4a70e ("clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes")
    Signed-off-by: Martin Blumenstingl <[email protected]>
    Signed-off-by: Daniel Lezcano <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

cpufreq: brcmstb-avs-cpufreq: fix up "add check for cpufreq_cpu_get's return value" [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Wed Mar 27 15:21:45 2024 +0100

    cpufreq: brcmstb-avs-cpufreq: fix up "add check for cpufreq_cpu_get's return value"
    
    In commit d951cf510fb0 ("cpufreq: brcmstb-avs-cpufreq: add check for
    cpufreq_cpu_get's return value"), build warnings occur because a
    variable is created after some logic, resulting in:
    
    drivers/cpufreq/brcmstb-avs-cpufreq.c: In function 'brcm_avs_cpufreq_get':
    drivers/cpufreq/brcmstb-avs-cpufreq.c:486:9: error: ISO C90 forbids mixed
    declarations and code [-Werror=declaration-after-statement]
      486 |         struct private_data *priv = policy->driver_data;
          |         ^~~~~~
    cc1: all warnings being treated as errors
    make[2]: *** [scripts/Makefile.build:289:
    drivers/cpufreq/brcmstb-avs-cpufreq.o] Error 1
    make[1]: *** [scripts/Makefile.build:552: drivers/cpufreq] Error 2
    make[1]: *** Waiting for unfinished jobs....
    make: *** [Makefile:1907: drivers] Error 2
    
    Fix this up.
    
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/stable/20240327015023.GC7502@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/T/#m15bff0fe96986ef780e848b4fff362bf8ea03f08
    Reported-by: Harshit Mogalapalli <[email protected]>
    Reported-by: Linux Kernel Functional Testing <[email protected]>
    Fixes: d951cf510fb0 ("cpufreq: brcmstb-avs-cpufreq: add check for cpufreq_cpu_get's return value")
    Cc: Anastasia Belova <[email protected]>
    Cc: Viresh Kumar <[email protected]>
    Cc: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

cpufreq: dt: always allocate zeroed cpumask [+ + +]

Author: Marek Szyprowski <[email protected]>
Date:   Thu Mar 14 13:54:57 2024 +0100

    cpufreq: dt: always allocate zeroed cpumask
    
    [ Upstream commit d2399501c2c081eac703ca9597ceb83c7875a537 ]
    
    Commit 0499a78369ad ("ARM64: Dynamically allocate cpumasks and increase
    supported CPUs to 512") changed the handling of cpumasks on ARM 64bit,
    what resulted in the strange issues and warnings during cpufreq-dt
    initialization on some big.LITTLE platforms.
    
    This was caused by mixing OPPs between big and LITTLE cores, because
    OPP-sharing information between big and LITTLE cores is computed on
    cpumask, which in turn was not zeroed on allocation. Fix this by
    switching to zalloc_cpumask_var() call.
    
    Fixes: dc279ac6e5b4 ("cpufreq: dt: Refactor initialization to handle probe deferral properly")
    CC: [email protected] # v5.10+
    Signed-off-by: Marek Szyprowski <[email protected]>
    Reviewed-by: Christoph Lameter (Ampere) <[email protected]>
    Reviewed-by: Dhruva Gole <[email protected]>
    Signed-off-by: Viresh Kumar <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

crypto: qat - fix double free during reset [+ + +]

Author: Svyatoslav Pankratov <[email protected]>
Date:   Mon Oct 9 13:27:19 2023 +0100

    crypto: qat - fix double free during reset
    
    [ Upstream commit 01aed663e6c421aeafc9c330bda630976b50a764 ]
    
    There is no need to free the reset_data structure if the recovery is
    unsuccessful and the reset is synchronous. The function
    adf_dev_aer_schedule_reset() handles the cleanup properly. Only
    asynchronous resets require such structure to be freed inside the reset
    worker.
    
    Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
    Signed-off-by: Svyatoslav Pankratov <[email protected]>
    Signed-off-by: Giovanni Cabiddu <[email protected]>
    Signed-off-by: Herbert Xu <[email protected]>
    Stable-dep-of: 7d42e097607c ("crypto: qat - resolve race condition during AER recovery")
    Signed-off-by: Sasha Levin <[email protected]>

crypto: qat - resolve race condition during AER recovery [+ + +]

Author: Damian Muszynski <[email protected]>
Date:   Fri Feb 9 13:43:42 2024 +0100

    crypto: qat - resolve race condition during AER recovery
    
    [ Upstream commit 7d42e097607c4d246d99225bf2b195b6167a210c ]
    
    During the PCI AER system's error recovery process, the kernel driver
    may encounter a race condition with freeing the reset_data structure's
    memory. If the device restart will take more than 10 seconds the function
    scheduling that restart will exit due to a timeout, and the reset_data
    structure will be freed. However, this data structure is used for
    completion notification after the restart is completed, which leads
    to a UAF bug.
    
    This results in a KFENCE bug notice.
    
      BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat]
      Use-after-free read at 0x00000000bc56fddf (in kfence-#142):
      adf_device_reset_worker+0x38/0xa0 [intel_qat]
      process_one_work+0x173/0x340
    
    To resolve this race condition, the memory associated to the container
    of the work_struct is freed on the worker if the timeout expired,
    otherwise on the function that schedules the worker.
    The timeout detection can be done by checking if the caller is
    still waiting for completion or not by using completion_done() function.
    
    Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
    Cc: <[email protected]>
    Signed-off-by: Damian Muszynski <[email protected]>
    Reviewed-by: Giovanni Cabiddu <[email protected]>
    Signed-off-by: Herbert Xu <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

dm integrity: fix out-of-range warning [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Thu Mar 28 15:30:39 2024 +0100

    dm integrity: fix out-of-range warning
    
    [ Upstream commit 8e91c2342351e0f5ef6c0a704384a7f6fc70c3b2 ]
    
    Depending on the value of CONFIG_HZ, clang complains about a pointless
    comparison:
    
    drivers/md/dm-integrity.c:4085:12: error: result of comparison of
                            constant 42949672950 with expression of type
                            'unsigned int' is always false
                            [-Werror,-Wtautological-constant-out-of-range-compare]
                            if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
    
    As the check remains useful for other configurations, shut up the
    warning by adding a second type cast to uint64_t.
    
    Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Reviewed-by: Mikulas Patocka <[email protected]>
    Reviewed-by: Justin Stitt <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

dm snapshot: fix lockup in dm_exception_table_exit [+ + +]

Author: Mikulas Patocka <[email protected]>
Date:   Wed Mar 20 18:43:11 2024 +0100

    dm snapshot: fix lockup in dm_exception_table_exit
    
    [ Upstream commit 6e7132ed3c07bd8a6ce3db4bb307ef2852b322dc ]
    
    There was reported lockup when we exit a snapshot with many exceptions.
    Fix this by adding "cond_resched" to the loop that frees the exceptions.
    
    Reported-by: John Pittman <[email protected]>
    Cc: [email protected]
    Signed-off-by: Mikulas Patocka <[email protected]>
    Signed-off-by: Mike Snitzer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

dm-raid: fix lockdep waring in "pers->hot_add_disk" [+ + +]

Author: Yu Kuai <[email protected]>
Date:   Tue Mar 5 15:23:06 2024 +0800

    dm-raid: fix lockdep waring in "pers->hot_add_disk"
    
    [ Upstream commit 95009ae904b1e9dca8db6f649f2d7c18a6e42c75 ]
    
    The lockdep assert is added by commit a448af25becf ("md/raid10: remove
    rcu protection to access rdev from conf") in print_conf(). And I didn't
    notice that dm-raid is calling "pers->hot_add_disk" without holding
    'reconfig_mutex'.
    
    "pers->hot_add_disk" read and write many fields that is protected by
    'reconfig_mutex', and raid_resume() already grab the lock in other
    contex. Hence fix this problem by protecting "pers->host_add_disk"
    with the lock.
    
    Fixes: 9092c02d9435 ("DM RAID: Add ability to restore transiently failed devices on resume")
    Fixes: a448af25becf ("md/raid10: remove rcu protection to access rdev from conf")
    Cc: [email protected] # v6.7+
    Signed-off-by: Yu Kuai <[email protected]>
    Signed-off-by: Xiao Ni <[email protected]>
    Acked-by: Mike Snitzer <[email protected]>
    Signed-off-by: Song Liu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

dma-iommu: add iommu_dma_opt_mapping_size() [+ + +]

Author: John Garry <[email protected]>
Date:   Thu Jul 14 19:15:25 2022 +0800

    dma-iommu: add iommu_dma_opt_mapping_size()
    
    [ Upstream commit 6d9870b7e5def2450e21316515b9efc0529204dd ]
    
    Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
    allows the drivers to know the optimal mapping limit and thus limit the
    requested IOVA lengths.
    
    This value is based on the IOVA rcache range limit, as IOVAs allocated
    above this limit must always be newly allocated, which may be quite slow.
    
    Signed-off-by: John Garry <[email protected]>
    Reviewed-by: Damien Le Moal <[email protected]>
    Acked-by: Robin Murphy <[email protected]>
    Acked-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Stable-dep-of: afc5aa46ed56 ("iommu/dma: Force swiotlb_max_mapping_size on an untrusted device")
    Signed-off-by: Sasha Levin <[email protected]>

dma-mapping: add dma_opt_mapping_size() [+ + +]

Author: John Garry <[email protected]>
Date:   Thu Jul 14 19:15:24 2022 +0800

    dma-mapping: add dma_opt_mapping_size()
    
    [ Upstream commit a229cc14f3395311b899e5e582b71efa8dd01df0 ]
    
    Streaming DMA mapping involving an IOMMU may be much slower for larger
    total mapping size. This is because every IOMMU DMA mapping requires an
    IOVA to be allocated and freed. IOVA sizes above a certain limit are not
    cached, which can have a big impact on DMA mapping performance.
    
    Provide an API for device drivers to know this "optimal" limit, such that
    they may try to produce mapping which don't exceed it.
    
    Signed-off-by: John Garry <[email protected]>
    Reviewed-by: Damien Le Moal <[email protected]>
    Acked-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Stable-dep-of: afc5aa46ed56 ("iommu/dma: Force swiotlb_max_mapping_size on an untrusted device")
    Signed-off-by: Sasha Levin <[email protected]>

dnotify: use fsnotify group lock helpers [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:21 2022 +0300

    dnotify: use fsnotify group lock helpers
    
    [ Upstream commit aabb45fdcb31f00f1e7cae2bce83e83474a87c03 ]
    
    Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette")
    nfsd would close open files in direct reclaim context.  There is no
    guarantee that others memory shrinkers don't do the same and no
    guarantee that future shrinkers won't do that.
    
    For example, if overlayfs implements inode cache of fscache would
    keep open files to cached objects, inode shrinkers could end up closing
    open files to underlying fs.
    
    Direct reclaim from dnotify mark allocation context may try to close
    open files that have dnotify marks of the same group and hit a deadlock
    on mark_mutex.
    
    Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim
    from allocations under dnotify group lock and use the safe group lock
    helpers.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

docs: Document the FAN_FS_ERROR event [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:46 2021 -0300

    docs: Document the FAN_FS_ERROR event
    
    [ Upstream commit c0baf9ac0b05d53dfe0436661dbdc5e43c01c5e0 ]
    
    Document the FAN_FS_ERROR event for user administrators and user space
    developers.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

Documentation/hw-vuln: Add documentation for RFDS [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:11:25 2024 -0700

    Documentation/hw-vuln: Add documentation for RFDS
    
    commit 4e42765d1be01111df0c0275bbaf1db1acef346e upstream.
    
    Add the documentation for transient execution vulnerability Register
    File Data Sampling (RFDS) that affects Intel Atom CPUs.
    
      [ pawan: s/ATOM_GRACEMONT/ALDERLAKE_N/ ]
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Acked-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Documentation/hw-vuln: Update spectre doc [+ + +]

Author: Lin Yujun <[email protected]>
Date:   Tue Aug 30 20:36:14 2022 +0800

    Documentation/hw-vuln: Update spectre doc
    
    commit 06cb31cc761823ef444ba4e1df11347342a6e745 upstream.
    
    commit 7c693f54c873691 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
    
    adds the "ibrs " option  in
    Documentation/admin-guide/kernel-parameters.txt but omits it to
    Documentation/admin-guide/hw-vuln/spectre.rst, add it.
    
    Signed-off-by: Lin Yujun <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jonathan Corbet <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Documentation: Add missing documentation for EXPORT_OP flags [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Aug 25 15:04:23 2023 -0400

    Documentation: Add missing documentation for EXPORT_OP flags
    
    [ Upstream commit b38a6023da6a12b561f0421c6a5a1f7624a1529c ]
    
    The commits that introduced these flags neglected to update the
    Documentation/filesystems/nfs/exporting.rst file.
    
    Signed-off-by: Chuck Lever <[email protected]>

driver core: Introduce device_link_wait_removal() [+ + +]

Author: Herve Codina <[email protected]>
Date:   Mon Mar 25 16:21:25 2024 +0100

    driver core: Introduce device_link_wait_removal()
    
    commit 0462c56c290a99a7f03e817ae5b843116dfb575c upstream.
    
    The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
    introduces a workqueue to release the consumer and supplier devices used
    in the devlink.
    In the job queued, devices are release and in turn, when all the
    references to these devices are dropped, the release function of the
    device itself is called.
    
    Nothing is present to provide some synchronisation with this workqueue
    in order to ensure that all ongoing releasing operations are done and
    so, some other operations can be started safely.
    
    For instance, in the following sequence:
      1) of_platform_depopulate()
      2) of_overlay_remove()
    
    During the step 1, devices are released and related devlinks are removed
    (jobs pushed in the workqueue).
    During the step 2, OF nodes are destroyed but, without any
    synchronisation with devlink removal jobs, of_overlay_remove() can raise
    warnings related to missing of_node_put():
      ERROR: memory leak, expected refcount 1 instead of 2
    
    Indeed, the missing of_node_put() call is going to be done, too late,
    from the workqueue job execution.
    
    Introduce device_link_wait_removal() to offer a way to synchronize
    operations waiting for the end of devlink removals (i.e. end of
    workqueue jobs).
    Also, as a flushing operation is done on the workqueue, the workqueue
    used is moved from a system-wide workqueue to a local one.
    
    Cc: [email protected]
    Signed-off-by: Herve Codina <[email protected]>
    Tested-by: Luca Ceresoli <[email protected]>
    Reviewed-by: Nuno Sa <[email protected]>
    Reviewed-by: Saravana Kannan <[email protected]>
    Acked-by: Greg Kroah-Hartman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Rob Herring <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory [+ + +]

Author: Michael Kelley <[email protected]>
Date:   Wed Feb 28 16:45:33 2024 -0800

    Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
    
    commit b8209544296edbd1af186e2ea9c648642c37b18c upstream.
    
    The VMBUS_RING_SIZE macro adds space for a ring buffer header to the
    requested ring buffer size.  The header size is always 1 page, and so
    its size varies based on the PAGE_SIZE for which the kernel is built.
    If the requested ring buffer size is a large power-of-2 size and the header
    size is small, the resulting size is inefficient in its use of memory.
    For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in
    a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory
    allocator, and wastes 508 Kbytes of memory.
    
    In such situations, the exact size of the ring buffer isn't that important,
    and it's OK to allocate the 4 Kbyte header at the beginning of the 512
    Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory
    allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted.
    
    Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer
    sizes.  "Large" is somewhat arbitrarily defined as 8 times the size of
    the ring buffer header (which is of size PAGE_SIZE).  For example, for
    4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first
    4 Kbytes as the ring buffer header.  For 64 Kbyte PAGE_SIZE, ring buffers
    of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer
    header.  In both cases, smaller sizes add space for the header so
    the ring size isn't reduced too much by using part of the space for
    the header.  For example, with a 64 Kbyte page size, we don't want
    a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half
    of the space for the header.  In such a case, the memory allocation
    is less efficient, but it's the best that can be done.
    
    While the new algorithm slightly changes the amount of space allocated
    for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't
    known to be sensitive to small changes in ring buffer size, so there
    shouldn't be any effect.
    
    Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL")
    Fixes: 6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes")
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502
    Cc: [email protected]
    Signed-off-by: Michael Kelley <[email protected]>
    Reviewed-by: Saurabh Sengar <[email protected]>
    Reviewed-by: Dexuan Cui <[email protected]>
    Tested-by: Souradeep Chakrabarti <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Wei Liu <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drivers: net: convert to boolean for the mac_managed_pm flag [+ + +]

Author: Denis Kirjanov <[email protected]>
Date:   Thu Oct 27 21:45:02 2022 +0300

    drivers: net: convert to boolean for the mac_managed_pm flag
    
    [ Upstream commit eca485d22165695587bed02d8b9d0f7f44246c4a ]
    
    Signed-off-by: Dennis Kirjanov <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Stable-dep-of: cbc17e7802f5 ("net: fec: Set mac_managed_pm during probe")
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Fix noise issue on HDMI AV mute [+ + +]

Author: Leo Ma <[email protected]>
Date:   Fri Jul 28 08:35:07 2023 -0400

    drm/amd/display: Fix noise issue on HDMI AV mute
    
    [ Upstream commit 69e3be6893a7e668660b05a966bead82bbddb01d ]
    
    [Why]
    When mode switching is triggered there is momentary noise visible on
    some HDMI TV or displays.
    
    [How]
    Wait for 2 frames to make sure we have enough time to send out AV mute
    and sink receives a full frame.
    
    Cc: Mario Limonciello <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: [email protected]
    Reviewed-by: Wenjing Liu <[email protected]>
    Acked-by: Wayne Lin <[email protected]>
    Signed-off-by: Leo Ma <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Preserve original aspect ratio in create stream [+ + +]

Author: Tom Chung <[email protected]>
Date:   Tue Jan 30 15:34:08 2024 +0800

    drm/amd/display: Preserve original aspect ratio in create stream
    
    [ Upstream commit 79f3e38f60e5b2416ba99804d83d22e69ae592a3 ]
    
    [Why]
    The original picture aspect ratio in mode struct may have chance be
    overwritten with wrong aspect ratio data in create_stream_for_sink().
    It will create a different VIC output and cause HDMI compliance test
    failed.
    
    [How]
    Preserve the original picture aspect ratio data during create the
    stream.
    
    Cc: Mario Limonciello <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: [email protected]
    Reviewed-by: Aurabindo Pillai <[email protected]>
    Signed-off-by: Tom Chung <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Return the correct HDCP error code [+ + +]

Author: Rodrigo Siqueira <[email protected]>
Date:   Wed Feb 14 13:29:51 2024 -0700

    drm/amd/display: Return the correct HDCP error code
    
    [ Upstream commit e64b3f55e458ce7e2087a0051f47edabf74545e7 ]
    
    [WHY & HOW]
    If the display is null when creating an HDCP session, return a proper
    error code.
    
    Cc: Mario Limonciello <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Cc: [email protected]
    Acked-by: Alex Hung <[email protected]>
    Signed-off-by: Rodrigo Siqueira <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag [+ + +]

Author: Philip Yang <[email protected]>
Date:   Mon Mar 11 18:07:34 2024 -0400

    drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag
    
    [ Upstream commit 6c6064cbe58b43533e3451ad6a8ba9736c109ac3 ]
    
    Otherwise after the GTT bo is released, the GTT and gart space is freed
    but amdgpu_ttm_backend_unbind will not clear the gart page table entry
    and leave valid mapping entry pointing to the stale system page. Then
    if GPU access the gart address mistakely, it will read undefined value
    instead page fault, harder to debug and reproduce the real issue.
    
    Cc: [email protected]
    Signed-off-by: Philip Yang <[email protected]>
    Reviewed-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: Use drm_mode_copy() [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Fri Feb 18 12:03:45 2022 +0200

    drm/amdgpu: Use drm_mode_copy()
    
    [ Upstream commit 426c89aa203bcec9d9cf6eea36735eafa1b1f099 ]
    
    struct drm_display_mode embeds a list head, so overwriting
    the full struct with another one will corrupt the list
    (if the destination mode is on a list). Use drm_mode_copy()
    instead which explicitly preserves the list head of
    the destination mode.
    
    Even if we know the destination mode is not on any list
    using drm_mode_copy() seems decent as it sets a good
    example. Bad examples of not using it might eventually
    get copied into code where preserving the list head
    actually matters.
    
    Obviously one case not covered here is when the mode
    itself is embedded in a larger structure and the whole
    structure is copied. But if we are careful when copying
    into modes embedded in structures I think we can be a
    little more reassured that bogus list heads haven't been
    propagated in.
    
    @is_mode_copy@
    @@
    drm_mode_copy(...)
    {
    ...
    }
    
    @depends on !is_mode_copy@
    struct drm_display_mode *mode;
    expression E, S;
    @@
    (
    - *mode = E
    + drm_mode_copy(mode, &E)
    |
    - memcpy(mode, E, S)
    + drm_mode_copy(mode, E)
    )
    
    @depends on !is_mode_copy@
    struct drm_display_mode mode;
    expression E;
    @@
    (
    - mode = E
    + drm_mode_copy(&mode, &E)
    |
    - memcpy(&mode, E, S)
    + drm_mode_copy(&mode, E)
    )
    
    @@
    struct drm_display_mode *mode;
    @@
    - &*mode
    + mode
    
    Cc: Alex Deucher <[email protected]>
    Cc: Harry Wentland <[email protected]>
    Cc: Leo Li <[email protected]>
    Cc: Rodrigo Siqueira <[email protected]>
    Cc: [email protected]
    Reviewed-by: Harry Wentland <[email protected]>
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Stable-dep-of: 79f3e38f60e5 ("drm/amd/display: Preserve original aspect ratio in create stream")
    Signed-off-by: Sasha Levin <[email protected]>

drm/etnaviv: Restore some id values [+ + +]

Author: Christian Gmeiner <[email protected]>
Date:   Fri Mar 1 14:28:11 2024 +0100

    drm/etnaviv: Restore some id values
    
    [ Upstream commit b735ee173f84d5d0d0733c53946a83c12d770d05 ]
    
    The hwdb selection logic as a feature that allows it to mark some fields
    as 'don't care'. If we match with such a field we memcpy(..)
    the current etnaviv_chip_identity into ident.
    
    This step can overwrite some id values read from the GPU with the
    'don't care' value.
    
    Fix this issue by restoring the affected values after the memcpy(..).
    
    As this is crucial for user space to know when this feature works as
    expected increment the minor version too.
    
    Fixes: 4078a1186dd3 ("drm/etnaviv: update hwdb selection logic")
    Cc: [email protected]
    Signed-off-by: Christian Gmeiner <[email protected]>
    Reviewed-by: Tomeu Vizoso <[email protected]>
    Signed-off-by: Lucas Stach <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/exynos: do not return negative values from .get_modes() [+ + +]

Author: Jani Nikula <[email protected]>
Date:   Fri Mar 8 18:03:41 2024 +0200

    drm/exynos: do not return negative values from .get_modes()
    
    [ Upstream commit 13d5b040363c7ec0ac29c2de9cf661a24a8aa531 ]
    
    The .get_modes() hooks aren't supposed to return negative error
    codes. Return 0 for no modes, whatever the reason.
    
    Cc: Inki Dae <[email protected]>
    Cc: Seung-Woo Kim <[email protected]>
    Cc: Kyungmin Park <[email protected]>
    Cc: [email protected]
    Acked-by: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/d8665f620d9c252aa7d5a4811ff6b16e773903a2.1709913674.git.jani.nikula@intel.com
    Signed-off-by: Jani Nikula <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915/gt: Reset queue_priority_hint on parking [+ + +]

Author: Chris Wilson <[email protected]>
Date:   Mon Mar 18 14:58:47 2024 +0100

    drm/i915/gt: Reset queue_priority_hint on parking
    
    commit 4a3859ea5240365d21f6053ee219bb240d520895 upstream.
    
    Originally, with strict in order execution, we could complete execution
    only when the queue was empty. Preempt-to-busy allows replacement of an
    active request that may complete before the preemption is processed by
    HW. If that happens, the request is retired from the queue, but the
    queue_priority_hint remains set, preventing direct submission until
    after the next CS interrupt is processed.
    
    This preempt-to-busy race can be triggered by the heartbeat, which will
    also act as the power-management barrier and upon completion allow us to
    idle the HW. We may process the completion of the heartbeat, and begin
    parking the engine before the CS event that restores the
    queue_priority_hint, causing us to fail the assertion that it is MIN.
    
    <3>[  166.210729] __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1))
    <0>[  166.210781] Dumping ftrace buffer:
    <0>[  166.210795] ---------------------------------
    ...
    <0>[  167.302811] drm_fdin-1097      2..s1. 165741070us : trace_ports: 0000:00:02.0 rcs0: promote { ccid:20 1217:2 prio 0 }
    <0>[  167.302861] drm_fdin-1097      2d.s2. 165741072us : execlists_submission_tasklet: 0000:00:02.0 rcs0: preempting last=1217:2, prio=0, hint=2147483646
    <0>[  167.302928] drm_fdin-1097      2d.s2. 165741072us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 1217:2, current 0
    <0>[  167.302992] drm_fdin-1097      2d.s2. 165741073us : __i915_request_submit: 0000:00:02.0 rcs0: fence 3:4660, current 4659
    <0>[  167.303044] drm_fdin-1097      2d.s1. 165741076us : execlists_submission_tasklet: 0000:00:02.0 rcs0: context:3 schedule-in, ccid:40
    <0>[  167.303095] drm_fdin-1097      2d.s1. 165741077us : trace_ports: 0000:00:02.0 rcs0: submit { ccid:40 3:4660* prio 2147483646 }
    <0>[  167.303159] kworker/-89       11..... 165741139us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence c90:2, current 2
    <0>[  167.303208] kworker/-89       11..... 165741148us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:c90 unpin
    <0>[  167.303272] kworker/-89       11..... 165741159us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 1217:2, current 2
    <0>[  167.303321] kworker/-89       11..... 165741166us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:1217 unpin
    <0>[  167.303384] kworker/-89       11..... 165741170us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 3:4660, current 4660
    <0>[  167.303434] kworker/-89       11d..1. 165741172us : __intel_context_retire: 0000:00:02.0 rcs0: context:1216 retire runtime: { total:56028ns, avg:56028ns }
    <0>[  167.303484] kworker/-89       11..... 165741198us : __engine_park: 0000:00:02.0 rcs0: parked
    <0>[  167.303534]   <idle>-0         5d.H3. 165741207us : execlists_irq_handler: 0000:00:02.0 rcs0: semaphore yield: 00000040
    <0>[  167.303583] kworker/-89       11..... 165741397us : __intel_context_retire: 0000:00:02.0 rcs0: context:1217 retire runtime: { total:325575ns, avg:0ns }
    <0>[  167.303756] kworker/-89       11..... 165741777us : __intel_context_retire: 0000:00:02.0 rcs0: context:c90 retire runtime: { total:0ns, avg:0ns }
    <0>[  167.303806] kworker/-89       11..... 165742017us : __engine_park: __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1))
    <0>[  167.303811] ---------------------------------
    <4>[  167.304722] ------------[ cut here ]------------
    <2>[  167.304725] kernel BUG at drivers/gpu/drm/i915/gt/intel_engine_pm.c:283!
    <4>[  167.304731] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    <4>[  167.304734] CPU: 11 PID: 89 Comm: kworker/11:1 Tainted: G        W          6.8.0-rc2-CI_DRM_14193-gc655e0fd2804+ #1
    <4>[  167.304736] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
    <4>[  167.304738] Workqueue: i915-unordered retire_work_handler [i915]
    <4>[  167.304839] RIP: 0010:__engine_park+0x3fd/0x680 [i915]
    <4>[  167.304937] Code: 00 48 c7 c2 b0 e5 86 a0 48 8d 3d 00 00 00 00 e8 79 48 d4 e0 bf 01 00 00 00 e8 ef 0a d4 e0 31 f6 bf 09 00 00 00 e8 03 49 c0 e0 <0f> 0b 0f 0b be 01 00 00 00 e8 f5 61 fd ff 31 c0 e9 34 fd ff ff 48
    <4>[  167.304940] RSP: 0018:ffffc9000059fce0 EFLAGS: 00010246
    <4>[  167.304942] RAX: 0000000000000200 RBX: 0000000000000000 RCX: 0000000000000006
    <4>[  167.304944] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
    <4>[  167.304946] RBP: ffff8881330ca1b0 R08: 0000000000000001 R09: 0000000000000001
    <4>[  167.304947] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881330ca000
    <4>[  167.304948] R13: ffff888110f02aa0 R14: ffff88812d1d0205 R15: ffff88811277d4f0
    <4>[  167.304950] FS:  0000000000000000(0000) GS:ffff88844f780000(0000) knlGS:0000000000000000
    <4>[  167.304952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    <4>[  167.304953] CR2: 00007fc362200c40 CR3: 000000013306e003 CR4: 0000000000770ef0
    <4>[  167.304955] PKRU: 55555554
    <4>[  167.304957] Call Trace:
    <4>[  167.304958]  <TASK>
    <4>[  167.305573]  ____intel_wakeref_put_last+0x1d/0x80 [i915]
    <4>[  167.305685]  i915_request_retire.part.0+0x34f/0x600 [i915]
    <4>[  167.305800]  retire_requests+0x51/0x80 [i915]
    <4>[  167.305892]  intel_gt_retire_requests_timeout+0x27f/0x700 [i915]
    <4>[  167.305985]  process_scheduled_works+0x2db/0x530
    <4>[  167.305990]  worker_thread+0x18c/0x350
    <4>[  167.305993]  kthread+0xfe/0x130
    <4>[  167.305997]  ret_from_fork+0x2c/0x50
    <4>[  167.306001]  ret_from_fork_asm+0x1b/0x30
    <4>[  167.306004]  </TASK>
    
    It is necessary for the queue_priority_hint to be lower than the next
    request submission upon waking up, as we rely on the hint to decide when
    to kick the tasklet to submit that first request.
    
    Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
    Closes: https://gitlab.freedesktop.org/drm/intel/issues/10154
    Signed-off-by: Chris Wilson <[email protected]>
    Signed-off-by: Janusz Krzysztofik <[email protected]>
    Cc: Mika Kuoppala <[email protected]>
    Cc: <[email protected]> # v5.4+
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Reviewed-by: Andi Shyti <[email protected]>
    Signed-off-by: Andi Shyti <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 98850e96cf811dc2d0a7d0af491caff9f5d49c1e)
    Signed-off-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/i915: Check before removing mm notifier [+ + +]

Author: Nirmoy Das <[email protected]>
Date:   Mon Feb 19 13:50:47 2024 +0100

    drm/i915: Check before removing mm notifier
    
    commit 01bb1ae35006e473138c90711bad1a6b614a1823 upstream.
    
    Error in mmu_interval_notifier_insert() can leave a NULL
    notifier.mm pointer. Catch that and return early.
    
    Fixes: ed29c2691188 ("drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.")
    Cc: <[email protected]> # v5.13+
    [tursulin: Added Fixes and cc stable.]
    Cc: Andi Shyti <[email protected]>
    Cc: Shawn Lee <[email protected]>
    Signed-off-by: Nirmoy Das <[email protected]>
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    (cherry picked from commit db7bbd13f08774cde0332c705f042e327fe21e73)
    Signed-off-by: Joonas Lahtinen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/imx/ipuv3: do not return negative values from .get_modes() [+ + +]

Author: Jani Nikula <[email protected]>
Date:   Fri Mar 8 18:03:43 2024 +0200

    drm/imx/ipuv3: do not return negative values from .get_modes()
    
    [ Upstream commit c2da9ada64962fcd2e6395ed9987b9874ea032d3 ]
    
    The .get_modes() hooks aren't supposed to return negative error
    codes. Return 0 for no modes, whatever the reason.
    
    Cc: Philipp Zabel <[email protected]>
    Cc: [email protected]
    Acked-by: Philipp Zabel <[email protected]>
    Acked-by: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/311f6eec96d47949b16a670529f4d89fcd97aefa.1709913674.git.jani.nikula@intel.com
    Signed-off-by: Jani Nikula <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/panel: do not return negative error codes from drm_panel_get_modes() [+ + +]

Author: Jani Nikula <[email protected]>
Date:   Fri Mar 8 18:03:40 2024 +0200

    drm/panel: do not return negative error codes from drm_panel_get_modes()
    
    [ Upstream commit fc4e97726530241d96dd7db72eb65979217422c9 ]
    
    None of the callers of drm_panel_get_modes() expect it to return
    negative error codes. Either they propagate the return value in their
    struct drm_connector_helper_funcs .get_modes() hook (which is also not
    supposed to return negative codes), or add it to other counts leading to
    bogus values.
    
    On the other hand, many of the struct drm_panel_funcs .get_modes() hooks
    do return negative error codes, so handle them gracefully instead of
    propagating further.
    
    Return 0 for no modes, whatever the reason.
    
    Cc: Neil Armstrong <[email protected]>
    Cc: Jessica Zhang <[email protected]>
    Cc: Sam Ravnborg <[email protected]>
    Cc: [email protected]
    Reviewed-by: Neil Armstrong <[email protected]>
    Reviewed-by: Jessica Zhang <[email protected]>
    Acked-by: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/79f559b72d8c493940417304e222a4b04dfa19c4.1709913674.git.jani.nikula@intel.com
    Signed-off-by: Jani Nikula <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/vc4: hdmi: do not return negative values from .get_modes() [+ + +]

Author: Jani Nikula <[email protected]>
Date:   Fri Mar 8 18:03:44 2024 +0200

    drm/vc4: hdmi: do not return negative values from .get_modes()
    
    [ Upstream commit abf493988e380f25242c1023275c68bd3579c9ce ]
    
    The .get_modes() hooks aren't supposed to return negative error
    codes. Return 0 for no modes, whatever the reason.
    
    Cc: Maxime Ripard <[email protected]>
    Cc: [email protected]
    Acked-by: Maxime Ripard <[email protected]>
    Acked-by: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/dcda6d4003e2c6192987916b35c7304732800e08.1709913674.git.jani.nikula@intel.com
    Signed-off-by: Jani Nikula <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/vmwgfx: Fix possible null pointer derefence with invalid contexts [+ + +]

Author: Zack Rusin <[email protected]>
Date:   Wed Jan 10 15:03:05 2024 -0500

    drm/vmwgfx: Fix possible null pointer derefence with invalid contexts
    
    [ Upstream commit 517621b7060096e48e42f545fa6646fc00252eac ]
    
    vmw_context_cotable can return either an error or a null pointer and its
    usage sometimes went unchecked. Subsequent code would then try to access
    either a null pointer or an error value.
    
    The invalid dereferences were only possible with malformed userspace
    apps which never properly initialized the rendering contexts.
    
    Check the results of vmw_context_cotable to fix the invalid derefs.
    
    Thanks:
    ziming zhang(@ezrak1e) from Ant Group Light-Year Security Lab
    who was the first person to discover it.
    Niels De Graef who reported it and helped to track down the poc.
    
    Fixes: 9c079b8ce8bf ("drm/vmwgfx: Adapt execbuf to the new validation api")
    Cc: <[email protected]> # v4.20+
    Reported-by: Niels De Graef  <[email protected]>
    Signed-off-by: Zack Rusin <[email protected]>
    Cc: Martin Krastev <[email protected]>
    Cc: Maaz Mombasawala <[email protected]>
    Cc: Ian Forbes <[email protected]>
    Cc: Broadcom internal kernel review list <[email protected]>
    Cc: [email protected]
    Reviewed-by: Maaz Mombasawala <[email protected]>
    Reviewed-by: Martin Krastev <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

efivarfs: Request at most 512 bytes for variable names [+ + +]

Author: Tim Schumacher <[email protected]>
Date:   Fri Jan 26 17:25:23 2024 +0100

    efivarfs: Request at most 512 bytes for variable names
    
    commit f45812cc23fb74bef62d4eb8a69fe7218f4b9f2a upstream.
    
    Work around a quirk in a few old (2011-ish) UEFI implementations, where
    a call to `GetNextVariableName` with a buffer size larger than 512 bytes
    will always return EFI_INVALID_PARAMETER.
    
    There is some lore around EFI variable names being up to 1024 bytes in
    size, but this has no basis in the UEFI specification, and the upper
    bounds are typically platform specific, and apply to the entire variable
    (name plus payload).
    
    Given that Linux does not permit creating files with names longer than
    NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a
    reasonable limit.
    
    Cc: <[email protected]> # 6.1+
    Signed-off-by: Tim Schumacher <[email protected]>
    Signed-off-by: Ard Biesheuvel <[email protected]>
    [[email protected]: adjusted diff for changed context and code move]
    Signed-off-by: Tim Schumacher <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

entry: Respect changes to system call number by trace_sys_enter() [+ + +]

Author: Andrц╘ Rц╤sti <[email protected]>
Date:   Mon Mar 11 21:17:04 2024 +0000

    entry: Respect changes to system call number by trace_sys_enter()
    
    [ Upstream commit fb13b11d53875e28e7fbf0c26b288e4ea676aa9f ]
    
    When a probe is registered at the trace_sys_enter() tracepoint, and that
    probe changes the system call number, the old system call still gets
    executed.  This worked correctly until commit b6ec41346103 ("core/entry:
    Report syscall correctly for trace and audit"), which removed the
    re-evaluation of the syscall number after the trace point.
    
    Restore the original semantics by re-evaluating the system call number
    after trace_sys_enter().
    
    The performance impact of this re-evaluation is minimal because it only
    takes place when a trace point is active, and compared to the actual trace
    point overhead the read from a cache hot variable is negligible.
    
    Fixes: b6ec41346103 ("core/entry: Report syscall correctly for trace and audit")
    Signed-off-by: Andrц╘ Rц╤sti <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

erspan: make sure erspan_base_hdr is present in skb->head [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Thu Mar 28 11:22:48 2024 +0000

    erspan: make sure erspan_base_hdr is present in skb->head
    
    commit 17af420545a750f763025149fa7b833a4fc8b8f0 upstream.
    
    syzbot reported a problem in ip6erspan_rcv() [1]
    
    Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make
    sure erspan_base_hdr is present in skb linear part (skb->head)
    before getting @ver field from it.
    
    Add the missing pskb_may_pull() calls.
    
    v2: Reload iph pointer in erspan_rcv() after pskb_may_pull()
        because skb->head might have changed.
    
    [1]
    
     BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
     BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline]
     BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
     BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
      pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
      pskb_may_pull include/linux/skbuff.h:2756 [inline]
      ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
      gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
      ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438
      ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
      ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
      dst_input include/net/dst.h:460 [inline]
      ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310
      __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
      __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652
      netif_receive_skb_internal net/core/dev.c:5738 [inline]
      netif_receive_skb+0x58/0x660 net/core/dev.c:5798
      tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549
      tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002
      tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2108 [inline]
      new_sync_write fs/read_write.c:497 [inline]
      vfs_write+0xb63/0x1520 fs/read_write.c:590
      ksys_write+0x20f/0x4c0 fs/read_write.c:643
      __do_sys_write fs/read_write.c:655 [inline]
      __se_sys_write fs/read_write.c:652 [inline]
      __x64_sys_write+0x93/0xe0 fs/read_write.c:652
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:3804 [inline]
      slab_alloc_node mm/slub.c:3845 [inline]
      kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
      __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
      alloc_skb include/linux/skbuff.h:1318 [inline]
      alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
      sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
      tun_alloc_skb drivers/net/tun.c:1525 [inline]
      tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846
      tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2108 [inline]
      new_sync_write fs/read_write.c:497 [inline]
      vfs_write+0xb63/0x1520 fs/read_write.c:590
      ksys_write+0x20f/0x4c0 fs/read_write.c:643
      __do_sys_write fs/read_write.c:655 [inline]
      __se_sys_write fs/read_write.c:652 [inline]
      __x64_sys_write+0x93/0xe0 fs/read_write.c:652
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g962490525cff #0
    
    Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netdev/[email protected]/
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: Lorenzo Bianconi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack() [+ + +]

Author: Max Filippov <[email protected]>
Date:   Wed Mar 20 11:26:07 2024 -0700

    exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
    
    commit 2aea94ac14d1e0a8ae9e34febebe208213ba72f7 upstream.
    
    In NOMMU kernel the value of linux_binprm::p is the offset inside the
    temporary program arguments array maintained in separate pages in the
    linux_binprm::page. linux_binprm::exec being a copy of linux_binprm::p
    thus must be adjusted when that array is copied to the user stack.
    Without that adjustment the value passed by the NOMMU kernel to the ELF
    program in the AT_EXECFN entry of the aux array doesn't make any sense
    and it may break programs that try to access memory pointed to by that
    entry.
    
    Adjust linux_binprm::exec before the successful return from the
    transfer_args_to_stack().
    
    Cc: <[email protected]>
    Fixes: b6a2fea39318 ("mm: variable length argument support")
    Fixes: 5edc2a5123a7 ("binfmt_elf_fdpic: wire up AT_EXECFD, AT_EXECFN, AT_SECURE")
    Signed-off-by: Max Filippov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Kees Cook <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

exit: Implement kthread_exit [+ + +]

Author: Eric W. Biederman <[email protected]>
Date:   Mon Nov 22 10:27:36 2021 -0600

    exit: Implement kthread_exit
    
    [ Upstream commit bbda86e988d4c124e4cfa816291cbd583ae8bfb1 ]
    
    The way the per task_struct exit_code is used by kernel threads is not
    quite compatible how it is used by userspace applications.  The low
    byte of the userspace exit_code value encodes the exit signal.  While
    kthreads just use the value as an int holding ordinary kernel function
    exit status like -EPERM.
    
    Add kthread_exit to clearly separate the two kinds of uses.
    
    Signed-off-by: "Eric W. Biederman" <[email protected]>
    Stable-dep-of: ca3574bd653a ("exit: Rename module_put_and_exit to module_put_and_kthread_exit")
    Signed-off-by: Chuck Lever <[email protected]>

exit: Rename module_put_and_exit to module_put_and_kthread_exit [+ + +]

Author: Eric W. Biederman <[email protected]>
Date:   Fri Dec 3 11:00:19 2021 -0600

    exit: Rename module_put_and_exit to module_put_and_kthread_exit
    
    [ Upstream commit ca3574bd653aba234a4b31955f2778947403be16 ]
    
    Update module_put_and_exit to call kthread_exit instead of do_exit.
    
    Change the name to reflect this change in functionality.  All of the
    users of module_put_and_exit are causing the current kthread to exit
    so this change makes it clear what is happening.  There is no
    functional change.
    
    Signed-off-by: "Eric W. Biederman" <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

exportfs: use pr_debug for unreachable debug statements [+ + +]

Author: David Disseldorp <[email protected]>
Date:   Fri Oct 21 14:24:14 2022 +0200

    exportfs: use pr_debug for unreachable debug statements
    
    [ Upstream commit 427505ffeaa464f683faba945a88d3e3248f6979 ]
    
    expfs.c has a bunch of dprintk statements which are unusable due to:
     #define dprintk(fmt, args...) do{}while(0)
    Use pr_debug so that they can be enabled dynamically.
    Also make some minor changes to the debug statements to fix some
    incorrect types, and remove __func__ which can be handled by dynamic
    debug separately.
    
    Signed-off-by: David Disseldorp <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

ext4: correct best extent lstart adjustment logic [+ + +]

Author: Baokun Li <[email protected]>
Date:   Thu Feb 1 22:18:45 2024 +0800

    ext4: correct best extent lstart adjustment logic
    
    [ Upstream commit 4fbf8bc733d14bceb16dda46a3f5e19c6a9621c5 ]
    
    When yangerkun review commit 93cdf49f6eca ("ext4: Fix best extent lstart
    adjustment logic in ext4_mb_new_inode_pa()"), it was found that the best
    extent did not completely cover the original request after adjusting the
    best extent lstart in ext4_mb_new_inode_pa() as follows:
    
      original request: 2/10(8)
      normalized request: 0/64(64)
      best extent: 0/9(9)
    
    When we check if best ex can be kept at start of goal, ac_o_ex.fe_logical
    is 2 less than the adjusted best extent logical end 9, so we think the
    adjustment is done. But obviously 0/9(9) doesn't cover 2/10(8), so we
    should determine here if the original request logical end is less than or
    equal to the adjusted best extent logical end.
    
    In addition, add a comment stating when adjusted best_ex will not cover
    the original request, and remove the duplicate assertion because adjusting
    lstart makes no change to b_ex.fe_len.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
    Cc:  <[email protected]>
    Signed-off-by: yangerkun <[email protected]>
    Signed-off-by: Baokun Li <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Reviewed-by: Ojaswin Mujoo <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fix corruption during on-line resize [+ + +]

Author: Maximilian Heyne <[email protected]>
Date:   Thu Feb 15 15:50:09 2024 +0000

    ext4: fix corruption during on-line resize
    
    [ Upstream commit a6b3bfe176e8a5b05ec4447404e412c2a3fc92cc ]
    
    We observed a corruption during on-line resize of a file system that is
    larger than 16 TiB with 4k block size. With having more then 2^32 blocks
    resize_inode is turned off by default by mke2fs. The issue can be
    reproduced on a smaller file system for convenience by explicitly
    turning off resize_inode. An on-line resize across an 8 GiB boundary (the
    size of a meta block group in this setup) then leads to a corruption:
    
      dev=/dev/<some_dev> # should be >= 16 GiB
      mkdir -p /corruption
      /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15))
      mount -t ext4 $dev /corruption
    
      dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15))
      sha1sum /corruption/test
      # 79d2658b39dcfd77274e435b0934028adafaab11  /corruption/test
    
      /sbin/resize2fs $dev $((2*2**21))
      # drop page cache to force reload the block from disk
      echo 1 > /proc/sys/vm/drop_caches
    
      sha1sum /corruption/test
      # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3  /corruption/test
    
    2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per
    block group and 2^6 are the number of block groups that make a meta
    block group.
    
    The last checksum might be different depending on how the file is laid
    out across the physical blocks. The actual corruption occurs at physical
    block 63*2^15 = 2064384 which would be the location of the backup of the
    meta block group's block descriptor. During the on-line resize the file
    system will be converted to meta_bg starting at s_first_meta_bg which is
    2 in the example - meaning all block groups after 16 GiB. However, in
    ext4_flex_group_add we might add block groups that are not part of the
    first meta block group yet. In the reproducer we achieved this by
    substracting the size of a whole block group from the point where the
    meta block group would start. This must be considered when updating the
    backup block group descriptors to follow the non-meta_bg layout. The fix
    is to add a test whether the group to add is already part of the meta
    block group or not.
    
    Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems")
    Cc:  <[email protected]>
    Signed-off-by: Maximilian Heyne <[email protected]>
    Tested-by: Srivathsa Dara <[email protected]>
    Reviewed-by: Srivathsa Dara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fix error code saved on super block during file system abort [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Tue Oct 26 14:33:02 2021 -0300

    ext4: fix error code saved on super block during file system abort
    
    [ Upstream commit 124e7c61deb27d758df5ec0521c36cf08d417f7a ]
    
    ext4_abort will eventually call ext4_errno_to_code, which translates the
    errno to an EXT4_ERR specific error.  This means that ext4_abort expects
    an errno.  By using EXT4_ERR_ here, it gets misinterpreted (as an errno),
    and ends up saving EXT4_ERR_EBUSY on the superblock during an abort,
    which makes no sense.
    
    ESHUTDOWN will get properly translated to EXT4_ERR_SHUTDOWN, so use that
    instead.
    
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

ext4: Send notifications on error [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:44 2021 -0300

    ext4: Send notifications on error
    
    [ Upstream commit 9a089b21f79b47eed240d4da7ea0d049de7c9b4d ]
    
    Send a FS_ERROR message via fsnotify to a userspace monitoring tool
    whenever a ext4 error condition is triggered.  This follows the existing
    error conditions in ext4, so it is hooked to the ext4_error* functions.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Acked-by: Theodore Ts'o <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Add helpers to decide whether to report FID/DFID [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:38 2021 -0300

    fanotify: Add helpers to decide whether to report FID/DFID
    
    [ Upstream commit 4bd5a5c8e6e5cd964e9738e6ef87f6c2cb453edf ]
    
    Now that there is an event that reports FID records even for a zeroed
    file handle, wrap the logic that deides whether to issue the records
    into helper functions.  This shouldn't have any impact on the code, but
    simplifies further patches.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Allow file handle encoding for unhashed events [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:29 2021 -0300

    fanotify: Allow file handle encoding for unhashed events
    
    [ Upstream commit 74fe4734897a2da2ae2a665a5e622cd490d36eaf ]
    
    Allow passing a NULL hash to fanotify_encode_fh and avoid calculating
    the hash if not needed.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Allow users to request FAN_FS_ERROR events [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:43 2021 -0300

    fanotify: Allow users to request FAN_FS_ERROR events
    
    [ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
    
    Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing
    user space to request the monitoring of FAN_FS_ERROR events.
    
    These events are limited to filesystem marks, so check it is the
    case in the syscall handler.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: cleanups for fanotify_mark() input validations [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed Jun 29 17:42:09 2022 +0300

    fanotify: cleanups for fanotify_mark() input validations
    
    [ Upstream commit 8afd7215aa97f8868d033f6e1d01a276ab2d29c0 ]
    
    Create helper fanotify_may_update_existing_mark() for checking for
    conflicts between existing mark flags and fanotify_mark() flags.
    
    Use variable mark_cmd to make the checks for mark command bits
    cleaner.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: create helper fanotify_mark_user_flags() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:23 2022 +0300

    fanotify: create helper fanotify_mark_user_flags()
    
    [ Upstream commit 4adce25ccfff215939ee465b8c0aa70526d5c352 ]
    
    To translate from fsnotify mark flags to user visible flags.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: do not allow setting dirent events in mask of non-dir [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Sat May 7 11:00:28 2022 +0300

    fanotify: do not allow setting dirent events in mask of non-dir
    
    [ Upstream commit ceaf69f8eadcafb323392be88e7a5248c415d423 ]
    
    Dirent events (create/delete/move) are only reported on watched
    directory inodes, but in fanotify as well as in legacy inotify, it was
    always allowed to set them on non-dir inode, which does not result in
    any meaningful outcome.
    
    Until kernel v5.17, dirent events in fanotify also differed from events
    "on child" (e.g. FAN_OPEN) in the information provided in the event.
    For example, FAN_OPEN could be set in the mask of a non-dir or the mask
    of its parent and event would report the fid of the child regardless of
    the marked object.
    By contrast, FAN_DELETE is not reported if the child is marked and the
    child fid was not reported in the events.
    
    Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the
    fid of the child is reported with dirent events, like events "on child",
    which may create confusion for users expecting the same behavior as
    events "on child" when setting events in the mask on a child.
    
    The desired semantics of setting dirent events in the mask of a child
    are not clear, so for now, deny this action for a group initialized
    with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME.
    We may relax this restriction in the future if we decide on the
    semantics and implement them.
    
    Fixes: d61fd650e9d2 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID")
    Fixes: 8cc3b1ccd930 ("fanotify: wire up FAN_RENAME event")
    Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Emit generic error info for error event [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:42 2021 -0300

    fanotify: Emit generic error info for error event
    
    [ Upstream commit 130a3c742107acff985541c28360c8b40203559c ]
    
    The error info is a record sent to users on FAN_FS_ERROR events
    documenting the type of error.  It also carries an error count,
    documenting how many errors were observed since the last reporting.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: enable "evictable" inode marks [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:27 2022 +0300

    fanotify: enable "evictable" inode marks
    
    [ Upstream commit 5f9d3bd520261fd7a850818c71809fd580e0f30c ]
    
    Now that the direct reclaim path is handled we can enable evictable
    inode marks.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Encode empty file handle when no inode is provided [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:30 2021 -0300

    fanotify: Encode empty file handle when no inode is provided
    
    [ Upstream commit 272531ac619b374ab474e989eb387162fded553f ]
    
    Instead of failing, encode an invalid file handle in fanotify_encode_fh
    if no inode is provided.  This bogus file handle will be reported by
    FAN_FS_ERROR for non-inode errors.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: factor out helper fanotify_mark_update_flags() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:24 2022 +0300

    fanotify: factor out helper fanotify_mark_update_flags()
    
    [ Upstream commit 8998d110835e3781ccd3f1ae061a590b4aaba911 ]
    
    Handle FAN_MARK_IGNORED_SURV_MODIFY flag change in a helper that
    is called after updating the mark mask.
    
    Replace the added and removed return values and help variables with
    bool recalc return values and help variable, which makes the code a
    bit easier to follow.
    
    Rename flags argument to fan_flags to emphasize the difference from
    mark->flags.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: fix incorrect fmode_t casts [+ + +]

Author: Vasily Averin <[email protected]>
Date:   Sun May 22 15:08:02 2022 +0300

    fanotify: fix incorrect fmode_t casts
    
    [ Upstream commit dccd855771b37820b6d976a99729c88259549f85 ]
    
    Fixes sparce warnings:
    fs/notify/fanotify/fanotify_user.c:267:63: sparse:
     warning: restricted fmode_t degrades to integer
    fs/notify/fanotify/fanotify_user.c:1351:28: sparse:
     warning: restricted fmode_t degrades to integer
    
    FMODE_NONTIFY have bitwise fmode_t type and requires __force attribute
    for any casts.
    
    Signed-off-by: Vasily Averin <[email protected]>
    Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Fold event size calculation to its own function [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:20 2021 -0300

    fanotify: Fold event size calculation to its own function
    
    [ Upstream commit b9928e80dda84b349ba8de01780b9bef2fc36ffa ]
    
    Every time this function is invoked, it is immediately added to
    FAN_EVENT_METADATA_LEN, since there is no need to just calculate the
    length of info records. This minor clean up folds the rest of the
    calculation into the function, which now operates in terms of events,
    returning the size of the entire event, including metadata.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: implement "evictable" inode marks [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:25 2022 +0300

    fanotify: implement "evictable" inode marks
    
    [ Upstream commit 7d5e005d982527e4029b0139823d179986e34cdc ]
    
    When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
    pin the marked inode to inode cache, so when inode is evicted from cache
    due to memory pressure, the mark will be lost.
    
    When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
    this flag, the marked inode is pinned to inode cache.
    
    When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
    existing mark already has the inode pinned, the mark update fails with
    error EEXIST.
    
    Evictable inode marks can be used to setup inode marks with ignored mask
    to suppress events from uninteresting files or directories in a lazy
    manner, upon receiving the first event, without having to iterate all
    the uninteresting files or directories before hand.
    
    The evictbale inode mark feature allows performing this lazy marks setup
    without exhausting the system memory with pinned inodes.
    
    This change does not enable the feature yet.
    
    Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: introduce FAN_MARK_IGNORE [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed Jun 29 17:42:10 2022 +0300

    fanotify: introduce FAN_MARK_IGNORE
    
    [ Upstream commit e252f2ed1c8c6c3884ab5dd34e003ed21f1fe6e0 ]
    
    This flag is a new way to configure ignore mask which allows adding and
    removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask.
    
    The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on
    directories and would ignore events on children depending on whether
    the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask.
    
    FAN_MARK_IGNORE can be used to ignore events on children without setting
    FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on
    directories unconditionally, only when FAN_ONDIR is set in ignore mask.
    
    The new behavior is non-downgradable.  After calling fanotify_mark() with
    FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK
    on the same object will return EEXIST error.
    
    Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark
    has no meaning and will return ENOTDIR error.
    
    The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new
    FAN_MARK_IGNORE flag, but with a few semantic differences:
    
    1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount
       marks and on an inode mark on a directory. Omitting this flag
       will return EINVAL or EISDIR error.
    
    2. An ignore mask on a non-directory inode that survives modify could
       never be downgraded to an ignore mask that does not survive modify.
       With new FAN_MARK_IGNORE semantics we make that rule explicit -
       trying to update a surviving ignore mask without the flag
       FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error.
    
    The conveniene macro FAN_MARK_IGNORE_SURV is added for
    (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY), because the
    common case should use short constant names.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: introduce group flag FAN_REPORT_TARGET_FID [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:29 2021 +0200

    fanotify: introduce group flag FAN_REPORT_TARGET_FID
    
    [ Upstream commit d61fd650e9d206a71fda789f02a1ced4b19944c4 ]
    
    FAN_REPORT_FID is ambiguous in that it reports the fid of the child for
    some events and the fid of the parent for create/delete/move events.
    
    The new FAN_REPORT_TARGET_FID flag is an implicit request to report
    the fid of the target object of the operation (a.k.a the child inode)
    also in create/delete/move events in addition to the fid of the parent
    and the name of the child.
    
    To reduce the test matrix for uninteresting use cases, the new
    FAN_REPORT_TARGET_FID flag requires both FAN_REPORT_NAME and
    FAN_REPORT_FID.  The convenience macro FAN_REPORT_DFID_NAME_TARGET
    combines FAN_REPORT_TARGET_FID with all the required flags.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Pre-allocate pool of error events [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:34 2021 -0300

    fanotify: Pre-allocate pool of error events
    
    [ Upstream commit 734a1a5eccc5f7473002b0669f788e135f1f64aa ]
    
    Pre-allocate slots for file system errors to have greater chances of
    succeeding, since error events can happen in GFP_NOFS context.  This
    patch introduces a group-wide mempool of error events, shared by all
    FAN_FS_ERROR marks in this group.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: prepare for setting event flags in ignore mask [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed Jun 29 17:42:08 2022 +0300

    fanotify: prepare for setting event flags in ignore mask
    
    [ Upstream commit 31a371e419c885e0f137ce70395356ba8639dc52 ]
    
    Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect.
    The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and
    ignore mask is always implicitly applied to events on directories.
    
    Define a mark flag that replaces this legacy behavior with logic of
    applying the ignore mask according to event flags in ignore mask.
    
    Implement the new logic to prepare for supporting an ignore mask that
    ignores events on children and ignore mask that does not ignore events
    on directories.
    
    To emphasize the change in terminology, also rename ignored_mask mark
    member to ignore_mask and use accessors to get only the effective
    ignored events or the ignored events and flags.
    
    This change in terminology finally aligns with the "ignore mask"
    language in man pages and in most of the comments.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: record either old name new name or both for FAN_RENAME [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:35 2021 +0200

    fanotify: record either old name new name or both for FAN_RENAME
    
    [ Upstream commit 2bfbcccde6e7a787feabad4645f628f963fe0663 ]
    
    We do not want to report the dirfid+name of a directory whose
    inode/sb are not watched, because watcher may not have permissions
    to see the directory content.
    
    Use an internal iter_info to indicate to fanotify_alloc_event()
    which marks of this group are watching FAN_RENAME, so it can decide
    if we need to record only the old parent+name, new parent+name or both.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    [JK: Modified code to pass around only mask of mark types matching
    generated event]
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: record old and new parent and name in FAN_RENAME event [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:34 2021 +0200

    fanotify: record old and new parent and name in FAN_RENAME event
    
    [ Upstream commit 3982534ba5ce45e890b2f5ef5e7372c1accd14c7 ]
    
    In the special case of FAN_RENAME event, we record both the old
    and new parent and name.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: refine the validation checks on non-dir inode mask [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Jun 27 20:47:19 2022 +0300

    fanotify: refine the validation checks on non-dir inode mask
    
    [ Upstream commit 8698e3bab4dd7968666e84e111d0bfd17c040e77 ]
    
    Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in
    mask of non-dir") added restrictions about setting dirent events in the
    mask of a non-dir inode mark, which does not make any sense.
    
    For backward compatibility, these restictions were added only to new
    (v5.17+) APIs.
    
    It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or
    FAN_ONDIR in the mask of a non-dir inode.  Add these flags to the
    dir-only restriction of the new APIs as well.
    
    Move the check of the dir-only flags for new APIs into the helper
    fanotify_events_supported(), which is only called for FAN_MARK_ADD,
    because there is no need to error on an attempt to remove the dir-only
    flags from non-dir inode.
    
    Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir")
    Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    [ cel: adjusted to apply on v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Remove obsoleted fanotify_event_has_path() [+ + +]

Author: Gaosheng Cui <[email protected]>
Date:   Mon Sep 26 10:30:18 2022 +0800

    fanotify: Remove obsoleted fanotify_event_has_path()
    
    [ Upstream commit 7a80bf902d2bc722b4477442ee772e8574603185 ]
    
    All uses of fanotify_event_has_path() have
    been removed since commit 9c61f3b560f5 ("fanotify: break up
    fanotify_alloc_event()"), now it is useless, so remove it.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Gaosheng Cui <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>

fanotify: Report fid info for file related file system errors [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:41 2021 -0300

    fanotify: Report fid info for file related file system errors
    
    [ Upstream commit 936d6a38be39177495af38497bf8da1c6128fa1b ]
    
    Plumb the pieces to add a FID report to error records.  Since all error
    event memory must be pre-allocated, we pre-allocate the maximum file
    handle size possible, such that it should always fit.
    
    For errors that don't expose a file handle, report it with an invalid
    FID. Internally we use zero-length FILEID_ROOT file handle for passing
    the information (which we report as zero-length FILEID_INVALID file
    handle to userspace) so we update the handle reporting code to deal with
    this case correctly.
    
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    [Folded two patches into 2 to make series bisectable]
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: report old and/or new parent+name in FAN_RENAME event [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:36 2021 +0200

    fanotify: report old and/or new parent+name in FAN_RENAME event
    
    [ Upstream commit 7326e382c21e9c23c89c88369afdc90b82a14da8 ]
    
    In the special case of FAN_RENAME event, we report old or new or both
    old and new parent+name.
    
    A single info record will be reported if either the old or new dir
    is watched and two records will be reported if both old and new dir
    (or their filesystem) are watched.
    
    The old and new parent+name are reported using new info record types
    FAN_EVENT_INFO_TYPE_{OLD,NEW}_DFID_NAME, so if a single info record
    is reported, it is clear to the application, to which dir entry the
    fid+name info is referring to.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Require fid_mode for any non-fd event [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:31 2021 -0300

    fanotify: Require fid_mode for any non-fd event
    
    [ Upstream commit 4fe595cf1c80e7a5af4d00c4da29def64aff57a2 ]
    
    Like inode events, FAN_FS_ERROR will require fid mode.  Therefore,
    convert the verification during fanotify_mark(2) to require fid for any
    non-fd event.  This means fid_mode will not only be required for inode
    events, but for any event that doesn't provide a descriptor.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Reserve UAPI bits for FAN_FS_ERROR [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:33 2021 -0300

    fanotify: Reserve UAPI bits for FAN_FS_ERROR
    
    [ Upstream commit 8d11a4f43ef4679be0908026907a7613b33d7127 ]
    
    FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which
    is a mechanism to report file system wide problems via fanotify.  This
    commit preallocate userspace visible bits to match the FS_ERROR event.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Split fsid check from other fid mode checks [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:21 2021 -0300

    fanotify: Split fsid check from other fid mode checks
    
    [ Upstream commit 8299212cbdb01a5867e230e961f82e5c02a6de34 ]
    
    FAN_FS_ERROR will require fsid, but not necessarily require the
    filesystem to expose a file handle.  Split those checks into different
    functions, so they can be used separately when setting up an event.
    
    While there, update a comment about tmpfs having 0 fsid, which is no
    longer true.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Support enqueueing of error events [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:35 2021 -0300

    fanotify: Support enqueueing of error events
    
    [ Upstream commit 83e9acbe13dc1b767f91b5c1350f7a65689b26f6 ]
    
    Once an error event is triggered, enqueue it in the notification group,
    similarly to what is done for other events.  FAN_FS_ERROR is not
    handled specially, since the memory is now handled by a preallocated
    mempool.
    
    For now, make the event unhashed.  A future patch implements merging of
    this kind of event.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Support merging of error events [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:36 2021 -0300

    fanotify: Support merging of error events
    
    [ Upstream commit 8a6ae64132fd27a944faed7bc38484827609eb76 ]
    
    Error events (FAN_FS_ERROR) against the same file system can be merged
    by simply iterating the error count.  The hash is taken from the fsid,
    without considering the FH.  This means that only the first error object
    is reported.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Support null inode event in fanotify_dfid_inode [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:28 2021 -0300

    fanotify: Support null inode event in fanotify_dfid_inode
    
    [ Upstream commit 12f47bf0f0990933d95d021d13d31bda010648fd ]
    
    FAN_FS_ERROR doesn't support DFID, but this function is still called for
    every event.  The problem is that it is not capable of handling null
    inodes, which now can happen in case of superblock error events.  For
    this case, just returning dir will be enough.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: support secondary dir fh and name in fanotify_info [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:33 2021 +0200

    fanotify: support secondary dir fh and name in fanotify_info
    
    [ Upstream commit 3cf984e950c1c3f41d407ed31db33beb996be132 ]
    
    Allow storing a secondary dir fh and name tupple in fanotify_info.
    This will be used to store the new parent and name information in
    FAN_RENAME event.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: use fsnotify group lock helpers [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:26 2022 +0300

    fanotify: use fsnotify group lock helpers
    
    [ Upstream commit e79719a2ca5c61912c0493bc1367db52759cf6fd ]
    
    Direct reclaim from fanotify mark allocation context may try to evict
    inodes with evictable marks of the same group and hit this deadlock:
    
    [<0>] fsnotify_destroy_mark+0x1f/0x3a
    [<0>] fsnotify_destroy_marks+0x71/0xd9
    [<0>] __destroy_inode+0x24/0x7e
    [<0>] destroy_inode+0x2c/0x67
    [<0>] dispose_list+0x49/0x68
    [<0>] prune_icache_sb+0x5b/0x79
    [<0>] super_cache_scan+0x11c/0x16f
    [<0>] shrink_slab.constprop.0+0x23e/0x40f
    [<0>] shrink_node+0x218/0x3e7
    [<0>] do_try_to_free_pages+0x12a/0x2d2
    [<0>] try_to_free_pages+0x166/0x242
    [<0>] __alloc_pages_slowpath.constprop.0+0x30c/0x903
    [<0>] __alloc_pages+0xeb/0x1c7
    [<0>] cache_grow_begin+0x6f/0x31e
    [<0>] fallback_alloc+0xe0/0x12d
    [<0>] ____cache_alloc_node+0x15a/0x17e
    [<0>] kmem_cache_alloc_trace+0xa1/0x143
    [<0>] fanotify_add_mark+0xd5/0x2b2
    [<0>] do_fanotify_mark+0x566/0x5eb
    [<0>] __x64_sys_fanotify_mark+0x21/0x24
    [<0>] do_syscall_64+0x6d/0x80
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim
    from allocations under fanotify group lock and use the safe group lock
    helpers.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: use helpers to parcel fanotify_info buffer [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:32 2021 +0200

    fanotify: use helpers to parcel fanotify_info buffer
    
    [ Upstream commit 1a9515ac9e55e68d733bab81bd408463ab1e25b1 ]
    
    fanotify_info buffer is parceled into variable sized records, so the
    records must be written in order: dir_fh, file_fh, name.
    
    Use helpers to assert that order and make fanotify_alloc_name_event()
    a bit more generic to allow empty dir_fh record and to allow expanding
    to more records (i.e. name2) soon.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: use macros to get the offset to fanotify_info buffer [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:31 2021 +0200

    fanotify: use macros to get the offset to fanotify_info buffer
    
    [ Upstream commit 2d9374f095136206a02eb0b6cd9ef94632c1e9f7 ]
    
    The fanotify_info buffer contains up to two file handles and a name.
    Use macros to simplify the code that access the different items within
    the buffer.
    
    Add assertions to verify that stored fh len and name len do not overflow
    the u8 stored value in fanotify_info header.
    
    Remove the unused fanotify_info_len() helper.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: WARN_ON against too large file handles [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:40 2021 -0300

    fanotify: WARN_ON against too large file handles
    
    [ Upstream commit 572c28f27a269f88e2d8d7b6b1507f114d637337 ]
    
    struct fanotify_error_event, at least, is preallocated and isn't able to
    to handle arbitrarily large file handles.  Future-proof the code by
    complaining loudly if a handle larger than MAX_HANDLE_SZ is ever found.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: wire up FAN_RENAME event [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:37 2021 +0200

    fanotify: wire up FAN_RENAME event
    
    [ Upstream commit 8cc3b1ccd930fe6971e1527f0c4f1bdc8cb56026 ]
    
    FAN_RENAME is the successor of FAN_MOVED_FROM and FAN_MOVED_TO
    and can be used to get the old and new parent+name information in
    a single event.
    
    FAN_MOVED_FROM and FAN_MOVED_TO are still supported for backward
    compatibility, but it makes little sense to use them together with
    FAN_RENAME in the same group.
    
    FAN_RENAME uses special info type records to report the old and
    new parent+name, so reporting only old and new parent id is less
    useful and was not implemented.
    Therefore, FAN_REANAME requires a group with flag FAN_REPORT_NAME.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fanotify: Wrap object_fh inline space in a creator macro [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:37 2021 -0300

    fanotify: Wrap object_fh inline space in a creator macro
    
    [ Upstream commit 2c5069433a3adc01ff9c5673567961bb7f138074 ]
    
    fanotify_error_event would duplicate this sequence of declarations that
    already exist elsewhere with a slight different size.  Create a helper
    macro to avoid code duplication.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fat: fix uninitialized field in nostale filehandles [+ + +]

Author: Jan Kara <[email protected]>
Date:   Mon Feb 5 13:26:26 2024 +0100

    fat: fix uninitialized field in nostale filehandles
    
    [ Upstream commit fde2497d2bc3a063d8af88b258dbadc86bd7b57c ]
    
    When fat_encode_fh_nostale() encodes file handle without a parent it
    stores only first 10 bytes of the file handle. However the length of the
    file handle must be a multiple of 4 so the file handle is actually 12
    bytes long and the last two bytes remain uninitialized. This is not
    great at we potentially leak uninitialized information with the handle
    to userspace. Properly initialize the full handle length.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Reported-by: [email protected]
    Fixes: ea3983ace6b7 ("fat: restructure export_operations")
    Signed-off-by: Jan Kara <[email protected]>
    Acked-by: OGAWA Hirofumi <[email protected]>
    Cc: Amir Goldstein <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

filelock: add a new locks_inode_context accessor function [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Nov 16 09:02:30 2022 -0500

    filelock: add a new locks_inode_context accessor function
    
    [ Upstream commit 401a8b8fd5acd51582b15238d72a8d0edd580e9f ]
    
    There are a number of places in the kernel that are accessing the
    inode->i_flctx field without smp_load_acquire. This is required to
    ensure that the caller doesn't see a partially-initialized structure.
    
    Add a new accessor function for it to make this clear and convert all of
    the relevant accesses in locks.c to use it. Also, convert
    locks_free_lock_context to use the helper as well instead of just doing
    a "bare" assignment.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Stable-dep-of: 77c67530e1f9 ("nfsd: use locks_inode_context helper")
    Signed-off-by: Chuck Lever <[email protected]>

fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion [+ + +]

Author: Bart Van Assche <[email protected]>
Date:   Mon Mar 4 15:57:15 2024 -0800

    fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
    
    commit 961ebd120565cb60cebe21cb634fbc456022db4a upstream.
    
    The first kiocb_set_cancel_fn() argument may point at a struct kiocb
    that is not embedded inside struct aio_kiocb. With the current code,
    depending on the compiler, the req->ki_ctx read happens either before
    the IOCB_AIO_RW test or after that test. Move the req->ki_ctx read such
    that it is guaranteed that the IOCB_AIO_RW test happens first.
    
    Reported-by: Eric Biggers <[email protected]>
    Cc: Benjamin LaHaise <[email protected]>
    Cc: Eric Biggers <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Avi Kivity <[email protected]>
    Cc: Sandeep Dhavale <[email protected]>
    Cc: Jens Axboe <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: Kent Overstreet <[email protected]>
    Cc: [email protected]
    Fixes: b820de741ae4 ("fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio")
    Signed-off-by: Bart Van Assche <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jens Axboe <[email protected]>
    Reviewed-by: Eric Biggers <[email protected]>
    Signed-off-by: Christian Brauner <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:25 2022 -0700

    fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict
    
    [ Upstream commit 2443da2259e97688f93d64d17ab69b15f466078a ]
    
    Add 2 new callbacks, lm_lock_expirable and lm_expire_lock, to
    lock_manager_operations to allow the lock manager to take appropriate
    action to resolve the lock conflict if possible.
    
    A new field, lm_mod_owner, is also added to lock_manager_operations.
    The lm_mod_owner is used by the fs/lock code to make sure the lock
    manager module such as nfsd, is not freed while lock conflict is being
    resolved.
    
    lm_lock_expirable checks and returns true to indicate that the lock
    conflict can be resolved else return false. This callback must be
    called with the flc_lock held so it can not block.
    
    lm_expire_lock is called to resolve the lock conflict if the returned
    value from lm_lock_expirable is true. This callback is called without
    the flc_lock held since it's allowed to block. Upon returning from
    this callback, the lock conflict should be resolved and the caller is
    expected to restart the conflict check from the beginnning of the list.
    
    Lock manager, such as NFSv4 courteous server, uses this callback to
    resolve conflict by destroying lock owner, or the NFSv4 courtesy client
    (client that has expired but allowed to maintains its states) that owns
    the lock.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fs/lock: add helper locks_owner_has_blockers to check for blockers [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:24 2022 -0700

    fs/lock: add helper locks_owner_has_blockers to check for blockers
    
    [ Upstream commit 591502c5cb325b1c6ec59ab161927d606b918aa0 ]
    
    Add helper locks_owner_has_blockers to check if there is any blockers
    for a given lockowner.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock. [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Sat Feb 12 10:12:52 2022 -0800

    fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
    
    [ Upstream commit 9d6647762b9c6b555bc83d97d7c93be6057a990f ]
    
    Update lock usage of lock_manager_operations' functions to reflect
    the changes in commit 6109c85037e5 ("locks: add a dedicated spinlock
    to protect i_flctx lists").
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fs/notify: constify path [+ + +]

Author: Al Viro <[email protected]>
Date:   Thu Aug 4 12:57:38 2022 -0400

    fs/notify: constify path
    
    [ Upstream commit d5bf88895f24686641c39420ee6df716dc1d95d8 ]
    
    Reviewed-by: Matthew Bobrowski <[email protected]>
    Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Al Viro <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() [+ + +]

Author: Jann Horn <[email protected]>
Date:   Fri Nov 24 16:08:22 2023 +0100

    fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
    
    [ Upstream commit 055ca83559912f2cfd91c9441427bac4caf3c74e ]
    
    When you try to splice between a normal pipe and a notification pipe,
    get_pipe_info(..., true) fails, so splice() falls back to treating the
    notification pipe like a normal pipe - so we end up in
    iter_file_splice_write(), which first locks the input pipe, then calls
    vfs_iter_write(), which locks the output pipe.
    
    Lockdep complains about that, because we're taking a pipe lock while
    already holding another pipe lock.
    
    I think this probably (?) can't actually lead to deadlocks, since you'd
    need another way to nest locking a normal pipe into locking a
    watch_queue pipe, but the lockdep annotations don't make that clear.
    
    Bail out earlier in pipe_write() for notification pipes, before taking
    the pipe lock.
    
    Reported-and-tested-by: <[email protected]>
    Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c
    Fixes: c73be61cede5 ("pipe: Add general notification queue support")
    Signed-off-by: Jann Horn <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Christian Brauner <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

fs: inotify: Fix typo in inotify comment [+ + +]

Author: Oliver Ford <[email protected]>
Date:   Wed May 18 15:59:59 2022 +0100

    fs: inotify: Fix typo in inotify comment
    
    Correct spelling in comment.
    
    Signed-off-by: Oliver Ford <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Add helper to detect overflow_event [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:23 2021 -0300

    fsnotify: Add helper to detect overflow_event
    
    [ Upstream commit 808967a0a4d2f4ce6a2005c5692fffbecaf018c1 ]
    
    Similarly to fanotify_is_perm_event and friends, provide a helper
    predicate to say whether a mask is of an overflow event.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Amir Goldstein <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Add wrapper around fsnotify_add_event [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:24 2021 -0300

    fsnotify: Add wrapper around fsnotify_add_event
    
    [ Upstream commit 1ad03c3a326a86e259389592117252c851873395 ]
    
    fsnotify_add_event is growing in number of parameters, which in most
    case are just passed a NULL pointer.  So, split out a new
    fsnotify_insert_event function to clean things up for users who don't
    need an insert hook.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Amir Goldstein <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: allow adding an inode mark without pinning inode [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:22 2022 +0300

    fsnotify: allow adding an inode mark without pinning inode
    
    [ Upstream commit c3638b5b13740fa31762d414bbce8b7a694e582a ]
    
    fsnotify_add_mark() and variants implicitly take a reference on inode
    when attaching a mark to an inode.
    
    Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF.
    
    Instead of taking the inode reference when attaching connector to inode
    and dropping the inode reference when detaching connector from inode,
    take the inode reference on attach of the first mark that wants to hold
    an inode reference and drop the inode reference on detach of the last
    mark that wants to hold an inode reference.
    
    Backends can "upgrade" an existing mark to take an inode reference, but
    cannot "downgrade" a mark with inode reference to release the refernce.
    
    This leaves the choice to the backend whether or not to pin the inode
    when adding an inode mark.
    
    This is intended to be used when adding a mark with ignored mask that is
    used for optimization in cases where group can afford getting unneeded
    events and reinstate the mark with ignored mask when inode is accessed
    again after being evicted.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: clarify contract for create event hooks [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Oct 25 16:27:18 2021 -0300

    fsnotify: clarify contract for create event hooks
    
    [ Upstream commit dabe729dddca550446e9cc118c96d1f91703345b ]
    
    Clarify argument names and contract for fsnotify_create() and
    fsnotify_mkdir() to reflect the anomaly of kernfs, which leaves dentries
    negavite after mkdir/create.
    
    Remove the WARN_ON(!inode) in audit code that were added by the Fixes
    commit under the wrong assumption that dentries cannot be negative after
    mkdir/create.
    
    Fixes: aa93bdc5500c ("fsnotify: use helpers to access data by data_type")
    Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Reported-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: clarify object type argument [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:27 2021 +0200

    fsnotify: clarify object type argument
    
    [ Upstream commit ad69cd9972e79aba103ba5365de0acd35770c265 ]
    
    In preparation for separating object type from iterator type, rename
    some 'type' arguments in functions to 'obj_type' and remove the unused
    interface to clear marks by object type mask.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: consistent behavior for parent not watching children [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed May 11 22:02:13 2022 +0300

    fsnotify: consistent behavior for parent not watching children
    
    [ Upstream commit e730558adffb88a52e562db089e969ee9510184a ]
    
    The logic for handling events on child in groups that have a mark on
    the parent inode, but without FS_EVENT_ON_CHILD flag in the mask is
    duplicated in several places and inconsistent.
    
    Move the logic into the preparation of mark type iterator, so that the
    parent mark type will be excluded from all mark type iterations in that
    case.
    
    This results in several subtle changes of behavior, hopefully all
    desired changes of behavior, for example:
    
    - Group A has a mount mark with FS_MODIFY in mask
    - Group A has a mark with ignore mask that does not survive FS_MODIFY
      and does not watch children on directory D.
    - Group B has a mark with FS_MODIFY in mask that does watch children
      on directory D.
    - FS_MODIFY event on file D/foo should not clear the ignore mask of
      group A, but before this change it does
    
    And if group A ignore mask was set to survive FS_MODIFY:
    - FS_MODIFY event on file D/foo should be reported to group A on account
      of the mount mark, but before this change it is wrongly ignored
    
    Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir")
    Reported-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: create helpers for group mark_mutex lock [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:17 2022 +0300

    fsnotify: create helpers for group mark_mutex lock
    
    [ Upstream commit 43b245a788e2d8f1bb742668a9bdace02fcb3e96 ]
    
    Create helpers to take and release the group mark_mutex lock.
    
    Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines
    if the mark_mutex lock is fs reclaim safe or not.  If not safe, the
    lock helpers take the lock and disable direct fs reclaim.
    
    In that case we annotate the mutex with a different lockdep class to
    express to lockdep that an allocation of mark of an fs reclaim safe group
    may take the group lock of another "NOFS" group to evict inodes.
    
    For now, converted only the callers in common code and no backend
    defines the NOFS flag.  It is intended to be set by fanotify for
    evictable marks support.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Don't insert unmergeable events in hashtable [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:19 2021 -0300

    fsnotify: Don't insert unmergeable events in hashtable
    
    [ Upstream commit cc53b55f697fe5aa98bdbfdfe67c6401da242155 ]
    
    Some events, like the overflow event, are not mergeable, so they are not
    hashed.  But, when failing inside fsnotify_add_event for lack of space,
    fsnotify_add_event() still calls the insert hook, which adds the
    overflow event to the merge list.  Add a check to prevent any kind of
    unmergeable event to be inserted in the hashtable.
    
    Fixes: 94e00d28a680 ("fsnotify: use hash table for faster events merge")
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Fix comment typo [+ + +]

Author: Xin Gao <[email protected]>
Date:   Sat Jul 23 03:46:39 2022 +0800

    fsnotify: Fix comment typo
    
    [ Upstream commit feee1ce45a5666bbdb08c5bb2f5f394047b1915b ]
    
    The double `if' is duplicated in line 104, remove one.
    
    Signed-off-by: Xin Gao <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: fix merge with parent's ignored mask [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed Feb 23 17:14:37 2022 +0200

    fsnotify: fix merge with parent's ignored mask
    
    [ Upstream commit 4f0b903ded728c505850daf2914bfc08841f0ae6 ]
    
    fsnotify_parent() does not consider the parent's mark at all unless
    the parent inode shows interest in events on children and in the
    specific event.
    
    So unless parent added an event to both its mark mask and ignored mask,
    the event will not be ignored.
    
    Fix this by declaring the interest of an object in an event when the
    event is in either a mark mask or ignored mask.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: generate FS_RENAME event with rich information [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:30 2021 +0200

    fsnotify: generate FS_RENAME event with rich information
    
    [ Upstream commit e54183fa7047c15819bc155f4c58501d9a9a3489 ]
    
    The dnotify FS_DN_RENAME event is used to request notification about
    a move within the same parent directory and was always coupled with
    the FS_MOVED_FROM event.
    
    Rename the FS_DN_RENAME event flag to FS_RENAME, decouple it from
    FS_MOVED_FROM and report it with the moved dentry instead of the moved
    inode, so it has the information about both old and new parent and name.
    
    Generate the FS_RENAME event regardless of same parent dir and apply
    the "same parent" rule in the generic fsnotify_handle_event() helper
    that is used to call backends with ->handle_inode_event() method
    (i.e. dnotify).  The ->handle_inode_event() method is not rich enough to
    report both old and new parent and name anyway.
    
    The enriched event is reported to fanotify over the ->handle_event()
    method with the old and new dir inode marks in marks array slots for
    ITER_TYPE_INODE and a new iter type slot ITER_TYPE_INODE2.
    
    The enriched event will be used for reporting old and new parent+name to
    fanotify groups with FAN_RENAME events.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: introduce mark type iterator [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed May 11 22:02:12 2022 +0300

    fsnotify: introduce mark type iterator
    
    [ Upstream commit 14362a2541797cf9df0e86fb12dcd7950baf566e ]
    
    fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code
    of iterating all marks of a specific group interested in an event
    by consulting the iterator report_mask.
    
    Use an open coded version of that iterator in fsnotify_iter_next()
    that collects all marks of the current iteration group without
    consulting the iterator report_mask.
    
    At the moment, the two iterator variants are the same, but this
    decoupling will allow us to exclude some of the group's marks from
    reporting the event, for example for event on child and inode marks
    on parent did not request to watch events on children.
    
    Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir")
    Reported-by: Jan Kara <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: invalidate dcache before IN_DELETE event [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Thu Jan 20 23:53:04 2022 +0200

    fsnotify: invalidate dcache before IN_DELETE event
    
    [ Upstream commit a37d9a17f099072fe4d3a9048b0321978707a918 ]
    
    Apparently, there are some applications that use IN_DELETE event as an
    invalidation mechanism and expect that if they try to open a file with
    the name reported with the delete event, that it should not contain the
    content of the deleted file.
    
    Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
    d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
    will have access to a positive dentry.
    
    This allowed a race where opening the deleted file via cached dentry
    is now possible after receiving the IN_DELETE event.
    
    To fix the regression, create a new hook fsnotify_delete() that takes
    the unlinked inode as an argument and use a helper d_delete_notify() to
    pin the inode, so we can pass it to fsnotify_delete() after d_delete().
    
    Backporting hint: this regression is from v5.3. Although patch will
    apply with only trivial conflicts to v5.4 and v5.10, it won't build,
    because fsnotify_delete() implementation is different in each of those
    versions (see fsnotify_link()).
    
    A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo
    filesystem that do not need to call d_delete().
    
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Ivan Delalande <[email protected]>
    Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
    Fixes: 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
    Cc: [email protected] # v5.3+
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    [ cel: adjusted to apply on v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: make allow_dups a property of the group [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:16 2022 +0300

    fsnotify: make allow_dups a property of the group
    
    [ Upstream commit f3010343d9e119da35ee864b3a28993bb5c78ed7 ]
    
    Instead of passing the allow_dups argument to fsnotify_add_mark()
    as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express
    the allow_dups behavior and set this behavior at group creation time
    for all calls of fsnotify_add_mark().
    
    Rename the allow_dups argument to generic add_flags argument for future
    use.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: optimize FS_MODIFY events with no ignored masks [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Wed Feb 23 17:14:38 2022 +0200

    fsnotify: optimize FS_MODIFY events with no ignored masks
    
    [ Upstream commit 04e317ba72d07901b03399b3d1525e83424df5b3 ]
    
    fsnotify() treats FS_MODIFY events specially - it does not skip them
    even if the FS_MODIFY event does not apear in the object's fsnotify
    mask.  This is because send_to_group() checks if FS_MODIFY needs to
    clear ignored mask of marks.
    
    The common case is that an object does not have any mark with ignored
    mask and in particular, that it does not have a mark with ignored mask
    and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag.
    
    Set FS_MODIFY in object's fsnotify mask during fsnotify_recalc_mask()
    if object has a mark with an ignored mask and without the
    FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag and remove the special
    treatment of FS_MODIFY in fsnotify(), so that FS_MODIFY events could
    be optimized in the common case.
    
    Call fsnotify_recalc_mask() from fanotify after adding or removing an
    ignored mask from a mark without FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY
    or when adding the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag to a mark
    with ignored mask (the flag cannot be removed by fanotify uapi).
    
    Performance results for doing 10000000 write(2)s to tmpfs:
    
                                    vanilla         patched
    without notification mark       25.486+-1.054   24.965+-0.244
    with notification mark          30.111+-0.139   26.891+-1.355
    
    So we can see the overhead of notification subsystem has been
    drastically reduced.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: pass data_type to fsnotify_name() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Oct 25 16:27:16 2021 -0300

    fsnotify: pass data_type to fsnotify_name()
    
    [ Upstream commit 9baf93d68bcc3d0a6042283b82603c076e25e4f5 ]
    
    Align the arguments of fsnotify_name() to those of fsnotify().
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: pass dentry instead of inode data [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Oct 25 16:27:17 2021 -0300

    fsnotify: pass dentry instead of inode data
    
    [ Upstream commit fd5a3ff49a19aa69e2bc1e26e98037c2d778e61a ]
    
    Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY.
    Use it to pass the dentry instead of it's ->d_inode where available.
    
    This is needed in preparation to the refactor to retrieve the super
    block from the data field.  In some cases (i.e. mkdir in kernfs), the
    data inode comes from a negative dentry, such that no super block
    information would be available. By receiving the dentry itself, instead
    of the inode, fsnotify can derive the super block even on these cases.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    [Expand explanation in commit message]
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: pass flags argument to fsnotify_alloc_group() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:15 2022 +0300

    fsnotify: pass flags argument to fsnotify_alloc_group()
    
    [ Upstream commit 867a448d587e7fa845bceaf4ee1c632448f2a9fa ]
    
    Add flags argument to fsnotify_alloc_group(), define and use the flag
    FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper
    fsnotify_alloc_user_group() to indicate user allocation.
    
    Although the flag FSNOTIFY_GROUP_USER is currently not used after group
    allocation, we store the flags argument in the group struct for future
    use of other group flags.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Pass group argument to free_event [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:27 2021 -0300

    fsnotify: Pass group argument to free_event
    
    [ Upstream commit 330ae77d2a5b0af32c0f29e139bf28ec8591de59 ]
    
    For group-wide mempool backed events, like FS_ERROR, the free_event
    callback will need to reference the group's mempool to free the memory.
    Wire that argument into the current callers.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Protect fsnotify_handle_inode_event from no-inode events [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:26 2021 -0300

    fsnotify: Protect fsnotify_handle_inode_event from no-inode events
    
    [ Upstream commit 24dca90590509a7a6cbe0650100c90c5b8a3468a ]
    
    FAN_FS_ERROR allows events without inodes - i.e. for file system-wide
    errors.  Even though fsnotify_handle_inode_event is not currently used
    by fanotify, this patch protects other backends from cases where neither
    inode or dir are provided.  Also document the constraints of the
    interface (inode and dir cannot be both NULL).
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Amir Goldstein <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: remove redundant parameter judgment [+ + +]

Author: Bang Li <[email protected]>
Date:   Fri Mar 11 23:12:40 2022 +0800

    fsnotify: remove redundant parameter judgment
    
    [ Upstream commit f92ca72b0263d601807bbd23ed25cbe6f4da89f4 ]
    
    iput() has already judged the incoming parameter, so there is no need to
    repeat the judgment here.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bang Li <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: remove unused declaration [+ + +]

Author: Gaosheng Cui <[email protected]>
Date:   Fri Sep 9 11:38:28 2022 +0800

    fsnotify: remove unused declaration
    
    [ Upstream commit f847c74d6e89f10926db58649a05b99237258691 ]
    
    fsnotify_alloc_event_holder() and fsnotify_destroy_event_holder()
    has been removed since commit 7053aee26a35 ("fsnotify: do not share
    events between notification groups"), so remove it.
    
    Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
    Signed-off-by: Gaosheng Cui <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Retrieve super block from the data field [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:25 2021 -0300

    fsnotify: Retrieve super block from the data field
    
    [ Upstream commit 29335033c574a15334015d8c4e36862cff3d3384 ]
    
    Some file system events (i.e. FS_ERROR) might not be associated with an
    inode or directory.  For these, we can retrieve the super block from the
    data field.  But, since the super_block is available in the data field
    on every event type, simplify the code to always retrieve it from there,
    through a new helper.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: separate mark iterator type from object type enum [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Mon Nov 29 22:15:28 2021 +0200

    fsnotify: separate mark iterator type from object type enum
    
    [ Upstream commit 1c9007d62bea6fd164285314f7553f73e5308863 ]
    
    They are two different types that use the same enum, so this confusing.
    
    Use the object type to indicate the type of object mark is attached to
    and the iter type to indicate the type of watch.
    
    A group can have two different watches of the same object type (parent
    and child watches) that match the same event.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fsnotify: Support FS_ERROR event type [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:32 2021 -0300

    fsnotify: Support FS_ERROR event type
    
    [ Upstream commit 9daa811073fa19c08e8aad3b90f9235fed161acf ]
    
    Expose a new type of fsnotify event for filesystems to report errors for
    userspace monitoring tools.  fanotify will send this type of
    notification for FAN_FS_ERROR events.  This also introduce a helper for
    generating the new event.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

fuse: don't unhash root [+ + +]

Author: Miklos Szeredi <[email protected]>
Date:   Wed Feb 28 16:50:49 2024 +0100

    fuse: don't unhash root
    
    [ Upstream commit b1fe686a765e6c0d71811d825b5a1585a202b777 ]
    
    The root inode is assumed to be always hashed.  Do not unhash the root
    inode even if it is marked BAD.
    
    Fixes: 5d069dbe8aaf ("fuse: fix bad inode")
    Cc: <[email protected]> # v5.11
    Signed-off-by: Miklos Szeredi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

fuse: fix root lookup with nonzero generation [+ + +]

Author: Miklos Szeredi <[email protected]>
Date:   Wed Feb 28 16:50:49 2024 +0100

    fuse: fix root lookup with nonzero generation
    
    [ Upstream commit 68ca1b49e430f6534d0774a94147a823e3b8b26e ]
    
    The root inode has a fixed nodeid and generation (1, 0).
    
    Prior to the commit 15db16837a35 ("fuse: fix illegal access to inode with
    reused nodeid") generation number on lookup was ignored.  After this commit
    lookup with the wrong generation number resulted in the inode being
    unhashed.  This is correct for non-root inodes, but replacing the root
    inode is wrong and results in weird behavior.
    
    Fix by reverting to the old behavior if ignoring the generation for the
    root inode, but issuing a warning in dmesg.
    
    Reported-by: Antonio SJ Musumeci <[email protected]>
    Closes: https://lore.kernel.org/all/CAOQ4uxhek5ytdN8Yz2tNEOg5ea4NkBb4nk0FGPjPk_9nz-VG3g@mail.gmail.com/
    Fixes: 15db16837a35 ("fuse: fix illegal access to inode with reused nodeid")
    Cc: <[email protected]> # v5.14
    Signed-off-by: Miklos Szeredi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

gro: fix ownership transfer [+ + +]

Author: Antoine Tenart <[email protected]>
Date:   Tue Mar 26 12:33:59 2024 +0100

    gro: fix ownership transfer
    
    commit ed4cccef64c1d0d5b91e69f7a8a6697c3a865486 upstream.
    
    If packets are GROed with fraglist they might be segmented later on and
    continue their journey in the stack. In skb_segment_list those skbs can
    be reused as-is. This is an issue as their destructor was removed in
    skb_gro_receive_list but not the reference to their socket, and then
    they can't be orphaned. Fix this by also removing the reference to the
    socket.
    
    For example this could be observed,
    
      kernel BUG at include/linux/skbuff.h:3131!  (skb_orphan)
      RIP: 0010:ip6_rcv_core+0x11bc/0x19a0
      Call Trace:
       ipv6_list_rcv+0x250/0x3f0
       __netif_receive_skb_list_core+0x49d/0x8f0
       netif_receive_skb_list_internal+0x634/0xd40
       napi_complete_done+0x1d2/0x7d0
       gro_cell_poll+0x118/0x1f0
    
    A similar construction is found in skb_gro_receive, apply the same
    change there.
    
    Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference")
    Signed-off-by: Antoine Tenart <[email protected]>
    Reviewed-by: Willem de Bruijn <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

hexagon: vmlinux.lds.S: handle attributes section [+ + +]

Author: Nathan Chancellor <[email protected]>
Date:   Tue Mar 19 17:37:46 2024 -0700

    hexagon: vmlinux.lds.S: handle attributes section
    
    commit 549aa9678a0b3981d4821bf244579d9937650562 upstream.
    
    After the linked LLVM change, the build fails with
    CONFIG_LD_ORPHAN_WARN_LEVEL="error", which happens with allmodconfig:
    
      ld.lld: error: vmlinux.a(init/main.o):(.hexagon.attributes) is being placed in '.hexagon.attributes'
    
    Handle the attributes section in a similar manner as arm and riscv by
    adding it after the primary ELF_DETAILS grouping in vmlinux.lds.S, which
    fixes the error.
    
    Link: https://lkml.kernel.org/r/20240319-hexagon-handle-attributes-section-vmlinux-lds-s-v1-1-59855dab8872@kernel.org
    Fixes: 113616ec5b64 ("hexagon: select ARCH_WANT_LD_ORPHAN_WARN")
    Link: https://github.com/llvm/llvm-project/commit/31f4b329c8234fab9afa59494d7f8bdaeaefeaad
    Signed-off-by: Nathan Chancellor <[email protected]>
    Reviewed-by: Brian Cain <[email protected]>
    Cc: Bill Wendling <[email protected]>
    Cc: Justin Stitt <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: uhid: Use READ_ONCE()/WRITE_ONCE() for ->running [+ + +]

Author: Jann Horn <[email protected]>
Date:   Fri Jan 14 14:33:31 2022 +0100

    HID: uhid: Use READ_ONCE()/WRITE_ONCE() for ->running
    
    [ Upstream commit c8e7ff41f819b0c31c66c5196933c26c18f7681f ]
    
    The flag uhid->running can be set to false by uhid_device_add_worker()
    without holding the uhid->devlock. Mark all reads/writes of the flag
    that might race with READ_ONCE()/WRITE_ONCE() for clarity and
    correctness.
    
    Signed-off-by: Jann Horn <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: (amc6821) add of_match table [+ + +]

Author: Josua Mayer <[email protected]>
Date:   Thu Mar 7 12:06:58 2024 +0100

    hwmon: (amc6821) add of_match table
    
    [ Upstream commit 3f003fda98a7a8d5f399057d92e6ed56b468657c ]
    
    Add of_match table for "ti,amc6821" compatible string.
    This fixes automatic driver loading by userspace when using device-tree,
    and if built as a module like major linux distributions do.
    
    While devices probe just fine with i2c_device_id table, userspace can't
    match the "ti,amc6821" compatible string from dt with the plain
    "amc6821" device id. As a result, the kernel module can not be loaded.
    
    Cc: [email protected]
    Signed-off-by: Josua Mayer <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [groeck: Cleaned up patch description]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: i801: Avoid potential double call to gpiod_remove_lookup_table [+ + +]

Author: Heiner Kallweit <[email protected]>
Date:   Mon Mar 4 21:31:06 2024 +0100

    i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
    
    commit ceb013b2d9a2946035de5e1827624edc85ae9484 upstream.
    
    If registering the platform device fails, the lookup table is
    removed in the error path. On module removal we would try to
    remove the lookup table again. Fix this by setting priv->lookup
    only if registering the platform device was successful.
    In addition free the memory allocated for the lookup table in
    the error path.
    
    Fixes: d308dfbf62ef ("i2c: mux/i801: Switch to use descriptor passing")
    Cc: [email protected]
    Reviewed-by: Andi Shyti <[email protected]>
    Reviewed-by: Linus Walleij <[email protected]>
    Signed-off-by: Heiner Kallweit <[email protected]>
    Signed-off-by: Andi Shyti <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

i40e: Enforce software interrupt during busy-poll exit [+ + +]

Author: Ivan Vecera <[email protected]>
Date:   Sat Mar 16 12:38:29 2024 +0100

    i40e: Enforce software interrupt during busy-poll exit
    
    [ Upstream commit ea558de7238bb12c3435c47f0631e9d17bf4a09f ]
    
    As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
    during poll exit") followed by commit 23be7075b318 ("ice: fix software
    generating extra interrupts") I'm seeing the similar issue also with
    i40e driver.
    
    In certain situation when busy-loop is enabled together with adaptive
    coalescing, the driver occasionally misses that there are outstanding
    descriptors to clean when exiting busy poll.
    
    Try to catch the remaining work by triggering a software interrupt
    when exiting busy poll. No extra interrupts will be generated when
    busy polling is not used.
    
    The issue was found when running sockperf ping-pong tcp test with
    adaptive coalescing and busy poll enabled (50 as value busy_pool
    and busy_read sysctl knobs) and results in huge latency spikes
    with more than 100000us.
    
    The fix is inspired from the ice driver and do the following:
    1) During napi poll exit in case of busy-poll (napo_complete_done()
       returns false) this is recorded to q_vector that we were in busy
       loop.
    2) Extends i40e_buildreg_itr() to be able to add an enforced software
       interrupt into built value
    2) In i40e_update_enable_itr() enforces a software interrupt trigger
       if we are exiting busy poll to catch any pending clean-ups
    3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
       20K interrupts per second to limit the number of these sw interrupts.
    
    Test results
    ============
    Prior:
    [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
    sockperf: == version #3.10-no.git ==
    sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
    
    [ 0] IP = 10.9.9.1        PORT = 11111 # TCP
    sockperf: Warmup stage (sending a few dummy messages)...
    sockperf: Starting test...
    sockperf: Test end (interrupted by timer)
    sockperf: Test ended
    sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
    sockperf: ========= Printing statistics for Server No: 0
    sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
    sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
    sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
    sockperf: Summary: Latency is 24.571 usec
    sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
    sockperf: ---> <MAX> observation = 103294.331
    sockperf: ---> percentile 99.999 =   45.633
    sockperf: ---> percentile 99.990 =   37.013
    sockperf: ---> percentile 99.900 =   35.910
    sockperf: ---> percentile 99.000 =   33.390
    sockperf: ---> percentile 90.000 =   28.626
    sockperf: ---> percentile 75.000 =   27.741
    sockperf: ---> percentile 50.000 =   26.743
    sockperf: ---> percentile 25.000 =   25.614
    sockperf: ---> <MIN> observation =   12.220
    
    After:
    [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
    sockperf: == version #3.10-no.git ==
    sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
    
    [ 0] IP = 10.9.9.1        PORT = 11111 # TCP
    sockperf: Warmup stage (sending a few dummy messages)...
    sockperf: Starting test...
    sockperf: Test end (interrupted by timer)
    sockperf: Test ended
    sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
    sockperf: ========= Printing statistics for Server No: 0
    sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
    sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
    sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
    sockperf: Summary: Latency is 24.965 usec
    sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
    sockperf: ---> <MAX> observation =  195.841
    sockperf: ---> percentile 99.999 =   45.026
    sockperf: ---> percentile 99.990 =   39.009
    sockperf: ---> percentile 99.900 =   35.922
    sockperf: ---> percentile 99.000 =   33.482
    sockperf: ---> percentile 90.000 =   28.902
    sockperf: ---> percentile 75.000 =   27.821
    sockperf: ---> percentile 50.000 =   26.860
    sockperf: ---> percentile 25.000 =   25.685
    sockperf: ---> <MIN> observation =   12.277
    
    Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
    Reported-by: Hugo Ferreira <[email protected]>
    Reviewed-by: Michal Schmidt <[email protected]>
    Signed-off-by: Ivan Vecera <[email protected]>
    Reviewed-by: Jesse Brandeburg <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i40e: fix i40e_count_filters() to count only active/new filters [+ + +]

Author: Aleksandr Loktionov <[email protected]>
Date:   Wed Mar 13 10:44:00 2024 +0100

    i40e: fix i40e_count_filters() to count only active/new filters
    
    commit eb58c598ce45b7e787568fe27016260417c3d807 upstream.
    
    The bug usually affects untrusted VFs, because they are limited to 18 MACs,
    it affects them badly, not letting to create MAC all filters.
    Not stable to reproduce, it happens when VF user creates MAC filters
    when other MACVLAN operations are happened in parallel.
    But consequence is that VF can't receive desired traffic.
    
    Fix counter to be bumped only for new or active filters.
    
    Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable")
    Signed-off-by: Aleksandr Loktionov <[email protected]>
    Reviewed-by: Arkadiusz Kubalewski <[email protected]>
    Reviewed-by: Paul Menzel <[email protected]>
    Tested-by: Rafal Romanowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

i40e: fix vf may be used uninitialized in this function warning [+ + +]

Author: Aleksandr Loktionov <[email protected]>
Date:   Wed Mar 13 10:56:39 2024 +0100

    i40e: fix vf may be used uninitialized in this function warning
    
    commit f37c4eac99c258111d414d31b740437e1925b8e8 upstream.
    
    To fix the regression introduced by commit 52424f974bc5, which causes
    servers hang in very hard to reproduce conditions with resets races.
    Using two sources for the information is the root cause.
    In this function before the fix bumping v didn't mean bumping vf
    pointer. But the code used this variables interchangeably, so stale vf
    could point to different/not intended vf.
    
    Remove redundant "v" variable and iterate via single VF pointer across
    whole function instead to guarantee VF pointer validity.
    
    Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
    Signed-off-by: Aleksandr Loktionov <[email protected]>
    Reviewed-by: Arkadiusz Kubalewski <[email protected]>
    Reviewed-by: Przemek Kitszel <[email protected]>
    Reviewed-by: Paul Menzel <[email protected]>
    Tested-by: Rafal Romanowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

i40e: Remove _t suffix from enum type names [+ + +]

Author: Ivan Vecera <[email protected]>
Date:   Mon Nov 13 15:10:24 2023 -0800

    i40e: Remove _t suffix from enum type names
    
    [ Upstream commit addca9175e5f74cf29e8ad918c38c09b8663b5b8 ]
    
    Enum type names should not be suffixed by '_t'. Either to use
    'typedef enum name name_t' to so plain 'name_t var' instead of
    'enum name_t var'.
    
    Signed-off-by: Ivan Vecera <[email protected]>
    Reviewed-by: Jacob Keller <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Stable-dep-of: ea558de7238b ("i40e: Enforce software interrupt during busy-poll exit")
    Signed-off-by: Sasha Levin <[email protected]>

i40e: Store the irq number in i40e_q_vector [+ + +]

Author: Joe Damato <[email protected]>
Date:   Fri Oct 7 14:38:40 2022 -0700

    i40e: Store the irq number in i40e_q_vector
    
    [ Upstream commit 6b85a4f39ff7177b2428d4deab1151a31754e391 ]
    
    Make it easy to figure out the IRQ number for a particular i40e_q_vector by
    storing the assigned IRQ in the structure itself.
    
    Signed-off-by: Joe Damato <[email protected]>
    Acked-by: Jesse Brandeburg <[email protected]>
    Acked-by: Sridhar Samudrala <[email protected]>
    Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Stable-dep-of: ea558de7238b ("i40e: Enforce software interrupt during busy-poll exit")
    Signed-off-by: Sasha Levin <[email protected]>

init: open /initrd.image with O_LARGEFILE [+ + +]

Author: John Sperbeck <[email protected]>
Date:   Sun Mar 17 15:15:22 2024 -0700

    init: open /initrd.image with O_LARGEFILE
    
    commit 4624b346cf67400ef46a31771011fb798dd2f999 upstream.
    
    If initrd data is larger than 2Gb, we'll eventually fail to write to the
    /initrd.image file when we hit that limit, unless O_LARGEFILE is set.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: John Sperbeck <[email protected]>
    Cc: Jens Axboe <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

inotify: Don't force FS_IN_IGNORED [+ + +]

Author: Gabriel Krisman Bertazi <[email protected]>
Date:   Mon Oct 25 16:27:22 2021 -0300

    inotify: Don't force FS_IN_IGNORED
    
    [ Upstream commit e0462f91d24756916fded4313d508e0fc52f39c9 ]
    
    According to Amir:
    
    "FS_IN_IGNORED is completely internal to inotify and there is no need
    to set it in i_fsnotify_mask at all, so if we remove the bit from the
    output of inotify_arg_to_mask() no functionality will change and we will
    be able to overload the event bit for FS_ERROR."
    
    This is done in preparation to overload FS_ERROR with the notification
    mechanism in fanotify.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Amir Goldstein <[email protected]>
    Reviewed-by: Amir Goldstein <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

inotify: move control flags from mask to mark flags [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:13 2022 +0300

    inotify: move control flags from mask to mark flags
    
    [ Upstream commit 38035c04f5865c4ef9597d6beed6a7178f90f64a ]
    
    The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not
    relevant to object interest mask, so move them to the mark flags.
    
    This frees up some bits in the object interest mask.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

inotify: use fsnotify group lock helpers [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:18 2022 +0300

    inotify: use fsnotify group lock helpers
    
    [ Upstream commit 642054b87058019be36033f73c3e48ffff1915aa ]
    
    inotify inode marks pin the inode so there is no need to set the
    FSNOTIFY_GROUP_NOFS flag.
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

io_uring: ensure '0' is returned on file registration success [+ + +]

Author: Jens Axboe <[email protected]>
Date:   Tue Apr 2 08:28:04 2024 -0600

    io_uring: ensure '0' is returned on file registration success
    
    A previous backport mistakenly removed code that cleared 'ret' to zero,
    as the SCM logging was performed. Fix up the return value so we don't
    return an errant error on fixed file registration.
    
    Fixes: d909d381c315 ("io_uring: drop any code related to SCM_RIGHTS")
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/dma: Force swiotlb_max_mapping_size on an untrusted device [+ + +]

Author: Nicolin Chen <[email protected]>
Date:   Fri Mar 8 15:28:28 2024 +0000

    iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
    
    [ Upstream commit afc5aa46ed560f01ceda897c053c6a40c77ce5c4 ]
    
    The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
    On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
    an NVME device can map a size between 300KB~512KB, which certainly failed
    the swiotlb mappings, though the default pool of swiotlb has many slots:
        systemd[1]: Started Journal Service.
     => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
        note: journal-offline[392] exited with irqs disabled
        note: journal-offline[392] exited with preempt_count 1
    
    Call trace:
    [    3.099918]  swiotlb_tbl_map_single+0x214/0x240
    [    3.099921]  iommu_dma_map_page+0x218/0x328
    [    3.099928]  dma_map_page_attrs+0x2e8/0x3a0
    [    3.101985]  nvme_prep_rq.part.0+0x408/0x878 [nvme]
    [    3.102308]  nvme_queue_rqs+0xc0/0x300 [nvme]
    [    3.102313]  blk_mq_flush_plug_list.part.0+0x57c/0x600
    [    3.102321]  blk_add_rq_to_plug+0x180/0x2a0
    [    3.102323]  blk_mq_submit_bio+0x4c8/0x6b8
    [    3.103463]  __submit_bio+0x44/0x220
    [    3.103468]  submit_bio_noacct_nocheck+0x2b8/0x360
    [    3.103470]  submit_bio_noacct+0x180/0x6c8
    [    3.103471]  submit_bio+0x34/0x130
    [    3.103473]  ext4_bio_write_folio+0x5a4/0x8c8
    [    3.104766]  mpage_submit_folio+0xa0/0x100
    [    3.104769]  mpage_map_and_submit_buffers+0x1a4/0x400
    [    3.104771]  ext4_do_writepages+0x6a0/0xd78
    [    3.105615]  ext4_writepages+0x80/0x118
    [    3.105616]  do_writepages+0x90/0x1e8
    [    3.105619]  filemap_fdatawrite_wbc+0x94/0xe0
    [    3.105622]  __filemap_fdatawrite_range+0x68/0xb8
    [    3.106656]  file_write_and_wait_range+0x84/0x120
    [    3.106658]  ext4_sync_file+0x7c/0x4c0
    [    3.106660]  vfs_fsync_range+0x3c/0xa8
    [    3.106663]  do_fsync+0x44/0xc0
    
    Since untrusted devices might go down the swiotlb pathway with dma-iommu,
    these devices should not map a size larger than swiotlb_max_mapping_size.
    
    To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
    take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
    the iommu_dma_opt_mapping_size().
    
    Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
    Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com
    Signed-off-by: Nicolin Chen <[email protected]>
    [will: Drop redundant is_swiotlb_active(dev) check]
    Signed-off-by: Will Deacon <[email protected]>
    Reviewed-by: Michael Kelley <[email protected]>
    Acked-by: Robin Murphy <[email protected]>
    Tested-by: Nicolin Chen <[email protected]>
    Tested-by: Michael Kelley <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ipv6: Fix infinite recursion in fib6_dump_done(). [+ + +]

Author: Kuniyuki Iwashima <[email protected]>
Date:   Mon Apr 1 14:10:04 2024 -0700

    ipv6: Fix infinite recursion in fib6_dump_done().
    
    commit d21d40605bca7bd5fc23ef03d4c1ca1f48bc2cae upstream.
    
    syzkaller reported infinite recursive calls of fib6_dump_done() during
    netlink socket destruction.  [1]
    
    From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
    the response was generated.  The following recvmmsg() resumed the dump
    for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
    to the fault injection.  [0]
    
      12:01:34 executing program 3:
      r0 = socket$nl_route(0x10, 0x3, 0x0)
      sendmsg$nl_route(r0, ... snip ...)
      recvmmsg(r0, ... snip ...) (fail_nth: 8)
    
    Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
    of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3].  syzkaller stopped
    receiving the response halfway through, and finally netlink_sock_destruct()
    called nlk_sk(sk)->cb.done().
    
    fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
    is still not NULL.  fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
    nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
    itself recursively and hitting the stack guard page.
    
    To avoid the issue, let's set the destructor after kzalloc().
    
    [0]:
    FAULT_INJECTION: forcing a failure.
    name failslab, interval 1, probability 0, space 0, times 0
    CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl (lib/dump_stack.c:117)
     should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
     should_failslab (mm/slub.c:3733)
     kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
     inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
     rtnl_dump_all (net/core/rtnetlink.c:4029)
     netlink_dump (net/netlink/af_netlink.c:2269)
     netlink_recvmsg (net/netlink/af_netlink.c:1988)
     ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
     ___sys_recvmsg (net/socket.c:2846)
     do_recvmmsg (net/socket.c:2943)
     __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
    
    [1]:
    BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
    stack guard page: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
    Workqueue: events netlink_sock_destruct_work
    RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
    Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
    RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
    RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
    RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
    R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
    FS:  0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
    PKRU: 55555554
    Call Trace:
     <#DF>
     </#DF>
     <TASK>
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     ...
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     netlink_sock_destruct (net/netlink/af_netlink.c:401)
     __sk_destruct (net/core/sock.c:2177 (discriminator 2))
     sk_destruct (net/core/sock.c:2224)
     __sk_free (net/core/sock.c:2235)
     sk_free (net/core/sock.c:2246)
     process_one_work (kernel/workqueue.c:3259)
     worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
     kthread (kernel/kthread.c:388)
     ret_from_fork (arch/x86/kernel/process.c:153)
     ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
    Modules linked in:
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzkaller <[email protected]>
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Reviewed-by: David Ahern <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iwlwifi: mvm: rfi: use kmemdup() to replace kzalloc + memcpy [+ + +]

Author: Bixuan Cui <[email protected]>
Date:   Wed Oct 27 14:58:40 2021 +0800

    iwlwifi: mvm: rfi: use kmemdup() to replace kzalloc + memcpy
    
    [ Upstream commit 08186e2501eec554cde8bae53b1d1de4d54abdf4 ]
    
    Fix memdup.cocci warning:
    ./drivers/net/wireless/intel/iwlwifi/mvm/rfi.c:110:8-15: WARNING
    opportunity for kmemdup
    
    Signed-off-by: Bixuan Cui <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Luca Coelho <[email protected]>
    Stable-dep-of: 06a093807eb7 ("wifi: iwlwifi: mvm: rfi: fix potential response leaks")
    Signed-off-by: Sasha Levin <[email protected]>

ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa() [+ + +]

Author: Przemek Kitszel <[email protected]>
Date:   Tue Mar 5 17:02:02 2024 +0100

    ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
    
    [ Upstream commit aec806fb4afba5fe80b09e29351379a4292baa43 ]
    
    Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to
    avoid sleeping in IRQ context.
    
    Dan Carpenter, with the help of Smatch, has found following issue:
    The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13,
    2018 (linux-next), leads to the following Smatch static checker
    warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa()
            warn: sleeping in IRQ context
    
    The call tree that Smatch is worried about is:
    ixgbe_msix_other() <- IRQ handler
    -> ixgbe_msg_task()
       -> ixgbe_rcv_msg_from_vf()
          -> ixgbe_ipsec_vf_add_sa()
    
    Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
    Reported-by: Dan Carpenter <[email protected]>
    Link: https://lore.kernel.org/intel-wired-lan/[email protected]
    Reviewed-by: Michal Kubiak <[email protected]>
    Signed-off-by: Przemek Kitszel <[email protected]>
    Reviewed-by: Shannon Nelson <[email protected]>
    Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

kasan/test: avoid gcc warning for intentional overflow [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Mon Feb 12 12:15:52 2024 +0100

    kasan/test: avoid gcc warning for intentional overflow
    
    [ Upstream commit e10aea105e9ed14b62a11844fec6aaa87c6935a3 ]
    
    The out-of-bounds test allocates an object that is three bytes too short
    in order to validate the bounds checking.  Starting with gcc-14, this
    causes a compile-time warning as gcc has grown smart enough to understand
    the sizeof() logic:
    
    mm/kasan/kasan_test.c: In function 'kmalloc_oob_16':
    mm/kasan/kasan_test.c:443:14: error: allocation of insufficient size '13' for type 'struct <anonymous>' with size '16' [-Werror=alloc-size]
      443 |         ptr1 = kmalloc(sizeof(*ptr1) - 3, GFP_KERNEL);
          |              ^
    
    Hide the actual computation behind a RELOC_HIDE() that ensures
    the compiler misses the intentional bug.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 3f15801cdc23 ("lib: add kasan test module")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Reviewed-by: Andrey Konovalov <[email protected]>
    Cc: Alexander Potapenko <[email protected]>
    Cc: Andrey Ryabinin <[email protected]>
    Cc: Arnd Bergmann <[email protected]>
    Cc: Dmitry Vyukov <[email protected]>
    Cc: Marco Elver <[email protected]>
    Cc: Vincenzo Frascino <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

kasan: test: add memcpy test that avoids out-of-bounds write [+ + +]

Author: Peter Collingbourne <[email protected]>
Date:   Fri Nov 5 13:35:56 2021 -0700

    kasan: test: add memcpy test that avoids out-of-bounds write
    
    [ Upstream commit 758cabae312d3aded781aacc6d0c946b299c52df ]
    
    With HW tag-based KASAN, error checks are performed implicitly by the
    load and store instructions in the memcpy implementation.  A failed
    check results in tag checks being disabled and execution will keep
    going.  As a result, under HW tag-based KASAN, prior to commit
    1b0668be62cf ("kasan: test: disable kmalloc_memmove_invalid_size for
    HW_TAGS"), this memcpy would end up corrupting memory until it hits an
    inaccessible page and causes a kernel panic.
    
    This is a pre-existing issue that was revealed by commit 285133040e6c
    ("arm64: Import latest memcpy()/memmove() implementation") which changed
    the memcpy implementation from using signed comparisons (incorrectly,
    resulting in the memcpy being terminated early for negative sizes) to
    using unsigned comparisons.
    
    It is unclear how this could be handled by memcpy itself in a reasonable
    way.  One possibility would be to add an exception handler that would
    force memcpy to return if a tag check fault is detected -- this would
    make the behavior roughly similar to generic and SW tag-based KASAN.
    However, this wouldn't solve the problem for asynchronous mode and also
    makes memcpy behavior inconsistent with manually copying data.
    
    This test was added as a part of a series that taught KASAN to detect
    negative sizes in memory operations, see commit 8cceeff48f23 ("kasan:
    detect negative size in memory operation function").  Therefore we
    should keep testing for negative sizes with generic and SW tag-based
    KASAN.  But there is some value in testing small memcpy overflows, so
    let's add another test with memcpy that does not destabilize the kernel
    by performing out-of-bounds writes, and run it in all modes.
    
    Link: https://linux-review.googlesource.com/id/I048d1e6a9aff766c4a53f989fb0c83de68923882
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Peter Collingbourne <[email protected]>
    Reviewed-by: Andrey Konovalov <[email protected]>
    Acked-by: Marco Elver <[email protected]>
    Cc: Robin Murphy <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Cc: Evgenii Stepanov <[email protected]>
    Cc: Alexander Potapenko <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Stable-dep-of: e10aea105e9e ("kasan/test: avoid gcc warning for intentional overflow")
    Signed-off-by: Sasha Levin <[email protected]>

kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 [+ + +]

Author: Nathan Chancellor <[email protected]>
Date:   Tue Mar 5 15:12:47 2024 -0700

    kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
    
    [ Upstream commit 75b5ab134bb5f657ef7979a59106dce0657e8d87 ]
    
    Clang enables -Wenum-enum-conversion and -Wenum-compare-conditional
    under -Wenum-conversion. A recent change in Clang strengthened these
    warnings and they appear frequently in common builds, primarily due to
    several instances in common headers but there are quite a few drivers
    that have individual instances as well.
    
      include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
        508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
            |                            ~~~~~~~~~~~~~~~~~~~~~ ^
        509 |                            item];
            |                            ~~~~
    
      drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:955:24: warning: conditional expression between different enumeration types ('enum iwl_mac_beacon_flags' and 'enum iwl_mac_beacon_flags_v1') [-Wenum-compare-conditional]
        955 |                 flags |= is_new_rate ? IWL_MAC_BEACON_CCK
            |                                      ^ ~~~~~~~~~~~~~~~~~~
        956 |                           : IWL_MAC_BEACON_CCK_V1;
            |                             ~~~~~~~~~~~~~~~~~~~~~
      drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:1120:21: warning: conditional expression between different enumeration types ('enum iwl_mac_beacon_flags' and 'enum iwl_mac_beacon_flags_v1') [-Wenum-compare-conditional]
       1120 |                                                0) > 10 ?
            |                                                        ^
       1121 |                         IWL_MAC_BEACON_FILS :
            |                         ~~~~~~~~~~~~~~~~~~~
       1122 |                         IWL_MAC_BEACON_FILS_V1;
            |                         ~~~~~~~~~~~~~~~~~~~~~~
    
    Doing arithmetic between or returning two different types of enums could
    be a bug, so each of the instance of the warning needs to be evaluated.
    Unfortunately, as mentioned above, there are many instances of this
    warning in many different configurations, which can break the build when
    CONFIG_WERROR is enabled.
    
    To avoid introducing new instances of the warnings while cleaning up the
    disruption for the majority of users, disable these warnings for the
    default build while leaving them on for W=1 builds.
    
    Cc: [email protected]
    Closes: https://github.com/ClangBuiltLinux/linux/issues/2002
    Link: https://github.com/llvm/llvm-project/commit/8c2ae42b3e1c6aa7c18f873edcebff7c0b45a37e
    Acked-by: Yonghong Song <[email protected]>
    Signed-off-by: Nathan Chancellor <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Masahiro Yamada <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info [+ + +]

Author: Marios Makassikis <[email protected]>
Date:   Thu Feb 22 10:58:21 2024 +0100

    ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
    
    [ Upstream commit 34cd86b6632718b7df3999d96f51e63de41c5e4f ]
    
    Use vfs_getattr() to retrieve stat information, rather than make
    assumptions about how a filesystem fills inode structs.
    
    Cc: [email protected]
    Signed-off-by: Marios Makassikis <[email protected]>
    Acked-by: Namjae Jeon <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM/VMX: Move VERW closer to VMentry for MDS mitigation [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:11:14 2024 -0700

    KVM/VMX: Move VERW closer to VMentry for MDS mitigation
    
    commit 43fb862de8f628c5db5e96831c915b9aebf62d33 upstream.
    
    During VMentry VERW is executed to mitigate MDS. After VERW, any memory
    access like register push onto stack may put host data in MDS affected
    CPU buffers. A guest can then use MDS to sample host data.
    
    Although likelihood of secrets surviving in registers at current VERW
    callsite is less, but it can't be ruled out. Harden the MDS mitigation
    by moving the VERW mitigation late in VMentry path.
    
    Note that VERW for MMIO Stale Data mitigation is unchanged because of
    the complexity of per-guest conditional VERW which is not easy to handle
    that late in asm with no GPRs available. If the CPU is also affected by
    MDS, VERW is unconditionally executed late in asm regardless of guest
    having MMIO access.
    
      [ pawan: conflict resolved in backport ]
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Acked-by: Sean Christopherson <[email protected]>
    Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Tue Mar 12 14:11:08 2024 -0700

    KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
    
    commit 706a189dcf74d3b3f955e9384785e726ed6c7c80 upstream.
    
    Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus
    VMLAUNCH.  Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF,
    for MDS mitigations as late as possible without needing to duplicate VERW
    for both paths.
    
      [ pawan: resolved merge conflict in __vmx_vcpu_run in backport. ]
    
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Reviewed-by: Nikolay Borisov <[email protected]>
    Link: https://lore.kernel.org/all/20240213-delay-verw-v8-5-a6216d83edb7%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:11:36 2024 -0700

    KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
    
    commit 2a0180129d726a4b953232175857d442651b55a0 upstream.
    
    Mitigation for RFDS requires RFDS_CLEAR capability which is enumerated
    by MSR_IA32_ARCH_CAPABILITIES bit 27. If the host has it set, export it
    to guests so that they can deploy the mitigation.
    
    RFDS_NO indicates that the system is not vulnerable to RFDS, export it
    to guests so that they don't deploy the mitigation unnecessarily. When
    the host is not affected by X86_BUG_RFDS, but has RFDS_NO=0, synthesize
    RFDS_NO to the guest.
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Acked-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: Always flush async #PF workqueue when vCPU is being destroyed [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Tue Jan 9 17:15:30 2024 -0800

    KVM: Always flush async #PF workqueue when vCPU is being destroyed
    
    [ Upstream commit 3d75b8aa5c29058a512db29da7cbee8052724157 ]
    
    Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
    completion queue, e.g. when a VM and all its vCPUs is being destroyed.
    KVM must ensure that none of its workqueue callbacks is running when the
    last reference to the KVM _module_ is put.  Gifting a reference to the
    associated VM prevents the workqueue callback from dereferencing freed
    vCPU/VM memory, but does not prevent the KVM module from being unloaded
    before the callback completes.
    
    Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
    async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
    result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
    finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:
    
     WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm]
     Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
     CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119
     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
     Workqueue: events async_pf_execute [kvm]
     RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
     Call Trace:
      <TASK>
      async_pf_execute+0x198/0x260 [kvm]
      process_one_work+0x145/0x2d0
      worker_thread+0x27e/0x3a0
      kthread+0xba/0xe0
      ret_from_fork+0x2d/0x50
      ret_from_fork_asm+0x11/0x20
      </TASK>
     ---[ end trace 0000000000000000 ]---
     INFO: task kworker/8:1:251 blocked for more than 120 seconds.
           Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119
     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x00004000
     Workqueue: events async_pf_execute [kvm]
     Call Trace:
      <TASK>
      __schedule+0x33f/0xa40
      schedule+0x53/0xc0
      schedule_timeout+0x12a/0x140
      __wait_for_common+0x8d/0x1d0
      __flush_work.isra.0+0x19f/0x2c0
      kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
      kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
      kvm_put_kvm+0x1c1/0x320 [kvm]
      async_pf_execute+0x198/0x260 [kvm]
      process_one_work+0x145/0x2d0
      worker_thread+0x27e/0x3a0
      kthread+0xba/0xe0
      ret_from_fork+0x2d/0x50
      ret_from_fork_asm+0x11/0x20
      </TASK>
    
    If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
    then there's no need to gift async_pf_execute() a reference because all
    invocations of async_pf_execute() will be forced to complete before the
    vCPU and its VM are destroyed/freed.  And that in turn fixes the module
    unloading bug as __fput() won't do module_put() on the last vCPU reference
    until the vCPU has been freed, e.g. if closing the vCPU file also puts the
    last reference to the KVM module.
    
    Note that kvm_check_async_pf_completion() may also take the work item off
    the completion queue and so also needs to flush the work queue, as the
    work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
    on the workqueue could theoretically delay a vCPU due to waiting for the
    work to complete, but that's a very, very small chance, and likely a very
    small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
    new request, i.e. will effectively delay entering the guest, so the
    remaining work is really just:
    
            trace_kvm_async_pf_completed(addr, cr2_or_gpa);
    
            __kvm_vcpu_wake_up(vcpu);
    
            mmput(mm);
    
    and mmput() can't drop the last reference to the page tables if the vCPU is
    still alive, i.e. the vCPU won't get stuck tearing down page tables.
    
    Add a helper to do the flushing, specifically to deal with "wakeup all"
    work items, as they aren't actually work items, i.e. are never placed in a
    workqueue.  Trying to flush a bogus workqueue entry rightly makes
    __flush_work() complain (kudos to whoever added that sanity check).
    
    Note, commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are
    freed") *tried* to fix the module refcounting issue by having VMs grab a
    reference to the module, but that only made the bug slightly harder to hit
    as it gave async_pf_execute() a bit more time to complete before the KVM
    module could be unloaded.
    
    Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
    Cc: [email protected]
    Cc: David Matlack <[email protected]>
    Reviewed-by: Xu Yilun <[email protected]>
    Reviewed-by: Vitaly Kuznetsov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: arm64: Limit stage2_apply_range() batch size to largest block [+ + +]

Author: Oliver Upton <[email protected]>
Date:   Fri Oct 7 23:41:51 2022 +0000

    KVM: arm64: Limit stage2_apply_range() batch size to largest block
    
    commit 5994bc9e05c2f8811f233aa434e391cd2783f0f5 upstream.
    
    Presently stage2_apply_range() works on a batch of memory addressed by a
    stage 2 root table entry for the VM. Depending on the IPA limit of the
    VM and PAGE_SIZE of the host, this could address a massive range of
    memory. Some examples:
    
      4 level, 4K paging -> 512 GB batch size
    
      3 level, 64K paging -> 4TB batch size
    
    Unsurprisingly, working on such a large range of memory can lead to soft
    lockups. When running dirty_log_perf_test:
    
      ./dirty_log_perf_test -m -2 -s anonymous_thp -b 4G -v 48
    
      watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [dirty_log_perf_:16703]
      Modules linked in: vfat fat cdc_ether usbnet mii xhci_pci xhci_hcd sha3_generic gq(O)
      CPU: 0 PID: 16703 Comm: dirty_log_perf_ Tainted: G           O       6.0.0-smp-DEV #1
      pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : dcache_clean_inval_poc+0x24/0x38
      lr : clean_dcache_guest_page+0x28/0x4c
      sp : ffff800021763990
      pmr_save: 000000e0
      x29: ffff800021763990 x28: 0000000000000005 x27: 0000000000000de0
      x26: 0000000000000001 x25: 00400830b13bc77f x24: ffffad4f91ead9c0
      x23: 0000000000000000 x22: ffff8000082ad9c8 x21: 0000fffafa7bc000
      x20: ffffad4f9066ce50 x19: 0000000000000003 x18: ffffad4f92402000
      x17: 000000000000011b x16: 000000000000011b x15: 0000000000000124
      x14: ffff07ff8301d280 x13: 0000000000000000 x12: 00000000ffffffff
      x11: 0000000000010001 x10: fffffc0000000000 x9 : ffffad4f9069e580
      x8 : 000000000000000c x7 : 0000000000000000 x6 : 000000000000003f
      x5 : ffff07ffa2076980 x4 : 0000000000000001 x3 : 000000000000003f
      x2 : 0000000000000040 x1 : ffff0830313bd000 x0 : ffff0830313bcc40
      Call trace:
       dcache_clean_inval_poc+0x24/0x38
       stage2_unmap_walker+0x138/0x1ec
       __kvm_pgtable_walk+0x130/0x1d4
       __kvm_pgtable_walk+0x170/0x1d4
       __kvm_pgtable_walk+0x170/0x1d4
       __kvm_pgtable_walk+0x170/0x1d4
       kvm_pgtable_stage2_unmap+0xc4/0xf8
       kvm_arch_flush_shadow_memslot+0xa4/0x10c
       kvm_set_memslot+0xb8/0x454
       __kvm_set_memory_region+0x194/0x244
       kvm_vm_ioctl_set_memory_region+0x58/0x7c
       kvm_vm_ioctl+0x49c/0x560
       __arm64_sys_ioctl+0x9c/0xd4
       invoke_syscall+0x4c/0x124
       el0_svc_common+0xc8/0x194
       do_el0_svc+0x38/0xc0
       el0_svc+0x2c/0xa4
       el0t_64_sync_handler+0x84/0xf0
       el0t_64_sync+0x1a0/0x1a4
    
    Use the largest supported block mapping for the configured page size as
    the batch granularity. In so doing the walker is guaranteed to visit a
    leaf only once.
    
    Signed-off-by: Oliver Upton <[email protected]>
    Signed-off-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Krister Johansen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: arm64: Work out supported block level at compile time [+ + +]

Author: Oliver Upton <[email protected]>
Date:   Fri Oct 7 23:41:50 2022 +0000

    KVM: arm64: Work out supported block level at compile time
    
    commit 3b5c082bbfa20d9a57924edd655bbe63fe98ab06 upstream.
    
    Work out the minimum page table level where KVM supports block mappings
    at compile time. While at it, rewrite the comment around supported block
    mappings to directly describe what KVM supports instead of phrasing in
    terms of what it does not.
    
    Signed-off-by: Oliver Upton <[email protected]>
    Signed-off-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Krister Johansen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Fri Feb 16 17:34:30 2024 -0800

    KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
    
    commit 5ef1d8c1ddbf696e47b226e11888eaf8d9e8e807 upstream.
    
    Do the cache flush of converted pages in svm_register_enc_region() before
    dropping kvm->lock to fix use-after-free issues where region and/or its
    array of pages could be freed by a different task, e.g. if userspace has
    __unregister_enc_region_locked() already queued up for the region.
    
    Note, the "obvious" alternative of using local variables doesn't fully
    resolve the bug, as region->pages is also dynamically allocated.  I.e. the
    region structure itself would be fine, but region->pages could be freed.
    
    Flushing multiple pages under kvm->lock is unfortunate, but the entire
    flow is a rare slow path, and the manual flush is only needed on CPUs that
    lack coherency for encrypted memory.
    
    Fixes: 19a23da53932 ("Fix unsynchronized access to sev members through svm_register_enc_region")
    Reported-by: Gabe Kirkpatrick <[email protected]>
    Cc: Josh Eads <[email protected]>
    Cc: Peter Gonda <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Add BHI_NO [+ + +]

Author: Daniel Sneddon <[email protected]>
Date:   Wed Mar 13 09:49:17 2024 -0700

    KVM: x86: Add BHI_NO
    
    commit ed2e8d49b54d677f3123668a21a57822d679651f upstream.
    
    Intel processors that aren't vulnerable to BHI will set
    MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to
    determine if they need to implement BHI mitigations or not.  Allow this bit
    to be passed to the guests.
    
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Alexandre Chartre <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace [+ + +]

Author: Jim Mattson <[email protected]>
Date:   Mon Oct 23 17:16:35 2023 -0700

    KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace
    
    commit eefe5e6682099445f77f2d97d4c525f9ac9d9b07 upstream.
    
    The low five bits {INTEL_PSFD, IPRED_CTRL, RRSBA_CTRL, DDPD_U, BHI_CTRL}
    advertise the availability of specific bits in IA32_SPEC_CTRL. Since KVM
    dynamically determines the legal IA32_SPEC_CTRL bits for the underlying
    hardware, the hard work has already been done. Just let userspace know
    that a guest can use these IA32_SPEC_CTRL bits.
    
    The sixth bit (MCDT_NO) states that the processor does not exhibit MXCSR
    Configuration Dependent Timing (MCDT) behavior. This is an inherent
    property of the physical processor that is inherited by the virtual
    CPU. Pass that information on to userspace.
    
    Signed-off-by: Jim Mattson <[email protected]>
    Reviewed-by: Chao Gao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Bail to userspace if emulation of atomic user access faults [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Wed Feb 2 00:49:45 2022 +0000

    KVM: x86: Bail to userspace if emulation of atomic user access faults
    
    commit 5d6c7de6446e9ab3fb41d6f7d82770e50998f3de upstream.
    
    Exit to userspace when emulating an atomic guest access if the CMPXCHG on
    the userspace address faults.  Emulating the access as a write and thus
    likely treating it as emulated MMIO is wrong, as KVM has already
    confirmed there is a valid, writable memslot.
    
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Mark target gfn of emulated atomic instruction as dirty [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Wed Feb 14 17:00:03 2024 -0800

    KVM: x86: Mark target gfn of emulated atomic instruction as dirty
    
    commit 910c57dfa4d113aae6571c2a8b9ae8c430975902 upstream.
    
    When emulating an atomic access on behalf of the guest, mark the target
    gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault.  This
    fixes a bug where KVM effectively corrupts guest memory during live
    migration by writing to guest memory without informing userspace that the
    page is dirty.
    
    Marking the page dirty got unintentionally dropped when KVM's emulated
    CMPXCHG was converted to do a user access.  Before that, KVM explicitly
    mapped the guest page into kernel memory, and marked the page dirty during
    the unmap phase.
    
    Mark the page dirty even if the CMPXCHG fails, as the old data is written
    back on failure, i.e. the page is still written.  The value written is
    guaranteed to be the same because the operation is atomic, but KVM's ABI
    is that all writes are dirty logged regardless of the value written.  And
    more importantly, that's what KVM did before the buggy commit.
    
    Huge kudos to the folks on the Cc list (and many others), who did all the
    actual work of triaging and debugging.
    
    Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
    Cc: [email protected]
    Cc: David Matlack <[email protected]>
    Cc: Pasha Tatashin <[email protected]>
    Cc: Michael Krebs <[email protected]>
    base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
    Reviewed-by: Jim Mattson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Fri Nov 25 20:58:39 2022 +0800

    KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs
    
    commit 047c7229906152fb85c23dc18fd25a00cd7cb4de upstream.
    
    Rename kvm_cpu_cap_init_scattered() to kvm_cpu_cap_init_kvm_defined() in
    anticipation of adding KVM-only CPUID leafs that aren't recognized by the
    kernel and thus not scattered, i.e. for leafs that are 100% KVM-defined.
    
    Adjust/add comments to kvm_only_cpuid_leafs and KVM_X86_FEATURE to
    document how to create new kvm_only_cpuid_leafs entries for scattered
    features as well as features that are entirely unknown to the kernel.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Use a switch statement and macros in __feature_translate() [+ + +]

Author: Jim Mattson <[email protected]>
Date:   Mon Oct 23 17:16:36 2023 -0700

    KVM: x86: Use a switch statement and macros in __feature_translate()
    
    commit 80c883db87d9ffe2d685e91ba07a087b1c246c78 upstream.
    
    Use a switch statement with macro-generated case statements to handle
    translating feature flags in order to reduce the probability of runtime
    errors due to copy+paste goofs, to make compile-time errors easier to
    debug, and to make the code more readable.
    
    E.g. the compiler won't directly generate an error for duplicate if
    statements
    
            if (x86_feature == X86_FEATURE_SGX1)
                    return KVM_X86_FEATURE_SGX1;
            else if (x86_feature == X86_FEATURE_SGX2)
                    return KVM_X86_FEATURE_SGX1;
    
    and so instead reverse_cpuid_check() will fail due to the untranslated
    entry pointing at a Linux-defined leaf, which provides practically no
    hint as to what is broken
    
      arch/x86/kvm/reverse_cpuid.h:108:2: error: call to __compiletime_assert_450 declared with 'error' attribute:
                                          BUILD_BUG_ON failed: x86_leaf == CPUID_LNX_4
              BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
              ^
    whereas duplicate case statements very explicitly point at the offending
    code:
    
      arch/x86/kvm/reverse_cpuid.h:125:2: error: duplicate case value '361'
              KVM_X86_TRANSLATE_FEATURE(SGX2);
              ^
      arch/x86/kvm/reverse_cpuid.h:124:2: error: duplicate case value '360'
              KVM_X86_TRANSLATE_FEATURE(SGX1);
              ^
    
    And without macros, the opposite type of copy+paste goof doesn't generate
    any error at compile-time, e.g. this yields no complaints:
    
            case X86_FEATURE_SGX1:
                    return KVM_X86_FEATURE_SGX1;
            case X86_FEATURE_SGX2:
                    return KVM_X86_FEATURE_SGX1;
    
    Note, __feature_translate() is forcibly inlined and the feature is known
    at compile-time, so the code generation between an if-elif sequence and a
    switch statement should be identical.
    
    Signed-off-by: Jim Mattson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [sean: use a macro, rewrite changelog]
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

landlock: Warn once if a Landlock action is requested while disabled [+ + +]

Author: Mickaц╚l Salaц╪n <[email protected]>
Date:   Tue Feb 27 12:05:50 2024 +0100

    landlock: Warn once if a Landlock action is requested while disabled
    
    [ Upstream commit 782191c74875cc33b50263e21d76080b1411884d ]
    
    Because sandboxing can be used as an opportunistic security measure,
    user space may not log unsupported features.  Let the system
    administrator know if an application tries to use Landlock but failed
    because it isn't enabled at boot time.  This may be caused by boot
    loader configurations with outdated "lsm" kernel's command-line
    parameter.
    
    Cc: [email protected]
    Fixes: 265885daf3e5 ("landlock: Add syscall implementations")
    Reviewed-by: Kees Cook <[email protected]>
    Reviewed-by: Gц╪nther Noack <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mickaц╚l Salaц╪n <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Linux: Linux 5.15.154 [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Wed Apr 10 16:19:44 2024 +0200

    Linux 5.15.154
    
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: SeongJae Park <[email protected]>
    Tested-by: Kelsey Steele <[email protected]>
    Tested-by: Ron Economos <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Tested-by: Shuah Khan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: kernelci.org bot <[email protected]>
    Tested-by: Harshit Mogalapalli <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

lockd: drop inappropriate svc_get() from locked_get() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Sat Jun 3 07:14:14 2023 +1000

    lockd: drop inappropriate svc_get() from locked_get()
    
    [ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ]
    
    The below-mentioned patch was intended to simplify refcounting on the
    svc_serv used by locked.  The goal was to only ever have a single
    reference from the single thread.  To that end we dropped a call to
    lockd_start_svc() (except when creating thread) which would take a
    reference, and dropped the svc_put(serv) that would drop that reference.
    
    Unfortunately we didn't also remove the svc_get() from
    lockd_create_svc() in the case where the svc_serv already existed.
    So after the patch:
     - on the first call the svc_serv was allocated and the one reference
       was given to the thread, so there are no extra references
     - on subsequent calls svc_get() was called so there is now an extra
       reference.
    This is clearly not consistent.
    
    The inconsistency is also clear in the current code in lockd_get()
    takes *two* references, one on nlmsvc_serv and one by incrementing
    nlmsvc_users.   This clearly does not match lockd_put().
    
    So: drop that svc_get() from lockd_get() (which used to be in
    lockd_create_svc().
    
    Reported-by: Ido Schimmel <[email protected]>
    Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u
    Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()")
    Signed-off-by: NeilBrown <[email protected]>
    Tested-by: Ido Schimmel <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: ensure we use the correct file descriptor when unlocking [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Nov 11 14:36:37 2022 -0500

    lockd: ensure we use the correct file descriptor when unlocking
    
    [ Upstream commit 69efce009f7df888e1fede3cb2913690eb829f52 ]
    
    Shared locks are set on O_RDONLY descriptors and exclusive locks are set
    on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once
    for each descriptor, but it doesn't reset fl_file. Ensure that it does.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: fix file selection in nlmsvc_cancel_blocked [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Nov 11 14:36:38 2022 -0500

    lockd: fix file selection in nlmsvc_cancel_blocked
    
    [ Upstream commit 9f27783b4dd235ef3c8dbf69fc6322777450323c ]
    
    We currently do a lock_to_openmode call based on the arguments from the
    NLM_UNLOCK call, but that will always set the fl_type of the lock to
    F_UNLCK, and the O_RDONLY descriptor is always chosen.
    
    Fix it to use the file_lock from the block instead.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: introduce lockd_put() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: introduce lockd_put()
    
    [ Upstream commit 865b674069e05e5779fcf8cf7a166d2acb7e930b ]
    
    There is some cleanup that is duplicated in lockd_down() and the failure
    path of lockd_up().
    Factor these out into a new lockd_put() and call it from both places.
    
    lockd_put() does *not* take the mutex - that must be held by the caller.
    It decrements nlmsvc_users and if that reaches zero, it cleans up.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: introduce nlmsvc_serv [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: introduce nlmsvc_serv
    
    [ Upstream commit 2840fe864c91a0fe822169b1fbfddbcac9aeac43 ]
    
    lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it
    wants the 'struct svc_serv', and when it doesn't want it exactly it can
    get to what it wants from the serv.
    
    This patch is a first step to removing nlmsvc_task and nlmsvc_rqst.  It
    introduces nlmsvc_serv to store the 'struct svc_serv*'.  This is set as
    soon as the serv is created, and cleared only when it is destroyed.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: introduce safe async lock op [+ + +]

Author: Alexander Aring <[email protected]>
Date:   Tue Sep 12 17:53:18 2023 -0400

    lockd: introduce safe async lock op
    
    [ Upstream commit 2dd10de8e6bcbacf85ad758b904543c294820c63 ]
    
    This patch reverts mostly commit 40595cdc93ed ("nfs: block notification
    on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
    export flag to signal that the "own ->lock" implementation supports
    async lock requests. The only main user is DLM that is used by GFS2 and
    OCFS2 filesystem. Those implement their own lock() implementation and
    return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93ed
    ("nfs: block notification on fs with its own ->lock") the DLM
    implementation were never updated. This patch should prepare for DLM
    to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
    plock implementation regarding to it.
    
    Acked-by: Jeff Layton <[email protected]>
    Signed-off-by: Alexander Aring <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: move from strlcpy with unused retval to strscpy [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Thu Aug 18 23:01:16 2022 +0200

    lockd: move from strlcpy with unused retval to strscpy
    
    [ Upstream commit 97f8e62572555f8ad578d7b1739ba64d5d2cac0f ]
    
    Follow the advice of the below link and prefer 'strscpy' in this
    subsystem. Conversion is 1:1 because the return value is not used.
    Generated by a coccinelle script.
    
    Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
    Signed-off-by: Wolfram Sang <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: move lockd_start_svc() call into lockd_create_svc() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: move lockd_start_svc() call into lockd_create_svc()
    
    [ Upstream commit b73a2972041bee70eb0cbbb25fa77828c63c916b ]
    
    lockd_start_svc() only needs to be called once, just after the svc is
    created.  If the start fails, the svc is discarded too.
    
    It thus makes sense to call lockd_start_svc() from lockd_create_svc().
    This allows us to remove the test against nlmsvc_rqst at the start of
    lockd_start_svc() - it must always be NULL.
    
    lockd_up() only held an extra reference on the svc until a thread was
    created - then it dropped it.  The thread - and thus the extra reference
    - will remain until kthread_stop() is called.
    Now that the thread is created in lockd_create_svc(), the extra
    reference can be dropped there.  So the 'serv' variable is no longer
    needed in lockd_up().
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: move svc_exit_thread() into the thread [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: move svc_exit_thread() into the thread
    
    [ Upstream commit 6a4e2527a63620a820c4ebf3596b57176da26fb3 ]
    
    The normal place to call svc_exit_thread() is from the thread itself
    just before it exists.
    Do this for lockd.
    
    This means that nlmsvc_rqst is not used out side of lockd_start_svc(),
    so it can be made local to that function, and renamed to 'rqst'.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: rename lockd_create_svc() to lockd_get() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: rename lockd_create_svc() to lockd_get()
    
    [ Upstream commit ecd3ad68d2c6d3ae178a63a2d9a02c392904fd36 ]
    
    lockd_create_svc() already does an svc_get() if the service already
    exists, so it is more like a "get" than a "create".
    
    So:
     - Move the increment of nlmsvc_users into the function as well
     - rename to lockd_get().
    
    It is now the inverse of lockd_put().
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: set missing fl_flags field when retrieving args [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Nov 11 14:36:36 2022 -0500

    lockd: set missing fl_flags field when retrieving args
    
    [ Upstream commit 75c7940d2a86d3f1b60a0a265478cb8fc887b970 ]
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: simplify management of network status notifiers [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: simplify management of network status notifiers
    
    [ Upstream commit 5a8a7ff57421b7de3ae72019938ffb5daaee36e7 ]
    
    Now that the network status notifiers use nlmsvc_serv rather then
    nlmsvc_rqst the management can be simplified.
    
    Notifier unregistration synchronises with any pending notifications so
    providing we unregister before nlm_serv is freed no further interlock
    is required.
    
    So we move the unregister call to just before the thread is killed
    (which destroys the service) and just before the service is destroyed in
    the failure-path of lockd_up().
    
    Then nlm_ntf_refcnt and nlm_ntf_wq can be removed.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: use locks_inode_context helper [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Nov 16 09:19:43 2022 -0500

    lockd: use locks_inode_context helper
    
    [ Upstream commit 98b41ffe0afdfeaa1439a5d6bd2db4a94277e31b ]
    
    lockd currently doesn't access i_flctx safely. This requires a
    smp_load_acquire, as the pointer is set via cmpxchg (a release
    operation).
    
    Cc: Trond Myklebust <[email protected]>
    Cc: Anna Schumaker <[email protected]>
    Cc: Chuck Lever <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

lockd: use svc_set_num_threads() for thread start and stop [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    lockd: use svc_set_num_threads() for thread start and stop
    
    [ Upstream commit 6b044fbaab02292fedb17565dbb3f2528083b169 ]
    
    svc_set_num_threads() does everything that lockd_start_svc() does, except
    set sv_maxconn.  It also (when passed 0) finds the threads and
    stops them with kthread_stop().
    
    So move the setting for sv_maxconn, and use svc_set_num_thread()
    
    We now don't need nlmsvc_task.
    
    Now that we use svc_set_num_threads() it makes sense to set svo_module.
    This request that the thread exists with module_put_and_exit().
    Also fix the documentation for svo_module to make this explicit.
    
    svc_prepare_thread is now only used where it is defined, so it can be
    made static.
    
    Signed-off-by: NeilBrown <[email protected]>
    [ cel: upstream, module_put_and_exit was replaced via a merge commit ]
    Signed-off-by: Chuck Lever <[email protected]>

locking/rwsem: Disable preemption while trying for rwsem lock [+ + +]

Author: Gokul krishna Krishnakumar <[email protected]>
Date:   Thu Sep 8 23:54:27 2022 +0530

    locking/rwsem: Disable preemption while trying for rwsem lock
    
    commit 48dfb5d2560d36fb16c7d430c229d1604ea7d185 upstream.
    
    Make the region inside the rwsem_write_trylock non preemptible.
    
    We observe RT task is hogging CPU when trying to acquire rwsem lock
    which was acquired by a kworker task but before the rwsem owner was set.
    
    Here is the scenario:
    1. CFS task (affined to a particular CPU) takes rwsem lock.
    
    2. CFS task gets preempted by a RT task before setting owner.
    
    3. RT task (FIFO) is trying to acquire the lock, but spinning until
    RT throttling happens for the lock as the lock was taken by CFS task.
    
    This patch attempts to fix the above issue by disabling preemption
    until owner is set for the lock. While at it also fix the issues
    at the places where rwsem_{set,clear}_owner() are called.
    
    This also adds lockdep annotation of preemption disable in
    rwsem_{set,clear}_owner() on Peter Z. suggestion.
    
    Signed-off-by: Gokul krishna Krishnakumar <[email protected]>
    Signed-off-by: Mukesh Ojha <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Reviewed-by: Waiman Long <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: Aaro Koskinen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac802154: fix llsec key resources release in mac802154_llsec_key_del [+ + +]

Author: Fedor Pchelkin <[email protected]>
Date:   Wed Feb 28 19:38:39 2024 +0300

    mac802154: fix llsec key resources release in mac802154_llsec_key_del
    
    [ Upstream commit e8a1e58345cf40b7b272e08ac7b32328b2543e40 ]
    
    mac802154_llsec_key_del() can free resources of a key directly without
    following the RCU rules for waiting before the end of a grace period. This
    may lead to use-after-free in case llsec_lookup_key() is traversing the
    list of keys in parallel with a key deletion:
    
    refcount_t: addition on 0; use-after-free.
    WARNING: CPU: 4 PID: 16000 at lib/refcount.c:25 refcount_warn_saturate+0x162/0x2a0
    Modules linked in:
    CPU: 4 PID: 16000 Comm: wpan-ping Not tainted 6.7.0 #19
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
    RIP: 0010:refcount_warn_saturate+0x162/0x2a0
    Call Trace:
     <TASK>
     llsec_lookup_key.isra.0+0x890/0x9e0
     mac802154_llsec_encrypt+0x30c/0x9c0
     ieee802154_subif_start_xmit+0x24/0x1e0
     dev_hard_start_xmit+0x13e/0x690
     sch_direct_xmit+0x2ae/0xbc0
     __dev_queue_xmit+0x11dd/0x3c20
     dgram_sendmsg+0x90b/0xd60
     __sys_sendto+0x466/0x4c0
     __x64_sys_sendto+0xe0/0x1c0
     do_syscall_64+0x45/0xf0
     entry_SYSCALL_64_after_hwframe+0x6e/0x76
    
    Also, ieee802154_llsec_key_entry structures are not freed by
    mac802154_llsec_key_del():
    
    unreferenced object 0xffff8880613b6980 (size 64):
      comm "iwpan", pid 2176, jiffies 4294761134 (age 60.475s)
      hex dump (first 32 bytes):
        78 0d 8f 18 80 88 ff ff 22 01 00 00 00 00 ad de  x.......".......
        00 00 00 00 00 00 00 00 03 00 cd ab 00 00 00 00  ................
      backtrace:
        [<ffffffff81dcfa62>] __kmem_cache_alloc_node+0x1e2/0x2d0
        [<ffffffff81c43865>] kmalloc_trace+0x25/0xc0
        [<ffffffff88968b09>] mac802154_llsec_key_add+0xac9/0xcf0
        [<ffffffff8896e41a>] ieee802154_add_llsec_key+0x5a/0x80
        [<ffffffff8892adc6>] nl802154_add_llsec_key+0x426/0x5b0
        [<ffffffff86ff293e>] genl_family_rcv_msg_doit+0x1fe/0x2f0
        [<ffffffff86ff46d1>] genl_rcv_msg+0x531/0x7d0
        [<ffffffff86fee7a9>] netlink_rcv_skb+0x169/0x440
        [<ffffffff86ff1d88>] genl_rcv+0x28/0x40
        [<ffffffff86fec15c>] netlink_unicast+0x53c/0x820
        [<ffffffff86fecd8b>] netlink_sendmsg+0x93b/0xe60
        [<ffffffff86b91b35>] ____sys_sendmsg+0xac5/0xca0
        [<ffffffff86b9c3dd>] ___sys_sendmsg+0x11d/0x1c0
        [<ffffffff86b9c65a>] __sys_sendmsg+0xfa/0x1d0
        [<ffffffff88eadbf5>] do_syscall_64+0x45/0xf0
        [<ffffffff890000ea>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
    
    Handle the proper resource release in the RCU callback function
    mac802154_llsec_key_del_rcu().
    
    Note that if llsec_lookup_key() finds a key, it gets a refcount via
    llsec_key_get() and locally copies key id from key_entry (which is a
    list element). So it's safe to call llsec_key_put() and free the list
    entry after the RCU grace period elapses.
    
    Found by Linux Verification Center (linuxtesting.org).
    
    Fixes: 5d637d5aabd8 ("mac802154: add llsec structures and mutators")
    Cc: [email protected]
    Signed-off-by: Fedor Pchelkin <[email protected]>
    Acked-by: Alexander Aring <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Stefan Schmidt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

media: staging: ipu3-imgu: Set fields before media_entity_pads_init() [+ + +]

Author: Hidenori Kobayashi <[email protected]>
Date:   Tue Jan 9 17:09:09 2024 +0900

    media: staging: ipu3-imgu: Set fields before media_entity_pads_init()
    
    [ Upstream commit 87318b7092670d4086bfec115a0280a60c51c2dd ]
    
    The imgu driver fails to probe with the following message because it
    does not set the pad's flags before calling media_entity_pads_init().
    
    [   14.596315] ipu3-imgu 0000:00:05.0: failed initialize subdev media entity (-22)
    [   14.596322] ipu3-imgu 0000:00:05.0: failed to register subdev0 ret (-22)
    [   14.596327] ipu3-imgu 0000:00:05.0: failed to register pipes (-22)
    [   14.596331] ipu3-imgu 0000:00:05.0: failed to create V4L2 devices (-22)
    
    Fix the initialization order so that the driver probe succeeds. The ops
    initialization is also moved together for readability.
    
    Fixes: a0ca1627b450 ("media: staging/intel-ipu3: Add v4l2 driver based on media framework")
    Cc: <[email protected]> # 6.7
    Cc: Dan Carpenter <[email protected]>
    Signed-off-by: Hidenori Kobayashi <[email protected]>
    Signed-off-by: Sakari Ailus <[email protected]>
    Signed-off-by: Hans Verkuil <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

media: xc4000: Fix atomicity violation in xc4000_get_frequency [+ + +]

Author: Gui-Dong Han <[email protected]>
Date:   Fri Dec 22 13:50:30 2023 +0800

    media: xc4000: Fix atomicity violation in xc4000_get_frequency
    
    [ Upstream commit 36d503ad547d1c75758a6fcdbec2806f1b6aeb41 ]
    
    In xc4000_get_frequency():
            *freq = priv->freq_hz + priv->freq_offset;
    The code accesses priv->freq_hz and priv->freq_offset without holding any
    lock.
    
    In xc4000_set_params():
            // Code that updates priv->freq_hz and priv->freq_offset
            ...
    
    xc4000_get_frequency() and xc4000_set_params() may execute concurrently,
    risking inconsistent reads of priv->freq_hz and priv->freq_offset. Since
    these related data may update during reading, it can result in incorrect
    frequency calculation, leading to atomicity violations.
    
    This possible bug is found by an experimental static analysis tool
    developed by our team, BassCheck[1]. This tool analyzes the locking APIs
    to extract function pairs that can be concurrently executed, and then
    analyzes the instructions in the paired functions to identify possible
    concurrency bugs including data races and atomicity violations. The above
    possible bug is reported when our tool analyzes the source code of
    Linux 6.2.
    
    To address this issue, it is proposed to add a mutex lock pair in
    xc4000_get_frequency() to ensure atomicity. With this patch applied, our
    tool no longer reports the possible bug, with the kernel configuration
    allyesconfig for x86_64. Due to the lack of associated hardware, we cannot
    test the patch in runtime testing, and just verify it according to the
    code logic.
    
    [1] https://sites.google.com/view/basscheck/
    
    Fixes: 4c07e32884ab ("[media] xc4000: Fix get_frequency()")
    Cc: [email protected]
    Reported-by: BassCheck <[email protected]>
    Signed-off-by: Gui-Dong Han <[email protected]>
    Signed-off-by: Hans Verkuil <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mei: me: add arrow lake point H DID [+ + +]

Author: Alexander Usyskin <[email protected]>
Date:   Sun Feb 11 12:39:12 2024 +0200

    mei: me: add arrow lake point H DID
    
    commit 8436f25802ec028ac7254990893f3e01926d9b79 upstream.
    
    Add Arrow Lake H device id.
    
    Cc: [email protected]
    Signed-off-by: Alexander Usyskin <[email protected]>
    Signed-off-by: Tomas Winkler <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mei: me: add arrow lake point S DID [+ + +]

Author: Alexander Usyskin <[email protected]>
Date:   Sun Feb 11 12:39:11 2024 +0200

    mei: me: add arrow lake point S DID
    
    commit 7a9b9012043e126f6d6f4683e67409312d1b707b upstream.
    
    Add Arrow Lake S device id.
    
    Cc: [email protected]
    Signed-off-by: Alexander Usyskin <[email protected]>
    Signed-off-by: Tomas Winkler <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

memtest: use {READ,WRITE}_ONCE in memory scanning [+ + +]

Author: Qiang Zhang <[email protected]>
Date:   Tue Mar 12 16:04:23 2024 +0800

    memtest: use {READ,WRITE}_ONCE in memory scanning
    
    [ Upstream commit 82634d7e24271698e50a3ec811e5f50de790a65f ]
    
    memtest failed to find bad memory when compiled with clang.  So use
    {WRITE,READ}_ONCE to access memory to avoid compiler over optimization.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Qiang Zhang <[email protected]>
    Cc: Bill Wendling <[email protected]>
    Cc: Justin Stitt <[email protected]>
    Cc: Nathan Chancellor <[email protected]>
    Cc: Nick Desaulniers <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

minmax: add umin(a, b) and umax(a, b) [+ + +]

Author: David Laight <[email protected]>
Date:   Mon Sep 18 08:16:30 2023 +0000

    minmax: add umin(a, b) and umax(a, b)
    
    [ Upstream commit 80fcac55385ccb710d33a20dc1caaef29bd5a921 ]
    
    Patch series "minmax: Relax type checks in min() and max()", v4.
    
    The min() (etc) functions in minmax.h require that the arguments have
    exactly the same types.
    
    However when the type check fails, rather than look at the types and fix
    the type of a variable/constant, everyone seems to jump on min_t().  In
    reality min_t() ought to be rare - when something unusual is being done,
    not normality.
    
    The orginal min() (added in 2.4.9) replaced several inline functions and
    included the type - so matched the implicit casting of the function call.
    This was renamed min_t() in 2.4.10 and the current min() added.  There is
    no actual indication that the conversion of negatve values to large
    unsigned values has ever been an actual problem.
    
    A quick grep shows 5734 min() and 4597 min_t().  Having the casts on
    almost half of the calls shows that something is clearly wrong.
    
    If the wrong type is picked (and it is far too easy to pick the type of
    the result instead of the larger input) then significant bits can get
    discarded.
    
    Pretty much the worst example is in the derived clamp_val(), consider:
            unsigned char x = 200u;
            y = clamp_val(x, 10u, 300u);
    
    I also suspect that many of the min_t(u16, ...) are actually wrong.  For
    example copy_data() in printk_ringbuffer.c contains:
    
            data_size = min_t(u16, buf_size, len);
    
    Here buf_size is 'unsigned int' and len 'u16', pass a 64k buffer (can you
    prove that doesn't happen?) and no data is returned.  Apparantly it did -
    and has since been fixed.
    
    The only reason that most of the min_t() are 'fine' is that pretty much
    all the values in the kernel are between 0 and INT_MAX.
    
    Patch 1 adds umin(), this uses integer promotions to convert both
    arguments to 'unsigned long long'.  It can be used to compare a signed
    type that is known to contain a non-negative value with an unsigned type.
    The compiler typically optimises it all away.  Added first so that it can
    be referred to in patch 2.
    
    Patch 2 replaces the 'same type' check with a 'same signedness' one.  This
    makes min(unsigned_int_var, sizeof()) be ok.  The error message is also
    improved and will contain the expanded form of both arguments (useful for
    seeing how constants are defined).
    
    Patch 3 just fixes some whitespace.
    
    Patch 4 allows comparisons of 'unsigned char' and 'unsigned short' to
    signed types.  The integer promotion rules convert them both to 'signed
    int' prior to the comparison so they can never cause a negative value be
    converted to a large positive one.
    
    Patch 5 (rewritted for v4) allows comparisons of unsigned values against
    non-negative constant integer expressions.  This makes
    min(unsigned_int_var, 4) be ok.
    
    The only common case that is still errored is the comparison of signed
    values against unsigned constant integer expressions below __INT_MAX__.
    Typcally min(int_val, sizeof (foo)), the real fix for this is casting the
    constant: min(int_var, (int)sizeof (foo)).
    
    With all the patches applied pretty much all the min_t() could be replaced
    by min(), and most of the rest by umin().  However they all need careful
    inspection due to code like:
    
            sz = min_t(unsigned char, sz - 1, LIM - 1) + 1;
    
    which converts 0 to LIM.
    
    This patch (of 6):
    
    umin() and umax() can be used when min()/max() errors a signed v unsigned
    compare when the signed value is known to be non-negative.
    
    Unlike min_t(some_unsigned_type, a, b) umin() will never mask off high
    bits if an inappropriate type is selected.
    
    The '+ 0u + 0ul + 0ull' may look strange.
    The '+ 0u' is needed for 'signed int' on 64bit systems.
    The '+ 0ul' is needed for 'signed long' on 32bit systems.
    The '+ 0ull' is needed for 'signed long long'.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: David Laight <[email protected]>
    Cc: Andy Shevchenko <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Jason A. Donenfeld <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Stable-dep-of: 51b30ecb73b4 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
    Signed-off-by: Sasha Levin <[email protected]>

mlxbf_gige: call request_irq() after NAPI initialized [+ + +]

Author: David Thompson <[email protected]>
Date:   Mon Mar 25 14:36:27 2024 -0400

    mlxbf_gige: call request_irq() after NAPI initialized
    
    [ Upstream commit f7442a634ac06b953fc1f7418f307b25acd4cfbc ]
    
    The mlxbf_gige driver encounters a NULL pointer exception in
    mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
    the exception is as follows:
    a) enable kdump
    b) trigger kdump via "echo c > /proc/sysrq-trigger"
    c) kdump kernel executes
    d) kdump kernel loads mlxbf_gige module
    e) the mlxbf_gige module runs its open() as the
       the "oob_net0" interface is brought up
    f) mlxbf_gige module will experience an exception
       during its open(), something like:
    
         Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
         Mem abort info:
           ESR = 0x0000000086000004
           EC = 0x21: IABT (current EL), IL = 32 bits
           SET = 0, FnV = 0
           EA = 0, S1PTW = 0
           FSC = 0x04: level 0 translation fault
         user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
         [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
         Internal error: Oops: 0000000086000004 [#1] SMP
         CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield #37-Ubuntu
         Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
         pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
         pc : 0x0
         lr : __napi_poll+0x40/0x230
         sp : ffff800008003e00
         x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
         x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
         x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
         x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
         x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
         x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
         x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
         x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
         x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
         x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
         Call trace:
          0x0
          net_rx_action+0x178/0x360
          __do_softirq+0x15c/0x428
          __irq_exit_rcu+0xac/0xec
          irq_exit+0x18/0x2c
          handle_domain_irq+0x6c/0xa0
          gic_handle_irq+0xec/0x1b0
          call_on_irq_stack+0x20/0x2c
          do_interrupt_handler+0x5c/0x70
          el1_interrupt+0x30/0x50
          el1h_64_irq_handler+0x18/0x2c
          el1h_64_irq+0x7c/0x80
          __setup_irq+0x4c0/0x950
          request_threaded_irq+0xf4/0x1bc
          mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
          mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
          __dev_open+0x100/0x220
          __dev_change_flags+0x16c/0x1f0
          dev_change_flags+0x2c/0x70
          do_setlink+0x220/0xa40
          __rtnl_newlink+0x56c/0x8a0
          rtnl_newlink+0x58/0x84
          rtnetlink_rcv_msg+0x138/0x3c4
          netlink_rcv_skb+0x64/0x130
          rtnetlink_rcv+0x20/0x30
          netlink_unicast+0x2ec/0x360
          netlink_sendmsg+0x278/0x490
          __sock_sendmsg+0x5c/0x6c
          ____sys_sendmsg+0x290/0x2d4
          ___sys_sendmsg+0x84/0xd0
          __sys_sendmsg+0x70/0xd0
          __arm64_sys_sendmsg+0x2c/0x40
          invoke_syscall+0x78/0x100
          el0_svc_common.constprop.0+0x54/0x184
          do_el0_svc+0x30/0xac
          el0_svc+0x48/0x160
          el0t_64_sync_handler+0xa4/0x12c
          el0t_64_sync+0x1a4/0x1a8
         Code: bad PC value
         ---[ end trace 7d1c3f3bf9d81885 ]---
         Kernel panic - not syncing: Oops: Fatal exception in interrupt
         Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
         PHYS_OFFSET: 0x80000000
         CPU features: 0x0,000005c1,a3332a5a
         Memory Limit: none
         ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
    
    The exception happens because there is a pending RX interrupt before the
    call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
    immediately after this request_irq() completes. The RX IRQ handler runs
    "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
    and "napi_enable()", both which happen later in the open() logic.
    
    The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
    to request_irq() execute.
    
    Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
    Signed-off-by: David Thompson <[email protected]>
    Reviewed-by: Asmaa Mnebhi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mlxbf_gige: stop interface during shutdown [+ + +]

Author: David Thompson <[email protected]>
Date:   Mon Mar 25 17:09:29 2024 -0400

    mlxbf_gige: stop interface during shutdown
    
    commit 09ba28e1cd3cf715daab1fca6e1623e22fd754a6 upstream.
    
    The mlxbf_gige driver intermittantly encounters a NULL pointer
    exception while the system is shutting down via "reboot" command.
    The mlxbf_driver will experience an exception right after executing
    its shutdown() method.  One example of this exception is:
    
    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
    Mem abort info:
      ESR = 0x0000000096000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
    [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S         OE     5.15.0-bf.6.gef6992a #1
    Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023
    pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
    lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
    sp : ffff8000080d3c10
    x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58
    x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008
    x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128
    x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff
    x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7
    x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
    x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404
    x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080
    x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001
    x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    Call trace:
     mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
     mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
     __napi_poll+0x40/0x1c8
     net_rx_action+0x314/0x3a0
     __do_softirq+0x128/0x334
     run_ksoftirqd+0x54/0x6c
     smpboot_thread_fn+0x14c/0x190
     kthread+0x10c/0x110
     ret_from_fork+0x10/0x20
    Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002)
    ---[ end trace 7cc3941aa0d8e6a4 ]---
    Kernel panic - not syncing: Oops: Fatal exception in interrupt
    Kernel Offset: 0x4ce722520000 from 0xffff800008000000
    PHYS_OFFSET: 0x80000000
    CPU features: 0x000005c1,a3330e5a
    Memory Limit: none
    ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
    
    During system shutdown, the mlxbf_gige driver's shutdown() is always executed.
    However, the driver's stop() method will only execute if networking interface
    configuration logic within the Linux distribution has been setup to do so.
    
    If shutdown() executes but stop() does not execute, NAPI remains enabled
    and this can lead to an exception if NAPI is scheduled while the hardware
    interface has only been partially deinitialized.
    
    The networking interface managed by the mlxbf_gige driver must be properly
    stopped during system shutdown so that IFF_UP is cleared, the hardware
    interface is put into a clean state, and NAPI is fully deinitialized.
    
    Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
    Signed-off-by: David Thompson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mlxbf_gige: stop PHY during open() error paths [+ + +]

Author: David Thompson <[email protected]>
Date:   Wed Mar 20 15:31:17 2024 -0400

    mlxbf_gige: stop PHY during open() error paths
    
    [ Upstream commit d6c30c5a168f8586b8bcc0d8e42e2456eb05209b ]
    
    The mlxbf_gige_open() routine starts the PHY as part of normal
    initialization.  The mlxbf_gige_open() routine must stop the
    PHY during its error paths.
    
    Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
    Signed-off-by: David Thompson <[email protected]>
    Reviewed-by: Asmaa Mnebhi <[email protected]>
    Reviewed-by: Andrew Lunn <[email protected]>
    Reviewed-by: Jiri Pirko <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations [+ + +]

Author: Vlastimil Babka <[email protected]>
Date:   Wed Feb 21 12:43:58 2024 +0100

    mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
    
    commit 803de9000f334b771afacb6ff3e78622916668b0 upstream.
    
    Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
    __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
    can happen in a suspend/resume context where a GFP_KERNEL allocation can
    have __GFP_IO masked out via gfp_allowed_mask.
    
    Quoting Sven:
    
    1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
       with __GFP_RETRY_MAYFAIL set.
    
    2. page alloc's __alloc_pages_slowpath tries to get a page from the
       freelist. This fails because there is nothing free of that costly
       order.
    
    3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
       which bails out because a zone is ready to be compacted; it pretends
       to have made a single page of progress.
    
    4. page alloc tries to compact, but this always bails out early because
       __GFP_IO is not set (it's not passed by the snd allocator, and even
       if it were, we are suspending so the __GFP_IO flag would be cleared
       anyway).
    
    5. page alloc believes reclaim progress was made (because of the
       pretense in item 3) and so it checks whether it should retry
       compaction. The compaction retry logic thinks it should try again,
       because:
        a) reclaim is needed because of the early bail-out in item 4
        b) a zonelist is suitable for compaction
    
    6. goto 2. indefinite stall.
    
    (end quote)
    
    The immediate root cause is confusing the COMPACT_SKIPPED returned from
    __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
    indicating a lack of order-0 pages, and in step 5 evaluating that in
    should_compact_retry() as a reason to retry, before incrementing and
    limiting the number of retries.  There are however other places that
    wrongly assume that compaction can happen while we lack __GFP_IO.
    
    To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
    evaluation and switch the open-coded test in try_to_compact_pages() to use
    it.
    
    Also use the new helper in:
    - compaction_ready(), which will make reclaim not bail out in step 3, so
      there's at least one attempt to actually reclaim, even if chances are
      small for a costly order
    - in_reclaim_compaction() which will make should_continue_reclaim()
      return false and we don't over-reclaim unnecessarily
    - in __alloc_pages_slowpath() to set a local variable can_compact,
      which is then used to avoid retrying reclaim/compaction for costly
      allocations (step 5) if we can't compact and also to skip the early
      compaction attempt that we do in some cases
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
    Signed-off-by: Vlastimil Babka <[email protected]>
    Reported-by: Sven van Ashbrook <[email protected]>
    Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%[email protected]/
    Tested-by: Karthikeyan Ramasubramanian <[email protected]>
    Cc: Brian Geffon <[email protected]>
    Cc: Curtis Malainey <[email protected]>
    Cc: Jaroslav Kysela <[email protected]>
    Cc: Mel Gorman <[email protected]>
    Cc: Michal Hocko <[email protected]>
    Cc: Takashi Iwai <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Vlastimil Babka <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mm/migrate: set swap entry values of THP tail pages properly. [+ + +]

Author: Zi Yan <[email protected]>
Date:   Wed Mar 6 10:51:57 2024 -0500

    mm/migrate: set swap entry values of THP tail pages properly.
    
    The tail pages in a THP can have swap entry information stored in their
    private field. When migrating to a new page, all tail pages of the new
    page need to update ->private to avoid future data corruption.
    
    This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory:
    work on folio->swap instead of page->private when splitting folio"),
    subpages of a swapcached THP no longer requires the maintenance.
    
    Adding THPs to the swapcache was introduced in commit
    38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"),
    where each subpage of a THP added to the swapcache had its own swapcache
    entry and required the ->private field to point to the correct swapcache
    entry. Later, when THP migration functionality was implemented in commit
    616b8371539a6 ("mm: thp: enable thp migration in generic path"),
    it initially did not handle the subpages of swapcached THPs, failing to
    update their ->private fields or replace the subpage pointers in the
    swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration
    for shmem thp") addressed the swapcache update aspect. This patch fixes
    the update of subpage ->private fields.
    
    Closes: https://lore.kernel.org/linux-mm/[email protected]/
    Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
    Signed-off-by: Zi Yan <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mm/secretmem: fix GUP-fast succeeding on secretmem folios [+ + +]

Author: David Hildenbrand <[email protected]>
Date:   Tue Mar 26 15:32:08 2024 +0100

    mm/secretmem: fix GUP-fast succeeding on secretmem folios
    
    commit 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706 upstream.
    
    folio_is_secretmem() currently relies on secretmem folios being LRU
    folios, to save some cycles.
    
    However, folios might reside in a folio batch without the LRU flag set, or
    temporarily have their LRU flag cleared.  Consequently, the LRU flag is
    unreliable for this purpose.
    
    In particular, this is the case when secretmem_fault() allocates a fresh
    page and calls filemap_add_folio()->folio_add_lru().  The folio might be
    added to the per-cpu folio batch and won't get the LRU flag set until the
    batch was drained using e.g., lru_add_drain().
    
    Consequently, folio_is_secretmem() might not detect secretmem folios and
    GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
    when we would later try reading/writing to the folio, because the folio
    has been unmapped from the directmap.
    
    Fix it by removing that unreliable check.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
    Signed-off-by: David Hildenbrand <[email protected]>
    Reported-by: xingwei lee <[email protected]>
    Reported-by: yue sun <[email protected]>
    Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/
    Debugged-by: Miklos Szeredi <[email protected]>
    Tested-by: Miklos Szeredi <[email protected]>
    Reviewed-by: Mike Rapoport (IBM) <[email protected]>
    Cc: Lorenzo Stoakes <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: David Hildenbrand <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mm: swap: fix race between free_swap_and_cache() and swapoff() [+ + +]

Author: Ryan Roberts <[email protected]>
Date:   Wed Mar 6 14:03:56 2024 +0000

    mm: swap: fix race between free_swap_and_cache() and swapoff()
    
    [ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ]
    
    There was previously a theoretical window where swapoff() could run and
    teardown a swap_info_struct while a call to free_swap_and_cache() was
    running in another thread.  This could cause, amongst other bad
    possibilities, swap_page_trans_huge_swapped() (called by
    free_swap_and_cache()) to access the freed memory for swap_map.
    
    This is a theoretical problem and I haven't been able to provoke it from a
    test case.  But there has been agreement based on code review that this is
    possible (see link below).
    
    Fix it by using get_swap_device()/put_swap_device(), which will stall
    swapoff().  There was an extra check in _swap_info_get() to confirm that
    the swap entry was not free.  This isn't present in get_swap_device()
    because it doesn't make sense in general due to the race between getting
    the reference and swapoff.  So I've added an equivalent check directly in
    free_swap_and_cache().
    
    Details of how to provoke one possible issue (thanks to David Hildenbrand
    for deriving this):
    
    --8<-----
    
    __swap_entry_free() might be the last user and result in
    "count == SWAP_HAS_CACHE".
    
    swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
    
    So the question is: could someone reclaim the folio and turn
    si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
    
    Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
    still references by swap entries.
    
    Process 1 still references subpage 0 via swap entry.
    Process 2 still references subpage 1 via swap entry.
    
    Process 1 quits. Calls free_swap_and_cache().
    -> count == SWAP_HAS_CACHE
    [then, preempted in the hypervisor etc.]
    
    Process 2 quits. Calls free_swap_and_cache().
    -> count == SWAP_HAS_CACHE
    
    Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
    __try_to_reclaim_swap().
    
    __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
    put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
    swap_entry_free()->swap_range_free()->
    ...
    WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
    
    What stops swapoff to succeed after process 2 reclaimed the swap cache
    but before process1 finished its call to swap_page_trans_huge_swapped()?
    
    --8<-----
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch")
    Closes: https://lore.kernel.org/linux-mm/[email protected]/
    Signed-off-by: Ryan Roberts <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: "Huang, Ying" <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mmc: core: Avoid negative index with array access [+ + +]

Author: Mikko Rapeli <[email protected]>
Date:   Wed Mar 13 15:37:44 2024 +0200

    mmc: core: Avoid negative index with array access
    
    commit cf55a7acd1ed38afe43bba1c8a0935b51d1dc014 upstream.
    
    Commit 4d0c8d0aef63 ("mmc: core: Use mrq.sbc in close-ended ffu") assigns
    prev_idata = idatas[i - 1], but doesn't check that the iterator i is
    greater than zero. Let's fix this by adding a check.
    
    Fixes: 4d0c8d0aef63 ("mmc: core: Use mrq.sbc in close-ended ffu")
    Link: https://lore.kernel.org/all/[email protected]/
    Cc: [email protected]
    Signed-off-by: Mikko Rapeli <[email protected]>
    Reviewed-by: Avri Altman <[email protected]>
    Tested-by: Francesco Dolcini <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mmc: core: Fix switch on gp3 partition [+ + +]

Author: Dominique Martinet <[email protected]>
Date:   Wed Mar 6 10:44:38 2024 +0900

    mmc: core: Fix switch on gp3 partition
    
    [ Upstream commit 4af59a8df5ea930038cd3355e822f5eedf4accc1 ]
    
    Commit e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB
    partitions.") added a mask check for 'part_type', but the mask used was
    wrong leading to the code intended for rpmb also being executed for GP3.
    
    On some MMCs (but not all) this would make gp3 partition inaccessible:
    armadillo:~# head -c 1 < /dev/mmcblk2gp3
    head: standard input: I/O error
    armadillo:~# dmesg -c
    [  422.976583] mmc2: running CQE recovery
    [  423.058182] mmc2: running CQE recovery
    [  423.137607] mmc2: running CQE recovery
    [  423.137802] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
    [  423.237125] mmc2: running CQE recovery
    [  423.318206] mmc2: running CQE recovery
    [  423.397680] mmc2: running CQE recovery
    [  423.397837] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  423.408287] Buffer I/O error on dev mmcblk2gp3, logical block 0, async page read
    
    the part_type values of interest here are defined as follow:
    main  0
    boot0 1
    boot1 2
    rpmb  3
    gp0   4
    gp1   5
    gp2   6
    gp3   7
    
    so mask with EXT_CSD_PART_CONFIG_ACC_MASK (7) to correctly identify rpmb
    
    Fixes: e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.")
    Cc: [email protected]
    Cc: Jorge Ramirez-Ortiz <[email protected]>
    Signed-off-by: Dominique Martinet <[email protected]>
    Reviewed-by: Linus Walleij <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mmc: core: Initialize mmc_blk_ioc_data [+ + +]

Author: Mikko Rapeli <[email protected]>
Date:   Wed Mar 13 15:37:43 2024 +0200

    mmc: core: Initialize mmc_blk_ioc_data
    
    commit 0cdfe5b0bf295c0dee97436a8ed13336933a0211 upstream.
    
    Commit 4d0c8d0aef63 ("mmc: core: Use mrq.sbc in close-ended ffu") adds
    flags uint to struct mmc_blk_ioc_data, but it does not get initialized for
    RPMB ioctls which now fails.
    
    Let's fix this by always initializing the struct and flags to zero.
    
    Fixes: 4d0c8d0aef63 ("mmc: core: Use mrq.sbc in close-ended ffu")
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218587
    Link: https://lore.kernel.org/all/[email protected]/
    Cc: [email protected]
    Signed-off-by: Mikko Rapeli <[email protected]>
    Reviewed-by: Avri Altman <[email protected]>
    Acked-by: Adrian Hunter <[email protected]>
    Tested-by: Francesco Dolcini <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mmc: tmio: avoid concurrent runs of mmc_request_done() [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Tue Mar 5 11:42:56 2024 +0100

    mmc: tmio: avoid concurrent runs of mmc_request_done()
    
    [ Upstream commit e8d1b41e69d72c62865bebe8f441163ec00b3d44 ]
    
    With the to-be-fixed commit, the reset_work handler cleared 'host->mrq'
    outside of the spinlock protected critical section. That leaves a small
    race window during execution of 'tmio_mmc_reset()' where the done_work
    handler could grab a pointer to the now invalid 'host->mrq'. Both would
    use it to call mmc_request_done() causing problems (see link below).
    
    However, 'host->mrq' cannot simply be cleared earlier inside the
    critical section. That would allow new mrqs to come in asynchronously
    while the actual reset of the controller still needs to be done. So,
    like 'tmio_mmc_set_ios()', an ERR_PTR is used to prevent new mrqs from
    coming in but still avoiding concurrency between work handlers.
    
    Reported-by: Dirk Behme <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Fixes: df3ef2d3c92c ("mmc: protect the tmio_mmc driver against a theoretical race")
    Signed-off-by: Wolfram Sang <[email protected]>
    Tested-by: Dirk Behme <[email protected]>
    Reviewed-by: Dirk Behme <[email protected]>
    Cc: [email protected] # 3.0+
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mptcp: don't account accept() of non-MPC client as fallback to TCP [+ + +]

Author: Davide Caratti <[email protected]>
Date:   Fri Mar 29 13:08:52 2024 +0100

    mptcp: don't account accept() of non-MPC client as fallback to TCP
    
    commit 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 upstream.
    
    Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
    accept non-MPC connections. As reported by Christoph, this is "surprising"
    because the counter might become greater than MPTcpExtMPCapableSYNRX.
    
    MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
    incremented when a connection was seen using MPTCP options, then a
    fallback to TCP has been done. Let's do that by incrementing it when
    the subflow context of an inbound MPC connection attempt is dropped.
    Also, update mptcp_connect.sh kselftest, to ensure that the
    above MIB does not increment in case a pure TCP client connects to a
    MPTCP server.
    
    Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
    Cc: [email protected]
    Reported-by: Christoph Paasch <[email protected]>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
    Signed-off-by: Davide Caratti <[email protected]>
    Reviewed-by: Mat Martineau <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mtd: rawnand: meson: fix scrambling mode value in command macro [+ + +]

Author: Arseniy Krasnov <[email protected]>
Date:   Sun Feb 11 00:45:51 2024 +0300

    mtd: rawnand: meson: fix scrambling mode value in command macro
    
    [ Upstream commit ef6f463599e16924cdd02ce5056ab52879dc008c ]
    
    Scrambling mode is enabled by value (1 << 19). NFC_CMD_SCRAMBLER_ENABLE
    is already (1 << 19), so there is no need to shift it again in CMDRWGEN
    macro.
    
    Signed-off-by: Arseniy Krasnov <[email protected]>
    Cc: <[email protected]>
    Fixes: 8fae856c5350 ("mtd: rawnand: meson: add support for Amlogic NAND flash controller")
    Signed-off-by: Miquel Raynal <[email protected]>
    Link: https://lore.kernel.org/linux-mtd/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

net/rds: fix possible cp null dereference [+ + +]

Author: Mahmoud Adam <[email protected]>
Date:   Tue Mar 26 16:31:33 2024 +0100

    net/rds: fix possible cp null dereference
    
    commit 62fc3357e079a07a22465b9b6ef71bb6ea75ee4b upstream.
    
    cp might be null, calling cp->cp_conn would produce null dereference
    
    [Simon Horman adds:]
    
    Analysis:
    
    * cp is a parameter of __rds_rdma_map and is not reassigned.
    
    * The following call-sites pass a NULL cp argument to __rds_rdma_map()
    
      - rds_get_mr()
      - rds_get_mr_for_dest
    
    * Prior to the code above, the following assumes that cp may be NULL
      (which is indicative, but could itself be unnecessary)
    
            trans_private = rs->rs_transport->get_mr(
                    sg, nents, rs, &mr->r_key, cp ? cp->cp_conn : NULL,
                    args->vec.addr, args->vec.bytes,
                    need_odp ? ODP_ZEROBASED : ODP_NOT_NEEDED);
    
    * The code modified by this patch is guarded by IS_ERR(trans_private),
      where trans_private is assigned as per the previous point in this analysis.
    
      The only implementation of get_mr that I could locate is rds_ib_get_mr()
      which can return an ERR_PTR if the conn (4th) argument is NULL.
    
    * ret is set to PTR_ERR(trans_private).
      rds_ib_get_mr can return ERR_PTR(-ENODEV) if the conn (4th) argument is NULL.
      Thus ret may be -ENODEV in which case the code in question will execute.
    
    Conclusion:
    * cp may be NULL at the point where this patch adds a check;
      this patch does seem to address a possible bug
    
    Fixes: c055fc00c07b ("net/rds: fix WARNING in rds_conn_connect_if_down")
    Cc: [email protected] # v4.19+
    Signed-off-by: Mahmoud Adam <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/sched: act_skbmod: prevent kernel-infoleak [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Wed Apr 3 13:09:08 2024 +0000

    net/sched: act_skbmod: prevent kernel-infoleak
    
    commit d313eb8b77557a6d5855f42d2234bd592c7b50dd upstream.
    
    syzbot found that tcf_skbmod_dump() was copying four bytes
    from kernel stack to user space [1].
    
    The issue here is that 'struct tc_skbmod' has a four bytes hole.
    
    We need to clear the structure before filling fields.
    
    [1]
    BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
     BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
     BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
     BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
     BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
     BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
      instrument_copy_to_user include/linux/instrumented.h:114 [inline]
      copy_to_user_iter lib/iov_iter.c:24 [inline]
      iterate_ubuf include/linux/iov_iter.h:29 [inline]
      iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
      iterate_and_advance include/linux/iov_iter.h:271 [inline]
      _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
      copy_to_iter include/linux/uio.h:196 [inline]
      simple_copy_to_iter net/core/datagram.c:532 [inline]
      __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420
      skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
      skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline]
      netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962
      sock_recvmsg_nosec net/socket.c:1046 [inline]
      sock_recvmsg+0x2c4/0x340 net/socket.c:1068
      __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242
      __do_sys_recvfrom net/socket.c:2260 [inline]
      __se_sys_recvfrom net/socket.c:2256 [inline]
      __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was stored to memory at:
      pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253
      netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317
      netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351
      nlmsg_unicast include/net/netlink.h:1144 [inline]
      nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610
      rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741
      rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline]
      tcf_add_notify net/sched/act_api.c:2048 [inline]
      tcf_action_add net/sched/act_api.c:2071 [inline]
      tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119
      rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
      netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
      netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
      __sys_sendmsg net/socket.c:2667 [inline]
      __do_sys_sendmsg net/socket.c:2676 [inline]
      __se_sys_sendmsg net/socket.c:2674 [inline]
      __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was stored to memory at:
      __nla_put lib/nlattr.c:1041 [inline]
      nla_put+0x1c6/0x230 lib/nlattr.c:1099
      tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256
      tcf_action_dump_old net/sched/act_api.c:1191 [inline]
      tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
      tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251
      tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628
      tcf_add_notify_msg net/sched/act_api.c:2023 [inline]
      tcf_add_notify net/sched/act_api.c:2042 [inline]
      tcf_action_add net/sched/act_api.c:2071 [inline]
      tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119
      rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
      netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
      netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
      __sys_sendmsg net/socket.c:2667 [inline]
      __do_sys_sendmsg net/socket.c:2676 [inline]
      __se_sys_sendmsg net/socket.c:2674 [inline]
      __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Local variable opt created at:
      tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244
      tcf_action_dump_old net/sched/act_api.c:1191 [inline]
      tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
    
    Bytes 188-191 of 248 are uninitialized
    Memory access of size 248 starts at ffff888117697680
    Data copied to user address 00007ffe56d855f0
    
    Fixes: 86da71b57383 ("net_sched: Introduce skbmod action")
    Signed-off-by: Eric Dumazet <[email protected]>
    Acked-by: Jamal Hadi Salim <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: fec: Set mac_managed_pm during probe [+ + +]

Author: Wei Fang <[email protected]>
Date:   Thu Mar 28 15:59:29 2024 +0000

    net: fec: Set mac_managed_pm during probe
    
    [ Upstream commit cbc17e7802f5de37c7c262204baadfad3f7f99e5 ]
    
    Setting mac_managed_pm during interface up is too late.
    
    In situations where the link is not brought up yet and the system suspends
    the regular PHY power management will run. Since the FEC ETHEREN control
    bit is cleared (automatically) on suspend the controller is off in resume.
    When the regular PHY power management resume path runs in this context it
    will write to the MII_DATA register but nothing will be transmitted on the
    MDIO bus.
    
    This can be observed by the following log:
    
        fec 5b040000.ethernet eth0: MDIO read timeout
        Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
        Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
    
    The data written will however remain in the MII_DATA register.
    
    When the link later is set to administrative up it will trigger a call to
    fec_restart() which will restore the MII_SPEED register. This triggers the
    quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt
    driven MDIO with polled IO") causing an extra MII_EVENT.
    
    This extra event desynchronizes all the MDIO register reads, causing them
    to complete too early. Leading all reads to read as 0 because
    fec_enet_mdio_wait() returns too early.
    
    When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
    the PHY to be initialized incorrectly and the PHY will not transmit any
    ethernet signal in this state. It cannot be brought out of this state
    without a power cycle of the PHY.
    
    Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
    Closes: https://lore.kernel.org/netdev/[email protected]/
    Signed-off-by: Wei Fang <[email protected]>
    [jernberg: commit message]
    Signed-off-by: John Ernberg <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: hns3: tracing: fix hclgevf trace event strings [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Wed Mar 13 09:34:54 2024 -0400

    net: hns3: tracing: fix hclgevf trace event strings
    
    [ Upstream commit 3f9952e8d80cca2da3b47ecd5ad9ec16cfd1a649 ]
    
    The __string() and __assign_str() helper macros of the TRACE_EVENT() macro
    are going through some optimizations where only the source string of
    __string() will be used and the __assign_str() source will be ignored and
    later removed.
    
    To make sure that there's no issues, a new check is added between the
    __string() src argument and the __assign_str() src argument that does a
    strcmp() to make sure they are the same string.
    
    The hclgevf trace events have:
    
      __assign_str(devname, &hdev->nic.kinfo.netdev->name);
    
    Which triggers the warning:
    
    hclgevf_trace.h:34:39: error: passing argument 1 of Б─≤strcmpБ─≥ from incompatible pointer type [-Werror=incompatible-pointer-types]
       34 |                 __assign_str(devname, &hdev->nic.kinfo.netdev->name);
     [..]
    arch/x86/include/asm/string_64.h:75:24: note: expected Б─≤const char *Б─≥ but argument is of type Б─≤char (*)[16]Б─≥
       75 | int strcmp(const char *cs, const char *ct);
          |            ~~~~~~~~~~~~^~
    
    Because __assign_str() now has:
    
            WARN_ON_ONCE(__builtin_constant_p(src) ?                \
                         strcmp((src), __data_offsets.dst##_ptr_) : \
                         (src) != __data_offsets.dst##_ptr_);       \
    
    The problem is the '&' on hdev->nic.kinfo.netdev->name. That's because
    that name is:
    
            char                    name[IFNAMSIZ]
    
    Where passing an address '&' of a char array is not compatible with strcmp().
    
    The '&' is not necessary, remove it.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: netdev <[email protected]>
    Cc: Yisen Zhuang <[email protected]>
    Cc: Salil Mehta <[email protected]>
    Cc: "David S. Miller" <[email protected]>
    Cc: Eric Dumazet <[email protected]>
    Cc: Jakub Kicinski <[email protected]>
    Cc: Yufeng Mo <[email protected]>
    Cc: Huazhong Tan <[email protected]>
    Cc: [email protected]
    Acked-by: Paolo Abeni <[email protected]>
    Reviewed-by: Jijie Shao <[email protected]>
    Fixes: d8355240cf8fb ("net: hns3: add trace event support for PF/VF mailbox")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ll_temac: platform_get_resource replaced by wrong function [+ + +]

Author: Claus Hansen Ries <[email protected]>
Date:   Thu Mar 21 13:08:59 2024 +0000

    net: ll_temac: platform_get_resource replaced by wrong function
    
    commit 3a38a829c8bc27d78552c28e582eb1d885d07d11 upstream.
    
    The function platform_get_resource was replaced with
    devm_platform_ioremap_resource_byname and is called using 0 as name.
    
    This eventually ends up in platform_get_resource_byname in the call
    stack, where it causes a null pointer in strcmp.
    
            if (type == resource_type(r) && !strcmp(r->name, name))
    
    It should have been replaced with devm_platform_ioremap_resource.
    
    Fixes: bd69058f50d5 ("net: ll_temac: Use devm_platform_ioremap_resource_byname()")
    Signed-off-by: Claus Hansen Ries <[email protected]>
    Cc: [email protected]
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ravb: Add R-Car Gen4 support [+ + +]

Author: Geert Uytterhoeven <[email protected]>
Date:   Fri Sep 9 12:10:11 2022 +0200

    net: ravb: Add R-Car Gen4 support
    
    commit 949f252a8594a860007e7035a0cb1c19a4e218b0 upstream.
    
    Add support for the Renesas Ethernet AVB (EtherAVB-IF) blocks on R-Car
    Gen4 SoCs (e.g. R-Car V4H) by matching on a family-specific compatible
    value.
    
    These are treated the same as EtherAVB on R-Car Gen3.
    
    Signed-off-by: Geert Uytterhoeven <[email protected]>
    Reviewed-by: Sergey Shtylyov <[email protected]>
    Link: https://lore.kernel.org/r/2ee968890feba777e627d781128b074b2c43cddb.1662718171.git.geert+renesas@glider.be
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ravb: Always process TX descriptor ring [+ + +]

Author: Paul Barker <[email protected]>
Date:   Tue Apr 2 15:53:04 2024 +0100

    net: ravb: Always process TX descriptor ring
    
    [ Upstream commit 596a4254915f94c927217fe09c33a6828f33fb25 ]
    
    The TX queue should be serviced each time the poll function is called,
    even if the full RX work budget has been consumed. This prevents
    starvation of the TX queue when RX bandwidth usage is high.
    
    Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
    Signed-off-by: Paul Barker <[email protected]>
    Reviewed-by: Sergey Shtylyov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: fix rx queue priority assignment [+ + +]

Author: Piotr Wejman <[email protected]>
Date:   Mon Apr 1 21:22:39 2024 +0200

    net: stmmac: fix rx queue priority assignment
    
    commit b3da86d432b7cd65b025a11f68613e333d2483db upstream.
    
    The driver should ensure that same priority is not mapped to multiple
    rx queues. From DesignWare Cores Ethernet Quality-of-Service
    Databook, section 17.1.29 MAC_RxQ_Ctrl2:
    "[...]The software must ensure that the content of this field is
    mutually exclusive to the PSRQ fields for other queues, that is,
    the same priority is not mapped to multiple Rx queues[...]"
    
    Previously rx_queue_priority() function was:
    - clearing all priorities from a queue
    - adding new priorities to that queue
    After this patch it will:
    - first assign new priorities to a queue
    - then remove those priorities from all other queues
    - keep other priorities previously assigned to that queue
    
    Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration")
    Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2")
    Signed-off-by: Piotr Wejman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: usb: asix: suspend embedded PHY if external is used [+ + +]

Author: Oleksij Rempel <[email protected]>
Date:   Fri Mar 11 09:50:14 2022 +0100

    net: usb: asix: suspend embedded PHY if external is used
    
    [ Upstream commit 4d17d43de9d186150b3289ce99d7a79fcff202f9 ]
    
    In case external PHY is used, we need to take care of embedded PHY.
    Since there are no methods to disable this PHY from the MAC side and
    keeping RMII reference clock, we need to suspend it.
    
    This patch will reduce electrical noise (PHY is continuing to send FLPs)
    and power consumption by 0,22W.
    
    Signed-off-by: Oleksij Rempel <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Stable-dep-of: cbc17e7802f5 ("net: fec: Set mac_managed_pm during probe")
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nf_tables: disallow anonymous set with timeout flag [+ + +]

Author: Pablo Neira Ayuso <[email protected]>
Date:   Fri Mar 1 00:11:10 2024 +0100

    netfilter: nf_tables: disallow anonymous set with timeout flag
    
    commit 16603605b667b70da974bea8216c93e7db043bf1 upstream.
    
    Anonymous sets are never used with timeout from userspace, reject this.
    Exception to this rule is NFT_SET_EVAL to ensure legacy meters still work.
    
    Cc: [email protected]
    Fixes: 761da2935d6e ("netfilter: nf_tables: add set timeout API support")
    Reported-by: lonial con <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() [+ + +]

Author: Ziyang Xuan <[email protected]>
Date:   Wed Apr 3 15:22:04 2024 +0800

    netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
    
    commit 24225011d81b471acc0e1e315b7d9905459a6304 upstream.
    
    nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
    concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
    And thhere is not any protection when iterate over nf_tables_flowtables
    list in __nft_flowtable_type_get(). Therefore, there is pertential
    data-race of nf_tables_flowtables list entry.
    
    Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
    in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
    nft_flowtable_type_get() to protect the entire type query process.
    
    Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
    Signed-off-by: Ziyang Xuan <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: flush pending destroy work before exit_net release [+ + +]

Author: Pablo Neira Ayuso <[email protected]>
Date:   Tue Apr 2 18:04:36 2024 +0200

    netfilter: nf_tables: flush pending destroy work before exit_net release
    
    commit 24cea9677025e0de419989ecb692acd4bb34cac2 upstream.
    
    Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
    work before netlink notifier") to address a race between exit_net and
    the destroy workqueue.
    
    The trace below shows an element to be released via destroy workqueue
    while exit_net path (triggered via module removal) has already released
    the set that is used in such transaction.
    
    [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
    [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
    [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
    [ 1360.547984] Call Trace:
    [ 1360.547991]  <TASK>
    [ 1360.547998]  dump_stack_lvl+0x53/0x70
    [ 1360.548014]  print_report+0xc4/0x610
    [ 1360.548026]  ? __virt_addr_valid+0xba/0x160
    [ 1360.548040]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
    [ 1360.548054]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.548176]  kasan_report+0xae/0xe0
    [ 1360.548189]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.548312]  nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.548447]  ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
    [ 1360.548577]  ? _raw_spin_unlock_irq+0x18/0x30
    [ 1360.548591]  process_one_work+0x2f1/0x670
    [ 1360.548610]  worker_thread+0x4d3/0x760
    [ 1360.548627]  ? __pfx_worker_thread+0x10/0x10
    [ 1360.548640]  kthread+0x16b/0x1b0
    [ 1360.548653]  ? __pfx_kthread+0x10/0x10
    [ 1360.548665]  ret_from_fork+0x2f/0x50
    [ 1360.548679]  ? __pfx_kthread+0x10/0x10
    [ 1360.548690]  ret_from_fork_asm+0x1a/0x30
    [ 1360.548707]  </TASK>
    
    [ 1360.548719] Allocated by task 192061:
    [ 1360.548726]  kasan_save_stack+0x20/0x40
    [ 1360.548739]  kasan_save_track+0x14/0x30
    [ 1360.548750]  __kasan_kmalloc+0x8f/0xa0
    [ 1360.548760]  __kmalloc_node+0x1f1/0x450
    [ 1360.548771]  nf_tables_newset+0x10c7/0x1b50 [nf_tables]
    [ 1360.548883]  nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
    [ 1360.548909]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
    [ 1360.548927]  netlink_unicast+0x367/0x4f0
    [ 1360.548935]  netlink_sendmsg+0x34b/0x610
    [ 1360.548944]  ____sys_sendmsg+0x4d4/0x510
    [ 1360.548953]  ___sys_sendmsg+0xc9/0x120
    [ 1360.548961]  __sys_sendmsg+0xbe/0x140
    [ 1360.548971]  do_syscall_64+0x55/0x120
    [ 1360.548982]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
    
    [ 1360.548994] Freed by task 192222:
    [ 1360.548999]  kasan_save_stack+0x20/0x40
    [ 1360.549009]  kasan_save_track+0x14/0x30
    [ 1360.549019]  kasan_save_free_info+0x3b/0x60
    [ 1360.549028]  poison_slab_object+0x100/0x180
    [ 1360.549036]  __kasan_slab_free+0x14/0x30
    [ 1360.549042]  kfree+0xb6/0x260
    [ 1360.549049]  __nft_release_table+0x473/0x6a0 [nf_tables]
    [ 1360.549131]  nf_tables_exit_net+0x170/0x240 [nf_tables]
    [ 1360.549221]  ops_exit_list+0x50/0xa0
    [ 1360.549229]  free_exit_list+0x101/0x140
    [ 1360.549236]  unregister_pernet_operations+0x107/0x160
    [ 1360.549245]  unregister_pernet_subsys+0x1c/0x30
    [ 1360.549254]  nf_tables_module_exit+0x43/0x80 [nf_tables]
    [ 1360.549345]  __do_sys_delete_module+0x253/0x370
    [ 1360.549352]  do_syscall_64+0x55/0x120
    [ 1360.549360]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
    
    (gdb) list *__nft_release_table+0x473
    0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
    11349           list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
    11350                   list_del(&flowtable->list);
    11351                   nft_use_dec(&table->use);
    11352                   nf_tables_flowtable_destroy(flowtable);
    11353           }
    11354           list_for_each_entry_safe(set, ns, &table->sets, list) {
    11355                   list_del(&set->list);
    11356                   nft_use_dec(&table->use);
    11357                   if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
    11358                           nft_map_deactivate(&ctx, set);
    (gdb)
    
    [ 1360.549372] Last potentially related work creation:
    [ 1360.549376]  kasan_save_stack+0x20/0x40
    [ 1360.549384]  __kasan_record_aux_stack+0x9b/0xb0
    [ 1360.549392]  __queue_work+0x3fb/0x780
    [ 1360.549399]  queue_work_on+0x4f/0x60
    [ 1360.549407]  nft_rhash_remove+0x33b/0x340 [nf_tables]
    [ 1360.549516]  nf_tables_commit+0x1c6a/0x2620 [nf_tables]
    [ 1360.549625]  nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
    [ 1360.549647]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
    [ 1360.549671]  netlink_unicast+0x367/0x4f0
    [ 1360.549680]  netlink_sendmsg+0x34b/0x610
    [ 1360.549690]  ____sys_sendmsg+0x4d4/0x510
    [ 1360.549697]  ___sys_sendmsg+0xc9/0x120
    [ 1360.549706]  __sys_sendmsg+0xbe/0x140
    [ 1360.549715]  do_syscall_64+0x55/0x120
    [ 1360.549725]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
    
    Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release")
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout [+ + +]

Author: Pablo Neira Ayuso <[email protected]>
Date:   Mon Mar 4 14:22:12 2024 +0100

    netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
    
    commit 552705a3650bbf46a22b1adedc1b04181490fc36 upstream.
    
    While the rhashtable set gc runs asynchronously, a race allows it to
    collect elements from anonymous sets with timeouts while it is being
    released from the commit path.
    
    Mingi Cho originally reported this issue in a different path in 6.1.x
    with a pipapo set with low timeouts which is not possible upstream since
    7395dfacfff6 ("netfilter: nf_tables: use timestamp to check for set
    element timeout").
    
    Fix this by setting on the dead flag for anonymous sets to skip async gc
    in this case.
    
    According to 08e4c8c5919f ("netfilter: nf_tables: mark newset as dead on
    transaction abort"), Florian plans to accelerate abort path by releasing
    objects via workqueue, therefore, this sets on the dead flag for abort
    path too.
    
    Cc: [email protected]
    Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
    Reported-by: Mingi Cho <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: reject constant set with timeout [+ + +]

Author: Pablo Neira Ayuso <[email protected]>
Date:   Fri Mar 1 01:04:11 2024 +0100

    netfilter: nf_tables: reject constant set with timeout
    
    commit 5f4fc4bd5cddb4770ab120ce44f02695c4505562 upstream.
    
    This set combination is weird: it allows for elements to be
    added/deleted, but once bound to the rule it cannot be updated anymore.
    Eventually, all elements expire, leading to an empty set which cannot
    be updated anymore. Reject this flags combination.
    
    Cc: [email protected]
    Fixes: 761da2935d6e ("netfilter: nf_tables: add set timeout API support")
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: reject new basechain after table flag update [+ + +]

Author: Pablo Neira Ayuso <[email protected]>
Date:   Mon Apr 1 00:33:02 2024 +0200

    netfilter: nf_tables: reject new basechain after table flag update
    
    commit 994209ddf4f430946f6247616b2e33d179243769 upstream.
    
    When dormant flag is toggled, hooks are disabled in the commit phase by
    iterating over current chains in table (existing and new).
    
    The following configuration allows for an inconsistent state:
    
      add table x
      add chain x y { type filter hook input priority 0; }
      add table x { flags dormant; }
      add chain x w { type filter hook input priority 1; }
    
    which triggers the following warning when trying to unregister chain w
    which is already unregistered.
    
    [  127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50                                                                     1 __nf_unregister_net_hook+0x21a/0x260
    [...]
    [  127.322519] Call Trace:
    [  127.322521]  <TASK>
    [  127.322524]  ? __warn+0x9f/0x1a0
    [  127.322531]  ? __nf_unregister_net_hook+0x21a/0x260
    [  127.322537]  ? report_bug+0x1b1/0x1e0
    [  127.322545]  ? handle_bug+0x3c/0x70
    [  127.322552]  ? exc_invalid_op+0x17/0x40
    [  127.322556]  ? asm_exc_invalid_op+0x1a/0x20
    [  127.322563]  ? kasan_save_free_info+0x3b/0x60
    [  127.322570]  ? __nf_unregister_net_hook+0x6a/0x260
    [  127.322577]  ? __nf_unregister_net_hook+0x21a/0x260
    [  127.322583]  ? __nf_unregister_net_hook+0x6a/0x260
    [  127.322590]  ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
    [  127.322655]  nft_table_disable+0x75/0xf0 [nf_tables]
    [  127.322717]  nf_tables_commit+0x2571/0x2620 [nf_tables]
    
    Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: validate user input for expected length [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Thu Apr 4 12:20:51 2024 +0000

    netfilter: validate user input for expected length
    
    commit 0c83842df40f86e529db6842231154772c20edcc upstream.
    
    I got multiple syzbot reports showing old bugs exposed
    by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc
    in cgroup/{s,g}etsockopt")
    
    setsockopt() @optlen argument should be taken into account
    before copying data.
    
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
     BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
     BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
    Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
    
    CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
      __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
      copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
      copy_from_sockptr include/linux/sockptr.h:55 [inline]
      do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
      do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
      nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
      do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x72/0x7a
    RIP: 0033:0x7fd22067dde9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
    RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
     </TASK>
    
    Allocated by task 7238:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
      __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
      kasan_kmalloc include/linux/kasan.h:211 [inline]
      __do_kmalloc_node mm/slub.c:4069 [inline]
      __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
      kmalloc_noprof include/linux/slab.h:664 [inline]
      __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
      do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x72/0x7a
    
    The buggy address belongs to the object at ffff88802cd73da0
     which belongs to the cache kmalloc-8 of size 8
    The buggy address is located 0 bytes inside of
     allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
    
    The buggy address belongs to the physical page:
    page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
    flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
    page_type: 0xffffefff(slab)
    raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
    raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
      set_page_owner include/linux/page_owner.h:32 [inline]
      post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
      prep_new_page mm/page_alloc.c:1498 [inline]
      get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
      __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
      __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
      alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
      alloc_slab_page+0x5f/0x120 mm/slub.c:2249
      allocate_slab+0x5a/0x2e0 mm/slub.c:2412
      new_slab mm/slub.c:2465 [inline]
      ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
      __slab_alloc+0x58/0xa0 mm/slub.c:3705
      __slab_alloc_node mm/slub.c:3758 [inline]
      slab_alloc_node mm/slub.c:3936 [inline]
      __do_kmalloc_node mm/slub.c:4068 [inline]
      kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
      kstrdup+0x3a/0x80 mm/util.c:62
      device_rename+0xb5/0x1b0 drivers/base/core.c:4558
      dev_change_name+0x275/0x860 net/core/dev.c:1232
      do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
      __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
      rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
      rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
    page last free pid 5146 tgid 5146 stack trace:
      reset_page_owner include/linux/page_owner.h:25 [inline]
      free_pages_prepare mm/page_alloc.c:1110 [inline]
      free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
      discard_slab mm/slub.c:2511 [inline]
      __put_partials+0xeb/0x130 mm/slub.c:2980
      put_cpu_partial+0x17c/0x250 mm/slub.c:3055
      __slab_free+0x2ea/0x3d0 mm/slub.c:4254
      qlink_free mm/kasan/quarantine.c:163 [inline]
      qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
      kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
      __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
      kasan_slab_alloc include/linux/kasan.h:201 [inline]
      slab_post_alloc_hook mm/slub.c:3888 [inline]
      slab_alloc_node mm/slub.c:3948 [inline]
      __do_kmalloc_node mm/slub.c:4068 [inline]
      __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
      kmalloc_node_noprof include/linux/slab.h:681 [inline]
      kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
      bucket_table_alloc lib/rhashtable.c:186 [inline]
      rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
      rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
      process_one_work kernel/workqueue.c:3218 [inline]
      process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
      worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
      kthread+0x2f0/0x390 kernel/kthread.c:388
      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
    
    Memory state around the buggy address:
     ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
     ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
    >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                                   ^
     ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
     ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Pablo Neira Ayuso <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet [+ + +]

Author: Ryosuke Yasuoka <[email protected]>
Date:   Wed Mar 20 09:54:10 2024 +0900

    nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
    
    [ Upstream commit d24b03535e5eb82e025219c2f632b485409c898f ]
    
    syzbot reported the following uninit-value access issue [1][2]:
    
    nci_rx_work() parses and processes received packet. When the payload
    length is zero, each message type handler reads uninitialized payload
    and KMSAN detects this issue. The receipt of a packet with a zero-size
    payload is considered unexpected, and therefore, such packets should be
    silently discarded.
    
    This patch resolved this issue by checking payload size before calling
    each message type handler codes.
    
    Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
    Reported-and-tested-by: [email protected]
    Reported-and-tested-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=7ea9413ea6749baf5574 [1]
    Closes: https://syzkaller.appspot.com/bug?extid=29b5ca705d2e0f4a44d2 [2]
    Signed-off-by: Ryosuke Yasuoka <[email protected]>
    Reviewed-by: Jeremy Cline <[email protected]>
    Reviewed-by: Krzysztof Kozlowski <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nfs: block notification on fs with its own ->lock [+ + +]

Author: J. Bruce Fields <[email protected]>
Date:   Thu Dec 16 12:20:13 2021 -0500

    nfs: block notification on fs with its own ->lock
    
    [ Upstream commit 40595cdc93edf4110c0f0c0b06f8d82008f23929 ]
    
    NFSv4.1 supports an optional lock notification feature which notifies
    the client when a lock comes available.  (Normally NFSv4 clients just
    poll for locks if necessary.)  To make that work, we need to request a
    blocking lock from the filesystem.
    
    We turned that off for NFS in commit f657f8eef3ff ("nfs: don't atempt
    blocking locks on nfs reexports") [sic] because it actually blocks the
    nfsd thread while waiting for the lock.
    
    Thanks to Vasily Averin for pointing out that NFS isn't the only
    filesystem with that problem.
    
    Any filesystem that leaves ->lock NULL will use posix_lock_file(), which
    does the right thing.  Simplest is just to assume that any filesystem
    that defines its own ->lock is not safe to request a blocking lock from.
    
    So, this patch mostly reverts commit f657f8eef3ff ("nfs: don't atempt
    blocking locks on nfs reexports") [sic] and commit b840be2f00c0 ("lockd:
    don't attempt blocking locks on nfs reexports"), and instead uses a
    check of ->lock (Vasily's suggestion) to decide whether to support
    blocking lock notifications on a given filesystem.  Also add a little
    documentation.
    
    Perhaps someday we could add back an export flag later to allow
    filesystems with "good" ->lock methods to support blocking lock
    notifications.
    
    Reported-by: Vasily Averin <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    [ cel: Description rewritten to address checkpatch nits ]
    [ cel: Fixed warning when SUNRPC debugging is disabled ]
    [ cel: Fixed NULL check ]
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Vasily Averin <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfs: fix UAF in direct writes [+ + +]

Author: Josef Bacik <[email protected]>
Date:   Fri Mar 1 11:49:57 2024 -0500

    nfs: fix UAF in direct writes
    
    [ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]
    
    In production we have been hitting the following warning consistently
    
    ------------[ cut here ]------------
    refcount_t: underflow; use-after-free.
    WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0
    Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
    RIP: 0010:refcount_warn_saturate+0x9c/0xe0
    PKRU: 55555554
    Call Trace:
     <TASK>
     ? __warn+0x9f/0x130
     ? refcount_warn_saturate+0x9c/0xe0
     ? report_bug+0xcc/0x150
     ? handle_bug+0x3d/0x70
     ? exc_invalid_op+0x16/0x40
     ? asm_exc_invalid_op+0x16/0x20
     ? refcount_warn_saturate+0x9c/0xe0
     nfs_direct_write_schedule_work+0x237/0x250 [nfs]
     process_one_work+0x12f/0x4a0
     worker_thread+0x14e/0x3b0
     ? ZSTD_getCParams_internal+0x220/0x220
     kthread+0xdc/0x120
     ? __btf_name_valid+0xa0/0xa0
     ret_from_fork+0x1f/0x30
    
    This is because we're completing the nfs_direct_request twice in a row.
    
    The source of this is when we have our commit requests to submit, we
    process them and send them off, and then in the completion path for the
    commit requests we have
    
    if (nfs_commit_end(cinfo.mds))
            nfs_direct_write_complete(dreq);
    
    However since we're submitting asynchronous requests we sometimes have
    one that completes before we submit the next one, so we end up calling
    complete on the nfs_direct_request twice.
    
    The only other place we use nfs_generic_commit_list() is in
    __nfs_commit_inode, which wraps this call in a
    
    nfs_commit_begin();
    nfs_commit_end();
    
    Which is a common pattern for this style of completion handling, one
    that is also repeated in the direct code with get_dreq()/put_dreq()
    calls around where we process events as well as in the completion paths.
    
    Fix this by using the same pattern for the commit requests.
    
    Before with my 200 node rocksdb stress running this warning would pop
    every 10ish minutes.  With my patch the stress test has been running for
    several hours without popping.
    
    Signed-off-by: Josef Bacik <[email protected]>
    Cc: [email protected]
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NFS: Move generic FS show macros to global header [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 22 16:16:56 2021 -0400

    NFS: Move generic FS show macros to global header
    
    [ Upstream commit 9d2d48bbbdabf7b2f029369c4f926d133c1d47ad ]
    
    Refactor: Surface useful show_ macros for use by other trace
    subsystems.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFS: Move NFS protocol display macros to global header [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 22 16:17:03 2021 -0400

    NFS: Move NFS protocol display macros to global header
    
    [ Upstream commit 8791545eda52e8f3bc48e3cd902e38bf4ba4c9de ]
    
    Refactor: surface useful show_ macros so they can be shared between
    the client and server trace code.
    
    Additional clean up:
    - Housekeeping: ensure the correct #include files are pulled in
      and add proper TRACE_DEFINE_ENUM where they are missing
    - Use a consistent naming scheme for the helpers
    - Store values to be displayed symbolically as unsigned long, as
      that is the type that the __print_yada() functions take
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFS: Remove unnecessary TRACE_DEFINE_ENUM()s [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Oct 4 10:09:57 2021 -0400

    NFS: Remove unnecessary TRACE_DEFINE_ENUM()s
    
    [ Upstream commit 8e09650f5ec68858f4b8b67cdef9e2ece9b208f3 ]
    
    Clean up: TRACE_DEFINE_ENUM is unnecessary because the target
    symbols are all C macros, not enums.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFS: restore module put when manager exits. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Thu Jun 23 14:47:34 2022 +1000

    NFS: restore module put when manager exits.
    
    [ Upstream commit 080abad71e99d2becf38c978572982130b927a28 ]
    
    Commit f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed
    calls to module_put_and_kthread_exit() from threads that acted as SUNRPC
    servers and had a related svc_serv_ops structure.  This was correct.
    
    It ALSO removed the module_put_and_kthread_exit() call from
    nfs4_run_state_manager() which is NOT a SUNRPC service.
    
    Consequently every time the NFSv4 state manager runs the module count
    increments and won't be decremented.  So the nfsv4 module cannot be
    unloaded.
    
    So restore the module_put_and_kthread_exit() call.
    
    Fixes: f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module")
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Anna Schumaker <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFS: switch the callback service back to non-pooled. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    NFS: switch the callback service back to non-pooled.
    
    [ Upstream commit 23a1a573c61ccb5e7829c1f5472d3e025293a031 ]
    
    Now that thread management is consistent there is no need for
    nfs-callback to use svc_create_pooled() as introduced in Commit
    df807fffaabd ("NFSv4.x/callback: Create the callback service through
    svc_create_pooled").  So switch back to svc_create().
    
    If service pools were configured, but the number of threads were left at
    '1', nfs callback may not work reliably when svc_create_pooled() is used.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

Linux: NFSD enforce filehandle check for source file in COPY [+ + +]

Author: Olga Kornievskaia <[email protected]>
Date:   Fri Aug 19 15:16:36 2022 -0400

    NFSD enforce filehandle check for source file in COPY
    
    [ Upstream commit 754035ff79a14886e68c0c9f6fa80adb21f12b53 ]
    
    If the passed in filehandle for the source file in the COPY operation
    is not a regular file, the server MUST return NFS4ERR_WRONG_TYPE.
    
    Signed-off-by: Olga Kornievskaia <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

nfsd4: add refcount for nfsd4_blocked_lock [+ + +]

Author: Vasily Averin <[email protected]>
Date:   Fri Dec 17 09:49:39 2021 +0300

    nfsd4: add refcount for nfsd4_blocked_lock
    
    [ Upstream commit 47446d74f1707049067fee038507cdffda805631 ]
    
    nbl allocated in nfsd4_lock can be released by a several ways:
    directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs
    command RELEASE_LOCKOWNER or via nfsd4_callback.
    This structure should be refcounted to be used and released correctly
    in all these cases.
    
    Refcount is initialized to 1 during allocation and is incremented
    when nbl is added into nbl_list/nbl_lru lists.
    
    Usually nbl is linked into both lists together, so only one refcount
    is used for both lists.
    
    However nfsd4_lock() should keep in mind that nbl can be present
    in one of lists only. This can happen if nbl was handled already
    by nfs4_laundromat/nfsd4_callback/etc.
    
    Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED,
    because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc.
    
    Refcount is not changed in find_blocked_lock() because of it reuses counter
    released after removing nbl from lists.
    
    Signed-off-by: Vasily Averin <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd4: remove obselete comment [+ + +]

Author: J. Bruce Fields <[email protected]>
Date:   Tue Oct 26 12:56:55 2021 -0400

    nfsd4: remove obselete comment
    
    [ Upstream commit 80479eb862102f9513e93fcf726c78cc0be2e3b2 ]
    
    Mandatory locking has been removed.  And the rest of this comment is
    redundant with the code.
    
    Reported-by: Jeff layton <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add a mechanism to wait for a DELEGRETURN [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:14:00 2022 -0400

    NFSD: Add a mechanism to wait for a DELEGRETURN
    
    [ Upstream commit c035362eb935fe9381d9d1cc453bc2a37460e24c ]
    
    Subsequent patches will use this mechanism to wake up an operation
    that is waiting for a client to return a delegation.
    
    The new tracepoint records whether the wait timed out or was
    properly awoken by the expected DELEGRETURN:
    
                nfsd-1155  [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out)
    
    Suggested-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add a nfsd4_file_hash_remove() helper [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:34 2022 -0400

    NFSD: Add a nfsd4_file_hash_remove() helper
    
    [ Upstream commit 3341678f2fd6106055cead09e513fad6950a0d19 ]
    
    Refactor to relocate hash deletion operation to a helper function
    that is close to most other nfs4_file data structure operations.
    
    The "noinline" annotation will become useful in a moment when the
    hlist_del_rcu() is replaced with a more complex rhash remove
    operation. It also guarantees that hash remove operations can be
    traced with "-p function -l remove_nfs4_file_locked".
    
    This also simplifies the organization of forward declarations: the
    to-be-added rhashtable and its param structure will be defined
    /after/ put_nfs4_file().
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() [+ + +]

Author: Trond Myklebust <[email protected]>
Date:   Sat Dec 18 20:38:00 2021 -0500

    nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
    
    [ Upstream commit a2f4c3fa4db94ba44d32a72201927cfd132a8e82 ]
    
    Since a clone error commit can cause the boot verifier to change,
    we should trace those errors.
    
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    [ cel: Addressed a checkpatch.pl splat in fs/nfsd/vfs.h ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add an nfsd4_encode_nfstime4() helper [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Jun 12 10:13:39 2023 -0400

    NFSD: Add an nfsd4_encode_nfstime4() helper
    
    [ Upstream commit 262176798b18b12fd8ab84c94cfece0a6a652476 ]
    
    Clean up: de-duplicate some common code.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Acked-by: Tom Talpey <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add an nfsd4_read::rd_eof field [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:08:57 2022 -0400

    NFSD: Add an nfsd4_read::rd_eof field
    
    [ Upstream commit 24c7fb85498eda1d4c6b42cc4886328429814990 ]
    
    Refactor: Make the EOF result available in the entire NFSv4 READ
    path.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add an nfsd_file_fsync tracepoint [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Nov 3 16:22:48 2022 -0400

    NFSD: Add an nfsd_file_fsync tracepoint
    
    [ Upstream commit d7064eaf688cfe454c50db9f59298463d80d403c ]
    
    Add a tracepoint to capture the number of filecache-triggered fsync
    calls and which files needed it. Also, record when an fsync triggers
    a write verifier reset.
    
    Examples:
    
    <...>-97    [007]   262.505611: nfsd_file_free:       inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400
    <...>-97    [007]   262.505612: nfsd_file_fsync:      inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0
    <...>-97    [007]   262.505623: nfsd_file_free:       inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00
    <...>-97    [007]   262.505624: nfsd_file_fsync:      inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:46:51 2022 -0400

    NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
    
    [ Upstream commit 4d1ea8455716ca070e3cd85767e6f6a562a58b1b ]
    
    NFSv4 operations manage the lifetime of nfsd_file items they use by
    means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
    garbage collected.
    
    Introduce a mechanism to enable garbage collection for nfsd_file
    items used only by NFSv2/3 callers.
    
    Note that the change in nfsd_file_put() ensures that both CLOSE and
    DELEGRETURN will actually close out and free an nfsd_file on last
    reference of a non-garbage-collected file.
    
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
    Suggested-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Tested-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add CB_RECALL_ANY tracepoints [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Nov 16 19:44:48 2022 -0800

    NFSD: add CB_RECALL_ANY tracepoints
    
    [ Upstream commit 638593be55c0b37a1930038460a9918215d5c24b ]
    
    Add tracepoints to trace start and end of CB_RECALL_ANY operation.
    
    Signed-off-by: Dai Ngo <[email protected]>
    [ cel: added show_rca_mask() macro ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add courteous server support for thread with only delegation [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:21 2022 -0700

    NFSD: add courteous server support for thread with only delegation
    
    [ Upstream commit 66af25799940b26efd41ea6e648f75c41a48a2c2 ]
    
    This patch provides courteous server support for delegation only.
    Only expired client with delegation but no conflict and no open
    or lock state is allowed to be in COURTESY state.
    
    Delegation conflict with COURTESY/EXPIRABLE client is resolved by
    setting it to EXPIRABLE, queue work for the laundromat and return
    delay to the caller. Conflict is resolved when the laudromat runs
    and expires the EXIRABLE client while the NFS client retries the
    OPEN request. Local thread request that gets conflict is doing the
    retry in _break_lease.
    
    Client in COURTESY or EXPIRABLE state is allowed to reconnect and
    continues to have access to its state. Access to the nfs4_client by
    the reconnecting thread and the laundromat is serialized via the
    client_lock.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add delegation reaper to react to low memory condition [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Nov 16 19:44:47 2022 -0800

    NFSD: add delegation reaper to react to low memory condition
    
    [ Upstream commit 44df6f439a1790a5f602e3842879efa88f346672 ]
    
    The delegation reaper is called by nfsd memory shrinker's on
    the 'count' callback. It scans the client list and sends the
    courtesy CB_RECALL_ANY to the clients that hold delegations.
    
    To avoid flooding the clients with CB_RECALL_ANY requests, the
    delegation reaper sends only one CB_RECALL_ANY request to each
    client per 5 seconds.
    
    Signed-off-by: Dai Ngo <[email protected]>
    [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add documenting comment for nfsd4_release_lockowner() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun May 22 12:34:38 2022 -0400

    NFSD: Add documenting comment for nfsd4_release_lockowner()
    
    [ Upstream commit 043862b09cc00273e35e6c3a6389957953a34207 ]
    
    And return explicit nfserr values that match what is documented in the
    new comment / API contract.
    
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Add errno mapping for EREMOTEIO [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Sat Dec 18 20:37:55 2021 -0500

    nfsd: Add errno mapping for EREMOTEIO
    
    [ Upstream commit a2694e51f60c5a18c7e43d1a9feaa46d7f153e65 ]
    
    The NFS client can occasionally return EREMOTEIO when signalling issues
    with the server.  ...map to NFSERR_IO.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Lance Shelton <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add nfsd4_send_cb_offload() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:41:12 2022 -0400

    NFSD: Add nfsd4_send_cb_offload()
    
    [ Upstream commit e72f9bc006c08841c46d27747a4debc747a8fe13 ]
    
    Refactor for legibility.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add nfsd_file_lru_dispose_list() helper [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:18 2022 -0400

    NFSD: Add nfsd_file_lru_dispose_list() helper
    
    [ Upstream commit 0bac5a264d9a923f5b01f3521e1519a8d0358342 ]
    
    Refactor the invariant part of nfsd_file_lru_walk_list() into a
    separate helper function.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add posix ACLs to struct nfsd_attrs [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: add posix ACLs to struct nfsd_attrs
    
    [ Upstream commit c0cbe70742f4a70893cd6e5f6b10b6e89b6db95b ]
    
    pacl and dpacl pointers are added to struct nfsd_attrs, which requires
    that we have an nfsd_attrs_free() function to free them.
    Those nfsv4 functions that can set ACLs now set up these pointers
    based on the passed in NFSv4 ACL.
    
    nfsd_setattr() sets the acls as appropriate.
    
    Errors are handled as with security labels.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add security label to struct nfsd_attrs [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: add security label to struct nfsd_attrs
    
    [ Upstream commit d6a97d3f589a3a46a16183e03f3774daee251317 ]
    
    nfsd_setattr() now sets a security label if provided, and nfsv4 provides
    it in the 'open' and 'create' paths and the 'setattr' path.
    If setting the label failed (including because the kernel doesn't
    support labels), an error field in 'struct nfsd_attrs' is set, and the
    caller can respond.  The open/create callers clear
    FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case.
    The setattr caller returns the error.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add shrinker to reap courtesy clients on low memory condition [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Sep 14 08:54:26 2022 -0700

    NFSD: add shrinker to reap courtesy clients on low memory condition
    
    [ Upstream commit 7746b32f467b3813fb61faaab3258de35806a7ac ]
    
    Add courtesy_client_reaper to react to low memory condition triggered
    by the system memory shrinker.
    
    The delayed_work for the courtesy_client_reaper is scheduled on
    the shrinker's count callback using the laundry_wq.
    
    The shrinker's scan callback is not used for expiring the courtesy
    clients due to potential deadlocks.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: add some comments to nfsd_file_do_acquire [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Thu Jan 5 07:15:12 2023 -0500

    nfsd: add some comments to nfsd_file_do_acquire
    
    [ Upstream commit b680cb9b737331aad271feebbedafb865504e234 ]
    
    David Howells mentioned that he found this bit of code confusing, so
    sprinkle in some comments to clarify.
    
    Reported-by: David Howells <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add support for lock conflict to courteous server [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:26 2022 -0700

    NFSD: add support for lock conflict to courteous server
    
    [ Upstream commit 27431affb0dbc259ac6ffe6071243a576c8f38f1 ]
    
    This patch allows expired client with lock state to be in COURTESY
    state. Lock conflict with COURTESY client is resolved by the fs/lock
    code using the lm_lock_expirable and lm_expire_lock callback in the
    struct lock_manager_operations.
    
    If conflict client is in COURTESY state, set it to EXPIRABLE and
    schedule the laundromat to run immediately to expire the client. The
    callback lm_expire_lock waits for the laundromat to flush its work
    queue before returning to caller.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add support for sending CB_RECALL_ANY [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Nov 16 19:44:46 2022 -0800

    NFSD: add support for sending CB_RECALL_ANY
    
    [ Upstream commit 3959066b697b5dfbb7141124ae9665337d4bc638 ]
    
    Add XDR encode and decode function for CB_RECALL_ANY.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: add support for share reservation conflict to courteous server [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:22 2022 -0700

    NFSD: add support for share reservation conflict to courteous server
    
    [ Upstream commit 3d69427151806656abf129342028f3f4e5e1fee0 ]
    
    This patch allows expired client with open state to be in COURTESY
    state. Share/access conflict with COURTESY client is resolved by
    setting COURTESY client to EXPIRABLE state, schedule laundromat
    to run and returning nfserr_jukebox to the request client.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Add support for the birth time attribute [+ + +]

Author: Ondrej Valousek <[email protected]>
Date:   Tue Jan 11 13:08:42 2022 +0100

    nfsd: Add support for the birth time attribute
    
    [ Upstream commit e377a3e698fb56cb63f6bddbebe7da76dc37e316 ]
    
    For filesystems that supports "btime" timestamp (i.e. most modern
    filesystems do) we share it via kernel nfsd. Btime support for NFS
    client has already been added by Trond recently.
    
    Suggested-by: Bruce Fields <[email protected]>
    Signed-off-by: Ondrej Valousek <[email protected]>
    [ cel: addressed some whitespace/checkpatch nits ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Add tracepoints to report NFSv4 callback completions [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:13:54 2022 -0400

    NFSD: Add tracepoints to report NFSv4 callback completions
    
    [ Upstream commit 1035d65446a018ca2dd179e29a2fcd6d29057781 ]
    
    Wireshark has always been lousy about dissecting NFSv4 callbacks,
    especially NFSv4.0 backchannel requests. Add tracepoints so we
    can surgically capture these events in the trace log.
    
    Tracepoints are time-stamped and ordered so that we can now observe
    the timing relationship between a CB_RECALL Reply and the client's
    DELEGRETURN Call. Example:
    
                nfsd-1153  [002]   211.986391: nfsd_cb_recall:       addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001
    
                nfsd-1153  [002]   212.095634: nfsd_compound:        xid=0x0000002c opcnt=2
                nfsd-1153  [002]   212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
                nfsd-1153  [002]   212.095658: nfsd_file_put:        hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
                nfsd-1153  [002]   212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
       kworker/u25:8-148   [002]   212.096713: nfsd_cb_recall_done:  client 62ea82e4:fee7492a stateid 00000003:00000001 status=0
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: allow disabling NFSv2 at compile time [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Oct 18 07:47:56 2022 -0400

    nfsd: allow disabling NFSv2 at compile time
    
    [ Upstream commit 2f3a4b2ac2f28b9be78ad21f401f31e263845214 ]
    
    rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical
    step toward deprecating it and allow NFSv2 support to be compiled out.
    
    Add a new CONFIG_NFSD_V2 option that can be turned off and rework the
    CONFIG_NFSD_V?_ACL option dependencies. Add a description that
    discourages enabling it.
    
    Also, change the description of CONFIG_NFSD to state that the always-on
    version is now 3 instead of 2.
    
    Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2
    is disabled at compile time, this should make the kernel ignore attempts
    to disable it at runtime, but still error out when trying to enable it.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Reviewed-by: Tom Talpey <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: allow nfsd_file_get to sanely handle a NULL pointer [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Jan 6 10:33:47 2023 -0500

    nfsd: allow nfsd_file_get to sanely handle a NULL pointer
    
    [ Upstream commit 70f62231cdfd52357836733dd31db787e0412ab2 ]
    
    ...and remove some now-useless NULL pointer checks in its callers.
    
    Suggested-by: NeilBrown <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: allow reaping files still under writeback [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Feb 15 06:53:54 2023 -0500

    nfsd: allow reaping files still under writeback
    
    [ Upstream commit dcb779fcd4ed5984ad15991d574943d12a8693d1 ]
    
    On most filesystems, there is no reason to delay reaping an nfsd_file
    just because its underlying inode is still under writeback. nfsd just
    relies on client activity or the local flusher threads to do writeback.
    
    The main exception is NFS, which flushes all of its dirty data on last
    close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to
    signal that they do this, and only skip closing files under writeback on
    such filesystems.
    
    Also, remove a redundant NULL file pointer check in
    nfsd_file_check_writeback, and clean up nfs's export op flag
    definitions.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Acked-by: Anna Schumaker <[email protected]>
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: always drop directory lock in nfsd_unlink() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: always drop directory lock in nfsd_unlink()
    
    [ Upstream commit b677c0c63a135a916493c064906582e9f3ed4802 ]
    
    Some error paths in nfsd_unlink() allow it to exit without unlocking the
    directory.  This is not a problem in practice as the directory will be
    locked with an fh_put(), but it is untidy and potentially confusing.
    
    This allows us to remove all the fh_unlock() calls that are immediately
    after nfsd_unlink() calls.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 28 10:16:42 2022 -0400

    NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
    
    [ Upstream commit 14ee45b70dd0d9ae76fb066cd8c0652d657353f6 ]
    
    Clean up: The "out" label already invokes fh_drop_write().
    
    Note that fh_drop_write() is already careful not to invoke
    mnt_drop_write() if either it has already been done or there is
    nothing to drop. Therefore no change in behavior is expected.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Avoid clashing function prototypes [+ + +]

Author: Kees Cook <[email protected]>
Date:   Fri Dec 2 12:48:59 2022 -0800

    NFSD: Avoid clashing function prototypes
    
    [ Upstream commit e78e274eb22d966258a3845acc71d3c5b8ee2ea8 ]
    
    When built with Control Flow Integrity, function prototypes between
    caller and function declaration must match. These mismatches are visible
    at compile time with the new -Wcast-function-type-strict in Clang[1].
    
    There were 97 warnings produced by NFS. For example:
    
    fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict]
            [OP_ACCESS]             = (nfsd4_dec)nfsd4_decode_access,
                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    The enc/dec callbacks were defined as passing "void *" as the second
    argument, but were being implicitly cast to a new type. Replace the
    argument with union nfsd4_op_u, and perform explicit member selection
    in the function body. There are no resulting binary differences.
    
    Changes were made mechanically using the following Coccinelle script,
    with minor by-hand fixes for members that didn't already match their
    existing argument name:
    
    @find@
    identifier func;
    type T, opsT;
    identifier ops, N;
    @@
    
     opsT ops[] = {
            [N] = (T) func,
     };
    
    @already_void@
    identifier find.func;
    identifier name;
    @@
    
     func(...,
    -void
    +union nfsd4_op_u
     *name)
     {
            ...
     }
    
    @proto depends on !already_void@
    identifier find.func;
    type T;
    identifier name;
    position p;
    @@
    
     func@p(...,
            T name
     ) {
            ...
       }
    
    @script:python get_member@
    type_name << proto.T;
    member;
    @@
    
    coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])
    
    @convert@
    identifier find.func;
    type proto.T;
    identifier proto.name;
    position proto.p;
    identifier get_member.member;
    @@
    
     func@p(...,
    -       T name
    +       union nfsd4_op_u *u
     ) {
    +       T name = &u->member;
            ...
       }
    
    @cast@
    identifier find.func;
    type T, opsT;
    identifier ops, N;
    @@
    
     opsT ops[] = {
            [N] =
    -       (T)
            func,
     };
    
    Cc: Chuck Lever <[email protected]>
    Cc: Jeff Layton <[email protected]>
    Cc: Gustavo A. R. Silva <[email protected]>
    Cc: [email protected]
    Signed-off-by: Kees Cook <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Avoid some useless tests [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Thu Sep 1 07:27:11 2022 +0200

    nfsd: Avoid some useless tests
    
    [ Upstream commit d44899b8bb0b919f923186c616a84f0e70e04772 ]
    
    memdup_user() can't return NULL, so there is no point for checking for it.
    
    Simplify some tests accordingly.
    
    Suggested-by: Dan Carpenter <[email protected]>
    Signed-off-by: Christophe JAILLET <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: call nfsd_last_thread() before final nfsd_put() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Fri Dec 15 11:56:31 2023 +1100

    nfsd: call nfsd_last_thread() before final nfsd_put()
    
    [ Upstream commit 2a501f55cd641eb4d3c16a2eab0d678693fac663 ]
    
    If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put()
    without calling nfsd_last_thread().  This leaves nn->nfsd_serv pointing
    to a structure that has been freed.
    
    So remove 'static' from nfsd_last_thread() and call it when the
    nfsd_serv is about to be destroyed.
    
    Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount")
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: call op_release, even when op_func returns an error [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Mar 27 06:21:37 2023 -0400

    nfsd: call op_release, even when op_func returns an error
    
    [ Upstream commit 15a8b55dbb1ba154d82627547c5761cac884d810 ]
    
    For ops with "trivial" replies, nfsd4_encode_operation will shortcut
    most of the encoding work and skip to just marshalling up the status.
    One of the things it skips is calling op_release. This could cause a
    memory leak in the layoutget codepath if there is an error at an
    inopportune time.
    
    Have the compound processing engine always call op_release, even when
    op_func sets an error in op->status. With this change, we also need
    nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL
    on error to avoid a double free.
    
    Reported-by: Zhi Li <[email protected]>
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403
    Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops")
    Signed-off-by: Jeff Layton <[email protected]>
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Cap rsize_bop result based on send buffer size [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 1 15:29:55 2022 -0400

    NFSD: Cap rsize_bop result based on send buffer size
    
    [ Upstream commit 76ce4dcec0dc08a032db916841ddc4e3998be317 ]
    
    Since before the git era, NFSD has conserved the number of pages
    held by each nfsd thread by combining the RPC receive and send
    buffers into a single array of pages. This works because there are
    no cases where an operation needs a large RPC Call message and a
    large RPC Reply at the same time.
    
    Once an RPC Call has been received, svc_process() updates
    svc_rqst::rq_res to describe the part of rq_pages that can be
    used for constructing the Reply. This means that the send buffer
    (rq_res) shrinks when the received RPC record containing the RPC
    Call is large.
    
    Add an NFSv4 helper that computes the size of the send buffer. It
    replaces svc_max_payload() in spots where svc_max_payload() returns
    a value that might be larger than the remaining send buffer space.
    Callers who need to know the transport's actual maximum payload size
    will continue to use svc_max_payload().
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
    
    [ Upstream commit 927bfc5600cd6333c9ef9f090f19e66b7d4c8ee1 ]
    
    nfsd_create() usually returns with the directory still locked.
    nfsd_symlink() usually returns with it unlocked.  This is clumsy.
    
    Until recently nfsd_create() needed to keep the directory locked until
    ACLs and security label had been set.  These are now set inside
    nfsd_create() (in nfsd_setattr()) so this need is gone.
    
    So change nfsd_create() and nfsd_symlink() to always unlock, and remove
    any fh_unlock() calls that follow calls to these functions.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up _lm_ operation names [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Feb 16 11:26:06 2022 -0500

    NFSD: Clean up _lm_ operation names
    
    [ Upstream commit 35aff0678f99b0623bb72d50112de9e163a19559 ]
    
    The common practice is to name function instances the same as the
    method names, but with a uniquifying prefix. Commit aef9583b234a
    ("NFSD: Get reference of lockowner when coping file_lock") missed
    this -- the new function names should both have been of the form
    "nfsd4_lm_*".
    
    Before more lock manager operations are added in NFSD, rename these
    two functions for consistency.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up find_or_add_file() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:41 2022 -0400

    NFSD: Clean up find_or_add_file()
    
    [ Upstream commit 9270fc514ba7d415636b23bcb937573a1ce54f6a ]
    
    Remove the call to find_file_locked() in insert_nfs4_file(). Tracing
    shows that over 99% of these calls return NULL. Thus it is not worth
    the expense of the extra bucket list traversal. insert_file() already
    deals correctly with the case where the item is already in the hash
    bucket.
    
    Since nfsd4_file_hash_insert() is now just a wrapper around
    insert_file(), move the meat of insert_file() into
    nfsd4_file_hash_insert() and get rid of it.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: clean up mounted_on_fileid handling [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Thu Sep 8 12:31:07 2022 -0400

    nfsd: clean up mounted_on_fileid handling
    
    [ Upstream commit 6106d9119b6599fa23dc556b429d887b4c2d9f62 ]
    
    We only need the inode number for this, not a full rack of attributes.
    Rename this function make it take a pointer to a u64 instead of
    struct kstat, and change it to just request STATX_INO.
    
    Signed-off-by: Jeff Layton <[email protected]>
    [ cel: renamed get_mounted_on_ino() ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfs4_preprocess_stateid_op() call sites [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:46:57 2022 -0400

    NFSD: Clean up nfs4_preprocess_stateid_op() call sites
    
    [ Upstream commit eeff73f7c1c583f79a401284f46c619294859310 ]
    
    Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op()
    call sites.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Tested-by: Jeff Layton <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfs4svc_encode_compoundres() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:23:19 2022 -0400

    NFSD: Clean up nfs4svc_encode_compoundres()
    
    [ Upsteam commit 9993a66317fc9951322483a9edbfae95a640b210 ]
    
    In today's Linux NFS server implementation, the NFS dispatcher
    initializes each XDR result stream, and the NFSv4 .pc_func and
    .pc_encode methods all use xdr_stream-based encoding. This keeps
    rq_res.len automatically updated. There is no longer a need for
    the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfsd3_proc_create() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Mar 25 14:47:54 2022 -0400

    NFSD: Clean up nfsd3_proc_create()
    
    [ Upstream commit e61568599c9ad638fdaba150fee07d7065e31851 ]
    
    As near as I can tell, mode bit masking and setting S_IFREG is
    already done by do_nfsd_create() and vfs_create(). The NFSv4 path
    (do_open_lookup), for example, does not bother with this special
    processing.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfsd4_encode_readlink() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:09:23 2022 -0400

    NFSD: Clean up nfsd4_encode_readlink()
    
    [ Upstream commit 99b002a1fa00d90e66357315757e7277447ce973 ]
    
    Similar changes to nfsd4_encode_readv(), all bundled into a single
    patch.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfsd4_init_file() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:28 2022 -0400

    NFSD: Clean up nfsd4_init_file()
    
    [ Upstream commit 81a21fa3e7fdecb3c5b97014f0fc5a17d5806cae ]
    
    Name this function more consistently. I'm going to use nfsd4_file_
    and nfsd4_file_hash_ for these helpers.
    
    Change the @fh parameter to be const pointer for better type safety.
    
    Finally, move the hash insertion operation to the caller. This is
    typical for most other "init_object" type helpers, and it is where
    most of the other nfs4_file hash table operations are located.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Clean up nfsd_file_put() [+ + +]

Author: Trond Myklebust <[email protected]>
Date:   Thu Mar 31 09:54:02 2022 -0400

    nfsd: Clean up nfsd_file_put()
    
    [ Upstream commit 999397926ab3f78c7d1235cc4ca6e3c89d2769bf ]
    
    Make it a little less racy, by removing the refcount_read() test. Then
    remove the redundant 'is_hashed' variable.
    
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfsd_open_verified() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun Mar 27 16:46:47 2022 -0400

    NFSD: Clean up nfsd_open_verified()
    
    [ Upstream commit f4d84c52643ae1d63a8e73e2585464470e7944d1 ]
    
    Its only caller always passes S_IFREG as the @type parameter. As an
    additional clean-up, add a kerneldoc comment.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfsd_splice_actor() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Apr 7 16:48:24 2022 -0400

    NFSD: Clean up nfsd_splice_actor()
    
    [ Upstream commit 91e23b1c39820bfed642119ff6b6ef9f43cf09ce ]
    
    nfsd_splice_actor() checks that the page being spliced does not
    match the previous element in the svc_rqst::rq_pages array. We
    believe this is to prevent a double put_page() in cases where the
    READ payload is partially contained in the xdr_buf's head buffer.
    
    However, the NFSD READ proc functions no longer place any part of
    the READ payload in the head buffer, in order to properly support
    NFS/RDMA READ with Write chunks. Therefore, simplify the logic in
    nfsd_splice_actor() to remove this unnecessary check.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up nfsd_vfs_write() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Dec 28 14:19:41 2021 -0500

    NFSD: Clean up nfsd_vfs_write()
    
    [ Upstream commit 33388b3aefefd4d83764dab8038cb54068161a44 ]
    
    The RWF_SYNC and !RWF_SYNC arms are now exactly alike except that
    the RWF_SYNC arm resets the boot verifier twice in a row. Fix that
    redundancy and de-duplicate the code.
    
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: clean up potential nfsd_file refcount leaks in COPY codepath [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Jan 17 14:38:31 2023 -0500

    nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
    
    [ Upstream commit 6ba434cb1a8d403ea9aad1b667c3ea3ad8b3191f ]
    
    There are two different flavors of the nfsd4_copy struct. One is
    embedded in the compound and is used directly in synchronous copies. The
    other is dynamically allocated, refcounted and tracked in the client
    struture. For the embedded one, the cleanup just involves releasing any
    nfsd_files held on its behalf. For the async one, the cleanup is a bit
    more involved, and we need to dequeue it from lists, unhash it, etc.
    
    There is at least one potential refcount leak in this code now. If the
    kthread_create call fails, then both the src and dst nfsd_files in the
    original nfsd4_copy object are leaked.
    
    The cleanup in this codepath is also sort of weird. In the async copy
    case, we'll have up to four nfsd_file references (src and dst for both
    flavors of copy structure). They are both put at the end of
    nfsd4_do_async_copy, even though the ones held on behalf of the embedded
    one outlive that structure.
    
    Change it so that we always clean up the nfsd_file refs held by the
    embedded copy structure before nfsd4_copy returns. Rework
    cleanup_async_copy to handle both inter and intra copies. Eliminate
    nfsd4_cleanup_intra_ssc since it now becomes a no-op.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up SPLICE_OK in nfsd4_encode_read() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:08:51 2022 -0400

    NFSD: Clean up SPLICE_OK in nfsd4_encode_read()
    
    [ Upstream commit c738b218a2e5a753a336b4b7fee6720b902c7ace ]
    
    Do the test_bit() once -- this reduces the number of locked-bus
    operations and makes the function a little easier to read.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up the nfsd_net::nfssvc_boot field [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Dec 29 14:43:16 2021 -0500

    NFSD: Clean up the nfsd_net::nfssvc_boot field
    
    [ Upstream commit 91d2e9b56cf5c80f9efc530d494968369a8a0e0d ]
    
    There are two boot-time fields in struct nfsd_net: one called
    boot_time and one called nfssvc_boot. The latter is used only to
    form write verifiers, but its documenting comment declares:
    
            /* Time of server startup */
    
    Since commit 27c438f53e79 ("nfsd: Support the server resetting the
    boot verifier"), this field can be reset at any time; it's no
    longer tied to server restart. So that comment is stale.
    
    Also, according to pahole, struct timespec64 is 16 bytes long on
    x86_64. The nfssvc_boot field is used only to form a write verifier,
    which is 8 bytes long.
    
    Let's clarify this situation by manufacturing an 8-byte verifier
    in nfs_reset_boot_verifier() and storing only that in struct
    nfsd_net.
    
    We're grabbing 128 bits of time, so compress all of those into a
    64-bit verifier instead of throwing out the high-order bits.
    In the future, the siphash_key can be re-used for other hashed
    objects per-nfsd_net.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up unused code after rhashtable conversion [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:36 2022 -0400

    NFSD: Clean up unused code after rhashtable conversion
    
    [ Upstream commit 0ec8e9d1539a7b8109a554028bbce441052f847e ]
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up WRITE arg decoders [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:23:07 2022 -0400

    NFSD: Clean up WRITE arg decoders
    
    [ Upstream commit d4da5baa533215b14625458e645056baf646bb2e ]
    
    xdr_stream_subsegment() already returns a boolean value.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Combine XDR error tracepoints [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Oct 21 12:11:45 2021 -0400

    NFSD: Combine XDR error tracepoints
    
    [ Upstream commit 70e94d757b3e1f46486d573729d84c8955c81dce ]
    
    Clean up: The garbage_args and cant_encode tracepoints report the
    same information as each other, so combine them into a single
    tracepoint class to reduce code duplication and slightly reduce the
    size of trace.o.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Convert filecache to rhltable [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Nov 24 15:09:04 2022 -0500

    NFSD: Convert filecache to rhltable
    
    [ Upstream commit c4c649ab413ba6a785b25f0edbb12f617c87db2a ]
    
    While we were converting the nfs4_file hashtable to use the kernel's
    resizable hashtable data structure, Neil Brown observed that the
    list variant (rhltable) would be better for managing nfsd_file items
    as well. The nfsd_file hash table will contain multiple entries for
    the same inode -- these should be kept together on a list. And, it
    could be possible for exotic or malicious client behavior to cause
    the hash table to resize itself on every insertion.
    
    A nice simplification is that rhltable_lookup() can return a list
    that contains only nfsd_file items that match a given inode, which
    enables us to eliminate specialized hash table helper functions and
    use the default functions provided by the rhashtable implementation).
    
    Since we are now storing nfsd_file items for the same inode on a
    single list, that effectively reduces the number of hash entries
    that have to be tracked in the hash table. The mininum bucket count
    is therefore lowered.
    
    Light testing with fstests generic/531 show no regressions.
    
    Suggested-by: Neil Brown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Convert the filecache to use rhashtable [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:30 2022 -0400

    NFSD: Convert the filecache to use rhashtable
    
    [ Upstream commit ce502f81ba884c1fe45dc0ebddbcaaa4ec0fc5fb ]
    
    Enable the filecache hash table to start small, then grow with the
    workload. Smaller server deployments benefit because there should
    be lower memory utilization. Larger server deployments should see
    improved scaling with the number of open files.
    
    Suggested-by: Jeff Layton <[email protected]>
    Suggested-by: Dave Chinner <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: copy the whole verifier in nfsd_copy_write_verifier [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Feb 14 10:07:59 2023 -0500

    NFSD: copy the whole verifier in nfsd_copy_write_verifier
    
    [ Upstream commit 90d2175572470ba7f55da8447c72ddd4942923c4 ]
    
    Currently, we're only memcpy'ing the first __be32. Ensure we copy into
    both words.
    
    Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field")
    Reported-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id) [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Dec 28 12:41:32 2021 -0500

    NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
    
    [ Upstream commit fb7622c2dbd1aa41133a8c73e1137b833c074519 ]
    
    Since this pointer is used repeatedly, move it to a stack variable.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: De-duplicate nfsd4_decode_bitmap4() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Dec 13 10:20:45 2021 -0500

    NFSD: De-duplicate nfsd4_decode_bitmap4()
    
    [ Upstream commit cd2e999c7c394ae916d8be741418b3c6c1dddea8 ]
    
    Clean up. Trond points out that xdr_stream_decode_uint32_array()
    does the same thing as nfsd4_decode_bitmap4().
    
    Suggested-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Decode NFSv4 birth time attribute [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun Jul 10 14:46:04 2022 -0400

    NFSD: Decode NFSv4 birth time attribute
    
    [ Upstream commit 5b2f3e0777da2a5dd62824bbe2fdab1d12caaf8f ]
    
    NFSD has advertised support for the NFSv4 time_create attribute
    since commit e377a3e698fb ("nfsd: Add support for the birth time
    attribute").
    
    Igor Mammedov reports that Mac OS clients attempt to set the NFSv4
    birth time attribute via OPEN(CREATE) and SETATTR if the server
    indicates that it supports it, but since the above commit was
    merged, those attempts now fail.
    
    Table 5 in RFC 8881 lists the time_create attribute as one that can
    be both set and retrieved, but the above commit did not add server
    support for clients to provide a time_create attribute. IMO that's
    a bug in our implementation of the NFSv4 protocol, which this commit
    addresses.
    
    Whether NFSD silently ignores the new birth time or actually sets it
    is another matter. I haven't found another filesystem service in the
    Linux kernel that enables users or clients to modify a file's birth
    time attribute.
    
    This commit reflects my (perhaps incorrect) understanding of whether
    Linux users can set a file's birth time. NFSD will now recognize a
    time_create attribute but it ignores its value. It clears the
    time_create bit in the returned attribute bitmask to indicate that
    the value was not used.
    
    Reported-by: Igor Mammedov <[email protected]>
    Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute")
    Tested-by: Igor Mammedov <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Demote a WARN to a pr_warn() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:23:45 2022 -0400

    NFSD: Demote a WARN to a pr_warn()
    
    [ Upstream commit ca3f9acb6d3faf78da2b63324f7c737dbddf7f69 ]
    
    The call trace doesn't add much value, but it sure is noisy.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Deprecate NFS_OFFSET_MAX [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Jan 25 15:57:45 2022 -0500

    NFSD: Deprecate NFS_OFFSET_MAX
    
    [ Upstream commit c306d737691ef84305d4ed0d302c63db2932f0bb ]
    
    NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
    was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
    few uses of it with its generic equivalent, and get rid of it.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: discard fh_locked flag and fh_lock/fh_unlock [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: discard fh_locked flag and fh_lock/fh_unlock
    
    [ Upstream commit dd8dd403d7b223cc77ee89d8d09caf045e90e648 ]
    
    As all inode locking is now fully balanced, fh_put() does not need to
    call fh_unlock().
    fh_lock() and fh_unlock() are no longer used, so discard them.
    These are the only real users of ->fh_locked, so discard that too.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't call locks_release_private() twice concurrently [+ + +]

Author: NeilBrown <[email protected]>
Date:   Wed Jan 31 11:17:40 2024 +1100

    nfsd: don't call locks_release_private() twice concurrently
    
    [ Upstream commit 05eda6e75773592760285e10ac86c56d683be17f ]
    
    It is possible for free_blocked_lock() to be called twice concurrently,
    once from nfsd4_lock() and once from nfsd4_release_lockowner() calling
    remove_blocked_locks().  This is why a kref was added.
    
    It is perfectly safe for locks_delete_block() and kref_put() to be
    called in parallel as they use locking or atomicity respectively as
    protection.  However locks_release_private() has no locking.  It is
    safe for it to be called twice sequentially, but not concurrently.
    
    This patch moves that call from free_blocked_lock() where it could race
    with itself, to free_nbl() where it cannot.  This will slightly delay
    the freeing of private info or release of the owner - but not by much.
    It is arguably more natural for this freeing to happen in free_nbl()
    where the structure itself is freed.
    
    This bug was found by code inspection - it has not been seen in practice.
    
    Fixes: 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock")
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't destroy global nfs4_file table in per-net shutdown [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Sat Feb 11 07:50:08 2023 -0500

    nfsd: don't destroy global nfs4_file table in per-net shutdown
    
    [ Upstream commit 4102db175b5d884d133270fdbd0e59111ce688fc ]
    
    The nfs4_file table is global, so shutting it down when a containerized
    nfsd is shut down is wrong and can lead to double-frees. Tear down the
    nfs4_file_rhltable in nfs4_state_shutdown instead of
    nfs4_state_shutdown_net.
    
    Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects")
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017
    Reported-by: JianHong Yin <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't free files unconditionally in __nfsd_file_cache_purge [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Jan 20 14:52:14 2023 -0500

    nfsd: don't free files unconditionally in __nfsd_file_cache_purge
    
    [ Upstream commit 4bdbba54e9b1c769da8ded9abd209d765715e1d6 ]
    
    nfsd_file_cache_purge is called when the server is shutting down, in
    which case, tearing things down is generally fine, but it also gets
    called when the exports cache is flushed.
    
    Instead of walking the cache and freeing everything unconditionally,
    handle it the same as when we have a notification of conflicting access.
    
    Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
    Reported-by: Ruben Vestergaard <[email protected]>
    Reported-by: Torkil Svensgaard <[email protected]>
    Reported-by: Shachar Kagan <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Tested-by: Shachar Kagan <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't fsync nfsd_files on last close [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Feb 7 12:02:46 2023 -0500

    nfsd: don't fsync nfsd_files on last close
    
    [ Upstream commit 4c475eee02375ade6e864f1db16976ba0d96a0a2 ]
    
    Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of
    an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is
    usually a no-op and doesn't block.
    
    We have a customer running knfsd over very slow storage (XFS over Ceph
    RBD). They were using the "async" export option because performance was
    more important than data integrity for this application. That export
    option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this
    codepath however, their final CLOSE calls would still stall (since a
    CLOSE effectively became a COMMIT).
    
    I think this fsync is not strictly necessary. We only use that result to
    reset the write verifier. Instead of fsync'ing all of the data when we
    free an nfsd_file, we can just check for writeback errors when one is
    acquired and when it is freed.
    
    If the client never comes back, then it'll never see the error anyway
    and there is no point in resetting it. If an error occurs after the
    nfsd_file is removed from the cache but before the inode is evicted,
    then it will reset the write verifier on the next nfsd_file_acquire,
    (since there will be an unseen error).
    
    The only exception here is if something else opens and fsyncs the file
    during that window. Given that local applications work with this
    limitation today, I don't see that as an issue.
    
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658
    Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
    Reported-and-tested-by: Pierguido Lambri <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't hand out delegation on setuid files being opened for write [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Jan 27 07:09:33 2023 -0500

    nfsd: don't hand out delegation on setuid files being opened for write
    
    [ Upstream commit 826b67e6376c2a788e3a62c4860dcd79500a27d5 ]
    
    We had a bug report that xfstest generic/355 was failing on NFSv4.0.
    This test sets various combinations of setuid/setgid modes and tests
    whether DIO writes will cause them to be stripped.
    
    What I found was that the server did properly strip those bits, but
    the client didn't notice because it held a delegation that was not
    recalled. The recall didn't occur because the client itself was the
    one generating the activity and we avoid recalls in that case.
    
    Clearing setuid bits is an "implicit" activity. The client didn't
    specifically request that we do that, so we need the server to issue a
    CB_RECALL, or avoid the situation entirely by not issuing a delegation.
    
    The easiest fix here is to simply not give out a delegation if the file
    is being opened for write, and the mode has the setuid and/or setgid bit
    set. Note that there is a potential race between the mode and lease
    being set, so we test for this condition both before and after setting
    the lease.
    
    This patch fixes generic/355, generic/683 and generic/684 for me. (Note
    that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and
    fail).
    
    Reported-by: Boyang Xue <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't kill nfsd_files because of lease break error [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Thu Jan 5 07:15:11 2023 -0500

    nfsd: don't kill nfsd_files because of lease break error
    
    [ Upstream commit c6593366c0bf222be9c7561354dfb921c611745e ]
    
    An error from break_lease is non-fatal, so we needn't destroy the
    nfsd_file in that case. Just put the reference like we normally would
    and return the error.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't open-code clear_and_wake_up_bit [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Thu Jan 5 07:15:09 2023 -0500

    nfsd: don't open-code clear_and_wake_up_bit
    
    [ Upstream commit b8bea9f6cdd7236c7c2238d022145e9b2f8aac22 ]
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't replace page in rq_pages if it's a continuation of last page [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Mar 17 13:13:08 2023 -0400

    nfsd: don't replace page in rq_pages if it's a continuation of last page
    
    [ Upstream commit 27c934dd8832dd40fd34776f916dc201e18b319b ]
    
    The splice read calls nfsd_splice_actor to put the pages containing file
    data into the svc_rqst->rq_pages array. It's possible however to get a
    splice result that only has a partial page at the end, if (e.g.) the
    filesystem hands back a short read that doesn't cover the whole page.
    
    nfsd_splice_actor will plop the partial page into its rq_pages array and
    return. Then later, when nfsd_splice_actor is called again, the
    remainder of the page may end up being filled out. At this point,
    nfsd_splice_actor will put the page into the array _again_ corrupting
    the reply. If this is done enough times, rq_next_page will overrun the
    array and corrupt the trailing fields -- the rq_respages and
    rq_next_page pointers themselves.
    
    If we've already added the page to the array in the last pass, don't add
    it to the array a second time when dealing with a splice continuation.
    This was originally handled properly in nfsd_splice_actor, but commit
    91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") removed the check
    for it.
    
    Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()")
    Cc: Al Viro <[email protected]>
    Reported-by: Dario Lesca <[email protected]>
    Tested-by: David Critch <[email protected]>
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't take fi_lock in nfsd_break_deleg_cb() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Feb 5 13:22:39 2024 +1100

    nfsd: don't take fi_lock in nfsd_break_deleg_cb()
    
    [ Upstream commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 ]
    
    A recent change to check_for_locks() changed it to take ->flc_lock while
    holding ->fi_lock.  This creates a lock inversion (reported by lockdep)
    because there is a case where ->fi_lock is taken while holding
    ->flc_lock.
    
    ->flc_lock is held across ->fl_lmops callbacks, and
    nfsd_break_deleg_cb() is one of those and does take ->fi_lock.  However
    it doesn't need to.
    
    Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each
    delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list
    and so needed the lock.  Since then it doesn't walk the list and doesn't
    need the lock.
    
    Two actions are performed under the lock.  One is to call
    nfsd_break_one_deleg which calls nfsd4_run_cb().  These doesn't act on
    the nfs4_file at all, so don't need the lock.
    
    The other is to set ->fi_had_conflict which is in the nfs4_file.
    This field is only ever set here (except when initialised to false)
    so there is no possible problem will multiple threads racing when
    setting it.
    
    The field is tested twice in nfs4_set_delegation().  The first test does
    not hold a lock and is documented as an opportunistic optimisation, so
    it doesn't impose any need to hold ->fi_lock while setting
    ->fi_had_conflict.
    
    The second test in nfs4_set_delegation() *is* make under ->fi_lock, so
    removing the locking when ->fi_had_conflict is set could make a change.
    The change could only be interesting if ->fi_had_conflict tested as
    false even though nfsd_break_one_deleg() ran before ->fi_lock was
    unlocked.  i.e. while hash_delegation_locked() was running.
    As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb()
    there can be no importance to this interaction.
    
    So this patch removes the locking from nfsd_break_one_deleg() and moves
    the final test on ->fi_had_conflict out of the locked region to make it
    clear that locking isn't important to the test.  It is still tested
    *after* vfs_setlease() has succeeded.  This might be significant and as
    vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called
    under ->flc_lock this "after" is a true ordering provided by a spinlock.
    
    Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER")
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't take/put an extra reference when putting a file [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Jan 18 12:31:37 2023 -0500

    nfsd: don't take/put an extra reference when putting a file
    
    [ Upstream commit b2ff1bd71db2a1b193a6dde0845adcd69cbcf75e ]
    
    The last thing that filp_close does is an fput, so don't bother taking
    and putting the extra reference.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: drop fh argument from alloc_init_deleg [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: drop fh argument from alloc_init_deleg
    
    [ Upstream commit bbf936edd543e7220f60f9cbd6933b916550396d ]
    
    Currently, we pass the fh of the opened file down through several
    functions so that alloc_init_deleg can pass it to delegation_blocked.
    The filehandle of the open file is available in the nfs4_file however,
    so there's no need to pass it in a separate argument.
    
    Drop the argument from alloc_init_deleg, nfs4_open_delegation and
    nfs4_set_delegation.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: drop fname and flen args from nfsd_create_locked() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Sep 6 10:42:19 2022 +1000

    NFSD: drop fname and flen args from nfsd_create_locked()
    
    [ Upstream commit 9558f9304ca1903090fa5d995a3269a8e82804b4 ]
    
    nfsd_create_locked() does not use the "fname" and "flen" arguments, so
    drop them from declaration and all callers.
    
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: drop support for ancient filehandles [+ + +]

Author: NeilBrown <[email protected]>
Date:   Thu Sep 2 11:15:29 2021 +1000

    NFSD: drop support for ancient filehandles
    
    [ Upstream commit c645a883df34ee10b884ec921e850def54b7f461 ]
    
    Filehandles not in the "new" or "version 1" format have not been handed
    out for new mounts since Linux 2.4 which was released 20 years ago.
    I think it is safe to say that no such file handles are still in use,
    and that we can drop support for them.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: drop the nfsd_put helper [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Jan 3 08:36:52 2024 -0500

    nfsd: drop the nfsd_put helper
    
    [ Upstream commit 64e6304169f1e1f078e7f0798033f80a7fb0ea46 ]
    
    It's not safe to call nfsd_put once nfsd_last_thread has been called, as
    that function will zero out the nn->nfsd_serv pointer.
    
    Drop the nfsd_put helper altogether and open-code the svc_put in its
    callers instead. That allows us to not be reliant on the value of that
    pointer when handling an error.
    
    Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()")
    Reported-by: Zhi Li <[email protected]>
    Cc: NeilBrown <[email protected]>
    Signed-off-by: Jeffrey Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: enhance inter-server copy cleanup [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Sun Dec 18 16:55:53 2022 -0800

    NFSD: enhance inter-server copy cleanup
    
    [ Upstream commit df24ac7a2e3a9d0bc68f1756a880e50bfe4b4522 ]
    
    Currently nfsd4_setup_inter_ssc returns the vfsmount of the source
    server's export when the mount completes. After the copy is done
    nfsd4_cleanup_inter_ssc is called with the vfsmount of the source
    server and it searches nfsd_ssc_mount_list for a matching entry
    to do the clean up.
    
    The problems with this approach are (1) the need to search the
    nfsd_ssc_mount_list and (2) the code has to handle the case where
    the matching entry is not found which looks ugly.
    
    The enhancement is instead of nfsd4_setup_inter_ssc returning the
    vfsmount, it returns the nfsd4_ssc_umount_item which has the
    vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called
    it's passed with the nfsd4_ssc_umount_item directly to do the
    clean up so no searching is needed and there is no need to handle
    the 'not found' case.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    [ cel: adjusted whitespace and variable/function names ]
    Reviewed-by: Olga Kornievskaia <[email protected]>

NFSD: Ensure nf_inode is never dereferenced [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:27:09 2022 -0400

    NFSD: Ensure nf_inode is never dereferenced
    
    [ Upstream commit 427f5f83a3191cbf024c5aea6e5b601cdf88d895 ]
    
    The documenting comment for struct nf_file states:
    
    /*
     * A representation of a file that has been opened by knfsd. These are hashed
     * in the hashtable by inode pointer value. Note that this object doesn't
     * hold a reference to the inode by itself, so the nf_inode pointer should
     * never be dereferenced, only used for comparison.
     */
    
    Replace the two existing dereferences to make the comment always
    true.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: ensure we always call fh_verify_error tracepoint [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Oct 12 14:42:54 2022 -0400

    nfsd: ensure we always call fh_verify_error tracepoint
    
    [ Upstream commit 93c128e709aec23b10f3a2f78a824080d4085318 ]
    
    This is a conditional tracepoint. Call it every time, not just when
    nfs_permission fails.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: extra checks when freeing delegation stateids [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Sep 26 14:41:02 2022 -0400

    nfsd: extra checks when freeing delegation stateids
    
    [ Upstream commit 895ddf5ed4c54ea9e3533606d7a8b4e4f27f95ef ]
    
    We've had some reports of problems in the refcounting for delegation
    stateids that we've yet to track down. Add some extra checks to ensure
    that we've removed the object from various lists before freeing it.
    
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2127067
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Finish converting the NFSv3 GETACL result encoder [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun Oct 16 11:47:08 2022 -0400

    NFSD: Finish converting the NFSv3 GETACL result encoder
    
    [ Upstream commit 841fd0a3cb490eae5dfd262eccb8c8b11d57f8b8 ]
    
    For some reason, the NFSv2 GETACL result encoder was fully converted
    to use the new nfs_stream_encode_acl(), but the NFSv3 equivalent was
    not similarly converted.
    
    Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream")
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Fix a regression in nfsd_setattr() [+ + +]

Author: Trond Myklebust <[email protected]>
Date:   Thu Feb 15 20:24:50 2024 -0500

    nfsd: Fix a regression in nfsd_setattr()
    
    [ Upstream commit 6412e44c40aaf8f1d7320b2099c5bdd6cb9126ac ]
    
    Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of
    fh_(un)lock for file operations") broke the NFSv3 pre/post op
    attributes behaviour when doing a SETATTR rpc call by stripping out
    the calls to fh_fill_pre_attrs() and fh_fill_post_attrs().
    
    Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations")
    Signed-off-by: Trond Myklebust <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Message-ID: <[email protected]>
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Fix a write performance regression [+ + +]

Author: Trond Myklebust <[email protected]>
Date:   Thu Mar 31 09:54:01 2022 -0400

    nfsd: Fix a write performance regression
    
    [ Upstream commit 6b8a94332ee4f7d9a8ae0cbac7609f79c212f06c ]
    
    The call to filemap_flush() in nfsd_file_put() is there to ensure that
    we clear out any writes belonging to a NFSv3 client relatively quickly
    and avoid situations where the file can't be evicted by the garbage
    collector. It also ensures that we detect write errors quickly.
    
    The problem is this causes a regression in performance for some
    workloads.
    
    So try to improve matters by deferring writeback until we're ready to
    close the file, and need to detect errors so that we can force the
    client to resend.
    
    Tested-by: Jan Kara <[email protected]>
    Fixes: b6669305d35a ("nfsd: Reduce the number of calls to nfsd_file_gc()")
    Signed-off-by: Trond Myklebust <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix comments about spinlock handling with delegations [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Sep 26 12:38:45 2022 -0400

    nfsd: fix comments about spinlock handling with delegations
    
    [ Upstream commit 25fbe1fca14142beae6c882f7906510363d42bff ]
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Feb 3 13:18:34 2023 -0500

    nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
    
    [ Upstream commit dcd779dc46540e174a6ac8d52fbed23593407317 ]
    
    The nested if statements here make no sense, as you can never reach
    "else" branch in the nested statement. Fix the error handling for
    when there is a courtesy client that holds a conflicting deny mode.
    
    Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server")
    Reported-by: Е╪╣Ф≥╨Х╚╨ <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Reviewed-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Fix creation time serialization order [+ + +]

Author: Tavian Barnes <[email protected]>
Date:   Fri Jun 23 17:09:06 2023 -0400

    nfsd: Fix creation time serialization order
    
    [ Upstream commit d7dbed457c2ef83709a2a2723a2d58de43623449 ]
    
    In nfsd4_encode_fattr(), TIME_CREATE was being written out after all
    other times.  However, they should be written out in an order that
    matches the bit flags in bmval1, which in this case are
    
        #define FATTR4_WORD1_TIME_ACCESS        (1UL << 15)
        #define FATTR4_WORD1_TIME_CREATE        (1UL << 18)
        #define FATTR4_WORD1_TIME_DELTA         (1UL << 19)
        #define FATTR4_WORD1_TIME_METADATA      (1UL << 20)
        #define FATTR4_WORD1_TIME_MODIFY        (1UL << 21)
    
    so TIME_CREATE should come second.
    
    I noticed this on a FreeBSD NFSv4.2 client, which supports creation
    times.  On this client, file times were weirdly permuted.  With this
    patch applied on the server, times looked normal on the client.
    
    Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute")
    Link: https://unix.stackexchange.com/q/749605/56202
    Signed-off-by: Tavian Barnes <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix double fget() bug in __write_ports_addfd() [+ + +]

Author: Dan Carpenter <[email protected]>
Date:   Mon May 29 14:35:55 2023 +0300

    nfsd: fix double fget() bug in __write_ports_addfd()
    
    [ Upstream commit c034203b6a9dae6751ef4371c18cb77983e30c28 ]
    
    The bug here is that you cannot rely on getting the same socket
    from multiple calls to fget() because userspace can influence
    that.  This is a kind of double fetch bug.
    
    The fix is to delete the svc_alien_sock() function and instead do
    the checking inside the svc_addsock() function.
    
    Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one")
    Signed-off-by: Dan Carpenter <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix handling of cached open files in nfsd4_open codepath [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Thu Jan 5 14:55:56 2023 -0500

    nfsd: fix handling of cached open files in nfsd4_open codepath
    
    [ Upstream commit 0b3a551fa58b4da941efeb209b3770868e2eddd7 ]
    
    Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a
    regular NFSv4 file") added the ability to cache an open fd over a
    compound. There are a couple of problems with the way this currently
    works:
    
    It's racy, as a newly-created nfsd_file can end up with its PENDING bit
    cleared while the nf is hashed, and the nf_file pointer is still zeroed
    out. Other tasks can find it in this state and they expect to see a
    valid nf_file, and can oops if nf_file is NULL.
    
    Also, there is no guarantee that we'll end up creating a new nfsd_file
    if one is already in the hash. If an extant entry is in the hash with a
    valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with
    the value of op_file and the old nf_file will leak.
    
    Fix both issues by making a new nfsd_file_acquirei_opened variant that
    takes an optional file pointer. If one is present when this is called,
    we'll take a new reference to it instead of trying to open the file. If
    the nfsd_file already has a valid nf_file, we'll just ignore the
    optional file and pass the nfsd_file back as-is.
    
    Also rework the tracepoints a bit to allow for an "opened" variant and
    don't try to avoid counting acquisitions in the case where we already
    have a cached open file.
    
    Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file")
    Cc: Trond Myklebust <[email protected]>
    Reported-by: Stanislav Saner <[email protected]>
    Reported-and-Tested-by: Ruben Vestergaard <[email protected]>
    Reported-and-Tested-by: Torkil Svensgaard <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix inconsistent indenting [+ + +]

Author: Jiapeng Chong <[email protected]>
Date:   Thu Dec 2 16:35:42 2021 +0800

    NFSD: Fix inconsistent indenting
    
    [ Upstream commit 1e37d0e5bda45881eea1bec4b812def72c7d4aea ]
    
    Eliminate the follow smatch warning:
    
    fs/nfsd/nfs4xdr.c:4766 nfsd4_encode_read_plus_hole() warn: inconsistent
    indenting.
    
    Reported-by: Abaci Robot <[email protected]>
    Signed-off-by: Jiapeng Chong <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: fix leaked reference count of nfsd4_ssc_umount_item [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon Jan 23 21:34:13 2023 -0800

    NFSD: fix leaked reference count of nfsd4_ssc_umount_item
    
    [ Upstream commit 34e8f9ec4c9ac235f917747b23a200a5e0ec857b ]
    
    The reference count of nfsd4_ssc_umount_item is not decremented
    on error conditions. This prevents the laundromat from unmounting
    the vfsmount of the source file.
    
    This patch decrements the reference count of nfsd4_ssc_umount_item
    on error.
    
    Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
    Signed-off-by: Dai Ngo <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix licensing header in filecache.c [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Oct 31 09:53:26 2022 -0400

    NFSD: Fix licensing header in filecache.c
    
    [ Upstream commit 3f054211b29c0fa06dfdcab402c795fd7e906be1 ]
    
    Add a missing SPDX header.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix net-namespace logic in __nfsd_file_cache_purge [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Oct 31 11:49:21 2022 -0400

    nfsd: fix net-namespace logic in __nfsd_file_cache_purge
    
    [ Upstream commit d3aefd2b29ff5ffdeb5c06a7d3191a027a18cdb8 ]
    
    If the namespace doesn't match the one in "net", then we'll continue,
    but that doesn't cause another rhashtable_walk_next call, so it will
    loop infinitely.
    
    Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable")
    Reported-by: Petr Vorel <[email protected]>
    Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix nfsd_clid_class use of __string_len() macro [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Thu Feb 22 12:28:28 2024 -0500

    NFSD: Fix nfsd_clid_class use of __string_len() macro
    
    [ Upstream commit 9388a2aa453321bcf1ad2603959debea9e6ab6d4 ]
    
    I'm working on restructuring the __string* macros so that it doesn't need
    to recalculate the string twice. That is, it will save it off when
    processing __string() and the __assign_str() will not need to do the work
    again as it currently does.
    
    Currently __string_len(item, src, len) doesn't actually use "src", but my
    changes will require src to be correct as that is where the __assign_str()
    will get its value from.
    
    The event class nfsd_clid_class has:
    
      __string_len(name, name, clp->cl_name.len)
    
    But the second "name" does not exist and causes my changes to fail to
    build. That second parameter should be: clp->cl_name.data.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: Neil Brown <[email protected]>
    Cc: Olga Kornievskaia <[email protected]>
    Cc: Dai Ngo <[email protected]>
    Cc: Tom Talpey <[email protected]>
    Cc: [email protected]
    Fixes: d27b74a8675ca ("NFSD: Use new __string_len C macros for nfsd_clid_class")
    Acked-by: Chuck Lever <[email protected]>
    Acked-by: Jeff Layton <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nfsd: fix nfsd_file_unhash_and_dispose [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Sep 30 16:56:02 2022 -0400

    nfsd: fix nfsd_file_unhash_and_dispose
    
    [ Upstream commit 8d0d254b15cc5b7d46d85fb7ab8ecede9575e672 ]
    
    nfsd_file_unhash_and_dispose() is called for two reasons:
    
    We're either shutting down and purging the filecache, or we've gotten a
    notification about a file delete, so we want to go ahead and unhash it
    so that it'll get cleaned up when we close.
    
    We're either walking the hashtable or doing a lookup in it and we
    don't take a reference in either case. What we want to do in both cases
    is to try and unhash the object and put it on the dispose list if that
    was successful. If it's no longer hashed, then we don't want to touch
    it, with the assumption being that something else is already cleaning
    up the sentinel reference.
    
    Instead of trying to selectively decrement the refcount in this
    function, just unhash it, and if that was successful, move it to the
    dispose list. Then, the disposal routine will just clean that up as
    usual.
    
    Also, just make this a void function, drop the WARN_ON_ONCE, and the
    comments about deadlocking since the nature of the purported deadlock
    is no longer clear.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Fix null-ptr-deref in nfsd_fill_super() [+ + +]

Author: Zhang Xiaoxu <[email protected]>
Date:   Sat May 21 12:08:45 2022 +0800

    nfsd: Fix null-ptr-deref in nfsd_fill_super()
    
    [ Upstream commit 6f6f84aa215f7b6665ccbb937db50860f9ec2989 ]
    
    KASAN report null-ptr-deref as follows:
    
      BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd]
      Write of size 8 at addr 000000000000005d by task a.out/852
    
      CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       kasan_report+0xab/0x120
       ? nfsd_mkdir+0x71/0x1c0 [nfsd]
       ? nfsd_fill_super+0xc6/0xe0 [nfsd]
       nfsd_fill_super+0xc6/0xe0 [nfsd]
       ? nfsd_mkdir+0x1c0/0x1c0 [nfsd]
       get_tree_keyed+0x8e/0x100
       vfs_get_tree+0x41/0xf0
       __do_sys_fsconfig+0x590/0x670
       ? fscontext_read+0x180/0x180
       ? anon_inode_getfd+0x4f/0x70
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    This can be reproduce by concurrent operations:
            1. fsopen(nfsd)/fsconfig
            2. insmod/rmmod nfsd
    
    Since the nfsd file system is registered before than nfsd_net allocated,
    the caller may get the file_system_type and use the nfsd_net before it
    allocated, then null-ptr-deref occurred.
    
    So init_nfsd() should call register_filesystem() last.
    
    Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first")
    Signed-off-by: Zhang Xiaoxu <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: fix possible oops when nfsd/pool_stats is closed. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Sep 12 11:25:00 2023 +1000

    NFSD: fix possible oops when nfsd/pool_stats is closed.
    
    [ Upstream commit 88956eabfdea7d01d550535af120d4ef265b1d02 ]
    
    If /proc/fs/nfsd/pool_stats is open when the last nfsd thread exits, then
    when the file is closed a NULL pointer is dereferenced.
    This is because nfsd_pool_stats_release() assumes that the
    pointer to the svc_serv cannot become NULL while a reference is held.
    
    This used to be the case but a recent patch split nfsd_last_thread() out
    from nfsd_put(), and clearing the pointer is done in nfsd_last_thread().
    
    This is easily reproduced by running
       rpc.nfsd 8 ; ( rpc.nfsd 0;true) < /proc/fs/nfsd/pool_stats
    
    Fortunately nfsd_pool_stats_release() has easy access to the svc_serv
    pointer, and so can call svc_put() on it directly.
    
    Fixes: 9f28a971ee9f ("nfsd: separate nfsd_last_thread() from nfsd_put()")
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix potential use-after-free in nfsd_file_put() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue May 31 19:49:01 2022 -0400

    NFSD: Fix potential use-after-free in nfsd_file_put()
    
    [ Upstream commit b6c71c66b0ad8f2b59d9bc08c7a5079b110bec01 ]
    
    nfsd_file_put_noref() can free @nf, so don't dereference @nf
    immediately upon return from nfsd_file_put_noref().
    
    Suggested-by: Trond Myklebust <[email protected]>
    Fixes: 999397926ab3 ("nfsd: Clean up nfsd_file_put()")
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Apr 19 10:53:18 2023 -0700

    NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop
    
    [ Upstream commit 147abcacee33781e75588869e944ddb07528a897 ]
    
    The following request sequence to the same file causes the NFS client and
    server getting into an infinite loop with COMMIT and NFS4ERR_DELAY:
    
    OPEN
    REMOVE
    WRITE
    COMMIT
    
    Problem reported by recall11, recall12, recall14, recall20, recall22,
    recall40, recall42, recall48, recall50 of nfstest suite.
    
    This patch restores the handling of race condition in nfsd_file_do_acquire
    with unlink to that prior of the regression.
    
    Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
    Signed-off-by: Dai Ngo <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: fix problems with cleanup on errors in nfsd4_copy [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Tue Jan 31 11:12:29 2023 -0800

    NFSD: fix problems with cleanup on errors in nfsd4_copy
    
    [ Upstream commit 81e722978ad21072470b73d8f6a50ad62c7d5b7d ]
    
    When nfsd4_copy fails to allocate memory for async_copy->cp_src, or
    nfs4_init_copy_state fails, it calls cleanup_async_copy to do the
    cleanup for the async_copy which causes page fault since async_copy
    is not yet initialized.
    
    This patche rearranges the order of initializing the fields in
    async_copy and adds checks in cleanup_async_copy to skip un-initialized
    fields.
    
    Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
    Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy")
    Signed-off-by: Dai Ngo <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix reads with a non-zero offset that don't end on a page boundary [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Nov 23 14:14:32 2022 -0500

    NFSD: Fix reads with a non-zero offset that don't end on a page boundary
    
    [ Upstream commit ac8db824ead0de2e9111337c401409d010fba2f0 ]
    
    This was found when virtual machines with nfs-mounted qcow2 disks
    failed to boot properly.
    
    Reported-by: Anders Blomdell <[email protected]>
    Suggested-by: Al Viro <[email protected]>
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
    Fixes: bfbfb6182ad1 ("nfsd_splice_actor(): handle compound pages")
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: fix regression with setting ACLs. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Thu Sep 8 12:08:40 2022 +1000

    NFSD: fix regression with setting ACLs.
    
    [ Upstream commit 00801cd92d91e94aa04d687f9bb9a9104e7c3d46 ]
    
    A recent patch moved ACL setting into nfsd_setattr().
    Unfortunately it didn't work as nfsd_setattr() aborts early if
    iap->ia_valid is 0.
    
    Remove this test, and instead avoid calling notify_change() when
    ia_valid is 0.
    
    This means that nfsd_setattr() will now *always* lock the inode.
    Previously it didn't if only a ATTR_MODE change was requested on a
    symlink (see Commit 15b7a1b86d66 ("[PATCH] knfsd: fix setattr-on-symlink
    error return")). I don't think this change really matters.
    
    Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix RELEASE_LOCKOWNER [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Jan 22 14:58:16 2024 +1100

    nfsd: fix RELEASE_LOCKOWNER
    
    [ Upstream commit edcf9725150e42beeca42d085149f4c88fa97afd ]
    
    The test on so_count in nfsd4_release_lockowner() is nonsense and
    harmful.  Revert to using check_for_locks(), changing that to not sleep.
    
    First: harmful.
    As is documented in the kdoc comment for nfsd4_release_lockowner(), the
    test on so_count can transiently return a false positive resulting in a
    return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
    clearly a protocol violation and with the Linux NFS client it can cause
    incorrect behaviour.
    
    If RELEASE_LOCKOWNER is sent while some other thread is still
    processing a LOCK request which failed because, at the time that request
    was received, the given owner held a conflicting lock, then the nfsd
    thread processing that LOCK request can hold a reference (conflock) to
    the lock owner that causes nfsd4_release_lockowner() to return an
    incorrect error.
    
    The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
    never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
    it knows that the error is impossible.  It assumes the lock owner was in
    fact released so it feels free to use the same lock owner identifier in
    some later locking request.
    
    When it does reuse a lock owner identifier for which a previous RELEASE
    failed, it will naturally use a lock_seqid of zero.  However the server,
    which didn't release the lock owner, will expect a larger lock_seqid and
    so will respond with NFS4ERR_BAD_SEQID.
    
    So clearly it is harmful to allow a false positive, which testing
    so_count allows.
    
    The test is nonsense because ... well... it doesn't mean anything.
    
    so_count is the sum of three different counts.
    1/ the set of states listed on so_stateids
    2/ the set of active vfs locks owned by any of those states
    3/ various transient counts such as for conflicting locks.
    
    When it is tested against '2' it is clear that one of these is the
    transient reference obtained by find_lockowner_str_locked().  It is not
    clear what the other one is expected to be.
    
    In practice, the count is often 2 because there is precisely one state
    on so_stateids.  If there were more, this would fail.
    
    In my testing I see two circumstances when RELEASE_LOCKOWNER is called.
    In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
    all the lock states being removed, and so the lockowner being discarded
    (it is removed when there are no more references which usually happens
    when the lock state is discarded).  When nfsd4_release_lockowner() finds
    that the lock owner doesn't exist, it returns success.
    
    The other case shows an so_count of '2' and precisely one state listed
    in so_stateid.  It appears that the Linux client uses a separate lock
    owner for each file resulting in one lock state per lock owner, so this
    test on '2' is safe.  For another client it might not be safe.
    
    So this patch changes check_for_locks() to use the (newish)
    find_any_file_locked() so that it doesn't take a reference on the
    nfs4_file and so never calls nfsd_file_put(), and so never sleeps.  With
    this check is it safe to restore the use of check_for_locks() rather
    than testing so_count against the mysterious '2'.
    
    Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Cc: [email protected] # v6.2+
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix space and spelling mistake [+ + +]

Author: Zhang Jiaming <[email protected]>
Date:   Thu Jun 23 16:20:05 2022 +0800

    NFSD: Fix space and spelling mistake
    
    [ Upstream commit f532c9ff103897be0e2a787c0876683c3dc39ed3 ]
    
    Add a blank space after ','.
    Change 'succesful' to 'successful'.
    
    Signed-off-by: Zhang Jiaming <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix strncpy() fortify warning [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:03 2022 -0400

    NFSD: Fix strncpy() fortify warning
    
    [ Upstream commit 5304877936c0a67e1a01464d113bae4c81eacdb6 ]
    
    In function Б─≤strncpyБ─≥,
        inlined from Б─≤nfsd4_ssc_setup_dulБ─≥ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3,
        inlined from Б─≤nfsd4_interssc_connectБ─≥ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11:
    /home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: Б─≤__builtin_strncpyБ─≥ specified bound 63 equals destination size [-Wstringop-truncation]
       52 | #define __underlying_strncpy    __builtin_strncpy
          |                                 ^
    /home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro Б─≤__underlying_strncpyБ─≥
       89 |         return __underlying_strncpy(p, q, size);
          |                ^~~~~~~~~~~~~~~~~~~~
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix the filecache LRU shrinker [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:24 2022 -0400

    NFSD: Fix the filecache LRU shrinker
    
    [ Upstream commit edead3a55804739b2e4af0f35e9c7326264e7b22 ]
    
    Without LRU item rotation, the shrinker visits only a few items on
    the end of the LRU list, and those would always be long-term OPEN
    files for NFSv4 workloads. That makes the filecache shrinker
    completely ineffective.
    
    Adopt the same strategy as the inode LRU by using LRU_ROTATE.
    
    Suggested-by: Dave Chinner <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix trace_nfsd_fh_verify_err() crasher [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sat Nov 12 15:06:07 2022 -0500

    NFSD: Fix trace_nfsd_fh_verify_err() crasher
    
    [ Upstream commit 5a01c805441bdc86e7af206d8a03735cc9394ffb ]
    
    Now that the nfsd_fh_verify_err() tracepoint is always called on
    error, it needs to handle cases where the filehandle is not yet
    fully formed.
    
    Fixes: 93c128e709ae ("nfsd: ensure we always call fh_verify_error tracepoint")
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix up the filecache laundrette scheduling [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Nov 2 14:44:50 2022 -0400

    nfsd: fix up the filecache laundrette scheduling
    
    [ Upstream commit 22ae4c114f77b55a4c5036e8f70409a0799a08f8 ]
    
    We don't really care whether there are hashed entries when it comes to
    scheduling the laundrette. They might all be non-gc entries, after all.
    We only want to schedule it if there are entries on the LRU.
    
    Switch to using list_lru_count, and move the check into
    nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to
    count entries, since it only schedules the laundrette after adding an
    entry to the LRU.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Sat Nov 5 09:49:26 2022 -0400

    nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint
    
    [ Upstream commit bdd6b5624c62d0acd350d07564f1c82fe649235f ]
    
    When we fail to insert into the hashtable with a non-retryable error,
    we'll free the object and then goto out_status. If the tracepoint is
    enabled, it'll end up accessing the freed object when it tries to
    grab the fields out of it.
    
    Set nf to NULL after freeing it to avoid the issue.
    
    Fixes: 243a5263014a ("nfsd: rework hashtable handling in nfsd_do_file_acquire")
    Reported-by: kernel test robot <[email protected]>
    Reported-by: Dan Carpenter <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix using the correct variable for sizeof() [+ + +]

Author: Jakob Koschel <[email protected]>
Date:   Sat Mar 19 21:27:04 2022 +0100

    nfsd: fix using the correct variable for sizeof()
    
    [ Upstream commit 4fc5f5346592cdc91689455d83885b0af65d71b8 ]
    
    While the original code is valid, it is not the obvious choice for the
    sizeof() call and in preparation to limit the scope of the list iterator
    variable the sizeof should be changed to the size of the destination.
    
    Signed-off-by: Jakob Koschel <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix whitespace [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 21 16:41:32 2022 -0400

    NFSD: Fix whitespace
    
    [ Upstream commit 26320d7e317c37404c811603d50d811132aef78c ]
    
    Clean up: Pull case arms back one tab stop to conform every other
    switch statement in fs/nfsd/nfs4proc.c.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Flesh out a documenting comment for filecache.c [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Nov 1 13:30:46 2022 -0400

    NFSD: Flesh out a documenting comment for filecache.c
    
    [ Upstream commit b3276c1f5b268ff56622e9e125b792b4c3dc03ac ]
    
    Record what we've learned recently about the NFSD filecache in a
    documenting comment so our future selves don't forget what all this
    is for.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: handle errors better in write_ports_addfd() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    NFSD: handle errors better in write_ports_addfd()
    
    [ Upstream commit 89b24336f03a8ba560e96b0c47a8434a7fa48e3c ]
    
    If write_ports_add() fails, we shouldn't destroy the serv, unless we had
    only just created it.  So if there are any permanent sockets already
    attached, leave the serv in place.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: hold a lighter-weight client reference over CB_RECALL_ANY [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Apr 5 13:56:18 2024 -0400

    nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
    
    [ Upstream commit 10396f4df8b75ff6ab0aa2cd74296565466f2c8d ]
    
    Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
    client. While a callback job is technically an RPC that counter is
    really more for client-driven RPCs, and this has the effect of
    preventing the client from being unhashed until the callback completes.
    
    If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
    can end up in a situation where the callback can't complete on the (now
    dead) callback channel, but the new client can't connect because the old
    client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
    return on the CREATE_SESSION operation.
    
    The job is only holding a reference to the client so it can clear a flag
    after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a
    reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of
    reference when dealing with the nfsdfs info files, but it should work
    appropriately here to ensure that the nfs4_client doesn't disappear.
    
    Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
    Reported-by: Vladimir Benes <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NFSD: Hook up the filecache stat file [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:58 2022 -0400

    NFSD: Hook up the filecache stat file
    
    [ Upstream commit 2e6c6e4c4375bfd3defa5b1ff3604d9f33d1c936 ]
    
    There has always been the capability of exporting filecache metrics
    via /proc, but it was never hooked up. Let's surface these metrics
    to enable better observability of the filecache.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: ignore requests to disable unsupported versions [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Oct 18 07:47:54 2022 -0400

    nfsd: ignore requests to disable unsupported versions
    
    [ Upstream commit 8e823bafff2308753d430566256c83d8085952da ]
    
    The kernel currently errors out if you attempt to enable or disable a
    version that it doesn't recognize. Change it to ignore attempts to
    disable an unrecognized version. If we don't support it, then there is
    no harm in doing so.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Reviewed-by: Tom Talpey <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: improve stateid access bitmask documentation [+ + +]

Author: J. Bruce Fields <[email protected]>
Date:   Tue Dec 7 17:32:21 2021 -0500

    nfsd: improve stateid access bitmask documentation
    
    [ Upstream commit 3dcd1d8aab00c5d3a0a3725253c86440b1a0f5a7 ]
    
    The use of the bitmaps is confusing.  Add a cross-reference to make it
    easier to find the existing comment.  Add an updated reference with URL
    to make it quicker to look up.  And a bit more editorializing about the
    value of this.
    
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Sep 2 18:18:16 2022 -0400

    NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND
    
    [ Upstream commit 80e591ce636f3ae6855a0ca26963da1fdd6d4508 ]
    
    When attempting an NFSv4 mount, a Solaris NFSv4 client builds a
    single large COMPOUND that chains a series of LOOKUPs to get to the
    pseudo filesystem root directory that is to be mounted. The Linux
    NFS server's current maximum of 16 operations per NFSv4 COMPOUND is
    not large enough to ensure that this works for paths that are more
    than a few components deep.
    
    Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most
    NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not
    trigger any re-allocation of the operation array on the server),
    increasing this maximum should result in little to no impact.
    
    The ops array can get large now, so allocate it via vmalloc() to
    help ensure memory fragmentation won't cause an allocation failure.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Initialize pointer ni with NULL and not plain integer 0 [+ + +]

Author: Colin Ian King <[email protected]>
Date:   Sat Sep 25 23:58:41 2021 +0100

    NFSD: Initialize pointer ni with NULL and not plain integer 0
    
    [ Upstream commit 8e70bf27fd20cc17e87150327a640e546bfbee64 ]
    
    Pointer ni is being initialized with plain integer zero. Fix
    this by initializing with NULL.
    
    Signed-off-by: Colin Ian King <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Instantiate a struct file when creating a regular NFSv4 file [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Mar 30 10:30:54 2022 -0400

    NFSD: Instantiate a struct file when creating a regular NFSv4 file
    
    [ Upstream commit fb70bf124b051d4ded4ce57511dfec6d3ebf2b43 ]
    
    There have been reports of races that cause NFSv4 OPEN(CREATE) to
    return an error even though the requested file was created. NFSv4
    does not provide a status code for this case.
    
    To mitigate some of these problems, reorganize the NFSv4
    OPEN(CREATE) logic to allocate resources before the file is actually
    created, and open the new file while the parent directory is still
    locked.
    
    Two new APIs are added:
    
    + Add an API that works like nfsd_file_acquire() but does not open
    the underlying file. The OPEN(CREATE) path can use this API when it
    already has an open file.
    
    + Add an API that is kin to dentry_open(). NFSD needs to create a
    file and grab an open "struct file *" atomically. The
    alloc_empty_file() has to be done before the inode create. If it
    fails (for example, because the NFS server has exceeded its
    max_files limit), we avoid creating the file and can still return
    an error to the NFS client.
    
    BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
    Signed-off-by: Chuck Lever <[email protected]>
    Tested-by: JianHong Yin <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Instrument fh_verify() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Jun 21 10:06:23 2022 -0400

    NFSD: Instrument fh_verify()
    
    [ Upstream commit 051382885552e12541cc0ebf82092be374a9ed2a ]
    
    Capture file handles and how they map to local inodes. In particular,
    NFSv4 PUTFH uses fh_verify() so we can now observe which file handles
    are the target of OPEN, LOOKUP, RENAME, and so on.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: introduce struct nfsd_attrs [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: introduce struct nfsd_attrs
    
    [ Upstream commit 7fe2a71dda349a1afa75781f0cc7975be9784d15 ]
    
    The attributes that nfsd might want to set on a file include 'struct
    iattr' as well as an ACL and security label.
    The latter two are passed around quite separately from the first, in
    part because they are only needed for NFSv4.  This leads to some
    clumsiness in the code, such as the attributes NOT being set in
    nfsd_create_setattr().
    
    We need to keep the directory locked until all attributes are set to
    ensure the file is never visibile without all its attributes.  This need
    combined with the inconsistent handling of attributes leads to more
    clumsiness.
    
    As a first step towards tidying this up, introduce 'struct nfsd_attrs'.
    This is passed (by reference) to vfs.c functions that work with
    attributes, and is assembled by the various nfs*proc functions which
    call them.  As yet only iattr is included, but future patches will
    expand this.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: keep track of the number of courtesy clients in the system [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Sep 14 08:54:25 2022 -0700

    NFSD: keep track of the number of courtesy clients in the system
    
    [ Upstream commit 3a4ea23d86a317c4b68b9a69d51f7e84e1e04357 ]
    
    Add counter nfs4_courtesy_client_count to nfsd_net to keep track
    of the number of courtesy clients in the system.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: keep track of the number of v4 clients in the system [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Fri Jul 15 16:54:52 2022 -0700

    NFSD: keep track of the number of v4 clients in the system
    
    [ Upstream commit 0926c39515aa065a296e97dfc8790026f1e53f86 ]
    
    Add counter nfs4_client_count to keep track of the total number
    of v4 clients, including courtesy clients, in the system.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Leave open files out of the filecache LRU [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:17 2022 -0400

    NFSD: Leave open files out of the filecache LRU
    
    [ Upstream commit 4a0e73e635e3f36b616ad5c943e3d23debe4632f ]
    
    There have been reports of problems when running fstests generic/531
    against Linux NFS servers with NFSv4. The NFS server that hosts the
    test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
    Analysis shows that:
    
    fs/nfsd/filecache.c
     482                 ret = list_lru_walk(&nfsd_file_lru,
     483                                 nfsd_file_lru_cb,
     484                                 &head, LONG_MAX);
    
    causes nfsd_file_gc() to walk the entire length of the filecache LRU
    list every time it is called (which is quite frequently). The walk
    holds a spinlock the entire time that prevents other nfsd threads
    from accessing the filecache.
    
    What's more, for NFSv4 workloads, none of the items that are visited
    during this walk may be evicted, since they are all files that are
    held OPEN by NFS clients.
    
    Address this by ensuring that open files are not kept on the LRU
    list.
    
    Reported-by: Frank van der Linden <[email protected]>
    Reported-by: Wang Yugui <[email protected]>
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
    Suggested-by: Trond Myklebust <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: limit the number of v4 clients to 1024 per 1GB of system memory [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Fri Jul 15 16:54:53 2022 -0700

    NFSD: limit the number of v4 clients to 1024 per 1GB of system memory
    
    [ Upstream commit 4271c2c0887562318a0afef97d32d8a71cbe0743 ]
    
    Currently there is no limit on how many v4 clients are supported
    by the system. This can be a problem in systems with small memory
    configuration to function properly when a very large number of
    clients exist that creates memory shortage conditions.
    
    This patch enforces a limit of 1024 NFSv4 clients, including courtesy
    clients, per 1GB of system memory.  When the number of the clients
    reaches the limit, requests that create new clients are returned
    with NFS4ERR_DELAY and the laundromat is kicked start to trim old
    clients. Due to the overhead of the upcall to remove the client
    record, the maximun number of clients the laundromat removes on
    each run is limited to 128. This is done to ensure the laundromat
    can still process the other tasks in a timely manner.
    
    Since there is now a limit of the number of clients, the 24-hr
    idle time limit of courtesy client is no longer needed and was
    removed.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: make a copy of struct iattr before calling notify_change [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed May 17 12:26:44 2023 -0400

    nfsd: make a copy of struct iattr before calling notify_change
    
    [ Upstream commit d53d70084d27f56bcdf5074328f2c9ec861be596 ]
    
    notify_change can modify the iattr structure. In particular it can
    end up setting ATTR_MODE when ATTR_KILL_SUID is already set, causing
    a BUG() if the same iattr is passed to notify_change more than once.
    
    Make a copy of the struct iattr before calling notify_change.
    
    Reported-by: Zhi Li <[email protected]>
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2207969
    Tested-by: Zhi Li <[email protected]>
    Fixes: 34b91dda7124 ("NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY")
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Make it possible to use svc_set_num_threads_sync [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    NFSD: Make it possible to use svc_set_num_threads_sync
    
    [ Upstream commit 3409e4f1e8f239f0ed81be0b068ecf4e73e2e826 ]
    
    nfsd cannot currently use svc_set_num_threads_sync.  It instead
    uses svc_set_num_threads which does *not* wait for threads to all
    exit, and has a separate mechanism (nfsd_shutdown_complete) to wait
    for completion.
    
    The reason that nfsd is unlike other services is that nfsd threads can
    exit separately from svc_set_num_threads being called - they die on
    receipt of SIGKILL.  Also, when the last thread exits, the service must
    be shut down (sockets closed).
    
    For this, the nfsd_mutex needs to be taken, and as that mutex needs to
    be held while svc_set_num_threads is called, the one cannot wait for
    the other.
    
    This patch changes the nfsd thread so that it can drop the ref on the
    service without blocking on nfsd_mutex, so that svc_set_num_threads_sync
    can be used:
     - if it can drop a non-last reference, it does that.  This does not
       trigger shutdown and does not require a mutex.  This will likely
       happen for all but the last thread signalled, and for all threads
       being shut down by nfsd_shutdown_threads()
     - if it can get the mutex without blocking (trylock), it does that
       and then drops the reference.  This will likely happen for the
       last thread killed by SIGKILL
     - Otherwise there might be an unrelated task holding the mutex,
       possibly in another network namespace, or nfsd_shutdown_threads()
       might be just about to get a reference on the service, after which
       we can drop ours safely.
       We cannot conveniently get wakeup notifications on these events,
       and we are unlikely to need to, so we sleep briefly and check again.
    
    With this we can discard nfsd_shutdown_complete and
    nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Make nfs4_put_copy() static [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:35 2022 -0400

    NFSD: Make nfs4_put_copy() static
    
    [ Upstream commit 8ea6e2c90bb0eb74a595a12e23a1dff9abbc760a ]
    
    Clean up: All call sites are in fs/nfsd/nfs4proc.c.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:14:25 2022 -0400

    NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
    
    [ Upstream commit 5f5f8b6d655fd947e899b1771c2f7cb581a06764 ]
    
    nfsd_unlink() can kick off a CB_RECALL (via
    vfs_unlink() -> leases_conflict()) if a delegation is present.
    Before returning NFS4ERR_DELAY, give the client holding that
    delegation a chance to return it and then retry the nfsd_unlink()
    again, once.
    
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
    Tested-by: Igor Mammedov <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:14:19 2022 -0400

    NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
    
    [ Upstream commit 68c522afd0b1936b48a03a4c8b81261e7597c62d ]
    
    nfsd_rename() can kick off a CB_RECALL (via
    vfs_rename() -> leases_conflict()) if a delegation is present.
    Before returning NFS4ERR_DELAY, give the client holding that
    delegation a chance to return it and then retry the nfsd_rename()
    again, once.
    
    This version of the patch handles renaming an existing file,
    but does not deal with renaming onto an existing file. That
    case will still always trigger an NFS4ERR_DELAY.
    
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
    Tested-by: Igor Mammedov <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: make nfsd4_run_cb a bool return function [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Sep 26 14:41:01 2022 -0400

    nfsd: make nfsd4_run_cb a bool return function
    
    [ Upstream commit b95239ca4954a0d48b19c09ce7e8f31b453b4216 ]
    
    queue_work can return false and not queue anything, if the work is
    already queued. If that happens in the case of a CB_RECALL, we'll have
    taken an extra reference to the stid that will never be put. Ensure we
    throw a warning in that case.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:14:13 2022 -0400

    NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
    
    [ Upstream commit 34b91dda7124fc3259e4b2ae53e0c933dedfec01 ]
    
    nfsd_setattr() can kick off a CB_RECALL (via
    notify_change() -> break_lease()) if a delegation is present. Before
    returning NFS4ERR_DELAY, give the client holding that delegation a
    chance to return it and then retry the nfsd_setattr() again, once.
    
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
    Tested-by: Igor Mammedov <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: make nfsd_stats.th_cnt atomic_t [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    nfsd: make nfsd_stats.th_cnt atomic_t
    
    [ Upstream commit 9b6c8c9bebccd5fb785c306b948c08874a88874d ]
    
    This allows us to move the updates for th_cnt out of the mutex.
    This is a step towards reducing mutex coverage in nfsd().
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: map EBADF [+ + +]

Author: Peng Tao <[email protected]>
Date:   Sat Dec 18 20:37:54 2021 -0500

    nfsd: map EBADF
    
    [ Upstream commit b3d0db706c77d02055910fcfe2f6eb5155ff9d5e ]
    
    Now that we have open file cache, it is possible that another client
    deletes the file and DP will not know about it. Then IO to MDS would
    fail with BADSTATEID and knfsd would start state recovery, which
    should fail as well and then nfs read/write will fail with EBADF.
    And it triggers a WARN() in nfserrno().
    
    -----------[ cut here ]------------
    WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]()
    nfsd: non-standard errno: -9
    modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt
    pata_acpi floppy
    CPU: 0 PID: 13529 Comm: nfsd Tainted: G        W       4.1.5-00307-g6e6579b #7
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
     0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936
     0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea
     ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48
     all Trace:
     [<ffffffff81789936>] dump_stack+0x45/0x57
     [<ffffffff810a08ea>] warn_slowpath_common+0x8a/0xc0
     [<ffffffff810a0975>] warn_slowpath_fmt+0x55/0x70
     [<ffffffff81252908>] ? splice_direct_to_actor+0x148/0x230
     [<ffffffffa02fb8c0>] ? fsid_source+0x60/0x60 [nfsd]
     [<ffffffffa02f9918>] nfserrno+0x58/0x70 [nfsd]
     [<ffffffffa02fba57>] nfsd_finish_read+0x97/0xb0 [nfsd]
     [<ffffffffa02fc7a6>] nfsd_splice_read+0x76/0xa0 [nfsd]
     [<ffffffffa02fcca1>] nfsd_read+0xc1/0xd0 [nfsd]
     [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
     [<ffffffffa03073da>] nfsd3_proc_read+0xba/0x150 [nfsd]
     [<ffffffffa02f7a03>] nfsd_dispatch+0xc3/0x210 [nfsd]
     [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
     [<ffffffffa0232913>] svc_process_common+0x453/0x6f0 [sunrpc]
     [<ffffffffa0232cc3>] svc_process+0x113/0x1b0 [sunrpc]
     [<ffffffffa02f740f>] nfsd+0xff/0x170 [nfsd]
     [<ffffffffa02f7310>] ? nfsd_destroy+0x80/0x80 [nfsd]
     [<ffffffff810bf3a8>] kthread+0xd8/0xf0
     [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
     [<ffffffff817912a2>] ret_from_fork+0x42/0x70
     [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
    
    Signed-off-by: Peng Tao <[email protected]>
    Signed-off-by: Lance Shelton <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Modernize nfsd4_release_lockowner() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun May 22 12:07:18 2022 -0400

    NFSD: Modernize nfsd4_release_lockowner()
    
    [ Upstream commit bd8fdb6e545f950f4654a9a10d7e819ad48146e5 ]
    
    Refactor: Use existing helpers that other lock operations use. This
    change removes several automatic variables, so re-organize the
    variable declarations for readability.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Move copy offload callback arguments into a separate structure [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:41:18 2022 -0400

    NFSD: Move copy offload callback arguments into a separate structure
    
    [ Upstream commit a11ada99ce93a79393dc6683d22f7915748c8f6b ]
    
    Refactor so that CB_OFFLOAD arguments can be passed without
    allocating a whole struct nfsd4_copy object. On my system (x86_64)
    this removes another 96 bytes from struct nfsd4_copy.
    
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:23 2022 -0700

    NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd
    
    [ Upstream commit d76cc46b37e123e8d245cc3490978dbda56f979d ]
    
    This patch moves create/destroy of laundry_wq from nfs4_state_start
    and nfs4_state_shutdown_net to init_nfsd and exit_nfsd to prevent
    the laundromat from being freed while a thread is processing a
    conflicting lock.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Move documenting comment for nfsd4_process_open2() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Mar 23 13:55:37 2022 -0400

    NFSD: Move documenting comment for nfsd4_process_open2()
    
    [ Upstream commit 7e2ce0cc15a509b859199235a2bad9cece00f67a ]
    
    Clean up nfsd4_open() by converting a large comment at the only
    call site for nfsd4_process_open2() to a kerneldoc comment in
    front of that function.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: move filehandle format declarations out of "uapi". [+ + +]

Author: NeilBrown <[email protected]>
Date:   Thu Sep 2 11:14:47 2021 +1000

    NFSD: move filehandle format declarations out of "uapi".
    
    [ Upstream commit ef5825e3cf0d0af657f5fb4dd86d750ed42fee0a ]
    
    A small part of the declaration concerning filehandle format are
    currently in the "uapi" include directory:
       include/uapi/linux/nfsd/nfsfh.h
    
    There is a lot more to the filehandle format, including "enum fid_type"
    and "enum nfsd_fsid" which are not exported via "uapi".
    
    This small part of the filehandle definition is of minimal use outside
    of the kernel, and I can find no evidence that an other code is using
    it. Certainly nfs-utils and wireshark (The most likely candidates) do not
    use these declarations.
    
    So move it out of "uapi" by copying the content from
      include/uapi/linux/nfsd/nfsfh.h
    into
      fs/nfsd/nfsfh.h
    
    A few unnecessary "#include" directives are not copied, and neither is
    the #define of fh_auth, which is annotated as being for userspace only.
    
    The copyright claims in the uapi file are identical to those in the nfsd
    file, so there is no need to copy those.
    
    The "__u32" style integer types are only needed in "uapi".  In
    kernel-only code we can use the more familiar "u32" style.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Move fill_pre_wcc() and fill_post_wcc() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Dec 24 14:36:49 2021 -0500

    NFSD: Move fill_pre_wcc() and fill_post_wcc()
    
    [ Upstream commit fcb5e3fa012351f3b96024c07bc44834c2478213 ]
    
    These functions are related to file handle processing and have
    nothing to do with XDR encoding or decoding. Also they are no longer
    NFSv3-specific. As a clean-up, move their definitions to a more
    appropriate location. WCC is also an NFSv3-specific term, so rename
    them as general-purpose helpers.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: move from strlcpy with unused retval to strscpy [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Thu Aug 18 23:01:14 2022 +0200

    NFSD: move from strlcpy with unused retval to strscpy
    
    [ Upstream commit 72f78ae00a8e5d7abe13abac8305a300f6afd74b ]
    
    Follow the advice of the below link and prefer 'strscpy' in this
    subsystem. Conversion is 1:1 because the return value is not used.
    Generated by a coccinelle script.
    
    Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
    Signed-off-by: Wolfram Sang <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Move nfsd_file_trace_alloc() tracepoint [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:49 2022 -0400

    NFSD: Move nfsd_file_trace_alloc() tracepoint
    
    [ Upstream commit b40a2839470cd62ed68c4a32d72a18ee8975b1ac ]
    
    Avoid recording the allocation of an nfsd_file item that is
    immediately released because a matching item was already
    inserted in the hash.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: move nfserrno() to vfs.c [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Oct 18 07:47:55 2022 -0400

    nfsd: move nfserrno() to vfs.c
    
    [ Upstream commit cb12fae1c34b1fa7eaae92c5aadc72d86d7fae19 ]
    
    nfserrno() is common to all nfs versions, but nfsproc.c is specifically
    for NFSv2. Move it to vfs.c, and the prototype to vfs.h.
    
    While we're in here, remove the #ifdef EDQUOT check in this function.
    It's apparently a holdover from the initial merge of the nfsd code in
    1997. No other place in the kernel checks that that symbol is defined
    before using it, so I think we can dispense with it here.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Move svc_serv_ops::svo_function into struct svc_serv [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Feb 16 12:16:27 2022 -0500

    NFSD: Move svc_serv_ops::svo_function into struct svc_serv
    
    [ Upstream commit 37902c6313090235c847af89c5515591261ee338 ]
    
    Hoist svo_function back into svc_serv and remove struct
    svc_serv_ops, since the struct is now devoid of fields.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: narrow nfsd_mutex protection in nfsd thread [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    NFSD: narrow nfsd_mutex protection in nfsd thread
    
    [ Upstream commit 9d3792aefdcda71d20c2b1ecc589c17ae71eb523 ]
    
    There is nothing happening in the start of nfsd() that requires
    protection by the mutex, so don't take it until shutting down the thread
    - which does still require protection - but only for nfsd_put().
    
    Signed-off-by: NeilBrown <[email protected]>
    [ cel: address merge conflict with fd2468fa1301 ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Never call nfsd_file_gc() in foreground paths [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:30 2022 -0400

    NFSD: Never call nfsd_file_gc() in foreground paths
    
    [ Upstream commit 6df19411367a5fb4ef61854cbd1af269c077f917 ]
    
    The checks in nfsd_file_acquire() and nfsd_file_put() that directly
    invoke filecache garbage collection are intended to keep cache
    occupancy between a low- and high-watermark. The reason to limit the
    capacity of the filecache is to keep filecache lookups reasonably
    fast.
    
    However, invoking garbage collection at those points has some
    undesirable negative impacts. Files that are held open by NFSv4
    clients often push the occupancy of the filecache over these
    watermarks. At that point:
    
    - Every call to nfsd_file_acquire() and nfsd_file_put() results in
      an LRU walk. This has the same effect on lookup latency as long
      chains in the hash table.
    - Garbage collection will then run on every nfsd thread, causing a
      lot of unnecessary lock contention.
    - Limiting cache capacity pushes out files used only by NFSv3
      clients, which are the type of files the filecache is supposed to
      help.
    
    To address those negative impacts, remove the direct calls to the
    garbage collector. Subsequent patches will address maintaining
    lookup efficiency as cache capacity increases.
    
    Suggested-by: Wang Yugui <[email protected]>
    Suggested-by: Dave Chinner <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: nfsd_file_hash_remove can compute hashval [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:03 2022 -0400

    NFSD: nfsd_file_hash_remove can compute hashval
    
    [ Upstream commit cb7ec76e73ff6640241c8f1f2f35c81d4005a2d6 ]
    
    Remove an unnecessary use of nf_hashval.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Jan 6 10:39:00 2023 -0500

    nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries
    
    [ Upstream commit 6c31e4c98853a4ba47355ea151b36a77c42b7734 ]
    
    Since v4 files are expected to be long-lived, there's little value in
    closing them out of the cache when there is conflicting access.
    
    Change the comparator to also match the gc value in the key. Change both
    of the current users of that key to set the gc value in the key to
    "true".
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: nfsd_file_put() can sleep [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed May 11 13:02:21 2022 -0400

    NFSD: nfsd_file_put() can sleep
    
    [ Upstream commit 08af54b3e5729bc1d56ad3190af811301bdc37a1 ]
    
    Now that there are no more callers of nfsd_file_put() that might
    hold a spin lock, ensure the lockdep infrastructure can catch
    newly introduced calls to nfsd_file_put() made while a spinlock
    is held.
    
    Link: https://lore.kernel.org/linux-nfs/[email protected]/T/#mf1855552570cf9a9c80d1e49d91438cd9085aada
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:50 2022 -0400

    NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
    
    [ Upstream commit 8755326399f471ec3b31e2ab8c5074c0d28a0fb5 ]
    
    Remove an unnecessary usage of nf_hashval.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: nfserrno(-ENOMEM) is nfserr_jukebox [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:09 2022 -0400

    NFSD: nfserrno(-ENOMEM) is nfserr_jukebox
    
    [ Upstream commit bb4d842722b84a2731257054b6405f2d866fc5f3 ]
    
    Suggested-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: NFSv4 CLOSE should release an nfsd_file immediately [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:27:02 2022 -0400

    NFSD: NFSv4 CLOSE should release an nfsd_file immediately
    
    [ Upstream commit 5e138c4a750dc140d881dab4a8804b094bbc08d2 ]
    
    The last close of a file should enable other accessors to open and
    use that file immediately. Leaving the file open in the filecache
    prevents other users from accessing that file until the filecache
    garbage-collects the file -- sometimes that takes several seconds.
    
    Reported-by: Wang Yugui <[email protected]>
    Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: No longer record nf_hashval in the trace log [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:37 2022 -0400

    NFSD: No longer record nf_hashval in the trace log
    
    [ Upstream commit 54f7df7094b329ca35d9f9808692bb16c48b13e9 ]
    
    I'm about to replace nfsd_file_hashtbl with an rhashtable. The
    individual hash values will no longer be visible or relevant, so
    remove them from the tracepoints.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: only call fh_unlock() once in nfsd_link() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: only call fh_unlock() once in nfsd_link()
    
    [ Upstream commit e18bcb33bc5b69bccc2b532075aa00bb49cc01c5 ]
    
    On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
    because fh_unlock() records that the unlock has been done and doesn't
    repeat it.
    However it makes the code a little confusing and interferes with changes
    that are planned for directory locking.
    
    So rearrange the code to ensure fh_unlock() is called exactly once if
    fh_lock() was called.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: only fill out return pointer on success in nfsd4_lookup_stateid [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Sep 26 12:38:44 2022 -0400

    nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
    
    [ Upstream commit 4d01416ab41540bb13ec4a39ac4e6c4aa5934bc9 ]
    
    In the case of a revoked delegation, we still fill out the pointer even
    when returning an error, which is bad form. Only overwrite the pointer
    on success.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Optimize DRC bucket pruning [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 20 15:25:21 2021 -0400

    NFSD: Optimize DRC bucket pruning
    
    [ Upstream commit 8847ecc9274a14114385d1cb4030326baa0766eb ]
    
    DRC bucket pruning is done by nfsd_cache_lookup(), which is part of
    every NFSv2 and NFSv3 dispatch (ie, it's done while the client is
    waiting).
    
    I added a trace_printk() in prune_bucket() to see just how long
    it takes to prune. Here are two ends of the spectrum:
    
     prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining
     prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining
    ...
     prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining
    
    Pruning latency is noticeable on fast transports with fast storage.
    By noticeable, I mean that the latency measured here in the worst
    case is the same order of magnitude as the round trip time for
    cached server operations.
    
    We could do something like moving expired entries to an expired list
    and then free them later instead of freeing them right in
    prune_bucket(). But simply limiting the number of entries that can
    be pruned by a lookup is simple and retains more entries in the
    cache, making the DRC somewhat more effective.
    
    Comparison with a 70/30 fio 8KB 12 thread direct I/O test:
    
    Before:
    
      write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets
    
    WRITE:
            1848726 ops (30%)
            avg bytes sent per op: 8340 avg bytes received per op: 136
            backlog wait: 0.635158  RTT: 0.128525   total execute time: 0.827242 (milliseconds)
    
    After:
    
      write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets
    
    WRITE:
            1891144 ops (30%)
            avg bytes sent per op: 8340 avg bytes received per op: 136
            backlog wait: 0.616114  RTT: 0.126842   total execute time: 0.805348 (milliseconds)
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Optimize nfsd4_encode_fattr() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:08:45 2022 -0400

    NFSD: Optimize nfsd4_encode_fattr()
    
    [ Upstream commit ab04de60ae1cc64ae16b77feae795311b97720c7 ]
    
    write_bytes_to_xdr_buf() is a generic way to place a variable-length
    data item in an already-reserved spot in the encoding buffer.
    
    However, it is costly. In nfsd4_encode_fattr(), it is unnecessary
    because the data item is fixed in size and the buffer destination
    address is always word-aligned.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Optimize nfsd4_encode_operation() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:08:38 2022 -0400

    NFSD: Optimize nfsd4_encode_operation()
    
    [ Upstream commit 095a764b7afb06c9499b798c04eaa3cbf70ebe2d ]
    
    write_bytes_to_xdr_buf() is a generic way to place a variable-length
    data item in an already-reserved spot in the encoding buffer.
    However, it is costly, and here, it is unnecessary because the
    data item is fixed in size, the buffer destination address is
    always word-aligned, and the destination location is already in
    @p.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Optimize nfsd4_encode_readv() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:09:04 2022 -0400

    NFSD: Optimize nfsd4_encode_readv()
    
    [ Upstream commit 28d5bc468efe74b790e052f758ce083a5015c665 ]
    
    write_bytes_to_xdr_buf() is pretty expensive to use for inserting
    an XDR data item that is always 1 XDR_UNIT at an address that is
    always XDR word-aligned.
    
    Since both the readv and splice read paths encode EOF and maxcount
    values, move both to a common code path.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Pack struct nfsd4_compoundres [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:23:36 2022 -0400

    NFSD: Pack struct nfsd4_compoundres
    
    [ Upstream commit 9f553e61bd36c1048543ac2f6945103dd2f742be ]
    
    Remove a couple of 4-byte holes on platforms with 64-bit pointers.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Pass the target nfsd_file to nfsd_commit() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:46:38 2022 -0400

    NFSD: Pass the target nfsd_file to nfsd_commit()
    
    [ Upstream commit c252849082ff525af18b4f253b3c9ece94e951ed ]
    
    In a moment I'm going to introduce separate nfsd_file types, one of
    which is garbage-collected; the other, not. The garbage-collected
    variety is to be used by NFSv2 and v3, and the non-garbage-collected
    variety is to be used by NFSv4.
    
    nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
    nfsd_commit() to find and use the correct variety of cached
    nfsd_file object for the NFS version that is in use.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Tested-by: Jeff Layton <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Propagate some error code returned by memdup_user() [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Thu Sep 1 07:27:19 2022 +0200

    nfsd: Propagate some error code returned by memdup_user()
    
    [ Upstream commit 30a30fcc3fc1ad4c5d017c9fcb75dc8f59e7bdad ]
    
    Propagate the error code returned by memdup_user() instead of a hard coded
    -EFAULT.
    
    Suggested-by: Dan Carpenter <[email protected]>
    Signed-off-by: Christophe JAILLET <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Protect against filesystem freezing [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 6 10:43:47 2023 -0500

    NFSD: Protect against filesystem freezing
    
    [ Upstream commit fd9a2e1d513823e840960cb3bc26d8b7749d4ac2 ]
    
    Flole observes this WARNING on occasion:
    
    [1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0
    
    Reported-by: <[email protected]>
    Suggested-by: Jan Kara <[email protected]>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123
    Fixes: 73da852e3831 ("nfsd: use vfs_iter_read/write")
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: put the export reference in nfsd4_verify_deleg_dentry [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Nov 8 11:23:11 2022 -0500

    nfsd: put the export reference in nfsd4_verify_deleg_dentry
    
    [ Upstream commit 50256e4793a5e5ab77703c82a47344ad2e774a59 ]
    
    nfsd_lookup_dentry returns an export reference in addition to the dentry
    ref. Ensure that we put it too.
    
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138866
    Fixes: 876c553cb410 ("NFSD: verify the opened dentry after setting a delegation")
    Reported-by: Yongcheng Yang <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Record number of flush calls [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:45 2022 -0400

    NFSD: Record number of flush calls
    
    [ Upstream commit df2aff524faceaf743b7c5ab0f4fb86cb511f782 ]
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:22:44 2022 -0400

    NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing
    
    [ Upstream commit 3fdc546462348b8a497c72bc894e0cde9f10fc40 ]
    
    Have SunRPC clear everything except for the iops array. Then have
    each NFSv4 XDR decoder clear it's own argument before decoding.
    
    Now individual operations may have a large argument struct while not
    penalizing the vast majority of operations with a small struct.
    
    And, clearing the argument structure occurs as the argument fields
    are initialized, enabling the CPU to do write combining on that
    memory. In some cases, clearing is not even necessary because all
    of the fields in the argument structure are initialized by the
    decoder.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: reduce locking in nfsd_lookup() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: reduce locking in nfsd_lookup()
    
    [ Upstream commit 19d008b46941b8c668402170522e0f7a9258409c ]
    
    nfsd_lookup() takes an exclusive lock on the parent inode, but no
    callers want the lock and it may not be needed at all if the
    result is in the dcache.
    
    Change nfsd_lookup_dentry() to not take the lock, and call
    lookup_one_len_locked() which takes lock only if needed.
    
    nfsd4_open() currently expects the lock to still be held, but that isn't
    necessary as nfsd_validate_delegated_dentry() provides required
    guarantees without the lock.
    
    NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
      wasn't requested and no change happened.  Now that nfsd_lookup()
      doesn't use fh_lock(), we need to explicitly fill the attributes
      when no create happens.  A new fh_fill_both_attrs() is provided
      for that task.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor __nfsd_file_close_inode() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:57 2022 -0400

    NFSD: Refactor __nfsd_file_close_inode()
    
    [ Upstream commit a845511007a63467fee575353c706806c21218b1 ]
    
    The code that computes the hashval is the same in both callers.
    
    To prevent them from going stale, reframe the documenting comments
    to remove descriptions of the underlying hash table structure, which
    is about to be replaced.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor common code out of dirlist helpers [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:22:56 2022 -0400

    NFSD: Refactor common code out of dirlist helpers
    
    [ Upstream commit 98124f5bd6c76699d514fbe491dd95265369cc99 ]
    
    The dust has settled a bit and it's become obvious what code is
    totally common between nfsd_init_dirlist_pages() and
    nfsd3_init_dirlist_pages(). Move that common code to SUNRPC.
    
    The new helper brackets the existing xdr_init_decode_pages() API.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor find_file() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:47 2022 -0400

    NFSD: Refactor find_file()
    
    [ Upstream commit 15424748001a9b5ea62b3e6ad45f0a8b27f01df9 ]
    
    find_file() is now the only caller of find_file_locked(), so just
    fold these two together.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:47 2022 -0400

    NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
    
    [ Upstream commit 24d796ea383b8a4c8234e06d1b14bbcd371192ea ]
    
    The @src parameter is sometimes a pointer to a struct nfsd_file and
    sometimes a pointer to struct file hiding in a phony struct
    nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter
    is always an explicit struct file.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:53 2022 -0400

    NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
    
    [ Upstream commit 478ed7b10d875da2743d1a22822b9f8a82df8f12 ]
    
    Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A
    subsequent patch will modify one of the new call sites to avoid
    the need to manufacture the phony struct nfsd_file.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd4_do_copy() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:59 2022 -0400

    NFSD: Refactor nfsd4_do_copy()
    
    [ Upstream commit 3b7bf5933cada732783554edf0dc61283551c6cf ]
    
    Refactor: Now that nfsd4_do_copy() no longer calls the cleanup
    helpers, plumb the use of struct file pointers all the way down to
    _nfsd_copy_file_range().
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd_create_setattr() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 28 16:10:17 2022 -0400

    NFSD: Refactor nfsd_create_setattr()
    
    [ Upstream commit 5f46e950c395b9c14c282b53ba78c5fd46d6c256 ]
    
    I'd like to move do_nfsd_create() out of vfs.c. Therefore
    nfsd_create_setattr() needs to be made publicly visible.
    
    Note that both call sites in vfs.c commit both the new object and
    its parent directory, so just combine those common metadata commits
    into nfsd_create_setattr().
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd_file_gc() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:25 2022 -0400

    NFSD: Refactor nfsd_file_gc()
    
    [ Upstream commit 3bc6d3470fe412f818f9bff6b71d1be3a76af8f3 ]
    
    Refactor nfsd_file_gc() to use the new list_lru helper.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd_file_lru_scan() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:31 2022 -0400

    NFSD: Refactor nfsd_file_lru_scan()
    
    [ Upstream commit 39f1d1ff8148902c5692ffb0e1c4479416ab44a7 ]
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor nfsd_setattr() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:14:07 2022 -0400

    NFSD: Refactor nfsd_setattr()
    
    [ Upstream commit c0aa1913db57219e91a0a8832363cbafb3a9cf8f ]
    
    Move code that will be retried (in a subsequent patch) into a helper
    function.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor NFSv3 CREATE [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 28 13:29:23 2022 -0400

    NFSD: Refactor NFSv3 CREATE
    
    [ Upstream commit df9606abddfb01090d5ece7dcc2441d848f690f0 ]
    
    The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to
    diverge such that it makes sense to split do_nfsd_create() into one
    version for NFSv3 and one for NFSv4.
    
    As a first step, copy do_nfsd_create() to nfs3proc.c and remove
    NFSv4-specific logic.
    
    One immediate legibility benefit is that the logic for handling
    NFSv3 createhow is now quite straightforward. NFSv4 createhow
    has some subtleties that IMO do not belong in generic code.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Refactor NFSv4 OPEN(CREATE) [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 28 14:47:34 2022 -0400

    NFSD: Refactor NFSv4 OPEN(CREATE)
    
    [ Upstream commit 254454a5aa4a9f696d6bae080c08d5863e650f49 ]
    
    Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Nov 16 19:44:45 2022 -0800

    NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
    
    [ Upstream commit a1049eb47f20b9eabf9afb218578fff16b4baca6 ]
    
    Refactoring courtesy_client_reaper to generic low memory
    shrinker so it can be used for other purposes.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: refactoring v4 specific code to a helper in nfs4state.c [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Fri Jul 15 16:54:51 2022 -0700

    NFSD: refactoring v4 specific code to a helper in nfs4state.c
    
    [ Upstream commit 6867137ebcf4155fe25f2ecf7c29b9fb90a76d1d ]
    
    This patch moves the v4 specific code from nfsd_init_net() to
    nfsd4_init_leases_net() helper in nfs4state.c
    
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Jan 11 12:17:09 2023 -0800

    NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
    
    [ Upstream commit f385f7d244134246f984975ed34cd75f77de479f ]
    
    Currently the nfsd-client shrinker is registered and unregistered at
    the time the nfsd module is loaded and unloaded. The problem with this
    is the shrinker is being registered before all of the relevant fields
    in nfsd_net are initialized when nfsd is started. This can lead to an
    oops when memory is low and the shrinker is called while nfsd is not
    running.
    
    This patch moves the  register/unregister of nfsd-client shrinker from
    module load/unload time to nfsd startup/shutdown time.
    
    Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
    Reported-by: Mike Galbraith <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove "inline" directives on op_rsize_bop helpers [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:23:25 2022 -0400

    NFSD: Remove "inline" directives on op_rsize_bop helpers
    
    [ Upstream commit 6604148cf961b57fc735e4204f8996536da9253c ]
    
    These helpers are always invoked indirectly, so the compiler can't
    inline these anyway. While we're updating the synopses of these
    helpers, defensively convert their parameters to const pointers.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove be32_to_cpu() from DRC hash function [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 30 19:10:03 2021 -0400

    NFSD: Remove be32_to_cpu() from DRC hash function
    
    [ Upstream commit 7578b2f628db27281d3165af0aa862311883a858 ]
    
    Commit 7142b98d9fd7 ("nfsd: Clean up drc cache in preparation for
    global spinlock elimination"), billed as a clean-up, added
    be32_to_cpu() to the DRC hash function without explanation. That
    commit removed two comments that state that byte-swapping in the
    hash function is unnecessary without explaining whether there was
    a need for that change.
    
    On some Intel CPUs, the swab32 instruction is known to cause a CPU
    pipeline stall. be32_to_cpu() does not add extra randomness, since
    the hash multiplication is done /before/ shifting to the high-order
    bits of the result.
    
    As a micro-optimization, remove the unnecessary transform from the
    DRC hash function.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove CONFIG_NFSD_V3 [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun Feb 6 12:25:47 2022 -0500

    NFSD: Remove CONFIG_NFSD_V3
    
    [ Upstream commit 5f9a62ff7d2808c7b56c0ec90f3b7eae5872afe6 ]
    
    Eventually support for NFSv2 in the Linux NFS server is to be
    deprecated and then removed.
    
    However, NFSv2 is the "always supported" version that is available
    as soon as CONFIG_NFSD is set.  Before NFSv2 support can be removed,
    we need to choose a different "always supported" version.
    
    This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported,
    as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become
    the only "always supported" NFS version.
    
    The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove do_nfsd_create() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Mar 28 15:36:58 2022 -0400

    NFSD: Remove do_nfsd_create()
    
    [ Upstream commit 1c388f27759c5d9271d4fca081f7ee138986eb7d ]
    
    Now that its two callers have their own version-specific instance of
    this function, do_nfsd_create() is no longer used.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove dprintk call sites from tail of nfsd4_open() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Mar 30 14:28:51 2022 -0400

    NFSD: Remove dprintk call sites from tail of nfsd4_open()
    
    [ Upstream commit f67a16b147045815b6aaafeef8663e5faeb6d569 ]
    
    Clean up: These relics are not likely to benefit server
    administrators.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove kmalloc from nfsd4_do_async_copy() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:41:06 2022 -0400

    NFSD: Remove kmalloc from nfsd4_do_async_copy()
    
    [ Upstream commit ad1e46c9b07b13659635ee5405f83ad0df143116 ]
    
    Instead of manufacturing a phony struct nfsd_file, pass the
    struct file returned by nfs42_ssc_open() directly to
    nfsd4_do_copy().
    
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove lockdep assertion from unhash_and_release_locked() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:44 2022 -0400

    NFSD: Remove lockdep assertion from unhash_and_release_locked()
    
    [ Upstream commit f53cef15dddec7203df702cdc62e554190385450 ]
    
    IIUC, holding the hash bucket lock is needed only in
    nfsd_file_unhash, and there is already a lockdep assertion there.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: remove nfsd4_prepare_cb_recall() declaration [+ + +]

Author: Gaosheng Cui <[email protected]>
Date:   Fri Sep 9 14:59:10 2022 +0800

    nfsd: remove nfsd4_prepare_cb_recall() declaration
    
    [ Upstream commit 18224dc58d960c65446971930d0487fc72d00598 ]
    
    nfsd4_prepare_cb_recall() has been removed since
    commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"),
    so remove it.
    
    Signed-off-by: Gaosheng Cui <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove nfsd_file::nf_hashval [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:10 2022 -0400

    NFSD: Remove nfsd_file::nf_hashval
    
    [ Upstream commit f0743c2b25c65debd4f599a7c861428cd9de5906 ]
    
    The value in this field can always be computed from nf_inode, thus
    it is no longer used.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove NFSD_PROC_ARGS_* macros [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Oct 20 14:53:30 2021 -0400

    NFSD: Remove NFSD_PROC_ARGS_* macros
    
    [ Upstream commit c1a3f2ce66c80cd9f2a4376fa35a5c8d05441c73 ]
    
    Clean up.
    
    The PROC_ARGS macros were added when I thought that NFSD tracepoints
    would be reporting endpoint information. However, tracepoints in the
    RPC server now report transport endpoint information, so in general
    there's no need for the upper layers to do that any more, and these
    macros can be retired.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove redundant assignment to variable host_err [+ + +]

Author: Colin Ian King <[email protected]>
Date:   Mon Oct 10 21:24:23 2022 +0100

    NFSD: Remove redundant assignment to variable host_err
    
    [ Upstream commit 69eed23baf877bbb1f14d7f4df54f89807c9ee2a ]
    
    Variable host_err is assigned a value that is never read, it is being
    re-assigned a value in every different execution path in the following
    switch statement. The assignment is redundant and can be removed.
    
    Cleans up clang-scan warning:
    warning: Value stored to 'host_err' is never read [deadcode.DeadStores]
    
    Signed-off-by: Colin Ian King <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: remove redundant assignment to variable len [+ + +]

Author: Colin Ian King <[email protected]>
Date:   Tue Jun 28 22:25:25 2022 +0100

    nfsd: remove redundant assignment to variable len
    
    [ Upstream commit 842e00ac3aa3b4a4f7f750c8ab54f8578fc875d3 ]
    
    Variable len is being assigned a value zero and this is never
    read, it is being re-assigned later. The assignment is redundant
    and can be removed.
    
    Cleans up clang scan-build warning:
    fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read
    
    Signed-off-by: Colin Ian King <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: remove redundant variable status [+ + +]

Author: Jinpeng Cui <[email protected]>
Date:   Wed Aug 31 14:20:02 2022 +0000

    NFSD: remove redundant variable status
    
    [ Upstream commit 4ab3442ca384a02abf8b1f2b3449a6c547851873 ]
    
    Return value directly from fh_verify() do_open_permission()
    exp_pseudoroot() instead of getting value from
    redundant variable status.
    
    Reported-by: Zeal Robot <[email protected]>
    Signed-off-by: Jinpeng Cui <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove svc_serv_ops::svo_module [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Feb 16 12:31:09 2022 -0500

    NFSD: Remove svc_serv_ops::svo_module
    
    [ Upstream commit f49169c97fceb21ad6a0aaf671c50b0f520f15a5 ]
    
    struct svc_serv_ops is about to be removed.
    
    Neil Brown says:
    > I suspect svo_module can go as well - I don't think the thread is
    > ever the thing that primarily keeps a module active.
    
    A random sample of kthread_create() callers shows sunrpc is the only
    one that manages module reference count in this way.
    
    Suggested-by: Neil Brown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: remove the pages_flushed statistic from filecache [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Nov 2 14:44:47 2022 -0400

    nfsd: remove the pages_flushed statistic from filecache
    
    [ Upstream commit 1f696e230ea5198e393368b319eb55651828d687 ]
    
    We're counting mapping->nrpages, but not all of those are necessarily
    dirty. We don't really have a simple way to count just the dirty pages,
    so just remove this stat since it's not accurate.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove unused nfsd4_compoundargs::cachetype field [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:23:30 2022 -0400

    NFSD: Remove unused nfsd4_compoundargs::cachetype field
    
    [ Upstream commit 77e378cf2a595d8e39cddf28a31efe6afd9394a0 ]
    
    This field was added by commit 1091006c5eb1 ("nfsd: turn on reply
    cache for NFSv4") but was never put to use.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Rename boot verifier functions [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Dec 30 10:22:05 2021 -0500

    NFSD: Rename boot verifier functions
    
    [ Upstream commit 3988a57885eeac05ef89f0ab4d7e47b52fbcf630 ]
    
    Clean up: These functions handle what the specs call a write
    verifier, which in the Linux NFS server implementation is now
    divorced from the server's boot instance
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Rename the fields in copy_stateid_t [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 22 13:10:35 2022 -0400

    NFSD: Rename the fields in copy_stateid_t
    
    [ Upstream commit 781fde1a2ba2391f31142f46f964cf1148ca1791 ]
    
    Code maintenance: The name of the copy_stateid_t::sc_count field
    collides with the sc_count field in struct nfs4_stid, making the
    latter difficult to grep for when auditing stateid reference
    counting.
    
    No behavior change expected.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Reorder the fields in struct nfsd4_op [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:28 2022 -0400

    NFSD: Reorder the fields in struct nfsd4_op
    
    [ Upstream commit d314309425ad5dc1b6facdb2d456580fb5fa5e3a ]
    
    Pack the fields to reduce the size of struct nfsd4_op, which is used
    an array in struct nfsd4_compoundargs.
    
    sizeof(struct nfsd4_op):
    Before: /* size: 672, cachelines: 11, members: 5 */
    After:  /* size: 640, cachelines: 10, members: 5 */
    
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: reorganize filecache.c [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Nov 2 14:44:48 2022 -0400

    nfsd: reorganize filecache.c
    
    [ Upstream commit 8214118589881b2d390284410c5ff275e7a5e03c ]
    
    In a coming patch, we're going to rework how the filecache refcounting
    works. Move some code around in the function to reduce the churn in the
    later patches, and rename some of the functions with (hopefully) clearer
    names: nfsd_file_flush becomes nfsd_file_fsync, and
    nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue.
    
    Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free,
    to better match the name of the function from which it's called.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Replace boolean fields in struct nfsd4_copy [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:41 2022 -0400

    NFSD: Replace boolean fields in struct nfsd4_copy
    
    [ Upstream commit 1913cdf56cb5bfbc8170873728d13598cbecda23 ]
    
    Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy()
    with an atomic bitop.
    
    [ cel: adjusted to apply to v5.15.y ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: replace delayed_work with work_struct for nfsd_client_shrinker [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Wed Jan 11 16:06:51 2023 -0800

    NFSD: replace delayed_work with work_struct for nfsd_client_shrinker
    
    [ Upstream commit 7c24fa225081f31bc6da6a355c1ba801889ab29a ]
    
    Since nfsd4_state_shrinker_count always calls mod_delayed_work with
    0 delay, we can replace delayed_work with work_struct to save some
    space and overhead.
    
    Also add the call to cancel_work after unregister the shrinker
    in nfs4_state_shutdown_net.
    
    Signed-off-by: Dai Ngo <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Replace dprintk() call site in fh_verify() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:13:42 2022 -0400

    NFSD: Replace dprintk() call site in fh_verify()
    
    [ Upstream commit 948755efc951de75c87d4fa916d9d36b58299295 ]
    
    Record permission errors in the trace log. Note that the new trace
    event is conditional, so it will only record non-zero return values
    from nfsd_permission().
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Replace the "init once" mechanism [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:16 2022 -0400

    NFSD: Replace the "init once" mechanism
    
    [ Upstream commit c7b824c3d06c85e054caf86e227255112c5e3c38 ]
    
    In a moment, the nfsd_file_hashtbl global will be replaced with an
    rhashtable. Replace the one or two spots that need to check if the
    hash table is available. We can easily reuse the SHUTDOWN flag for
    this purpose.
    
    Document that this mechanism relies on callers to hold the
    nfsd_mutex to prevent init, shutdown, and purging to run
    concurrently.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Report average age of filecache items [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:12 2022 -0400

    NFSD: Report average age of filecache items
    
    [ Upstream commit 904940e94a887701db24401e3ed6928a1d4e329f ]
    
    This is a measure of how long items stay in the filecache, to help
    assess how efficient the cache is.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Report count of calls to nfsd_file_acquire() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:23:59 2022 -0400

    NFSD: Report count of calls to nfsd_file_acquire()
    
    [ Upstream commit 29d4bdbbb910f33d6058d2c51278f00f656df325 ]
    
    Count the number of successful acquisitions that did not create a
    file (ie, acquisitions that do not result in a compulsory cache
    miss). This count can be compared directly with the reported hit
    count to compute a hit ratio.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Report count of freed filecache items [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:05 2022 -0400

    NFSD: Report count of freed filecache items
    
    [ Upstream commit d63293272abb51c02457f1017dfd61c3270d9ae3 ]
    
    Surface the count of freed nfsd_file items.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Report filecache LRU size [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:23:52 2022 -0400

    NFSD: Report filecache LRU size
    
    [ Upstream commit 0fd244c115f0321fc5e34ad2291f2a572508e3f7 ]
    
    Surface the NFSD filecache's LRU list length to help field
    troubleshooters monitor filecache issues.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Report the number of items evicted by the LRU walk [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:38 2022 -0400

    NFSD: Report the number of items evicted by the LRU walk
    
    [ Upstream commit 94660cc19c75083af046b0f8362e3d3bc2eba21d ]
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Retry once in nfsd_open on an -EOPENSTALE return [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Sat Dec 18 20:37:56 2021 -0500

    nfsd: Retry once in nfsd_open on an -EOPENSTALE return
    
    [ Upstream commit 12bcbd40fd931472c7fc9cf3bfe66799ece93ed8 ]
    
    If we get back -EOPENSTALE from an NFSv4 open, then we either got some
    unhandled error or the inode we got back was not the same as the one
    associated with the dentry.
    
    We really have no recourse in that situation other than to retry the
    open, and if it fails to just return nfserr_stale back to the client.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Lance Shelton <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: return error if nfs4_setacl fails [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Mon Nov 7 06:58:41 2022 -0500

    nfsd: return error if nfs4_setacl fails
    
    [ Upstream commit 01d53a88c08951f88f2a42f1f1e6568928e0590e ]
    
    With the addition of POSIX ACLs to struct nfsd_attrs, we no longer
    return an error if setting the ACL fails. Ensure we return the na_aclerr
    error on SETATTR if there is one.
    
    Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
    Cc: Neil Brown <[email protected]>
    Reported-by: Yongcheng Yang <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:46:44 2022 -0400

    NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
    
    [ Upstream commit dcf3f80965ca787c70def402cdf1553c93c75529 ]
    
    This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2.
    
    That commit attempted to make files available to other users as soon
    as all NFSv4 clients were done with them, rather than waiting until
    the filecache LRU had garbage collected them.
    
    It gets the reference counting wrong, for one thing.
    
    But it also misses that DELEGRETURN should release a file in the
    same fashion. In fact, any nfsd_file_put() on an file held open
    by an NFSv4 client needs potentially to release the file
    immediately...
    
    Clear the way for implementing that idea.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: rework hashtable handling in nfsd_do_file_acquire [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Oct 4 15:41:10 2022 -0400

    nfsd: rework hashtable handling in nfsd_do_file_acquire
    
    [ Upstream commit 243a5263014a30436c93ed3f1f864c1da845455e ]
    
    nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough
    to get a reference after finding it in the hash. Take the
    rcu_read_lock() and call rhashtable_lookup directly.
    
    Switch to using rhashtable_lookup_insert_key as well, and use the usual
    retry mechanism if we hit an -EEXIST. Rename the "retry" bool to
    open_retry, and eliminiate the insert_err goto target.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: rework refcounting in filecache [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Sun Dec 11 06:19:33 2022 -0500

    nfsd: rework refcounting in filecache
    
    [ Upstream commit ac3a2585f018f10039b4a856dcb122da88c1c1c9 ]
    
    The filecache refcounting is a bit non-standard for something searchable
    by RCU, in that we maintain a sentinel reference while it's hashed. This
    in turn requires that we have to do things differently in the "put"
    depending on whether its hashed, which we believe to have led to races.
    
    There are other problems in here too. nfsd_file_close_inode_sync can end
    up freeing an nfsd_file while there are still outstanding references to
    it, and there are a number of subtle ToC/ToU races.
    
    Rework the code so that the refcount is what drives the lifecycle. When
    the refcount goes to zero, then unhash and rcu free the object. A task
    searching for a nfsd_file is allowed to bump its refcount, but only if
    it's not already 0. Ensure that we don't make any other changes to it
    until a reference is held.
    
    With this change, the LRU carries a reference. Take special care to deal
    with it when removing an entry from the list, and ensure that we only
    repurpose the nf_lru list_head when the refcount is 0 to ensure
    exclusive access to it.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Save location of NFSv4 COMPOUND status [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Oct 13 10:40:59 2021 -0400

    NFSD: Save location of NFSv4 COMPOUND status
    
    [ Upstream commit 3b0ebb255fdc49a3d340846deebf045ef58ec744 ]
    
    Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS
    dispatcher to pass in the buffer location of the COMPOUND status.
    Instead, save that buffer location in struct nfsd4_compoundres.
    
    The compound tag follows immediately after.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: separate nfsd_last_thread() from nfsd_put() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Jul 31 16:48:32 2023 +1000

    nfsd: separate nfsd_last_thread() from nfsd_put()
    
    [ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ]
    
    Now that the last nfsd thread is stopped by an explicit act of calling
    svc_set_num_threads() with a count of zero, we only have a limited
    number of places that can happen, and don't need to call
    nfsd_last_thread() in nfsd_put()
    
    So separate that out and call it at the two places where the number of
    threads is set to zero.
    
    Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all()
    into nfsd_last_thread(), as they are really part of the same action.
    
    nfsd_put() is now a thin wrapper around svc_put(), so make it a static
    inline.
    
    nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of
    places we have to use svc_put() instead.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Separate tracepoints for acquire and create [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:43 2022 -0400

    NFSD: Separate tracepoints for acquire and create
    
    [ Upstream commit be0230069fcbf7d332d010b57c1d0cfd623a84d6 ]
    
    These tracepoints collect different information: the create case does
    not open a file, so there's no nf_file available.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: set attributes when creating symlinks [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: set attributes when creating symlinks
    
    [ Upstream commit 93adc1e391a761441d783828b93979b38093d011 ]
    
    The NFS protocol includes attributes when creating symlinks.
    Linux does store attributes for symlinks and allows them to be set,
    though they are not used for permission checking.
    
    NFSD currently doesn't set standard (struct iattr) attributes when
    creating symlinks, but for NFSv4 it does set ACLs and security labels.
    This is inconsistent.
    
    To improve consistency, pass the provided attributes into nfsd_symlink()
    and call nfsd_create_setattr() to set them.
    
    NOTE: this results in a behaviour change for all NFS versions when the
    client sends non-default attributes with a SYMLINK request. With the
    Linux client, the only attributes are:
            attr.ia_mode = S_IFLNK | S_IRWXUGO;
            attr.ia_valid = ATTR_MODE;
    so the final outcome will be unchanged. Other clients might sent
    different attributes, and if they did they probably expect them to be
    honoured.
    
    We ignore any error from nfsd_create_setattr().  It isn't really clear
    what should be done if a file is successfully created, but the
    attributes cannot be set.  NFS doesn't allow partial success to be
    reported.  Reporting failure is probably more misleading than reporting
    success, so the status is ignored.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Set up an rhashtable for the filecache [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:26:23 2022 -0400

    NFSD: Set up an rhashtable for the filecache
    
    [ Upstream commit fc22945ecc2a0a028f3683115f98a922d506c284 ]
    
    Add code to initialize and tear down an rhashtable. The rhashtable
    is not used yet.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Show state of courtesy client in client info [+ + +]

Author: Dai Ngo <[email protected]>
Date:   Mon May 2 14:19:27 2022 -0700

    NFSD: Show state of courtesy client in client info
    
    [ Upstream commit e9488d5ae13c0a72223c507e2508dc2ac66cad4f ]
    
    Update client_info_show to show state of courtesy client
    and seconds since last renew.
    
    Reviewed-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Dai Ngo <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Shrink size of struct nfsd4_copy [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:22 2022 -0400

    NFSD: Shrink size of struct nfsd4_copy
    
    [ Upstream commit 87689df694916c40e8e6c179ab1c8710f65cb6c6 ]
    
    struct nfsd4_copy is part of struct nfsd4_op, which resides in an
    8-element array.
    
    sizeof(struct nfsd4_op):
    Before: /* size: 1696, cachelines: 27, members: 5 */
    After:  /* size: 672, cachelines: 11, members: 5 */
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Shrink size of struct nfsd4_copy_notify [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jul 27 14:40:16 2022 -0400

    NFSD: Shrink size of struct nfsd4_copy_notify
    
    [ Upstream commit 09426ef2a64ee189ca1e3298f1e874842dbf35ea ]
    
    struct nfsd4_copy_notify is part of struct nfsd4_op, which resides
    in an 8-element array.
    
    sizeof(struct nfsd4_op):
    Before: /* size: 2208, cachelines: 35, members: 5 */
    After:  /* size: 1696, cachelines: 27, members: 5 */
    
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: silence extraneous printk on nfsd.ko insertion [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Jul 20 08:39:23 2022 -0400

    nfsd: silence extraneous printk on nfsd.ko insertion
    
    [ Upstream commit 3a5940bfa17fb9964bf9688b4356ca643a8f5e2d ]
    
    This printk pops every time nfsd.ko gets plugged in. Most kmods don't do
    that and this one is not very informative. Olaf's email address seems to
    be defunct at this point anyway. Just drop it.
    
    Cc: Olaf Kirch <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Simplify code around svc_exit_thread() call in nfsd() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Jul 31 16:48:31 2023 +1000

    nfsd: Simplify code around svc_exit_thread() call in nfsd()
    
    [ Upstream commit 18e4cf915543257eae2925671934937163f5639b ]
    
    Previously a thread could exit asynchronously (due to a signal) so some
    care was needed to hold nfsd_mutex over the last svc_put() call.  Now a
    thread can only exit when svc_set_num_threads() is called, and this is
    always called under nfsd_mutex.  So no care is needed.
    
    Not only is the mutex held when a thread exits now, but the svc refcount
    is elevated, so the svc_put() in svc_exit_thread() will never be a final
    put, so the mutex isn't even needed at this point in the code.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: simplify locking for network notifier. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    NFSD: simplify locking for network notifier.
    
    [ Upstream commit d057cfec4940ce6eeffa22b4a71dec203b06cd55 ]
    
    nfsd currently maintains an open-coded read/write semaphore (refcount
    and wait queue) for each network namespace to ensure the nfs service
    isn't shut down while the notifier is running.
    
    This is excessive.  As there is unlikely to be contention between
    notifiers and they run without sleeping, a single spinlock is sufficient
    to avoid problems.
    
    Signed-off-by: NeilBrown <[email protected]>
    [ cel: ensure nfsd_notifier_lock is static ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: simplify per-net file cache management [+ + +]

Author: NeilBrown <[email protected]>
Date:   Wed Dec 1 10:58:14 2021 +1100

    NFSD: simplify per-net file cache management
    
    [ Upstream commit 1463b38e7cf34d4cc60f41daff459ad807b2e408 ]
    
    We currently have a 'laundrette' for closing cached files - a different
    work-item for each network-namespace.
    
    These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a
    list, and are freed using rcu.
    
    The list is not necessary as we have a per-namespace structure (struct
    nfsd_net) which can hold a link to the nfsd_fcache_disposal.
    The use of kfree_rcu is also unnecessary as the cache is cleaned of all
    files associated with a given namespace, and no new files can be added,
    before the nfsd_fcache_disposal is freed.
    
    So add a '->fcache_disposal' link to nfsd_net, and discard the list
    management and rcu usage.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Simplify READ_PLUS [+ + +]

Author: Anna Schumaker <[email protected]>
Date:   Tue Sep 13 14:01:51 2022 -0400

    NFSD: Simplify READ_PLUS
    
    [ Upstream commit eeadcb75794516839078c28b3730132aeb700ce6 ]
    
    Chuck had suggested reverting READ_PLUS so it returns a single DATA
    segment covering the requested read range. This prepares the server for
    a future "sparse read" function so support can easily be added without
    needing to rip out the old READ_PLUS code at the same time.
    
    Signed-off-by: Anna Schumaker <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Simplify starting_len [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:09:10 2022 -0400

    NFSD: Simplify starting_len
    
    [ Upstream commit 071ae99feadfc55979f89287d6ad2c6a315cb46d ]
    
    Clean-up: Now that nfsd4_encode_readv() does not have to encode the
    EOF or rd_length values, it no longer needs to subtract 8 from
    @starting_len.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: simplify struct nfsfh [+ + +]

Author: NeilBrown <[email protected]>
Date:   Thu Sep 2 11:16:32 2021 +1000

    NFSD: simplify struct nfsfh
    
    [ Upstream commit d8b26071e65e80a348602b939e333242f989221b ]
    
    Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
    struct) and are accessed using macros like:
    
     #define fh_FOO fh_base.fh_new.fb_FOO
    
    This patch makes the union and struct anonymous, so that "fh_FOO" can be
    a name directly within 'struct knfsd_fh' and the #defines aren't needed.
    
    The file handle as a whole is sometimes accessed as "fh_base" or
    "fh_base.fh_pad", neither of which are particularly helpful names.
    As the struct holding the filehandle is now anonymous, we
    cannot use the name of that, so we union it with 'fh_raw' and use that
    where the raw filehandle is needed.  fh_raw also ensure the structure is
    large enough for the largest possible filehandle.
    
    fh_raw is a 'char' array, removing any need to cast it for memcpy etc.
    
    SVCFH_fmt() is simplified using the "%ph" printk format.  This
    changes the appearance of filehandles in dprintk() debugging, making
    them a little more precise.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Jan 6 10:39:01 2023 -0500

    nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator
    
    [ Upstream commit d69b8dbfd0866abc5ec84652cc1c10fc3d4d91ef ]
    
    test_bit returns bool, so we can just compare the result of that to the
    key->gc value without the "!!".
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: simplify the delayed disposal list code [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Fri Apr 14 17:31:44 2023 -0400

    nfsd: simplify the delayed disposal list code
    
    [ Upstream commit 92e4a6733f922f0fef1d0995f7b2d0eaff86c7ea ]
    
    When queueing a dispose list to the appropriate "freeme" lists, it
    pointlessly queues the objects one at a time to an intermediate list.
    
    Remove a few helpers and just open code a list_move to make it more
    clear and efficient. Better document the resulting functions with
    kerneldoc comments.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Skip extra computation for RC_NOCACHE case [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Sep 28 11:39:02 2021 -0400

    NFSD: Skip extra computation for RC_NOCACHE case
    
    [ Upstream commit 0f29ce32fbc56cfdb304eec8a4deb920ccfd89c3 ]
    
    Force the compiler to skip unneeded initialization for cases that
    don't need those values. For example, NFSv4 COMPOUND operations are
    RC_NOCACHE.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Streamline the rare "found" case [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Sep 28 11:40:59 2021 -0400

    NFSD: Streamline the rare "found" case
    
    [ Upstream commit add1511c38166cf1036765f8c4aa939f0275a799 ]
    
    Move a rarely called function call site out of the hot path.
    
    This is an exceptionally small improvement because the compiler
    inlines most of the functions that nfsd_cache_lookup() calls.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Trace boot verifier resets [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Dec 28 14:27:56 2021 -0500

    NFSD: Trace boot verifier resets
    
    [ Upstream commit 75acacb6583df0b9328dc701d8eeea05af49b8b5 ]
    
    According to commit bbf2f098838a ("nfsd: Reset the boot verifier on
    all write I/O errors"), the Linux NFS server forces all clients to
    resend pending unstable writes if any server-side write or commit
    operation encounters an error (say, ENOSPC). This is a rare and
    quite exceptional event that could require administrative recovery
    action, so it should be made trace-able. Example trace event:
    
    nfsd-938   [002]  7174.945558: nfsd_writeverf_reset: boot_time=        61cc920d xid=0xdcd62036 error=-28 new verifier=0x08aecc6142515904
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Trace delegation revocations [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:09 2022 -0400

    NFSD: Trace delegation revocations
    
    [ Upstream commit a1c74569bbde91299f24535abf711be5c84df9de ]
    
    Delegation revocation is an exceptional event that is not otherwise
    visible externally (eg, no network traffic is emitted). Generate a
    trace record when it occurs so that revocation can be observed or
    other activity can be triggered. Example:
    
    nfsd-1104  [005]  1912.002544: nfsd_stid_revoke:        client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG
    
    Trace infrastructure is provided for subsequent additional tracing
    related to nfs4_stid activity.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Tested-by: Jeff Layton <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Trace filecache LRU activity [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:11 2022 -0400

    NFSD: Trace filecache LRU activity
    
    [ Upstream commit c46203acddd9b9200dbc53d0603c97355fd3a03b ]
    
    Observe the operation of garbage collection and the lifetime of
    filecache items.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Trace filecache opens [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun Mar 27 16:42:20 2022 -0400

    NFSD: Trace filecache opens
    
    [ Upstream commit 0122e882119ddbd9efa6edfeeac3f5c704a7aeea ]
    
    Instrument calls to nfsd_open_verified() to get a sense of the
    filecache hit rate.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Trace NFSv4 COMPOUND tags [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 8 18:13:48 2022 -0400

    NFSD: Trace NFSv4 COMPOUND tags
    
    [ Upstream commit de29cf7e6cbbe236c3a51999c188fcd467762899 ]
    
    The Linux NFSv4 client implementation does not use COMPOUND tags,
    but the Solaris and MacOS implementations do, and so does pynfs.
    Record these eye-catchers in the server's trace buffer to annotate
    client requests while troubleshooting.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Trace stateids returned via DELEGRETURN [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:03 2022 -0400

    NFSD: Trace stateids returned via DELEGRETURN
    
    [ Upstream commit 20eee313ff4b8a7e71ae9560f5c4ba27cd763005 ]
    
    Handing out a delegation stateid is recorded with the
    nfsd_deleg_read tracepoint, but there isn't a matching tracepoint
    for recording when the stateid is returned.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: unregister shrinker when nfsd_init_net() fails [+ + +]

Author: Tetsuo Handa <[email protected]>
Date:   Mon Oct 10 14:59:02 2022 +0900

    NFSD: unregister shrinker when nfsd_init_net() fails
    
    [ Upstream commit bd86c69dae65de30f6d47249418ba7889809e31a ]
    
    syzbot is reporting UAF read at register_shrinker_prepared() [1], for
    commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on
    low memory condition") missed that nfsd4_leases_net_shutdown() from
    nfsd_exit_net() is called only when nfsd_init_net() succeeded.
    If nfsd_init_net() fails due to nfsd_reply_cache_init() failure,
    register_shrinker() from nfsd4_init_leases_net() has to be undone
    before nfsd_init_net() returns.
    
    Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89 [1]
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Tetsuo Handa <[email protected]>
    Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition")
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: Unregister the cld notifier when laundry_wq create failed [+ + +]

Author: Zhang Xiaoxu <[email protected]>
Date:   Sat May 21 12:08:44 2022 +0800

    nfsd: Unregister the cld notifier when laundry_wq create failed
    
    [ Upstream commit 62fdb65edb6c43306c774939001f3a00974832aa ]
    
    If laundry_wq create failed, the cld notifier should be unregistered.
    
    Signed-off-by: Zhang Xiaoxu <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: update comment over __nfsd_file_cache_purge [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Thu Jan 26 12:21:16 2023 -0500

    nfsd: update comment over __nfsd_file_cache_purge
    
    [ Upstream commit 972cc0e0924598cb293b919d39c848dc038b2c28 ]
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: update create verifier comment [+ + +]

Author: J. Bruce Fields <[email protected]>
Date:   Mon Feb 19 11:44:28 2024 -0500

    nfsd: update create verifier comment
    
    [ Upstream commit 2336d696862186fd4a6ddd1ea0cb243b3e32847c ]
    
    I don't know if that Solaris behavior matters any more or if it's still
    possible to look up that bug ID any more.  The XFS behavior's definitely
    still relevant, though; any but the most recent XFS filesystems will
    lose the top bits.
    
    Reported-by: Frank S. Filz <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Update file_hashtbl() helpers [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:22 2022 -0400

    NFSD: Update file_hashtbl() helpers
    
    [ Upstream commit 3fe828caddd81e68e9d29353c6e9285a658ca056 ]
    
    Enable callers to use const pointers for type safety.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: use (un)lock_inode instead of fh_(un)lock for file operations [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
    
    [ Upstream commit bb4d53d66e4b8c8b8e5634802262e53851a2d2db ]
    
    When locking a file to access ACLs and xattrs etc, use explicit locking
    with inode_lock() instead of fh_lock().  This means that the calls to
    fh_fill_pre/post_attr() are also explicit which improves readability and
    allows us to place them only where they are needed.  Only the xattr
    calls need pre/post information.
    
    When locking a file we don't need I_MUTEX_PARENT as the file is not a
    parent of anything, so we can use inode_lock() directly rather than the
    inode_lock_nested() call that fh_lock() uses.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use const pointers as parameters to fh_ helpers [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:16 2022 -0400

    NFSD: Use const pointers as parameters to fh_ helpers
    
    [ Upstream commit b48f8056c034f28dd54668399f1d22be421b0bef ]
    
    Enable callers to use const pointers where they are able to.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Tested-by: Jeff Layton <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Fri Sep 23 00:31:52 2022 +0800

    nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
    
    [ Upstream commit 0cfb0c4228a5c8e2ed2b58f8309b660b187cef02 ]
    
    Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code.
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Fri Sep 23 00:31:54 2022 +0800

    nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
    
    [ Upstream commit 1d7f6b302b75ff7acb9eb3cab0c631b10cfa7542 ]
    
    Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
    
    inode is converted from seq_file->file instead of seq_file->private in
    client_info_show().
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Fri Sep 23 00:31:53 2022 +0800

    nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
    
    [ Upstream commit 9beeaab8e05d353d709103cafa1941714b4d5d94 ]
    
    Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    [ cel: reduce line length ]
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Fri Sep 23 00:31:56 2022 +0800

    nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
    
    [ Upstream commit 1342f9dd3fc219089deeb2620f6790f19b4129b1 ]
    
    Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Fri Sep 23 00:31:55 2022 +0800

    nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
    
    [ Upstream commit 64776611a06322b99386f8dfe3b3ba1aa0347a38 ]
    
    Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
    
    nfsd_net is converted from seq_file->file instead of seq_file->private in
    nfsd_reply_cache_stats_show().
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    [ cel: reduce line length ]
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: use explicit lock/unlock for directory ops [+ + +]

Author: NeilBrown <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: use explicit lock/unlock for directory ops
    
    [ Upstream commit debf16f0c671cb8db154a9ebcd6014cfff683b80 ]
    
    When creating or unlinking a name in a directory use explicit
    inode_lock_nested() instead of fh_lock(), and explicit calls to
    fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done
    for renames, with lock_rename() as the explicit locking.
    
    Also move the 'fill' calls closer to the operation that might change the
    attributes.  This way they are avoided on some error paths.
    
    For the v2-only code in nfsproc.c, the fill calls are not replaced as
    they aren't needed.
    
    Making the locking explicit will simplify proposed future changes to
    locking for directories.  It also makes it easily visible exactly where
    pre/post attributes are used - not all callers of fh_lock() actually
    need the pre/post attributes.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use fsnotify group lock helpers [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Fri Apr 22 15:03:20 2022 +0300

    nfsd: use fsnotify group lock helpers
    
    [ Upstream commit b8962a9d8cc2d8c93362e2f684091c79f702f6f3 ]
    
    Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette")
    nfsd would close open files in direct reclaim context and that could
    cause a deadlock when fsnotify mark allocation went into direct reclaim
    and nfsd shrinker tried to free existing fsnotify marks.
    
    To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS
    flag on nfsd fsnotify group to prevent going into direct reclaim from
    fsnotify_add_inode_mark().
    
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]/
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Jan Kara <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd: use locks_inode_context helper [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Wed Nov 16 09:36:07 2022 -0500

    nfsd: use locks_inode_context helper
    
    [ Upstream commit 77c67530e1f95ac25c7075635f32f04367380894 ]
    
    nfsd currently doesn't access i_flctx safely everywhere. This requires a
    smp_load_acquire, as the pointer is set via cmpxchg (a release
    operation).
    
    Acked-by: Chuck Lever <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use only RQ_DROPME to signal the need to drop a reply [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sat Nov 26 15:55:30 2022 -0500

    NFSD: Use only RQ_DROPME to signal the need to drop a reply
    
    [ Upstream commit 9315564747cb6a570e99196b3a4880fb817635fd ]
    
    Clean up: NFSv2 has the only two usages of rpc_drop_reply in the
    NFSD code base. Since NFSv2 is going away at some point, replace
    these in order to simplify the "drop this reply?" check in
    nfsd_dispatch().
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use rhashtable for managing nfs4_file objects [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Oct 28 10:47:53 2022 -0400

    NFSD: Use rhashtable for managing nfs4_file objects
    
    [ Upstream commit d47b295e8d76a4d69f0e2ea0cd8a79c9d3488280 ]
    
    fh_match() is costly, especially when filehandles are large (as is
    the case for NFSv4). It needs to be used sparingly when searching
    data structures. Unfortunately, with common workloads, I see
    multiple thousands of objects stored in file_hashtbl[], which has
    just 256 buckets, making its bucket hash chains quite lengthy.
    
    Walking long hash chains with the state_lock held blocks other
    activity that needs that lock. Sizable hash chains are a common
    occurrance once the server has handed out some delegations, for
    example -- IIUC, each delegated file is held open on the server by
    an nfs4_file object.
    
    To help mitigate the cost of searching with fh_match(), replace the
    nfs4_file hash table with an rhashtable, which can dynamically
    resize its bucket array to minimize hash chain length.
    
    The result of this modification is an improvement in the latency of
    NFSv4 operations, and the reduction of nfsd CPU utilization due to
    eliminating the cost of multiple calls to fh_match() and reducing
    the CPU cache misses incurred while walking long hash chains in the
    nfs4_file hash table.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use set_bit(RQ_DROPME) [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sat Jan 7 10:15:35 2023 -0500

    NFSD: Use set_bit(RQ_DROPME)
    
    [ Upstream commit 5304930dbae82d259bcf7e5611db7c81e7a42eff ]
    
    The premise that "Once an svc thread is scheduled and executing an
    RPC, no other processes will touch svc_rqst::rq_flags" is false.
    svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
    threads when determining which thread to wake up next.
    
    Fixes: 9315564747cb ("NFSD: Use only RQ_DROPME to signal the need to drop a reply")
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use struct_size() helper in alloc_session() [+ + +]

Author: Xiu Jianfeng <[email protected]>
Date:   Fri Nov 11 17:18:35 2022 +0800

    NFSD: Use struct_size() helper in alloc_session()
    
    [ Upstream commit 85a0d0c9a58002ef7d1bf5e3ea630f4fbd42a4f0 ]
    
    Use struct_size() helper to simplify the code, no functional changes.
    
    Signed-off-by: Xiu Jianfeng <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:23:02 2022 -0400

    NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
    
    [ Upstream commit c3d2a04f05c590303c125a176e6e43df4a436fdb ]
    
    Replace the check for buffer over/underflow with a helper that is
    commonly used for this purpose. The helper also sets xdr->nwords
    correctly after successfully linearizing the symlink argument into
    the stream's scratch buffer.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Use xdr_pad_size() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 22 16:09:16 2022 -0400

    NFSD: Use xdr_pad_size()
    
    [ Upstream commit 5e64d85c7d0c59cfcd61d899720b8ccfe895d743 ]
    
    Clean up: Use a helper instead of open-coding the calculation of
    the XDR pad size.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: verify the opened dentry after setting a delegation [+ + +]

Author: Jeff Layton <[email protected]>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: verify the opened dentry after setting a delegation
    
    [ Upstream commit 876c553cb41026cb6ad3cef970a35e5f69c42a25 ]
    
    Between opening a file and setting a delegation on it, someone could
    rename or unlink the dentry. If this happens, we do not want to grant a
    delegation on the open.
    
    On a CLAIM_NULL open, we're opening by filename, and we may (in the
    non-create case) or may not (in the create case) be holding i_rwsem
    when attempting to set a delegation.  The latter case allows a
    race.
    
    After getting a lease, redo the lookup of the file being opened and
    validate that the resulting dentry matches the one in the open file
    description.
    
    To properly redo the lookup we need an rqst pointer to pass to
    nfsd_lookup_dentry(), so make sure that is available.
    
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: WARN when freeing an item still linked via nf_lru [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:25:04 2022 -0400

    NFSD: WARN when freeing an item still linked via nf_lru
    
    [ Upstream commit 668ed92e651d3c25f9b6e8cb7ceca54d00daa96d ]
    
    Add a guardrail to prevent freeing memory that is still on a list.
    This includes either a dispose list or the LRU list.
    
    This is the sign of a bug, but this class of bugs can be detected
    so that they don't endanger system stability, especially while
    debugging.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Write verifier might go backwards [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Dec 30 10:26:18 2021 -0500

    NFSD: Write verifier might go backwards
    
    [ Upstream commit cdc556600c0133575487cc69fb3128440b3c3e92 ]
    
    When vfs_iter_write() starts to fail because a file system is full,
    a bunch of writes can fail at once with ENOSPC. These writes
    repeatedly invoke nfsd_reset_boot_verifier() in quick succession.
    
    Ensure that the time it grabs doesn't go backwards due to an ntp
    adjustment going on at the same time.
    
    Signed-off-by: Chuck Lever <[email protected]>

NFSD: Zero counters when the filecache is re-initialized [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jul 8 14:24:51 2022 -0400

    NFSD: Zero counters when the filecache is re-initialized
    
    [ Upstream commit 8b330f78040cbe16cf8029df70391b2a491f17e2 ]
    
    If nfsd_file_cache_init() is called after a shutdown, be sure the
    stat counters are reset.
    
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

Linux: NFSD:fix boolreturn.cocci warning [+ + +]

Author: Changcheng Deng <[email protected]>
Date:   Tue Oct 19 04:14:22 2021 +0000

    NFSD:fix boolreturn.cocci warning
    
    [ Upstream commit 291cd656da04163f4bba67953c1f2f823e0d1231 ]
    
    ./fs/nfsd/nfssvc.c: 1072: 8-9: :WARNING return of 0/1 in function
    'nfssvc_decode_voidarg' with return type bool
    
    Return statements in functions returning bool should use true/false
    instead of 1/0.
    
    Reported-by: Zeal Robot <[email protected]>
    Signed-off-by: Changcheng Deng <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nfsd_splice_actor(): handle compound pages [+ + +]

Author: Al Viro <[email protected]>
Date:   Sat Sep 10 22:14:02 2022 +0100

    nfsd_splice_actor(): handle compound pages
    
    [ Upstream commit bfbfb6182ad1d7d184b16f25165faad879147f79 ]
    
    pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE
    worth of data).  Theoretically it had been possible since way back, but
    nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change.
    Fortunately, the only thing that changes for compound pages is that we
    need to stuff each relevant subpage in and convert the offset into offset
    in the first subpage.
    
    Acked-by: Chuck Lever <[email protected]>
    Tested-by: Benjamin Coddington <[email protected]>
    Fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE"
    Signed-off-by: Al Viro <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nilfs2: fix failure to detect DAT corruption in btree and direct mappings [+ + +]

Author: Ryusuke Konishi <[email protected]>
Date:   Wed Mar 13 19:58:26 2024 +0900

    nilfs2: fix failure to detect DAT corruption in btree and direct mappings
    
    [ Upstream commit f2f26b4a84a0ef41791bd2d70861c8eac748f4ba ]
    
    Patch series "nilfs2: fix kernel bug at submit_bh_wbc()".
    
    This resolves a kernel BUG reported by syzbot.  Since there are two
    flaws involved, I've made each one a separate patch.
    
    The first patch alone resolves the syzbot-reported bug, but I think
    both fixes should be sent to stable, so I've tagged them as such.
    
    This patch (of 2):
    
    Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data
    to a nilfs2 file system whose metadata is corrupted.
    
    There are two flaws involved in this issue.
    
    The first flaw is that when nilfs_get_block() locates a data block using
    btree or direct mapping, if the disk address translation routine
    nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata
    corruption, it can be passed back to nilfs_get_block().  This causes
    nilfs_get_block() to misidentify an existing block as non-existent,
    causing both data block lookup and insertion to fail inconsistently.
    
    The second flaw is that nilfs_get_block() returns a successful status in
    this inconsistent state.  This causes the caller __block_write_begin_int()
    or others to request a read even though the buffer is not mapped,
    resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc()
    failing.
    
    This fixes the first issue by changing the return value to code -EINVAL
    when a conversion using DAT fails with code -ENOENT, avoiding the
    conflicting condition that leads to the kernel bug described above.  Here,
    code -EINVAL indicates that metadata corruption was detected during the
    block lookup, which will be properly handled as a file system error and
    converted to -EIO when passing through the nilfs2 bmap layer.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks")
    Signed-off-by: Ryusuke Konishi <[email protected]>
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e
    Tested-by: Ryusuke Konishi <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nilfs2: prevent kernel bug at submit_bh_wbc() [+ + +]

Author: Ryusuke Konishi <[email protected]>
Date:   Wed Mar 13 19:58:27 2024 +0900

    nilfs2: prevent kernel bug at submit_bh_wbc()
    
    [ Upstream commit 269cdf353b5bdd15f1a079671b0f889113865f20 ]
    
    Fix a bug where nilfs_get_block() returns a successful status when
    searching and inserting the specified block both fail inconsistently.  If
    this inconsistent behavior is not due to a previously fixed bug, then an
    unexpected race is occurring, so return a temporary error -EAGAIN instead.
    
    This prevents callers such as __block_write_begin_int() from requesting a
    read into a buffer that is not mapped, which would cause the BUG_ON check
    for the BH_Mapped flag in submit_bh_wbc() to fail.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
    Signed-off-by: Ryusuke Konishi <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NLM: Defend against file_lock changes after vfs_test_lock() [+ + +]

Author: Benjamin Coddington <[email protected]>
Date:   Mon Jun 13 09:40:06 2022 -0400

    NLM: Defend against file_lock changes after vfs_test_lock()
    
    [ Upstream commit 184cefbe62627730c30282df12bcff9aae4816ea ]
    
    Instead of trusting that struct file_lock returns completely unchanged
    after vfs_test_lock() when there's no conflicting lock, stash away our
    nlm_lockowner reference so we can properly release it for all cases.
    
    This defends against another file_lock implementation overwriting fl_owner
    when the return type is F_UNLCK.
    
    Reported-by: Roberto Bergantinos Corpas <[email protected]>
    Tested-by: Roberto Bergantinos Corpas <[email protected]>
    Signed-off-by: Benjamin Coddington <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

nvme: fix miss command type check [+ + +]

Author: min15.li <[email protected]>
Date:   Fri May 26 17:06:56 2023 +0000

    nvme: fix miss command type check
    
    commit 31a5978243d24d77be4bacca56c78a0fbc43b00d upstream.
    
    In the function nvme_passthru_end(), only the value of the command
    opcode is checked, without checking the command type (IO command or
    Admin command). When we send a Dataset Management command (The opcode
    of the Dataset Management command is the same as the Set Feature
    command), kernel thinks it is a set feature command, then sets the
    controller's keep alive interval, and calls nvme_keep_alive_work().
    
    Signed-off-by: min15.li <[email protected]>
    Reviewed-by: Kanchan Joshi <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
    Signed-off-by: Tokunori Ikegami <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

nvmem: meson-efuse: fix function pointer type mismatch [+ + +]

Author: Jerome Brunet <[email protected]>
Date:   Sat Feb 24 11:40:23 2024 +0000

    nvmem: meson-efuse: fix function pointer type mismatch
    
    [ Upstream commit cbd38332c140829ab752ba4e727f98be5c257f18 ]
    
    clang-16 warns about casting functions to incompatible types, as is done
    here to call clk_disable_unprepare:
    
    drivers/nvmem/meson-efuse.c:78:12: error: cast from 'void (*)(struct clk *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
       78 |                                        (void(*)(void *))clk_disable_unprepare,
    
    The pattern of getting, enabling and setting a disable callback for a
    clock can be replaced with devm_clk_get_enabled(), which also fixes
    this warning.
    
    Fixes: 611fbca1c861 ("nvmem: meson-efuse: add peripheral clock")
    Cc: [email protected]
    Reported-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Jerome Brunet <[email protected]>
    Reviewed-by: Martin Blumenstingl <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Reviewed-by: Justin Stitt <[email protected]>
    Signed-off-by: Srinivas Kandagatla <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: Fix issue with loading coalesced KPU profiles [+ + +]

Author: Hariprasad Kelam <[email protected]>
Date:   Tue Mar 26 17:51:49 2024 +0530

    octeontx2-af: Fix issue with loading coalesced KPU profiles
    
    commit 0ba80d96585662299d4ea4624043759ce9015421 upstream.
    
    The current implementation for loading coalesced KPU profiles has
    a limitation.  The "offset" field, which is used to locate profiles
    within the profile is restricted to a u16.
    
    This restricts the number of profiles that can be loaded. This patch
    addresses this limitation by increasing the size of the "offset" field.
    
    Fixes: 11c730bfbf5b ("octeontx2-af: support for coalescing KPU profiles")
    Signed-off-by: Hariprasad Kelam <[email protected]>
    Reviewed-by: Kalesh AP <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Octeontx2-af: fix pause frame configuration in GMP mode [+ + +]

Author: Hariprasad Kelam <[email protected]>
Date:   Tue Mar 26 10:57:20 2024 +0530

    Octeontx2-af: fix pause frame configuration in GMP mode
    
    [ Upstream commit 40d4b4807cadd83fb3f46cc8cd67a945b5b25461 ]
    
    The Octeontx2 MAC block (CGX) has separate data paths (SMU and GMP) for
    different speeds, allowing for efficient data transfer.
    
    The previous patch which added pause frame configuration has a bug due
    to which pause frame feature is not working in GMP mode.
    
    This patch fixes the issue by configurating appropriate registers.
    
    Fixes: f7e086e754fe ("octeontx2-af: Pause frame configuration at cgx")
    Signed-off-by: Hariprasad Kelam <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-pf: check negative error code in otx2_open() [+ + +]

Author: Su Hui <[email protected]>
Date:   Thu Mar 28 10:06:21 2024 +0800

    octeontx2-pf: check negative error code in otx2_open()
    
    commit e709acbd84fb6ef32736331b0147f027a3ef4c20 upstream.
    
    otx2_rxtx_enable() return negative error code such as -EIO,
    check -EIO rather than EIO to fix this problem.
    
    Fixes: c926252205c4 ("octeontx2-pf: Disable packet I/O for graceful exit")
    Signed-off-by: Su Hui <[email protected]>
    Reviewed-by: Subbaraya Sundeep <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Reviewed-by: Kalesh AP <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

of: dynamic: Synchronize of_changeset_destroy() with the devlink removals [+ + +]

Author: Herve Codina <[email protected]>
Date:   Mon Mar 25 16:21:26 2024 +0100

    of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
    
    commit 8917e7385346bd6584890ed362985c219fe6ae84 upstream.
    
    In the following sequence:
      1) of_platform_depopulate()
      2) of_overlay_remove()
    
    During the step 1, devices are destroyed and devlinks are removed.
    During the step 2, OF nodes are destroyed but
    __of_changeset_entry_destroy() can raise warnings related to missing
    of_node_put():
      ERROR: memory leak, expected refcount 1 instead of 2 ...
    
    Indeed, during the devlink removals performed at step 1, the removal
    itself releasing the device (and the attached of_node) is done by a job
    queued in a workqueue and so, it is done asynchronously with respect to
    function calls.
    When the warning is present, of_node_put() will be called but wrongly
    too late from the workqueue job.
    
    In order to be sure that any ongoing devlink removals are done before
    the of_node destruction, synchronize the of_changeset_destroy() with the
    devlink removals.
    
    Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
    Cc: [email protected]
    Signed-off-by: Herve Codina <[email protected]>
    Reviewed-by: Saravana Kannan <[email protected]>
    Tested-by: Luca Ceresoli <[email protected]>
    Reviewed-by: Nuno Sa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Rob Herring <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached [+ + +]

Author: Jann Horn <[email protected]>
Date:   Thu Oct 6 20:33:01 2022 +0200

    openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
    
    [ Upstream commit 28148a17c988b614534f457da86893f83664ad43 ]
    
    Since commit 8782fb61cc848 ("mm: pagewalk: Fix race between unmap and page
    walker"), walk_page_range() on kernel ranges won't work anymore,
    walk_page_range_novma() must be used instead.
    
    Note: I don't have an openrisc development setup, so this is completely
    untested.
    
    Fixes: 8782fb61cc848 ("mm: pagewalk: Fix race between unmap and page walker")
    Signed-off-by: Jann Horn <[email protected]>
    Signed-off-by: Stafford Horne <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

orDate: Thu Sep 30 19:19:57 2021 -0400 [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 30 19:19:57 2021 -0400

    orDate: Thu Sep 30 19:19:57 2021 -0400
    
    NFSD: De-duplicate hash bucket indexing
    
    [ Upstream commit 378a6109dd142a678f629b740f558365150f60f9 ]
    
    Clean up: The details of finding the right hash bucket are exactly
    the same in both nfsd_cache_lookup() and nfsd_cache_update().
    
    Signed-off-by: Chuck Lever <[email protected]>

parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros [+ + +]

Author: John David Anglin <[email protected]>
Date:   Fri Feb 23 16:40:51 2024 +0100

    parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros
    
    [ Upstream commit 4603fbaa76b5e703b38ac8cc718102834eb6e330 ]
    
    Use add,l to avoid clobbering the C/B bits in the PSW.
    
    Signed-off-by: John David Anglin <[email protected]>
    Signed-off-by: Helge Deller <[email protected]>
    Cc: [email protected] # v5.10+
    Signed-off-by: Sasha Levin <[email protected]>

parisc: Fix csum_ipv6_magic on 32-bit systems [+ + +]

Author: Guenter Roeck <[email protected]>
Date:   Sat Feb 10 11:15:56 2024 -0800

    parisc: Fix csum_ipv6_magic on 32-bit systems
    
    [ Upstream commit 4408ba75e4ba80c91fde7e10bccccf388f5c09be ]
    
    Calculating the IPv6 checksum on 32-bit systems missed overflows when
    adding the proto+len fields into the checksum. This results in the
    following unit test failure.
    
        # test_csum_ipv6_magic: ASSERTION FAILED at lib/checksum_kunit.c:506
        Expected ( u64)csum_result == ( u64)expected, but
            ( u64)csum_result == 46722 (0xb682)
            ( u64)expected == 46721 (0xb681)
        not ok 5 test_csum_ipv6_magic
    
    This is probably rarely seen in the real world because proto+len are
    usually small values which will rarely result in overflows when calculating
    the checksum. However, the unit test code uses large values for the length
    field, causing the test to fail.
    
    Fix the problem by adding the missing carry into the final checksum.
    
    Cc: Palmer Dabbelt <[email protected]>
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: [email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Tested-by: Charlie Jenkins <[email protected]>
    Reviewed-by: Charlie Jenkins <[email protected]>
    Signed-off-by: Helge Deller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

parisc: Fix csum_ipv6_magic on 64-bit systems [+ + +]

Author: Guenter Roeck <[email protected]>
Date:   Tue Feb 13 15:46:31 2024 -0800

    parisc: Fix csum_ipv6_magic on 64-bit systems
    
    [ Upstream commit 4b75b12d70506e31fc02356bbca60f8d5ca012d0 ]
    
    hppa 64-bit systems calculates the IPv6 checksum using 64-bit add
    operations. The last add folds protocol and length fields into the 64-bit
    result. While unlikely, this operation can overflow. The overflow can be
    triggered with a code sequence such as the following.
    
            /* try to trigger massive overflows */
            memset(tmp_buf, 0xff, sizeof(struct in6_addr));
            csum_result = csum_ipv6_magic((struct in6_addr *)tmp_buf,
                                          (struct in6_addr *)tmp_buf,
                                          0xffff, 0xff, 0xffffffff);
    
    Fix the problem by adding any overflows from the final add operation into
    the calculated checksum. Fortunately, we can do this without additional
    cost by replacing the add operation used to fold the checksum into 32 bit
    with "add,dc" to add in the missing carry.
    
    Cc: Palmer Dabbelt <[email protected]>
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: [email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Reviewed-by: Charlie Jenkins <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Signed-off-by: Helge Deller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

parisc: Fix ip_fast_csum [+ + +]

Author: Guenter Roeck <[email protected]>
Date:   Sat Feb 10 09:55:26 2024 -0800

    parisc: Fix ip_fast_csum
    
    [ Upstream commit a2abae8f0b638c31bb9799d9dd847306e0d005bd ]
    
    IP checksum unit tests report the following error when run on hppa/hppa64.
    
        # test_ip_fast_csum: ASSERTION FAILED at lib/checksum_kunit.c:463
        Expected ( u64)csum_result == ( u64)expected, but
            ( u64)csum_result == 33754 (0x83da)
            ( u64)expected == 10946 (0x2ac2)
        not ok 4 test_ip_fast_csum
    
    0x83da is the expected result if the IP header length is 20 bytes. 0x2ac2
    is the expected result if the IP header length is 24 bytes. The test fails
    with an IP header length of 24 bytes. It appears that ip_fast_csum()
    always returns the checksum for a 20-byte header, no matter how long
    the header actually is.
    
    Code analysis shows a suspicious assembler sequence in ip_fast_csum().
    
     "      addc            %0, %3, %0\n"
     "1:    ldws,ma         4(%1), %3\n"
     "      addib,<         0, %2, 1b\n"    <---
    
    While my understanding of HPPA assembler is limited, it does not seem
    to make much sense to subtract 0 from a register and to expect the result
    to ever be negative. Subtracting 1 from the length parameter makes more
    sense. On top of that, the operation should be repeated if and only if
    the result is still > 0, so change the suspicious instruction to
     "      addib,>         -1, %2, 1b\n"
    
    The IP checksum unit test passes after this change.
    
    Cc: Palmer Dabbelt <[email protected]>
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: [email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Tested-by: Charlie Jenkins <[email protected]>
    Reviewed-by: Charlie Jenkins <[email protected]>
    Signed-off-by: Helge Deller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds [+ + +]

Author: Guenter Roeck <[email protected]>
Date:   Tue Feb 27 12:33:51 2024 -0800

    parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
    
    [ Upstream commit 0568b6f0d863643db2edcc7be31165740c89fa82 ]
    
    IPv6 checksum tests with unaligned addresses on 64-bit builds result
    in unexpected failures.
    
    Expected expected == csum_result, but
        expected == 46591 (0xb5ff)
        csum_result == 46381 (0xb52d)
    with alignment offset 1
    
    Oddly enough, the problem disappeared after adding test code into
    the beginning of csum_ipv6_magic().
    
    As it turns out, the 'sum' parameter of csum_ipv6_magic() is declared as
    __wsum, which is a 32-bit variable. However, it is treated as 64-bit
    variable in the 64-bit assembler code. Tests showed that the upper 32 bit
    of the register used to pass the variable are _not_ cleared when entering
    the function. This can result in checksum calculation errors.
    
    Clearing the upper 32 bit of 'sum' as first operation in the assembler
    code fixes the problem.
    
    Acked-by: Helge Deller <[email protected]>
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: [email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Helge Deller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI/AER: Block runtime suspend when handling errors [+ + +]

Author: Stanislaw Gruszka <[email protected]>
Date:   Mon Feb 12 13:01:35 2024 +0100

    PCI/AER: Block runtime suspend when handling errors
    
    [ Upstream commit 002bf2fbc00e5c4b95fb167287e2ae7d1973281e ]
    
    PM runtime can be done simultaneously with AER error handling.  Avoid that
    by using pm_runtime_get_sync() before and pm_runtime_put() after reset in
    pcie_do_recovery() for all recovering devices.
    
    pm_runtime_get_sync() will increase dev->power.usage_count counter to
    prevent any possible future request to runtime suspend a device.  It will
    also resume a device, if it was previously in D3hot state.
    
    I tested with igc device by doing simultaneous aer_inject and rpm
    suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can
    reproduce:
    
      igc 0000:02:00.0: not ready 65535ms after bus reset; giving up
      pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25)
      pcieport 0000:00:1c.2: AER: subordinate device reset failed
      pcieport 0000:00:1c.2: AER: device recovery failed
      igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
    
    The problem disappears when this patch is applied.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Stanislaw Gruszka <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
    Acked-by: Rafael J. Wysocki <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimited [+ + +]

Author: Mika Westerberg <[email protected]>
Date:   Tue Apr 5 12:38:10 2022 +0300

    PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimited
    
    [ Upstream commit 03038d84ace72678a9944524508f218a00377dc0 ]
    
    Intel DG2 discrete graphics PCIe endpoints advertise L1 acceptable exit
    latency to be < 1us even though they can actually tolerate unlimited exit
    latencies just fine. Quirk the L1 acceptable exit latency for these
    endpoints to be unlimited so ASPM L1 can be enabled.
    
    [bhelgaas: use FIELD_GET/FIELD_PREP, wordsmith comment & commit log]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mika Westerberg <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Stable-dep-of: 627c6db20703 ("PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports")
    Signed-off-by: Sasha Levin <[email protected]>

PCI/DPC: Quirk PIO log size for certain Intel Root Ports [+ + +]

Author: Mika Westerberg <[email protected]>
Date:   Tue Aug 16 13:20:42 2022 +0300

    PCI/DPC: Quirk PIO log size for certain Intel Root Ports
    
    [ Upstream commit 5459c0b7046752e519a646e1c2404852bb628459 ]
    
    Some Root Ports on Intel Tiger Lake and Alder Lake systems support the RP
    Extensions for DPC and the RP PIO Log registers but incorrectly advertise
    an RP PIO Log Size of zero.  This means the kernel complains that:
    
      DPC: RP PIO log size 0 is invalid
    
    and if DPC is triggered, the DPC driver will not dump the RP PIO Log
    registers when it should.
    
    This is caused by a BIOS bug and should be fixed the BIOS for future CPUs.
    
    Add a quirk to set the correct RP PIO Log size for the affected Root Ports.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=209943
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mika Westerberg <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
    Stable-dep-of: 627c6db20703 ("PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports")
    Signed-off-by: Sasha Levin <[email protected]>

PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports [+ + +]

Author: Mika Westerberg <[email protected]>
Date:   Thu May 11 15:19:05 2023 +0300

    PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports
    
    commit 3b8803494a0612acdeee714cb72aa142b1e05ce5 upstream.
    
    Commit 5459c0b70467 ("PCI/DPC: Quirk PIO log size for certain Intel Root
    Ports") added quirks for Tiger and Alder Lake Root Ports but missed that
    the same issue exists also in the previous generation, Ice Lake.
    
    Apply the quirk for Ice Lake Root Ports as well.  This prevents kernel
    complaints like:
    
      DPC: RP PIO log size 0 is invalid
    
    and also enables the DPC driver to dump the RP PIO Log registers when DPC
    is triggered.
    
    [bhelgaas: add dmesg warning and RP PIO Log dump info]
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=209943
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Mark Blakeney <[email protected]>
    Signed-off-by: Mika Westerberg <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports [+ + +]

Author: Paul Menzel <[email protected]>
Date:   Tue Mar 5 12:30:56 2024 +0100

    PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
    
    [ Upstream commit 627c6db20703b5d18d928464f411d0d4ec327508 ]
    
    Commit 5459c0b70467 ("PCI/DPC: Quirk PIO log size for certain Intel Root
    Ports") and commit 3b8803494a06 ("PCI/DPC: Quirk PIO log size for Intel Ice
    Lake Root Ports") add quirks for Ice, Tiger and Alder Lake Root Ports.
    System firmware for Raptor Lake still has the bug, so Linux logs the
    warning below on several Raptor Lake systems like Dell Precision 3581 with
    Intel Raptor Lake processor (0W18NX) system firmware/BIOS version 1.10.1.
    
      pci 0000:00:07.0: [8086:a76e] type 01 class 0x060400
      pci 0000:00:07.0: DPC: RP PIO log size 0 is invalid
      pci 0000:00:07.1: [8086:a73f] type 01 class 0x060400
      pci 0000:00:07.1: DPC: RP PIO log size 0 is invalid
    
    Apply the quirk for Raptor Lake Root Ports as well.
    
    This also enables the DPC driver to dump the RP PIO Log registers when DPC
    is triggered.
    
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Niels van Aert <[email protected]>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218560
    Signed-off-by: Paul Menzel <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Cc: <[email protected]>
    Cc: Mika Westerberg <[email protected]>
    Cc: Niels van Aert <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI/PM: Drain runtime-idle callbacks before driver removal [+ + +]

Author: Rafael J. Wysocki <[email protected]>
Date:   Tue Mar 5 11:45:38 2024 +0100

    PCI/PM: Drain runtime-idle callbacks before driver removal
    
    [ Upstream commit 9d5286d4e7f68beab450deddbb6a32edd5ecf4bf ]
    
    A race condition between the .runtime_idle() callback and the .remove()
    callback in the rtsx_pcr PCI driver leads to a kernel crash due to an
    unhandled page fault [1].
    
    The problem is that rtsx_pci_runtime_idle() is not expected to be running
    after pm_runtime_get_sync() has been called, but the latter doesn't really
    guarantee that.  It only guarantees that the suspend and resume callbacks
    will not be running when it returns.
    
    However, if a .runtime_idle() callback is already running when
    pm_runtime_get_sync() is called, the latter will notice that the runtime PM
    status of the device is RPM_ACTIVE and it will return right away without
    waiting for the former to complete.  In fact, it cannot wait for
    .runtime_idle() to complete because it may be called from that callback (it
    arguably does not make much sense to do that, but it is not strictly
    prohibited).
    
    Thus in general, whoever is providing a .runtime_idle() callback needs
    to protect it from running in parallel with whatever code runs after
    pm_runtime_get_sync().  [Note that .runtime_idle() will not start after
    pm_runtime_get_sync() has returned, but it may continue running then if it
    has started earlier.]
    
    One way to address that race condition is to call pm_runtime_barrier()
    after pm_runtime_get_sync() (not before it, because a nonzero value of the
    runtime PM usage counter is necessary to prevent runtime PM callbacks from
    being invoked) to wait for the .runtime_idle() callback to complete should
    it be running at that point.  A suitable place for doing that is in
    pci_device_remove() which calls pm_runtime_get_sync() before removing the
    driver, so it may as well call pm_runtime_barrier() subsequently, which
    will prevent the race in question from occurring, not just in the rtsx_pcr
    driver, but in any PCI drivers providing .runtime_idle() callbacks.
    
    Link: https://lore.kernel.org/lkml/[email protected]/ # [1]
    Link: https://lore.kernel.org/r/5761426.DvuYhMxLoT@kreacher
    Reported-by: Kai-Heng Feng <[email protected]>
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Tested-by: Ricky Wu <[email protected]>
    Acked-by: Kai-Heng Feng <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: Drop pci_device_remove() test of pci_dev->driver [+ + +]

Author: Uwe Kleine-Kц╤nig <[email protected]>
Date:   Mon Oct 4 14:59:25 2021 +0200

    PCI: Drop pci_device_remove() test of pci_dev->driver
    
    [ Upstream commit 097d9d414433315122f759ee6c2d8a7417a8ff0f ]
    
    When the driver core calls pci_device_remove(), there is a driver bound
    to the device, so pci_dev->driver is never NULL.
    
    Remove the unnecessary test of pci_dev->driver.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Uwe Kleine-Kц╤nig <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Stable-dep-of: 9d5286d4e7f6 ("PCI/PM: Drain runtime-idle callbacks before driver removal")
    Signed-off-by: Sasha Levin <[email protected]>

PCI: dwc: endpoint: Fix advertised resizable BAR size [+ + +]

Author: Niklas Cassel <[email protected]>
Date:   Thu Mar 7 12:15:20 2024 +0100

    PCI: dwc: endpoint: Fix advertised resizable BAR size
    
    [ Upstream commit 72e34b8593e08a0ee759b7a038e0b178418ea6f8 ]
    
    The commit message in commit fc9a77040b04 ("PCI: designware-ep: Configure
    Resizable BAR cap to advertise the smallest size") claims that it modifies
    the Resizable BAR capability to only advertise support for 1 MB size BARs.
    
    However, the commit writes all zeroes to PCI_REBAR_CAP (the register which
    contains the possible BAR sizes that a BAR be resized to).
    
    According to the spec, it is illegal to not have a bit set in
    PCI_REBAR_CAP, and 1 MB is the smallest size allowed.
    
    Set bit 4 in PCI_REBAR_CAP, so that we actually advertise support for a
    1 MB BAR size.
    
    Before:
            Capabilities: [2e8 v1] Physical Resizable BAR
                    BAR 0: current size: 1MB
                    BAR 1: current size: 1MB
                    BAR 2: current size: 1MB
                    BAR 3: current size: 1MB
                    BAR 4: current size: 1MB
                    BAR 5: current size: 1MB
    After:
            Capabilities: [2e8 v1] Physical Resizable BAR
                    BAR 0: current size: 1MB, supported: 1MB
                    BAR 1: current size: 1MB, supported: 1MB
                    BAR 2: current size: 1MB, supported: 1MB
                    BAR 3: current size: 1MB, supported: 1MB
                    BAR 4: current size: 1MB, supported: 1MB
                    BAR 5: current size: 1MB, supported: 1MB
    
    Fixes: fc9a77040b04 ("PCI: designware-ep: Configure Resizable BAR cap to advertise the smallest size")
    Link: https://lore.kernel.org/linux-pci/[email protected]
    Signed-off-by: Niklas Cassel <[email protected]>
    Signed-off-by: Krzysztof Wilczyе└ski <[email protected]>
    Reviewed-by: Manivannan Sadhasivam <[email protected]>
    Cc: <[email protected]> # 5.2
    Signed-off-by: Sasha Levin <[email protected]>

PCI: Work around Intel I210 ROM BAR overlap defect [+ + +]

Author: Bjorn Helgaas <[email protected]>
Date:   Tue Dec 21 10:45:07 2021 -0600

    PCI: Work around Intel I210 ROM BAR overlap defect
    
    [ Upstream commit 500b55b05d0a21c4adddf4c3b29ee6f32b502046 ]
    
    Per PCIe r5, sec 7.5.1.2.4, a device must not claim accesses to its
    Expansion ROM unless both the Memory Space Enable and the Expansion ROM
    Enable bit are set.  But apparently some Intel I210 NICs don't work
    correctly if the ROM BAR overlaps another BAR, even if the Expansion ROM is
    disabled.
    
    Michael reported that on a Kontron SMARC-sAL28 ARM64 system with U-Boot
    v2021.01-rc3, the ROM BAR overlaps BAR 3, and networking doesn't work at
    all:
    
      BAR 0: 0x40000000 (32-bit, non-prefetchable) [size=1M]
      BAR 3: 0x40200000 (32-bit, non-prefetchable) [size=16K]
      ROM:   0x40200000 (disabled) [size=1M]
    
      NETDEV WATCHDOG: enP2p1s0 (igb): transmit queue 0 timed out
      Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
      igb 0002:01:00.0 enP2p1s0: Reset adapter
    
    Previously, pci_std_update_resource() wrote the assigned ROM address to the
    BAR only when the ROM was enabled.  This meant that the I210 ROM BAR could
    be left with an address assigned by firmware, which might overlap with
    other BARs.
    
    Quirk these I210 devices so pci_std_update_resource() always writes the
    assigned address to the ROM BAR, whether or not the ROM is enabled.
    
    Link: https://lore.kernel.org/r/20211223163754.GA1267351@bhelgaas
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=211105
    Reported-by: Michael Walle <[email protected]>
    Tested-by: Michael Walle <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Stable-dep-of: 627c6db20703 ("PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports")
    Signed-off-by: Sasha Levin <[email protected]>

pci_iounmap(): Fix MMIO mapping leak [+ + +]

Author: Philipp Stanner <[email protected]>
Date:   Wed Jan 31 10:00:20 2024 +0100

    pci_iounmap(): Fix MMIO mapping leak
    
    [ Upstream commit 7626913652cc786c238e2dd7d8740b17d41b2637 ]
    
    The #ifdef ARCH_HAS_GENERIC_IOPORT_MAP accidentally also guards iounmap(),
    which means MMIO mappings are leaked.
    
    Move the guard so we call iounmap() for MMIO mappings.
    
    Fixes: 316e8d79a095 ("pci_iounmap'2: Electric Boogaloo: try to make sense of it all")
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Danilo Krummrich <[email protected]>
    Suggested-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Philipp Stanner <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Arnd Bergmann <[email protected]>
    Cc: <[email protected]> # v5.15+
    Signed-off-by: Sasha Levin <[email protected]>

perf/core: Fix reentry problem in perf_output_read_group() [+ + +]

Author: Yang Jihong <[email protected]>
Date:   Fri Sep 2 16:29:18 2022 +0800

    perf/core: Fix reentry problem in perf_output_read_group()
    
    commit 6b959ba22d34ca793ffdb15b5715457c78e38b1a upstream.
    
    perf_output_read_group may respond to IPI request of other cores and invoke
    __perf_install_in_context function. As a result, hwc configuration is modified.
    causing inconsistency and unexpected consequences.
    
    Interrupts are not disabled when perf_output_read_group reads PMU counter.
    In this case, IPI request may be received from other cores.
    As a result, PMU configuration is modified and an error occurs when
    reading PMU counter:
    
                         CPU0                                         CPU1
                                                          __se_sys_perf_event_open
                                                            perf_install_in_context
      perf_output_read_group                                  smp_call_function_single
        for_each_sibling_event(sub, leader) {                   generic_exec_single
          if ((sub != event) &&                                   remote_function
              (sub->state == PERF_EVENT_STATE_ACTIVE))                    |
      <enter IPI handler: __perf_install_in_context>   <----RAISE IPI-----+
      __perf_install_in_context
        ctx_resched
          event_sched_out
            armpmu_del
              ...
              hwc->idx = -1; // event->hwc.idx is set to -1
      ...
      <exit IPI>
                  sub->pmu->read(sub);
                    armpmu_read
                      armv8pmu_read_counter
                        armv8pmu_read_hw_counter
                          int idx = event->hw.idx; // idx = -1
                          u64 val = armv8pmu_read_evcntr(idx);
                            u32 counter = ARMV8_IDX_TO_COUNTER(idx); // invalid counter = 30
                            read_pmevcntrn(counter) // undefined instruction
    
    Signed-off-by: Yang Jihong <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

phy: tegra: xusb: Add API to retrieve the port number of phy [+ + +]

Author: Wayne Chang <[email protected]>
Date:   Thu Mar 7 11:03:27 2024 +0800

    phy: tegra: xusb: Add API to retrieve the port number of phy
    
    [ Upstream commit d843f031d9e90462253015bc0bd9e3852d206bf2 ]
    
    This patch introduces a new API, tegra_xusb_padctl_get_port_number,
    to the Tegra XUSB Pad Controller driver. This API is used to identify
    the USB port that is associated with a given PHY.
    
    The function takes a PHY pointer for either a USB2 PHY or USB3 PHY as input
    and returns the corresponding port number. If the PHY pointer is invalid,
    it returns -ENODEV.
    
    Cc: [email protected]
    Signed-off-by: Wayne Chang <[email protected]>
    Reviewed-by: Jon Hunter <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PM: sleep: wakeirq: fix wake irq warning in system suspend [+ + +]

Author: Qingliang Li <[email protected]>
Date:   Fri Mar 1 17:26:57 2024 +0800

    PM: sleep: wakeirq: fix wake irq warning in system suspend
    
    [ Upstream commit e7a7681c859643f3f2476b2a28a494877fd89442 ]
    
    When driver uses pm_runtime_force_suspend() as the system suspend callback
    function and registers the wake irq with reverse enable ordering, the wake
    irq will be re-enabled when entering system suspend, triggering an
    'Unbalanced enable for IRQ xxx' warning. In this scenario, the call
    sequence during system suspend is as follows:
      suspend_devices_and_enter()
        -> dpm_suspend_start()
          -> dpm_run_callback()
            -> pm_runtime_force_suspend()
              -> dev_pm_enable_wake_irq_check()
              -> dev_pm_enable_wake_irq_complete()
    
        -> suspend_enter()
          -> dpm_suspend_noirq()
            -> device_wakeup_arm_wake_irqs()
              -> dev_pm_arm_wake_irq()
    
    To fix this issue, complete the setting of WAKE_IRQ_DEDICATED_ENABLED flag
    in dev_pm_enable_wake_irq_complete() to avoid redundant irq enablement.
    
    Fixes: 8527beb12087 ("PM: sleep: wakeirq: fix wake irq arming")
    Reviewed-by: Dhruva Gole <[email protected]>
    Signed-off-by: Qingliang Li <[email protected]>
    Reviewed-by: Johan Hovold <[email protected]>
    Cc: 5.16+ <[email protected]> # 5.16+
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PM: suspend: Set mem_sleep_current during kernel command line setup [+ + +]

Author: Maulik Shah <[email protected]>
Date:   Thu Feb 29 12:14:59 2024 +0530

    PM: suspend: Set mem_sleep_current during kernel command line setup
    
    [ Upstream commit 9bc4ffd32ef8943f5c5a42c9637cfd04771d021b ]
    
    psci_init_system_suspend() invokes suspend_set_ops() very early during
    bootup even before kernel command line for mem_sleep_default is setup.
    This leads to kernel command line mem_sleep_default=s2idle not working
    as mem_sleep_current gets changed to deep via suspend_set_ops() and never
    changes back to s2idle.
    
    Set mem_sleep_current along with mem_sleep_default during kernel command
    line setup as default suspend mode.
    
    Fixes: faf7ec4a92c0 ("drivers: firmware: psci: add system suspend support")
    CC: [email protected] # 5.4+
    Signed-off-by: Maulik Shah <[email protected]>
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

powerpc/fsl: Fix mfpmr build errors with newer binutils [+ + +]

Author: Michael Ellerman <[email protected]>
Date:   Thu Feb 29 23:25:19 2024 +1100

    powerpc/fsl: Fix mfpmr build errors with newer binutils
    
    [ Upstream commit 5f491356b7149564ab22323ccce79c8d595bfd0c ]
    
    Binutils 2.38 complains about the use of mfpmr when building
    ppc6xx_defconfig:
    
        CC      arch/powerpc/kernel/pmc.o
      {standard input}: Assembler messages:
      {standard input}:45: Error: unrecognized opcode: `mfpmr'
      {standard input}:56: Error: unrecognized opcode: `mtpmr'
    
    This is because by default the kernel is built with -mcpu=powerpc, and
    the mt/mfpmr instructions are not defined.
    
    It can be avoided by enabling CONFIG_E300C3_CPU, but just adding that to
    the defconfig will leave open the possibility of randconfig failures.
    
    So add machine directives around the mt/mfpmr instructions to tell
    binutils how to assemble them.
    
    Cc: [email protected]
    Reported-by: Jan-Benedict Glaw <[email protected]>
    Signed-off-by: Michael Ellerman <[email protected]>
    Link: https://msgid.link/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

powerpc: xor_vmx: Add '-mhard-float' to CFLAGS [+ + +]

Author: Nathan Chancellor <[email protected]>
Date:   Sat Jan 27 11:07:43 2024 -0700

    powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
    
    commit 35f20786c481d5ced9283ff42de5c69b65e5ed13 upstream.
    
    arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main
    powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an
    error when building with clang after a recent change in main:
    
      error: option '-msoft-float' cannot be specified with '-maltivec'
      make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1
    
    Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS
    to override the previous inclusion of '-msoft-float' (as the last option
    wins), which matches how other areas of the kernel use '-maltivec', such
    as AMDGPU.
    
    Cc: [email protected]
    Closes: https://github.com/ClangBuiltLinux/linux/issues/1986
    Link: https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873be3c28556c
    Signed-off-by: Nathan Chancellor <[email protected]>
    Signed-off-by: Michael Ellerman <[email protected]>
    Link: https://msgid.link/20240127-ppc-xor_vmx-drop-msoft-float-v1-1-f24140e81376@kernel.org
    [nathan: Fixed conflicts due to lack of 04e85bbf71c9 in older trees]
    Signed-off-by: Nathan Chancellor <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

printk/console: Split out code that enables default console [+ + +]

Author: Petr Mladek <[email protected]>
Date:   Mon Nov 22 14:26:45 2021 +0100

    printk/console: Split out code that enables default console
    
    [ Upstream commit ed758b30d541e9bf713cd58612a4414e57dc6d73 ]
    
    Put the code enabling a console by default into a separate function
    called try_enable_default_console().
    
    Rename try_enable_new_console() to try_enable_preferred_console() to
    make the purpose of the different variants more clear.
    
    It is a code refactoring without any functional change.
    
    Signed-off-by: Petr Mladek <[email protected]>
    Reviewed-by: Sergey Senozhatsky <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Stable-dep-of: 801410b26a0e ("serial: Lock console when calling into driver before registration")
    Signed-off-by: Sasha Levin <[email protected]>

printk: Update @console_may_schedule in console_trylock_spinning() [+ + +]

Author: John Ogness <[email protected]>
Date:   Mon Feb 26 13:07:24 2024 +0106

    printk: Update @console_may_schedule in console_trylock_spinning()
    
    [ Upstream commit 8076972468584d4a21dab9aa50e388b3ea9ad8c7 ]
    
    console_trylock_spinning() may takeover the console lock from a
    schedulable context. Update @console_may_schedule to make sure it
    reflects a trylock acquire.
    
    Reported-by: Mukesh Ojha <[email protected]>
    Closes: https://lore.kernel.org/lkml/[email protected]
    Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
    Signed-off-by: John Ogness <[email protected]>
    Reviewed-by: Mukesh Ojha <[email protected]>
    Reviewed-by: Petr Mladek <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Petr Mladek <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d [+ + +]

Author: Heiner Kallweit <[email protected]>
Date:   Sat Mar 30 12:49:02 2024 +0100

    r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
    
    commit 5d872c9f46bd2ea3524af3c2420a364a13667135 upstream.
    
    On some boards with this chip version the BIOS is buggy and misses
    to reset the PHY page selector. This results in the PHY ID read
    accessing registers on a different page, returning a more or
    less random value. Fix this by resetting the page selector first.
    
    Fixes: f1e911d5d0df ("r8169: add basic phylib support")
    Cc: [email protected]
    Signed-off-by: Heiner Kallweit <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Revert "SUNRPC: Use RMW bitops in single-threaded hot paths" [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Jan 6 12:43:37 2023 -0500

    Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
    
    [ Upstream commit 7827c81f0248e3c2f40d438b020f3d222f002171 ]
    
    The premise that "Once an svc thread is scheduled and executing an
    RPC, no other processes will touch svc_rqst::rq_flags" is false.
    svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
    threads when determining which thread to wake up next.
    
    Found via KCSAN.
    
    Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths")
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

Revert "usb: phy: generic: Get the vbus supply" [+ + +]

Author: Alexander Stein <[email protected]>
Date:   Thu Mar 14 10:26:27 2024 +0100

    Revert "usb: phy: generic: Get the vbus supply"
    
    [ Upstream commit fdada0db0b2ae2addef4ccafe50937874dbeeebe ]
    
    This reverts commit 75fd6485cccef269ac9eb3b71cf56753341195ef.
    This patch was applied twice by accident, causing probe failures.
    Revert the accident.
    
    Signed-off-by: Alexander Stein <[email protected]>
    Fixes: 75fd6485ccce ("usb: phy: generic: Get the vbus supply")
    Cc: stable <[email protected]>
    Reviewed-by: Sean Anderson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped." [+ + +]

Author: Ingo Molnar <[email protected]>
Date:   Mon Mar 25 11:47:51 2024 +0100

    Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
    
    commit c567f2948f57bdc03ed03403ae0234085f376b7d upstream.
    
    This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572.
    
    While the original change tries to fix a bug, it also unintentionally broke
    existing systems, see the regressions reported at:
    
      https://lore.kernel.org/all/[email protected]/
    
    Since d794734c9bbf was also marked for -stable, let's back it out before
    causing more damage.
    
    Note that due to another upstream change the revert was not 100% automatic:
    
      0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf()
    
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: <[email protected]>
    Cc: Russ Anderson <[email protected]>
    Cc: Steve Wahl <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
    Signed-off-by: Steve Wahl <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ring-buffer: Do not set shortest_full when full target is hit [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Tue Mar 12 11:56:41 2024 -0400

    ring-buffer: Do not set shortest_full when full target is hit
    
    [ Upstream commit 761d9473e27f0c8782895013a3e7b52a37c8bcfc ]
    
    The rb_watermark_hit() checks if the amount of data in the ring buffer is
    above the percentage level passed in by the "full" variable. If it is, it
    returns true.
    
    But it also sets the "shortest_full" field of the cpu_buffer that informs
    writers that it needs to call the irq_work if the amount of data on the
    ring buffer is above the requested amount.
    
    The rb_watermark_hit() always sets the shortest_full even if the amount in
    the ring buffer is what it wants. As it is not going to wait, because it
    has what it wants, there's no reason to set shortest_full.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Mathieu Desnoyers <[email protected]>
    Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
    Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ring-buffer: Fix full_waiters_pending in poll [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Tue Mar 12 09:19:20 2024 -0400

    ring-buffer: Fix full_waiters_pending in poll
    
    [ Upstream commit 8145f1c35fa648da662078efab299c4467b85ad5 ]
    
    If a reader of the ring buffer is doing a poll, and waiting for the ring
    buffer to hit a specific watermark, there could be a case where it gets
    into an infinite ping-pong loop.
    
    The poll code has:
    
      rbwork->full_waiters_pending = true;
      if (!cpu_buffer->shortest_full ||
          cpu_buffer->shortest_full > full)
             cpu_buffer->shortest_full = full;
    
    The writer will see full_waiters_pending and check if the ring buffer is
    filled over the percentage of the shortest_full value. If it is, it calls
    an irq_work to wake up all the waiters.
    
    But the code could get into a circular loop:
    
            CPU 0                                   CPU 1
            -----                                   -----
     [ Poll ]
       [ shortest_full = 0 ]
       rbwork->full_waiters_pending = true;
                                              if (rbwork->full_waiters_pending &&
                                                  [ buffer percent ] > shortest_full) {
                                                     rbwork->wakeup_full = true;
                                                     [ queue_irqwork ]
    
       cpu_buffer->shortest_full = full;
    
                                              [ IRQ work ]
                                              if (rbwork->wakeup_full) {
                                                    cpu_buffer->shortest_full = 0;
                                                    wakeup poll waiters;
      [woken]
       if ([ buffer percent ] > full)
          break;
       rbwork->full_waiters_pending = true;
                                              if (rbwork->full_waiters_pending &&
                                                  [ buffer percent ] > shortest_full) {
                                                     rbwork->wakeup_full = true;
                                                     [ queue_irqwork ]
    
       cpu_buffer->shortest_full = full;
    
                                              [ IRQ work ]
                                              if (rbwork->wakeup_full) {
                                                    cpu_buffer->shortest_full = 0;
                                                    wakeup poll waiters;
      [woken]
    
     [ Wash, rinse, repeat! ]
    
    In the poll, the shortest_full needs to be set before the
    full_pending_waiters, as once that is set, the writer will compare the
    current shortest_full (which is incorrect) to decide to call the irq_work,
    which will reset the shortest_full (expecting the readers to update it).
    
    Also move the setting of full_waiters_pending after the check if the ring
    buffer has the required percentage filled. There's no reason to tell the
    writer to wake up waiters if there are no waiters.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Mark Rutland <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
    Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ring-buffer: Fix resetting of shortest_full [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Fri Mar 8 15:24:04 2024 -0500

    ring-buffer: Fix resetting of shortest_full
    
    [ Upstream commit 68282dd930ea38b068ce2c109d12405f40df3f93 ]
    
    The "shortest_full" variable is used to keep track of the waiter that is
    waiting for the smallest amount on the ring buffer before being woken up.
    When a tasks waits on the ring buffer, it passes in a "full" value that is
    a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
    100% full buffer.
    
    As all waiters are on the same wait queue, the wake up happens for the
    waiter with the smallest percentage.
    
    The problem is that the smallest_full on the cpu_buffer that stores the
    smallest amount doesn't get reset when all the waiters are woken up. It
    does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
    
    This means that tasks may be woken up more often then when they want to
    be. Instead, have the shortest_full field get reset just before waking up
    all the tasks. If the tasks wait again, they will update the shortest_full
    before sleeping.
    
    Also add locking around setting of shortest_full in the poll logic, and
    change "work" to "rbwork" to match the variable name for rb_irq_work
    structures that are used in other places.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: linke li <[email protected]>
    Cc: Rabin Vincent <[email protected]>
    Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Stable-dep-of: 8145f1c35fa6 ("ring-buffer: Fix full_waiters_pending in poll")
    Signed-off-by: Sasha Levin <[email protected]>

ring-buffer: Fix waking up ring buffer readers [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Fri Mar 8 15:24:03 2024 -0500

    ring-buffer: Fix waking up ring buffer readers
    
    [ Upstream commit b3594573681b53316ec0365332681a30463edfd6 ]
    
    A task can wait on a ring buffer for when it fills up to a specific
    watermark. The writer will check the minimum watermark that waiters are
    waiting for and if the ring buffer is past that, it will wake up all the
    waiters.
    
    The waiters are in a wait loop, and will first check if a signal is
    pending and then check if the ring buffer is at the desired level where it
    should break out of the loop.
    
    If a file that uses a ring buffer closes, and there's threads waiting on
    the ring buffer, it needs to wake up those threads. To do this, a
    "wait_index" was used.
    
    Before entering the wait loop, the waiter will read the wait_index. On
    wakeup, it will check if the wait_index is different than when it entered
    the loop, and will exit the loop if it is. The waker will only need to
    update the wait_index before waking up the waiters.
    
    This had a couple of bugs. One trivial one and one broken by design.
    
    The trivial bug was that the waiter checked the wait_index after the
    schedule() call. It had to be checked between the prepare_to_wait() and
    the schedule() which it was not.
    
    The main bug is that the first check to set the default wait_index will
    always be outside the prepare_to_wait() and the schedule(). That's because
    the ring_buffer_wait() doesn't have enough context to know if it should
    break out of the loop.
    
    The loop itself is not needed, because all the callers to the
    ring_buffer_wait() also has their own loop, as the callers have a better
    sense of what the context is to decide whether to break out of the loop
    or not.
    
    Just have the ring_buffer_wait() block once, and if it gets woken up, exit
    the function and let the callers decide what to do next.
    
    Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: linke li <[email protected]>
    Cc: Rabin Vincent <[email protected]>
    Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Stable-dep-of: 761d9473e27f ("ring-buffer: Do not set shortest_full when full target is hit")
    Signed-off-by: Sasha Levin <[email protected]>

ring-buffer: Use wait_event_interruptible() in ring_buffer_wait() [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Tue Mar 12 08:15:07 2024 -0400

    ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()
    
    [ Upstream commit 7af9ded0c2caac0a95f33df5cb04706b0f502588 ]
    
    Convert ring_buffer_wait() over to wait_event_interruptible(). The default
    condition is to execute the wait loop inside __wait_event() just once.
    
    This does not change the ring_buffer_wait() prototype yet, but
    restructures the code so that it can take a "cond" and "data" parameter
    and will call wait_event_interruptible() with a helper function as the
    condition.
    
    The helper function (rb_wait_cond) takes the cond function and data
    parameters. It will first check if the buffer hit the watermark defined by
    the "full" parameter and then call the passed in condition parameter. If
    either are true, it returns true.
    
    If rb_wait_cond() does not return true, it will set the appropriate
    "waiters_pending" flag and returns false.
    
    Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQmi1waeS2O1v6L4c_Um5A@mail.gmail.com/
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: linke li <[email protected]>
    Cc: Rabin Vincent <[email protected]>
    Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

riscv: Fix spurious errors from __get/put_kernel_nofault [+ + +]

Author: Samuel Holland <[email protected]>
Date:   Mon Mar 11 19:19:13 2024 -0700

    riscv: Fix spurious errors from __get/put_kernel_nofault
    
    commit d080a08b06b6266cc3e0e86c5acfd80db937cb6b upstream.
    
    These macros did not initialize __kr_err, so they could fail even if
    the access did not fault.
    
    Cc: [email protected]
    Fixes: d464118cdc41 ("riscv: implement __get_kernel_nofault and __put_user_nofault")
    Signed-off-by: Samuel Holland <[email protected]>
    Reviewed-by: Alexandre Ghiti <[email protected]>
    Reviewed-by: Charlie Jenkins <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv: process: Fix kernel gp leakage [+ + +]

Author: Stefan O'Rear <[email protected]>
Date:   Wed Mar 27 02:12:58 2024 -0400

    riscv: process: Fix kernel gp leakage
    
    commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 upstream.
    
    childregs represents the registers which are active for the new thread
    in user context. For a kernel thread, childregs->gp is never used since
    the kernel gp is not touched by switch_to. For a user mode helper, the
    gp value can be observed in user space after execve or possibly by other
    means.
    
    [From the email thread]
    
    The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
    for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
    when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
    PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
    
    childregs is the *user* context during syscall execution and it is observable
    from userspace in at least five ways:
    
    1. kernel_execve does not currently clear integer registers, so the starting
       register state for PID 1 and other user processes started by the kernel has
       sp = user stack, gp = kernel __global_pointer$, all other integer registers
       zeroed by the memset in the patch comment.
    
       This is a bug in its own right, but I'm unwilling to bet that it is the only
       way to exploit the issue addressed by this patch.
    
    2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
       before it execs, but ptrace requires SIGSTOP to be delivered which can only
       happen at user/kernel boundaries.
    
    3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
       user_mode_helpers before the exec completes, but gp is not one of the
       registers it returns.
    
    4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
       addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
       are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
       LOCKDOWN_PERF. I have not attempted to write exploit code.
    
    5. Much of the tracing infrastructure allows access to user registers. I have
       not attempted to determine which forms of tracing allow access to user
       registers without already allowing access to kernel registers.
    
    Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
    Cc: [email protected]
    Signed-off-by: Stefan O'Rear <[email protected]>
    Reviewed-by: Alexandre Ghiti <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/entry: align system call table on 8 bytes [+ + +]

Author: Sumanth Korikkar <[email protected]>
Date:   Tue Mar 26 18:12:13 2024 +0100

    s390/entry: align system call table on 8 bytes
    
    commit 378ca2d2ad410a1cd5690d06b46c5e2297f4c8c0 upstream.
    
    Align system call table on 8 bytes. With sys_call_table entry size
    of 8 bytes that eliminates the possibility of a system call pointer
    crossing cache line boundary.
    
    Cc: [email protected]
    Suggested-by: Ulrich Weigand <[email protected]>
    Reviewed-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Sumanth Korikkar <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/qeth: handle deferred cc1 [+ + +]

Author: Alexandra Winter <[email protected]>
Date:   Thu Mar 21 12:53:37 2024 +0100

    s390/qeth: handle deferred cc1
    
    [ Upstream commit afb373ff3f54c9d909efc7f810dc80a9742807b2 ]
    
    The IO subsystem expects a driver to retry a ccw_device_start, when the
    subsequent interrupt response block (irb) contains a deferred
    condition code 1.
    
    Symptoms before this commit:
    On the read channel we always trigger the next read anyhow, so no
    different behaviour here.
    On the write channel we may experience timeout errors, because the
    expected reply will never be received without the retry.
    Other callers of qeth_send_control_data() may wrongly assume that the ccw
    was successful, which may cause problems later.
    
    Note that since
    commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
    and
    commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start")
    deferred CC1s are much more likely to occur. See the commit message of the
    latter for more background information.
    
    Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
    Signed-off-by: Alexandra Winter <[email protected]>
    Co-developed-by: Thorsten Winkler <[email protected]>
    Signed-off-by: Thorsten Winkler <[email protected]>
    Reviewed-by: Peter Oberparleiter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

s390/zcrypt: fix reference counting on zcrypt card objects [+ + +]

Author: Harald Freudenberger <[email protected]>
Date:   Thu Feb 29 15:20:09 2024 +0100

    s390/zcrypt: fix reference counting on zcrypt card objects
    
    [ Upstream commit 50ed48c80fecbe17218afed4f8bed005c802976c ]
    
    Tests with hot-plugging crytpo cards on KVM guests with debug
    kernel build revealed an use after free for the load field of
    the struct zcrypt_card. The reason was an incorrect reference
    handling of the zcrypt card object which could lead to a free
    of the zcrypt card object while it was still in use.
    
    This is an example of the slab message:
    
        kernel: 0x00000000885a7512-0x00000000885a7513 @offset=1298. First byte 0x68 instead of 0x6b
        kernel: Allocated in zcrypt_card_alloc+0x36/0x70 [zcrypt] age=18046 cpu=3 pid=43
        kernel:  kmalloc_trace+0x3f2/0x470
        kernel:  zcrypt_card_alloc+0x36/0x70 [zcrypt]
        kernel:  zcrypt_cex4_card_probe+0x26/0x380 [zcrypt_cex4]
        kernel:  ap_device_probe+0x15c/0x290
        kernel:  really_probe+0xd2/0x468
        kernel:  driver_probe_device+0x40/0xf0
        kernel:  __device_attach_driver+0xc0/0x140
        kernel:  bus_for_each_drv+0x8c/0xd0
        kernel:  __device_attach+0x114/0x198
        kernel:  bus_probe_device+0xb4/0xc8
        kernel:  device_add+0x4d2/0x6e0
        kernel:  ap_scan_adapter+0x3d0/0x7c0
        kernel:  ap_scan_bus+0x5a/0x3b0
        kernel:  ap_scan_bus_wq_callback+0x40/0x60
        kernel:  process_one_work+0x26e/0x620
        kernel:  worker_thread+0x21c/0x440
        kernel: Freed in zcrypt_card_put+0x54/0x80 [zcrypt] age=9024 cpu=3 pid=43
        kernel:  kfree+0x37e/0x418
        kernel:  zcrypt_card_put+0x54/0x80 [zcrypt]
        kernel:  ap_device_remove+0x4c/0xe0
        kernel:  device_release_driver_internal+0x1c4/0x270
        kernel:  bus_remove_device+0x100/0x188
        kernel:  device_del+0x164/0x3c0
        kernel:  device_unregister+0x30/0x90
        kernel:  ap_scan_adapter+0xc8/0x7c0
        kernel:  ap_scan_bus+0x5a/0x3b0
        kernel:  ap_scan_bus_wq_callback+0x40/0x60
        kernel:  process_one_work+0x26e/0x620
        kernel:  worker_thread+0x21c/0x440
        kernel:  kthread+0x150/0x168
        kernel:  __ret_from_fork+0x3c/0x58
        kernel:  ret_from_fork+0xa/0x30
        kernel: Slab 0x00000372022169c0 objects=20 used=18 fp=0x00000000885a7c88 flags=0x3ffff00000000a00(workingset|slab|node=0|zone=1|lastcpupid=0x1ffff)
        kernel: Object 0x00000000885a74b8 @offset=1208 fp=0x00000000885a7c88
        kernel: Redzone  00000000885a74b0: bb bb bb bb bb bb bb bb                          ........
        kernel: Object   00000000885a74b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        kernel: Object   00000000885a74c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        kernel: Object   00000000885a74d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        kernel: Object   00000000885a74e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        kernel: Object   00000000885a74f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        kernel: Object   00000000885a7508: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 68 4b 6b 6b 6b a5  kkkkkkkkkkhKkkk.
        kernel: Redzone  00000000885a7518: bb bb bb bb bb bb bb bb                          ........
        kernel: Padding  00000000885a756c: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a              ZZZZZZZZZZZZ
        kernel: CPU: 0 PID: 387 Comm: systemd-udevd Not tainted 6.8.0-HF #2
        kernel: Hardware name: IBM 3931 A01 704 (KVM/Linux)
        kernel: Call Trace:
        kernel:  [<00000000ca5ab5b8>] dump_stack_lvl+0x90/0x120
        kernel:  [<00000000c99d78bc>] check_bytes_and_report+0x114/0x140
        kernel:  [<00000000c99d53cc>] check_object+0x334/0x3f8
        kernel:  [<00000000c99d820c>] alloc_debug_processing+0xc4/0x1f8
        kernel:  [<00000000c99d852e>] get_partial_node.part.0+0x1ee/0x3e0
        kernel:  [<00000000c99d94ec>] ___slab_alloc+0xaf4/0x13c8
        kernel:  [<00000000c99d9e38>] __slab_alloc.constprop.0+0x78/0xb8
        kernel:  [<00000000c99dc8dc>] __kmalloc+0x434/0x590
        kernel:  [<00000000c9b4c0ce>] ext4_htree_store_dirent+0x4e/0x1c0
        kernel:  [<00000000c9b908a2>] htree_dirblock_to_tree+0x17a/0x3f0
        kernel:  [<00000000c9b919dc>] ext4_htree_fill_tree+0x134/0x400
        kernel:  [<00000000c9b4b3d0>] ext4_dx_readdir+0x160/0x2f0
        kernel:  [<00000000c9b4bedc>] ext4_readdir+0x5f4/0x760
        kernel:  [<00000000c9a7efc4>] iterate_dir+0xb4/0x280
        kernel:  [<00000000c9a7f1ea>] __do_sys_getdents64+0x5a/0x120
        kernel:  [<00000000ca5d6946>] __do_syscall+0x256/0x310
        kernel:  [<00000000ca5eea10>] system_call+0x70/0x98
        kernel: INFO: lockdep is turned off.
        kernel: FIX kmalloc-96: Restoring Poison 0x00000000885a7512-0x00000000885a7513=0x6b
        kernel: FIX kmalloc-96: Marking all objects used
    
    The fix is simple: Before use of the queue not only the queue object
    but also the card object needs to increase it's reference count
    with a call to zcrypt_card_get(). Similar after use of the queue
    not only the queue but also the card object's reference count is
    decreased with zcrypt_card_put().
    
    Signed-off-by: Harald Freudenberger <[email protected]>
    Reviewed-by: Holger Dengler <[email protected]>
    Cc: [email protected]
    Signed-off-by: Heiko Carstens <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: core: Fix unremoved procfs host directory regression [+ + +]

Author: Guilherme G. Piccoli <[email protected]>
Date:   Wed Mar 13 08:21:20 2024 -0300

    scsi: core: Fix unremoved procfs host directory regression
    
    commit f23a4d6e07570826fe95023ca1aa96a011fa9f84 upstream.
    
    Commit fc663711b944 ("scsi: core: Remove the /proc/scsi/${proc_name}
    directory earlier") fixed a bug related to modules loading/unloading, by
    adding a call to scsi_proc_hostdir_rm() on scsi_remove_host(). But that led
    to a potential duplicate call to the hostdir_rm() routine, since it's also
    called from scsi_host_dev_release(). That triggered a regression report,
    which was then fixed by commit be03df3d4bfe ("scsi: core: Fix a procfs host
    directory removal regression"). The fix just dropped the hostdir_rm() call
    from dev_release().
    
    But it happens that this proc directory is created on scsi_host_alloc(),
    and that function "pairs" with scsi_host_dev_release(), while
    scsi_remove_host() pairs with scsi_add_host(). In other words, it seems the
    reason for removing the proc directory on dev_release() was meant to cover
    cases in which a SCSI host structure was allocated, but the call to
    scsi_add_host() didn't happen. And that pattern happens to exist in some
    error paths, for example.
    
    Syzkaller causes that by using USB raw gadget device, error'ing on
    usb-storage driver, at usb_stor_probe2(). By checking that path, we can see
    that the BadDevice label leads to a scsi_host_put() after a SCSI host
    allocation, but there's no call to scsi_add_host() in such path. That leads
    to messages like this in dmesg (and a leak of the SCSI host proc
    structure):
    
    usb-storage 4-1:87.51: USB Mass Storage device detected
    proc_dir_entry 'scsi/usb-storage' already registered
    WARNING: CPU: 1 PID: 3519 at fs/proc/generic.c:377 proc_register+0x347/0x4e0 fs/proc/generic.c:376
    
    The proper fix seems to still call scsi_proc_hostdir_rm() on dev_release(),
    but guard that with the state check for SHOST_CREATED; there is even a
    comment in scsi_host_dev_release() detailing that: such conditional is
    meant for cases where the SCSI host was allocated but there was no calls to
    {add,remove}_host(), like the usb-storage case.
    
    This is what we propose here and with that, the error path of usb-storage
    does not trigger the warning anymore.
    
    Reported-by: [email protected]
    Fixes: be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression")
    Cc: [email protected]
    Cc: Bart Van Assche <[email protected]>
    Cc: John Garry <[email protected]>
    Cc: Shin'ichiro Kawasaki <[email protected]>
    Signed-off-by: Guilherme G. Piccoli <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Bart Van Assche <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: lpfc: Correct size for wqe for memset() [+ + +]

Author: Muhammad Usama Anjum <[email protected]>
Date:   Mon Mar 4 14:06:48 2024 +0500

    scsi: lpfc: Correct size for wqe for memset()
    
    commit 28d41991182c210ec1654f8af2e140ef4cc73f20 upstream.
    
    The wqe is of type lpfc_wqe128. It should be memset with the same type.
    
    Fixes: 6c621a2229b0 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context")
    Signed-off-by: Muhammad Usama Anjum <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
    Reviewed-by: Justin Tee <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: mylex: Fix sysfs buffer lengths [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Tue Mar 26 23:38:06 2024 +0100

    scsi: mylex: Fix sysfs buffer lengths
    
    [ Upstream commit 1197c5b2099f716b3de327437fb50900a0b936c9 ]
    
    The myrb and myrs drivers use an odd way of implementing their sysfs files,
    calling snprintf() with a fixed length of 32 bytes to print into a page
    sized buffer. One of the strings is actually longer than 32 bytes, which
    clang can warn about:
    
    drivers/scsi/myrb.c:1906:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation]
    drivers/scsi/myrs.c:1089:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation]
    
    These could all be plain sprintf() without a length as the buffer is always
    long enough. On the other hand, sysfs files should not be overly long
    either, so just double the length to make sure the longest strings don't
    get truncated here.
    
    Fixes: 77266186397c ("scsi: myrs: Add Mylex RAID controller (SCSI interface)")
    Fixes: 081ff398c56c ("scsi: myrb: Add Mylex RAID controller (block interface)")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Hannes Reinecke <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: qla2xxx: Change debug message during driver unload [+ + +]

Author: Saurav Kashyap <[email protected]>
Date:   Tue Feb 27 22:11:25 2024 +0530

    scsi: qla2xxx: Change debug message during driver unload
    
    commit b5a30840727a3e41d12a336d19f6c0716b299161 upstream.
    
    Upon driver unload, purge_mbox flag is set and the heartbeat monitor thread
    detects this flag and does not send the mailbox command down to FW with a
    debug message "Error detected: purge[1] eeh[0] cmd=0x0, Exiting".  This
    being not a real error, change the debug message.
    
    Cc: [email protected]
    Signed-off-by: Saurav Kashyap <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Delay I/O Abort on PCI error [+ + +]

Author: Quinn Tran <[email protected]>
Date:   Tue Feb 27 22:11:26 2024 +0530

    scsi: qla2xxx: Delay I/O Abort on PCI error
    
    commit 591c1fdf2016d118b8fbde427b796fac13f3f070 upstream.
    
    Currently when PCI error is detected, I/O is aborted manually through the
    ABORT IOCB mechanism which is not guaranteed to succeed.
    
    Instead, wait for the OS or system to notify driver to wind down I/O
    through the pci_error_handlers api.  Set eeh_busy flag to pause all traffic
    and wait for I/O to drain.
    
    Cc: [email protected]
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Fix command flush on cable pull [+ + +]

Author: Quinn Tran <[email protected]>
Date:   Tue Feb 27 22:11:22 2024 +0530

    scsi: qla2xxx: Fix command flush on cable pull
    
    commit a27d4d0e7de305def8a5098a614053be208d1aa1 upstream.
    
    System crash due to command failed to flush back to SCSI layer.
    
     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
     PGD 0 P4D 0
     Oops: 0000 [#1] SMP NOPTI
     CPU: 27 PID: 793455 Comm: kworker/u130:6 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
     Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021
     Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
     RIP: 0010:__wake_up_common+0x4c/0x190
     Code: 24 10 4d 85 c9 74 0a 41 f6 01 04 0f 85 9d 00 00 00 48 8b 43 08 48 83 c3 08 4c 8d 48 e8 49 8d 41 18 48 39 c3 0f 84 f0 00 00 00 <49> 8b 41 18 89 54 24 08 31 ed 4c 8d 70 e8 45 8b 29 41 f6 c5 04 75
     RSP: 0018:ffff95f3e0cb7cd0 EFLAGS: 00010086
     RAX: 0000000000000000 RBX: ffff8b08d3b26328 RCX: 0000000000000000
     RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8b08d3b26320
     RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffe8
     R10: 0000000000000000 R11: ffff95f3e0cb7a60 R12: ffff95f3e0cb7d20
     R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
     FS:  0000000000000000(0000) GS:ffff8b2fdf6c0000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 0000000000000000 CR3: 0000002f1e410002 CR4: 00000000007706e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     PKRU: 55555554
     Call Trace:
      __wake_up_common_lock+0x7c/0xc0
      qla_nvme_ls_req+0x355/0x4c0 [qla2xxx]
     qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae1407ca000 from port 21:32:00:02:ac:07:ee:b8 loop_id 0x02 s_id 01:02:00 logout 1 keep 0 els_logo 0
     ? __nvme_fc_send_ls_req+0x260/0x380 [nvme_fc]
     qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:00:02:ac:07:ee:b8 state transitioned from ONLINE to LOST - portid=010200.
      ? nvme_fc_send_ls_req.constprop.42+0x1a/0x45 [nvme_fc]
     qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320002ac07eeb8. rport ffff8ae598122000 roles 1
     ? nvme_fc_connect_ctrl_work.cold.63+0x1e3/0xa7d [nvme_fc]
     qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae14801e000 from port 21:32:01:02:ad:f7:ee:b8 loop_id 0x04 s_id 01:02:01 logout 1 keep 0 els_logo 0
      ? __switch_to+0x10c/0x450
     ? process_one_work+0x1a7/0x360
     qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:01:02:ad:f7:ee:b8 state transitioned from ONLINE to LOST - portid=010201.
      ? worker_thread+0x1ce/0x390
      ? create_worker+0x1a0/0x1a0
     qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320102adf7eeb8. rport ffff8ae3b2312800 roles 70
      ? kthread+0x10a/0x120
     qla2xxx [0000:12:00.1]-2112:3: qla_nvme_unregister_remote_port: unregister remoteport on ffff8ae14801e000 21320102adf7eeb8
      ? set_kthread_struct+0x40/0x40
     qla2xxx [0000:12:00.1]-2110:3: remoteport_delete of ffff8ae14801e000 21320102adf7eeb8 completed.
      ? ret_from_fork+0x1f/0x40
     qla2xxx [0000:12:00.1]-f086:3: qlt_free_session_done: waiting for sess ffff8ae14801e000 logout
    
    The system was under memory stress where driver was not able to allocate an
    SRB to carry out error recovery of cable pull.  The failure to flush causes
    upper layer to start modifying scsi_cmnd.  When the system frees up some
    memory, the subsequent cable pull trigger another command flush. At this
    point the driver access a null pointer when attempting to DMA unmap the
    SGL.
    
    Add a check to make sure commands are flush back on session tear down to
    prevent the null pointer access.
    
    Cc: [email protected]
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Fix double free of fcport [+ + +]

Author: Saurav Kashyap <[email protected]>
Date:   Tue Feb 27 22:11:24 2024 +0530

    scsi: qla2xxx: Fix double free of fcport
    
    commit 82f522ae0d97119a43da53e0f729275691b9c525 upstream.
    
    The server was crashing after LOGO because fcport was getting freed twice.
    
     -----------[ cut here ]-----------
     kernel BUG at mm/slub.c:371!
     invalid opcode: 0000 1 SMP PTI
     CPU: 35 PID: 4610 Comm: bash Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8.x86_64 #1
     Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021
     RIP: 0010:set_freepointer.part.57+0x0/0x10
     RSP: 0018:ffffb07107027d90 EFLAGS: 00010246
     RAX: ffff9cb7e3150000 RBX: ffff9cb7e332b9c0 RCX: ffff9cb7e3150400
     RDX: 0000000000001f37 RSI: 0000000000000000 RDI: ffff9cb7c0005500
     RBP: fffff693448c5400 R08: 0000000080000000 R09: 0000000000000009
     R10: 0000000000000000 R11: 0000000000132af0 R12: ffff9cb7c0005500
     R13: ffff9cb7e3150000 R14: ffffffffc06990e0 R15: ffff9cb7ea85ea58
     FS: 00007ff6b79c2740(0000) GS:ffff9cb8f7ec0000(0000) knlGS:0000000000000000
     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000055b426b7d700 CR3: 0000000169c18002 CR4: 00000000007706e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     PKRU: 55555554
     Call Trace:
     kfree+0x238/0x250
     qla2x00_els_dcmd_sp_free+0x20/0x230 [qla2xxx]
     ? qla24xx_els_dcmd_iocb+0x607/0x690 [qla2xxx]
     qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx]
     ? qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx]
     ? kernfs_fop_write+0x11e/0x1a0
    
    Remove one of the free calls and add check for valid fcport. Also use
    function qla2x00_free_fcport() instead of kfree().
    
    Cc: [email protected]
    Signed-off-by: Saurav Kashyap <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Fix N2N stuck connection [+ + +]

Author: Quinn Tran <[email protected]>
Date:   Tue Feb 27 22:11:18 2024 +0530

    scsi: qla2xxx: Fix N2N stuck connection
    
    commit 881eb861ca3877300570db10abbf11494e48548d upstream.
    
    Disk failed to rediscover after chip reset error injection. The chip reset
    happens at the time when a PLOGI is being sent. This causes a flag to be
    left on which blocks the retry. Clear the blocking flag.
    
    Cc: [email protected]
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: NVME|FCP prefer flag not being honored [+ + +]

Author: Quinn Tran <[email protected]>
Date:   Tue Feb 27 22:11:21 2024 +0530

    scsi: qla2xxx: NVME|FCP prefer flag not being honored
    
    commit 69aecdd410106dc3a8f543a4f7ec6379b995b8d0 upstream.
    
    Changing of [FCP|NVME] prefer flag in flash has no effect on driver. For
    device that supports both FCP + NVMe over the same connection, driver
    continues to connect to this device using the previous successful login
    mode.
    
    On completion of flash update, adapter will be reset. Driver will
    reset the prefer flag based on setting from flash.
    
    Cc: [email protected]
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Prevent command send on chip reset [+ + +]

Author: Quinn Tran <[email protected]>
Date:   Tue Feb 27 22:11:17 2024 +0530

    scsi: qla2xxx: Prevent command send on chip reset
    
    commit 4895009c4bb72f71f2e682f1e7d2c2d96e482087 upstream.
    
    Currently IOCBs are allowed to push through while chip reset could be in
    progress. During chip reset the outstanding_cmds array is cleared
    twice. Once when any command on this array is returned as failed and
    secondly when the array is initialize to zero. If a command is inserted on
    to the array between these intervals, then the command will be lost.  Check
    for chip reset before sending IOCB.
    
    Cc: [email protected]
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Split FCE|EFT trace control [+ + +]

Author: Quinn Tran <[email protected]>
Date:   Tue Feb 27 22:11:19 2024 +0530

    scsi: qla2xxx: Split FCE|EFT trace control
    
    commit 76a192e1a566e15365704b9f8fb3b70825f85064 upstream.
    
    Current code combines the allocation of FCE|EFT trace buffers and enables
    the features all in 1 step.
    
    Split this step into separate steps in preparation for follow-on patch to
    allow user to have a choice to enable / disable FCE trace feature.
    
    Cc: [email protected]
    Reported-by: kernel test robot <[email protected]>
    Signed-off-by: Quinn Tran <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: qla2xxx: Update manufacturer detail [+ + +]

Author: Bikash Hazarika <[email protected]>
Date:   Tue Feb 27 22:11:20 2024 +0530

    scsi: qla2xxx: Update manufacturer detail
    
    [ Upstream commit 688fa069fda6fce24d243cddfe0c7024428acb74 ]
    
    Update manufacturer detail from "Marvell Semiconductor, Inc." to
    "Marvell".
    
    Cc: [email protected]
    Signed-off-by: Bikash Hazarika <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: qla2xxx: Update manufacturer details [+ + +]

Author: Bikash Hazarika <[email protected]>
Date:   Tue Jul 12 22:20:44 2022 -0700

    scsi: qla2xxx: Update manufacturer details
    
    [ Upstream commit 1ccad27716ecad1fd58c35e579bedb81fa5e1ad5 ]
    
    Update manufacturer details to indicate Marvell Semiconductors.
    
    Link: https://lore.kernel.org/r/[email protected]
    Cc: [email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Bikash Hazarika <[email protected]>
    Signed-off-by: Nilesh Javali <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Stable-dep-of: 688fa069fda6 ("scsi: qla2xxx: Update manufacturer detail")
    Signed-off-by: Sasha Levin <[email protected]>

scsi: usb: Call scsi_done() directly [+ + +]

Author: Bart Van Assche <[email protected]>
Date:   Thu Oct 7 13:46:10 2021 -0700

    scsi: usb: Call scsi_done() directly
    
    [ Upstream commit 46c97948e9b5bc8b67fd72741a2fe723ac1d14d7 ]
    
    Conditional statements are faster than indirect calls. Hence call
    scsi_done() directly.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bart Van Assche <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Stable-dep-of: cd5432c71235 ("USB: UAS: return ENODEV when submit urbs fail with device not attached")
    Signed-off-by: Sasha Levin <[email protected]>

scsi: usb: Stop using the SCSI pointer [+ + +]

Author: Bart Van Assche <[email protected]>
Date:   Fri Feb 18 11:51:13 2022 -0800

    scsi: usb: Stop using the SCSI pointer
    
    [ Upstream commit 5dfcf1ad933fe877cb44e9fb7a661dfc22190101 ]
    
    Set scsi_host_template.cmd_size instead of using the SCSI pointer for
    storing driver-private data. Change the type of the argument of
    uas_add_work() from struct uas_cmd_info * into struct scsi_cmnd * because
    it is easier to convert a SCSI command pointer into a uas_cmd_info pointer
    than the other way around.
    
    This patch prepares for removal of the SCSI pointer from struct scsi_cmnd.
    
    Link: https://lore.kernel.org/r/[email protected]
    Cc: [email protected]
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: Hannes Reinecke <[email protected]>
    Reviewed-by: Himanshu Madhani <[email protected]>
    Acked-by: Greg Kroah-Hartman <[email protected]>
    Acked-by: Oliver Neukum <[email protected]>
    Signed-off-by: Bart Van Assche <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Stable-dep-of: cd5432c71235 ("USB: UAS: return ENODEV when submit urbs fail with device not attached")
    Signed-off-by: Sasha Levin <[email protected]>

selftests/mqueue: Set timeout to 180 seconds [+ + +]

Author: SeongJae Park <[email protected]>
Date:   Mon Feb 19 16:08:02 2024 -0800

    selftests/mqueue: Set timeout to 180 seconds
    
    [ Upstream commit 85506aca2eb4ea41223c91c5fe25125953c19b13 ]
    
    While mq_perf_tests runs with the default kselftest timeout limit, which
    is 45 seconds, the test takes about 60 seconds to complete on i3.metal
    AWS instances.  Hence, the test always times out.  Increase the timeout
    to 180 seconds.
    
    Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test")
    Cc: <[email protected]> # 5.4.x
    Signed-off-by: SeongJae Park <[email protected]>
    Reviewed-by: Kees Cook <[email protected]>
    Signed-off-by: Shuah Khan <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: mptcp: diag: return KSFT_FAIL not test_cnt [+ + +]

Author: Geliang Tang <[email protected]>
Date:   Fri Mar 1 18:11:22 2024 +0100

    selftests: mptcp: diag: return KSFT_FAIL not test_cnt
    
    commit 45bcc0346561daa3f59e19a753cc7f3e08e8dff1 upstream.
    
    The test counter 'test_cnt' should not be returned in diag.sh, e.g. what
    if only the 4th test fail? Will do 'exit 4' which is 'exit ${KSFT_SKIP}',
    the whole test will be marked as skipped instead of 'failed'!
    
    So we should do ret=${KSFT_FAIL} instead.
    
    Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests")
    Cc: [email protected]
    Fixes: 42fb6cddec3b ("selftests: mptcp: more stable diag tests")
    Signed-off-by: Geliang Tang <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

selftests: net: gro fwd: update vxlan GRO test expectations [+ + +]

Author: Antoine Tenart <[email protected]>
Date:   Tue Mar 26 12:34:02 2024 +0100

    selftests: net: gro fwd: update vxlan GRO test expectations
    
    commit 0fb101be97ca27850c5ecdbd1269423ce4d1f607 upstream.
    
    UDP tunnel packets can't be GRO in-between their endpoints as this
    causes different issues. The UDP GRO fwd vxlan tests were relying on
    this and their expectations have to be fixed.
    
    We keep both vxlan tests and expected no GRO from happening. The vxlan
    UDP GRO bench test was removed as it's not providing any valuable
    information now.
    
    Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests")
    Signed-off-by: Antoine Tenart <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

selftests: reuseaddr_conflict: add missing new line at the end of the output [+ + +]

Author: Jakub Kicinski <[email protected]>
Date:   Fri Mar 29 09:05:59 2024 -0700

    selftests: reuseaddr_conflict: add missing new line at the end of the output
    
    commit 31974122cfdeaf56abc18d8ab740d580d9833e90 upstream.
    
    The netdev CI runs in a VM and captures serial, so stdout and
    stderr get combined. Because there's a missing new line in
    stderr the test ends up corrupting KTAP:
    
      # Successok 1 selftests: net: reuseaddr_conflict
    
    which should have been:
    
      # Success
      ok 1 selftests: net: reuseaddr_conflict
    
    Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test")
    Reviewed-by: Muhammad Usama Anjum <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: Lock console when calling into driver before registration [+ + +]

Author: Peter Collingbourne <[email protected]>
Date:   Mon Mar 4 13:43:49 2024 -0800

    serial: Lock console when calling into driver before registration
    
    [ Upstream commit 801410b26a0e8b8a16f7915b2b55c9528b69ca87 ]
    
    During the handoff from earlycon to the real console driver, we have
    two separate drivers operating on the same device concurrently. In the
    case of the 8250 driver these concurrent accesses cause problems due
    to the driver's use of banked registers, controlled by LCR.DLAB. It is
    possible for the setup(), config_port(), pm() and set_mctrl() callbacks
    to set DLAB, which can cause the earlycon code that intends to access
    TX to instead access DLL, leading to missed output and corruption on
    the serial line due to unintended modifications to the baud rate.
    
    In particular, for setup() we have:
    
    univ8250_console_setup()
    -> serial8250_console_setup()
    -> uart_set_options()
    -> serial8250_set_termios()
    -> serial8250_do_set_termios()
    -> serial8250_do_set_divisor()
    
    For config_port() we have:
    
    serial8250_config_port()
    -> autoconfig()
    
    For pm() we have:
    
    serial8250_pm()
    -> serial8250_do_pm()
    -> serial8250_set_sleep()
    
    For set_mctrl() we have (for some devices):
    
    serial8250_set_mctrl()
    -> omap8250_set_mctrl()
    -> __omap8250_set_mctrl()
    
    To avoid such problems, let's make it so that the console is locked
    during pre-registration calls to these callbacks, which will prevent
    the earlycon driver from running concurrently.
    
    Remove the partial solution to this problem in the 8250 driver
    that locked the console only during autoconfig_irq(), as this would
    result in a deadlock with the new approach. The console continues
    to be locked during autoconfig_irq() because it can only be called
    through uart_configure_port().
    
    Although this patch introduces more locking than strictly necessary
    (and in particular it also locks during the call to rs485_config()
    which is not affected by this issue as far as I can tell), it follows
    the principle that it is the responsibility of the generic console
    code to manage the earlycon handoff by ensuring that earlycon and real
    console driver code cannot run concurrently, and not the individual
    drivers.
    
    Signed-off-by: Peter Collingbourne <[email protected]>
    Reviewed-by: John Ogness <[email protected]>
    Link: https://linux-review.googlesource.com/id/I7cf8124dcebf8618e6b2ee543fa5b25532de55d8
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions for FIFO [+ + +]

Author: Hugo Villeneuve <[email protected]>
Date:   Mon Dec 11 12:13:52 2023 -0500

    serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions for FIFO
    
    commit dbf4ab821804df071c8b566d9813083125e6d97b upstream.
    
    The SC16IS7XX IC supports a burst mode to access the FIFOs where the
    initial register address is sent ($00), followed by all the FIFO data
    without having to resend the register address each time. In this mode, the
    IC doesn't increment the register address for each R/W byte.
    
    The regmap_raw_read() and regmap_raw_write() are functions which can
    perform IO over multiple registers. They are currently used to read/write
    from/to the FIFO, and although they operate correctly in this burst mode on
    the SPI bus, they would corrupt the regmap cache if it was not disabled
    manually. The reason is that when the R/W size is more than 1 byte, these
    functions assume that the register address is incremented and handle the
    cache accordingly.
    
    Convert FIFO R/W functions to use the regmap _noinc_ versions in order to
    remove the manual cache control which was a workaround when using the
    _raw_ versions. FIFO registers are properly declared as volatile so
    cache will not be used/updated for FIFO accesses.
    
    Fixes: dfeae619d781 ("serial: sc16is7xx")
    Cc:  <[email protected]>
    Signed-off-by: Hugo Villeneuve <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: Hugo Villeneuve <[email protected]>
    Signed-off-by: GONG, Ruiqi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

slimbus: core: Remove usage of the deprecated ida_simple_xx() API [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Sat Feb 24 11:41:37 2024 +0000

    slimbus: core: Remove usage of the deprecated ida_simple_xx() API
    
    [ Upstream commit 89ffa4cccec54467446f141a79b9e36893079fb8 ]
    
    ida_alloc() and ida_free() should be preferred to the deprecated
    ida_simple_get() and ida_simple_remove().
    
    Note that the upper limit of ida_simple_get() is exclusive, but the one of
    ida_alloc_range() is inclusive. So change this change allows one more
    device. Previously address 0xFE was never used.
    
    Fixes: 46a2bb5a7f7e ("slimbus: core: Add slim controllers support")
    Cc: [email protected]
    Signed-off-by: Christophe JAILLET <[email protected]>
    Signed-off-by: Srinivas Kandagatla <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity() [+ + +]

Author: Roberto Sassu <[email protected]>
Date:   Thu Nov 16 10:01:22 2023 +0100

    smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
    
    [ Upstream commit ac02f007d64eb2769d0bde742aac4d7a5fc6e8a5 ]
    
    If the SMACK64TRANSMUTE xattr is provided, and the inode is a directory,
    update the in-memory inode flags by setting SMK_INODE_TRANSMUTE.
    
    Cc: [email protected]
    Fixes: 5c6d1125f8db ("Smack: Transmute labels on specified directories") # v2.6.38.x
    Signed-off-by: Roberto Sassu <[email protected]>
    Signed-off-by: Casey Schaufler <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr() [+ + +]

Author: Roberto Sassu <[email protected]>
Date:   Thu Nov 16 10:01:21 2023 +0100

    smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
    
    [ Upstream commit 9c82169208dde516510aaba6bbd8b13976690c5d ]
    
    Since the SMACK64TRANSMUTE xattr makes sense only for directories, enforce
    this restriction in smack_inode_setxattr().
    
    Cc: [email protected]
    Fixes: 5c6d1125f8db ("Smack: Transmute labels on specified directories") # v2.6.38.x
    Signed-off-by: Roberto Sassu <[email protected]>
    Signed-off-by: Casey Schaufler <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: qbman: Add CGR update function [+ + +]

Author: Sean Anderson <[email protected]>
Date:   Fri Sep 2 17:57:35 2022 -0400

    soc: fsl: qbman: Add CGR update function
    
    [ Upstream commit 914f8b228ede709274b8c80514b352248ec9da00 ]
    
    This adds a function to update a CGR with new parameters. qman_create_cgr
    can almost be used for this (with flags=0), but it's not suitable because
    it also registers the callback function. The _safe variant was modeled off
    of qman_cgr_delete_safe. However, we handle multiple arguments and a return
    value.
    
    Signed-off-by: Sean Anderson <[email protected]>
    Acked-by: Camelia Groza <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Stable-dep-of: fbec4e7fed89 ("soc: fsl: qbman: Use raw spinlock for cgr_lock")
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: qbman: Add helper for sanity checking cgr ops [+ + +]

Author: Sean Anderson <[email protected]>
Date:   Fri Sep 2 17:57:34 2022 -0400

    soc: fsl: qbman: Add helper for sanity checking cgr ops
    
    [ Upstream commit d0e17a4653cebc2c8a20251c837dd1fcec5014d9 ]
    
    This breaks out/combines get_affine_portal and the cgr sanity check in
    preparation for the next commit. No functional change intended.
    
    Signed-off-by: Sean Anderson <[email protected]>
    Acked-by: Camelia Groza <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Stable-dep-of: fbec4e7fed89 ("soc: fsl: qbman: Use raw spinlock for cgr_lock")
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: qbman: Always disable interrupts when taking cgr_lock [+ + +]

Author: Sean Anderson <[email protected]>
Date:   Mon Mar 11 12:38:29 2024 -0400

    soc: fsl: qbman: Always disable interrupts when taking cgr_lock
    
    [ Upstream commit 584c2a9184a33a40fceee838f856de3cffa19be3 ]
    
    smp_call_function_single disables IRQs when executing the callback. To
    prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere.
    This is already done by qman_update_cgr and qman_delete_cgr; fix the
    other lockers.
    
    Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
    CC: [email protected]
    Signed-off-by: Sean Anderson <[email protected]>
    Reviewed-by: Camelia Groza <[email protected]>
    Tested-by: Vladimir Oltean <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: qbman: Use raw spinlock for cgr_lock [+ + +]

Author: Sean Anderson <[email protected]>
Date:   Mon Mar 11 12:38:30 2024 -0400

    soc: fsl: qbman: Use raw spinlock for cgr_lock
    
    [ Upstream commit fbec4e7fed89b579f2483041fabf9650fb0dd6bc ]
    
    smp_call_function always runs its callback in hard IRQ context, even on
    PREEMPT_RT, where spinlocks can sleep. So we need to use a raw spinlock
    for cgr_lock to ensure we aren't waiting on a sleeping task.
    
    Although this bug has existed for a while, it was not apparent until
    commit ef2a8d5478b9 ("net: dpaa: Adjust queue depth on rate change")
    which invokes smp_call_function_single via qman_update_cgr_safe every
    time a link goes up or down.
    
    Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
    CC: [email protected]
    Reported-by: Vladimir Oltean <[email protected]>
    Closes: https://lore.kernel.org/all/20230323153935.nofnjucqjqnz34ej@skbuf/
    Reported-by: Steffen Trumtrar <[email protected]>
    Closes: https://lore.kernel.org/linux-arm-kernel/[email protected]/
    Signed-off-by: Sean Anderson <[email protected]>
    Reviewed-by: Camelia Groza <[email protected]>
    Tested-by: Vladimir Oltean <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

sparc64: NMI watchdog: fix return value of __setup handler [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Sat Feb 10 21:28:02 2024 -0800

    sparc64: NMI watchdog: fix return value of __setup handler
    
    [ Upstream commit 3ed7c61e49d65dacb96db798c0ab6fcd55a1f20f ]
    
    __setup() handlers should return 1 to obsolete_checksetup() in
    init/main.c to indicate that the boot option has been handled.
    A return of 0 causes the boot option/value to be listed as an Unknown
    kernel parameter and added to init's (limited) argument or environment
    strings. Also, error return codes don't mean anything to
    obsolete_checksetup() -- only non-zero (usually 1) or zero.
    So return 1 from setup_nmi_watchdog().
    
    Fixes: e5553a6d0442 ("sparc64: Implement NMI watchdog on capable cpus.")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Link: lore.kernel.org/r/[email protected]
    Cc: "David S. Miller" <[email protected]>
    Cc: [email protected]
    Cc: Sam Ravnborg <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: [email protected]
    Cc: Arnd Bergmann <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Signed-off-by: Andreas Larsson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

sparc: vDSO: fix return value of __setup handler [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Sat Feb 10 21:28:08 2024 -0800

    sparc: vDSO: fix return value of __setup handler
    
    [ Upstream commit 5378f00c935bebb846b1fdb0e79cb76c137c56b5 ]
    
    __setup() handlers should return 1 to obsolete_checksetup() in
    init/main.c to indicate that the boot option has been handled.
    A return of 0 causes the boot option/value to be listed as an Unknown
    kernel parameter and added to init's (limited) argument or environment
    strings. Also, error return codes don't mean anything to
    obsolete_checksetup() -- only non-zero (usually 1) or zero.
    So return 1 from vdso_setup().
    
    Fixes: 9a08862a5d2e ("vDSO for sparc")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Link: lore.kernel.org/r/[email protected]
    Cc: "David S. Miller" <[email protected]>
    Cc: [email protected]
    Cc: Dan Carpenter <[email protected]>
    Cc: Nick Alcock <[email protected]>
    Cc: Sam Ravnborg <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: [email protected]
    Cc: Arnd Bergmann <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Signed-off-by: Andreas Larsson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

speakup: Fix 8bit characters from direct synth [+ + +]

Author: Samuel Thibault <[email protected]>
Date:   Sun Feb 4 16:57:36 2024 +0100

    speakup: Fix 8bit characters from direct synth
    
    [ Upstream commit b6c8dafc9d86eb77e502bb018ec4105e8d2fbf78 ]
    
    When userland echoes 8bit characters to /dev/synth with e.g.
    
    echo -e '\xe9' > /dev/synth
    
    synth_write would get characters beyond 0x7f, and thus negative when
    char is signed.  When given to synth_buffer_add which takes a u16, this
    would sign-extend and produce a U+ffxy character rather than U+xy.
    Users thus get garbled text instead of accents in their output.
    
    Let's fix this by making sure that we read unsigned characters.
    
    Signed-off-by: Samuel Thibault <[email protected]>
    Fixes: 89fc2ae80bb1 ("speakup: extend synth buffer to 16bit unicode characters")
    Cc: [email protected]
    Link: https://lore.kernel.org/r/20240204155736.2oh4ot7tiaa2wpbh@begin
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

staging: vc04_services: changen strncpy() to strscpy_pad() [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Wed Mar 13 17:36:56 2024 +0100

    staging: vc04_services: changen strncpy() to strscpy_pad()
    
    commit ef25725b7f8aaffd7756974d3246ec44fae0a5cf upstream.
    
    gcc-14 warns about this strncpy() that results in a non-terminated
    string for an overflow:
    
    In file included from include/linux/string.h:369,
                     from drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:20:
    In function 'strncpy',
        inlined from 'create_component' at drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:940:2:
    include/linux/fortify-string.h:108:33: error: '__builtin_strncpy' specified bound 128 equals destination size [-Werror=stringop-truncation]
    
    Change it to strscpy_pad(), which produces a properly terminated and
    zero-padded string.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Reviewed-by: Dan Carpenter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

staging: vc04_services: fix information leak in create_component() [+ + +]

Author: Dan Carpenter <[email protected]>
Date:   Wed Mar 13 21:07:43 2024 +0300

    staging: vc04_services: fix information leak in create_component()
    
    commit f37e76abd614b68987abc8e5c22d986013349771 upstream.
    
    The m.u.component_create.pid field is for debugging and in the mainline
    kernel it's not used anything.  However, it still needs to be set to
    something to prevent disclosing uninitialized stack data.  Set it to
    zero.
    
    Fixes: 7b3ad5abf027 ("staging: Import the BCM2835 MMAL-based V4L2 camera driver.")
    Cc: stable <[email protected]>
    Signed-off-by: Dan Carpenter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

SUNRPC/NFSD: clean up get/put functions. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC/NFSD: clean up get/put functions.
    
    [ Upstream commit 8c62d12740a1450d2e8456d5747f440e10db281a ]
    
    svc_destroy() is poorly named - it doesn't necessarily destroy the svc,
    it might just reduce the ref count.
    nfsd_destroy() is poorly named for the same reason.
    
    This patch:
     - removes the refcount functionality from svc_destroy(), moving it to
       a new svc_put().  Almost all previous callers of svc_destroy() now
       call svc_put().
     - renames nfsd_destroy() to nfsd_put() and improves the code, using
       the new svc_destroy() rather than svc_put()
     - removes a few comments that explain the important for balanced
       get/put calls.  This should be obvious.
    
    The only non-trivial part of this is that svc_destroy() would call
    svc_sock_update() on a non-final decrement.  It can no longer do that,
    and svc_put() isn't really a good place of it.  This call is now made
    from svc_exit_thread() which seems like a good place.  This makes the
    call *before* sv_nrthreads is decremented rather than after.  This
    is not particularly important as the call just sets a flag which
    causes sv_nrthreads set be checked later.  A subsequent patch will
    improve the ordering.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: always treat sv_nrpools==1 as "not pooled" [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC: always treat sv_nrpools==1 as "not pooled"
    
    [ Upstream commit 93aa619eb0b42eec2f3a9b4d9db41f5095390aec ]
    
    Currently 'pooled' services hold a reference on the pool_map, and
    'unpooled' services do not.
    svc_destroy() uses the presence of ->svo_function (via
    svc_serv_is_pooled()) to determine if the reference should be dropped.
    There is no direct correlation between being pooled and the use of
    svo_function, though in practice, lockd is the only non-pooled service,
    and the only one not to use svo_function.
    
    This is untidy and would cause problems if we changed lockd to use
    svc_set_num_threads(), which requires the use of ->svo_function.
    
    So change the test for "is the service pooled" to "is sv_nrpools > 1".
    
    This means that when svc_pool_map_get() returns 1, it must NOT take a
    reference to the pool.
    
    We discard svc_serv_is_pooled(), and test sv_nrpools directly.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Change return value type of .pc_decode [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Oct 12 11:57:28 2021 -0400

    SUNRPC: Change return value type of .pc_decode
    
    [ Upstream commit c44b31c263798ec34614dd394c31ef1a2e7e716e ]
    
    Returning an undecorated integer is an age-old trope, but it's
    not clear (even to previous experts in this code) that the only
    valid return values are 1 and 0. These functions do not return
    a negative errno, rpc_stat value, or a positive length.
    
    Document there are only two valid return values by having
    .pc_decode return only true or false.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Change return value type of .pc_encode [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Oct 13 10:41:13 2021 -0400

    SUNRPC: Change return value type of .pc_encode
    
    [ Upstream commit 130e2054d4a652a2bd79fb1557ddcd19c053cb37 ]
    
    Returning an undecorated integer is an age-old trope, but it's
    not clear (even to previous experts in this code) that the only
    valid return values are 1 and 0. These functions do not return
    a negative errno, rpc_stat value, or a positive length.
    
    Document there are only two valid return values by having
    .pc_encode return only true or false.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: change svc_get() to return the svc. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC: change svc_get() to return the svc.
    
    [ Upstream commit df5e49c880ea0776806b8a9f8ab95e035272cf6f ]
    
    It is common for 'get' functions to return the object that was 'got',
    and there are a couple of places where users of svc_get() would be a
    little simpler if svc_get() did that.
    
    Make it so.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: discard svo_setup and rename svc_set_num_threads_sync() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
    
    [ Upstream commit 3ebdbe5203a874614819700d3f470724cb803709 ]
    
    The ->svo_setup callback serves no purpose.  It is always called from
    within the same module that chooses which callback is needed.  So
    discard it and call the relevant function directly.
    
    Now that svc_set_num_threads() is no longer used remove it and rename
    svc_set_num_threads_sync() to remove the "_sync" suffix.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Jan 25 17:57:23 2022 -0500

    SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
    
    [ Upstream commit c0219c499799c1e92bd570c15a47e6257a27bb15 ]
    
    Neil says:
    "These functions were separated in commit 0971374e2818 ("SUNRPC:
    Reduce contention in svc_xprt_enqueue()") so that the XPT_BUSY check
    happened before taking any spinlocks.
    
    We have since moved or removed the spinlocks so the extra test is
    fairly pointless."
    
    I've made this a separate patch in case the XPT_BUSY change has
    unexpected consequences and needs to be reverted.
    
    Suggested-by: Neil Brown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: move the pool_map definitions (back) into svc.c [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC: move the pool_map definitions (back) into svc.c
    
    [ Upstream commit cf0e124e0a489944d08fcc3c694d2b234d2cc658 ]
    
    These definitions are not used outside of svc.c, and there is no
    evidence that they ever have been.  So move them into svc.c
    and make the declarations 'static'.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Parametrize how much of argsize should be zeroed [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Sep 12 17:22:38 2022 -0400

    SUNRPC: Parametrize how much of argsize should be zeroed
    
    [ Upstream commit 103cc1fafee48adb91fca0e19deb869fd23e46ab ]
    
    Currently, SUNRPC clears the whole of .pc_argsize before processing
    each incoming RPC transaction. Add an extra parameter to struct
    svc_procedure to enable upper layers to reduce the amount of each
    operation's argument structure that is zeroed by SUNRPC.
    
    The size of struct nfsd4_compoundargs, in particular, is a lot to
    clear on each incoming RPC Call. A subsequent patch will cut this
    down to something closer to what NFSv2 and NFSv3 uses.
    
    This patch should cause no behavior changes.
    
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Remove svc_shutdown_net() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jan 26 11:30:55 2022 -0500

    SUNRPC: Remove svc_shutdown_net()
    
    [ Upstream commit c7d7ec8f043e53ad16e30f5ebb8b9df415ec0f2b ]
    
    Clean up: svc_shutdown_net() now does nothing but call
    svc_close_net(). Replace all external call sites.
    
    svc_close_net() is renamed to be the inverse of svc_xprt_create().
    
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Remove svo_shutdown method [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Jan 25 13:49:29 2022 -0500

    SUNRPC: Remove svo_shutdown method
    
    [ Upstream commit 87cdd8641c8a1ec6afd2468265e20840a57fd888 ]
    
    Clean up. Neil observed that "any code that calls svc_shutdown_net()
    knows what the shutdown function should be, and so can call it
    directly."
    
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Remove the .svo_enqueue_xprt method [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Jan 25 10:17:59 2022 -0500

    SUNRPC: Remove the .svo_enqueue_xprt method
    
    [ Upstream commit a9ff2e99e9fa501ec965da03c18a5422b37a2f44 ]
    
    We have never been able to track down and address the underlying
    cause of the performance issues with workqueue-based service
    support. svo_enqueue_xprt is called multiple times per RPC, so
    it adds instruction path length, but always ends up at the same
    function: svc_xprt_do_enqueue(). We do not anticipate needing
    this flexibility for dynamic nfsd thread management support.
    
    As a micro-optimization, remove .svo_enqueue_xprt because
    Spectre/Meltdown makes virtual function calls more costly.
    
    This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc:
    turn enqueueing a svc_xprt into a svc_serv operation").
    
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Rename svc_close_xprt() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Jan 31 13:34:29 2022 -0500

    SUNRPC: Rename svc_close_xprt()
    
    [ Upstream commit 4355d767a21b9445958fc11bce9a9701f76529d3 ]
    
    Clean up: Use the "svc_xprt_<task>" function naming convention as
    is used for other external APIs.
    
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Rename svc_create_xprt() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Jan 26 11:42:08 2022 -0500

    SUNRPC: Rename svc_create_xprt()
    
    [ Upstream commit 352ad31448fecc78a2e9b78da64eea5d63b8d0ce ]
    
    Clean up: Use the "svc_xprt_<task>" function naming convention as
    is used for other external APIs.
    
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Replace the "__be32 *p" parameter to .pc_decode [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Oct 12 11:57:22 2021 -0400

    SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
    
    [ Upstream commit 16c663642c7ec03cd4cee5fec520bb69e97babe4 ]
    
    The passed-in value of the "__be32 *p" parameter is now unused in
    every server-side XDR decoder, and can be removed.
    
    Note also that there is a line in each decoder that sets up a local
    pointer to a struct xdr_stream. Passing that pointer from the
    dispatcher instead saves one line per decoder function.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Replace the "__be32 *p" parameter to .pc_encode [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Wed Oct 13 10:41:06 2021 -0400

    SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
    
    [ Upstream commit fda494411485aff91768842c532f90fb8eb54943 ]
    
    The passed-in value of the "__be32 *p" parameter is now unused in
    every server-side XDR encoder, and can be removed.
    
    Note also that there is a line in each encoder that sets up a local
    pointer to a struct xdr_stream. Passing that pointer from the
    dispatcher instead saves one line per encoder function.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: stop using ->sv_nrthreads as a refcount [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC: stop using ->sv_nrthreads as a refcount
    
    [ Upstream commit ec52361df99b490f6af412b046df9799b92c1050 ]
    
    The use of sv_nrthreads as a general refcount results in clumsy code, as
    is seen by various comments needed to explain the situation.
    
    This patch introduces a 'struct kref' and uses that for reference
    counting, leaving sv_nrthreads to be a pure count of threads.  The kref
    is managed particularly in svc_get() and svc_put(), and also nfsd_put();
    
    svc_destroy() now takes a pointer to the embedded kref, rather than to
    the serv.
    
    nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero.  This
    happens when a transport is created before the first thread is started.
    To support this, a 'keep_active' flag is introduced which holds a ref on
    the svc_serv.  This is set when any listening socket is successfully
    added (unless there are running threads), and cleared when the number of
    threads is set.  So when the last thread exits, the nfs_serv will be
    destroyed.
    The use of 'keep_active' replaces previous code which checked if there
    were any permanent sockets.
    
    We no longer clear ->rq_server when nfsd() exits.  This was done
    to prevent svc_exit_thread() from calling svc_destroy().
    Instead we take an extra reference to the svc_serv to prevent
    svc_destroy() from being called.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sat Oct 16 18:02:24 2021 -0400

    SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field
    
    [ Upstream commit b4776a341ec05e809d21e98db5ed49dbdc81d5d8 ]
    
    For certain special cases, RPC-related tracepoints record a -1 as
    the task ID or the client ID. It's ugly for a trace event to display
    4 billion in these cases.
    
    To help keep SUNRPC tracepoints consistent, create a macro that
    defines the print format specifiers for tk_pid and cl_clid. At some
    point in the future we might try tk_pid with a wider range of values
    than 0..64K so this makes it easier to make that change.
    
    RPC tracepoints now look like this:
    
    <...>-1276  [009]   149.720358: rpc_clnt_new:         client=00000005 peer=[192.168.2.55]:20049 program=nfs server=klimt.ib
    
    <...>-1342  [004]   149.921234: rpc_xdr_recvfrom:     task:0000001a@00000005 head=[0xff1242d9ab6dc01c,144] page=0 tail=[(nil),0] len=144
    <...>-1342  [004]   149.921235: xprt_release_cong:    task:0000001a@00000005 snd_task:ffffffff cong=256 cwnd=16384
    <...>-1342  [004]   149.921235: xprt_put_cong:        task:0000001a@00000005 snd_task:ffffffff cong=0 cwnd=16384
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Use RMW bitops in single-threaded hot paths [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Apr 29 10:06:21 2022 -0400

    SUNRPC: Use RMW bitops in single-threaded hot paths
    
    [ Upstream commit 28df0988815f63e2af5e6718193c9f68681ad7ff ]
    
    I noticed CPU pipeline stalls while using perf.
    
    Once an svc thread is scheduled and executing an RPC, no other
    processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are
    not needed outside the svc thread scheduler.
    
    Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: use sv_lock to protect updates to sv_nrthreads. [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Nov 29 15:51:25 2021 +1100

    SUNRPC: use sv_lock to protect updates to sv_nrthreads.
    
    [ Upstream commit 2a36395fac3b72771f87c3ee4387e3a96d85a7cc ]
    
    Using sv_lock means we don't need to hold the service mutex over these
    updates.
    
    In particular,  svc_exit_thread() no longer requires synchronisation, so
    threads can exit asynchronously.
    
    Note that we could use an atomic_t, but as there are many more read
    sites than writes, that would add unnecessary noise to the code.
    Some reads are already racy, and there is no need for them to not be.
    
    Signed-off-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

swap: comments get_swap_device() with usage rule [+ + +]

Author: Huang Ying <[email protected]>
Date:   Mon May 29 14:13:55 2023 +0800

    swap: comments get_swap_device() with usage rule
    
    [ Upstream commit a95722a047724ef75567381976a36f0e44230bd9 ]
    
    The general rule to use a swap entry is as follows.
    
    When we get a swap entry, if there aren't some other ways to prevent
    swapoff, such as the folio in swap cache is locked, page table lock is
    held, etc., the swap entry may become invalid because of swapoff.
    Then, we need to enclose all swap related functions with
    get_swap_device() and put_swap_device(), unless the swap functions
    call get/put_swap_device() by themselves.
    
    Add the rule as comments of get_swap_device().
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: "Huang, Ying" <[email protected]>
    Reviewed-by: David Hildenbrand <[email protected]>
    Reviewed-by: Yosry Ahmed <[email protected]>
    Reviewed-by: Chris Li (Google) <[email protected]>
    Cc: Hugh Dickins <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Matthew Wilcox <[email protected]>
    Cc: Michal Hocko <[email protected]>
    Cc: Minchan Kim <[email protected]>
    Cc: Tim Chen <[email protected]>
    Cc: Yang Shi <[email protected]>
    Cc: Yu Zhao <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Stable-dep-of: 82b1c07a0af6 ("mm: swap: fix race between free_swap_and_cache() and swapoff()")
    Signed-off-by: Sasha Levin <[email protected]>

swiotlb: Fix alignment checks when both allocation and DMA masks are present [+ + +]

Author: Will Deacon <[email protected]>
Date:   Fri Mar 8 15:28:27 2024 +0000

    swiotlb: Fix alignment checks when both allocation and DMA masks are present
    
    [ Upstream commit 51b30ecb73b481d5fac6ccf2ecb4a309c9ee3310 ]
    
    Nicolin reports that swiotlb buffer allocations fail for an NVME device
    behind an IOMMU using 64KiB pages. This is because we end up with a
    minimum allocation alignment of 64KiB (for the IOMMU to map the buffer
    safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME
    page (i.e. preserving the 4KiB page offset from the original allocation).
    If the original address is not 4KiB-aligned, the allocation will fail
    because swiotlb_search_pool_area() erroneously compares these unmasked
    bits with the 64KiB-aligned candidate allocation.
    
    Tweak swiotlb_search_pool_area() so that the DMA alignment mask is
    reduced based on the required alignment of the allocation.
    
    Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Nicolin Chen <[email protected]>
    Signed-off-by: Will Deacon <[email protected]>
    Reviewed-by: Michael Kelley <[email protected]>
    Tested-by: Nicolin Chen <[email protected]>
    Tested-by: Michael Kelley <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tcp: properly terminate timers for kernel sockets [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Fri Mar 22 13:57:32 2024 +0000

    tcp: properly terminate timers for kernel sockets
    
    [ Upstream commit 151c9c724d05d5b0dd8acd3e11cb69ef1f2dbada ]
    
    We had various syzbot reports about tcp timers firing after
    the corresponding netns has been dismantled.
    
    Fortunately Josef Bacik could trigger the issue more often,
    and could test a patch I wrote two years ago.
    
    When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
    to 'stop' the timers.
    
    inet_csk_clear_xmit_timers() can be called from any context,
    including when socket lock is held.
    This is the reason it uses sk_stop_timer(), aka del_timer().
    This means that ongoing timers might finish much later.
    
    For user sockets, this is fine because each running timer
    holds a reference on the socket, and the user socket holds
    a reference on the netns.
    
    For kernel sockets, we risk that the netns is freed before
    timer can complete, because kernel sockets do not hold
    reference on the netns.
    
    This patch adds inet_csk_clear_xmit_timers_sync() function
    that using sk_stop_timer_sync() to make sure all timers
    are terminated before the kernel socket is released.
    Modules using kernel sockets close them in their netns exit()
    handler.
    
    Also add sock_not_owned_by_me() helper to get LOCKDEP
    support : inet_csk_clear_xmit_timers_sync() must not be called
    while socket lock is held.
    
    It is very possible we can revert in the future commit
    3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
    which attempted to solve the issue in rds only.
    (net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
    
    We probably can remove the check_net() tests from
    tcp_out_of_resources() and __tcp_close() in the future.
    
    Reported-by: Josef Bacik <[email protected]>
    Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
    Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
    Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
    Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
    Signed-off-by: Eric Dumazet <[email protected]>
    Tested-by: Josef Bacik <[email protected]>
    Cc: Tetsuo Handa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tee: optee: Fix kernel panic caused by incorrect error handling [+ + +]

Author: Sumit Garg <[email protected]>
Date:   Fri Mar 1 20:07:31 2024 +0530

    tee: optee: Fix kernel panic caused by incorrect error handling
    
    commit 95915ba4b987cf2b222b0f251280228a1ff977ac upstream.
    
    The error path while failing to register devices on the TEE bus has a
    bug leading to kernel panic as follows:
    
    [   15.398930] Unable to handle kernel paging request at virtual address ffff07ed00626d7c
    [   15.406913] Mem abort info:
    [   15.409722]   ESR = 0x0000000096000005
    [   15.413490]   EC = 0x25: DABT (current EL), IL = 32 bits
    [   15.418814]   SET = 0, FnV = 0
    [   15.421878]   EA = 0, S1PTW = 0
    [   15.425031]   FSC = 0x05: level 1 translation fault
    [   15.429922] Data abort info:
    [   15.432813]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
    [   15.438310]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    [   15.443372]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    [   15.448697] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000d9e3e000
    [   15.455413] [ffff07ed00626d7c] pgd=1800000bffdf9003, p4d=1800000bffdf9003, pud=0000000000000000
    [   15.464146] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
    
    Commit 7269cba53d90 ("tee: optee: Fix supplicant based device enumeration")
    lead to the introduction of this bug. So fix it appropriately.
    
    Reported-by: Mikko Rapeli <[email protected]>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218542
    Fixes: 7269cba53d90 ("tee: optee: Fix supplicant based device enumeration")
    Cc: [email protected]
    Signed-off-by: Sumit Garg <[email protected]>
    Signed-off-by: Jens Wiklander <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

thermal: devfreq_cooling: Fix perf state when calculate dfc res_util [+ + +]

Author: Ye Zhang <[email protected]>
Date:   Thu Mar 21 18:21:00 2024 +0800

    thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
    
    commit a26de34b3c77ae3a969654d94be49e433c947e3b upstream.
    
    The issue occurs when the devfreq cooling device uses the EM power model
    and the get_real_power() callback is provided by the driver.
    
    The EM power table is sorted ascendingО╪▄can't index the table by cooling
    device stateО╪▄so convert cooling state to performance state by
    dfc->max_state - dfc->capped_state.
    
    Fixes: 615510fe13bd ("thermal: devfreq_cooling: remove old power model and use EM")
    Cc: 5.11+ <[email protected]> # 5.11+
    Signed-off-by: Ye Zhang <[email protected]>
    Reviewed-by: Dhruva Gole <[email protected]>
    Reviewed-by: Lukasz Luba <[email protected]>
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Lukasz Luba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

timers: Rename del_timer_sync() to timer_delete_sync() [+ + +]

Author: Thomas Gleixner <[email protected]>
Date:   Wed Nov 23 21:18:44 2022 +0100

    timers: Rename del_timer_sync() to timer_delete_sync()
    
    [ Upstream commit 9b13df3fb64ee95e2397585404e442afee2c7d4f ]
    
    The timer related functions do not have a strict timer_ prefixed namespace
    which is really annoying.
    
    Rename del_timer_sync() to timer_delete_sync() and provide del_timer_sync()
    as a wrapper. Document that del_timer_sync() is not for new code.
    
    Signed-off-by: Thomas Gleixner <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Reviewed-by: Steven Rostedt (Google) <[email protected]>
    Reviewed-by: Jacob Keller <[email protected]>
    Reviewed-by: Anna-Maria Behnsen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Stable-dep-of: 0f7352557a35 ("wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach")
    Signed-off-by: Sasha Levin <[email protected]>

timers: Update kernel-doc for various functions [+ + +]

Author: Thomas Gleixner <[email protected]>
Date:   Wed Nov 23 21:18:40 2022 +0100

    timers: Update kernel-doc for various functions
    
    [ Upstream commit 14f043f1340bf30bc60af127bff39f55889fef26 ]
    
    The kernel-doc of timer related functions is partially uncomprehensible
    word salad. Rewrite it to make it useful.
    
    Signed-off-by: Thomas Gleixner <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Reviewed-by: Jacob Keller <[email protected]>
    Reviewed-by: Anna-Maria Behnsen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Stable-dep-of: 0f7352557a35 ("wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach")
    Signed-off-by: Sasha Levin <[email protected]>

timers: Use del_timer_sync() even on UP [+ + +]

Author: Thomas Gleixner <[email protected]>
Date:   Wed Nov 23 21:18:42 2022 +0100

    timers: Use del_timer_sync() even on UP
    
    [ Upstream commit 168f6b6ffbeec0b9333f3582e4cf637300858db5 ]
    
    del_timer_sync() is assumed to be pointless on uniprocessor systems and can
    be mapped to del_timer() because in theory del_timer() can never be invoked
    while the timer callback function is executed.
    
    This is not entirely true because del_timer() can be invoked from interrupt
    context and therefore hit in the middle of a running timer callback.
    
    Contrary to that del_timer_sync() is not allowed to be invoked from
    interrupt context unless the affected timer is marked with TIMER_IRQSAFE.
    del_timer_sync() has proper checks in place to detect such a situation.
    
    Give up on the UP optimization and make del_timer_sync() unconditionally
    available.
    
    Co-developed-by: Steven Rostedt <[email protected]>
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Reviewed-by: Jacob Keller <[email protected]>
    Reviewed-by: Anna-Maria Behnsen <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]
    Link: https://lore.kernel.org/all/[email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Stable-dep-of: 0f7352557a35 ("wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach")
    Signed-off-by: Sasha Levin <[email protected]>

trace: Relocate event helper files [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Mon Nov 14 08:57:43 2022 -0500

    trace: Relocate event helper files
    
    [ Upstream commit 247c01ff5f8d66e62a404c91733be52fecb8b7f6 ]
    
    Steven Rostedt says:
    > The include/trace/events/ directory should only hold files that
    > are to create events, not headers that hold helper functions.
    >
    > Can you please move them out of include/trace/events/ as that
    > directory is "special" in the creation of events.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Acked-by: Leon Romanovsky <[email protected]>
    Acked-by: Steven Rostedt (Google) <[email protected]>
    Acked-by: Anna Schumaker <[email protected]>
    Stable-dep-of: 638593be55c0 ("NFSD: add CB_RECALL_ANY tracepoints")
    Signed-off-by: Chuck Lever <[email protected]>

tracing: Use .flush() call to wake up readers [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Fri Mar 8 15:24:05 2024 -0500

    tracing: Use .flush() call to wake up readers
    
    commit e5d7c1916562f0e856eb3d6f569629fcd535fed2 upstream.
    
    The .release() function does not get called until all readers of a file
    descriptor are finished.
    
    If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
    and another thread closes the file descriptor, it will not wake up the
    other thread as ring_buffer_wake_waiters() is called by .release(), and
    that will not get called until the .read() is finished.
    
    The issue originally showed up in trace-cmd, but the readers are actually
    other processes with their own file descriptors. So calling close() would wake
    up the other tasks because they are blocked on another descriptor then the
    one that was closed(). But there's other wake ups that solve that issue.
    
    When a thread is blocked on a read, it can still hang even when another
    thread closed its descriptor.
    
    This is what the .flush() callback is for. Have the .flush() wake up the
    readers.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mark Rutland <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: linke li <[email protected]>
    Cc: Rabin Vincent <[email protected]>
    Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled [+ + +]

Author: Sherry Sun <[email protected]>
Date:   Tue Mar 5 09:57:06 2024 +0800

    tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
    
    commit 74cb7e0355fae9641f825afa389d3fba3b617714 upstream.
    
    If the remote uart device is not connected or not enabled after booting
    up, the CTS line is high by default. At this time, if we enable the flow
    control when opening the device(for example, using Б─°stty -F /dev/ttyLP4
    crtsctsБ─² command), there will be a pending idle preamble(first writing 0
    and then writing 1 to UARTCTRL_TE will queue an idle preamble) that
    cannot be sent out, resulting in the uart port fail to close(waiting for
    TX empty), so the user space stty will have to wait for a long time or
    forever.
    
    This is an LPUART IP bug(idle preamble has higher priority than CTS),
    here add a workaround patch to enable TX CTS after enabling UARTCTRL_TE,
    so that the idle preamble does not get stuck due to CTS is deasserted.
    
    Fixes: 380c966c093e ("tty: serial: fsl_lpuart: add 32-bit register interface support")
    Cc: stable <[email protected]>
    Signed-off-by: Sherry Sun <[email protected]>
    Reviewed-by: Alexander Sverdlin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: serial: imx: Fix broken RS485 [+ + +]

Author: Rickard x Andersson <[email protected]>
Date:   Wed Feb 21 12:53:04 2024 +0100

    tty: serial: imx: Fix broken RS485
    
    commit 672448ccf9b6a676f96f9352cbf91f4d35f4084a upstream.
    
    When about to transmit the function imx_uart_start_tx is called and in
    some RS485 configurations this function will call imx_uart_stop_rx. The
    problem is that imx_uart_stop_rx will enable loopback in order to
    release the RS485 bus, but when loopback is enabled transmitted data
    will just be looped to RX.
    
    This patch fixes the above problem by not enabling loopback when about
    to transmit.
    
    This driver now works well when used for RS485 half duplex master
    configurations.
    
    Fixes: 79d0224f6bf2 ("tty: serial: imx: Handle RS485 DE signal active high")
    Cc: stable <[email protected]>
    Signed-off-by: Rickard x Andersson <[email protected]>
    Tested-by: Christoph Niedermaier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Christoph Niedermaier <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ubi: Check for too small LEB size in VTBL code [+ + +]

Author: Richard Weinberger <[email protected]>
Date:   Wed Jan 24 07:37:02 2024 +0100

    ubi: Check for too small LEB size in VTBL code
    
    [ Upstream commit 68a24aba7c593eafa8fd00f2f76407b9b32b47a9 ]
    
    If the LEB size is smaller than a volume table record we cannot
    have volumes.
    In this case abort attaching.
    
    Cc: Chenyuan Yang <[email protected]>
    Cc: [email protected]
    Fixes: 801c135ce73d ("UBI: Unsorted Block Images")
    Reported-by: Chenyuan Yang <[email protected]>
    Closes: https://lore.kernel.org/linux-mtd/[email protected]/
    Signed-off-by: Richard Weinberger <[email protected]>
    Reviewed-by: Zhihao Cheng <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ubi: correct the calculation of fastmap size [+ + +]

Author: Zhang Yi <[email protected]>
Date:   Tue Feb 20 10:49:03 2024 +0800

    ubi: correct the calculation of fastmap size
    
    [ Upstream commit 7f174ae4f39e8475adcc09d26c5a43394689ad6c ]
    
    Now that the calculation of fastmap size in ubi_calc_fm_size() is
    incorrect since it miss each user volume's ubi_fm_eba structure and the
    Internal UBI volume info. Let's correct the calculation.
    
    Cc: [email protected]
    Signed-off-by: Zhang Yi <[email protected]>
    Reviewed-by: Zhihao Cheng <[email protected]>
    Signed-off-by: Richard Weinberger <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ubifs: Set page uptodate in the correct place [+ + +]

Author: Matthew Wilcox (Oracle) <[email protected]>
Date:   Wed Jan 24 17:52:44 2024 +0000

    ubifs: Set page uptodate in the correct place
    
    [ Upstream commit 723012cab779eee8228376754e22c6594229bf8f ]
    
    Page cache reads are lockless, so setting the freshly allocated page
    uptodate before we've overwritten it with the data it's supposed to have
    in it will allow a simultaneous reader to see old data.  Move the call
    to SetPageUptodate into ubifs_write_end(), which is after we copied the
    new data into the page.
    
    Fixes: 1e51764a3c2a ("UBIFS: add new flash file system")
    Cc: [email protected]
    Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
    Reviewed-by: Zhihao Cheng <[email protected]>
    Signed-off-by: Richard Weinberger <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

udp: do not accept non-tunnel GSO skbs landing in a tunnel [+ + +]

Author: Antoine Tenart <[email protected]>
Date:   Tue Mar 26 12:33:58 2024 +0100

    udp: do not accept non-tunnel GSO skbs landing in a tunnel
    
    commit 3d010c8031e39f5fa1e8b13ada77e0321091011f upstream.
    
    When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
    being forwarded. If such packets might land in a tunnel this can cause
    various issues and udp_gro_receive makes sure this isn't the case by
    looking for a matching socket. This is performed in
    udp4/6_gro_lookup_skb but only in the current netns. This is an issue
    with tunneled packets when the endpoint is in another netns. In such
    cases the packets will be GROed at the UDP level, which leads to various
    issues later on. The same thing can happen with rx-gro-list.
    
    We saw this with geneve packets being GROed at the UDP level. In such
    case gso_size is set; later the packet goes through the geneve rx path,
    the geneve header is pulled, the offset are adjusted and frag_list skbs
    are not adjusted with regard to geneve. When those skbs hit
    skb_fragment, it will misbehave. Different outcomes are possible
    depending on what the GROed skbs look like; from corrupted packets to
    kernel crashes.
    
    One example is a BUG_ON[1] triggered in skb_segment while processing the
    frag_list. Because gso_size is wrong (geneve header was pulled)
    skb_segment thinks there is "geneve header size" of data in frag_list,
    although it's in fact the next packet. The BUG_ON itself has nothing to
    do with the issue. This is only one of the potential issues.
    
    Looking up for a matching socket in udp_gro_receive is fragile: the
    lookup could be extended to all netns (not speaking about performances)
    but nothing prevents those packets from being modified in between and we
    could still not find a matching socket. It's OK to keep the current
    logic there as it should cover most cases but we also need to make sure
    we handle tunnel packets being GROed too early.
    
    This is done by extending the checks in udp_unexpected_gso: GSO packets
    lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
    be segmented.
    
    [1] kernel BUG at net/core/skbuff.c:4408!
        RIP: 0010:skb_segment+0xd2a/0xf70
        __udp_gso_segment+0xaa/0x560
    
    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
    Signed-off-by: Antoine Tenart <[email protected]>
    Reviewed-by: Willem de Bruijn <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

udp: do not transition UDP GRO fraglist partial checksums to unnecessary [+ + +]

Author: Antoine Tenart <[email protected]>
Date:   Tue Mar 26 12:34:00 2024 +0100

    udp: do not transition UDP GRO fraglist partial checksums to unnecessary
    
    commit f0b8c30345565344df2e33a8417a27503589247d upstream.
    
    UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets
    are converted to CHECKSUM_UNNECESSARY to avoid later checks. However
    this is an issue for CHECKSUM_PARTIAL packets as they can be looped in
    an egress path and then their partial checksums are not fixed.
    
    Different issues can be observed, from invalid checksum on packets to
    traces like:
    
      gen01: hw csum failure
      skb len=3008 headroom=160 headlen=1376 tailroom=0
      mac=(106,14) net=(120,40) trans=160
      shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
      csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0)
      hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12
      ...
    
    Fix this by only converting CHECKSUM_NONE packets to
    CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All
    other checksum types are kept as-is, including CHECKSUM_COMPLETE as
    fraglist packets being segmented back would have their skb->csum valid.
    
    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Signed-off-by: Antoine Tenart <[email protected]>
    Reviewed-by: Willem de Bruijn <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

udp: prevent local UDP tunnel packets from being GROed [+ + +]

Author: Antoine Tenart <[email protected]>
Date:   Tue Mar 26 12:34:01 2024 +0100

    udp: prevent local UDP tunnel packets from being GROed
    
    commit 64235eabc4b5b18c507c08a1f16cdac6c5661220 upstream.
    
    GRO has a fundamental issue with UDP tunnel packets as it can't detect
    those in a foolproof way and GRO could happen before they reach the
    tunnel endpoint. Previous commits have fixed issues when UDP tunnel
    packets come from a remote host, but if those packets are issued locally
    they could run into checksum issues.
    
    If the inner packet has a partial checksum the information will be lost
    in the GRO logic, either in udp4/6_gro_complete or in
    udp_gro_complete_segment and packets will have an invalid checksum when
    leaving the host.
    
    Prevent local UDP tunnel packets from ever being GROed at the outer UDP
    level.
    
    Due to skb->encapsulation being wrongly used in some drivers this is
    actually only preventing UDP tunnel packets with a partial checksum to
    be GROed (see iptunnel_handle_offloads) but those were also the packets
    triggering issues so in practice this should be sufficient.
    
    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
    Suggested-by: Paolo Abeni <[email protected]>
    Signed-off-by: Antoine Tenart <[email protected]>
    Reviewed-by: Willem de Bruijn <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: cdc-wdm: close race between read and workqueue [+ + +]

Author: Oliver Neukum <[email protected]>
Date:   Thu Mar 14 12:50:48 2024 +0100

    usb: cdc-wdm: close race between read and workqueue
    
    commit 339f83612f3a569b194680768b22bf113c26a29d upstream.
    
    wdm_read() cannot race with itself. However, in
    service_outstanding_interrupt() it can race with the
    workqueue, which can be triggered by error handling.
    
    Hence we need to make sure that the WDM_RESPONDING
    flag is not just only set but tested.
    
    Fixes: afba937e540c9 ("USB: CDC WDM driver")
    Cc: stable <[email protected]>
    Signed-off-by: Oliver Neukum <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: core: Add hub_get() and hub_put() routines [+ + +]

Author: Alan Stern <[email protected]>
Date:   Fri Mar 15 13:04:50 2024 -0400

    USB: core: Add hub_get() and hub_put() routines
    
    commit ee113b860aa169e9a4d2c167c95d0f1961c6e1b8 upstream.
    
    Create hub_get() and hub_put() routines to encapsulate the kref_get()
    and kref_put() calls in hub.c.  The new routines will be used by the
    next patch in this series.
    
    Signed-off-by: Alan Stern <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: stable <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: core: Fix deadlock in usb_deauthorize_interface() [+ + +]

Author: Alan Stern <[email protected]>
Date:   Tue Mar 12 11:48:23 2024 -0400

    USB: core: Fix deadlock in usb_deauthorize_interface()
    
    commit 80ba43e9f799cbdd83842fc27db667289b3150f5 upstream.
    
    Among the attribute file callback routines in
    drivers/usb/core/sysfs.c, the interface_authorized_store() function is
    the only one which acquires a device lock on an ancestor device: It
    calls usb_deauthorize_interface(), which locks the interface's parent
    USB device.
    
    The will lead to deadlock if another process already owns that lock
    and tries to remove the interface, whether through a configuration
    change or because the device has been disconnected.  As part of the
    removal procedure, device_del() waits for all ongoing sysfs attribute
    callbacks to complete.  But usb_deauthorize_interface() can't complete
    until the device lock has been released, and the lock won't be
    released until the removal has finished.
    
    The mechanism provided by sysfs to prevent this kind of deadlock is
    to use the sysfs_break_active_protection() function, which tells sysfs
    not to wait for the attribute callback.
    
    Reported-and-tested by: Yue Sun <[email protected]>
    Reported by: xingwei lee <[email protected]>
    
    Signed-off-by: Alan Stern <[email protected]>
    Link: https://lore.kernel.org/linux-usb/CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNOOHocDkPoAA@mail.gmail.com/#r
    Fixes: 310d2b4124c0 ("usb: interface authorization: SysFS part of USB interface authorization")
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc2: gadget: Fix exiting from clock gating [+ + +]

Author: Minas Harutyunyan <[email protected]>
Date:   Wed Mar 13 09:22:01 2024 +0000

    usb: dwc2: gadget: Fix exiting from clock gating
    
    commit 31f42da31417bec88158f3cf62d19db836217f1e upstream.
    
    Added exiting from the clock gating mode on USB Reset Detect interrupt
    if core in the clock gating mode.
    Added new condition to check core in clock gating mode or no.
    
    Fixes: 9b4965d77e11 ("usb: dwc2: Add exit clock gating from session request interrupt")
    Fixes: 5d240efddc7f ("usb: dwc2: Add exit clock gating from wakeup interrupt")
    Fixes: 16c729f90bdf ("usb: dwc2: Allow exit clock gating in urb enqueue")
    Fixes: 401411bbc4e6 ("usb: dwc2: Add exit clock gating before removing driver")
    CC: [email protected]
    Signed-off-by: Minas Harutyunyan <[email protected]>
    Link: https://lore.kernel.org/r/cbcc2ccd37e89e339130797ed68ae4597db773ac.1708938774.git.Minas.Harutyunyan@synopsys.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc2: gadget: LPM flow fix [+ + +]

Author: Minas Harutyunyan <[email protected]>
Date:   Wed Mar 13 09:22:13 2024 +0000

    usb: dwc2: gadget: LPM flow fix
    
    commit 5d69a3b54e5a630c90d82a4c2bdce3d53dc78710 upstream.
    
    Added functionality to exit from L1 state by device initiation
    using remote wakeup signaling, in case when function driver queuing
    request while core in L1 state.
    
    Fixes: 273d576c4d41 ("usb: dwc2: gadget: Add functionality to exit from LPM L1 state")
    Fixes: 88b02f2cb1e1 ("usb: dwc2: Add core state checking")
    CC: [email protected]
    Signed-off-by: Minas Harutyunyan <[email protected]>
    Link: https://lore.kernel.org/r/b4d9de5382375dddbf7ef6049d9a82066ad87d5d.1710166393.git.Minas.Harutyunyan@synopsys.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc2: host: Fix hibernation flow [+ + +]

Author: Minas Harutyunyan <[email protected]>
Date:   Wed Mar 13 09:21:11 2024 +0000

    usb: dwc2: host: Fix hibernation flow
    
    commit 3c7b9856a82227db01a20171d2e24c7ce305d59b upstream.
    
    Added to backup/restore registers HFLBADDR, HCCHARi, HCSPLTi,
    HCTSIZi, HCDMAi and HCDMABi.
    
    Fixes: 58e52ff6a6c3 ("usb: dwc2: Move register save and restore functions")
    Fixes: d17ee77b3044 ("usb: dwc2: add controller hibernation support")
    CC: [email protected]
    Signed-off-by: Minas Harutyunyan <[email protected]>
    Link: https://lore.kernel.org/r/c2d10ee6098b9b009a8e94191e046004747d3bdd.1708945444.git.Minas.Harutyunyan@synopsys.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc2: host: Fix ISOC flow in DDMA mode [+ + +]

Author: Minas Harutyunyan <[email protected]>
Date:   Wed Mar 13 09:21:32 2024 +0000

    usb: dwc2: host: Fix ISOC flow in DDMA mode
    
    commit b258e42688501cadb1a6dd658d6f015df9f32d8f upstream.
    
    Fixed ISOC completion flow in DDMA mode. Added isoc
    descriptor actual length value and update urb's start_frame
    value.
    Fixed initialization of ISOC DMA descriptors flow.
    
    Fixes: 56f5b1cff22a ("staging: Core files for the DWC2 driver")
    Fixes: 20f2eb9c4cf8 ("staging: dwc2: add microframe scheduler from downstream Pi kernel")
    Fixes: c17b337c1ea4 ("usb: dwc2: host: program descriptor for next frame")
    Fixes: dc4c76e7b22c ("staging: HCD descriptor DMA support for the DWC2 driver")
    Fixes: 762d3a1a9cd7 ("usb: dwc2: host: process all completed urbs")
    CC: [email protected]
    Signed-off-by: Minas Harutyunyan <[email protected]>
    Link: https://lore.kernel.org/r/a8b1e1711cc6cabfb45d92ede12e35445c66f06c.1708944698.git.Minas.Harutyunyan@synopsys.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc2: host: Fix remote wakeup from hibernation [+ + +]

Author: Minas Harutyunyan <[email protected]>
Date:   Wed Mar 13 09:21:21 2024 +0000

    usb: dwc2: host: Fix remote wakeup from hibernation
    
    commit bae2bc73a59c200db53b6c15fb26bb758e2c6108 upstream.
    
    Starting from core v4.30a changed order of programming
    GPWRDN_PMUACTV to 0 in case of exit from hibernation on
    remote wakeup signaling from device.
    
    Fixes: c5c403dc4336 ("usb: dwc2: Add host/device hibernation functions")
    CC: [email protected]
    Signed-off-by: Minas Harutyunyan <[email protected]>
    Link: https://lore.kernel.org/r/99385ec55ce73445b6fbd0f471c9bd40eb1c9b9e.1708939799.git.Minas.Harutyunyan@synopsys.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: ncm: Fix handling of zero block length packets [+ + +]

Author: Krishna Kurapati <[email protected]>
Date:   Wed Feb 28 17:24:41 2024 +0530

    usb: gadget: ncm: Fix handling of zero block length packets
    
    commit f90ce1e04cbcc76639d6cba0fdbd820cd80b3c70 upstream.
    
    While connecting to a Linux host with CDC_NCM_NTB_DEF_SIZE_TX
    set to 65536, it has been observed that we receive short packets,
    which come at interval of 5-10 seconds sometimes and have block
    length zero but still contain 1-2 valid datagrams present.
    
    According to the NCM spec:
    
    "If wBlockLength = 0x0000, the block is terminated by a
    short packet. In this case, the USB transfer must still
    be shorter than dwNtbInMaxSize or dwNtbOutMaxSize. If
    exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent,
    and the size is a multiple of wMaxPacketSize for the
    given pipe, then no ZLP shall be sent.
    
    wBlockLength= 0x0000 must be used with extreme care, because
    of the possibility that the host and device may get out of
    sync, and because of test issues.
    
    wBlockLength = 0x0000 allows the sender to reduce latency by
    starting to send a very large NTB, and then shortening it when
    the sender discovers that thereБ─≥s not sufficient data to justify
    sending a large NTB"
    
    However, there is a potential issue with the current implementation,
    as it checks for the occurrence of multiple NTBs in a single
    giveback by verifying if the leftover bytes to be processed is zero
    or not. If the block length reads zero, we would process the same
    NTB infintely because the leftover bytes is never zero and it leads
    to a crash. Fix this by bailing out if block length reads zero.
    
    Cc: [email protected]
    Fixes: 427694cfaafa ("usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call")
    Signed-off-by: Krishna Kurapati <[email protected]>
    Reviewed-by: Maciej е╩enczykowski <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic [+ + +]

Author: Wayne Chang <[email protected]>
Date:   Thu Mar 7 11:03:28 2024 +0800

    usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic
    
    [ Upstream commit 84fa943d93c31ee978355e6c6c69592dae3c9f59 ]
    
    This commit resolves an issue in the tegra-xudc USB gadget driver that
    incorrectly fetched USB3 PHY instances. The problem stemmed from the
    assumption of a one-to-one correspondence between USB2 and USB3 PHY
    names and their association with physical USB ports in the device tree.
    
    Previously, the driver associated USB3 PHY names directly with the USB3
    instance number, leading to mismatches when mapping the physical USB
    ports. For instance, if using USB3-1 PHY, the driver expect the
    corresponding PHY name as 'usb3-1'. However, the physical USB ports in
    the device tree were designated as USB2-0 and USB3-0 as we only have
    one device controller, causing a misalignment.
    
    This commit rectifies the issue by adjusting the PHY naming logic.
    Now, the driver correctly correlates the USB2 and USB3 PHY instances,
    allowing the USB2-0 and USB3-1 PHYs to form a physical USB port pair
    while accurately reflecting their configuration in the device tree by
    naming them USB2-0 and USB3-0, respectively.
    
    The change ensures that the PHY and PHY names align appropriately,
    resolving the mismatch between physical USB ports and their associated
    names in the device tree.
    
    Fixes: b4e19931c98a ("usb: gadget: tegra-xudc: Support multiple device modes")
    Cc: [email protected]
    Signed-off-by: Wayne Chang <[email protected]>
    Reviewed-by: Jon Hunter <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

usb: port: Don't try to peer unused USB ports based on location [+ + +]

Author: Mathias Nyman <[email protected]>
Date:   Fri Feb 23 01:33:43 2024 +0200

    usb: port: Don't try to peer unused USB ports based on location
    
    commit 69c63350e573367f9c8594162288cffa8a26d0d1 upstream.
    
    Unused USB ports may have bogus location data in ACPI PLD tables.
    This causes port peering failures as these unused USB2 and USB3 ports
    location may match.
    
    Due to these failures the driver prints a
    "usb: port power management may be unreliable" warning, and
    unnecessarily blocks port power off during runtime suspend.
    
    This was debugged on a couple DELL systems where the unused ports
    all returned zeroes in their location data.
    Similar bugreports exist for other systems.
    
    Don't try to peer or match ports that have connect type set to
    USB_PORT_NOT_USED.
    
    Fixes: 3bfd659baec8 ("usb: find internal hub tier mismatch via acpi")
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218465
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218486
    Tested-by: Paul Menzel <[email protected]>
    Link: https://lore.kernel.org/linux-usb/[email protected]
    Cc: [email protected] # v3.16+
    Signed-off-by: Mathias Nyman <[email protected]>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218490
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: serial: add device ID for VeriFone adapter [+ + +]

Author: Cameron Williams <[email protected]>
Date:   Tue Feb 13 21:53:29 2024 +0000

    USB: serial: add device ID for VeriFone adapter
    
    [ Upstream commit cda704809797a8a86284f9df3eef5e62ec8a3175 ]
    
    Add device ID for a (probably fake) CP2102 UART device.
    
    lsusb -v output:
    
    Device Descriptor:
      bLength                18
      bDescriptorType         1
      bcdUSB               1.10
      bDeviceClass            0 [unknown]
      bDeviceSubClass         0 [unknown]
      bDeviceProtocol         0
      bMaxPacketSize0        64
      idVendor           0x11ca VeriFone Inc
      idProduct          0x0212 Verifone USB to Printer
      bcdDevice            1.00
      iManufacturer           1 Silicon Labs
      iProduct                2 Verifone USB to Printer
      iSerial                 3 0001
      bNumConfigurations      1
      Configuration Descriptor:
        bLength                 9
        bDescriptorType         2
        wTotalLength       0x0020
        bNumInterfaces          1
        bConfigurationValue     1
        iConfiguration          0
        bmAttributes         0x80
          (Bus Powered)
        MaxPower              100mA
        Interface Descriptor:
          bLength                 9
          bDescriptorType         4
          bInterfaceNumber        0
          bAlternateSetting       0
          bNumEndpoints           2
          bInterfaceClass       255 Vendor Specific Class
          bInterfaceSubClass      0 [unknown]
          bInterfaceProtocol      0
          iInterface              2 Verifone USB to Printer
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x81  EP 1 IN
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0040  1x 64 bytes
            bInterval               0
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x01  EP 1 OUT
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0040  1x 64 bytes
            bInterval               0
    Device Status:     0x0000
      (Bus Powered)
    
    Signed-off-by: Cameron Williams <[email protected]>
    Cc: [email protected]
    Signed-off-by: Johan Hovold <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

USB: serial: cp210x: add ID for MGP Instruments PDS100 [+ + +]

Author: Christian Hц╓ggstrц╤m <[email protected]>
Date:   Wed Feb 14 11:47:29 2024 +0100

    USB: serial: cp210x: add ID for MGP Instruments PDS100
    
    [ Upstream commit a0d9d868491a362d421521499d98308c8e3a0398 ]
    
    The radiation meter has the text MGP Instruments PDS-100G or PDS-100GN
    produced by Mirion Technologies. Tested by forcing the driver
    association with
    
      echo 10c4 863c > /sys/bus/usb-serial/drivers/cp210x/new_id
    
    and then setting the serial port in 115200 8N1 mode. The device
    announces ID_USB_VENDOR_ENC=Silicon\x20Labs and ID_USB_MODEL_ENC=PDS100
    
    Signed-off-by: Christian Hц╓ggstrц╤m <[email protected]>
    Cc: [email protected]
    Signed-off-by: Johan Hovold <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M [+ + +]

Author: Toru Katagiri <[email protected]>
Date:   Tue Mar 5 08:46:14 2024 +0900

    USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
    
    [ Upstream commit b1a8da9ff1395c4879b4bd41e55733d944f3d613 ]
    
    TDK NC0110013M and MM0110113M have custom USB IDs for CP210x,
    so we need to add them to the driver.
    
    Signed-off-by: Toru Katagiri <[email protected]>
    Cc: [email protected]
    Signed-off-by: Johan Hovold <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB [+ + +]

Author: Daniel Vogelbacher <[email protected]>
Date:   Sun Feb 11 15:42:46 2024 +0100

    USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
    
    [ Upstream commit 3fb7bc4f3a98c48981318b87cf553c5f115fd5ca ]
    
    The GMC IR-USB adapter cable utilizes a FTDI FT232R chip.
    
    Add VID/PID for this adapter so it can be used as serial device via
    ftdi_sio.
    
    Signed-off-by: Daniel Vogelbacher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Johan Hovold <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

USB: serial: option: add MeiG Smart SLM320 product [+ + +]

Author: Aurц╘lien Jacobs <[email protected]>
Date:   Wed Jan 31 18:49:17 2024 +0100

    USB: serial: option: add MeiG Smart SLM320 product
    
    [ Upstream commit 46809c51565b83881aede6cdf3b0d25254966a41 ]
    
    Update the USB serial option driver to support MeiG Smart SLM320.
    
    ID 2dee:4d41 UNISOC UNISOC-8910
    
    T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 9 Spd=480 MxCh= 0
    D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
    P: Vendor=2dee ProdID=4d41 Rev=00.00
    S: Manufacturer=UNISOC
    S: Product=UNISOC-8910
    C: #Ifs= 8 Cfg#= 1 Atr=e0 MxPwr=400mA
    I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I: If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    
    Tested successfully a PPP LTE connection using If#= 0.
    Not sure of the purpose of every other serial interfaces.
    
    Signed-off-by: Aurц╘lien Jacobs <[email protected]>
    Cc: [email protected]
    Signed-off-by: Johan Hovold <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

usb: typec: ucsi: Ack unsupported commands [+ + +]

Author: Christian A. Ehrhardt <[email protected]>
Date:   Wed Mar 20 08:39:24 2024 +0100

    usb: typec: ucsi: Ack unsupported commands
    
    commit 6b5c85ddeea77d18c4b69e3bda60e9374a20c304 upstream.
    
    If a command completes the OPM must send an ack. This applies
    to unsupported commands, too.
    
    Send the required ACK for unsupported commands.
    
    Signed-off-by: Christian A. Ehrhardt <[email protected]>
    Cc: stable <[email protected]>
    Reviewed-by: Heikki Krogerus <[email protected]>
    Tested-by: Neil Armstrong <[email protected]> # on SM8550-QRD
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: typec: ucsi: Clean up UCSI_CABLE_PROP macros [+ + +]

Author: Jameson Thies <[email protected]>
Date:   Tue Mar 5 02:58:01 2024 +0000

    usb: typec: ucsi: Clean up UCSI_CABLE_PROP macros
    
    [ Upstream commit 4d0a5a9915793377c0fe1a8d78de6bcd92cea963 ]
    
    Clean up UCSI_CABLE_PROP macros by fixing a bitmask shifting error for
    plug type and updating the modal support macro for consistent naming.
    
    Fixes: 3cf657f07918 ("usb: typec: ucsi: Remove all bit-fields")
    Cc: [email protected]
    Reviewed-by: Benson Leung <[email protected]>
    Reviewed-by: Prashant Malani <[email protected]>
    Reviewed-by: Dmitry Baryshkov <[email protected]>
    Signed-off-by: Jameson Thies <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset [+ + +]

Author: Christian A. Ehrhardt <[email protected]>
Date:   Wed Mar 20 08:39:26 2024 +0100

    usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
    
    commit 3de4f996a0b5412aa451729008130a488f71563e upstream.
    
    Check the UCSI_CCI_RESET_COMPLETE complete flag before starting
    another reset. Use a UCSI_SET_NOTIFICATION_ENABLE command to clear
    the flag if it is set.
    
    Signed-off-by: Christian A. Ehrhardt <[email protected]>
    Cc: stable <[email protected]>
    Reviewed-by: Heikki Krogerus <[email protected]>
    Tested-by: Neil Armstrong <[email protected]> # on SM8550-QRD
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: UAS: return ENODEV when submit urbs fail with device not attached [+ + +]

Author: Weitao Wang <[email protected]>
Date:   Thu Mar 7 02:08:14 2024 +0800

    USB: UAS: return ENODEV when submit urbs fail with device not attached
    
    [ Upstream commit cd5432c712351a3d5f82512908f5febfca946ca6 ]
    
    In the scenario of entering hibernation with udisk in the system, if the
    udisk was gone or resume fail in the thaw phase of hibernation. Its state
    will be set to NOTATTACHED. At this point, usb_hub_wq was already freezed
    and can't not handle disconnect event. Next, in the poweroff phase of
    hibernation, SYNCHRONIZE_CACHE SCSI command will be sent to this udisk
    when poweroff this scsi device, which will cause uas_submit_urbs to be
    called to submit URB for sense/data/cmd pipe. However, these URBs will
    submit fail as device was set to NOTATTACHED state. Then, uas_submit_urbs
    will return a value SCSI_MLQUEUE_DEVICE_BUSY to the caller. That will lead
    the SCSI layer go into an ugly loop and system fail to go into hibernation.
    
    On the other hand, when we specially check for -ENODEV in function
    uas_queuecommand_lck, returning DID_ERROR to SCSI layer will cause device
    poweroff fail and system shutdown instead of entering hibernation.
    
    To fix this issue, let uas_submit_urbs to return original generic error
    when submitting URB failed. At the same time, we need to translate -ENODEV
    to DID_NOT_CONNECT for the SCSI layer.
    
    Suggested-by: Oliver Neukum <[email protected]>
    Cc: [email protected]
    Signed-off-by: Weitao Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

usb: udc: remove warning when queue disabled ep [+ + +]

Author: yuan linyu <[email protected]>
Date:   Fri Mar 15 10:01:44 2024 +0800

    usb: udc: remove warning when queue disabled ep
    
    commit 2a587a035214fa1b5ef598aea0b81848c5b72e5e upstream.
    
    It is possible trigger below warning message from mass storage function,
    
    WARNING: CPU: 6 PID: 3839 at drivers/usb/gadget/udc/core.c:294 usb_ep_queue+0x7c/0x104
    pc : usb_ep_queue+0x7c/0x104
    lr : fsg_main_thread+0x494/0x1b3c
    
    Root cause is mass storage function try to queue request from main thread,
    but other thread may already disable ep when function disable.
    
    As there is no function failure in the driver, in order to avoid effort
    to fix warning, change WARN_ON_ONCE() in usb_ep_queue() to pr_debug().
    
    Suggested-by: Alan Stern <[email protected]>
    Cc: [email protected]
    Signed-off-by: yuan linyu <[email protected]>
    Reviewed-by: Alan Stern <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command [+ + +]

Author: Alan Stern <[email protected]>
Date:   Thu Feb 29 14:30:06 2024 -0500

    USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
    
    commit 014bcf41d946b36a8f0b8e9b5d9529efbb822f49 upstream.
    
    The isd200 sub-driver in usb-storage uses the HEADS and SECTORS values
    in the ATA ID information to calculate cylinder and head values when
    creating a CDB for READ or WRITE commands.  The calculation involves
    division and modulus operations, which will cause a crash if either of
    these values is 0.  While this never happens with a genuine device, it
    could happen with a flawed or subversive emulation, as reported by the
    syzbot fuzzer.
    
    Protect against this possibility by refusing to bind to the device if
    either the ATA_ID_HEADS or ATA_ID_SECTORS value in the device's ID
    information is 0.  This requires isd200_Initialization() to return a
    negative error code when initialization fails; currently it always
    returns 0 (even when there is an error).
    
    Signed-off-by: Alan Stern <[email protected]>
    Reported-and-tested-by: [email protected]
    Link: https://lore.kernel.org/linux-usb/[email protected]/
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: [email protected]
    Reviewed-by: PrasannaKumar Muralidharan <[email protected]>
    Reviewed-by: Martin K. Petersen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: xhci: Add error handling in xhci_map_urb_for_dma [+ + +]

Author: Prashanth K <[email protected]>
Date:   Thu Feb 29 16:14:38 2024 +0200

    usb: xhci: Add error handling in xhci_map_urb_for_dma
    
    [ Upstream commit be95cc6d71dfd0cba66e3621c65413321b398052 ]
    
    Currently xhci_map_urb_for_dma() creates a temporary buffer and copies
    the SG list to the new linear buffer. But if the kzalloc_node() fails,
    then the following sg_pcopy_to_buffer() can lead to crash since it
    tries to memcpy to NULL pointer.
    
    So return -ENOMEM if kzalloc returns null pointer.
    
    Cc: [email protected] # 5.11
    Fixes: 2017a1e58472 ("usb: xhci: Use temporary buffer to consolidate SG")
    Signed-off-by: Prashanth K <[email protected]>
    Signed-off-by: Mathias Nyman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

vboxsf: Avoid an spurious warning if load_nls_xxx() fails [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Wed Nov 1 11:49:48 2023 +0100

    vboxsf: Avoid an spurious warning if load_nls_xxx() fails
    
    commit de3f64b738af57e2732b91a0774facc675b75b54 upstream.
    
    If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is
    still 0.
    So, in the error handling path, we will call ida_simple_remove(..., 0)
    which is not allocated yet.
    
    In order to prevent a spurious "ida_free called for id=0 which is not
    allocated." message, tweak the error handling path and add a new label.
    
    Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
    Signed-off-by: Christophe JAILLET <[email protected]>
    Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.1698835730.git.christophe.jaillet@wanadoo.fr
    Reviewed-by: Hans de Goede <[email protected]>
    Signed-off-by: Hans de Goede <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vfio/fsl-mc: Block calling interrupt handler without trigger [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 29 16:59:42 2024 -0600

    vfio/fsl-mc: Block calling interrupt handler without trigger
    
    [ Upstream commit 7447d911af699a15f8d050dfcb7c680a86f87012 ]
    
    The eventfd_ctx trigger pointer of the vfio_fsl_mc_irq object is
    initially NULL and may become NULL if the user sets the trigger
    eventfd to -1.  The interrupt handler itself is guaranteed that
    trigger is always valid between request_irq() and free_irq(), but
    the loopback testing mechanisms to invoke the handler function
    need to test the trigger.  The triggering and setting ioctl paths
    both make use of igate and are therefore mutually exclusive.
    
    The vfio-fsl-mc driver does not make use of irqfds, nor does it
    support any sort of masking operations, therefore unlike vfio-pci
    and vfio-platform, the flow can remain essentially unchanged.
    
    Cc: Diana Craciun <[email protected]>
    Cc:  <[email protected]>
    Fixes: cc0ee20bd969 ("vfio/fsl-mc: trigger an interrupt via eventfd")
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vfio/pci: Create persistent INTx handler [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 29 16:59:40 2024 -0600

    vfio/pci: Create persistent INTx handler
    
    [ Upstream commit 18c198c96a815c962adc2b9b77909eec0be7df4d ]
    
    A vulnerability exists where the eventfd for INTx signaling can be
    deconfigured, which unregisters the IRQ handler but still allows
    eventfds to be signaled with a NULL context through the SET_IRQS ioctl
    or through unmask irqfd if the device interrupt is pending.
    
    Ideally this could be solved with some additional locking; the igate
    mutex serializes the ioctl and config space accesses, and the interrupt
    handler is unregistered relative to the trigger, but the irqfd path
    runs asynchronous to those.  The igate mutex cannot be acquired from the
    atomic context of the eventfd wake function.  Disabling the irqfd
    relative to the eventfd registration is potentially incompatible with
    existing userspace.
    
    As a result, the solution implemented here moves configuration of the
    INTx interrupt handler to track the lifetime of the INTx context object
    and irq_type configuration, rather than registration of a particular
    trigger eventfd.  Synchronization is added between the ioctl path and
    eventfd_signal() wrapper such that the eventfd trigger can be
    dynamically updated relative to in-flight interrupts or irqfd callbacks.
    
    Cc:  <[email protected]>
    Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
    Reported-by: Reinette Chatre <[email protected]>
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Reinette Chatre <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vfio/pci: Disable auto-enable of exclusive INTx IRQ [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 29 16:59:37 2024 -0600

    vfio/pci: Disable auto-enable of exclusive INTx IRQ
    
    [ Upstream commit fe9a7082684eb059b925c535682e68c34d487d43 ]
    
    Currently for devices requiring masking at the irqchip for INTx, ie.
    devices without DisINTx support, the IRQ is enabled in request_irq()
    and subsequently disabled as necessary to align with the masked status
    flag.  This presents a window where the interrupt could fire between
    these events, resulting in the IRQ incrementing the disable depth twice.
    This would be unrecoverable for a user since the masked flag prevents
    nested enables through vfio.
    
    Instead, invert the logic using IRQF_NO_AUTOEN such that exclusive INTx
    is never auto-enabled, then unmask as required.
    
    Cc:  <[email protected]>
    Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vfio/pci: Lock external INTx masking ops [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 29 16:59:38 2024 -0600

    vfio/pci: Lock external INTx masking ops
    
    [ Upstream commit 810cd4bb53456d0503cc4e7934e063835152c1b7 ]
    
    Mask operations through config space changes to DisINTx may race INTx
    configuration changes via ioctl.  Create wrappers that add locking for
    paths outside of the core interrupt code.
    
    In particular, irq_type is updated holding igate, therefore testing
    is_intx() requires holding igate.  For example clearing DisINTx from
    config space can otherwise race changes of the interrupt configuration.
    
    This aligns interfaces which may trigger the INTx eventfd into two
    camps, one side serialized by igate and the other only enabled while
    INTx is configured.  A subsequent patch introduces synchronization for
    the latter flows.
    
    Cc:  <[email protected]>
    Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
    Reported-by: Reinette Chatre <[email protected]>
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Reinette Chatre <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vfio/platform: Create persistent IRQ handlers [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 29 16:59:41 2024 -0600

    vfio/platform: Create persistent IRQ handlers
    
    [ Upstream commit 675daf435e9f8e5a5eab140a9864dfad6668b375 ]
    
    The vfio-platform SET_IRQS ioctl currently allows loopback triggering of
    an interrupt before a signaling eventfd has been configured by the user,
    which thereby allows a NULL pointer dereference.
    
    Rather than register the IRQ relative to a valid trigger, register all
    IRQs in a disabled state in the device open path.  This allows mask
    operations on the IRQ to nest within the overall enable state governed
    by a valid eventfd signal.  This decouples @masked, protected by the
    @locked spinlock from @trigger, protected via the @igate mutex.
    
    In doing so, it's guaranteed that changes to @trigger cannot race the
    IRQ handlers because the IRQ handler is synchronously disabled before
    modifying the trigger, and loopback triggering of the IRQ via ioctl is
    safe due to serialization with trigger changes via igate.
    
    For compatibility, request_irq() failures are maintained to be local to
    the SET_IRQS ioctl rather than a fatal error in the open device path.
    This allows, for example, a userspace driver with polling mode support
    to continue to work regardless of moving the request_irq() call site.
    This necessarily blocks all SET_IRQS access to the failed index.
    
    Cc: Eric Auger <[email protected]>
    Cc:  <[email protected]>
    Fixes: 57f972e2b341 ("vfio/platform: trigger an interrupt via eventfd")
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vfio/platform: Disable virqfds on cleanup [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 8 16:05:26 2024 -0700

    vfio/platform: Disable virqfds on cleanup
    
    [ Upstream commit fcdc0d3d40bc26c105acf8467f7d9018970944ae ]
    
    irqfds for mask and unmask that are not specifically disabled by the
    user are leaked.  Remove any irqfds during cleanup
    
    Cc: Eric Auger <[email protected]>
    Cc:  <[email protected]>
    Fixes: a7fa7c77cf15 ("vfio/platform: implement IRQ masking/unmasking via an eventfd")
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

vfio: Introduce interface to flush virqfd inject workqueue [+ + +]

Author: Alex Williamson <[email protected]>
Date:   Fri Mar 29 16:59:39 2024 -0600

    vfio: Introduce interface to flush virqfd inject workqueue
    
    [ Upstream commit b620ecbd17a03cacd06f014a5d3f3a11285ce053 ]
    
    In order to synchronize changes that can affect the thread callback,
    introduce an interface to force a flush of the inject workqueue.  The
    irqfd pointer is only valid under spinlock, but the workqueue cannot
    be flushed under spinlock.  Therefore the flush work for the irqfd is
    queued under spinlock.  The vfio_irqfd_cleanup_wq workqueue is re-used
    for queuing this work such that flushing the workqueue is also ordered
    relative to shutdown.
    
    Reviewed-by: Kevin Tian <[email protected]>
    Reviewed-by: Reinette Chatre <[email protected]>
    Reviewed-by: Eric Auger <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alex Williamson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vt: fix unicode buffer corruption when deleting characters [+ + +]

Author: Nicolas Pitre <[email protected]>
Date:   Thu Feb 29 17:15:27 2024 -0500

    vt: fix unicode buffer corruption when deleting characters
    
    commit 1581dafaf0d34bc9c428a794a22110d7046d186d upstream.
    
    This is the same issue that was fixed for the VGA text buffer in commit
    39cdb68c64d8 ("vt: fix memory overlapping when deleting chars in the
    buffer"). The cure is also the same i.e. replace memcpy() with memmove()
    due to the overlaping buffers.
    
    Signed-off-by: Nicolas Pitre <[email protected]>
    Fixes: 81732c3b2fed ("tty vt: Fix line garbage in virtual console on command line edition")
    Cc: stable <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach [+ + +]

Author: Zheng Wang <[email protected]>
Date:   Sun Jan 7 08:25:04 2024 +0100

    wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
    
    [ Upstream commit 0f7352557a35ab7888bc7831411ec8a3cbe20d78 ]
    
    This is the candidate patch of CVE-2023-47233 :
    https://nvd.nist.gov/vuln/detail/CVE-2023-47233
    
    In brcm80211 driver,it starts with the following invoking chain
    to start init a timeout worker:
    
    ->brcmf_usb_probe
      ->brcmf_usb_probe_cb
        ->brcmf_attach
          ->brcmf_bus_started
            ->brcmf_cfg80211_attach
              ->wl_init_priv
                ->brcmf_init_escan
                  ->INIT_WORK(&cfg->escan_timeout_work,
                      brcmf_cfg80211_escan_timeout_worker);
    
    If we disconnect the USB by hotplug, it will call
    brcmf_usb_disconnect to make cleanup. The invoking chain is :
    
    brcmf_usb_disconnect
      ->brcmf_usb_disconnect_cb
        ->brcmf_detach
          ->brcmf_cfg80211_detach
            ->kfree(cfg);
    
    While the timeout woker may still be running. This will cause
    a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
    
    Fix it by deleting the timer and canceling the worker in
    brcmf_cfg80211_detach.
    
    Fixes: e756af5b30b0 ("brcmfmac: add e-scan support.")
    Signed-off-by: Zheng Wang <[email protected]>
    Cc: [email protected]
    [[email protected]: keep timer delete as is and cancel work just before free]
    Signed-off-by: Arend van Spriel <[email protected]>
    Signed-off-by: Kalle Valo <[email protected]>
    Link: https://msgid.link/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

wifi: iwlwifi: mvm: rfi: fix potential response leaks [+ + +]

Author: Johannes Berg <[email protected]>
Date:   Tue Mar 19 10:10:17 2024 +0200

    wifi: iwlwifi: mvm: rfi: fix potential response leaks
    
    [ Upstream commit 06a093807eb7b5c5b29b6cff49f8174a4e702341 ]
    
    If the rx payload length check fails, or if kmemdup() fails,
    we still need to free the command response. Fix that.
    
    Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support")
    Co-authored-by: Anjaneyulu <[email protected]>
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Miri Korenblit <[email protected]>
    Link: https://msgid.link/20240319100755.db2fa0196aa7.I116293b132502ac68a65527330fa37799694b79c@changeid
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes [+ + +]

Author: Felix Fietkau <[email protected]>
Date:   Sat Mar 16 08:43:36 2024 +0100

    wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
    
    commit 4f2bdb3c5e3189297e156b3ff84b140423d64685 upstream.
    
    When moving a station out of a VLAN and deleting the VLAN afterwards, the
    fast_rx entry still holds a pointer to the VLAN's netdev, which can cause
    use-after-free bugs. Fix this by immediately calling ieee80211_check_fast_rx
    after the VLAN change.
    
    Cc: [email protected]
    Reported-by: [email protected]
    Signed-off-by: Felix Fietkau <[email protected]>
    Link: https://msgid.link/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

wireguard: netlink: access device through ctx instead of peer [+ + +]

Author: Jason A. Donenfeld <[email protected]>
Date:   Thu Mar 14 16:49:10 2024 -0600

    wireguard: netlink: access device through ctx instead of peer
    
    [ Upstream commit 71cbd32e3db82ea4a74e3ef9aeeaa6971969c86f ]
    
    The previous commit fixed a bug that led to a NULL peer->device being
    dereferenced. It's actually easier and faster performance-wise to
    instead get the device from ctx->wg. This semantically makes more sense
    too, since ctx->wg->peer_allowedips.seq is compared with
    ctx->allowedips_seq, basing them both in ctx. This also acts as a
    defence in depth provision against freed peers.
    
    Cc: [email protected]
    Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
    Signed-off-by: Jason A. Donenfeld <[email protected]>
    Reviewed-by: Jiri Pirko <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

wireguard: netlink: check for dangling peer via is_dead instead of empty list [+ + +]

Author: Jason A. Donenfeld <[email protected]>
Date:   Thu Mar 14 16:49:09 2024 -0600

    wireguard: netlink: check for dangling peer via is_dead instead of empty list
    
    [ Upstream commit 55b6c738673871c9b0edae05d0c97995c1ff08c4 ]
    
    If all peers are removed via wg_peer_remove_all(), rather than setting
    peer_list to empty, the peer is added to a temporary list with a head on
    the stack of wg_peer_remove_all(). If a netlink dump is resumed and the
    cursored peer is one that has been removed via wg_peer_remove_all(), it
    will iterate from that peer and then attempt to dump freed peers.
    
    Fix this by instead checking peer->is_dead, which was explictly created
    for this purpose. Also move up the device_update_lock lockdep assertion,
    since reading is_dead relies on that.
    
    It can be reproduced by a small script like:
    
        echo "Setting config..."
        ip link add dev wg0 type wireguard
        wg setconf wg0 /big-config
        (
                while true; do
                        echo "Showing config..."
                        wg showconf wg0 > /dev/null
                done
        ) &
        sleep 4
        wg setconf wg0 <(printf "[Peer]\nPublicKey=$(wg genkey)\n")
    
    Resulting in:
    
        BUG: KASAN: slab-use-after-free in __lock_acquire+0x182a/0x1b20
        Read of size 8 at addr ffff88811956ec70 by task wg/59
        CPU: 2 PID: 59 Comm: wg Not tainted 6.8.0-rc2-debug+ #5
        Call Trace:
         <TASK>
         dump_stack_lvl+0x47/0x70
         print_address_description.constprop.0+0x2c/0x380
         print_report+0xab/0x250
         kasan_report+0xba/0xf0
         __lock_acquire+0x182a/0x1b20
         lock_acquire+0x191/0x4b0
         down_read+0x80/0x440
         get_peer+0x140/0xcb0
         wg_get_device_dump+0x471/0x1130
    
    Cc: [email protected]
    Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
    Reported-by: Lillian Berry <[email protected]>
    Signed-off-by: Jason A. Donenfeld <[email protected]>
    Reviewed-by: Jiri Pirko <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

x86/alternatives: Introduce int3_emulate_jcc() [+ + +]

Author: Peter Zijlstra <[email protected]>
Date:   Wed Mar 13 07:42:53 2024 -0300

    x86/alternatives: Introduce int3_emulate_jcc()
    
    commit db7adcfd1cec4e95155e37bc066fddab302c6340 upstream.
    
    Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be
    used by more code -- specifically static_call() will need this.
    
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions [+ + +]

Author: Peter Zijlstra <[email protected]>
Date:   Wed Mar 13 07:42:54 2024 -0300

    x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
    
    commit ac0ee0a9560c97fa5fe1409e450c2425d4ebd17a upstream.
    
    In order to re-write Jcc.d32 instructions text_poke_bp() needs to be
    taught about them.
    
    The biggest hurdle is that the whole machinery is currently made for 5
    byte instructions and extending this would grow struct text_poke_loc
    which is currently a nice 16 bytes and used in an array.
    
    However, since text_poke_loc contains a full copy of the (s32)
    displacement, it is possible to map the Jcc.d32 2 byte opcodes to
    Jcc.d8 1 byte opcode for the int3 emulation.
    
    This then leaves the replacement bytes; fudge that by only storing the
    last 5 bytes and adding the rule that 'length == 6' instruction will
    be prefixed with a 0x0f byte.
    
    Change-Id: Ie3f72c6b92f865d287c8940e5a87e59d41cfaa27
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [cascardo: there is no emit_call_track_retpoline]
    Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix [+ + +]

Author: H. Peter Anvin (Intel) <[email protected]>
Date:   Tue Mar 12 14:10:40 2024 -0700

    x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
    
    commit f87bc8dc7a7c438c70f97b4e51c76a183313272e upstream.
    
    Add a macro _ASM_RIP() to add a (%rip) suffix on 64 bits only. This is
    useful for immediate memory references where one doesn't want gcc
    to possibly use a register indirection as it may in the case of an "m"
    constraint.
    
      [ pawan: resolved merged conflict for __ASM_REGPFX ]
    
    Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
    Signed-off-by: Borislav Petkov <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/asm: Differentiate between code and function alignment [+ + +]

Author: Thomas Gleixner <[email protected]>
Date:   Wed Mar 13 07:42:52 2024 -0300

    x86/asm: Differentiate between code and function alignment
    
    commit 8eb5d34e77c63fde8af21c691bcf6e3cd87f7829 upstream.
    
    Create SYM_F_ALIGN to differentiate alignment requirements between
    SYM_CODE and SYM_FUNC.
    
    This distinction is useful later when adding padding in front of
    functions; IOW this allows following the compiler's
    patchable-function-entry option.
    
    [peterz: Changelog]
    
    Change-Id: I4f9bc0507e5c3fdb3e0839806989efc305e0a758
    Signed-off-by: Thomas Gleixner <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [cascardo: adjust for missing commit c4691712b546 ("x86/linkage: Add ENDBR to SYM_FUNC_START*()")]
    Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bhi: Add BHI mitigation knob [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Mon Mar 11 08:57:05 2024 -0700

    x86/bhi: Add BHI mitigation knob
    
    commit ec9404e40e8f36421a2b66ecb76dc2209fe7f3ef upstream.
    
    Branch history clearing software sequences and hardware control
    BHI_DIS_S were defined to mitigate Branch History Injection (BHI).
    
    Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:
    
     auto - Deploy the hardware mitigation BHI_DIS_S, if available.
     on   - Deploy the hardware mitigation BHI_DIS_S, if available,
            otherwise deploy the software sequence at syscall entry and
            VMexit.
     off  - Turn off BHI mitigation.
    
    The default is auto mode which does not deploy the software sequence
    mitigation.  This is because of the hardening done in the syscall
    dispatch path, which is the likely target of BHI.
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Alexandre Chartre <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bhi: Add support for clearing branch history at syscall entry [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Mon Mar 11 08:56:58 2024 -0700

    x86/bhi: Add support for clearing branch history at syscall entry
    
    commit 7390db8aea0d64e9deb28b8e1ce716f5020c7ee5 upstream.
    
    Branch History Injection (BHI) attacks may allow a malicious application to
    influence indirect branch prediction in kernel by poisoning the branch
    history. eIBRS isolates indirect branch targets in ring0.  The BHB can
    still influence the choice of indirect branch predictor entry, and although
    branch predictor entries are isolated between modes when eIBRS is enabled,
    the BHB itself is not isolated between modes.
    
    Alder Lake and new processors supports a hardware control BHI_DIS_S to
    mitigate BHI.  For older processors Intel has released a software sequence
    to clear the branch history on parts that don't support BHI_DIS_S. Add
    support to execute the software sequence at syscall entry and VMexit to
    overwrite the branch history.
    
    For now, branch history is not cleared at interrupt entry, as malicious
    applications are not believed to have sufficient control over the
    registers, since previous register state is cleared at interrupt
    entry. Researchers continue to poke at this area and it may become
    necessary to clear at interrupt entry as well in the future.
    
    This mitigation is only defined here. It is enabled later.
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Co-developed-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Alexandre Chartre <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bhi: Define SPEC_CTRL_BHI_DIS_S [+ + +]

Author: Daniel Sneddon <[email protected]>
Date:   Wed Mar 13 09:47:57 2024 -0700

    x86/bhi: Define SPEC_CTRL_BHI_DIS_S
    
    commit 0f4a837615ff925ba62648d280a861adf1582df7 upstream.
    
    Newer processors supports a hardware control BHI_DIS_S to mitigate
    Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel
    from userspace BHI attacks without having to manually overwrite the
    branch history.
    
    Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL.
    Mitigation is enabled later.
    
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Alexandre Chartre <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bhi: Enumerate Branch History Injection (BHI) bug [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Mon Mar 11 08:57:03 2024 -0700

    x86/bhi: Enumerate Branch History Injection (BHI) bug
    
    commit be482ff9500999f56093738f9219bbabc729d163 upstream.
    
    Mitigation for BHI is selected based on the bug enumeration. Add bits
    needed to enumerate BHI bug.
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Alexandre Chartre <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bhi: Mitigate KVM by default [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Mon Mar 11 08:57:09 2024 -0700

    x86/bhi: Mitigate KVM by default
    
    commit 95a6ccbdc7199a14b71ad8901cb788ba7fb5167b upstream.
    
    BHI mitigation mode spectre_bhi=auto does not deploy the software
    mitigation by default. In a cloud environment, it is a likely scenario
    where userspace is trusted but the guests are not trusted. Deploying
    system wide mitigation in such cases is not desirable.
    
    Update the auto mode to unconditionally mitigate against malicious
    guests. Deploy the software sequence at VMexit in auto mode also, when
    hardware mitigation is not available. Unlike the force =on mode,
    software sequence is not deployed at syscalls in auto mode.
    
    Suggested-by: Alexandre Chartre <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Alexandre Chartre <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Add asm helpers for executing VERW [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:10:46 2024 -0700

    x86/bugs: Add asm helpers for executing VERW
    
    commit baf8361e54550a48a7087b603313ad013cc13386 upstream.
    
    MDS mitigation requires clearing the CPU buffers before returning to
    user. This needs to be done late in the exit-to-user path. Current
    location of VERW leaves a possibility of kernel data ending up in CPU
    buffers for memory accesses done after VERW such as:
    
      1. Kernel data accessed by an NMI between VERW and return-to-user can
         remain in CPU buffers since NMI returning to kernel does not
         execute VERW to clear CPU buffers.
      2. Alyssa reported that after VERW is executed,
         CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system
         call. Memory accesses during stack scrubbing can move kernel stack
         contents into CPU buffers.
      3. When caller saved registers are restored after a return from
         function executing VERW, the kernel stack accesses can remain in
         CPU buffers(since they occur after VERW).
    
    To fix this VERW needs to be moved very late in exit-to-user path.
    
    In preparation for moving VERW to entry/exit asm code, create macros
    that can be used in asm. Also make VERW patching depend on a new feature
    flag X86_FEATURE_CLEAR_CPU_BUF.
    
      [pawan: - Runtime patch jmp instead of verw in macro CLEAR_CPU_BUFFERS
                due to lack of relative addressing support for relocations
                in kernels < v6.5.
              - Add UNWIND_HINT_EMPTY to avoid warning:
                arch/x86/entry/entry.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction]
    
    Reported-by: Alyssa Milburn <[email protected]>
    Suggested-by: Andrew Cooper <[email protected]>
    Suggested-by: Peter Zijlstra <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/all/20240213-delay-verw-v8-1-a6216d83edb7%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Fri Apr 5 11:14:13 2024 -0700

    x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
    
    commit 0cd01ac5dcb1e18eb18df0f0d05b5de76522a437 upstream.
    
    Change the format of the 'spectre_v2' vulnerabilities sysfs file
    slightly by converting the commas to semicolons, so that mitigations for
    future variants can be grouped together and separated by commas.
    
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Fix the SRSO mitigation on Zen3/4 [+ + +]

Author: Borislav Petkov (AMD) <[email protected]>
Date:   Fri Apr 5 16:04:32 2024 +0200

    x86/bugs: Fix the SRSO mitigation on Zen3/4
    
    Commit 4535e1a4174c4111d92c5a9a21e542d232e0fcaa upstream.
    
    The original version of the mitigation would patch in the calls to the
    untraining routines directly.  That is, the alternative() in UNTRAIN_RET
    will patch in the CALL to srso_alias_untrain_ret() directly.
    
    However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
    mess") meant well in trying to clean up the situation, due to micro-
    architectural reasons, the untraining routine srso_alias_untrain_ret()
    must be the target of a CALL instruction and not of a JMP instruction as
    it is done now.
    
    Reshuffle the alternative macros to accomplish that.
    
    Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Reviewed-by: Ingo Molnar <[email protected]>
    Cc: [email protected]
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:11:02 2024 -0700

    x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
    
    commit 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 upstream.
    
    The VERW mitigation at exit-to-user is enabled via a static branch
    mds_user_clear. This static branch is never toggled after boot, and can
    be safely replaced with an ALTERNATIVE() which is convenient to use in
    asm.
    
    Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user
    path. Also remove the now redundant VERW in exc_nmi() and
    arch_exit_to_user_mode().
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Use sysfs_emit() [+ + +]

Author: Borislav Petkov <[email protected]>
Date:   Tue Aug 9 17:32:02 2022 +0200

    x86/bugs: Use sysfs_emit()
    
    commit 1d30800c0c0ae1d086ffad2bdf0ba4403370f132 upstream.
    
    Those mitigations are very talkative; use the printing helper which pays
    attention to the buffer size.
    
    Signed-off-by: Borislav Petkov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/CPU/AMD: Update the Zenbleed microcode revisions [+ + +]

Author: Borislav Petkov (AMD) <[email protected]>
Date:   Fri Mar 15 22:42:27 2024 +0100

    x86/CPU/AMD: Update the Zenbleed microcode revisions
    
    [ Upstream commit 5c84b051bd4e777cf37aaff983277e58c99618d5 ]
    
    Update them to the correct revision numbers.
    
    Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled [+ + +]

Author: Kim Phillips <[email protected]>
Date:   Thu Jul 20 14:47:27 2023 -0500

    x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
    
    commit fd470a8beed88440b160d690344fbae05a0b9b1b upstream.
    
    Unlike Intel's Enhanced IBRS feature, AMD's Automatic IBRS does not
    provide protection to processes running at CPL3/user mode, see section
    "Extended Feature Enable Register (EFER)" in the APM v2 at
    https://bugzilla.kernel.org/attachment.cgi?id=304652
    
    Explicitly enable STIBP to protect against cross-thread CPL3
    branch target injections on systems with Automatic IBRS enabled.
    
    Also update the relevant documentation.
    
    Fixes: e7862eda309e ("x86/cpu: Support AMD Automatic IBRS")
    Reported-by: Tom Lendacky <[email protected]>
    Signed-off-by: Kim Phillips <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/cpu: Support AMD Automatic IBRS [+ + +]

Author: Kim Phillips <[email protected]>
Date:   Tue Jan 24 10:33:18 2023 -0600

    x86/cpu: Support AMD Automatic IBRS
    
    commit e7862eda309ecfccc36bb5558d937ed3ace07f3f upstream.
    
    The AMD Zen4 core supports a new feature called Automatic IBRS.
    
    It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS,
    h/w manages its IBRS mitigation resources automatically across CPL transitions.
    
    The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by
    setting MSR C000_0080 (EFER) bit 21.
    
    Enable Automatic IBRS by default if the CPU feature is present.  It typically
    provides greater performance over the incumbent generic retpolines mitigation.
    
    Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum.  AMD Automatic IBRS and
    Intel Enhanced IBRS have similar enablement.  Add NO_EIBRS_PBRSB to
    cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS.
    
    The kernel command line option spectre_v2=eibrs is used to select AMD Automatic
    IBRS, if available.
    
    Signed-off-by: Kim Phillips <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Acked-by: Sean Christopherson <[email protected]>
    Acked-by: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Thu Apr 4 17:16:14 2024 -0700

    x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
    
    commit 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc upstream.
    
    Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate
    compile-time assert in KVM to prevent direct lookups on the features in
    CPUID_LNX_5.  KVM uses X86_FEATURE_* flags to manage guest CPUID, and so
    must translate features that are scattered by Linux from the Linux-defined
    bit to the hardware-defined bit, i.e. should never try to directly access
    scattered features in guest CPUID.
    
    Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a
    compile-time assert in KVM's CPUID infrastructure to ensure that future
    additions update cpuid_leafs along with NCAPINTS.
    
    No functional change intended.
    
    Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features")
    Cc: Sandipan Das <[email protected]>
    Signed-off-by: Sean Christopherson <[email protected]>
    Acked-by: Dave Hansen <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/cpufeatures: Add new word for scattered features [+ + +]

Author: Sandipan Das <[email protected]>
Date:   Mon Mar 25 13:01:44 2024 +0530

    x86/cpufeatures: Add new word for scattered features
    
    commit 7f274e609f3d5f45c22b1dd59053f6764458b492 upstream.
    
    Add a new word for scattered features because all free bits among the
    existing Linux-defined auxiliary flags have been exhausted.
    
    Signed-off-by: Sandipan Das <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Link: https://lore.kernel.org/r/8380d2a0da469a1f0ad75b8954a79fb689599ff6.1711091584.git.sandipan.das@amd.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/entry_32: Add VERW just before userspace transition [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:10:57 2024 -0700

    x86/entry_32: Add VERW just before userspace transition
    
    commit a0e2dab44d22b913b4c228c8b52b2a104434b0b3 upstream.
    
    As done for entry_64, add support for executing VERW late in exit to
    user path for 32-bit mode.
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/all/20240213-delay-verw-v8-3-a6216d83edb7%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/entry_64: Add VERW just before userspace transition [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:10:51 2024 -0700

    x86/entry_64: Add VERW just before userspace transition
    
    commit 3c7501722e6b31a6e56edd23cea5e77dbb9ffd1a upstream.
    
    Mitigation for MDS is to use VERW instruction to clear any secrets in
    CPU Buffers. Any memory accesses after VERW execution can still remain
    in CPU buffers. It is safer to execute VERW late in return to user path
    to minimize the window in which kernel data can end up in CPU buffers.
    There are not many kernel secrets to be had after SWITCH_TO_USER_CR3.
    
    Add support for deploying VERW mitigation after user register state is
    restored. This helps minimize the chances of kernel data ending up into
    CPU buffers after executing VERW.
    
    Note that the mitigation at the new location is not yet enabled.
    
      Corner case not handled
      =======================
      Interrupts returning to kernel don't clear CPUs buffers since the
      exit-to-user path is expected to do that anyways. But, there could be
      a case when an NMI is generated in kernel after the exit-to-user path
      has cleared the buffers. This case is not handled and NMI returning to
      kernel don't clear CPU buffers because:
    
      1. It is rare to get an NMI after VERW, but before returning to user.
      2. For an unprivileged user, there is no known way to make that NMI
         less rare or target it.
      3. It would take a large number of these precisely-timed NMIs to mount
         an actual attack.  There's presumably not enough bandwidth.
      4. The NMI in question occurs after a VERW, i.e. when user state is
         restored and most interesting data is already scrubbed. Whats left
         is only the data that NMI touches, and that may or may not be of
         any interest.
    
      [ pawan: resolved conflict for hunk swapgs_restore_regs_and_return_to_usermode ]
    
    Suggested-by: Dave Hansen <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/all/20240213-delay-verw-v8-2-a6216d83edb7%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/mce: Make sure to grab mce_sysfs_mutex in set_bank() [+ + +]

Author: Borislav Petkov (AMD) <[email protected]>
Date:   Wed Mar 13 14:48:27 2024 +0100

    x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
    
    commit 3ddf944b32f88741c303f0b21459dbb3872b8bc5 upstream.
    
    Modifying a MCA bank's MCA_CTL bits which control which error types to
    be reported is done over
    
      /sys/devices/system/machinecheck/
      Б■°Б■─Б■─ machinecheck0
      Б■┌б═б═ Б■°Б■─Б■─ bank0
      Б■┌б═б═ Б■°Б■─Б■─ bank1
      Б■┌б═б═ Б■°Б■─Б■─ bank10
      Б■┌б═б═ Б■°Б■─Б■─ bank11
      ...
    
    sysfs nodes by writing the new bit mask of events to enable.
    
    When the write is accepted, the kernel deletes all current timers and
    reinits all banks.
    
    Doing that in parallel can lead to initializing a timer which is already
    armed and in the timer wheel, i.e., in use already:
    
      ODEBUG: init active (active state 0) object: ffff888063a28000 object
      type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
      WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
      debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
    
    Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
    does.
    
    Reported by: Yue Sun <[email protected]>
    Reported by: xingwei lee <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:11:19 2024 -0700

    x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
    
    commit e95df4ec0c0c9791941f112db699fae794b9862a upstream.
    
    Currently MMIO Stale Data mitigation for CPUs not affected by MDS/TAA is
    to only deploy VERW at VMentry by enabling mmio_stale_data_clear static
    branch. No mitigation is needed for kernel->user transitions. If such
    CPUs are also affected by RFDS, its mitigation may set
    X86_FEATURE_CLEAR_CPU_BUF to deploy VERW at kernel->user and VMentry.
    This could result in duplicate VERW at VMentry.
    
    Fix this by disabling mmio_stale_data_clear static branch when
    X86_FEATURE_CLEAR_CPU_BUF is enabled.
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Reviewed-by: Dave Hansen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/pm: Work around false positive kmemleak report in msr_build_context() [+ + +]

Author: Anton Altaparmakov <[email protected]>
Date:   Thu Mar 14 14:26:56 2024 +0000

    x86/pm: Work around false positive kmemleak report in msr_build_context()
    
    [ Upstream commit e3f269ed0accbb22aa8f25d2daffa23c3fccd407 ]
    
    Since:
    
      7ee18d677989 ("x86/power: Make restore_processor_context() sane")
    
    kmemleak reports this issue:
    
      unreferenced object 0xf68241e0 (size 32):
        comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s)
        hex dump (first 32 bytes):
          00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00  ....)...........
          00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc  .B..............
        backtrace:
          [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260
          [<ea65e13b>] __kmalloc+0x54/0x160
          [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100
          [<46635aff>] pm_check_save_msr+0x63/0x80
          [<6b6bb938>] do_one_initcall+0x41/0x1f0
          [<3f3add60>] kernel_init_freeable+0x199/0x1e8
          [<3b538fde>] kernel_init+0x1a/0x110
          [<938ae2b2>] ret_from_fork+0x1c/0x28
    
    Which is a false positive.
    
    Reproducer:
    
      - Run rsync of whole kernel tree (multiple times if needed).
      - start a kmemleak scan
      - Note this is just an example: a lot of our internal tests hit these.
    
    The root cause is similar to the fix in:
    
      b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context()
    
    ie. the alignment within the packed struct saved_context
    which has everything unaligned as there is only "u16 gs;" at start of
    struct where in the past there were four u16 there thus aligning
    everything afterwards.  The issue is with the fact that Kmemleak only
    searches for pointers that are aligned (see how pointers are scanned in
    kmemleak.c) so when the struct members are not aligned it doesn't see
    them.
    
    Testing:
    
    We run a lot of tests with our CI, and after applying this fix we do not
    see any kmemleak issues any more whilst without it we see hundreds of
    the above report. From a single, simple test run consisting of 416 individual test
    cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this,
    which is quite a lot. With this fix applied we get zero kmemleak related failures.
    
    Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane")
    Signed-off-by: Anton Altaparmakov <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Acked-by: "Rafael J. Wysocki" <[email protected]>
    Cc: [email protected]
    Cc: Linus Torvalds <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO [+ + +]

Author: Borislav Petkov (AMD) <[email protected]>
Date:   Fri Apr 5 16:05:09 2024 +0200

    x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
    
    Commit 0e110732473e14d6520e49d75d2c88ef7d46fe67 upstream.
    
    The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO
    case is there only for the altenative in CALL_UNTRAIN_RET to have
    a symbol to resolve.
    
    However, testing with kernels which don't have CONFIG_MITIGATION_SRSO
    enabled, leads to the warning in patch_return() to fire:
    
      missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e
      WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826
    
    Put in a plain "ret" there so that gcc doesn't put a return thunk in
    in its place which special and gets checked.
    
    In addition:
    
      ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined!
      make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1
      make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2
      make: *** [Makefile:240: __sub-make] Chyba 2
    
    since !SRSO builds would use the dummy return thunk as reported by
    [email protected], https://bugzilla.kernel.org/show_bug.cgi?id=218679.
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-lkp/[email protected]
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/rfds: Mitigate Register File Data Sampling (RFDS) [+ + +]

Author: Pawan Gupta <[email protected]>
Date:   Tue Mar 12 14:11:30 2024 -0700

    x86/rfds: Mitigate Register File Data Sampling (RFDS)
    
    commit 8076fcde016c9c0e0660543e67bff86cb48a7c9c upstream.
    
    RFDS is a CPU vulnerability that may allow userspace to infer kernel
    stale data previously used in floating point registers, vector registers
    and integer registers. RFDS only affects certain Intel Atom processors.
    
    Intel released a microcode update that uses VERW instruction to clear
    the affected CPU buffers. Unlike MDS, none of the affected cores support
    SMT.
    
    Add RFDS bug infrastructure and enable the VERW based mitigation by
    default, that clears the affected buffers just before exiting to
    userspace. Also add sysfs reporting and cmdline parameter
    "reg_file_data_sampling" to control the mitigation.
    
    For details see:
    Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
    
      [ pawan: - Resolved conflicts in sysfs reporting.
               - s/ATOM_GRACEMONT/ALDERLAKE_N/ATOM_GRACEMONT is called
                 ALDERLAKE_N in 6.6. ]
    
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Dave Hansen <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Acked-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/static_call: Add support for Jcc tail-calls [+ + +]

Author: Peter Zijlstra <[email protected]>
Date:   Wed Mar 13 07:42:55 2024 -0300

    x86/static_call: Add support for Jcc tail-calls
    
    commit 923510c88d2b7d947c4217835fd9ca6bd65cc56c upstream.
    
    Clang likes to create conditional tail calls like:
    
      0000000000000350 <amd_pmu_add_event>:
      350:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1) 351: R_X86_64_NONE      __fentry__-0x4
      355:       48 83 bf 20 01 00 00 00         cmpq   $0x0,0x120(%rdi)
      35d:       0f 85 00 00 00 00       jne    363 <amd_pmu_add_event+0x13>     35f: R_X86_64_PLT32     __SCT__amd_pmu_branch_add-0x4
      363:       e9 00 00 00 00          jmp    368 <amd_pmu_add_event+0x18>     364: R_X86_64_PLT32     __x86_return_thunk-0x4
    
    Where 0x35d is a static call site that's turned into a conditional
    tail-call using the Jcc class of instructions.
    
    Teach the in-line static call text patching about this.
    
    Notably, since there is no conditional-ret, in that case patch the Jcc
    to point at an empty stub function that does the ret -- or the return
    thunk when needed.
    
    Reported-by: "Erhard F." <[email protected]>
    Change-Id: I99c8fc3f721e5d1c74f06710b38d4bac5230303a
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [cascardo: __static_call_validate didn't have the bool tramp argument]
    Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/syscall: Don't force use of indirect calls for system calls [+ + +]

Author: Linus Torvalds <[email protected]>
Date:   Wed Apr 3 16:36:44 2024 -0700

    x86/syscall: Don't force use of indirect calls for system calls
    
    commit 1e3ad78334a69b36e107232e337f9d693dcc9df2 upstream.
    
    Make <asm/syscall.h> build a switch statement instead, and the compiler can
    either decide to generate an indirect jump, or - more likely these days due
    to mitigations - just a series of conditional branches.
    
    Yes, the conditional branches also have branch prediction, but the branch
    prediction is much more controlled, in that it just causes speculatively
    running the wrong system call (harmless), rather than speculatively running
    possibly wrong random less controlled code gadgets.
    
    This doesn't mitigate other indirect calls, but the system call indirection
    is the first and most easily triggered case.
    
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86: set SPECTRE_BHI_ON as default [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Tue Apr 9 19:32:41 2024 +0200

    x86: set SPECTRE_BHI_ON as default
    
    commit 2bb69f5fc72183e1c62547d900f560d0e9334925 upstream.
    
    Part of a merge commit from Linus that adjusted the default setting of
    SPECTRE_BHI_ON.
    
    Cc: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xen-netfront: Add missing skb_mark_for_recycle [+ + +]

Author: Jesper Dangaard Brouer <[email protected]>
Date:   Wed Mar 27 13:14:56 2024 +0100

    xen-netfront: Add missing skb_mark_for_recycle
    
    commit 037965402a010898d34f4e35327d22c0a95cd51f upstream.
    
    Notice that skb_mark_for_recycle() is introduced later than fixes tag in
    commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling").
    
    It is believed that fixes tag were missing a call to page_pool_release_page()
    between v5.9 to v5.14, after which is should have used skb_mark_for_recycle().
    Since v6.6 the call page_pool_release_page() were removed (in
    commit 535b9c61bdef ("net: page_pool: hide page_pool_release_page()")
    and remaining callers converted (in commit 6bfef2ec0172 ("Merge branch
    'net-page_pool-remove-page_pool_release_page'")).
    
    This leak became visible in v6.8 via commit dba1b8a7ab68 ("mm/page_pool: catch
    page_pool memory leaks").
    
    Cc: [email protected]
    Fixes: 6c5aa6fc4def ("xen networking: add basic XDP support for xen-netfront")
    Reported-by: Leonidas Spyropoulos <[email protected]>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=218654
    Reported-by: Arthur Borsboom <[email protected]>
    Signed-off-by: Jesper Dangaard Brouer <[email protected]>
    Link: https://lore.kernel.org/r/171154167446.2671062.9127105384591237363.stgit@firesoul
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xen/events: close evtchn after mapping cleanup [+ + +]

Author: Maximilian Heyne <[email protected]>
Date:   Wed Jan 24 16:31:28 2024 +0000

    xen/events: close evtchn after mapping cleanup
    
    commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream.
    
    shutdown_pirq and startup_pirq are not taking the
    irq_mapping_update_lock because they can't due to lock inversion. Both
    are called with the irq_desc->lock being taking. The lock order,
    however, is first irq_mapping_update_lock and then irq_desc->lock.
    
    This opens multiple races:
    - shutdown_pirq can be interrupted by a function that allocates an event
      channel:
    
      CPU0                        CPU1
      shutdown_pirq {
        xen_evtchn_close(e)
                                  __startup_pirq {
                                    EVTCHNOP_bind_pirq
                                      -> returns just freed evtchn e
                                    set_evtchn_to_irq(e, irq)
                                  }
        xen_irq_info_cleanup() {
          set_evtchn_to_irq(e, -1)
        }
      }
    
      Assume here event channel e refers here to the same event channel
      number.
      After this race the evtchn_to_irq mapping for e is invalid (-1).
    
    - __startup_pirq races with __unbind_from_irq in a similar way. Because
      __startup_pirq doesn't take irq_mapping_update_lock it can grab the
      evtchn that __unbind_from_irq is currently freeing and cleaning up. In
      this case even though the event channel is allocated, its mapping can
      be unset in evtchn_to_irq.
    
    The fix is to first cleanup the mappings and then close the event
    channel. In this way, when an event channel gets allocated it's
    potential previous evtchn_to_irq mappings are guaranteed to be unset already.
    This is also the reverse order of the allocation where first the event
    channel is allocated and then the mappings are setup.
    
    On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
    [un]bind interfaces"), we hit a BUG like the following during probing of NVMe
    devices. The issue is that during nvme_setup_io_queues, pci_free_irq
    is called for every device which results in a call to shutdown_pirq.
    With many nvme devices it's therefore likely to hit this race during
    boot because there will be multiple calls to shutdown_pirq and
    startup_pirq are running potentially in parallel.
    
      ------------[ cut here ]------------
      blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
      kernel BUG at drivers/xen/events/events_base.c:499!
      invalid opcode: 0000 [#1] SMP PTI
      CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1
      Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
      Workqueue: nvme-reset-wq nvme_reset_work
      RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
      Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
      RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
      RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff
      RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed
      R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002
      FS:  0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? show_trace_log_lvl+0x1c1/0x2d9
       ? show_trace_log_lvl+0x1c1/0x2d9
       ? set_affinity_irq+0xdc/0x1c0
       ? __die_body.cold+0x8/0xd
       ? die+0x2b/0x50
       ? do_trap+0x90/0x110
       ? bind_evtchn_to_cpu+0xdf/0xf0
       ? do_error_trap+0x65/0x80
       ? bind_evtchn_to_cpu+0xdf/0xf0
       ? exc_invalid_op+0x4e/0x70
       ? bind_evtchn_to_cpu+0xdf/0xf0
       ? asm_exc_invalid_op+0x12/0x20
       ? bind_evtchn_to_cpu+0xdf/0xf0
       ? bind_evtchn_to_cpu+0xc5/0xf0
       set_affinity_irq+0xdc/0x1c0
       irq_do_set_affinity+0x1d7/0x1f0
       irq_setup_affinity+0xd6/0x1a0
       irq_startup+0x8a/0xf0
       __setup_irq+0x639/0x6d0
       ? nvme_suspend+0x150/0x150
       request_threaded_irq+0x10c/0x180
       ? nvme_suspend+0x150/0x150
       pci_request_irq+0xa8/0xf0
       ? __blk_mq_free_request+0x74/0xa0
       queue_request_irq+0x6f/0x80
       nvme_create_queue+0x1af/0x200
       nvme_create_io_queues+0xbd/0xf0
       nvme_setup_io_queues+0x246/0x320
       ? nvme_irq_check+0x30/0x30
       nvme_reset_work+0x1c8/0x400
       process_one_work+0x1b0/0x350
       worker_thread+0x49/0x310
       ? process_one_work+0x350/0x350
       kthread+0x11b/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      Modules linked in:
      ---[ end trace a11715de1eee1873 ]---
    
    Fixes: d46a78b05c0e ("xen: implement pirq type event channels")
    Cc: [email protected]
    Co-debugged-by: Andrew Panyakin <[email protected]>
    Signed-off-by: Maximilian Heyne <[email protected]>
    Reviewed-by: Juergen Gross <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Juergen Gross <[email protected]>
    [apanyaki: backport to v5.15-stable]
    Signed-off-by: Andrew Paniakin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: Avoid clang fortify warning in copy_to_user_tmpl() [+ + +]

Author: Nathan Chancellor <[email protected]>
Date:   Wed Feb 21 14:46:21 2024 -0700

    xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
    
    commit 1a807e46aa93ebad1dfbed4f82dc3bf779423a6e upstream.
    
    After a couple recent changes in LLVM, there is a warning (or error with
    CONFIG_WERROR=y or W=e) from the compile time fortify source routines,
    specifically the memset() in copy_to_user_tmpl().
    
      In file included from net/xfrm/xfrm_user.c:14:
      ...
      include/linux/fortify-string.h:438:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
        438 |                         __write_overflow_field(p_size_field, size);
            |                         ^
      1 error generated.
    
    While ->xfrm_nr has been validated against XFRM_MAX_DEPTH when its value
    is first assigned in copy_templates() by calling validate_tmpl() first
    (so there should not be any issue in practice), LLVM/clang cannot really
    deduce that across the boundaries of these functions. Without that
    knowledge, it cannot assume that the loop stops before i is greater than
    XFRM_MAX_DEPTH, which would indeed result a stack buffer overflow in the
    memset().
    
    To make the bounds of ->xfrm_nr clear to the compiler and add additional
    defense in case copy_to_user_tmpl() is ever used in a path where
    ->xfrm_nr has not been properly validated against XFRM_MAX_DEPTH first,
    add an explicit bound check and early return, which clears up the
    warning.
    
    Cc: [email protected]
    Link: https://github.com/ClangBuiltLinux/linux/issues/1985
    Signed-off-by: Nathan Chancellor <[email protected]>
    Reviewed-by: Kees Cook <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Список изменений в Linux 5.15.154