Changelog in Linux kernel 6.12.66

alpha: don't reference obsolete termio struct for TC* constants [+ + +]

Author: Sam James <[email protected]>
Date:   Fri Dec 5 08:14:57 2025 +0000

    alpha: don't reference obsolete termio struct for TC* constants
    
    [ Upstream commit 9aeed9041929812a10a6d693af050846942a1d16 ]
    
    Similar in nature to ab107276607af90b13a5994997e19b7b9731e251. glibc-2.42
    drops the legacy termio struct, but the ioctls.h header still defines some
    TC* constants in terms of termio (via sizeof). Hardcode the values instead.
    
    This fixes building Python for example, which falls over like:
      ./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'
    
    Link: https://bugs.gentoo.org/961769
    Link: https://bugs.gentoo.org/962600
    Signed-off-by: Sam James <[email protected]>
    Reviewed-by: Magnus Lindholm <[email protected]>
    Link: https://lore.kernel.org/r/6ebd3451908785cad53b50ca6bc46cfe9d6bc03c.1764922497.git.sam@gentoo.org
    Signed-off-by: Magnus Lindholm <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ALSA: ac97: fix a double free in snd_ac97_controller_register() [+ + +]

Author: Haoxiang Li <[email protected]>
Date:   Mon Jan 12 12:09:40 2026 -0500

    ALSA: ac97: fix a double free in snd_ac97_controller_register()
    
    [ Upstream commit 830988b6cf197e6dcffdfe2008c5738e6c6c3c0f ]
    
    If ac97_add_adapter() fails, put_device() is the correct way to drop
    the device reference. kfree() is not required.
    Add kfree() if idr_alloc() fails and in ac97_adapter_release() to do
    the cleanup.
    
    Found by code review.
    
    Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus")
    Cc: [email protected]
    Signed-off-by: Haoxiang Li <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: ac97bus: Use guard() for mutex locks [+ + +]

Author: Takashi Iwai <[email protected]>
Date:   Mon Jan 12 12:09:39 2026 -0500

    ALSA: ac97bus: Use guard() for mutex locks
    
    [ Upstream commit c07824a14d99c10edd4ec4c389d219af336ecf20 ]
    
    Replace the manual mutex lock/unlock pairs with guard() for code
    simplification.
    
    Only code refactoring, and no behavior change.
    
    Signed-off-by: Takashi Iwai <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Stable-dep-of: 830988b6cf19 ("ALSA: ac97: fix a double free in snd_ac97_controller_register()")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: hda/realtek: enable woofer speakers on Medion NM14LNL [+ + +]

Author: Kai Vehmanen <[email protected]>
Date:   Fri Dec 12 19:46:58 2025 +0200

    ALSA: hda/realtek: enable woofer speakers on Medion NM14LNL
    
    [ Upstream commit e64826e5e367ad45539ab245b92f009ee165025c ]
    
    The ALC233 codec on these Medion NM14LNL (SPRCHRGD 14 S2) systems
    requires a quirk to enable all speakers.
    
    Tested-by: davplsm <[email protected]>
    Link: https://github.com/thesofproject/linux/issues/5611
    Signed-off-by: Kai Vehmanen <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ALSA: hda: intel-dsp-config: Prefer legacy driver as fallback [+ + +]

Author: Takashi Iwai <[email protected]>
Date:   Wed Dec 10 14:15:51 2025 +0100

    ALSA: hda: intel-dsp-config: Prefer legacy driver as fallback
    
    commit 161a0c617ab172bbcda7ce61803addeb2124dbff upstream.
    
    When config table entries don't match with the device to be probed,
    currently we fall back to SND_INTEL_DSP_DRIVER_ANY, which means to
    allow any drivers to bind with it.
    
    This was set so with the assumption (or hope) that all controller
    drivers should cover the devices generally, but in practice, this
    caused a problem as reported recently.  Namely, when a specific
    kconfig for SOF isn't set for the modern Intel chips like Alderlake,
    a wrong driver (AVS) got probed and failed.  This is because we have
    entries like:
    
     #if IS_ENABLED(CONFIG_SND_SOC_SOF_ALDERLAKE)
     /* Alder Lake / Raptor Lake */
            {
                    .flags = FLAG_SOF | FLAG_SOF_ONLY_IF_DMIC_OR_SOUNDWIRE,
                    .device = PCI_DEVICE_ID_INTEL_HDA_ADL_S,
            },
     ....
     #endif
    
    so this entry is effective only when CONFIG_SND_SOC_SOF_ALDERLAKE is
    set.  If not set, there is no matching entry, hence it returns
    SND_INTEL_DSP_DRIVER_ANY as fallback.  OTOH, if the kconfig is set, it
    explicitly falls back to SND_INTEL_DSP_DRIVER_LEGACY when no DMIC or
    SoundWire is found -- that was the working scenario.  That being said,
    the current setup may be broken for modern Intel chips that are
    supposed to work with either SOF or legacy driver when the
    corresponding kconfig were missing.
    
    For addressing the problem above, this patch changes the fallback
    driver to the legacy driver, i.e. return SND_INTEL_DSP_DRIVER_LEGACY
    type as much as possible.  When CONFIG_SND_HDA_INTEL is also disabled,
    the fallback is set to SND_INTEL_DSP_DRIVER_ANY type, just to be sure.
    
    Reported-by: Askar Safin <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Tested-by: Askar Safin <[email protected]>
    Reviewed-by: Peter Ujfalusi <[email protected]>
    Signed-off-by: Takashi Iwai <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Askar Safin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: usb-audio: Update for native DSD support quirks [+ + +]

Author: Jussi Laako <[email protected]>
Date:   Thu Dec 11 17:22:21 2025 +0200

    ALSA: usb-audio: Update for native DSD support quirks
    
    [ Upstream commit da3a7efff64ec0d63af4499eea3a46a2e13b5797 ]
    
    Maintenance patch for native DSD support.
    
    Add set of missing device and vendor quirks; TEAC, Esoteric, Luxman and
    Musical Fidelity.
    
    Signed-off-by: Jussi Laako <[email protected]>
    Signed-off-by: Takashi Iwai <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: add off-on-delay-us for usdhc2 regulator [+ + +]

Author: Haibo Chen <[email protected]>
Date:   Wed Nov 19 11:22:40 2025 +0800

    arm64: dts: add off-on-delay-us for usdhc2 regulator
    
    [ Upstream commit ca643894a37a25713029b36cfe7d1bae515cac08 ]
    
    For SD card, according to the spec requirement, for sd card power reset
    operation, it need sd card supply voltage to be lower than 0.5v and keep
    over 1ms, otherwise, next time power back the sd card supply voltage to
    3.3v, sd card can't support SD3.0 mode again.
    
    To match such requirement on imx8qm-mek board, add 4.8ms delay between
    sd power off and power on.
    
    Fixes: 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support")
    Reviewed-by: Frank Li <[email protected]>
    Signed-off-by: Haibo Chen <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM [+ + +]

Author: Marek Vasut <[email protected]>
Date:   Tue Dec 2 14:41:51 2025 +0100

    arm64: dts: imx8mp: Fix LAN8740Ai PHY reference clock on DH electronics i.MX8M Plus DHCOM
    
    [ Upstream commit c63749a7ddc59ac6ec0b05abfa0a21af9f2c1d38 ]
    
    Add missing 'clocks' property to LAN8740Ai PHY node, to allow the PHY driver
    to manage LAN8740Ai CLKIN reference clock supply. This fixes sporadic link
    bouncing caused by interruptions on the PHY reference clock, by letting the
    PHY driver manage the reference clock and assure there are no interruptions.
    
    This follows the matching PHY driver recommendation described in commit
    bedd8d78aba3 ("net: phy: smsc: LAN8710/20: add phy refclk in support")
    
    Fixes: 8d6712695bc8 ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2")
    Signed-off-by: Marek Vasut <[email protected]>
    Tested-by: Christoph Niedermaier <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: imx8qm-ss-dma: correct the dma channels of lpuart [+ + +]

Author: Sherry Sun <[email protected]>
Date:   Wed Dec 3 09:59:56 2025 +0800

    arm64: dts: imx8qm-ss-dma: correct the dma channels of lpuart
    
    [ Upstream commit a988caeed9d918452aa0a68de2c6e94d86aa43ba ]
    
    The commit 616effc0272b5 ("arm64: dts: imx8: Fix lpuart DMA channel
    order") swap uart rx and tx channel at common imx8-ss-dma.dtsi. But miss
    update imx8qm-ss-dma.dtsi.
    
    The commit 5a8e9b022e569 ("arm64: dts: imx8qm-ss-dma: Pass lpuart
    dma-names") just simple add dma-names as binding doc requirement.
    
    Correct lpuart0 - lpuart3 dma rx and tx channels, and use defines for
    the FSL_EDMA_RX flag.
    
    Fixes: 5a8e9b022e56 ("arm64: dts: imx8qm-ss-dma: Pass lpuart dma-names")
    Signed-off-by: Sherry Sun <[email protected]>
    Reviewed-by: Frank Li <[email protected]>
    Reviewed-by: Alexander Stein <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: mba8mx: Fix Ethernet PHY IRQ support [+ + +]

Author: Alexander Stein <[email protected]>
Date:   Tue Dec 16 14:15:28 2025 +0100

    arm64: dts: mba8mx: Fix Ethernet PHY IRQ support
    
    [ Upstream commit 89e87d0dc87eb3654c9ae01afc4a18c1c6d1e523 ]
    
    Ethernet PHY interrupt mode is level triggered. Adjust the mode
    accordingly.
    
    Signed-off-by: Alexander Stein <[email protected]>
    Reviewed-by: Andrew Lunn <[email protected]>
    Fixes: 70cf622bb16e ("arm64: dts: mba8mx: Add Ethernet PHY IRQ support")
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: ti: k3-am62-lp-sk-nand: Rename pinctrls to fix schema warnings [+ + +]

Author: Wadim Egorov <[email protected]>
Date:   Thu Nov 27 13:27:33 2025 +0100

    arm64: dts: ti: k3-am62-lp-sk-nand: Rename pinctrls to fix schema warnings
    
    [ Upstream commit cf5e8adebe77917a4cc95e43e461cdbd857591ce ]
    
    Rename pinctrl nodes to comply with naming conventions required by
    pinctrl-single schema.
    
    Fixes: e569152274fec ("arm64: dts: ti: am62-lp-sk: Add overlay for NAND expansion card")
    Signed-off-by: Wadim Egorov <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Nishanth Menon <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: Fix cleared E0POE bit after cpu_suspend()/resume() [+ + +]

Author: Yeoreum Yun <[email protected]>
Date:   Wed Jan 7 16:21:15 2026 +0000

    arm64: Fix cleared E0POE bit after cpu_suspend()/resume()
    
    commit bdf3f4176092df5281877cacf42f843063b4784d upstream.
    
    TCR2_ELx.E0POE is set during smp_init().
    However, this bit is not reprogrammed when the CPU enters suspension and
    later resumes via cpu_resume(), as __cpu_setup() does not re-enable E0POE
    and there is no save/restore logic for the TCR2_ELx system register.
    
    As a result, the E0POE feature no longer works after cpu_resume().
    
    To address this, save and restore TCR2_EL1 in the cpu_suspend()/cpu_resume()
    path, rather than adding related logic to __cpu_setup(), taking into account
    possible future extensions of the TCR2_ELx feature.
    
    Fixes: bf83dae90fbc ("arm64: enable the Permission Overlay Extension for EL0")
    Cc: <[email protected]> # 6.12.x
    Signed-off-by: Yeoreum Yun <[email protected]>
    Reviewed-by: Anshuman Khandual <[email protected]>
    Reviewed-by: Kevin Brodsky <[email protected]>
    Signed-off-by: Catalin Marinas <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels [+ + +]

Author: Sebastian Andrzej Siewior <[email protected]>
Date:   Tue Nov 11 16:54:37 2025 +0100

    ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels
    
    [ Upstream commit fedadc4137234c3d00c4785eeed3e747fe9036ae ]
    
    gup_pgd_range() is invoked with disabled interrupts and invokes
    __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range().
    With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get()
    which uses a spinlock_t via lock_kmap_any(). This leads to an
    sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a
    sleeping lock and must not be acquired in atomic context.
    
    The loop in map_new_virtual() uses wait_queue_head_t for wake up which
    also is using a spinlock_t.
    
    Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT
    to allow the use of get_user_pages_fast().
    
    [arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]
    
    Co-developed-by: Arnd Bergmann <[email protected]>
    
    Acked-by: Linus Walleij <[email protected]>
    Reviewed-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
    Signed-off-by: Russell King (Oracle) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: dts: imx6q-ba16: fix RTC interrupt level [+ + +]

Author: Ian Ray <[email protected]>
Date:   Mon Dec 1 11:56:05 2025 +0200

    ARM: dts: imx6q-ba16: fix RTC interrupt level
    
    [ Upstream commit e6a4eedd49ce27c16a80506c66a04707e0ee0116 ]
    
    RTC interrupt level should be set to "LOW". This was revealed by the
    introduction of commit:
    
      f181987ef477 ("rtc: m41t80: use IRQ flags obtained from fwnode")
    
    which changed the way IRQ type is obtained.
    
    Fixes: 56c27310c1b4 ("ARM: dts: imx: Add Advantech BA-16 Qseven module")
    Signed-off-by: Ian Ray <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arp: do not assume dev_hard_header() does not change skb->head [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Wed Jan 7 21:22:50 2026 +0000

    arp: do not assume dev_hard_header() does not change skb->head
    
    [ Upstream commit c92510f5e3f82ba11c95991824a41e59a9c5ed81 ]
    
    arp_create() is the only dev_hard_header() caller
    making assumption about skb->head being unchanged.
    
    A recent commit broke this assumption.
    
    Initialize @arp pointer after dev_hard_header() call.
    
    Fixes: db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust")
    Reported-by: [email protected]
    Signed-off-by: Eric Dumazet <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: amd: yc: Add quirk for Honor MagicBook X16 2025 [+ + +]

Author: Andrew Elantsev <[email protected]>
Date:   Wed Dec 10 23:38:00 2025 +0300

    ASoC: amd: yc: Add quirk for Honor MagicBook X16 2025
    
    [ Upstream commit e2cb8ef0372665854fca6fa7b30b20dd35acffeb ]
    
    Add a DMI quirk for the Honor MagicBook X16 2025 laptop
    fixing the issue where the internal microphone was
    not detected.
    
    Signed-off-by: Andrew Elantsev <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: fsl_sai: Add missing registers to cache default [+ + +]

Author: Alexander Stein <[email protected]>
Date:   Tue Dec 16 11:22:45 2025 +0100

    ASoC: fsl_sai: Add missing registers to cache default
    
    [ Upstream commit 90ed688792a6b7012b3e8a2f858bc3fe7454d0eb ]
    
    Drivers does cache sync during runtime resume, setting all writable
    registers. Not all writable registers are set in cache default, resulting
    in the erorr message:
      fsl-sai 30c30000.sai: using zero-initialized flat cache, this may cause
      unexpected behavior
    
    Fix this by adding missing writable register defaults.
    
    Signed-off-by: Alexander Stein <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rockchip: Fix Wvoid-pointer-to-enum-cast warning (again) [+ + +]

Author: Krzysztof Kozlowski <[email protected]>
Date:   Wed Dec 3 15:16:45 2025 +0100

    ASoC: rockchip: Fix Wvoid-pointer-to-enum-cast warning (again)
    
    [ Upstream commit 57d508b5f718730f74b11e0dc9609ac7976802d1 ]
    
    'version' is an enum, thus cast of pointer on 64-bit compile test with
    clang W=1 causes:
    
      rockchip_pdm.c:583:17: error: cast to smaller integer type 'enum rk_pdm_version' from 'const void *' [-Werror,-Wvoid-pointer-to-enum-cast]
    
    This was already fixed in commit 49a4a8d12612 ("ASoC: rockchip: Fix
    Wvoid-pointer-to-enum-cast warning") but then got bad in
    commit 9958d85968ed ("ASoC: Use device_get_match_data()").
    
    Discussion on LKML also pointed out that 'uintptr_t' is not the correct
    type and either 'kernel_ulong_t' or 'unsigned long' should be used,
    with several arguments towards the latter [1].
    
    Link: https://lore.kernel.org/r/CAMuHMdX7t=mabqFE5O-Cii3REMuyaePHmqX+j_mqyrn6XXzsoA@mail.gmail.com/ [1]
    Signed-off-by: Krzysztof Kozlowski <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ata: libata-core: Disable LPM on ST2000DM008-2FR102 [+ + +]

Author: Niklas Cassel <[email protected]>
Date:   Tue Dec 9 05:24:00 2025 +0100

    ata: libata-core: Disable LPM on ST2000DM008-2FR102
    
    [ Upstream commit ba624ba88d9f5c3e2ace9bb6697dbeb05b2dbc44 ]
    
    According to a user report, the ST2000DM008-2FR102 has problems with LPM.
    
    Reported-by: Emerson Pinter <[email protected]>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220693
    Signed-off-by: Niklas Cassel <[email protected]>
    Signed-off-by: Damien Le Moal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

atm: Fix dma_free_coherent() size [+ + +]

Author: Thomas Fourier <[email protected]>
Date:   Wed Jan 7 10:01:36 2026 +0100

    atm: Fix dma_free_coherent() size
    
    commit 4d984b0574ff708e66152763fbfdef24ea40933f upstream.
    
    The size of the buffer is not the same when alloc'd with
    dma_alloc_coherent() in he_init_tpdrq() and freed.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: <[email protected]>
    Signed-off-by: Thomas Fourier <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

bnxt_en: Fix potential data corruption with HW GRO/LRO [+ + +]

Author: Srijit Bose <[email protected]>
Date:   Wed Dec 31 00:36:25 2025 -0800

    bnxt_en: Fix potential data corruption with HW GRO/LRO
    
    [ Upstream commit ffeafa65b2b26df2f5b5a6118d3174f17bd12ec5 ]
    
    Fix the max number of bits passed to find_first_zero_bit() in
    bnxt_alloc_agg_idx().  We were incorrectly passing the number of
    long words.  find_first_zero_bit() may fail to find a zero bit and
    cause a wrong ID to be used.  If the wrong ID is already in use, this
    can cause data corruption.  Sometimes an error like this can also be
    seen:
    
    bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1
    
    Fix it by passing the correct number of bits MAX_TPA_P5.  Use
    DECLARE_BITMAP() to more cleanly define the bitmap.  Add a sanity
    check to warn if a bit cannot be found and reset the ring [MChan].
    
    Fixes: ec4d8e7cf024 ("bnxt_en: Add TPA ID mapping logic for 57500 chips.")
    Reviewed-by: Ray Jui <[email protected]>
    Signed-off-by: Srijit Bose <[email protected]>
    Signed-off-by: Michael Chan <[email protected]>
    Reviewed-by: Vadim Fedorenko <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf, test_run: Subtract size of xdp_frame from allowed metadata size [+ + +]

Author: Toke Høiland-Jørgensen <[email protected]>
Date:   Mon Jan 5 12:47:45 2026 +0100

    bpf, test_run: Subtract size of xdp_frame from allowed metadata size
    
    [ Upstream commit e558cca217790286e799a8baacd1610bda31b261 ]
    
    The xdp_frame structure takes up part of the XDP frame headroom,
    limiting the size of the metadata. However, in bpf_test_run, we don't
    take this into account, which makes it possible for userspace to supply
    a metadata size that is too large (taking up the entire headroom).
    
    If userspace supplies such a large metadata size in live packet mode,
    the xdp_update_frame_from_buff() call in xdp_test_run_init_page() call
    will fail, after which packet transmission proceeds with an
    uninitialised frame structure, leading to the usual Bad Stuff.
    
    The commit in the Fixes tag fixed a related bug where the second check
    in xdp_update_frame_from_buff() could fail, but did not add any
    additional constraints on the metadata size. Complete the fix by adding
    an additional check on the metadata size. Reorder the checks slightly to
    make the logic clearer and add a comment.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: b6f1f780b393 ("bpf, test_run: Fix packet size check for live packet mode")
    Reported-by: Yinhao Hu <[email protected]>
    Reported-by: Kaiyan Mei <[email protected]>
    Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
    Reviewed-by: Amery Hung <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K [+ + +]

Author: Yonghong Song <[email protected]>
Date:   Wed Jun 11 20:50:32 2025 -0700

    bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K
    
    [ Upstream commit 4fc012daf9c074772421c904357abf586336b1ca ]
    
    The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on
    arm64 with 64KB page:
       xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL
    
    In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on
    when constructing frags, with 64K page size, the frag data_len could
    be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail().
    
    To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel
    can test different page size properly. With the kernel change, the user
    space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default
    value is 17, so for 4K page, the maximum packet size will be less than 68K.
    To test 64K page, a bigger maximum packet size than 68K is desired. So two
    different functions are implemented for subtest xdp_adjust_frags_tail_grow.
    Depending on different page size, different data input/output sizes are used
    to adapt with different page size.
    
    Signed-off-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size")
    Signed-off-by: Sasha Levin <[email protected]>

bpf: Fix reference count leak in bpf_prog_test_run_xdp() [+ + +]

Author: Tetsuo Handa <[email protected]>
Date:   Thu Jan 8 21:36:48 2026 +0900

    bpf: Fix reference count leak in bpf_prog_test_run_xdp()
    
    [ Upstream commit ec69daabe45256f98ac86c651b8ad1b2574489a7 ]
    
    syzbot is reporting
    
      unregister_netdevice: waiting for sit0 to become free. Usage count = 2
    
    problem. A debug printk() patch found that a refcount is obtained at
    xdp_convert_md_to_buff() from bpf_prog_test_run_xdp().
    
    According to commit ec94670fcb3b ("bpf: Support specifying ingress via
    xdp_md context in BPF_PROG_TEST_RUN"), the refcount obtained by
    xdp_convert_md_to_buff() will be released by xdp_convert_buff_to_md().
    
    Therefore, we can consider that the error handling path introduced by
    commit 1c1949982524 ("bpf: introduce frags support to
    bpf_prog_test_run_xdp()") forgot to call xdp_convert_buff_to_md().
    
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
    Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
    Signed-off-by: Tetsuo Handa <[email protected]>
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf: Make variables in bpf_prog_test_run_xdp less confusing [+ + +]

Author: Amery Hung <[email protected]>
Date:   Mon Sep 22 16:33:53 2025 -0700

    bpf: Make variables in bpf_prog_test_run_xdp less confusing
    
    [ Upstream commit 7eb83bff02ad5e82e8c456c58717ef181c220870 ]
    
    Change the variable naming in bpf_prog_test_run_xdp() to make the
    overall logic less confusing. As different modes were added to the
    function over the time, some variables got overloaded, making
    it hard to understand and changing the code becomes error-prone.
    
    Replace "size" with "linear_sz" where it refers to the size of metadata
    and data. If "size" refers to input data size, use test.data_size_in
    directly.
    
    Replace "max_data_sz" with "max_linear_sz" to better reflect the fact
    that it is the maximum size of metadata and data (i.e., linear_sz). Also,
    xdp_rxq.frags_size is always PAGE_SIZE, so just set it directly instead
    of subtracting headroom and tailroom and adding them back.
    
    Signed-off-by: Amery Hung <[email protected]>
    Signed-off-by: Martin KaFai Lau <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size")
    Signed-off-by: Sasha Levin <[email protected]>

bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN [+ + +]

Author: Amery Hung <[email protected]>
Date:   Mon Sep 22 16:33:54 2025 -0700

    bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN
    
    [ Upstream commit fe9544ed1a2e9217b2c5285c3a4ac0dc5a38bd7b ]
    
    To test bpf_xdp_pull_data(), an xdp packet containing fragments as well
    as free linear data area after xdp->data_end needs to be created.
    However, bpf_prog_test_run_xdp() always fills the linear area with
    data_in before creating fragments, leaving no space to pull data. This
    patch will allow users to specify the linear data size through
    ctx->data_end.
    
    Currently, ctx_in->data_end must match data_size_in and will not be the
    final ctx->data_end seen by xdp programs. This is because ctx->data_end
    is populated according to the xdp_buff passed to test_run. The linear
    data area available in an xdp_buff, max_linear_sz, is alawys filled up
    before copying data_in into fragments.
    
    This patch will allow users to specify the size of data that goes into
    the linear area. When ctx_in->data_end is different from data_size_in,
    only ctx_in->data_end bytes of data will be put into the linear area when
    creating the xdp_buff.
    
    While ctx_in->data_end will be allowed to be different from data_size_in,
    it cannot be larger than the data_size_in as there will be no data to
    copy from user space. If it is larger than the maximum linear data area
    size, the layout suggested by the user will not be honored. Data beyond
    max_linear_sz bytes will still be copied into fragments.
    
    Finally, since it is possible for a NIC to produce a xdp_buff with empty
    linear data area, allow it when calling bpf_test_init() from
    bpf_prog_test_run_xdp() so that we can test XDP kfuncs with such
    xdp_buff. This is done by moving lower-bound check to callers as most of
    them already do except bpf_prog_test_run_skb(). The change also fixes a
    bug that allows passing an xdp_buff with data < ETH_HLEN. This can
    happen when ctx is used and metadata is at least ETH_HLEN.
    
    Signed-off-by: Amery Hung <[email protected]>
    Signed-off-by: Martin KaFai Lau <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Stable-dep-of: e558cca21779 ("bpf, test_run: Subtract size of xdp_frame from allowed metadata size")
    Signed-off-by: Sasha Levin <[email protected]>

bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path [+ + +]

Author: Shardul Bankar <[email protected]>
Date:   Tue Oct 14 17:30:37 2025 +0530

    bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
    
    commit 7f9ee5fc97e14682e36fe22ae2654c07e4998b82 upstream.
    
    Fix a memory leak in bpf_prog_test_run_xdp() where the context buffer
    allocated by bpf_ctx_init() is not freed when the function returns early
    due to a data size check.
    
    On the failing path:
      ctx = bpf_ctx_init(...);
      if (kattr->test.data_size_in - meta_sz < ETH_HLEN)
          return -EINVAL;
    
    The early return bypasses the cleanup label that kfree()s ctx, leading to a
    leak detectable by kmemleak under fuzzing. Change the return to jump to the
    existing free_ctx label.
    
    Fixes: fe9544ed1a2e ("bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN")
    Reported-by: BPF Runtime Fuzzer (BRF)
    Signed-off-by: Shardul Bankar <[email protected]>
    Signed-off-by: Martin KaFai Lau <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress [+ + +]

Author: Alexandre Knecht <[email protected]>
Date:   Sun Dec 28 03:00:57 2025 +0100

    bridge: fix C-VLAN preservation in 802.1ad vlan_tunnel egress
    
    [ Upstream commit 3128df6be147768fe536986fbb85db1d37806a9f ]
    
    When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is
    incorrectly stripped from frames during egress processing.
    
    br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN
    from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also
    moves any "next" VLAN from the payload into hwaccel:
    
        /* move next vlan tag to hw accel tag */
        __skb_vlan_pop(skb, &vlan_tci);
        __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
    
    For QinQ frames where the C-VLAN sits in the payload, this moves it to
    hwaccel where it gets lost during VXLAN encapsulation.
    
    Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only
    the hwaccel S-VLAN and leaves the payload untouched.
    
    This path is only taken when vlan_tunnel is enabled and tunnel_info
    is configured, so 802.1Q bridges are unaffected.
    
    Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN
    preserved in VXLAN payload via tcpdump.
    
    Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
    Signed-off-by: Alexandre Knecht <[email protected]>
    Reviewed-by: Ido Schimmel <[email protected]>
    Acked-by: Nikolay Aleksandrov <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: add extra error messages for delalloc range related errors [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Mon Jan 12 09:55:51 2026 -0500

    btrfs: add extra error messages for delalloc range related errors
    
    [ Upstream commit 975a6a8855f45729a0fbfe2a8f2df2d3faef2a97 ]
    
    All the error handling bugs I hit so far are all -ENOSPC from either:
    
    - cow_file_range()
    - run_delalloc_nocow()
    - submit_uncompressed_range()
    
    Previously when those functions failed, there was no error message at
    all, making the debugging much harder.
    
    So here we introduce extra error messages for:
    
    - cow_file_range()
    - run_delalloc_nocow()
    - submit_uncompressed_range()
    - writepage_delalloc() when btrfs_run_delalloc_range() failed
    - extent_writepage() when extent_writepage_io() failed
    
    One example of the new debug error messages is the following one:
    
      run fstests generic/750 at 2024-12-08 12:41:41
      BTRFS: device fsid 461b25f5-e240-4543-8deb-e7c2bd01a6d3 devid 1 transid 8 /dev/mapper/test-scratch1 (253:4) scanned by mount (2436600)
      BTRFS info (device dm-4): first mount of filesystem 461b25f5-e240-4543-8deb-e7c2bd01a6d3
      BTRFS info (device dm-4): using crc32c (crc32c-arm64) checksum algorithm
      BTRFS info (device dm-4): forcing free space tree for sector size 4096 with page size 65536
      BTRFS info (device dm-4): using free-space-tree
      BTRFS warning (device dm-4): read-write for sector size 4096 with page size 65536 is experimental
      BTRFS info (device dm-4): checking UUID tree
      BTRFS error (device dm-4): cow_file_range failed, root=363 inode=412 start=503808 len=98304: -28
      BTRFS error (device dm-4): run_delalloc_nocow failed, root=363 inode=412 start=503808 len=98304: -28
      BTRFS error (device dm-4): failed to run delalloc range, root=363 ino=412 folio=458752 submit_bitmap=11-15 start=503808 len=98304: -28
    
    Which shows an error from cow_file_range() which is called inside a
    nocow write attempt, along with the extra bitmap from
    writepage_delalloc().
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: e9e3b22ddfa7 ("btrfs: fix beyond-EOF write handling")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: always detect conflicting inodes when logging inode refs [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Dec 11 15:06:26 2025 +0000

    btrfs: always detect conflicting inodes when logging inode refs
    
    commit 7ba0b6461bc4edb3005ea6e00cdae189bcf908a5 upstream.
    
    After rename exchanging (either with the rename exchange operation or
    regular renames in multiple non-atomic steps) two inodes and at least
    one of them is a directory, we can end up with a log tree that contains
    only of the inodes and after a power failure that can result in an attempt
    to delete the other inode when it should not because it was not deleted
    before the power failure. In some case that delete attempt fails when
    the target inode is a directory that contains a subvolume inside it, since
    the log replay code is not prepared to deal with directory entries that
    point to root items (only inode items).
    
    1) We have directories "dir1" (inode A) and "dir2" (inode B) under the
       same parent directory;
    
    2) We have a file (inode C) under directory "dir1" (inode A);
    
    3) We have a subvolume inside directory "dir2" (inode B);
    
    4) All these inodes were persisted in a past transaction and we are
       currently at transaction N;
    
    5) We rename the file (inode C), so at btrfs_log_new_name() we update
       inode C's last_unlink_trans to N;
    
    6) We get a rename exchange for "dir1" (inode A) and "dir2" (inode B),
       so after the exchange "dir1" is inode B and "dir2" is inode A.
       During the rename exchange we call btrfs_log_new_name() for inodes
       A and B, but because they are directories, we don't update their
       last_unlink_trans to N;
    
    7) An fsync against the file (inode C) is done, and because its inode
       has a last_unlink_trans with a value of N we log its parent directory
       (inode A) (through btrfs_log_all_parents(), called from
       btrfs_log_inode_parent()).
    
    8) So we end up with inode B not logged, which now has the old name
       of inode A. At copy_inode_items_to_log(), when logging inode A, we
       did not check if we had any conflicting inode to log because inode
       A has a generation lower than the current transaction (created in
       a past transaction);
    
    9) After a power failure, when replaying the log tree, since we find that
       inode A has a new name that conflicts with the name of inode B in the
       fs tree, we attempt to delete inode B... this is wrong since that
       directory was never deleted before the power failure, and because there
       is a subvolume inside that directory, attempting to delete it will fail
       since replay_dir_deletes() and btrfs_unlink_inode() are not prepared
       to deal with dir items that point to roots instead of inodes.
    
       When that happens the mount fails and we get a stack trace like the
       following:
    
       [87.2314] BTRFS info (device dm-0): start tree-log replay
       [87.2318] BTRFS critical (device dm-0): failed to delete reference to subvol, root 5 inode 256 parent 259
       [87.2332] ------------[ cut here ]------------
       [87.2338] BTRFS: Transaction aborted (error -2)
       [87.2346] WARNING: CPU: 1 PID: 638968 at fs/btrfs/inode.c:4345 __btrfs_unlink_inode+0x416/0x440 [btrfs]
       [87.2368] Modules linked in: btrfs loop dm_thin_pool (...)
       [87.2470] CPU: 1 UID: 0 PID: 638968 Comm: mount Tainted: G        W           6.18.0-rc7-btrfs-next-218+ #2 PREEMPT(full)
       [87.2489] Tainted: [W]=WARN
       [87.2494] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
       [87.2514] RIP: 0010:__btrfs_unlink_inode+0x416/0x440 [btrfs]
       [87.2538] Code: c0 89 04 24 (...)
       [87.2568] RSP: 0018:ffffc0e741f4b9b8 EFLAGS: 00010286
       [87.2574] RAX: 0000000000000000 RBX: ffff9d3ec8a6cf60 RCX: 0000000000000000
       [87.2582] RDX: 0000000000000002 RSI: ffffffff84ab45a1 RDI: 00000000ffffffff
       [87.2591] RBP: ffff9d3ec8a6ef20 R08: 0000000000000000 R09: ffffc0e741f4b840
       [87.2599] R10: ffff9d45dc1fffa8 R11: 0000000000000003 R12: ffff9d3ee26d77e0
       [87.2608] R13: ffffc0e741f4ba98 R14: ffff9d4458040800 R15: ffff9d44b6b7ca10
       [87.2618] FS:  00007f7b9603a840(0000) GS:ffff9d4658982000(0000) knlGS:0000000000000000
       [87.2629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [87.2637] CR2: 00007ffc9ec33b98 CR3: 000000011273e003 CR4: 0000000000370ef0
       [87.2648] Call Trace:
       [87.2651]  <TASK>
       [87.2654]  btrfs_unlink_inode+0x15/0x40 [btrfs]
       [87.2661]  unlink_inode_for_log_replay+0x27/0xf0 [btrfs]
       [87.2669]  check_item_in_log+0x1ea/0x2c0 [btrfs]
       [87.2676]  replay_dir_deletes+0x16b/0x380 [btrfs]
       [87.2684]  fixup_inode_link_count+0x34b/0x370 [btrfs]
       [87.2696]  fixup_inode_link_counts+0x41/0x160 [btrfs]
       [87.2703]  btrfs_recover_log_trees+0x1ff/0x7c0 [btrfs]
       [87.2711]  ? __pfx_replay_one_buffer+0x10/0x10 [btrfs]
       [87.2719]  open_ctree+0x10bb/0x15f0 [btrfs]
       [87.2726]  btrfs_get_tree.cold+0xb/0x16c [btrfs]
       [87.2734]  ? fscontext_read+0x15c/0x180
       [87.2740]  ? rw_verify_area+0x50/0x180
       [87.2746]  vfs_get_tree+0x25/0xd0
       [87.2750]  vfs_cmd_create+0x59/0xe0
       [87.2755]  __do_sys_fsconfig+0x4f6/0x6b0
       [87.2760]  do_syscall_64+0x50/0x1220
       [87.2764]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
       [87.2770] RIP: 0033:0x7f7b9625f4aa
       [87.2775] Code: 73 01 c3 48 (...)
       [87.2803] RSP: 002b:00007ffc9ec35b08 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
       [87.2817] RAX: ffffffffffffffda RBX: 0000558bfa91ac20 RCX: 00007f7b9625f4aa
       [87.2829] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
       [87.2842] RBP: 0000558bfa91b120 R08: 0000000000000000 R09: 0000000000000000
       [87.2854] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       [87.2864] R13: 00007f7b963f1580 R14: 00007f7b963f326c R15: 00007f7b963d8a23
       [87.2877]  </TASK>
       [87.2882] ---[ end trace 0000000000000000 ]---
       [87.2891] BTRFS: error (device dm-0 state A) in __btrfs_unlink_inode:4345: errno=-2 No such entry
       [87.2904] BTRFS: error (device dm-0 state EAO) in do_abort_log_replay:191: errno=-2 No such entry
       [87.2915] BTRFS critical (device dm-0 state EAO): log tree (for root 5) leaf currently being processed (slot 7 key (258 12 257)):
       [87.2929] BTRFS info (device dm-0 state EAO): leaf 30736384 gen 10 total ptrs 7 free space 15712 owner 18446744073709551610
       [87.2929] BTRFS info (device dm-0 state EAO): refs 3 lock_owner 0 current 638968
       [87.2929]      item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
       [87.2929]              inode generation 9 transid 10 size 0 nbytes 0
       [87.2929]              block group 0 mode 40755 links 1 uid 0 gid 0
       [87.2929]              rdev 0 sequence 7 flags 0x0
       [87.2929]              atime 1765464494.678070921
       [87.2929]              ctime 1765464494.686606513
       [87.2929]              mtime 1765464494.686606513
       [87.2929]              otime 1765464494.678070921
       [87.2929]      item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14
       [87.2929]              index 4 name_len 4
       [87.2929]      item 2 key (257 DIR_LOG_INDEX 2) itemoff 16101 itemsize 8
       [87.2929]              dir log end 2
       [87.2929]      item 3 key (257 DIR_LOG_INDEX 3) itemoff 16093 itemsize 8
       [87.2929]              dir log end 18446744073709551615
       [87.2930]      item 4 key (257 DIR_INDEX 3) itemoff 16060 itemsize 33
       [87.2930]              location key (258 1 0) type 1
       [87.2930]              transid 10 data_len 0 name_len 3
       [87.2930]      item 5 key (258 INODE_ITEM 0) itemoff 15900 itemsize 160
       [87.2930]              inode generation 9 transid 10 size 0 nbytes 0
       [87.2930]              block group 0 mode 100644 links 1 uid 0 gid 0
       [87.2930]              rdev 0 sequence 2 flags 0x0
       [87.2930]              atime 1765464494.678456467
       [87.2930]              ctime 1765464494.686606513
       [87.2930]              mtime 1765464494.678456467
       [87.2930]              otime 1765464494.678456467
       [87.2930]      item 6 key (258 INODE_REF 257) itemoff 15887 itemsize 13
       [87.2930]              index 3 name_len 3
       [87.2930] BTRFS critical (device dm-0 state EAO): log replay failed in unlink_inode_for_log_replay:1045 for root 5, stage 3, with error -2: failed to unlink inode 256 parent dir 259 name subvol root 5
       [87.2963] BTRFS: error (device dm-0 state EAO) in btrfs_recover_log_trees:7743: errno=-2 No such entry
       [87.2981] BTRFS: error (device dm-0 state EAO) in btrfs_replay_log:2083: errno=-2 No such entry (Failed to recover log tr
    
    So fix this by changing copy_inode_items_to_log() to always detect if
    there are conflicting inodes for the ref/extref of the inode being logged
    even if the inode was created in a past transaction.
    
    A test case for fstests will follow soon.
    
    CC: [email protected] # 6.1+
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix beyond-EOF write handling [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Mon Jan 12 09:55:55 2026 -0500

    btrfs: fix beyond-EOF write handling
    
    [ Upstream commit e9e3b22ddfa760762b696ac6417c8d6edd182e49 ]
    
    [BUG]
    For the following write sequence with 64K page size and 4K fs block size,
    it will lead to file extent items to be inserted without any data
    checksum:
    
      mkfs.btrfs -s 4k -f $dev > /dev/null
      mount $dev $mnt
      xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
                -c "truncate 16k" $mnt/foobar
      umount $mnt
    
    This will result the following 2 file extent items to be inserted (extra
    trace point added to insert_ordered_extent_file_extent()):
    
      btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
      btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
    
    Note for file offset 60K, we're inserting a file extent without any
    data checksum.
    
    Also note that range [32K, 36K) didn't reach
    insert_ordered_extent_file_extent(), which is the correct behavior as
    that OE is fully truncated, should not result any file extent.
    
    Although file extent at 60K will be later dropped by btrfs_truncate(),
    if the transaction got committed after file extent inserted but before
    the file extent dropping, we will have a small window where we have a
    file extent beyond EOF and without any data checksum.
    
    That will cause "btrfs check" to report error.
    
    [CAUSE]
    The sequence happens like this:
    
    - Buffered write dirtied the page cache and updated isize
    
      Now the inode size is 64K, with the following page cache layout:
    
      0             16K             32K              48K           64K
      |/////////////|               |//|                        |//|
    
    - Truncate the inode to 16K
      Which will trigger writeback through:
    
      btrfs_setsize()
      |- truncate_setsize()
      |  Now the inode size is set to 16K
      |
      |- btrfs_truncate()
         |- btrfs_wait_ordered_range() for [16K, u64(-1)]
            |- btrfs_fdatawrite_range() for [16K, u64(-1)}
               |- extent_writepage() for folio 0
                  |- writepage_delalloc()
                  |  Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
                  |
                  |- extent_writepage_io()
    
      Then inside extent_writepage_io(), the dirty fs blocks are handled
      differently:
    
      - Submit write for range [0, 16K)
        As they are still inside the inode size (16K).
    
      - Mark OE [32K, 36K) as truncated
        Since we only call btrfs_lookup_first_ordered_range() once, which
        returned the first OE after file offset 16K.
    
      - Mark all OEs inside range [16K, 64K) as finished
        Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
    
        For OE [32K, 36K) since it's already marked as truncated, and its
        truncated length is 0, no file extent will be inserted.
    
        For OE [60K, 64K) it has never been submitted thus has no data
        checksum, and we insert the file extent as usual.
        This is the root cause of file extent at 60K to be inserted without
        any data checksum.
    
      - Clear dirty flags for range [16K, 64K)
        It is the function btrfs_folio_clear_dirty() which searches and clears
        any dirty blocks inside that range.
    
    [FIX]
    The bug itself was introduced a long time ago, way before subpage and
    large folio support.
    
    At that time, fs block size must match page size, thus the range
    [cur, end) is just one fs block.
    
    But later with subpage and large folios, the same range [cur, end)
    can have multiple blocks and ordered extents.
    
    Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
    writeback past i_size") was fixing a bug related to subpage/large
    folios, but it's still utilizing the old range [cur, end), meaning only
    the first OE will be marked as truncated.
    
    The proper fix here is to make EOF handling block-by-block, not trying
    to handle the whole range to @end.
    
    By this we always locate and truncate the OE for every dirty block.
    
    CC: [email protected] # 5.15+
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix error handling of submit_uncompressed_range() [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Mon Jan 12 09:55:49 2026 -0500

    btrfs: fix error handling of submit_uncompressed_range()
    
    [ Upstream commit a7858d5c36cae52eaf3048490b05c0b19086073b ]
    
    [BUG]
    If we failed to compress the range, or cannot reserve a large enough
    data extent (e.g. too fragmented free space), we will fall back to
    submit_uncompressed_range().
    
    But inside submit_uncompressed_range(), run_delalloc_cow() can also fail
    due to -ENOSPC or any other error.
    
    In that case there are 3 bugs in the error handling:
    
    1) Double freeing for the same ordered extent
       This can lead to crash due to ordered extent double accounting
    
    2) Start/end writeback without updating the subpage writeback bitmap
    
    3) Unlock the folio without clear the subpage lock bitmap
    
    Both bugs 2) and 3) will crash the kernel if the btrfs block size is
    smaller than folio size, as the next time the folio gets writeback/lock
    updates, subpage will find the bitmap already have the range set,
    triggering an ASSERT().
    
    [CAUSE]
    Bug 1) happens in the following call chain:
    
      submit_uncompressed_range()
      |- run_delalloc_cow()
      |  |- cow_file_range()
      |     |- btrfs_reserve_extent()
      |        Failed with -ENOSPC or whatever error
      |
      |- btrfs_clean_up_ordered_extents()
      |  |- btrfs_mark_ordered_io_finished()
      |     Which cleans all the ordered extents in the async_extent range.
      |
      |- btrfs_mark_ordered_io_finished()
         Which cleans the folio range.
    
    The finished ordered extents may not be immediately removed from the
    ordered io tree, as they are removed inside a work queue.
    
    So the second btrfs_mark_ordered_io_finished() may find the finished but
    not-yet-removed ordered extents, and double free them.
    
    Furthermore, the second btrfs_mark_ordered_io_finished() is not subpage
    compatible, as it uses fixed folio_pos() with PAGE_SIZE, which can cover
    other ordered extents.
    
    Bugs 2) and 3) are more straightforward, btrfs just calls folio_unlock(),
    folio_start_writeback() and folio_end_writeback(), other than the helpers
    which handle subpage cases.
    
    [FIX]
    For bug 1) since the first btrfs_cleanup_ordered_extents() call is
    handling the whole range, we should not do the second
    btrfs_mark_ordered_io_finished() call.
    
    And for the first btrfs_cleanup_ordered_extents(), we no longer need to
    pass the @locked_page parameter, as we are already in the async extent
    context, thus will never rely on the error handling inside
    btrfs_run_delalloc_range().
    
    So just let the btrfs_clean_up_ordered_extents() handle every folio
    equally.
    
    For bug 2) we should not even call
    folio_start_writeback()/folio_end_writeback() anymore.
    As the error handling protocol, cow_file_range() should clear
    dirty flag and start/finish the writeback for the whole range passed in.
    
    For bug 3) just change the folio_unlock() to btrfs_folio_end_lock()
    helper.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: e9e3b22ddfa7 ("btrfs: fix beyond-EOF write handling")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix NULL dereference on root when tracing inode eviction [+ + +]

Author: Miquel Sabaté Solà <[email protected]>
Date:   Tue Oct 21 11:11:25 2025 +0200

    btrfs: fix NULL dereference on root when tracing inode eviction
    
    [ Upstream commit f157dd661339fc6f5f2b574fe2429c43bd309534 ]
    
    When evicting an inode the first thing we do is to setup tracing for it,
    which implies fetching the root's id. But in btrfs_evict_inode() the
    root might be NULL, as implied in the next check that we do in
    btrfs_evict_inode().
    
    Hence, we either should set the ->root_objectid to 0 in case the root is
    NULL, or we move tracing setup after checking that the root is not
    NULL. Setting the rootid to 0 at least gives us the possibility to trace
    this call even in the case when the root is NULL, so that's the solution
    taken here.
    
    Fixes: 1abe9b8a138c ("Btrfs: add initial tracepoint support for btrfs")
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=d991fea1b4b23b1f6bf8
    Signed-off-by: Miquel Sabaté Solà <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: fix qgroup_snapshot_quick_inherit() squota bug [+ + +]

Author: Boris Burkov <[email protected]>
Date:   Mon Dec 1 12:47:14 2025 -0800

    btrfs: fix qgroup_snapshot_quick_inherit() squota bug
    
    [ Upstream commit 7ee19a59a75e3d5b9ec00499b86af8e2a46fbe86 ]
    
    qgroup_snapshot_quick_inherit() detects conditions where the snapshot
    destination would land in the same parent qgroup as the snapshot source
    subvolume. In this case we can avoid costly qgroup calculations and just
    add the nodesize of the new snapshot to the parent.
    
    However, in the case of squotas this is actually a double count, and
    also an undercount for deeper qgroup nestings.
    
    The following annotated script shows the issue:
    
      btrfs quota enable --simple "$mnt"
    
      # Create 2-level qgroup hierarchy
      btrfs qgroup create 2/100 "$mnt"  # Q2 (level 2)
      btrfs qgroup create 1/100 "$mnt"  # Q1 (level 1)
      btrfs qgroup assign 1/100 2/100 "$mnt"
    
      # Create base subvolume
      btrfs subvolume create "$mnt/base" >/dev/null
      base_id=$(btrfs subvolume show "$mnt/base" | grep 'Subvolume ID:' | awk '{print $3}')
    
      # Create intermediate snapshot and add to Q1
      btrfs subvolume snapshot "$mnt/base" "$mnt/intermediate" >/dev/null
      inter_id=$(btrfs subvolume show "$mnt/intermediate" | grep 'Subvolume ID:' | awk '{print $3}')
      btrfs qgroup assign "0/$inter_id" 1/100 "$mnt"
    
      # Create working snapshot with --inherit (auto-adds to Q1)
      # src=intermediate (in only Q1)
      # dst=snap (inheriting only into Q1)
      # This double counts the 16k nodesize of the snapshot in Q1, and
      # undercounts it in Q2.
      btrfs subvolume snapshot -i 1/100 "$mnt/intermediate" "$mnt/snap" >/dev/null
      snap_id=$(btrfs subvolume show "$mnt/snap" | grep 'Subvolume ID:' | awk '{print $3}')
    
      # Fully complete snapshot creation
      sync
    
      # Delete working snapshot
      # Q1 and Q2 will lose the full snap usage
      btrfs subvolume delete "$mnt/snap" >/dev/null
    
      # Delete intermediate and remove from Q1
      # Q1 and Q2 will lose the full intermediate usage
      btrfs qgroup remove "0/$inter_id" 1/100 "$mnt"
      btrfs subvolume delete "$mnt/intermediate" >/dev/null
    
      # Q1 should be at 0, but still has 16k. Q2 is "correct" at 0 (for now...)
    
      # Trigger cleaner, wait for deletions
      mount -o remount,sync=1 "$mnt"
      btrfs subvolume sync "$mnt" "$snap_id"
      btrfs subvolume sync "$mnt" "$inter_id"
    
      # Remove Q1 from Q2
      # Frees 16k more from Q2, underflowing it to 16EiB
      btrfs qgroup remove 1/100 2/100 "$mnt"
    
      # And show the bad state:
      btrfs qgroup show -pc "$mnt"
    
            Qgroupid    Referenced    Exclusive Parent   Child   Path
            --------    ----------    --------- ------   -----   ----
            0/5           16.00KiB     16.00KiB -        -       <toplevel>
            0/256         16.00KiB     16.00KiB -        -       base
            1/100         16.00KiB     16.00KiB -        -       <0 member qgroups>
            2/100         16.00EiB     16.00EiB -        -       <0 member qgroups>
    
    Fix this by simply not doing this quick inheritance with squotas.
    
    I suspect that it is also wrong in normal qgroups to not recurse up the
    qgroup tree in the quick inherit case, though other consistency checks
    will likely fix it anyway.
    
    Fixes: b20fe56cd285 ("btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent")
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Boris Burkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: only enforce free space tree if v1 cache is required for bs < ps cases [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Thu Dec 18 15:15:28 2025 +1030

    btrfs: only enforce free space tree if v1 cache is required for bs < ps cases
    
    [ Upstream commit 30bcf4e824aa37d305502f52e1527c7b1eabef3d ]
    
    [BUG]
    Since the introduction of btrfs bs < ps support, v1 cache was never on
    the plan due to its hard coded PAGE_SIZE usage, and the future plan to
    properly deprecate it.
    
    However for bs < ps cases, even if 'nospace_cache,clear_cache' mount
    option is specified, it's never respected and free space tree is always
    enabled:
    
      mkfs.btrfs -f -O ^bgt,fst $dev
      mount $dev $mnt -o clear_cache,nospace_cache
      umount $mnt
      btrfs ins dump-super $dev
      ...
      compat_ro_flags               0x3
                            ( FREE_SPACE_TREE |
                              FREE_SPACE_TREE_VALID )
      ...
    
    This means a different behavior compared to bs >= ps cases.
    
    [CAUSE]
    The forcing usage of v2 space cache is done inside
    btrfs_set_free_space_cache_settings(), however it never checks if we're
    even using space cache but always enabling v2 cache.
    
    [FIX]
    Instead unconditionally enable v2 cache, only forcing v2 cache if the
    old v1 cache is required.
    
    Now v2 space cache can be properly disabled on bs < ps cases:
    
      mkfs.btrfs -f -O ^bgt,fst $dev
      mount $dev $mnt -o clear_cache,nospace_cache
      umount $mnt
      btrfs ins dump-super $dev
      ...
      compat_ro_flags               0x0
      ...
    
    Fixes: 9f73f1aef98b ("btrfs: force v2 space cache usage for subpage mount")
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: qgroup: update all parent qgroups when doing quick inherit [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Thu Dec 4 14:38:23 2025 +1030

    btrfs: qgroup: update all parent qgroups when doing quick inherit
    
    [ Upstream commit 68d4b3fa18d72b7f649e83012e7e08f1881f6b75 ]
    
    [BUG]
    There is a bug that if a subvolume has multi-level parent qgroups, and
    is able to do a quick inherit, only the direct parent qgroup got
    updated:
    
      mkfs.btrfs  -f -O quota $dev
      mount $dev $mnt
      btrfs subv create $mnt/subv1
      btrfs qgroup create 1/100 $mnt
      btrfs qgroup create 2/100 $mnt
      btrfs qgroup assign 1/100 2/100 $mnt
      btrfs qgroup assign 0/256 1/100 $mnt
      btrfs qgroup show -p --sync $mnt
    
      Qgroupid    Referenced    Exclusive Parent     Path
      --------    ----------    --------- ------     ----
      0/5           16.00KiB     16.00KiB -          <toplevel>
      0/256         16.00KiB     16.00KiB 1/100      subv1
      1/100         16.00KiB     16.00KiB 2/100      2/100<1 member qgroup>
      2/100         16.00KiB     16.00KiB -          <0 member qgroups>
    
      btrfs subv snap -i 1/100 $mnt/subv1 $mnt/snap1
      btrfs qgroup show -p --sync $mnt
    
      Qgroupid    Referenced    Exclusive Parent     Path
      --------    ----------    --------- ------     ----
      0/5           16.00KiB     16.00KiB -          <toplevel>
      0/256         16.00KiB     16.00KiB 1/100      subv1
      0/257         16.00KiB     16.00KiB 1/100      snap1
      1/100         32.00KiB     32.00KiB 2/100      2/100<1 member qgroup>
      2/100         16.00KiB     16.00KiB -          <0 member qgroups>
      # Note that 2/100 is not updated, and qgroup numbers are inconsistent
    
      umount $mnt
    
    [CAUSE]
    If the snapshot source subvolume belongs to a parent qgroup, and the new
    snapshot target is also added to the new same parent qgroup, we allow a
    quick update without marking qgroup inconsistent.
    
    But that quick update only update the parent qgroup, without checking if
    there is any more parent qgroups.
    
    [FIX]
    Iterate through all parent qgroups during the quick inherit.
    
    Reported-by: Boris Burkov <[email protected]>
    Fixes: b20fe56cd285 ("btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent")
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: remove btrfs_fs_info::sectors_per_page [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Mon Jan 12 09:55:52 2026 -0500

    btrfs: remove btrfs_fs_info::sectors_per_page
    
    [ Upstream commit 619611e87fcca1fdaa67c2bf6b030863ab90216e ]
    
    For the future large folio support, our filemap can have folios with
    different sizes, thus we can no longer rely on a fixed blocks_per_page
    value.
    
    To prepare for that future, here we do:
    
    - Remove btrfs_fs_info::sectors_per_page
    
    - Introduce a helper, btrfs_blocks_per_folio()
      Which uses the folio size to calculate the number of blocks for each
      folio.
    
    - Migrate the existing btrfs_fs_info::sectors_per_page to use that
      helper
      There are some exceptions:
    
      * Metadata nodesize < page size support
        In the future, even if we support large folios, we will only
        allocate a folio that matches our nodesize.
        Thus we won't have a folio covering multiple metadata unless
        nodesize < page size.
    
      * Existing subpage bitmap dump
        We use a single unsigned long to store the bitmap.
        That means until we change the bitmap dumping code, our upper limit
        for folio size will only be 256K (4K block size, 64 bit unsigned
        long).
    
      * btrfs_is_subpage() check
        This will be migrated into a future patch.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: e9e3b22ddfa7 ("btrfs: fix beyond-EOF write handling")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: subpage: dump the involved bitmap when ASSERT() failed [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Mon Jan 12 09:55:50 2026 -0500

    btrfs: subpage: dump the involved bitmap when ASSERT() failed
    
    [ Upstream commit 61d730731b47eeee42ad11fc71e145d269acab8d ]
    
    For btrfs_folio_assert_not_dirty() and btrfs_folio_set_lock(), we call
    bitmap_test_range_all_zero() to ensure the involved range has no
    dirty/lock bit already set.
    
    However with my recent enhanced delalloc range error handling, I was
    hitting the ASSERT() inside btrfs_folio_set_lock(), and it turns out
    that some error handling path is not properly updating the folio flags.
    
    So add some extra dumping for the ASSERTs to dump the involved bitmap
    to help debug.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: e9e3b22ddfa7 ("btrfs: fix beyond-EOF write handling")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: tracepoints: use btrfs_root_id() to get the id of a root [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Apr 3 16:23:41 2025 +0100

    btrfs: tracepoints: use btrfs_root_id() to get the id of a root
    
    [ Upstream commit 0f987c099d22c3b8c7d94fd13f957792e46f79c9 ]
    
    Instead of open coding btrfs_root_id() to get the ID of a root, use the
    helper in the trace points, which also makes the code less verbose.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: f157dd661339 ("btrfs: fix NULL dereference on root when tracing inode eviction")
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: truncate ordered extent when skipping writeback past i_size [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Mon Jan 12 09:55:53 2026 -0500

    btrfs: truncate ordered extent when skipping writeback past i_size
    
    [ Upstream commit 18de34daa7c62c830be533aace6b7c271e8e95cf ]
    
    While running test case btrfs/192 from fstests with support for large
    folios (needs CONFIG_BTRFS_EXPERIMENTAL=y) I ended up getting very sporadic
    btrfs check failures reporting that csum items were missing. Looking into
    the issue it turned out that btrfs check searches for csum items of a file
    extent item with a range that spans beyond the i_size of a file and we
    don't have any, because the kernel's writeback code skips submitting bios
    for ranges beyond eof. It's not expected however to find a file extent item
    that crosses the rounded up (by the sector size) i_size value, but there is
    a short time window where we can end up with a transaction commit leaving
    this small inconsistency between the i_size and the last file extent item.
    
    Example btrfs check output when this happens:
    
      $ btrfs check /dev/sdc
      Opening filesystem to check...
      Checking filesystem on /dev/sdc
      UUID: 69642c61-5efb-4367-aa31-cdfd4067f713
      [1/8] checking log skipped (none written)
      [2/8] checking root items
      [3/8] checking extents
      [4/8] checking free space tree
      [5/8] checking fs roots
      root 5 inode 332 errors 1000, some csum missing
      ERROR: errors found in fs roots
      (...)
    
    Looking at a tree dump of the fs tree (root 5) for inode 332 we have:
    
       $ btrfs inspect-internal dump-tree -t 5 /dev/sdc
       (...)
            item 28 key (332 INODE_ITEM 0) itemoff 2006 itemsize 160
                    generation 17 transid 19 size 610969 nbytes 86016
                    block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
                    sequence 11 flags 0x0(none)
                    atime 1759851068.391327881 (2025-10-07 16:31:08)
                    ctime 1759851068.410098267 (2025-10-07 16:31:08)
                    mtime 1759851068.410098267 (2025-10-07 16:31:08)
                    otime 1759851068.391327881 (2025-10-07 16:31:08)
            item 29 key (332 INODE_REF 340) itemoff 1993 itemsize 13
                    index 2 namelen 3 name: f1f
            item 30 key (332 EXTENT_DATA 589824) itemoff 1940 itemsize 53
                    generation 19 type 1 (regular)
                    extent data disk byte 21745664 nr 65536
                    extent data offset 0 nr 65536 ram 65536
                    extent compression 0 (none)
       (...)
    
    We can see that the file extent item for file offset 589824 has a length of
    64K and its number of bytes is 64K. Looking at the inode item we see that
    its i_size is 610969 bytes which falls within the range of that file extent
    item [589824, 655360[.
    
    Looking into the csum tree:
    
      $ btrfs inspect-internal dump-tree /dev/sdc
      (...)
            item 15 key (EXTENT_CSUM EXTENT_CSUM 21565440) itemoff 991 itemsize 200
                    range start 21565440 end 21770240 length 204800
               item 16 key (EXTENT_CSUM EXTENT_CSUM 1104576512) itemoff 983 itemsize 8
                    range start 1104576512 end 1104584704 length 8192
      (..)
    
    We see that the csum item number 15 covers the first 24K of the file extent
    item - it ends at offset 21770240 and the extent's disk_bytenr is 21745664,
    so we have:
    
       21770240 - 21745664 = 24K
    
    We see that the next csum item (number 16) is completely outside the range,
    so the remaining 40K of the extent doesn't have csum items in the tree.
    
    If we round up the i_size to the sector size, we get:
    
       round_up(610969, 4096) = 614400
    
    If we subtract from that the file offset for the extent item we get:
    
       614400 - 589824 = 24K
    
    So the missing 40K corresponds to the end of the file extent item's range
    minus the rounded up i_size:
    
       655360 - 614400 = 40K
    
    Normally we don't expect a file extent item to span over the rounded up
    i_size of an inode, since when truncating, doing hole punching and other
    operations that trim a file extent item, the number of bytes is adjusted.
    
    There is however a short time window where the kernel can end up,
    temporarily,persisting an inode with an i_size that falls in the middle of
    the last file extent item and the file extent item was not yet trimmed (its
    number of bytes reduced so that it doesn't cross i_size rounded up by the
    sector size).
    
    The steps (in the kernel) that lead to such scenario are the following:
    
     1) We have inode I as an empty file, no allocated extents, i_size is 0;
    
     2) A buffered write is done for file range [589824, 655360[ (length of
        64K) and the i_size is updated to 655360. Note that we got a single
        large folio for the range (64K);
    
     3) A truncate operation starts that reduces the inode's i_size down to
        610969 bytes. The truncate sets the inode's new i_size at
        btrfs_setsize() by calling truncate_setsize() and before calling
        btrfs_truncate();
    
     4) At btrfs_truncate() we trigger writeback for the range starting at
        610304 (which is the new i_size rounded down to the sector size) and
        ending at (u64)-1;
    
     5) During the writeback, at extent_write_cache_pages(), we get from the
        call to filemap_get_folios_tag(), the 64K folio that starts at file
        offset 589824 since it contains the start offset of the writeback
        range (610304);
    
     6) At writepage_delalloc() we find the whole range of the folio is dirty
        and therefore we run delalloc for that 64K range ([589824, 655360[),
        reserving a 64K extent, creating an ordered extent, etc;
    
     7) At extent_writepage_io() we submit IO only for subrange [589824, 614400[
        because the inode's i_size is 610969 bytes (rounded up by sector size
        is 614400). There, in the while loop we intentionally skip IO beyond
        i_size to avoid any unnecessay work and just call
        btrfs_mark_ordered_io_finished() for the range [614400, 655360[ (which
        has a 40K length);
    
     8) Once the IO finishes we finish the ordered extent by ending up at
        btrfs_finish_one_ordered(), join transaction N, insert a file extent
        item in the inode's subvolume tree for file offset 589824 with a number
        of bytes of 64K, and update the inode's delayed inode item or directly
        the inode item with a call to btrfs_update_inode_fallback(), which
        results in storing the new i_size of 610969 bytes;
    
     9) Transaction N is committed either by the transaction kthread or some
        other task committed it (in response to a sync or fsync for example).
    
        At this point we have inode I persisted with an i_size of 610969 bytes
        and file extent item that starts at file offset 589824 and has a number
        of bytes of 64K, ending at an offset of 655360 which is beyond the
        i_size rounded up to the sector size (614400).
    
        --> So after a crash or power failure here, the btrfs check program
            reports that error about missing checksum items for this inode, as
            it tries to lookup for checksums covering the whole range of the
            extent;
    
    10) Only after transaction N is committed that at btrfs_truncate() the
        call to btrfs_start_transaction() starts a new transaction, N + 1,
        instead of joining transaction N. And it's with transaction N + 1 that
        it calls btrfs_truncate_inode_items() which updates the file extent
        item at file offset 589824 to reduce its number of bytes from 64K down
        to 24K, so that the file extent item's range ends at the i_size
        rounded up to the sector size (614400 bytes).
    
    Fix this by truncating the ordered extent at extent_writepage_io() when we
    skip writeback because the current offset in the folio is beyond i_size.
    This ensures we don't ever persist a file extent item with a number of
    bytes beyond the rounded up (by sector size) value of the i_size.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: e9e3b22ddfa7 ("btrfs: fix beyond-EOF write handling")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: use variable for end offset in extent_writepage_io() [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Mon Jan 12 09:55:54 2026 -0500

    btrfs: use variable for end offset in extent_writepage_io()
    
    [ Upstream commit 46a23908598f4b8e61483f04ea9f471b2affc58a ]
    
    Instead of repeating the expression "start + len" multiple times, store it
    in a variable and use it where needed.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Stable-dep-of: e9e3b22ddfa7 ("btrfs: fix beyond-EOF write handling")
    Signed-off-by: Sasha Levin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

can: j1939: make j1939_session_activate() fail if device is no longer registered [+ + +]

Author: Tetsuo Handa <[email protected]>
Date:   Tue Nov 25 22:39:59 2025 +0900

    can: j1939: make j1939_session_activate() fail if device is no longer registered
    
    [ Upstream commit 5d5602236f5db19e8b337a2cd87a90ace5ea776d ]
    
    syzbot is still reporting
    
      unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
    
    even after commit 93a27b5891b8 ("can: j1939: add missing calls in
    NETDEV_UNREGISTER notification handler") was added. A debug printk() patch
    found that j1939_session_activate() can succeed even after
    j1939_cancel_active_session() from j1939_netdev_notify(NETDEV_UNREGISTER)
    has completed.
    
    Since j1939_cancel_active_session() is processed with the session list lock
    held, checking ndev->reg_state in j1939_session_activate() with the session
    list lock held can reliably close the race window.
    
    Reported-by: syzbot <[email protected]>
    Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
    Signed-off-by: Tetsuo Handa <[email protected]>
    Acked-by: Oleksij Rempel <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Marc Kleine-Budde <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

counter: 104-quad-8: Fix incorrect return value in IRQ handler [+ + +]

Author: Haotian Zhang <[email protected]>
Date:   Mon Dec 15 10:01:14 2025 +0800

    counter: 104-quad-8: Fix incorrect return value in IRQ handler
    
    commit 9517d76dd160208b7a432301ce7bec8fc1ddc305 upstream.
    
    quad8_irq_handler() should return irqreturn_t enum values, but it
    directly returns negative errno codes from regmap operations on error.
    
    Return IRQ_NONE if the interrupt status cannot be read. If clearing the
    interrupt fails, return IRQ_HANDLED to prevent the kernel from disabling
    the IRQ line due to a spurious interrupt storm. Also, log these regmap
    failures with dev_WARN_ONCE.
    
    Fixes: 98ffe0252911 ("counter: 104-quad-8: Migrate to the regmap API")
    Suggested-by: Andy Shevchenko <[email protected]>
    Signed-off-by: Haotian Zhang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Cc: [email protected]
    Signed-off-by: William Breathitt Gray <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

counter: interrupt-cnt: Drop IRQF_NO_THREAD flag [+ + +]

Author: Alexander Sverdlin <[email protected]>
Date:   Tue Nov 18 09:35:48 2025 +0100

    counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
    
    commit 23f9485510c338476b9735d516c1d4aacb810d46 upstream.
    
    An IRQ handler can either be IRQF_NO_THREAD or acquire spinlock_t, as
    CONFIG_PROVE_RAW_LOCK_NESTING warns:
    =============================
    [ BUG: Invalid wait context ]
    6.18.0-rc1+git... #1
    -----------------------------
    some-user-space-process/1251 is trying to lock:
    (&counter->events_list_lock){....}-{3:3}, at: counter_push_event [counter]
    other info that might help us debug this:
    context-{2:2}
    no locks held by some-user-space-process/....
    stack backtrace:
    CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process 6.18.0-rc1+git... #1 PREEMPT
    Call trace:
     show_stack (C)
     dump_stack_lvl
     dump_stack
     __lock_acquire
     lock_acquire
     _raw_spin_lock_irqsave
     counter_push_event [counter]
     interrupt_cnt_isr [interrupt_cnt]
     __handle_irq_event_percpu
     handle_irq_event
     handle_simple_irq
     handle_irq_desc
     generic_handle_domain_irq
     gpio_irq_handler
     handle_irq_desc
     generic_handle_domain_irq
     gic_handle_irq
     call_on_irq_stack
     do_interrupt_handler
     el0_interrupt
     __el0_irq_handler_common
     el0t_64_irq_handler
     el0t_64_irq
    
    ... and Sebastian correctly points out. Remove IRQF_NO_THREAD as an
    alternative to switching to raw_spinlock_t, because the latter would limit
    all potential nested locks to raw_spinlock_t only.
    
    Cc: Sebastian Andrzej Siewior <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/all/[email protected]/
    Fixes: a55ebd47f21f ("counter: add IRQ or GPIO based counter")
    Signed-off-by: Alexander Sverdlin <[email protected]>
    Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
    Reviewed-by: Oleksij Rempel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: William Breathitt Gray <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

crypto: qat - fix duplicate restarting msg during AER error [+ + +]

Author: Harshita Bhilwaria <[email protected]>
Date:   Wed Dec 17 11:16:06 2025 +0530

    crypto: qat - fix duplicate restarting msg during AER error
    
    [ Upstream commit 961ac9d97be72267255f1ed841aabf6694b17454 ]
    
    The restarting message from PF to VF is sent twice during AER error
    handling: once from adf_error_detected() and again from
    adf_disable_sriov().
    This causes userspace subservices to shutdown unexpectedly when they
    receive a duplicate restarting message after already being restarted.
    
    Avoid calling adf_pf2vf_notify_restarting() and
    adf_pf2vf_wait_for_restarting_complete() from adf_error_detected() so
    that the restarting msg is sent only once from PF to VF.
    
    Fixes: 9567d3dc760931 ("crypto: qat - improve aer error reset handling")
    Signed-off-by: Harshita Bhilwaria <[email protected]>
    Reviewed-by: Giovanni Cabiddu <[email protected]>
    Reviewed-by: Ahsan Atta <[email protected]>
    Reviewed-by: Ravikumar PM <[email protected]>
    Reviewed-by: Srikanth Thokala <[email protected]>
    Signed-off-by: Herbert Xu <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

csky: fix csky_cmpxchg_fixup not working [+ + +]

Author: Yang Li <[email protected]>
Date:   Wed Oct 16 17:56:26 2024 +0800

    csky: fix csky_cmpxchg_fixup not working
    
    [ Upstream commit 809ef03d6d21d5fea016bbf6babeec462e37e68c ]
    
    In the csky_cmpxchg_fixup function, it is incorrect to use the global
    variable csky_cmpxchg_stw to determine the address where the exception
    occurred.The global variable csky_cmpxchg_stw stores the opcode at the
    time of the exception, while &csky_cmpxchg_stw shows the address where
    the exception occurred.
    
    Signed-off-by: Yang Li <[email protected]>
    Signed-off-by: Guo Ren <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

dm-snapshot: fix 'scheduling while atomic' on real-time kernels [+ + +]

Author: Mikulas Patocka <[email protected]>
Date:   Mon Dec 1 22:13:10 2025 +0100

    dm-snapshot: fix 'scheduling while atomic' on real-time kernels
    
    [ Upstream commit 8581b19eb2c5ccf06c195d3b5468c3c9d17a5020 ]
    
    There is reported 'scheduling while atomic' bug when using dm-snapshot on
    real-time kernels. The reason for the bug is that the hlist_bl code does
    preempt_disable() when taking the lock and the kernel attempts to take
    other spinlocks while holding the hlist_bl lock.
    
    Fix this by converting a hlist_bl spinlock into a regular spinlock.
    
    Signed-off-by: Mikulas Patocka <[email protected]>
    Reported-by: Jiping Ma <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Apply e4479aecf658 to dml [+ + +]

Author: Nathan Chancellor <[email protected]>
Date:   Sat Dec 13 15:16:43 2025 +0900

    drm/amd/display: Apply e4479aecf658 to dml
    
    [ Upstream commit 70740454377f1ba3ff32f5df4acd965db99d055b ]
    
    After an innocuous optimization change in clang-22, allmodconfig (which
    enables CONFIG_KASAN and CONFIG_WERROR) breaks with:
    
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1724:6: error: stack frame size (3144) exceeds limit (3072) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
       1724 | void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            |      ^
    
    With clang-21, this function was already pretty close to the existing
    limit of 3072 bytes.
    
      drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1724:6: error: stack frame size (2904) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
       1724 | void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            |      ^
    
    A similar situation occurred in dml2, which was resolved by
    commit e4479aecf658 ("drm/amd/display: Increase sanitizer frame larger
    than limit when compile testing with clang") by increasing the limit for
    clang when compile testing with certain sanitizer enabled, so that
    allmodconfig (an easy testing target) continues to work.
    
    Apply that same change to the dml folder to clear up the warning for
    allmodconfig, unbreaking the build.
    
    Closes: https://github.com/ClangBuiltLinux/linux/issues/2135
    Signed-off-by: Nathan Chancellor <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit 25314b453cf812150e9951a32007a32bba85707e)
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Fix DP no audio issue [+ + +]

Author: Charlene Liu <[email protected]>
Date:   Fri Nov 28 19:38:31 2025 -0500

    drm/amd/display: Fix DP no audio issue
    
    [ Upstream commit 3886b198bd6e49c801fe9552fcfbfc387a49fbbc ]
    
    [why]
    need to enable APG_CLOCK_ENABLE enable first
    also need to wake up az from D3 before access az block
    
    Reviewed-by: Swapnil Patel <[email protected]>
    Signed-off-by: Charlene Liu <[email protected]>
    Signed-off-by: Chenyu Chen <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit bf5e396957acafd46003318965500914d5f4edfa)
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Respect user's CONFIG_FRAME_WARN more for dml files [+ + +]

Author: Nathan Chancellor <[email protected]>
Date:   Fri Jan 31 15:31:19 2025 -0700

    drm/amd/display: Respect user's CONFIG_FRAME_WARN more for dml files
    
    [ Upstream commit 820ccf8cb2b145ab9fc12651f7f80339614fa46c ]
    
    Currently, there are several files in drm/amd/display that aim to have a
    higher -Wframe-larger-than value to avoid instances of that warning with
    a lower value from the user's configuration. However, with the way that
    it is currently implemented, it does not respect the user's request via
    CONFIG_FRAME_WARN for a higher stack frame limit, which can cause pain
    when new instances of the warning appear and break the build due to
    CONFIG_WERROR.
    
    Adjust the logic to switch from a hard coded -Wframe-larger-than value
    to only using the value as a minimum clamp and deferring to the
    requested value from CONFIG_FRAME_WARN if it is higher.
    
    Suggested-by: Harry Wentland <[email protected]>
    Reported-by: Greg Kroah-Hartman <[email protected]>
    Closes: https://lore.kernel.org/2025013003-audience-opposing-7f95@gregkh/
    Signed-off-by: Nathan Chancellor <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Stable-dep-of: 70740454377f ("drm/amd/display: Apply e4479aecf658 to dml")
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: shrink struct members [+ + +]

Author: Rosen Penev <[email protected]>
Date:   Sat Nov 8 09:40:47 2025 -0800

    drm/amd/display: shrink struct members
    
    [ Upstream commit 7329417fc9ac128729c3a092b006c8f1fd0d04a6 ]
    
    On a 32-bit ARM system, the audio_decoder struct ends up being too large
    for dp_retrain_link_dp_test.
    
    link_dp_cts.c:157:1: error: the frame size of 1328 bytes is larger than
    1280 bytes [-Werror=frame-larger-than=]
    
    This is mitigated by shrinking the members of the struct and avoids
    having to deal with dynamic allocation.
    
    feed_back_divider is assigned but otherwise unused. Remove both.
    
    pixel_repetition looks like it should be a bool since it's only ever
    assigned to 1. But there are checks for 2 and 4. Reduce to uint8_t.
    
    Remove ss_percentage_divider. Unused.
    
    Shrink refresh_rate as it gets assigned to at most a 3 digit integer
    value.
    
    Signed-off-by: Rosen Penev <[email protected]>
    Reviewed-by: Alex Hung <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit 3849efdc7888d537f09c3dcfaea4b3cd377a102e)
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: Fix query for VPE block_type and ip_count [+ + +]

Author: Alan Liu <[email protected]>
Date:   Mon Dec 22 12:26:35 2025 +0800

    drm/amdgpu: Fix query for VPE block_type and ip_count
    
    commit 72d7f4573660287f1b66c30319efecd6fcde92ee upstream.
    
    [Why]
    Query for VPE block_type and ip_count is missing.
    
    [How]
    Add VPE case in ip_block_type and hw_ip_count query.
    
    Reviewed-by: Lang Yu <[email protected]>
    Signed-off-by: Alan Liu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit a6ea0a430aca5932b9c75d8e38deeb45665dd2ae)
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amdkfd: Fix improper NULL termination of queue restore SMI event string [+ + +]

Author: Brian Kocoloski <[email protected]>
Date:   Thu Nov 20 13:57:19 2025 -0500

    drm/amdkfd: Fix improper NULL termination of queue restore SMI event string
    
    [ Upstream commit 969faea4e9d01787c58bab4d945f7ad82dad222d ]
    
    Pass character "0" rather than NULL terminator to properly format
    queue restoration SMI events. Currently, the NULL terminator precedes
    the newline character that is intended to delineate separate events
    in the SMI event buffer, which can break userspace parsers.
    
    Signed-off-by: Brian Kocoloski <[email protected]>
    Reviewed-by: Philip Yang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit 6e7143e5e6e21f9d5572e0390f7089e6d53edf3c)
    Signed-off-by: Sasha Levin <[email protected]>

drm/pl111: Fix error handling in pl111_amba_probe [+ + +]

Author: Miaoqian Lin <[email protected]>
Date:   Thu Dec 11 16:33:44 2025 +0400

    drm/pl111: Fix error handling in pl111_amba_probe
    
    commit 0ddd3bb4b14c9102c0267b3fd916c81fe5ab89c1 upstream.
    
    Jump to the existing dev_put label when devm_request_irq() fails
    so drm_dev_put() and of_reserved_mem_device_release() run
    instead of returning early and leaking resources.
    
    Found via static analysis and code review.
    
    Fixes: bed41005e617 ("drm/pl111: Initial drm/kms driver for pl111")
    Cc: [email protected]
    Signed-off-by: Miaoqian Lin <[email protected]>
    Reviewed-by: Javier Martinez Canillas <[email protected]>
    Signed-off-by: Linus Walleij <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/radeon: Remove __counted_by from ClockInfoArray.clockInfo[] [+ + +]

Author: Alex Deucher <[email protected]>
Date:   Mon Jun 30 10:47:09 2025 -0400

    drm/radeon: Remove __counted_by from ClockInfoArray.clockInfo[]
    
    commit 19158c7332468bc28572bdca428e89c7954ee1b1 upstream.
    
    clockInfo[] is a generic uchar pointer to variable sized structures
    which vary from ASIC to ASIC.
    
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4374
    Reviewed-by: Lijo Lazar <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit dc135aa73561b5acc74eadf776e48530996529a3)
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/xe: Ensure GT is in C0 during resumes [+ + +]

Author: Xin Wang <[email protected]>
Date:   Tue Aug 26 17:06:33 2025 -0700

    drm/xe: Ensure GT is in C0 during resumes
    
    commit 95d0883ac8105717f59c2dcdc0d8b9150f13aa12 upstream.
    
    This patch ensures the gt will be awake for the entire duration
    of the resume sequences until GuCRC takes over and GT-C6 gets
    re-enabled.
    
    Before suspending GT-C6 is kept enabled, but upon resume, GuCRC
    is not yet alive to properly control the exits and some cases of
    instability and corruption related to GT-C6 can be observed.
    
    Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
    
    Suggested-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Xin Wang <[email protected]>
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally [+ + +]

Author: Xin Wang <[email protected]>
Date:   Tue Aug 26 17:06:32 2025 -0700

    drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally
    
    commit 1313351e71181a4818afeb8dfe202e4162091ef6 upstream.
    
    Move forcewake_get() into xe_gt_idle_enable_c6() to streamline the
    code and make it easier to use.
    
    Suggested-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Xin Wang <[email protected]>
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

erofs: don't bother with s_stack_depth increasing for now [+ + +]

Author: Gao Xiang <[email protected]>
Date:   Thu Jan 8 10:38:31 2026 +0800

    erofs: don't bother with s_stack_depth increasing for now
    
    [ Upstream commit 072a7c7cdbea4f91df854ee2bb216256cd619f2a ]
    
    Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
    for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
    stack overflow when stacking an unlimited number of EROFS on top of
    each other.
    
    This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
    (and such setups are already used in production for quite a long time).
    
    One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
    from 2 to 3, but proving that this is safe in general is a high bar.
    
    After a long discussion on GitHub issues [1] about possible solutions,
    one conclusion is that there is no need to support nesting file-backed
    EROFS mounts on stacked filesystems, because there is always the option
    to use loopback devices as a fallback.
    
    As a quick fix for the composefs regression for this cycle, instead of
    bumping `s_stack_depth` for file backed EROFS mounts, we disallow
    nesting file-backed EROFS over EROFS and over filesystems with
    `s_stack_depth` > 0.
    
    This works for all known file-backed mount use cases (composefs,
    containerd, and Android APEX for some Android vendors), and the fix is
    self-contained.
    
    Essentially, we are allowing one extra unaccounted fs stacking level of
    EROFS below stacking filesystems, but EROFS can only be used in the read
    path (i.e. overlayfs lower layers), which typically has much lower stack
    usage than the write path.
    
    We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
    stack usage analysis or using alternative approaches, such as splitting
    the `s_stack_depth` limitation according to different combinations of
    stacking.
    
    Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
    Reported-and-tested-by: Dusty Mabe <[email protected]>
    Reported-by: Timothée Ravier <[email protected]>
    Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
    Reported-by: "Alekséi Naidénov" <[email protected]>
    Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
    Acked-by: Amir Goldstein <[email protected]>
    Acked-by: Alexander Larsson <[email protected]>
    Reviewed-and-tested-by: Sheng Yong <[email protected]>
    Reviewed-by: Zhiguo Niu <[email protected]>
    Reviewed-by: Chao Yu <[email protected]>
    Cc: Christian Brauner <[email protected]>
    Cc: Miklos Szeredi <[email protected]>
    Signed-off-by: Gao Xiang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

erofs: fix file-backed mounts no longer working on EROFS partitions [+ + +]

Author: Gao Xiang <[email protected]>
Date:   Sat Jan 10 19:47:03 2026 +0800

    erofs: fix file-backed mounts no longer working on EROFS partitions
    
    [ Upstream commit 7893cc12251f6f19e7689a4cf3ba803bddbd8437 ]
    
    Sheng Yong reported [1] that Android APEX images didn't work with commit
    072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for
    now") because "EROFS-formatted APEX file images can be stored within an
    EROFS-formatted Android system partition."
    
    In response, I sent a quick fat-fingered [PATCH v3] to address the
    report.  Unfortunately, the updated condition was incorrect:
    
             if (erofs_is_fileio_mode(sbi)) {
    -            sb->s_stack_depth =
    -                file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
    -            if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
    -                erofs_err(sb, "maximum fs stacking depth exceeded");
    +            inode = file_inode(sbi->dif0.file);
    +            if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
    +                inode->i_sb->s_stack_depth) {
    
    The condition `!sb->s_bdev` is always true for all file-backed EROFS
    mounts, making the check effectively a no-op.
    
    The real fix tested and confirmed by Sheng Yong [2] at that time was
    [PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup
    works:
        EROFS (on a block device) + EROFS (file-backed mount)
    
    But sadly I screwed it up again by upstreaming the outdated [PATCH v3].
    
    This patch applies the same logic as the delta between the upstream
    [PATCH v3] and the real fix [PATCH v3 RESEND].
    
    Reported-by: Sheng Yong <[email protected]>
    Closes: https://lore.kernel.org/r/[email protected] [1]
    Fixes: 072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for now")
    Link: https://lore.kernel.org/r/[email protected] [2]
    Signed-off-by: Gao Xiang <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

gpio: pca953x: Add support for level-triggered interrupts [+ + +]

Author: Potin Lai <[email protected]>
Date:   Wed Apr 9 23:37:30 2025 +0800

    gpio: pca953x: Add support for level-triggered interrupts
    
    [ Upstream commit 417b0f8d08f878615de9481c6e8827fbc8b57ed2 ]
    
    Adds support for level-triggered interrupts in the PCA953x GPIO
    expander driver. Previously, the driver only supported edge-triggered
    interrupts, which could lead to missed events in scenarios where an
    interrupt condition persists until it is explicitly cleared.
    
    By enabling level-triggered interrupts, the driver can now detect and
    respond to sustained interrupt conditions more reliably.
    
    Signed-off-by: Potin Lai <[email protected]>
    Link: https://lore.kernel.org/r/20250409-gpio-pca953x-level-triggered-irq-v3-1-7f184d814934@gmail.com
    Signed-off-by: Bartosz Golaszewski <[email protected]>
    Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices")
    Signed-off-by: Sasha Levin <[email protected]>

gpio: pca953x: handle short interrupt pulses on PCAL devices [+ + +]

Author: Ernest Van Hoecke <[email protected]>
Date:   Wed Dec 17 16:30:25 2025 +0100

    gpio: pca953x: handle short interrupt pulses on PCAL devices
    
    [ Upstream commit 014a17deb41201449f76df2b20c857a9c3294a7c ]
    
    GPIO drivers with latch input support may miss short pulses on input
    pins even when input latching is enabled. The generic interrupt logic in
    the pca953x driver reports interrupts by comparing the current input
    value against the previously sampled one and only signals an event when
    a level change is observed between two reads.
    
    For short pulses, the first edge is captured when the input register is
    read, but if the signal returns to its previous level before the read,
    the second edge is not observed. As a result, successive pulses can
    produce identical input values at read time and no level change is
    detected, causing interrupts to be missed. Below timing diagram shows
    this situation where the top signal is the input pin level and the
    bottom signal indicates the latched value.
    
    ─────┐     ┌──*───────────────┐     ┌──*─────────────────┐     ┌──*───
         │     │  .               │     │  .                 │     │  .
         │     │  │               │     │  │                 │     │  │
         └──*──┘  │               └──*──┘  │                 └──*──┘  │
    Input   │     │                  │     │                    │     │
            ▼     │                  ▼     │                    ▼     │
           IRQ    │                 IRQ    │                   IRQ    │
                  .                        .                          .
    ─────┐        .┌──────────────┐        .┌────────────────┐        .┌──
         │         │              │         │                │         │
         │         │              │         │                │         │
         └────────*┘              └────────*┘                └────────*┘
    Latched       │                        │                          │
                  ▼                        ▼                          ▼
                READ 0                   READ 0                     READ 0
                                       NO CHANGE                  NO CHANGE
    
    PCAL variants provide an interrupt status register that records which
    pins triggered an interrupt, but the status and input registers cannot
    be read atomically. The interrupt status is only cleared when the input
    port is read, and the input value must also be read to determine the
    triggering edge. If another interrupt occurs on a different line after
    the status register has been read but before the input register is
    sampled, that event will not be reflected in the earlier status
    snapshot, so relying solely on the interrupt status register is also
    insufficient.
    
    Support for input latching and interrupt status handling was previously
    added by [1], but the interrupt status-based logic was reverted by [2]
    due to these issues. This patch addresses the original problem by
    combining both sources of information. Events indicated by the interrupt
    status register are merged with events detected through the existing
    level-change logic. As a result:
    
    * short pulses, whose second edges are invisible, are detected via the
      interrupt status register, and
    * interrupts that occur between the status and input reads are still
      caught by the generic level-change logic.
    
    This significantly improves robustness on devices that signal interrupts
    as short pulses, while avoiding the issues that led to the earlier
    reversion. In practice, even if only the first edge of a pulse is
    observable, the interrupt is reliably detected.
    
    This fixes missed interrupts from an Ilitek touch controller with its
    interrupt line connected to a PCAL6416A, where active-low pulses are
    approximately 200 us long.
    
    [1] commit 44896beae605 ("gpio: pca953x: add PCAL9535 interrupt support for Galileo Gen2")
    [2] commit d6179f6c6204 ("gpio: pca953x: Improve interrupt support")
    
    Fixes: d6179f6c6204 ("gpio: pca953x: Improve interrupt support")
    Signed-off-by: Ernest Van Hoecke <[email protected]>
    Reviewed-by: Andy Shevchenko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bartosz Golaszewski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

gpio: rockchip: mark the GPIO controller as sleeping [+ + +]

Author: Bartosz Golaszewski <[email protected]>
Date:   Tue Jan 6 10:00:11 2026 +0100

    gpio: rockchip: mark the GPIO controller as sleeping
    
    commit 20cf2aed89ac6d78a0122e31c875228e15247194 upstream.
    
    The GPIO controller is configured as non-sleeping but it uses generic
    pinctrl helpers which use a mutex for synchronization.
    
    This can cause the following lockdep splat with shared GPIOs enabled on
    boards which have multiple devices using the same GPIO:
    
    BUG: sleeping function called from invalid context at
    kernel/locking/mutex.c:591
    in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 12, name:
    kworker/u16:0
    preempt_count: 1, expected: 0
    RCU nest depth: 0, expected: 0
    6 locks held by kworker/u16:0/12:
      #0: ffff0001f0018d48 ((wq_completion)events_unbound#2){+.+.}-{0:0},
    at: process_one_work+0x18c/0x604
      #1: ffff8000842dbdf0 (deferred_probe_work){+.+.}-{0:0}, at:
    process_one_work+0x1b4/0x604
      #2: ffff0001f18498f8 (&dev->mutex){....}-{4:4}, at:
    __device_attach+0x38/0x1b0
      #3: ffff0001f75f1e90 (&gdev->srcu){.+.?}-{0:0}, at:
    gpiod_direction_output_raw_commit+0x0/0x360
      #4: ffff0001f46e3db8 (&shared_desc->spinlock){....}-{3:3}, at:
    gpio_shared_proxy_direction_output+0xd0/0x144 [gpio_shared_proxy]
      #5: ffff0001f180ee90 (&gdev->srcu){.+.?}-{0:0}, at:
    gpiod_direction_output_raw_commit+0x0/0x360
    irq event stamp: 81450
    hardirqs last  enabled at (81449): [<ffff8000813acba4>]
    _raw_spin_unlock_irqrestore+0x74/0x78
    hardirqs last disabled at (81450): [<ffff8000813abfb8>]
    _raw_spin_lock_irqsave+0x84/0x88
    softirqs last  enabled at (79616): [<ffff8000811455fc>]
    __alloc_skb+0x17c/0x1e8
    softirqs last disabled at (79614): [<ffff8000811455fc>]
    __alloc_skb+0x17c/0x1e8
    CPU: 2 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted
    6.19.0-rc4-next-20260105+ #11975 PREEMPT
    Hardware name: Hardkernel ODROID-M1 (DT)
    Workqueue: events_unbound deferred_probe_work_func
    Call trace:
      show_stack+0x18/0x24 (C)
      dump_stack_lvl+0x90/0xd0
      dump_stack+0x18/0x24
      __might_resched+0x144/0x248
      __might_sleep+0x48/0x98
      __mutex_lock+0x5c/0x894
      mutex_lock_nested+0x24/0x30
      pinctrl_get_device_gpio_range+0x44/0x128
      pinctrl_gpio_direction+0x3c/0xe0
      pinctrl_gpio_direction_output+0x14/0x20
      rockchip_gpio_direction_output+0xb8/0x19c
      gpiochip_direction_output+0x38/0x94
      gpiod_direction_output_raw_commit+0x1d8/0x360
      gpiod_direction_output_nonotify+0x7c/0x230
      gpiod_direction_output+0x34/0xf8
      gpio_shared_proxy_direction_output+0xec/0x144 [gpio_shared_proxy]
      gpiochip_direction_output+0x38/0x94
      gpiod_direction_output_raw_commit+0x1d8/0x360
      gpiod_direction_output_nonotify+0x7c/0x230
      gpiod_configure_flags+0xbc/0x480
      gpiod_find_and_request+0x1a0/0x574
      gpiod_get_index+0x58/0x84
      devm_gpiod_get_index+0x20/0xb4
      devm_gpiod_get_optional+0x18/0x30
      rockchip_pcie_probe+0x98/0x380
      platform_probe+0x5c/0xac
      really_probe+0xbc/0x298
    
    Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio")
    Cc: [email protected]
    Reported-by: Marek Szyprowski <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Acked-by: Heiko Stuebner <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Bartosz Golaszewski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: quirks: work around VID/PID conflict for appledisplay [+ + +]

Author: René Rebe <[email protected]>
Date:   Fri Nov 28 13:46:41 2025 +0100

    HID: quirks: work around VID/PID conflict for appledisplay
    
    [ Upstream commit c7fabe4ad9219866c203164a214c474c95b36bf2 ]
    
    For years I wondered why the Apple Cinema Display driver would not
    just work for me. Turns out the hidraw driver instantly takes it
    over. Fix by adding appledisplay VID/PIDs to hid_have_special_driver.
    
    Fixes: 069e8a65cd79 ("Driver for Apple Cinema Display")
    Signed-off-by: René Rebe <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

idpf: cap maximum Rx buffer size [+ + +]

Author: Joshua Hay <[email protected]>
Date:   Mon Nov 3 13:20:36 2025 -0800

    idpf: cap maximum Rx buffer size
    
    [ Upstream commit 086efe0a1ecc36cffe46640ce12649a4cd3ff171 ]
    
    The HW only supports a maximum Rx buffer size of 16K-128. On systems
    using large pages, the libeth logic can configure the buffer size to be
    larger than this. The upper bound is PAGE_SIZE while the lower bound is
    MTU rounded up to the nearest power of 2. For example, ARM systems with
    a 64K page size and an mtu of 9000 will set the Rx buffer size to 16K,
    which will cause the config Rx queues message to fail.
    
    Initialize the bufq/fill queue buf_len field to the maximum supported
    size. This will trigger the libeth logic to cap the maximum Rx buffer
    size by reducing the upper bound.
    
    Fixes: 74d1412ac8f37 ("idpf: use libeth Rx buffer management for payload buffer")
    Signed-off-by: Joshua Hay <[email protected]>
    Acked-by: Alexander Lobakin <[email protected]>
    Reviewed-by: Madhu Chittim <[email protected]>
    Reviewed-by: Jacob Keller <[email protected]>
    Reviewed-by: Aleksandr Loktionov <[email protected]>
    Reviewed-by: David Decotigny <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

idpf: fix memory leak in idpf_vport_rel() [+ + +]

Author: Emil Tantilov <[email protected]>
Date:   Thu Nov 20 16:12:16 2025 -0800

    idpf: fix memory leak in idpf_vport_rel()
    
    [ Upstream commit f6242b354605faff263ca45882b148200915a3f6 ]
    
    Free vport->rx_ptype_lkup in idpf_vport_rel() to avoid leaking memory
    during a reset. Reported by kmemleak:
    
    unreferenced object 0xff450acac838a000 (size 4096):
      comm "kworker/u258:5", pid 7732, jiffies 4296830044
      hex dump (first 32 bytes):
        00 00 00 00 00 10 00 00 00 10 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00  ................
      backtrace (crc 3da81902):
        __kmalloc_cache_noprof+0x469/0x7a0
        idpf_send_get_rx_ptype_msg+0x90/0x570 [idpf]
        idpf_init_task+0x1ec/0x8d0 [idpf]
        process_one_work+0x226/0x6d0
        worker_thread+0x19e/0x340
        kthread+0x10f/0x250
        ret_from_fork+0x251/0x2b0
        ret_from_fork_asm+0x1a/0x30
    
    Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
    Signed-off-by: Emil Tantilov <[email protected]>
    Reviewed-by: Aleksandr Loktionov <[email protected]>
    Reviewed-by: Madhu Chittim <[email protected]>
    Tested-by: Samuel Salin <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

idpf: keep the netdev when a reset fails [+ + +]

Author: Emil Tantilov <[email protected]>
Date:   Thu Nov 20 16:12:14 2025 -0800

    idpf: keep the netdev when a reset fails
    
    [ Upstream commit 083029bd8b445595222a3cd14076b880781c1765 ]
    
    During a successful reset the driver would re-allocate vport resources
    while keeping the netdevs intact. However, in case of an error in the
    init task, the netdev of the failing vport will be unregistered,
    effectively removing the network interface:
    
    [  121.211076] idpf 0000:83:00.0: enabling device (0100 -> 0102)
    [  121.221976] idpf 0000:83:00.0: Device HW Reset initiated
    [  124.161229] idpf 0000:83:00.0 ens801f0: renamed from eth0
    [  124.163364] idpf 0000:83:00.0 ens801f0d1: renamed from eth1
    [  125.934656] idpf 0000:83:00.0 ens801f0d2: renamed from eth2
    [  128.218429] idpf 0000:83:00.0 ens801f0d3: renamed from eth3
    
    ip -br a
    ens801f0         UP
    ens801f0d1       UP
    ens801f0d2       UP
    ens801f0d3       UP
    echo 1 > /sys/class/net/ens801f0/device/reset
    
    [  145.885537] idpf 0000:83:00.0: resetting
    [  145.990280] idpf 0000:83:00.0: reset done
    [  146.284766] idpf 0000:83:00.0: HW reset detected
    [  146.296610] idpf 0000:83:00.0: Device HW Reset initiated
    [  211.556719] idpf 0000:83:00.0: Transaction timed-out (op:526 cookie:7700 vc_op:526 salt:77 timeout:60000ms)
    [  272.996705] idpf 0000:83:00.0: Transaction timed-out (op:502 cookie:7800 vc_op:502 salt:78 timeout:60000ms)
    
    ip -br a
    ens801f0d1       DOWN
    ens801f0d2       DOWN
    ens801f0d3       DOWN
    
    Re-shuffle the logic in the error path of the init task to make sure the
    netdevs remain intact. This will allow the driver to attempt recovery via
    subsequent resets, provided the FW is still functional.
    
    The main change is to make sure that idpf_decfg_netdev() is not called
    should the init task fail during a reset. The error handling is
    consolidated under unwind_vports, as the removed labels had the same
    cleanup logic split depending on the point of failure.
    
    Fixes: ce1b75d0635c ("idpf: add ptypes and MAC filter support")
    Signed-off-by: Emil Tantilov <[email protected]>
    Reviewed-by: Aleksandr Loktionov <[email protected]>
    Tested-by: Samuel Salin <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

inet: ping: Fix icmp out counting [+ + +]

Author: yuan.gao <[email protected]>
Date:   Wed Dec 24 14:31:45 2025 +0800

    inet: ping: Fix icmp out counting
    
    [ Upstream commit 4c0856c225b39b1def6c9a6bc56faca79550da13 ]
    
    When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO
    messages, ICMP_MIB_OUTMSGS is counted twice.
    
        ping_v4_sendmsg
          ping_v4_push_pending_frames
            ip_push_pending_frames
              ip_finish_skb
                __ip_make_skb
                  icmp_out_count(net, icmp_type); // first count
          icmp_out_count(sock_net(sk), user_icmph.type); // second count
    
    However, when the ping program uses an IPPROTO_RAW socket,
    ICMP_MIB_OUTMSGS is counted correctly only once.
    
    Therefore, the first count should be removed.
    
    Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
    Signed-off-by: yuan.gao <[email protected]>
    Reviewed-by: Ido Schimmel <[email protected]>
    Tested-by: Ido Schimmel <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

lib/crypto: aes: Fix missing MMU protection for AES S-box [+ + +]

Author: Eric Biggers <[email protected]>
Date:   Tue Jan 6 21:20:23 2026 -0800

    lib/crypto: aes: Fix missing MMU protection for AES S-box
    
    commit 74d74bb78aeccc9edc10db216d6be121cf7ec176 upstream.
    
    __cacheline_aligned puts the data in the ".data..cacheline_aligned"
    section, which isn't marked read-only i.e. it doesn't receive MMU
    protection.  Replace it with ____cacheline_aligned which does the right
    thing and just aligns the data while keeping it in ".rodata".
    
    Fixes: b5e0b032b6c3 ("crypto: aes - add generic time invariant AES cipher")
    Cc: [email protected]
    Reported-by: Qingfang Deng <[email protected]>
    Closes: https://lore.kernel.org/r/[email protected]/
    Acked-by: Ard Biesheuvel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Eric Biggers <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

libceph: make calc_target() set t->paused, not just clear it [+ + +]

Author: Ilya Dryomov <[email protected]>
Date:   Mon Jan 5 19:23:19 2026 +0100

    libceph: make calc_target() set t->paused, not just clear it
    
    commit c0fe2994f9a9d0a2ec9e42441ea5ba74b6a16176 upstream.
    
    Currently calc_target() clears t->paused if the request shouldn't be
    paused anymore, but doesn't ever set t->paused even though it's able to
    determine when the request should be paused.  Setting t->paused is left
    to __submit_request() which is fine for regular requests but doesn't
    work for linger requests -- since __submit_request() doesn't operate
    on linger requests, there is nowhere for lreq->t.paused to be set.
    One consequence of this is that watches don't get reestablished on
    paused -> unpaused transitions in cases where requests have been paused
    long enough for the (paused) unwatch request to time out and for the
    subsequent (re)watch request to enter the paused state.  On top of the
    watch not getting reestablished, rbd_reregister_watch() gets stuck with
    rbd_dev->watch_mutex held:
    
      rbd_register_watch
        __rbd_register_watch
          ceph_osdc_watch
            linger_reg_commit_wait
    
    It's waiting for lreq->reg_commit_wait to be completed, but for that to
    happen the respective request needs to end up on need_resend_linger list
    and be kicked when requests are unpaused.  There is no chance for that
    if the request in question is never marked paused in the first place.
    
    The fact that rbd_dev->watch_mutex remains taken out forever then
    prevents the image from getting unmapped -- "rbd unmap" would inevitably
    hang in D state on an attempt to grab the mutex.
    
    Cc: [email protected]
    Reported-by: Raphael Zimmer <[email protected]>
    Signed-off-by: Ilya Dryomov <[email protected]>
    Reviewed-by: Viacheslav Dubeyko <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

libceph: make free_choose_arg_map() resilient to partial allocation [+ + +]

Author: Tuo Li <[email protected]>
Date:   Sun Dec 21 02:11:49 2025 +0800

    libceph: make free_choose_arg_map() resilient to partial allocation
    
    commit e3fe30e57649c551757a02e1cad073c47e1e075e upstream.
    
    free_choose_arg_map() may dereference a NULL pointer if its caller fails
    after a partial allocation.
    
    For example, in decode_choose_args(), if allocation of arg_map->args
    fails, execution jumps to the fail label and free_choose_arg_map() is
    called. Since arg_map->size is updated to a non-zero value before memory
    allocation, free_choose_arg_map() will iterate over arg_map->args and
    dereference a NULL pointer.
    
    To prevent this potential NULL pointer dereference and make
    free_choose_arg_map() more resilient, add checks for pointers before
    iterating.
    
    Cc: [email protected]
    Co-authored-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Tuo Li <[email protected]>
    Reviewed-by: Viacheslav Dubeyko <[email protected]>
    Signed-off-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

libceph: prevent potential out-of-bounds reads in handle_auth_done() [+ + +]

Author: ziming zhang <[email protected]>
Date:   Thu Dec 11 16:52:58 2025 +0800

    libceph: prevent potential out-of-bounds reads in handle_auth_done()
    
    commit 818156caffbf55cb4d368f9c3cac64e458fb49c9 upstream.
    
    Perform an explicit bounds check on payload_len to avoid a possible
    out-of-bounds access in the callout.
    
    [ idryomov: changelog ]
    
    Cc: [email protected]
    Signed-off-by: ziming zhang <[email protected]>
    Reviewed-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

libceph: replace overzealous BUG_ON in osdmap_apply_incremental() [+ + +]

Author: Ilya Dryomov <[email protected]>
Date:   Mon Dec 15 11:53:31 2025 +0100

    libceph: replace overzealous BUG_ON in osdmap_apply_incremental()
    
    commit e00c3f71b5cf75681dbd74ee3f982a99cb690c2b upstream.
    
    If the osdmap is (maliciously) corrupted such that the incremental
    osdmap epoch is different from what is expected, there is no need to
    BUG.  Instead, just declare the incremental osdmap to be invalid.
    
    Cc: [email protected]
    Reported-by: ziming zhang <[email protected]>
    Signed-off-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

libceph: reset sparse-read state in osd_fault() [+ + +]

Author: Sam Edwards <[email protected]>
Date:   Tue Dec 30 20:05:06 2025 -0800

    libceph: reset sparse-read state in osd_fault()
    
    commit 11194b416ef95012c2cfe5f546d71af07b639e93 upstream.
    
    When a fault occurs, the connection is abandoned, reestablished, and any
    pending operations are retried. The OSD client tracks the progress of a
    sparse-read reply using a separate state machine, largely independent of
    the messenger's state.
    
    If a connection is lost mid-payload or the sparse-read state machine
    returns an error, the sparse-read state is not reset. The OSD client
    will then interpret the beginning of a new reply as the continuation of
    the old one. If this makes the sparse-read machinery enter a failure
    state, it may never recover, producing loops like:
    
      libceph:  [0] got 0 extents
      libceph: data len 142248331 != extent len 0
      libceph: osd0 (1)...:6801 socket error on read
      libceph: data len 142248331 != extent len 0
      libceph: osd0 (1)...:6801 socket error on read
    
    Therefore, reset the sparse-read state in osd_fault(), ensuring retries
    start from a clean state.
    
    Cc: [email protected]
    Fixes: f628d7999727 ("libceph: add sparse read support to OSD client")
    Signed-off-by: Sam Edwards <[email protected]>
    Reviewed-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Ilya Dryomov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

libceph: return the handler error from mon_handle_auth_done() [+ + +]

Author: Ilya Dryomov <[email protected]>
Date:   Mon Dec 29 15:14:48 2025 +0100

    libceph: return the handler error from mon_handle_auth_done()
    
    commit e84b48d31b5008932c0a0902982809fbaa1d3b70 upstream.
    
    Currently any error from ceph_auth_handle_reply_done() is propagated
    via finish_auth() but isn't returned from mon_handle_auth_done().  This
    results in higher layers learning that (despite the monitor considering
    us to be successfully authenticated) something went wrong in the
    authentication phase and reacting accordingly, but msgr2 still trying
    to proceed with establishing the session in the background.  In the
    case of secure mode this can trigger a WARN in setup_crypto() and later
    lead to a NULL pointer dereference inside of prepare_auth_signature().
    
    Cc: [email protected]
    Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
    Signed-off-by: Ilya Dryomov <[email protected]>
    Reviewed-by: Viacheslav Dubeyko <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Linux: Linux 6.12.66 [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Sat Jan 17 16:31:30 2026 +0100

    Linux 6.12.66
    
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Brett A C Sheffield <[email protected]>
    Tested-by: Slade Watkins <[email protected]>
    Tested-by: Francesco Dolcini <[email protected]>
    Tested-by: Shuah Khan <[email protected]>
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: Salvatore Bonaccorso <[email protected]>
    Tested-by: Ron Economos <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Tested-by: Peter Schneider <[email protected]>
    Tested-by: Harshit Mogalapalli <[email protected]>
    Tested-by: Hardik Garg <[email protected]>
    Tested-by: Mark Brown <[email protected]>
    Tested-by: Brett Mastbergen <[email protected]>
    Tested-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mei: me: add nova lake point S DID [+ + +]

Author: Alexander Usyskin <[email protected]>
Date:   Mon Dec 15 12:59:15 2025 +0200

    mei: me: add nova lake point S DID
    
    commit 420f423defcf6d0af2263d38da870ca4a20c0990 upstream.
    
    Add Nova Lake S device id.
    
    Cc: stable <[email protected]>
    Co-developed-by: Tomas Winkler <[email protected]>
    Signed-off-by: Tomas Winkler <[email protected]>
    Signed-off-by: Alexander Usyskin <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/mlx5e: Don't print error message due to invalid module [+ + +]

Author: Gal Pressman <[email protected]>
Date:   Thu Dec 25 15:27:16 2025 +0200

    net/mlx5e: Don't print error message due to invalid module
    
    [ Upstream commit 144297e2a24e3e54aee1180ec21120ea38822b97 ]
    
    Dumping module EEPROM on newer modules is supported through the netlink
    interface only.
    
    Querying with old userspace ethtool (or other tools, such as 'lshw')
    which still uses the ioctl interface results in an error message that
    could flood dmesg (in addition to the expected error return value).
    The original message was added under the assumption that the driver
    should be able to handle all module types, but now that such flows are
    easily triggered from userspace, it doesn't serve its purpose.
    
    Change the log level of the print in mlx5_query_module_eeprom() to
    debug.
    
    Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM")
    Signed-off-by: Gal Pressman <[email protected]>
    Reviewed-by: Tariq Toukan <[email protected]>
    Signed-off-by: Mark Bloch <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset [+ + +]

Author: Xiang Mei <[email protected]>
Date:   Mon Jan 5 20:41:00 2026 -0700

    net/sched: sch_qfq: Fix NULL deref when deactivating inactive aggregate in qfq_reset
    
    [ Upstream commit c1d73b1480235731e35c81df70b08f4714a7d095 ]
    
    `qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class
    itself is active.
    
    Two qfq_class objects may point to the same leaf_qdisc. This happens
    when:
    
    1. one QFQ qdisc is attached to the dev as the root qdisc, and
    
    2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get()
    / qdisc_put()) and is pending to be destroyed, as in function
    tc_new_tfilter.
    
    When packets are enqueued through the root QFQ qdisc, the shared
    leaf_qdisc->q.qlen increases. At the same time, the second QFQ
    qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters
    qfq_reset() with its own q->q.qlen == 0, but its class's leaf
    qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate
    an inactive aggregate and trigger a null-deref in qfq_deactivate_agg:
    
    [    0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000
    [    0.903571] #PF: supervisor write access in kernel mode
    [    0.903860] #PF: error_code(0x0002) - not-present page
    [    0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0
    [    0.904502] Oops: Oops: 0002 [#1] SMP NOPTI
    [    0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE
    [    0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    [    0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
    [    0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0
    
    Code starting with the faulting instruction
    ===========================================
       0:   0f 84 4d 01 00 00       je     0x153
       6:   48 89 70 18             mov    %rsi,0x18(%rax)
       a:   8b 4b 10                mov    0x10(%rbx),%ecx
       d:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
      14:   48 8b 78 08             mov    0x8(%rax),%rdi
      18:   48 d3 e2                shl    %cl,%rdx
      1b:   48 21 f2                and    %rsi,%rdx
      1e:   48 2b 13                sub    (%rbx),%rdx
      21:   48 8b 30                mov    (%rax),%rsi
      24:   48 d3 ea                shr    %cl,%rdx
      27:   8b 4b 18                mov    0x18(%rbx),%ecx
            ...
    [    0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
    [    0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
    [    0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    [    0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
    [    0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
    [    0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
    [    0.909179] FS:  000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
    [    0.909572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
    [    0.910247] PKRU: 55555554
    [    0.910391] Call Trace:
    [    0.910527]  <TASK>
    [    0.910638]  qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485)
    [    0.910826]  qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036)
    [    0.911040]  __qdisc_destroy (net/sched/sch_generic.c:1076)
    [    0.911236]  tc_new_tfilter (net/sched/cls_api.c:2447)
    [    0.911447]  rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
    [    0.911663]  ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861)
    [    0.911894]  netlink_rcv_skb (net/netlink/af_netlink.c:2550)
    [    0.912100]  netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
    [    0.912296]  ? __alloc_skb (net/core/skbuff.c:706)
    [    0.912484]  netlink_sendmsg (net/netlink/af_netlink.c:1894)
    [    0.912682]  sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1))
    [    0.912880]  vfs_write (fs/read_write.c:593 fs/read_write.c:686)
    [    0.913077]  ksys_write (fs/read_write.c:738)
    [    0.913252]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
    [    0.913438]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
    [    0.913687] RIP: 0033:0x424c34
    [    0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9
    
    Code starting with the faulting instruction
    ===========================================
       0:   89 02                   mov    %eax,(%rdx)
       2:   48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
       9:   eb bd                   jmp    0xffffffffffffffc8
       b:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
      12:   00 00 00
      15:   90                      nop
      16:   f3 0f 1e fa             endbr64
      1a:   80 3d 2d 44 09 00 00    cmpb   $0x0,0x9442d(%rip)        # 0x9444e
      21:   74 13                   je     0x36
      23:   b8 01 00 00 00          mov    $0x1,%eax
      28:   0f 05                   syscall
      2a:   09                      .byte 0x9
    [    0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
    [    0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34
    [    0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003
    [    0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000
    [    0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28
    [    0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001
    [    0.917039]  </TASK>
    [    0.917158] Modules linked in:
    [    0.917316] CR2: 0000000000000000
    [    0.917484] ---[ end trace 0000000000000000 ]---
    [    0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
    [    0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0
    
    Code starting with the faulting instruction
    ===========================================
       0:   0f 84 4d 01 00 00       je     0x153
       6:   48 89 70 18             mov    %rsi,0x18(%rax)
       a:   8b 4b 10                mov    0x10(%rbx),%ecx
       d:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
      14:   48 8b 78 08             mov    0x8(%rax),%rdi
      18:   48 d3 e2                shl    %cl,%rdx
      1b:   48 21 f2                and    %rsi,%rdx
      1e:   48 2b 13                sub    (%rbx),%rdx
      21:   48 8b 30                mov    (%rax),%rsi
      24:   48 d3 ea                shr    %cl,%rdx
      27:   8b 4b 18                mov    0x18(%rbx),%ecx
            ...
    [    0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
    [    0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
    [    0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    [    0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
    [    0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
    [    0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
    [    0.921014] FS:  000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
    [    0.921424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
    [    0.922097] PKRU: 55555554
    [    0.922240] Kernel panic - not syncing: Fatal exception
    [    0.922590] Kernel Offset: disabled
    
    Fixes: 0545a3037773 ("pkt_sched: QFQ - quick fair queue scheduler")
    Signed-off-by: Xiang Mei <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: 3com: 3c59x: fix possible null dereference in vortex_probe1() [+ + +]

Author: Thomas Fourier <[email protected]>
Date:   Tue Jan 6 10:47:21 2026 +0100

    net: 3com: 3c59x: fix possible null dereference in vortex_probe1()
    
    commit a4e305ed60f7c41bbf9aabc16dd75267194e0de3 upstream.
    
    pdev can be null and free_ring: can be called in 1297 with a null
    pdev.
    
    Fixes: 55c82617c3e8 ("3c59x: convert to generic DMA API")
    Cc: <[email protected]>
    Signed-off-by: Thomas Fourier <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: enetc: fix build warning when PAGE_SIZE is greater than 128K [+ + +]

Author: Wei Fang <[email protected]>
Date:   Wed Jan 7 17:12:04 2026 +0800

    net: enetc: fix build warning when PAGE_SIZE is greater than 128K
    
    [ Upstream commit 4b5bdabb5449b652122e43f507f73789041d4abe ]
    
    The max buffer size of ENETC RX BD is 0xFFFF bytes, so if the PAGE_SIZE
    is greater than 128K, ENETC_RXB_DMA_SIZE and ENETC_RXB_DMA_SIZE_XDP will
    be greater than 0xFFFF, thus causing a build warning.
    
    This will not cause any practical issues because ENETC is currently only
    used on the ARM64 platform, and the max PAGE_SIZE is 64K. So this patch
    is only for fixing the build warning that occurs when compiling ENETC
    drivers for other platforms.
    
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Fixes: e59bc32df2e9 ("net: enetc: correct the value of ENETC_RXB_TRUESIZE")
    Signed-off-by: Wei Fang <[email protected]>
    Reviewed-by: Frank Li <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: fix memory leak in skb_segment_list for GRO packets [+ + +]

Author: Mohammad Heib <[email protected]>
Date:   Sun Jan 4 23:31:01 2026 +0200

    net: fix memory leak in skb_segment_list for GRO packets
    
    [ Upstream commit 238e03d0466239410b72294b79494e43d4fabe77 ]
    
    When skb_segment_list() is called during packet forwarding, it handles
    packets that were aggregated by the GRO engine.
    
    Historically, the segmentation logic in skb_segment_list assumes that
    individual segments are split from a parent SKB and may need to carry
    their own socket memory accounting. Accordingly, the code transfers
    truesize from the parent to the newly created segments.
    
    Prior to commit ed4cccef64c1 ("gro: fix ownership transfer"), this
    truesize subtraction in skb_segment_list() was valid because fragments
    still carry a reference to the original socket.
    
    However, commit ed4cccef64c1 ("gro: fix ownership transfer") changed
    this behavior by ensuring that fraglist entries are explicitly
    orphaned (skb->sk = NULL) to prevent illegal orphaning later in the
    stack. This change meant that the entire socket memory charge remained
    with the head SKB, but the corresponding accounting logic in
    skb_segment_list() was never updated.
    
    As a result, the current code unconditionally adds each fragment's
    truesize to delta_truesize and subtracts it from the parent SKB. Since
    the fragments are no longer charged to the socket, this subtraction
    results in an effective under-count of memory when the head is freed.
    This causes sk_wmem_alloc to remain non-zero, preventing socket
    destruction and leading to a persistent memory leak.
    
    The leak can be observed via KMEMLEAK when tearing down the networking
    environment:
    
    unreferenced object 0xffff8881e6eb9100 (size 2048):
      comm "ping", pid 6720, jiffies 4295492526
      backtrace:
        kmem_cache_alloc_noprof+0x5c6/0x800
        sk_prot_alloc+0x5b/0x220
        sk_alloc+0x35/0xa00
        inet6_create.part.0+0x303/0x10d0
        __sock_create+0x248/0x640
        __sys_socket+0x11b/0x1d0
    
    Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST
    packets constructed by GRO, the truesize adjustment is removed.
    
    The call to skb_release_head_state() must be preserved. As documented in
    commit cf673ed0e057 ("net: fix fraglist segmentation reference count
    leak"), it is still required to correctly drop references to SKB
    extensions that may be overwritten during __copy_skb_header().
    
    Fixes: ed4cccef64c1 ("gro: fix ownership transfer")
    Signed-off-by: Mohammad Heib <[email protected]>
    Reviewed-by: Willem de Bruijn <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: marvell: prestera: fix NULL dereference on devlink_alloc() failure [+ + +]

Author: Alok Tiwari <[email protected]>
Date:   Mon Dec 29 21:21:18 2025 -0800

    net: marvell: prestera: fix NULL dereference on devlink_alloc() failure
    
    [ Upstream commit a428e0da1248c353557970848994f35fd3f005e2 ]
    
    devlink_alloc() may return NULL on allocation failure, but
    prestera_devlink_alloc() unconditionally calls devlink_priv() on
    the returned pointer.
    
    This leads to a NULL pointer dereference if devlink allocation fails.
    Add a check for a NULL devlink pointer and return NULL early to avoid
    the crash.
    
    Fixes: 34dd1710f5a3 ("net: marvell: prestera: Add basic devlink support")
    Signed-off-by: Alok Tiwari <[email protected]>
    Acked-by: Elad Nachman <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: mscc: ocelot: Fix crash when adding interface under a lag [+ + +]

Author: Jerry Wu <[email protected]>
Date:   Thu Dec 25 20:36:17 2025 +0000

    net: mscc: ocelot: Fix crash when adding interface under a lag
    
    [ Upstream commit 34f3ff52cb9fa7dbf04f5c734fcc4cb6ed5d1a95 ]
    
    Commit 15faa1f67ab4 ("lan966x: Fix crash when adding interface under a lag")
    fixed a similar issue in the lan966x driver caused by a NULL pointer dereference.
    The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic
    and is susceptible to the same crash.
    
    This issue specifically affects the ocelot_vsc7514.c frontend, which leaves
    unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as
    it uses the DSA framework which registers all ports.
    
    Fix this by checking if the port pointer is valid before accessing it.
    
    Fixes: 528d3f190c98 ("net: mscc: ocelot: drop the use of the "lags" array")
    Signed-off-by: Jerry Wu <[email protected]>
    Reviewed-by: Vladimir Oltean <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: netdevsim: fix inconsistent carrier state after link/unlink [+ + +]

Author: Yohei Kojima <[email protected]>
Date:   Tue Jan 6 00:17:32 2026 +0900

    net: netdevsim: fix inconsistent carrier state after link/unlink
    
    [ Upstream commit d83dddffe1904e4a576d11a541878850a8e64cd2 ]
    
    This patch fixes the edge case behavior on ifup/ifdown and
    linking/unlinking two netdevsim interfaces:
    
    1. unlink two interfaces netdevsim1 and netdevsim2
    2. ifdown netdevsim1
    3. ifup netdevsim1
    4. link two interfaces netdevsim1 and netdevsim2
    5. (Now two interfaces are linked in terms of netdevsim peer, but
        carrier state of the two interfaces remains DOWN.)
    
    This inconsistent behavior is caused by the current implementation,
    which only cares about the "link, then ifup" order, not "ifup, then
    link" order. This patch fixes the inconsistency by calling
    netif_carrier_on() when two netdevsim interfaces are linked.
    
    This patch fixes buggy behavior on NetworkManager-based systems which
    causes the netdevsim test to fail with the following error:
    
      # timeout set to 600
      # selftests: drivers/net/netdevsim: peer.sh
      # 2025/12/25 00:54:03 socat[9115] W address is opened in read-write mode but only supports read-only
      # 2025/12/25 00:56:17 socat[9115] W connect(7, AF=2 192.168.1.1:1234, 16): Connection timed out
      # 2025/12/25 00:56:17 socat[9115] E TCP:192.168.1.1:1234: Connection timed out
      # expected 3 bytes, got 0
      # 2025/12/25 00:56:17 socat[9109] W exiting on signal 15
      not ok 13 selftests: drivers/net/netdevsim: peer.sh # exit=1
    
    This patch also solves timeout on TCP Fast Open (TFO) test in
    NetworkManager-based systems because it also depends on netdevsim's
    carrier consistency.
    
    Fixes: 1a8fed52f7be ("netdevsim: set the carrier when the device goes up")
    Signed-off-by: Yohei Kojima <[email protected]>
    Reviewed-by: Breno Leitao <[email protected]>
    Link: https://patch.msgid.link/602c9e1ba5bb2ee1997bb38b1d866c9c3b807ae9.1767624906.git.yk@y-koj.net
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: sfp: extend Potron XGSPON quirk to cover additional EEPROM variant [+ + +]

Author: Marcus Hughes <[email protected]>
Date:   Sun Dec 7 21:03:55 2025 +0000

    net: sfp: extend Potron XGSPON quirk to cover additional EEPROM variant
    
    [ Upstream commit 71cfa7c893a05d09e7dc14713b27a8309fd4a2db ]
    
    Some Potron SFP+ XGSPON ONU sticks are shipped with different EEPROM
    vendor ID and vendor name strings, but are otherwise functionally
    identical to the existing "Potron SFP+ XGSPON ONU Stick" handled by
    sfp_quirk_potron().
    
    These modules, including units distributed under the "Better Internet"
    branding, use the same UART pin assignment and require the same
    TX_FAULT/LOS behaviour and boot delay. Re-use the existing Potron
    quirk for this EEPROM variant.
    
    Signed-off-by: Marcus Hughes <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: sock: fix hardened usercopy panic in sock_recv_errqueue [+ + +]

Author: Weiming Shi <[email protected]>
Date:   Wed Dec 24 04:35:35 2025 +0800

    net: sock: fix hardened usercopy panic in sock_recv_errqueue
    
    [ Upstream commit 2a71a1a8d0ed718b1c7a9ac61f07e5755c47ae20 ]
    
    skbuff_fclone_cache was created without defining a usercopy region,
    [1] unlike skbuff_head_cache which properly whitelists the cb[] field.
    [2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is
    enabled and the kernel attempts to copy sk_buff.cb data to userspace
    via sock_recv_errqueue() -> put_cmsg().
    
    The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone()
       (from skbuff_fclone_cache) [1]
    2. The skb is cloned via skb_clone() using the pre-allocated fclone
    [3] 3. The cloned skb is queued to sk_error_queue for timestamp
    reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
    5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb
    [4] 6. __check_heap_object() fails because skbuff_fclone_cache has no
       usercopy whitelist [5]
    
    When cloned skbs allocated from skbuff_fclone_cache are used in the
    socket error queue, accessing the sock_exterr_skb structure in skb->cb
    via put_cmsg() triggers a usercopy hardening violation:
    
    [    5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
    [    5.382796] kernel BUG at mm/usercopy.c:102!
    [    5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
    [    5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
    [    5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
    [    5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
    [    5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
    [    5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
    [    5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
    [    5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
    [    5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
    [    5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
    [    5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
    [    5.384903] FS:  0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
    [    5.384903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
    [    5.384903] PKRU: 55555554
    [    5.384903] Call Trace:
    [    5.384903]  <TASK>
    [    5.384903]  __check_heap_object+0x9a/0xd0
    [    5.384903]  __check_object_size+0x46c/0x690
    [    5.384903]  put_cmsg+0x129/0x5e0
    [    5.384903]  sock_recv_errqueue+0x22f/0x380
    [    5.384903]  tls_sw_recvmsg+0x7ed/0x1960
    [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
    [    5.384903]  ? schedule+0x6d/0x270
    [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
    [    5.384903]  ? mutex_unlock+0x81/0xd0
    [    5.384903]  ? __pfx_mutex_unlock+0x10/0x10
    [    5.384903]  ? __pfx_tls_sw_recvmsg+0x10/0x10
    [    5.384903]  ? _raw_spin_lock_irqsave+0x8f/0xf0
    [    5.384903]  ? _raw_read_unlock_irqrestore+0x20/0x40
    [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
    
    The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
      - sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 -
      offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 =
      272 + 24 (inside sock_exterr_skb.ee)
    
    This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure.
    
    [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
    [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
    [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
    [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
    [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
    
    Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
    Reported-by: Xiang Mei <[email protected]>
    Signed-off-by: Weiming Shi <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: usb: pegasus: fix memory leak in update_eth_regs_async() [+ + +]

Author: Petko Manolov <[email protected]>
Date:   Tue Jan 6 10:48:21 2026 +0200

    net: usb: pegasus: fix memory leak in update_eth_regs_async()
    
    [ Upstream commit afa27621a28af317523e0836dad430bec551eb54 ]
    
    When asynchronously writing to the device registers and if usb_submit_urb()
    fail, the code fail to release allocated to this point resources.
    
    Fixes: 323b34963d11 ("drivers: net: usb: pegasus: fix control urb submission")
    Signed-off-by: Petko Manolov <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: wwan: iosm: Fix memory leak in ipc_mux_deinit() [+ + +]

Author: Zilin Guan <[email protected]>
Date:   Tue Dec 30 07:18:53 2025 +0000

    net: wwan: iosm: Fix memory leak in ipc_mux_deinit()
    
    [ Upstream commit 92e6e0a87f6860a4710f9494f8c704d498ae60f8 ]
    
    Commit 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
    allocated memory for pp_qlt in ipc_mux_init() but did not free it in
    ipc_mux_deinit(). This results in a memory leak when the driver is
    unloaded.
    
    Free the allocated memory in ipc_mux_deinit() to fix the leak.
    
    Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
    Co-developed-by: Jianhao Xu <[email protected]>
    Signed-off-by: Jianhao Xu <[email protected]>
    Signed-off-by: Zilin Guan <[email protected]>
    Reviewed-by: Loic Poulain <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates [+ + +]

Author: Di Zhu <[email protected]>
Date:   Wed Dec 24 09:22:24 2025 +0800

    netdev: preserve NETIF_F_ALL_FOR_ALL across TSO updates
    
    [ Upstream commit 02d1e1a3f9239cdb3ecf2c6d365fb959d1bf39df ]
    
    Directly increment the TSO features incurs a side effect: it will also
    directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device,
    which can cause issues such as the inability to enable the nocache copy
    feature on the bonding driver.
    
    The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby
    preventing it from being cleared.
    
    Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master")
    Signed-off-by: Di Zhu <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nf_conncount: update last_gc only when GC has been performed [+ + +]

Author: Fernando Fernandez Mancera <[email protected]>
Date:   Wed Dec 17 15:46:40 2025 +0100

    netfilter: nf_conncount: update last_gc only when GC has been performed
    
    [ Upstream commit 7811ba452402d58628e68faedf38745b3d485e3c ]
    
    Currently last_gc is being updated everytime a new connection is
    tracked, that means that it is updated even if a GC wasn't performed.
    With a sufficiently high packet rate, it is possible to always bypass
    the GC, causing the list to grow infinitely.
    
    Update the last_gc value only when a GC has been actually performed.
    
    Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
    Signed-off-by: Fernando Fernandez Mancera <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nf_tables: avoid chain re-validation if possible [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Sun Jul 7 01:18:25 2024 +0200

    netfilter: nf_tables: avoid chain re-validation if possible
    
    [ Upstream commit 8e1a1bc4f5a42747c08130b8242ebebd1210b32f ]
    
    Hamza Mahfooz reports cpu soft lock-ups in
    nft_chain_validate():
    
     watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [iptables-nft-re:37547]
    [..]
     RIP: 0010:nft_chain_validate+0xcb/0x110 [nf_tables]
    [..]
      nft_immediate_validate+0x36/0x50 [nf_tables]
      nft_chain_validate+0xc9/0x110 [nf_tables]
      nft_immediate_validate+0x36/0x50 [nf_tables]
      nft_chain_validate+0xc9/0x110 [nf_tables]
      nft_immediate_validate+0x36/0x50 [nf_tables]
      nft_chain_validate+0xc9/0x110 [nf_tables]
      nft_immediate_validate+0x36/0x50 [nf_tables]
      nft_chain_validate+0xc9/0x110 [nf_tables]
      nft_immediate_validate+0x36/0x50 [nf_tables]
      nft_chain_validate+0xc9/0x110 [nf_tables]
      nft_immediate_validate+0x36/0x50 [nf_tables]
      nft_chain_validate+0xc9/0x110 [nf_tables]
      nft_table_validate+0x6b/0xb0 [nf_tables]
      nf_tables_validate+0x8b/0xa0 [nf_tables]
      nf_tables_commit+0x1df/0x1eb0 [nf_tables]
    [..]
    
    Currently nf_tables will traverse the entire table (chain graph), starting
    from the entry points (base chains), exploring all possible paths
    (chain jumps).  But there are cases where we could avoid revalidation.
    
    Consider:
    1  input -> j2 -> j3
    2  input -> j2 -> j3
    3  input -> j1 -> j2 -> j3
    
    Then the second rule does not need to revalidate j2, and, by extension j3,
    because this was already checked during validation of the first rule.
    We need to validate it only for rule 3.
    
    This is needed because chain loop detection also ensures we do not exceed
    the jump stack: Just because we know that j2 is cycle free, its last jump
    might now exceed the allowed stack size.  We also need to update all
    reachable chains with the new largest observed call depth.
    
    Care has to be taken to revalidate even if the chain depth won't be an
    issue: chain validation also ensures that expressions are not called from
    invalid base chains.  For example, the masquerade expression can only be
    called from NAT postrouting base chains.
    
    Therefore we also need to keep record of the base chain context (type,
    hooknum) and revalidate if the chain becomes reachable from a different
    hook location.
    
    Reported-by: Hamza Mahfooz <[email protected]>
    Closes: https://lore.kernel.org/netfilter-devel/20251118221735.GA5477@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
    Tested-by: Hamza Mahfooz <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nf_tables: fix memory leak in nf_tables_newrule() [+ + +]

Author: Zilin Guan <[email protected]>
Date:   Wed Dec 24 12:48:26 2025 +0000

    netfilter: nf_tables: fix memory leak in nf_tables_newrule()
    
    [ Upstream commit d077e8119ddbb4fca67540f1a52453631a47f221 ]
    
    In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to
    the err_release_rule label without freeing the allocated flow, leading
    to a memory leak.
    
    Fix this by adding a new label err_destroy_flow and jumping to it when
    nft_use_inc() fails. This ensures that the flow is properly released
    in this error case.
    
    Fixes: 1689f25924ada ("netfilter: nf_tables: report use refcount overflow")
    Signed-off-by: Zilin Guan <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nft_set_pipapo: fix range overlap detection [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Thu Dec 4 12:20:35 2025 +0100

    netfilter: nft_set_pipapo: fix range overlap detection
    
    [ Upstream commit 7711f4bb4b360d9c0ff84db1c0ec91e385625047 ]
    
    set->klen has to be used, not sizeof().  The latter only compares a
    single register but a full check of the entire key is needed.
    
    Example:
    table ip t {
            map s {
                    typeof iifname . ip saddr : verdict
                    flags interval
            }
    }
    
    nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected
    nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected
    nft add element t s '{ "lo" . 10.0.0.0/8 : drop }' # bug: no error
    
    The 3rd 'add element' should be rejected via -ENOTEMPTY, not -EEXIST,
    so userspace / nft can report an error to the user.
    
    The latter is only correct for the 2nd case (re-add of existing element).
    
    As-is, userspace is told that the command was successful, but no elements were
    added.
    
    After this patch, 3rd command gives:
    Error: Could not process rule: File exists
    add element t s { "lo" . 127.0.0.0/8 . "lo"  : drop }
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
    
    Fixes: 0eb4b5ee33f2 ("netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion")
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nft_synproxy: avoid possible data-race on update operation [+ + +]

Author: Fernando Fernandez Mancera <[email protected]>
Date:   Wed Dec 17 21:21:59 2025 +0100

    netfilter: nft_synproxy: avoid possible data-race on update operation
    
    [ Upstream commit 36a3200575642846a96436d503d46544533bb943 ]
    
    During nft_synproxy eval we are reading nf_synproxy_info struct which
    can be modified on update operation concurrently. As nf_synproxy_info
    struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations.
    
    Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support")
    Signed-off-by: Fernando Fernandez Mancera <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NFS: Fix up the automount fs_context to use the correct cred [+ + +]

Author: Trond Myklebust <[email protected]>
Date:   Fri Nov 28 18:56:46 2025 -0500

    NFS: Fix up the automount fs_context to use the correct cred
    
    [ Upstream commit a2a8fc27dd668e7562b5326b5ed2f1604cb1e2e9 ]
    
    When automounting, the fs_context should be fixed up to use the cred
    from the parent filesystem, since the operation is just extending the
    namespace. Authorisation to enter that namespace will already have been
    provided by the preceding lookup.
    
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nfsd: check that server is running in unlock_filesystem [+ + +]

Author: Olga Kornievskaia <[email protected]>
Date:   Mon Dec 15 14:10:36 2025 -0500

    nfsd: check that server is running in unlock_filesystem
    
    commit d0424066fcd294977f310964bed6f2a487fa4515 upstream.
    
    If we are trying to unlock the filesystem via an administrative
    interface and nfsd isn't running, it crashes the server. This
    happens currently because nfsd4_revoke_states() access state
    structures (eg., conf_id_hashtbl) that has been freed as a part
    of the server shutdown.
    
    [   59.465072] Call trace:
    [   59.465308]  nfsd4_revoke_states+0x1b4/0x898 [nfsd] (P)
    [   59.465830]  write_unlock_fs+0x258/0x440 [nfsd]
    [   59.466278]  nfsctl_transaction_write+0xb0/0x120 [nfsd]
    [   59.466780]  vfs_write+0x1f0/0x938
    [   59.467088]  ksys_write+0xfc/0x1f8
    [   59.467395]  __arm64_sys_write+0x74/0xb8
    [   59.467746]  invoke_syscall.constprop.0+0xdc/0x1e8
    [   59.468177]  do_el0_svc+0x154/0x1d8
    [   59.468489]  el0_svc+0x40/0xe0
    [   59.468767]  el0t_64_sync_handler+0xa0/0xe8
    [   59.469138]  el0t_64_sync+0x1ac/0x1b0
    
    Ensure this can't happen by taking the nfsd_mutex and checking that
    the server is still up, and then holding the mutex across the call to
    nfsd4_revoke_states().
    
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Fixes: 1ac3629bf0125 ("nfsd: prepare for supporting admin-revocation of state")
    Cc: [email protected]
    Signed-off-by: Olga Kornievskaia <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

NFSD: Fix permission check for read access to executable-only files [+ + +]

Author: Scott Mayhew <[email protected]>
Date:   Thu Dec 11 07:34:34 2025 -0500

    NFSD: Fix permission check for read access to executable-only files
    
    commit e901c7fce59e72d9f3c92733c379849c4034ac50 upstream.
    
    Commit abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET")
    added NFSD_MAY_OWNER_OVERRIDE to the access flags passed from
    nfsd4_layoutget() to fh_verify().  This causes LAYOUTGET to fail for
    executable-only files, and causes xfstests generic/126 to fail on
    pNFS SCSI.
    
    To allow read access to executable-only files, what we really want is:
    1. The "permissions" portion of the access flags (the lower 6 bits)
       must be exactly NFSD_MAY_READ
    2. The "hints" portion of the access flags (the upper 26 bits) can
       contain any combination of NFSD_MAY_OWNER_OVERRIDE and
       NFSD_MAY_READ_IF_EXEC
    
    Fixes: abc02e5602f7 ("NFSD: Support write delegations in LAYOUTGET")
    Cc: [email protected] # v6.6+
    Signed-off-by: Scott Mayhew <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

NFSD: net ref data still needs to be freed even if net hasn't startup [+ + +]

Author: Edward Adam Davis <[email protected]>
Date:   Tue Dec 16 18:27:37 2025 +0800

    NFSD: net ref data still needs to be freed even if net hasn't startup
    
    commit 0b88bfa42e5468baff71909c2f324a495318532b upstream.
    
    When the NFSD instance doesn't to startup, the net ref data memory is
    not properly reclaimed, which triggers the memory leak issue reported
    by syzbot [1].
    
    To avoid the problem reported in [1], the net ref data memory reclamation
    action is moved outside of nfsd_net_up when the net is shutdown.
    
    [1]
    unreferenced object 0xffff88812a39dfc0 (size 64):
      backtrace (crc a2262fc6):
        percpu_ref_init+0x94/0x1e0 lib/percpu-refcount.c:76
        nfsd_create_serv+0xbe/0x260 fs/nfsd/nfssvc.c:605
        nfsd_nl_listener_set_doit+0x62/0xb00 fs/nfsd/nfsctl.c:1882
        genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115
        genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
        genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210
    
    BUG: memory leak
    
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=6ee3b889bdeada0a6226
    Fixes: 39972494e318 ("nfsd: update percpu_ref to manage references on nfsd_net")
    Cc: [email protected]
    Signed-off-by: Edward Adam Davis <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

nfsd: provide locking for v4_end_grace [+ + +]

Author: NeilBrown <[email protected]>
Date:   Sat Dec 13 13:41:59 2025 -0500

    nfsd: provide locking for v4_end_grace
    
    commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 upstream.
    
    Writing to v4_end_grace can race with server shutdown and result in
    memory being accessed after it was freed - reclaim_str_hashtbl in
    particularly.
    
    We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
    held while client_tracking_op->init() is called and that can wait for
    an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
    deadlock.
    
    nfsd4_end_grace() is also called by the landromat work queue and this
    doesn't require locking as server shutdown will stop the work and wait
    for it before freeing anything that nfsd4_end_grace() might access.
    
    However, we must be sure that writing to v4_end_grace doesn't restart
    the work item after shutdown has already waited for it.  For this we
    add a new flag protected with nn->client_lock.  It is set only while it
    is safe to make client tracking calls, and v4_end_grace only schedules
    work while the flag is set with the spinlock held.
    
    So this patch adds a nfsd_net field "client_tracking_active" which is
    set as described.  Another field "grace_end_forced", is set when
    v4_end_grace is written.  After this is set, and providing
    client_tracking_active is set, the laundromat is scheduled.
    This "grace_end_forced" field bypasses other checks for whether the
    grace period has finished.
    
    This resolves a race which can result in use-after-free.
    
    Reported-by: Li Lingfeng <[email protected]>
    Closes: https://lore.kernel.org/linux-nfs/[email protected]/T/#t
    Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
    Cc: [email protected]
    Signed-off-by: NeilBrown <[email protected]>
    Tested-by: Li Lingfeng <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

NFSD: Remove NFSERR_EAGAIN [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Dec 9 19:28:49 2025 -0500

    NFSD: Remove NFSERR_EAGAIN
    
    commit c6c209ceb87f64a6ceebe61761951dcbbf4a0baa upstream.
    
    I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
    None of these RFCs have an NFS status code that match the numeric
    value "11".
    
    Based on the meaning of the EAGAIN errno, I presume the use of this
    status in NFSD means NFS4ERR_DELAY. So replace the one usage of
    nfserr_eagain, and remove it from NFSD's NFS status conversion
    tables.
    
    As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
    era, but was not actually used by any code until commit f4e44b393389
    ("NFSD: delay unmount source's export after inter-server copy
    completed."), at which time it become possible for NFSD to return
    a status code of 11 (which is not valid NFS protocol).
    
    Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
    Cc: [email protected]
    Reviewed-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

nfsd: use correct loop termination in nfsd4_revoke_states() [+ + +]

Author: NeilBrown <[email protected]>
Date:   Mon Dec 15 08:07:28 2025 +1100

    nfsd: use correct loop termination in nfsd4_revoke_states()
    
    commit fb321998de7639f1954430674475e469fb529d9c upstream.
    
    The loop in nfsd4_revoke_states() stops one too early because
    the end value given is CLIENT_HASH_MASK where it should be
    CLIENT_HASH_SIZE.
    
    This means that an admin request to drop all locks for a filesystem will
    miss locks held by clients which hash to the maximum possible hash value.
    
    Fixes: 1ac3629bf012 ("nfsd: prepare for supporting admin-revocation of state")
    Cc: [email protected]
    Signed-off-by: NeilBrown <[email protected]>
    Reviewed-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

NFSv4: ensure the open stateid seqid doesn't go backwards [+ + +]

Author: Scott Mayhew <[email protected]>
Date:   Mon Nov 3 10:44:15 2025 -0500

    NFSv4: ensure the open stateid seqid doesn't go backwards
    
    [ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ]
    
    We have observed an NFSv4 client receiving a LOCK reply with a status of
    NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an
    earlier seqid value in the stateid.  As this was for a new lockowner,
    that would imply that nfs_set_open_stateid_locked() had updated the open
    stateid seqid with an earlier value.
    
    Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out
    of sequence, the task will sleep on the state->waitq for up to 5
    seconds.  If the task waits for the full 5 seconds, then after finishing
    the wait it'll update the open stateid seqid with whatever value the
    incoming seqid has.  If there are multiple waiters in this scenario,
    then the last one to perform said update may not be the one with the
    highest seqid.
    
    Add a check to ensure that the seqid can only be incremented, and add a
    tracepoint to indicate when old seqids are skipped.
    
    Signed-off-by: Scott Mayhew <[email protected]>
    Reviewed-by: Benjamin Coddington <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

pinctrl: qcom: lpass-lpi: mark the GPIO controller as sleeping [+ + +]

Author: Bartosz Golaszewski <[email protected]>
Date:   Wed Nov 26 13:22:19 2025 +0100

    pinctrl: qcom: lpass-lpi: mark the GPIO controller as sleeping
    
    commit ebc18e9854e5a2b62a041fb57b216a903af45b85 upstream.
    
    The gpio_chip settings in this driver say the controller can't sleep
    but it actually uses a mutex for synchronization. This triggers the
    following BUG():
    
    [    9.233659] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
    [    9.233665] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 554, name: (udev-worker)
    [    9.233669] preempt_count: 1, expected: 0
    [    9.233673] RCU nest depth: 0, expected: 0
    [    9.233688] Tainted: [W]=WARN
    [    9.233690] Hardware name: Dell Inc. Latitude 7455/0FK7MX, BIOS 2.10.1 05/20/2025
    [    9.233694] Call trace:
    [    9.233696]  show_stack+0x24/0x38 (C)
    [    9.233709]  dump_stack_lvl+0x40/0x88
    [    9.233716]  dump_stack+0x18/0x24
    [    9.233722]  __might_resched+0x148/0x160
    [    9.233731]  __might_sleep+0x38/0x98
    [    9.233736]  mutex_lock+0x30/0xd8
    [    9.233749]  lpi_config_set+0x2e8/0x3c8 [pinctrl_lpass_lpi]
    [    9.233757]  lpi_gpio_direction_output+0x58/0x90 [pinctrl_lpass_lpi]
    [    9.233761]  gpiod_direction_output_raw_commit+0x110/0x428
    [    9.233772]  gpiod_direction_output_nonotify+0x234/0x358
    [    9.233779]  gpiod_direction_output+0x38/0xd0
    [    9.233786]  gpio_shared_proxy_direction_output+0xb8/0x2a8 [gpio_shared_proxy]
    [    9.233792]  gpiod_direction_output_raw_commit+0x110/0x428
    [    9.233799]  gpiod_direction_output_nonotify+0x234/0x358
    [    9.233806]  gpiod_configure_flags+0x2c0/0x580
    [    9.233812]  gpiod_find_and_request+0x358/0x4f8
    [    9.233819]  gpiod_get_index+0x7c/0x98
    [    9.233826]  devm_gpiod_get+0x34/0xb0
    [    9.233829]  reset_gpio_probe+0x58/0x128 [reset_gpio]
    [    9.233836]  auxiliary_bus_probe+0xb0/0xf0
    [    9.233845]  really_probe+0x14c/0x450
    [    9.233853]  __driver_probe_device+0xb0/0x188
    [    9.233858]  driver_probe_device+0x4c/0x250
    [    9.233863]  __driver_attach+0xf8/0x2a0
    [    9.233868]  bus_for_each_dev+0xf8/0x158
    [    9.233872]  driver_attach+0x30/0x48
    [    9.233876]  bus_add_driver+0x158/0x2b8
    [    9.233880]  driver_register+0x74/0x118
    [    9.233886]  __auxiliary_driver_register+0x94/0xe8
    [    9.233893]  init_module+0x34/0xfd0 [reset_gpio]
    [    9.233898]  do_one_initcall+0xec/0x300
    [    9.233903]  do_init_module+0x64/0x260
    [    9.233910]  load_module+0x16c4/0x1900
    [    9.233915]  __arm64_sys_finit_module+0x24c/0x378
    [    9.233919]  invoke_syscall+0x4c/0xe8
    [    9.233925]  el0_svc_common+0x8c/0xf0
    [    9.233929]  do_el0_svc+0x28/0x40
    [    9.233934]  el0_svc+0x38/0x100
    [    9.233938]  el0t_64_sync_handler+0x84/0x130
    [    9.233943]  el0t_64_sync+0x17c/0x180
    
    Mark the controller as sleeping.
    
    Fixes: 6e261d1090d6 ("pinctrl: qcom: Add sm8250 lpass lpi pinctrl driver")
    Cc: [email protected]
    Reported-by: Val Packett <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Signed-off-by: Bartosz Golaszewski <[email protected]>
    Reviewed-by: Dmitry Baryshkov <[email protected]>
    Reviewed-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Linus Walleij <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

powercap: fix race condition in register_control_type() [+ + +]

Author: Sumeet Pawnikar <[email protected]>
Date:   Sat Dec 6 00:32:16 2025 +0530

    powercap: fix race condition in register_control_type()
    
    [ Upstream commit 7bda1910c4bccd4b8d4726620bb3d6bbfb62286e ]
    
    The device becomes visible to userspace via device_register()
    even before it fully initialized by idr_init(). If userspace
    or another thread tries to register a zone immediately after
    device_register(), the control_type_valid() will fail because
    the control_type is not yet in the list. The IDR is not yet
    initialized, so this race condition causes zone registration
    failure.
    
    Move idr_init() and list addition before device_register()
    fix the race condition.
    
    Signed-off-by: Sumeet Pawnikar <[email protected]>
    [ rjw: Subject adjustment, empty line added ]
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

powercap: fix sscanf() error return value handling [+ + +]

Author: Sumeet Pawnikar <[email protected]>
Date:   Sun Dec 7 20:45:48 2025 +0530

    powercap: fix sscanf() error return value handling
    
    [ Upstream commit efc4c35b741af973de90f6826bf35d3b3ac36bf1 ]
    
    Fix inconsistent error handling for sscanf() return value check.
    
    Implicit boolean conversion is used instead of explicit return
    value checks. The code checks if (!sscanf(...)) which is incorrect
    because:
     1. sscanf returns the number of successfully parsed items
     2. On success, it returns 1 (one item passed)
     3. On failure, it returns 0 or EOF
     4. The check 'if (!sscanf(...))' is wrong because it treats
        success (1) as failure
    
    All occurrences of sscanf() now uses explicit return value check.
    With this behavior it returns '-EINVAL' when parsing fails (returns
    0 or EOF), and continues when parsing succeeds (returns 1).
    
    Signed-off-by: Sumeet Pawnikar <[email protected]>
    [ rjw: Subject and changelog edits ]
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

riscv: pgtable: Cleanup useless VA_USER_XXX definitions [+ + +]

Author: Guo Ren (Alibaba DAMO Academy) <[email protected]>
Date:   Sun Nov 30 19:58:50 2025 -0500

    riscv: pgtable: Cleanup useless VA_USER_XXX definitions
    
    [ Upstream commit 5e5be092ffadcab0093464ccd9e30f0c5cce16b9 ]
    
    These marcos are not used after commit b5b4287accd7 ("riscv: mm: Use
    hint address in mmap if available"). Cleanup VA_USER_XXX definitions
    in asm/pgtable.h.
    
    Fixes: b5b4287accd7 ("riscv: mm: Use hint address in mmap if available")
    Signed-off-by: Guo Ren (Alibaba DAMO Academy) <[email protected]>
    Reviewed-by: Jinjie Ruan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paul Walmsley <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset [+ + +]

Author: Wen Xiong <[email protected]>
Date:   Tue Oct 28 09:24:26 2025 -0500

    scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
    
    [ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ]
    
    A dynamic remove/add storage adapter test hits EEH on PowerPC:
    
      EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160
      EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650
      EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr]
      EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr]
      EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr]
      EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440
      EEH: [c00000000017e558] process_one_work+0x298/0x590
      EEH: [c00000000017eef8] worker_thread+0xa8/0x620
      EEH: [c00000000018be34] kthread+0x124/0x130
      EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
    
    A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by
    irqbalance daemon. If we disable irqbalance daemon, we won't see the
    issue.
    
    With debug enabled in ipr driver:
    
      [   44.103071] ipr: Entering __ipr_remove
      [   44.103083] ipr: Entering ipr_initiate_ioa_bringdown
      [   44.103091] ipr: Entering ipr_reset_shutdown_ioa
      [   44.103099] ipr: Leaving ipr_reset_shutdown_ioa
      [   44.103105] ipr: Leaving ipr_initiate_ioa_bringdown
      [   44.149918] ipr: Entering ipr_reset_ucode_download
      [   44.149935] ipr: Entering ipr_reset_alert
      [   44.150032] ipr: Entering ipr_reset_start_timer
      [   44.150038] ipr: Leaving ipr_reset_alert
      [   44.244343] scsi 1:2:3:0: alua: Detached
      [   44.254300] ipr: Entering ipr_reset_start_bist
      [   44.254320] ipr: Entering ipr_reset_start_timer
      [   44.254325] ipr: Leaving ipr_reset_start_bist
      [   44.364329] scsi 1:2:4:0: alua: Detached
      [   45.134341] scsi 1:2:5:0: alua: Detached
      [   45.860949] ipr: Entering ipr_reset_shutdown_ioa
      [   45.860962] ipr: Leaving ipr_reset_shutdown_ioa
      [   45.860966] ipr: Entering ipr_reset_alert
      [   45.861028] ipr: Entering ipr_reset_start_timer
      [   45.861035] ipr: Leaving ipr_reset_alert
      [   45.964302] ipr: Entering ipr_reset_start_bist
      [   45.964309] ipr: Entering ipr_reset_start_timer
      [   45.964313] ipr: Leaving ipr_reset_start_bist
      [   46.264301] ipr: Entering ipr_reset_bist_done
      [   46.264309] ipr: Leaving ipr_reset_bist_done
    
    During adapter reset, ipr device driver blocks config space access but
    can't block MMIO access for MSI-X entries.  There is very small window:
    irqbalance daemon kicks in during adapter reset before ipr driver calls
    pci_restore_state(pdev) to restore MSI-X table.
    
    irqbalance daemon reads back all 0 for that MSI-X vector in
    __pci_read_msi_msg().
    
    irqbalance daemon:
    
      msi_domain_set_affinity()
      ->irq_chip_set_affinity_patent()
      ->xive_irq_set_affinity()
      ->irq_chip_compose_msi_msg()
        ->pseries_msi_compose_msg()
        ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state
      ->irq_chip_write_msi_msg()
        -> pci_write_msg_msi(): write 0 to the msix vector entry
    
    When ipr driver calls pci_restore_state(pdev) in
    ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared
    by irqbalance daemon in pci_write_msg_msix().
    
      pci_restore_state()
      ->__pci_restore_msix_state()
    
    Below is the MSI-X table for ipr adapter after irqbalance daemon kicked
    in during adapter reset:
    
      Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0
      Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0
      Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0
      Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0
      Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0
      Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0
      Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0
      Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0
      Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0
      ---------> Hit EEH since msix vector of index=8 are 0
      Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0
      Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0
      Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0
      Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0
      Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0
      Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0
      Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0
    
      [   46.264312] ipr: Entering ipr_reset_restore_cfg_space
      [   46.267439] ipr: Entering ipr_fail_all_ops
      [   46.267447] ipr: Leaving ipr_fail_all_ops
      [   46.267451] ipr: Leaving ipr_reset_restore_cfg_space
      [   46.267454] ipr: Entering ipr_ioa_bringdown_done
      [   46.267458] ipr: Leaving ipr_ioa_bringdown_done
      [   46.267467] ipr: Entering ipr_worker_thread
      [   46.267470] ipr: Leaving ipr_worker_thread
    
    IRQ balancing is not required during adapter reset.
    
    Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable
    it after calling pci_restore_state(). The irqbalance daemon is disabled
    for this short period of time (~2s).
    
    Co-developed-by: Kyle Mahlkuch <[email protected]>
    Signed-off-by: Kyle Mahlkuch <[email protected]>
    Signed-off-by: Wen Xiong <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" [+ + +]

Author: Xingui Yang <[email protected]>
Date:   Tue Dec 2 14:56:27 2025 +0800

    scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
    
    [ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ]
    
    This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d.
    
    When probing the exp-attached sata device, libsas/libata will issue a
    hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a
    broadcast event will be received after the disk probe fails, and this
    commit causes the probe will be re-executed on the disk, and a faulty
    disk may get into an indefinite loop of probe.
    
    Therefore, revert this commit, although it can fix some temporary issues
    with disk probe failure.
    
    Signed-off-by: Xingui Yang <[email protected]>
    Reviewed-by: Jason Yan <[email protected]>
    Reviewed-by: John Garry <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: sg: Fix occasional bogus elapsed time that exceeds timeout [+ + +]

Author: Michal Rábek <[email protected]>
Date:   Fri Dec 12 17:08:23 2025 +0100

    scsi: sg: Fix occasional bogus elapsed time that exceeds timeout
    
    [ Upstream commit 0e1677654259a2f3ccf728de1edde922a3c4ba57 ]
    
    A race condition was found in sg_proc_debug_helper(). It was observed on
    a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring
    /proc/scsi/sg/debug every second. A very large elapsed time would
    sometimes appear. This is caused by two race conditions.
    
    We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64
    architecture. A patched kernel was built, and the race condition could
    not be observed anymore after the application of this patch. A
    reproducer C program utilising the scsi_debug module was also built by
    Changhui Zhong and can be viewed here:
    
    https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c
    
    The first race happens between the reading of hp->duration in
    sg_proc_debug_helper() and request completion in sg_rq_end_io().  The
    hp->duration member variable may hold either of two types of
    information:
    
     #1 - The start time of the request. This value is present while
          the request is not yet finished.
    
     #2 - The total execution time of the request (end_time - start_time).
    
    If sg_proc_debug_helper() executes *after* the value of hp->duration was
    changed from #1 to #2, but *before* srp->done is set to 1 in
    sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the
    elapsed time (value type #2) is subtracted from a timestamp, which
    cannot yield a valid elapsed time (which is a type #2 value as well).
    
    To fix this issue, the value of hp->duration must change under the
    protection of the sfp->rq_list_lock in sg_rq_end_io().  Since
    sg_proc_debug_helper() takes this read lock, the change to srp->done and
    srp->header.duration will happen atomically from the perspective of
    sg_proc_debug_helper() and the race condition is thus eliminated.
    
    The second race condition happens between sg_proc_debug_helper() and
    sg_new_write(). Even though hp->duration is set to the current time
    stamp in sg_add_request() under the write lock's protection, it gets
    overwritten by a call to get_sg_io_hdr(), which calls copy_from_user()
    to copy struct sg_io_hdr from userspace into kernel space. hp->duration
    is set to the start time again in sg_common_write(). If
    sg_proc_debug_helper() is called between these two calls, an arbitrary
    value set by userspace (usually zero) is used to compute the elapsed
    time.
    
    To fix this issue, hp->duration must be set to the current timestamp
    again after get_sg_io_hdr() returns successfully. A small race window
    still exists between get_sg_io_hdr() and setting hp->duration, but this
    window is only a few instructions wide and does not result in observable
    issues in practice, as confirmed by testing.
    
    Additionally, we fix the format specifier from %d to %u for printing
    unsigned int values in sg_proc_debug_helper().
    
    Signed-off-by: Michal Rábek <[email protected]>
    Suggested-by: Tomas Henzl <[email protected]>
    Tested-by: Changhui Zhong <[email protected]>
    Reviewed-by: Ewan D. Milne <[email protected]>
    Reviewed-by: John Meneghini <[email protected]>
    Reviewed-by: Tomas Henzl <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: ufs: core: Fix EH failure after W-LUN resume error [+ + +]

Author: Brian Kao <[email protected]>
Date:   Wed Nov 12 06:32:02 2025 +0000

    scsi: ufs: core: Fix EH failure after W-LUN resume error
    
    [ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ]
    
    When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
    including the scsi_target, may be runtime suspended. Subsequently, the
    error handler in ufshcd_recover_pm_error() fails to set the W-LUN device
    back to active because the parent target is not active.  This results in
    the following errors:
    
      google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
      ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
      ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
      ...
      ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
    
    Address this by:
    
     1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
        attempting to set the W-LUN to active within
        ufshcd_recover_pm_error().
    
     2. Explicitly checking for power.runtime_error on the HBA and W-LUN
        devices before calling pm_runtime_set_active() to clear the error
        state.
    
     3. Adding pm_runtime_get_sync(hba->dev) in
        ufshcd_err_handling_prepare() to ensure the HBA itself is active
        during error recovery, even if a child device resume failed.
    
    These changes ensure the device power states are managed correctly
    during error recovery.
    
    Signed-off-by: Brian Kao <[email protected]>
    Tested-by: Brian Kao <[email protected]>
    Reviewed-by: Bart Van Assche <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Sun Dec 7 09:17:57 2025 +0800

    smb/client: fix NT_STATUS_DEVICE_DOOR_OPEN value
    
    [ Upstream commit b2b50fca34da5ec231008edba798ddf92986bd7f ]
    
    This was reported by the KUnit tests in the later patches.
    
    See MS-ERREF 2.3.1 STATUS_DEVICE_DOOR_OPEN. Keep it consistent with the
    value in the documentation.
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

smb/client: fix NT_STATUS_NO_DATA_DETECTED value [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Sun Dec 7 09:13:06 2025 +0800

    smb/client: fix NT_STATUS_NO_DATA_DETECTED value
    
    [ Upstream commit a1237c203f1757480dc2f3b930608ee00072d3cc ]
    
    This was reported by the KUnit tests in the later patches.
    
    See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the
    value in the documentation.
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value [+ + +]

Author: ChenXiaoSong <[email protected]>
Date:   Sun Dec 7 09:22:53 2025 +0800

    smb/client: fix NT_STATUS_UNABLE_TO_FREE_VM value
    
    [ Upstream commit 9f99caa8950a76f560a90074e3a4b93cfa8b3d84 ]
    
    This was reported by the KUnit tests in the later patches.
    
    See MS-ERREF 2.3.1 STATUS_UNABLE_TO_FREE_VM. Keep it consistent with the
    value in the documentation.
    
    Signed-off-by: ChenXiaoSong <[email protected]>
    Acked-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

spi: cadence-quadspi: Prevent lost complete() call during indirect read [+ + +]

Author: Mateusz Litwin <[email protected]>
Date:   Thu Dec 18 22:33:04 2025 +0100

    spi: cadence-quadspi: Prevent lost complete() call during indirect read
    
    [ Upstream commit d67396c9d697041b385d70ff2fd59cb07ae167e8 ]
    
    A race condition exists between the read loop and IRQ `complete()` call.
    An interrupt could call the complete() between the inner loop and
    reinit_completion(), potentially losing the completion event and causing
    an unnecessary timeout. Moving reinit_completion() before the loop
    prevents this. A premature signal will only result in a spurious wakeup
    and another wait cycle, which is preferable to waiting for a timeout.
    
    Signed-off-by: Mateusz Litwin <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

spi: mt65xx: Use IRQF_ONESHOT with threaded IRQ [+ + +]

Author: Fei Shao <[email protected]>
Date:   Wed Dec 17 18:10:47 2025 +0800

    spi: mt65xx: Use IRQF_ONESHOT with threaded IRQ
    
    [ Upstream commit 8c04b77f87e6e321ae6acd28ce1de5553916153f ]
    
    This driver is migrated to use threaded IRQ since commit 5972eb05ca32
    ("spi: spi-mt65xx: Use threaded interrupt for non-SPIMEM transfer"), and
    we almost always want to disable the interrupt line to avoid excess
    interrupts while the threaded handler is processing SPI transfer.
    Use IRQF_ONESHOT for that purpose.
    
    In practice, we see MediaTek devices show SPI transfer timeout errors
    when communicating with ChromeOS EC in certain scenarios, and with
    IRQF_ONESHOT, the issue goes away.
    
    Signed-off-by: Fei Shao <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock(). [+ + +]

Author: Kuniyuki Iwashima <[email protected]>
Date:   Tue Sep 16 21:47:23 2025 +0000

    tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock().
    
    commit c65f27b9c3be2269918e1cbad6d8884741f835c5 upstream.
    
    get_netdev_for_sock() is called during setsockopt(),
    so not under RCU.
    
    Using sk_dst_get(sk)->dev could trigger UAF.
    
    Let's use __sk_dst_get() and dst_dev_rcu().
    
    Note that the only ->ndo_sk_get_lower_dev() user is
    bond_sk_get_lower_dev(), which uses RCU.
    
    Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Reviewed-by: Sabrina Dubroca <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    [ Keerthana: Backport to v6.12.y ]
    Signed-off-by: Keerthana K <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tpm2-sessions: Fix out of range indexing in name_size [+ + +]

Author: Jarkko Sakkinen <[email protected]>
Date:   Sun Nov 30 21:07:12 2025 +0200

    tpm2-sessions: Fix out of range indexing in name_size
    
    commit 6e9722e9a7bfe1bbad649937c811076acf86e1fd upstream.
    
    'name_size' does not have any range checks, and it just directly indexes
    with TPM_ALG_ID, which could lead into memory corruption at worst.
    
    Address the issue by only processing known values and returning -EINVAL for
    unrecognized values.
    
    Make also 'tpm_buf_append_name' and 'tpm_buf_fill_hmac_session' fallible so
    that errors are detected before causing any spurious TPM traffic.
    
    End also the authorization session on failure in both of the functions, as
    the session state would be then by definition corrupted.
    
    Cc: [email protected] # v6.10+
    Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API")
    Reviewed-by: Jonathan McDowell <[email protected]>
    Signed-off-by: Jarkko Sakkinen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vsock: Make accept()ed sockets use custom setsockopt() [+ + +]

Author: Michal Luczaj <[email protected]>
Date:   Mon Dec 29 20:43:10 2025 +0100

    vsock: Make accept()ed sockets use custom setsockopt()
    
    [ Upstream commit ce5e612dd411de096aa041b9e9325ba1bec5f9f4 ]
    
    SO_ZEROCOPY handling in vsock_connectible_setsockopt() does not get called
    on accept()ed sockets due to a missing flag. Flip it.
    
    Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
    Signed-off-by: Michal Luczaj <[email protected]>
    Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-1-64778d6c4f88@rbox.co
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

wifi: avoid kernel-infoleak from struct iw_point [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Thu Jan 8 10:19:27 2026 +0000

    wifi: avoid kernel-infoleak from struct iw_point
    
    commit 21cbf883d073abbfe09e3924466aa5e0449e7261 upstream.
    
    struct iw_point has a 32bit hole on 64bit arches.
    
    struct iw_point {
      void __user   *pointer;       /* Pointer to the data  (in user space) */
      __u16         length;         /* number of fields or size in bytes */
      __u16         flags;          /* Optional params */
    };
    
    Make sure to zero the structure to avoid disclosing 32bits of kernel data
    to user space.
    
    Fixes: 87de87d5e47f ("wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: [email protected]
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

wifi: mac80211: restore non-chanctx injection behaviour [+ + +]

Author: Johannes Berg <[email protected]>
Date:   Tue Dec 16 11:52:42 2025 +0100

    wifi: mac80211: restore non-chanctx injection behaviour
    
    commit d594cc6f2c588810888df70c83a9654b6bc7942d upstream.
    
    During the transition to use channel contexts throughout, the
    ability to do injection while in monitor mode concurrent with
    another interface was lost, since the (virtual) monitor won't
    have a chanctx assigned in this scenario.
    
    It's harder to fix drivers that actually transitioned to using
    channel contexts themselves, such as mt76, but it's easy to do
    those that are (still) just using the emulation. Do that.
    
    Cc: [email protected]
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=218763
    Reported-and-tested-by: Oscar Alfonso Diaz <[email protected]>
    Fixes: 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers")
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>