Список изменений в ядре 6.1.87

af_unix: Clear stale u->oob_skb. [+ + +]

Author: Kuniyuki Iwashima <[email protected]>
Date:   Fri Apr 5 15:10:57 2024 -0700

    af_unix: Clear stale u->oob_skb.
    
    [ Upstream commit b46f4eaa4f0ec38909fb0072eea3aeddb32f954e ]
    
    syzkaller started to report deadlock of unix_gc_lock after commit
    4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but
    it just uncovers the bug that has been there since commit 314001f0bf92
    ("af_unix: Add OOB support").
    
    The repro basically does the following.
    
      from socket import *
      from array import array
    
      c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
      c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
      c2.recv(1)  # blocked as no normal data in recv queue
    
      c2.close()  # done async and unblock recv()
      c1.close()  # done async and trigger GC
    
    A socket sends its file descriptor to itself as OOB data and tries to
    receive normal data, but finally recv() fails due to async close().
    
    The problem here is wrong handling of OOB skb in manage_oob().  When
    recvmsg() is called without MSG_OOB, manage_oob() is called to check
    if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
    out of the receive queue but does not clear unix_sock(sk)->oob_skb.
    This is wrong in terms of uAPI.
    
    Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
    The 'o' is handled as OOB data.  When recv() is called twice without
    MSG_OOB, the OOB data should be lost.
    
      >>> from socket import *
      >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
      >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
      5
      >>> c1.send(b'world')
      5
      >>> c2.recv(5)  # OOB data is not received
      b'hell'
      >>> c2.recv(5)  # OOB date is skipped
      b'world'
      >>> c2.recv(5, MSG_OOB)  # This should return an error
      b'o'
    
    In the same situation, TCP actually returns -EINVAL for the last
    recv().
    
    Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
    EPOLLPRI even though the data has passed through by previous recv().
    
    To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
    it from recv queue.
    
    The reason why the old GC did not trigger the deadlock is because the
    old GC relied on the receive queue to detect the loop.
    
    When it is triggered, the socket with OOB data is marked as GC candidate
    because file refcount == inflight count (1).  However, after traversing
    all inflight sockets, the socket still has a positive inflight count (1),
    thus the socket is excluded from candidates.  Then, the old GC lose the
    chance to garbage-collect the socket.
    
    With the old GC, the repro continues to create true garbage that will
    never be freed nor detected by kmemleak as it's linked to the global
    inflight list.  That's why we couldn't even notice the issue.
    
    Fixes: 314001f0bf92 ("af_unix: Add OOB support")
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

af_unix: Do not use atomic ops for unix_sk(sk)->inflight. [+ + +]

Author: Kuniyuki Iwashima <[email protected]>
Date:   Tue Jan 23 09:08:53 2024 -0800

    af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
    
    [ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ]
    
    When touching unix_sk(sk)->inflight, we are always under
    spin_lock(&unix_gc_lock).
    
    Let's convert unix_sk(sk)->inflight to the normal unsigned long.
    
    Signed-off-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
    Signed-off-by: Sasha Levin <[email protected]>

af_unix: Fix garbage collector racing against connect() [+ + +]

Author: Michal Luczaj <[email protected]>
Date:   Tue Apr 9 22:09:39 2024 +0200

    af_unix: Fix garbage collector racing against connect()
    
    [ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]
    
    Garbage collector does not take into account the risk of embryo getting
    enqueued during the garbage collection. If such embryo has a peer that
    carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
    different set of children. Leading to an incorrectly elevated inflight
    count, and then a dangling pointer within the gc_inflight_list.
    
    sockets are AF_UNIX/SOCK_STREAM
    S is an unconnected socket
    L is a listening in-flight socket bound to addr, not in fdtable
    V's fd will be passed via sendmsg(), gets inflight count bumped
    
    connect(S, addr)        sendmsg(S, [V]); close(V)       __unix_gc()
    ----------------        -------------------------       -----------
    
    NS = unix_create1()
    skb1 = sock_wmalloc(NS)
    L = unix_find_other(addr)
    unix_state_lock(L)
    unix_peer(S) = NS
                            // V count=1 inflight=0
    
                            NS = unix_peer(S)
                            skb2 = sock_alloc()
                            skb_queue_tail(NS, skb2[V])
    
                            // V became in-flight
                            // V count=2 inflight=1
    
                            close(V)
    
                            // V count=1 inflight=1
                            // GC candidate condition met
    
                                                    for u in gc_inflight_list:
                                                      if (total_refs == inflight_refs)
                                                        add u to gc_candidates
    
                                                    // gc_candidates={L, V}
    
                                                    for u in gc_candidates:
                                                      scan_children(u, dec_inflight)
    
                                                    // embryo (skb1) was not
                                                    // reachable from L yet, so V's
                                                    // inflight remains unchanged
    __skb_queue_tail(L, skb1)
    unix_state_unlock(L)
                                                    for u in gc_candidates:
                                                      if (u.inflight)
                                                        scan_children(u, inc_inflight_move_tail)
    
                                                    // V count=1 inflight=2 (!)
    
    If there is a GC-candidate listening socket, lock/unlock its state. This
    makes GC wait until the end of any ongoing connect() to that socket. After
    flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
    there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
    this point, unix_inflight() can not happen because unix_gc_lock is already
    taken. Inflight graph remains unaffected.
    
    Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
    Signed-off-by: Michal Luczaj <[email protected]>
    Reviewed-by: Kuniyuki Iwashima <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order [+ + +]

Author: Frank Li <[email protected]>
Date:   Fri Mar 22 12:47:05 2024 -0400

    arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
    
    [ Upstream commit c6ddd6e7b166532a0816825442ff60f70aed9647 ]
    
    The actual clock show wrong frequency:
    
       echo on >/sys/devices/platform/bus\@5b000000/5b010000.mmc/power/control
       cat /sys/kernel/debug/mmc0/ios
    
       clock:          200000000 Hz
       actual clock:   166000000 Hz
                       ^^^^^^^^^
       .....
    
    According to
    
    sdhc0_lpcg: clock-controller@5b200000 {
                    compatible = "fsl,imx8qxp-lpcg";
                    reg = <0x5b200000 0x10000>;
                    #clock-cells = <1>;
                    clocks = <&clk IMX_SC_R_SDHC_0 IMX_SC_PM_CLK_PER>,
                             <&conn_ipg_clk>, <&conn_axi_clk>;
                    clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>,
                                    <IMX_LPCG_CLK_5>;
                    clock-output-names = "sdhc0_lpcg_per_clk",
                                         "sdhc0_lpcg_ipg_clk",
                                         "sdhc0_lpcg_ahb_clk";
                    power-domains = <&pd IMX_SC_R_SDHC_0>;
            }
    
    "per_clk" should be IMX_LPCG_CLK_0 instead of IMX_LPCG_CLK_5.
    
    After correct clocks order:
    
       echo on >/sys/devices/platform/bus\@5b000000/5b010000.mmc/power/control
       cat /sys/kernel/debug/mmc0/ios
    
       clock:          200000000 Hz
       actual clock:   198000000 Hz
                       ^^^^^^^^
       ...
    
    Fixes: 16c4ea7501b1 ("arm64: dts: imx8: switch to new lpcg clock binding")
    Signed-off-by: Frank Li <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ata: libata-scsi: Fix ata_scsi_dev_rescan() error path [+ + +]

Author: Damien Le Moal <[email protected]>
Date:   Fri Apr 12 08:41:15 2024 +0900

    ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
    
    commit 79336504781e7fee5ddaf046dcc186c8dfdf60b1 upstream.
    
    Commit 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
    incorrectly handles failures of scsi_resume_device() in
    ata_scsi_dev_rescan(), leading to a double call to
    spin_unlock_irqrestore() to unlock a device port. Fix this by redefining
    the goto labels used in case of errors and only unlock the port
    scsi_scan_mutex when scsi_resume_device() fails.
    
    Bug found with the Smatch static checker warning:
    
            drivers/ata/libata-scsi.c:4774 ata_scsi_dev_rescan()
            error: double unlocked 'ap->lock' (orig line 4757)
    
    Reported-by: Dan Carpenter <[email protected]>
    Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
    Cc: [email protected]
    Signed-off-by: Damien Le Moal <[email protected]>
    Reviewed-by: Niklas Cassel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Avoid infinite loop trying to resize local TT [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Mon Feb 12 13:58:33 2024 +0100

    batman-adv: Avoid infinite loop trying to resize local TT
    
    commit b1f532a3b1e6d2e5559c7ace49322922637a28aa upstream.
    
    If the MTU of one of an attached interface becomes too small to transmit
    the local translation table then it must be resized to fit inside all
    fragments (when enabled) or a single packet.
    
    But if the MTU becomes too low to transmit even the header + the VLAN
    specific part then the resizing of the local TT will never succeed. This
    can for example happen when the usable space is 110 bytes and 11 VLANs are
    on top of batman-adv. In this case, at least 116 byte would be needed.
    There will just be an endless spam of
    
       batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
    
    in the log but the function will never finish. Problem here is that the
    timeout will be halved all the time and will then stagnate at 0 and
    therefore never be able to reduce the table even more.
    
    There are other scenarios possible with a similar result. The number of
    BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too
    high to fit inside a packet. Such a scenario can therefore happen also with
    only a single VLAN + 7 non-purgable addresses - requiring at least 120
    bytes.
    
    While this should be handled proactively when:
    
    * interface with too low MTU is added
    * VLAN is added
    * non-purgeable local mac is added
    * MTU of an attached interface is reduced
    * fragmentation setting gets disabled (which most likely requires dropping
      attached interfaces)
    
    not all of these scenarios can be prevented because batman-adv is only
    consuming events without the the possibility to prevent these actions
    (non-purgable MAC address added, MTU of an attached interface is reduced).
    It is therefore necessary to also make sure that the code is able to handle
    also the situations when there were already incompatible system
    configuration are present.
    
    Cc: [email protected]
    Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size")
    Reported-by: [email protected]
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Bluetooth: Fix memory leak in hci_req_sync_complete() [+ + +]

Author: Dmitry Antipov <[email protected]>
Date:   Tue Apr 2 14:32:05 2024 +0300

    Bluetooth: Fix memory leak in hci_req_sync_complete()
    
    commit 45d355a926ab40f3ae7bc0b0a00cb0e3e8a5a810 upstream.
    
    In 'hci_req_sync_complete()', always free the previous sync
    request state before assigning reference to a new one.
    
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=39ec16ff6cc18b1d066d
    Cc: [email protected]
    Fixes: f60cb30579d3 ("Bluetooth: Convert hci_req_sync family of function to new request API")
    Signed-off-by: Dmitry Antipov <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Bluetooth: L2CAP: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <[email protected]>
Date:   Fri Apr 5 15:50:47 2024 -0400

    Bluetooth: L2CAP: Fix not validating setsockopt user input
    
    [ Upstream commit 4f3951242ace5efc7131932e2e01e6ac6baed846 ]
    
    Check user input length before copying data.
    
    Fixes: 33575df7be67 ("Bluetooth: move l2cap_sock_setsockopt() to l2cap_sock.c")
    Fixes: 3ee7b7cd8390 ("Bluetooth: Add BT_MODE socket option")
    Signed-off-by: Eric Dumazet <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Bluetooth: SCO: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <[email protected]>
Date:   Fri Apr 5 15:41:52 2024 -0400

    Bluetooth: SCO: Fix not validating setsockopt user input
    
    [ Upstream commit 51eda36d33e43201e7a4fd35232e069b2c850b01 ]
    
    syzbot reported sco_sock_setsockopt() is copying data without
    checking user input length.
    
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
    include/linux/sockptr.h:49 [inline]
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
    include/linux/sockptr.h:55 [inline]
    BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90
    net/bluetooth/sco.c:893
    Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578
    
    Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option")
    Fixes: b96e9c671b05 ("Bluetooth: Add BT_DEFER_SETUP option to sco socket")
    Fixes: 00398e1d5183 ("Bluetooth: Add support for BT_PKT_STATUS CMSG data for SCO connections")
    Fixes: f6873401a608 ("Bluetooth: Allow setting of codec for HFP offload use case")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bnxt_en: Reset PTP tx_avail after possible firmware reset [+ + +]

Author: Pavan Chebbi <[email protected]>
Date:   Fri Apr 5 16:55:13 2024 -0700

    bnxt_en: Reset PTP tx_avail after possible firmware reset
    
    [ Upstream commit faa12ca245585379d612736a4b5e98e88481ea59 ]
    
    It is possible that during error recovery and firmware reset,
    there is a pending TX PTP packet waiting for the timestamp.
    We need to reset this condition so that after recovery, the
    tx_avail count for PTP is reset back to the initial value.
    Otherwise, we may not accept any PTP TX timestamps after
    recovery.
    
    Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods")
    Reviewed-by: Kalesh AP <[email protected]>
    Signed-off-by: Pavan Chebbi <[email protected]>
    Signed-off-by: Michael Chan <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans [+ + +]

Author: Boris Burkov <[email protected]>
Date:   Thu Mar 21 10:18:39 2024 -0700

    btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
    
    commit 211de93367304ab395357f8cb12568a4d1e20701 upstream.
    
    The transaction is only able to free PERTRANS reservations for a root
    once that root has been recorded with the TRANS tag on the roots radix
    tree. Therefore, until we are sure that this root will get tagged, it
    isn't safe to convert. Generally, this is not an issue as *some*
    transaction will likely tag the root before long and this reservation
    will get freed in that transaction, but technically it could stick
    around until unmount and result in a warning about leaked metadata
    reservation space.
    
    This path is most exercised by running the generic/269 fstest with
    CONFIG_BTRFS_DEBUG.
    
    Fixes: a6496849671a ("btrfs: fix start transaction qgroup rsv double free")
    CC: [email protected] # 6.6+
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Boris Burkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: qgroup: correctly model root qgroup rsv in convert [+ + +]

Author: Boris Burkov <[email protected]>
Date:   Tue Mar 19 10:54:22 2024 -0700

    btrfs: qgroup: correctly model root qgroup rsv in convert
    
    commit 141fb8cd206ace23c02cd2791c6da52c1d77d42a upstream.
    
    We use add_root_meta_rsv and sub_root_meta_rsv to track prealloc and
    pertrans reservations for subvolumes when quotas are enabled. The
    convert function does not properly increment pertrans after decrementing
    prealloc, so the count is not accurate.
    
    Note: we check that the fs is not read-only to mirror the logic in
    qgroup_convert_meta, which checks that before adding to the pertrans rsv.
    
    Fixes: 8287475a2055 ("btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space")
    CC: [email protected] # 6.1+
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Boris Burkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: record delayed inode root in transaction [+ + +]

Author: Boris Burkov <[email protected]>
Date:   Thu Mar 21 10:14:24 2024 -0700

    btrfs: record delayed inode root in transaction
    
    commit 71537e35c324ea6fbd68377a4f26bb93a831ae35 upstream.
    
    When running delayed inode updates, we do not record the inode's root in
    the transaction, but we do allocate PREALLOC and thus converted PERTRANS
    space for it. To be sure we free that PERTRANS meta rsv, we must ensure
    that we record the root in the transaction.
    
    Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item")
    CC: [email protected] # 6.1+
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Boris Burkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amd/display: fix disable otg wa logic in DCN316 [+ + +]

Author: Fudongwang <[email protected]>
Date:   Tue Mar 26 16:03:16 2024 +0800

    drm/amd/display: fix disable otg wa logic in DCN316
    
    commit cf79814cb0bf5749b9f0db53ca231aa540c02768 upstream.
    
    [Why]
    Wrong logic cause screen corruption.
    
    [How]
    Port logic from DCN35/314.
    
    Cc: [email protected]
    Reviewed-by: Nicholas Kazlauskas <[email protected]>
    Acked-by: Hamza Mahfooz <[email protected]>
    Signed-off-by: Fudongwang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11 [+ + +]

Author: Tim Huang <[email protected]>
Date:   Wed Mar 27 13:10:37 2024 +0800

    drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
    
    commit 31729e8c21ecfd671458e02b6511eb68c2225113 upstream.
    
    While doing multiple S4 stress tests, GC/RLC/PMFW get into
    an invalid state resulting into hard hangs.
    
    Adding a GFX reset as workaround just before sending the
    MP1_UNLOAD message avoids this failure.
    
    Signed-off-by: Tim Huang <[email protected]>
    Acked-by: Alex Deucher <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: Mario Limonciello <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amdgpu: always force full reset for SOC21 [+ + +]

Author: Alex Deucher <[email protected]>
Date:   Sat Mar 23 20:46:53 2024 -0400

    drm/amdgpu: always force full reset for SOC21
    
    commit 65ff8092e4802f96d87d3d7cde146961f5228265 upstream.
    
    There are cases where soft reset seems to succeed, but
    does not, so always use mode1/2 for now.
    
    Reviewed-by: Harish Kasiviswanathan <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amdgpu: Reset dGPU if suspend got aborted [+ + +]

Author: Lijo Lazar <[email protected]>
Date:   Wed Feb 14 17:55:54 2024 +0530

    drm/amdgpu: Reset dGPU if suspend got aborted
    
    commit 8b2be55f4d6c1099d7f629b0ed7535a5be788c83 upstream.
    
    For SOC21 ASICs, there is an issue in re-enabling PM features if a
    suspend got aborted. In such cases, reset the device during resume
    phase. This is a workaround till a proper solution is finalized.
    
    Signed-off-by: Lijo Lazar <[email protected]>
    Reviewed-by: Alex Deucher <[email protected]>
    Reviewed-by: Yang Wang <[email protected]>
    Reviewed-by: Hawking Zhang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amdkfd: Reset GPU on queue preemption failure [+ + +]

Author: Harish Kasiviswanathan <[email protected]>
Date:   Tue Mar 26 15:32:46 2024 -0400

    drm/amdkfd: Reset GPU on queue preemption failure
    
    commit 8bdfb4ea95ca738d33ef71376c21eba20130f2eb upstream.
    
    Currently, with F32 HWS GPU reset is only when unmap queue fails.
    
    However, if compute queue doesn't repond to preemption request in time
    unmap will return without any error. In this case, only preemption error
    is logged and Reset is not triggered. Call GPU reset in this case also.
    
    Reviewed-by: Alex Deucher <[email protected]>
    Signed-off-by: Harish Kasiviswanathan <[email protected]>
    Reviewed-by: Mukul Joshi <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/ast: Fix soft lockup [+ + +]

Author: Jammy Huang <[email protected]>
Date:   Wed Apr 3 17:02:46 2024 +0800

    drm/ast: Fix soft lockup
    
    commit bc004f5038220b1891ef4107134ccae44be55109 upstream.
    
    There is a while-loop in ast_dp_set_on_off() that could lead to
    infinite-loop. This is because the register, VGACRI-Dx, checked in
    this API is a scratch register actually controlled by a MCU, named
    DPMCU, in BMC.
    
    These scratch registers are protected by scu-lock. If suc-lock is not
    off, DPMCU can not update these registers and then host will have soft
    lockup due to never updated status.
    
    DPMCU is used to control DP and relative registers to handshake with
    host's VGA driver. Even the most time-consuming task, DP's link
    training, is less than 100ms. 200ms should be enough.
    
    Signed-off-by: Jammy Huang <[email protected]>
    Fixes: 594e9c04b586 ("drm/ast: Create the driver for ASPEED proprietory Display-Port")
    Reviewed-by: Jocelyn Falempe <[email protected]>
    Reviewed-by: Thomas Zimmermann <[email protected]>
    Signed-off-by: Thomas Zimmermann <[email protected]>
    Cc: KuoHsiang Chou <[email protected]>
    Cc: Thomas Zimmermann <[email protected]>
    Cc: Dave Airlie <[email protected]>
    Cc: Jocelyn Falempe <[email protected]>
    Cc: [email protected]
    Cc: <[email protected]> # v5.19+
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/client: Fully protect modes[] with dev->mode_config.mutex [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Thu Apr 4 23:33:25 2024 +0300

    drm/client: Fully protect modes[] with dev->mode_config.mutex
    
    commit 3eadd887dbac1df8f25f701e5d404d1b90fd0fea upstream.
    
    The modes[] array contains pointers to modes on the connectors'
    mode lists, which are protected by dev->mode_config.mutex.
    Thus we need to extend modes[] the same protection or by the
    time we use it the elements may already be pointing to
    freed/reused memory.
    
    Cc: [email protected]
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10583
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Reviewed-by: Dmitry Baryshkov <[email protected]>
    Reviewed-by: Jani Nikula <[email protected]>
    Reviewed-by: Thomas Zimmermann <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/i915/cdclk: Fix CDCLK programming order when pipes are active [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Tue Apr 2 18:50:03 2024 +0300

    drm/i915/cdclk: Fix CDCLK programming order when pipes are active
    
    commit 7b1f6b5aaec0f849e19c3e99d4eea75876853cdd upstream.
    
    Currently we always reprogram CDCLK from the
    intel_set_cdclk_pre_plane_update() when using squash/crawl.
    The code only works correctly for the cd2x update or full
    modeset cases, and it was simply never updated to deal with
    squash/crawl.
    
    If the CDCLK frequency is increasing we must reprogram it
    before we do anything else that might depend on the new
    higher frequency, and conversely we must not decrease
    the frequency until everything that might still depend
    on the old higher frequency has been dealt with.
    
    Since cdclk_state->pipe is only relevant when doing a cd2x
    update we can't use it to determine the correct sequence
    during squash/crawl. To that end introduce cdclk_state->disable_pipes
    which simply indicates that we must perform the update
    while the pipes are disable (ie. during
    intel_set_cdclk_pre_plane_update()). Otherwise we use the
    same old vs. new CDCLK frequency comparsiong as for cd2x
    updates.
    
    The only remaining problem case is when the voltage_level
    needs to increase due to a DDI port, but the CDCLK frequency
    is decreasing (and not all pipes are being disabled). The
    current approach will not bump the voltage level up until
    after the port has already been enabled, which is too late.
    But we'll take care of that case separately.
    
    v2: Don't break the "must disable pipes case"
    v3: Keep the on stack 'pipe' for future use
    
    Cc: [email protected]
    Fixes: d62686ba3b54 ("drm/i915/adl_p: CDCLK crawl support for ADL")
    Reviewed-by: Uma Shankar <[email protected]>
    Reviewed-by: Gustavo Sousa <[email protected]>
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 3aecee90ac12a351905f12dda7643d5b0676d6ca)
    Signed-off-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/i915/vrr: Disable VRR when using bigjoiner [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Fri Apr 5 00:34:29 2024 +0300

    drm/i915/vrr: Disable VRR when using bigjoiner
    
    commit dcd8992e47f13afb5c11a61e8d9c141c35e23751 upstream.
    
    All joined pipes share the same transcoder/timing generator.
    Currently we just do the commits per-pipe, which doesn't really
    work if we need to change switch between non-VRR and VRR timings
    generators on the fly, or even when sending the push to the
    transcoder. For now just disable VRR when bigjoiner is needed.
    
    Cc: [email protected]
    Tested-by: Vidya Srinivas <[email protected]>
    Reviewed-by: Vandita Kulkarni <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    (cherry picked from commit f9d5e51db65652dbd8a2102fd7619440e3599fd2)
    Signed-off-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/i915: Disable port sync when bigjoiner is used [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Fri Apr 5 00:34:27 2024 +0300

    drm/i915: Disable port sync when bigjoiner is used
    
    commit 0653d501409eeb9f1deb7e4c12e4d0d2c9f1cba1 upstream.
    
    The current modeset sequence can't handle port sync and bigjoiner
    at the same time. Refuse port sync when bigjoiner is needed,
    at least until we fix the modeset sequence.
    
    v2: Add a FIXME (Vandite)
    
    Cc: [email protected]
    Tested-by: Vidya Srinivas <[email protected]>
    Reviewed-by: Vandita Kulkarni <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    (cherry picked from commit b37e1347b991459c38c56ec2476087854a4f720b)
    Signed-off-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

geneve: fix header validation in geneve[6]_xmit_skb [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Fri Apr 5 10:30:34 2024 +0000

    geneve: fix header validation in geneve[6]_xmit_skb
    
    [ Upstream commit d8a6213d70accb403b82924a1c229e733433a5ef ]
    
    syzbot is able to trigger an uninit-value in geneve_xmit() [1]
    
    Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield())
    uses skb_protocol(skb, true), pskb_inet_may_pull() is only using
    skb->protocol.
    
    If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol,
    pskb_inet_may_pull() does nothing at all.
    
    If a vlan tag was provided by the caller (af_packet in the syzbot case),
    the network header might not point to the correct location, and skb
    linear part could be smaller than expected.
    
    Add skb_vlan_inet_prepare() to perform a complete mac validation.
    
    Use this in geneve for the moment, I suspect we need to adopt this
    more broadly.
    
    v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest
       - Only call __vlan_get_protocol() for vlan types.
    Link: https://lore.kernel.org/netdev/[email protected]/
    
    v2,v3 - Addressed Sabrina comments on v1 and v2
    Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/
    
    [1]
    
    BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline]
     BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
      geneve_xmit_skb drivers/net/geneve.c:910 [inline]
      geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030
      __netdev_start_xmit include/linux/netdevice.h:4903 [inline]
      netdev_start_xmit include/linux/netdevice.h:4917 [inline]
      xmit_one net/core/dev.c:3531 [inline]
      dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
      __dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335
      dev_queue_xmit include/linux/netdevice.h:3091 [inline]
      packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
      packet_snd net/packet/af_packet.c:3081 [inline]
      packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      __sys_sendto+0x685/0x830 net/socket.c:2191
      __do_sys_sendto net/socket.c:2203 [inline]
      __se_sys_sendto net/socket.c:2199 [inline]
      __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:3804 [inline]
      slab_alloc_node mm/slub.c:3845 [inline]
      kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
      __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
      alloc_skb include/linux/skbuff.h:1318 [inline]
      alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
      sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
      packet_alloc_skb net/packet/af_packet.c:2930 [inline]
      packet_snd net/packet/af_packet.c:3024 [inline]
      packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      __sys_sendto+0x685/0x830 net/socket.c:2191
      __do_sys_sendto net/socket.c:2203 [inline]
      __se_sys_sendto net/socket.c:2199 [inline]
      __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
    
    Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb")
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/netdev/[email protected]/
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: Phillip Potter <[email protected]>
    Cc: Sabrina Dubroca <[email protected]>
    Reviewed-by: Sabrina Dubroca <[email protected]>
    Reviewed-by: Phillip Potter <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

io_uring/net: restore msg_control on sendzc retry [+ + +]

Author: Pavel Begunkov <[email protected]>
Date:   Mon Apr 8 18:11:09 2024 +0100

    io_uring/net: restore msg_control on sendzc retry
    
    commit 4fe82aedeb8a8cb09bfa60f55ab57b5c10a74ac4 upstream.
    
    cac9e4418f4cb ("io_uring/net: save msghdr->msg_control for retries")
    reinstatiates msg_control before every __sys_sendmsg_sock(), since the
    function can overwrite the value in msghdr. We need to do same for
    zerocopy sendmsg.
    
    Cc: [email protected]
    Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg")
    Link: https://github.com/axboe/liburing/issues/1067
    Signed-off-by: Pavel Begunkov <[email protected]>
    Link: https://lore.kernel.org/r/cc1d5d9df0576fa66ddad4420d240a98a020b267.1712596179.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/vt-d: Allocate local memory for page request queue [+ + +]

Author: Jacob Pan <[email protected]>
Date:   Thu Apr 11 11:07:43 2024 +0800

    iommu/vt-d: Allocate local memory for page request queue
    
    [ Upstream commit a34f3e20ddff02c4f12df2c0635367394e64c63d ]
    
    The page request queue is per IOMMU, its allocation should be made
    NUMA-aware for performance reasons.
    
    Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling")
    Signed-off-by: Jacob Pan <[email protected]>
    Reviewed-by: Kevin Tian <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ipv4/route: avoid unused-but-set-variable warning [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Mon Apr 8 09:42:03 2024 +0200

    ipv4/route: avoid unused-but-set-variable warning
    
    [ Upstream commit cf1b7201df59fb936f40f4a807433fe3f2ce310a ]
    
    The log_martians variable is only used in an #ifdef, causing a 'make W=1'
    warning with gcc:
    
    net/ipv4/route.c: In function 'ip_rt_send_redirect':
    net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
    
    Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
    see where the variable is used.
    
    Fixes: 30038fc61adf ("net: ip_rt_send_redirect() optimization")
    Reviewed-by: David Ahern <[email protected]>
    Signed-off-by: Arnd Bergmann <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ipv6: fib: hide unused 'pn' variable [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Mon Apr 8 09:42:02 2024 +0200

    ipv6: fib: hide unused 'pn' variable
    
    [ Upstream commit 74043489fcb5e5ca4074133582b5b8011b67f9e7 ]
    
    When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
    a 'make W=1' warning:
    
    net/ipv6/ip6_fib.c: In function 'fib6_add':
    net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
    
    Add another #ifdef around the variable declaration, matching the other
    uses in this file.
    
    Fixes: 66729e18df08 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.")
    Link: https://lore.kernel.org/netdev/[email protected]/
    Reviewed-by: David Ahern <[email protected]>
    Signed-off-by: Arnd Bergmann <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr [+ + +]

Author: Jiri Benc <[email protected]>
Date:   Mon Apr 8 16:18:21 2024 +0200

    ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
    
    [ Upstream commit 7633c4da919ad51164acbf1aa322cc1a3ead6129 ]
    
    Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
    still means hlist_for_each_entry_rcu can return an item that got removed
    from the list. The memory itself of such item is not freed thanks to RCU
    but nothing guarantees the actual content of the memory is sane.
    
    In particular, the reference count can be zero. This can happen if
    ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
    from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
    references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
    timing, this can happen:
    
    1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
    
    2. Then, the whole ipv6_del_addr is executed for the given entry. The
       reference count drops to zero and kfree_rcu is scheduled.
    
    3. ipv6_get_ifaddr continues and tries to increments the reference count
       (in6_ifa_hold).
    
    4. The rcu is unlocked and the entry is freed.
    
    5. The freed entry is returned.
    
    Prevent increasing of the reference count in such case. The name
    in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
    
    [   41.506330] refcount_t: addition on 0; use-after-free.
    [   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
    [   41.507413] Modules linked in: veth bridge stp llc
    [   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
    [   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    [   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
    [   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
    [   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
    [   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
    [   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
    [   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
    [   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
    [   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
    [   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
    [   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
    [   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [   41.516799] Call Trace:
    [   41.517037]  <TASK>
    [   41.517249]  ? __warn+0x7b/0x120
    [   41.517535]  ? refcount_warn_saturate+0xa5/0x130
    [   41.517923]  ? report_bug+0x164/0x190
    [   41.518240]  ? handle_bug+0x3d/0x70
    [   41.518541]  ? exc_invalid_op+0x17/0x70
    [   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
    [   41.521325]  ? refcount_warn_saturate+0xa5/0x130
    [   41.521708]  ipv6_get_ifaddr+0xda/0xe0
    [   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
    [   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
    [   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
    [   41.523102]  ? netlink_unicast+0x30f/0x390
    [   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
    [   41.523832]  netlink_rcv_skb+0x53/0x100
    [   41.524157]  netlink_unicast+0x23b/0x390
    [   41.524484]  netlink_sendmsg+0x1f2/0x440
    [   41.524826]  __sys_sendto+0x1d8/0x1f0
    [   41.525145]  __x64_sys_sendto+0x1f/0x30
    [   41.525467]  do_syscall_64+0xa5/0x1b0
    [   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
    [   41.526213] RIP: 0033:0x7fbc4cfcea9a
    [   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
    [   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    [   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
    [   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
    [   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
    [   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
    [   41.531573]  </TASK>
    
    Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU")
    Reviewed-by: Eric Dumazet <[email protected]>
    Reviewed-by: David Ahern <[email protected]>
    Signed-off-by: Jiri Benc <[email protected]>
    Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

irqflags: Explicitly ignore lockdep_hrtimer_exit() argument [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Mon Apr 8 09:46:01 2024 +0200

    irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
    
    commit c1d11fc2c8320871b40730991071dd0a0b405bc8 upstream.
    
    When building with 'make W=1' but CONFIG_TRACE_IRQFLAGS=n, the
    unused argument to lockdep_hrtimer_exit() causes a warning:
    
    kernel/time/hrtimer.c:1655:14: error: variable 'expires_in_hardirq' set but not used [-Werror=unused-but-set-variable]
    
    This is intentional behavior, so add a cast to void to shut up the warning.
    
    Fixes: 73d20564e0dc ("hrtimer: Don't dereference the hrtimer pointer after the callback")
    Reported-by: kernel test robot <[email protected]>
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

kprobes: Fix possible use-after-free issue on kprobe registration [+ + +]

Author: Zheng Yejian <[email protected]>
Date:   Wed Apr 10 09:58:02 2024 +0800

    kprobes: Fix possible use-after-free issue on kprobe registration
    
    commit 325f3fb551f8cd672dbbfc4cf58b14f9ee3fc9e8 upstream.
    
    When unloading a module, its state is changing MODULE_STATE_LIVE ->
     MODULE_STATE_GOING -> MODULE_STATE_UNFORMED. Each change will take
    a time. `is_module_text_address()` and `__module_text_address()`
    works with MODULE_STATE_LIVE and MODULE_STATE_GOING.
    If we use `is_module_text_address()` and `__module_text_address()`
    separately, there is a chance that the first one is succeeded but the
    next one is failed because module->state becomes MODULE_STATE_UNFORMED
    between those operations.
    
    In `check_kprobe_address_safe()`, if the second `__module_text_address()`
    is failed, that is ignored because it expected a kernel_text address.
    But it may have failed simply because module->state has been changed
    to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify
    non-exist module text address (use-after-free).
    
    To fix this problem, we should not use separated `is_module_text_address()`
    and `__module_text_address()`, but use only `__module_text_address()`
    once and do `try_module_get(module)` which is only available with
    MODULE_STATE_LIVE.
    
    Link: https://lore.kernel.org/all/[email protected]/
    
    Fixes: 28f6c37a2910 ("kprobes: Forbid probing on trampoline and BPF code areas")
    Cc: [email protected]
    Signed-off-by: Zheng Yejian <[email protected]>
    Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Linux: Linux 6.1.87 [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Wed Apr 17 11:18:29 2024 +0200

    Linux 6.1.87
    
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: Pavel Machek (CIP) <[email protected]>
    Tested-by: Kelsey Steele <[email protected]>
    Tested-by: Mark Brown <[email protected]>
    Tested-by: Ron Economos <[email protected]>
    Tested-by: Yann Sionneau<[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Tested-by: Mateusz Joе└czyk <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

media: cec: core: remove length check of Timer Status [+ + +]

Author: Nini Song <[email protected]>
Date:   Thu Jan 25 21:28:45 2024 +0800

    media: cec: core: remove length check of Timer Status
    
    commit ce5d241c3ad4568c12842168288993234345c0eb upstream.
    
    The valid_la is used to check the length requirements,
    including special cases of Timer Status. If the length is
    shorter than 5, that means no Duration Available is returned,
    the message will be forced to be invalid.
    
    However, the description of Duration Available in the spec
    is that this parameter may be returned when these cases, or
    that it can be optionally return when these cases. The key
    words in the spec description are flexible choices.
    
    Remove the special length check of Timer Status to fit the
    spec which is not compulsory about that.
    
    Signed-off-by: Nini Song <[email protected]>
    Signed-off-by: Hans Verkuil <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/mlx5: Properly link new fs rules into the tree [+ + +]

Author: Cosmin Ratiu <[email protected]>
Date:   Tue Apr 9 22:08:12 2024 +0300

    net/mlx5: Properly link new fs rules into the tree
    
    [ Upstream commit 7c6782ad4911cbee874e85630226ed389ff2e453 ]
    
    Previously, add_rule_fg would only add newly created rules from the
    handle into the tree when they had a refcount of 1. On the other hand,
    create_flow_handle tries hard to find and reference already existing
    identical rules instead of creating new ones.
    
    These two behaviors can result in a situation where create_flow_handle
    1) creates a new rule and references it, then
    2) in a subsequent step during the same handle creation references it
       again,
    resulting in a rule with a refcount of 2 that is not linked into the
    tree, will have a NULL parent and root and will result in a crash when
    the flow group is deleted because del_sw_hw_rule, invoked on rule
    deletion, assumes node->parent is != NULL.
    
    This happened in the wild, due to another bug related to incorrect
    handling of duplicate pkt_reformat ids, which lead to the code in
    create_flow_handle incorrectly referencing a just-added rule in the same
    flow handle, resulting in the problem described above. Full details are
    at [1].
    
    This patch changes add_rule_fg to add new rules without parents into
    the tree, properly initializing them and avoiding the crash. This makes
    it more consistent with how rules are added to an FTE in
    create_flow_handle.
    
    Fixes: 74491de93712 ("net/mlx5: Add multi dest support")
    Link: https://lore.kernel.org/netdev/[email protected]/T/#u [1]
    Signed-off-by: Cosmin Ratiu <[email protected]>
    Reviewed-by: Tariq Toukan <[email protected]>
    Reviewed-by: Mark Bloch <[email protected]>
    Signed-off-by: Saeed Mahameed <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/mlx5e: Fix mlx5e_priv_init() cleanup flow [+ + +]

Author: Carolina Jubran <[email protected]>
Date:   Tue Apr 9 22:08:15 2024 +0300

    net/mlx5e: Fix mlx5e_priv_init() cleanup flow
    
    [ Upstream commit ecb829459a841198e142f72fadab56424ae96519 ]
    
    When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which
    calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using
    lockdep_is_held().
    
    Acquire the state_lock in mlx5e_selq_cleanup().
    
    Kernel log:
    =============================
    WARNING: suspicious RCU usage
    6.8.0-rc3_net_next_841a9b5 #1 Not tainted
    -----------------------------
    drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    2 locks held by systemd-modules/293:
     #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core]
     #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core]
    
    stack backtrace:
    CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x8a/0xa0
     lockdep_rcu_suspicious+0x154/0x1a0
     mlx5e_selq_apply+0x94/0xa0 [mlx5_core]
     mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core]
     mlx5e_priv_init+0x2be/0x2f0 [mlx5_core]
     mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core]
     rdma_init_netdev+0x4e/0x80 [ib_core]
     ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core]
     ipoib_intf_init+0x64/0x550 [ib_ipoib]
     ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib]
     ipoib_add_one+0xb0/0x360 [ib_ipoib]
     add_client_context+0x112/0x1c0 [ib_core]
     ib_register_client+0x166/0x1b0 [ib_core]
     ? 0xffffffffa0573000
     ipoib_init_module+0xeb/0x1a0 [ib_ipoib]
     do_one_initcall+0x61/0x250
     do_init_module+0x8a/0x270
     init_module_from_file+0x8b/0xd0
     idempotent_init_module+0x17d/0x230
     __x64_sys_finit_module+0x61/0xb0
     do_syscall_64+0x71/0x140
     entry_SYSCALL_64_after_hwframe+0x46/0x4e
     </TASK>
    
    Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters")
    Signed-off-by: Carolina Jubran <[email protected]>
    Reviewed-by: Tariq Toukan <[email protected]>
    Reviewed-by: Dragos Tatulea <[email protected]>
    Signed-off-by: Saeed Mahameed <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/mlx5e: HTB, Fix inconsistencies with QoS SQs number [+ + +]

Author: Carolina Jubran <[email protected]>
Date:   Tue Apr 9 22:08:16 2024 +0300

    net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
    
    [ Upstream commit 2f436f1869771d46e1a9f85738d5a1a7c5653a4e ]
    
    When creating a new HTB class while the interface is down,
    the variable that follows the number of QoS SQs (htb_max_qos_sqs)
    may not be consistent with the number of HTB classes.
    
    Previously, we compared these two values to ensure that
    the node_qid is lower than the number of QoS SQs, and we
    allocated stats for that SQ when they are equal.
    
    Change the check to compare the node_qid with the current
    number of leaf nodes and fix the checking conditions to
    ensure allocation of stats_list and stats for each node.
    
    Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
    Signed-off-by: Carolina Jubran <[email protected]>
    Reviewed-by: Tariq Toukan <[email protected]>
    Reviewed-by: Dragos Tatulea <[email protected]>
    Signed-off-by: Saeed Mahameed <[email protected]>
    Signed-off-by: Tariq Toukan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dsa: mt7530: trap link-local frames regardless of ST Port State [+ + +]

Author: Arд╠nц╖ ц°NAL <[email protected]>
Date:   Tue Apr 9 18:01:14 2024 +0300

    net: dsa: mt7530: trap link-local frames regardless of ST Port State
    
    [ Upstream commit 17c560113231ddc20088553c7b499b289b664311 ]
    
    In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer
    (DLL) of the Open Systems Interconnection basic reference model (OSI/RM)
    are described; the medium access control (MAC) and logical link control
    (LLC) sublayers. The MAC sublayer is the one facing the physical layer.
    
    In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A
    Bridge component comprises a MAC Relay Entity for interconnecting the Ports
    of the Bridge, at least two Ports, and higher layer entities with at least
    a Spanning Tree Protocol Entity included.
    
    Each Bridge Port also functions as an end station and shall provide the MAC
    Service to an LLC Entity. Each instance of the MAC Service is provided to a
    distinct LLC Entity that supports protocol identification, multiplexing,
    and demultiplexing, for protocol data unit (PDU) transmission and reception
    by one or more higher layer entities.
    
    It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC
    Entity associated with each Bridge Port is modeled as being directly
    connected to the attached Local Area Network (LAN).
    
    On the switch with CPU port architecture, CPU port functions as Management
    Port, and the Management Port functionality is provided by software which
    functions as an end station. Software is connected to an IEEE 802 LAN that
    is wholly contained within the system that incorporates the Bridge.
    Software provides access to the LLC Entity associated with each Bridge Port
    by the value of the source port field on the special tag on the frame
    received by software.
    
    We call frames that carry control information to determine the active
    topology and current extent of each Virtual Local Area Network (VLAN),
    i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN
    Registration Protocol Data Units (MVRPDUs), and frames from other link
    constrained protocols, such as Extensible Authentication Protocol over LAN
    (EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They
    are not forwarded by a Bridge. Permanently configured entries in the
    filtering database (FDB) ensure that such frames are discarded by the
    Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in
    detail:
    
    Each of the reserved MAC addresses specified in Table 8-1
    (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be
    permanently configured in the FDB in C-VLAN components and ERs.
    
    Each of the reserved MAC addresses specified in Table 8-2
    (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently
    configured in the FDB in S-VLAN components.
    
    Each of the reserved MAC addresses specified in Table 8-3
    (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB
    in TPMR components.
    
    The FDB entries for reserved MAC addresses shall specify filtering for all
    Bridge Ports and all VIDs. Management shall not provide the capability to
    modify or remove entries for reserved MAC addresses.
    
    The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of
    propagation of PDUs within a Bridged Network, as follows:
    
      The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that
      no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN)
      component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward.
      PDUs transmitted using this destination address, or any other addresses
      that appear in Table 8-1, Table 8-2, and Table 8-3
      (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can
      therefore travel no further than those stations that can be reached via a
      single individual LAN from the originating station.
    
      The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an
      address that no conformant S-VLAN component, C-VLAN component, or MAC
      Bridge can forward; however, this address is relayed by a TPMR component.
      PDUs using this destination address, or any of the other addresses that
      appear in both Table 8-1 and Table 8-2 but not in Table 8-3
      (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed
      by any TPMRs but will propagate no further than the nearest S-VLAN
      component, C-VLAN component, or MAC Bridge.
    
      The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an
      address that no conformant C-VLAN component, MAC Bridge can forward;
      however, it is relayed by TPMR components and S-VLAN components. PDUs
      using this destination address, or any of the other addresses that appear
      in Table 8-1 but not in either Table 8-2 or Table 8-3
      (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and
      S-VLAN components but will propagate no further than the nearest C-VLAN
      component or MAC Bridge.
    
    Because the LLC Entity associated with each Bridge Port is provided via CPU
    port, we must not filter these frames but forward them to CPU port.
    
    In a Bridge, the transmission Port is majorly decided by ingress and egress
    rules, FDB, and spanning tree Port State functions of the Forwarding
    Process. For link-local frames, only CPU port should be designated as
    destination port in the FDB, and the other functions of the Forwarding
    Process must not interfere with the decision of the transmission Port. We
    call this process trapping frames to CPU port.
    
    Therefore, on the switch with CPU port architecture, link-local frames must
    be trapped to CPU port, and certain link-local frames received by a Port of
    a Bridge comprising a TPMR component or an S-VLAN component must be
    excluded from it.
    
    A Bridge of the switch with CPU port architecture cannot comprise a
    Two-Port MAC Relay (TPMR) component as a TPMR component supports only a
    subset of the functionality of a MAC Bridge. A Bridge comprising two Ports
    (Management Port doesn't count) of this architecture will either function
    as a standard MAC Bridge or a standard VLAN Bridge.
    
    Therefore, a Bridge of this architecture can only comprise S-VLAN
    components, C-VLAN components, or MAC Bridge components. Since there's no
    TPMR component, we don't need to relay PDUs using the destination addresses
    specified on the Nearest non-TPMR section, and the proportion of the
    Nearest Customer Bridge section where they must be relayed by TPMR
    components.
    
    One option to trap link-local frames to CPU port is to add static FDB
    entries with CPU port designated as destination port. However, because that
    Independent VLAN Learning (IVL) is being used on every VID, each entry only
    applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC
    Bridge component or a C-VLAN component, there would have to be 16 times
    4096 entries. This switch intellectual property can only hold a maximum of
    2048 entries. Using this option, there also isn't a mechanism to prevent
    link-local frames from being discarded when the spanning tree Port State of
    the reception Port is discarding.
    
    The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4
    registers. Whilst this applies to every VID, it doesn't contain all of the
    reserved MAC addresses without affecting the remaining Standard Group MAC
    Addresses. The REV_UN frame tag utilised using the RGAC4 register covers
    the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination
    addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF
    destination addresses which may be relayed by MAC Bridges or VLAN Bridges.
    The latter option provides better but not complete conformance.
    
    This switch intellectual property also does not provide a mechanism to trap
    link-local frames with specific destination addresses to CPU port by
    Bridge, to conform to the filtering rules for the distinct Bridge
    components.
    
    Therefore, regardless of the type of the Bridge component, link-local
    frames with these destination addresses will be trapped to CPU port:
    
    01-80-C2-00-00-[00,01,02,03,0E]
    
    In a Bridge comprising a MAC Bridge component or a C-VLAN component:
    
      Link-local frames with these destination addresses won't be trapped to
      CPU port which won't conform to IEEE Std 802.1Q-2022:
    
      01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F]
    
    In a Bridge comprising an S-VLAN component:
    
      Link-local frames with these destination addresses will be trapped to CPU
      port which won't conform to IEEE Std 802.1Q-2022:
    
      01-80-C2-00-00-00
    
      Link-local frames with these destination addresses won't be trapped to
      CPU port which won't conform to IEEE Std 802.1Q-2022:
    
      01-80-C2-00-00-[04,05,06,07,08,09,0A]
    
    Currently on this switch intellectual property, if the spanning tree Port
    State of the reception Port is discarding, link-local frames will be
    discarded.
    
    To trap link-local frames regardless of the spanning tree Port State, make
    the switch regard them as Bridge Protocol Data Units (BPDUs). This switch
    intellectual property only lets the frames regarded as BPDUs bypass the
    spanning tree Port State function of the Forwarding Process.
    
    With this change, the only remaining interference is the ingress rules.
    When the reception Port has no PVID assigned on software, VLAN-untagged
    frames won't be allowed in. There doesn't seem to be a mechanism on the
    switch intellectual property to have link-local frames bypass this function
    of the Forwarding Process.
    
    Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
    Reviewed-by: Daniel Golle <[email protected]>
    Signed-off-by: Arд╠nц╖ ц°NAL <[email protected]>
    Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp-discarding-v2-1-07b1150164ac@arinc9.com
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ena: Fix incorrect descriptor free behavior [+ + +]

Author: David Arinzon <[email protected]>
Date:   Wed Apr 10 09:13:57 2024 +0000

    net: ena: Fix incorrect descriptor free behavior
    
    [ Upstream commit bf02d9fe00632d22fa91d34749c7aacf397b6cde ]
    
    ENA has two types of TX queues:
    - queues which only process TX packets arriving from the network stack
    - queues which only process TX packets forwarded to it by XDP_REDIRECT
      or XDP_TX instructions
    
    The ena_free_tx_bufs() cycles through all descriptors in a TX queue
    and unmaps + frees every descriptor that hasn't been acknowledged yet
    by the device (uncompleted TX transactions).
    The function assumes that the processed TX queue is necessarily from
    the first category listed above and ends up using napi_consume_skb()
    for descriptors belonging to an XDP specific queue.
    
    This patch solves a bug in which, in case of a VF reset, the
    descriptors aren't freed correctly, leading to crashes.
    
    Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
    Signed-off-by: Shay Agroskin <[email protected]>
    Signed-off-by: David Arinzon <[email protected]>
    Reviewed-by: Shannon Nelson <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ena: Fix potential sign extension issue [+ + +]

Author: David Arinzon <[email protected]>
Date:   Wed Apr 10 09:13:55 2024 +0000

    net: ena: Fix potential sign extension issue
    
    [ Upstream commit 713a85195aad25d8a26786a37b674e3e5ec09e3c ]
    
    Small unsigned types are promoted to larger signed types in
    the case of multiplication, the result of which may overflow.
    In case the result of such a multiplication has its MSB
    turned on, it will be sign extended with '1's.
    This changes the multiplication result.
    
    Code example of the phenomenon:
    -------------------------------
    u16 x, y;
    size_t z1, z2;
    
    x = y = 0xffff;
    printk("x=%x y=%x\n",x,y);
    
    z1 = x*y;
    z2 = (size_t)x*y;
    
    printk("z1=%lx z2=%lx\n", z1, z2);
    
    Output:
    -------
    x=ffff y=ffff
    z1=fffffffffffe0001 z2=fffe0001
    
    The expected result of ffff*ffff is fffe0001, and without the
    explicit casting to avoid the unwanted sign extension we got
    fffffffffffe0001.
    
    This commit adds an explicit casting to avoid the sign extension
    issue.
    
    Fixes: 689b2bdaaa14 ("net: ena: add functions for handling Low Latency Queues in ena_com")
    Signed-off-by: Arthur Kiyanovski <[email protected]>
    Signed-off-by: David Arinzon <[email protected]>
    Reviewed-by: Shannon Nelson <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ena: Wrong missing IO completions check order [+ + +]

Author: David Arinzon <[email protected]>
Date:   Wed Apr 10 09:13:56 2024 +0000

    net: ena: Wrong missing IO completions check order
    
    [ Upstream commit f7e417180665234fdb7af2ebe33d89aaa434d16f ]
    
    Missing IO completions check is called every second (HZ jiffies).
    This commit fixes several issues with this check:
    
    1. Duplicate queues check:
       Max of 4 queues are scanned on each check due to monitor budget.
       Once reaching the budget, this check exits under the assumption that
       the next check will continue to scan the remainder of the queues,
       but in practice, next check will first scan the last already scanned
       queue which is not necessary and may cause the full queue scan to
       last a couple of seconds longer.
       The fix is to start every check with the next queue to scan.
       For example, on 8 IO queues:
       Bug: [0,1,2,3], [3,4,5,6], [6,7]
       Fix: [0,1,2,3], [4,5,6,7]
    
    2. Unbalanced queues check:
       In case the number of active IO queues is not a multiple of budget,
       there will be checks which don't utilize the full budget
       because the full scan exits when reaching the last queue id.
       The fix is to run every TX completion check with exact queue budget
       regardless of the queue id.
       For example, on 7 IO queues:
       Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
       Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
       The budget may be lowered in case the number of IO queues is less
       than the budget (4) to make sure there are no duplicate queues on
       the same check.
       For example, on 3 IO queues:
       Bug: [0,1,2,0], [1,2,0,1]
       Fix: [0,1,2], [0,1,2]
    
    Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
    Signed-off-by: Amit Bernstein <[email protected]>
    Signed-off-by: David Arinzon <[email protected]>
    Reviewed-by: Shannon Nelson <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ks8851: Handle softirqs at the end of IRQ thread to fix hang [+ + +]

Author: Marek Vasut <[email protected]>
Date:   Fri Apr 5 22:30:40 2024 +0200

    net: ks8851: Handle softirqs at the end of IRQ thread to fix hang
    
    [ Upstream commit be0384bf599cf1eb8d337517feeb732d71f75a6f ]
    
    The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
    any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
    implementation is guarded by local_bh_disable() and local_bh_enable().
    The local_bh_enable() may call do_softirq() to run softirqs in case
    any are pending. One of the softirqs is net_rx_action, which ultimately
    reaches the driver .start_xmit callback. If that happens, the system
    hangs. The entire call chain is below:
    
    ks8851_start_xmit_par from netdev_start_xmit
    netdev_start_xmit from dev_hard_start_xmit
    dev_hard_start_xmit from sch_direct_xmit
    sch_direct_xmit from __dev_queue_xmit
    __dev_queue_xmit from __neigh_update
    __neigh_update from neigh_update
    neigh_update from arp_process.constprop.0
    arp_process.constprop.0 from __netif_receive_skb_one_core
    __netif_receive_skb_one_core from process_backlog
    process_backlog from __napi_poll.constprop.0
    __napi_poll.constprop.0 from net_rx_action
    net_rx_action from __do_softirq
    __do_softirq from call_with_stack
    call_with_stack from do_softirq
    do_softirq from __local_bh_enable_ip
    __local_bh_enable_ip from netif_rx
    netif_rx from ks8851_irq
    ks8851_irq from irq_thread_fn
    irq_thread_fn from irq_thread
    irq_thread from kthread
    kthread from ret_from_fork
    
    The hang happens because ks8851_irq() first locks a spinlock in
    ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
    and with that spinlock locked, calls netif_rx(). Once the execution
    reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
    which attempts to claim the already locked spinlock again, and the
    hang happens.
    
    Move the do_softirq() call outside of the spinlock protected section
    of ks8851_irq() by disabling BHs around the entire spinlock protected
    section of ks8851_irq() handler. Place local_bh_enable() outside of
    the spinlock protected section, so that it can trigger do_softirq()
    without the ks8851_par.c ks8851_lock_par() spinlock being held, and
    safely call ks8851_start_xmit_par() without attempting to lock the
    already locked spinlock.
    
    Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
    now, replace netif_rx() with __netif_rx() which is not duplicating the
    local_bh_disable()/local_bh_enable() calls.
    
    Fixes: 797047f875b5 ("net: ks8851: Implement Parallel bus operations")
    Signed-off-by: Marek Vasut <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ks8851: Inline ks8851_rx_skb() [+ + +]

Author: Marek Vasut <[email protected]>
Date:   Fri Apr 5 22:30:39 2024 +0200

    net: ks8851: Inline ks8851_rx_skb()
    
    [ Upstream commit f96f700449b6d190e06272f1cf732ae8e45b73df ]
    
    Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb),
    inline the netif_rx(skb) call directly into ks8851_common.c and drop
    the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one
    indirect call from the driver, no functional change otherwise.
    
    Signed-off-by: Marek Vasut <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Stable-dep-of: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
    Signed-off-by: Sasha Levin <[email protected]>

net: openvswitch: fix unwanted error log on timeout policy probing [+ + +]

Author: Ilya Maximets <[email protected]>
Date:   Wed Apr 3 22:38:01 2024 +0200

    net: openvswitch: fix unwanted error log on timeout policy probing
    
    [ Upstream commit 4539f91f2a801c0c028c252bffae56030cfb2cae ]
    
    On startup, ovs-vswitchd probes different datapath features including
    support for timeout policies.  While probing, it tries to execute
    certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
    attributes set.  These attributes tell the openvswitch module to not
    log any errors when they occur as it is expected that some of the
    probes will fail.
    
    For some reason, setting the timeout policy ignores the PROBE attribute
    and logs a failure anyway.  This is causing the following kernel log
    on each re-start of ovs-vswitchd:
    
      kernel: Failed to associated timeout policy `ovs_test_tp'
    
    Fix that by using the same logging macro that all other messages are
    using.  The message will still be printed at info level when needed
    and will be rate limited, but with a net rate limiter instead of
    generic printk one.
    
    The nf_ct_set_timeout() itself will still print some info messages,
    but at least this change makes logging in openvswitch module more
    consistent.
    
    Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
    Signed-off-by: Ilya Maximets <[email protected]>
    Acked-by: Eelco Chaudron <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: sparx5: fix wrong config being used when reconfiguring PCS [+ + +]

Author: Daniel Machon <[email protected]>
Date:   Tue Apr 9 12:41:59 2024 +0200

    net: sparx5: fix wrong config being used when reconfiguring PCS
    
    [ Upstream commit 33623113a48ea906f1955cbf71094f6aa4462e8f ]
    
    The wrong port config is being used if the PCS is reconfigured. Fix this
    by correctly using the new config instead of the old one.
    
    Fixes: 946e7fd5053a ("net: sparx5: add port module support")
    Signed-off-by: Daniel Machon <[email protected]>
    Reviewed-by: Jacob Keller <[email protected]>
    Link: https://lore.kernel.org/r/20240409-link-mode-reconfiguration-fix-v2-1-db6a507f3627@microchip.com
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: complete validation of user input [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Tue Apr 9 12:07:41 2024 +0000

    netfilter: complete validation of user input
    
    [ Upstream commit 65acf6e0501ac8880a4f73980d01b5d27648b956 ]
    
    In my recent commit, I missed that do_replace() handlers
    use copy_from_sockptr() (which I fixed), followed
    by unsafe copy_from_sockptr_offset() calls.
    
    In all functions, we can perform the @optlen validation
    before even calling xt_alloc_table_info() with the following
    check:
    
    if ((u64)optlen < (u64)tmp.size + sizeof(tmp))
            return -EINVAL;
    
    Fixes: 0c83842df40f ("netfilter: validate user input for expected length")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Pablo Neira Ayuso <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nouveau: fix function cast warning [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Thu Apr 4 18:02:25 2024 +0200

    nouveau: fix function cast warning
    
    [ Upstream commit 185fdb4697cc9684a02f2fab0530ecdd0c2f15d4 ]
    
    Calling a function through an incompatible pointer type causes breaks
    kcfi, so clang warns about the assignment:
    
    drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c:73:10: error: cast from 'void (*)(const void *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
       73 |         .fini = (void(*)(void *))kfree,
    
    Avoid this with a trivial wrapper.
    
    Fixes: c39f472e9f14 ("drm/nouveau: remove symlinks, move core/ to nvkm/ (no code changes)")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Danilo Krummrich <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: Fix NIX SQ mode and BP config [+ + +]

Author: Geetha sowjanya <[email protected]>
Date:   Mon Apr 8 12:06:43 2024 +0530

    octeontx2-af: Fix NIX SQ mode and BP config
    
    [ Upstream commit faf23006185e777db18912685922c5ddb2df383f ]
    
    NIX SQ mode and link backpressure configuration is required for
    all platforms. But in current driver this code is wrongly placed
    under specific platform check. This patch fixes the issue by
    moving the code out of platform check.
    
    Fixes: 5d9b976d4480 ("octeontx2-af: Support fixed transmit scheduler topology")
    Signed-off-by: Geetha sowjanya <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

perf/x86: Fix out of range data [+ + +]

Author: Namhyung Kim <[email protected]>
Date:   Tue Mar 5 22:10:03 2024 -0800

    perf/x86: Fix out of range data
    
    commit dec8ced871e17eea46f097542dd074d022be4bd1 upstream.
    
    On x86 each struct cpu_hw_events maintains a table for counter assignment but
    it missed to update one for the deleted event in x86_pmu_del().  This
    can make perf_clear_dirty_counters() reset used counter if it's called
    before event scheduling or enabling.  Then it would return out of range
    data which doesn't make sense.
    
    The following code can reproduce the problem.
    
      $ cat repro.c
      #include <pthread.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <linux/perf_event.h>
      #include <sys/ioctl.h>
      #include <sys/mman.h>
      #include <sys/syscall.h>
    
      struct perf_event_attr attr = {
            .type = PERF_TYPE_HARDWARE,
            .config = PERF_COUNT_HW_CPU_CYCLES,
            .disabled = 1,
      };
    
      void *worker(void *arg)
      {
            int cpu = (long)arg;
            int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
            int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
            void *p;
    
            do {
                    ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
                    p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
                    ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
    
                    ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
                    munmap(p, 4096);
                    ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
            } while (1);
    
            return NULL;
      }
    
      int main(void)
      {
            int i;
            int n = sysconf(_SC_NPROCESSORS_ONLN);
            pthread_t *th = calloc(n, sizeof(*th));
    
            for (i = 0; i < n; i++)
                    pthread_create(&th[i], NULL, worker, (void *)(long)i);
            for (i = 0; i < n; i++)
                    pthread_join(th[i], NULL);
    
            free(th);
            return 0;
      }
    
    And you can see the out of range data using perf stat like this.
    Probably it'd be easier to see on a large machine.
    
      $ gcc -o repro repro.c -pthread
      $ ./repro &
      $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
           1.001028462 CPU6   196,719,295,683,763      cycles                           # 194290.996 GHz                       (71.54%)
           1.001028462 CPU3   396,077,485,787,730      branch-misses                    # 15804359784.80% of all branches      (71.07%)
           1.001028462 CPU17  197,608,350,727,877      branch-misses                    # 14594186554.56% of all branches      (71.22%)
           2.020064073 CPU4   198,372,472,612,140      cycles                           # 194681.113 GHz                       (70.95%)
           2.020064073 CPU6   199,419,277,896,696      cycles                           # 195720.007 GHz                       (70.57%)
           2.020064073 CPU20  198,147,174,025,639      cycles                           # 194474.654 GHz                       (71.03%)
           2.020064073 CPU20  198,421,240,580,145      stalled-cycles-frontend          #  100.14% frontend cycles idle        (70.93%)
           3.037443155 CPU4   197,382,689,923,416      cycles                           # 194043.065 GHz                       (71.30%)
           3.037443155 CPU20  196,324,797,879,414      cycles                           # 193003.773 GHz                       (71.69%)
           3.037443155 CPU5   197,679,956,608,205      stalled-cycles-backend           # 1315606428.66% backend cycles idle   (71.19%)
           3.037443155 CPU5   198,571,860,474,851      instructions                     # 13215422.58  insn per cycle
    
    It should move the contents in the cpuc->assign as well.
    
    Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
    Signed-off-by: Namhyung Kim <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Kan Liang <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

PM: s2idle: Make sure CPUs will wakeup directly on resume [+ + +]

Author: Anna-Maria Behnsen <[email protected]>
Date:   Mon Apr 8 09:02:23 2024 +0200

    PM: s2idle: Make sure CPUs will wakeup directly on resume
    
    commit 3c89a068bfd0698a5478f4cf39493595ef757d5e upstream.
    
    s2idle works like a regular suspend with freezing processes and freezing
    devices. All CPUs except the control CPU go into idle. Once this is
    completed the control CPU kicks all other CPUs out of idle, so that they
    reenter the idle loop and then enter s2idle state. The control CPU then
    issues an swait() on the suspend state and therefore enters the idle loop
    as well.
    
    Due to being kicked out of idle, the other CPUs leave their NOHZ states,
    which means the tick is active and the corresponding hrtimer is programmed
    to the next jiffie.
    
    On entering s2idle the CPUs shut down their local clockevent device to
    prevent wakeups. The last CPU which enters s2idle shuts down its local
    clockevent and freezes timekeeping.
    
    On resume, one of the CPUs receives the wakeup interrupt, unfreezes
    timekeeping and its local clockevent and starts the resume process. At that
    point all other CPUs are still in s2idle with their clockevents switched
    off. They only resume when they are kicked by another CPU or after resuming
    devices and then receiving a device interrupt.
    
    That means there is no guarantee that all CPUs will wakeup directly on
    resume. As a consequence there is no guarantee that timers which are queued
    on those CPUs and should expire directly after resume, are handled. Also
    timer list timers which are remotely queued to one of those CPUs after
    resume will not result in a reprogramming IPI as the tick is
    active. Queueing a hrtimer will also not result in a reprogramming IPI
    because the first hrtimer event is already in the past.
    
    The recent introduction of the timer pull model (7ee988770326 ("timers:
    Implement the hierarchical pull model")) amplifies this problem, if the
    current migrator is one of the non woken up CPUs. When a non pinned timer
    list timer is queued and the queuing CPU goes idle, it relies on the still
    suspended migrator CPU to expire the timer which will happen by chance.
    
    The problem exists since commit 8d89835b0467 ("PM: suspend: Do not pause
    cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which
    in turn invoked a wakeup for all idle CPUs was moved to a later point in
    the resume process. This might not be reached or reached very late because
    it waits on a timer of a still suspended CPU.
    
    Address this by kicking all CPUs out of idle after the control CPU returns
    from swait() so that they resume their timers and restore consistent system
    state.
    
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641
    Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
    Signed-off-by: Anna-Maria Behnsen <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Tested-by: Mario Limonciello <[email protected]>
    Cc: 5.16+ <[email protected]> # 5.16+
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Reviewed-by: Ulf Hansson <[email protected]>
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Revert "drm/qxl: simplify qxl_fence_wait" [+ + +]

Author: Alex Constantino <[email protected]>
Date:   Thu Apr 4 19:14:48 2024 +0100

    Revert "drm/qxl: simplify qxl_fence_wait"
    
    [ Upstream commit 07ed11afb68d94eadd4ffc082b97c2331307c5ea ]
    
    This reverts commit 5a838e5d5825c85556011478abde708251cc0776.
    
    Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
    result in a '[TTM] Buffer eviction failed' exception whenever it reached a
    timeout.
    Due to a dependency to DMA_FENCE_WARN this also restores some code deleted
    by commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2").
    
    Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
    Link: https://lore.kernel.org/regressions/[email protected]/
    Reported-by: Timo Lindfors <[email protected]>
    Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
    Signed-off-by: Alex Constantino <[email protected]>
    Signed-off-by: Maxime Ripard <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

ring-buffer: Only update pages_touched when a new page is touched [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Tue Apr 9 15:13:09 2024 -0400

    ring-buffer: Only update pages_touched when a new page is touched
    
    commit ffe3986fece696cf65e0ef99e74c75f848be8e30 upstream.
    
    The "buffer_percent" logic that is used by the ring buffer splice code to
    only wake up the tasks when there's no data after the buffer is filled to
    the percentage of the "buffer_percent" file is dependent on three
    variables that determine the amount of data that is in the ring buffer:
    
     1) pages_read - incremented whenever a new sub-buffer is consumed
     2) pages_lost - incremented every time a writer overwrites a sub-buffer
     3) pages_touched - incremented when a write goes to a new sub-buffer
    
    The percentage is the calculation of:
    
      (pages_touched - (pages_lost + pages_read)) / nr_pages
    
    Basically, the amount of data is the total number of sub-bufs that have been
    touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
    divided by the total count to give the buffer percentage. When the
    percentage is greater than the value in the "buffer_percent" file, it
    wakes up splice readers waiting for that amount.
    
    It was observed that over time, the amount read from the splice was
    constantly decreasing the longer the trace was running. That is, if one
    asked for 60%, it would read over 60% when it first starts tracing, but
    then it would be woken up at under 60% and would slowly decrease the
    amount of data read after being woken up, where the amount becomes much
    less than the buffer percent.
    
    This was due to an accounting of the pages_touched incrementation. This
    value is incremented whenever a writer transfers to a new sub-buffer. But
    the place where it was incremented was incorrect. If a writer overflowed
    the current sub-buffer it would go to the next one. If it gets preempted
    by an interrupt at that time, and the interrupt performs a trace, it too
    will end up going to the next sub-buffer. But only one should increment
    the counter. Unfortunately, that was not the case.
    
    Change the cmpxchg() that does the real switch of the tail-page into a
    try_cmpxchg(), and on success, perform the increment of pages_touched. This
    will only increment the counter once for when the writer moves to a new
    sub-buffer, and not when there's a race and is incremented for when a
    writer and its preempting writer both move to the same new sub-buffer.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Mathieu Desnoyers <[email protected]>
    Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
    Acked-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: hisi_sas: Modify the deadline for ata_wait_after_reset() [+ + +]

Author: Xiang Chen <[email protected]>
Date:   Tue Apr 2 11:55:13 2024 +0800

    scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
    
    [ Upstream commit 0098c55e0881f0b32591f2110410d5c8b7f9bd5a ]
    
    We found that the second parameter of function ata_wait_after_reset() is
    incorrectly used. We call smp_ata_check_ready_type() to poll the device
    type until the 30s timeout, so the correct deadline should be (jiffies +
    30000).
    
    Fixes: 3c2673a09cf1 ("scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset")
    Co-developed-by: xiabing <[email protected]>
    Signed-off-by: xiabing <[email protected]>
    Co-developed-by: Yihang Li <[email protected]>
    Signed-off-by: Yihang Li <[email protected]>
    Signed-off-by: Xiang Chen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: qla2xxx: Fix off by one in qla_edif_app_getstats() [+ + +]

Author: Dan Carpenter <[email protected]>
Date:   Tue Apr 2 12:56:54 2024 +0300

    scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
    
    [ Upstream commit 4406e4176f47177f5e51b4cc7e6a7a2ff3dbfbbd ]
    
    The app_reply->elem[] array is allocated earlier in this function and it
    has app_req.num_ports elements.  Thus this > comparison needs to be >= to
    prevent memory corruption.
    
    Fixes: 7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs")
    Signed-off-by: Dan Carpenter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Himanshu Madhani <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: timers: Fix abs() warning in posix_timers test [+ + +]

Author: John Stultz <[email protected]>
Date:   Wed Apr 10 16:26:30 2024 -0700

    selftests: timers: Fix abs() warning in posix_timers test
    
    commit ed366de8ec89d4f960d66c85fc37d9de22f7bf6d upstream.
    
    Building with clang results in the following warning:
    
      posix_timers.c:69:6: warning: absolute value function 'abs' given an
          argument of type 'long long' but has parameter of type 'int' which may
          cause truncation of value [-Wabsolute-value]
            if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
                ^
    So switch to using llabs() instead.
    
    Fixes: 0bc4b0cf1570 ("selftests: add basic posix timers selftests")
    Signed-off-by: John Stultz <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

smb3: fix Open files on server counter going negative [+ + +]

Author: Steve French <[email protected]>
Date:   Sat Apr 6 23:16:08 2024 -0500

    smb3: fix Open files on server counter going negative
    
    commit 28e0947651ce6a2200b9a7eceb93282e97d7e51a upstream.
    
    We were decrementing the count of open files on server twice
    for the case where we were closing cached directories.
    
    Fixes: 8e843bf38f7b ("cifs: return a single-use cfid if we did not get a lease")
    Cc: [email protected]
    Acked-by: Bharath SM <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tracing: hide unused ftrace_event_id_fops [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Wed Apr 3 10:06:24 2024 +0200

    tracing: hide unused ftrace_event_id_fops
    
    [ Upstream commit 5281ec83454d70d98b71f1836fb16512566c01cd ]
    
    When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the
    unused ftrace_event_id_fops variable:
    
    kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=]
     2155 | static const struct file_operations ftrace_event_id_fops = {
    
    Hide this in the same #ifdef as the reference to it.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Oleg Nesterov <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Zheng Yejian <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: Ajay Kaher <[email protected]>
    Cc: Jinjie Ruan <[email protected]>
    Cc: Clц╘ment Lц╘ger <[email protected]>
    Cc: Dan Carpenter <[email protected]>
    Cc: "Tzvetomir Stoyanov (VMware)" <[email protected]>
    Fixes: 620a30e97feb ("tracing: Don't pass file_operations array to event_create_dir()")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file [+ + +]

Author: Petr Tesarik <[email protected]>
Date:   Thu Apr 4 09:57:40 2024 +0200

    u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file
    
    [ Upstream commit 38a15d0a50e0a43778561a5861403851f0b0194c ]
    
    Fix bogus lockdep warnings if multiple u64_stats_sync variables are
    initialized in the same file.
    
    With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
    
            static struct lock_class_key __key;
    
    Since u64_stats_init() is a function (albeit an inline one), all calls
    within the same file end up using the same instance, effectively treating
    them all as a single lock-class.
    
    Fixes: 9464ca650008 ("net: make u64_stats_init() a function")
    Closes: https://lore.kernel.org/netdev/[email protected]/
    Signed-off-by: Petr Tesarik <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

vhost: Add smp_rmb() in vhost_enable_notify() [+ + +]

Author: Gavin Shan <[email protected]>
Date:   Thu Mar 28 10:21:48 2024 +1000

    vhost: Add smp_rmb() in vhost_enable_notify()
    
    commit df9ace7647d4123209395bb9967e998d5758c645 upstream.
    
    A smp_rmb() has been missed in vhost_enable_notify(), inspired by
    Will. Otherwise, it's not ensured the available ring entries pushed
    by guest can be observed by vhost in time, leading to stale available
    ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
    Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
    
      /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
      -accel kvm -machine virt,gic-version=host -cpu host          \
      -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
      -m 4096M,slots=16,maxmem=64G                                 \
      -object memory-backend-ram,id=mem0,size=4096M                \
       :                                                           \
      -netdev tap,id=vnet0,vhost=true                              \
      -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
       :
      guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
      virtio_net virtio0: output.0:id 100 is not a head!
    
    Add the missed smp_rmb() in vhost_enable_notify(). When it returns true,
    it means there's still pending tx buffers. Since it might read indices,
    so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that
    it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
    ("vhost: cache avail index in vhost_enable_notify()").
    
    Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
    Cc: <[email protected]> # v5.18+
    Reported-by: Yihuang Yu <[email protected]>
    Suggested-by: Will Deacon <[email protected]>
    Signed-off-by: Gavin Shan <[email protected]>
    Acked-by: Jason Wang <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Michael S. Tsirkin <[email protected]>
    Reviewed-by: Stefano Garzarella <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

vhost: Add smp_rmb() in vhost_vq_avail_empty() [+ + +]

Author: Gavin Shan <[email protected]>
Date:   Thu Mar 28 10:21:47 2024 +1000

    vhost: Add smp_rmb() in vhost_vq_avail_empty()
    
    commit 22e1992cf7b034db5325660e98c41ca5afa5f519 upstream.
    
    A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
    Will. Otherwise, it's not ensured the available ring entries pushed
    by guest can be observed by vhost in time, leading to stale available
    ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
    Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
    
      /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
      -accel kvm -machine virt,gic-version=host -cpu host          \
      -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
      -m 4096M,slots=16,maxmem=64G                                 \
      -object memory-backend-ram,id=mem0,size=4096M                \
       :                                                           \
      -netdev tap,id=vnet0,vhost=true                              \
      -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
       :
      guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
      virtio_net virtio0: output.0:id 100 is not a head!
    
    Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch()
    returns true, it means there's still pending tx buffers. Since it might
    read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc().
    Note that it should be safe until vq->avail_idx is changed by commit
    275bf960ac697 ("vhost: better detection of available buffers").
    
    Fixes: 275bf960ac69 ("vhost: better detection of available buffers")
    Cc: <[email protected]> # v4.11+
    Reported-by: Yihuang Yu <[email protected]>
    Suggested-by: Will Deacon <[email protected]>
    Signed-off-by: Gavin Shan <[email protected]>
    Acked-by: Jason Wang <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Michael S. Tsirkin <[email protected]>
    Reviewed-by: Stefano Garzarella <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/apic: Force native_apic_mem_read() to use the MOV instruction [+ + +]

Author: Adam Dunlap <[email protected]>
Date:   Mon Mar 18 16:09:27 2024 -0700

    x86/apic: Force native_apic_mem_read() to use the MOV instruction
    
    commit 5ce344beaca688f4cdea07045e0b8f03dc537e74 upstream.
    
    When done from a virtual machine, instructions that touch APIC memory
    must be emulated. By convention, MMIO accesses are typically performed
    via io.h helpers such as readl() or writeq() to simplify instruction
    emulation/decoding (ex: in KVM hosts and SEV guests) [0].
    
    Currently, native_apic_mem_read() does not follow this convention,
    allowing the compiler to emit instructions other than the MOV
    instruction generated by readl(). In particular, when the kernel is
    compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
    would emit a TESTL instruction which is not supported by the SEV-ES
    emulator, causing a boot failure in that environment. It is likely the
    same problem would happen in a TDX guest as that uses the same
    instruction emulator as SEV-ES.
    
    To make sure all emulators can emulate APIC memory reads via MOV, use
    the readl() function in native_apic_mem_read(). It is expected that any
    emulator would support MOV in any addressing mode as it is the most
    generic and is what is usually emitted currently.
    
    The TESTL instruction is emitted when native_apic_mem_read() is inlined
    into apic_mem_wait_icr_idle(). The emulator comes from
    insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
    extend insn_decode_mmio() to support more instructions since, in theory,
    the compiler could choose to output nearly any instruction for such
    reads which would bloat the emulator beyond reason.
    
      [0] https://lore.kernel.org/all/[email protected]/
    
      [ bp: Massage commit message, fix typos. ]
    
    Signed-off-by: Adam Dunlap <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Ard Biesheuvel <[email protected]>
    Tested-by: Kevin Loughlin <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Wed Apr 10 22:40:46 2024 -0700

    x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
    
    commit cb2db5bb04d7f778fbc1a1ea2507aab436f1bff3 upstream.
    
    There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
    over.  It's even read in the BHI sysfs function which is a big no-no.
    Just read it once and cache it.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Nikolay Borisov <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Clarify that syscall hardening isn't a BHI mitigation [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Wed Apr 10 22:40:48 2024 -0700

    x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
    
    commit 5f882f3b0a8bf0788d5a0ee44b1191de5319bb8a upstream.
    
    While syscall hardening helps prevent some BHI attacks, there's still
    other low-hanging fruit remaining.  Don't classify it as a mitigation
    and make it clear that the system may still be vulnerable if it doesn't
    have a HW or SW mitigation enabled.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Fix BHI documentation [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Wed Apr 10 22:40:45 2024 -0700

    x86/bugs: Fix BHI documentation
    
    commit dfe648903f42296866d79f10d03f8c85c9dfba30 upstream.
    
    Fix up some inaccuracies in the BHI documentation.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Nikolay Borisov <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Fix BHI handling of RRSBA [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Wed Apr 10 22:40:47 2024 -0700

    x86/bugs: Fix BHI handling of RRSBA
    
    commit 1cea8a280dfd1016148a3820676f2f03e3f5b898 upstream.
    
    The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
    disabled by the Spectre v2 mitigation (or can otherwise be disabled by
    the BHI mitigation itself if needed).  In that case retpolines are fine.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Fix return type of spectre_bhi_state() [+ + +]

Author: Daniel Sneddon <[email protected]>
Date:   Tue Apr 9 16:08:05 2024 -0700

    x86/bugs: Fix return type of spectre_bhi_state()
    
    commit 04f4230e2f86a4e961ea5466eda3db8c1762004d upstream.
    
    The definition of spectre_bhi_state() incorrectly returns a const char
    * const. This causes the a compiler warning when building with W=1:
    
     warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
     2812 | static const char * const spectre_bhi_state(void)
    
    Remove the const qualifier from the pointer.
    
    Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
    Reported-by: Sean Christopherson <[email protected]>
    Signed-off-by: Daniel Sneddon <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Wed Apr 10 22:40:50 2024 -0700

    x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
    
    commit 36d4fe147c870f6d3f6602befd7ef44393a1c87a upstream.
    
    Unlike most other mitigations' "auto" options, spectre_bhi=auto only
    mitigates newer systems, which is confusing and not particularly useful.
    
    Remove it.
    
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Nikolay Borisov <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr' [+ + +]

Author: Ingo Molnar <[email protected]>
Date:   Thu Apr 11 09:25:36 2024 +0200

    x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
    
    commit d0485730d2189ffe5d986d4e9e191f1e4d5ffd24 upstream.
    
    So we are using the 'ia32_cap' value in a number of places,
    which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
    
    But there's very little 'IA32' about it - this isn't 32-bit only
    code, nor does it originate from there, it's just a historic
    quirk that many Intel MSR names are prefixed with IA32_.
    
    This is already clear from the helper method around the MSR:
    x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
    
    So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
    its role and with the naming of the helper function.
    
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: Josh Poimboeuf <[email protected]>
    Cc: Nikolay Borisov <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI [+ + +]

Author: Josh Poimboeuf <[email protected]>
Date:   Wed Apr 10 22:40:51 2024 -0700

    x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
    
    commit 4f511739c54b549061993b53fc0380f48dfca23b upstream.
    
    For consistency with the other CONFIG_MITIGATION_* options, replace the
    CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
    CONFIG_MITIGATION_SPECTRE_BHI option.
    
    [ mingo: Fix ]
    
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Cc: Sean Christopherson <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Nikolay Borisov <[email protected]>
    Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Tue Apr 9 10:51:05 2024 -0700

    x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
    
    commit f337a6a21e2fd67eadea471e93d05dd37baaa9be upstream.
    
    Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built
    with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly
    states that disabling SPECULATION_MITIGATIONS is supposed to turn off all
    mitigations by default.
    
      Б■┌ If you say N, all mitigations will be disabled. You really
      Б■┌ should know what you are doing to say so.
    
    As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in
    some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.
    
    Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Reviewed-by: Daniel Sneddon <[email protected]>
    Cc: [email protected]
    Cc: Linus Torvalds <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Thu Apr 4 20:27:38 2024 +0000

    xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
    
    [ Upstream commit 237f3cf13b20db183d3706d997eedc3c49eacd44 ]
    
    syzbot reported an illegal copy in xsk_setsockopt() [1]
    
    Make sure to validate setsockopt() @optlen parameter.
    
    [1]
    
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
     BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
    Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
    
    CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
      copy_from_sockptr include/linux/sockptr.h:55 [inline]
      xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
      do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    RIP: 0033:0x7fb40587de69
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
    RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
    RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
     </TASK>
    
    Allocated by task 7549:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
      __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
      kasan_kmalloc include/linux/kasan.h:211 [inline]
      __do_kmalloc_node mm/slub.c:3966 [inline]
      __kmalloc+0x233/0x4a0 mm/slub.c:3979
      kmalloc include/linux/slab.h:632 [inline]
      __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
      do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    The buggy address belongs to the object at ffff888028c6cde0
     which belongs to the cache kmalloc-8 of size 8
    The buggy address is located 1 bytes to the right of
     allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
    
    The buggy address belongs to the physical page:
    page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
    anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
    page_type: 0xffffffff()
    raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
    raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
      set_page_owner include/linux/page_owner.h:31 [inline]
      post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
      prep_new_page mm/page_alloc.c:1540 [inline]
      get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
      __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
      __alloc_pages_node include/linux/gfp.h:238 [inline]
      alloc_pages_node include/linux/gfp.h:261 [inline]
      alloc_slab_page+0x5f/0x160 mm/slub.c:2175
      allocate_slab mm/slub.c:2338 [inline]
      new_slab+0x84/0x2f0 mm/slub.c:2391
      ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
      __slab_alloc mm/slub.c:3610 [inline]
      __slab_alloc_node mm/slub.c:3663 [inline]
      slab_alloc_node mm/slub.c:3835 [inline]
      __do_kmalloc_node mm/slub.c:3965 [inline]
      __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
      kmalloc_node include/linux/slab.h:648 [inline]
      __vmalloc_area_node mm/vmalloc.c:3197 [inline]
      __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
      __vmalloc_node mm/vmalloc.c:3457 [inline]
      vzalloc+0x79/0x90 mm/vmalloc.c:3530
      bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
      bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
      __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
      __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
      __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
      __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    page last free pid 6650 tgid 6647 stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1140 [inline]
      free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
      free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
      release_pages+0x2117/0x2400 mm/swap.c:1042
      tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
      tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
      tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
      tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
      exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
      __mmput+0x115/0x3c0 kernel/fork.c:1345
      exit_mm+0x220/0x310 kernel/exit.c:569
      do_exit+0x99e/0x27e0 kernel/exit.c:865
      do_group_exit+0x207/0x2c0 kernel/exit.c:1027
      get_signal+0x176e/0x1850 kernel/signal.c:2907
      arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
      exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
      exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
      __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
      syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
      do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Memory state around the buggy address:
     ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
     ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
    >ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
                                                           ^
     ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
     ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
    
    Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: "Bjц╤rn Tц╤pel" <[email protected]>
    Cc: Magnus Karlsson <[email protected]>
    Cc: Maciej Fijalkowski <[email protected]>
    Cc: Jonathan Lemon <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Список изменений в Linux 6.1.87