Author: Pedro Falcato <[email protected]> Date: Wed Aug 7 10:47:25 2024 +0100 9p: Avoid creating multiple slab caches with the same name [ Upstream commit 79efebae4afc2221fa814c3cae001bede66ab259 ] In the spirit of [1], avoid creating multiple slab caches with the same name. Instead, add the dev_name into the mix. [1]: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Pedro Falcato <[email protected]> Reported-by: [email protected] Message-ID: <[email protected]> Signed-off-by: Dominique Martinet <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Linus Torvalds <[email protected]> Date: Mon Oct 21 11:57:38 2024 -0700 9p: fix slab cache name creation for real commit a360f311f57a36e96d88fa8086b749159714dcd2 upstream. This was attempted by using the dev_name in the slab cache name, but as Omar Sandoval pointed out, that can be an arbitrary string, eg something like "/dev/root". Which in turn trips verify_dirent_name(), which fails if a filename contains a slash. So just make it use a sequence counter, and make it an atomic_t to avoid any possible races or locking issues. Reported-and-tested-by: Omar Sandoval <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Fixes: 79efebae4afc ("9p: Avoid creating multiple slab caches with the same name") Acked-by: Vlastimil Babka <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Thorsten Leemhuis <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Dominique Martinet <[email protected]> Date: Thu May 23 20:31:38 2024 +0900 9p: v9fs_fid_find: also lookup by inode if not found dentry [ Upstream commit 38d222b3163f7b7d737e5d999ffc890a12870e36 ] It's possible for v9fs_fid_find "find by dentry" branch to not turn up anything despite having an entry set (because e.g. uid doesn't match), in which case the calling code will generally make an extra lookup to the server. In this case we might have had better luck looking by inode, so fall back to look up by inode if we have one and the lookup by dentry failed. Message-Id: <[email protected]> Reviewed-by: Christian Schoenebeck <[email protected]> Signed-off-by: Dominique Martinet <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Heusel <[email protected]> Date: Thu Oct 10 15:32:11 2024 +0200 ASoC: amd: yc: Add quirk for ASUS Vivobook S15 M3502RA [ Upstream commit 182fff3a2aafe4e7f3717a0be9df2fe2ed1a77de ] As reported the builtin microphone doesn't work on the ASUS Vivobook model S15 OLED M3502RA. Therefore add a quirk for it to make it work. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219345 Signed-off-by: Christian Heusel <[email protected]> Link: https://patch.msgid.link/20241010-bugzilla-219345-asus-vivobook-v1-1-3bb24834e2c3@heusel.eu Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ilya Dudikov <[email protected]> Date: Wed Oct 16 10:40:37 2024 +0700 ASoC: amd: yc: Fix non-functional mic on ASUS E1404FA [ Upstream commit b0867999e3282378a0b26a7ad200233044d31eca ] ASUS Vivobook E1404FA needs a quirks-table entry for the internal microphone to function properly. Signed-off-by: Ilya Dudikov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Shengjiu Wang <[email protected]> Date: Mon Oct 14 13:38:33 2024 +0800 ASoC: fsl_micfil: Add sample rate constraint [ Upstream commit b9a8ecf81066e01e8a3de35517481bc5aa0439e5 ] On some platforms, for example i.MX93, there is only one audio PLL source, so some sample rate can't be supported. If the PLL source is used for 8kHz series rates, then 11kHz series rates can't be supported. So add constraints according to the frequency of available clock sources, then alsa-lib will help to convert the unsupported rate for the driver. Signed-off-by: Shengjiu Wang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jack Yu <[email protected]> Date: Mon Oct 21 06:15:44 2024 +0000 ASoC: rt722-sdca: increase clk_stop_timeout to fix clock stop issue [ Upstream commit 038fa6ddf5d22694f61ff7a7a53c8887c6b08c45 ] clk_stop_timeout should be increased to 900ms to fix clock stop issue. Signed-off-by: Jack Yu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: SurajSonawane2415 <[email protected]> Date: Mon Oct 7 16:44:16 2024 +0530 block: Fix elevator_get_default() checking for NULL q->tag_set [ Upstream commit b402328a24ee7193a8ab84277c0c90ae16768126 ] elevator_get_default() and elv_support_iosched() both check for whether or not q->tag_set is non-NULL, however it's not possible for them to be NULL. This messes up some static checkers, as the checking of tag_set isn't consistent. Remove the checks, which both simplifies the logic and avoids checker errors. Signed-off-by: SurajSonawane2415 <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Zijian Zhang <[email protected]> Date: Wed Nov 6 00:37:42 2024 +0000 bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx [ Upstream commit 44d0469f79bd3d0b3433732877358df7dc6b17b1 ] As the introduction of the support for vsock and unix sockets in sockmap, tls_sw_has_ctx_tx/rx cannot presume the socket passed in must be IS_ICSK. vsock and af_unix sockets have vsock_sock and unix_sock instead of inet_connection_sock. For these sockets, tls_get_ctx may return an invalid pointer and cause page fault in function tls_sw_ctx_rx. BUG: unable to handle page fault for address: 0000000000040030 Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:sk_psock_strp_data_ready+0x23/0x60 Call Trace: ? __die+0x81/0xc3 ? no_context+0x194/0x350 ? do_page_fault+0x30/0x110 ? async_page_fault+0x3e/0x50 ? sk_psock_strp_data_ready+0x23/0x60 virtio_transport_recv_pkt+0x750/0x800 ? update_load_avg+0x7e/0x620 vsock_loopback_work+0xd0/0x100 process_one_work+0x1a7/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x112/0x130 ? __kthread_cancel_work+0x40/0x40 ret_from_fork+0x1f/0x40 v2: - Add IS_ICSK check v3: - Update the commits in Fixes Fixes: 634f1a7110b4 ("vsock: support sockmap") Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: Zijian Zhang <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Cong Wang <[email protected]> Acked-by: Stefano Garzarella <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hou Tao <[email protected]> Date: Thu Oct 24 09:35:58 2024 +0800 bpf: Check validity of link->type in bpf_link_show_fdinfo() [ Upstream commit 8421d4c8762bd022cb491f2f0f7019ef51b4f0a7 ] If a newly-added link type doesn't invoke BPF_LINK_TYPE(), accessing bpf_link_type_strs[link->type] may result in an out-of-bounds access. To spot such missed invocations early in the future, checking the validity of link->type in bpf_link_show_fdinfo() and emitting a warning when such invocations are missed. Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Jiawei Ye <[email protected]> Date: Fri Nov 8 08:18:52 2024 +0000 bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6 [ Upstream commit fb86c42a2a5d44e849ddfbc98b8d2f4f40d36ee3 ] In the bpf_out_neigh_v6 function, rcu_read_lock() is used to begin an RCU read-side critical section. However, when unlocking, one branch incorrectly uses a different RCU unlock flavour rcu_read_unlock_bh() instead of rcu_read_unlock(). This mismatch in RCU locking flavours can lead to unexpected behavior and potential concurrency issues. This possible bug was identified using a static analysis tool developed by myself, specifically designed to detect RCU-related issues. This patch corrects the mismatched unlock flavour by replacing the incorrect rcu_read_unlock_bh() with the appropriate rcu_read_unlock(), ensuring that the RCU critical section is properly exited. This change prevents potential synchronization issues and aligns with proper RCU usage patterns. Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh") Signed-off-by: Jiawei Ye <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Rik van Riel <[email protected]> Date: Tue Oct 8 17:07:35 2024 -0400 bpf: use kvzmalloc to allocate BPF verifier environment [ Upstream commit 434247637c66e1be2bc71a9987d4c3f0d8672387 ] The kzmalloc call in bpf_check can fail when memory is very fragmented, which in turn can lead to an OOM kill. Use kvzmalloc to fall back to vmalloc when memory is too fragmented to allocate an order 3 sized bpf verifier environment. Admittedly this is not a very common case, and only happens on systems where memory has already been squeezed close to the limit, but this does not seem like much of a hot path, and it's a simple enough fix. Signed-off-by: Rik van Riel <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Herbert Xu <[email protected]> Date: Sun Oct 6 09:18:37 2024 +0800 crypto: api - Fix liveliness check in crypto_alg_tested [ Upstream commit b81e286ba154a4e0f01a94d99179a97f4ba3e396 ] As algorithm testing is carried out without holding the main crypto lock, it is always possible for the algorithm to go away during the test. So before crypto_alg_tested updates the status of the tested alg, it checks whether it's still on the list of all algorithms. This is inaccurate because it may be off the main list but still on the list of algorithms to be removed. Updating the algorithm status is safe per se as the larval still holds a reference to it. However, killing spawns of other algorithms that are of lower priority is clearly a deficiency as it adds unnecessary churn. Fix the test by checking whether the algorithm is dead. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Herbert Xu <[email protected]> Date: Wed Oct 9 16:38:48 2024 +0800 crypto: marvell/cesa - Disable hash algorithms [ Upstream commit e845d2399a00f866f287e0cefbd4fc7d8ef0d2f7 ] Disable cesa hash algorithms by lowering the priority because they appear to be broken when invoked in parallel. This allows them to still be tested for debugging purposes. Reported-by: Klaus Kudielka <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Philip Yang <[email protected]> Date: Fri Oct 4 16:28:07 2024 -0400 drm/amdkfd: Accounting pdd vram_usage for svm [ Upstream commit 68d26c10ef503175df3142db6fcd75dd94860592 ] Process device data pdd->vram_usage is read by rocm-smi via sysfs, this is currently missing the svm_bo usage accounting, so "rocm-smi --showpids" per process VRAM usage report is incorrect. Add pdd->vram_usage accounting when svm_bo allocation and release, change to atomic64_t type because it is updated outside process mutex now. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 98c0b0efcc11f2a5ddf3ce33af1e48eedf808b04) Signed-off-by: Sasha Levin <[email protected]>
Author: Ian Forbes <[email protected]> Date: Thu Aug 8 15:06:34 2024 -0500 drm/vmwgfx: Limit display layout ioctl array size to VMWGFX_NUM_DISPLAY_UNITS [ Upstream commit 28a5dfd4f615539fb22fb6d5c219c199c14e6eb6 ] Currently the array size is only limited by the largest kmalloc size which is incorrect. This change will also return a more specific error message than ENOMEM to userspace. Signed-off-by: Ian Forbes <[email protected]> Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Martin Krastev <[email protected]> Signed-off-by: Zack Rusin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Alessandro Zanni <[email protected]> Date: Thu Oct 17 14:05:51 2024 +0200 fs: Fix uninitialized value issue in from_kuid and from_kgid [ Upstream commit 15f34347481648a567db67fb473c23befb796af5 ] ocfs2_setattr() uses attr->ia_mode, attr->ia_uid and attr->ia_gid in a trace point even though ATTR_MODE, ATTR_UID and ATTR_GID aren't set. Initialize all fields of newattrs to avoid uninitialized variables, by checking if ATTR_MODE, ATTR_UID, ATTR_GID are initialized, otherwise 0. Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=6c55f725d1bdc8c52058 Signed-off-by: Alessandro Zanni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hans de Goede <[email protected]> Date: Thu Oct 10 11:45:12 2024 +0200 HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard [ Upstream commit 51268879eb2bfc563a91cdce69362d9dbf707e7e ] The Thinkpad X1 Tablet Gen 3 keyboard has the same Lenovo specific quirks as the original Thinkpad X1 Tablet keyboard. Add the PID for the "Thinkpad X1 Tablet Gen 3 keyboard" to the hid-lenovo driver to fix the FnLock, Mute and media buttons not working. Suggested-by: Izhar Firdaus <[email protected]> Closes https://bugzilla.redhat.com/show_bug.cgi?id=2315395 Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: WangYuli <[email protected]> Date: Mon Oct 7 12:08:03 2024 +0800 HID: multitouch: Add quirk for HONOR MagicBook Art 14 touchpad [ Upstream commit 7a5ab8071114344f62a8b1e64ed3452a77257d76 ] The behavior of HONOR MagicBook Art 14 touchpad is not consistent after reboots, as sometimes it reports itself as a touchpad, and sometimes as a mouse. Similarly to GLO-GXXX it is possible to call MT_QUIRK_FORCE_GET_FEATURE as a workaround to force set feature in mt_set_input_mode() for such special touchpad device. [[email protected]: reword changelog a little bit] Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/1040 Signed-off-by: Wentao Guan <[email protected]> Signed-off-by: WangYuli <[email protected]> Reviewed-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Kenneth Albanowski <[email protected]> Date: Fri Oct 4 10:24:29 2024 -0700 HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad [ Upstream commit 526748b925185e95f1415900ee13c2469d4b64cc ] The Logitech Casa Touchpad does not reliably send touch release signals when communicating through the Logitech Bolt wireless-to-USB receiver. Adjusting the device class to add MT_QUIRK_NOT_SEEN_MEANS_UP to make sure that no touches become stuck, MT_QUIRK_FORCE_MULTI_INPUT is not needed, but harmless. Linux does not have information on which devices are connected to the Bolt receiver, so we have to enable this for the entire device. Signed-off-by: Kenneth Albanowski <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Stefan Blum <[email protected]> Date: Sun Oct 6 10:12:23 2024 +0200 HID: multitouch: Add support for B2402FVA track point [ Upstream commit 1a5cbb526ec4b885177d06a8bc04f38da7dbb1d9 ] By default the track point does not work on the Asus Expertbook B2402FVA. From libinput record i got the ID of the track point device: evdev: # Name: ASUE1201:00 04F3:32AE # ID: bus 0x18 vendor 0x4f3 product 0x32ae version 0x100 I found that the track point is functional, when i set the MT_CLS_WIN_8_FORCE_MULTI_INPUT_NSMU class for the reported device. Signed-off-by: Stefan Blum <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hagar Hemdan <[email protected]> Date: Tue Jun 4 13:05:27 2024 +0000 io_uring: fix possible deadlock in io_register_iowq_max_workers() commit 73254a297c2dd094abec7c9efee32455ae875bdf upstream. The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <[email protected]> Signed-off-by: Hagar Hemdan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Robin Murphy <[email protected]> Date: Tue Oct 8 16:21:17 2024 +0100 iommu/arm-smmu: Clarify MMU-500 CPRE workaround [ Upstream commit 0dfe314cdd0d378f96bb9c6bdc05c8120f48606d ] CPRE workarounds are implicated in at least 5 MMU-500 errata, some of which remain unfixed. The comment and warning message have proven to be unhelpfully misleading about this scope, so reword them to get the point across with less risk of going out of date or confusing users. Signed-off-by: Robin Murphy <[email protected]> Link: https://lore.kernel.org/r/dfa82171b5248ad7cf1f25592101a6eec36b8c9a.1728400877.git.robin.murphy@arm.com Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sergey Matsievskiy <[email protected]> Date: Wed Sep 25 21:44:15 2024 +0300 irqchip/ocelot: Fix trigger register address [ Upstream commit 9e9c4666abb5bb444dac37e2d7eb5250c8d52a45 ] Controllers, supported by this driver, have two sets of registers: * (main) interrupt registers control peripheral interrupt sources. * device interrupt registers configure per-device (network interface) interrupts and act as an extra stage before the main interrupt registers. In the driver unmask code, device trigger registers are used in the mask calculation of the main interrupt sticky register, mixing two kinds of registers. Use the main interrupt trigger register instead. Signed-off-by: Sergey Matsievskiy <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Greg Kroah-Hartman <[email protected]> Date: Sun Nov 17 15:08:59 2024 +0100 Linux 6.6.62 Link: https://lore.kernel.org/r/[email protected] Tested-by: Takeshi Ogasawara <[email protected]> Tested-by: Peter Schneider <[email protected]> Tested-by: Harshit Mogalapalli <[email protected]> Tested-by: Jon Hunter <[email protected]> Tested-by: SeongJae Park <[email protected]> Tested-by: Florian Fainelli <[email protected]> Tested-by: Mark Brown <[email protected]> Tested-by: Ron Economos <[email protected]> Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Hardik Garg <[email protected]> Tested-by: Shuah Khan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yanteng Si <[email protected]> Date: Mon Oct 21 22:11:18 2024 +0800 LoongArch: Use "Exception return address" to comment ERA [ Upstream commit b69269c870ece1bc7d2e3e39ca76f4602f2cb0dd ] The information contained in the comment for LOONGARCH_CSR_ERA is even less informative than the macro itself, which can cause confusion for junior developers. Let's use the full English term. Signed-off-by: Yanteng Si <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ryan Roberts <[email protected]> Date: Fri Dec 1 16:10:45 2023 +0000 mm/readahead: do not allow order-1 folio commit ec056cef76a525706601b32048f174f9bea72c7c upstream. The THP machinery does not support order-1 folios because it requires meta data spanning the first 3 `struct page`s. So order-2 is the smallest large folio that we can safely create. There was a theoretical bug whereby if ra->size was 2 or 3 pages (due to the device-specific bdi->ra_pages being set that way), we could end up with order = 1. Fix this by unconditionally checking if the preferred order is 1 and if so, set it to 0. Previously this was done in a few specific places, but with this refactoring it is done just once, unconditionally, at the end of the calculation. This is a theoretical bug found during review of the code; I have no evidence to suggest this manifests in the real world (I expect all device-specific ra_pages values are much bigger than 3). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryan Roberts <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hugh Dickins <[email protected]> Date: Sun Oct 27 13:02:13 2024 -0700 mm/thp: fix deferred split unqueue naming and locking commit f8f931bba0f92052cf842b7e30917b1afcc77d5a upstream. Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/[email protected] Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Barry Song <[email protected]> Cc: Chris Li <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Usama Arif <[email protected]> Cc: Wei Yang <[email protected]> Cc: Zi Yan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> [ Upstream commit itself does not apply cleanly, because there are fewer calls to folio_undo_large_rmappable() in this tree (in particular, folio migration does not migrate memcg charge), and mm/memcontrol-v1.c has not been split out of mm/memcontrol.c. ] Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hugh Dickins <[email protected]> Date: Tue Oct 3 02:25:33 2023 -0700 mm: add page_rmappable_folio() wrapper commit 23e4883248f0472d806c8b3422ba6257e67bf1a5 upstream. folio_prep_large_rmappable() is being used repeatedly along with a conversion from page to folio, a check non-NULL, a check order > 1: wrap it all up into struct folio *page_rmappable_folio(struct page *). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tejun heo <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Matthew Wilcox (Oracle) <[email protected]> Date: Thu Mar 21 14:24:39 2024 +0000 mm: always initialise folio->_deferred_list commit b7b098cf00a2b65d5654a86dc8edf82f125289c1 upstream. Patch series "Various significant MM patches". These patches all interact in annoying ways which make it tricky to send them out in any way other than a big batch, even though there's not really an overarching theme to connect them. The big effects of this patch series are: - folio_test_hugetlb() becomes reliable, even when called without a page reference - We free up PG_slab, and we could always use more page flags - We no longer need to check PageSlab before calling page_mapcount() This patch (of 9): For compound pages which are at least order-2 (and hence have a deferred_list), initialise it and then we can check at free that the page is not part of a deferred list. We recently found this useful to rule out a source of corruption. [[email protected]: always initialise folio->_deferred_list] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> [ Include three small changes from the upstream commit, for backport safety: replace list_del() by list_del_init() in split_huge_page_to_list(), like c010d47f107f ("mm: thp: split huge page to any lower order pages"); replace list_del() by list_del_init() in folio_undo_large_rmappable(), like 9bcef5973e31 ("mm: memcg: fix split queue list crash when large folio migration"); keep __free_pages() instead of folio_put() in __update_and_free_hugetlb_folio(). ] Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Qun-Wei Lin <[email protected]> Date: Fri Oct 25 16:58:11 2024 +0800 mm: krealloc: Fix MTE false alarm in __do_krealloc commit 704573851b51808b45dae2d62059d1d8189138a2 upstream. This patch addresses an issue introduced by commit 1a83a716ec233 ("mm: krealloc: consider spare memory for __GFP_ZERO") which causes MTE (Memory Tagging Extension) to falsely report a slab-out-of-bounds error. The problem occurs when zeroing out spare memory in __do_krealloc. The original code only considered software-based KASAN and did not account for MTE. It does not reset the KASAN tag before calling memset, leading to a mismatch between the pointer tag and the memory tag, resulting in a false positive. Example of the error: ================================================================== swapper/0: BUG: KASAN: slab-out-of-bounds in __memset+0x84/0x188 swapper/0: Write at addr f4ffff8005f0fdf0 by task swapper/0/1 swapper/0: Pointer tag: [f4], memory tag: [fe] swapper/0: swapper/0: CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12. swapper/0: Hardware name: MT6991(ENG) (DT) swapper/0: Call trace: swapper/0: dump_backtrace+0xfc/0x17c swapper/0: show_stack+0x18/0x28 swapper/0: dump_stack_lvl+0x40/0xa0 swapper/0: print_report+0x1b8/0x71c swapper/0: kasan_report+0xec/0x14c swapper/0: __do_kernel_fault+0x60/0x29c swapper/0: do_bad_area+0x30/0xdc swapper/0: do_tag_check_fault+0x20/0x34 swapper/0: do_mem_abort+0x58/0x104 swapper/0: el1_abort+0x3c/0x5c swapper/0: el1h_64_sync_handler+0x80/0xcc swapper/0: el1h_64_sync+0x68/0x6c swapper/0: __memset+0x84/0x188 swapper/0: btf_populate_kfunc_set+0x280/0x3d8 swapper/0: __register_btf_kfunc_id_set+0x43c/0x468 swapper/0: register_btf_kfunc_id_set+0x48/0x60 swapper/0: register_nf_nat_bpf+0x1c/0x40 swapper/0: nf_nat_init+0xc0/0x128 swapper/0: do_one_initcall+0x184/0x464 swapper/0: do_initcall_level+0xdc/0x1b0 swapper/0: do_initcalls+0x70/0xc0 swapper/0: do_basic_setup+0x1c/0x28 swapper/0: kernel_init_freeable+0x144/0x1b8 swapper/0: kernel_init+0x20/0x1a8 swapper/0: ret_from_fork+0x10/0x20 ================================================================== Fixes: 1a83a716ec233 ("mm: krealloc: consider spare memory for __GFP_ZERO") Signed-off-by: Qun-Wei Lin <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kefeng Wang <[email protected]> Date: Tue May 21 21:03:15 2024 +0800 mm: refactor folio_undo_large_rmappable() commit 593a10dabe08dcf93259fce2badd8dc2528859a8 upstream. Folios of order <= 1 are not in deferred list, the check of order is added into folio_undo_large_rmappable() from commit 8897277acfef ("mm: support order-1 folios in the page cache"), but there is a repeated check for small folio (order 0) during each call of the folio_undo_large_rmappable(), so only keep folio_order() check inside the function. In addition, move all the checks into header file to save a function call for non-large-rmappable or empty deferred_list folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vishal Moola (Oracle) <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Lance Yang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> [ Upstream commit itself does not apply cleanly, because there are fewer calls to folio_undo_large_rmappable() in this tree. ] Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Matthew Wilcox (Oracle) <[email protected]> Date: Mon Feb 26 15:55:28 2024 -0500 mm: support order-1 folios in the page cache commit 8897277acfef7f70fdecc054073bea2542fc7a1b upstream. Folios of order 1 have no space to store the deferred list. This is not a problem for the page cache as file-backed folios are never placed on the deferred list. All we need to do is prevent the core MM from touching the deferred list for order 1 folios and remove the code which prevented us from allocating order 1 folios. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Zi Yan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Michal Koutny <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Linus Walleij <[email protected]> Date: Sat Oct 12 22:35:23 2024 +0200 net: phy: mdio-bcm-unimac: Add BCM6846 support [ Upstream commit 906b77ca91c7e9833b4e47bedb6bec76be71d497 ] Add Unimac mdio compatible string for the special BCM6846 variant. This variant has a few extra registers compared to other versions. Suggested-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/linux-devicetree/[email protected]/ Signed-off-by: Linus Walleij <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Reinhard Speyerer <[email protected]> Date: Fri Oct 18 22:52:55 2024 +0200 net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition [ Upstream commit 64761c980cbf71fb7a532a8c7299907ea972a88c ] Add Fibocom FG132 0x0112 composition: T: Bus=03 Lev=02 Prnt=06 Port=01 Cnt=02 Dev#= 10 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2cb7 ProdID=0112 Rev= 5.15 S: Manufacturer=Fibocom Wireless Inc. S: Product=Fibocom Module S: SerialNumber=xxxxxxxx C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms Signed-off-by: Reinhard Speyerer <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Nilay Shroff <[email protected]> Date: Wed Oct 16 08:33:14 2024 +0530 nvme-loop: flush off pending I/O while shutting down loop controller [ Upstream commit c199fac88fe7c749f88a0653e9f621b9f5a71cf1 ] While shutting down loop controller, we first quiesce the admin/IO queue, delete the admin/IO tag-set and then at last destroy the admin/IO queue. However it's quite possible that during the window between quiescing and destroying of the admin/IO queue, some admin/IO request might sneak in and if that happens then we could potentially encounter a hung task because shutdown operation can't forward progress until any pending I/O is flushed off. This commit helps ensure that before destroying the admin/IO queue, we unquiesce the admin/IO queue so that any outstanding requests, which are added after the admin/IO queue is quiesced, are now flushed to its completion. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Nilay Shroff <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Keith Busch <[email protected]> Date: Tue Oct 15 07:30:17 2024 -0700 nvme-multipath: defer partition scanning [ Upstream commit 1f021341eef41e77a633186e9be5223de2ce5d48 ] We need to suppress the partition scan from occuring within the controller's scan_work context. If a path error occurs here, the IO will wait until a path becomes available or all paths are torn down, but that action also occurs within scan_work, so it would deadlock. Defer the partion scan to a different context that does not block scan_work. Reported-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Breno Leitao <[email protected]> Date: Mon Nov 4 04:24:40 2024 -0800 nvme/host: Fix RCU list traversal to use SRCU primitive [ Upstream commit 6d1c69945ce63a9fba22a4abf646cf960d878782 ] The code currently uses list_for_each_entry_rcu() while holding an SRCU lock, triggering false positive warnings with CONFIG_PROVE_RCU=y enabled: drivers/nvme/host/core.c:3770 RCU-list traversed in non-reader section!! While the list is properly protected by SRCU lock, the code uses the wrong list traversal primitive. Replace list_for_each_entry_rcu() with list_for_each_entry_srcu() to correctly indicate SRCU-based protection and eliminate the false warning. Fixes: be647e2c76b2 ("nvme: use srcu for iterating namespace list") Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Greg Joyce <[email protected]> Date: Mon Oct 7 14:33:24 2024 -0500 nvme: disable CC.CRIME (NVME_CC_CRIME) [ Upstream commit 0ce96a6708f34280a536263ee5c67e20c433dcce ] Disable NVME_CC_CRIME so that CSTS.RDY indicates that the media is ready and able to handle commands without returning NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY. Signed-off-by: Greg Joyce <[email protected]> Reviewed-by: Nilay Shroff <[email protected]> Tested-by: Nilay Shroff <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Nilay Shroff <[email protected]> Date: Wed Oct 16 08:33:15 2024 +0530 nvme: make keep-alive synchronous operation [ Upstream commit d06923670b5a5f609603d4a9fee4dec02d38de9c ] The nvme keep-alive operation, which executes at a periodic interval, could potentially sneak in while shutting down a fabric controller. This may lead to a race between the fabric controller admin queue destroy code path (invoked while shutting down controller) and hw/hctx queue dispatcher called from the nvme keep-alive async request queuing operation. This race could lead to the kernel crash shown below: Call Trace: autoremove_wake_function+0x0/0xbc (unreliable) __blk_mq_sched_dispatch_requests+0x114/0x24c blk_mq_sched_dispatch_requests+0x44/0x84 blk_mq_run_hw_queue+0x140/0x220 nvme_keep_alive_work+0xc8/0x19c [nvme_core] process_one_work+0x200/0x4e0 worker_thread+0x340/0x504 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 While shutting down fabric controller, if nvme keep-alive request sneaks in then it would be flushed off. The nvme_keep_alive_end_io function is then invoked to handle the end of the keep-alive operation which decrements the admin->q_usage_counter and assuming this is the last/only request in the admin queue then the admin->q_usage_counter becomes zero. If that happens then blk-mq destroy queue operation (blk_mq_destroy_ queue()) which could be potentially running simultaneously on another cpu (as this is the controller shutdown code path) would forward progress and deletes the admin queue. So, now from this point onward we are not supposed to access the admin queue resources. However the issue here's that the nvme keep-alive thread running hw/hctx queue dispatch operation hasn't yet finished its work and so it could still potentially access the admin queue resource while the admin queue had been already deleted and that causes the above crash. This fix helps avoid the observed crash by implementing keep-alive as a synchronous operation so that we decrement admin->q_usage_counter only after keep-alive command finished its execution and returns the command status back up to its caller (blk_execute_rq()). This would ensure that fabric shutdown code path doesn't destroy the fabric admin queue until keep-alive request finished execution and also keep-alive thread is not running hw/hctx queue dispatch operation. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Nilay Shroff <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hannes Reinecke <[email protected]> Date: Wed Oct 2 13:51:41 2024 +0900 nvme: tcp: avoid race between queue_lock lock and destroy [ Upstream commit 782373ba27660ba7d330208cf5509ece6feb4545 ] Commit 76d54bf20cdc ("nvme-tcp: don't access released socket during error recovery") added a mutex_lock() call for the queue->queue_lock in nvme_tcp_get_address(). However, the mutex_lock() races with mutex_destroy() in nvme_tcp_free_queue(), and causes the WARN below. DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 34077 at kernel/locking/mutex.c:587 __mutex_lock+0xcf0/0x1220 Modules linked in: nvmet_tcp nvmet nvme_tcp nvme_fabrics iw_cm ib_cm ib_core pktcdvd nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc ppdev 9pnet_virtio 9pnet pcspkr netfs parport_pc parport e1000 i2c_piix4 i2c_smbus loop fuse nfnetlink zram bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper xfs drm sym53c8xx floppy nvme scsi_transport_spi nvme_core nvme_auth serio_raw ata_generic pata_acpi dm_multipath qemu_fw_cfg [last unloaded: ib_uverbs] CPU: 3 UID: 0 PID: 34077 Comm: udisksd Not tainted 6.11.0-rc7 #319 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:__mutex_lock+0xcf0/0x1220 Code: 08 84 d2 0f 85 c8 04 00 00 8b 15 ef b6 c8 01 85 d2 0f 85 78 f4 ff ff 48 c7 c6 20 93 ee af 48 c7 c7 60 91 ee af e8 f0 a7 6d fd <0f> 0b e9 5e f4 ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 RSP: 0018:ffff88811305f760 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88812c652058 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001 RBP: ffff88811305f8b0 R08: 0000000000000001 R09: ffffed1075c36341 R10: ffff8883ae1b1a0b R11: 0000000000010498 R12: 0000000000000000 R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88812c652058 FS: 00007f9713ae4980(0000) GS:ffff8883ae180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcd78483c7c CR3: 0000000122c38000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn.cold+0x5b/0x1af ? __mutex_lock+0xcf0/0x1220 ? report_bug+0x1ec/0x390 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? __mutex_lock+0xcf0/0x1220 ? nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp] ? __pfx___mutex_lock+0x10/0x10 ? __lock_acquire+0xd6a/0x59e0 ? nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp] nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp] ? __pfx_nvme_tcp_get_address+0x10/0x10 [nvme_tcp] nvme_sysfs_show_address+0x81/0xc0 [nvme_core] dev_attr_show+0x42/0x80 ? __asan_memset+0x1f/0x40 sysfs_kf_seq_show+0x1f0/0x370 seq_read_iter+0x2cb/0x1130 ? rw_verify_area+0x3b1/0x590 ? __mutex_lock+0x433/0x1220 vfs_read+0x6a6/0xa20 ? lockdep_hardirqs_on+0x78/0x100 ? __pfx_vfs_read+0x10/0x10 ksys_read+0xf7/0x1d0 ? __pfx_ksys_read+0x10/0x10 ? __x64_sys_openat+0x105/0x1d0 do_syscall_64+0x93/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? __pfx_ksys_read+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? do_syscall_64+0x9f/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f9713f55cfa Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 e8 74 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 42 75 f8 ff 48 8b RSP: 002b:00007ffd7f512e70 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000055c38f316859 RCX: 00007f9713f55cfa RDX: 0000000000000fff RSI: 00007ffd7f512eb0 RDI: 0000000000000011 RBP: 00007ffd7f512e90 R08: 0000000000000000 R09: 00000000ffffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000055c38f317148 R13: 0000000000000000 R14: 00007f96f4004f30 R15: 000055c3b6b623c0 </TASK> The WARN is observed when the blktests test case nvme/014 is repeated with tcp transport. It is rare, and 200 times repeat is required to recreate in some test environments. To avoid the WARN, check the NVME_TCP_Q_LIVE flag before locking queue->queue_lock. The flag is cleared long time before the lock gets destroyed. Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Michael Ellerman <[email protected]> Date: Fri Sep 20 19:35:20 2024 +1000 powerpc/powernv: Free name on error in opal_event_init() [ Upstream commit cf8989d20d64ad702a6210c11a0347ebf3852aa7 ] In opal_event_init() if request_irq() fails name is not freed, leading to a memory leak. The code only runs at boot time, there's no way for a user to trigger it, so there's no security impact. Fix the leak by freeing name in the error path. Reported-by: 2639161967 <[email protected]> Closes: https://lore.kernel.org/linuxppc-dev/[email protected] Signed-off-by: Michael Ellerman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Showrya M N <[email protected]> Date: Mon Oct 7 18:28:36 2024 +0530 RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES [ Upstream commit 4e1e3dd88a4cedd5ccc1a3fc3d71e03b70a7a791 ] While running ISER over SIW, the initiator machine encounters a warning from skb_splice_from_iter() indicating that a slab page is being used in send_page. To address this, it is better to add a sendpage_ok() check within the driver itself, and if it returns 0, then MSG_SPLICE_PAGES flag should be disabled before entering the network stack. A similar issue has been discussed for NVMe in this thread: https://lore.kernel.org/all/[email protected]/ WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140 skb_splice_from_iter+0x173/0x320 Call Trace: tcp_sendmsg_locked+0x368/0xe40 siw_tx_hdt+0x695/0xa40 [siw] siw_qp_sq_process+0x102/0xb00 [siw] siw_sq_resume+0x39/0x110 [siw] siw_run_sq+0x74/0x160 [siw] kthread+0xd2/0x100 ret_from_fork+0x34/0x40 ret_from_fork_asm+0x1a/0x30 Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Showrya M N <[email protected]> Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Cyan Yang <[email protected]> Date: Fri Sep 20 00:01:26 2024 +0800 RISCV: KVM: use raw_spinlock for critical section in imsic [ Upstream commit 3ec4350d4efb5ccb6bd0e11d9cf7f2be4f47297d ] For the external interrupt updating procedure in imsic, there was a spinlock to protect it already. But since it should not be preempted in any cases, we should turn to use raw_spinlock to prevent any preemption in case PREEMPT_RT was enabled. Signed-off-by: Cyan Yang <[email protected]> Reviewed-by: Yong-Xuan Wang <[email protected]> Reviewed-by: Anup Patel <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eduard Zingerman <[email protected]> Date: Tue Sep 24 14:08:44 2024 -0700 selftests/bpf: Verify that sync_linked_regs preserves subreg_def [ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ] This test was added because of a bug in verifier.c:sync_linked_regs(), upon range propagation it destroyed subreg_def marks for registers. The test is written in a way to return an upper half of a register that is affected by range propagation and must have it's subreg_def preserved. This gives a return value of 0 and leads to undefined return value if subreg_def mark is not preserved. Signed-off-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Kuniyuki Iwashima <[email protected]> Date: Sat Nov 2 14:24:38 2024 -0700 smb: client: Fix use-after-free of network namespace. [ Upstream commit ef7134c7fc48e1441b398e55a862232868a6f0a7 ] Recently, we got a customer report that CIFS triggers oops while reconnecting to a server. [0] The workload runs on Kubernetes, and some pods mount CIFS servers in non-root network namespaces. The problem rarely happened, but it was always while the pod was dying. The root cause is wrong reference counting for network namespace. CIFS uses kernel sockets, which do not hold refcnt of the netns that the socket belongs to. That means CIFS must ensure the socket is always freed before its netns; otherwise, use-after-free happens. The repro steps are roughly: 1. mount CIFS in a non-root netns 2. drop packets from the netns 3. destroy the netns 4. unmount CIFS We can reproduce the issue quickly with the script [1] below and see the splat [2] if CONFIG_NET_NS_REFCNT_TRACKER is enabled. When the socket is TCP, it is hard to guarantee the netns lifetime without holding refcnt due to async timers. Let's hold netns refcnt for each socket as done for SMC in commit 9744d2bf1976 ("smc: Fix use-after-free in tcp_write_timer_handler()."). Note that we need to move put_net() from cifs_put_tcp_session() to clean_demultiplex_info(); otherwise, __sock_create() still could touch a freed netns while cifsd tries to reconnect from cifs_demultiplex_thread(). Also, maybe_get_net() cannot be put just before __sock_create() because the code is not under RCU and there is a small chance that the same address happened to be reallocated to another netns. [0]: CIFS: VFS: \\XXXXXXXXXXX has not responded in 15 seconds. Reconnecting... CIFS: Serverclose failed 4 times, giving up Unable to handle kernel paging request at virtual address 14de99e461f84a07 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [14de99e461f84a07] address between user and kernel address ranges Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: cls_bpf sch_ingress nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver tcp_diag inet_diag veth xt_state xt_connmark nf_conntrack_netlink xt_nat xt_statistic xt_MASQUERADE xt_mark xt_addrtype ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink overlay nls_ascii nls_cp437 sunrpc vfat fat aes_ce_blk aes_ce_cipher ghash_ce sm4_ce_cipher sm4 sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 sha1_ce ena button sch_fq_codel loop fuse configfs dmi_sysfs sha2_ce sha256_arm64 dm_mirror dm_region_hash dm_log dm_mod dax efivarfs CPU: 5 PID: 2690970 Comm: cifsd Not tainted 6.1.103-109.184.amzn2023.aarch64 #1 Hardware name: Amazon EC2 r7g.4xlarge/, BIOS 1.0 11/1/2018 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : fib_rules_lookup+0x44/0x238 lr : __fib_lookup+0x64/0xbc sp : ffff8000265db790 x29: ffff8000265db790 x28: 0000000000000000 x27: 000000000000bd01 x26: 0000000000000000 x25: ffff000b4baf8000 x24: ffff00047b5e4580 x23: ffff8000265db7e0 x22: 0000000000000000 x21: ffff00047b5e4500 x20: ffff0010e3f694f8 x19: 14de99e461f849f7 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 3f92800abd010002 x11: 0000000000000001 x10: ffff0010e3f69420 x9 : ffff800008a6f294 x8 : 0000000000000000 x7 : 0000000000000006 x6 : 0000000000000000 x5 : 0000000000000001 x4 : ffff001924354280 x3 : ffff8000265db7e0 x2 : 0000000000000000 x1 : ffff0010e3f694f8 x0 : ffff00047b5e4500 Call trace: fib_rules_lookup+0x44/0x238 __fib_lookup+0x64/0xbc ip_route_output_key_hash_rcu+0x2c4/0x398 ip_route_output_key_hash+0x60/0x8c tcp_v4_connect+0x290/0x488 __inet_stream_connect+0x108/0x3d0 inet_stream_connect+0x50/0x78 kernel_connect+0x6c/0xac generic_ip_connect+0x10c/0x6c8 [cifs] __reconnect_target_unlocked+0xa0/0x214 [cifs] reconnect_dfs_server+0x144/0x460 [cifs] cifs_reconnect+0x88/0x148 [cifs] cifs_readv_from_socket+0x230/0x430 [cifs] cifs_read_from_socket+0x74/0xa8 [cifs] cifs_demultiplex_thread+0xf8/0x704 [cifs] kthread+0xd0/0xd4 Code: aa0003f8 f8480f13 eb18027f 540006c0 (b9401264) [1]: CIFS_CRED="/root/cred.cifs" CIFS_USER="Administrator" CIFS_PASS="Password" CIFS_IP="X.X.X.X" CIFS_PATH="//${CIFS_IP}/Users/Administrator/Desktop/CIFS_TEST" CIFS_MNT="/mnt/smb" DEV="enp0s3" cat <<EOF > ${CIFS_CRED} username=${CIFS_USER} password=${CIFS_PASS} domain=EXAMPLE.COM EOF unshare -n bash -c " mkdir -p ${CIFS_MNT} ip netns attach root 1 ip link add eth0 type veth peer veth0 netns root ip link set eth0 up ip -n root link set veth0 up ip addr add 192.168.0.2/24 dev eth0 ip -n root addr add 192.168.0.1/24 dev veth0 ip route add default via 192.168.0.1 dev eth0 ip netns exec root sysctl net.ipv4.ip_forward=1 ip netns exec root iptables -t nat -A POSTROUTING -s 192.168.0.2 -o ${DEV} -j MASQUERADE mount -t cifs ${CIFS_PATH} ${CIFS_MNT} -o vers=3.0,sec=ntlmssp,credentials=${CIFS_CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1 touch ${CIFS_MNT}/a.txt ip netns exec root iptables -t nat -D POSTROUTING -s 192.168.0.2 -o ${DEV} -j MASQUERADE " umount ${CIFS_MNT} [2]: ref_tracker: net notrefcnt@000000004bbc008d has 1/1 users at sk_alloc (./include/net/net_namespace.h:339 net/core/sock.c:2227) inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1576) generic_ip_connect (fs/smb/client/connect.c:3075) cifs_get_tcp_session.part.0 (fs/smb/client/connect.c:3160 fs/smb/client/connect.c:1798) cifs_mount_get_session (fs/smb/client/trace.h:959 fs/smb/client/connect.c:3366) dfs_mount_share (fs/smb/client/dfs.c:63 fs/smb/client/dfs.c:285) cifs_mount (fs/smb/client/connect.c:3622) cifs_smb3_do_mount (fs/smb/client/cifsfs.c:949) smb3_get_tree (fs/smb/client/fs_context.c:784 fs/smb/client/fs_context.c:802 fs/smb/client/fs_context.c:794) vfs_get_tree (fs/super.c:1800) path_mount (fs/namespace.c:3508 fs/namespace.c:3834) __x64_sys_mount (fs/namespace.c:3848 fs/namespace.c:4057 fs/namespace.c:4034 fs/namespace.c:4034) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Tom Talpey <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Julian Vetter <[email protected]> Date: Thu Oct 10 14:46:01 2024 +0200 sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML [ Upstream commit ad6639f143a0b42d7fb110ad14f5949f7c218890 ] When building for the UM arch and neither INDIRECT_IOMEM=y, nor HAS_IOMEM=y is selected, it will fall back to the implementations from asm-generic/io.h for IO memcpy. But these fall-back functions just do a memcpy. So, instead of depending on UML, add dependency on 'HAS_IOMEM || INDIRECT_IOMEM'. Reviewed-by: Yann Sionneau <[email protected]> Signed-off-by: Julian Vetter <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Yuan Can <[email protected]> Date: Thu Oct 17 09:38:12 2024 +0800 vDPA/ifcvf: Fix pci_read_config_byte() return code handling [ Upstream commit 7f8825b2a78ac392d3fbb3a2e65e56d9e39d75e9 ] ifcvf_init_hw() uses pci_read_config_byte() that returns PCIBIOS_* codes. The error handling, however, assumes the codes are normal errnos because it checks for < 0. Convert the error check to plain non-zero check. Fixes: 5a2414bc454e ("virtio: Intel IFC VF driver for VDPA") Signed-off-by: Yuan Can <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Acked-by: Zhu Lingshan <[email protected]> Signed-off-by: Sasha Levin <[email protected]>