Author: Shirish S <[email protected]> Date: Fri Mar 11 20:30:17 2022 +0530 amd/display: set backlight only if required commit 4052287a75eb3fc0f487fcc5f768a38bede455c8 upstream. [Why] comparing pwm bl values (coverted) with user brightness(converted) levels in commit_tail leads to continuous setting of backlight via dmub as they don't to match. This leads overdrive in queuing of commands to DMCU that sometimes lead to depending on load on DMCU fw: "[drm:dc_dmub_srv_wait_idle] *ERROR* Error waiting for DMUB idle: status=3" [How] Store last successfully set backlight value and compare with it instead of pwm reads which is not what we should compare with. Signed-off-by: Shirish S <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Phil Auld <[email protected]> Date: Thu Mar 31 11:39:26 2022 -0400 arch/arm64: Fix topology initialization for core scheduling [ Upstream commit 5524cbb1bfcdff0cad0aaa9f94e6092002a07259 ] Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() to transfer the toplogy to the various cpu masks. This needs to be done before the call to notify_cpu_starting() which tells the scheduler about each cpu found, otherwise the core scheduling data structures are setup in a way that does not match the actual topology. With smt_mask not setup correctly we bail on `cpumask_weight(smt_mask) == 1` for !leaders in: notify_cpu_starting() cpuhp_invoke_callback_range() sched_cpu_starting() sched_core_cpu_starting() which leads to rq->core not being correctly set for !leader-rq's. Without this change stress-ng (which enables core scheduling in its prctl tests in newer versions -- i.e. with PR_SCHED_CORE support) causes a warning and then a crash (trimmed for legibility): [ 1853.805168] ------------[ cut here ]------------ [ 1853.809784] task_rq(b)->core != rq->core [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 ... [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 ... [ 1854.231256] Call trace: [ 1854.233689] pick_next_task+0x3dc/0x81c [ 1854.237512] __schedule+0x10c/0x4cc [ 1854.240988] schedule_idle+0x34/0x54 Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock") Signed-off-by: Phil Auld <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Tested-by: Dietmar Eggemann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Chanho Park <[email protected]> Date: Thu Apr 7 18:11:28 2022 +0900 arm64: Add part number for Arm Cortex-A78AE commit 83bea32ac7ed37bbda58733de61fc9369513f9f9 upstream. Add the MIDR part number info for the Arm Cortex-A78AE[1] and add it to spectre-BHB affected list[2]. [1]: https://developer.arm.com/Processors/Cortex-A78AE [2]: https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB Cc: Catalin Marinas <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Cc: James Morse <[email protected]> Signed-off-by: Chanho Park <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Guo Ren <[email protected]> Date: Thu Apr 7 15:33:20 2022 +0800 arm64: patch_text: Fixup last cpu should be master commit 31a099dbd91e69fcab55eef4be15ed7a8c984918 upstream. These patch_text implementations are using stop_machine_cpuslocked infrastructure with atomic cpu_count. The original idea: When the master CPU patch_text, the others should wait for it. But current implementation is using the first CPU as master, which couldn't guarantee the remaining CPUs are waiting. This patch changes the last CPU as the master to solve the potential risk. Fixes: ae16480785de ("arm64: introduce interfaces to hotpatch kernel and module code") Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Christian Lamparter <[email protected]> Date: Sat Mar 19 21:11:02 2022 +0100 ata: sata_dwc_460ex: Fix crash due to OOB write commit 7aa8104a554713b685db729e66511b93d989dd6a upstream. the driver uses libata's "tag" values from in various arrays. Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32, the value of the SATA_DWC_QCMD_MAX needs to account for that. Otherwise ATA_TAG_INTERNAL usage cause similar crashes like this as reported by Tice Rex on the OpenWrt Forum and reproduced (with symbols) here: | BUG: Kernel NULL pointer dereference at 0x00000000 | Faulting instruction address: 0xc03ed4b8 | Oops: Kernel access of bad area, sig: 11 [#1] | BE PAGE_SIZE=4K PowerPC 44x Platform | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0 | NIP: c03ed4b8 LR: c03d27e8 CTR: c03ed36c | REGS: cfa59950 TRAP: 0300 Not tainted (5.4.163) | MSR: 00021000 <CE,ME> CR: 42000222 XER: 00000000 | DEAR: 00000000 ESR: 00000000 | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...] | [..] | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254 | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc | Call Trace: | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable) | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524 | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0 | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204 | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130 | [...] This is because sata_dwc_dma_xfer_complete() NULLs the dma_pending's next neighbour "chan" (a *dma_chan struct) in this '32' case right here (line ~735): > hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE; Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes the NULL'd hsdevp->chan to the dmaengine_slave_config() which then causes the crash. With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1. This avoids the OOB. But please note, there was a worthwhile discussion on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not be a "fake" 33 command-long queue size. Ideally, the dw driver should account for the ATA_TAG_INTERNAL. In Damien Le Moal's words: "... having looked at the driver, it is a bigger change than just faking a 33rd "tag" that is in fact not a command tag at all." Fixes: 28361c403683c ("libata: add extra internal command") Cc: [email protected] # 4.18+ BugLink: https://github.com/openwrt/openwrt/issues/9505 Signed-off-by: Christian Lamparter <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Venkateswara Naralasetty <[email protected]> Date: Wed Jan 19 14:49:33 2022 +0530 ath11k: fix kernel panic during unload/load ath11k modules [ Upstream commit 22b59cb965f79ee1accf83172441c9ca0ecb632a ] Call netif_napi_del() from ath11k_ahb_free_ext_irq() to fix the following kernel panic when unload/load ath11k modules for few iterations. [ 971.201365] Unable to handle kernel paging request at virtual address 6d97a208 [ 971.204227] pgd = 594c2919 [ 971.211478] [6d97a208] *pgd=00000000 [ 971.214120] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 971.412024] CPU: 2 PID: 4435 Comm: insmod Not tainted 5.4.89 #0 [ 971.434256] Hardware name: Generic DT based system [ 971.440165] PC is at napi_by_id+0x10/0x40 [ 971.445019] LR is at netif_napi_add+0x160/0x1dc [ 971.743127] (napi_by_id) from [<807d89a0>] (netif_napi_add+0x160/0x1dc) [ 971.751295] (netif_napi_add) from [<7f1209ac>] (ath11k_ahb_config_irq+0xf8/0x414 [ath11k_ahb]) [ 971.759164] (ath11k_ahb_config_irq [ath11k_ahb]) from [<7f12135c>] (ath11k_ahb_probe+0x40c/0x51c [ath11k_ahb]) [ 971.768567] (ath11k_ahb_probe [ath11k_ahb]) from [<80666864>] (platform_drv_probe+0x48/0x94) [ 971.779670] (platform_drv_probe) from [<80664718>] (really_probe+0x1c8/0x450) [ 971.789389] (really_probe) from [<80664cc4>] (driver_probe_device+0x15c/0x1b8) [ 971.797547] (driver_probe_device) from [<80664f60>] (device_driver_attach+0x44/0x60) [ 971.805795] (device_driver_attach) from [<806650a0>] (__driver_attach+0x124/0x140) [ 971.814822] (__driver_attach) from [<80662adc>] (bus_for_each_dev+0x58/0xa4) [ 971.823328] (bus_for_each_dev) from [<80663a2c>] (bus_add_driver+0xf0/0x1e8) [ 971.831662] (bus_add_driver) from [<806658a4>] (driver_register+0xa8/0xf0) [ 971.839822] (driver_register) from [<8030269c>] (do_one_initcall+0x78/0x1ac) [ 971.847638] (do_one_initcall) from [<80392524>] (do_init_module+0x54/0x200) [ 971.855968] (do_init_module) from [<803945b0>] (load_module+0x1e30/0x1ffc) [ 971.864126] (load_module) from [<803948b0>] (sys_init_module+0x134/0x17c) [ 971.871852] (sys_init_module) from [<80301000>] (ret_fast_syscall+0x0/0x50) Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.6.0.1-00760-QCAHKSWPL_SILICONZ-1 Signed-off-by: Venkateswara Naralasetty <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Kalle Valo <[email protected]> Date: Thu Jan 27 11:01:17 2022 +0200 ath11k: mhi: use mhi_sync_power_up() [ Upstream commit 3df6d74aedfdca919cca475d15dfdbc8b05c9e5d ] If amss.bin was missing ath11k would crash during 'rmmod ath11k_pci'. The reason for that was that we were using mhi_async_power_up() which does not check any errors. But mhi_sync_power_up() on the other hand does check for errors so let's use that to fix the crash. I was not able to find a reason why an async version was used. ath11k_mhi_start() (which enables state ATH11K_MHI_POWER_ON) is called from ath11k_hif_power_up(), which can sleep. So sync version should be safe to use here. [ 145.569731] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI [ 145.569789] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 145.569843] CPU: 2 PID: 1628 Comm: rmmod Kdump: loaded Tainted: G W 5.16.0-wt-ath+ #567 [ 145.569898] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 [ 145.569956] RIP: 0010:ath11k_hal_srng_access_begin+0xb5/0x2b0 [ath11k] [ 145.570028] Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 ec 01 00 00 48 8b ab a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 <0f> b6 14 02 48 89 e8 83 e0 07 83 c0 03 45 85 ed 75 48 38 d0 7c 08 [ 145.570089] RSP: 0018:ffffc900025d7ac0 EFLAGS: 00010246 [ 145.570144] RAX: dffffc0000000000 RBX: ffff88814fca2dd8 RCX: 1ffffffff50cb455 [ 145.570196] RDX: 0000000000000000 RSI: ffff88814fca2dd8 RDI: ffff88814fca2e80 [ 145.570252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa8659497 [ 145.570329] R10: fffffbfff50cb292 R11: 0000000000000001 R12: ffff88814fca0000 [ 145.570410] R13: 0000000000000000 R14: ffff88814fca2798 R15: ffff88814fca2dd8 [ 145.570465] FS: 00007fa399988540(0000) GS:ffff888233e00000(0000) knlGS:0000000000000000 [ 145.570519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 145.570571] CR2: 00007fa399b51421 CR3: 0000000137898002 CR4: 00000000003706e0 [ 145.570623] Call Trace: [ 145.570675] <TASK> [ 145.570727] ? ath11k_ce_tx_process_cb+0x34b/0x860 [ath11k] [ 145.570797] ath11k_ce_tx_process_cb+0x356/0x860 [ath11k] [ 145.570864] ? tasklet_init+0x150/0x150 [ 145.570919] ? ath11k_ce_alloc_pipes+0x280/0x280 [ath11k] [ 145.570986] ? tasklet_clear_sched+0x42/0xe0 [ 145.571042] ? tasklet_kill+0xe9/0x1b0 [ 145.571095] ? tasklet_clear_sched+0xe0/0xe0 [ 145.571148] ? irq_has_action+0x120/0x120 [ 145.571202] ath11k_ce_cleanup_pipes+0x45a/0x580 [ath11k] [ 145.571270] ? ath11k_pci_stop+0x10e/0x170 [ath11k_pci] [ 145.571345] ath11k_core_stop+0x8a/0xc0 [ath11k] [ 145.571434] ath11k_core_deinit+0x9e/0x150 [ath11k] [ 145.571499] ath11k_pci_remove+0xd2/0x260 [ath11k_pci] [ 145.571553] pci_device_remove+0x9a/0x1c0 [ 145.571605] __device_release_driver+0x332/0x660 [ 145.571659] driver_detach+0x1e7/0x2c0 [ 145.571712] bus_remove_driver+0xe2/0x2d0 [ 145.571772] pci_unregister_driver+0x21/0x250 [ 145.571826] __do_sys_delete_module+0x30a/0x4b0 [ 145.571879] ? free_module+0xac0/0xac0 [ 145.571933] ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370 [ 145.571986] ? syscall_enter_from_user_mode+0x1d/0x50 [ 145.572039] ? lockdep_hardirqs_on+0x79/0x100 [ 145.572097] do_syscall_64+0x3b/0x90 [ 145.572153] entry_SYSCALL_64_after_hwframe+0x44/0xae Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2 Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Kalle Valo <[email protected]> Date: Thu Jan 27 11:01:16 2022 +0200 ath11k: pci: fix crash on suspend if board file is not found [ Upstream commit b4f4c56459a5c744f7f066b9fc2b54ea995030c5 ] Mario reported that the kernel was crashing on suspend if ath11k was not able to find a board file: [ 473.693286] PM: Suspending system (s2idle) [ 473.693291] printk: Suspending console(s) (use no_console_suspend to debug) [ 474.407787] BUG: unable to handle page fault for address: 0000000000002070 [ 474.407791] #PF: supervisor read access in kernel mode [ 474.407794] #PF: error_code(0x0000) - not-present page [ 474.407798] PGD 0 P4D 0 [ 474.407801] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 474.407805] CPU: 2 PID: 2350 Comm: kworker/u32:14 Tainted: G W 5.16.0 #248 [...] [ 474.407868] Call Trace: [ 474.407870] <TASK> [ 474.407874] ? _raw_spin_lock_irqsave+0x2a/0x60 [ 474.407882] ? lock_timer_base+0x72/0xa0 [ 474.407889] ? _raw_spin_unlock_irqrestore+0x29/0x3d [ 474.407892] ? try_to_del_timer_sync+0x54/0x80 [ 474.407896] ath11k_dp_rx_pktlog_stop+0x49/0xc0 [ath11k] [ 474.407912] ath11k_core_suspend+0x34/0x130 [ath11k] [ 474.407923] ath11k_pci_pm_suspend+0x1b/0x50 [ath11k_pci] [ 474.407928] pci_pm_suspend+0x7e/0x170 [ 474.407935] ? pci_pm_freeze+0xc0/0xc0 [ 474.407939] dpm_run_callback+0x4e/0x150 [ 474.407947] __device_suspend+0x148/0x4c0 [ 474.407951] async_suspend+0x20/0x90 dmesg-efi-164255130401001: Oops#1 Part1 [ 474.407955] async_run_entry_fn+0x33/0x120 [ 474.407959] process_one_work+0x220/0x3f0 [ 474.407966] worker_thread+0x4a/0x3d0 [ 474.407971] kthread+0x17a/0x1a0 [ 474.407975] ? process_one_work+0x3f0/0x3f0 [ 474.407979] ? set_kthread_struct+0x40/0x40 [ 474.407983] ret_from_fork+0x22/0x30 [ 474.407991] </TASK> The issue here is that board file loading happens after ath11k_pci_probe() succesfully returns (ath11k initialisation happends asynchronously) and the suspend handler is still enabled, of course failing as ath11k is not properly initialised. Fix this by checking ATH11K_FLAG_QMI_FAIL during both suspend and resume. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2 Reported-by: Mario Limonciello <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215504 Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Zekun Shen <[email protected]> Date: Sun Dec 26 22:12:13 2021 -0500 ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 [ Upstream commit 564d4eceb97eaf381dd6ef6470b06377bb50c95a ] The bug was found during fuzzing. Stacktrace locates it in ath5k_eeprom_convert_pcal_info_5111. When none of the curve is selected in the loop, idx can go up to AR5K_EEPROM_N_PD_CURVES. The line makes pd out of bound. pd = &chinfo[pier].pd_curves[idx]; There are many OOB writes using pd later in the code. So I added a sanity check for idx. Checks for other loops involving AR5K_EEPROM_N_PD_CURVES are not needed as the loop index is not used outside the loops. The patch is NOT tested with real device. The following is the fuzzing report BUG: KASAN: slab-out-of-bounds in ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] Write of size 1 at addr ffff8880174a4d60 by task modprobe/214 CPU: 0 PID: 214 Comm: modprobe Not tainted 5.6.0 #1 Call Trace: dump_stack+0x76/0xa0 print_address_description.constprop.0+0x16/0x200 ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] __kasan_report.cold+0x37/0x7c ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] kasan_report+0xe/0x20 ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] ? apic_timer_interrupt+0xa/0x20 ? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k] ? ath5k_pci_eeprom_read+0x228/0x3c0 [ath5k] ath5k_eeprom_init+0x2513/0x6290 [ath5k] ? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k] ? usleep_range+0xb8/0x100 ? apic_timer_interrupt+0xa/0x20 ? ath5k_eeprom_read_pcal_info_2413+0x2f20/0x2f20 [ath5k] ath5k_hw_init+0xb60/0x1970 [ath5k] ath5k_init_ah+0x6fe/0x2530 [ath5k] ? kasprintf+0xa6/0xe0 ? ath5k_stop+0x140/0x140 [ath5k] ? _dev_notice+0xf6/0xf6 ? apic_timer_interrupt+0xa/0x20 ath5k_pci_probe.cold+0x29a/0x3d6 [ath5k] ? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k] ? mutex_lock+0x89/0xd0 ? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k] local_pci_probe+0xd3/0x160 pci_device_probe+0x23f/0x3e0 ? pci_device_remove+0x280/0x280 ? pci_device_remove+0x280/0x280 really_probe+0x209/0x5d0 Reported-by: Brendan Dolan-Gavitt <[email protected]> Signed-off-by: Zekun Shen <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Luiz Augusto von Dentz <[email protected]> Date: Thu Mar 3 13:11:57 2022 -0800 Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg} [ Upstream commit 9b392e0e0b6d026da5a62bb79a08f32e27af858e ] This fixes attemting to print hdev->name directly which causes them to print an error: kernel: read_version:367: (efault): sock 000000006a3008f2 Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Luiz Augusto von Dentz <[email protected]> Date: Fri Mar 11 13:19:33 2022 -0800 Bluetooth: Fix use after free in hci_send_acl [ Upstream commit f63d24baff787e13b723d86fe036f84bdbc35045 ] This fixes the following trace caused by receiving HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without first checking if conn->type is in fact AMP_LINK and in case it is do properly cleanup upper layers with hci_disconn_cfm: ================================================================== BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50 Read of size 8 at addr ffff88800e404818 by task bluetoothd/142 CPU: 0 PID: 142 Comm: bluetoothd Not tainted 5.17.0-rc5-00006-gda4022eeac1a #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 kasan_report.cold+0x7f/0x11b hci_send_acl+0xaba/0xc50 l2cap_do_send+0x23f/0x3d0 l2cap_chan_send+0xc06/0x2cc0 l2cap_sock_sendmsg+0x201/0x2b0 sock_sendmsg+0xdc/0x110 sock_write_iter+0x20f/0x370 do_iter_readv_writev+0x343/0x690 do_iter_write+0x132/0x640 vfs_writev+0x198/0x570 do_writev+0x202/0x280 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015 RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77 R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580 RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001 </TASK> R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0 Allocated by task 45: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 hci_chan_create+0x9a/0x2f0 l2cap_conn_add.part.0+0x1a/0xdc0 l2cap_connect_cfm+0x236/0x1000 le_conn_complete_evt+0x15a7/0x1db0 hci_le_conn_complete_evt+0x226/0x2c0 hci_le_meta_evt+0x247/0x450 hci_event_packet+0x61b/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30 Freed by task 45: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xfb/0x130 kfree+0xac/0x350 hci_conn_cleanup+0x101/0x6a0 hci_conn_del+0x27e/0x6c0 hci_disconn_phylink_complete_evt+0xe0/0x120 hci_event_packet+0x812/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff88800c0f0500 The buggy address is located 24 bytes inside of which belongs to the cache kmalloc-128 of size 128 The buggy address belongs to the page: 128-byte region [ffff88800c0f0500, ffff88800c0f0580) flags: 0x100000000000200(slab|node=0|zone=1) page:00000000fe45cd86 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc0f0 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 raw: 0100000000000200 ffffea00003a2c80 dead000000000004 ffff8880078418c0 page dumped because: kasan: bad access detected ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc Memory state around the buggy address: >ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ================================================================== ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-by: Sönke Huster <[email protected]> Tested-by: Sönke Huster <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Minghao Chi (CGEL ZTE) <[email protected]> Date: Fri Feb 25 07:41:52 2022 +0000 Bluetooth: use memset avoid memory leaks [ Upstream commit d3715b2333e9a21692ba16ef8645eda584a9515d ] Use memset to initialize structs to prevent memory leaks in l2cap_ecred_connect Reported-by: Zeal Robot <[email protected]> Signed-off-by: Minghao Chi (CGEL ZTE) <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Michael Chan <[email protected]> Date: Sat Mar 5 03:54:39 2022 -0500 bnxt_en: Eliminate unintended link toggle during FW reset [ Upstream commit 7c492a2530c1f05441da541307c2534230dfd59b ] If the flow control settings have been changed, a subsequent FW reset may cause the ethernet link to toggle unnecessarily. This link toggle will increase the down time by a few seconds. The problem is caused by bnxt_update_phy_setting() detecting a false mismatch in the flow control settings between the stored software settings and the current FW settings after the FW reset. This mismatch is caused by the AUTONEG bit added to link_info->req_flow_ctrl in an inconsistent way in bnxt_set_pauseparam() in autoneg mode. The AUTONEG bit should not be added to link_info->req_flow_ctrl. Reviewed-by: Colin Winegarden <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ray Jui <[email protected]> Date: Fri Apr 1 20:21:12 2022 -0400 bnxt_en: Prevent XDP redirect from running when stopping TX queue [ Upstream commit 27d4073f8d9af0340362554414f4961643a4f4de ] Add checks in the XDP redirect callback to prevent XDP from running when the TX ring is undergoing shutdown. Also remove redundant checks in the XDP redirect callback to validate the txr and the flag that indicates the ring supports XDP. The modulo arithmetic on 'tx_nr_rings_xdp' already guarantees the derived TX ring is an XDP ring. txr is also guaranteed to be valid after checking BNXT_STATE_OPEN and within RCU grace period. Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Vladimir Olovyannikov <[email protected]> Signed-off-by: Ray Jui <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Andy Gospodarek <[email protected]> Date: Fri Apr 1 20:21:11 2022 -0400 bnxt_en: reserve space inside receive page for skb_shared_info [ Upstream commit facc173cf700e55b2ad249ecbd3a7537f7315691 ] Insufficient space was being reserved in the page used for packet reception, so the interface MTU could be set too large to still have room for the contents of the packet when doing XDP redirect. This resulted in the following message when redirecting a packet between 3520 and 3822 bytes with an MTU of 3822: [311815.561880] XDP_WARN: xdp_update_frame_from_buff(line:200): Driver BUG: missing reserved tailroom Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Somnath Kotur <[email protected]> Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Andy Gospodarek <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Pavan Chebbi <[email protected]> Date: Fri Apr 1 20:21:10 2022 -0400 bnxt_en: Synchronize tx when xdp redirects happen on same ring [ Upstream commit 4f81def272de17dc4bbd89ac38f49b2676c9b3d2 ] If there are more CPUs than the number of TX XDP rings, multiple XDP redirects can select the same TX ring based on the CPU on which XDP redirect is called. Add locking when needed and use static key to decide whether to take the lock. Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jakub Sitnicki <[email protected]> Date: Sun Jan 30 12:55:17 2022 +0100 bpf: Make dst_port field in struct bpf_sock 16-bit wide [ Upstream commit 4421a582718ab81608d8486734c18083b822390d ] Menglong Dong reports that the documentation for the dst_port field in struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the field is a zero-padded 16-bit integer in network byte order. The value appears to the BPF user as if laid out in memory as so: offsetof(struct bpf_sock, dst_port) + 0 <port MSB> + 8 <port LSB> +16 0x00 +24 0x00 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if the offset into the field is 0. 32-bit wide loads from dst_port are especially confusing. The loaded value, after converting to host byte order with bpf_ntohl(dst_port), contains the port number in the upper 16-bits. Remove the confusion by splitting the field into two 16-bit fields. For backward compatibility, allow 32-bit wide loads from offsetof(struct bpf_sock, dst_port). While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port. Reported-by: Menglong Dong <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jakub Sitnicki <[email protected]> Date: Wed Feb 9 19:43:32 2022 +0100 bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream. remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order. First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide"). Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits. Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Maxim Mikityanskiy <[email protected]> Date: Wed Apr 6 15:41:12 2022 +0300 bpf: Support dual-stack sockets in bpf_tcp_check_syncookie [ Upstream commit 2e8702cc0cfa1080f29fd64003c00a3e24ac38de ] bpf_tcp_gen_syncookie looks at the IP version in the IP header and validates the address family of the socket. It supports IPv4 packets in AF_INET6 dual-stack sockets. On the other hand, bpf_tcp_check_syncookie looks only at the address family of the socket, ignoring the real IP version in headers, and validates only the packet size. This implementation has some drawbacks: 1. Packets are not validated properly, allowing a BPF program to trick bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4 socket. 2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end up receiving a SYNACK with the cookie, but the following ACK gets dropped. This patch fixes these issues by changing the checks in bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP version from the header is taken into account, and it is validated properly with address family. Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie") Signed-off-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Acked-by: Arthur Fabre <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Ethan Lien <[email protected]> Date: Mon Mar 7 18:00:04 2022 +0800 btrfs: fix qgroup reserve overflow the qgroup limit commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream. We use extent_changeset->bytes_changed in qgroup_reserve_data() to record how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the bytes_changed is set as "unsigned int", and it will overflow if we try to fallocate a range larger than 4GiB. The result is we reserve less bytes and eventually break the qgroup limit. Unlike regular buffered/direct write, which we use one changeset for each ordered extent, which can never be larger than 256M. For fallocate, we use one changeset for the whole range, thus it no longer respects the 256M per extent limit, and caused the problem. The following example test script reproduces the problem: $ cat qgroup-overflow.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT # Set qgroup limit to 2GiB. btrfs quota enable $MNT btrfs qgroup limit 2G $MNT # Try to fallocate a 3GiB file. This should fail. echo echo "Try to fallocate a 3GiB file..." fallocate -l 3G $MNT/3G.file # Try to fallocate a 5GiB file. echo echo "Try to fallocate a 5GiB file..." fallocate -l 5G $MNT/5G.file # See we break the qgroup limit. echo sync btrfs qgroup show -r $MNT umount $MNT When running the test: $ ./qgroup-overflow.sh (...) Try to fallocate a 3GiB file... fallocate: fallocate failed: Disk quota exceeded Try to fallocate a 5GiB file... qgroupid        rfer        excl    max_rfer --------        ----        ----    -------- 0/5          5.00GiB     5.00GiB     2.00GiB Since we have no control of how bytes_changed is used, it's better to set it to u64. CC: [email protected] # 4.14+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Ethan Lien <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kaiwen Hu <[email protected]> Date: Wed Mar 23 15:10:32 2022 +0800 btrfs: prevent subvol with swapfile from being deleted commit 60021bd754c6ca0addc6817994f20290a321d8d6 upstream. A subvolume with an active swapfile must not be deleted otherwise it would not be possible to deactivate it. After the subvolume is deleted, we cannot swapoff the swapfile in this deleted subvolume because the path is unreachable. The swapfile is still active and holding references, the filesystem cannot be unmounted. The test looks like this: mkfs.btrfs -f $dev > /dev/null mount $dev $mnt btrfs sub create $mnt/subvol touch $mnt/subvol/swapfile chmod 600 $mnt/subvol/swapfile chattr +C $mnt/subvol/swapfile dd if=/dev/zero of=$mnt/subvol/swapfile bs=1K count=4096 mkswap $mnt/subvol/swapfile swapon $mnt/subvol/swapfile btrfs sub delete $mnt/subvol swapoff $mnt/subvol/swapfile # failed: No such file or directory swapoff --all unmount $mnt # target is busy. To prevent above issue, we simply check that whether the subvolume contains any active swapfile, and stop the deleting process. This behavior is like snapshot ioctl dealing with a swapfile. CC: [email protected] # 5.4+ Reviewed-by: Robbie Ko <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Kaiwen Hu <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Vincent Mailhol <[email protected]> Date: Sun Mar 6 19:13:02 2022 +0900 can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len() [ Upstream commit 7a8cd7c0ee823a1cc893ab3feaa23e4b602bfb9a ] Function es58x_fd_rx_event() invokes the es58x_check_msg_len() macro: | ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len); While doing so, it dereferences an uninitialized variable: *rx_event_msg. This is actually harmless because es58x_check_msg_len() only uses preprocessor macros (sizeof() and __stringify()) on *rx_event_msg. c.f. [1]. Nonetheless, this pattern is confusing so the lines are reordered to make sure that rx_event_msg is correctly initialized. This patch also fixes a false positive warning reported by cppcheck: | cppcheck possible warnings: (new ones prefixed by >>, may not be real problems) | | In file included from drivers/net/can/usb/etas_es58x/es58x_fd.c: | >> drivers/net/can/usb/etas_es58x/es58x_fd.c:174:8: warning: Uninitialized variable: rx_event_msg [uninitvar] | ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len); | ^ [1] https://elixir.bootlin.com/linux/v5.16/source/drivers/net/can/usb/etas_es58x/es58x_core.h#L467 Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Vincent Mailhol <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Oliver Hartkopp <[email protected]> Date: Wed Mar 9 13:04:13 2022 +0100 can: isotp: set default value for N_As to 50 micro seconds [ Upstream commit 530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3 ] The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s). Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative. With this N_As/frame_txtime value the 'burst mode' is disabled by default. As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Xiubo Li <[email protected]> Date: Wed Mar 2 14:51:53 2022 +0800 ceph: fix inode reference leakage in ceph_get_snapdir() [ Upstream commit 322794d3355c33adcc4feace0045d85a8e4ed813 ] The ceph_get_inode() will search for or insert a new inode into the hash for the given vino, and return a reference to it. If new is non-NULL, its reference is consumed. We should release the reference when in error handing cases. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Xiubo Li <[email protected]> Date: Sat Mar 5 19:52:59 2022 +0800 ceph: fix memory leak in ceph_readdir when note_last_dentry returns error [ Upstream commit f639d9867eea647005dc824e0e24f39ffc50d4e4 ] Reset the last_readdir at the same time, and add a comment explaining why we don't free last_readdir when dir_emit returns false. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Avraham Stern <[email protected]> Date: Wed Feb 2 10:49:37 2022 +0200 cfg80211: don't add non transmitted BSS to 6GHz scanned channels [ Upstream commit 5666ee154f4696c011dfa8544aaf5591b6b87515 ] When adding 6GHz channels to scan request based on reported co-located APs, don't add channels that have only APs with "non-transmitted" BSSes if they only match the wildcard SSID since they will be found by probing the "transmitted" BSS. Signed-off-by: Avraham Stern <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d3644f292d00dfe0f42653269f2559e@changeid Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Maxime Ripard <[email protected]> Date: Fri Feb 25 15:35:25 2022 +0100 clk: Enforce that disjoints limits are invalid [ Upstream commit 10c46f2ea914202482d19cf80dcc9c321c9ff59b ] If we were to have two users of the same clock, doing something like: clk_set_rate_range(user1, 1000, 2000); clk_set_rate_range(user2, 3000, 4000); The second call would fail with -EINVAL, preventing from getting in a situation where we end up with impossible limits. However, this is never explicitly checked against and enforced, and works by relying on an undocumented behaviour of clk_set_rate(). Indeed, on the first clk_set_rate_range will make sure the current clock rate is within the new range, so it will be between 1000 and 2000Hz. On the second clk_set_rate_range(), it will consider (rightfully), that our current clock is outside of the 3000-4000Hz range, and will call clk_core_set_rate_nolock() to set it to 3000Hz. clk_core_set_rate_nolock() will then call clk_calc_new_rates() that will eventually check that our rate 3000Hz rate is outside the min 3000Hz max 2000Hz range, will bail out, the error will propagate and we'll eventually return -EINVAL. This solely relies on the fact that clk_calc_new_rates(), and in particular clk_core_determine_round_nolock(), won't modify the new rate allowing the error to be reported. That assumption won't be true for all drivers, and most importantly we'll break that assumption in a later patch. It can also be argued that we shouldn't even reach the point where we're calling clk_core_set_rate_nolock(). Let's make an explicit check for disjoints range before we're doing anything. Signed-off-by: Maxime Ripard <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sascha Hauer <[email protected]> Date: Wed Jan 26 15:55:46 2022 +0100 clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568 [ Upstream commit ff3187eabb5ce478d15b6ed62eb286756adefac3 ] The pixel clocks dclk_vop[012] can be clocked from hpll, vpll, gpll or cpll. gpll and cpll also drive many other clocks, so changing the dclk_vop[012] clocks could change these other clocks as well. Drop CLK_SET_RATE_PARENT to fix that. With this change the VOP2 driver can only adjust the pixel clocks with the divider between the PLL and the dclk_vop[012] which means the user may have to adjust the PLL clock to a suitable rate using the assigned-clock-rate device tree property. Signed-off-by: Sascha Hauer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Adam Wujek <[email protected]> Date: Fri Dec 3 14:12:07 2021 +0000 clk: si5341: fix reported clk_rate when output divider is 2 [ Upstream commit 2a8b539433e111c4de364237627ef219d2f6350a ] SI5341_OUT_CFG_RDIV_FORCE2 shall be checked first to distinguish whether a divider for a given output is set to 2 (SI5341_OUT_CFG_RDIV_FORCE2 is set) or the output is disabled (SI5341_OUT_CFG_RDIV_FORCE2 not set, SI5341_OUT_R_REG is set 0). Before the change, divider set to 2 (SI5341_OUT_R_REG set to 0) was interpreted as output is disabled. Signed-off-by: Adam Wujek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Robert Hancock <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Tony Lindgren <[email protected]> Date: Fri Feb 4 09:14:43 2022 +0200 clk: ti: Preserve node in ti_dt_clocks_register() [ Upstream commit 80864594ff2ad002e2755daf97d46ff0c86faf1f ] In preparation for making use of the clock-output-names, we want to keep node around in ti_dt_clocks_register(). This change should not needed as a fix currently. Signed-off-by: Tony Lindgren <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Pierre Gondois <[email protected]> Date: Tue Feb 8 09:01:09 2022 +0100 cpufreq: CPPC: Fix performance/frequency conversion [ Upstream commit ec1c7ad47664f964c1101fe555b6fde0cb124b38 ] CPUfreq governors request CPU frequencies using information on current CPU usage. The CPPC driver converts them to performance requests. Frequency targets are computed as: target_freq = (util / cpu_capacity) * max_freq target_freq is then clamped between [policy->min, policy->max]. The CPPC driver converts performance values to frequencies (and vice-versa) using cppc_cpufreq_perf_to_khz() and cppc_cpufreq_khz_to_perf(). These functions both use two different factors depending on the range of the input value. For cppc_cpufreq_khz_to_perf(): - (NOMINAL_PERF / NOMINAL_FREQ) or - (LOWEST_PERF / LOWEST_FREQ) and for cppc_cpufreq_perf_to_khz(): - (NOMINAL_FREQ / NOMINAL_PERF) or - ((NOMINAL_PERF - LOWEST_FREQ) / (NOMINAL_PERF - LOWEST_PERF)) This means: 1- the functions are not inverse for some values: (perf_to_khz(khz_to_perf(x)) != x) 2- cppc_cpufreq_perf_to_khz(LOWEST_PERF) can sometimes give a different value from LOWEST_FREQ due to integer approximation 3- it is implied that performance and frequency are proportional (NOMINAL_FREQ / NOMINAL_PERF) == (LOWEST_PERF / LOWEST_FREQ) This patch changes the conversion functions to an affine function. This fixes the 3 points above. Suggested-by: Lukasz Luba <[email protected]> Suggested-by: Morten Rasmussen <[email protected]> Signed-off-by: Pierre Gondois <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jordy Zomer <[email protected]> Date: Sat Jan 29 15:58:39 2022 +0100 dm ioctl: prevent potential spectre v1 gadget [ Upstream commit cd9c88da171a62c4b0f1c70e50c75845969fbc18 ] It appears like cmd could be a Spectre v1 gadget as it's supplied by a user and used as an array index. Prevent the contents of kernel memory from being leaked to userspace via speculative execution by using array_index_nospec. Signed-off-by: Jordy Zomer <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Mike Snitzer <[email protected]> Date: Tue Feb 22 13:28:12 2022 -0500 dm: requeue IO if mapping table not yet available [ Upstream commit fa247089de9936a46e290d4724cb5f0b845600f5 ] Update both bio-based and request-based DM to requeue IO if the mapping table not available. This race of IO being submitted before the DM device ready is so narrow, yet possible for initial table load given that the DM device's request_queue is created prior, that it best to requeue IO to handle this unlikely case. Reported-by: Zhang Yi <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Vinod Koul <[email protected]> Date: Thu Mar 10 10:13:20 2022 +0530 dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" commit d143f939a95696d38ff800ada14402fa50ebbd6c upstream. This reverts commit 455896c53d5b ("dmaengine: shdma: Fix runtime PM imbalance on error") as the patch wrongly reduced the count on error and did not bail out. So drop the count by reverting the patch . Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Miaoqian Lin <[email protected]> Date: Mon Apr 4 12:53:36 2022 +0000 dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe [ Upstream commit 2b04bd4f03bba021959ca339314f6739710f0954 ] This node pointer is returned by of_find_compatible_node() with refcount incremented. Calling of_node_put() to aovid the refcount leak. Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver") Signed-off-by: Miaoqian Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Lv Yunlong <[email protected]> Date: Wed Apr 6 21:04:43 2022 +0200 drbd: Fix five use after free bugs in get_initial_state [ Upstream commit aadb22ba2f656581b2f733deb3a467c48cc618f6 ] In get_initial_state, it calls notify_initial_state_done(skb,..) if cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(), the skb will be freed by nlmsg_free(skb). Then get_initial_state will goto out and the freed skb will be used by return value skb->len, which is a uaf bug. What's worse, the same problem goes even further: skb can also be freed in the notify_*_state_change -> notify_*_state calls below. Thus 4 additional uaf bugs happened. My patch lets the problem callee functions: notify_initial_state_done and notify_*_state_change return an error code if errors happen. So that the error codes could be propagated and the uaf bugs can be avoid. v2 reports a compilation warning. This v3 fixed this warning and built successfully in my local environment with no additional warnings. v2: https://lore.kernel.org/patchwork/patch/1435218/ Fixes: a29728463b254 ("drbd: Backport the "events2" command") Signed-off-by: Lv Yunlong <[email protected]> Reviewed-by: Christoph Böhmwalder <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Guilherme G. Piccoli <[email protected]> Date: Tue Mar 15 17:35:35 2022 -0300 Drivers: hv: vmbus: Fix potential crash on module unload [ Upstream commit 792f232d57ff28bbd5f9c4abe0466b23d5879dc8 ] The vmbus driver relies on the panic notifier infrastructure to perform some operations when a panic event is detected. Since vmbus can be built as module, it is required that the driver handles both registering and unregistering such panic notifier callback. After commit 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") though, the panic notifier registration is done unconditionally in the module initialization routine whereas the unregistering procedure is conditionally guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability is set. This patch fixes that by unconditionally unregistering the panic notifier in the module's exit routine as well. Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") Signed-off-by: Guilherme G. Piccoli <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Andrea Parri (Microsoft) <[email protected]> Date: Mon Mar 28 17:44:57 2022 +0200 Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() commit eaa03d34535872d29004cb5cf77dc9dec1ba9a25 upstream. Following the recommendation in Documentation/memory-barriers.txt for virtual machine guests. Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels") Signed-off-by: Andrea Parri (Microsoft) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Xin Xiong <[email protected]> Date: Fri Jan 21 15:46:23 2022 -0500 drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj [ Upstream commit dfced44f122c500004a48ecc8db516bb6a295a1b ] This issue takes place in an error path in amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into default case, the function simply returns -EINVAL, forgetting to decrement the reference count of a dma_fence obj, which is bumped earlier by amdgpu_cs_get_fence(). This may result in reference count leaks. Fix it by decreasing the refcount of specific object before returning the error code. Reviewed-by: Christian König <[email protected]> Signed-off-by: Xin Xiong <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Dale Zhao <[email protected]> Date: Tue Dec 28 16:50:28 2021 +0800 drm/amd/display: Add signal type check when verify stream backends same [ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ] [Why] For allow eDP hot-plug feature, the stream signal may change to VIRTUAL when plug-out and back to eDP when plug-in. OS will still setPathMode with same timing for each plugging, but eDP gets no stream update as we don't check signal type changing back as keeping it VIRTUAL. It's also unsafe for future cases that stream signal is switched with same timing. [How] Check stream signal type change include previous HDMI signal case. Reviewed-by: Aric Cyr <[email protected]> Acked-by: Wayne Lin <[email protected]> Signed-off-by: Dale Zhao <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Yongzhi Liu <[email protected]> Date: Fri Jan 21 11:26:13 2022 +0000 drm/amd/display: Fix memory leak [ Upstream commit 5d5c6dba2b43e28845d7d7ed32a36802329a5f52 ] [why] Resource release is needed on the error handling path to prevent memory leak. [how] Fix this by adding kfree on the error handling path. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Yongzhi Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Nicholas Kazlauskas <[email protected]> Date: Sun Jan 23 13:20:04 2022 -0500 drm/amd/display: Use PSR version selected during set_psr_caps [ Upstream commit b80ddeb29d9df449f875f0b6f5de08d7537c02b8 ] [Why] If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps. This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_psr, which can result in DMCUB hanging if we pass in an unsupported PSR version number. [How] Fix the hang by using link->psr_settings.psr_version directly during amdgpu_dm_link_setup_psr. Tested-by: Daniel Wheeler <[email protected]> Reviewed-by: Anthony Koo <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Benjamin Marty <[email protected]> Date: Wed Mar 23 22:08:26 2022 +0100 drm/amdgpu/display: change pipe policy for DCN 2.1 commit 879791ad8bf3dc5453061cad74776a617b6e3319 upstream. Fixes crash on MST Hub disconnect. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1849 Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for multi-display pipe split") Signed-off-by: Benjamin Marty <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Alex Deucher <[email protected]> Date: Fri Apr 1 11:08:48 2022 -0400 drm/amdgpu/smu10: fix SoC/fclk units in auto mode commit 2f25d8ce09b7ba5d769c132ba3d4eb84a941d2cb upstream. SMU takes clock limits in Mhz units. socclk and fclk were using 10 khz units in some cases. Switch to Mhz units. Fixes higher than required SoC clocks. Fixes: 97cf32996c46d9 ("drm/amd/pm: Removed fixed clock in auto mode DPM") Reviewed-by: Paul Menzel <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Emily Deng <[email protected]> Date: Mon Mar 21 16:25:24 2022 +0800 drm/amdgpu/vcn: Fix the register setting for vcn1 commit 02fc996d5098f4c3f65bdf6cdb6b28e3f29ba789 upstream. Correct the code error for setting register UVD_GFX10_ADDR_CONFIG. Need to use inst_idx, or it only will set VCN0. Signed-off-by: Emily Deng <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Alex Deucher <[email protected]> Date: Fri Mar 25 11:53:39 2022 -0400 drm/amdgpu: don't use BACO for reset in S3 commit ebc002e3ee78409c42156e62e4e27ad1d09c5a75 upstream. Seems to cause a reboots or hangs on some systems. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953 Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Dan Carpenter <[email protected]> Date: Wed Mar 16 11:41:48 2022 +0300 drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() [ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ] This post-op should be a pre-op so that we do not pass -1 as the bit number to test_bit(). The current code will loop downwards from 63 to -1. After changing to a pre-op, it loops from 63 to 0. Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Rajneesh Bhardwaj <[email protected]> Date: Thu Feb 3 21:18:21 2022 -0500 drm/amdgpu: Fix recursive locking warning [ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ] Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.000003] WARNING: possible recursive locking detected [ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.000004] -------------------------------------------- [ +0.000002] python/4822 is trying to acquire lock: [ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000017] other info that might help us debug this: [ +0.000002] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000002] ---- [ +0.000002] lock(reservation_ww_class_mutex); [ +0.000004] lock(reservation_ww_class_mutex); [ +0.000003] *** DEADLOCK *** [ +0.000002] May be due to missing lock nesting notation [ +0.000003] 7 locks held by python/4822: [ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: ffffb2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.000003] Call Trace: [ +0.000003] dump_stack+0x6d/0x89 [ +0.000010] __lock_acquire+0xb93/0x1a90 [ +0.000009] lock_acquire+0x25d/0x2d0 [ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.000005] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.000006] ? __fget_files+0x107/0x1e0 [ +0.000007] __x64_sys_ioctl+0x8b/0xd0 [ +0.000007] do_syscall_64+0x36/0x70 [ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000007] RIP: 0033:0x7fbff90a7317 [ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX: 00007fbff90a7317 [ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI: 0000000000000004 [ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09: 00007fbcc402d880 [ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12: 00000000c0184b18 [ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15: 00007fbcc402d820 Cc: Christian König <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Lee Jones <[email protected]> Date: Thu Mar 31 13:21:17 2022 +0100 drm/amdkfd: Create file descriptor after client is added to smi_clients list commit e79a2398e1b2d47060474dca291542368183bc0f upstream. This ensures userspace cannot prematurely clean-up the client before it is fully initialised which has been proven to cause issues in the past. Cc: Felix Kuehling <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Lee Jones <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Philip Yang <[email protected]> Date: Mon Jan 24 16:40:44 2022 -0500 drm/amdkfd: Don't take process mutex for svm ioctls [ Upstream commit ac7c48c0cce00d03b3c95fddcccb0a45257e33e3 ] SVM ioctls take proper svms->lock to handle race conditions, don't need take process mutex to serialize ioctls. This also fixes circular locking warning: WARNING: possible circular locking dependency detected Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); *** DEADLOCK *** Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Alex Deucher <[email protected]> Date: Fri Feb 18 15:40:12 2022 -0500 drm/amdkfd: make CRAT table missing message informational only [ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ] The driver has a fallback so make the message informational rather than a warning. The driver has a fallback if the Component Resource Association Table (CRAT) is missing, so make this informational now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906 Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Yongzhi Liu <[email protected]> Date: Sun Jan 23 23:20:35 2022 -0800 drm/bridge: Add missing pm_runtime_put_sync [ Upstream commit 46f47807738441e354873546dde0b000106c068a ] pm_runtime_get_sync() will increase the rumtime PM counter even when it returns an error. Thus a pairing decrement is needed to prevent refcount leak. Fix this by replacing this API with pm_runtime_resume_and_get(), which will not change the runtime PM counter on error. Besides, a matching decrement is needed on the error handling path to keep the counter balanced. Signed-off-by: Yongzhi Liu <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Signed-off-by: Robert Foss <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Liu Ying <[email protected]> Date: Fri Jan 28 17:19:44 2022 +0800 drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe [ Upstream commit e8083acc3f8cc2097917018e947fd4c857f60454 ] In dw_hdmi_imx_probe(), if error happens after dw_hdmi_probe() returns successfully, dw_hdmi_remove() should be called where necessary as bailout. Fixes: c805ec7eb210 ("drm/imx: dw_hdmi-imx: move initialization into probe") Cc: Philipp Zabel <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Sascha Hauer <[email protected]> Cc: Pengutronix Kernel Team <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: NXP Linux Team <[email protected]> Signed-off-by: Liu Ying <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: José Expósito <[email protected]> Date: Sat Jan 8 17:52:30 2022 +0100 drm/imx: Fix memory leak in imx_pd_connector_get_modes [ Upstream commit bce81feb03a20fca7bbdd1c4af16b4e9d5c0e1d3 ] Avoid leaking the display mode variable if of_get_drm_display_mode fails. Fixes: 76ecd9c9fb24 ("drm/imx: parallel-display: check return code from of_get_drm_display_mode()") Addresses-Coverity-ID: 1443943 ("Resource leak") Signed-off-by: José Expósito <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Jiasheng Jiang <[email protected]> Date: Wed Jan 5 15:47:29 2022 +0800 drm/imx: imx-ldb: Check for null pointer after calling kmemdup [ Upstream commit 8027a9ad9b3568c5eb49c968ad6c97f279d76730 ] As the possible failure of the allocation, kmemdup() may return NULL pointer. Therefore, it should be better to check the return value of kmemdup() and return error if fails. Fixes: dc80d7038883 ("drm/imx-ldb: Add support to drm-bridge") Signed-off-by: Jiasheng Jiang <[email protected]> Signed-off-by: Philipp Zabel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Daniel Thompson <[email protected]> Date: Tue Feb 1 17:47:32 2022 +0000 drm/msm/dsi: Remove spurious IRQF_ONESHOT flag [ Upstream commit 24b176d8827d167ac3b379317f60c0985f6e95aa ] Quoting the header comments, IRQF_ONESHOT is "Used by threaded interrupts which need to keep the irq line disabled until the threaded handler has been run.". When applied to an interrupt that doesn't request a threaded irq then IRQF_ONESHOT has a lesser known (undocumented?) side effect, which it to disable the forced threading of irqs (and for "normal" kernels it is a nop). In this case I can find no evidence that suppressing forced threading is intentional. Had it been intentional then a driver must adopt the raw_spinlock API in order to avoid deadlocks on PREEMPT_RT kernels (and avoid calling any kernel API that uses regular spinlocks). Fix this by removing the spurious additional flag. This change is required for my Snapdragon 7cx Gen2 tablet to boot-to-GUI with PREEMPT_RT enabled. Signed-off-by: Daniel Thompson <[email protected]> Reviewed-by: Dmitry Baryshkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Baryshkov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Karol Herbst <[email protected]> Date: Tue Mar 22 13:48:00 2022 +0100 drm/nouveau/pmu: Add missing callbacks for Tegra devices commit 38d4e5cf5b08798f093374e53c2f4609d5382dd5 upstream. Fixes a crash booting on those platforms with nouveau. Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence") Cc: Ben Skeggs <[email protected]> Cc: Karol Herbst <[email protected]> Cc: [email protected] Cc: [email protected] Cc: <[email protected]> # v5.17+ Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Lyude Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Daniel Mack <[email protected]> Date: Thu Mar 17 23:55:37 2022 +0100 drm/panel: ili9341: fix optional regulator handling commit d14eb80e27795b7b20060f7b151cdfe39722a813 upstream. If the optional regulator lookup fails, reset the pointer to NULL. Other functions such as mipi_dbi_poweron_reset_conditional() only do a NULL pointer check and will otherwise dereference the error pointer. Fixes: 5a04227326b04c15 ("drm/panel: Add ilitek ili9341 panel driver") Signed-off-by: Daniel Mack <[email protected]> Cc: [email protected] Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yongzhi Liu <[email protected]> Date: Fri Jan 28 05:41:02 2022 -0800 drm/v3d: fix missing unlock [ Upstream commit e57c1a3bd5e8e0c7181f65ae55581f0236a8f284 ] [why] Unlock is needed on the error handling path to prevent dead lock. v3d_submit_cl_ioctl and v3d_submit_csd_ioctl is missing unlock. [how] Fix this by changing goto target on the error handling path. So changing the goto to target an error handling path that includes drm_gem_unlock reservations. Signed-off-by: Yongzhi Liu <[email protected]> Reviewed-by: Melissa Wen <[email protected]> Signed-off-by: Melissa Wen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Anisse Astier <[email protected]> Date: Wed Dec 29 23:22:00 2021 +0100 drm: Add orientation quirk for GPD Win Max [ Upstream commit 0b464ca3e0dd3cec65f28bc6d396d82f19080f69 ] Panel is 800x1280, but mounted on a laptop form factor, sideways. Signed-off-by: Anisse Astier <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Thomas Zimmermann <[email protected]> Date: Mon Apr 4 21:44:02 2022 +0200 fbdev: Fix unregistering of framebuffers without device commit 0f525289ff0ddeb380813bd81e0f9bdaaa1c9078 upstream. OF framebuffers do not have an underlying device in the Linux device hierarchy. Do a regular unregister call instead of hot unplugging such a non-existing device. Fixes a NULL dereference. An example error message on ppc64le is shown below. BUG: Kernel NULL pointer dereference on read at 0x00000060 Faulting instruction address: 0xc00000000080dfa4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [...] CPU: 2 PID: 139 Comm: systemd-udevd Not tainted 5.17.0-ae085d7f9365 #1 NIP: c00000000080dfa4 LR: c00000000080df9c CTR: c000000000797430 REGS: c000000004132fe0 TRAP: 0300 Not tainted (5.17.0-ae085d7f9365) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28228282 XER: 20000000 CFAR: c00000000000c80c DAR: 0000000000000060 DSISR: 40000000 IRQMASK: 0 GPR00: c00000000080df9c c000000004133280 c00000000169d200 0000000000000029 GPR04: 00000000ffffefff c000000004132f90 c000000004132f88 0000000000000000 GPR08: c0000000015658f8 c0000000015cd200 c0000000014f57d0 0000000048228283 GPR12: 0000000000000000 c00000003fffe300 0000000020000000 0000000000000000 GPR16: 0000000000000000 0000000113fc4a40 0000000000000005 0000000113fcfb80 GPR20: 000001000f7283b0 0000000000000000 c000000000e4a588 c000000000e4a5b0 GPR24: 0000000000000001 00000000000a0000 c008000000db0168 c0000000021f6ec0 GPR28: c0000000016d65a8 c000000004b36460 0000000000000000 c0000000016d64b0 NIP [c00000000080dfa4] do_remove_conflicting_framebuffers+0x184/0x1d0 [c000000004133280] [c00000000080df9c] do_remove_conflicting_framebuffers+0x17c/0x1d0 (unreliable) [c000000004133350] [c00000000080e4d0] remove_conflicting_framebuffers+0x60/0x150 [c0000000041333a0] [c00000000080e6f4] remove_conflicting_pci_framebuffers+0x134/0x1b0 [c000000004133450] [c008000000e70438] drm_aperture_remove_conflicting_pci_framebuffers+0x90/0x100 [drm] [c000000004133490] [c008000000da0ce4] bochs_pci_probe+0x6c/0xa64 [bochs] [...] [c000000004133db0] [c00000000002aaa0] system_call_exception+0x170/0x2d0 [c000000004133e10] [c00000000000c3cc] system_call_common+0xec/0x250 The bug [1] was introduced by commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal"). Most firmware framebuffers have an underlying platform device, which can be hot-unplugged before loading the native graphics driver. OF framebuffers do not (yet) have that device. Fix the code by unregistering the framebuffer as before without a hot unplug. Tested with 5.17 on qemu ppc64le emulation. Signed-off-by: Thomas Zimmermann <[email protected]> Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Reported-by: Sudip Mukherjee <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Tested-by: Sudip Mukherjee <[email protected]> Cc: Zack Rusin <[email protected]> Cc: Javier Martinez Canillas <[email protected]> Cc: Hans de Goede <[email protected]> Cc: [email protected] # v5.11+ Cc: Helge Deller <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Zheyu Ma <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: Zhen Lei <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/all/YkHXO6LGHAN0p1pq@debian/ # [1] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Shreeya Patel <[email protected]> Date: Mon Mar 21 19:02:41 2022 +0530 gpio: Restrict usage of GPIO chip irq members before initialization commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream. GPIO chip irq members are exposed before they could be completely initialized and this leads to race conditions. One such issue was observed for the gc->irq.domain variable which was accessed through the I2C interface in gpiochip_to_irq() before it could be initialized by gpiochip_add_irqchip(). This resulted in Kernel NULL pointer dereference. Following are the logs for reference :- kernel: Call Trace: kernel: gpiod_to_irq+0x53/0x70 kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0 To avoid such scenarios, restrict usage of GPIO chip irq members before they are completely initialized. Signed-off-by: Shreeya Patel <[email protected]> Cc: [email protected] Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Ohad Sharabi <[email protected]> Date: Mon Jan 3 09:48:27 2022 +0200 habanalabs: fix possible memory leak in MMU DR fini [ Upstream commit eb85eec858c1a5c11d3a0bff403f6440b05b40dc ] This patch fixes what seems to be copy paste error. We will have a memory leak if the host-resident shadow is NULL (which will likely happen as the DR and HR are not dependent). Signed-off-by: Ohad Sharabi <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Max Filippov <[email protected]> Date: Fri Apr 8 13:08:55 2022 -0700 highmem: fix checks in __kmap_local_sched_{in,out} commit 66f133ceab7456c789f70a242991ed1b27ba1c3d upstream. When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check that even slots in the tsk->kmap_ctrl.pteval are unmapped. The slots are initialized with 0 value, but the check is done with pte_none. 0 pte however does not necessarily mean that pte_none will return true. e.g. on xtensa it returns false, resulting in the following runtime warnings: WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_out+0x51/0x108 __schedule+0x71a/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0 CPU: 0 PID: 101 Comm: touch Tainted: G W 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_in+0x50/0xe0 finish_task_switch$isra$0+0x1ce/0x2f8 __schedule+0x86e/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5fbda3ecd14a ("sched: highmem: Store local kmaps in task struct") Signed-off-by: Max Filippov <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Mark Zhang <[email protected]> Date: Mon Apr 4 11:58:05 2022 +0300 IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD [ Upstream commit 107dd7beba403a363adfeb3ffe3734fe38a05cce ] On the passive side when the disconnectReq event comes, if the current state is MRA_REP_RCVD, it needs to cancel the MAD before entering the DREQ_RCVD and TIMEWAIT states, otherwise the destroy_id may block until this mad will reach timeout. Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/75261c00c1d82128b1d981af9ff46e994186e621.1649062436.git.leonro@nvidia.com Signed-off-by: Mark Zhang <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Niels Dossche <[email protected]> Date: Mon Feb 28 17:53:30 2022 +0100 IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition [ Upstream commit 4d809f69695d4e7d1378b3a072fa9aef23123018 ] The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. However, the commit I referenced in Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no longer covered by r_lock. This results in the lockdep assertion failing and also possibly in a race condition. Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Niels Dossche <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ivan Vecera <[email protected]> Date: Thu Mar 31 09:20:06 2022 -0700 ice: Clear default forwarding VSI during VSI release [ Upstream commit bd8c624c0cd59de0032752ba3001c107bba97f7b ] VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal. Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device. To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Alice Michael <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Anatolii Gerasymenko <[email protected]> Date: Mon Apr 4 11:35:48 2022 -0700 ice: Do not skip not enabled queues in ice_vc_dis_qs_msg [ Upstream commit 05ef6813b234db3196f083b91db3963f040b65bb ] Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues. Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8) Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11) The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg. When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues). Testing Hints: Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES. while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Alice Michael <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Anatolii Gerasymenko <[email protected]> Date: Mon Apr 4 11:35:47 2022 -0700 ice: Set txq_teid to ICE_INVAL_TEID on ring creation [ Upstream commit ccfee1822042b87e5135d33cad8ea353e64612d2 ] When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Alice Michael <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Maciej Fijalkowski <[email protected]> Date: Thu Mar 17 19:36:27 2022 +0100 ice: synchronize_rcu() when terminating rings [ Upstream commit f9124c68f05ffdb87a47e3ea6d5fae9dad7cb6eb ] Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use. This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Shwetha Nagaraju <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Maciej Fijalkowski <[email protected]> Date: Thu Mar 17 19:36:28 2022 +0100 ice: xsk: fix VSI state check in ice_xsk_wakeup() [ Upstream commit 72b915a2b444e9247c9d424a840e94263db07c27 ] ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski <[email protected]> Tested-by: Shwetha Nagaraju <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Randy Dunlap <[email protected]> Date: Wed Mar 23 16:06:14 2022 -0700 init/main.c: return 1 from handled __setup() functions [ Upstream commit f9a40b0890658330c83c95511f9d6b396610defc ] initcall_blacklist() should return 1 to indicate that it handled its cmdline arguments. set_debug_rodata() should return 1 to indicate that it handled its cmdline arguments. Print a warning if the option string is invalid. This prevents these strings from being added to the 'init' program's environment as they are not init arguments/parameters. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Igor Zhbanov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jens Axboe <[email protected]> Date: Tue Mar 29 10:59:20 2022 -0600 io_uring: defer splice/tee file validity check until command issue commit a3e4bc23d5470b2beb7cc42a86b6a3e75b704c15 upstream. In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time. This also means we can get rid of the cleanup flag for splice and tee. Cc: [email protected] # v5.15+ Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jens Axboe <[email protected]> Date: Wed Mar 30 11:06:02 2022 -0600 io_uring: don't check req->file in io_fsync_prep() commit ec858afda857e361182ceafc3d2ba2b164b8e889 upstream. This is a leftover from the really old days where we weren't able to track and error early if we need a file and it wasn't assigned. Kill the check. Cc: [email protected] # v5.15+ Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Pavel Begunkov <[email protected]> Date: Wed Apr 6 12:43:58 2022 +0100 io_uring: don't touch scm_fp_list after queueing skb [ Upstream commit a07211e3001435fe8591b992464cd8d5e3c98c5a ] It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side. Fixes: 6b06314c47e14 ("io_uring: add file set registration") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jens Axboe <[email protected]> Date: Fri Apr 8 11:08:58 2022 -0600 io_uring: fix race between timeout flush and removal commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream. io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it. Leave it on the list and let the normal timeout cancelation take care of it. Cc: [email protected] # 5.5+ Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Eugene Syromiatnikov <[email protected]> Date: Wed Apr 6 13:55:33 2022 +0200 io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF commit 0f5e4b83b37a96e3643951588ed7176b9b187c0a upstream. Similarly to the way it is done im mbind syscall. Cc: [email protected] # 5.14 Fixes: fe76421d1da1dcdb ("io_uring: allow user configurable IO thread CPU affinity") Signed-off-by: Eugene Syromiatnikov <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Pavel Begunkov <[email protected]> Date: Wed Apr 6 12:43:57 2022 +0100 io_uring: nospec index for tags on files update [ Upstream commit 34bb77184123ae401100a4d156584f12fa630e5c ] Don't forget to array_index_nospec() for indexes before updating rsrc tags in __io_sqe_files_update(), just use already safe and precalculated index @i. Fixes: c3bdad0271834 ("io_uring: add generic rsrc update with tags") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Zhou Guanghui <[email protected]> Date: Wed Jan 19 07:07:54 2022 +0000 iommu/arm-smmu-v3: fix event handling soft lockup [ Upstream commit 30de2b541af98179780054836b48825fcfba4408 ] During event processing, events are read from the event queue one by one until the queue is empty.If the master device continuously requests address access at the same time and the SMMU generates events, the cyclic processing of the event takes a long time and softlockup warnings may be reported. arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received: arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247] Call trace: _dev_info+0x7c/0xa0 arm_smmu_evtq_thread+0x1c0/0x230 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x210 kthread+0x134/0x138 ret_from_fork+0x10/0x1c Kernel panic - not syncing: softlockup: hung tasks Fix this by calling cond_resched() after the event information is printed. Signed-off-by: Zhou Guanghui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Tony Lindgren <[email protected]> Date: Thu Mar 31 09:23:01 2022 +0300 iommu/omap: Fix regression in probe for NULL pointer dereference [ Upstream commit 71ff461c3f41f6465434b9e980c01782763e7ad8 ] Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started triggering a NULL pointer dereference for some omap variants: __iommu_probe_device from probe_iommu_group+0x2c/0x38 probe_iommu_group from bus_for_each_dev+0x74/0xbc bus_for_each_dev from bus_iommu_probe+0x34/0x2e8 bus_iommu_probe from bus_set_iommu+0x80/0xc8 bus_set_iommu from omap_iommu_init+0x88/0xcc omap_iommu_init from do_one_initcall+0x44/0x24 This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV) as noted by Jason Gunthorpe <[email protected]>. Looks like the regression already happened with an earlier commit 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") that changed the function return type and missed converting one place. Cc: Drew Fustini <[email protected]> Cc: Lu Baolu <[email protected]> Cc: Suman Anna <[email protected]> Suggested-by: Jason Gunthorpe <[email protected]> Fixes: 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") Signed-off-by: Tony Lindgren <[email protected]> Tested-by: Drew Fustini <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ido Schimmel <[email protected]> Date: Sat Feb 19 17:45:19 2022 +0200 ipv4: Invalidate neighbour for broadcast address upon address addition [ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ] In case user space sends a packet destined to a broadcast address when a matching broadcast route is not configured, the kernel will create a unicast neighbour entry that will never be resolved [1]. When the broadcast route is configured, the unicast neighbour entry will not be invalidated and continue to linger, resulting in packets being dropped. Solve this by invalidating unresolved neighbour entries for broadcast addresses after routes for these addresses are internally configured by the kernel. This allows the kernel to create a broadcast neighbour entry following the next route lookup. Another possible solution that is more generic but also more complex is to have the ARP code register a listener to the FIB notification chain and invalidate matching neighbour entries upon the addition of broadcast routes. It is also possible to wave off the issue as a user space problem, but it seems a bit excessive to expect user space to be that intimately familiar with the inner workings of the FIB/neighbour kernel code. [1] https://lore.kernel.org/netdev/[email protected]/ Reported-by: Wang Hai <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Wang Hai <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: David Ahern <[email protected]> Date: Mon Apr 4 09:09:08 2022 -0600 ipv6: Fix stats accounting in ip6_pkt_drop [ Upstream commit 1158f79f82d437093aeed87d57df0548bdd68146 ] VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'. Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip <[email protected]> Reported-by: Xiao, Jiguang <[email protected]> Signed-off-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Fri Feb 4 12:15:45 2022 -0800 ipv6: make mc_forwarding atomic [ Upstream commit 145c7a793838add5e004e7d49a67654dc7eba147 ] This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Andre Przywara <[email protected]> Date: Mon Apr 4 12:08:42 2022 +0100 irqchip/gic, gic-v3: Prevent GSI to SGI translations commit 544808f7e21cb9ccdb8f3aa7de594c05b1419061 upstream. At the moment the GIC IRQ domain translation routine happily converts ACPI table GSI numbers below 16 to GIC SGIs (Software Generated Interrupts aka IPIs). On the Devicetree side we explicitly forbid this translation, actually the function will never return HWIRQs below 16 when using a DT based domain translation. We expect SGIs to be handled in the first part of the function, and any further occurrence should be treated as a firmware bug, so add a check and print to report this explicitly and avoid lengthy debug sessions. Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts") Signed-off-by: Andre Przywara <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Marc Zyngier <[email protected]> Date: Tue Mar 15 16:50:32 2022 +0000 irqchip/gic-v3: Fix GICR_CTLR.RWP polling commit 0df6664531a12cdd8fc873f0cac0dcb40243d3e9 upstream. It turns out that our polling of RWP is totally wrong when checking for it in the redistributors, as we test the *distributor* bit index, whereas it is a different bit number in the RDs... Oopsie boo. This is embarassing. Not only because it is wrong, but also because it took *8 years* to notice the blunder... Just fix the damn thing. Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3") Signed-off-by: Marc Zyngier <[email protected]> Cc: [email protected] Reviewed-by: Andre Przywara <[email protected]> Reviewed-by: Lorenzo Pieralisi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Marc Zyngier <[email protected]> Date: Thu Mar 17 09:49:02 2022 +0000 irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling commit af27e41612ec7e5b4783f589b753a7c31a37aac8 upstream. The way KVM drives GICv4.{0,1} is as follows: - vcpu_load() makes the VPE resident, instructing the RD to start scanning for interrupts - just before entering the guest, we check that the RD has finished scanning and that we can start running the vcpu - on preemption, we deschedule the VPE by making it invalid on the RD However, we are preemptible between the first two steps. If it so happens *and* that the RD was still scanning, we nonetheless write to the GICR_VPENDBASER register while Dirty is set, and bad things happen (we're in UNPRED land). This affects both the 4.0 and 4.1 implementations. Make sure Dirty is cleared before performing the deschedule, meaning that its_clear_vpend_valid() becomes a sort of full VPE residency barrier. Reported-by: Jingyi Wang <[email protected]> Tested-by: Nianyao Tang <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Ilan Peer <[email protected]> Date: Fri Feb 4 12:25:00 2022 +0200 iwlwifi: mvm: Correctly set fragmented EBS [ Upstream commit d8d4dd26b9e0469baf5017f0544d852fd4e3fb6d ] Currently, fragmented EBS was set for a channel only if the 'hb_type' was set to fragmented or balanced scan. However, 'hb_type' is set only in case of CDB, and thus fragmented EBS is never set for a channel for non-CDB devices. Fix it. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20220204122220.a6165ac9b9d5.I654eafa62fd647030ae6d4f07f32c96c3171decb@changeid Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Miri Korenblit <[email protected]> Date: Thu Feb 10 18:22:33 2022 +0200 iwlwifi: mvm: move only to an enabled channel [ Upstream commit e04135c07755d001b5cde61048c69a7cc84bb94b ] During disassociation we're decreasing the phy's ref count. If the ref count becomes 0, we're configuring the phy ctxt to the default channel (the lowest channel which the device can operate on). Currently we're not checking whether the the default channel is enabled or not. Fix it by configuring the phy ctxt to the lowest channel which is enabled. Signed-off-by: Miri Korenblit <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20220210181930.03f281b6a6bc.I5b63d43ec41996d599e6f37ec3f32e878b3e405e@changeid Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Haimin Zhang <[email protected]> Date: Tue Mar 22 21:59:17 2022 +0800 jfs: prevent NULL deref in diFree [ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ] Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref in diFree since diFree uses it without do any validations. When function jfs_mount calls diMount to initialize fileset inode allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be initialized. Then it calls diFreeSpecial to close fileset inode allocation map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref. Reported-by: TCS Robot <[email protected]> Signed-off-by: Haimin Zhang <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Marco Elver <[email protected]> Date: Fri Nov 5 13:45:28 2021 -0700 kfence: count unexpectedly skipped allocations [ Upstream commit 9a19aeb5665068c3e2727230588684aae2cab7ef ] Maintain a counter to count allocations that are skipped due to being incompatible (oversized, incompatible gfp flags) or no capacity. This is to compute the fraction of allocations that could not be serviced by KFENCE, which we expect to be rare. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Marco Elver <[email protected]> Date: Fri Nov 5 13:45:34 2021 -0700 kfence: limit currently covered allocations when pool nearly full [ Upstream commit 08f6b10630f284755087f58aa393402e15b92977 ] One of KFENCE's main design principles is that with increasing uptime, allocation coverage increases sufficiently to detect previously undetected bugs. We have observed that frequent long-lived allocations of the same source (e.g. pagecache) tend to permanently fill up the KFENCE pool with increasing system uptime, thus breaking the above requirement. The workaround thus far had been increasing the sample interval and/or increasing the KFENCE pool size, but is no reliable solution. To ensure diverse coverage of allocations, limit currently covered allocations of the same source once pool utilization reaches 75% (configurable via `kfence.skip_covered_thresh`) or above. The effect is retaining reasonable allocation coverage when the pool is close to full. A side-effect is that this also limits frequent long-lived allocations of the same source filling up the pool permanently. Uniqueness of an allocation for coverage purposes is based on its (partial) allocation stack trace (the source). A Counting Bloom filter is used to check if an allocation is covered; if the allocation is currently covered, the allocation is skipped by KFENCE. Testing was done using: (a) a synthetic workload that performs frequent long-lived allocations (default config values; sample_interval=1; num_objects=63), and (b) normal desktop workloads on an otherwise idle machine where the problem was first reported after a few days of uptime (default config values). In both test cases the sampled allocation rate no longer drops to zero at any point. In the case of (b) we observe (after 2 days uptime) 15% unique allocations in the pool, 77% pool utilization, with 20% "skipped allocations (covered)". [[email protected]: simplify and just use hash_32(), use more random stack_hash_seed] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix 32 bit] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Marco Elver <[email protected]> Date: Fri Nov 5 13:45:31 2021 -0700 kfence: move saving stack trace of allocations into __kfence_alloc() [ Upstream commit a9ab52bbcb52df49ec4b30e6741e120588989455 ] Move the saving of the stack trace of allocations into __kfence_alloc(), so that the stack entries array can be used outside of kfence_guarded_alloc() and we avoid potentially unwinding the stack multiple times. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Paolo Bonzini <[email protected]> Date: Wed Apr 6 13:13:42 2022 -0400 KVM: avoid NULL pointer dereference in kvm_dirty_ring_push commit 5593473a1e6c743764b08e3b6071cb43b5cfa6c4 upstream. kvm_vcpu_release() will call kvm_dirty_ring_free(), freeing ring->dirty_gfns and setting it to NULL. Afterwards, it calls kvm_arch_vcpu_destroy(). However, if closing the file descriptor races with KVM_RUN in such away that vcpu->arch.st.preempted == 0, the following call stack leads to a NULL pointer dereference in kvm_dirty_run_push(): mark_page_dirty_in_slot+0x192/0x270 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3171 kvm_steal_time_set_preempted arch/x86/kvm/x86.c:4600 [inline] kvm_arch_vcpu_put+0x34e/0x5b0 arch/x86/kvm/x86.c:4618 vcpu_put+0x1b/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:211 vmx_free_vcpu+0xcb/0x130 arch/x86/kvm/vmx/vmx.c:6985 kvm_arch_vcpu_destroy+0x76/0x290 arch/x86/kvm/x86.c:11219 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] The fix is to release the dirty page ring after kvm_arch_vcpu_destroy has run. Reported-by: Qiuhao Li <[email protected]> Reported-by: Gaoning Pan <[email protected]> Reported-by: Yongkang Jia <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Suravee Suthikulpanit <[email protected]> Date: Thu Feb 10 18:08:51 2022 -0600 KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255 commit 4a204f7895878363ca8211f50ec610408c8c70aa upstream. Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined by the architecture. The number of bits consumed by hardware is model specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM to enumerate the "true" size. So, KVM must allow using all bits, else it risks rejecting completely legal x2APIC IDs on newer CPUs. This means KVM relies on hardware to not assign x2APIC IDs that exceed the "true" width of the field, but presumably hardware is smart enough to tie the width to the max x2APIC ID. KVM also relies on hardware to support at least 8 bits, as the legacy xAPIC ID is writable by software. But, those assumptions are unavoidable due to the lack of any way to enumerate the "true" width. Cc: [email protected] Cc: Maxim Levitsky <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: Suravee Suthikulpanit <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> [modified due to the conflict caused by the commit 391503528257 ("KVM: x86: SVM: move avic definitions from AMD's spec to svm.h")] Signed-off-by: Suravee Suthikulpanit <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Peter Gonda <[email protected]> Date: Fri Mar 4 08:10:32 2022 -0800 KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode() [ Upstream commit 4a9e7b9ea252842bc8b14d495706ac6317fafd5d ] Include kvm_cache_regs.h to pick up the definition of is_guest_mode(), which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove include from svm_onhpyerv.c which was done only because of lack of include in svm.h. Fixes: 883b0a91f41ab ("KVM: SVM: Move Nested SVM Implementation to nested.c") Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Peter Gonda <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hou Wenlong <[email protected]> Date: Wed Mar 2 21:15:14 2022 +0800 KVM: x86/emulator: Emulate RDPID only if it is enabled in guest [ Upstream commit a836839cbfe60dc434c5476a7429cf2bae36415d ] When RDTSCP is supported but RDPID is not supported in host, RDPID emulation is available. However, __kvm_get_msr() would only fail when RDTSCP/RDPID both are disabled in guest, so the emulator wouldn't inject a #UD when RDPID is disabled but RDTSCP is enabled in guest. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Signed-off-by: Hou Wenlong <[email protected]> Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Like Xu <[email protected]> Date: Wed Mar 9 16:42:57 2022 +0800 KVM: x86/pmu: Fix and isolate TSX-specific performance event logic [ Upstream commit e644896f5106aa3f6d7e8c7adf2e4dc0fce53555 ] HSW_IN_TX* bits are used in generic code which are not supported on AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence using HSW_IN_TX* bits unconditionally in generic code is resulting in unintentional pmu behavior on AMD. For example, if EventSelect[11:8] is 0x2, pmc_reprogram_counter() wrongly assumes that HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0. Also per the SDM, both bits 32 and 33 "may only be set if the processor supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set for IA32_PERFEVTSEL2." Opportunistically eliminate code redundancy, because if the HSW_IN_TX* bit is set in pmc->eventsel, it is already set in attr.config. Reported-by: Ravi Bangoria <[email protected]> Reported-by: Jim Mattson <[email protected]> Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5") Co-developed-by: Ravi Bangoria <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Like Xu <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jim Mattson <[email protected]> Date: Mon Mar 7 17:24:52 2022 -0800 KVM: x86/pmu: Use different raw event masks for AMD and Intel [ Upstream commit 95b065bf5c431c06c68056a03a5853b660640ecc ] The third nybble of AMD's event select overlaps with Intel's IN_TX and IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel platforms that support TSX. Declare a raw_event_mask in the kvm_pmu structure, initialize it in the vendor-specific pmu_refresh() functions, and use that mask for PERF_TYPE_RAW configurations in reprogram_gp_counter(). Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW") Signed-off-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jim Mattson <[email protected]> Date: Sat Feb 26 15:41:31 2022 -0800 KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs [ Upstream commit 9b026073db2f1ad0e4d8b61c83316c8497981037 ] AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared. When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits. This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP. For example, root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 Performance counter stats for 'system wide': 0 r26 1.001070977 seconds time elapsed Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn <[email protected]> Signed-off-by: Jim Mattson <[email protected]> Reviewed-by: Like Xu <[email protected]> Reviewed-by: David Dunn <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Feng Tang <[email protected]> Date: Wed Mar 23 16:05:50 2022 -0700 lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option [ Upstream commit 1bf18da62106225dbc47aab41efee2aeb99caccd ] 0Day robots reported there is compiling issue for 'csky' ARCH when CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED is enabled [1]: All errors (new ones prefixed by >>): {standard input}: Assembler messages: >> {standard input}:2277: Error: pcrel offset for branch to .LS000B too far (0x3c) Which was discussed in [2]. And as there is no solution for csky yet, add some dependency for this config to limit it to several ARCHs which have no compiling issue so far. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://www.spinics.net/lists/linux-kbuild/msg30298.html Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Signed-off-by: Feng Tang <[email protected]> Cc: Guo Ren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Johannes Berg <[email protected]> Date: Mon Jan 3 16:40:15 2022 +0100 lib/logic_iomem: correct fallback config references [ Upstream commit 2a6852cb8ff0c8c1363cac648d68489343813212 ] Due to some renaming, we ended up with the "indirect iomem" naming in Kconfig, following INDIRECT_PIO. However, clearly I missed following through on that in the ifdefs, but so far INDIRECT_IOMEM_FALLBACK isn't used by any architecture. Reported-by: Lukas Bulwahn <[email protected]> Fixes: ca2e334232b6 ("lib: add iomem emulation (logic_iomem)") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Yonghong Song <[email protected]> Date: Fri Feb 4 13:43:55 2022 -0800 libbpf: Fix build issue with llvm-readelf [ Upstream commit 0908a66ad1124c1634c33847ac662106f7f2c198 ] There are cases where clang compiler is packaged in a way readelf is a symbolic link to llvm-readelf. In such cases, llvm-readelf will be used instead of default binutils readelf, and the following error will appear during libbpf build: # Warning: Num of global symbols in # /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367) # does NOT match with num of versioned symbols in # /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383). # Please make sure all LIBBPF_API symbols are versioned in libbpf.map. # --- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ... # +++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ... # @@ -324,6 +324,22 @@ # btf__str_by_offset # btf__type_by_id # btf__type_cnt # +LIBBPF_0.0.1 # +LIBBPF_0.0.2 # +LIBBPF_0.0.3 # +LIBBPF_0.0.4 # +LIBBPF_0.0.5 # +LIBBPF_0.0.6 # +LIBBPF_0.0.7 # +LIBBPF_0.0.8 # +LIBBPF_0.0.9 # +LIBBPF_0.1.0 # +LIBBPF_0.2.0 # +LIBBPF_0.3.0 # +LIBBPF_0.4.0 # +LIBBPF_0.5.0 # +LIBBPF_0.6.0 # +LIBBPF_0.7.0 # libbpf_attach_type_by_name # libbpf_find_kernel_btf # libbpf_find_vmlinux_btf_id # make[2]: *** [Makefile:184: check_abi] Error 1 # make[1]: *** [Makefile:140: all] Error 2 The above failure is due to different printouts for some ABS versioned symbols. For example, with the same libbpf.so, $ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0 ... $ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0 ... The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does. Such a difference caused libbpf build failure with llvm-readelf. The proposed fix filters out all ABS symbols as they are not part of the comparison. This works for both binutils readelf and llvm-readelf. Reported-by: Delyan Kratunov <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Greg Kroah-Hartman <[email protected]> Date: Wed Apr 13 20:59:28 2022 +0200 Linux 5.15.34 Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Tested-by: Shuah Khan <[email protected]> Tested-by: Fox Chen <[email protected]> Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Ron Economos <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Guo Xuenan <[email protected]> Date: Fri Apr 8 13:08:58 2022 -0700 lz4: fix LZ4_decompress_safe_partial read out of bound commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream. When partialDecoding, it is EOF if we've either filled the output buffer or can't proceed with reading an offset for following match. In some extreme corner cases when compressed data is suitably corrupted, UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial may lead to read out of bound problem during decoding. lz4 upstream has fixed it [2] and this issue has been disscussed here [3] before. current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd better fix it first. [1] https://lore.kernel.org/all/[email protected]/ [2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad# [3] https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: [email protected] Signed-off-by: Guo Xuenan <[email protected]> Reviewed-by: Nick Terrell <[email protected]> Acked-by: Gao Xiang <[email protected]> Cc: Yann Collet <[email protected]> Cc: Chengyang Fan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Mon Feb 28 01:32:40 2022 +0100 macvtap: advertise link netns via netlink [ Upstream commit a02192151b7dbf855084c38dca380d77c7658353 ] Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. This fixes iproute2 which otherwise resolved the link interface to an interface in the wrong namespace. Test commands: ip netns add nst ip link add dummy0 type dummy ip link add link macvtap0 link dummy0 type macvtap ip link set macvtap0 netns nst ip -netns nst link show macvtap0 Before: 10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff After: 10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0 Reported-by: Leonardo Mörlein <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Matt Johnston <[email protected]> Date: Fri Apr 1 10:48:42 2022 +0800 mctp: Fix check for dev_hard_header() result [ Upstream commit 60be976ac45137657b7b505d7e0d44d0e51accb7 ] dev_hard_header() returns the length of the header, so we need to test for negative errors rather than non-zero. Fixes: 889b7da23abf ("mctp: Add initial routing framework") Signed-off-by: Matt Johnston <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Qinghua Jin <[email protected]> Date: Wed Mar 23 16:06:23 2022 -0700 minix: fix bug when opening a file with O_DIRECT [ Upstream commit 9ce3c0d26c42d279b6c378a03cd6a61d828f19ca ] Testcase: 1. create a minix file system and mount it 2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT 3. open fails with -EINVAL but leaves an empty file behind. All other open() failures don't leave the failed open files behind. It is hard to check the direct_IO op before creating the inode. Just as ext4 and btrfs do, this patch will resolve the issue by allowing to create the file with O_DIRECT but returning error when writing the file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qinghua Jin <[email protected]> Reported-by: Colin Ian King <[email protected]> Reviewed-by: Jan Kara <[email protected]> Acked-by: Christian Brauner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Alexander Lobakin <[email protected]> Date: Wed Feb 23 01:30:23 2022 +0000 MIPS: fix fortify panic when copying asm exception handlers [ Upstream commit d17b66417308996e7e64b270a3c7f3c1fbd4cfc8 ] With KCFLAGS="-O3", I was able to trigger a fortify-source memcpy() overflow panic on set_vi_srs_handler(). Although O3 level is not supported in the mainline, under some conditions that may've happened with any optimization settings, it's just a matter of inlining luck. The panic itself is correct, more precisely, 50/50 false-positive and not at the same time. From the one side, no real overflow happens. Exception handler defined in asm just gets copied to some reserved places in the memory. But the reason behind is that C code refers to that exception handler declares it as `char`, i.e. something of 1 byte length. It's obvious that the asm function itself is way more than 1 byte, so fortify logics thought we are going to past the symbol declared. The standard way to refer to asm symbols from C code which is not supposed to be called from C is to declare them as `extern const u8[]`. This is fully correct from any point of view, as any code itself is just a bunch of bytes (including 0 as it is for syms like _stext/_etext/etc.), and the exact size is not known at the moment of compilation. Adjust the type of the except_vec_vi_*() and related variables. Make set_handler() take `const` as a second argument to avoid cast-away warnings and give a little more room for optimization. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Krzysztof Kozlowski <[email protected]> Date: Thu Mar 17 12:52:59 2022 +0100 MIPS: ingenic: correct unit node address [ Upstream commit 8931ddd8d6a55fcefb20f44a38ba42bb746f0b62 ] Unit node addresses should not have leading 0x: Warning (unit_address_format): /nemc@13410000/efuse@d0/eth-mac-addr@0x22: unit name should not have leading "0x" Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Paul Cercueil <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hangyu Hua <[email protected]> Date: Mon Feb 28 15:35:37 2022 +0800 mips: ralink: fix a refcount leak in ill_acc_of_setup() [ Upstream commit 4a0a1436053b17e50b7c88858fb0824326641793 ] of_node_put(np) needs to be called when pdev == NULL. Signed-off-by: Hangyu Hua <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Miaohe Lin <[email protected]> Date: Fri Apr 8 13:09:07 2022 -0700 mm/mempolicy: fix mpol_new leak in shared_policy_replace commit 4ad099559b00ac01c3726e5c95dc3108ef47d03e upstream. If mpol_new is allocated but not used in restart loop, mpol_new will be freed via mpol_put before returning to the caller. But refcnt is not initialized yet, so mpol_put could not do the right things and might leak the unused mpol_new. This would happen if mempolicy was updated on the shared shmem file while the sp->lock has been dropped during the memory allocation. This issue could be triggered easily with the below code snippet if there are many processes doing the below work at the same time: shmid = shmget((key_t)5566, 1024 * PAGE_SIZE, 0666|IPC_CREAT); shm = shmat(shmid, 0, 0); loop many times { mbind(shm, 1024 * PAGE_SIZE, MPOL_LOCAL, mask, maxnode, 0); mbind(shm + 128 * PAGE_SIZE, 128 * PAGE_SIZE, MPOL_DEFAULT, mask, maxnode, 0); } Link: https://lkml.kernel.org/r/[email protected] Fixes: 42288fe366c4 ("mm: mempolicy: Convert shared_policy mutex to spinlock") Signed-off-by: Miaohe Lin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> [3.8] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Waiman Long <[email protected]> Date: Fri Apr 8 13:09:01 2022 -0700 mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream. The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code: static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; : It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is #ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense. Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long <[email protected]> Reported-by: Justin Forbes <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Rafael Aquini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Peter Xu <[email protected]> Date: Tue Mar 22 14:42:15 2022 -0700 mm: don't skip swap entry even if zap_details specified commit 5abfd71d936a8aefd9f9ccd299dea7a164a5d455 upstream. Patch series "mm: Rework zap ptes on swap entries", v5. Patch 1 should fix a long standing bug for zap_pte_range() on zap_details usage. The risk is we could have some swap entries skipped while we should have zapped them. Migration entries are not the major concern because file backed memory always zap in the pattern that "first time without page lock, then re-zap with page lock" hence the 2nd zap will always make sure all migration entries are already recovered. However there can be issues with real swap entries got skipped errornoously. There's a reproducer provided in commit message of patch 1 for that. Patch 2-4 are cleanups that are based on patch 1. After the whole patchset applied, we should have a very clean view of zap_pte_range(). Only patch 1 needs to be backported to stable if necessary. This patch (of 4): The "details" pointer shouldn't be the token to decide whether we should skip swap entries. For example, when the callers specified details->zap_mapping==NULL, it means the user wants to zap all the pages (including COWed pages), then we need to look into swap entries because there can be private COWed pages that was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them. A reproducer of the problem: ===8<=== #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <stdio.h> #include <assert.h> #include <unistd.h> #include <sys/mman.h> #include <sys/types.h> int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size = getpagesize(); shmem_fd = memfd_create("test", 0); assert(shmem_fd >= 0); ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, shmem_fd, 0); assert(buffer != MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] = 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret = ftruncate(shmem_fd, page_size); assert(ret == 0); /* Recover the size */ ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0); /* Re-read the data, it should be all zero */ val = buffer[page_size]; if (val == 0) printf("Good\n"); else printf("BUG\n"); } ===8<=== We don't need to touch up the pmd path, because pmd never had a issue with swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhile we should do the same check upon migration entry, hwpoison entry and genuine swap entries too. To be explicit, we should still remember to keep the private entries if even_cows==false, and always zap them when even_cows==true. The issue seems to exist starting from the initial commit of git. [[email protected]: comment tweaks] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Mauricio Faria de Oliveira <[email protected]> Date: Thu Apr 7 16:14:27 2022 -0300 mm: fix race between MADV_FREE reclaim and blkdev direct IO read commit 6c8e2a256915a223f6289f651d6b926cd7135c9e upstream. Problem: ======= Userspace might read the zero-page instead of actual data from a direct IO read on a block device if the buffers have been called madvise(MADV_FREE) on earlier (this is discussed below) due to a race between page reclaim on MADV_FREE and blkdev direct IO read. - Race condition: ============== During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks if the page is not dirty, then discards its rmap PTE(s) (vs. remap back if the page is dirty). However, after try_to_unmap_one() returns to shrink_page_list(), it might keep the page _anyway_ if page_ref_freeze() fails (it expects exactly _one_ page reference, from the isolation for page reclaim). Well, blkdev_direct_IO() gets references for all pages, and on READ operations it only sets them dirty _later_. So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct IO read from block devices, and page reclaim happens during __blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages() returns, but BEFORE the pages are set dirty, the situation happens. The direct IO read eventually completes. Now, when userspace reads the buffers, the PTE is no longer there and the page fault handler do_anonymous_page() services that with the zero-page, NOT the data! A synthetic reproducer is provided. - Page faults: =========== If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't happen, because that faults-in all pages as writeable, so do_anonymous_page() sets up a new page/rmap/PTE, and that is used by direct IO. The userspace reads don't fault as the PTE is there (thus zero-page is not used/setup). But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE is no longer there; the subsequent page faults can't help: The data-read from the block device probably won't generate faults due to DMA (no MMU) but even in the case it wouldn't use DMA, that happens on different virtual addresses (not user-mapped addresses) because `struct bio_vec` stores `struct page` to figure addresses out (which are different from user-mapped addresses) for the read. Thus userspace reads (to user-mapped addresses) still fault, then do_anonymous_page() gets another `struct page` that would address/ map to other memory than the `struct page` used by `struct bio_vec` for the read. (The original `struct page` is not available, since it wasn't freed, as page_ref_freeze() failed due to more page refs. And even if it were available, its data cannot be trusted anymore.) Solution: ======== One solution is to check for the expected page reference count in try_to_unmap_one(). There should be one reference from the isolation (that is also checked in shrink_page_list() with page_ref_freeze()) plus one or more references from page mapping(s) (put in discard: label). Further references mean that rmap/PTE cannot be unmapped/nuked. (Note: there might be more than one reference from mapping due to fork()/clone() without CLONE_VM, which use the same `struct page` for references, until the copy-on-write page gets copied.) So, additional page references (e.g., from direct IO read) now prevent the rmap/PTE from being unmapped/dropped; similarly to the page is not freed per shrink_page_list()/page_ref_freeze()). - Races and Barriers: ================== The new check in try_to_unmap_one() should be safe in races with bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's done under the PTE lock. The fast path doesn't take the lock, but it checks if the PTE has changed and if so, it drops the reference and leaves the page for the slow path (which does take that lock). The fast path requires synchronization w/ full memory barrier: it writes the page reference count first then it reads the PTE later, while try_to_unmap() writes PTE first then it reads page refcount. And a second barrier is needed, as the page dirty flag should not be read before the page reference count (as in __remove_mapping()). (This can be a load memory barrier only; no writes are involved.) Call stack/comments: - try_to_unmap_one() - page_vma_mapped_walk() - map_pte() # see pte_offset_map_lock(): pte_offset_map() spin_lock() - ptep_get_and_clear() # write PTE - smp_mb() # (new barrier) GUP fast path - page_ref_count() # (new check) read refcount - page_vma_mapped_walk_done() # see pte_unmap_unlock(): pte_unmap() spin_unlock() - bio_iov_iter_get_pages() - __bio_iov_iter_get_pages() - iov_iter_get_pages() - get_user_pages_fast() - internal_get_user_pages_fast() # fast path - lockless_pages_from_mm() - gup_{pgd,p4d,pud,pmd,pte}_range() ptep = pte_offset_map() # not _lock() pte = ptep_get_lockless(ptep) page = pte_page(pte) try_grab_compound_head(page) # inc refcount # (RMW/barrier # on success) if (pte_val(pte) != pte_val(*ptep)) # read PTE put_compound_head(page) # dec refcount # go slow path # slow path - __gup_longterm_unlocked() - get_user_pages_unlocked() - __get_user_pages_locked() - __get_user_pages() - follow_{page,p4d,pud,pmd}_mask() - follow_page_pte() ptep = pte_offset_map_lock() pte = *ptep page = vm_normal_page(pte) try_grab_page(page) # inc refcount pte_unmap_unlock() - Huge Pages: ========== Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE (aka lazyfree) pages are PageAnon() && !PageSwapBacked() (madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn()) thus should reach shrink_page_list() -> split_huge_page_to_list() before try_to_unmap[_one](), so it deals with normal pages only. (And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens, which should not or be rare, the page refcount should be greater than mapcount: the head page is referenced by tail pages. That also prevents checking the head `page` then incorrectly call page_remove_rmap(subpage) for a tail page, that isn't even in the shrink_page_list()'s page_list (an effect of split huge pmd/pmvw), as it might happen today in this unlikely scenario.) MADV_FREE'd buffers: =================== So, back to the "if MADV_FREE pages are used as buffers" note. The case is arguable, and subject to multiple interpretations. The madvise(2) manual page on the MADV_FREE advice value says: 1) 'After a successful MADV_FREE ... data will be lost when the kernel frees the pages.' 2) 'the free operation will be canceled if the caller writes into the page' / 'subsequent writes ... will succeed and then [the] kernel cannot free those dirtied pages' 3) 'If there is no subsequent write, the kernel can free the pages at any time.' Thoughts, questions, considerations... respectively: 1) Since the kernel didn't actually free the page (page_ref_freeze() failed), should the data not have been lost? (on userspace read.) 2) Should writes performed by the direct IO read be able to cancel the free operation? - Should the direct IO read be considered as 'the caller' too, as it's been requested by 'the caller'? - Should the bio technique to dirty pages on return to userspace (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) be considered in another/special way here? 3) Should an upcoming write from a previously requested direct IO read be considered as a subsequent write, so the kernel should not free the pages? (as it's known at the time of page reclaim.) And lastly: Technically, the last point would seem a reasonable consideration and balance, as the madvise(2) manual page apparently (and fairly) seem to assume that 'writes' are memory access from the userspace process (not explicitly considering writes from the kernel or its corner cases; again, fairly).. plus the kernel fix implementation for the corner case of the largely 'non-atomic write' encompassed by a direct IO read operation, is relatively simple; and it helps. Reproducer: ========== @ test.c (simplified, but works) #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> int main() { int fd, i; char *buf; fd = open(DEV, O_RDONLY | O_DIRECT); buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) buf[i] = 1; // init to non-zero madvise(buf, BUF_SIZE, MADV_FREE); read(fd, buf, BUF_SIZE); for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) printf("%p: 0x%x\n", &buf[i], buf[i]); return 0; } @ block/fops.c (formerly fs/block_dev.c) +#include <linux/swap.h> ... ... __blkdev_direct_IO[_simple](...) { ... + if (!strcmp(current->comm, "good")) + shrink_all_memory(ULONG_MAX); + ret = bio_iov_iter_get_pages(...); + + if (!strcmp(current->comm, "bad")) + shrink_all_memory(ULONG_MAX); ... } @ shell # NUM_PAGES=4 # PAGE_SIZE=$(getconf PAGE_SIZE) # yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES} # DEV=$(losetup -f --show test.img) # gcc -DDEV=\"$DEV\" \ -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \ -DPAGE_SIZE=${PAGE_SIZE} \ test.c -o test # od -tx1 $DEV 0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a * 0040000 # mv test good # ./good 0x7f7c10418000: 0x79 0x7f7c10419000: 0x79 0x7f7c1041a000: 0x79 0x7f7c1041b000: 0x79 # mv good bad # ./bad 0x7fa1b8050000: 0x0 0x7fa1b8051000: 0x0 0x7fa1b8052000: 0x0 0x7fa1b8053000: 0x0 Note: the issue is consistent on v5.17-rc3, but it's intermittent with the support of MADV_FREE on v4.5 (60%-70% error; needs swap). [wrap do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c]. - v5.17-rc3: # for i in {1..1000}; do ./good; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79 # mv good bad # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 4000 0x0 # free | grep Swap Swap: 0 0 0 - v4.5: # for i in {1..1000}; do ./good; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79 # mv good bad # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 2702 0x0 1298 0x79 # swapoff -av swapoff /swap # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79 Ceph/TCMalloc: ============= For documentation purposes, the use case driving the analysis/fix is Ceph on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to release unused memory to the system from the mmap'ed page heap (might be committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan() -> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() -> TCMalloc_SystemCommit() -> do nothing. Note: TCMalloc switched back to MADV_DONTNEED a few commits after the release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue just 'disappeared' on Ceph on later Ubuntu releases but is still present in the kernel, and can be hit by other use cases. The observed issue seems to be the old Ceph bug #22464 [1], where checksum mismatches are observed (and instrumentation with buffer dumps shows zero-pages read from mmap'ed/MADV_FREE'd page ranges). The issue in Ceph was reasonably deemed a kernel bug (comment #50) and mostly worked around with a retry mechanism, but other parts of Ceph could still hit that (rocksdb). Anyway, it's less likely to be hit again as TCMalloc switched out of MADV_FREE by default. (Some kernel versions/reports from the Ceph bug, and relation with the MADV_FREE introduction/changes; TCMalloc versions not checked.) - 4.4 good - 4.5 (madv_free: introduction) - 4.9 bad - 4.10 good? maybe a swapless system - 4.12 (madv_free: no longer free instantly on swapless systems) - 4.13 bad [1] https://tracker.ceph.com/issues/22464 Thanks: ====== Several people contributed to analysis/discussions/tests/reproducers in the first stages when drilling down on ceph/tcmalloc/linux kernel: - Dan Hill - Dan Streetman - Dongdong Tao - Gavin Guo - Gerald Yang - Heitor Alves de Siqueira - Ioanna Alifieraki - Jay Vosburgh - Matthew Ruffell - Ponnuvel Palaniyappan Reviews, suggestions, corrections, comments: - Minchan Kim - Yu Zhao - Huang, Ying - John Hubbard - Christoph Hellwig [[email protected]: v4] Link: https://lkml.kernel.org/r/[email protected]: https://lkml.kernel.org/r/[email protected] Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Yang Shi <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Dan Hill <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Dongdong Tao <[email protected]> Cc: Gavin Guo <[email protected]> Cc: Gerald Yang <[email protected]> Cc: Heitor Alves de Siqueira <[email protected]> Cc: Ioanna Alifieraki <[email protected]> Cc: Jay Vosburgh <[email protected]> Cc: Matthew Ruffell <[email protected]> Cc: Ponnuvel Palaniyappan <[email protected]> Cc: <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> [mfo: backport: replace folio/test_flag with page/flag equivalents; real Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") in v4.] Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Löhle <[email protected]> Date: Thu Mar 24 14:18:41 2022 +0000 mmc: block: Check for errors after write on SPI commit 5d435933376962b107bd76970912e7e80247dcc7 upstream. Introduce a SEND_STATUS check for writes through SPI to not mark an unsuccessful write as successful. Since SPI SD/MMC does not have states, after a write, the card will just hold the line LOW until it is ready again. The driver marks the write therefore as completed as soon as it reads something other than all zeroes. The driver does not distinguish from a card no longer signalling busy and it being disconnected (and the line being pulled-up by the host). This lead to writes being marked as successful when disconnecting a busy card. Now the card is ensured to be still connected by an additional CMD13, just like non-SPI is ensured to go back to TRAN state. While at it and since we already poll for the post-write status anyway, we might as well check for SPIs error bits (any of them). The disconnecting card problem is reproducable for me after continuous write activity and randomly disconnecting, around every 20-50 tries on SPI DS for some card. Fixes: 7213d175e3b6f ("MMC/SD card driver learns SPI") Cc: [email protected] Signed-off-by: Christian Loehle <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Michael Wu <[email protected]> Date: Thu Mar 31 15:32:23 2022 +0800 mmc: core: Fixup support for writeback-cache for eMMC and SD commit 08ebf903af57cda6d773f3dd1671b64f73b432b8 upstream. During the card initialization process, the mmc core checks whether the eMMC/SD card supports an internal writeback-cache and then enables it inside the card. Unfortunately, this isn't according to what the mmc core reports to the upper block layer. Instead, the writeback-cache support with REQ_FLUSH and REQ_FUA, are being enabled depending on whether the host supports the CMD23 (MMC_CAP_CMD23) and whether an eMMC supports the reliable-write command. This is wrong and it may also sound awkward. In fact, it's a remnant from when both eMMC/SD cards didn't have dedicated commands/support to control the internal writeback-cache. In other words, it was the best we could do at that point in time. To fix the problem, but also without breaking backwards compatibility, let's align the REQ_FLUSH support with whether the writeback-cache became successfully enabled - for both eMMC and SD cards. Cc: [email protected] Fixes: 881d1c25f765 ("mmc: core: Add cache control for eMMC4.5 device") Fixes: 130206a615a9 ("mmc: core: Add support for cache ctrl for SD cards") Depends-on: 97fce126e279 ("mmc: block: Issue a cache flush only when it's enabled") Reviewed-by: Avri Altman <[email protected]> Signed-off-by: Michael Wu <[email protected]> Link: https://lore.kernel.org/r/[email protected] [Ulf: Re-wrote the commit message] Signed-off-by: Ulf Hansson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yann Gautier <[email protected]> Date: Thu Mar 17 12:19:43 2022 +0100 mmc: mmci: stm32: correctly check all elements of sg list commit 0d319dd5a27183b75d984e3dc495248e59f99334 upstream. Use sg and not data->sg when checking sg list elements. Else only the first element alignment is checked. The last element should be checked the same way, for_each_sg already set sg to sg_next(sg). Fixes: 46b723dd867d ("mmc: mmci: add stm32 sdmmc variant") Cc: [email protected] Signed-off-by: Yann Gautier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wolfram Sang <[email protected]> Date: Mon Apr 4 13:49:02 2022 +0200 mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete commit 03e59b1e2f56245163b14c69e0a830c24b1a3a47 upstream. When HS400 tuning is complete and HS400 is going to be activated, we have to keep the current number of TAPs and should not overwrite them with a hardcoded value. This was probably a copy&paste mistake when upporting HS400 support from the BSP. Fixes: 26eb2607fa28 ("mmc: renesas_sdhi: add eMMC HS400 mode support") Reported-by: Yoshihiro Shimoda <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Reviewed-by: Yoshihiro Shimoda <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Paolo Bonzini <[email protected]> Date: Fri Apr 8 13:09:04 2022 -0700 mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) commit 01e67e04c28170c47700c2c226d732bbfedb1ad0 upstream. If an mremap() syscall with old_size=0 ends up in move_page_tables(), it will call invalidate_range_start()/invalidate_range_end() unnecessarily, i.e. with an empty range. This causes a WARN in KVM's mmu_notifier. In the past, empty ranges have been diagnosed to be off-by-one bugs, hence the WARNing. Given the low (so far) number of unique reports, the benefits of detecting more buggy callers seem to outweigh the cost of having to fix cases such as this one, where userspace is doing something silly. In this particular case, an early return from move_page_tables() is enough to fix the issue. Link: https://lkml.kernel.org/r/[email protected] Reported-by: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Lorenzo Bianconi <[email protected]> Date: Tue Feb 1 12:29:55 2022 +0100 mt76: dma: initialize skip_unmap in mt76_dma_rx_fill [ Upstream commit 577298ec55dfc8b9aece54520f0258c3f93a6573 ] Even if it is only a false-positive since skip_buf0/skip_buf1 are only used in mt76_dma_tx_cleanup_idx routine, initialize skip_unmap in mt76_dma_rx_fill in order to fix the following UBSAN report: [ 13.924906] UBSAN: invalid-load in linux-5.15.0/drivers/net/wireless/mediatek/mt76/dma.c:162:13 [ 13.924909] load of value 225 is not a valid value for type '_Bool' [ 13.924912] CPU: 9 PID: 672 Comm: systemd-udevd Not tainted 5.15.0-18-generic #18-Ubuntu [ 13.924914] Hardware name: LENOVO 21A0000CMX/21A0000CMX, BIOS R1MET43W (1.13 ) 11/05/2021 [ 13.924915] Call Trace: [ 13.924917] <TASK> [ 13.924920] show_stack+0x52/0x58 [ 13.924925] dump_stack_lvl+0x4a/0x5f [ 13.924931] dump_stack+0x10/0x12 [ 13.924932] ubsan_epilogue+0x9/0x45 [ 13.924934] __ubsan_handle_load_invalid_value.cold+0x44/0x49 [ 13.924935] ? __iommu_dma_map+0x84/0xf0 [ 13.924939] mt76_dma_add_buf.constprop.0.cold+0x23/0x85 [mt76] [ 13.924949] mt76_dma_rx_fill.isra.0+0x102/0x1f0 [mt76] [ 13.924954] mt76_dma_init+0xc9/0x150 [mt76] [ 13.924959] ? mt7921_dma_enable+0x110/0x110 [mt7921e] [ 13.924966] mt7921_dma_init+0x1e3/0x260 [mt7921e] [ 13.924970] mt7921_register_device+0x29d/0x510 [mt7921e] [ 13.924975] mt7921_pci_probe.part.0+0x17f/0x1b0 [mt7921e] [ 13.924980] mt7921_pci_probe+0x43/0x60 [mt7921e] [ 13.924984] local_pci_probe+0x4b/0x90 [ 13.924987] pci_device_probe+0x115/0x1f0 [ 13.924989] really_probe+0x21e/0x420 [ 13.924992] __driver_probe_device+0x115/0x190 [ 13.924994] driver_probe_device+0x23/0xc0 [ 13.924996] __driver_attach+0xbd/0x1d0 [ 13.924998] ? __device_attach_driver+0x110/0x110 [ 13.924999] bus_for_each_dev+0x7e/0xc0 [ 13.925001] driver_attach+0x1e/0x20 [ 13.925003] bus_add_driver+0x135/0x200 [ 13.925005] driver_register+0x95/0xf0 [ 13.925008] ? 0xffffffffc0766000 [ 13.925010] __pci_register_driver+0x68/0x70 [ 13.925011] mt7921_pci_driver_init+0x23/0x1000 [mt7921e] [ 13.925015] do_one_initcall+0x48/0x1d0 [ 13.925019] ? kmem_cache_alloc_trace+0x19e/0x2e0 [ 13.925022] do_init_module+0x62/0x280 [ 13.925025] load_module+0xac9/0xbb0 [ 13.925027] __do_sys_finit_module+0xbf/0x120 [ 13.925029] __x64_sys_finit_module+0x18/0x20 [ 13.925030] do_syscall_64+0x5c/0xc0 [ 13.925033] ? do_syscall_64+0x69/0xc0 [ 13.925034] ? sysvec_reschedule_ipi+0x78/0xe0 [ 13.925036] ? asm_sysvec_reschedule_ipi+0xa/0x20 [ 13.925039] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 13.925040] RIP: 0033:0x7fbf2b90f94d [ 13.925045] RSP: 002b:00007ffe2ec7e5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 13.925047] RAX: ffffffffffffffda RBX: 000056106b0634e0 RCX: 00007fbf2b90f94d [ 13.925048] RDX: 0000000000000000 RSI: 00007fbf2baa3441 RDI: 0000000000000013 [ 13.925049] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002 [ 13.925050] R10: 0000000000000013 R11: 0000000000000246 R12: 00007fbf2baa3441 [ 13.925051] R13: 000056106b062620 R14: 000056106b0610c0 R15: 000056106b0640d0 [ 13.925053] </TASK> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Deren Wu <[email protected]> Date: Fri Mar 4 22:54:05 2022 +0800 mt76: fix monitor mode crash with sdio driver [ Upstream commit 123bc712b1de0805f9d683687e17b1ec2aba0b68 ] mt7921s driver may receive frames with fragment buffers. If there is a CTS packet received in monitor mode, the payload is 10 bytes only and need 6 bytes header padding after RXD buffer. However, only RXD in the first linear buffer, if we pull buffer size RXD-size+6 bytes with skb_pull(), that would trigger "BUG_ON(skb->len < skb->data_len)" in __skb_pull(). To avoid the nonlinear buffer issue, enlarge the RXD size from 128 to 256 to make sure all MCU operation in linear buffer. [ 52.007562] kernel BUG at include/linux/skbuff.h:2313! [ 52.007578] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 52.007987] pc : skb_pull+0x48/0x4c [ 52.008015] lr : mt7921_queue_rx_skb+0x494/0x890 [mt7921_common] [ 52.008361] Call trace: [ 52.008377] skb_pull+0x48/0x4c [ 52.008400] mt76s_net_worker+0x134/0x1b0 [mt76_sdio 35339a92c6eb7d4bbcc806a1d22f56365565135c] [ 52.008431] __mt76_worker_fn+0xe8/0x170 [mt76 ef716597d11a77150bc07e3fdd68eeb0f9b56917] [ 52.008449] kthread+0x148/0x3ac [ 52.008466] ret_from_fork+0x10/0x30 Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Sean Wang <[email protected]> Signed-off-by: Deren Wu <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Yang Li <[email protected]> Date: Mon Feb 14 09:58:21 2022 +0800 mt76: mt7615: Fix assigning negative values to unsigned variable [ Upstream commit 9273ffcc9a11942bd586bb42584337ef3962b692 ] Smatch reports the following: drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865 mt7615_mac_adjust_sensitivity() warn: assigning (-110) to unsigned variable 'def_th' drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865 mt7615_mac_adjust_sensitivity() warn: assigning (-98) to unsigned variable 'def_th' Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Johan Almbladh <[email protected]> Date: Fri Feb 4 16:47:30 2022 +0100 mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU [ Upstream commit 28225a6ef80ebf46c46e5fbd5b1ee231a0b2b5b7 ] Before, the hardware would be allowed to transmit injected 802.11 MPDUs as A-MSDU. This resulted in corrupted frames being transmitted. Now, injected MPDUs are transmitted as-is, without A-MSDU. The fix was verified with frame injection on MT7915 hardware, both with and without the injected frame being encrypted. If the hardware cannot do A-MSDU aggregation on MPDUs, this problem would also be present in the TX path where mac80211 does the 802.11 encapsulation. However, I have not observed any such problem when disabling IEEE80211_HW_SUPPORTS_TX_ENCAP_OFFLOAD to force that mode. Therefore this fix is isolated to injected frames only. The same A-MSDU logic is also present in the mt7921 driver, so it is likely that this fix should be applied there too. I do not have access to mt7921 hardware so I have not been able to test that. Signed-off-by: Johan Almbladh <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ben Greear <[email protected]> Date: Sat Jan 8 11:08:14 2022 -0800 mt76: mt7921: fix crash when startup fails. [ Upstream commit 827e7799c61b978fbc2cc9dac66cb62401b2b3f0 ] If the nic fails to start, it is possible that the reset_work has already been scheduled. Ensure the work item is canceled so we do not have use-after-free crash in case cleanup is called before the work item is executed. This fixes crash on my x86_64 apu2 when mt7921k radio fails to work. Radio still fails, but OS does not crash. Signed-off-by: Ben Greear <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Luis Chamberlain <[email protected]> Date: Mon Sep 27 14:59:58 2021 -0700 nbd: add error handling support for add_disk() [ Upstream commit e1654f413fe08ffbc3292d8d2b8958b2cc5cb5e8 ] We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ye Bin <[email protected]> Date: Tue Nov 2 09:52:37 2021 +0800 nbd: Fix hungtask when nbd_config_put [ Upstream commit e2daec488c57069a4a431d5b752f50294c4bf273 ] I got follow issue: [ 247.381177] INFO: task kworker/u10:0:47 blocked for more than 120 seconds. [ 247.382644] Not tainted 4.19.90-dirty #140 [ 247.383502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.385027] Call Trace: [ 247.388384] schedule+0xb8/0x3c0 [ 247.388966] schedule_timeout+0x2b4/0x380 [ 247.392815] wait_for_completion+0x367/0x510 [ 247.397713] flush_workqueue+0x32b/0x1340 [ 247.402700] drain_workqueue+0xda/0x3c0 [ 247.403442] destroy_workqueue+0x7b/0x690 [ 247.405014] nbd_config_put.cold+0x2f9/0x5b6 [ 247.405823] recv_work+0x1fd/0x2b0 [ 247.406485] process_one_work+0x70b/0x1610 [ 247.407262] worker_thread+0x5a9/0x1060 [ 247.408699] kthread+0x35e/0x430 [ 247.410918] ret_from_fork+0x1f/0x30 We can reproduce issue as follows: 1. Inject memory fault in nbd_start_device -1244,10 +1248,18 @@ static int nbd_start_device(struct nbd_device *nbd) nbd_dev_dbg_init(nbd); for (i = 0; i < num_connections; i++) { struct recv_thread_args *args; - - args = kzalloc(sizeof(*args), GFP_KERNEL); + + if (i == 1) { + args = NULL; + printk("%s: inject malloc error\n", __func__); + } + else + args = kzalloc(sizeof(*args), GFP_KERNEL); 2. Inject delay in recv_work -757,6 +760,8 @@ static void recv_work(struct work_struct *work) blk_mq_complete_request(blk_mq_rq_from_pdu(cmd)); } + printk("%s: comm=%s pid=%d\n", __func__, current->comm, current->pid); + mdelay(5 * 1000); nbd_config_put(nbd); atomic_dec(&config->recv_threads); wake_up(&config->recv_wq); 3. Create nbd server nbd-server 8000 /tmp/disk 4. Create nbd client nbd-client localhost 8000 /dev/nbd1 Then will trigger above issue. Reason is when add delay in recv_work, lead to release the last reference of 'nbd->config_refs'. nbd_config_put will call flush_workqueue to make all work finish. Obviously, it will lead to deadloop. To solve this issue, according to Josef's suggestion move 'recv_work' init from start device to nbd_dev_add, then destroy 'recv_work'when nbd device teardown. Signed-off-by: Ye Bin <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ye Bin <[email protected]> Date: Tue Nov 2 09:52:36 2021 +0800 nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_add [ Upstream commit 69beb62ff0d1723a750eebe1c4d01da573d7cd19 ] If first_minor is illegal will goto out_free_idr label, this will miss cleanup disk. Fixes: b1a811633f73 ("block: nbd: add sanity check for first_minor") Signed-off-by: Ye Bin <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Zhang Wensheng <[email protected]> Date: Thu Mar 10 17:32:24 2022 +0800 nbd: fix possible overflow on 'first_minor' in nbd_dev_add() [ Upstream commit 6d35d04a9e18990040e87d2bbf72689252669d54 ] When 'index' is a big numbers, it may become negative which forced to 'int'. then 'index << part_shift' might overflow to a positive value that is not greater than '0xfffff', then sysfs might complains about duplicate creation. Because of this, move the 'index' judgment to the front will fix it and be better. Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Fixes: 940c264984fd ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()") Signed-off-by: Zhang Wensheng <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Maxim Mikityanskiy <[email protected]> Date: Tue Jan 25 12:52:48 2022 +0200 net/mlx5e: Disable TX queues before registering the netdev [ Upstream commit d08c6e2a4d0308a7922d7ef3b1b3af45d4096aad ] Normally, the queues are disabled when the channels are deactivated, and enabled when the channels are activated. However, on register, the channels are not active, but the queues are enabled by default. This change fixes it, preventing mlx5e_xmit from running when the channels are deactivated in the beginning. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Gal Pressman <[email protected]> Date: Wed Jan 26 16:28:23 2022 +0200 net/mlx5e: Remove overzealous validations in netlink EEPROM query [ Upstream commit 970adfb76095fa719778d70a6b86030d2feb88dd ] Unlike the legacy EEPROM callbacks, when using the netlink EEPROM query (get_module_eeprom_by_page) the driver should not try to validate the query parameters, but just perform the read requested by the userspace. Recent discussion in the mailing list: https://lore.kernel.org/netdev/20220120093051.70845141@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net/ Signed-off-by: Gal Pressman <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Dust Li <[email protected]> Date: Tue Mar 1 17:44:00 2022 +0800 net/smc: correct settings of RMB window update limit [ Upstream commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37 ] rmbe_update_limit is used to limit announcing receive window updating too frequently. RFC7609 request a minimal increase in the window size of 10% of the receive buffer space. But current implementation used: min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2) and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost always less then 10% of the receive buffer space. This causes the receiver always sending CDC message to update its consumer cursor when it consumes more then 2K of data. And as a result, we may encounter something like "TCP silly window syndrome" when sending 2.5~8K message. This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2). With this patch and SMC autocorking enabled, qperf 2K/4K/8K tcp_bw test shows 45%/75%/40% increase in throughput respectively. Signed-off-by: Dust Li <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Dust Li <[email protected]> Date: Tue Mar 1 17:43:59 2022 +0800 net/smc: send directly on setting TCP_NODELAY commit b70a5cc045197aad9c159042621baf3c015f6cc7 upstream. In commit ea785a1a573b("net/smc: Send directly when TCP_CORK is cleared"), we don't use delayed work to implement cork. This patch use the same algorithm, removes the delayed work when setting TCP_NODELAY and send directly in setsockopt(). This also makes the TCP_NODELAY the same as TCP. Cc: Tony Lu <[email protected]> Signed-off-by: Dust Li <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tony Lu <[email protected]> Date: Mon Jan 31 02:02:55 2022 +0800 net/smc: Send directly when TCP_CORK is cleared [ Upstream commit ea785a1a573b390a150010b3c5b81e1ccd8c98a8 ] According to the man page of TCP_CORK [1], if set, don't send out partial frames. All queued partial frames are sent when option is cleared again. When applications call setsockopt to disable TCP_CORK, this call is protected by lock_sock(), and tries to mod_delayed_work() to 0, in order to send pending data right now. However, the delayed work smc_tx_work is also protected by lock_sock(). There introduces lock contention for sending data. To fix it, send pending data directly which acts like TCP, without lock_sock() protected in the context of setsockopt (already lock_sock()ed), and cancel unnecessary dealyed work, which is protected by lock. [1] https://linux.die.net/man/7/tcp Signed-off-by: Tony Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ziyang Xuan <[email protected]> Date: Thu Mar 31 15:04:28 2022 +0800 net/tls: fix slab-out-of-bounds bug in decrypt_internal [ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ] The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in tls_set_sw_offload(). The return value of crypto_aead_ivsize() for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes memory space will trigger slab-out-of-bounds bug as following: ================================================================== BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls] Read of size 16 at addr ffff888114e84e60 by task tls/10911 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db ? decrypt_internal+0x385/0xc40 [tls] kasan_report+0xab/0x120 ? decrypt_internal+0x385/0xc40 [tls] kasan_check_range+0xf9/0x1e0 memcpy+0x20/0x60 decrypt_internal+0x385/0xc40 [tls] ? tls_get_rec+0x2e0/0x2e0 [tls] ? process_rx_list+0x1a5/0x420 [tls] ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls] decrypt_skb_update+0x9d/0x400 [tls] tls_sw_recvmsg+0x3c8/0xb50 [tls] Allocated by task 10911: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 tls_set_sw_offload+0x2eb/0xa20 [tls] tls_setsockopt+0x68c/0x700 [tls] __sys_setsockopt+0xfe/0x1b0 Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size when memcpy() iv value in TLS_1_3_VERSION scenario. Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jakub Kicinski <[email protected]> Date: Wed Mar 9 10:29:13 2022 -0800 net: account alternate interface name memory [ Upstream commit 5d26cff5bdbebdf98ba48217c078ff102536f134 ] George reports that altnames can eat up kernel memory. We should charge that memory appropriately. Reported-by: George Shuklin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Andrew Lunn <[email protected]> Date: Tue Apr 5 02:04:04 2022 +0200 net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address() [ Upstream commit 11f8e7c122ce013fa745029fa8c94c6db69c2e54 ] There is often not a MAC address available in an EEPROM accessible by Linux with Marvell devices. Instead the bootload has the MAC address and directly programs it into the hardware. So don't consider an error from of_get_mac_address() has fatal. However, the check was added for the case where there is a MAC address in an the EEPROM, but the EEPROM has not probed yet, and -EPROBE_DEFER is returned. In that case the error should be returned. So make the check specific to this error code. Cc: Mauri Sandberg <[email protected]> Reported-by: Thomas Walther <[email protected]> Fixes: 42404d8f1c01 ("net: mv643xx_eth: process retval from of_get_mac_address") Signed-off-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Sat Feb 5 09:01:25 2022 -0800 net: initialize init_net earlier [ Upstream commit 9c1be1935fb68b2413796cdc03d019b8cf35ab51 ] While testing a patch that will follow later ("net: add netns refcount tracker to struct nsproxy") I found that devtmpfs_init() was called before init_net was initialized. This is a bug, because devtmpfs_setup() calls ksys_unshare(CLONE_NEWNS); This has the effect of increasing init_net refcount, which will be later overwritten to 1, as part of setup_net(&init_net) We had too many prior patches [1] trying to work around the root cause. Really, make sure init_net is in BSS section, and that net_ns_init() is called earlier at boot time. Note that another patch ("vfs: add netns refcount tracker to struct fs_context") also will need net_ns_init() being called before vfs_caches_init() As a bonus, this patch saves around 4KB in .data section. [1] f8c46cb39079 ("netns: do not call pernet ops for not yet set up init_net namespace") b5082df8019a ("net: Initialise init_net.count to 1") 734b65417b24 ("net: Statically initialize init_net.dev_base_head") v2: fixed a build error reported by kernel build bots (CONFIG_NET=n) Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Nikolay Aleksandrov <[email protected]> Date: Fri Apr 1 10:33:42 2022 +0300 net: ipv4: fix route with nexthop object delete warning [ Upstream commit 6bf92d70e690b7ff12b24f4bfff5e5434d019b82 ] FRR folks have hit a kernel warning[1] while deleting routes[2] which is caused by trying to delete a route pointing to a nexthop id without specifying nhid but matching on an interface. That is, a route is found but we hit a warning while matching it. The warning is from fib_info_nh() in include/net/nexthop.h because we run it on a fib_info with nexthop object. The call chain is: inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a nexthop fib_info and also with fc_oif set thus calling fib_info_nh on the fib_info and triggering the warning). The fix is to not do any matching in that branch if the fi has a nexthop object because those are managed separately. I.e. we should match when deleting without nh spec and should fail when deleting a nexthop route with old-style nh spec because nexthop objects are managed separately, e.g.: $ ip r show 1.2.3.4/32 1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0 $ ip r del 1.2.3.4/32 $ ip r del 1.2.3.4/32 nhid 12 <both should work> $ ip r del 1.2.3.4/32 dev dummy0 <should fail with ESRCH> [1] [ 523.462226] ------------[ cut here ]------------ [ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460 [ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd [ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse [ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1 [ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020 [ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460 [ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00 [ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286 [ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0 [ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380 [ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000 [ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031 [ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0 [ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000 [ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0 [ 523.462315] Call Trace: [ 523.462316] <TASK> [ 523.462320] fib_table_delete+0x1a9/0x310 [ 523.462323] inet_rtm_delroute+0x93/0x110 [ 523.462325] rtnetlink_rcv_msg+0x133/0x370 [ 523.462327] ? _copy_to_iter+0xb5/0x6f0 [ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110 [ 523.462331] netlink_rcv_skb+0x50/0xf0 [ 523.462334] netlink_unicast+0x211/0x330 [ 523.462336] netlink_sendmsg+0x23f/0x480 [ 523.462338] sock_sendmsg+0x5e/0x60 [ 523.462340] ____sys_sendmsg+0x22c/0x270 [ 523.462341] ? import_iovec+0x17/0x20 [ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90 [ 523.462344] ? __mod_lruvec_page_state+0x85/0x110 [ 523.462348] ___sys_sendmsg+0x81/0xc0 [ 523.462350] ? netlink_seq_start+0x70/0x70 [ 523.462352] ? __dentry_kill+0x13a/0x180 [ 523.462354] ? __fput+0xff/0x250 [ 523.462356] __sys_sendmsg+0x49/0x80 [ 523.462358] do_syscall_64+0x3b/0x90 [ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 523.462364] RIP: 0033:0x7f24552aa337 [ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337 [ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003 [ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001 [ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040 [ 523.462373] </TASK> [ 523.462374] ---[ end trace ba537bc16f6bf4ed ]--- [2] https://github.com/FRRouting/frr/issues/6412 Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info") Signed-off-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jakub Kicinski <[email protected]> Date: Wed Mar 9 10:29:14 2022 -0800 net: limit altnames to 64k total [ Upstream commit 155fb43b70b5fce341347a77d1af2765d1e8fbb8 ] Property list (altname is a link "property") is wrapped in a nlattr. nlattrs length is 16bit so practically speaking the list of properties can't be longer than that, otherwise user space would have to interpret broken netlink messages. Prevent the problem from occurring by checking the length of the property list before adding new entries. Reported-by: George Shuklin <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ilya Maximets <[email protected]> Date: Mon Apr 4 12:41:50 2022 +0200 net: openvswitch: don't send internal clone attribute to the userspace. [ Upstream commit 3f2a3050b4a3e7f32fc0ea3c9b0183090ae00522 ] 'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI. The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows: # ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0) With the fix: # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0) Additionally fixed an incorrect attribute name in the comment. Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Aaron Conole <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ilya Maximets <[email protected]> Date: Mon Apr 4 17:43:45 2022 +0200 net: openvswitch: fix leak of nested actions [ Upstream commit 1f30fb9166d4f15a1aa19449b9da871fe0ed4796 ] While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions. Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory. For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc(): actions:clone(ct(commit),0) Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references. Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS. Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively. New build time assertion should protect us from this problem if new actions will be added in the future. Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too. Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Michael Walle <[email protected]> Date: Tue Apr 5 14:02:33 2022 +0200 net: phy: mscc-miim: reject clause 45 register accesses [ Upstream commit 8d90991e5bf7fdb9f264f5f579d18969913054b7 ] The driver doesn't support clause 45 register access yet, but doesn't check if the access is a c45 one either. This leads to spurious register reads and writes. Add the check. Fixes: 542671fe4d86 ("net: phy: mscc-miim: Add MDIO driver") Signed-off-by: Michael Walle <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Taehee Yoo <[email protected]> Date: Wed Mar 30 16:37:03 2022 +0000 net: sfc: add missing xdp queue reinitialization [ Upstream commit 059a47f1da93811d37533556d67e72f2261b1127 ] After rx/tx ring buffer size is changed, kernel panic occurs when it acts XDP_TX or XDP_REDIRECT. When tx/rx ring buffer size is changed(ethtool -G), sfc driver reallocates and reinitializes rx and tx queues and their buffer (tx_queue->buffer). But it misses reinitializing xdp queues(efx->xdp_tx_queues). So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized tx_queue->buffer. A new function efx_set_xdp_channels() is separated from efx_set_channels() to handle only xdp queues. Splat looks like: BUG: kernel NULL pointer dereference, address: 000000000000002a #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#4] PREEMPT SMP NOPTI RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 5.17.0+ #55 e8beeee8289528f11357029357cf Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297 RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0 RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0 FS: 0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0 RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297 PKRU: 55555554 RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700 RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700 FS: 0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0 PKRU: 55555554 Call Trace: <IRQ> efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] ? enqueue_task_fair+0x95/0x550 efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Taehee Yoo <[email protected]> Date: Tue Apr 5 08:45:44 2022 +0000 net: sfc: fix using uninitialized xdp tx_queue [ Upstream commit fb5833d81e4333294add35d3ac7f7f52a7bf107f ] In some cases, xdp tx_queue can get used before initialization. 1. interface up/down 2. ring buffer size change When CPU cores are lower than maximum number of channels of sfc driver, it creates new channels only for XDP. When an interface is up or ring buffer size is changed, all channels are initialized. But xdp channels are always initialized later. So, the below scenario is possible. Packets are received to rx queue of normal channels and it is acted XDP_TX and tx_queue of xdp channels get used. But these tx_queues are not initialized yet. If so, TX DMA or queue error occurs. In order to avoid this problem. 1. initializes xdp tx_queues earlier than other rx_queue in efx_start_channels(). 2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers(). Splat looks like: sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250 sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL) sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22 (raw=22) arg=789 sfc 0000:08:00.1 enp8s0f1np1: has been disabled Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets <[email protected]> Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Michael Walle <[email protected]> Date: Sat Mar 12 21:50:14 2022 +0100 net: sfp: add 2500base-X quirk for Lantech SFP module [ Upstream commit 00eec9fe4f3b9588b4bfa8ef9dd0aae96407d5d7 ] The Lantech 8330-262D-E module is 2500base-X capable, but it reports the nominal bitrate as 2500MBd instead of 3125MBd. Add a quirk for the module. The following in an EEPROM dump of such a SFP with the serial number redacted: 00: 03 04 07 00 00 00 01 20 40 0c 05 01 19 00 00 00 ???...? @????... 10: 1e 0f 00 00 4c 61 6e 74 65 63 68 20 20 20 20 20 ??..Lantech 20: 20 20 20 20 00 00 00 00 38 33 33 30 2d 32 36 32 ....8330-262 30: 44 2d 45 20 20 20 20 20 56 31 2e 30 03 52 00 cb D-E V1.0?R.? 40: 00 1a 00 00 46 43 XX XX XX XX XX XX XX XX XX XX .?..FCXXXXXXXXXX 50: 20 20 20 20 32 32 30 32 31 34 20 20 68 b0 01 98 220214 h??? 60: 45 58 54 52 45 4d 45 4c 59 20 43 4f 4d 50 41 54 EXTREMELY COMPAT 70: 49 42 4c 45 20 20 20 20 20 20 20 20 20 20 20 20 IBLE Signed-off-by: Michael Walle <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Chen-Yu Tsai <[email protected]> Date: Fri Apr 1 02:48:32 2022 +0800 net: stmmac: Fix unset max_speed difference between DT and non-DT platforms [ Upstream commit c21cabb0fd0b54b8b54235fc1ecfe1195a23bcb2 ] In commit 9cbadf094d9d ("net: stmmac: support max-speed device tree property"), when DT platforms don't set "max-speed", max_speed is set to -1; for non-DT platforms, it stays the default 0. Prior to commit eeef2f6b9f6e ("net: stmmac: Start adding phylink support"), the check for a valid max_speed setting was to check if it was greater than zero. This commit got it right, but subsequent patches just checked for non-zero, which is incorrect for DT platforms. In commit 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()") the conversion switched completely to checking for non-zero value as a valid value, which caused 1000base-T to stop getting advertised by default. Instead of trying to fix all the checks, simply leave max_speed alone if DT property parsing fails. Fixes: 9cbadf094d9d ("net: stmmac: support max-speed device tree property") Fixes: 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()") Signed-off-by: Chen-Yu Tsai <[email protected]> Acked-by: Russell King (Oracle) <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Florian Westphal <[email protected]> Date: Wed Feb 16 16:43:05 2022 +0100 netfilter: conntrack: revisit gc autotuning [ Upstream commit 2cfadb761d3d0219412fd8150faea60c7e863833 ] as of commit 4608fdfc07e1 ("netfilter: conntrack: collect all entries in one cycle") conntrack gc was changed to run every 2 minutes. On systems where conntrack hash table is set to large value, most evictions happen from gc worker rather than the packet path due to hash table distribution. This causes netlink event overflows when events are collected. This change collects average expiry of scanned entries and reschedules to the average remaining value, within 1 to 60 second interval. To avoid event overflows, reschedule after each bucket and add a limit for both run time and number of evictions per run. If more entries have to be evicted, reschedule and restart 1 jiffy into the future. Reported-by: Karel Rericha <[email protected]> Cc: Shmulik Ladkani <[email protected]> Cc: Eyal Birger <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Wang Yufen <[email protected]> Date: Fri Mar 18 14:35:08 2022 +0800 netlabel: fix out-of-bounds memory accesses [ Upstream commit f22881de730ebd472e15bcc2c0d1d46e36a87b9c ] In calipso_map_cat_ntoh(), in the for loop, if the return value of netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses of bitmap[byte_offset] occurs. The bug was found during fuzzing. The following is the fuzzing report BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0 Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252 CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x21c/0x230 show_stack+0x1c/0x60 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0x70/0x2d0 __kasan_report+0x158/0x16c kasan_report+0x74/0x120 __asan_load1+0x80/0xa0 netlbl_bitmap_walk+0x3c/0xd0 calipso_opt_getattr+0x1a8/0x230 calipso_sock_getattr+0x218/0x340 calipso_sock_getattr+0x44/0x60 netlbl_sock_getattr+0x44/0x80 selinux_netlbl_socket_setsockopt+0x138/0x170 selinux_socket_setsockopt+0x4c/0x60 security_socket_setsockopt+0x4c/0x90 __sys_setsockopt+0xbc/0x2b0 __arm64_sys_setsockopt+0x6c/0x84 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x88/0xa0 el0_svc+0x128/0x1b0 el0t_64_sync_handler+0x9c/0x120 el0t_64_sync+0x16c/0x170 Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wang Yufen <[email protected]> Acked-by: Paul Moore <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Mon Mar 21 13:48:36 2022 -0400 NFS: Avoid writeback threads getting stuck in mempool_alloc() [ Upstream commit 0bae835b63c53f86cdc524f5962e39409585b22c ] In a low memory situation, allow the NFS writeback code to fail without getting stuck in infinite loops in mempool_alloc(). Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Mon Mar 21 12:34:19 2022 -0400 NFS: nfsiod should not block forever in mempool_alloc() [ Upstream commit 515dcdcd48736576c6f5c197814da6f81c60a21e ] The concern is that since nfsiod is sometimes required to kick off a commit, it can get locked up waiting forever in mempool_alloc() instead of failing gracefully and leaving the commit until later. Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY, then fall back to a non-blocking attempt to allocate from the memory pool. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: NeilBrown <[email protected]> Date: Mon Mar 7 10:41:44 2022 +1100 NFS: swap IO handling is slightly different for O_DIRECT IO [ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ] 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: NeilBrown <[email protected]> Date: Mon Mar 7 10:41:44 2022 +1100 NFS: swap-out must always use STABLE writes. [ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ] The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Xin Xiong <[email protected]> Date: Tue Jan 25 21:10:45 2022 +0800 NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify() [ Upstream commit b7f114edd54326f730a754547e7cfb197b5bc132 ] [You don't often get email from [email protected]. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] The reference counting issue happens in two error paths in the function _nfs42_proc_copy_notify(). In both error paths, the function simply returns the error code and forgets to balance the refcount of object `ctx`, bumped by get_nfs_open_context() earlier, which may cause refcount leaks. Fix it by balancing refcount of the `ctx` object before the function returns in both error paths. Signed-off-by: Xin Xiong <[email protected]> Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: ChenXiaoSong <[email protected]> Date: Tue Mar 29 19:32:08 2022 +0800 NFSv4: fix open failure with O_ACCMODE flag [ Upstream commit b243874f6f9568b2daf1a00e9222cacdc15e159c ] open() with O_ACCMODE|O_DIRECT flags secondly will fail. Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT) Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when client use incorrect share access mode of 0. Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client, just like firstly opening. Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations") Signed-off-by: ChenXiaoSong <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Sat Jan 29 13:32:45 2022 -0500 NFSv4: Protect the state recovery thread against direct reclaim [ Upstream commit 3e17898aca293a24dae757a440a50aa63ca29671 ] If memory allocation triggers a direct reclaim from the state recovery thread, then we can deadlock. Use memalloc_nofs_save/restore to ensure that doesn't happen. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Viresh Kumar <[email protected]> Date: Thu Feb 10 14:37:53 2022 +0530 opp: Expose of-node's name in debugfs [ Upstream commit 021dbecabc93b1610b5db989d52a94e0c6671136 ] It is difficult to find which OPPs are active at the moment, specially if there are multiple OPPs with same frequency available in the device tree (controlled by supported hardware feature). Expose name of the DT node to find out the exact OPP. While at it, also expose level field. Reported-by: Leo Yan <[email protected]> Tested-by: Leo Yan <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Helge Deller <[email protected]> Date: Sun Mar 27 15:46:26 2022 +0200 parisc: Fix CPU affinity for Lasi, WAX and Dino chips [ Upstream commit 939fc856676c266c3bc347c1c1661872a3725c0f ] Add the missing logic to allow Lasi, WAX and Dino to set the CPU affinity. This fixes IRQ migration to other CPUs when a CPU is shutdown which currently holds the IRQs for one of those chips. Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: John David Anglin <[email protected]> Date: Tue Mar 29 18:54:36 2022 +0000 parisc: Fix patch code locking and flushing [ Upstream commit a9fe7fa7d874a536e0540469f314772c054a0323 ] This change fixes the following: 1) The flags variable is not initialized. Always use raw_spin_lock_irqsave and raw_spin_unlock_irqrestore to serialize patching. 2) flush_kernel_vmap_range is primarily intended for DMA flushes. Since __patch_text_multiple is often called with interrupts disabled, it is better to directly call flush_kernel_dcache_range_asm and flush_kernel_icache_range_asm. This avoids an extra call. 3) The final call to flush_icache_range is unnecessary. Signed-off-by: John David Anglin <[email protected]> Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Pali Rohár <[email protected]> Date: Mon Jan 10 02:49:58 2022 +0100 PCI: aardvark: Fix support for MSI interrupts [ Upstream commit b0b0b8b897f8e12b2368e868bd7cdc5742d5c5a9 ] Aardvark hardware supports Multi-MSI and MSI_FLAG_MULTI_PCI_MSI is already set for the MSI chip. But when allocating MSI interrupt numbers for Multi-MSI, the numbers need to be properly aligned, otherwise endpoint devices send MSI interrupt with incorrect numbers. Fix this issue by using function bitmap_find_free_region() instead of bitmap_find_next_zero_area(). To ensure that aligned MSI interrupt numbers are used by endpoint devices, we cannot use Linux virtual irq numbers (as they are random and not properly aligned). Instead we need to use the aligned hwirq numbers. This change fixes receiving MSI interrupts on Armada 3720 boards and allows using NVMe disks which use Multi-MSI feature with 3 interrupts. Without this NVMe disks freeze booting as linux nvme-core.c is waiting 60s for an interrupt. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Marek Behún <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hou Zhiqiang <[email protected]> Date: Fri Dec 17 17:47:08 2021 +0800 PCI: endpoint: Fix alignment fault error in copy tests [ Upstream commit 829cc0e2ea2d61fb6c54bc3f8a17f86c56e11864 ] The copy test uses the memcpy() to copy data between IO memory spaces. This can trigger an alignment fault error (pasted the error logs below) because memcpy() may use unaligned accesses on a mapped memory that is just IO, which does not support unaligned memory accesses. Fix it by using the correct memcpy API to copy from/to IO memory. Alignment fault error logs: Unable to handle kernel paging request at virtual address ffff8000101cd3c1 Mem abort info: ESR = 0x96000021 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000021 CM = 0, WnR = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081773000 [ffff8000101cd3c1] pgd=1000000082410003, p4d=1000000082410003, pud=1000000082411003, pmd=1000000082412003, pte=0068004000001f13 Internal error: Oops: 96000021 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6 Comm: kworker/0:0H Not tainted 5.15.0-rc1-next-20210914-dirty #2 Hardware name: LS1012A RDB Board (DT) Workqueue: kpcitest pci_epf_test_cmd_handler pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x168/0x230 lr : pci_epf_test_cmd_handler+0x6f0/0xa68 sp : ffff80001003bce0 x29: ffff80001003bce0 x28: ffff800010135000 x27: ffff8000101e5000 x26: ffff8000101cd000 x25: ffff6cda941cf6c8 x24: 0000000000000000 x23: ffff6cda863f2000 x22: ffff6cda9096c800 x21: ffff800010135000 x20: ffff6cda941cf680 x19: ffffaf39fd999000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffaf39fd2b6000 x14: 0000000000000000 x13: 15f5c8fa2f984d57 x12: 604d132b60275454 x11: 065cee5e5fb428b6 x10: aae662eb17d0cf3e x9 : 1d97c9a1b4ddef37 x8 : 7541b65edebf928c x7 : e71937c4fc595de0 x6 : b8a0e09562430d1c x5 : ffff8000101e5401 x4 : ffff8000101cd401 x3 : ffff8000101e5380 x2 : fffffffffffffff1 x1 : ffff8000101cd3c0 x0 : ffff8000101e5000 Call trace: __memcpy+0x168/0x230 process_one_work+0x1ec/0x370 worker_thread+0x44/0x478 kthread+0x154/0x160 ret_from_fork+0x10/0x20 Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e) ---[ end trace 568c28c7b6336335 ]--- Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hou Zhiqiang <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Kishon Vijay Abraham I <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Li Chen <[email protected]> Date: Fri Jan 21 15:48:23 2022 +0800 PCI: endpoint: Fix misused goto label [ Upstream commit bf8d87c076f55b8b4dfdb6bc6c6b6dc0c2ccb487 ] Fix a misused goto label jump since that can result in a memory leak. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Li Chen <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Acked-by: Kishon Vijay Abraham I <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Manivannan Sadhasivam <[email protected]> Date: Thu Feb 10 20:20:03 2022 +0530 PCI: pciehp: Add Qualcomm quirk for Command Completed erratum [ Upstream commit 9f72d4757cbe4d1ed669192f6d23817c9e437c4b ] The Qualcomm PCI bridge device (Device ID 0x0110) found in chipsets such as SM8450 does not set the Command Completed bit unless writes to the Slot Command register change "Control" bits. This results in timeouts like below: pcieport 0001:00:00.0: pciehp: Timeout on hotplug command 0x03c0 (issued 2020 msec ago) Add the device to the Command Completed quirk to mark commands "completed" immediately unless they change the "Control" bits. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Arnaldo Carvalho de Melo <[email protected]> Date: Thu Apr 7 11:04:20 2022 -0300 perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13 commit 3a8a0475861a443f02e3a9b57d044fe2a0a99291 upstream. Using -ffat-lto-objects in the python feature test when building with clang-13 results in: clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument] error: command '/usr/sbin/clang' failed with exit code 1 cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1 Noticed when building on a docker.io/library/archlinux:base container. Cc: Adrian Hunter <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Keeping <[email protected]> Cc: Leo Yan <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Sedat Dilek <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Arnaldo Carvalho de Melo <[email protected]> Date: Fri Apr 8 10:08:07 2022 -0300 perf python: Fix probing for some clang command line options commit dd6e1fe91cdd52774ca642d1da75b58a86356b56 upstream. The clang compiler complains about some options even without a source file being available, while others require one, so use the simple tools/build/feature/test-hello.c file. Then check for the "is not supported" string in its output, in addition to the "unknown argument" already being looked for. This was noticed when building with clang-13 where -ffat-lto-objects isn't supported and since we were looking just for "unknown argument" and not providing a source code to clang, was mistakenly assumed as being available and not being filtered to set of command line options provided to clang, leading to a build failure. Cc: Adrian Hunter <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Keeping <[email protected]> Cc: Leo Yan <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Sedat Dilek <[email protected]> Link: http://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Denis Nikitin <[email protected]> Date: Tue Mar 29 20:11:30 2022 -0700 perf session: Remap buf if there is no space for event [ Upstream commit bc21e74d4775f883ae1f542c1f1dc7205b15d925 ] If a perf event doesn't fit into remaining buffer space return NULL to remap buf and fetch the event again. Keep the logic to error out on inadequate input from fuzzing. This fixes perf failing on ChromeOS (with 32b userspace): $ perf report -v -i perf.data ... prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data? Error: failed to process sample Fixes: 57fc032ad643ffd0 ("perf session: Avoid infinite loop when seeing invalid header.size") Reviewed-by: James Clark <[email protected]> Signed-off-by: Denis Nikitin <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Adrian Hunter <[email protected]> Date: Fri Apr 8 16:26:25 2022 +0300 perf tools: Fix perf's libperf_print callback [ Upstream commit aeee9dc53ce405d2161f9915f553114e94e5b677 ] eprintf() does not expect va_list as the type of the 4th parameter. Use veprintf() because it does. Signed-off-by: Adrian Hunter <[email protected]> Fixes: 428dab813a56ce94 ("libperf: Merge libperf_set_print() into libperf_init()") Cc: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Namhyung Kim <[email protected]> Date: Mon Mar 28 13:01:12 2022 -0700 perf/core: Inherit event_caps commit e3265a4386428d3d157d9565bb520aabff8b4bf0 upstream. It was reported that some perf event setup can make fork failed on ARM64. It was the case of a group of mixed hw and sw events and it failed in perf_event_init_task() due to armpmu_event_init(). The ARM PMU code checks if all the events in a group belong to the same PMU except for software events. But it didn't set the event_caps of inherited events and no longer identify them as software events. Therefore the test failed in a child process. A simple reproducer is: $ perf stat -e '{cycles,cs,instructions}' perf bench sched messaging # Running 'sched/messaging' benchmark: perf: fork(): Invalid argument The perf stat was fine but the perf bench failed in fork(). Let's inherit the event caps from the parent. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kan Liang <[email protected]> Date: Mon Mar 28 08:49:02 2022 -0700 perf/x86/intel: Don't extend the pseudo-encoding to GP counters commit 4a263bf331c512849062805ef1b4ac40301a9829 upstream. The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR. perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 607,246 cpu/event=0xc0,umask=0x0/ 0 cpu/event=0x0,umask=0x1/ The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which doesn't work on the generic counters. However, current perf extends its mask to the generic counters. The pseudo event-code for a fixed counter must be 0x00. Check and avoid extending the mask for the fixed counter event which using the pseudo-encoding, e.g., ref-cycles and PREC_DIST event. With the patch, perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 583,184 cpu/event=0xc0,umask=0x0/ 583,048 cpu/event=0x0,umask=0x1/ Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings") Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kan Liang <[email protected]> Date: Mon Mar 28 08:49:03 2022 -0700 perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids commit e590928de7547454469693da9bc7ffd562e54b7e upstream. On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't support it. Update intel_spr_extra_regs[] to support it. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: James Clark <[email protected]> Date: Fri Apr 8 15:40:56 2022 +0100 perf: arm-spe: Fix perf report --mem-mode [ Upstream commit ffab487052054162b3b6c9c6005777ec6cfcea05 ] Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available") "perf mem report" and "perf report --mem-mode" don't allow opening the file unless one of the events has PERF_SAMPLE_DATA_SRC set. SPE doesn't have this set even though synthetic memory data is generated after it is decoded. Fix this issue by setting DATA_SRC on SPE events. This has no effect on the data collected because the SPE driver doesn't do anything with that flag and doesn't generate samples. Fixes: bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available") Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: German Gomez <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Xiaomeng Tong <[email protected]> Date: Sun Mar 27 13:57:33 2022 +0800 perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator commit 2012a9e279013933885983cbe0a5fe828052563b upstream. The bug is here: return cluster; The list iterator value 'cluster' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, return 'cluster' when found, otherwise return NULL. Cc: [email protected] Fixes: 21bdbb7102ed ("perf: add qcom l2 cache perf events driver") Signed-off-by: Xiaomeng Tong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Amjad Ouled-Ameur <[email protected]> Date: Tue Jan 11 10:52:55 2022 +0100 phy: amlogic: meson8b-usb2: fix shared reset control use [ Upstream commit 6f1dedf089ab1a4f03ea7aadc3c4a99885b4b4a0 ] Use reset_control_rearm() call if an error occurs in case phy_meson8b_usb2_power_on() fails after reset() has been called, or in case phy_meson8b_usb2_power_off() is called i.e the resource is no longer used and the reset line may be triggered again by other devices. reset_control_rearm() keeps use of triggered_count sane in the reset framework, use of reset_control_reset() on shared reset line should be balanced with reset_control_rearm(). Signed-off-by: Amjad Ouled-Ameur <[email protected]> Reported-by: Jerome Brunet <[email protected]> Reviewed-by: Martin Blumenstingl <[email protected]> Acked-by: Neil Armstrong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Amjad Ouled-Ameur <[email protected]> Date: Tue Jan 11 10:52:54 2022 +0100 phy: amlogic: meson8b-usb2: Use dev_err_probe() [ Upstream commit 6466ba1898d415b527e1013bd8551a6fdfece94c ] Use the existing dev_err_probe() helper instead of open-coding the same operation. Signed-off-by: Amjad Ouled-Ameur <[email protected]> Reported-by: Martin Blumenstingl <[email protected]> Reviewed-by: Martin Blumenstingl <[email protected]> Acked-by: Neil Armstrong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Amjad Ouled-Ameur <[email protected]> Date: Tue Jan 11 10:52:53 2022 +0100 phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use [ Upstream commit 2f87727130ce17ffefecd0895eeebf22d5a36f6f ] Use reset_control_rearm() call if an error occurs in case phy_meson_gxl_usb2_init() fails after reset() has been called ; or in case phy_meson_gxl_usb2_exit() is called i.e the resource is no longer used and the reset line may be triggered again by other devices. reset_control_rearm() keeps use of triggered_count sane in the reset framework. Therefore, use of reset_control_reset() on shared reset line should be balanced with reset_control_rearm(). Signed-off-by: Amjad Ouled-Ameur <[email protected]> Reported-by: Jerome Brunet <[email protected]> Reviewed-by: Martin Blumenstingl <[email protected]> Reviewed-by: Philipp Zabel <[email protected]> Acked-by: Neil Armstrong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Evgeny Boger <[email protected]> Date: Wed Jan 12 11:47:27 2022 +0300 power: supply: axp20x_battery: properly report current when discharging [ Upstream commit d4f408cdcd26921c1268cb8dcbe8ffb6faf837f3 ] As stated in [1], negative current values are used for discharging batteries. AXP PMICs internally have two different ADC channels for shunt current measurement: one used during charging and one during discharging. The values reported by these ADCs are unsigned. While the driver properly selects ADC channel to get the data from, it doesn't apply negative sign when reporting discharging current. [1] Documentation/ABI/testing/sysfs-class-power Signed-off-by: Evgeny Boger <[email protected]> Acked-by: Chen-Yu Tsai <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hans de Goede <[email protected]> Date: Tue Feb 8 13:51:47 2022 +0100 power: supply: axp288-charger: Set Vhold to 4.4V [ Upstream commit 5ac121b81b4051e7fc83d5b3456a5e499d5bd147 ] The AXP288's recommended and factory default Vhold value (minimum input voltage below which the input current draw will be reduced) is 4.4V. This lines up with other charger IC's such as the TI bq2419x/bq2429x series which use 4.36V or 4.44V. For some reason some BIOS-es initialize Vhold to 4.6V or even 4.7V which combined with the typical voltage drop over typically low wire gauge micro-USB cables leads to the input-current getting capped below 1A (with a 2A capable dedicated charger) based on Vhold. This leads to slow charging, or even to the device slowly discharging if the device is in heavy use. As the Linux AXP288 drivers use the builtin BC1.2 charger detection and send the input-current-limit according to the detected charger there really is no reason not to use the recommended 4.4V Vhold. Set Vhold to 4.4V to fix the slow charging issue on various devices. There is one exception, the special-case of the HP X2 2-in-1s which combine this BC1.2 capable PMIC with a Type-C port and a 5V/3A factory provided charger with a Type-C plug which does not do BC1.2. These have their input-current-limit hardcoded to 3A (like under Windows) and use a higher Vhold on purpose to limit the current when used with other chargers. To avoid touching Vhold on these HP X2 laptops the code setting Vhold is added to an else branch of the if checking for these models. Note this also fixes the sofar unused VBUS_ISPOUT_VHOLD_SET_MASK define, which was wrong. Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christophe Leroy <[email protected]> Date: Sun Mar 27 09:32:26 2022 +0200 powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S commit af41d2866f7d75bbb38d487f6ec7770425d70e45 upstream. Using conditional branches between two files is hasardous, they may get linked too far from each other. arch/powerpc/kvm/book3s_64_entry.o:(.text+0x3ec): relocation truncated to fit: R_PPC64_REL14 (stub) against symbol `system_reset_common' defined in .text section in arch/powerpc/kernel/head_64.o Reorganise the code to use non conditional branches. Fixes: 89d35b239101 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C") Signed-off-by: Christophe Leroy <[email protected]> [mpe: Avoid odd-looking bne ., use named local labels] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/89cf27bf43ee07a0b2879b9e8e2f5cd6386a3645.1648366338.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Michael Ellerman <[email protected]> Date: Fri Mar 4 17:12:22 2022 +1100 powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E [ Upstream commit 1a76e520ee1831a81dabf8a9a58c6453f700026e ] Since the IBM A2 CPU support was removed, see commit fb5a515704d7 ("powerpc: Remove platforms/wsp and associated pieces"), the only 64-bit Book3E CPUs we support are Freescale (NXP) ones. However our Kconfig still allows configurating a kernel that has 64-bit Book3E support, but no Freescale CPU support enabled. Such a kernel would never boot, it doesn't know about any CPUs. It also causes build errors, as reported by lkp, because PPC_BARRIER_NOSPEC is not enabled in such a configuration: powerpc64-linux-ld: arch/powerpc/net/bpf_jit_comp64.o:(.toc+0x0): undefined reference to `powerpc_security_features' To fix this, force PPC_FSL_BOOK3E to be selected whenever we are building a 64-bit Book3E kernel. Reported-by: kernel test robot <[email protected]> Reported-by: Naveen N. Rao <[email protected]> Suggested-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Nicholas Piggin <[email protected]> Date: Fri Feb 4 13:53:48 2022 +1000 powerpc/64s/hash: Make hash faults work in NMI context [ Upstream commit 8b91cee5eadd2021f55e6775f2d50bd56d00c217 ] Hash faults are not resoved in NMI context, instead causing the access to fail. This is done because perf interrupts can get backtraces including walking the user stack, and taking a hash fault on those could deadlock on the HPTE lock if the perf interrupt hits while the same HPTE lock is being held by the hash fault code. The user-access for the stack walking will notice the access failed and deal with that in the perf code. The reason to allow perf interrupts in is to better profile hash faults. The problem with this is any hash fault on a kernel access that happens in NMI context will crash, because kernel accesses must not fail. Hard lockups, system reset, machine checks that access vmalloc space including modules and including stack backtracing and symbol lookup in modules, per-cpu data, etc could all run into this problem. Fix this by disallowing perf interrupts in the hash fault code (the direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs to extend to the preload case). This simplifies the tricky logic in hash faults and perf, at the cost of reduced profiling of hash faults. perf can still latch addresses when interrupts are disabled, it just won't get the stack trace at that point, so it would still find hot spots, just sometimes with confusing stack chains. An alternative could be to allow perf interrupts here but always do the slowpath stack walk if we are in nmi context, but that slows down all perf interrupt stack walking on hash though and it does not remove as much tricky code. Reported-by: Laurent Dufour <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Laurent Dufour <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Hangyu Hua <[email protected]> Date: Wed Mar 2 10:19:59 2022 +0800 powerpc/secvar: fix refcount leak in format_show() [ Upstream commit d601fd24e6964967f115f036a840f4f28488f63f ] Refcount leak will happen when format_show returns failure in multiple cases. Unified management of of_node_put can fix this problem. Signed-off-by: Hangyu Hua <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Christophe Leroy <[email protected]> Date: Fri Dec 24 11:07:33 2021 +0000 powerpc/set_memory: Avoid spinlock recursion in change_page_attr() [ Upstream commit a4c182ecf33584b9b2d1aa9dad073014a504c01f ] Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") included a spin_lock() to change_page_attr() in order to safely perform the three step operations. But then commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") modify it to use pte_update() and do the operation safely against concurrent access. In the meantime, Maxime reported some spinlock recursion. [ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217 [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0 [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523 [ 15.373350] Workqueue: events do_free_init [ 15.377615] Call Trace: [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable) [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4 [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0 [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8 [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94 [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134 [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8 [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8 [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94 [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8 [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8 [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210 [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c Remove the read / modify / write sequence to make the operation atomic and remove the spin_lock() in change_page_attr(). To do the operation atomically, we can't use pte modification helpers anymore. Because all platforms have different combination of bits, it is not easy to use those bits directly. But all have the _PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare two sets to know which bits are set or cleared. For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you know which bit gets cleared and which bit get set when changing exec permission. Reported-by: Maxime Bizon <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/ Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.1640344012.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <[email protected]>
Author: Maxim Kiselev <[email protected]> Date: Thu Dec 30 18:11:21 2021 +0300 powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 [ Upstream commit 17846485dff91acce1ad47b508b633dffc32e838 ] T1040RDB has two RTL8211E-VB phys which requires setting of internal delays for correct work. Changing the phy-connection-type property to `rgmii-id` will fix this issue. Signed-off-by: Maxim Kiselev <[email protected]> Reviewed-by: Maxim Kochetkov <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Kefeng Wang <[email protected]> Date: Thu Apr 7 00:57:57 2022 +1000 powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 upstream. mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000. Because of the way __pa() works we have: __pa(0x8000000000000000) == 0, and therefore virt_to_pfn(0x8000000000000000) == 0, and therefore virt_addr_valid(0x8000000000000000) == true Which is wrong, virt_addr_valid() should be false for vmalloc space. In fact all vmalloc addresses that alias with a valid PFN will return true from virt_addr_valid(). That can cause bugs with hardened usercopy as described below by Kefeng Wang: When running ethtool eth0 on 64-bit Book3E, a BUG occurred: usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)! kernel BUG at mm/usercopy.c:99 ... usercopy_abort+0x64/0xa0 (unreliable) __check_heap_object+0x168/0x190 __check_object_size+0x1a0/0x200 dev_ethtool+0x2494/0x2b20 dev_ioctl+0x5d0/0x770 sock_do_ioctl+0xf0/0x1d0 sock_ioctl+0x3ec/0x5a0 __se_sys_ioctl+0xf0/0x160 system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200 The code shows below, data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN)) The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true on 64-bit Book3E, which leads to the panic. As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in the virt_addr_valid() for 64-bit, also add upper limit check to make sure the virt is below high_memory. Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start of lowmem, high_memory is the upper low virtual address, the check is suitable for 32-bit, this will fix the issue mentioned in commit 602946ec2f90 ("powerpc: Set max_mapnr correctly") too. On 32-bit there is a similar problem with high memory, that was fixed in commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that commit breaks highmem and needs to be reverted. We can't easily fix __pa(), we have code that relies on its current behaviour. So for now add extra checks to virt_addr_valid(). For 64-bit Book3S the extra checks are not necessary, the combination of virt_to_pfn() and pfn_valid() should yield the correct result, but they are harmless. Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> [mpe: Add additional change log detail] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sourabh Jain <[email protected]> Date: Fri Feb 4 14:26:01 2022 +0530 powerpc: Set crashkernel offset to mid of RMA region [ Upstream commit 7c5ed82b800d8615cdda00729e7b62e5899f0b13 ] On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources. The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources. Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual. This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html Reported-by: Abdul haleem <[email protected]> Signed-off-by: Sourabh Jain <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Yang Guang <[email protected]> Date: Thu Jan 27 08:02:36 2022 +0800 ptp: replace snprintf with sysfs_emit [ Upstream commit e2cf07654efb0fd7bbcb475c6f74be7b5755a8fd ] coccinelle report: ./drivers/ptp/ptp_sysfs.c:17:8-16: WARNING: use scnprintf or sprintf ./drivers/ptp/ptp_sysfs.c:390:8-16: WARNING: use scnprintf or sprintf Use sysfs_emit instead of scnprintf or sprintf makes more sense. Reported-by: Zeal Robot <[email protected]> Signed-off-by: Yang Guang <[email protected]> Signed-off-by: David Yang <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jamie Bainbridge <[email protected]> Date: Wed Apr 6 21:19:19 2022 +1000 qede: confirm skb is allocated before using [ Upstream commit 4e910dbe36508654a896d5735b318c0b88172570 ] qede_build_skb() assumes build_skb() always works and goes straight to skb_reserve(). However, build_skb() can fail under memory pressure. This results in a kernel panic because the skb to reserve is NULL. Add a check in case build_skb() failed to allocate and return NULL. The NULL return is handled correctly in callers to qede_build_skb(). Fixes: 8a8633978b842 ("qede: Add build_skb() support.") Signed-off-by: Jamie Bainbridge <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Douglas Miller <[email protected]> Date: Fri Apr 8 09:35:23 2022 -0400 RDMA/hfi1: Fix use-after-free bug for mm struct commit 2bbac98d0930e8161b1957dc0ec99de39ade1b3c upstream. Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may represent the last reference held on the task mm. hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed before the final use in hfi1_release_user_pages(). A new task may allocate the mm structure while it is still being used, resulting in problems. One manifestation is corruption of the mmap_sem counter leading to a hang in down_write(). Another is corruption of an mm struct that is in use by another task. Fixes: 3d2a9d642512 ("IB/hfi1: Ensure correct mm is used at all times") Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Signed-off-by: Douglas Miller <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Aharon Landau <[email protected]> Date: Mon Apr 4 11:58:04 2022 +0300 RDMA/mlx5: Add a missing update of cache->last_add [ Upstream commit 1d735eeee63a0beb65180ca0224f239cc0c9f804 ] Update cache->last_add when returning an MR to the cache so that the cache work won't remove it. Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue") Link: https://lore.kernel.org/r/c99f076fce4b44829d434936bbcd3b5fc4c95020.1649062436.git.leonro@nvidia.com Signed-off-by: Aharon Landau <[email protected]> Reviewed-by: Shay Drory <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Aharon Landau <[email protected]> Date: Mon Apr 4 11:58:03 2022 +0300 RDMA/mlx5: Don't remove cache MRs when a delay is needed [ Upstream commit 84c2362fb65d69c721fec0974556378cbb36a62b ] Don't remove MRs from the cache if need to delay the removal. Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue") Link: https://lore.kernel.org/r/c3087a90ff362c8796c7eaa2715128743ce36722.1649062436.git.leonro@nvidia.com Signed-off-by: Aharon Landau <[email protected]> Reviewed-by: Shay Drory <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Axel Lin <[email protected]> Date: Sun Apr 3 21:22:35 2022 +0800 regulator: atc260x: Fix missing active_discharge_on setting [ Upstream commit 2316f0fc0ad2aa87a568ceaf3d76be983ee555c3 ] Without active_discharge_on setting, the SWITCH1 discharge enable control is always disabled. Fix it. Fixes: 3b15ccac161a ("regulator: Add regulator driver for ATC260x PMICs") Signed-off-by: Axel Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Axel Lin <[email protected]> Date: Mon Apr 4 10:25:14 2022 +0800 regulator: rtq2134: Fix missing active_discharge_on setting [ Upstream commit 17049bf9de55a42ee96fd34520aff8a484677675 ] The active_discharge_on setting was missed, so output discharge resistor is always disabled. Fix it. Fixes: 0555d41497de ("regulator: rtq2134: Add support for Richtek RTQ2134 SubPMIC") Signed-off-by: Axel Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Pali Rohár <[email protected]> Date: Fri Mar 18 15:14:41 2022 +0100 Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" commit 7e2646ed47542123168d43916b84b954532e5386 upstream. This reverts commit bb32e1987bc55ce1db400faf47d85891da3c9b9f. Commit 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization") contains proper fix for the issue described in commit bb32e1987bc5 ("mmc: sdhci-xenon: fix annoying 1.8V regulator warning"). Fixes: 8d876bf472db ("mmc: sdhci-xenon: wait 5ms after set 1.8V signal enable") Cc: [email protected] # 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization") Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Marek Behún <[email protected]> Reviewed-by: Marcin Wojtas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jens Axboe <[email protected]> Date: Sat Apr 2 11:40:23 2022 -0600 Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()" commit 7198bfc2017644c6b92d2ecef9b8b8e0363bb5fd upstream. This reverts commit 6d35d04a9e18990040e87d2bbf72689252669d54. Both Gabriel and Borislav report that this commit casues a regression with nbd: sysfs: cannot create duplicate filename '/dev/block/43:0' Revert it before 5.18-rc1 and we'll investigage this separately in due time. Link: https://lore.kernel.org/all/[email protected]/ Reported-by: Gabriel L. Somlo <[email protected]> Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: dann frazier <[email protected]> Date: Thu Apr 7 14:16:42 2022 -0600 Revert "net/mlx5: Accept devlink user input after driver initialization complete" This reverts commit 9cc25e8529d567e08da98d11f092b21449763144 which is commit 64ea2d0e7263b67d8efc93fa1baace041ed36d1e upstream. This patch was shown to introduce a regression: # devlink dev param show pci/0000:24:00.0 name flow_steering_mode pci/0000:24:00.0: name flow_steering_mode type driver-specific values: (flow steering mode description is missing beneath "values:") # devlink dev param set pci/0000:24:00.0 name flow_steering_mode value smfs cmode runtime Segmentation fault (core dumped) and also with upstream iproute # ./iproute2/devlink/devlink dev param set pci/0000:24:00.0 name flow_steering_mode value smfs cmode runtime Configuration mode not supported Note: Instead of reverting, we could instead also backport commit cf530217408e ("devlink: Notify users when objects are accessible"). However, that makes changes to core devlink code that I'm not sure are suitable for a stable backport. Cc: Leon Romanovsky <[email protected]> Cc: David S. Miller <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: dann frazier <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: ChenXiaoSong <[email protected]> Date: Tue Mar 29 19:32:07 2022 +0800 Revert "NFSv4: Handle the special Linux file open access mode" [ Upstream commit ab0fc21bc7105b54bafd85bd8b82742f9e68898a ] This reverts commit 44942b4e457beda00981f616402a1a791e8c616e. After secondly opening a file with O_ACCMODE|O_DIRECT flags, nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek(). Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT) 5. lseek(fd) Reported-by: Lyu Tao <[email protected]> Signed-off-by: ChenXiaoSong <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jakub Kicinski <[email protected]> Date: Mon Mar 28 14:29:04 2022 -0700 Revert "selftests: net: Add tls config dependency for tls selftests" commit 20695e9a9fd39103d1b0669470ae74030b7aa196 upstream. This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47. The test is supposed to run cleanly with TLS is disabled, to test compatibility with TCP behavior. I can't repro the failure [1], the problem should be debugged rather than papered over. Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [1] Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests") Signed-off-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Michael T. Kloos <[email protected]> Date: Mon Mar 7 20:03:21 2022 -0500 riscv: Fixed misaligned memory access. Fixed pointer comparison. [ Upstream commit 9d1f0ec9f71780e69ceb9d91697600c747d6e02e ] Rewrote the RISC-V memmove() assembly implementation. The previous implementation did not check memory alignment and it compared 2 pointers with a signed comparison. The misaligned memory access would cause the kernel to crash on systems that did not emulate it in firmware and did not support it in hardware. Firmware emulation is slow and may not exist. The RISC-V spec does not guarantee that support for misaligned memory accesses will exist. It should not be depended on. This patch now checks for XLEN granularity of co-alignment between the pointers. Failing that, copying is done by loading from the 2 contiguous and naturally aligned XLEN memory locations containing the overlapping XLEN sized data to be copied. The data is shifted into the correct place and binary or'ed together on each iteration. The result is then stored into the corresponding naturally aligned XLEN sized location in the destination. For unaligned data at the terminations of the regions to be copied or for copies less than (2 * XLEN) in size, byte copy is used. This patch also now uses unsigned comparison for the pointers and migrates to the newer assembler annotations from the now deprecated ones. Signed-off-by: Michael T. Kloos <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Mateusz Jończyk <[email protected]> Date: Fri Dec 10 21:01:25 2021 +0100 rtc: Check return value from mc146818_get_time() [ Upstream commit 0dd8d6cb9eddfe637bcd821bbfd40ebd5a0737b9 ] There are 4 users of mc146818_get_time() and none of them was checking the return value from this function. Change this. Print the appropriate warnings in callers of mc146818_get_time() instead of in the function mc146818_get_time() itself, in order not to add strings to rtc-mc146818-lib.c, which is kind of a library. The callers of alpha_rtc_read_time() and cmos_read_time() may use the contents of (struct rtc_time *) even when the functions return a failure code. Therefore, set the contents of (struct rtc_time *) to 0x00, which looks more sensible then 0xff and aligns with the (possibly stale?) comment in cmos_read_time: /* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. */ For consistency, do this in mc146818_get_time(). Note: hpet_rtc_interrupt() may call mc146818_get_time() many times a second. It is very unlikely, though, that the RTC suddenly stops working and mc146818_get_time() would consistently fail. Only compile-tested on alpha. Signed-off-by: Mateusz Jończyk <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Mateusz Jończyk <[email protected]> Date: Fri Dec 10 21:01:24 2021 +0100 rtc: mc146818-lib: change return values of mc146818_get_time() [ Upstream commit d35786b3a28dee20b12962ae2dd365892a99ed1a ] No function is checking mc146818_get_time() return values yet, so correct them to make them more customary. Signed-off-by: Mateusz Jończyk <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Alexandre Belloni <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Mateusz Jończyk <[email protected]> Date: Fri Dec 10 21:01:26 2021 +0100 rtc: mc146818-lib: fix RTC presence check [ Upstream commit ea6fa4961aab8f90a8aa03575a98b4bda368d4b6 ] To prevent an infinite loop in mc146818_get_time(), commit 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") added a check for RTC availability. Together with a later fix, it checked if bit 6 in register 0x0d is cleared. This, however, caused a false negative on a motherboard with an AMD SB710 southbridge; according to the specification [1], bit 6 of register 0x0d of this chipset is a scratchbit. This caused a regression in Linux 5.11 - the RTC was determined broken by the kernel and not used by rtc-cmos.c [3]. This problem was also reported in Fedora [4]. As a better alternative, check whether the UIP ("Update-in-progress") bit is set for longer then 10ms. If that is the case, then apparently the RTC is either absent (and all register reads return 0xff) or broken. Also limit the number of loop iterations in mc146818_get_time() to 10 to prevent an infinite loop there. The functions mc146818_get_time() and mc146818_does_rtc_work() will be refactored later in this patch series, in order to fix a separate problem with reading / setting the RTC alarm time. This is done so to avoid a confusion about what is being fixed when. In a previous approach to this problem, I implemented a check whether the RTC_HOURS register contains a value <= 24. This, however, sometimes did not work correctly on my Intel Kaby Lake laptop. According to Intel's documentation [2], "the time and date RAM locations (0-9) are disconnected from the external bus" during the update cycle so reading this register without checking the UIP bit is incorrect. [1] AMD SB700/710/750 Register Reference Guide, page 308, https://developer.amd.com/wordpress/media/2012/10/43009_sb7xx_rrg_pub_1.00.pdf [2] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet Volume 1 of 2, page 209 Intel's Document Number: 334658-006, https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-and-8th-gen-core-family-mobile-u-y-processor-lines-i-o-datasheet-vol-1.pdf [3] Functions in arch/x86/kernel/rtc.c apparently were using it. [4] https://bugzilla.redhat.com/show_bug.cgi?id=1936688 Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") Fixes: ebb22a059436 ("rtc: mc146818: Dont test for bit 0-5 in Register D") Signed-off-by: Mateusz Jończyk <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Alexandre Belloni <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Dan Carpenter <[email protected]> Date: Tue Jan 11 10:19:22 2022 +0300 rtc: mc146818-lib: fix signedness bug in mc146818_get_time() commit 7372971c1be5b7d4fdd8ad237798bdc1d1d54162 upstream. The mc146818_get_time() function returns zero on success or negative a error code on failure. It needs to be type int. Fixes: d35786b3a28d ("rtc: mc146818-lib: change return values of mc146818_get_time()") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Mateusz Jończyk <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/20220111071922.GE11243@kili Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jiasheng Jiang <[email protected]> Date: Thu Mar 3 16:50:30 2022 +0800 rtc: wm8350: Handle error for wm8350_register_irq [ Upstream commit 43f0269b6b89c1eec4ef83c48035608f4dcdd886 ] As the potential failure of the wm8350_register_irq(), it should be better to check it and return error if fails. Also, it need not free 'wm_rtc->rtc' since it will be freed automatically. Fixes: 077eaf5b40ec ("rtc: rtc-wm8350: add support for WM8350 RTC") Signed-off-by: Jiasheng Jiang <[email protected]> Acked-by: Charles Keepax <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Mon Apr 4 11:34:39 2022 -0700 rxrpc: fix a race in rxrpc_exit_net() [ Upstream commit 1946014ca3b19be9e485e780e862c375c6f98bad ] Current code can lead to the following race: CPU0 CPU1 rxrpc_exit_net() rxrpc_peer_keepalive_worker() if (rxnet->live) rxnet->live = false; del_timer_sync(&rxnet->peer_keepalive_timer); timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay); cancel_work_sync(&rxnet->peer_keepalive_work); rxrpc_exit_net() exits while peer_keepalive_timer is still armed, leading to use-after-free. syzbot report was: ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0 WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0 R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __debug_check_no_obj_freed lib/debugobjects.c:992 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023 kfree+0xd6/0x310 mm/slab.c:3809 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176 ops_free_list net/core/net_namespace.c:174 [inline] cleanup_net+0x591/0xb00 net/core/net_namespace.c:598 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 </TASK> Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet <[email protected]> Cc: David Howells <[email protected]> Cc: Marc Dionne <[email protected]> Cc: [email protected] Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sebastian Andrzej Siewior <[email protected]> Date: Thu Mar 17 15:51:32 2022 +0100 sched: Teach the forced-newidle balancer about CPU affinity limitation. commit 386ef214c3c6ab111d05e1790e79475363abaa05 upstream. try_steal_cookie() looks at task_struct::cpus_mask to decide if the task could be moved to `this' CPU. It ignores that the task might be in a migration disabled section while not on the CPU. In this case the task must not be moved otherwise per-CPU assumption are broken. Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a task can be moved. Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Randy Dunlap <[email protected]> Date: Tue Feb 22 16:06:23 2022 -0800 scsi: aha152x: Fix aha152x_setup() __setup handler return value [ Upstream commit cc8294ec4738d25e2bb2d71f7d82a9bf7f4a157b ] __setup() handlers should return 1 if the command line option is handled and 0 if not (or maybe never return 0; doing so just pollutes init's environment with strings that are not init arguments/parameters). Return 1 from aha152x_setup() to indicate that the boot option has been handled. Link: lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Cc: "Juergen E. Fischer" <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Reported-by: Igor Zhbanov <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Yang Guang <[email protected]> Date: Thu Jan 27 08:03:46 2022 +0800 scsi: bfa: Replace snprintf() with sysfs_emit() [ Upstream commit 2245ea91fd3a04cafbe2f54911432a8657528c3b ] coccinelle report: ./drivers/scsi/bfa/bfad_attr.c:908:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:860:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:888:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:853:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:808:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:728:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:822:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:927:9-17: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:900:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:874:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:714:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:839:8-16: WARNING: use scnprintf or sprintf Use sysfs_emit() instead of scnprintf() or sprintf(). Link: https://lore.kernel.org/r/def83ff75faec64ba592b867a8499b1367bae303.1643181468.git.yang.guang5@zte.com.cn Reported-by: Zeal Robot <[email protected]> Signed-off-by: Yang Guang <[email protected]> Signed-off-by: David Yang <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: John Garry <[email protected]> Date: Wed Mar 16 17:44:30 2022 +0800 scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map() [ Upstream commit eaba83b5b8506bbc9ee7ca2f10aeab3fff3719e7 ] In commit edb854a3680b ("scsi: core: Reallocate device's budget map on queue depth change"), the sbitmap for the device budget map may be reallocated after the slave device depth is configured. When the sbitmap is reallocated we use the result from scsi_device_max_queue_depth() for the sbitmap size, but don't resize to match the actual device queue depth. Fix by resizing the sbitmap after reallocating the budget sbitmap. We do this instead of init'ing the sbitmap to the device queue depth as the user may want to change the queue depth later via sysfs or other. Link: https://lore.kernel.org/r/[email protected] Fixes: edb854a3680b ("scsi: core: Reallocate device's budget map on queue depth change") Tested-by: Damien Le Moal <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Qi Liu <[email protected]> Date: Thu Feb 24 19:51:26 2022 +0800 scsi: hisi_sas: Free irq vectors in order for v3 HW [ Upstream commit 554fb72ee34f4732c7f694f56c3c6e67790352a0 ] If the driver probe fails to request the channel IRQ or fatal IRQ, the driver will free the IRQ vectors before freeing the IRQs in free_irq(), and this will cause a kernel BUG like this: ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:369! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Call trace: free_msi_irqs+0x118/0x13c pci_disable_msi+0xfc/0x120 pci_free_irq_vectors+0x24/0x3c hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw] local_pci_probe+0x44/0xb0 work_for_cpu_fn+0x20/0x34 process_one_work+0x1d0/0x340 worker_thread+0x2e0/0x460 kthread+0x180/0x190 ret_from_fork+0x10/0x20 ---[ end trace b88990335b610c11 ]--- So we use devm_add_action() to control the order in which we free the vectors. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Qi Liu <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Xiang Chen <[email protected]> Date: Thu Feb 24 19:51:28 2022 +0800 scsi: hisi_sas: Limit users changing debugfs BIST count value [ Upstream commit 286ce4c65fbdf5eb9d4d5f4e4997c4e32bf1b073 ] Add a file operation for "cnt" file under bist directory, so users can only read "cnt" or clear "cnt" to zero, but cannot randomly modify. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xiang Chen <[email protected]> Signed-off-by: Qi Liu <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jianglei Nie <[email protected]> Date: Thu Mar 3 09:51:15 2022 +0800 scsi: libfc: Fix use after free in fc_exch_abts_resp() [ Upstream commit 271add11994ba1a334859069367e04d2be2ebdd4 ] fc_exch_release(ep) will decrease the ep's reference count. When the reference count reaches zero, it is freed. But ep is still used in the following code, which will lead to a use after free. Return after the fc_exch_release() call to avoid use after free. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jianglei Nie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sreekanth Reddy <[email protected]> Date: Thu Feb 10 15:28:16 2022 +0530 scsi: mpi3mr: Fix memory leaks [ Upstream commit d44b5fefb22e139408ae12b864da1ecb9ad9d1d2 ] Fix memory leaks related to operational reply queue's memory segments which are not getting freed while unloading the driver. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sreekanth Reddy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sreekanth Reddy <[email protected]> Date: Thu Feb 10 15:28:14 2022 +0530 scsi: mpi3mr: Fix reporting of actual data transfer size [ Upstream commit 9992246127246a27cc7184f05cce6f62ac48f84e ] The driver is missing to set the residual size while completing an I/O. Ensure proper data transfer size is reported to the kernel on I/O completion based on the transfer length reported by the firmware. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sreekanth Reddy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Damien Le Moal <[email protected]> Date: Tue Mar 22 14:57:02 2022 +0900 scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove() commit 87d663d40801dffc99a5ad3b0188ad3e2b4d1557 upstream. The function mpt3sas_transport_port_remove() called in _scsih_expander_node_remove() frees the port field of the sas_expander structure, leading to the following use-after-free splat from KASAN when the ioc_info() call following that function is executed (e.g. when doing rmmod of the driver module): [ 3479.371167] ================================================================== [ 3479.378496] BUG: KASAN: use-after-free in _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.386936] Read of size 1 at addr ffff8881c037691c by task rmmod/1531 [ 3479.393524] [ 3479.395035] CPU: 18 PID: 1531 Comm: rmmod Not tainted 5.17.0-rc8+ #1436 [ 3479.401712] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.1 06/02/2021 [ 3479.409263] Call Trace: [ 3479.411743] <TASK> [ 3479.413875] dump_stack_lvl+0x45/0x59 [ 3479.417582] print_address_description.constprop.0+0x1f/0x120 [ 3479.423389] ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.429469] kasan_report.cold+0x83/0xdf [ 3479.433438] ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.439514] _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.445411] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 3479.452032] scsih_remove+0x525/0xc90 [mpt3sas] [ 3479.458212] ? mpt3sas_expander_remove+0x1d0/0x1d0 [mpt3sas] [ 3479.465529] ? down_write+0xde/0x150 [ 3479.470746] ? up_write+0x14d/0x460 [ 3479.475840] ? kernfs_find_ns+0x137/0x310 [ 3479.481438] pci_device_remove+0x65/0x110 [ 3479.487013] __device_release_driver+0x316/0x680 [ 3479.493180] driver_detach+0x1ec/0x2d0 [ 3479.498499] bus_remove_driver+0xe7/0x2d0 [ 3479.504081] pci_unregister_driver+0x26/0x250 [ 3479.510033] _mpt3sas_exit+0x2b/0x6cf [mpt3sas] [ 3479.516144] __x64_sys_delete_module+0x2fd/0x510 [ 3479.522315] ? free_module+0xaa0/0xaa0 [ 3479.527593] ? __cond_resched+0x1c/0x90 [ 3479.532951] ? lockdep_hardirqs_on_prepare+0x273/0x3e0 [ 3479.539607] ? syscall_enter_from_user_mode+0x21/0x70 [ 3479.546161] ? trace_hardirqs_on+0x1c/0x110 [ 3479.551828] do_syscall_64+0x35/0x80 [ 3479.556884] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3479.563402] RIP: 0033:0x7f1fc482483b ... [ 3479.943087] ================================================================== Fix this by introducing the local variable port_id to store the port ID value before executing mpt3sas_transport_port_remove(). This local variable is then used in the call to ioc_info() instead of dereferencing the freed port structure. Link: https://lore.kernel.org/r/[email protected] Fixes: 7d310f241001 ("scsi: mpt3sas: Get device objects using sas_address & portID") Cc: [email protected] Acked-by: Sreekanth Reddy <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yang Guang <[email protected]> Date: Thu Jan 27 08:00:59 2022 +0800 scsi: mvsas: Replace snprintf() with sysfs_emit() [ Upstream commit 0ad3867b0f13e45cfee5a1298bfd40eef096116c ] coccinelle report: ./drivers/scsi/mvsas/mv_init.c:699:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/mvsas/mv_init.c:747:8-16: WARNING: use scnprintf or sprintf Use sysfs_emit() instead of scnprintf() or sprintf(). Link: https://lore.kernel.org/r/c1711f7cf251730a8ceb5bdfc313bf85662b3395.1643182948.git.yang.guang5@zte.com.cn Reported-by: Zeal Robot <[email protected]> Signed-off-by: Yang Guang <[email protected]> Signed-off-by: David Yang <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Damien Le Moal <[email protected]> Date: Sun Feb 20 12:18:01 2022 +0900 scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() [ Upstream commit f792a3629f4c4aa4c3703d66b43ce1edcc3ec09a ] In pm8001_chip_fw_flash_update_build(), if pm8001_chip_fw_flash_update_build() fails, the struct fw_control_ex allocated must be freed. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Damien Le Moal <[email protected]> Date: Sun Feb 20 12:17:57 2022 +0900 scsi: pm8001: Fix pm8001_mpi_task_abort_resp() [ Upstream commit 7e6b7e740addcea450041b5be8e42f0a4ceece0f ] The call to pm8001_ccb_task_free() at the end of pm8001_mpi_task_abort_resp() already frees the ccb tag. So when the device NCQ_ABORT_ALL_FLAG is set, the tag should not be freed again. Also change the hardcoded 0xBFFFFFFF value to ~NCQ_ABORT_ALL_FLAG as it ought to be. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Damien Le Moal <[email protected]> Date: Sun Feb 20 12:17:44 2022 +0900 scsi: pm8001: Fix pm80xx_pci_mem_copy() interface [ Upstream commit 3762d8f6edcdb03994c919f9487fd6d336c06561 ] The declaration of the local variable destination1 in pm80xx_pci_mem_copy() as a pointer to a u32 results in the sparse warning: warning: incorrect type in assignment (different base types) expected unsigned int [usertype] got restricted __le32 [usertype] Furthermore, the destination" argument of pm80xx_pci_mem_copy() is wrongly declared with the const attribute. Fix both problems by changing the type of the "destination" argument to "__le32 *" and use this argument directly inside the pm80xx_pci_mem_copy() function, thus removing the need for the destination1 local variable. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jack Wang <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Damien Le Moal <[email protected]> Date: Sun Feb 20 12:18:00 2022 +0900 scsi: pm8001: Fix tag leaks on error [ Upstream commit 4c8f04b1905cd4b776d0b720463c091545478ef7 ] In pm8001_chip_set_dev_state_req(), pm8001_chip_fw_flash_update_req(), pm80xx_chip_phy_ctl_req() and pm8001_chip_reg_dev_req() add missing calls to pm8001_tag_free() to free the allocated tag when pm8001_mpi_build_cmd() fails. Similarly, in pm8001_exec_internal_task_abort(), if the chip ->task_abort method fails, the tag allocated for the abort request task must be freed. Add the missing call to pm8001_tag_free(). Link: https://lore.kernel.org/r/[email protected] Reviewed-by: John Garry <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Damien Le Moal <[email protected]> Date: Sun Feb 20 12:17:59 2022 +0900 scsi: pm8001: Fix task leak in pm8001_send_abort_all() [ Upstream commit f90a74892f3acf0cdec5844e90fc8686ca13e7d7 ] In pm8001_send_abort_all(), make sure to free the allocated sas task if pm8001_tag_alloc() or pm8001_mpi_build_cmd() fail. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: John Garry <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Mahesh Rajashekhara <[email protected]> Date: Tue Feb 1 15:48:43 2022 -0600 scsi: smartpqi: Fix kdump issue when controller is locked up [ Upstream commit 3ada501d602abf02353445c03bb3258146445d90 ] Avoid dropping into shell if the controller is in locked up state. Driver issues SIS soft reset to bring back the controller to SIS mode while OS boots into kdump mode. If the controller is in lockup state, SIS soft reset does not work. Since the controller lockup code has not been cleared, driver considers the firmware is no longer up and running. Driver returns back an error code to OS and the kdump fails. Link: https://lore.kernel.org/r/164375212337.440833.11955356190354940369.stgit@brunhilda.pdev.net Reviewed-by: Kevin Barnett <[email protected]> Reviewed-by: Scott Benesh <[email protected]> Reviewed-by: Scott Teel <[email protected]> Signed-off-by: Mahesh Rajashekhara <[email protected]> Signed-off-by: Don Brace <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Kevin Groeneveld <[email protected]> Date: Tue Mar 22 20:22:42 2022 -0400 scsi: sr: Fix typo in CDROM(CLOSETRAY|EJECT) handling [ Upstream commit bc5519c18a32ce855bb51b9f5eceb77a9489d080 ] Commit 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()") seems to have a typo as it is checking ret instead of cmd in the if statement checking for CDROMCLOSETRAY and CDROMEJECT. This changes the behaviour of these ioctls as the cdrom_ioctl handling of these is more restrictive than the scsi_ioctl version. Link: https://lore.kernel.org/r/[email protected] Fixes: 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()") Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Kevin Groeneveld <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Adrian Hunter <[email protected]> Date: Mon Apr 4 08:50:38 2022 +0300 scsi: ufs: ufs-pci: Add support for Intel MTL commit 4049f7acef3eb37c1ea0df45f3ffc29404f4e708 upstream. Add PCI ID and callbacks to support Intel Meteor Lake (MTL). Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] # v5.15+ Reviewed-by: Avri Altman <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Xiaomeng Tong <[email protected]> Date: Sun Mar 20 23:07:33 2022 +0800 scsi: ufs: ufshpb: Fix a NULL check on list iterator [ Upstream commit bfb7789bcbd901caead43861461bc8f334c90d3b ] The list iterator is always non-NULL so the check 'if (!rgn)' is always false and the dev_err() is never called. Move the check outside the loop and determine if 'victim_rgn' is NULL, to fix this bug. Link: https://lore.kernel.org/r/[email protected] Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read") Reviewed-by: Daejun Park <[email protected]> Signed-off-by: Xiaomeng Tong <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christophe JAILLET <[email protected]> Date: Sat Mar 19 08:01:24 2022 +0100 scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() [ Upstream commit 16ed828b872d12ccba8f07bcc446ae89ba662f9c ] The error handling path of the probe releases a resource that is not freed in the remove function. In some cases, a ioremap() must be undone. Add the missing iounmap() call in the remove function. Link: https://lore.kernel.org/r/247066a3104d25f9a05de8b3270fc3c848763bcc.1647673264.git.christophe.jaillet@wanadoo.fr Fixes: 45804fbb00ee ("[SCSI] 53c700: Amiga Zorro NCR53c710 SCSI") Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jamie Bainbridge <[email protected]> Date: Mon Apr 4 09:47:48 2022 +1000 sctp: count singleton chunks in assoc user stats [ Upstream commit e3d37210df5c41c51147a2d5d465de1a4d77be7a ] Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN- COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks" counter available to the assoc owner. These are all control chunks so they should be counted as such. Add counting of singleton chunks so they are properly accounted for. Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call") Signed-off-by: Jamie Bainbridge <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jakub Sitnicki <[email protected]> Date: Sat Mar 19 19:33:55 2022 +0100 selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port commit 3c69611b8926f8e74fcf76bd97ae0e5dafbeb26a upstream. In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") ->remote_port field changed from __u32 to __be16. However, narrow load tests which exercise 1-byte sized loads from offsetof(struct bpf_sk_lookup, remote_port) were not adopted to reflect the change. As a result, on little-endian we continue testing loads from addresses: - (__u8 *)&ctx->remote_port + 3 - (__u8 *)&ctx->remote_port + 4 which map to the zero padding following the remote_port field, and don't break the tests because there is no observable change. While on big-endian, we observe breakage because tests expect to see zeros for values loaded from: - (__u8 *)&ctx->remote_port - 1 - (__u8 *)&ctx->remote_port - 2 Above addresses map to ->remote_ip6 field, which precedes ->remote_port, and are populated during the bpf_sk_lookup IPv6 tests. Unsurprisingly, on s390x we observe: #136/38 sk_lookup/narrow access to ctx v4:OK #136/39 sk_lookup/narrow access to ctx v6:FAIL Fix it by removing the checks for 1-byte loads from offsets outside of the ->remote_port field. Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") Suggested-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tejun Heo <[email protected]> Date: Thu Jan 6 11:02:29 2022 -1000 selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 commit b09c2baa56347ae65795350dfcc633dedb1c2970 upstream. 0644 is an odd perm to create a cgroup which is a directory. Use the regular 0755 instead. This is necessary for euid switching test case. Reviewed-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Ovidiu Panait <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tejun Heo <[email protected]> Date: Thu Jan 6 11:02:29 2022 -1000 selftests: cgroup: Test open-time cgroup namespace usage for migration checks commit bf35a7879f1dfb0d050fe779168bcf25c7de66f5 upstream. When a task is writing to an fd opened by a different task, the perm check should use the cgroup namespace of the latter task. Add a test for it. Tested-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Ovidiu Panait <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tejun Heo <[email protected]> Date: Thu Jan 6 11:02:29 2022 -1000 selftests: cgroup: Test open-time credential usage for migration checks commit 613e040e4dc285367bff0f8f75ea59839bc10947 upstream. When a task is writing to an fd opened by a different task, the perm check should use the credentials of the latter task. Add a test for it. Tested-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Ovidiu Panait <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Naresh Kamboju <[email protected]> Date: Mon Mar 28 19:16:50 2022 +0530 selftests: net: Add tls config dependency for tls selftests [ Upstream commit d9142e1cf3bbdaf21337767114ecab26fe702d47 ] selftest net tls test cases need TLS=m without this the test hangs. Enabling config TLS solves this problem and runs to complete. - CONFIG_TLS=m Reported-by: Linux Kernel Functional Testing <[email protected]> Signed-off-by: Naresh Kamboju <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jiri Slaby <[email protected]> Date: Tue Mar 8 12:51:53 2022 +0100 serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() [ Upstream commit 988c7c00691008ea1daaa1235680a0da49dab4e8 ] The commit c15c3747ee32 (serial: samsung: fix potential soft lockup during uart write) added an unlock of port->lock before uart_write_wakeup() and a lock after it. It was always problematic to write data from tty_ldisc_ops::write_wakeup and it was even documented that way. We fixed the line disciplines to conform to this recently. So if there is still a missed one, we should fix them instead of this workaround. On the top of that, s3c24xx_serial_tx_dma_complete() in this driver still holds the port->lock while calling uart_write_wakeup(). So revert the wrap added by the commit above. Cc: Thomas Abraham <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: Hyeonkook Kim <[email protected]> Signed-off-by: Jiri Slaby <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Martin Habets <[email protected]> Date: Mon Apr 4 11:48:51 2022 +0100 sfc: Do not free an empty page_ring [ Upstream commit 458f5d92df4807e2a7c803ed928369129996bf96 ] When the page_ring is not used page_ptr_mask is 0. Do not dereference page_ring[0] in this case. Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Reported-by: Taehee Yoo <[email protected]> Signed-off-by: Martin Habets <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jean-Philippe Brucker <[email protected]> Date: Thu Mar 31 11:24:41 2022 +0100 skbuff: fix coalescing for page_pool fragment recycling [ Upstream commit 1effe8ca4e34c34cdd9318436a4232dcb582ebf4 ] Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver: (1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this: RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/ (2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv(). (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb(): netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2) SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3: SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/ (3b) Now while handling TCP, coalesce SKB3 with SKB1: tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref SKB1 _____ PAGE1 \____ SKB2 _____ PAGE2 / RX_BD3 _________/ In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow. (3c) Drop SKB2: af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count SKB1 _____ PAGE1 \____ PAGE2 / RX_BD3 _________/ (4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well: tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count (5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled. Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned. The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce(). Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck <[email protected]> Signed-off-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Yunsheng Lin <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Kamal Dasu <[email protected]> Date: Mon Mar 28 10:24:42 2022 -0400 spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() [ Upstream commit 2c7d1b281286c46049cd22b43435cecba560edde ] This fixes case where MSPI controller is used to access spi-nor flash and BSPI block is not present. Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface") Signed-off-by: Kamal Dasu <[email protected]> Acked-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Vinod Koul <[email protected]> Date: Wed Apr 6 18:52:38 2022 +0530 spi: core: add dma_map_dev for __spi_unmap_msg() commit 409543cec01a84610029d6440c480c3fdd7214fb upstream. Commit b470e10eb43f ("spi: core: add dma_map_dev for dma device") added dma_map_dev for _spi_map_msg() but missed to add for unmap routine, __spi_unmap_msg(), so add it now. Fixes: b470e10eb43f ("spi: core: add dma_map_dev for dma device") Cc: [email protected] # v5.14+ Signed-off-by: Vinod Koul <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Marco Elver <[email protected]> Date: Fri Nov 5 13:45:25 2021 -0700 stacktrace: move filter_irq_stacks() to kernel/stacktrace.c commit f39f21b3ddc7fc0f87eb6dc75ddc81b5bbfb7672 upstream. filter_irq_stacks() has little to do with the stackdepot implementation, except that it is usually used by users (such as KASAN) of stackdepot to reduce the stack trace. However, filter_irq_stacks() itself is not useful without a stack trace as obtained by stack_trace_save() and friends. Therefore, move filter_irq_stacks() to kernel/stacktrace.c, so that new users of filter_irq_stacks() do not have to start depending on STACKDEPOT only for filter_irq_stacks(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Acked-by: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Jann Horn <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Stefan Wahren <[email protected]> Date: Sun Jan 23 21:02:21 2022 +0100 staging: vchiq_arm: Avoid NULL ptr deref in vchiq_dump_platform_instances [ Upstream commit aa899e686d442c63d50f4d369cc02dbbf0941cb0 ] vchiq_get_state() can return a NULL pointer. So handle this cases and avoid a NULL pointer derefence in vchiq_dump_platform_instances. Reviewed-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Stefan Wahren <[email protected]> Date: Sun Jan 23 21:02:22 2022 +0100 staging: vchiq_core: handle NULL result of find_service_by_handle [ Upstream commit ca225857faf237234d2fffe5d1919467dfadd822 ] In case of an invalid handle the function find_servive_by_handle returns NULL. So take care of this and avoid a NULL pointer dereference. Reviewed-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Xiaoke Wang <[email protected]> Date: Fri Feb 18 21:59:45 2022 +0800 staging: wfx: fix an error handling in wfx_init_common() [ Upstream commit 60f1d3c92dc1ef1026e5b917a329a7fa947da036 ] One error handler of wfx_init_common() return without calling ieee80211_free_hw(hw), which may result in memory leak. And I add one err label to unify the error handler, which is useful for the subsequent changes. Suggested-by: Jérôme Pouiller <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Reviewed-by: Jérôme Pouiller <[email protected]> Signed-off-by: Xiaoke Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christophe Leroy <[email protected]> Date: Mon Mar 14 12:49:36 2022 +0100 static_call: Don't make __static_call_return0 static commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b upstream. System.map shows that vmlinux contains several instances of __static_call_return0(): c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0 arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not: c0011520 <arch_static_call_transform>: c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it. In order to work properly, global single instance of __static_call_return0() is required. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: NeilBrown <[email protected]> Date: Mon Mar 7 10:41:44 2022 +1100 SUNRPC/call_alloc: async tasks mustn't block waiting for memory [ Upstream commit c487216bec83b0c5a8803e5c61433d33ad7b104d ] When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: NeilBrown <[email protected]> Date: Mon Mar 7 10:41:44 2022 +1100 SUNRPC/xprt: async tasks mustn't block waiting for memory [ Upstream commit a721035477fb5fb8abc738fbe410b07c12af3dc5 ] When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Wed Mar 16 19:10:43 2022 -0400 SUNRPC: Don't call connect() more than once on a TCP socket commit 89f42494f92f448747bd8a7ab1ae8b5d5520577d upstream. Avoid socket state races due to repeated calls to ->connect() using the same socket. If connect() returns 0 due to the connection having completed, but we are in fact in a closing state, then we may leave the XPRT_CONNECTING flag set on the transport. Reported-by: Enrico Scholz <[email protected]> Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Mon Mar 14 21:02:10 2022 -0400 SUNRPC: Fix socket waits for write buffer space [ Upstream commit 7496b59f588dd52886fdbac7633608097543a0a5 ] The socket layer requires that we use the socket lock to protect changes to the sock->sk_write_pending field and others. Reported-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Wed Apr 6 23:18:57 2022 -0400 SUNRPC: Handle ENOMEM in call_transmit_status() [ Upstream commit d3c15033b240767d0287f1c4a529cbbe2d5ded8a ] Both call_transmit() and call_bc_transmit() can now return ENOMEM, so let's make sure that we handle the errors gracefully. Fixes: 0472e4766049 ("SUNRPC: Convert socket page send code to use iov_iter()") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Thu Apr 7 09:50:19 2022 -0400 SUNRPC: Handle low memory situations in call_status() [ Upstream commit 9d82819d5b065348ce623f196bf601028e22ed00 ] We need to handle ENFILE, ENOBUFS, and ENOMEM, because xprt_wake_pending_tasks() can be called with any one of these due to socket creation failures. Fixes: b61d59fffd3e ("SUNRPC:Â xs_tcp_connect_worker{4,6}: merge common code") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Tue Oct 26 18:01:07 2021 -0400 SUNRPC: Prevent immediate close+reconnect commit 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 upstream. If we have already set up the socket and are waiting for it to connect, then don't immediately close and retry. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: NeilBrown <[email protected]> Date: Mon Mar 7 10:41:44 2022 +1100 SUNRPC: remove scheduling boost for "SWAPPER" tasks. [ Upstream commit a80a8461868905823609be97f91776a26befe839 ] Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Trond Myklebust <[email protected]> Date: Thu Apr 7 14:10:23 2022 -0400 SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() [ Upstream commit b056fa070814897be32d83b079dbc311375588e7 ] The allocation is done with GFP_KERNEL, but it could still fail in a low memory situation. Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sebastian Andrzej Siewior <[email protected]> Date: Wed Feb 9 19:56:57 2022 +0100 tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. [ Upstream commit 4f9bf2a2f5aacf988e6d5e56b961ba45c5a25248 ] Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations") removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired. On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves is also acquired with enabled bottom halves followed by disabling bottom halves. This is the reverse locking order. It has been observed with inet_listen_hashbucket::lock: local_bh_disable() + spin_lock(&ilb->lock): inet_listen() inet_csk_listen_start() sk->sk_prot->hash() := inet_hash() local_bh_disable() __inet_hash() spin_lock(&ilb->lock); acquire(&ilb->lock); Reverse order: spin_lock(&ilb2->lock) + local_bh_disable(): tcp_seq_next() listening_get_next() spin_lock(&ilb2->lock); acquire(&ilb2->lock); tcp4_seq_show() get_tcp4_sock() sock_i_ino() read_lock_bh(&sk->sk_callback_lock); acquire(softirq_ctrl) // <---- whoops acquire(&sk->sk_callback_lock) Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock. Split inet_unhash() and acquire the listen_hashbucket lock without disabling bottom halves; the inet_ehash lock with disabled bottom halves. Reported-by: Mike Galbraith <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Arnaldo Carvalho de Melo <[email protected]> Date: Tue Apr 5 10:33:21 2022 -0300 tools build: Filter out options and warnings not supported by clang commit 41caff459a5b956b3e23ba9ca759dd0629ad3dda upstream. These make the feature check fail when using clang, so remove them just like is done in tools/perf/Makefile.config to build perf itself. Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config when building with clang is also necessary to avoid these warnings turned into errors (-Werror): CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o In file included from util/scripting-engines/trace-event-perl.c:35: In file included from /usr/lib64/perl5/CORE/perl.h:4085: In file included from /usr/lib64/perl5/CORE/hv.h:659: In file included from /usr/lib64/perl5/CORE/hv_func.h:34: In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4: /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro] ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \ ^~~~~~~~~~ /usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START' # define STMT_START (void)( /* gcc supports "({ STATEMENTS; })" */ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro] ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' v ^= (v>>23); \ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' } STMT_END ^~~~~~~~ /usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END' # define STMT_END ) ^ Please refer to the discussion on the Link: tag below, where Nathan clarifies the situation: <quote> acme> And then get to the problems at the end of this message, which seem acme> similar to the problem described here: acme> acme> From Nathan Chancellor <> acme> Subject [PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO acme> acme> https://lkml.org/lkml/2020/9/1/135 acme> acme> So perhaps in this case its better to disable that acme> -Werror,-Wcompound-token-split-by-macro when building with clang? Yes, I think that is probably the best solution. As far as I can tell, at least in this file and context, the warning appears harmless, as the "create a GNU C statement expression from two different macros" is very much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS. The warning is fixed in upstream Perl by just avoiding creating GNU C statement expressions using STMT_START and STMT_END: https://github.com/Perl/perl5/issues/18780 https://github.com/Perl/perl5/pull/18984 If I am reading the source code correctly, an alternative to disabling the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it seems like that might end up impacting more than just this site, according to the issue discussion above. </quote> Based-on-a-patch-by: Sedat Dilek <[email protected]> Tested-by: Sedat Dilek <[email protected]> # Debian/Selfmade LLVM-14 (x86-64) Cc: Adrian Hunter <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Keeping <[email protected]> Cc: Leo Yan <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Arnaldo Carvalho de Melo <[email protected]> Date: Mon Apr 4 17:28:48 2022 -0300 tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts commit 541f695cbcb6932c22638b06e0cbe1d56177e2e9 upstream. Just like its done for ldopts and for both in tools/perf/Makefile.config. Using `` to initialize PERL_EMBED_CCOPTS somehow precludes using: $(filter-out SOMETHING_TO_FILTER,$(PERL_EMBED_CCOPTS)) And we need to do it to allow for building with versions of clang where some gcc options selected by distros are not available. Tested-by: Sedat Dilek <[email protected]> # Debian/Selfmade LLVM-14 (x86-64) Cc: Adrian Hunter <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Keeping <[email protected]> Cc: Leo Yan <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Harold Huang <[email protected]> Date: Thu Mar 3 10:24:40 2022 +0800 tuntap: add sanity checks about msg_controllen in sendmsg [ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ] In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to tun_sendmsg. Although we donot use msg_controllen in this path, we should check msg_controllen to make sure the caller pass a valid msg_ctl. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505 Reported-by: Eric Dumazet <[email protected]> Suggested-by: Jason Wang <[email protected]> Signed-off-by: Harold Huang <[email protected]> Acked-by: Jason Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Kees Cook <[email protected]> Date: Wed Jan 19 18:10:35 2022 -0800 ubsan: remove CONFIG_UBSAN_OBJECT_SIZE commit 69d0db01e210e07fe915e5da91b54a867cda040f upstream. The object-size sanitizer is redundant to -Warray-bounds, and inappropriately performs its checks at run-time when all information needed for the evaluation is available at compile-time, making it quite difficult to use: https://bugzilla.kernel.org/show_bug.cgi?id=214861 With -Warray-bounds almost enabled globally, it doesn't make sense to keep this around. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Michal Marek <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Cc: Tadeusz Struk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Benjamin Beichler <[email protected]> Date: Tue Jan 11 20:05:06 2022 +0000 um: fix and optimize xor select template for CONFIG64 and timetravel mode [ Upstream commit e3a33af812c611d99756e2ec61e9d7068d466bdf ] Due to dropped inclusion of asm-generic/xor.h, xor_block_8regs symbol is missing with CONFIG64 and break compilation, as the asm/xor_64.h also did not include it. The patch recreate the logic from arch/x86, which check whether AVX is available and add fallbacks for 32bit and 64bit config of um. A very minor additional "fix" is, the return of the macro parameter instead of NULL, as this is the original intent of the macro, but this does not change the actual behavior. Fixes: c0ecca6604b8 ("um: enable the use of optimized xor routines in UML") Signed-off-by: Benjamin Beichler <[email protected]> Acked-By: Anton Ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Pawel Laszczak <[email protected]> Date: Wed Jan 12 06:32:37 2022 +0100 usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value [ Upstream commit 03db9289b5ab59437e42a111a34545a7cedb5190 ] Variable ret in function cdnsp_decode_trb is initialized but not used. To fix this compiler warning patch adds checking whether the data buffer has not been overflowed. Reported-by: kernel test robot <[email protected]> Signed-off-by: Pawel Laszczak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: H. Nikolaus Schaller <[email protected]> Date: Tue Mar 8 14:03:37 2022 +0100 usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm [ Upstream commit ac01df343e5a6c6bcead2ed421af1fde30f73e7e ] Usually, the vbus_regulator (smps10 on omap5evm) boots up disabled. Hence calling regulator_disable() indirectly through dwc3_omap_set_mailbox() during probe leads to: [ 10.332764] WARNING: CPU: 0 PID: 1628 at drivers/regulator/core.c:2853 _regulator_disable+0x40/0x164 [ 10.351919] unbalanced disables for smps10_out1 [ 10.361298] Modules linked in: dwc3_omap(+) clk_twl6040 at24 gpio_twl6040 palmas_gpadc palmas_pwrbutton industrialio snd_soc_omap_mcbsp(+) snd_soc_ti_sdma display_connector ti_tpd12s015 drm leds_gpio drm_panel_orientation_quirks ip_tables x_tables ipv6 autofs4 [ 10.387818] CPU: 0 PID: 1628 Comm: systemd-udevd Not tainted 5.17.0-rc1-letux-lpae+ #8139 [ 10.405129] Hardware name: Generic OMAP5 (Flattened Device Tree) [ 10.411455] unwind_backtrace from show_stack+0x10/0x14 [ 10.416970] show_stack from dump_stack_lvl+0x40/0x4c [ 10.422313] dump_stack_lvl from __warn+0xb8/0x170 [ 10.427377] __warn from warn_slowpath_fmt+0x70/0x9c [ 10.432595] warn_slowpath_fmt from _regulator_disable+0x40/0x164 [ 10.439037] _regulator_disable from regulator_disable+0x30/0x64 [ 10.445382] regulator_disable from dwc3_omap_set_mailbox+0x8c/0xf0 [dwc3_omap] [ 10.453116] dwc3_omap_set_mailbox [dwc3_omap] from dwc3_omap_probe+0x2b8/0x394 [dwc3_omap] [ 10.467021] dwc3_omap_probe [dwc3_omap] from platform_probe+0x58/0xa8 [ 10.481762] platform_probe from really_probe+0x168/0x2fc [ 10.481782] really_probe from __driver_probe_device+0xc4/0xd8 [ 10.481782] __driver_probe_device from driver_probe_device+0x24/0xa4 [ 10.503762] driver_probe_device from __driver_attach+0xc4/0xd8 [ 10.510018] __driver_attach from bus_for_each_dev+0x64/0xa0 [ 10.516001] bus_for_each_dev from bus_add_driver+0x148/0x1a4 [ 10.524880] bus_add_driver from driver_register+0xb4/0xf8 [ 10.530678] driver_register from do_one_initcall+0x90/0x1c4 [ 10.536661] do_one_initcall from do_init_module+0x4c/0x200 [ 10.536683] do_init_module from load_module+0x13dc/0x1910 [ 10.551159] load_module from sys_finit_module+0xc8/0xd8 [ 10.561319] sys_finit_module from __sys_trace_return+0x0/0x18 [ 10.561336] Exception stack(0xc344bfa8 to 0xc344bff0) [ 10.561341] bfa0: b6fb5778 b6fab8d8 00000007 b6ecfbb8 00000000 b6ed0398 [ 10.561341] bfc0: b6fb5778 b6fab8d8 855c0500 0000017b 00020000 b6f9a3cc 00000000 b6fb5778 [ 10.595500] bfe0: bede18f8 bede18e8 b6ec9aeb b6dda1c2 [ 10.601345] ---[ end trace 0000000000000000 ]--- Fix this unnecessary warning by checking if the regulator is enabled. Signed-off-by: H. Nikolaus Schaller <[email protected]> Link: https://lore.kernel.org/r/af3b750dc2265d875deaabcf5f80098c9645da45.1646744616.git.hns@goldelico.com Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hans de Goede <[email protected]> Date: Sun Feb 13 14:05:18 2022 +0100 usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks() [ Upstream commit e285cb403994419e997749c9a52b9370884ae0c8 ] The quirk handling may need to set some different properties which means using a different swnode, move the setting of the swnode to inside dwc3_pci_quirks() so that the quirk handling can choose a different swnode. Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Neal Liu <[email protected]> Date: Tue Feb 8 18:16:57 2022 +0800 usb: ehci: add pci device support for Aspeed platforms [ Upstream commit c3c9cee592828528fd228b01d312c7526c584a42 ] Enable Aspeed quirks in commit 7f2d73788d90 ("usb: ehci: handshake CMD_RUN instead of STS_HALT") to support Aspeed ehci-pci device. Acked-by: Alan Stern <[email protected]> Signed-off-by: Neal Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Wayne Chang <[email protected]> Date: Fri Jan 7 17:04:43 2022 +0800 usb: gadget: tegra-xudc: Do not program SPARAM [ Upstream commit 62fb61580eb48fc890b7bc9fb5fd263367baeca8 ] According to the Tegra Technical Reference Manual, SPARAM is a read-only register and should not be programmed in the driver. The change removes the wrong SPARAM usage. Signed-off-by: Wayne Chang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Wayne Chang <[email protected]> Date: Fri Jan 7 17:13:49 2022 +0800 usb: gadget: tegra-xudc: Fix control endpoint's definitions [ Upstream commit 7bd42fb95eb4f98495ccadf467ad15124208ec49 ] According to the Tegra Technical Reference Manual, the seq_num field of control endpoint is not [31:24] but [31:27]. Bit 24 is reserved and bit 26 is splitxstate. The change fixes the wrong control endpoint's definitions. Signed-off-by: Wayne Chang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eli Cohen <[email protected]> Date: Thu Sep 9 15:36:35 2021 +0300 vdpa/mlx5: Propagate link status from device to vdpa driver [ Upstream commit edf747affc41a18ccc3a616813d4c2b6d38b46ce ] Add code to register to hardware asynchronous events. Use this mechanism to track link status events coming from the device and update the config struct. After doing link status change, call the vdpa callback to notify of the link status change. Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eli Cohen <[email protected]> Date: Thu Sep 9 15:36:34 2021 +0300 vdpa/mlx5: Rename control VQ workqueue to vdpa wq [ Upstream commit 218bdd20e56cab41a68481bc10c551ae3e0a24fb ] A subesequent patch will use the same workqueue for executing other work not related to control VQ. Rename the workqueue and the work queue entry used to convey information to the workqueue. Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Jason Wang <[email protected]> Date: Tue Mar 29 12:21:07 2022 +0800 vdpa: mlx5: prevent cvq work from hogging CPU [ Upstream commit 55ebf0d60e3cc6c9e8593399e185842c00e12f36 ] A userspace triggerable infinite loop could happen in mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of cvq requests. Fixing this by introducing a quota and re-queue the work if we're out of the budget (currently the implicit budget is one) . While at it, using a per device work struct to avoid on demand memory allocation for cvq. Fixes: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Jason Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Eli Cohen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Alex Williamson <[email protected]> Date: Mon Jan 24 16:11:37 2022 -0700 vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA [ Upstream commit 6e031ec0e5a2dda53e12e0d2a7e9b15b47a3c502 ] Resolve build errors reported against UML build for undefined ioport_map() and ioport_unmap() functions. Without this config option a device cannot have vfio_pci_core_device.has_vga set, so the existing function would always return -EINVAL anyway. Reported-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/164306582968.3758255.15192949639574660648.stgit@omen Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Randy Dunlap <[email protected]> Date: Wed Mar 16 12:20:03 2022 -0700 virtio_console: eliminate anonymous module_init & module_exit [ Upstream commit fefb8a2a941338d871e2d83fbd65fbfa068857bd ] Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log. Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names. Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs Signed-off-by: Randy Dunlap <[email protected]> Reviewed-by: Amit Shah <[email protected]> Cc: [email protected] Cc: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eyal Birger <[email protected]> Date: Thu Mar 31 10:26:43 2022 +0300 vrf: fix packet sniffing for traffic originating from ip tunnels [ Upstream commit 012d69fbfcc739f846766c1da56ef8b493b803b5 ] in commit 048939088220 ("vrf: add mac header for tunneled packets when sniffer is attached") an Ethernet header was cooked for traffic originating from tunnel devices. However, the header is added based on whether the mac_header is unset and ignores cases where the device doesn't expose a mac header to upper layers, such as in ip tunnels like ipip and gre. Traffic originating from such devices still appears garbled when capturing on the vrf device. Fix by observing whether the original device exposes a header to upper layers, similar to the logic done in af_packet. In addition, skb->mac_len needs to be adjusted after adding the Ethernet header for the skb_push/pull() surrounding dev_queue_xmit_nit() to work on these packets. Fixes: 048939088220 ("vrf: add mac header for tunneled packets when sniffer is attached") Signed-off-by: Eyal Birger <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Lucas Denefle <[email protected]> Date: Wed Feb 23 11:35:55 2022 +0000 w1: w1_therm: fixes w1_seq for ds28ea00 sensors [ Upstream commit 41a92a89eee819298f805c40187ad8b02bb53426 ] w1_seq was failing due to several devices responding to the CHAIN_DONE at the same time. Now properly selects the current device in the chain with MATCH_ROM. Also acknowledgment was read twice. Signed-off-by: Lucas Denefle <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Peter Zijlstra <[email protected]> Date: Fri Mar 18 21:24:38 2022 +0100 x86,static_call: Fix __static_call_return0 for i386 commit 1cd5f059d956e6f614ba6666ecdbcf95db05d5f5 upstream. Paolo reported that the instruction sequence that is used to replace: call __static_call_return0 namely: 66 66 48 31 c0 data16 data16 xor %rax,%rax decodes to something else on i386, namely: 66 66 48 data16 dec %ax 31 c0 xor %eax,%eax Which is a nonsensical sequence that happens to have the same outcome. *However* an important distinction is that it consists of 2 instructions which is a problem when the thing needs to be overwriten with a regular call instruction again. As such, replace the instruction with something that decodes the same on both i386 and x86_64. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Reported-by: Paolo Bonzini <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Vincent Mailhol <[email protected]> Date: Thu Mar 24 11:37:42 2022 +0900 x86/bug: Prevent shadowing in __WARN_FLAGS commit 9ce02f0fc68326dd1f87a0a3a4c6ae7fdd39e6f6 upstream. The macro __WARN_FLAGS() uses a local variable named "f". This being a common name, there is a risk of shadowing other variables. For example, GCC would yield: | In file included from ./include/linux/bug.h:5, | from ./include/linux/cpumask.h:14, | from ./arch/x86/include/asm/cpumask.h:5, | from ./arch/x86/include/asm/msr.h:11, | from ./arch/x86/include/asm/processor.h:22, | from ./arch/x86/include/asm/timex.h:5, | from ./include/linux/timex.h:65, | from ./include/linux/time32.h:13, | from ./include/linux/time.h:60, | from ./include/linux/stat.h:19, | from ./include/linux/module.h:13, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu': | ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow] | 80 | __auto_type f = BUGFLAG_WARNING|(flags); \ | | ^ | ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS' | 106 | __WARN_FLAGS(BUGFLAG_ONCE | \ | | ^~~~~~~~~~~~ | ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE' | 1007 | WARN_ON_ONCE(func != (rcu_callback_t)~0L); | | ^~~~~~~~~~~~ | In file included from ./include/linux/rbtree.h:24, | from ./include/linux/mm_types.h:11, | from ./include/linux/buildid.h:5, | from ./include/linux/module.h:14, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here | 1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f) | | ~~~~~~~~~~~~~~~^ For reference, sparse also warns about it, c.f. [1]. This patch renames the variable from f to __flags (with two underscore prefixes as suggested in the Linux kernel coding style [2]) in order to prevent collisions. [1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+1sw@mail.gmail.com/ [2] Linux kernel coding style, section 12) Macros, Enums and RTL, paragraph 5) namespace collisions when defining local variables in macros resembling functions https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm") Signed-off-by: Vincent Mailhol <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Nathan Chancellor <[email protected]> Date: Mon Mar 14 12:48:42 2022 -0700 x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy [ Upstream commit aaeed6ecc1253ce1463fa1aca0b70a4ccbc9fa75 ] There are two outstanding issues with CONFIG_X86_X32_ABI and llvm-objcopy, with similar root causes: 1. llvm-objcopy does not properly convert .note.gnu.property when going from x86_64 to x86_x32, resulting in a corrupted section when linking: https://github.com/ClangBuiltLinux/linux/issues/1141 2. llvm-objcopy produces corrupted compressed debug sections when going from x86_64 to x86_x32, also resulting in an error when linking: https://github.com/ClangBuiltLinux/linux/issues/514 After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the .note.gnu.property section is always generated when CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become visible with an allmodconfig build: ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy, this can be turned into a feature check. Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Dave Hansen <[email protected]> Date: Fri Mar 18 06:52:59 2022 -0700 x86/mm/tlb: Revert retpoline avoidance approach commit d39268ad24c0fd0665d0c5cf55a7c1a0ebf94766 upstream. 0day reported a regression on a microbenchmark which is intended to stress the TLB flushing path: https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/ It pointed at a commit from Nadav which intended to remove retpoline overhead in the TLB flushing path by taking the 'cond'-ition in on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into 'cpumask'. That allowed the code to use a bunch of earlier direct calls instead of later indirect calls that need a retpoline. But, in practice, threads can go idle (and into lazy TLB mode where they don't need to flush their TLB) between the early and late calls. It works in this direction and not in the other because TLB-flushing threads tend to hold mmap_lock for write. Contention on that lock causes threads to _go_ idle right in this early/late window. There was not any performance data in the original commit specific to the retpoline overhead. I did a few tests on a system with retpolines: https://lore.kernel.org/all/[email protected]/ which showed a possible small win. But, that small win pales in comparison with the bigger loss induced on non-retpoline systems. Revert the patch that removed the retpolines. This was not a clean revert, but it was self-contained enough not to be too painful. Fixes: 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()") Reported-by: kernel test robot <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Nadav Amit <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-bot2 Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Reto Buerki <[email protected]> Date: Thu Apr 7 13:06:47 2022 +0200 x86/msi: Fix msi message data shadow struct commit 59b18a1e65b7a2134814106d0860010e10babe18 upstream. The x86 MSI message data is 32 bits in total and is either in compatibility or remappable format, see Intel Virtualization Technology for Directed I/O, section 5.1.2. Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs") Co-developed-by: Adrian-Ken Rueegsegger <[email protected]> Signed-off-by: Adrian-Ken Rueegsegger <[email protected]> Signed-off-by: Reto Buerki <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Pawan Gupta <[email protected]> Date: Mon Apr 4 17:34:19 2022 -0700 x86/pm: Save the MSR validity status at context setup commit 73924ec4d560257004d5b5116b22a3647661e364 upstream. The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be. Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend. Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: Dave Hansen <[email protected]> Signed-off-by: Pawan Gupta <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Pawan Gupta <[email protected]> Date: Mon Apr 4 17:35:45 2022 -0700 x86/speculation: Restore speculation related MSRs during S3 resume commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 upstream. After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU. These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu(). During S3 resume, restore these MSRs for boot CPU when restoring its processor state. Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by: Neelima Krishnan <[email protected]> Signed-off-by: Pawan Gupta <[email protected]> Tested-by: Neelima Krishnan <[email protected]> Acked-by: Borislav Petkov <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Peter Zijlstra <[email protected]> Date: Tue Mar 8 16:30:50 2022 +0100 x86: Annotate call_on_stack() [ Upstream commit be0075951fde739f14ee2b659e2fd6e2499c46c0 ] vmlinux.o: warning: objtool: page_fault_oops()+0x13c: unreachable instruction 0000 000000000005b460 <page_fault_oops>: ... 0128 5b588: 49 89 23 mov %rsp,(%r11) 012b 5b58b: 4c 89 dc mov %r11,%rsp 012e 5b58e: 4c 89 f2 mov %r14,%rdx 0131 5b591: 48 89 ee mov %rbp,%rsi 0134 5b594: 4c 89 e7 mov %r12,%rdi 0137 5b597: e8 00 00 00 00 call 5b59c <page_fault_oops+0x13c> 5b598: R_X86_64_PLT32 handle_stack_overflow-0x4 013c 5b59c: 5c pop %rsp vmlinux.o: warning: objtool: sysvec_reboot()+0x6d: unreachable instruction 0000 00000000000033f0 <sysvec_reboot>: ... 005d 344d: 4c 89 dc mov %r11,%rsp 0060 3450: e8 00 00 00 00 call 3455 <sysvec_reboot+0x65> 3451: R_X86_64_PLT32 irq_enter_rcu-0x4 0065 3455: 48 89 ef mov %rbp,%rdi 0068 3458: e8 00 00 00 00 call 345d <sysvec_reboot+0x6d> 3459: R_X86_64_PC32 .text+0x47d0c 006d 345d: e8 00 00 00 00 call 3462 <sysvec_reboot+0x72> 345e: R_X86_64_PLT32 irq_exit_rcu-0x4 0072 3462: 5c pop %rsp Both cases are due to a call_on_stack() calling a __noreturn function. Since that's an inline asm, GCC can't do anything about the instructions after the CALL. Therefore put in an explicit ASM_REACHABLE annotation to make sure objtool and gcc are consistently confused about control flow. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Dongli Zhang <[email protected]> Date: Wed Mar 2 08:40:32 2022 -0800 xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 [ Upstream commit eed05744322da07dd7e419432dcedf3c2e017179 ] The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below: setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time; This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic. This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32. This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled: "taskset -c 33 echo c > /proc/sysrq-trigger" The bugfix for PVM is not implemented due to the lack of testing environment. [boris: xen_hvm_init_time_ops() returns on errors instead of jumping to end] Cc: Joe Jin <[email protected]> Signed-off-by: Dongli Zhang <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Boris Ostrovsky <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Max Filippov <[email protected]> Date: Thu Mar 17 02:49:41 2022 -0700 xtensa: fix DTC warning unit_address_format [ Upstream commit e85d29ba4b24f68e7a78cb85c55e754362eeb2de ] DTC issues the following warnings when building xtfpga device trees: /soc/flash@00000000/partition@0x0: unit name should not have leading "0x" /soc/flash@00000000/partition@0x6000000: unit name should not have leading "0x" /soc/flash@00000000/partition@0x6800000: unit name should not have leading "0x" /soc/flash@00000000/partition@0x7fe0000: unit name should not have leading "0x" Drop leading 0x from flash partition unit names. Signed-off-by: Max Filippov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>