Список изменений

ALSA: intel_hdmi: Fix reference to PCM buffer address [+ + +]

Author: Zhen Ni <[email protected]>
Date:   Wed Mar 2 15:42:41 2022 +0800

    ALSA: intel_hdmi: Fix reference to PCM buffer address
    
    commit 0aa6b294b312d9710804679abd2c0c8ca52cc2bc upstream.
    
    PCM buffers might be allocated dynamically when the buffer
    preallocation failed or a larger buffer is requested, and it's not
    guaranteed that substream->dma_buffer points to the actually used
    buffer.  The driver needs to refer to substream->runtime->dma_addr
    instead for the buffer address.
    
    Signed-off-by: Zhen Ni <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm64: dts: juno: Remove GICv2m dma-range [+ + +]

Author: Robin Murphy <[email protected]>
Date:   Mon Jan 24 17:57:01 2022 +0000

    arm64: dts: juno: Remove GICv2m dma-range
    
    [ Upstream commit 31eeb6b09f4053f32a30ce9fbcdfca31f713028d ]
    
    Although it is painstakingly honest to describe all 3 PCI windows in
    "dma-ranges", it misses the the subtle distinction that the window for
    the GICv2m range is normally programmed for Device memory attributes
    rather than Normal Cacheable like the DRAM windows. Since MMU-401 only
    offers stage 2 translation, this means that when the PCI SMMU is
    enabled, accesses through that IPA range unexpectedly lose coherency if
    mapped as cacheable at the SMMU, due to the attribute combining rules.
    Since an extra 256KB is neither here nor there when we still have 10GB
    worth of usable address space, rather than attempting to describe and
    cope with this detail let's just remove the offending range. If the SMMU
    is not used then it makes no difference anyway.
    
    Link: https://lore.kernel.org/r/856c3f7192c6c3ce545ba67462f2ce9c86ed6b0c.1643046936.git.robin.murphy@arm.com
    Fixes: 4ac4d146cb63 ("arm64: dts: juno: Describe PCI dma-ranges")
    Reported-by: Anders Roxell <[email protected]>
    Acked-by: Liviu Dudau <[email protected]>
    Signed-off-by: Robin Murphy <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output [+ + +]

Author: Brian Norris <[email protected]>
Date:   Fri Jan 14 15:02:07 2022 -0800

    arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
    
    commit b5fbaf7d779f5f02b7f75b080e7707222573be2a upstream.
    
    Commit b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
    switched the platform to SPDIF, but we didn't fix up the device tree.
    
    Drop the pinctrl settings, because the 'spdif_bus' pins are either:
     * unused (on kevin, bob), so the settings is ~harmless
     * used by a different function (on scarlet), which causes probe
       failures (!!)
    
    Fixes: b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
    Signed-off-by: Brian Norris <[email protected]>
    Reviewed-by: Chen-Yu Tsai <[email protected]>
    Link: https://lore.kernel.org/r/20220114150129.v2.1.I46f64b00508d9dff34abe1c3e8d2defdab4ea1e5@changeid
    Signed-off-by: Heiko Stuebner <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL [+ + +]

Author: Masami Hiramatsu <[email protected]>
Date:   Mon Jan 24 17:17:54 2022 +0900

    arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL
    
    [ Upstream commit 1e0924bd09916fab795fc2a21ec1d148f24299fd ]
    
    Mark the start_backtrace() as notrace and NOKPROBE_SYMBOL
    because this function is called from ftrace and lockdep to
    get the caller address via return_address(). The lockdep
    is used in kprobes, it should also be NOKPROBE_SYMBOL.
    
    Fixes: b07f3499661c ("arm64: stacktrace: Move start_backtrace() out of the header")
    Cc: <[email protected]> # 5.13.x
    Signed-off-by: Masami Hiramatsu <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/164301227374.1433152.12808232644267107415.stgit@devnote2
    Signed-off-by: Catalin Marinas <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Feb 23 20:46:35 2022 +0100

    ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
    
    commit 7b83299e5b9385943a857d59e15cba270df20d7e upstream.
    
    early_param() handlers should return 0 on success.
    __setup() handlers should return 1 on success, i.e., the parameter
    has been handled. A return of 0 would cause the "option=value" string
    to be added to init's environment strings, polluting it.
    
    ../arch/arm/mm/mmu.c: In function 'test_early_cachepolicy':
    ../arch/arm/mm/mmu.c:215:1: error: no return statement in function returning non-void [-Werror=return-type]
    ../arch/arm/mm/mmu.c: In function 'test_noalign_setup':
    ../arch/arm/mm/mmu.c:221:1: error: no return statement in function returning non-void [-Werror=return-type]
    
    Fixes: b849a60e0903 ("ARM: make cr_alignment read-only #ifndef CONFIG_CPU_CP15")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Cc: Uwe Kleine-Kц╤nig <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Russell King (Oracle) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: dts: switch timer config to common devkit8000 devicetree [+ + +]

Author: Anthoine Bourgeois <[email protected]>
Date:   Tue Jan 25 20:11:38 2022 +0100

    ARM: dts: switch timer config to common devkit8000 devicetree
    
    [ Upstream commit 64324ef337d0caa5798fa8fa3f6bbfbd3245868a ]
    
    This patch allow lcd43 and lcd70 flavors to benefit from timer
    evolution.
    
    Fixes: e428e250fde6 ("ARM: dts: Configure system timers for omap3")
    Signed-off-by: Anthoine Bourgeois <[email protected]>
    Signed-off-by: Tony Lindgren <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: dts: Use 32KiHz oscillator on devkit8000 [+ + +]

Author: Anthoine Bourgeois <[email protected]>
Date:   Tue Jan 25 20:11:39 2022 +0100

    ARM: dts: Use 32KiHz oscillator on devkit8000
    
    [ Upstream commit 8840f5460a23759403f1f2860429dcbcc2f04a65 ]
    
    Devkit8000 board seems to always used 32k_counter as clocksource.
    Restore this behavior.
    
    If clocksource is back to 32k_counter, timer12 is now the clockevent
    source (as before) and timer2 is not longer needed here.
    
    This commit fixes the same issue observed with commit 23885389dbbb
    ("ARM: dts: Fix timer regression for beagleboard revision c") when sleep
    is blocked until hitting keys over serial console.
    
    Fixes: aba1ad05da08 ("clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support")
    Fixes: e428e250fde6 ("ARM: dts: Configure system timers for omap3")
    Signed-off-by: Anthoine Bourgeois <[email protected]>
    Signed-off-by: Tony Lindgren <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: Fix kgdb breakpoint for Thumb2 [+ + +]

Author: Russell King (Oracle) <[email protected]>
Date:   Wed Feb 16 15:37:38 2022 +0000

    ARM: Fix kgdb breakpoint for Thumb2
    
    commit d920eaa4c4559f59be7b4c2d26fa0a2e1aaa3da9 upstream.
    
    The kgdb code needs to register an undef hook for the Thumb UDF
    instruction that will fault in order to be functional on Thumb2
    platforms.
    
    Reported-by: Johannes Stezenbach <[email protected]>
    Tested-by: Johannes Stezenbach <[email protected]>
    Fixes: 5cbad0ebf45c ("kgdb: support for ARCH=arm")
    Signed-off-by: Russell King (Oracle) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: tegra: Move panels to AUX bus [+ + +]

Author: Thierry Reding <[email protected]>
Date:   Mon Dec 20 11:32:39 2021 +0100

    ARM: tegra: Move panels to AUX bus
    
    [ Upstream commit 8d3b01e0d4bb54368d73d0984466d72c2eeeac74 ]
    
    Move the eDP panel on Venice 2 and Nyan boards into the corresponding
    AUX bus device tree node. This allows us to avoid a nasty circular
    dependency that would otherwise be created between the DPAUX and panel
    nodes via the DDC/I2C phandle.
    
    Fixes: eb481f9ac95c ("ARM: tegra: add Acer Chromebook 13 device tree")
    Fixes: 59fe02cb079f ("ARM: tegra: Add DTS for the nyan-blaze board")
    Fixes: 40e231c770a4 ("ARM: tegra: Enable eDP for Venice2")
    Signed-off-by: Thierry Reding <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: cs4265: Fix the duplicated control name [+ + +]

Author: Fabio Estevam <[email protected]>
Date:   Tue Feb 15 09:05:14 2022 -0300

    ASoC: cs4265: Fix the duplicated control name
    
    commit c5487b9cdea5c1ede38a7ec94db0fc59963c8e86 upstream.
    
    Currently, the following error messages are seen during boot:
    
    asoc-simple-card sound: control 2:0:0:SPDIF Switch:0 is already present
    cs4265 1-004f: ASoC: failed to add widget SPDIF dapm kcontrol SPDIF Switch: -16
    
    Quoting Mark Brown:
    
    "The driver is just plain buggy, it defines both a regular SPIDF Switch
    control and a SND_SOC_DAPM_SWITCH() called SPDIF both of which will
    create an identically named control, it can never have loaded without
    error.  One or both of those has to be renamed or they need to be
    merged into one thing."
    
    Fix the duplicated control name by combining the two SPDIF controls here
    and move the register bits onto the DAPM widget and have DAPM control them.
    
    Fixes: f853d6b3ba34 ("ASoC: cs4265: Add a S/PDIF enable switch")
    Signed-off-by: Fabio Estevam <[email protected]>
    Acked-by: Charles Keepax <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min [+ + +]

Author: Marek Vasut <[email protected]>
Date:   Tue Feb 15 14:06:45 2022 +0100

    ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
    
    commit 9bdd10d57a8807dba0003af0325191f3cec0f11c upstream.
    
    While the $val/$val2 values passed in from userspace are always >= 0
    integers, the limits of the control can be signed integers and the $min
    can be non-zero and less than zero. To correctly validate $val/$val2
    against platform_max, add the $min offset to val first.
    
    Fixes: 817f7c9335ec0 ("ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()")
    Signed-off-by: Marek Vasut <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ASoC: rt5668: do not block workqueue if card is unbound [+ + +]

Author: Kai Vehmanen <[email protected]>
Date:   Mon Feb 7 17:29:59 2022 +0200

    ASoC: rt5668: do not block workqueue if card is unbound
    
    [ Upstream commit a6d78661dc903d90a327892bbc34268f3a5f4b9c ]
    
    The current rt5668_jack_detect_handler() assumes the component
    and card will always show up and implements an infinite usleep
    loop waiting for them to show up.
    
    This does not hold true if a codec interrupt (or other
    event) occurs when the card is unbound. The codec driver's
    remove  or shutdown functions cannot cancel the workqueue due
    to the wait loop. As a result, code can either end up blocking
    the workqueue, or hit a kernel oops when the card is freed.
    
    Fix the issue by rescheduling the jack detect handler in
    case the card is not ready. In case card never shows up,
    the shutdown/remove/suspend calls can now cancel the detect
    task.
    
    Signed-off-by: Kai Vehmanen <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Ranjani Sridharan <[email protected]>
    Reviewed-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Pц╘ter Ujfalusi <[email protected]>
    Reviewed-by: Shuming Fan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rt5682: do not block workqueue if card is unbound [+ + +]

Author: Kai Vehmanen <[email protected]>
Date:   Mon Feb 7 17:30:00 2022 +0200

    ASoC: rt5682: do not block workqueue if card is unbound
    
    [ Upstream commit 4c33de0673ced9c7c37b3bbd9bfe0fda72340b2a ]
    
    The current rt5682_jack_detect_handler() assumes the component
    and card will always show up and implements an infinite usleep
    loop waiting for them to show up.
    
    This does not hold true if a codec interrupt (or other
    event) occurs when the card is unbound. The codec driver's
    remove  or shutdown functions cannot cancel the workqueue due
    to the wait loop. As a result, code can either end up blocking
    the workqueue, or hit a kernel oops when the card is freed.
    
    Fix the issue by rescheduling the jack detect handler in
    case the card is not ready. In case card never shows up,
    the shutdown/remove/suspend calls can now cancel the detect
    task.
    
    Signed-off-by: Kai Vehmanen <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Ranjani Sridharan <[email protected]>
    Reviewed-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Pц╘ter Ujfalusi <[email protected]>
    Reviewed-by: Shuming Fan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ata: pata_hpt37x: fix PCI clock detection [+ + +]

Author: Sergey Shtylyov <[email protected]>
Date:   Sat Feb 19 23:04:29 2022 +0300

    ata: pata_hpt37x: fix PCI clock detection
    
    [ Upstream commit 5f6b0f2d037c8864f20ff15311c695f65eb09db5 ]
    
    The f_CNT register (at the PCI config. address 0x78) is 16-bit, not
    8-bit! The bug was there from the very start... :-(
    
    Signed-off-by: Sergey Shtylyov <[email protected]>
    Fixes: 669a5db411d8 ("[libata] Add a bunch of PATA drivers.")
    Cc: [email protected]
    Signed-off-by: Damien Le Moal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature [+ + +]

Author: Andy Shevchenko <[email protected]>
Date:   Wed Feb 23 17:47:16 2022 +0200

    auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
    
    commit 4424c35ead667ba2e8de7ab8206da66453e6f728 upstream.
    
    It seems that the lcd2s_redefine_char() has never been properly
    tested. The buffer is filled by DEF_CUSTOM_CHAR command followed
    by the character number (from 0 to 7), but immediately after that
    these bytes are rewritten by the decoded hex stream.
    
    Fix the index to fill the buffer after the command and number.
    
    Fixes: 8c9108d014c5 ("auxdisplay: add a driver for lcd2s character display")
    Cc: Lars Poeschel <[email protected]>
    Signed-off-by: Andy Shevchenko <[email protected]>
    Reviewed-by: Geert Uytterhoeven <[email protected]>
    [fixed typo in commit message]
    Signed-off-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

auxdisplay: lcd2s: Fix memory leak in ->remove() [+ + +]

Author: Andy Shevchenko <[email protected]>
Date:   Wed Feb 23 17:47:17 2022 +0200

    auxdisplay: lcd2s: Fix memory leak in ->remove()
    
    commit 898c0a15425a5bcaa8d44bd436eae5afd2483796 upstream.
    
    Once allocated the struct lcd2s_data is never freed.
    Fix the memory leak by switching to devm_kzalloc().
    
    Fixes: 8c9108d014c5 ("auxdisplay: add a driver for lcd2s character display")
    Cc: Lars Poeschel <[email protected]>
    Signed-off-by: Andy Shevchenko <[email protected]>
    Signed-off-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

auxdisplay: lcd2s: Use proper API to free the instance of charlcd object [+ + +]

Author: Andy Shevchenko <[email protected]>
Date:   Wed Feb 23 17:47:18 2022 +0200

    auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
    
    commit 9ed331f8a0fb674f4f06edf05a1687bf755af27b upstream.
    
    While it might work, the current approach is fragile in a few ways:
    - whenever members in the structure are shuffled, the pointer will be wrong
    - the resource freeing may include more than covered by kfree()
    
    Fix this by using charlcd_free() call instead of kfree().
    
    Fixes: 8c9108d014c5 ("auxdisplay: add a driver for lcd2s character display")
    Cc: Lars Poeschel <[email protected]>
    Signed-off-by: Andy Shevchenko <[email protected]>
    Signed-off-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Don't expect inter-netns unique iflink indices [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Sun Feb 27 23:23:49 2022 +0100

    batman-adv: Don't expect inter-netns unique iflink indices
    
    commit 6c1f41afc1dbe59d9d3c8bb0d80b749c119aa334 upstream.
    
    The ifindex doesn't have to be unique for multiple network namespaces on
    the same machine.
    
      $ ip netns add test1
      $ ip -net test1 link add dummy1 type dummy
      $ ip netns add test2
      $ ip -net test2 link add dummy2 type dummy
    
      $ ip -net test1 link show dev dummy1
      6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
      $ ip -net test2 link show dev dummy2
      6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff
    
    But the batman-adv code to walk through the various layers of virtual
    interfaces uses this assumption because dev_get_iflink handles it
    internally and doesn't return the actual netns of the iflink. And
    dev_get_iflink only documents the situation where ifindex == iflink for
    physical devices.
    
    But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
    because ipoib_get_iflink implements it even when it sometimes returns an
    iflink != ifindex and sometimes iflink == ifindex. The caller must
    therefore make sure itself to check both netns and iflink + ifindex for
    equality. Only when they are equal, a "physical" interface was detected
    which should stop the traversal. On the other hand, vxcan_get_iflink can
    also return 0 in case there was currently no valid peer. In this case, it
    is still necessary to stop.
    
    Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
    Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
    Reported-by: Sabrina Dubroca <[email protected]>
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Request iflink once in batadv-on-batadv check [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Mon Feb 28 00:01:24 2022 +0100

    batman-adv: Request iflink once in batadv-on-batadv check
    
    commit 690bb6fb64f5dc7437317153902573ecad67593d upstream.
    
    There is no need to call dev_get_iflink multiple times for the same
    net_device in batadv_is_on_batman_iface. And since some of the
    .ndo_get_iflink callbacks are dynamic (for example via RCUs like in
    vxcan_get_iflink), it could easily happen that the returned values are not
    stable. The pre-checks before __dev_get_by_index are then of course bogus.
    
    Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Request iflink once in batadv_get_real_netdevice [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Mon Feb 28 00:01:24 2022 +0100

    batman-adv: Request iflink once in batadv_get_real_netdevice
    
    commit 6116ba09423f7d140f0460be6a1644dceaad00da upstream.
    
    There is no need to call dev_get_iflink multiple times for the same
    net_device in batadv_get_real_netdevice. And since some of the
    ndo_get_iflink callbacks are dynamic (for example via RCUs like in
    vxcan_get_iflink), it could easily happen that the returned values are not
    stable. The pre-checks before __dev_get_by_index are then of course bogus.
    
    Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

blktrace: fix use after free for struct blk_trace [+ + +]

Author: Yu Kuai <[email protected]>
Date:   Mon Feb 28 11:43:54 2022 +0800

    blktrace: fix use after free for struct blk_trace
    
    commit 30939293262eb433c960c4532a0d59c4073b2b84 upstream.
    
    When tracing the whole disk, 'dropped' and 'msg' will be created
    under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free()
    won't remove those files. What's worse, the following UAF can be
    triggered because of accessing stale 'dropped' and 'msg':
    
    ==================================================================
    BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100
    Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188
    
    CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
    Call Trace:
     <TASK>
     dump_stack_lvl+0x34/0x44
     print_address_description.constprop.0.cold+0xab/0x381
     ? blk_dropped_read+0x89/0x100
     ? blk_dropped_read+0x89/0x100
     kasan_report.cold+0x83/0xdf
     ? blk_dropped_read+0x89/0x100
     kasan_check_range+0x140/0x1b0
     blk_dropped_read+0x89/0x100
     ? blk_create_buf_file_callback+0x20/0x20
     ? kmem_cache_free+0xa1/0x500
     ? do_sys_openat2+0x258/0x460
     full_proxy_read+0x8f/0xc0
     vfs_read+0xc6/0x260
     ksys_read+0xb9/0x150
     ? vfs_write+0x3d0/0x3d0
     ? fpregs_assert_state_consistent+0x55/0x60
     ? exit_to_user_mode_prepare+0x39/0x1e0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7fbc080d92fd
    Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1
    RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
    RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd
    RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045
    RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd
    R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0
    R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8
     </TASK>
    
    Allocated by task 1050:
     kasan_save_stack+0x1e/0x40
     __kasan_kmalloc+0x81/0xa0
     do_blk_trace_setup+0xcb/0x410
     __blk_trace_setup+0xac/0x130
     blk_trace_ioctl+0xe9/0x1c0
     blkdev_ioctl+0xf1/0x390
     __x64_sys_ioctl+0xa5/0xe0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Freed by task 1050:
     kasan_save_stack+0x1e/0x40
     kasan_set_track+0x21/0x30
     kasan_set_free_info+0x20/0x30
     __kasan_slab_free+0x103/0x180
     kfree+0x9a/0x4c0
     __blk_trace_remove+0x53/0x70
     blk_trace_ioctl+0x199/0x1c0
     blkdev_common_ioctl+0x5e9/0xb30
     blkdev_ioctl+0x1a5/0x390
     __x64_sys_ioctl+0xa5/0xe0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    The buggy address belongs to the object at ffff88816912f380
     which belongs to the cache kmalloc-96 of size 96
    The buggy address is located 88 bytes inside of
     96-byte region [ffff88816912f380, ffff88816912f3e0)
    The buggy address belongs to the page:
    page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f
    flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
    raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780
    raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
     ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                        ^
     ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
     ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    ==================================================================
    
    Fixes: c0ea57608b69 ("blktrace: remove debugfs file dentries from struct blk_trace")
    Signed-off-by: Yu Kuai <[email protected]>
    Reviewed-by: Greg Kroah-Hartman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern [+ + +]

Author: Haimin Zhang <[email protected]>
Date:   Wed Feb 16 16:40:38 2022 +0800

    block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
    
    [ Upstream commit cc8f7fe1f5eab010191aa4570f27641876fa1267 ]
    
    Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize
    the buffer of a bio.
    
    Signed-off-by: Haimin Zhang <[email protected]>
    Reviewed-by: Chaitanya Kulkarni <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

block: loop:use kstatfs.f_bsize of backing file to set discard granularity [+ + +]

Author: Ming Lei <[email protected]>
Date:   Wed Jan 26 11:58:30 2022 +0800

    block: loop:use kstatfs.f_bsize of backing file to set discard granularity
    
    [ Upstream commit 06582bc86d7f48d35cd044098ca1e246e8c7c52e ]
    
    If backing file's filesystem has implemented ->fallocate(), we think the
    loop device can support discard, then pass sb->s_blocksize as
    discard_granularity. However, some underlying FS, such as overlayfs,
    doesn't set sb->s_blocksize, and causes discard_granularity to be set as
    zero, then the warning in __blkdev_issue_discard() is triggered.
    
    Christoph suggested to pass kstatfs.f_bsize as discard granularity, and
    this way is fine because kstatfs.f_bsize means 'Optimal transfer block
    size', which still matches with definition of discard granularity.
    
    So fix the issue by setting discard_granularity as kstatfs.f_bsize if it
    is available, otherwise claims discard isn't supported.
    
    Cc: Christoph Hellwig <[email protected]>
    Cc: Vivek Goyal <[email protected]>
    Reported-by: Pei Zhang <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bnxt_en: Fix occasional ethtool -t loopback test failures [+ + +]

Author: Michael Chan <[email protected]>
Date:   Sun Feb 20 04:05:49 2022 -0500

    bnxt_en: Fix occasional ethtool -t loopback test failures
    
    [ Upstream commit cfcab3b3b61584a02bb523ffa99564eafa761dfe ]
    
    In the current code, we setup the port to PHY or MAC loopback mode
    and then transmit a test broadcast packet for the loopback test.  This
    scheme fails sometime if the port is shared with management firmware
    that can also send packets.  The driver may receive the management
    firmware's packet and the test will fail when the contents don't
    match the test packet.
    
    Change the test packet to use it's own MAC address as the destination
    and setup the port to only receive it's own MAC address.  This should
    filter out other packets sent by management firmware.
    
    Fixes: 91725d89b97a ("bnxt_en: Add PHY loopback to ethtool self-test.")
    Reviewed-by: Pavan Chebbi <[email protected]>
    Reviewed-by: Edwin Peer <[email protected]>
    Reviewed-by: Andy Gospodarek <[email protected]>
    Signed-off-by: Michael Chan <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf, sockmap: Do not ignore orig_len parameter [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Wed Mar 2 08:17:22 2022 -0800

    bpf, sockmap: Do not ignore orig_len parameter
    
    commit 60ce37b03917e593d8e5d8bcc7ec820773daf81d upstream.
    
    Currently, sk_psock_verdict_recv() returns skb->len
    
    This is problematic because tcp_read_sock() might have
    passed orig_len < skb->len, due to the presence of TCP urgent data.
    
    This causes an infinite loop from tcp_read_sock()
    
    Followup patch will make tcp_read_sock() more robust vs bad actors.
    
    Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Acked-by: John Fastabend <[email protected]>
    Acked-by: Jakub Sitnicki <[email protected]>
    Tested-by: Jakub Sitnicki <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

bpf: Fix possible race in inc_misses_counter [+ + +]

Author: He Fengqing <[email protected]>
Date:   Sat Jan 22 10:29:36 2022 +0000

    bpf: Fix possible race in inc_misses_counter
    
    [ Upstream commit 0e3135d3bfa5dfb658145238d2bc723a8e30c3a3 ]
    
    It seems inc_misses_counter() suffers from same issue fixed in
    the commit d979617aa84d ("bpf: Fixes possible race in update_prog_stats()
    for 32bit arches"):
    As it can run while interrupts are enabled, it could
    be re-entered and the u64_stats syncp could be mangled.
    
    Fixes: 9ed9e9ba2337 ("bpf: Count the number of times recursion was prevented")
    Signed-off-by: He Fengqing <[email protected]>
    Acked-by: John Fastabend <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf: Use u64_stats_t in struct bpf_prog_stats [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Tue Oct 26 14:41:33 2021 -0700

    bpf: Use u64_stats_t in struct bpf_prog_stats
    
    [ Upstream commit 61a0abaee2092eee69e44fe60336aa2f5b578938 ]
    
    Commit 316580b69d0a ("u64_stats: provide u64_stats_t type")
    fixed possible load/store tearing on 64bit arches.
    
    For instance the following C code
    
    stats->nsecs += sched_clock() - start;
    
    Could be rightfully implemented like this by a compiler,
    confusing concurrent readers a lot:
    
    stats->nsecs += sched_clock();
    // arbitrary delay
    stats->nsecs -= start;
    
    Signed-off-by: Eric Dumazet <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: add missing run of delayed items after unlink during log replay [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Mon Feb 28 16:29:28 2022 +0000

    btrfs: add missing run of delayed items after unlink during log replay
    
    commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream.
    
    During log replay, whenever we need to check if a name (dentry) exists in
    a directory we do searches on the subvolume tree for inode references or
    or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
    keys as well, before kernel 5.17). However when during log replay we
    unlink a name, through btrfs_unlink_inode(), we may not delete inode
    references and dir index keys from a subvolume tree and instead just add
    the deletions to the delayed inode's delayed items, which will only be
    run when we commit the transaction used for log replay. This means that
    after an unlink operation during log replay, if we attempt to search for
    the same name during log replay, we will not see that the name was already
    deleted, since the deletion is recorded only on the delayed items.
    
    We run delayed items after every unlink operation during log replay,
    except at unlink_old_inode_refs() and at add_inode_ref(). This was due
    to an overlook, as delayed items should be run after evert unlink, for
    the reasons stated above.
    
    So fix those two cases.
    
    Fixes: 0d836392cadd5 ("Btrfs: fix mount failure after fsync due to hard link recreation")
    Fixes: 1f250e929a9c9 ("Btrfs: fix log replay failure after unlink and link combination")
    CC: [email protected] # 4.19+
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: do not start relocation until in progress drops are done [+ + +]

Author: Josef Bacik <[email protected]>
Date:   Fri Feb 18 14:56:10 2022 -0500

    btrfs: do not start relocation until in progress drops are done
    
    commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream.
    
    We hit a bug with a recovering relocation on mount for one of our file
    systems in production.  I reproduced this locally by injecting errors
    into snapshot delete with balance running at the same time.  This
    presented as an error while looking up an extent item
    
      WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
      CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
      RIP: 0010:lookup_inline_extent_backref+0x647/0x680
      RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
      RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
      R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
      R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
      FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
      Call Trace:
       <TASK>
       insert_inline_extent_backref+0x46/0xd0
       __btrfs_inc_extent_ref.isra.0+0x5f/0x200
       ? btrfs_merge_delayed_refs+0x164/0x190
       __btrfs_run_delayed_refs+0x561/0xfa0
       ? btrfs_search_slot+0x7b4/0xb30
       ? btrfs_update_root+0x1a9/0x2c0
       btrfs_run_delayed_refs+0x73/0x1f0
       ? btrfs_update_root+0x1a9/0x2c0
       btrfs_commit_transaction+0x50/0xa50
       ? btrfs_update_reloc_root+0x122/0x220
       prepare_to_merge+0x29f/0x320
       relocate_block_group+0x2b8/0x550
       btrfs_relocate_block_group+0x1a6/0x350
       btrfs_relocate_chunk+0x27/0xe0
       btrfs_balance+0x777/0xe60
       balance_kthread+0x35/0x50
       ? btrfs_balance+0xe60/0xe60
       kthread+0x16b/0x190
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x22/0x30
       </TASK>
    
    Normally snapshot deletion and relocation are excluded from running at
    the same time by the fs_info->cleaner_mutex.  However if we had a
    pending balance waiting to get the ->cleaner_mutex, and a snapshot
    deletion was running, and then the box crashed, we would come up in a
    state where we have a half deleted snapshot.
    
    Again, in the normal case the snapshot deletion needs to complete before
    relocation can start, but in this case relocation could very well start
    before the snapshot deletion completes, as we simply add the root to the
    dead roots list and wait for the next time the cleaner runs to clean up
    the snapshot.
    
    Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
    had a pending drop_progress key.  If they do then we know we were in the
    middle of the drop operation and set a flag on the fs_info.  Then
    balance can wait until this flag is cleared to start up again.
    
    If there are DEAD_ROOT's that don't have a drop_progress set then we're
    safe to start balance right away as we'll be properly protected by the
    cleaner_mutex.
    
    CC: [email protected] # 5.10+
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: do not WARN_ON() if we have PageError set [+ + +]

Author: Josef Bacik <[email protected]>
Date:   Fri Feb 18 10:17:39 2022 -0500

    btrfs: do not WARN_ON() if we have PageError set
    
    commit a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 upstream.
    
    Whenever we do any extent buffer operations we call
    assert_eb_page_uptodate() to complain loudly if we're operating on an
    non-uptodate page.  Our overnight tests caught this warning earlier this
    week
    
      WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
      CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G        W         5.17.0-rc3+ #564
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Workqueue: btrfs-cache btrfs_work_helper
      RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
      RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
      RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
      RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
      RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
      R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
      R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
      FS:  0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
      Call Trace:
    
       extent_buffer_test_bit+0x3f/0x70
       free_space_test_bit+0xa6/0xc0
       load_free_space_tree+0x1f6/0x470
       caching_thread+0x454/0x630
       ? rcu_read_lock_sched_held+0x12/0x60
       ? rcu_read_lock_sched_held+0x12/0x60
       ? rcu_read_lock_sched_held+0x12/0x60
       ? lock_release+0x1f0/0x2d0
       btrfs_work_helper+0xf2/0x3e0
       ? lock_release+0x1f0/0x2d0
       ? finish_task_switch.isra.0+0xf9/0x3a0
       process_one_work+0x26d/0x580
       ? process_one_work+0x580/0x580
       worker_thread+0x55/0x3b0
       ? process_one_work+0x580/0x580
       kthread+0xf0/0x120
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x1f/0x30
    
    This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer
    uptodate when we fail to write it"), however all that fix did was keep
    us from finding extent buffers after a failed writeout.  It didn't keep
    us from continuing to use a buffer that we already had found.
    
    In this case we're searching the commit root to cache the block group,
    so we can start committing the transaction and switch the commit root
    and then start writing.  After the switch we can look up an extent
    buffer that hasn't been written yet and start processing that block
    group.  Then we fail to write that block out and clear Uptodate on the
    page, and then we start spewing these errors.
    
    Normally we're protected by the tree lock to a certain degree here.  If
    we read a block we have that block read locked, and we block the writer
    from locking the block before we submit it for the write.  However this
    isn't necessarily fool proof because the read could happen before we do
    the submit_bio and after we locked and unlocked the extent buffer.
    
    Also in this particular case we have path->skip_locking set, so that
    won't save us here.  We'll simply get a block that was valid when we
    read it, but became invalid while we were using it.
    
    What we really want is to catch the case where we've "read" a block but
    it's not marked Uptodate.  On read we ClearPageError(), so if we're
    !Uptodate and !Error we know we didn't do the right thing for reading
    the page.
    
    Fix this by checking !Uptodate && !Error, this way we will not complain
    if our buffer gets invalidated while we're using it, and we'll maintain
    the spirit of the check which is to make sure we have a fully in-cache
    block while we're messing with it.
    
    CC: [email protected] # 5.4+
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Oct 28 16:03:41 2021 +0100

    btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range
    
    commit f0bfa76a11e93d0fe2c896fcb566568c5e8b5d3f upstream.
    
    When doing a direct IO write against a file range that either has
    preallocated extents in that range or has regular extents and the file
    has the NOCOW attribute set, the write fails with -ENOSPC when all of
    the following conditions are met:
    
    1) There are no data blocks groups with enough free space matching
       the size of the write;
    
    2) There's not enough unallocated space for allocating a new data block
       group;
    
    3) The extents in the target file range are not shared, neither through
       snapshots nor through reflinks.
    
    This is wrong because a NOCOW write can be done in such case, and in fact
    it's possible to do it using a buffered IO write, since when failing to
    allocate data space, the buffered IO path checks if a NOCOW write is
    possible.
    
    The failure in direct IO write path comes from the fact that early on,
    at btrfs_dio_iomap_begin(), we try to allocate data space for the write
    and if it that fails we return the error and stop - we never check if we
    can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check
    if we can do a NOCOW write into the range, or a subset of the range, and
    then release the previously reserved data space.
    
    Fix this by doing the data reservation only if needed, when we must COW,
    at btrfs_get_blocks_direct_write() instead of doing it at
    btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes
    the inneficiency of doing unnecessary data reservations.
    
    The following example test script reproduces the problem:
    
      $ cat dio-nocow-enospc.sh
      #!/bin/bash
    
      DEV=/dev/sdj
      MNT=/mnt/sdj
    
      # Use a small fixed size (1G) filesystem so that it's quick to fill
      # it up.
      # Make sure the mixed block groups feature is not enabled because we
      # later want to not have more space available for allocating data
      # extents but still have enough metadata space free for the file writes.
      mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV
      mount $DEV $MNT
    
      # Create our test file with the NOCOW attribute set.
      touch $MNT/foobar
      chattr +C $MNT/foobar
    
      # Now fill in all unallocated space with data for our test file.
      # This will allocate a data block group that will be full and leave
      # no (or a very small amount of) unallocated space in the device, so
      # that it will not be possible to allocate a new block group later.
      echo
      echo "Creating test file with initial data..."
      xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar
    
      # Now try a direct IO write against file range [0, 10M[.
      # This should succeed since this is a NOCOW file and an extent for the
      # range was previously allocated.
      echo
      echo "Trying direct IO write over allocated space..."
      xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar
    
      umount $MNT
    
    When running the test:
    
      $ ./dio-nocow-enospc.sh
      (...)
    
      Creating test file with initial data...
      wrote 943718400/943718400 bytes at offset 0
      900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec)
    
      Trying direct IO write over allocated space...
      pwrite: No space left on device
    
    A test case for fstests will follow, testing both this direct IO write
    scenario as well as the buffered IO write scenario to make it less likely
    to get future regressions on the buffered IO case.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Anand Jain <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix lost prealloc extents beyond eof after full fsync [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Feb 17 12:12:02 2022 +0000

    btrfs: fix lost prealloc extents beyond eof after full fsync
    
    commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream.
    
    When doing a full fsync, if we have prealloc extents beyond (or at) eof,
    and the leaves that contain them were not modified in the current
    transaction, we end up not logging them. This results in losing those
    extents when we replay the log after a power failure, since the inode is
    truncated to the current value of the logged i_size.
    
    Just like for the fast fsync path, we need to always log all prealloc
    extents starting at or beyond i_size. The fast fsync case was fixed in
    commit 471d557afed155 ("Btrfs: fix loss of prealloc extents past i_size
    after fsync log replay") but it missed the full fsync path. The problem
    exists since the very early days, when the log tree was added by
    commit e02119d5a7b439 ("Btrfs: Add a write ahead tree log to optimize
    synchronous operations").
    
    Example reproducer:
    
      $ mkfs.btrfs -f /dev/sdc
      $ mount /dev/sdc /mnt
    
      # Create our test file with many file extent items, so that they span
      # several leaves of metadata, even if the node/page size is 64K. Use
      # direct IO and not fsync/O_SYNC because it's both faster and it avoids
      # clearing the full sync flag from the inode - we want the fsync below
      # to trigger the slow full sync code path.
      $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo
    
      # Now add two preallocated extents to our file without extending the
      # file's size. One right at i_size, and another further beyond, leaving
      # a gap between the two prealloc extents.
      $ xfs_io -c "falloc -k 16M 1M" /mnt/foo
      $ xfs_io -c "falloc -k 20M 1M" /mnt/foo
    
      # Make sure everything is durably persisted and the transaction is
      # committed. This makes all created extents to have a generation lower
      # than the generation of the transaction used by the next write and
      # fsync.
      sync
    
      # Now overwrite only the first extent, which will result in modifying
      # only the first leaf of metadata for our inode. Then fsync it. This
      # fsync will use the slow code path (inode full sync bit is set) because
      # it's the first fsync since the inode was created/loaded.
      $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo
    
      # Extent list before power failure.
      $ xfs_io -c "fiemap -v" /mnt/foo
      /mnt/foo:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..7]:          2178048..2178055     8   0x0
         1: [8..16383]:      26632..43007     16376   0x0
         2: [16384..32767]:  2156544..2172927 16384   0x0
         3: [32768..34815]:  2172928..2174975  2048 0x800
         4: [34816..40959]:  hole              6144
         5: [40960..43007]:  2174976..2177023  2048 0x801
    
      <power fail>
    
      # Mount fs again, trigger log replay.
      $ mount /dev/sdc /mnt
    
      # Extent list after power failure and log replay.
      $ xfs_io -c "fiemap -v" /mnt/foo
      /mnt/foo:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..7]:          2178048..2178055     8   0x0
         1: [8..16383]:      26632..43007     16376   0x0
         2: [16384..32767]:  2156544..2172927 16384   0x1
    
      # The prealloc extents at file offsets 16M and 20M are missing.
    
    So fix this by calling btrfs_log_prealloc_extents() when we are doing a
    full fsync, so that we always log all prealloc extents beyond eof.
    
    A test case for fstests will follow soon.
    
    CC: [email protected] # 4.19+
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() [+ + +]

Author: Omar Sandoval <[email protected]>
Date:   Thu Feb 17 15:14:43 2022 -0800

    btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
    
    commit 5fd76bf31ccfecc06e2e6b29f8c809e934085b99 upstream.
    
    We are seeing crashes similar to the following trace:
    
    [38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
    [38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
    [38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    [38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
    [38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
    [38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
    [38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
    [38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
    [38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
    [38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
    [38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
    [38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
    [38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [38.992528] Call Trace:
    [38.992854]  <TASK>
    [38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
    [38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
    [38.994801]  ? vsnprintf+0x33c/0x520
    [38.995368]  ? __kmalloc_track_caller+0x351/0x440
    [38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
    [38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
    [38.997867]  ? mod_objcg_state+0xee/0x340
    [38.998552]  ? seq_release+0x24/0x30
    [38.999184]  ? proc_nr_files+0x30/0x30
    [38.999654]  ? call_rcu+0xc8/0x2f0
    [39.000228]  ? __x64_sys_ioctl+0x84/0xc0
    [39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
    [39.001973]  __x64_sys_ioctl+0x84/0xc0
    [39.002566]  do_syscall_64+0x3a/0x80
    [39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [39.003735] RIP: 0033:0x7f11c166959b
    [39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
    [39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
    [39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
    [39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
    [39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
    [39.015040]  </TASK>
    [39.015418] ---[ end trace 0000000000000000 ]---
    [43.131559] ------------[ cut here ]------------
    [43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
    [43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
    [43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
    [43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    [43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
    [43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
    [43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
    [43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
    [43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
    [43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
    [43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
    [43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
    [43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
    [43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [43.150559] Call Trace:
    [43.150904]  <TASK>
    [43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
    [43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
    [43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
    [43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
    [43.154475]  ? __check_object_size+0x150/0x170
    [43.155170]  ? preempt_count_add+0x49/0xa0
    [43.155753]  ? __x64_sys_ioctl+0x84/0xc0
    [43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
    [43.157456]  __x64_sys_ioctl+0x84/0xc0
    [43.157980]  do_syscall_64+0x3a/0x80
    [43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [43.159231] RIP: 0033:0x7f7657f1e59b
    [43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
    [43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
    [43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
    [43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
    [43.166863]  </TASK>
    [43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
    [43.169552] ---[ end trace 0000000000000000 ]---
    [43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
    [43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
    [43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
    [43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
    [43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
    [43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
    [43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
    [43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
    [43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
    [43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    
    We first hit the WARN_ON(rc->block_group->pinned > 0) in
    btrfs_relocate_block_group() and then the BUG_ON(!cache) in
    unpin_extent_range(). This tells us that we are exiting relocation and
    removing the block group with bytes still pinned for that block group.
    This is supposed to be impossible: the last thing relocate_block_group()
    does is commit the transaction to get rid of pinned extents.
    
    Commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when
    waiting for a transaction commit") introduced an optimization so that
    commits from fsync don't have to wait for the previous commit to unpin
    extents. This was only intended to affect fsync, but it inadvertently
    made it possible for any commit to skip waiting for the previous commit
    to unpin. This is because if a call to btrfs_commit_transaction() finds
    that another thread is already committing the transaction, it waits for
    the other thread to complete the commit and then returns. If that other
    thread was in fsync, then it completes the commit without completing the
    previous commit. This makes the following sequence of events possible:
    
    Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
    btrfs_commit_transaction(N) |                                     |
      btrfs_run_delayed_refs    |                                     |
        pin extents             |                                     |
      ...                       |                                     |
      state = UNBLOCKED         |btrfs_sync_file                      |
                                |  btrfs_start_transaction(N + 1)     |relocate_block_group
                                |                                     |  btrfs_join_transaction(N + 1)
                                |  btrfs_commit_transaction(N + 1)    |
      ...                       |  trans->state = COMMIT_START        |
                                |                                     |  btrfs_commit_transaction(N + 1)
                                |                                     |    wait_for_commit(N + 1, COMPLETED)
                                |  wait_for_commit(N, SUPER_COMMITTED)|
      state = SUPER_COMMITTED   |  ...                                |
      btrfs_finish_extent_commit|                                     |
        unpin_extent_range()    |  trans->state = COMPLETED           |
                                |                                     |    return
                                |                                     |
        ...                     |                                     |Thread 1 isn't done, so pinned > 0
                                |                                     |and we WARN
                                |                                     |
                                |                                     |btrfs_remove_block_group
        unpin_extent_range()    |                                     |
          Thread 3 removed the  |                                     |
          block group, so we BUG|                                     |
    
    There are other sequences involving SUPER_COMMITTED transactions that
    can cause a similar outcome.
    
    We could fix this by making relocation explicitly wait for unpinning,
    but there may be other cases that need it. Josef mentioned ENOSPC
    flushing and the free space cache inode as other potential victims.
    Rather than playing whack-a-mole, this fix is conservative and makes all
    commits not in fsync wait for all previous transactions, which is what
    the optimization intended.
    
    Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
    CC: [email protected] # 5.15+
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Omar Sandoval <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: get rid of warning on transaction commit when using flushoncommit [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Wed Feb 2 15:26:09 2022 +0000

    btrfs: get rid of warning on transaction commit when using flushoncommit
    
    [ Upstream commit a0f0cf8341e34e5d2265bfd3a7ad68342da1e2aa ]
    
    When using the flushoncommit mount option, during almost every transaction
    commit we trigger a warning from __writeback_inodes_sb_nr():
    
      $ cat fs/fs-writeback.c:
      (...)
      static void __writeback_inodes_sb_nr(struct super_block *sb, ...
      {
            (...)
            WARN_ON(!rwsem_is_locked(&sb->s_umount));
            (...)
      }
      (...)
    
    The trace produced in dmesg looks like the following:
    
      [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
      [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
      [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
      [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
      [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
      [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
      [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
      [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
      [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
      [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
      [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
      [947.553621] FS:  0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
      [947.560537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
      [947.571072] Call Trace:
      [947.572354]  <TASK>
      [947.573266]  btrfs_commit_transaction+0x1f1/0x998
      [947.576785]  ? start_transaction+0x3ab/0x44e
      [947.579867]  ? schedule_timeout+0x8a/0xdd
      [947.582716]  transaction_kthread+0xe9/0x156
      [947.585721]  ? btrfs_cleanup_transaction.isra.0+0x407/0x407
      [947.590104]  kthread+0x131/0x139
      [947.592168]  ? set_kthread_struct+0x32/0x32
      [947.595174]  ret_from_fork+0x22/0x30
      [947.597561]  </TASK>
      [947.598553] ---[ end trace 644721052755541c ]---
    
    This is because we started using writeback_inodes_sb() to flush delalloc
    when committing a transaction (when using -o flushoncommit), in order to
    avoid deadlocks with filesystem freeze operations. This change was made
    by commit ce8ea7cc6eb313 ("btrfs: don't call btrfs_start_delalloc_roots
    in flushoncommit"). After that change we started producing that warning,
    and every now and then a user reports this since the warning happens too
    often, it spams dmesg/syslog, and a user is unsure if this reflects any
    problem that might compromise the filesystem's reliability.
    
    We can not just lock the sb->s_umount semaphore before calling
    writeback_inodes_sb(), because that would at least deadlock with
    filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
    is called while we are holding that semaphore in write mode, and that can
    trigger a transaction commit, resulting in a deadlock. It would also
    trigger the same type of deadlock in the unmount path. Possibly, it could
    also introduce some other locking dependencies that lockdep would report.
    
    To fix this call try_to_writeback_inodes_sb() instead of
    writeback_inodes_sb(), because that will try to read lock sb->s_umount
    and then will only call writeback_inodes_sb() if it was able to lock it.
    This is fine because the cases where it can't read lock sb->s_umount
    are during a filesystem unmount or during a filesystem freeze - in those
    cases sb->s_umount is write locked and sync_filesystem() is called, which
    calls writeback_inodes_sb(). In other words, in all cases where we can't
    take a read lock on sb->s_umount, writeback is already being triggered
    elsewhere.
    
    An alternative would be to call btrfs_start_delalloc_roots() with a
    number of pages different from LONG_MAX, for example matching the number
    of delalloc bytes we currently have, in which case we would end up
    starting all delalloc with filemap_fdatawrite_wbc() and not with an
    async flush via filemap_flush() - that is only possible after the rather
    recent commit e076ab2a2ca70a ("btrfs: shrink delalloc pages instead of
    full inodes"). However that creates a whole new can of worms due to new
    lock dependencies, which lockdep complains, like for example:
    
    [ 8948.247280] ======================================================
    [ 8948.247823] WARNING: possible circular locking dependency detected
    [ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted
    [ 8948.248786] ------------------------------------------------------
    [ 8948.249320] kworker/u16:18/933570 is trying to acquire lock:
    [ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.250638]
                   but task is already holding lock:
    [ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
    [ 8948.252018]
                   which lock already depends on the new lock.
    
    [ 8948.252710]
                   the existing dependency chain (in reverse order) is:
    [ 8948.253343]
                   -> #2 (&root->delalloc_mutex){+.+.}-{3:3}:
    [ 8948.253950]        __mutex_lock+0x90/0x900
    [ 8948.254354]        start_delalloc_inodes+0x78/0x400 [btrfs]
    [ 8948.254859]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
    [ 8948.255408]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
    [ 8948.255942]        btrfs_mksubvol+0x380/0x570 [btrfs]
    [ 8948.256406]        btrfs_mksnapshot+0x81/0xb0 [btrfs]
    [ 8948.256870]        __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs]
    [ 8948.257413]        btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
    [ 8948.257961]        btrfs_ioctl+0x1196/0x3630 [btrfs]
    [ 8948.258418]        __x64_sys_ioctl+0x83/0xb0
    [ 8948.258793]        do_syscall_64+0x3b/0xc0
    [ 8948.259146]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 8948.259709]
                   -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}:
    [ 8948.260330]        __mutex_lock+0x90/0x900
    [ 8948.260692]        btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
    [ 8948.261234]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
    [ 8948.261766]        btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs]
    [ 8948.262379]        btrfs_start_pre_rw_mount+0x119/0x180 [btrfs]
    [ 8948.262909]        open_ctree+0x1511/0x171e [btrfs]
    [ 8948.263359]        btrfs_mount_root.cold+0x12/0xde [btrfs]
    [ 8948.263863]        legacy_get_tree+0x30/0x50
    [ 8948.264242]        vfs_get_tree+0x28/0xc0
    [ 8948.264594]        vfs_kern_mount.part.0+0x71/0xb0
    [ 8948.265017]        btrfs_mount+0x11d/0x3a0 [btrfs]
    [ 8948.265462]        legacy_get_tree+0x30/0x50
    [ 8948.265851]        vfs_get_tree+0x28/0xc0
    [ 8948.266203]        path_mount+0x2d4/0xbe0
    [ 8948.266554]        __x64_sys_mount+0x103/0x140
    [ 8948.266940]        do_syscall_64+0x3b/0xc0
    [ 8948.267300]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 8948.267790]
                   -> #0 (sb_internal#2){.+.+}-{0:0}:
    [ 8948.268322]        __lock_acquire+0x12e8/0x2260
    [ 8948.268733]        lock_acquire+0xd7/0x310
    [ 8948.269092]        start_transaction+0x44c/0x6e0 [btrfs]
    [ 8948.269591]        find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.270087]        btrfs_reserve_extent+0x14b/0x280 [btrfs]
    [ 8948.270588]        cow_file_range+0x17e/0x490 [btrfs]
    [ 8948.271051]        btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
    [ 8948.271586]        writepage_delalloc+0xb5/0x170 [btrfs]
    [ 8948.272071]        __extent_writepage+0x156/0x3c0 [btrfs]
    [ 8948.272579]        extent_write_cache_pages+0x263/0x460 [btrfs]
    [ 8948.273113]        extent_writepages+0x76/0x130 [btrfs]
    [ 8948.273573]        do_writepages+0xd2/0x1c0
    [ 8948.273942]        filemap_fdatawrite_wbc+0x68/0x90
    [ 8948.274371]        start_delalloc_inodes+0x17f/0x400 [btrfs]
    [ 8948.274876]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
    [ 8948.275417]        flush_space+0x1f2/0x630 [btrfs]
    [ 8948.275863]        btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
    [ 8948.276438]        process_one_work+0x252/0x5a0
    [ 8948.276829]        worker_thread+0x55/0x3b0
    [ 8948.277189]        kthread+0xf2/0x120
    [ 8948.277506]        ret_from_fork+0x22/0x30
    [ 8948.277868]
                   other info that might help us debug this:
    
    [ 8948.278548] Chain exists of:
                     sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex
    
    [ 8948.279601]  Possible unsafe locking scenario:
    
    [ 8948.280102]        CPU0                    CPU1
    [ 8948.280508]        ----                    ----
    [ 8948.280915]   lock(&root->delalloc_mutex);
    [ 8948.281271]                                lock(&fs_info->delalloc_root_mutex);
    [ 8948.281915]                                lock(&root->delalloc_mutex);
    [ 8948.282487]   lock(sb_internal#2);
    [ 8948.282800]
                    *** DEADLOCK ***
    
    [ 8948.283333] 4 locks held by kworker/u16:18/933570:
    [ 8948.283750]  #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
    [ 8948.284609]  #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
    [ 8948.285637]  #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
    [ 8948.286674]  #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
    [ 8948.287596]
                  stack backtrace:
    [ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1
    [ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    [ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
    [ 8948.290298] Call Trace:
    [ 8948.290517]  <TASK>
    [ 8948.290700]  dump_stack_lvl+0x59/0x73
    [ 8948.291026]  check_noncircular+0xf3/0x110
    [ 8948.291375]  ? start_transaction+0x228/0x6e0 [btrfs]
    [ 8948.291826]  __lock_acquire+0x12e8/0x2260
    [ 8948.292241]  lock_acquire+0xd7/0x310
    [ 8948.292714]  ? find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.293241]  ? lock_is_held_type+0xea/0x140
    [ 8948.293601]  start_transaction+0x44c/0x6e0 [btrfs]
    [ 8948.294055]  ? find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.294518]  find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.294957]  ? _raw_spin_unlock+0x29/0x40
    [ 8948.295312]  ? btrfs_get_alloc_profile+0x124/0x290 [btrfs]
    [ 8948.295813]  btrfs_reserve_extent+0x14b/0x280 [btrfs]
    [ 8948.296270]  cow_file_range+0x17e/0x490 [btrfs]
    [ 8948.296691]  btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
    [ 8948.297175]  ? find_lock_delalloc_range+0x247/0x270 [btrfs]
    [ 8948.297678]  writepage_delalloc+0xb5/0x170 [btrfs]
    [ 8948.298123]  __extent_writepage+0x156/0x3c0 [btrfs]
    [ 8948.298570]  extent_write_cache_pages+0x263/0x460 [btrfs]
    [ 8948.299061]  extent_writepages+0x76/0x130 [btrfs]
    [ 8948.299495]  do_writepages+0xd2/0x1c0
    [ 8948.299817]  ? sched_clock_cpu+0xd/0x110
    [ 8948.300160]  ? lock_release+0x155/0x4a0
    [ 8948.300494]  filemap_fdatawrite_wbc+0x68/0x90
    [ 8948.300874]  ? do_raw_spin_unlock+0x4b/0xa0
    [ 8948.301243]  start_delalloc_inodes+0x17f/0x400 [btrfs]
    [ 8948.301706]  ? lock_release+0x155/0x4a0
    [ 8948.302055]  btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
    [ 8948.302564]  flush_space+0x1f2/0x630 [btrfs]
    [ 8948.302970]  btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
    [ 8948.303510]  process_one_work+0x252/0x5a0
    [ 8948.303860]  ? process_one_work+0x5a0/0x5a0
    [ 8948.304221]  worker_thread+0x55/0x3b0
    [ 8948.304543]  ? process_one_work+0x5a0/0x5a0
    [ 8948.304904]  kthread+0xf2/0x120
    [ 8948.305184]  ? kthread_complete_and_exit+0x20/0x20
    [ 8948.305598]  ret_from_fork+0x22/0x30
    [ 8948.305921]  </TASK>
    
    It all comes from the fact that btrfs_start_delalloc_roots() takes the
    delalloc_root_mutex, in the transaction commit path we are holding a
    read lock on one of the superblock's freeze semaphores (via
    sb_start_intwrite()), the async reclaim task can also do a call to
    btrfs_start_delalloc_roots(), which ends up triggering writeback with
    calls to filemap_fdatawrite_wbc(), resulting in extent allocation which
    in turn can call btrfs_start_transaction(), which will result in taking
    the freeze semaphore via sb_start_intwrite(), forming a nasty dependency
    on all those locks which can be taken in different orders by different
    code paths.
    
    So just adopt the simple approach of calling try_to_writeback_inodes_sb()
    at btrfs_start_delalloc_flush().
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reviewed-by: Omar Sandoval <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    [ add more link reports ]
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: qgroup: fix deadlock between rescan worker and remove qgroup [+ + +]

Author: Sidong Yang <[email protected]>
Date:   Mon Feb 28 01:43:40 2022 +0000

    btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
    
    commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream.
    
    The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and
    qgroup rescan worker") by Kawasaki resolves deadlock between quota
    disable and qgroup rescan worker. But also there is a deadlock case like
    it. It's about enabling or disabling quota and creating or removing
    qgroup. It can be reproduced in simple script below.
    
    for i in {1..100}
    do
        btrfs quota enable /mnt &
        btrfs qgroup create 1/0 /mnt &
        btrfs qgroup destroy 1/0 /mnt &
        btrfs quota disable /mnt &
    done
    
    Here's why the deadlock happens:
    
    1) The quota rescan task is running.
    
    2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
       mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
       the quota rescan task to complete.
    
    3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
       the qgroup_ioctl_lock mutex, because it's being held by task A. At that
       point task B is holding a transaction handle for the current transaction.
    
    4) The quota rescan task calls btrfs_commit_transaction(). This results
       in it waiting for all other tasks to release their handles on the
       transaction, but task B is blocked on the qgroup_ioctl_lock mutex
       while holding a handle on the transaction, and that mutex is being held
       by task A, which is waiting for the quota rescan task to complete,
       resulting in a deadlock between these 3 tasks.
    
    To resolve this issue, the thread disabling quota should unlock
    qgroup_ioctl_lock before waiting rescan completion. Move
    btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.
    
    Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
    CC: [email protected] # 5.4+
    Reviewed-by: Filipe Manana <[email protected]>
    Reviewed-by: Shin'ichiro Kawasaki <[email protected]>
    Signed-off-by: Sidong Yang <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8 [+ + +]

Author: Vincent Mailhol <[email protected]>
Date:   Sat Feb 12 20:27:13 2022 +0900

    can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8
    
    [ Upstream commit f4896248e9025ff744b4147e6758274a1cb8cbae ]
    
    The driver uses an atomic_t variable: struct
    es58x_device::opened_channel_cnt to keep track of the number of opened
    channels in order to only allocate memory for the URBs when this count
    changes from zero to one.
    
    While the intent was to prevent race conditions, the choice of an
    atomic_t turns out to be a bad idea for several reasons:
    
    - implementation is incorrect and fails to decrement
      opened_channel_cnt when the URB allocation fails as reported in
      [1].
    
    - even if opened_channel_cnt were to be correctly decremented,
      atomic_t is insufficient to cover edge cases: there can be a race
      condition in which 1/ a first process fails to allocate URBs
      memory 2/ a second process enters es58x_open() before the first
      process does its cleanup and decrements opened_channed_cnt. In
      which case, the second process would successfully return despite
      the URBs memory not being allocated.
    
    - actually, any kind of locking mechanism was useless here because
      it is redundant with the network stack big kernel lock
      (a.k.a. rtnl_lock) which is being hold by all the callers of
      net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
      ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
      [3].
    
    The atmomic_t is thus replaced by a simple u8 type and the logic to
    increment and decrement es58x_device:opened_channel_cnt is simplified
    accordingly fixing the bug reported in [1]. We do not check again for
    ASSERST_RTNL() as this is already done by the callers.
    
    [1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
    [2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
    [3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541
    
    Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
    Link: https://lore.kernel.org/all/[email protected]
    Reported-by: Dan Carpenter <[email protected]>
    Signed-off-by: Vincent Mailhol <[email protected]>
    Signed-off-by: Marc Kleine-Budde <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

can: gs_usb: change active_channels's type from atomic_t to u8 [+ + +]

Author: Vincent Mailhol <[email protected]>
Date:   Tue Feb 15 08:48:14 2022 +0900

    can: gs_usb: change active_channels's type from atomic_t to u8
    
    commit 035b0fcf02707d3c9c2890dc1484b11aa5335eb1 upstream.
    
    The driver uses an atomic_t variable: gs_usb:active_channels to keep
    track of the number of opened channels in order to only allocate
    memory for the URBs when this count changes from zero to one.
    
    However, the driver does not decrement the counter when an error
    occurs in gs_can_open(). This issue is fixed by changing the type from
    atomic_t to u8 and by simplifying the logic accordingly.
    
    It is safe to use an u8 here because the network stack big kernel lock
    (a.k.a. rtnl_mutex) is being hold. For details, please refer to [1].
    
    [1] https://lore.kernel.org/linux-can/CAMZ6Rq+sHpiw34ijPsmp7vbUpDtJwvVtdV7CvRZJsLixjAFfrg@mail.gmail.com/T/#t
    
    Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices")
    Link: https://lore.kernel.org/all/[email protected]
    Signed-off-by: Vincent Mailhol <[email protected]>
    Signed-off-by: Marc Kleine-Budde <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

cifs: do not use uninitialized data in the owner/group sid [+ + +]

Author: Ronnie Sahlberg <[email protected]>
Date:   Sat Feb 12 08:16:20 2022 +1000

    cifs: do not use uninitialized data in the owner/group sid
    
    [ Upstream commit 26d3dadebbcbddfaf1d9caad42527a28a0ed28d8 ]
    
    When idsfromsid is used we create a special SID for owner/group.
    This structure must be initialized or else the first 5 bytes
    of the Authority field of the SID will contain uninitialized data
    and thus not be a valid SID.
    
    Signed-off-by: Ronnie Sahlberg <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: fix confusing unneeded warning message on smb2.1 and earlier [+ + +]

Author: Steve French <[email protected]>
Date:   Wed Feb 16 13:23:53 2022 -0600

    cifs: fix confusing unneeded warning message on smb2.1 and earlier
    
    [ Upstream commit 53923e0fe2098f90f339510aeaa0e1413ae99a16 ]
    
    When mounting with SMB2.1 or earlier, even with nomultichannel, we
    log the confusing warning message:
      "CIFS: VFS: multichannel is not supported on this protocol version, use 3.0 or above"
    
    Fix this so that we don't log this unless they really are trying
    to mount with multichannel.
    
    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215608
    Reported-by: Kim Scarborough <[email protected]>
    Cc: [email protected] # 5.11+
    Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: fix double free race when mount fails in cifs_get_root() [+ + +]

Author: Ronnie Sahlberg <[email protected]>
Date:   Fri Feb 11 02:59:15 2022 +1000

    cifs: fix double free race when mount fails in cifs_get_root()
    
    [ Upstream commit 3d6cc9898efdfb062efb74dc18cfc700e082f5d5 ]
    
    When cifs_get_root() fails during cifs_smb3_do_mount() we call
    deactivate_locked_super() which eventually will call delayed_free() which
    will free the context.
    In this situation we should not proceed to enter the out: section in
    cifs_smb3_do_mount() and free the same resources a second time.
    
    [Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0
    
    [Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE     5.17.0-rc3+ #4
    [Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
    [Thu Feb 10 12:59:06 2022] Call Trace:
    [Thu Feb 10 12:59:06 2022]  <IRQ>
    [Thu Feb 10 12:59:06 2022]  dump_stack_lvl+0x5d/0x78
    [Thu Feb 10 12:59:06 2022]  print_address_description.constprop.0+0x24/0x150
    [Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022]  kasan_report.cold+0x7d/0x117
    [Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022]  __asan_load8+0x86/0xa0
    [Thu Feb 10 12:59:06 2022]  rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022]  rcu_core+0x547/0xca0
    [Thu Feb 10 12:59:06 2022]  ? call_rcu+0x3c0/0x3c0
    [Thu Feb 10 12:59:06 2022]  ? __this_cpu_preempt_check+0x13/0x20
    [Thu Feb 10 12:59:06 2022]  ? lock_is_held_type+0xea/0x140
    [Thu Feb 10 12:59:06 2022]  rcu_core_si+0xe/0x10
    [Thu Feb 10 12:59:06 2022]  __do_softirq+0x1d4/0x67b
    [Thu Feb 10 12:59:06 2022]  __irq_exit_rcu+0x100/0x150
    [Thu Feb 10 12:59:06 2022]  irq_exit_rcu+0xe/0x30
    [Thu Feb 10 12:59:06 2022]  sysvec_hyperv_stimer0+0x9d/0xc0
    ...
    [Thu Feb 10 12:59:07 2022] Freed by task 58179:
    [Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
    [Thu Feb 10 12:59:07 2022]  kasan_set_track+0x25/0x30
    [Thu Feb 10 12:59:07 2022]  kasan_set_free_info+0x24/0x40
    [Thu Feb 10 12:59:07 2022]  ____kasan_slab_free+0x137/0x170
    [Thu Feb 10 12:59:07 2022]  __kasan_slab_free+0x12/0x20
    [Thu Feb 10 12:59:07 2022]  slab_free_freelist_hook+0xb3/0x1d0
    [Thu Feb 10 12:59:07 2022]  kfree+0xcd/0x520
    [Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0x149/0xbe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
    [Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
    [Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
    [Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
    [Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
    [Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    [Thu Feb 10 12:59:07 2022] Last potentially related work creation:
    [Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
    [Thu Feb 10 12:59:07 2022]  __kasan_record_aux_stack+0xb6/0xc0
    [Thu Feb 10 12:59:07 2022]  kasan_record_aux_stack_noalloc+0xb/0x10
    [Thu Feb 10 12:59:07 2022]  call_rcu+0x76/0x3c0
    [Thu Feb 10 12:59:07 2022]  cifs_umount+0xce/0xe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  cifs_kill_sb+0xc8/0xe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  deactivate_locked_super+0x5d/0xd0
    [Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0xab9/0xbe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
    [Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
    [Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
    [Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
    [Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
    [Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Reported-by: Shyam Prasad N <[email protected]>
    Reviewed-by: Shyam Prasad N <[email protected]>
    Signed-off-by: Ronnie Sahlberg <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: modefromsids must add an ACE for authenticated users [+ + +]

Author: Ronnie Sahlberg <[email protected]>
Date:   Mon Feb 14 08:40:52 2022 +1000

    cifs: modefromsids must add an ACE for authenticated users
    
    [ Upstream commit 0c6f4ebf8835d01866eb686d47578cde80097981 ]
    
    When we create a file with modefromsids we set an ACL that
    has one ACE for the magic modefromsid as well as a second ACE that
    grants full access to all authenticated users.
    
    When later we chante the mode on the file we strip away this, and other,
    ACE for authenticated users in set_chmod_dacl() and then just add back/update
    the modefromsid ACE.
    Thus leaving the file with a single ACE that is for the mode and no ACE
    to grant any user any rights to access the file.
    Fix this by always adding back also the modefromsid ACE so that we do not
    drop the rights to access the file.
    
    Signed-off-by: Ronnie Sahlberg <[email protected]>
    Reviewed-by: Shyam Prasad N <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: protect session channel fields with chan_lock [+ + +]

Author: Shyam Prasad N <[email protected]>
Date:   Mon Jul 19 10:54:46 2021 +0000

    cifs: protect session channel fields with chan_lock
    
    [ Upstream commit 724244cdb3828522109c88e56a0242537aefabe9 ]
    
    Introducing a new spin lock to protect all the channel related
    fields in a cifs_ses struct. This lock should be taken
    whenever dealing with the channel fields, and should be held
    only for very short intervals which will not sleep.
    
    Currently, all channel related fields in cifs_ses structure
    are protected by session_mutex. However, this mutex is held for
    long periods (sometimes while waiting for a reply from server).
    This makes the codepath quite tricky to change.
    
    Signed-off-by: Shyam Prasad N <[email protected]>
    Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

dma-buf: cma_heap: Fix mutex locking section [+ + +]

Author: Weizhao Ouyang <[email protected]>
Date:   Tue Jan 4 15:35:45 2022 +0800

    dma-buf: cma_heap: Fix mutex locking section
    
    [ Upstream commit 54329e6f7beea6af56c1230da293acc97d6a6ee7 ]
    
    Fix cma_heap_buffer mutex locking critical section to protect vmap_cnt
    and vaddr.
    
    Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the cma_heap implementation")
    Signed-off-by: Weizhao Ouyang <[email protected]>
    Acked-by: John Stultz <[email protected]>
    Signed-off-by: Sumit Semwal <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

dmaengine: shdma: Fix runtime PM imbalance on error [+ + +]

Author: Yongzhi Liu <[email protected]>
Date:   Sat Jan 15 21:34:56 2022 -0800

    dmaengine: shdma: Fix runtime PM imbalance on error
    
    [ Upstream commit 455896c53d5b803733ddd84e1bf8a430644439b6 ]
    
    pm_runtime_get_() increments the runtime PM usage counter even
    when it returns an error code, thus a matching decrement is needed on
    the error handling path to keep the counter balanced.
    
    Signed-off-by: Yongzhi Liu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Vinod Koul <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Fix stream->link_enc unassigned during stream removal [+ + +]

Author: Nicholas Kazlauskas <[email protected]>
Date:   Tue Jan 25 12:04:34 2022 -0500

    drm/amd/display: Fix stream->link_enc unassigned during stream removal
    
    [ Upstream commit 3743e7f6fcb938b7d8b7967e6a9442805e269b3d ]
    
    [Why]
    Found when running igt@kms_atomic.
    
    Userspace attempts to do a TEST_COMMIT when 0 streams which calls
    dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
    which ends up modifying stream->link = NULL directly, causing the
    global link_enc to be removed preventing further link activity
    and future link validation from passing.
    
    [How]
    We take care of link_enc unassignment at the start of
    link_enc_cfg_link_encs_assign so this call is no longer necessary.
    
    Fixes global state from being modified while unlocked.
    
    Reviewed-by: Jimmy Kizito <[email protected]>
    Acked-by: Jasdeep Dhillon <[email protected]>
    Signed-off-by: Nicholas Kazlauskas <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: For vblank_disable_immediate, check PSR is really used [+ + +]

Author: Michel Dц╓nzer <[email protected]>
Date:   Tue Feb 15 19:53:37 2022 +0100

    drm/amd/display: For vblank_disable_immediate, check PSR is really used
    
    [ Upstream commit 4d22336f903930eb94588b939c310743a3640276 ]
    
    Even if PSR is allowed for a present GPU, there might be no eDP link
    which supports PSR.
    
    Fixes: 708978487304 ("drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled")
    Reviewed-by: Harry Wentland <[email protected]>
    Signed-off-by: Michel Dц╓nzer <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: move FPU associated DSC code to DML folder [+ + +]

Author: Qingqing Zhuo <[email protected]>
Date:   Tue Aug 31 07:52:24 2021 -0400

    drm/amd/display: move FPU associated DSC code to DML folder
    
    [ Upstream commit d738db6883df3e3c513f9e777c842262693f951b ]
    
    [Why & How]
    As part of the FPU isolation work documented in
    https://patchwork.freedesktop.org/series/93042/, isolate code that uses
    FPU in DSC to DML, where all FPU code should locate.
    
    This change does not refactor any functions but move code around.
    
    Cc: Christian Kц╤nig <[email protected]>
    Cc: Hersen Wu <[email protected]>
    Cc: Anson Jacob <[email protected]>
    Cc: Harry Wentland <[email protected]>
    Reviewed-by: Rodrigo Siqueira <[email protected]>
    Acked-by: Agustin Gutierrez <[email protected]>
    Tested-by: Anson Jacob <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Qingqing Zhuo <[email protected]>
    Acked-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Reduce dmesg error to a debug print [+ + +]

Author: Leo (Hanghong) Ma <[email protected]>
Date:   Fri Nov 12 10:11:35 2021 -0500

    drm/amd/display: Reduce dmesg error to a debug print
    
    commit 1d925758ba1a5d2716a847903e2fd04efcbd9862 upstream.
    
    [Why & How]
    Dmesg errors are found on dcn3.1 during reset test, but it's not
    a really failure. So reduce it to a debug print.
    
    Signed-off-by: Leo (Hanghong) Ma <[email protected]>
    Reviewed-by: Nicholas Kazlauskas <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: Mario Limonciello <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amd/display: Update watermark values for DCN301 [+ + +]

Author: Agustin Gutierrez <[email protected]>
Date:   Fri Jan 28 17:51:53 2022 -0500

    drm/amd/display: Update watermark values for DCN301
    
    [ Upstream commit 2d8ae25d233767171942a9fba5fd8f4a620996be ]
    
    [Why]
    There is underflow / visual corruption DCN301, for high
    bandwidth MST DSC configurations such as 2x1440p144 or 2x4k60.
    
    [How]
    Use up-to-date watermark values for DCN301.
    
    Reviewed-by: Zhan Liu <[email protected]>
    Signed-off-by: Agustin Gutierrez <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Use adjusted DCN301 watermarks [+ + +]

Author: Nikola Cornij <[email protected]>
Date:   Tue Sep 7 22:09:01 2021 -0400

    drm/amd/display: Use adjusted DCN301 watermarks
    
    [ Upstream commit 808643ea56a2f96a42873d5e11c399957d6493aa ]
    
    [why]
    If DCN30 watermark calc is used for DCN301, the calculated values are
    wrong due to the data structure mismatch between DCN30 and DCN301.
    However, using the original DCN301 watermark values causes underflow.
    
    [how]
    - Add DCN21-style watermark calculations
    - Adjust DCN301 watermark values to remove the underflow
    
    Reviewed-by: Zhan Liu <[email protected]>
    Acked-by: Rodrigo Siqueira <[email protected]>
    Signed-off-by: Nikola Cornij <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby [+ + +]

Author: Evan Quan <[email protected]>
Date:   Tue Jan 18 14:07:51 2022 +0800

    drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby
    
    [ Upstream commit 0136f5844b006e2286f873457c3fcba8c45a3735 ]
    
    Correct the UMD pstate profiling clocks for Dimgrey Cavefish and Beige
    Goby.
    
    Signed-off-by: Evan Quan <[email protected]>
    Reviewed-by: Alex Deucher <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled [+ + +]

Author: Nicholas Kazlauskas <[email protected]>
Date:   Tue Nov 30 09:32:33 2021 -0500

    drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled
    
    [ Upstream commit 70897848730470cc477d5d89e6222c0f6a9ac173 ]
    
    [Why]
    PSR currently relies on the kernel's delayed vblank on/off mechanism
    as an implicit bufferring mechanism to prevent excessive entry/exit.
    
    Without this delay the user experience is impacted since it can take
    a few frames to enter/exit.
    
    [How]
    Only allow vblank disable immediate for DC when psr is not supported.
    
    Leave a TODO indicating that this support should be extended in the
    future to delay independent of the vblank interrupt.
    
    Fixes: 92020e81ddbeac ("drm/amdgpu/display: set vblank_disable_immediate for DC")
    
    Acked-by: Alex Deucher <[email protected]>
    Reviewed-by: Harry Wentland <[email protected]>
    Signed-off-by: Nicholas Kazlauskas <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: check vm ready by amdgpu_vm->evicting flag [+ + +]

Author: Qiang Yu <[email protected]>
Date:   Mon Feb 21 17:53:56 2022 +0800

    drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
    
    [ Upstream commit c1a66c3bc425ff93774fb2f6eefa67b83170dd7e ]
    
    Workstation application ANSA/META v21.1.4 get this error dmesg when
    running CI test suite provided by ANSA/META:
    [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
    
    This is caused by:
    1. create a 256MB buffer in invisible VRAM
    2. CPU map the buffer and access it causes vm_fault and try to move
       it to visible VRAM
    3. force visible VRAM space and traverse all VRAM bos to check if
       evicting this bo is valuable
    4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
       will set amdgpu_vm->evicting, but latter due to not in visible
       VRAM, won't really evict it so not add it to amdgpu_vm->evicted
    5. before next CS to clear the amdgpu_vm->evicting, user VM ops
       ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
       but fail in amdgpu_vm_bo_update_mapping() (check
       amdgpu_vm->evicting) and get this error log
    
    This error won't affect functionality as next CS will finish the
    waiting VM ops. But we'd better clear the error log by checking
    the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
    amdgpu_vm_bo_update_mapping() later.
    
    Another reason is amdgpu_vm->evicted list holds all BOs (both
    user buffer and page table), but only page table BOs' eviction
    prevent VM ops. amdgpu_vm->evicting flag is set only for page
    table BOs, so we should use evicting flag instead of evicted list
    in amdgpu_vm_ready().
    
    The side effect of this change is: previously blocked VM op (user
    buffer in "evicted" list but no page table in it) gets done
    immediately.
    
    v2: update commit comments.
    
    Acked-by: Paul Menzel <[email protected]>
    Reviewed-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Qiang Yu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: filter out radeon PCI device IDs [+ + +]

Author: Alex Deucher <[email protected]>
Date:   Tue Aug 3 17:17:10 2021 -0400

    drm/amdgpu: filter out radeon PCI device IDs
    
    [ Upstream commit bdbeb0dde4258586bb2f481b12da1e83aa4766f3 ]
    
    Once we claim all 0x1002 PCI display class devices, we will
    need to filter out devices owned by radeon.
    
    v2: rename radeon id array to make it more clear that
    the devices are not supported by amdgpu.
        add r128, mach64 pci ids as well
    
    Acked-by: Christian Kц╤nig <[email protected]> (v1)
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: filter out radeon secondary ids as well [+ + +]

Author: Alex Deucher <[email protected]>
Date:   Thu Jan 20 12:17:07 2022 -0500

    drm/amdgpu: filter out radeon secondary ids as well
    
    [ Upstream commit 9e5a14bce2402e84251a10269df0235cd7ce9234 ]
    
    Older radeon boards (r2xx-r5xx) had secondary PCI functions
    which we solely there for supporting multi-head on OSs with
    special requirements.  Add them to the unsupported list
    as well so we don't attempt to bind to them.  The driver
    would fail to bind to them anyway, but this does so
    in a cleaner way that should not confuse the user.
    
    Cc: [email protected]
    Acked-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: fix suspend/resume hang regression [+ + +]

Author: Qiang Yu <[email protected]>
Date:   Tue Mar 1 14:11:59 2022 +0800

    drm/amdgpu: fix suspend/resume hang regression
    
    [ Upstream commit f1ef17011c765495c876fa75435e59eecfdc1ee4 ]
    
    Regression has been reported that suspend/resume may hang with
    the previous vm ready check commit.
    
    So bring back the evicted list check as a temp fix.
    
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
    Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
    Reviewed-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Qiang Yu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt [+ + +]

Author: Guchun Chen <[email protected]>
Date:   Fri Jan 7 16:31:20 2022 +0800

    drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt
    
    [ Upstream commit 2096b74b1da5ca418827b54ac4904493bd9de89c ]
    
    This is observed in SRIOV case with virtual KMS as display.
    
    _raw_spin_lock_irqsave+0x37/0x40
    drm_handle_vblank+0x69/0x350 [drm]
    ? try_to_wake_up+0x432/0x5c0
    ? amdgpu_vkms_prepare_fb+0x1c0/0x1c0 [amdgpu]
    drm_crtc_handle_vblank+0x17/0x20 [drm]
    amdgpu_vkms_vblank_simulate+0x4d/0x80 [amdgpu]
    __hrtimer_run_queues+0xfb/0x230
    hrtimer_interrupt+0x109/0x220
    __sysvec_apic_timer_interrupt+0x64/0xe0
    asm_call_irq_on_stack+0x12/0x20
    
    Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
    Signed-off-by: Guchun Chen <[email protected]>
    Acked-by: Alex Deucher <[email protected]>
    Tested-by: Kelly Zytaruk <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdkfd: Check for null pointer after calling kmemdup [+ + +]

Author: Jiasheng Jiang <[email protected]>
Date:   Wed Jan 5 17:09:43 2022 +0800

    drm/amdkfd: Check for null pointer after calling kmemdup
    
    [ Upstream commit abfaf0eee97925905e742aa3b0b72e04a918fa9e ]
    
    As the possible failure of the allocation, kmemdup() may return NULL
    pointer.
    Therefore, it should be better to check the 'props2' in order to prevent
    the dereference of NULL pointer.
    
    Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
    Signed-off-by: Jiasheng Jiang <[email protected]>
    Reviewed-by: Felix Kuehling <[email protected]>
    Signed-off-by: Felix Kuehling <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self refresh mode [+ + +]

Author: Liu Ying <[email protected]>
Date:   Thu Dec 30 12:06:26 2021 +0800

    drm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self refresh mode
    
    [ Upstream commit 69e630016ef4e4a1745310c446f204dc6243e907 ]
    
    Actual hardware state of CRTC is controlled by the member 'active' in
    struct drm_crtc_state instead of the member 'enable', according to the
    kernel doc of the member 'enable'.  In fact, the drm client modeset
    and atomic helpers are using the member 'active' to do the control.
    
    Referencing the member 'enable' of new_crtc_state, the function
    crtc_needs_disable() may fail to reflect if CRTC needs disable in
    self refresh mode, e.g., when the framebuffer emulation will be blanked
    through the client modeset helper with the next commit, the member
    'enable' of new_crtc_state is still true while the member 'active' is
    false, hence the relevant potential encoder and bridges won't be disabled.
    
    So, let's check new_crtc_state->active to determine if CRTC needs disable
    in self refresh mode instead of new_crtc_state->enable.
    
    Fixes: 1452c25b0e60 ("drm: Add helpers to kick off self refresh mode in drivers")
    Cc: Sean Paul <[email protected]>
    Cc: Rob Clark <[email protected]>
    Cc: Maarten Lankhorst <[email protected]>
    Cc: Maxime Ripard <[email protected]>
    Cc: Thomas Zimmermann <[email protected]>
    Cc: David Airlie <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Reviewed-by: Alex Deucher <[email protected]>
    Signed-off-by: Liu Ying <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/bridge: ti-sn65dsi86: Properly undo autosuspend [+ + +]

Author: Douglas Anderson <[email protected]>
Date:   Tue Feb 22 14:18:43 2022 -0800

    drm/bridge: ti-sn65dsi86: Properly undo autosuspend
    
    [ Upstream commit 26d3474348293dc752c55fe6d41282199f73714c ]
    
    The PM Runtime docs say:
      Drivers in ->remove() callback should undo the runtime PM changes done
      in ->probe(). Usually this means calling pm_runtime_disable(),
      pm_runtime_dont_use_autosuspend() etc.
    
    We weren't doing that for autosuspend. Let's do it.
    
    Fixes: 9bede63127c6 ("drm/bridge: ti-sn65dsi86: Use pm_runtime autosuspend")
    Signed-off-by: Douglas Anderson <[email protected]>
    Reviewed-by: Linus Walleij <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/20220222141838.1.If784ba19e875e8ded4ec4931601ce6d255845245@changeid
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915/display: Move DRRS code its own file [+ + +]

Author: Josц╘ Roberto de Souza <[email protected]>
Date:   Fri Aug 27 10:42:52 2021 -0700

    drm/i915/display: Move DRRS code its own file
    
    [ Upstream commit a1b63119ee839c8ff622407aab25c9723943638a ]
    
    intel_dp.c is a 5k lines monster, so moving DRRS out of it to reduce
    some lines from it.
    
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Cc: Jani Nikula <[email protected]>
    Cc: Rodrigo Vivi <[email protected]>
    Signed-off-by: Josц╘ Roberto de Souza <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915/display: split out dpt out of intel_display.c [+ + +]

Author: Jani Nikula <[email protected]>
Date:   Mon Aug 23 15:25:31 2021 +0300

    drm/i915/display: split out dpt out of intel_display.c
    
    [ Upstream commit dc6d6158a6e8b11a11544a541583296d9323050f ]
    
    Let's try to reduce the size of intel_display.c, not increase it.
    
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Signed-off-by: Jani Nikula <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/934a2a0db05e835f6843befef6082e2034f23b3a.1629721467.git.jani.nikula@intel.com
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915/guc/slpc: Correct the param count for unset param [+ + +]

Author: Vinay Belgaumkar <[email protected]>
Date:   Wed Feb 16 10:15:04 2022 -0800

    drm/i915/guc/slpc: Correct the param count for unset param
    
    [ Upstream commit 1b279f6ad467535c3b8a66b4edefaca2cdd5bdc3 ]
    
    SLPC unset param H2G only needs one parameter - the id of the
    param.
    
    Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits")
    
    Suggested-by: Umesh Nerlige Ramappa <[email protected]>
    Signed-off-by: Vinay Belgaumkar <[email protected]>
    Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
    Signed-off-by: Ramalingam C <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 9648f1c3739505557d94ff749a4f32192ea81fe3)
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915: Disable DRRS on IVB/HSW port != A [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Fri Jan 28 12:37:50 2022 +0200

    drm/i915: Disable DRRS on IVB/HSW port != A
    
    [ Upstream commit ee59792c97176f12c1da31f29fc4c2aab187f06e ]
    
    Currently we allow DRRS on IVB PCH ports, but we're missing a
    few programming steps meaning it is guaranteed to not work.
    And on HSW DRRS is not supported on anything but port A ever
    as only transcoder EDP has the M2/N2 registers (though I'm
    not sure if HSW ever has eDP on any other port).
    
    Starting from BDW all transcoders have the dynamically
    reprogrammable M/N registers so DRRS could work on any
    port.
    
    Stop initializing DRRS on ports where it cannot possibly work.
    
    Cc: [email protected]
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Reviewed-by: Jani Nikula <[email protected]>
    (cherry picked from commit f0d4ce59f4d48622044933054a0e0cefa91ba15e)
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915: don't call free_mmap_offset when purging [+ + +]

Author: Matthew Auld <[email protected]>
Date:   Thu Jan 6 17:49:07 2022 +0000

    drm/i915: don't call free_mmap_offset when purging
    
    [ Upstream commit 4c2602ba8d74c35d550ed3d518809c697de08d88 ]
    
    The TTM backend is in theory the only user here(also purge should only
    be called once we have dropped the pages), where it is setup at object
    creation and is only removed once the object is destroyed. Also
    resetting the node here might be iffy since the ttm fault handler
    uses the stored fake offset to determine the page offset within the pages
    array.
    
    This also blows up in the dontneed-before-mmap test, since the
    expectation is that the vma_node will live on, until the object is
    destroyed:
    
    <2> [749.062902] kernel BUG at drivers/gpu/drm/i915/gem/i915_gem_ttm.c:943!
    <4> [749.062923] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    <4> [749.062928] CPU: 0 PID: 1643 Comm: gem_madvise Tainted: G     U  W         5.16.0-rc8-CI-CI_DRM_11046+ #1
    <4> [749.062933] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390 Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
    <4> [749.062937] RIP: 0010:i915_ttm_mmap_offset.cold.35+0x5b/0x5d [i915]
    <4> [749.063044] Code: 00 48 c7 c2 a0 23 4e a0 48 c7 c7 26 df 4a a0 e8 95 1d d0 e0 bf 01 00 00 00 e8 8b ec cf e0 31 f6 bf 09 00 00 00 e8 5f 30 c0 e0 <0f> 0b 48 c7 c1 24 4b 56 a0 ba 5b 03 00 00 48 c7 c6 c0 23 4e a0 48
    <4> [749.063052] RSP: 0018:ffffc90002ab7d38 EFLAGS: 00010246
    <4> [749.063056] RAX: 0000000000000240 RBX: ffff88811f2e61c0 RCX: 0000000000000006
    <4> [749.063060] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
    <4> [749.063063] RBP: ffffc90002ab7e58 R08: 0000000000000001 R09: 0000000000000001
    <4> [749.063067] R10: 000000000123d0f8 R11: ffffc90002ab7b20 R12: ffff888112a1a000
    <4> [749.063071] R13: 0000000000000004 R14: ffff88811f2e61c0 R15: ffff888112a1a000
    <4> [749.063074] FS:  00007f6e5fcad500(0000) GS:ffff8884ad600000(0000) knlGS:0000000000000000
    <4> [749.063078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    <4> [749.063081] CR2: 00007efd264e39f0 CR3: 0000000115fd6005 CR4: 00000000003706f0
    <4> [749.063085] Call Trace:
    <4> [749.063087]  <TASK>
    <4> [749.063089]  __assign_mmap_offset+0x41/0x300 [i915]
    <4> [749.063171]  __assign_mmap_offset_handle+0x159/0x270 [i915]
    <4> [749.063248]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
    <4> [749.063325]  drm_ioctl_kernel+0xae/0x140
    <4> [749.063330]  drm_ioctl+0x201/0x3d0
    <4> [749.063333]  ? i915_gem_dumb_mmap_offset+0x70/0x70 [i915]
    <4> [749.063409]  ? do_user_addr_fault+0x200/0x670
    <4> [749.063415]  __x64_sys_ioctl+0x6d/0xa0
    <4> [749.063419]  do_syscall_64+0x3a/0xb0
    <4> [749.063423]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    <4> [749.063428] RIP: 0033:0x7f6e5f100317
    
    Testcase: igt/gem_madvise/dontneed-before-mmap
    Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.")
    Signed-off-by: Matthew Auld <[email protected]>
    Cc: Thomas Hellstrц╤m <[email protected]>
    Reviewed-by: Thomas Hellstrц╤m <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 658a0c632625e1db51837ff754fe18a6a7f2ccf8)
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915: s/JSP2/ICP2/ PCH [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Thu Feb 24 15:21:42 2022 +0200

    drm/i915: s/JSP2/ICP2/ PCH
    
    commit 08783aa7693f55619859f4f63f384abf17cb58c5 upstream.
    
    This JSP2 PCH actually seems to be some special Apple
    specific ICP variant rather than a JSP. Make it so. Or at
    least all the references to it seem to be some Apple ICL
    machines. Didn't manage to find these PCI IDs in any
    public chipset docs unfortunately.
    
    The only thing we're losing here with this JSP->ICP change
    is Wa_14011294188, but based on the HSD that isn't actually
    needed on any ICP based design (including JSP), only TGP
    based stuff (including MCC) really need it. The documented
    w/a just never made that distinction because Windows didn't
    want to differentiate between JSP and MCC (not sure how
    they handle hpd/ddc/etc. then though...).
    
    Cc: [email protected]
    Cc: Matt Roper <[email protected]>
    Cc: Vivek Kasireddy <[email protected]>
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4226
    Fixes: 943682e3bd19 ("drm/i915: Introduce Jasper Lake PCH")
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Acked-by: Vivek Kasireddy <[email protected]>
    Tested-by: Tomas Bzatek <[email protected]>
    (cherry picked from commit 53581504a8e216d435f114a4f2596ad0dfd902fc)
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/mediatek: mtk_dsi: Reset the dsi0 hardware [+ + +]

Author: Enric Balletbo i Serra <[email protected]>
Date:   Thu Sep 30 10:31:50 2021 +0200

    drm/mediatek: mtk_dsi: Reset the dsi0 hardware
    
    [ Upstream commit 605c83753d97946aab176735020a33ebfb0b4615 ]
    
    Reset dsi0 HW to default when power on. This prevents to have different
    settingis between the bootloader and the kernel.
    
    As not all Mediatek boards have the reset consumer configured in their
    board description, also is not needed on all of them, the reset is optional,
    so the change is compatible with all boards.
    
    Cc: Jitao Shi <[email protected]>
    Suggested-by: Chun-Kuang Hu <[email protected]>
    Signed-off-by: Enric Balletbo i Serra <[email protected]>
    Acked-by: Chun-Kuang Hu <[email protected]>
    Reviewed-by: Matthias Brugger <[email protected]>
    Link: https://lore.kernel.org/r/20210930103105.v4.7.Idbb4727ddf00ba2fe796b630906baff10d994d89@changeid
    Signed-off-by: Matthias Brugger <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get [+ + +]

Author: Miaoqian Lin <[email protected]>
Date:   Fri Jan 7 08:36:32 2022 +0000

    drm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get
    
    [ Upstream commit c71af3dae3e34d2fde0c19623cf7f8483321f0e3 ]
    
    The reference taken by 'of_find_device_by_node()' must be released when
    not needed anymore.
    Add the corresponding 'put_device()' in the error handling path.
    
    Fixes: 9bf3797796f5 ("drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device")
    Signed-off-by: Miaoqian Lin <[email protected]>
    Signed-off-by: Maxime Ripard <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm: mxsfb: Fix NULL pointer dereference [+ + +]

Author: Alexander Stein <[email protected]>
Date:   Wed Feb 2 09:17:55 2022 +0100

    drm: mxsfb: Fix NULL pointer dereference
    
    [ Upstream commit 622c9a3a7868e1eeca39c55305ca3ebec4742b64 ]
    
    mxsfb should not ever dereference the NULL pointer which
    drm_atomic_get_new_bridge_state is allowed to return.
    Assume a fixed format instead.
    
    Fixes: b776b0f00f24 ("drm: mxsfb: Use bus_format from the nearest bridge if present")
    Signed-off-by: Alexander Stein <[email protected]>
    Signed-off-by: Marek Vasut <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm: mxsfb: Set fallback bus format when the bridge doesn't provide one [+ + +]

Author: Guido Gц╪nther <[email protected]>
Date:   Mon Oct 11 15:41:27 2021 +0200

    drm: mxsfb: Set fallback bus format when the bridge doesn't provide one
    
    [ Upstream commit 1db060509903b29d63fe2e39c14fd0f99c4a447e ]
    
    If a bridge doesn't do any bus format handling MEDIA_BUS_FMT_FIXED is
    returned. Fallback to a reasonable default (MEDIA_BUS_FMT_RGB888_1X24) in
    that case.
    
    This unbreaks e.g. using mxsfb with the nwl bridge and mipi dsi panels.
    
    Reported-by: Martin Kepplinger <[email protected]>
    Signed-off-by: Guido Gц╪nther <[email protected]>
    Reviewed-by: Lucas Stach <[email protected]>
    Reviewed-by: Sam Ravnborg <[email protected]>
    Acked-by: Stefan Agner <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/781f0352052cc50c823c199ef5f53c84902d0580.1633959458.git.agx@sigxcpu.org
    Signed-off-by: Sasha Levin <[email protected]>

e1000e: Correct NVM checksum verification flow [+ + +]

Author: Sasha Neftin <[email protected]>
Date:   Thu Feb 3 14:21:49 2022 +0200

    e1000e: Correct NVM checksum verification flow
    
    commit ffd24fa2fcc76ecb2e61e7a4ef8588177bcb42a6 upstream.
    
    Update MAC type check e1000_pch_tgp because for e1000_pch_cnp,
    NVM checksum update is still possible.
    Emit a more detailed warning message.
    
    Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1191663
    Fixes: 4051f68318ca ("e1000e: Do not take care about recovery NVM checksum")
    Reported-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Neftin <[email protected]>
    Tested-by: Naama Meir <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

e1000e: Fix possible HW unit hang after an s0ix exit [+ + +]

Author: Sasha Neftin <[email protected]>
Date:   Tue Jan 25 19:31:23 2022 +0200

    e1000e: Fix possible HW unit hang after an s0ix exit
    
    [ Upstream commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd ]
    
    Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY
    LAN connected device (LCD) reset during power management flows. This
    fixes possible HW unit hangs on the s0ix exit on some corporate ADL
    platforms.
    
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
    Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
    Suggested-by: Dima Ruinskiy <[email protected]>
    Suggested-by: Nir Efrati <[email protected]>
    Signed-off-by: Sasha Neftin <[email protected]>
    Tested-by: Kai-Heng Feng <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

efivars: Respect "block" flag in efivar_entry_set_safe() [+ + +]

Author: Jann Horn <[email protected]>
Date:   Fri Feb 18 19:05:59 2022 +0100

    efivars: Respect "block" flag in efivar_entry_set_safe()
    
    commit 258dd902022cb10c83671176688074879517fd21 upstream.
    
    When the "block" flag is false, the old code would sometimes still call
    check_var_size(), which wrongly tells ->query_variable_store() that it can
    block.
    
    As far as I can tell, this can't really materialize as a bug at the moment,
    because ->query_variable_store only does something on X86 with generic EFI,
    and in that configuration we always take the efivar_entry_set_nonblocking()
    path.
    
    Fixes: ca0e30dcaa53 ("efi: Add nonblocking option to efi_query_variable_store()")
    Signed-off-by: Jann Horn <[email protected]>
    Signed-off-by: Ard Biesheuvel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ethtool: Fix link extended state for big endian [+ + +]

Author: Moshe Tal <[email protected]>
Date:   Thu Jan 20 11:55:50 2022 +0200

    ethtool: Fix link extended state for big endian
    
    [ Upstream commit e2f08207c558bc0bc8abaa557cdb29bad776ac7b ]
    
    The link extended sub-states are assigned as enum that is an integer
    size but read from a union as u8, this is working for small values on
    little endian systems but for big endian this always give 0. Fix the
    variable in the union to match the enum size.
    
    Fixes: ecc31c60240b ("ethtool: Add link extended state")
    Signed-off-by: Moshe Tal <[email protected]>
    Reviewed-by: Ido Schimmel <[email protected]>
    Tested-by: Ido Schimmel <[email protected]>
    Reviewed-by: Gal Pressman <[email protected]>
    Reviewed-by: Amit Cohen <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

exfat: fix i_blocks for files truncated over 4 GiB [+ + +]

Author: Christophe Vu-Brugier <[email protected]>
Date:   Mon Nov 22 22:02:37 2021 +0900

    exfat: fix i_blocks for files truncated over 4 GiB
    
    [ Upstream commit 92fba084b79e6bc7b12fc118209f1922c1a2df56 ]
    
    In exfat_truncate(), the computation of inode->i_blocks is wrong if
    the file is larger than 4 GiB because a 32-bit variable is used as a
    mask. This is fixed and simplified by using round_up().
    
    Also fix the same buggy computation in exfat_read_root() and another
    (correct) one in exfat_fill_inode(). The latter was fixed another way
    last month but can be simplified by using round_up() as well. See:
    
      commit 0c336d6e33f4 ("exfat: fix incorrect loading of i_blocks for
                            large files")
    
    Fixes: 98d917047e8b ("exfat: add file operations")
    Cc: [email protected] # v5.7+
    Suggested-by: Matthew Wilcox <[email protected]>
    Reviewed-by: Sungjong Seo <[email protected]>
    Signed-off-by: Christophe Vu-Brugier <[email protected]>
    Signed-off-by: Namjae Jeon <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

exfat: reuse exfat_inode_info variable instead of calling EXFAT_I() [+ + +]

Author: Christophe Vu-Brugier <[email protected]>
Date:   Tue Nov 2 22:23:58 2021 +0100

    exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
    
    [ Upstream commit 7dee6f57d7f22a89dd214518c778aec448270d4c ]
    
    Also add a local "struct exfat_inode_info *ei" variable to
    exfat_truncate() to simplify the code.
    
    Signed-off-by: Christophe Vu-Brugier <[email protected]>
    Signed-off-by: Namjae Jeon <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: drop ineligible txn start stop APIs [+ + +]

Author: Harshad Shirwadkar <[email protected]>
Date:   Thu Dec 23 12:21:38 2021 -0800

    ext4: drop ineligible txn start stop APIs
    
    [ Upstream commit 7bbbe241ec7ce0def9f71464c878fdbd2b0dcf37 ]
    
    This patch drops ext4_fc_start_ineligible() and
    ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions
    should simply call ext4_fc_mark_ineligible() after starting the
    trasaction.
    
    Signed-off-by: Harshad Shirwadkar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fast commit may miss file actions [+ + +]

Author: Xin Yin <[email protected]>
Date:   Mon Jan 17 17:36:55 2022 +0800

    ext4: fast commit may miss file actions
    
    [ Upstream commit bdc8a53a6f2f0b1cb5f991440f2100732299eb93 ]
    
    in the follow scenario:
    1. jbd start transaction n
    2. task A get new handle for transaction n+1
    3. task A do some actions and add inode to FC_Q_MAIN fc_q
    4. jbd complete transaction n and clear FC_Q_MAIN fc_q
    5. task A call fsync
    
    Fast commit will lost the file actions during a full commit.
    
    we should also add updates to staging queue during a full commit.
    and in ext4_fc_cleanup(), when reset a inode's fc track range, check
    it's i_sync_tid, if it bigger than current transaction tid, do not
    rest it, or we will lost the track range.
    
    And EXT4_MF_FC_COMMITTING is not needed anymore, so drop it.
    
    Signed-off-by: Xin Yin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fast commit may not fallback for ineligible commit [+ + +]

Author: Xin Yin <[email protected]>
Date:   Mon Jan 17 17:36:54 2022 +0800

    ext4: fast commit may not fallback for ineligible commit
    
    [ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ]
    
    For the follow scenario:
    1. jbd start commit transaction n
    2. task A get new handle for transaction n+1
    3. task A do some ineligible actions and mark FC_INELIGIBLE
    4. jbd complete transaction n and clean FC_INELIGIBLE
    5. task A call fsync
    
    In this case fast commit will not fallback to full commit and
    transaction n+1 also not handled by jbd.
    
    Make ext4_fc_mark_ineligible() also record transaction tid for
    latest ineligible case, when call ext4_fc_cleanup() check
    current transaction tid, if small than latest ineligible tid
    do not clear the EXT4_MF_FC_INELIGIBLE.
    
    Reported-by: kernel test robot <[email protected]>
    Reported-by: Dan Carpenter <[email protected]>
    Reported-by: Ritesh Harjani <[email protected]>
    Suggested-by: Harshad Shirwadkar <[email protected]>
    Signed-off-by: Xin Yin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

ext4: simplify updating of fast commit stats [+ + +]

Author: Harshad Shirwadkar <[email protected]>
Date:   Thu Dec 23 12:21:39 2021 -0800

    ext4: simplify updating of fast commit stats
    
    [ Upstream commit 0915e464cb274648e1ef1663e1356e53ff400983 ]
    
    Move fast commit stats updating logic to a separate function from
    ext4_fc_commit(). This significantly improves readability of
    ext4_fc_commit().
    
    Signed-off-by: Harshad Shirwadkar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

firmware: arm_scmi: Remove space in MODULE_ALIAS name [+ + +]

Author: Alyssa Ross <[email protected]>
Date:   Fri Feb 11 10:27:04 2022 +0000

    firmware: arm_scmi: Remove space in MODULE_ALIAS name
    
    commit 1ba603f56568c3b4c2542dfba07afa25f21dcff3 upstream.
    
    modprobe can't handle spaces in aliases. Get rid of it to fix the issue.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: aa4f886f3893 ("firmware: arm_scmi: add basic driver infrastructure for SCMI")
    Reviewed-by: Cristian Marussi <[email protected]>
    Signed-off-by: Alyssa Ross <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

gve: Recording rx queue before sending to napi [+ + +]

Author: Tao Liu <[email protected]>
Date:   Mon Feb 7 09:59:01 2022 -0800

    gve: Recording rx queue before sending to napi
    
    [ Upstream commit 084cbb2ec3af2d23be9de65fcc9493e21e265859 ]
    
    This caused a significant performance degredation when using generic XDP
    with multiple queues.
    
    Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
    Signed-off-by: Tao Liu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hamradio: fix macro redefine warning [+ + +]

Author: Huang Pei <[email protected]>
Date:   Tue Nov 23 19:07:48 2021 +0800

    hamradio: fix macro redefine warning
    
    commit 16517829f2e02f096fb5ea9083d160381127faf3 upstream.
    
    MIPS/IA64 define END as assembly function ending, which conflict
    with END definition in mkiss.c, just undef it at first
    
    Reported-by: [email protected]
    Signed-off-by: Huang Pei <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Cc: Guenter Roeck <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: add mapping for KEY_ALL_APPLICATIONS [+ + +]

Author: William Mahon <[email protected]>
Date:   Thu Mar 3 18:26:22 2022 -0800

    HID: add mapping for KEY_ALL_APPLICATIONS
    
    commit 327b89f0acc4c20a06ed59e4d9af7f6d804dc2e2 upstream.
    
    This patch adds a new key definition for KEY_ALL_APPLICATIONS
    and aliases KEY_DASHBOARD to it.
    
    It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS.
    
    Signed-off-by: William Mahon <[email protected]>
    Acked-by: Benjamin Tissoires <[email protected]>
    Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: add mapping for KEY_DICTATE [+ + +]

Author: William Mahon <[email protected]>
Date:   Thu Mar 3 18:23:42 2022 -0800

    HID: add mapping for KEY_DICTATE
    
    commit bfa26ba343c727e055223be04e08f2ebdd43c293 upstream.
    
    Numerous keyboards are adding dictate keys which allows for text
    messages to be dictated by a microphone.
    
    This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8
    usage code to this new keycode. Additionally hid-debug is adjusted to
    recognize this new usage code as well.
    
    Signed-off-by: William Mahon <[email protected]>
    Acked-by: Benjamin Tissoires <[email protected]>
    Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: amd_sfh: Add functionality to clear interrupts [+ + +]

Author: Basavaraj Natikar <[email protected]>
Date:   Tue Feb 8 17:51:11 2022 +0530

    HID: amd_sfh: Add functionality to clear interrupts
    
    [ Upstream commit fb75a3791a8032848c987db29b622878d8fe2b1c ]
    
    Newer AMD platforms with SFH may generate interrupts on some events
    which are unwarranted. Until this is cleared the actual MP2 data
    processing maybe stalled in some cases.
    
    Add a mechanism to clear the pending interrupts (if any) during the
    driver initialization and sensor command operations.
    
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

HID: amd_sfh: Add interrupt handler to process interrupts [+ + +]

Author: Basavaraj Natikar <[email protected]>
Date:   Tue Feb 8 17:51:12 2022 +0530

    HID: amd_sfh: Add interrupt handler to process interrupts
    
    [ Upstream commit 7f016b35ca7623c71b31facdde080e8ce171a697 ]
    
    On newer AMD platforms with SFH, it is observed that random interrupts
    get generated on the SFH hardware and until this is cleared the firmware
    sensor processing is stalled, resulting in no data been received to
    driver side.
    
    Add routines to handle these interrupts, so that firmware operations are
    not stalled.
    
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

HID: amd_sfh: Handle amd_sfh work buffer in PM ops [+ + +]

Author: Basavaraj Natikar <[email protected]>
Date:   Tue Feb 8 17:51:08 2022 +0530

    HID: amd_sfh: Handle amd_sfh work buffer in PM ops
    
    [ Upstream commit 0cf74235f4403b760a37f77271d2ca3424001ff9 ]
    
    Since in the current amd_sfh design the sensor data is periodically
    obtained in the form of poll data, during the suspend/resume cycle,
    scheduling a delayed work adds no value.
    
    So, cancel the work and restart back during the suspend/resume cycle
    respectively.
    
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Fri Jan 14 14:08:30 2022 -0800

    hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list()
    
    [ Upstream commit d6aba4c8e20d4d2bf65d589953f6d891c178f3a3 ]
    
    Pass "end - 1" instead of "end" when walking the interval tree in
    hugetlb_vmdelete_list() to fix an inclusive vs.  exclusive bug.  The two
    callers that pass a non-zero "end" treat it as exclusive, whereas the
    interval tree iterator expects an inclusive "last".  E.g.  punching a
    hole in a file that precisely matches the size of a single hugepage,
    with a vma starting right on the boundary, will result in
    unmap_hugepage_range() being called twice, with the second call having
    start==end.
    
    The off-by-one error doesn't cause functional problems as
    __unmap_hugepage_range() turns into a massive nop due to
    short-circuiting its for-loop on "address < end".  But, the mmu_notifier
    invocations to invalid_range_{start,end}() are passed a bogus zero-sized
    range, which may be unexpected behavior for secondary MMUs.
    
    The bug was exposed by commit ed922739c919 ("KVM: Use interval tree to
    do fast hva lookup in memslots"), currently queued in the KVM tree for
    5.17, which added a WARN to detect ranges with start==end.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 1bfad99ab425 ("hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete")
    Signed-off-by: Sean Christopherson <[email protected]>
    Reported-by: [email protected]
    Reviewed-by: Mike Kravetz <[email protected]>
    Cc: Paolo Bonzini <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: bcm2835: Avoid clock stretching timeouts [+ + +]

Author: Eric Anholt <[email protected]>
Date:   Fri Feb 23 22:42:31 2018 +0100

    i2c: bcm2835: Avoid clock stretching timeouts
    
    [ Upstream commit 9495b9b31abe525ebd93da58de2c88b9f66d3a0e ]
    
    The CLKT register contains at poweron 0x40, which at our typical 100kHz
    bus rate means .64ms. But there is no specified limit to how long devices
    should be able to stretch the clocks, so just disable the timeout. We
    still have a timeout wrapping the entire transfer.
    
    Signed-off-by: Eric Anholt <[email protected]>
    Signed-off-by: Stefan Wahren <[email protected]>
    BugLink: https://github.com/raspberrypi/linux/issues/3064
    Signed-off-by: Wolfram Sang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: cadence: allow COMPILE_TEST [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Sat Feb 12 20:45:48 2022 +0100

    i2c: cadence: allow COMPILE_TEST
    
    [ Upstream commit 0b0dcb3882c8f08bdeafa03adb4487e104d26050 ]
    
    Driver builds fine with COMPILE_TEST. Enable it for wider test coverage
    and easier maintenance.
    
    Signed-off-by: Wolfram Sang <[email protected]>
    Acked-by: Michal Simek <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: imx: allow COMPILE_TEST [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Sat Feb 12 20:46:57 2022 +0100

    i2c: imx: allow COMPILE_TEST
    
    [ Upstream commit 2ce4462f2724d1b3cedccea441c6d18bb360629a ]
    
    Driver builds fine with COMPILE_TEST. Enable it for wider test coverage
    and easier maintenance.
    
    Signed-off-by: Wolfram Sang <[email protected]>
    Acked-by: Oleksij Rempel <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: qup: allow COMPILE_TEST [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Sat Feb 12 20:47:07 2022 +0100

    i2c: qup: allow COMPILE_TEST
    
    [ Upstream commit 5de717974005fcad2502281e9f82e139ca91f4bb ]
    
    Driver builds fine with COMPILE_TEST. Enable it for wider test coverage
    and easier maintenance.
    
    Signed-off-by: Wolfram Sang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i3c/master/mipi-i3c-hci: Fix a potentially infinite loop in 'hci_dat_v1_get_index()' [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Wed Nov 17 23:05:23 2021 +0100

    i3c/master/mipi-i3c-hci: Fix a potentially infinite loop in 'hci_dat_v1_get_index()'
    
    [ Upstream commit 3f43926f271287fb1744c9ac9ae1122497f2b0c2 ]
    
    The code in 'hci_dat_v1_get_index()' really looks like a hand coded version
    of 'for_each_set_bit()', except that a +1 is missing when searching for the
    next set bit.
    
    This really looks odd and it seems that it will loop until 'dat_w0_read()'
    returns the expected result.
    
    So use 'for_each_set_bit()' instead. It is less verbose and should be more
    correct.
    
    Fixes: 9ad9a52cce28 ("i3c/master: introduce the mipi-i3c-hci driver")
    Signed-off-by: Christophe JAILLET <[email protected]>
    Acked-by: Nicolas Pitre <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>
    Link: https://lore.kernel.org/r/0cdf3cb10293ead1acd271fdb8a70369c298c082.1637186628.git.christophe.jaillet@wanadoo.fr
    Signed-off-by: Sasha Levin <[email protected]>

i3c: fix incorrect address slot lookup on 64-bit [+ + +]

Author: Jamie Iles <[email protected]>
Date:   Wed Sep 22 17:56:00 2021 +0100

    i3c: fix incorrect address slot lookup on 64-bit
    
    [ Upstream commit f18f98110f2b179792cb70d85cba697320a3790f ]
    
    The address slot bitmap is an array of unsigned long's which are the
    same size as an int on 32-bit platforms but not 64-bit.  Loading the
    bitmap into an int could result in the incorrect status being returned
    for a slot and slots being reported as the wrong status.
    
    Fixes: 3a379bbcea0a ("i3c: Add core I3C infrastructure")
    Cc: Boris Brezillon <[email protected]>
    Cc: Alexandre Belloni <[email protected]>
    Signed-off-by: Jamie Iles <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

i3c: master: dw: check return of dw_i3c_master_get_free_pos() [+ + +]

Author: Tom Rix <[email protected]>
Date:   Sat Jan 8 07:09:48 2022 -0800

    i3c: master: dw: check return of dw_i3c_master_get_free_pos()
    
    [ Upstream commit 13462ba1815db5a96891293a9cfaa2451f7bd623 ]
    
    Clang static analysis reports this problem
    dw-i3c-master.c:799:9: warning: The result of the left shift is
      undefined because the left operand is negative
                          COMMAND_PORT_DEV_INDEX(pos) |
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    pos can be negative because dw_i3c_master_get_free_pos() can return an
    error.  So check for an error.
    
    Fixes: 1dd728f5d4d4 ("i3c: master: Add driver for Synopsys DesignWare IP")
    Signed-off-by: Tom Rix <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Add __IAVF_INIT_FAILED state [+ + +]

Author: Mateusz Palczewski <[email protected]>
Date:   Thu Aug 19 08:47:49 2021 +0000

    iavf: Add __IAVF_INIT_FAILED state
    
    [ Upstream commit 59756ad6948be91d66867ce458083b820c59b8ba ]
    
    This commit adds a new state, __IAVF_INIT_FAILED to the state machine.
    From now on initialization functions report errors not by returning an
    error value, but by changing the state to indicate that something went
    wrong.
    
    Signed-off-by: Jakub Pawlak <[email protected]>
    Signed-off-by: Jan Sokolowski <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Add helper function to go from pci_dev to adapter [+ + +]

Author: Karen Sornek <[email protected]>
Date:   Wed Sep 15 08:41:23 2021 +0200

    iavf: Add helper function to go from pci_dev to adapter
    
    [ Upstream commit 247aa001b72b6c8a89df9d108a2ec6f274a6b64d ]
    
    Add helper function to go from pci_dev to adapter to make work simple -
    to go from a pci_dev to the adapter structure and make netdev assignment
    instead of having to go to the net_device then the adapter.
    
    Signed-off-by: Brett Creeley <[email protected]>
    Signed-off-by: Karen Sornek <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Add trace while removing device [+ + +]

Author: Jedrzej Jagielski <[email protected]>
Date:   Tue Jun 22 15:43:48 2021 +0200

    iavf: Add trace while removing device
    
    [ Upstream commit bdb9e5c7aec73a7b8b5acab37587b6de1203e68d ]
    
    Add kernel trace that device was removed.
    Currently there is no such information.
    I.e. Host admin removes a PCI device from a VM,
    than on VM shall be info about the event.
    
    This patch adds info log to iavf_remove function.
    
    Signed-off-by: Arkadiusz Kubalewski <[email protected]>
    Signed-off-by: Jedrzej Jagielski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Add waiting so the port is initialized in remove [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:36:56 2022 +0100

    iavf: Add waiting so the port is initialized in remove
    
    [ Upstream commit 974578017fc1fdd06cea8afb9dfa32602e8529ed ]
    
    There exist races when port is being configured and remove is
    triggered.
    
    unregister_netdev is not and can't be called under crit_lock
    mutex since it is calling ndo_stop -> iavf_close which requires
    this lock. Depending on init state the netdev could be still
    unregistered so unregister_netdev never cleans up, when shortly
    after that the device could become registered.
    
    Make iavf_remove wait until port finishes initialization.
    All critical state changes are atomic (under crit_lock).
    Crashes that come from iavf_reset_interrupt_capability and
    iavf_free_traffic_irqs should now be solved in a graceful
    manner.
    
    Fixes: 605ca7c5c6707 ("iavf: Fix kernel BUG in free_msi_irqs")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Combine init and watchdog state machines [+ + +]

Author: Mateusz Palczewski <[email protected]>
Date:   Thu Aug 19 08:47:58 2021 +0000

    iavf: Combine init and watchdog state machines
    
    [ Upstream commit 898ef1cb1cb24040c3e89263e02c605af70c776a ]
    
    Use single state machine for driver initialization and for service
    initialized driver. The init state machine implemented in init_task()
    is merged into the watchdog_task(). The init_task() function is
    removed.
    
    Signed-off-by: Jakub Pawlak <[email protected]>
    Signed-off-by: Jan Sokolowski <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: do not override the adapter state in the watchdog task (again) [+ + +]

Author: Stefan Assmann <[email protected]>
Date:   Wed Dec 1 09:14:34 2021 +0100

    iavf: do not override the adapter state in the watchdog task (again)
    
    commit fe523d7c9a8332855376ad5eb1aa301091129ba4 upstream.
    
    The watchdog task incorrectly changes the state to __IAVF_RESETTING,
    instead of letting the reset task take care of that. This was already
    resolved by commit 22c8fd71d3a5 ("iavf: do not override the adapter
    state in the watchdog task") but the problem was reintroduced by the
    recent code refactoring in commit 45eebd62999d ("iavf: Refactor iavf
    state machine tracking").
    
    Fixes: 45eebd62999d ("iavf: Refactor iavf state machine tracking")
    Signed-off-by: Stefan Assmann <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iavf: Fix __IAVF_RESETTING state usage [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:55 2022 +0100

    iavf: Fix __IAVF_RESETTING state usage
    
    [ Upstream commit 14756b2ae265d526b8356e86729090b01778fdf6 ]
    
    The setup of __IAVF_RESETTING state in watchdog task had no
    effect and could lead to slow resets in the driver as
    the task for __IAVF_RESETTING state only requeues watchdog.
    Till now the __IAVF_RESETTING was interpreted by reset task
    as running state which could lead to errors with allocating
    and resources disposal.
    
    Make watchdog_task queue the reset task when it's necessary.
    Do not update the state to __IAVF_RESETTING so the reset task
    knows exactly what is the current state of the adapter.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix deadlock in iavf_reset_task [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:31 2022 +0100

    iavf: Fix deadlock in iavf_reset_task
    
    commit e85ff9c631e1bf109ce8428848dfc8e8b0041f48 upstream.
    
    There exists a missing mutex_unlock call on crit_lock in
    iavf_reset_task call path.
    
    Unlock the crit_lock before returning from reset task.
    
    Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iavf: Fix init state closure on remove [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:37:10 2022 +0100

    iavf: Fix init state closure on remove
    
    [ Upstream commit 3ccd54ef44ebfa0792c5441b6d9c86618f3378d1 ]
    
    When init states of the adapter work, the errors like lack
    of communication with the PF might hop in. If such events
    occur the driver restores previous states in order to retry
    initialization in a proper way. When remove task kicks in,
    this situation could lead to races with unregistering the
    netdevice as well as resources cleanup. With the commit
    introducing the waiting in remove for init to complete,
    this problem turns into an endless waiting if init never
    recovers from errors.
    
    Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the
    remove thread has started.
    
    Make __IAVF_COMM_FAILED adapter state respect the
    __IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED
    state and return without any action instead of trying to
    recover.
    
    Make __IAVF_INIT_FAILED adapter state respect the
    __IAVF_IN_REMOVE_TASK bit and return without any further
    actions.
    
    Make the loop in the remove handler break when adapter has
    __IAVF_INIT_FAILED state set.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix kernel BUG in free_msi_irqs [+ + +]

Author: Przemyslaw Patynowski <[email protected]>
Date:   Fri Oct 22 10:30:14 2021 +0200

    iavf: Fix kernel BUG in free_msi_irqs
    
    [ Upstream commit 605ca7c5c670762e36ccb475cfa089d7ad0698e0 ]
    
    Fix driver not freeing VF's traffic irqs, prior to calling
    pci_disable_msix in iavf_remove.
    There were possible 2 erroneous states in which, iavf_close would
    not be called.
    One erroneous state is fixed by allowing netdev to register, when state
    is already running. It was possible for VF adapter to enter state loop
    from running to resetting, where iavf_open would subsequently fail.
    If user would then unload driver/remove VF pci, iavf_close would not be
    called, as the netdev was not registered, leaving traffic pcis still
    allocated.
    Fixed this by breaking loop, allowing netdev to open device when adapter
    state is __IAVF_RUNNING and it is not explicitily downed.
    Other possiblity is entering to iavf_remove from __IAVF_RESETTING state,
    where iavf_close would not free irqs, but just return 0.
    Fixed this by checking for last adapter state and then removing irqs.
    
    Kernel panic:
    [ 2773.628585] kernel BUG at drivers/pci/msi.c:375!
    ...
    [ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0
    ...
    [ 2773.640939] Call Trace:
    [ 2773.641572]  pci_disable_msix+0xf7/0x120
    [ 2773.642224]  iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf]
    [ 2773.642897]  iavf_remove+0x12e/0x500 [iavf]
    [ 2773.643578]  pci_device_remove+0x3b/0xc0
    [ 2773.644266]  device_release_driver_internal+0x103/0x1f0
    [ 2773.644948]  pci_stop_bus_device+0x69/0x90
    [ 2773.645576]  pci_stop_and_remove_bus_device+0xe/0x20
    [ 2773.646215]  pci_iov_remove_virtfn+0xba/0x120
    [ 2773.646862]  sriov_disable+0x2f/0xe0
    [ 2773.647531]  ice_free_vfs+0x2f8/0x350 [ice]
    [ 2773.648207]  ice_sriov_configure+0x94/0x960 [ice]
    [ 2773.648883]  ? _kstrtoull+0x3b/0x90
    [ 2773.649560]  sriov_numvfs_store+0x10a/0x190
    [ 2773.650249]  kernfs_fop_write+0x116/0x190
    [ 2773.650948]  vfs_write+0xa5/0x1a0
    [ 2773.651651]  ksys_write+0x4f/0xb0
    [ 2773.652358]  do_syscall_64+0x5b/0x1a0
    [ 2773.653075]  entry_SYSCALL_64_after_hwframe+0x65/0xca
    
    Fixes: 22ead37f8af8 ("i40evf: Add longer wait after remove module")
    Signed-off-by: Przemyslaw Patynowski <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:37:50 2022 +0100

    iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
    
    [ Upstream commit 0579fafd37fb7efe091f0e6c8ccf968864f40f3e ]
    
    iavf_virtchnl_completion is called under crit_lock but when
    the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called,
    this lock is released in order to obtain rtnl_lock to avoid
    ABBA deadlock with unregister_netdev.
    
    Along with the new way iavf_remove behaves, there exist
    many risks related to the lock release and attmepts to regrab
    it. The driver faces crashes related to races between
    unregister_netdev and netdev_update_features. Yet another
    risk is that the driver could already obtain the crit_lock
    in order to destroy it and iavf_virtchnl_completion could
    crash or block forever.
    
    Make iavf_virtchnl_completion never relock crit_lock in it's
    call paths.
    
    Extract rtnl_lock locking logic to the driver for
    unregister_netdev in order to set the netdev_registered flag
    inside the lock.
    
    Introduce a new flag that will inform adminq_task to perform
    the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after
    it finishes processing messages. Guard this code with remove
    flags so it's never called when the driver is in remove state.
    
    Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix missing check for running netdev [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:43 2022 +0100

    iavf: Fix missing check for running netdev
    
    commit d2c0f45fcceb0995f208c441d9c9a453623f9ccf upstream.
    
    The driver was queueing reset_task regardless of the netdev
    state.
    
    Do not queue the reset task in iavf_change_mtu if netdev
    is not running.
    
    Fixes: fdd4044ffdc8 ("iavf: Remove timer for work triggering, use delaying work instead")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iavf: Fix race in init state [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:01 2022 +0100

    iavf: Fix race in init state
    
    [ Upstream commit a472eb5cbaebb5774672c565e024336c039e9128 ]
    
    When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES
    message, the driver will wait for the response after requeueing
    the watchdog task in iavf_init_get_resources call stack. The
    logic is implemented this way that iavf_init_get_resources has
    to be called in order to allocate adapter->vf_res. It is polling
    for the AQ response in iavf_get_vf_config function. Expect a
    call trace from kernel when adminq_task worker handles this
    message first. adapter->vf_res will be NULL in
    iavf_virtchnl_completion.
    
    Make the watchdog task not queue the adminq_task if the init
    process is not finished yet.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: missing unlocks in iavf_watchdog_task() [+ + +]

Author: Dan Carpenter <[email protected]>
Date:   Wed Nov 10 11:13:50 2021 +0300

    iavf: missing unlocks in iavf_watchdog_task()
    
    commit bc2f39a6252ee40d9bfc2743d4437d420aec5f6e upstream.
    
    This code was re-organized and there some unlocks missing now.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Dan Carpenter <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iavf: Refactor iavf state machine tracking [+ + +]

Author: Mateusz Palczewski <[email protected]>
Date:   Thu Aug 19 08:47:40 2021 +0000

    iavf: Refactor iavf state machine tracking
    
    [ Upstream commit 45eebd62999d37d13568723524b99d828e0ce22c ]
    
    Replace state changes of iavf state machine
    with a method that also tracks the previous
    state the machine was on.
    
    This change is required for further work with
    refactoring init and watchdog state machines.
    
    Tracking of previous state would help us
    recover iavf after failure has occurred.
    
    Signed-off-by: Jakub Pawlak <[email protected]>
    Signed-off-by: Jan Sokolowski <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Rework mutexes for better synchronisation [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:35:49 2022 +0100

    iavf: Rework mutexes for better synchronisation
    
    [ Upstream commit fc2e6b3b132a907378f6af08356b105a4139c4fb ]
    
    The driver used to crash in multiple spots when put to stress testing
    of the init, reset and remove paths.
    
    The user would experience call traces or hangs when creating,
    resetting, removing VFs. Depending on the machines, the call traces
    are happening in random spots, like reset restoring resources racing
    with driver remove.
    
    Make adapter->crit_lock mutex a mandatory lock for guarding the
    operations performed on all workqueues and functions dealing with
    resource allocation and disposal.
    
    Make __IAVF_REMOVE a final state of the driver respected by
    workqueues that shall not requeue, when they fail to obtain the
    crit_lock.
    
    Make the IRQ handler not to queue the new work for adminq_task
    when the __IAVF_REMOVE state is set.
    
    Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: complete init_done on transport events [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:54 2022 -0800

    ibmvnic: complete init_done on transport events
    
    [ Upstream commit 36491f2df9ad2501e5a4ec25d3d95d72bafd2781 ]
    
    If we get a transport event, set the error and mark the init as
    complete so the attempt to send crq-init or login fail sooner
    rather than wait for the timeout.
    
    Fixes: bbd669a868bb ("ibmvnic: Fix completion structure initialization")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: define flush_reset_queue helper [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:53 2022 -0800

    ibmvnic: define flush_reset_queue helper
    
    [ Upstream commit 83da53f7e4bd86dca4b2edc1e2bb324fb3c033a1 ]
    
    Define and use a helper to flush the reset queue.
    
    Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: don't release napi in __ibmvnic_open() [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Mon Feb 7 16:19:18 2022 -0800

    ibmvnic: don't release napi in __ibmvnic_open()
    
    [ Upstream commit 61772b0908c640d0309c40f7d41d062ca4e979fa ]
    
    If __ibmvnic_open() encounters an error such as when setting link state,
    it calls release_resources() which frees the napi structures needlessly.
    Instead, have __ibmvnic_open() only clean up the work it did so far (i.e.
    disable napi and irqs) and leave the rest to the callers.
    
    If caller of __ibmvnic_open() is ibmvnic_open(), it should release the
    resources immediately. If the caller is do_reset() or do_hard_reset(),
    they will release the resources on the next reset.
    
    This fixes following crash that occurred when running the drmgr command
    several times to add/remove a vnic interface:
    
            [102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq
            [102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq
            [102056] ibmvnic 30000003 env3: Replenished 8 pools
            Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
            BUG: Kernel NULL pointer dereference on read at 0x00000010
            Faulting instruction address: 0xc000000000a3c840
            Oops: Kernel access of bad area, sig: 11 [#1]
            LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
            ...
            CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e2e37 #1
            Workqueue: events_long __ibmvnic_reset [ibmvnic]
            NIP:  c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820
            REGS: c0000000548e37e0 TRAP: 0300   Not tainted  (5.16.0-rc5-autotest-g6441998e2e37)
            MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28248484  XER: 00000004
            CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
            GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000
            ...
            NIP [c000000000a3c840] napi_enable+0x20/0xc0
            LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
            Call Trace:
            [c0000000548e3a80] [0000000000000006] 0x6 (unreliable)
            [c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
            [c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
            [c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570
            [c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660
            [c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0
            [c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
            Instruction dump:
            7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010
            38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020
            ---[ end trace 5f8033b08fd27706 ]---
    
    Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
    Reported-by: Abdul Haleem <[email protected]>
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Reviewed-by: Dany Madden <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: free reset-work-item when flushing [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:51 2022 -0800

    ibmvnic: free reset-work-item when flushing
    
    commit 8d0657f39f487d904fca713e0bc39c2707382553 upstream.
    
    Fix a tiny memory leak when flushing the reset work queue.
    
    Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ibmvnic: initialize rc before completing wait [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:52 2022 -0800

    ibmvnic: initialize rc before completing wait
    
    [ Upstream commit 765559b10ce514eb1576595834f23cdc92125fee ]
    
    We should initialize ->init_done_rc before calling complete(). Otherwise
    the waiting thread may see ->init_done_rc as 0 before we have updated it
    and may assume that the CRQ was successful.
    
    Fixes: 6b278c0cb378 ("ibmvnic delay complete()")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: register netdev after init of adapter [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:55 2022 -0800

    ibmvnic: register netdev after init of adapter
    
    commit 570425f8c7c18b14fa8a2a58a0adb431968ad118 upstream.
    
    Finish initializing the adapter before registering netdev so state
    is consistent.
    
    Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

igc: igc_read_phy_reg_gpy: drop premature return [+ + +]

Author: Corinna Vinschen <[email protected]>
Date:   Wed Feb 16 14:31:35 2022 +0100

    igc: igc_read_phy_reg_gpy: drop premature return
    
    commit fda2635466cd26ad237e1bc5d3f6a60f97ad09b6 upstream.
    
    igc_read_phy_reg_gpy checks the return value from igc_read_phy_reg_mdic
    and if it's not 0, returns immediately. By doing this, it leaves the HW
    semaphore in the acquired state.
    
    Drop this premature return statement, the function returns after
    releasing the semaphore immediately anyway.
    
    Fixes: 5586838fe9ce ("igc: Add code for PHY support")
    Signed-off-by: Corinna Vinschen <[email protected]>
    Acked-by: Sasha Neftin <[email protected]>
    Tested-by: Naama Meir <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

igc: igc_write_phy_reg_gpy: drop premature return [+ + +]

Author: Sasha Neftin <[email protected]>
Date:   Sun Feb 20 09:29:15 2022 +0200

    igc: igc_write_phy_reg_gpy: drop premature return
    
    commit c4208653a327a09da1e9e7b10299709b6d9b17bf upstream.
    
    Similar to "igc_read_phy_reg_gpy: drop premature return" patch.
    igc_write_phy_reg_gpy checks the return value from igc_write_phy_reg_mdic
    and if it's not 0, returns immediately. By doing this, it leaves the HW
    semaphore in the acquired state.
    
    Drop this premature return statement, the function returns after
    releasing the semaphore immediately anyway.
    
    Fixes: 5586838fe9ce ("igc: Add code for PHY support")
    Suggested-by: Dima Ruinskiy <[email protected]>
    Reported-by: Corinna Vinschen <[email protected]>
    Signed-off-by: Sasha Neftin <[email protected]>
    Tested-by: Naama Meir <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: clear BTN_RIGHT/MIDDLE on buttonpads [+ + +]

Author: Josц╘ ExpцЁsito <[email protected]>
Date:   Tue Feb 8 09:59:16 2022 -0800

    Input: clear BTN_RIGHT/MIDDLE on buttonpads
    
    [ Upstream commit 37ef4c19b4c659926ce65a7ac709ceaefb211c40 ]
    
    Buttonpads are expected to map the INPUT_PROP_BUTTONPAD property bit
    and the BTN_LEFT key bit.
    
    As explained in the specification, where a device has a button type
    value of 0 (click-pad) or 1 (pressure-pad) there should not be
    discrete buttons:
    https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/touchpad-windows-precision-touchpad-collection#device-capabilities-feature-report
    
    However, some drivers map the BTN_RIGHT and/or BTN_MIDDLE key bits even
    though the device is a buttonpad and therefore does not have those
    buttons.
    
    This behavior has forced userspace applications like libinput to
    implement different workarounds and quirks to detect buttonpads and
    offer to the user the right set of features and configuration options.
    For more information:
    https://gitlab.freedesktop.org/libinput/libinput/-/merge_requests/726
    
    In order to avoid this issue clear the BTN_RIGHT and BTN_MIDDLE key
    bits when the input device is register if the INPUT_PROP_BUTTONPAD
    property bit is set.
    
    Notice that this change will not affect udev because it does not check
    for buttons. See systemd/src/udev/udev-builtin-input_id.c.
    
    List of known affected hardware:
    
     - Chuwi AeroBook Plus
     - Chuwi Gemibook
     - Framework Laptop
     - GPD Win Max
     - Huawei MateBook 2020
     - Prestigio Smartbook 141 C2
     - Purism Librem 14v1
     - StarLite Mk II   - AMI firmware
     - StarLite Mk II   - Coreboot firmware
     - StarLite Mk III  - AMI firmware
     - StarLite Mk III  - Coreboot firmware
     - StarLabTop Mk IV - AMI firmware
     - StarLabTop Mk IV - Coreboot firmware
     - StarBook Mk V
    
    Acked-by: Peter Hutterer <[email protected]>
    Acked-by: Benjamin Tissoires <[email protected]>
    Acked-by: Jiri Kosina <[email protected]>
    Signed-off-by: Josц╘ ExpцЁsito <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Input: elan_i2c - fix regulator enable count imbalance after suspend/resume [+ + +]

Author: Hans de Goede <[email protected]>
Date:   Mon Feb 28 23:39:50 2022 -0800

    Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
    
    commit 04b7762e37c95d9b965d16bb0e18dbd1fa2e2861 upstream.
    
    Before these changes elan_suspend() would only disable the regulator
    when device_may_wakeup() returns false; whereas elan_resume() would
    unconditionally enable it, leading to an enable count imbalance when
    device_may_wakeup() returns true.
    
    This triggers the "WARN_ON(regulator->enable_count)" in regulator_put()
    when the elan_i2c driver gets unbound, this happens e.g. with the
    hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1.
    
    Fix this by making the regulator_enable() call also be conditional
    on device_may_wakeup() returning false.
    
    Signed-off-by: Hans de Goede <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power() [+ + +]

Author: Hans de Goede <[email protected]>
Date:   Mon Feb 28 23:39:38 2022 -0800

    Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
    
    commit 81a36d8ce554b82b0a08e2b95d0bd44fcbff339b upstream.
    
    elan_disable_power() is called conditionally on suspend, where as
    elan_enable_power() is always called on resume. This leads to
    an imbalance in the regulator's enable count.
    
    Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power()
    in preparation of fixing this.
    
    No functional changes intended.
    
    Signed-off-by: Hans de Goede <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [dtor: consolidate elan_[en|dis]able() into elan_set_power()]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: samsung-keypad - properly state IOMEM dependency [+ + +]

Author: David Gow <[email protected]>
Date:   Sun Feb 27 21:00:10 2022 -0800

    Input: samsung-keypad - properly state IOMEM dependency
    
    commit ba115adf61b36b8c167126425a62b0efc23f72c0 upstream.
    
    Make the samsung-keypad driver explicitly depend on CONFIG_HAS_IOMEM, as it
    calls devm_ioremap(). This prevents compile errors in some configs (e.g,
    allyesconfig/randconfig under UML):
    
    /usr/bin/ld: drivers/input/keyboard/samsung-keypad.o: in function `samsung_keypad_probe':
    samsung-keypad.c:(.text+0xc60): undefined reference to `devm_ioremap'
    
    Signed-off-by: David Gow <[email protected]>
    Acked-by: anton ivanov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: ti_am335x_tsc - fix STEPCONFIG setup for Z2 [+ + +]

Author: Dario Binacchi <[email protected]>
Date:   Sun Dec 12 21:14:48 2021 -0800

    Input: ti_am335x_tsc - fix STEPCONFIG setup for Z2
    
    [ Upstream commit 6bfeb6c21e1bdc11c328b7d996d20f0f73c6b9b0 ]
    
    The Z2 step configuration doesn't erase the SEL_INP_SWC_3_0 bit-field
    before setting the ADC channel. This way its value could be corrupted by
    the ADC channel selected for the Z1 coordinate.
    
    Fixes: 8c896308feae ("input: ti_am335x_adc: use only FIFO0 and clean up a little")
    Signed-off-by: Dario Binacchi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Input: ti_am335x_tsc - set ADCREFM for X configuration [+ + +]

Author: Dario Binacchi <[email protected]>
Date:   Sun Dec 12 21:14:35 2021 -0800

    Input: ti_am335x_tsc - set ADCREFM for X configuration
    
    [ Upstream commit 73cca71a903202cddc8279fc76b2da4995da5bea ]
    
    As reported by the STEPCONFIG[1-16] registered field descriptions of the
    TI reference manual, for the ADC "in single ended, SEL_INM_SWC_3_0 must
    be 1xxx".
    
    Unlike the Y and Z coordinates, this bit has not been set for the step
    configuration registers used to sample the X coordinate.
    
    Fixes: 1b8be32e6914 ("Input: add support for TI Touchscreen controller")
    Signed-off-by: Dario Binacchi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

io_uring: fix no lock protection for ctx->cq_extra [+ + +]

Author: Hao Xu <[email protected]>
Date:   Thu Nov 25 17:21:02 2021 +0800

    io_uring: fix no lock protection for ctx->cq_extra
    
    [ Upstream commit e302f1046f4c209291b07ff7bc4d15ca26891f16 ]
    
    ctx->cq_extra should be protected by completion lock so that the
    req_need_defer() does the right check.
    
    Cc: [email protected]
    Signed-off-by: Hao Xu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iommu/amd: Fix I/O page table memory leak [+ + +]

Author: Suravee Suthikulpanit <[email protected]>
Date:   Thu Feb 10 09:47:45 2022 -0600

    iommu/amd: Fix I/O page table memory leak
    
    [ Upstream commit 6b0b2d9a6a308bcd9300c2d83000a82812c56cea ]
    
    The current logic updates the I/O page table mode for the domain
    before calling the logic to free memory used for the page table.
    This results in IOMMU page table memory leak, and can be observed
    when launching VM w/ pass-through devices.
    
    Fix by freeing the memory used for page table before updating the mode.
    
    Cc: Joerg Roedel <[email protected]>
    Reported-by: Daniel Jordan <[email protected]>
    Tested-by: Daniel Jordan <[email protected]>
    Signed-off-by: Suravee Suthikulpanit <[email protected]>
    Fixes: e42ba0633064 ("iommu/amd: Restructure code for freeing page table")
    Link: https://lore.kernel.org/all/[email protected]/
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iommu/amd: Recover from event log overflow [+ + +]

Author: Lennert Buytenhek <[email protected]>
Date:   Mon Oct 4 13:07:24 2021 +0300

    iommu/amd: Recover from event log overflow
    
    commit 5ce97f4ec5e0f8726a5dda1710727b1ee9badcac upstream.
    
    The AMD IOMMU logs I/O page faults and such to a ring buffer in
    system memory, and this ring buffer can overflow.  The AMD IOMMU
    spec has the following to say about the interrupt status bit that
    signals this overflow condition:
    
            EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
            event log overflow has occurred. This bit is set when a new
            event is to be written to the event log and there is no usable
            entry in the event log, causing the new event information to
            be discarded. An interrupt is generated when EventOverflow = 1b
            and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
            entries are written while this bit is set. Software Note: To
            resume logging, clear EventOverflow (W1C), and write a 1 to
            MMIO Offset 0018h[EventLogEn].
    
    The AMD IOMMU driver doesn't currently implement this recovery
    sequence, meaning that if a ring buffer overflow occurs, logging
    of EVT/PPR/GA events will cease entirely.
    
    This patch implements the spec-mandated reset sequence, with the
    minor tweak that the hardware seems to want to have a 0 written to
    MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this
    field, or the IOMMU won't actually resume logging events.
    
    Signed-off-by: Lennert Buytenhek <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find [+ + +]

Author: Miaoqian Lin <[email protected]>
Date:   Fri Jan 7 08:09:11 2022 +0000

    iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
    
    commit 9826e393e4a8c3df474e7f9eacd3087266f74005 upstream.
    
    The reference taken by 'of_find_device_by_node()' must be released when
    not needed anymore.
    Add the corresponding 'put_device()' in the error handling path.
    
    Fixes: 765a9d1d02b2 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan")
    Signed-off-by: Miaoqian Lin <[email protected]>
    Acked-by: Thierry Reding <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/vt-d: Fix double list_add when enabling VMD in scalable mode [+ + +]

Author: Adrian Huang <[email protected]>
Date:   Mon Feb 21 13:33:48 2022 +0800

    iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
    
    commit b00833768e170a31af09268f7ab96aecfcca9623 upstream.
    
    When enabling VMD and IOMMU scalable mode, the following kernel panic
    call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
    CPU) during booting:
    
    pci 0000:59:00.5: Adding to iommu group 42
    ...
    vmd 0000:59:00.5: PCI host bridge to bus 10000:80
    pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
    pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
    pci 10000:80:01.0: enabling Extended Tags
    pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
    pci 10000:80:01.0: DMAR: Setup RID2PASID failed
    pci 10000:80:01.0: Failed to add to iommu group 42: -16
    pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
    pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
    pci 10000:80:03.0: enabling Extended Tags
    pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
    ------------[ cut here ]------------
    kernel BUG at lib/list_debug.c:29!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
    Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
    Workqueue: events work_for_cpu_fn
    RIP: 0010:__list_add_valid.cold+0x26/0x3f
    Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
          0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
          fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
          9e e8 8b b1 fe
    RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
    RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
    RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
    RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
    R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
    R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
    FS:  0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     intel_pasid_alloc_table+0x9c/0x1d0
     dmar_insert_one_dev_info+0x423/0x540
     ? device_to_iommu+0x12d/0x2f0
     intel_iommu_attach_device+0x116/0x290
     __iommu_attach_device+0x1a/0x90
     iommu_group_add_device+0x190/0x2c0
     __iommu_probe_device+0x13e/0x250
     iommu_probe_device+0x24/0x150
     iommu_bus_notifier+0x69/0x90
     blocking_notifier_call_chain+0x5a/0x80
     device_add+0x3db/0x7b0
     ? arch_memremap_can_ram_remap+0x19/0x50
     ? memremap+0x75/0x140
     pci_device_add+0x193/0x1d0
     pci_scan_single_device+0xb9/0xf0
     pci_scan_slot+0x4c/0x110
     pci_scan_child_bus_extend+0x3a/0x290
     vmd_enable_domain.constprop.0+0x63e/0x820
     vmd_probe+0x163/0x190
     local_pci_probe+0x42/0x80
     work_for_cpu_fn+0x13/0x20
     process_one_work+0x1e2/0x3b0
     worker_thread+0x1c4/0x3a0
     ? rescuer_thread+0x370/0x370
     kthread+0xc7/0xf0
     ? kthread_complete_and_exit+0x20/0x20
     ret_from_fork+0x1f/0x30
     </TASK>
    Modules linked in:
    ---[ end trace 0000000000000000 ]---
    ...
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---
    
    The following 'lspci' output shows devices '10000:80:*' are subdevices of
    the VMD device 0000:59:00.5:
    
      $ lspci
      ...
      0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
      ...
      10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
      10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
      10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
      10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
      10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
      10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
    
    The symptom 'list_add double add' is caused by the following failure
    message:
    
      pci 10000:80:01.0: DMAR: Setup RID2PASID failed
      pci 10000:80:01.0: Failed to add to iommu group 42: -16
      pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
    
    Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
    so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
    device 0000:59:00.5. Here is call path:
    
      intel_pasid_alloc_table
        pci_for_each_dma_alias
         get_alias_pasid_table
           search_pasid_table
    
    pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
    which is the VMD device 0000:59:00.5. However, pte of the VMD device
    0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
    Adding to iommu group 42". So, the status -EBUSY is returned when
    configuring pasid entry for device 10000:80:01.0.
    
    It then invokes dmar_remove_one_dev_info() to release
    'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
    table is not released because of the following statement in
    __dmar_remove_one_dev_info():
    
            if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
                    ...
                    intel_pasid_free_table(info->dev);
            }
    
    The subsequent dmar_insert_one_dev_info() operation of device
    10000:80:03.0 allocates 'struct device_domain_info *' from
    iommu_devinfo_cache. The allocated address is the same address that
    is released previously for device 10000:80:01.0. Finally, invoking
    device_attach_pasid_table() causes the issue.
    
    `git bisect` points to the offending commit 474dd1c65064 ("iommu/vt-d:
    Fix clearing real DMA device's scalable-mode context entries"), which
    releases the pasid table if the device is not the subdevice by
    checking the returned status of dev_is_real_dma_subdevice().
    Reverting the offending commit can work around the issue.
    
    The solution is to prevent from allocating pasid table if those
    devices are subdevices of the VMD device.
    
    Fixes: 474dd1c65064 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
    Cc: [email protected] # v5.14+
    Signed-off-by: Adrian Huang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Thu Mar 3 09:37:28 2022 -0800

    ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
    
    [ Upstream commit 2d3916f3189172d5c69d33065c3c21119fe539fc ]
    
    While investigating on why a synchronize_net() has been added recently
    in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
    might drop skbs in some cases.
    
    Discussion about removing synchronize_net() from ipv6_mc_down()
    will happen in a different thread.
    
    Fixes: f185de28d9ae ("mld: add new workqueues for process mld events")
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: Taehee Yoo <[email protected]>
    Cc: Cong Wang <[email protected]>
    Cc: David Ahern <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iwlwifi: mvm: check debugfs_dir ptr before use [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Tue Feb 22 19:06:30 2022 -0800

    iwlwifi: mvm: check debugfs_dir ptr before use
    
    commit 5a6248c0a22352f09ea041665d3bd3e18f6f872c upstream.
    
    When "debugfs=off" is used on the kernel command line, iwiwifi's
    mvm module uses an invalid/unchecked debugfs_dir pointer and causes
    a BUG:
    
     BUG: kernel NULL pointer dereference, address: 000000000000004f
     #PF: supervisor read access in kernel mode
     #PF: error_code(0x0000) - not-present page
     PGD 0 P4D 0
     Oops: 0000 [#1] PREEMPT SMP
     CPU: 1 PID: 503 Comm: modprobe Tainted: G        W         5.17.0-rc5 #7
     Hardware name: Dell Inc. Inspiron 15 5510/076F7Y, BIOS 2.4.1 11/05/2021
     RIP: 0010:iwl_mvm_dbgfs_register+0x692/0x700 [iwlmvm]
     Code: 69 a0 be 80 01 00 00 48 c7 c7 50 73 6a a0 e8 95 cf ee e0 48 8b 83 b0 1e 00 00 48 c7 c2 54 73 6a a0 be 64 00 00 00 48 8d 7d 8c <48> 8b 48 50 e8 15 22 07 e1 48 8b 43 28 48 8d 55 8c 48 c7 c7 5f 73
     RSP: 0018:ffffc90000a0ba68 EFLAGS: 00010246
     RAX: ffffffffffffffff RBX: ffff88817d6e3328 RCX: ffff88817d6e3328
     RDX: ffffffffa06a7354 RSI: 0000000000000064 RDI: ffffc90000a0ba6c
     RBP: ffffc90000a0bae0 R08: ffffffff824e4880 R09: ffffffffa069d620
     R10: ffffc90000a0ba00 R11: ffffffffffffffff R12: 0000000000000000
     R13: ffffc90000a0bb28 R14: ffff88817d6e3328 R15: ffff88817d6e3320
     FS:  00007f64dd92d740(0000) GS:ffff88847f640000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000000000000004f CR3: 000000016fc79001 CR4: 0000000000770ee0
     PKRU: 55555554
     Call Trace:
      <TASK>
      ? iwl_mvm_mac_setup_register+0xbdc/0xda0 [iwlmvm]
      iwl_mvm_start_post_nvm+0x71/0x100 [iwlmvm]
      iwl_op_mode_mvm_start+0xab8/0xb30 [iwlmvm]
      _iwl_op_mode_start+0x6f/0xd0 [iwlwifi]
      iwl_opmode_register+0x6a/0xe0 [iwlwifi]
      ? 0xffffffffa0231000
      iwl_mvm_init+0x35/0x1000 [iwlmvm]
      ? 0xffffffffa0231000
      do_one_initcall+0x5a/0x1b0
      ? kmem_cache_alloc+0x1e5/0x2f0
      ? do_init_module+0x1e/0x220
      do_init_module+0x48/0x220
      load_module+0x2602/0x2bc0
      ? __kernel_read+0x145/0x2e0
      ? kernel_read_file+0x229/0x290
      __do_sys_finit_module+0xc5/0x130
      ? __do_sys_finit_module+0xc5/0x130
      __x64_sys_finit_module+0x13/0x20
      do_syscall_64+0x38/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae
     RIP: 0033:0x7f64dda564dd
     Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 29 0f 00 f7 d8 64 89 01 48
     RSP: 002b:00007ffdba393f88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
     RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f64dda564dd
     RDX: 0000000000000000 RSI: 00005575399e2ab2 RDI: 0000000000000001
     RBP: 000055753a91c5e0 R08: 0000000000000000 R09: 0000000000000002
     R10: 0000000000000001 R11: 0000000000000246 R12: 00005575399e2ab2
     R13: 000055753a91ceb0 R14: 0000000000000000 R15: 000055753a923018
      </TASK>
     Modules linked in: btintel(+) btmtk bluetooth vfat snd_hda_codec_hdmi fat snd_hda_codec_realtek snd_hda_codec_generic iwlmvm(+) snd_sof_pci_intel_tgl mac80211 snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence soundwire_bus snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core btrfs snd_compress snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec raid6_pq iwlwifi snd_hda_core snd_pcm snd_timer snd soundcore cfg80211 intel_ish_ipc(+) thunderbolt rfkill intel_ishtp ucsi_acpi wmi i2c_hid_acpi i2c_hid evdev
     CR2: 000000000000004f
     ---[ end trace 0000000000000000 ]---
    
    Check the debugfs_dir pointer for an error before using it.
    
    Fixes: 8c082a99edb9 ("iwlwifi: mvm: simplify iwl_mvm_dbgfs_register")
    Signed-off-by: Randy Dunlap <[email protected]>
    Cc: Luca Coelho <[email protected]>
    Cc: [email protected]
    Cc: Kalle Valo <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: Emmanuel Grumbach <[email protected]>
    Cc: stable <[email protected]>
    Reviewed-by: Greg Kroah-Hartman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [change to make both conditional]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() [+ + +]

Author: Maciej Fijalkowski <[email protected]>
Date:   Wed Mar 2 09:59:27 2022 -0800

    ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
    
    commit 6c7273a266759d9d36f7c862149f248bcdeddc0f upstream.
    
    Commit c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if
    netif is not OK") addressed the ring transient state when
    MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
    interface to through down/up. Maurice reported that when carrier is not
    ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
    cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
    there is still some work to be done.
    
    To fix this, do not set work_done to false for a !netif_carrier_ok().
    
    Fixes: c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
    Reported-by: Maurice Baijens <[email protected]>
    Tested-by: Maurice Baijens <[email protected]>
    Signed-off-by: Maciej Fijalkowski <[email protected]>
    Tested-by: Sandeep Penigalapati <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

kasan: fix quarantine conflicting with init_on_free [+ + +]

Author: Andrey Konovalov <[email protected]>
Date:   Fri Jan 14 14:05:01 2022 -0800

    kasan: fix quarantine conflicting with init_on_free
    
    [ Upstream commit 26dca996ea7b1ac7008b6b6063fc88b849e3ac3e ]
    
    KASAN's quarantine might save its metadata inside freed objects.  As
    this happens after the memory is zeroed by the slab allocator when
    init_on_free is enabled, the memory coming out of quarantine is not
    properly zeroed.
    
    This causes lib/test_meminit.c tests to fail with Generic KASAN.
    
    Zero the metadata when the object is removed from quarantine.
    
    Link: https://lkml.kernel.org/r/2805da5df4b57138fdacd671f5d227d58950ba54.1640037083.git.andreyknvl@google.com
    Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
    Signed-off-by: Andrey Konovalov <[email protected]>
    Reviewed-by: Marco Elver <[email protected]>
    Cc: Alexander Potapenko <[email protected]>
    Cc: Andrey Konovalov <[email protected]>
    Cc: Dmitry Vyukov <[email protected]>
    Cc: Andrey Ryabinin <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: arm64: vgic: Read HW interrupt pending state from the HW [+ + +]

Author: Marc Zyngier <[email protected]>
Date:   Thu Feb 3 09:24:45 2022 +0000

    KVM: arm64: vgic: Read HW interrupt pending state from the HW
    
    [ Upstream commit 5bfa685e62e9ba93c303a9a8db646c7228b9b570 ]
    
    It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always
    result in the pending interrupts being accurately reported if they are
    mapped to a HW interrupt. This is particularily visible when acking
    the timer interrupt and reading the GICR_ISPENDR1 register immediately
    after, for example (the interrupt appears as not-pending while it really
    is...).
    
    This is because a HW interrupt has its 'active and pending state' kept
    in the *physical* distributor, and not in the virtual one, as mandated
    by the spec (this is what allows the direct deactivation). The virtual
    distributor only caries the pending and active *states* (note the
    plural, as these are two independent and non-overlapping states).
    
    Fix it by reading the HW state back, either from the timer itself or
    from the distributor if necessary.
    
    Reported-by: Ricardo Koller <[email protected]>
    Tested-by: Ricardo Koller <[email protected]>
    Reviewed-by: Ricardo Koller <[email protected]>
    Signed-off-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Fri Oct 8 19:11:56 2021 -0700

    KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU
    
    [ Upstream commit 6f390916c4fb359507d9ac4bf1b28a4f8abee5c0 ]
    
    Wrap s390's halt_poll_max_steal with READ_ONCE and snapshot the result of
    kvm_arch_no_poll() in kvm_vcpu_block() to avoid a mostly-theoretical,
    largely benign bug on s390 where the result of kvm_arch_no_poll() could
    change due to userspace modifying halt_poll_max_steal while the vCPU is
    blocking.  The bug is largely benign as it will either cause KVM to skip
    updating halt-polling times (no_poll toggles false=>true) or to update
    halt-polling times with a slightly flawed block_ns.
    
    Note, READ_ONCE is unnecessary in the current code, add it in case the
    arch hook is ever inlined, and to provide a hint that userspace can
    change the param at will.
    
    Fixes: 8b905d28ee17 ("KVM: s390: provide kvm_arch_no_poll function")
    Reviewed-by: Christian Borntraeger <[email protected]>
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guest [+ + +]

Author: Paolo Bonzini <[email protected]>
Date:   Tue Nov 16 09:32:47 2021 -0500

    KVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guest
    
    [ Upstream commit 1831fa44df743a7cdffdf1c12c799bf6f3c12b8c ]
    
    Don't configure the wakeup handler when a vCPU is blocking with IRQs
    disabled, in which case any IRQ, posted or otherwise, should not be
    recognized and thus should not wake the vCPU.
    
    Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Fri Oct 8 19:12:19 2021 -0700

    KVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration
    
    [ Upstream commit cfb0e1306a3790eb055ebf7cdb7b0ee8a23e9b6e ]
    
    Use READ_ONCE() when loading the posted interrupt descriptor control
    field to ensure "old" and "new" have the same base value.  If the
    compiler emits separate loads, and loads into "new" before "old", KVM
    could theoretically drop the ON bit if it were set between the loads.
    
    Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots() [+ + +]

Author: Like Xu <[email protected]>
Date:   Tue Mar 1 20:49:41 2022 +0800

    KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
    
    commit c6c937d673aaa1d603f62f134e1ca9c173eeeed3 upstream.
    
    Just like on the optional mmu_alloc_direct_roots() path, once shadow
    path reaches "r = -EIO" somewhere, the caller needs to know the actual
    state in order to enter error handling and avoid something worse.
    
    Fixes: 4a38162ee9f1 ("KVM: MMU: load PDPTRs outside mmu_lock")
    Signed-off-by: Like Xu <[email protected]>
    Reviewed-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: X86: Ensure that dirty PDPTRs are loaded [+ + +]

Author: Lai Jiangshan <[email protected]>
Date:   Mon Nov 8 20:43:53 2021 +0800

    KVM: X86: Ensure that dirty PDPTRs are loaded
    
    [ Upstream commit 2c5653caecc4807b8abfe9c41880ac38417be7bf ]
    
    For VMX with EPT, dirty PDPTRs need to be loaded before the next vmentry
    via vmx_load_mmu_pgd()
    
    But not all paths that call load_pdptrs() will cause vmx_load_mmu_pgd()
    to be invoked.  Normally, kvm_mmu_reset_context() is used to cause
    KVM_REQ_LOAD_MMU_PGD, but sometimes it is skipped:
    
    * commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and
    CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs()
    when changing CR0.CD and CR0.NW.
    
    * commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current
    PCID on MOV CR3 w/ flush") skips KVM_REQ_LOAD_MMU_PGD after
    load_pdptrs() when rewriting the CR3 with the same value.
    
    * commit a91a7c709600("KVM: X86: Don't reset mmu context when
    toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after
    load_pdptrs() when changing CR4.PGE.
    
    Fixes: d81135a57aa6 ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")
    Fixes: 21823fbda552 ("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush")
    Fixes: a91a7c709600 ("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE")
    Signed-off-by: Lai Jiangshan <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: x86: Exit to userspace if emulation prepared a completion callback [+ + +]

Author: Hou Wenlong <[email protected]>
Date:   Tue Nov 2 17:15:32 2021 +0800

    KVM: x86: Exit to userspace if emulation prepared a completion callback
    
    [ Upstream commit adbfb12d4c4517a8adde23a7fc46538953d56eea ]
    
    em_rdmsr() and em_wrmsr() return X86EMUL_IO_NEEDED if MSR accesses
    required an exit to userspace. However, x86_emulate_insn() doesn't return
    X86EMUL_*, so x86_emulate_instruction() doesn't directly act on
    X86EMUL_IO_NEEDED; instead, it looks for other signals to differentiate
    between PIO, MMIO, etc. causing RDMSR/WRMSR emulation not to
    exit to userspace now.
    
    Nevertheless, if the userspace_msr_exit_test testcase in selftests
    is changed to test RDMSR/WRMSR with a forced emulation prefix,
    the test passes.  What happens is that first userspace exit
    information is filled but the userspace exit does not happen.
    Because x86_emulate_instruction() returns 1, the guest retries
    the instruction---but this time RIP has already been adjusted
    past the forced emulation prefix, so the guest executes RDMSR/WRMSR
    and the userspace exit finally happens.
    
    Since the X86EMUL_IO_NEEDED path has provided a complete_userspace_io
    callback, x86_emulate_instruction() can just return 0 if the
    callback is not NULL. Then RDMSR/WRMSR instruction emulation will
    exit to userspace directly, without the RDMSR/WRMSR vmexit.
    
    Fixes: 1ae099540e8c7 ("KVM: x86: Allow deflecting unknown MSR accesses to user space")
    Signed-off-by: Hou Wenlong <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Message-Id: <56f9df2ee5c05a81155e2be366c9dc1f7adc8817.1635842679.git.houwenlong93@linux.alibaba.com>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg [+ + +]

Author: Sean Christopherson <[email protected]>
Date:   Tue Nov 2 17:15:29 2021 +0800

    KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg
    
    [ Upstream commit 5e854864ee4384736f27a986633bae21731a4e4e ]
    
    Truncate the new EIP to a 32-bit value when handling EMULTYPE_SKIP as the
    decode phase does not truncate _eip.  Wrapping the 32-bit boundary is
    legal if and only if CS is a flat code segment, but that check is
    implicitly handled in the form of limit checks in the decode phase.
    
    Opportunstically prepare for a future fix by storing the result of any
    truncation in "eip" instead of "_eip".
    
    Fixes: 1957aa63be53 ("KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig")
    Signed-off-by: Sean Christopherson <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Message-Id: <093eabb1eab2965201c9b018373baf26ff256d85.1635842679.git.houwenlong93@linux.alibaba.com>
    Signed-off-by: Sasha Levin <[email protected]>

Linux: Linux 5.15.27 [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Tue Mar 8 19:12:55 2022 +0100

    Linux 5.15.27
    
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: Shuah Khan <[email protected]>
    Tested-by: Fox Chen <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Tested-by: Ron Economos <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Tested-by: Bagas Sanjaya <[email protected]>
    Tested-by: Sudip Mukherjee <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211: fix EAPoL rekey fail in 802.3 rx path [+ + +]

Author: Deren Wu <[email protected]>
Date:   Sun Feb 13 00:20:15 2022 +0800

    mac80211: fix EAPoL rekey fail in 802.3 rx path
    
    commit 610d086d6df0b15c3732a7b4a5b0f1c3e1b84d4c upstream.
    
    mac80211 set capability NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211
    to upper layer by default. That means we should pass EAPoL packets through
    nl80211 path only, and should not send the EAPoL skb to netdevice diretly.
    At the meanwhile, wpa_supplicant would not register sock to listen EAPoL
    skb on the netdevice.
    
    However, there is no control_port_protocol handler in mac80211 for 802.3 RX
    packets, mac80211 driver would pass up the EAPoL rekey frame to netdevice
    and wpa_supplicant would be never interactive with this kind of packets,
    if SUPPORTS_RX_DECAP_OFFLOAD is enabled. This causes STA always rekey fail
    if EAPoL frame go through 802.3 path.
    
    To avoid this problem, align the same process as 802.11 type to handle
    this frame before put it into network stack.
    
    This also addresses a potential security issue in 802.3 RX mode that was
    previously fixed in commit a8c4d76a8dd4 ("mac80211: do not accept/forward
    invalid EAPOL frames").
    
    Cc: [email protected] # 5.12+
    Fixes: 80a915ec4427 ("mac80211: add rx decapsulation offload support")
    Signed-off-by: Deren Wu <[email protected]>
    Link: https://lore.kernel.org/r/6889c9fced5859ebb088564035f84fd0fa792a49.1644680751.git.deren.wu@mediatek.com
    [fix typos, update comment and add note about security issue]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211: fix forwarded mesh frames AC & queue selection [+ + +]

Author: Nicolas Escande <[email protected]>
Date:   Mon Feb 14 18:32:14 2022 +0100

    mac80211: fix forwarded mesh frames AC & queue selection
    
    commit 859ae7018316daa4adbc496012dcbbb458d7e510 upstream.
    
    There are two problems with the current code that have been highlighted
    with the AQL feature that is now enbaled by default.
    
    First problem is in ieee80211_rx_h_mesh_fwding(),
    ieee80211_select_queue_80211() is used on received packets to choose
    the sending AC queue of the forwarding packet although this function
    should only be called on TX packet (it uses ieee80211_tx_info).
    This ends with forwarded mesh packets been sent on unrelated random AC
    queue. To fix that, AC queue can directly be infered from skb->priority
    which has been extracted from QOS info (see ieee80211_parse_qos()).
    
    Second problem is the value of queue_mapping set on forwarded mesh
    frames via skb_set_queue_mapping() is not the AC of the packet but a
    hardware queue index. This may or may not work depending on AC to HW
    queue mapping which is driver specific.
    
    Both of these issues lead to improper AC selection while forwarding
    mesh packets but more importantly due to improper airtime accounting
    (which is done on a per STA, per AC basis) caused traffic stall with
    the introduction of AQL.
    
    Fixes: cf44012810cc ("mac80211: fix unnecessary frame drops in mesh fwding")
    Fixes: d3c1597b8d1b ("mac80211: fix forwarded mesh frame queue mapping")
    Co-developed-by: Remi Pommarel <[email protected]>
    Signed-off-by: Remi Pommarel <[email protected]>
    Signed-off-by: Nicolas Escande <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211: treat some SAE auth steps as final [+ + +]

Author: Johannes Berg <[email protected]>
Date:   Thu Feb 24 10:39:34 2022 +0100

    mac80211: treat some SAE auth steps as final
    
    commit 94d9864cc86f572f881db9b842a78e9d075493ae upstream.
    
    When we get anti-clogging token required (added by the commit
    mentioned below), or the other status codes added by the later
    commit 4e56cde15f7d ("mac80211: Handle special status codes in
    SAE commit") we currently just pretend (towards the internal
    state machine of authentication) that we didn't receive anything.
    
    This has the undesirable consequence of retransmitting the prior
    frame, which is not expected, because the timer is still armed.
    
    If we just disarm the timer at that point, it would result in
    the undesirable side effect of being in this state indefinitely
    if userspace crashes, or so.
    
    So to fix this, reset the timer and set a new auth_data->waiting
    in order to have no more retransmissions, but to have the data
    destroyed when the timer actually fires, which will only happen
    if userspace didn't continue (i.e. crashed or abandoned it.)
    
    Fixes: a4055e74a2ff ("mac80211: Don't destroy auth data in case of anti-clogging")
    Reported-by: Jouni Malinen <[email protected]>
    Link: https://lore.kernel.org/r/20220224103932.75964e1d7932.Ia487f91556f29daae734bf61f8181404642e1eec@changeid
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work [+ + +]

Author: JaeMan Park <[email protected]>
Date:   Thu Jan 13 15:02:35 2022 +0900

    mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
    
    [ Upstream commit cacfddf82baf1470e5741edeecb187260868f195 ]
    
    In mac80211_hwsim, the probe_req frame is created and sent while
    scanning. It is sent with ieee80211_tx_info which is not initialized.
    Uninitialized ieee80211_tx_info can cause problems when using
    mac80211_hwsim with wmediumd. wmediumd checks the tx_rates field of
    ieee80211_tx_info and doesn't relay probe_req frame to other clients
    even if it is a broadcasting message.
    
    Call ieee80211_tx_prepare_skb() to initialize ieee80211_tx_info for
    the probe_req that is created by hw_scan_work in mac80211_hwsim.
    
    Signed-off-by: JaeMan Park <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [fix memory leak]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mac80211_hwsim: report NOACK frames in tx_status [+ + +]

Author: Benjamin Beichler <[email protected]>
Date:   Tue Jan 11 22:13:26 2022 +0000

    mac80211_hwsim: report NOACK frames in tx_status
    
    [ Upstream commit 42a79960ffa50bfe9e0bf5d6280be89bf563a5dd ]
    
    Add IEEE80211_TX_STAT_NOACK_TRANSMITTED to tx_status flags to have proper
    statistics for non-acked frames.
    
    Signed-off-by: Benjamin Beichler <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

MAINTAINERS: adjust file entry for of_net.c after movement [+ + +]

Author: Lukas Bulwahn <[email protected]>
Date:   Sat Oct 16 07:58:15 2021 +0200

    MAINTAINERS: adjust file entry for of_net.c after movement
    
    commit f616447034a120b18f6e612814641e7d8f5d7f0a upstream.
    
    Commit e330fb14590c ("of: net: move of_net under net/") moves of_net.c
    to ./net/core/, but misses to adjust the reference to this file in
    MAINTAINERS.
    
    Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:
    
       warning: no file matches    F:    drivers/of/of_net.c
    
    Adjust the file entry after this file movement.
    
    Signed-off-by: Lukas Bulwahn <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

memfd: fix F_SEAL_WRITE after shmem huge page allocated [+ + +]

Author: Hugh Dickins <[email protected]>
Date:   Fri Mar 4 20:29:01 2022 -0800

    memfd: fix F_SEAL_WRITE after shmem huge page allocated
    
    commit f2b277c4d1c63a85127e8aa2588e9cc3bd21cb99 upstream.
    
    Wangyong reports: after enabling tmpfs filesystem to support transparent
    hugepage with the following command:
    
      echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
    
    the docker program tries to add F_SEAL_WRITE through the following
    command, but it fails unexpectedly with errno EBUSY:
    
      fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
    
    That is because memfd_tag_pins() and memfd_wait_for_pins() were never
    updated for shmem huge pages: checking page_mapcount() against
    page_count() is hopeless on THP subpages - they need to check
    total_mapcount() against page_count() on THP heads only.
    
    Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
    (compared != 1): either can be justified, but given the non-atomic
    total_mapcount() calculation, it is better now to be strict.  Bear in
    mind that total_mapcount() itself scans all of the THP subpages, when
    choosing to take an XA_CHECK_SCHED latency break.
    
    Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
    page has been swapped out since memfd_tag_pins(), then its refcount must
    have fallen, and so it can safely be untagged.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Hugh Dickins <[email protected]>
    Reported-by: Zeal Robot <[email protected]>
    Reported-by: wangyong <[email protected]>
    Cc: Mike Kravetz <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Cc: CGEL ZTE <[email protected]>
    Cc: Kirill A. Shutemov <[email protected]>
    Cc: Song Liu <[email protected]>
    Cc: Yang Yang <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

MIPS: fix local_{add,sub}_return on MIPS64 [+ + +]

Author: Huang Pei <[email protected]>
Date:   Wed Dec 15 16:44:57 2021 +0800

    MIPS: fix local_{add,sub}_return on MIPS64
    
    [ Upstream commit 277c8cb3e8ac199f075bf9576ad286687ed17173 ]
    
    Use "daddu/dsubu" for long int on MIPS64 instead of "addu/subu"
    
    Fixes: 7232311ef14c ("local_t: mips extension")
    Signed-off-by: Huang Pei <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

MIPS: ralink: mt7621: do memory detection on KSEG1 [+ + +]

Author: Chuanhong Guo <[email protected]>
Date:   Fri Feb 11 08:13:44 2022 +0800

    MIPS: ralink: mt7621: do memory detection on KSEG1
    
    [ Upstream commit cc19db8b312a6c75645645f5cc1b45166b109006 ]
    
    It's reported that current memory detection code occasionally detects
    larger memory under some bootloaders.
    Current memory detection code tests whether address space wraps around
    on KSEG0, which is unreliable because it's cached.
    
    Rewrite memory size detection to perform the same test on KSEG1 instead.
    While at it, this patch also does the following two things:
    1. use a fixed pattern instead of a random function pointer as the magic
       value.
    2. add an additional memory write and a second comparison as part of the
       test to prevent possible smaller memory detection result due to
       leftover values in memory.
    
    Fixes: 139c949f7f0a MIPS: ("ralink: mt7621: add memory detection support")
    Reported-by: Rui Salvaterra <[email protected]>
    Signed-off-by: Chuanhong Guo <[email protected]>
    Tested-by: Sergio Paracuellos <[email protected]>
    Tested-by: Rui Salvaterra <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

MIPS: ralink: mt7621: use bitwise NOT instead of logical [+ + +]

Author: Ilya Lipnitskiy <[email protected]>
Date:   Mon Feb 28 17:15:07 2022 -0800

    MIPS: ralink: mt7621: use bitwise NOT instead of logical
    
    [ Upstream commit 5d8965704fe5662e2e4a7e4424a2cbe53e182670 ]
    
    It was the intention to reverse the bits, not make them all zero by
    using logical NOT operator.
    
    Fixes: cc19db8b312a ("MIPS: ralink: mt7621: do memory detection on KSEG1")
    Suggested-by: Chuanhong Guo <[email protected]>
    Signed-off-by: Ilya Lipnitskiy <[email protected]>
    Reviewed-by: Sergio Paracuellos <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mips: setup: fix setnocoherentio() boolean setting [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Mon Feb 21 09:50:29 2022 -0800

    mips: setup: fix setnocoherentio() boolean setting
    
    commit 1e6ae0e46e32749b130f1823da30cea9aa2a59a0 upstream.
    
    Correct a typo/pasto: setnocoherentio() should set
    dma_default_coherent to false, not true.
    
    Fixes: 14ac09a65e19 ("MIPS: refactor the runtime coherent vs noncoherent DMA indicators")
    Signed-off-by: Randy Dunlap <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Thomas Bogendoerfer <[email protected]>
    Cc: [email protected]
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls [+ + +]

Author: Daniel Borkmann <[email protected]>
Date:   Fri Mar 4 15:26:32 2022 +0100

    mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
    
    commit 0708a0afe291bdfe1386d74d5ec1f0c27e8b9168 upstream.
    
    syzkaller was recently triggering an oversized kvmalloc() warning via
    xdp_umem_create().
    
    The triggered warning was added back in 7661809d493b ("mm: don't allow
    oversized kvmalloc() calls"). The rationale for the warning for huge
    kvmalloc sizes was as a reaction to a security bug where the size was
    more than UINT_MAX but not everything was prepared to handle unsigned
    long sizes.
    
    Anyway, the AF_XDP related call trace from this syzkaller report was:
    
      kvmalloc include/linux/mm.h:806 [inline]
      kvmalloc_array include/linux/mm.h:824 [inline]
      kvcalloc include/linux/mm.h:829 [inline]
      xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
      xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
      xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
      xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
      __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
      __do_sys_setsockopt net/socket.c:2187 [inline]
      __se_sys_setsockopt net/socket.c:2184 [inline]
      __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Bjц╤rn mentioned that requests for >2GB allocation can still be valid:
    
      The structure that is being allocated is the page-pinning accounting.
      AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
      still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
      PAGE_SIZE on 64 bit systems). [...]
    
      I could just change from U32_MAX to INT_MAX, but as I stated earlier
      that has a hacky feeling to it. [...] From my perspective, the code
      isn't broken, with the memcg limits in consideration. [...]
    
    Linus says:
    
      [...] Pretty much every time this has come up, the kernel warning has
      shown that yes, the code was broken and there really wasn't a reason
      for doing allocations that big.
    
      Of course, some people would be perfectly fine with the allocation
      failing, they just don't want the warning. I didn't want __GFP_NOWARN
      to shut it up originally because I wanted people to see all those
      cases, but these days I think we can just say "yeah, people can shut
      it up explicitly by saying 'go ahead and fail this allocation, don't
      warn about it'".
    
      So enough time has passed that by now I'd certainly be ok with [it].
    
    Thus allow call-sites to silence such userspace triggered splats if the
    allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
    to kvcalloc() this is already the case, so nothing else needed there.
    
    Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
    Reported-by: [email protected]
    Suggested-by: Linus Torvalds <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Tested-by: [email protected]
    Cc: Bjц╤rn Tц╤pel <[email protected]>
    Cc: Magnus Karlsson <[email protected]>
    Cc: Willy Tarreau <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Alexei Starovoitov <[email protected]>
    Cc: Andrii Nakryiko <[email protected]>
    Cc: Jakub Kicinski <[email protected]>
    Cc: David S. Miller <[email protected]>
    Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
    Link: https://lore.kernel.org/bpf/[email protected]
    Reviewed-by: Leon Romanovsky <[email protected]>
    Ackd-by: Michal Hocko <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mm: defer kmemleak object creation of module_alloc() [+ + +]

Author: Kefeng Wang <[email protected]>
Date:   Fri Jan 14 14:04:11 2022 -0800

    mm: defer kmemleak object creation of module_alloc()
    
    [ Upstream commit 60115fa54ad7b913b7cb5844e6b7ffeb842d55f2 ]
    
    Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
    enabled(without KASAN_VMALLOC) on x86[1].
    
    When the module area allocates memory, it's kmemleak_object is created
    successfully, but the KASAN shadow memory of module allocation is not
    ready, so when kmemleak scan the module's pointer, it will panic due to
    no shadow memory with KASAN check.
    
      module_alloc
        __vmalloc_node_range
          kmemleak_vmalloc
                                    kmemleak_scan
                                      update_checksum
        kasan_module_alloc
          kmemleak_ignore
    
    Note, there is no problem if KASAN_VMALLOC enabled, the modules area
    entire shadow memory is preallocated.  Thus, the bug only exits on ARCH
    which supports dynamic allocation of module area per module load, for
    now, only x86/arm64/s390 are involved.
    
    Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
    kmemleak in module_alloc() to fix this issue.
    
    [1] https://lore.kernel.org/all/[email protected]/
    
    [[email protected]: fix build]
      Link: https://lkml.kernel.org/r/[email protected]
    [[email protected]: simplify ifdefs, per Andrey]
      Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 793213a82de4 ("s390/kasan: dynamic shadow mem allocation for modules")
    Fixes: 39d114ddc682 ("arm64: add KASAN support")
    Fixes: bebf56a1b176 ("kasan: enable instrumentation of global variables")
    Signed-off-by: Kefeng Wang <[email protected]>
    Reported-by: Yongqiang Liu <[email protected]>
    Cc: Andrey Konovalov <[email protected]>
    Cc: Andrey Ryabinin <[email protected]>
    Cc: Dmitry Vyukov <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: Heiko Carstens <[email protected]>
    Cc: Vasily Gorbik <[email protected]>
    Cc: Christian Borntraeger <[email protected]>
    Cc: Alexander Gordeev <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Alexander Potapenko <[email protected]>
    Cc: Kefeng Wang <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mptcp: Correctly set DATA_FIN timeout when number of retransmits is large [+ + +]

Author: Mat Martineau <[email protected]>
Date:   Thu Feb 24 16:52:59 2022 -0800

    mptcp: Correctly set DATA_FIN timeout when number of retransmits is large
    
    commit 877d11f0332cd2160e19e3313e262754c321fa36 upstream.
    
    Syzkaller with UBSAN uncovered a scenario where a large number of
    DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
    timeout calculation:
    
    ================================================================================
    UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
    shift exponent 32 is too large for 32-bit type 'unsigned int'
    CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    Workqueue: events mptcp_worker
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
     __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
     mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
     __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
     mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
     process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
     worker_thread+0x95/0xe10 kernel/workqueue.c:2454
     kthread+0x2f4/0x3b0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
     </TASK>
    ================================================================================
    
    This change limits the maximum timeout by limiting the size of the
    shift, which keeps all intermediate values in-bounds.
    
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
    Fixes: 6477dd39e62c ("mptcp: Retransmit DATA_FIN")
    Acked-by: Paolo Abeni <[email protected]>
    Signed-off-by: Mat Martineau <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mtd: spi-nor: Fix mtd size for s3an flashes [+ + +]

Author: Tudor Ambarus <[email protected]>
Date:   Tue Dec 7 16:02:41 2021 +0200

    mtd: spi-nor: Fix mtd size for s3an flashes
    
    [ Upstream commit f656b419d41aabafb6b526abc3988dfbf2e5c1ba ]
    
    As it was before the blamed commit, s3an_nor_scan() was called
    after mtd size was set with params->size, and it overwrote the mtd
    size value with '8 * nor->page_size * nor->info->n_sectors' when
    XSR_PAGESIZE was set. With the introduction of
    s3an_post_sfdp_fixups(), we missed to update the mtd size for the
    s3an flashes. Fix the mtd size by updating both nor->params->size,
    (which will update the mtd_info size later on) and nor->mtd.size
    (which is used in spi_nor_set_addr_width()).
    
    Fixes: 641edddb4f43 ("mtd: spi-nor: Add s3an_post_sfdp_fixups()")
    Signed-off-by: Tudor Ambarus <[email protected]>
    Reviewed-by: Pratyush Yadav <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic [+ + +]

Author: Raed Salem <[email protected]>
Date:   Thu Dec 2 17:43:50 2021 +0200

    net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
    
    [ Upstream commit 5352859b3bfa0ca188b2f1d2c1436fddc781e3b6 ]
    
    IPsec crypto offload always set the ethernet segment checksum flags with
    the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
    packet regardless of the encapsulated L4 header type, and even if it
    doesn't exists in the first place, this breaks non TCP/UDP traffic as
    such.
    
    Set the inner L4 checksum flag only when the encapsulated L4 header
    protocol is TCP/UDP using software parser swp_inner_l4_offset field as
    indication.
    
    Fixes: 5cfb540ef27b ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
    Signed-off-by: Raed Salem <[email protected]>
    Reviewed-by: Maor Dickman <[email protected]>
    Signed-off-by: Saeed Mahameed <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/mlx5e: IPsec: Refactor checksum code in tx data path [+ + +]

Author: Raed Salem <[email protected]>
Date:   Tue Oct 26 10:10:42 2021 +0300

    net/mlx5e: IPsec: Refactor checksum code in tx data path
    
    [ Upstream commit 428ffea0711a11efa0c1c4ee1fac27903ed091be ]
    
    Part of code that is related solely to IPsec is always compiled in the
    driver code regardless if the IPsec functionality is enabled or disabled
    in the driver code, this will add unnecessary branch in case IPsec is
    disabled at Tx data path.
    
    Move IPsec related code to IPsec related file such that in case of IPsec
    is disabled and because of unlikely macro the compiler should be able to
    optimize and omit the checksum IPsec code all together from Tx data path
    
    Signed-off-by: Raed Salem <[email protected]>
    Reviewed-by: Emeel Hakim <[email protected]>
    Signed-off-by: Saeed Mahameed <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/smc: fix connection leak [+ + +]

Author: D. Wythe <[email protected]>
Date:   Thu Feb 24 23:26:19 2022 +0800

    net/smc: fix connection leak
    
    commit 9f1c50cf39167ff71dc5953a3234f3f6eeb8fcb5 upstream.
    
    There's a potential leak issue under following execution sequence :
    
    smc_release                             smc_connect_work
    if (sk->sk_state == SMC_INIT)
                                            send_clc_confirim
            tcp_abort();
                                            ...
                                            sk.sk_state = SMC_ACTIVE
    smc_close_active
    switch(sk->sk_state) {
    ...
    case SMC_ACTIVE:
            smc_close_final()
            // then wait peer closed
    
    Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
    still in the tcp send buffer, in which case our connection token cannot
    be delivered to the server side, which means that we cannot get a
    passive close message at all. Therefore, it is impossible for the to be
    disconnected at all.
    
    This patch tries a very simple way to avoid this issue, once the state
    has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
    smc connection, considering that the state is SMC_INIT before
    tcp_abort(), abandoning the complete disconnection process should not
    cause too much problem.
    
    In fact, this problem may exist as long as the CLC CONFIRM message is
    not received by the server. Whether a timer should be added after
    smc_close_final() needs to be discussed in the future. But even so, this
    patch provides a faster release for connection in above case, it should
    also be valuable.
    
    Fixes: 39f41f367b08 ("net/smc: common release code for non-accepted sockets")
    Signed-off-by: D. Wythe <[email protected]>
    Acked-by: Karsten Graul <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server [+ + +]

Author: D. Wythe <[email protected]>
Date:   Wed Mar 2 21:25:12 2022 +0800

    net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
    
    commit 4940a1fdf31c39f0806ac831cde333134862030b upstream.
    
    The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
    Based on the fact that whether a new SMC connection can be accepted or
    not depends on not only the limit of conn nums, but also the available
    entries of rtoken. Since the rtoken release is trigger by peer, while
    the conn nums is decrease by local, tons of thing can happen in this
    time difference.
    
    This only thing that needs to be mentioned is that now all connection
    creations are completely protected by smc_server_lgr_pending lock, it's
    enough to check only the available entries in rtokens_used_mask.
    
    Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
    Signed-off-by: D. Wythe <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client [+ + +]

Author: D. Wythe <[email protected]>
Date:   Wed Mar 2 21:25:11 2022 +0800

    net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
    
    commit 0537f0a2151375dcf90c1bbfda6a0aaf57164e89 upstream.
    
    The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
    dues to following execution sequence:
    
    Server Conn A:           Server Conn B:                 Client Conn B:
    
    smc_lgr_unregister_conn
                            smc_lgr_register_conn
                            smc_clc_send_accept     ->
                                                            smc_rtoken_add
    smcr_buf_unuse
                    ->              Client Conn A:
                                    smc_rtoken_delete
    
    smc_lgr_unregister_conn() makes current link available to assigned to new
    incoming connection, while smcr_buf_unuse() has not executed yet, which
    means that smc_rtoken_add may fail because of insufficient rtoken_entry,
    reversing their execution order will avoid this problem.
    
    Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
    Signed-off-by: D. Wythe <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() [+ + +]

Author: Zheyu Ma <[email protected]>
Date:   Wed Mar 2 20:24:23 2022 +0800

    net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
    
    commit bd6f1fd5d33dfe5d1b4f2502d3694a7cc13f166d upstream.
    
    During driver initialization, the pointer of card info, i.e. the
    variable 'ci' is required. However, the definition of
    'com20020pci_id_table' reveals that this field is empty for some
    devices, which will cause null pointer dereference when initializing
    these devices.
    
    The following log reveals it:
    
    [    3.973806] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
    [    3.973819] RIP: 0010:com20020pci_probe+0x18d/0x13e0 [com20020_pci]
    [    3.975181] Call Trace:
    [    3.976208]  local_pci_probe+0x13f/0x210
    [    3.977248]  pci_device_probe+0x34c/0x6d0
    [    3.977255]  ? pci_uevent+0x470/0x470
    [    3.978265]  really_probe+0x24c/0x8d0
    [    3.978273]  __driver_probe_device+0x1b3/0x280
    [    3.979288]  driver_probe_device+0x50/0x370
    
    Fix this by checking whether the 'ci' is a null pointer first.
    
    Fixes: 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
    Signed-off-by: Zheyu Ma <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: chelsio: cxgb3: check the return value of pci_find_capability() [+ + +]

Author: Jia-Ju Bai <[email protected]>
Date:   Fri Feb 25 04:37:27 2022 -0800

    net: chelsio: cxgb3: check the return value of pci_find_capability()
    
    [ Upstream commit 767b9825ed1765894e569a3d698749d40d83762a ]
    
    The function pci_find_capability() in t3_prep_adapter() can fail, so its
    return value should be checked.
    
    Fixes: 4d22de3e6cc4 ("Add support for the latest 1G/10G Chelsio adapter, T3")
    Reported-by: TOTE Robot <[email protected]>
    Signed-off-by: Jia-Ju Bai <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dcb: disable softirqs in dcbnl_flush_dev() [+ + +]

Author: Vladimir Oltean <[email protected]>
Date:   Wed Mar 2 21:39:39 2022 +0200

    net: dcb: disable softirqs in dcbnl_flush_dev()
    
    [ Upstream commit 10b6bb62ae1a49ee818fc479cf57b8900176773e ]
    
    Ido Schimmel points out that since commit 52cff74eef5d ("dcbnl : Disable
    software interrupts before taking dcb_lock"), the DCB API can be called
    by drivers from softirq context.
    
    One such in-tree example is the chelsio cxgb4 driver:
    dcb_rpl
    -> cxgb4_dcb_handle_fw_update
       -> dcb_ieee_setapp
    
    If the firmware for this driver happened to send an event which resulted
    in a call to dcb_ieee_setapp() at the exact same time as another
    DCB-enabled interface was unregistering on the same CPU, the softirq
    would deadlock, because the interrupted process was already holding the
    dcb_lock in dcbnl_flush_dev().
    
    Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
    in the rest of the dcbnl code.
    
    Fixes: 91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices")
    Reported-by: Ido Schimmel <[email protected]>
    Signed-off-by: Vladimir Oltean <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dcb: flush lingering app table entries for unregistered devices [+ + +]

Author: Vladimir Oltean <[email protected]>
Date:   Thu Feb 24 18:01:54 2022 +0200

    net: dcb: flush lingering app table entries for unregistered devices
    
    commit 91b0383fef06f20b847fa9e4f0e3054ead0b1a1b upstream.
    
    If I'm not mistaken (and I don't think I am), the way in which the
    dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
    the application table with dynamically allocated struct dcb_app_type
    entries that are kept in the module-global dcb_app_list.
    
    However, nobody keeps exact track of these entries, and although
    dcb_ieee_delapp() is supposed to remove them, nobody does so when the
    interface goes away (example: driver unbinds from device). So the
    dcb_app_list will contain lingering entries with an ifindex that no
    longer matches any device in dcb_app_lookup().
    
    Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
    flushing the app table entries of interfaces that are now gone.
    
    In fact something like this used to be done as part of the initial
    commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
    essentially at module_exit time. That became dead code after commit
    7a6b6f515f77 ("DCB: fix kconfig option") which essentially merged
    "tristate config DCB" and "bool config DCBNL" into a single "bool config
    DCB", so net/dcb/dcbnl.c could not be built as a module anymore.
    
    Commit 36b9ad8084bd ("net/dcb: make dcbnl.c explicitly non-modular")
    recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
    leaving us with the version we have today.
    
    Since flushing application table entries can and should be done as soon
    as the netdevice disappears, fundamentally the commit that is to blame
    is the one that introduced the design of this API.
    
    Fixes: 9ab933ab2cc8 ("dcbnl: add appliction tlv handlers")
    Signed-off-by: Vladimir Oltean <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: dsa: ocelot: seville: utilize of_mdiobus_register [+ + +]

Author: Colin Foster <[email protected]>
Date:   Sun Nov 28 17:57:36 2021 -0800

    net: dsa: ocelot: seville: utilize of_mdiobus_register
    
    [ Upstream commit 5186c4a05b9713138b762a49467a8ab9753cdb36 ]
    
    Switch seville to use of_mdiobus_register(bus, NULL) instead of just
    mdiobus_register. This code is about to be pulled into a separate module
    that can optionally define ports by the device_node.
    
    Signed-off-by: Colin Foster <[email protected]>
    Reviewed-by: Florian Fainelli <[email protected]>
    Reviewed-by: Vladimir Oltean <[email protected]>
    Tested-by: Vladimir Oltean <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dsa: seville: register the mdiobus under devres [+ + +]

Author: Vladimir Oltean <[email protected]>
Date:   Mon Feb 7 18:15:51 2022 +0200

    net: dsa: seville: register the mdiobus under devres
    
    [ Upstream commit bd488afc3b39e045ba71aab472233f2a78726e7b ]
    
    As explained in commits:
    74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
    5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
    
    mdiobus_free() will panic when called from devm_mdiobus_free() <-
    devres_release_all() <- __device_release_driver(), and that mdiobus was
    not previously unregistered.
    
    The Seville VSC9959 switch is a platform device, so the initial set of
    constraints that I thought would cause this (I2C or SPI buses which call
    ->remove on ->shutdown) do not apply. But there is one more which
    applies here.
    
    If the DSA master itself is on a bus that calls ->remove from ->shutdown
    (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
    between the switch and the DSA master, and device_links_unbind_consumers()
    will unbind the seville switch driver on shutdown.
    
    So the same treatment must be applied to all DSA switch drivers, which
    is: either use devres for both the mdiobus allocation and registration,
    or don't use devres at all.
    
    The seville driver has a code structure that could accommodate both the
    mdiobus_unregister and mdiobus_free calls, but it has an external
    dependency upon mscc_miim_setup() from mdio-mscc-miim.c, which calls
    devm_mdiobus_alloc_size() on its behalf. So rather than restructuring
    that, and exporting yet one more symbol mscc_miim_teardown(), let's work
    with devres and replace of_mdiobus_register with the devres variant.
    When we use all-devres, we can ensure that devres doesn't free a
    still-registered bus (it either runs both callbacks, or none).
    
    Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
    Signed-off-by: Vladimir Oltean <[email protected]>
    Reviewed-by: Florian Fainelli <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: ethernet: litex: Add the dependency on HAS_IOMEM [+ + +]

Author: Cai Huoqing <[email protected]>
Date:   Tue Feb 8 09:33:08 2022 +0800

    net: ethernet: litex: Add the dependency on HAS_IOMEM
    
    [ Upstream commit 2427f03fb42f9dc14c53108f2c9b5563eb37e770 ]
    
    The LiteX driver uses devm io function API which
    needs HAS_IOMEM enabled, so add the dependency on HAS_IOMEM.
    
    Fixes: ee7da21ac4c3 ("net: Add driver for LiteX's LiteETH network interface")
    Signed-off-by: Cai Huoqing <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: fix up skbs delta_truesize in UDP GRO frag_list [+ + +]

Author: lena wang <[email protected]>
Date:   Tue Mar 1 19:17:09 2022 +0800

    net: fix up skbs delta_truesize in UDP GRO frag_list
    
    commit 224102de2ff105a2c05695e66a08f4b5b6b2d19c upstream.
    
    The truesize for a UDP GRO packet is added by main skb and skbs in main
    skb's frag_list:
    skb_gro_receive_list
            p->truesize += skb->truesize;
    
    The commit 53475c5dd856 ("net: fix use-after-free when UDP GRO with
    shared fraglist") introduced a truesize increase for frag_list skbs.
    When uncloning skb, it will call pskb_expand_head and trusesize for
    frag_list skbs may increase. This can occur when allocators uses
    __netdev_alloc_skb and not jump into __alloc_skb. This flow does not
    use ksize(len) to calculate truesize while pskb_expand_head uses.
    skb_segment_list
    err = skb_unclone(nskb, GFP_ATOMIC);
    pskb_expand_head
            if (!skb->sk || skb->destructor == sock_edemux)
                    skb->truesize += size - osize;
    
    If we uses increased truesize adding as delta_truesize, it will be
    larger than before and even larger than previous total truesize value
    if skbs in frag_list are abundant. The main skb truesize will become
    smaller and even a minus value or a huge value for an unsigned int
    parameter. Then the following memory check will drop this abnormal skb.
    
    To avoid this error we should use the original truesize to segment the
    main skb.
    
    Fixes: 53475c5dd856 ("net: fix use-after-free when UDP GRO with shared fraglist")
    Signed-off-by: lena wang <[email protected]>
    Acked-by: Paolo Abeni <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ipa: add an interconnect dependency [+ + +]

Author: Alex Elder <[email protected]>
Date:   Tue Mar 1 05:34:40 2022 -0600

    net: ipa: add an interconnect dependency
    
    commit 1dba41c9d2e2dc94b543394974f63d55aa195bfe upstream.
    
    In order to function, the IPA driver very clearly requires the
    interconnect framework to be enabled in the kernel configuration.
    State that dependency in the Kconfig file.
    
    This became a problem when CONFIG_COMPILE_TEST support was added.
    Non-Qualcomm platforms won't necessarily enable CONFIG_INTERCONNECT.
    
    Reported-by: kernel test robot <[email protected]>
    Fixes: 38a4066f593c5 ("net: ipa: support COMPILE_TEST")
    Signed-off-by: Alex Elder <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ipv6: ensure we call ipv6_mc_down() at most once [+ + +]

Author: [email protected] <[email protected]>
Date:   Thu Feb 24 10:06:49 2022 +0100

    net: ipv6: ensure we call ipv6_mc_down() at most once
    
    commit 9995b408f17ff8c7f11bc725c8aa225ba3a63b1c upstream.
    
    There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
    either the network device is actually going down, or IPv6 was disabled
    on the interface.
    
    If either of them stays down while the other is toggled, we repeatedly
    call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
    calling the corresponding ipv6_mc_up() in between. This will cause a
    new entry in idev->mc_tomb to be allocated for each multicast group
    the interface is subscribed to, which in turn leaks one struct ifmcaddr6
    per nontrivial multicast group the interface is subscribed to.
    
    The following reproducer will leak at least $n objects:
    
    ip addr add ff2e::4242/32 dev eth0 autojoin
    sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
    for i in $(seq 1 $n); do
            ip link set up eth0; ip link set down eth0
    done
    
    Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
    sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
    can also be used to create a nontrivial idev->mc_list, which will the
    leak objects with the right up-down-sequence.
    
    Based on both sources for NETDEV_DOWN events the interface IPv6 state
    should be considered:
    
     - not ready if the network interface is not ready OR IPv6 is disabled
       for it
     - ready if the network interface is ready AND IPv6 is enabled for it
    
    The functions ipv6_mc_up() and ipv6_down() should only be run when this
    state changes.
    
    Implement this by remembering when the IPv6 state is ready, and only
    run ipv6_mc_down() if it actually changed from ready to not ready.
    
    The other direction (not ready -> ready) already works correctly, as:
    
     - the interface notification triggered codepath for NETDEV_UP /
       NETDEV_CHANGE returns early if ipv6 is disabled, and
     - the disable_ipv6=0 triggered codepath skips fully initializing the
       interface as long as addrconf_link_ready(dev) returns false
     - calling ipv6_mc_up() repeatedly does not leak anything
    
    Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
    Signed-off-by: Johannes Nixdorf <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: of: fix stub of_net helpers for CONFIG_NET=n [+ + +]

Author: Arnd Bergmann <[email protected]>
Date:   Thu Oct 14 11:00:37 2021 +0200

    net: of: fix stub of_net helpers for CONFIG_NET=n
    
    [ Upstream commit 8b017fbe0bbb98dd71fb4850f6b9cc0e136a26b8 ]
    
    Moving the of_net code from drivers/of/ to net/core means we
    no longer stub out the helpers when networking is disabled,
    which leads to a randconfig build failure with at least one
    ARM platform that calls this from non-networking code:
    
    arm-linux-gnueabi-ld: arch/arm/mach-mvebu/kirkwood.o: in function `kirkwood_dt_eth_fixup':
    kirkwood.c:(.init.text+0x54): undefined reference to `of_get_mac_address'
    
    Restore the way this worked before by changing that #ifdef
    check back to testing for both CONFIG_OF and CONFIG_NET.
    
    Fixes: e330fb14590c ("of: net: move of_net under net/")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: sparx5: Fix add vlan when invalid operation [+ + +]

Author: Casper Andersson <[email protected]>
Date:   Fri Feb 25 11:15:16 2022 +0100

    net: sparx5: Fix add vlan when invalid operation
    
    [ Upstream commit b3a34dc362c03215031b268fcc0b988e69490231 ]
    
    Check if operation is valid before changing any
    settings in hardware. Otherwise it results in
    changes being made despite it not being a valid
    operation.
    
    Fixes: 78eab33bb68b ("net: sparx5: add vlan support")
    
    Signed-off-by: Casper Andersson <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: enhance XDP ZC driver level switching performance [+ + +]

Author: Ong Boon Leong <[email protected]>
Date:   Thu Nov 11 22:39:49 2021 +0800

    net: stmmac: enhance XDP ZC driver level switching performance
    
    [ Upstream commit ac746c8520d9d056b6963ecca8ff1da9929d02f1 ]
    
    The previous stmmac_xdp_set_prog() implementation uses stmmac_release()
    and stmmac_open() which tear down the PHY device and causes undesirable
    autonegotiation which causes a delay whenever AFXDP ZC is setup.
    
    This patch introduces two new functions that just sufficiently tear
    down DMA descriptors, buffer, NAPI process, and IRQs and reestablish
    them accordingly in both stmmac_xdp_release() and stammac_xdp_open().
    
    As the results of this enhancement, we get rid of transient state
    introduced by the link auto-negotiation:
    
    $ ./xdpsock -i eth0 -t -z
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 634444         634560
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 632330         1267072
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 632438         1899584
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 632502         2532160
    
    Reported-by: Kurt Kanzenbach <[email protected]>
    Signed-off-by: Ong Boon Leong <[email protected]>
    Tested-by: Kurt Kanzenbach <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: fix return value of __setup handler [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Feb 23 19:35:36 2022 -0800

    net: stmmac: fix return value of __setup handler
    
    commit e01b042e580f1fbf4fd8da467442451da00c7a90 upstream.
    
    __setup() handlers should return 1 on success, i.e., the parameter
    has been handled. A return of 0 causes the "option=value" string to be
    added to init's environment strings, polluting it.
    
    Fixes: 47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers.")
    Fixes: f3240e2811f0 ("stmmac: remove warning when compile as built-in (V2)")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Link: lore.kernel.org/r/[email protected]
    Cc: Giuseppe Cavallaro <[email protected]>
    Cc: Alexandre Torgue <[email protected]>
    Cc: Jose Abreu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: stmmac: only enable DMA interrupts when ready [+ + +]

Author: Vincent Whitchurch <[email protected]>
Date:   Thu Feb 24 12:38:29 2022 +0100

    net: stmmac: only enable DMA interrupts when ready
    
    [ Upstream commit 087a7b944c5db409f7c1a68bf4896c56ba54eaff ]
    
    In this driver's ->ndo_open() callback, it enables DMA interrupts,
    starts the DMA channels, then requests interrupts with request_irq(),
    and then finally enables napi.
    
    If RX DMA interrupts are received before napi is enabled, no processing
    is done because napi_schedule_prep() will return false.  If the network
    has a lot of broadcast/multicast traffic, then the RX ring could fill up
    completely before napi is enabled.  When this happens, no further RX
    interrupts will be delivered, and the driver will fail to receive any
    packets.
    
    Fix this by only enabling DMA interrupts after all other initialization
    is complete.
    
    Fixes: 523f11b5d4fd72efb ("net: stmmac: move hardware setup for stmmac_open to new function")
    Reported-by: Lars Persson <[email protected]>
    Signed-off-by: Vincent Whitchurch <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: perserve TX and RX coalesce value during XDP setup [+ + +]

Author: Ong Boon Leong <[email protected]>
Date:   Wed Nov 24 19:40:19 2021 +0800

    net: stmmac: perserve TX and RX coalesce value during XDP setup
    
    commit 61da6ac715700bcfeef50d187e15c6cc7c9d079b upstream.
    
    When XDP program is loaded, it is desirable that the previous TX and RX
    coalesce values are not re-inited to its default value. This prevents
    unnecessary re-configurig the coalesce values that were working fine
    before.
    
    Fixes: ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance")
    Signed-off-by: Ong Boon Leong <[email protected]>
    Tested-by: Kurt Kanzenbach <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: sxgbe: fix return value of __setup handler [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Feb 23 19:35:28 2022 -0800

    net: sxgbe: fix return value of __setup handler
    
    commit 50e06ddceeea263f57fe92baa677c638ecd65bb6 upstream.
    
    __setup() handlers should return 1 on success, i.e., the parameter
    has been handled. A return of 0 causes the "option=value" string to be
    added to init's environment strings, polluting it.
    
    Fixes: acc18c147b22 ("net: sxgbe: add EEE(Energy Efficient Ethernet) for Samsung sxgbe")
    Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Link: lore.kernel.org/r/[email protected]
    Cc: Siva Reddy <[email protected]>
    Cc: Girish K S <[email protected]>
    Cc: Byungho An <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 [+ + +]

Author: Daniele Palmas <[email protected]>
Date:   Tue Feb 15 12:13:35 2022 +0100

    net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
    
    [ Upstream commit 21e8a96377e6b6debae42164605bf9dcbe5720c5 ]
    
    Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
    0x1071 composition in order to avoid bind error.
    
    Signed-off-by: Daniele Palmas <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: fix use-after-free in __nf_register_net_hook() [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Sun Feb 27 10:01:41 2022 -0800

    netfilter: fix use-after-free in __nf_register_net_hook()
    
    commit 56763f12b0f02706576a088e85ef856deacc98a0 upstream.
    
    We must not dereference @new_hooks after nf_hook_mutex has been released,
    because other threads might have freed our allocated hooks already.
    
    BUG: KASAN: use-after-free in nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
    BUG: KASAN: use-after-free in hooks_validate net/netfilter/core.c:171 [inline]
    BUG: KASAN: use-after-free in __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
    Read of size 2 at addr ffff88801c1a8000 by task syz-executor237/4430
    
    CPU: 1 PID: 4430 Comm: syz-executor237 Not tainted 5.17.0-rc5-syzkaller-00306-g2293be58d6a1 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
     __kasan_report mm/kasan/report.c:442 [inline]
     kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
     nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
     hooks_validate net/netfilter/core.c:171 [inline]
     __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
     nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
     nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
     nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
     synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
     xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
     check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
     find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
     translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
     do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
     do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
     nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
     ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1024
     rawv6_setsockopt+0xd3/0x6a0 net/ipv6/raw.c:1084
     __sys_setsockopt+0x2db/0x610 net/socket.c:2180
     __do_sys_setsockopt net/socket.c:2191 [inline]
     __se_sys_setsockopt net/socket.c:2188 [inline]
     __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f65a1ace7d9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 71 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f65a1a7f308 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f65a1ace7d9
    RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
    RBP: 00007f65a1b574c8 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000020000000 R11: 0000000000000246 R12: 00007f65a1b55130
    R13: 00007f65a1b574c0 R14: 00007f65a1b24090 R15: 0000000000022000
     </TASK>
    
    The buggy address belongs to the page:
    page:ffffea0000706a00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c1a8
    flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
    raw: 00fff00000000000 ffffea0001c1b108 ffffea000046dd08 0000000000000000
    raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as freed
    page last allocated via order 2, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 4430, ts 1061781545818, free_ts 1061791488993
     prep_new_page mm/page_alloc.c:2434 [inline]
     get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
     __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
     __alloc_pages_node include/linux/gfp.h:572 [inline]
     alloc_pages_node include/linux/gfp.h:595 [inline]
     kmalloc_large_node+0x62/0x130 mm/slub.c:4438
     __kmalloc_node+0x35a/0x4a0 mm/slub.c:4454
     kmalloc_node include/linux/slab.h:604 [inline]
     kvmalloc_node+0x97/0x100 mm/util.c:580
     kvmalloc include/linux/slab.h:731 [inline]
     kvzalloc include/linux/slab.h:739 [inline]
     allocate_hook_entries_size net/netfilter/core.c:61 [inline]
     nf_hook_entries_grow+0x140/0x780 net/netfilter/core.c:128
     __nf_register_net_hook+0x144/0x820 net/netfilter/core.c:429
     nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
     nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
     nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
     synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
     xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
     check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
     find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
     translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
     do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
     do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
     nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
    page last free stack trace:
     reset_page_owner include/linux/page_owner.h:24 [inline]
     free_pages_prepare mm/page_alloc.c:1352 [inline]
     free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
     free_unref_page_prepare mm/page_alloc.c:3325 [inline]
     free_unref_page+0x19/0x690 mm/page_alloc.c:3404
     kvfree+0x42/0x50 mm/util.c:613
     rcu_do_batch kernel/rcu/tree.c:2527 [inline]
     rcu_core+0x7b1/0x1820 kernel/rcu/tree.c:2778
     __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
    
    Memory state around the buggy address:
     ffff88801c1a7f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ffff88801c1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    >ffff88801c1a8000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                       ^
     ffff88801c1a8080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ffff88801c1a8100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    
    Fixes: 2420b79f8c18 ("netfilter: debug: check for sorted array")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reported-by: syzbot <[email protected]>
    Acked-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_queue: don't assume sk is full socket [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Fri Feb 25 14:02:41 2022 +0100

    netfilter: nf_queue: don't assume sk is full socket
    
    commit 747670fd9a2d1b7774030dba65ca022ba442ce71 upstream.
    
    There is no guarantee that state->sk refers to a full socket.
    
    If refcount transitions to 0, sock_put calls sk_free which then ends up
    with garbage fields.
    
    I'd like to thank Oleksandr Natalenko and Jiri Benc for considerable
    debug work and pointing out state->sk oddities.
    
    Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
    Tested-by: Oleksandr Natalenko <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_queue: fix possible use-after-free [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Mon Feb 28 06:22:22 2022 +0100

    netfilter: nf_queue: fix possible use-after-free
    
    commit c3873070247d9e3c7a6b0cf9bf9b45e8018427b1 upstream.
    
    Eric Dumazet says:
      The sock_hold() side seems suspect, because there is no guarantee
      that sk_refcnt is not already 0.
    
    On failure, we cannot queue the packet and need to indicate an
    error.  The packet will be dropped by the caller.
    
    v2: split skb prefetch hunk into separate change
    
    Fixes: 271b72c7fa82c ("udp: RCU handling for Unicast packets.")
    Reported-by: Eric Dumazet <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_queue: handle socket prefetch [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Tue Mar 1 00:46:19 2022 +0100

    netfilter: nf_queue: handle socket prefetch
    
    commit 3b836da4081fa585cf6c392f62557496f2cb0efe upstream.
    
    In case someone combines bpf socket assign and nf_queue, then we will
    queue an skb who references a struct sock that did not have its
    reference count incremented.
    
    As we leave rcu protection, there is no guarantee that skb->sk is still
    valid.
    
    For refcount-less skb->sk case, try to increment the reference count
    and then override the destructor.
    
    In case of failure we have two choices: orphan the skb and 'delete'
    preselect or let nf_queue() drop the packet.
    
    Do the latter, it should not happen during normal operation.
    
    Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
    Acked-by: Joe Stringer <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Tue Feb 22 10:13:31 2022 -0800

    netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
    
    [ Upstream commit ae089831ff28a115908b8d796f667c2dadef1637 ]
    
    While kfree_rcu(ptr) _is_ supported, it has some limitations.
    
    Given that 99.99% of kfree_rcu() users [1] use the legacy
    two parameters variant, and @catchall objects do have an rcu head,
    simply use it.
    
    Choice of kfree_rcu(ptr) variant was probably not intentional.
    
    [1] including calls from net/netfilter/nf_tables_api.c
    
    Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nfsd: fix crash on COPY_NOTIFY with special stateid [+ + +]

Author: J. Bruce Fields <[email protected]>
Date:   Wed Jan 5 14:15:03 2022 -0500

    nfsd: fix crash on COPY_NOTIFY with special stateid
    
    [ Upstream commit 074b07d94e0bb6ddce5690a9b7e2373088e8b33a ]
    
    RTM says "If the special ONE stateid is passed to
    nfs4_preprocess_stateid_op(), it returns status=0 but does not set
    *cstid. nfsd4_copy_notify() depends on stid being set if status=0, and
    thus can crash if the client sends the right COPY_NOTIFY RPC."
    
    RFC 7862 says "The cna_src_stateid MUST refer to either open or locking
    states provided earlier by the server.  If it is invalid, then the
    operation MUST fail."
    
    The RFC doesn't specify an error, and the choice doesn't matter much as
    this is clearly illegal client behavior, but bad_stateid seems
    reasonable.
    
    Simplest is just to guarantee that nfs4_preprocess_stateid_op, called
    with non-NULL cstid, errors out if it can't return a stateid.
    
    Reported-by: [email protected]
    Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation")
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Reviewed-by: Olga Kornievskaia <[email protected]>
    Tested-by: Olga Kornievskaia <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NFSD: Fix verifier returned in stable WRITEs [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Dec 28 12:35:43 2021 -0500

    NFSD: Fix verifier returned in stable WRITEs
    
    [ Upstream commit f11ad7aa653130b71e2e89bed207f387718216d5 ]
    
    RFC 8881 explains the purpose of the write verifier this way:
    
    > The final portion of the result is the field writeverf. This field
    > is the write verifier and is a cookie that the client can use to
    > determine whether a server has changed instance state (e.g., server
    > restart) between a call to WRITE and a subsequent call to either
    > WRITE or COMMIT.
    
    But then it says:
    
    > This cookie MUST be unchanged during a single instance of the
    > NFSv4.1 server and MUST be unique between instances of the NFSv4.1
    > server. If the cookie changes, then the client MUST assume that
    > any data written with an UNSTABLE4 value for committed and an old
    > writeverf in the reply has been lost and will need to be
    > recovered.
    
    RFC 1813 has similar language for NFSv3. NFSv2 does not have a write
    verifier since it doesn't implement the COMMIT procedure.
    
    Since commit 19e0663ff9bc ("nfsd: Ensure sampling of the write
    verifier is atomic with the write"), the Linux NFS server has
    returned a boot-time-based verifier for UNSTABLE WRITEs, but a zero
    verifier for FILE_SYNC and DATA_SYNC WRITEs. FILE_SYNC and DATA_SYNC
    WRITEs are not followed up with a COMMIT, so there's no need for
    clients to compare verifiers for stable writes.
    
    However, by returning a different verifier for stable and unstable
    writes, the above commit puts the Linux NFS server a step farther
    out of compliance with the first MUST above. At least one NFS client
    (FreeBSD) noticed the difference, making this a potential
    regression.
    
    Reported-by: Rick Macklem <[email protected]>
    Link: https://lore.kernel.org/linux-nfs/YQXPR0101MB096857EEACF04A6DF1FC6D9BDD749@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM/T/
    Fixes: 19e0663ff9bc ("nfsd: Ensure sampling of the write verifier is atomic with the write")
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NFSD: Fix zero-length NFSv3 WRITEs [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Tue Dec 21 11:52:06 2021 -0500

    NFSD: Fix zero-length NFSv3 WRITEs
    
    [ Upstream commit 6a2f774424bfdcc2df3e17de0cefe74a4269cad5 ]
    
    The Linux NFS server currently responds to a zero-length NFSv3 WRITE
    request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE
    with NFS4_OK and count of zero.
    
    RFC 1813 says of the WRITE procedure's @count argument:
    
    count
             The number of bytes of data to be written. If count is
             0, the WRITE will succeed and return a count of 0,
             barring errors due to permissions checking.
    
    RFC 8881 has similar language for NFSv4, though NFSv4 removed the
    explicit @count argument because that value is already contained in
    the opaque payload array.
    
    The synthetic client pynfs's WRT4 and WRT15 tests do emit zero-
    length WRITEs to exercise this spec requirement. Commit fdec6114ee1f
    ("nfsd4: zero-length WRITE should succeed") addressed the same
    problem there with the same fix.
    
    But interestingly the Linux NFS client does not appear to emit zero-
    length WRITEs, instead squelching them. I'm not aware of a test that
    can generate such WRITEs for NFSv3, so I wrote a naive C program to
    generate a zero-length WRITE and test this fix.
    
    Fixes: 8154ef2776aa ("NFSD: Clean up legacy NFS WRITE argument XDR decoders")
    Reported-by: Trond Myklebust <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    Cc: [email protected]
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment() [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Thu Sep 30 17:06:21 2021 -0400

    NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
    
    [ Upstream commit dae9a6cab8009e526570e7477ce858dcdfeb256e ]
    
    Refactor.
    
    Now that the NFSv2 and NFSv3 XDR decoders have been converted to
    use xdr_streams, the WRITE decoder functions can use
    xdr_stream_subsegment() to extract the WRITE payload into its own
    xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.
    
    That makes it possible to pass the first kvec, pages array + length,
    page_base, and total payload length via a single function parameter.
    
    The payload's page_base is not yet assigned or used, but will be in
    subsequent patches.
    
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: J. Bruce Fields <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nl80211: Handle nla_memdup failures in handle_nan_filter [+ + +]

Author: Jiasheng Jiang <[email protected]>
Date:   Tue Mar 1 18:00:20 2022 +0800

    nl80211: Handle nla_memdup failures in handle_nan_filter
    
    [ Upstream commit 6ad27f522cb3b210476daf63ce6ddb6568c0508b ]
    
    As there's potential for failure of the nla_memdup(),
    check the return value.
    
    Fixes: a442b761b24b ("cfg80211: add add_nan_func / del_nan_func")
    Signed-off-by: Jiasheng Jiang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ntb: intel: fix port config status offset for SPR [+ + +]

Author: Dave Jiang <[email protected]>
Date:   Thu Jan 27 13:31:12 2022 -0700

    ntb: intel: fix port config status offset for SPR
    
    commit d5081bf5dcfb1cb83fb538708b0ac07a10a79cc4 upstream.
    
    The field offset for port configuration status on SPR has been changed to
    bit 14 from ICX where it resides at bit 12. By chance link status detection
    continued to work on SPR. This is due to bit 12 being a configuration bit
    which is in sync with the status bit. Fix this by checking for a SPR device
    and checking correct status bit.
    
    Fixes: 26bfe3d0b227 ("ntb: intel: Add Icelake (gen4) support for Intel NTB")
    Tested-by: Jerry Dai <[email protected]>
    Signed-off-by: Dave Jiang <[email protected]>
    Signed-off-by: Jon Mason <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ntb_hw_switchtec: Fix bug with more than 32 partitions [+ + +]

Author: Wesley Sheng <[email protected]>
Date:   Thu Dec 23 17:23:30 2021 -0800

    ntb_hw_switchtec: Fix bug with more than 32 partitions
    
    [ Upstream commit 7ff351c86b6b258f387502ab2c9b9d04f82c1c3d ]
    
    Switchtec could support as mush as 48 partitions, but ffs & fls are
    for 32 bit argument, in case of partition index larger than 31, the
    current code could not parse the peer partition index correctly.
    Change to the 64 bit version __ffs64 & fls64 accordingly to fix this
    bug.
    
    Fixes: 3df54c870f52 ("ntb_hw_switchtec: Allow using Switchtec NTB in multi-partition setups")
    Signed-off-by: Wesley Sheng <[email protected]>
    Signed-off-by: Kelvin Cao <[email protected]>
    Signed-off-by: Jon Mason <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all [+ + +]

Author: Jeremy Pallotta <[email protected]>
Date:   Thu Dec 23 17:23:29 2021 -0800

    ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all
    
    [ Upstream commit 32c3d375b0ed84b6acb51ae5ebef35ff0d649d85 ]
    
    Array mmio_part_cfg_all holds the partition configuration of all
    partitions, with partition number as index. Fix this by reading into
    mmio_part_cfg_all for pff.
    
    Fixes: 0ee28f26f378 ("NTB: switchtec_ntb: Add link management")
    Signed-off-by: Jeremy Pallotta <[email protected]>
    Signed-off-by: Kelvin Cao <[email protected]>
    Signed-off-by: Jon Mason <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: Add KPU changes to parse NGIO as separate layer [+ + +]

Author: Kiran Kumar K <[email protected]>
Date:   Fri Jan 21 12:04:47 2022 +0530

    octeontx2-af: Add KPU changes to parse NGIO as separate layer
    
    [ Upstream commit 745166fcf01cecc4f5ff3defc6586868349a43f9 ]
    
    With current KPU profile NGIO is being parsed along with CTAG as
    a single layer. Because of this MCAM/ntuple rules installed with
    ethertype as 0x8842 are not being hit. Adding KPU profile changes
    to parse NGIO in separate ltype and CTAG in separate ltype.
    
    Fixes: f9c49be90c05 ("octeontx2-af: Update the default KPU profile and fixes")
    Signed-off-by: Kiran Kumar K <[email protected]>
    Signed-off-by: Subbaraya Sundeep <[email protected]>
    Signed-off-by: Sunil Goutham <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: Adjust LA pointer for cpt parse header [+ + +]

Author: Kiran Kumar K <[email protected]>
Date:   Wed Sep 29 11:28:31 2021 +0530

    octeontx2-af: Adjust LA pointer for cpt parse header
    
    [ Upstream commit 85212a127e469c5560daf63a9782755ee4b03619 ]
    
    In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the
    start of cpt parse header. Since cpt parse header has veriable
    length padding, this will be a problem for DMAC extraction. Adding
    KPU profile changes to adjust the LA pointer to start at ether header
    in case of cpt parse header by
       - Adding ptr advance in pkind 58 to a fixed value 40
       - Adding variable length offset 7 and mask 7 (pad len in
         CPT_PARSE_HDR).
    Also added the missing static declaration for npc_set_var_len_offset_pkind
    function.
    
    Signed-off-by: Kiran Kumar K <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: cn10k: RPM hardware timestamp configuration [+ + +]

Author: Hariprasad Kelam <[email protected]>
Date:   Tue Sep 28 17:00:59 2021 +0530

    octeontx2-af: cn10k: RPM hardware timestamp configuration
    
    [ Upstream commit d1489208681dfe432609fdaa49b160219c6e221c ]
    
    MAC on CN10K support hardware timestamping such that 8 bytes addition
    header is prepended to incoming packets. This patch does necessary
    configuration to enable Hardware time stamping upon receiving request
    from PF netdev interfaces.
    
    Timestamp configuration is different on MAC (CGX) Octeontx2 silicon
    and MAC (RPM) OcteonTX3 CN10k. Based on silicon variant appropriate
    fn() pointer is called. Refactor MAC specific mbox messages to remove
    unnecessary gaps in mboxids.
    
    Signed-off-by: Hariprasad Kelam <[email protected]>
    Signed-off-by: Sunil Goutham <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: cn10k: Use appropriate register for LMAC enable [+ + +]

Author: Geetha sowjanya <[email protected]>
Date:   Fri Jan 21 12:04:42 2022 +0530

    octeontx2-af: cn10k: Use appropriate register for LMAC enable
    
    [ Upstream commit fae80edeafbbba5ef9a0423aa5e5515518626433 ]
    
    CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG
    register for lmac TX/RX enable whereas CN9xxx platforms use
    CGX_CMRX_CONFIG register. This config change was missed when
    adding support for CN10K RPM.
    
    Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
    Signed-off-by: Geetha sowjanya <[email protected]>
    Signed-off-by: Subbaraya Sundeep <[email protected]>
    Signed-off-by: Sunil Goutham <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: Optimize KPU1 processing for variable-length headers [+ + +]

Author: Kiran Kumar K <[email protected]>
Date:   Fri Sep 24 11:48:51 2021 +0530

    octeontx2-af: Optimize KPU1 processing for variable-length headers
    
    [ Upstream commit edadeb38dc2fa2550801995b748110c3e5e59557 ]
    
    Optimized KPU1 entry processing for variable-length custom L2 headers
    of size 24B, 90B by
            - Moving LA LTYPE parsing for 24B and 90B headers to PKIND.
            - Removing LA flags assignment for 24B and 90B headers.
            - Reserving a PKIND 55 to parse variable length headers.
    
    Also, new mailbox(NPC_SET_PKIND) added to configure PKIND with
    corresponding variable-length offset, mask, and shift count
    (NPC_AF_KPUX_ENTRYX_ACTION0).
    
    Signed-off-by: Kiran Kumar K <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

octeontx2-af: Reset PTP config in FLR handler [+ + +]

Author: Harman Kalra <[email protected]>
Date:   Tue Sep 28 17:00:58 2021 +0530

    octeontx2-af: Reset PTP config in FLR handler
    
    [ Upstream commit e37e08fffc373206ad4e905c05729ea6bbdcb22c ]
    
    Upon receiving ptp config request from netdev interface , Octeontx2 MAC
    block CGX is configured to append timestamp to every incoming packet
    and NPC config is updated with DMAC offset change.
    
    Currently this configuration is not reset in FLR handler. This patch
    resets the same.
    
    Signed-off-by: Harman Kalra <[email protected]>
    Signed-off-by: Hariprasad Kelam <[email protected]>
    Signed-off-by: Sunil Goutham <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

of: net: move of_net under net/ [+ + +]

Author: Jakub Kicinski <[email protected]>
Date:   Wed Oct 6 18:06:54 2021 -0700

    of: net: move of_net under net/
    
    [ Upstream commit e330fb14590c5c80f7195c3d8c9b4bcf79e1a5cd ]
    
    Rob suggests to move of_net.c from under drivers/of/ somewhere
    to the networking code.
    
    Suggested-by: Rob Herring <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Reviewed-by: Rob Herring <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: aardvark: Fix checking for MEM resource type [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 17:01:47 2021 +0100

    PCI: aardvark: Fix checking for MEM resource type
    
    [ Upstream commit 2070b2ddea89f5b604fac3d27ade5cb6d19a5706 ]
    
    IORESOURCE_MEM_64 is not a resource type but a type flag.
    
    Remove incorrect check for type IORESOURCE_MEM_64.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 64f160e19e92 ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Marek Behц╨n <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: dwc: Do not remap invalid res [+ + +]

Author: Tim Harvey <[email protected]>
Date:   Mon Nov 1 11:02:43 2021 -0700

    PCI: dwc: Do not remap invalid res
    
    [ Upstream commit 6e5ebc96ec651b67131f816d7e3bf286c635e749 ]
    
    On imx6 and perhaps others when pcie probes you get a:
    imx6q-pcie 33800000.pcie: invalid resource
    
    This occurs because the atu is not specified in the DT and as such it
    should not be remapped.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 281f1f99cf3a ("PCI: dwc: Detect number of iATU windows")
    Signed-off-by: Tim Harvey <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Reviewed-by: Rob Herring <[email protected]>
    Acked-by: Richard Zhu <[email protected]>
    Cc: Richard Zhu <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mediatek-gen3: Disable DVFSRC voltage request [+ + +]

Author: Jianjun Wang <[email protected]>
Date:   Fri Oct 15 14:36:02 2021 +0800

    PCI: mediatek-gen3: Disable DVFSRC voltage request
    
    [ Upstream commit ab344fd43f2958726d17d651c0cb692c67dca382 ]
    
    When the DVFSRC (dynamic voltage and frequency scaling resource collector)
    feature is not implemented, the PCIe hardware will assert a voltage request
    signal when exit from the L1 PM Substates to request a specific Vcore
    voltage, but cannot receive the voltage ready signal, which will cause
    the link to fail to exit the L1 PM Substates.
    
    Disable DVFSRC voltage request by default, we need to find a common way to
    enable it in the future.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: d3bf75b579b9 ("PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192")
    Tested-by: Qizhong Cheng <[email protected]>
    Signed-off-by: Jianjun Wang <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Reviewed-by: Tzung-Bi Shih <[email protected]>
    Reviewed-by: Matthias Brugger <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Check for errors from pci_bridge_emul_init() call [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:45:52 2021 +0100

    PCI: mvebu: Check for errors from pci_bridge_emul_init() call
    
    [ Upstream commit 5d18d702e5c9309f4195653475c7a7fdde4ca71f ]
    
    Function pci_bridge_emul_init() may fail so correctly check for errors.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Do not modify PCI IO type bits in conf_write [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:45:57 2021 +0100

    PCI: mvebu: Do not modify PCI IO type bits in conf_write
    
    [ Upstream commit 2cf150216e5b5619d7c25180ccf2cc8ac7bebc13 ]
    
    PCI IO type bits are already initialized in mvebu_pci_bridge_emul_init()
    function and only when IO support is enabled. These type bits are read-only
    and pci-bridge-emul.c code already does not allow to modify them from upper
    layers.
    
    When IO support is disabled then all IO registers should be read-only and
    return zeros. Therefore do not modify PCI IO type bits in
    mvebu_pci_bridge_emul_base_conf_write() callback.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:46:01 2021 +0100

    PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge
    
    [ Upstream commit 91a8d79fc797d3486ae978beebdfc55261c7d65b ]
    
    It looks like that mvebu PCIe controller has for each PCIe link fully
    independent PCIe host bridge and so every PCIe Root Port is isolated not
    only on its own bus but also isolated from each others. But in past device
    tree structure was defined to put all PCIe Root Ports (as PCI Bridge
    devices) into one root bus 0 and this bus is emulated by pci-mvebu.c
    driver.
    
    Probably reason for this decision was incorrect understanding of PCIe
    topology of these Armada SoCs and also reason of misunderstanding how is
    PCIe controller generating Type 0 and Type 1 config requests (it is fully
    different compared to other drivers). Probably incorrect setup leaded to
    very surprised things like having PCIe Root Port (PCI Bridge device, with
    even incorrect Device Class set to Memory Controller) and the PCIe device
    behind the Root Port on the same PCI bus, which obviously was needed to
    somehow hack (as these two devices cannot be in reality on the same bus).
    
    Properly set mvebu local bus number and mvebu local device number based on
    PCI Bridge secondary bus number configuration. Also correctly report
    configured secondary bus number in config space. And explain in driver
    comment why this setup is correct.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix device enumeration regression [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Mon Feb 14 12:02:28 2022 +0100

    PCI: mvebu: Fix device enumeration regression
    
    [ Upstream commit c49ae619905eebd3f54598a84e4cd2bd58ba8fe9 ]
    
    Jan reported that on Turris Omnia (Armada 385), no PCIe devices were
    detected after upgrading from v5.16.1 to v5.16.3 and identified the cause
    as the backport of 91a8d79fc797 ("PCI: mvebu: Fix configuring secondary bus
    of PCIe Root Port via emulated bridge"), which appeared in v5.17-rc1.
    
    91a8d79fc797 was incorrectly applied from mailing list patch [1] to the
    linux git repository [2] probably due to resolving merge conflicts
    incorrectly. Fix it now.
    
    [1] https://lore.kernel.org/r/[email protected]
    [2] https://git.kernel.org/linus/91a8d79fc797
    
    [bhelgaas: commit log]
    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215540
    Fixes: 91a8d79fc797 ("PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge")
    Link: https://lore.kernel.org/r/[email protected]
    Link: https://lore.kernel.org/r/20220127234917.GA150851@bhelgaas
    Reported-by: Jan Palus <[email protected]>
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:45:56 2021 +0100

    PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge
    
    [ Upstream commit e42b85583719adb87ab88dc7bcd41b38011f7d11 ]
    
    According to PCI specifications bits [0:2] of Command Register, this should
    be by default disabled on reset. So explicitly disable these bits at early
    beginning of driver initialization.
    
    Also remove code which unconditionally enables all 3 bits and let kernel
    code (via pci_set_master() function) to handle bus mastering of PCI Bridge
    via emulated PCI_COMMAND on emulated bridge.
    
    Adjust existing functions mvebu_pcie_handle_iobase_change() and
    mvebu_pcie_handle_membase_change() to handle PCI_IO_BASE and PCI_MEM_BASE
    registers correctly even when bus mastering on emulated bridge is disabled.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:46:05 2021 +0100

    PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge
    
    [ Upstream commit 4ab34548c55fbbb3898306a47dfaccd4860e1ccb ]
    
    Armada XP and new hardware supports access to DEVCAP2, DEVCTL2 and LNKCTL2
    configuration registers of PCIe core via PCIE_CAP_PCIEXP. So export them
    via emulated software root bridge.
    
    Pre-XP hardware does not support these registers and returns zeros.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:46:02 2021 +0100

    PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
    
    [ Upstream commit d75404cc08832206f173668bd35391c581fea121 ]
    
    Hardware supports PCIe Hot Reset via PCIE_CTRL_OFF register. Use it for
    implementing PCI_BRIDGE_CTL_BUS_RESET bit of PCI_BRIDGE_CONTROL register on
    emulated bridge.
    
    With this change the function pci_reset_secondary_bus() starts working and
    can reset connected PCIe card.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:46:03 2021 +0100

    PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge
    
    [ Upstream commit ecae073e393e65ee7be7ebf3fdd5258ab99f1636 ]
    
    Comment in Armada 370 functional specification is misleading.
    PCI_EXP_DEVCTL_*RE bits are supported and configures receiving of error
    interrupts.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:46:04 2021 +0100

    PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge
    
    [ Upstream commit 838ff44a398ff47fe9b924961d91aee325821220 ]
    
    PME Status bit in Root Status Register (PCIE_RC_RTSTA_OFF) is read-only and
    can be cleared only by writing 0b to the Interrupt Cause RW0C register
    (PCIE_INT_CAUSE_OFF).
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: mvebu: Setup PCIe controller to Root Complex mode [+ + +]

Author: Pali Rohц║r <[email protected]>
Date:   Thu Nov 25 13:45:59 2021 +0100

    PCI: mvebu: Setup PCIe controller to Root Complex mode
    
    [ Upstream commit df08ac016124bd88b8598ac0599d7b89c0642774 ]
    
    This driver operates only in Root Complex mode, so ensure that hardware is
    properly configured in Root Complex mode.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Pali Rohц║r <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI: rcar: Check if device is runtime suspended instead of __clk_is_enabled() [+ + +]

Author: Marek Vasut <[email protected]>
Date:   Mon Nov 15 21:46:41 2021 +0100

    PCI: rcar: Check if device is runtime suspended instead of __clk_is_enabled()
    
    [ Upstream commit d2a14b54989e9ccea8401895fdfbc213bd1f56af ]
    
    Replace __clk_is_enabled() with pm_runtime_suspended(),
    as __clk_is_enabled() was checking the wrong bus clock
    and caused the following build error too:
      arm-linux-gnueabi-ld: drivers/pci/controller/pcie-rcar-host.o: in function `rcar_pcie_aarch32_abort_handler':
      pcie-rcar-host.c:(.text+0xdd0): undefined reference to `__clk_is_enabled'
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: a115b1bd3af0 ("PCI: rcar: Add L1 link state fix into data abort hook")
    Signed-off-by: Marek Vasut <[email protected]>
    Signed-off-by: Lorenzo Pieralisi <[email protected]>
    Reviewed-by: Geert Uytterhoeven <[email protected]>
    Acked-by: Randy Dunlap <[email protected]>
    Cc: Arnd Bergmann <[email protected]>
    Cc: Bjorn Helgaas <[email protected]>
    Cc: Geert Uytterhoeven <[email protected]>
    Cc: Lorenzo Pieralisi <[email protected]>
    Cc: Stephen Boyd <[email protected]>
    Cc: Wolfram Sang <[email protected]>
    Cc: Yoshihiro Shimoda <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

pinctrl: sunxi: Use unique lockdep classes for IRQs [+ + +]

Author: Samuel Holland <[email protected]>
Date:   Tue Feb 15 22:00:36 2022 -0600

    pinctrl: sunxi: Use unique lockdep classes for IRQs
    
    commit bac129dbc6560dfeb634c03f0c08b78024e71915 upstream.
    
    This driver, like several others, uses a chained IRQ for each GPIO bank,
    and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
    a call to irq_set_irq_wake() needs to lock both the upstream and
    downstream irq_desc's. Lockdep considers this to be a possible deadlock
    when the irq_desc's share lockdep classes, which they do by default:
    
     ============================================
     WARNING: possible recursive locking detected
     5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
     --------------------------------------------
     init/307 is trying to acquire lock:
     c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
    
     but task is already holding lock:
     c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
    
     other info that might help us debug this:
      Possible unsafe locking scenario:
    
            CPU0
            ----
       lock(&irq_desc_lock_class);
       lock(&irq_desc_lock_class);
    
      *** DEADLOCK ***
    
      May be due to missing lock nesting notation
    
     4 locks held by init/307:
      #0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c
      #1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224
      #2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224
      #3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
    
     stack backtrace:
     CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
     Hardware name: Allwinner sun8i Family
      unwind_backtrace from show_stack+0x10/0x14
      show_stack from dump_stack_lvl+0x68/0x90
      dump_stack_lvl from __lock_acquire+0x1680/0x31a0
      __lock_acquire from lock_acquire+0x148/0x3dc
      lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
      _raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
      __irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
      irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
        [tail call from sunxi_pinctrl_irq_set_wake]
      irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
      gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
      gpio_keys_shutdown from device_shutdown+0x180/0x224
      device_shutdown from __do_sys_reboot+0x134/0x23c
      __do_sys_reboot from ret_fast_syscall+0x0/0x1c
    
    However, this can never deadlock because the upstream and downstream
    IRQs are never the same (nor do they even involve the same irqchip).
    
    Silence this erroneous lockdep splat by applying what appears to be the
    usual fix of moving the GPIO IRQs to separate lockdep classes.
    
    Fixes: a59c99d9eaf9 ("pinctrl: sunxi: Forward calls to irq_set_irq_wake")
    Reported-by: Guenter Roeck <[email protected]>
    Signed-off-by: Samuel Holland <[email protected]>
    Reviewed-by: Jernej Skrabec <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Linus Walleij <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

proc: fix documentation and description of pagemap [+ + +]

Author: Yun Zhou <[email protected]>
Date:   Fri Mar 4 20:29:07 2022 -0800

    proc: fix documentation and description of pagemap
    
    commit dd21bfa425c098b95ca86845f8e7d1ec1ddf6e4a upstream.
    
    Since bit 57 was exported for uffd-wp write-protected (commit
    fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
    fixing it can reduce some unnecessary confusion.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information")
    Signed-off-by: Yun Zhou <[email protected]>
    Reviewed-by: Peter Xu <[email protected]>
    Cc: Jonathan Corbet <[email protected]>
    Cc: Tiberiu A Georgescu <[email protected]>
    Cc: Florian Schmidt <[email protected]>
    Cc: Ivan Teterevkov <[email protected]>
    Cc: SeongJae Park <[email protected]>
    Cc: Yang Shi <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: Axel Rasmussen <[email protected]>
    Cc: Miaohe Lin <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Colin Cross <[email protected]>
    Cc: Alistair Popple <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

regulator: core: fix false positive in regulator_late_cleanup() [+ + +]

Author: Oliver Barta <[email protected]>
Date:   Tue Feb 8 09:46:45 2022 +0100

    regulator: core: fix false positive in regulator_late_cleanup()
    
    [ Upstream commit 4e2a354e3775870ca823f1fb29bbbffbe11059a6 ]
    
    The check done by regulator_late_cleanup() to detect whether a regulator
    is on was inconsistent with the check done by _regulator_is_enabled().
    While _regulator_is_enabled() takes the enable GPIO into account,
    regulator_late_cleanup() was not doing that.
    
    This resulted in a false positive, e.g. when a GPIO-controlled fixed
    regulator was used, which was not enabled at boot time, e.g.
    
    reg_disp_1v2: reg_disp_1v2 {
            compatible = "regulator-fixed";
            regulator-name = "display_1v2";
            regulator-min-microvolt = <1200000>;
            regulator-max-microvolt = <1200000>;
            gpio = <&tlmm 148 0>;
            enable-active-high;
    };
    
    Such regulator doesn't have an is_enabled() operation. Nevertheless
    it's state can be determined based on the enable GPIO. The check in
    regulator_late_cleanup() wrongly assumed that the regulator is on and
    tried to disable it.
    
    Signed-off-by: Oliver Barta <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Revert "nfsd: skip some unnecessary stats in the v4 case" [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Fri Dec 24 14:22:28 2021 -0500

    Revert "nfsd: skip some unnecessary stats in the v4 case"
    
    [ Upstream commit 58f258f65267542959487dbe8b5641754411843d ]
    
    On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes
    returning a reasonable-looking value in the cinfo.before field and
    zero in the cinfo.after field.
    
    RFC 8881 Section 10.8.1 says:
    > When a client is making changes to a given directory, it needs to
    > determine whether there have been changes made to the directory by
    > other clients.  It does this by using the change attribute as
    > reported before and after the directory operation in the associated
    > change_info4 value returned for the operation.
    
    and
    
    > ... The post-operation change
    > value needs to be saved as the basis for future change_info4
    > comparisons.
    
    A good quality client implementation therefore saves the zero
    cinfo.after value. During a subsequent OPEN operation, it will
    receive a different non-zero value in the cinfo.before field for
    that directory, and it will incorrectly believe the directory has
    changed, triggering an undesirable directory cache invalidation.
    
    There are filesystem types where fs_supports_change_attribute()
    returns false, tmpfs being one. On NFSv4 mounts, this means the
    fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is
    never invoked. Subsequently, nfsd4_change_attribute() is invoked
    with an uninitialized @stat argument.
    
    In fill_pre_wcc(), @stat contains stale stack garbage, which is
    then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all
    zeroes, so zero is placed on the wire. Both of these values are
    meaningless.
    
    This fix can be applied immediately to stable kernels. Once there
    are more regression tests in this area, this optimization can be
    attempted again.
    
    Fixes: 428a23d2bf0c ("nfsd: skip some unnecessary stats in the v4 case")
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6" [+ + +]

Author: Jiri Bohac <[email protected]>
Date:   Wed Jan 26 16:00:18 2022 +0100

    Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
    
    commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream.
    
    This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a.
    
    Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu
    should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
    calculation in ipsec transport mode, resulting complete stalls of TCP
    connections. This happens when the (P)MTU is 1280 or slighly larger.
    
    The desired formula for the MSS is:
    MSS = (MTU - ESP_overhead) - IP header - TCP header
    
    However, the above commit clamps the (MTU - ESP_overhead) to a
    minimum of 1280, turning the formula into
    MSS = max(MTU - ESP overhead, 1280) -  IP header - TCP header
    
    With the (P)MTU near 1280, the calculated MSS is too large and the
    resulting TCP packets never make it to the destination because they
    are over the actual PMTU.
    
    The above commit also causes suboptimal double fragmentation in
    xfrm tunnel mode, as described in
    https://lore.kernel.org/netdev/[email protected]/
    
    The original problem the above commit was trying to fix is now fixed
    by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
    regression").
    
    Signed-off-by: Jiri Bohac <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value [+ + +]

Author: Sunil V L <[email protected]>
Date:   Fri Jan 28 10:20:04 2022 +0530

    riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
    
    commit dcf0c838854c86e1f41fb1934aea906845d69782 upstream.
    
    The get_boot_hartid_from_fdt() function currently returns U32_MAX
    for failure case which is not correct because U32_MAX is a valid
    hartid value. This patch fixes the issue by returning error code.
    
    Cc: <[email protected]>
    Fixes: d7071743db31 ("RISC-V: Add EFI stub support.")
    Signed-off-by: Sunil V L <[email protected]>
    Reviewed-by: Heinrich Schuchardt <[email protected]>
    Signed-off-by: Ard Biesheuvel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv/mm: Add XIP_FIXUP for phys_ram_base [+ + +]

Author: Palmer Dabbelt <[email protected]>
Date:   Fri Feb 4 13:13:37 2022 -0800

    riscv/mm: Add XIP_FIXUP for phys_ram_base
    
    [ Upstream commit 4b1c70aa8ed8249608bb991380cb8ff423edf49e ]
    
    This manifests as a crash early in boot on VexRiscv.
    
    Signed-off-by: Myrtle Shah <[email protected]>
    [Palmer: split commit]
    Fixes: 6d7f91d914bc ("riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversion")
    Cc: [email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

riscv: Fix config KASAN && DEBUG_VIRTUAL [+ + +]

Author: Alexandre Ghiti <[email protected]>
Date:   Fri Feb 25 13:39:51 2022 +0100

    riscv: Fix config KASAN && DEBUG_VIRTUAL
    
    commit c648c4bb7d02ceb53ee40172fdc4433b37cee9c6 upstream.
    
    __virt_to_phys function is called very early in the boot process (ie
    kasan_early_init) so it should not be instrumented by KASAN otherwise it
    bugs.
    
    Fix this by declaring phys_addr.c as non-kasan instrumentable.
    
    Signed-off-by: Alexandre Ghiti <[email protected]>
    Fixes: 8ad8b72721d0 (riscv: Add KASAN support)
    Cc: [email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP [+ + +]

Author: Alexandre Ghiti <[email protected]>
Date:   Fri Feb 25 13:39:49 2022 +0100

    riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
    
    commit a3d328037846d013bb4c7f3777241e190e4c75e1 upstream.
    
    In order to get the pfn of a struct page* when sparsemem is enabled
    without vmemmap, the mem_section structures need to be initialized which
    happens in sparse_init.
    
    But kasan_early_init calls pfn_to_page way before sparse_init is called,
    which then tries to dereference a null mem_section pointer.
    
    Fix this by removing the usage of this function in kasan_early_init.
    
    Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
    Signed-off-by: Alexandre Ghiti <[email protected]>
    Cc: [email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/extable: fix exception table sorting [+ + +]

Author: Heiko Carstens <[email protected]>
Date:   Thu Feb 24 22:03:29 2022 +0100

    s390/extable: fix exception table sorting
    
    commit c194dad21025dfd043210912653baab823bdff67 upstream.
    
    s390 has a swap_ex_entry_fixup function, however it is not being used
    since common code expects a swap_ex_entry_fixup define. If it is not
    defined the default implementation will be used. So fix this by adding
    a proper define.
    However also the implementation of the function must be fixed, since a
    NULL value for handler has a special meaning and must not be adjusted.
    
    Luckily all of this doesn't fix a real bug currently: the main extable
    is correctly sorted during build time, and for runtime sorting there
    is currently no case where the handler field is not NULL.
    
    Fixes: 05a68e892e89 ("s390/kernel: expand exception table logic to allow new handling options")
    Acked-by: Ilya Leoshkevich <[email protected]>
    Reviewed-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Heiko Carstens <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

sched/fair: Fix fault in reweight_entity [+ + +]

Author: Tadeusz Struk <[email protected]>
Date:   Thu Feb 3 08:18:46 2022 -0800

    sched/fair: Fix fault in reweight_entity
    
    [ Upstream commit 13765de8148f71fa795e0a6607de37c49ea5915a ]
    
    Syzbot found a GPF in reweight_entity. This has been bisected to
    commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid
    sched_task_group")
    
    Thereб═is a race between sched_post_fork() and setpriority(PRIO_PGRP)
    within a thread group that causes a null-ptr-derefб═in
    reweight_entity() in CFS. The scenario is that the main process spawns
    number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
    wait, and exit.  For each of the new threads the copy_process() gets
    invoked, which adds the new task_struct and calls sched_post_fork()
    for it.
    
    In the above scenario there is a possibility that
    setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
    in the group that is just being created by copy_process(), and for
    which the sched_post_fork() has not been executed yet. This will
    trigger a null pointer dereference in reweight_entity(),б═as it will
    try to access the run queue pointer, which hasn't been set.
    
    Before the mentioned change the cfs_rq pointer for the task  has been
    set in sched_fork(), which is called much earlier in copy_process(),
    before the new task is added to the thread_group.  Now it is done in
    the sched_post_fork(), which is called after that.  To fix the issue
    the remove the update_load param from the update_load param() function
    and call reweight_task() only if the task flag doesn't have the
    TASK_NEW flag set.
    
    Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
    Reported-by: [email protected]
    Signed-off-by: Tadeusz Struk <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Reviewed-by: Dietmar Eggemann <[email protected]>
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

sched: Fix yet more sched_fork() races [+ + +]

Author: Peter Zijlstra <[email protected]>
Date:   Mon Feb 14 10:16:57 2022 +0100

    sched: Fix yet more sched_fork() races
    
    commit b1e8206582f9d680cff7d04828708c8b6ab32957 upstream.
    
    Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
    invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
    race vs syscalls by not placing the task on the runqueue before it
    gets exposed through the pidhash.
    
    Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is
    trying to fix a single instance of this, instead fix the whole class
    of issues, effectively reverting this commit.
    
    Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
    Reported-by: Linus Torvalds <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Tested-by: Tadeusz Struk <[email protected]>
    Tested-by: Zhang Qiao <[email protected]>
    Tested-by: Dietmar Eggemann <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT [+ + +]

Author: Krzysztof Kozlowski <[email protected]>
Date:   Mon Feb 14 09:36:57 2022 +0100

    selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
    
    [ Upstream commit 6fec1ab67f8d60704cc7de64abcfd389ab131542 ]
    
    The PREEMPT_RT patchset does not use do_softirq() function thus trying
    to filter for do_softirq fails for such kernel:
    
      echo do_softirq
      ftracetest: 81: echo: echo: I/O error
    
    Choose some other visible function for the test.  The function does not
    have to be actually executed during the test, because it is only testing
    filter API interface.
    
    Signed-off-by: Krzysztof Kozlowski <[email protected]>
    Reviewed-by: Shuah Khan <[email protected]>
    Acked-by: Sebastian Andrzej Siewior <[email protected]>
    Reviewed-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Shuah Khan <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests/seccomp: Fix seccomp failure by adding missing headers [+ + +]

Author: Sherry Yang <[email protected]>
Date:   Thu Feb 10 12:30:49 2022 -0800

    selftests/seccomp: Fix seccomp failure by adding missing headers
    
    [ Upstream commit 21bffcb76ee2fbafc7d5946cef10abc9df5cfff7 ]
    
    seccomp_bpf failed on tests 47 global.user_notification_filter_empty
    and 48 global.user_notification_filter_empty_threaded when it's
    tested on updated kernel but with old kernel headers. Because old
    kernel headers don't have definition of macro __NR_clone3 which is
    required for these two tests. Since under selftests/, we can install
    headers once for all tests (the default INSTALL_HDR_PATH is
    usr/include), fix it by adding usr/include to the list of directories
    to be searched. Use "-isystem" to indicate it's a system directory as
    the real kernel headers directories are.
    
    Signed-off-by: Sherry Yang <[email protected]>
    Tested-by: Sherry Yang <[email protected]>
    Reviewed-by: Kees Cook <[email protected]>
    Signed-off-by: Shuah Khan <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting [+ + +]

Author: Waiman Long <[email protected]>
Date:   Fri Jan 14 14:07:58 2022 -0800

    selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting
    
    [ Upstream commit 209376ed2a8431ccb4c40fdcef11194fc1e749b0 ]
    
    The hugetlb cgroup reservation test charge_reserved_hugetlb.sh assume
    that no cgroup filesystems are mounted before running the test.  That is
    not true in many cases.  As a result, the test fails to run.  Fix that
    by querying the current cgroup mount setting and using the existing
    cgroup setup instead before attempting to freshly mount a cgroup
    filesystem.
    
    Similar change is also made for hugetlb_reparenting_test.sh as well,
    though it still has problem if cgroup v2 isn't used.
    
    The patched test scripts were run on a centos 8 based system to verify
    that they ran properly.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
    Signed-off-by: Waiman Long <[email protected]>
    Acked-by: Mina Almasry <[email protected]>
    Cc: Shuah Khan <[email protected]>
    Cc: Mike Kravetz <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: mlxsw: resource_scale: Fix return value [+ + +]

Author: Amit Cohen <[email protected]>
Date:   Wed Mar 2 18:14:47 2022 +0200

    selftests: mlxsw: resource_scale: Fix return value
    
    [ Upstream commit 196f9bc050cbc5085b4cbb61cce2efe380bc66d0 ]
    
    The test runs several test cases and is supposed to return an error in
    case at least one of them failed.
    
    Currently, the check of the return value of each test case is in the
    wrong place, which can result in the wrong return value. For example:
    
     # TESTS='tc_police' ./resource_scale.sh
     TEST: 'tc_police' [default] 968                                     [FAIL]
             tc police offload count failed
     Error: mlxsw_spectrum: Failed to allocate policer index.
     We have an error talking to the kernel
     Command failed /tmp/tmp.i7Oc5HwmXY:969
     TEST: 'tc_police' [default] overflow 969                            [ OK ]
     ...
     TEST: 'tc_police' [ipv4_max] overflow 969                           [ OK ]
    
     $ echo $?
     0
    
    Fix this by moving the check to be done after each test case.
    
    Fixes: 059b18e21c63 ("selftests: mlxsw: Return correct error code in resource scale test")
    Signed-off-by: Amit Cohen <[email protected]>
    Reviewed-by: Petr Machata <[email protected]>
    Signed-off-by: Ido Schimmel <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: mlxsw: tc_police_scale: Make test more robust [+ + +]

Author: Amit Cohen <[email protected]>
Date:   Wed Mar 2 18:14:46 2022 +0200

    selftests: mlxsw: tc_police_scale: Make test more robust
    
    commit dc9752075341e7beb653e37c6f4a3723074dc8bc upstream.
    
    The test adds tc filters and checks how many of them were offloaded by
    grepping for 'in_hw'.
    
    iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
    action offload") added offload indication to tc actions, producing the
    following output:
    
     $ tc filter show dev swp2 ingress
     ...
     filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
       eth_type ipv6
       dst_ip 2001:db8:1::7bf
       skip_sw
       in_hw in_hw_count 1
             action order 1:  police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
             ref 1 bind 1
             not_in_hw
             used_hw_stats immediate
    
    The current grep expression matches on both 'in_hw' and 'not_in_hw',
    resulting in incorrect results.
    
    Fix that by using JSON output instead.
    
    Fixes: 5061e773264b ("selftests: mlxsw: Add scale test for tc-police")
    Signed-off-by: Amit Cohen <[email protected]>
    Reviewed-by: Petr Machata <[email protected]>
    Signed-off-by: Ido Schimmel <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: stm32: prevent TDR register overwrite when sending x_char [+ + +]

Author: Valentin Caron <[email protected]>
Date:   Tue Jan 11 17:44:40 2022 +0100

    serial: stm32: prevent TDR register overwrite when sending x_char
    
    [ Upstream commit d3d079bde07e1b7deaeb57506dc0b86010121d17 ]
    
    When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
    the value of TDR register by the value of x_char. If this happens, the
    previous value that was present in TDR register will not be sent through
    uart.
    
    This code checks if the previous value in TDR register is sent before
    writing the x_char value into register.
    
    Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
    Cc: stable <[email protected]>
    Signed-off-by: Valentin Caron <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

signal: In get_signal test for signal_group_exit every time through the loop [+ + +]

Author: Eric W. Biederman <[email protected]>
Date:   Mon Nov 15 11:55:57 2021 -0600

    signal: In get_signal test for signal_group_exit every time through the loop
    
    [ Upstream commit e7f7c99ba911f56bc338845c1cd72954ba591707 ]
    
    Recently while investigating a problem with rr and signals I noticed
    that siglock is dropped in ptrace_signal and get_signal does not jump
    to relock.
    
    Looking farther to see if the problem is anywhere else I see that
    do_signal_stop also returns if signal_group_exit is true.  I believe
    that test can now never be true, but it is a bit hard to trace
    through and be certain.
    
    Testing signal_group_exit is not expensive, so move the test for
    signal_group_exit into the for loop inside of get_signal to ensure
    the test is never skipped improperly.
    
    This has been a potential problem since I added the test for
    signal_group_exit was added.
    
    Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
    Reviewed-by: Kees Cook <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: "Eric W. Biederman" <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: guts: Add a missing memory allocation failure check [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Wed Nov 3 21:00:33 2021 +0100

    soc: fsl: guts: Add a missing memory allocation failure check
    
    [ Upstream commit b9abe942cda43a1d46a0fd96efb54f1aa909f757 ]
    
    If 'devm_kstrdup()' fails, we should return -ENOMEM.
    
    While at it, move the 'of_node_put()' call in the error handling path and
    after the 'machine' has been copied.
    Better safe than sorry.
    
    Fixes: a6fc3b698130 ("soc: fsl: add GUTS driver for QorIQ platforms")
    Depends-on: fddacc7ff4dd ("soc: fsl: guts: Revert commit 3c0d64e867ed")
    Suggested-by: Tyrel Datwyler <[email protected]>
    Signed-off-by: Christophe JAILLET <[email protected]>
    Signed-off-by: Li Yang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: guts: Revert commit 3c0d64e867ed [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Wed Nov 3 21:00:17 2021 +0100

    soc: fsl: guts: Revert commit 3c0d64e867ed
    
    [ Upstream commit b113737cf12964a20cc3ba1ddabe6229099661c6 ]
    
    This reverts commit 3c0d64e867ed
    ("soc: fsl: guts: reuse machine name from device tree").
    
    A following patch will fix the missing memory allocation failure check
    instead.
    
    Suggested-by: Tyrel Datwyler <[email protected]>
    Signed-off-by: Christophe JAILLET <[email protected]>
    Signed-off-by: Li Yang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: qe: Check of ioremap return value [+ + +]

Author: Jiasheng Jiang <[email protected]>
Date:   Thu Dec 30 09:45:43 2021 +0800

    soc: fsl: qe: Check of ioremap return value
    
    [ Upstream commit a222fd8541394b36b13c89d1698d9530afd59a9c ]
    
    As the possible failure of the ioremap(), the par_io could be NULL.
    Therefore it should be better to check it and return error in order to
    guarantee the success of the initiation.
    But, I also notice that all the caller like mpc85xx_qe_par_io_init() in
    `arch/powerpc/platforms/85xx/common.c` don't check the return value of
    the par_io_init().
    Actually, par_io_init() needs to check to handle the potential error.
    I will submit another patch to fix that.
    Anyway, par_io_init() itsely should be fixed.
    
    Fixes: 7aa1aa6ecec2 ("QE: Move QE from arch/powerpc to drivers/soc")
    Signed-off-by: Jiasheng Jiang <[email protected]>
    Signed-off-by: Li Yang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sat Jan 8 16:59:54 2022 -0500

    SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points
    
    [ Upstream commit 16720861675393a35974532b3c837d9fd7bfe08c ]
    
    Avoid potentially hazardous memory copying and the needless use of
    "%pIS" -- in the kernel, an RPC service listener is always bound to
    ANYADDR. Having the network namespace is helpful when recording
    errors, though.
    
    Fixes: a0469f46faab ("SUNRPC: Replace dprintk call sites in TCP state change callouts")
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point [+ + +]

Author: Chuck Lever <[email protected]>
Date:   Sun Jan 9 13:26:51 2022 -0500

    SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point
    
    [ Upstream commit dc6c6fb3d639756a532bcc47d4a9bf9f3965881b ]
    
    While testing, I got an unexpected KASAN splat:
    
    Jan 08 13:50:27 oracle-102.nfsv4.dev kernel: BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_svc_xprt_create_err+0x190/0x210 [sunrpc]
    Jan 08 13:50:27 oracle-102.nfsv4.dev kernel: Read of size 28 at addr ffffc9000008f728 by task mount.nfs/4628
    
    The memcpy() in the TP_fast_assign section of this trace point
    copies the size of the destination buffer in order that the buffer
    won't be overrun.
    
    In other similar trace points, the source buffer for this memcpy is
    a "struct sockaddr_storage" so the actual length of the source
    buffer is always long enough to prevent the memcpy from reading
    uninitialized or unallocated memory.
    
    However, for this trace point, the source buffer can be as small as
    a "struct sockaddr_in". For AF_INET sockaddrs, the memcpy() reads
    memory that follows the source buffer, which is not always valid
    memory.
    
    To avoid copying past the end of the passed-in sockaddr, make the
    source address's length available to the memcpy(). It would be a
    little nicer if the tracing infrastructure was more friendly about
    storing socket addresses that are not AF_INET, but I could not find
    a way to make printk("%pIS") work with a dynamic array.
    
    Reported-by: KASAN
    Fixes: 4b8f380e46e4 ("SUNRPC: Tracepoint to record errors in svc_xpo_create()")
    Signed-off-by: Chuck Lever <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

thermal: core: Fix TZ_GET_TRIP NULL pointer dereference [+ + +]

Author: Nicolas Cavallari <[email protected]>
Date:   Mon Feb 28 12:03:51 2022 +0100

    thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
    
    commit 5838a14832d447990827d85e90afe17e6fb9c175 upstream.
    
    Do not call get_trip_hyst() from thermal_genl_cmd_tz_get_trip() if
    the thermal zone does not define one.
    
    Fixes: 1ce50e7d408e ("thermal: core: genetlink support for events/cmd/sampling")
    Signed-off-by: Nicolas Cavallari <[email protected]>
    Cc: 5.10+ <[email protected]> # 5.10+
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tipc: fix a bit overflow in tipc_crypto_key_rcv() [+ + +]

Author: Hangyu Hua <[email protected]>
Date:   Fri Feb 11 12:55:10 2022 +0800

    tipc: fix a bit overflow in tipc_crypto_key_rcv()
    
    [ Upstream commit 143de8d97d79316590475dc2a84513c63c863ddf ]
    
    msg_data_sz return a 32bit value, but size is 16bit. This may lead to a
    bit overflow.
    
    Signed-off-by: Hangyu Hua <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tools/resolve_btf_ids: Close ELF file on error [+ + +]

Author: Andrii Nakryiko <[email protected]>
Date:   Tue Nov 23 16:23:13 2021 -0800

    tools/resolve_btf_ids: Close ELF file on error
    
    [ Upstream commit 1144ab9bdf3430e1b5b3f22741e5283841951add ]
    
    Fix one case where we don't do explicit clean up.
    
    Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

tracing/histogram: Fix sorting on old "cpu" value [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Tue Mar 1 22:29:04 2022 -0500

    tracing/histogram: Fix sorting on old "cpu" value
    
    commit 1d1898f65616c4601208963c3376c1d828cbf2c7 upstream.
    
    When trying to add a histogram against an event with the "cpu" field, it
    was impossible due to "cpu" being a keyword to key off of the running CPU.
    So to fix this, it was changed to "common_cpu" to match the other generic
    fields (like "common_pid"). But since some scripts used "cpu" for keying
    off of the CPU (for events that did not have "cpu" as a field, which is
    most of them), a backward compatibility trick was added such that if "cpu"
    was used as a key, and the event did not have "cpu" as a field name, then
    it would fallback and switch over to "common_cpu".
    
    This fix has a couple of subtle bugs. One was that when switching over to
    "common_cpu", it did not change the field name, it just set a flag. But
    the code still found a "cpu" field. The "cpu" field is used for filtering
    and is returned when the event does not have a "cpu" field.
    
    This was found by:
    
      # cd /sys/kernel/tracing
      # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
      # cat events/sched/sched_wakeup/hist
    
    Which showed the histogram unsorted:
    
    { cpu:         19, pid:       1175 } hitcount:          1
    { cpu:          6, pid:        239 } hitcount:          2
    { cpu:         23, pid:       1186 } hitcount:         14
    { cpu:         12, pid:        249 } hitcount:          2
    { cpu:          3, pid:        994 } hitcount:          5
    
    Instead of hard coding the "cpu" checks, take advantage of the fact that
    trace_event_field_field() returns a special field for "cpu" and "CPU" if
    the event does not have "cpu" as a field. This special field has the
    "filter_type" of "FILTER_CPU". Check that to test if the returned field is
    of the CPU type instead of doing the string compare.
    
    Also, fix the sorting bug by testing for the hist_field flag of
    HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
    the special CPU field to know what compare routine to use, and since that
    special field does not have a size, it returns tracing_map_cmp_none.
    
    Cc: [email protected]
    Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"")
    Reported-by: Daniel Bristot de Oliveira <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tracing/probes: check the return value of kstrndup() for pbuf [+ + +]

Author: Xiaoke Wang <[email protected]>
Date:   Tue Dec 14 10:26:46 2021 +0800

    tracing/probes: check the return value of kstrndup() for pbuf
    
    [ Upstream commit 1c1857d400355e96f0fe8b32adc6fa7594d03b52 ]
    
    kstrndup() is a memory allocation-related function, it returns NULL when
    some internal memory errors happen. It is better to check the return
    value of it so to catch the memory error in time.
    
    Link: https://lkml.kernel.org/r/[email protected]
    
    Acked-by: Masami Hiramatsu <[email protected]>
    Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support")
    Signed-off-by: Xiaoke Wang <[email protected]>
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing/uprobes: Check the return value of kstrdup() for tu->filename [+ + +]

Author: Xiaoke Wang <[email protected]>
Date:   Tue Dec 14 09:28:02 2021 +0800

    tracing/uprobes: Check the return value of kstrdup() for tu->filename
    
    [ Upstream commit 8c7224245557707c613f130431cafbaaa4889615 ]
    
    kstrdup() returns NULL when some internal memory errors happen, it is
    better to check the return value of it so to catch the memory error in
    time.
    
    Link: https://lkml.kernel.org/r/[email protected]
    
    Acked-by: Masami Hiramatsu <[email protected]>
    Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
    Signed-off-by: Xiaoke Wang <[email protected]>
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Add test for user space strings when filtering on string pointers [+ + +]

Author: Steven Rostedt <[email protected]>
Date:   Mon Jan 10 11:55:32 2022 -0500

    tracing: Add test for user space strings when filtering on string pointers
    
    [ Upstream commit 77360f9bbc7e5e2ab7a2c8b4c0244fbbfcfc6f62 ]
    
    Pingfan reported that the following causes a fault:
    
      echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
      echo 1 > events/syscalls/sys_enter_at/enable
    
    The reason is that trace event filter treats the user space pointer
    defined by "filename" as a normal pointer to compare against the "cpu"
    string. The following bug happened:
    
     kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
     #PF: supervisor read access in kernel mode
     #PF: error_code(0x0001) - permissions violation
     PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
     Oops: 0001 [#1] PREEMPT SMP PTI
     CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
     Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
     RIP: 0010:strlen+0x0/0x20
     Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
           48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
           48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
     RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
     RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
     RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
     RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
     R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
     R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
     FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     PKRU: 55555554
     Call Trace:
      filter_pred_pchar+0x18/0x40
      filter_match_preds+0x31/0x70
      ftrace_syscall_enter+0x27a/0x2c0
      syscall_trace_enter.constprop.0+0x1aa/0x1d0
      do_syscall_64+0x16/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae
     RIP: 0033:0x7fa861d88664
    
    The above happened because the kernel tried to access user space directly
    and triggered a "supervisor read access in kernel mode" fault. Worse yet,
    the memory could not even be loaded yet, and a SEGFAULT could happen as
    well. This could be true for kernel space accessing as well.
    
    To be even more robust, test both kernel and user space strings. If the
    string fails to read, then simply have the filter fail.
    
    Note, TASK_SIZE is used to determine if the pointer is user or kernel space
    and the appropriate strncpy_from_kernel/user_nofault() function is used to
    copy the memory. For some architectures, the compare to TASK_SIZE may always
    pick user space or kernel space. If it gets it wrong, the only thing is that
    the filter will fail to match. In the future, this needs to be fixed to have
    the event denote which should be used. But failing a filter is much better
    than panicing the machine, and that can be solved later.
    
    Link: https://lore.kernel.org/all/[email protected]/
    Link: https://lkml.kernel.org/r/[email protected]
    
    Cc: [email protected]
    Cc: Ingo Molnar <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Tom Zanussi <[email protected]>
    Reported-by: Pingfan Liu <[email protected]>
    Tested-by: Pingfan Liu <[email protected]>
    Fixes: 87a342f5db69d ("tracing/filters: Support filtering for char * strings")
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Add ustring operation to filtering string pointers [+ + +]

Author: Steven Rostedt <[email protected]>
Date:   Thu Jan 13 20:08:40 2022 -0500

    tracing: Add ustring operation to filtering string pointers
    
    [ Upstream commit f37c3bbc635994eda203a6da4ba0f9d05165a8d6 ]
    
    Since referencing user space pointers is special, if the user wants to
    filter on a field that is a pointer to user space, then they need to
    specify it.
    
    Add a ".ustring" attribute to the field name for filters to state that the
    field is pointing to user space such that the kernel can take the
    appropriate action to read that pointer.
    
    Link: https://lore.kernel.org/all/[email protected]/
    
    Fixes: 77360f9bbc7e ("tracing: Add test for user space strings when filtering on string pointers")
    Tested-by: Sven Schnelle <[email protected]>
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Do not let synth_events block other dyn_event systems during create [+ + +]

Author: Beau Belgrave <[email protected]>
Date:   Thu Sep 30 15:38:21 2021 -0700

    tracing: Do not let synth_events block other dyn_event systems during create
    
    [ Upstream commit 4f67cca70c0f615e9cfe6ac42244f3416ec60877 ]
    
    synth_events is returning -EINVAL if the dyn_event create command does
    not contain ' \t'. This prevents other systems from getting called back.
    synth_events needs to return -ECANCELED in these cases when the command
    is not targeting the synth_event system.
    
    Link: https://lore.kernel.org/linux-trace-devel/[email protected]
    
    Fixes: c9e759b1e8456 ("tracing: Rework synthetic event command parsing")
    Reviewed-by: Masami Hiramatsu <[email protected]>
    Signed-off-by: Beau Belgrave <[email protected]>
    Signed-off-by: Steven Rostedt (VMware) <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Fix return value of __setup handlers [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Mar 2 19:17:44 2022 -0800

    tracing: Fix return value of __setup handlers
    
    commit 1d02b444b8d1345ea4708db3bab4db89a7784b55 upstream.
    
    __setup() handlers should generally return 1 to indicate that the
    boot options have been handled.
    
    Using invalid option values causes the entire kernel boot option
    string to be reported as Unknown and added to init's environment
    strings, polluting it.
    
      Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
        kprobe_event=p,syscall_any,$arg1 trace_options=quiet
        trace_clock=jiffies", will be passed to user space.
    
     Run /sbin/init as init process
       with arguments:
         /sbin/init
       with environment:
         HOME=/
         TERM=linux
         BOOT_IMAGE=/boot/bzImage-517rc6
         kprobe_event=p,syscall_any,$arg1
         trace_options=quiet
         trace_clock=jiffies
    
    Return 1 from the __setup() handlers so that init's environment is not
    polluted with kernel boot options.
    
    Link: lore.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    
    Cc: [email protected]
    Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter")
    Fixes: e1e232ca6b8f ("tracing: Add trace_clock=<clock> kernel parameter")
    Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Acked-by: Masami Hiramatsu <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ucounts: Fix systemd LimitNPROC with private users regression [+ + +]

Author: Eric W. Biederman <[email protected]>
Date:   Thu Feb 24 08:32:28 2022 -0600

    ucounts: Fix systemd LimitNPROC with private users regression
    
    commit 0ac983f512033cb7b5e210c9589768ad25b1e36b upstream.
    
    Long story short recursively enforcing RLIMIT_NPROC when it is not
    enforced on the process that creates a new user namespace, causes
    currently working code to fail.  There is no reason to enforce
    RLIMIT_NPROC recursively when we don't enforce it normally so update
    the code to detect this case.
    
    I would like to simply use capable(CAP_SYS_RESOURCE) to detect when
    RLIMIT_NPROC is not enforced upon the caller.  Unfortunately because
    RLIMIT_NPROC is charged and checked for enforcement based upon the
    real uid, using capable() which is euid based is inconsistent with reality.
    Come as close as possible to testing for capable(CAP_SYS_RESOURCE) by
    testing for when the real uid would match the conditions when
    CAP_SYS_RESOURCE would be present if the real uid was the effective
    uid.
    
    Reported-by: Etienne Dechamps <[email protected]>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215596
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Cc: [email protected]
    Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
    Reviewed-by: Kees Cook <[email protected]>
    Signed-off-by: "Eric W. Biederman" <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: clear related members when goto fail [+ + +]

Author: Hangyu Hua <[email protected]>
Date:   Sat Jan 1 01:21:38 2022 +0800

    usb: gadget: clear related members when goto fail
    
    commit 501e38a5531efbd77d5c73c0ba838a889bfc1d74 upstream.
    
    dev->config and dev->hs_config and dev->dev need to be cleaned if
    dev_config fails to avoid UAF.
    
    Acked-by: Alan Stern <[email protected]>
    Signed-off-by: Hangyu Hua <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: don't release an existing dev->buf [+ + +]

Author: Hangyu Hua <[email protected]>
Date:   Sat Jan 1 01:21:37 2022 +0800

    usb: gadget: don't release an existing dev->buf
    
    commit 89f3594d0de58e8a57d92d497dea9fee3d4b9cda upstream.
    
    dev->buf does not need to be released if it already exists before
    executing dev_config.
    
    Acked-by: Alan Stern <[email protected]>
    Signed-off-by: Hangyu Hua <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() [+ + +]

Author: Vitaly Kuznetsov <[email protected]>
Date:   Thu Jan 6 10:46:11 2022 +0100

    x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
    
    [ Upstream commit 51500b71d500f251037ed339047a4d9e7d7e295b ]
    
    KASAN detected the following issue:
    
     BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
     Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33
    
     CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
     Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
         BIOS Hyper-V UEFI Release v4.0 12/17/2019
     Call Trace:
      dump_stack_lvl+0x57/0x7d
      print_address_description.constprop.0+0x1f/0x140
      ? hyperv_flush_tlb_multi+0xf88/0x1060
      __kasan_report.cold+0x7f/0x11e
      ? hyperv_flush_tlb_multi+0xf88/0x1060
      kasan_report+0x38/0x50
      hyperv_flush_tlb_multi+0xf88/0x1060
      flush_tlb_mm_range+0x1b1/0x200
      ptep_clear_flush+0x10e/0x150
    ...
     Allocated by task 0:
      kasan_save_stack+0x1b/0x40
      __kasan_kmalloc+0x7c/0x90
      hv_common_init+0xae/0x115
      hyperv_init+0x97/0x501
      apic_intr_mode_init+0xb3/0x1e0
      x86_late_time_init+0x92/0xa2
      start_kernel+0x338/0x3eb
      secondary_startup_64_no_verify+0xc2/0xcb
    
     The buggy address belongs to the object at ffff8880011cc800
      which belongs to the cache kmalloc-1k of size 1024
     The buggy address is located 960 bytes inside of
      1024-byte region [ffff8880011cc800, ffff8880011ccc00)
    
    'hyperv_flush_tlb_multi+0xf88/0x1060' points to
    hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
    VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
    access past hv_vp_index's last element. This can (and will) happen
    when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.
    
    Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has
    been disabled") tried to deal with empty cpumask situation but
    apparently didn't fully fix the issue.
    
    'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
    'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
    time the particular mm is scheduled/unscheduled on some CPU (see
    switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
    TLB flush has zero influence on whether the particular process can get
    scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
    scheduled on one other CPU and got unscheduled during
    hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.
    
    It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
    from changing during hyperv_flush_tlb_multi()'s execution. It would be
    possible to copy it in the very beginning of the function but this is a
    waste. It seems we can deal with changing cpumask just fine.
    
    When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
    execution, there are two possible issues:
    - 'Under-flushing': we will not flush TLB on a CPU which got added to
    the mask while hyperv_flush_tlb_multi() was already running. This is
    not a problem as this is equal to mm getting scheduled on that CPU
    right after TLB flush.
    - 'Over-flushing': we may flush TLB on a CPU which is already cleared
    from the mask. First, extra TLB flush preserves correctness. Second,
    Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
    avoid the flush if CR3 doesn't match.
    
    Fix the immediate issue with cpumask_last()/hv_cpu_number_to_vp_number()
    and remove the pointless cpumask_empty() check from the beginning of the
    function as it really doesn't protect anything. Also, avoid the hypercall
    altogether when 'flush->processor_mask' ends up being empty.
    
    Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled")
    Signed-off-by: Vitaly Kuznetsov <[email protected]>
    Reviewed-by: Michael Kelley <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Wei Liu <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

xen/netfront: destroy queues before real_num_tx_queues is zeroed [+ + +]

Author: Marek Marczykowski-GцЁrecki <[email protected]>
Date:   Wed Feb 23 22:19:54 2022 +0100

    xen/netfront: destroy queues before real_num_tx_queues is zeroed
    
    commit dcf4ff7a48e7598e6b10126cc02177abb8ae4f3f upstream.
    
    xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
    delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5
    ("net-sysfs: update the queue counts in the unregistration path"),
    unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
    facts together means, that xennet_destroy_queues() called from
    xennet_remove() cannot do its job, because it's called after
    unregister_netdev(). This results in kfree-ing queues that are still
    linked in napi, which ultimately crashes:
    
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
        RIP: 0010:free_netdev+0xa3/0x1a0
        Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
        RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
        RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
        R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
        FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
        Call Trace:
         <TASK>
         xennet_remove+0x13d/0x300 [xen_netfront]
         xenbus_dev_remove+0x6d/0xf0
         __device_release_driver+0x17a/0x240
         device_release_driver+0x24/0x30
         bus_remove_device+0xd8/0x140
         device_del+0x18b/0x410
         ? _raw_spin_unlock+0x16/0x30
         ? klist_iter_exit+0x14/0x20
         ? xenbus_dev_request_and_reply+0x80/0x80
         device_unregister+0x13/0x60
         xenbus_dev_changed+0x18e/0x1f0
         xenwatch_thread+0xc0/0x1a0
         ? do_wait_intr_irq+0xa0/0xa0
         kthread+0x16b/0x190
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x22/0x30
         </TASK>
    
    Fix this by calling xennet_destroy_queues() from xennet_uninit(),
    when real_num_tx_queues is still available. This ensures that queues are
    destroyed when real_num_tx_queues is set to 0, regardless of how
    unregister_netdev() was called.
    
    Originally reported at
    https://github.com/QubesOS/qubes-issues/issues/7257
    
    Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path")
    Cc: [email protected]
    Signed-off-by: Marek Marczykowski-GцЁrecki <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: enforce validity of offload input flags [+ + +]

Author: Leon Romanovsky <[email protected]>
Date:   Tue Feb 8 16:14:32 2022 +0200

    xfrm: enforce validity of offload input flags
    
    commit 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 upstream.
    
    struct xfrm_user_offload has flags variable that received user input,
    but kernel didn't check if valid bits were provided. It caused a situation
    where not sanitized input was forwarded directly to the drivers.
    
    For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
    strongswan, but not implemented in the kernel at all.
    
    As a solution, check and sanitize input flags to forward
    XFRM_OFFLOAD_INBOUND to the drivers.
    
    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Leon Romanovsky <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: fix MTU regression [+ + +]

Author: Jiri Bohac <[email protected]>
Date:   Wed Jan 19 10:22:53 2022 +0100

    xfrm: fix MTU regression
    
    commit 6596a0229541270fb8d38d989f91b78838e5e9da upstream.
    
    Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
    sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
    
    A Packet Too Big ICMPv6 message received in response to an ESP
    packet will prevent all further communication through the tunnel
    if the reported MTU minus the ESP overhead is smaller than 1280.
    
    E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
    is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
    in all further packets in the tunnel dropped. A ping through the
    tunnel fails with "ping: sendmsg: Invalid argument".
    
    Apparently the MTU on the xfrm route is smaller than 1280 and
    fails the check inside ip6_setup_cork() added by 749439bf.
    
    We found this by debugging USGv6/ipv6ready failures. Failing
    tests are: "Phase-2 Interoperability Test Scenario IPsec" /
    5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
    
    Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm:
    xfrm_state_mtu should return at least 1280 for ipv6") attempted
    to fix this but caused another regression in TCP MSS calculations
    and had to be reverted.
    
    The patch below fixes the situation by dropping the MTU
    check and instead checking for the underflows described in the
    749439bf commit message.
    
    Signed-off-by: Jiri Bohac <[email protected]>
    Fixes: 749439bfac6e ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: fix the if_id check in changelink [+ + +]

Author: Antony Antony <[email protected]>
Date:   Tue Feb 1 07:51:57 2022 +0100

    xfrm: fix the if_id check in changelink
    
    commit 6d0d95a1c2b07270870e7be16575c513c29af3f1 upstream.
    
    if_id will be always 0, because it was not yet initialized.
    
    Fixes: 8dce43919566 ("xfrm: interface with if_id 0 should return error")
    Reported-by: Pavel Machek <[email protected]>
    Signed-off-by: Antony Antony <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Список изменений в Linux 5.15.27