Список изменений

ALSA: intel_hdmi: Fix reference to PCM buffer address [+ + +]

Author: Zhen Ni <[email protected]>
Date:   Wed Mar 2 15:42:41 2022 +0800

    ALSA: intel_hdmi: Fix reference to PCM buffer address
    
    commit 0aa6b294b312d9710804679abd2c0c8ca52cc2bc upstream.
    
    PCM buffers might be allocated dynamically when the buffer
    preallocation failed or a larger buffer is requested, and it's not
    guaranteed that substream->dma_buffer points to the actually used
    buffer.  The driver needs to refer to substream->runtime->dma_addr
    instead for the buffer address.
    
    Signed-off-by: Zhen Ni <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm64: dts: imx8mm: Fix VPU Hanging [+ + +]

Author: Adam Ford <[email protected]>
Date:   Tue Jan 25 11:11:25 2022 -0600

    arm64: dts: imx8mm: Fix VPU Hanging
    
    [ Upstream commit ef3075d6638d3d5353a97fcc7bb0338fc85675f5 ]
    
    The vpumix power domain has a reset assigned to it, however
    when used, it causes a system hang.  Testing has shown that
    it does not appear to be needed anywhere.
    
    Fixes: d39d4bb15310 ("arm64: dts: imx8mm: add GPC node")
    Signed-off-by: Adam Ford <[email protected]>
    Reviewed-by: Lucas Stach <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: juno: Remove GICv2m dma-range [+ + +]

Author: Robin Murphy <[email protected]>
Date:   Mon Jan 24 17:57:01 2022 +0000

    arm64: dts: juno: Remove GICv2m dma-range
    
    [ Upstream commit 31eeb6b09f4053f32a30ce9fbcdfca31f713028d ]
    
    Although it is painstakingly honest to describe all 3 PCI windows in
    "dma-ranges", it misses the the subtle distinction that the window for
    the GICv2m range is normally programmed for Device memory attributes
    rather than Normal Cacheable like the DRAM windows. Since MMU-401 only
    offers stage 2 translation, this means that when the PCI SMMU is
    enabled, accesses through that IPA range unexpectedly lose coherency if
    mapped as cacheable at the SMMU, due to the attribute combining rules.
    Since an extra 256KB is neither here nor there when we still have 10GB
    worth of usable address space, rather than attempting to describe and
    cope with this detail let's just remove the offending range. If the SMMU
    is not used then it makes no difference anyway.
    
    Link: https://lore.kernel.org/r/856c3f7192c6c3ce545ba67462f2ce9c86ed6b0c.1643046936.git.robin.murphy@arm.com
    Fixes: 4ac4d146cb63 ("arm64: dts: juno: Describe PCI dma-ranges")
    Reported-by: Anders Roxell <[email protected]>
    Acked-by: Liviu Dudau <[email protected]>
    Signed-off-by: Robin Murphy <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: rockchip: drop pclk_xpcs from gmac0 on rk3568 [+ + +]

Author: Frank Wunderlich <[email protected]>
Date:   Sun Jan 23 14:35:10 2022 +0100

    arm64: dts: rockchip: drop pclk_xpcs from gmac0 on rk3568
    
    [ Upstream commit 85a8bccfa945680dc561f06b65ea01341d2033fc ]
    
    pclk_xpcs is not supported by mainline driver and breaks dtbs_check
    
    following warnings occour, and many more
    
    rk3568-evb1-v10.dt.yaml: ethernet@fe2a0000: clocks:
        [[15, 386], [15, 389], [15, 389], [15, 184], [15, 180], [15, 181],
        [15, 389], [15, 185], [15, 172]] is too long
            From schema: Documentation/devicetree/bindings/net/snps,dwmac.yaml
    rk3568-evb1-v10.dt.yaml: ethernet@fe2a0000: clock-names:
        ['stmmaceth', 'mac_clk_rx', 'mac_clk_tx', 'clk_mac_refout', 'aclk_mac',
        'pclk_mac', 'clk_mac_speed', 'ptp_ref', 'pclk_xpcs'] is too long
            From schema: Documentation/devicetree/bindings/net/snps,dwmac.yaml
    
    after removing it, the clock and other warnings are gone.
    
    pclk_xpcs on gmac is used to support QSGMII, but this requires a driver
    supporting it.
    Once xpcs support is introduced, the clock can be added to the
    documentation and both controllers.
    
    Fixes: b8d41e5053cd ("arm64: dts: rockchip: add gmac0 node to rk3568")
    Co-developed-by: Peter Geis <[email protected]>
    Signed-off-by: Peter Geis <[email protected]>
    Signed-off-by: Frank Wunderlich <[email protected]>
    Acked-by: Michael Riesch <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Heiko Stuebner <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: rockchip: fix Quartz64-A ddr regulator voltage [+ + +]

Author: Peter Geis <[email protected]>
Date:   Thu Jan 27 19:38:05 2022 -0500

    arm64: dts: rockchip: fix Quartz64-A ddr regulator voltage
    
    [ Upstream commit ad02776cf8d083e28b1ca4d93d8b1949668c27cc ]
    
    The Quartz64 Model A uses a voltage divider to ensure ddr voltage is
    within specification from the default regulator configuration.
    Adjusting this voltage is detrimental, and currently causes the ddr
    voltage to be about 0.8v.
    
    Remove the min and max voltage setpoints for the ddr regulator.
    
    Fixes: b33a22a1e7c4 ("arm64: dts: rockchip: add basic dts for Pine64 Quartz64-A")
    Signed-off-by: Peter Geis <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Heiko Stuebner <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output [+ + +]

Author: Brian Norris <[email protected]>
Date:   Fri Jan 14 15:02:07 2022 -0800

    arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
    
    commit b5fbaf7d779f5f02b7f75b080e7707222573be2a upstream.
    
    Commit b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
    switched the platform to SPDIF, but we didn't fix up the device tree.
    
    Drop the pinctrl settings, because the 'spdif_bus' pins are either:
     * unused (on kevin, bob), so the settings is ~harmless
     * used by a different function (on scarlet), which causes probe
       failures (!!)
    
    Fixes: b18c6c3c7768 ("ASoC: rockchip: cdn-dp sound output use spdif")
    Signed-off-by: Brian Norris <[email protected]>
    Reviewed-by: Chen-Yu Tsai <[email protected]>
    Link: https://lore.kernel.org/r/20220114150129.v2.1.I46f64b00508d9dff34abe1c3e8d2defdab4ea1e5@changeid
    Signed-off-by: Heiko Stuebner <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL [+ + +]

Author: Masami Hiramatsu <[email protected]>
Date:   Mon Jan 24 17:17:54 2022 +0900

    arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL
    
    [ Upstream commit 1e0924bd09916fab795fc2a21ec1d148f24299fd ]
    
    Mark the start_backtrace() as notrace and NOKPROBE_SYMBOL
    because this function is called from ftrace and lockdep to
    get the caller address via return_address(). The lockdep
    is used in kprobes, it should also be NOKPROBE_SYMBOL.
    
    Fixes: b07f3499661c ("arm64: stacktrace: Move start_backtrace() out of the header")
    Cc: <[email protected]> # 5.13.x
    Signed-off-by: Masami Hiramatsu <[email protected]>
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/164301227374.1433152.12808232644267107415.stgit@devnote2
    Signed-off-by: Catalin Marinas <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Feb 23 20:46:35 2022 +0100

    ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
    
    commit 7b83299e5b9385943a857d59e15cba270df20d7e upstream.
    
    early_param() handlers should return 0 on success.
    __setup() handlers should return 1 on success, i.e., the parameter
    has been handled. A return of 0 would cause the "option=value" string
    to be added to init's environment strings, polluting it.
    
    ../arch/arm/mm/mmu.c: In function 'test_early_cachepolicy':
    ../arch/arm/mm/mmu.c:215:1: error: no return statement in function returning non-void [-Werror=return-type]
    ../arch/arm/mm/mmu.c: In function 'test_noalign_setup':
    ../arch/arm/mm/mmu.c:221:1: error: no return statement in function returning non-void [-Werror=return-type]
    
    Fixes: b849a60e0903 ("ARM: make cr_alignment read-only #ifndef CONFIG_CPU_CP15")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Cc: Uwe Kleine-Kц╤nig <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Russell King (Oracle) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: dts: switch timer config to common devkit8000 devicetree [+ + +]

Author: Anthoine Bourgeois <[email protected]>
Date:   Tue Jan 25 20:11:38 2022 +0100

    ARM: dts: switch timer config to common devkit8000 devicetree
    
    [ Upstream commit 64324ef337d0caa5798fa8fa3f6bbfbd3245868a ]
    
    This patch allow lcd43 and lcd70 flavors to benefit from timer
    evolution.
    
    Fixes: e428e250fde6 ("ARM: dts: Configure system timers for omap3")
    Signed-off-by: Anthoine Bourgeois <[email protected]>
    Signed-off-by: Tony Lindgren <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: dts: Use 32KiHz oscillator on devkit8000 [+ + +]

Author: Anthoine Bourgeois <[email protected]>
Date:   Tue Jan 25 20:11:39 2022 +0100

    ARM: dts: Use 32KiHz oscillator on devkit8000
    
    [ Upstream commit 8840f5460a23759403f1f2860429dcbcc2f04a65 ]
    
    Devkit8000 board seems to always used 32k_counter as clocksource.
    Restore this behavior.
    
    If clocksource is back to 32k_counter, timer12 is now the clockevent
    source (as before) and timer2 is not longer needed here.
    
    This commit fixes the same issue observed with commit 23885389dbbb
    ("ARM: dts: Fix timer regression for beagleboard revision c") when sleep
    is blocked until hitting keys over serial console.
    
    Fixes: aba1ad05da08 ("clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support")
    Fixes: e428e250fde6 ("ARM: dts: Configure system timers for omap3")
    Signed-off-by: Anthoine Bourgeois <[email protected]>
    Signed-off-by: Tony Lindgren <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ARM: Fix kgdb breakpoint for Thumb2 [+ + +]

Author: Russell King (Oracle) <[email protected]>
Date:   Wed Feb 16 15:37:38 2022 +0000

    ARM: Fix kgdb breakpoint for Thumb2
    
    commit d920eaa4c4559f59be7b4c2d26fa0a2e1aaa3da9 upstream.
    
    The kgdb code needs to register an undef hook for the Thumb UDF
    instruction that will fault in order to be functional on Thumb2
    platforms.
    
    Reported-by: Johannes Stezenbach <[email protected]>
    Tested-by: Johannes Stezenbach <[email protected]>
    Fixes: 5cbad0ebf45c ("kgdb: support for ARCH=arm")
    Signed-off-by: Russell King (Oracle) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: tegra: Move panels to AUX bus [+ + +]

Author: Thierry Reding <[email protected]>
Date:   Mon Dec 20 11:32:39 2021 +0100

    ARM: tegra: Move panels to AUX bus
    
    [ Upstream commit 8d3b01e0d4bb54368d73d0984466d72c2eeeac74 ]
    
    Move the eDP panel on Venice 2 and Nyan boards into the corresponding
    AUX bus device tree node. This allows us to avoid a nasty circular
    dependency that would otherwise be created between the DPAUX and panel
    nodes via the DDC/I2C phandle.
    
    Fixes: eb481f9ac95c ("ARM: tegra: add Acer Chromebook 13 device tree")
    Fixes: 59fe02cb079f ("ARM: tegra: Add DTS for the nyan-blaze board")
    Fixes: 40e231c770a4 ("ARM: tegra: Enable eDP for Venice2")
    Signed-off-by: Thierry Reding <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: cs4265: Fix the duplicated control name [+ + +]

Author: Fabio Estevam <[email protected]>
Date:   Tue Feb 15 09:05:14 2022 -0300

    ASoC: cs4265: Fix the duplicated control name
    
    commit c5487b9cdea5c1ede38a7ec94db0fc59963c8e86 upstream.
    
    Currently, the following error messages are seen during boot:
    
    asoc-simple-card sound: control 2:0:0:SPDIF Switch:0 is already present
    cs4265 1-004f: ASoC: failed to add widget SPDIF dapm kcontrol SPDIF Switch: -16
    
    Quoting Mark Brown:
    
    "The driver is just plain buggy, it defines both a regular SPIDF Switch
    control and a SND_SOC_DAPM_SWITCH() called SPDIF both of which will
    create an identically named control, it can never have loaded without
    error.  One or both of those has to be renamed or they need to be
    merged into one thing."
    
    Fix the duplicated control name by combining the two SPDIF controls here
    and move the register bits onto the DAPM widget and have DAPM control them.
    
    Fixes: f853d6b3ba34 ("ASoC: cs4265: Add a S/PDIF enable switch")
    Signed-off-by: Fabio Estevam <[email protected]>
    Acked-by: Charles Keepax <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min [+ + +]

Author: Marek Vasut <[email protected]>
Date:   Tue Feb 15 14:06:45 2022 +0100

    ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
    
    commit 9bdd10d57a8807dba0003af0325191f3cec0f11c upstream.
    
    While the $val/$val2 values passed in from userspace are always >= 0
    integers, the limits of the control can be signed integers and the $min
    can be non-zero and less than zero. To correctly validate $val/$val2
    against platform_max, add the $min offset to val first.
    
    Fixes: 817f7c9335ec0 ("ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()")
    Signed-off-by: Marek Vasut <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ASoC: rt5668: do not block workqueue if card is unbound [+ + +]

Author: Kai Vehmanen <[email protected]>
Date:   Mon Feb 7 17:29:59 2022 +0200

    ASoC: rt5668: do not block workqueue if card is unbound
    
    [ Upstream commit a6d78661dc903d90a327892bbc34268f3a5f4b9c ]
    
    The current rt5668_jack_detect_handler() assumes the component
    and card will always show up and implements an infinite usleep
    loop waiting for them to show up.
    
    This does not hold true if a codec interrupt (or other
    event) occurs when the card is unbound. The codec driver's
    remove  or shutdown functions cannot cancel the workqueue due
    to the wait loop. As a result, code can either end up blocking
    the workqueue, or hit a kernel oops when the card is freed.
    
    Fix the issue by rescheduling the jack detect handler in
    case the card is not ready. In case card never shows up,
    the shutdown/remove/suspend calls can now cancel the detect
    task.
    
    Signed-off-by: Kai Vehmanen <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Ranjani Sridharan <[email protected]>
    Reviewed-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Pц╘ter Ujfalusi <[email protected]>
    Reviewed-by: Shuming Fan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rt5682: do not block workqueue if card is unbound [+ + +]

Author: Kai Vehmanen <[email protected]>
Date:   Mon Feb 7 17:30:00 2022 +0200

    ASoC: rt5682: do not block workqueue if card is unbound
    
    [ Upstream commit 4c33de0673ced9c7c37b3bbd9bfe0fda72340b2a ]
    
    The current rt5682_jack_detect_handler() assumes the component
    and card will always show up and implements an infinite usleep
    loop waiting for them to show up.
    
    This does not hold true if a codec interrupt (or other
    event) occurs when the card is unbound. The codec driver's
    remove  or shutdown functions cannot cancel the workqueue due
    to the wait loop. As a result, code can either end up blocking
    the workqueue, or hit a kernel oops when the card is freed.
    
    Fix the issue by rescheduling the jack detect handler in
    case the card is not ready. In case card never shows up,
    the shutdown/remove/suspend calls can now cancel the detect
    task.
    
    Signed-off-by: Kai Vehmanen <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Ranjani Sridharan <[email protected]>
    Reviewed-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Pц╘ter Ujfalusi <[email protected]>
    Reviewed-by: Shuming Fan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ASoC: rt5682s: do not block workqueue if card is unbound [+ + +]

Author: Kai Vehmanen <[email protected]>
Date:   Mon Feb 7 17:29:58 2022 +0200

    ASoC: rt5682s: do not block workqueue if card is unbound
    
    [ Upstream commit d7b530fdc45e75a54914a194c4becd9672a4e24f ]
    
    The current rt5682s_jack_detect_handler() assumes the component
    and card will always show up and implements an infinite usleep
    loop waiting for them to show up.
    
    This does not hold true if a codec interrupt (or other
    event) occurs when the card is unbound. The codec driver's
    remove  or shutdown functions cannot cancel the workqueue due
    to the wait loop. As a result, code can either end up blocking
    the workqueue, or hit a kernel oops when the card is freed.
    
    Fix the issue by rescheduling the jack detect handler in
    case the card is not ready. In case card never shows up,
    the shutdown/remove/suspend calls can now cancel the detect
    task.
    
    Signed-off-by: Kai Vehmanen <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Reviewed-by: Ranjani Sridharan <[email protected]>
    Reviewed-by: Pierre-Louis Bossart <[email protected]>
    Reviewed-by: Pц╘ter Ujfalusi <[email protected]>
    Reviewed-by: Shuming Fan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ata: pata_hpt37x: fix PCI clock detection [+ + +]

Author: Sergey Shtylyov <[email protected]>
Date:   Sat Feb 19 23:04:29 2022 +0300

    ata: pata_hpt37x: fix PCI clock detection
    
    [ Upstream commit 5f6b0f2d037c8864f20ff15311c695f65eb09db5 ]
    
    The f_CNT register (at the PCI config. address 0x78) is 16-bit, not
    8-bit! The bug was there from the very start... :-(
    
    Signed-off-by: Sergey Shtylyov <[email protected]>
    Fixes: 669a5db411d8 ("[libata] Add a bunch of PATA drivers.")
    Cc: [email protected]
    Signed-off-by: Damien Le Moal <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature [+ + +]

Author: Andy Shevchenko <[email protected]>
Date:   Wed Feb 23 17:47:16 2022 +0200

    auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
    
    commit 4424c35ead667ba2e8de7ab8206da66453e6f728 upstream.
    
    It seems that the lcd2s_redefine_char() has never been properly
    tested. The buffer is filled by DEF_CUSTOM_CHAR command followed
    by the character number (from 0 to 7), but immediately after that
    these bytes are rewritten by the decoded hex stream.
    
    Fix the index to fill the buffer after the command and number.
    
    Fixes: 8c9108d014c5 ("auxdisplay: add a driver for lcd2s character display")
    Cc: Lars Poeschel <[email protected]>
    Signed-off-by: Andy Shevchenko <[email protected]>
    Reviewed-by: Geert Uytterhoeven <[email protected]>
    [fixed typo in commit message]
    Signed-off-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

auxdisplay: lcd2s: Fix memory leak in ->remove() [+ + +]

Author: Andy Shevchenko <[email protected]>
Date:   Wed Feb 23 17:47:17 2022 +0200

    auxdisplay: lcd2s: Fix memory leak in ->remove()
    
    commit 898c0a15425a5bcaa8d44bd436eae5afd2483796 upstream.
    
    Once allocated the struct lcd2s_data is never freed.
    Fix the memory leak by switching to devm_kzalloc().
    
    Fixes: 8c9108d014c5 ("auxdisplay: add a driver for lcd2s character display")
    Cc: Lars Poeschel <[email protected]>
    Signed-off-by: Andy Shevchenko <[email protected]>
    Signed-off-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

auxdisplay: lcd2s: Use proper API to free the instance of charlcd object [+ + +]

Author: Andy Shevchenko <[email protected]>
Date:   Wed Feb 23 17:47:18 2022 +0200

    auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
    
    commit 9ed331f8a0fb674f4f06edf05a1687bf755af27b upstream.
    
    While it might work, the current approach is fragile in a few ways:
    - whenever members in the structure are shuffled, the pointer will be wrong
    - the resource freeing may include more than covered by kfree()
    
    Fix this by using charlcd_free() call instead of kfree().
    
    Fixes: 8c9108d014c5 ("auxdisplay: add a driver for lcd2s character display")
    Cc: Lars Poeschel <[email protected]>
    Signed-off-by: Andy Shevchenko <[email protected]>
    Signed-off-by: Miguel Ojeda <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Don't expect inter-netns unique iflink indices [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Sun Feb 27 23:23:49 2022 +0100

    batman-adv: Don't expect inter-netns unique iflink indices
    
    commit 6c1f41afc1dbe59d9d3c8bb0d80b749c119aa334 upstream.
    
    The ifindex doesn't have to be unique for multiple network namespaces on
    the same machine.
    
      $ ip netns add test1
      $ ip -net test1 link add dummy1 type dummy
      $ ip netns add test2
      $ ip -net test2 link add dummy2 type dummy
    
      $ ip -net test1 link show dev dummy1
      6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
      $ ip -net test2 link show dev dummy2
      6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff
    
    But the batman-adv code to walk through the various layers of virtual
    interfaces uses this assumption because dev_get_iflink handles it
    internally and doesn't return the actual netns of the iflink. And
    dev_get_iflink only documents the situation where ifindex == iflink for
    physical devices.
    
    But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
    because ipoib_get_iflink implements it even when it sometimes returns an
    iflink != ifindex and sometimes iflink == ifindex. The caller must
    therefore make sure itself to check both netns and iflink + ifindex for
    equality. Only when they are equal, a "physical" interface was detected
    which should stop the traversal. On the other hand, vxcan_get_iflink can
    also return 0 in case there was currently no valid peer. In this case, it
    is still necessary to stop.
    
    Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
    Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
    Reported-by: Sabrina Dubroca <[email protected]>
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Request iflink once in batadv-on-batadv check [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Mon Feb 28 00:01:24 2022 +0100

    batman-adv: Request iflink once in batadv-on-batadv check
    
    commit 690bb6fb64f5dc7437317153902573ecad67593d upstream.
    
    There is no need to call dev_get_iflink multiple times for the same
    net_device in batadv_is_on_batman_iface. And since some of the
    .ndo_get_iflink callbacks are dynamic (for example via RCUs like in
    vxcan_get_iflink), it could easily happen that the returned values are not
    stable. The pre-checks before __dev_get_by_index are then of course bogus.
    
    Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

batman-adv: Request iflink once in batadv_get_real_netdevice [+ + +]

Author: Sven Eckelmann <[email protected]>
Date:   Mon Feb 28 00:01:24 2022 +0100

    batman-adv: Request iflink once in batadv_get_real_netdevice
    
    commit 6116ba09423f7d140f0460be6a1644dceaad00da upstream.
    
    There is no need to call dev_get_iflink multiple times for the same
    net_device in batadv_get_real_netdevice. And since some of the
    ndo_get_iflink callbacks are dynamic (for example via RCUs like in
    vxcan_get_iflink), it could easily happen that the returned values are not
    stable. The pre-checks before __dev_get_by_index are then of course bogus.
    
    Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
    Signed-off-by: Sven Eckelmann <[email protected]>
    Signed-off-by: Simon Wunderlich <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

binfmt_elf: Avoid total_mapping_size for ET_EXEC [+ + +]

Author: Kees Cook <[email protected]>
Date:   Mon Feb 28 10:59:12 2022 -0800

    binfmt_elf: Avoid total_mapping_size for ET_EXEC
    
    commit 439a8468242b313486e69b8cc3b45ddcfa898fbf upstream.
    
    Partially revert commit 5f501d555653 ("binfmt_elf: reintroduce using
    MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size"
    logic also to ET_EXEC.
    
    At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
    contiguous (but _are_ file-offset contiguous). This would result in a
    giant mapping attempting to cover the entire span, including the virtual
    address range hole, and well beyond the size of the ELF file itself,
    causing the kernel to refuse to load it. For example:
    
    $ readelf -lW /usr/bin/gcc
    ...
    Program Headers:
      Type Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   ...
    ...
      LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ...
      LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ...
    ...
           ^^^^^^^^ ^^^^^^^^^^^^^^^^^^                    ^^^^^^^^ ^^^^^^^^
    
    File offset range     : 0x000000-0x00bb4c
                            0x00bb4c bytes
    
    Virtual address range : 0x4000000000000000-0x600000000000bcb0
                            0x200000000000bcb0 bytes
    
    Remove the total_mapping_size logic for ET_EXEC, which reduces the
    ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better
    than nothing), and retains it for ET_DYN.
    
    Ironically, this is the reverse of the problem that originally caused
    problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future
    work could restore full coverage if load_elf_binary() were to perform
    mappings in a separate phase from the loading (where it could resolve
    both overlaps and holes).
    
    Cc: Eric Biederman <[email protected]>
    Cc: Alexander Viro <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Reported-by: matoro <[email protected]>
    Fixes: 5f501d555653 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE")
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: matoro <[email protected]>
    Link: https://lore.kernel.org/lkml/[email protected]
    Tested-By: John Paul Adrian Glaubitz <[email protected]>
    Link: https://lore.kernel.org/lkml/[email protected]
    Cc: [email protected]
    Signed-off-by: Kees Cook <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

blktrace: fix use after free for struct blk_trace [+ + +]

Author: Yu Kuai <[email protected]>
Date:   Mon Feb 28 11:43:54 2022 +0800

    blktrace: fix use after free for struct blk_trace
    
    commit 30939293262eb433c960c4532a0d59c4073b2b84 upstream.
    
    When tracing the whole disk, 'dropped' and 'msg' will be created
    under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free()
    won't remove those files. What's worse, the following UAF can be
    triggered because of accessing stale 'dropped' and 'msg':
    
    ==================================================================
    BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100
    Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188
    
    CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
    Call Trace:
     <TASK>
     dump_stack_lvl+0x34/0x44
     print_address_description.constprop.0.cold+0xab/0x381
     ? blk_dropped_read+0x89/0x100
     ? blk_dropped_read+0x89/0x100
     kasan_report.cold+0x83/0xdf
     ? blk_dropped_read+0x89/0x100
     kasan_check_range+0x140/0x1b0
     blk_dropped_read+0x89/0x100
     ? blk_create_buf_file_callback+0x20/0x20
     ? kmem_cache_free+0xa1/0x500
     ? do_sys_openat2+0x258/0x460
     full_proxy_read+0x8f/0xc0
     vfs_read+0xc6/0x260
     ksys_read+0xb9/0x150
     ? vfs_write+0x3d0/0x3d0
     ? fpregs_assert_state_consistent+0x55/0x60
     ? exit_to_user_mode_prepare+0x39/0x1e0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7fbc080d92fd
    Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1
    RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
    RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd
    RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045
    RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd
    R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0
    R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8
     </TASK>
    
    Allocated by task 1050:
     kasan_save_stack+0x1e/0x40
     __kasan_kmalloc+0x81/0xa0
     do_blk_trace_setup+0xcb/0x410
     __blk_trace_setup+0xac/0x130
     blk_trace_ioctl+0xe9/0x1c0
     blkdev_ioctl+0xf1/0x390
     __x64_sys_ioctl+0xa5/0xe0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Freed by task 1050:
     kasan_save_stack+0x1e/0x40
     kasan_set_track+0x21/0x30
     kasan_set_free_info+0x20/0x30
     __kasan_slab_free+0x103/0x180
     kfree+0x9a/0x4c0
     __blk_trace_remove+0x53/0x70
     blk_trace_ioctl+0x199/0x1c0
     blkdev_common_ioctl+0x5e9/0xb30
     blkdev_ioctl+0x1a5/0x390
     __x64_sys_ioctl+0xa5/0xe0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    The buggy address belongs to the object at ffff88816912f380
     which belongs to the cache kmalloc-96 of size 96
    The buggy address is located 88 bytes inside of
     96-byte region [ffff88816912f380, ffff88816912f3e0)
    The buggy address belongs to the page:
    page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f
    flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
    raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780
    raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
     ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                        ^
     ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
     ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    ==================================================================
    
    Fixes: c0ea57608b69 ("blktrace: remove debugfs file dentries from struct blk_trace")
    Signed-off-by: Yu Kuai <[email protected]>
    Reviewed-by: Greg Kroah-Hartman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern [+ + +]

Author: Haimin Zhang <[email protected]>
Date:   Wed Feb 16 16:40:38 2022 +0800

    block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
    
    [ Upstream commit cc8f7fe1f5eab010191aa4570f27641876fa1267 ]
    
    Add __GFP_ZERO flag for alloc_page in function bio_copy_kern to initialize
    the buffer of a bio.
    
    Signed-off-by: Haimin Zhang <[email protected]>
    Reviewed-by: Chaitanya Kulkarni <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

block: loop:use kstatfs.f_bsize of backing file to set discard granularity [+ + +]

Author: Ming Lei <[email protected]>
Date:   Wed Jan 26 11:58:30 2022 +0800

    block: loop:use kstatfs.f_bsize of backing file to set discard granularity
    
    [ Upstream commit 06582bc86d7f48d35cd044098ca1e246e8c7c52e ]
    
    If backing file's filesystem has implemented ->fallocate(), we think the
    loop device can support discard, then pass sb->s_blocksize as
    discard_granularity. However, some underlying FS, such as overlayfs,
    doesn't set sb->s_blocksize, and causes discard_granularity to be set as
    zero, then the warning in __blkdev_issue_discard() is triggered.
    
    Christoph suggested to pass kstatfs.f_bsize as discard granularity, and
    this way is fine because kstatfs.f_bsize means 'Optimal transfer block
    size', which still matches with definition of discard granularity.
    
    So fix the issue by setting discard_granularity as kstatfs.f_bsize if it
    is available, otherwise claims discard isn't supported.
    
    Cc: Christoph Hellwig <[email protected]>
    Cc: Vivek Goyal <[email protected]>
    Reported-by: Pei Zhang <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks [+ + +]

Author: Luiz Augusto von Dentz <[email protected]>
Date:   Mon Feb 14 17:59:38 2022 -0800

    Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
    
    [ Upstream commit 29fb608396d6a62c1b85acc421ad7a4399085b9f ]
    
    Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
    shall return the partial chunks it could allocate instead of freeing
    everything as otherwise it can cause problems like bellow.
    
    Fixes: 81be03e026dc ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
    Reported-by: Paul Menzel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>
    Tested-by: Paul Menzel <[email protected]> (Nokia N9 (MeeGo/Harmattan)
    Signed-off-by: Marcel Holtmann <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

bpf, sockmap: Do not ignore orig_len parameter [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Wed Mar 2 08:17:22 2022 -0800

    bpf, sockmap: Do not ignore orig_len parameter
    
    commit 60ce37b03917e593d8e5d8bcc7ec820773daf81d upstream.
    
    Currently, sk_psock_verdict_recv() returns skb->len
    
    This is problematic because tcp_read_sock() might have
    passed orig_len < skb->len, due to the presence of TCP urgent data.
    
    This causes an infinite loop from tcp_read_sock()
    
    Followup patch will make tcp_read_sock() more robust vs bad actors.
    
    Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Acked-by: John Fastabend <[email protected]>
    Acked-by: Jakub Sitnicki <[email protected]>
    Tested-by: Jakub Sitnicki <[email protected]>
    Acked-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: add missing run of delayed items after unlink during log replay [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Mon Feb 28 16:29:28 2022 +0000

    btrfs: add missing run of delayed items after unlink during log replay
    
    commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream.
    
    During log replay, whenever we need to check if a name (dentry) exists in
    a directory we do searches on the subvolume tree for inode references or
    or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
    keys as well, before kernel 5.17). However when during log replay we
    unlink a name, through btrfs_unlink_inode(), we may not delete inode
    references and dir index keys from a subvolume tree and instead just add
    the deletions to the delayed inode's delayed items, which will only be
    run when we commit the transaction used for log replay. This means that
    after an unlink operation during log replay, if we attempt to search for
    the same name during log replay, we will not see that the name was already
    deleted, since the deletion is recorded only on the delayed items.
    
    We run delayed items after every unlink operation during log replay,
    except at unlink_old_inode_refs() and at add_inode_ref(). This was due
    to an overlook, as delayed items should be run after evert unlink, for
    the reasons stated above.
    
    So fix those two cases.
    
    Fixes: 0d836392cadd5 ("Btrfs: fix mount failure after fsync due to hard link recreation")
    Fixes: 1f250e929a9c9 ("Btrfs: fix log replay failure after unlink and link combination")
    CC: [email protected] # 4.19+
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: defrag: bring back the old file extent search behavior [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Fri Feb 11 14:46:12 2022 +0800

    btrfs: defrag: bring back the old file extent search behavior
    
    commit d5633b0dee02d7d25e93463a03709f11c71500e2 upstream.
    
    For defrag, we don't really want to use btrfs_get_extent() to iterate
    all extent maps of an inode.
    
    The reasons are:
    
    - btrfs_get_extent() can merge extent maps
      And the result em has the higher generation of the two, causing defrag
      to mark unnecessary part of such merged large extent map.
    
      This in fact can result extra IO for autodefrag in v5.16+ kernels.
    
      However this patch is not going to completely solve the problem, as
      one can still using read() to trigger extent map reading, and got
      them merged.
    
      The completely solution for the extent map merging generation problem
      will come as an standalone fix.
    
    - btrfs_get_extent() caches the extent map result
      Normally it's fine, but for defrag the target range may not get
      another read/write for a long long time.
      Such cache would only increase the memory usage.
    
    - btrfs_get_extent() doesn't skip older extent map
      Unlike the old find_new_extent() which uses btrfs_search_forward() to
      skip the older subtree, thus it will pick up unnecessary extent maps.
    
    This patch will fix the regression by introducing defrag_get_extent() to
    replace the btrfs_get_extent() call.
    
    This helper will:
    
    - Not cache the file extent we found
      It will search the file extent and manually convert it to em.
    
    - Use btrfs_search_forward() to skip entire ranges which is modified in
      the past
    
    This should reduce the IO for autodefrag.
    
    Reported-by: Filipe Manana <[email protected]>
    Fixes: 7b508037d4ca ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: defrag: don't use merged extent map for their generation check [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Fri Feb 11 14:46:13 2022 +0800

    btrfs: defrag: don't use merged extent map for their generation check
    
    commit 199257a78bb01341c3ba6e85bdcf3a2e6e452c6d upstream.
    
    For extent maps, if they are not compressed extents and are adjacent by
    logical addresses and file offsets, they can be merged into one larger
    extent map.
    
    Such merged extent map will have the higher generation of all the
    original ones.
    
    But this brings a problem for autodefrag, as it relies on accurate
    extent_map::generation to determine if one extent should be defragged.
    
    For merged extent maps, their higher generation can mark some older
    extents to be defragged while the original extent map doesn't meet the
    minimal generation threshold.
    
    Thus this will cause extra IO.
    
    So solve the problem, here we introduce a new flag, EXTENT_FLAG_MERGED,
    to indicate if the extent map is merged from one or more ems.
    
    And for autodefrag, if we find a merged extent map, and its generation
    meets the generation requirement, we just don't use this one, and go
    back to defrag_get_extent() to read extent maps from subvolume trees.
    
    This could cause more read IO, but should result less defrag data write,
    so in the long run it should be a win for autodefrag.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: do not start relocation until in progress drops are done [+ + +]

Author: Josef Bacik <[email protected]>
Date:   Fri Feb 18 14:56:10 2022 -0500

    btrfs: do not start relocation until in progress drops are done
    
    commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream.
    
    We hit a bug with a recovering relocation on mount for one of our file
    systems in production.  I reproduced this locally by injecting errors
    into snapshot delete with balance running at the same time.  This
    presented as an error while looking up an extent item
    
      WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
      CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
      RIP: 0010:lookup_inline_extent_backref+0x647/0x680
      RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
      RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
      R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
      R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
      FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
      Call Trace:
       <TASK>
       insert_inline_extent_backref+0x46/0xd0
       __btrfs_inc_extent_ref.isra.0+0x5f/0x200
       ? btrfs_merge_delayed_refs+0x164/0x190
       __btrfs_run_delayed_refs+0x561/0xfa0
       ? btrfs_search_slot+0x7b4/0xb30
       ? btrfs_update_root+0x1a9/0x2c0
       btrfs_run_delayed_refs+0x73/0x1f0
       ? btrfs_update_root+0x1a9/0x2c0
       btrfs_commit_transaction+0x50/0xa50
       ? btrfs_update_reloc_root+0x122/0x220
       prepare_to_merge+0x29f/0x320
       relocate_block_group+0x2b8/0x550
       btrfs_relocate_block_group+0x1a6/0x350
       btrfs_relocate_chunk+0x27/0xe0
       btrfs_balance+0x777/0xe60
       balance_kthread+0x35/0x50
       ? btrfs_balance+0xe60/0xe60
       kthread+0x16b/0x190
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x22/0x30
       </TASK>
    
    Normally snapshot deletion and relocation are excluded from running at
    the same time by the fs_info->cleaner_mutex.  However if we had a
    pending balance waiting to get the ->cleaner_mutex, and a snapshot
    deletion was running, and then the box crashed, we would come up in a
    state where we have a half deleted snapshot.
    
    Again, in the normal case the snapshot deletion needs to complete before
    relocation can start, but in this case relocation could very well start
    before the snapshot deletion completes, as we simply add the root to the
    dead roots list and wait for the next time the cleaner runs to clean up
    the snapshot.
    
    Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
    had a pending drop_progress key.  If they do then we know we were in the
    middle of the drop operation and set a flag on the fs_info.  Then
    balance can wait until this flag is cleared to start up again.
    
    If there are DEAD_ROOT's that don't have a drop_progress set then we're
    safe to start balance right away as we'll be properly protected by the
    cleaner_mutex.
    
    CC: [email protected] # 5.10+
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Josef Bacik <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: do not WARN_ON() if we have PageError set [+ + +]

Author: Josef Bacik <[email protected]>
Date:   Fri Feb 18 10:17:39 2022 -0500

    btrfs: do not WARN_ON() if we have PageError set
    
    commit a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 upstream.
    
    Whenever we do any extent buffer operations we call
    assert_eb_page_uptodate() to complain loudly if we're operating on an
    non-uptodate page.  Our overnight tests caught this warning earlier this
    week
    
      WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
      CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G        W         5.17.0-rc3+ #564
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Workqueue: btrfs-cache btrfs_work_helper
      RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
      RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
      RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
      RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
      RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
      R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
      R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
      FS:  0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
      Call Trace:
    
       extent_buffer_test_bit+0x3f/0x70
       free_space_test_bit+0xa6/0xc0
       load_free_space_tree+0x1f6/0x470
       caching_thread+0x454/0x630
       ? rcu_read_lock_sched_held+0x12/0x60
       ? rcu_read_lock_sched_held+0x12/0x60
       ? rcu_read_lock_sched_held+0x12/0x60
       ? lock_release+0x1f0/0x2d0
       btrfs_work_helper+0xf2/0x3e0
       ? lock_release+0x1f0/0x2d0
       ? finish_task_switch.isra.0+0xf9/0x3a0
       process_one_work+0x26d/0x580
       ? process_one_work+0x580/0x580
       worker_thread+0x55/0x3b0
       ? process_one_work+0x580/0x580
       kthread+0xf0/0x120
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x1f/0x30
    
    This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer
    uptodate when we fail to write it"), however all that fix did was keep
    us from finding extent buffers after a failed writeout.  It didn't keep
    us from continuing to use a buffer that we already had found.
    
    In this case we're searching the commit root to cache the block group,
    so we can start committing the transaction and switch the commit root
    and then start writing.  After the switch we can look up an extent
    buffer that hasn't been written yet and start processing that block
    group.  Then we fail to write that block out and clear Uptodate on the
    page, and then we start spewing these errors.
    
    Normally we're protected by the tree lock to a certain degree here.  If
    we read a block we have that block read locked, and we block the writer
    from locking the block before we submit it for the write.  However this
    isn't necessarily fool proof because the read could happen before we do
    the submit_bio and after we locked and unlocked the extent buffer.
    
    Also in this particular case we have path->skip_locking set, so that
    won't save us here.  We'll simply get a block that was valid when we
    read it, but became invalid while we were using it.
    
    What we really want is to catch the case where we've "read" a block but
    it's not marked Uptodate.  On read we ClearPageError(), so if we're
    !Uptodate and !Error we know we didn't do the right thing for reading
    the page.
    
    Fix this by checking !Uptodate && !Error, this way we will not complain
    if our buffer gets invalidated while we're using it, and we'll maintain
    the spirit of the check which is to make sure we have a fully in-cache
    block while we're messing with it.
    
    CC: [email protected] # 5.4+
    Signed-off-by: Josef Bacik <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fallback to blocking mode when doing async dio over multiple extents [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Wed Mar 2 11:48:39 2022 +0000

    btrfs: fallback to blocking mode when doing async dio over multiple extents
    
    commit ca93e44bfb5fd7996b76f0f544999171f647f93b upstream.
    
    Some users recently reported that MariaDB was getting a read corruption
    when using io_uring on top of btrfs. This started to happen in 5.16,
    after commit 51bd9563b6783d ("btrfs: fix deadlock due to page faults
    during direct IO reads and writes"). That changed btrfs to use the new
    iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling
    iomap_dio_rw(). This was necessary to fix deadlocks when the iovector
    corresponds to a memory mapped file region. That type of scenario is
    exercised by test case generic/647 from fstests.
    
    For this MariaDB scenario, we attempt to read 16K from file offset X
    using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each
    with a size of 4K, and what happens is the following:
    
    1) btrfs_direct_read() disables page faults and calls iomap_dio_rw();
    
    2) iomap creates a struct iomap_dio object, its reference count is
       initialized to 1 and its ->size field is initialized to 0;
    
    3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds
       the first 4K extent, and setups an iomap for this extent consisting
       of a single page;
    
    4) At iomap_dio_bio_iter(), we are able to access the first page of the
       buffer (struct iov_iter) with bio_iov_iter_get_pages() without
       triggering a page fault;
    
    5) iomap submits a bio for this 4K extent
       (iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments
       the refcount on the struct iomap_dio object to 2; The ->size field
       of the struct iomap_dio object is incremented to 4K;
    
    6) iomap calls btrfs_iomap_begin() again, this time with a file
       offset of X + 4K. There we setup an iomap for the next extent
       that also has a size of 4K;
    
    7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(),
       which tries to access the next page (2nd page) of the buffer.
       This triggers a page fault and returns -EFAULT;
    
    8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error
       to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and
       the struct iomap_dio object has a ->size value of 4K (we submitted
       a bio for an extent already). The 'wait_for_completion' variable
       is not set to true, because our iocb has IOCB_NOWAIT set;
    
    9) At the bottom of __iomap_dio_rw(), we decrement the reference count
       of the struct iomap_dio object from 2 to 1. Because we were not
       the only ones holding a reference on it and 'wait_for_completion' is
       set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which
       just returns it up the callchain, up to io_uring;
    
    10) The bio submitted for the first extent (step 5) completes and its
        bio endio function, iomap_dio_bio_end_io(), decrements the last
        reference on the struct iomap_dio object, resulting in calling
        iomap_dio_complete_work() -> iomap_dio_complete().
    
    11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K
        and return 4K (the amount of io done) to iomap_dio_complete_work();
    
    12) iomap_dio_complete_work() calls the iocb completion callback,
        iocb->ki_complete() with a second argument value of 4K (total io
        done) and the iocb with the adjust ki_pos of X + 4K. This results
        in completing the read request for io_uring, leaving it with a
        result of 4K bytes read, and only the first page of the buffer
        filled in, while the remaining 3 pages, corresponding to the other
        3 extents, were not filled;
    
    13) For the application, the result is unexpected because if we ask
        to read N bytes, it expects to get N bytes read as long as those
        N bytes don't cross the EOF (i_size).
    
    MariaDB reports this as an error, as it's not expecting a short read,
    since it knows it's asking for read operations fully within the i_size
    boundary. This is typical in many applications, but it may also be
    questionable if they should react to such short reads by issuing more
    read calls to get the remaining data. Nevertheless, the short read
    happened due to a change in btrfs regarding how it deals with page
    faults while in the middle of a read operation, and there's no reason
    why btrfs can't have the previous behaviour of returning the whole data
    that was requested by the application.
    
    The problem can also be triggered with the following simple program:
    
      /* Get O_DIRECT */
      #ifndef _GNU_SOURCE
      #define _GNU_SOURCE
      #endif
    
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <fcntl.h>
      #include <errno.h>
      #include <string.h>
      #include <liburing.h>
    
      int main(int argc, char *argv[])
      {
          char *foo_path;
          struct io_uring ring;
          struct io_uring_sqe *sqe;
          struct io_uring_cqe *cqe;
          struct iovec iovec;
          int fd;
          long pagesize;
          void *write_buf;
          void *read_buf;
          ssize_t ret;
          int i;
    
          if (argc != 2) {
              fprintf(stderr, "Use: %s <directory>\n", argv[0]);
              return 1;
          }
    
          foo_path = malloc(strlen(argv[1]) + 5);
          if (!foo_path) {
              fprintf(stderr, "Failed to allocate memory for file path\n");
              return 1;
          }
          strcpy(foo_path, argv[1]);
          strcat(foo_path, "/foo");
    
          /*
           * Create file foo with 2 extents, each with a size matching
           * the page size. Then allocate a buffer to read both extents
           * with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing
           * the read with io_uring, access the first page of the buffer
           * to fault it in, so that during the read we only trigger a
           * page fault when accessing the second page of the buffer.
           */
           fd = open(foo_path, O_CREAT | O_TRUNC | O_WRONLY |
                    O_DIRECT, 0666);
           if (fd == -1) {
               fprintf(stderr,
                       "Failed to create file 'foo': %s (errno %d)",
                       strerror(errno), errno);
               return 1;
           }
    
           pagesize = sysconf(_SC_PAGE_SIZE);
           ret = posix_memalign(&write_buf, pagesize, 2 * pagesize);
           if (ret) {
               fprintf(stderr, "Failed to allocate write buffer\n");
               return 1;
           }
    
           memset(write_buf, 0xab, pagesize);
           memset(write_buf + pagesize, 0xcd, pagesize);
    
           /* Create 2 extents, each with a size matching page size. */
           for (i = 0; i < 2; i++) {
               ret = pwrite(fd, write_buf + i * pagesize, pagesize,
                            i * pagesize);
               if (ret != pagesize) {
                   fprintf(stderr,
                         "Failed to write to file, ret = %ld errno %d (%s)\n",
                          ret, errno, strerror(errno));
                   return 1;
               }
               ret = fsync(fd);
               if (ret != 0) {
                   fprintf(stderr, "Failed to fsync file\n");
                   return 1;
               }
           }
    
           close(fd);
           fd = open(foo_path, O_RDONLY | O_DIRECT);
           if (fd == -1) {
               fprintf(stderr,
                       "Failed to open file 'foo': %s (errno %d)",
                       strerror(errno), errno);
               return 1;
           }
    
           ret = posix_memalign(&read_buf, pagesize, 2 * pagesize);
           if (ret) {
               fprintf(stderr, "Failed to allocate read buffer\n");
               return 1;
           }
    
           /*
            * Fault in only the first page of the read buffer.
            * We want to trigger a page fault for the 2nd page of the
            * read buffer during the read operation with io_uring
            * (O_DIRECT and IOCB_NOWAIT).
            */
           memset(read_buf, 0, 1);
    
           ret = io_uring_queue_init(1, &ring, 0);
           if (ret != 0) {
               fprintf(stderr, "Failed to create io_uring queue\n");
               return 1;
           }
    
           sqe = io_uring_get_sqe(&ring);
           if (!sqe) {
               fprintf(stderr, "Failed to get io_uring sqe\n");
               return 1;
           }
    
           iovec.iov_base = read_buf;
           iovec.iov_len = 2 * pagesize;
           io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
    
           ret = io_uring_submit_and_wait(&ring, 1);
           if (ret != 1) {
               fprintf(stderr,
                       "Failed at io_uring_submit_and_wait()\n");
               return 1;
           }
    
           ret = io_uring_wait_cqe(&ring, &cqe);
           if (ret < 0) {
               fprintf(stderr, "Failed at io_uring_wait_cqe()\n");
               return 1;
           }
    
           printf("io_uring read result for file foo:\n\n");
           printf("  cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize);
           printf("  memcmp(read_buf, write_buf) == %d (expected 0)\n",
                  memcmp(read_buf, write_buf, 2 * pagesize));
    
           io_uring_cqe_seen(&ring, cqe);
           io_uring_queue_exit(&ring);
    
           return 0;
      }
    
    When running it on an unpatched kernel:
    
      $ gcc io_uring_test.c -luring
      $ mkfs.btrfs -f /dev/sda
      $ mount /dev/sda /mnt/sda
      $ ./a.out /mnt/sda
      io_uring read result for file foo:
    
        cqe->res == 4096 (expected 8192)
        memcmp(read_buf, write_buf) == -205 (expected 0)
    
    After this patch, the read always returns 8192 bytes, with the buffer
    filled with the correct data. Although that reproducer always triggers
    the bug in my test vms, it's possible that it will not be so reliable
    on other environments, as that can happen if the bio for the first
    extent completes and decrements the reference on the struct iomap_dio
    object before we do the atomic_dec_and_test() on the reference at
    __iomap_dio_rw().
    
    Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN
    whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag
    set) over a range that spans multiple extents (or a mix of extents and
    holes). This avoids returning success to the caller when we only did
    partial IO, which is not optimal for writes and for reads it's actually
    incorrect, as the caller doesn't expect to get less bytes read than it has
    requested (unless EOF is crossed), as previously mentioned. This is also
    the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()),
    even though it doesn't use IOMAP_DIO_PARTIAL.
    
    A test case for fstests will follow soon.
    
    Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/
    Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/
    CC: [email protected] # 5.16+
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Oct 28 16:03:41 2021 +0100

    btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range
    
    commit f0bfa76a11e93d0fe2c896fcb566568c5e8b5d3f upstream.
    
    When doing a direct IO write against a file range that either has
    preallocated extents in that range or has regular extents and the file
    has the NOCOW attribute set, the write fails with -ENOSPC when all of
    the following conditions are met:
    
    1) There are no data blocks groups with enough free space matching
       the size of the write;
    
    2) There's not enough unallocated space for allocating a new data block
       group;
    
    3) The extents in the target file range are not shared, neither through
       snapshots nor through reflinks.
    
    This is wrong because a NOCOW write can be done in such case, and in fact
    it's possible to do it using a buffered IO write, since when failing to
    allocate data space, the buffered IO path checks if a NOCOW write is
    possible.
    
    The failure in direct IO write path comes from the fact that early on,
    at btrfs_dio_iomap_begin(), we try to allocate data space for the write
    and if it that fails we return the error and stop - we never check if we
    can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check
    if we can do a NOCOW write into the range, or a subset of the range, and
    then release the previously reserved data space.
    
    Fix this by doing the data reservation only if needed, when we must COW,
    at btrfs_get_blocks_direct_write() instead of doing it at
    btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes
    the inneficiency of doing unnecessary data reservations.
    
    The following example test script reproduces the problem:
    
      $ cat dio-nocow-enospc.sh
      #!/bin/bash
    
      DEV=/dev/sdj
      MNT=/mnt/sdj
    
      # Use a small fixed size (1G) filesystem so that it's quick to fill
      # it up.
      # Make sure the mixed block groups feature is not enabled because we
      # later want to not have more space available for allocating data
      # extents but still have enough metadata space free for the file writes.
      mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV
      mount $DEV $MNT
    
      # Create our test file with the NOCOW attribute set.
      touch $MNT/foobar
      chattr +C $MNT/foobar
    
      # Now fill in all unallocated space with data for our test file.
      # This will allocate a data block group that will be full and leave
      # no (or a very small amount of) unallocated space in the device, so
      # that it will not be possible to allocate a new block group later.
      echo
      echo "Creating test file with initial data..."
      xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar
    
      # Now try a direct IO write against file range [0, 10M[.
      # This should succeed since this is a NOCOW file and an extent for the
      # range was previously allocated.
      echo
      echo "Trying direct IO write over allocated space..."
      xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar
    
      umount $MNT
    
    When running the test:
    
      $ ./dio-nocow-enospc.sh
      (...)
    
      Creating test file with initial data...
      wrote 943718400/943718400 bytes at offset 0
      900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec)
    
      Trying direct IO write over allocated space...
      pwrite: No space left on device
    
    A test case for fstests will follow, testing both this direct IO write
    scenario as well as the buffered IO write scenario to make it less likely
    to get future regressions on the buffered IO case.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix lost prealloc extents beyond eof after full fsync [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Thu Feb 17 12:12:02 2022 +0000

    btrfs: fix lost prealloc extents beyond eof after full fsync
    
    commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream.
    
    When doing a full fsync, if we have prealloc extents beyond (or at) eof,
    and the leaves that contain them were not modified in the current
    transaction, we end up not logging them. This results in losing those
    extents when we replay the log after a power failure, since the inode is
    truncated to the current value of the logged i_size.
    
    Just like for the fast fsync path, we need to always log all prealloc
    extents starting at or beyond i_size. The fast fsync case was fixed in
    commit 471d557afed155 ("Btrfs: fix loss of prealloc extents past i_size
    after fsync log replay") but it missed the full fsync path. The problem
    exists since the very early days, when the log tree was added by
    commit e02119d5a7b439 ("Btrfs: Add a write ahead tree log to optimize
    synchronous operations").
    
    Example reproducer:
    
      $ mkfs.btrfs -f /dev/sdc
      $ mount /dev/sdc /mnt
    
      # Create our test file with many file extent items, so that they span
      # several leaves of metadata, even if the node/page size is 64K. Use
      # direct IO and not fsync/O_SYNC because it's both faster and it avoids
      # clearing the full sync flag from the inode - we want the fsync below
      # to trigger the slow full sync code path.
      $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo
    
      # Now add two preallocated extents to our file without extending the
      # file's size. One right at i_size, and another further beyond, leaving
      # a gap between the two prealloc extents.
      $ xfs_io -c "falloc -k 16M 1M" /mnt/foo
      $ xfs_io -c "falloc -k 20M 1M" /mnt/foo
    
      # Make sure everything is durably persisted and the transaction is
      # committed. This makes all created extents to have a generation lower
      # than the generation of the transaction used by the next write and
      # fsync.
      sync
    
      # Now overwrite only the first extent, which will result in modifying
      # only the first leaf of metadata for our inode. Then fsync it. This
      # fsync will use the slow code path (inode full sync bit is set) because
      # it's the first fsync since the inode was created/loaded.
      $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo
    
      # Extent list before power failure.
      $ xfs_io -c "fiemap -v" /mnt/foo
      /mnt/foo:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..7]:          2178048..2178055     8   0x0
         1: [8..16383]:      26632..43007     16376   0x0
         2: [16384..32767]:  2156544..2172927 16384   0x0
         3: [32768..34815]:  2172928..2174975  2048 0x800
         4: [34816..40959]:  hole              6144
         5: [40960..43007]:  2174976..2177023  2048 0x801
    
      <power fail>
    
      # Mount fs again, trigger log replay.
      $ mount /dev/sdc /mnt
    
      # Extent list after power failure and log replay.
      $ xfs_io -c "fiemap -v" /mnt/foo
      /mnt/foo:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..7]:          2178048..2178055     8   0x0
         1: [8..16383]:      26632..43007     16376   0x0
         2: [16384..32767]:  2156544..2172927 16384   0x1
    
      # The prealloc extents at file offsets 16M and 20M are missing.
    
    So fix this by calling btrfs_log_prealloc_extents() when we are doing a
    full fsync, so that we always log all prealloc extents beyond eof.
    
    A test case for fstests will follow soon.
    
    CC: [email protected] # 4.19+
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() [+ + +]

Author: Omar Sandoval <[email protected]>
Date:   Thu Feb 17 15:14:43 2022 -0800

    btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
    
    commit 5fd76bf31ccfecc06e2e6b29f8c809e934085b99 upstream.
    
    We are seeing crashes similar to the following trace:
    
    [38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
    [38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
    [38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    [38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
    [38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
    [38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
    [38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
    [38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
    [38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
    [38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
    [38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
    [38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
    [38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [38.992528] Call Trace:
    [38.992854]  <TASK>
    [38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
    [38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
    [38.994801]  ? vsnprintf+0x33c/0x520
    [38.995368]  ? __kmalloc_track_caller+0x351/0x440
    [38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
    [38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
    [38.997867]  ? mod_objcg_state+0xee/0x340
    [38.998552]  ? seq_release+0x24/0x30
    [38.999184]  ? proc_nr_files+0x30/0x30
    [38.999654]  ? call_rcu+0xc8/0x2f0
    [39.000228]  ? __x64_sys_ioctl+0x84/0xc0
    [39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
    [39.001973]  __x64_sys_ioctl+0x84/0xc0
    [39.002566]  do_syscall_64+0x3a/0x80
    [39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [39.003735] RIP: 0033:0x7f11c166959b
    [39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
    [39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
    [39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
    [39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
    [39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
    [39.015040]  </TASK>
    [39.015418] ---[ end trace 0000000000000000 ]---
    [43.131559] ------------[ cut here ]------------
    [43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
    [43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
    [43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
    [43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    [43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
    [43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
    [43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
    [43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
    [43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
    [43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
    [43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
    [43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
    [43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
    [43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [43.150559] Call Trace:
    [43.150904]  <TASK>
    [43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
    [43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
    [43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
    [43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
    [43.154475]  ? __check_object_size+0x150/0x170
    [43.155170]  ? preempt_count_add+0x49/0xa0
    [43.155753]  ? __x64_sys_ioctl+0x84/0xc0
    [43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
    [43.157456]  __x64_sys_ioctl+0x84/0xc0
    [43.157980]  do_syscall_64+0x3a/0x80
    [43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [43.159231] RIP: 0033:0x7f7657f1e59b
    [43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
    [43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
    [43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
    [43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
    [43.166863]  </TASK>
    [43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
    [43.169552] ---[ end trace 0000000000000000 ]---
    [43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
    [43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
    [43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
    [43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
    [43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
    [43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
    [43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
    [43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
    [43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
    [43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    
    We first hit the WARN_ON(rc->block_group->pinned > 0) in
    btrfs_relocate_block_group() and then the BUG_ON(!cache) in
    unpin_extent_range(). This tells us that we are exiting relocation and
    removing the block group with bytes still pinned for that block group.
    This is supposed to be impossible: the last thing relocate_block_group()
    does is commit the transaction to get rid of pinned extents.
    
    Commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when
    waiting for a transaction commit") introduced an optimization so that
    commits from fsync don't have to wait for the previous commit to unpin
    extents. This was only intended to affect fsync, but it inadvertently
    made it possible for any commit to skip waiting for the previous commit
    to unpin. This is because if a call to btrfs_commit_transaction() finds
    that another thread is already committing the transaction, it waits for
    the other thread to complete the commit and then returns. If that other
    thread was in fsync, then it completes the commit without completing the
    previous commit. This makes the following sequence of events possible:
    
    Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
    btrfs_commit_transaction(N) |                                     |
      btrfs_run_delayed_refs    |                                     |
        pin extents             |                                     |
      ...                       |                                     |
      state = UNBLOCKED         |btrfs_sync_file                      |
                                |  btrfs_start_transaction(N + 1)     |relocate_block_group
                                |                                     |  btrfs_join_transaction(N + 1)
                                |  btrfs_commit_transaction(N + 1)    |
      ...                       |  trans->state = COMMIT_START        |
                                |                                     |  btrfs_commit_transaction(N + 1)
                                |                                     |    wait_for_commit(N + 1, COMPLETED)
                                |  wait_for_commit(N, SUPER_COMMITTED)|
      state = SUPER_COMMITTED   |  ...                                |
      btrfs_finish_extent_commit|                                     |
        unpin_extent_range()    |  trans->state = COMPLETED           |
                                |                                     |    return
                                |                                     |
        ...                     |                                     |Thread 1 isn't done, so pinned > 0
                                |                                     |and we WARN
                                |                                     |
                                |                                     |btrfs_remove_block_group
        unpin_extent_range()    |                                     |
          Thread 3 removed the  |                                     |
          block group, so we BUG|                                     |
    
    There are other sequences involving SUPER_COMMITTED transactions that
    can cause a similar outcome.
    
    We could fix this by making relocation explicitly wait for unpinning,
    but there may be other cases that need it. Josef mentioned ENOSPC
    flushing and the free space cache inode as other potential victims.
    Rather than playing whack-a-mole, this fix is conservative and makes all
    commits not in fsync wait for all previous transactions, which is what
    the optimization intended.
    
    Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
    CC: [email protected] # 5.15+
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Omar Sandoval <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: get rid of warning on transaction commit when using flushoncommit [+ + +]

Author: Filipe Manana <[email protected]>
Date:   Wed Feb 2 15:26:09 2022 +0000

    btrfs: get rid of warning on transaction commit when using flushoncommit
    
    [ Upstream commit a0f0cf8341e34e5d2265bfd3a7ad68342da1e2aa ]
    
    When using the flushoncommit mount option, during almost every transaction
    commit we trigger a warning from __writeback_inodes_sb_nr():
    
      $ cat fs/fs-writeback.c:
      (...)
      static void __writeback_inodes_sb_nr(struct super_block *sb, ...
      {
            (...)
            WARN_ON(!rwsem_is_locked(&sb->s_umount));
            (...)
      }
      (...)
    
    The trace produced in dmesg looks like the following:
    
      [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
      [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
      [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
      [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
      [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
      [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
      [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
      [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
      [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
      [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
      [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
      [947.553621] FS:  0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
      [947.560537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
      [947.571072] Call Trace:
      [947.572354]  <TASK>
      [947.573266]  btrfs_commit_transaction+0x1f1/0x998
      [947.576785]  ? start_transaction+0x3ab/0x44e
      [947.579867]  ? schedule_timeout+0x8a/0xdd
      [947.582716]  transaction_kthread+0xe9/0x156
      [947.585721]  ? btrfs_cleanup_transaction.isra.0+0x407/0x407
      [947.590104]  kthread+0x131/0x139
      [947.592168]  ? set_kthread_struct+0x32/0x32
      [947.595174]  ret_from_fork+0x22/0x30
      [947.597561]  </TASK>
      [947.598553] ---[ end trace 644721052755541c ]---
    
    This is because we started using writeback_inodes_sb() to flush delalloc
    when committing a transaction (when using -o flushoncommit), in order to
    avoid deadlocks with filesystem freeze operations. This change was made
    by commit ce8ea7cc6eb313 ("btrfs: don't call btrfs_start_delalloc_roots
    in flushoncommit"). After that change we started producing that warning,
    and every now and then a user reports this since the warning happens too
    often, it spams dmesg/syslog, and a user is unsure if this reflects any
    problem that might compromise the filesystem's reliability.
    
    We can not just lock the sb->s_umount semaphore before calling
    writeback_inodes_sb(), because that would at least deadlock with
    filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
    is called while we are holding that semaphore in write mode, and that can
    trigger a transaction commit, resulting in a deadlock. It would also
    trigger the same type of deadlock in the unmount path. Possibly, it could
    also introduce some other locking dependencies that lockdep would report.
    
    To fix this call try_to_writeback_inodes_sb() instead of
    writeback_inodes_sb(), because that will try to read lock sb->s_umount
    and then will only call writeback_inodes_sb() if it was able to lock it.
    This is fine because the cases where it can't read lock sb->s_umount
    are during a filesystem unmount or during a filesystem freeze - in those
    cases sb->s_umount is write locked and sync_filesystem() is called, which
    calls writeback_inodes_sb(). In other words, in all cases where we can't
    take a read lock on sb->s_umount, writeback is already being triggered
    elsewhere.
    
    An alternative would be to call btrfs_start_delalloc_roots() with a
    number of pages different from LONG_MAX, for example matching the number
    of delalloc bytes we currently have, in which case we would end up
    starting all delalloc with filemap_fdatawrite_wbc() and not with an
    async flush via filemap_flush() - that is only possible after the rather
    recent commit e076ab2a2ca70a ("btrfs: shrink delalloc pages instead of
    full inodes"). However that creates a whole new can of worms due to new
    lock dependencies, which lockdep complains, like for example:
    
    [ 8948.247280] ======================================================
    [ 8948.247823] WARNING: possible circular locking dependency detected
    [ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted
    [ 8948.248786] ------------------------------------------------------
    [ 8948.249320] kworker/u16:18/933570 is trying to acquire lock:
    [ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.250638]
                   but task is already holding lock:
    [ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
    [ 8948.252018]
                   which lock already depends on the new lock.
    
    [ 8948.252710]
                   the existing dependency chain (in reverse order) is:
    [ 8948.253343]
                   -> #2 (&root->delalloc_mutex){+.+.}-{3:3}:
    [ 8948.253950]        __mutex_lock+0x90/0x900
    [ 8948.254354]        start_delalloc_inodes+0x78/0x400 [btrfs]
    [ 8948.254859]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
    [ 8948.255408]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
    [ 8948.255942]        btrfs_mksubvol+0x380/0x570 [btrfs]
    [ 8948.256406]        btrfs_mksnapshot+0x81/0xb0 [btrfs]
    [ 8948.256870]        __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs]
    [ 8948.257413]        btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
    [ 8948.257961]        btrfs_ioctl+0x1196/0x3630 [btrfs]
    [ 8948.258418]        __x64_sys_ioctl+0x83/0xb0
    [ 8948.258793]        do_syscall_64+0x3b/0xc0
    [ 8948.259146]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 8948.259709]
                   -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}:
    [ 8948.260330]        __mutex_lock+0x90/0x900
    [ 8948.260692]        btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
    [ 8948.261234]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
    [ 8948.261766]        btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs]
    [ 8948.262379]        btrfs_start_pre_rw_mount+0x119/0x180 [btrfs]
    [ 8948.262909]        open_ctree+0x1511/0x171e [btrfs]
    [ 8948.263359]        btrfs_mount_root.cold+0x12/0xde [btrfs]
    [ 8948.263863]        legacy_get_tree+0x30/0x50
    [ 8948.264242]        vfs_get_tree+0x28/0xc0
    [ 8948.264594]        vfs_kern_mount.part.0+0x71/0xb0
    [ 8948.265017]        btrfs_mount+0x11d/0x3a0 [btrfs]
    [ 8948.265462]        legacy_get_tree+0x30/0x50
    [ 8948.265851]        vfs_get_tree+0x28/0xc0
    [ 8948.266203]        path_mount+0x2d4/0xbe0
    [ 8948.266554]        __x64_sys_mount+0x103/0x140
    [ 8948.266940]        do_syscall_64+0x3b/0xc0
    [ 8948.267300]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 8948.267790]
                   -> #0 (sb_internal#2){.+.+}-{0:0}:
    [ 8948.268322]        __lock_acquire+0x12e8/0x2260
    [ 8948.268733]        lock_acquire+0xd7/0x310
    [ 8948.269092]        start_transaction+0x44c/0x6e0 [btrfs]
    [ 8948.269591]        find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.270087]        btrfs_reserve_extent+0x14b/0x280 [btrfs]
    [ 8948.270588]        cow_file_range+0x17e/0x490 [btrfs]
    [ 8948.271051]        btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
    [ 8948.271586]        writepage_delalloc+0xb5/0x170 [btrfs]
    [ 8948.272071]        __extent_writepage+0x156/0x3c0 [btrfs]
    [ 8948.272579]        extent_write_cache_pages+0x263/0x460 [btrfs]
    [ 8948.273113]        extent_writepages+0x76/0x130 [btrfs]
    [ 8948.273573]        do_writepages+0xd2/0x1c0
    [ 8948.273942]        filemap_fdatawrite_wbc+0x68/0x90
    [ 8948.274371]        start_delalloc_inodes+0x17f/0x400 [btrfs]
    [ 8948.274876]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
    [ 8948.275417]        flush_space+0x1f2/0x630 [btrfs]
    [ 8948.275863]        btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
    [ 8948.276438]        process_one_work+0x252/0x5a0
    [ 8948.276829]        worker_thread+0x55/0x3b0
    [ 8948.277189]        kthread+0xf2/0x120
    [ 8948.277506]        ret_from_fork+0x22/0x30
    [ 8948.277868]
                   other info that might help us debug this:
    
    [ 8948.278548] Chain exists of:
                     sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex
    
    [ 8948.279601]  Possible unsafe locking scenario:
    
    [ 8948.280102]        CPU0                    CPU1
    [ 8948.280508]        ----                    ----
    [ 8948.280915]   lock(&root->delalloc_mutex);
    [ 8948.281271]                                lock(&fs_info->delalloc_root_mutex);
    [ 8948.281915]                                lock(&root->delalloc_mutex);
    [ 8948.282487]   lock(sb_internal#2);
    [ 8948.282800]
                    *** DEADLOCK ***
    
    [ 8948.283333] 4 locks held by kworker/u16:18/933570:
    [ 8948.283750]  #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
    [ 8948.284609]  #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
    [ 8948.285637]  #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
    [ 8948.286674]  #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
    [ 8948.287596]
                  stack backtrace:
    [ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1
    [ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    [ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
    [ 8948.290298] Call Trace:
    [ 8948.290517]  <TASK>
    [ 8948.290700]  dump_stack_lvl+0x59/0x73
    [ 8948.291026]  check_noncircular+0xf3/0x110
    [ 8948.291375]  ? start_transaction+0x228/0x6e0 [btrfs]
    [ 8948.291826]  __lock_acquire+0x12e8/0x2260
    [ 8948.292241]  lock_acquire+0xd7/0x310
    [ 8948.292714]  ? find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.293241]  ? lock_is_held_type+0xea/0x140
    [ 8948.293601]  start_transaction+0x44c/0x6e0 [btrfs]
    [ 8948.294055]  ? find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.294518]  find_free_extent+0x141e/0x1590 [btrfs]
    [ 8948.294957]  ? _raw_spin_unlock+0x29/0x40
    [ 8948.295312]  ? btrfs_get_alloc_profile+0x124/0x290 [btrfs]
    [ 8948.295813]  btrfs_reserve_extent+0x14b/0x280 [btrfs]
    [ 8948.296270]  cow_file_range+0x17e/0x490 [btrfs]
    [ 8948.296691]  btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
    [ 8948.297175]  ? find_lock_delalloc_range+0x247/0x270 [btrfs]
    [ 8948.297678]  writepage_delalloc+0xb5/0x170 [btrfs]
    [ 8948.298123]  __extent_writepage+0x156/0x3c0 [btrfs]
    [ 8948.298570]  extent_write_cache_pages+0x263/0x460 [btrfs]
    [ 8948.299061]  extent_writepages+0x76/0x130 [btrfs]
    [ 8948.299495]  do_writepages+0xd2/0x1c0
    [ 8948.299817]  ? sched_clock_cpu+0xd/0x110
    [ 8948.300160]  ? lock_release+0x155/0x4a0
    [ 8948.300494]  filemap_fdatawrite_wbc+0x68/0x90
    [ 8948.300874]  ? do_raw_spin_unlock+0x4b/0xa0
    [ 8948.301243]  start_delalloc_inodes+0x17f/0x400 [btrfs]
    [ 8948.301706]  ? lock_release+0x155/0x4a0
    [ 8948.302055]  btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
    [ 8948.302564]  flush_space+0x1f2/0x630 [btrfs]
    [ 8948.302970]  btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
    [ 8948.303510]  process_one_work+0x252/0x5a0
    [ 8948.303860]  ? process_one_work+0x5a0/0x5a0
    [ 8948.304221]  worker_thread+0x55/0x3b0
    [ 8948.304543]  ? process_one_work+0x5a0/0x5a0
    [ 8948.304904]  kthread+0xf2/0x120
    [ 8948.305184]  ? kthread_complete_and_exit+0x20/0x20
    [ 8948.305598]  ret_from_fork+0x22/0x30
    [ 8948.305921]  </TASK>
    
    It all comes from the fact that btrfs_start_delalloc_roots() takes the
    delalloc_root_mutex, in the transaction commit path we are holding a
    read lock on one of the superblock's freeze semaphores (via
    sb_start_intwrite()), the async reclaim task can also do a call to
    btrfs_start_delalloc_roots(), which ends up triggering writeback with
    calls to filemap_fdatawrite_wbc(), resulting in extent allocation which
    in turn can call btrfs_start_transaction(), which will result in taking
    the freeze semaphore via sb_start_intwrite(), forming a nasty dependency
    on all those locks which can be taken in different orders by different
    code paths.
    
    So just adopt the simple approach of calling try_to_writeback_inodes_sb()
    at btrfs_start_delalloc_flush().
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reviewed-by: Omar Sandoval <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    [ add more link reports ]
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

btrfs: qgroup: fix deadlock between rescan worker and remove qgroup [+ + +]

Author: Sidong Yang <[email protected]>
Date:   Mon Feb 28 01:43:40 2022 +0000

    btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
    
    commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream.
    
    The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and
    qgroup rescan worker") by Kawasaki resolves deadlock between quota
    disable and qgroup rescan worker. But also there is a deadlock case like
    it. It's about enabling or disabling quota and creating or removing
    qgroup. It can be reproduced in simple script below.
    
    for i in {1..100}
    do
        btrfs quota enable /mnt &
        btrfs qgroup create 1/0 /mnt &
        btrfs qgroup destroy 1/0 /mnt &
        btrfs quota disable /mnt &
    done
    
    Here's why the deadlock happens:
    
    1) The quota rescan task is running.
    
    2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
       mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
       the quota rescan task to complete.
    
    3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
       the qgroup_ioctl_lock mutex, because it's being held by task A. At that
       point task B is holding a transaction handle for the current transaction.
    
    4) The quota rescan task calls btrfs_commit_transaction(). This results
       in it waiting for all other tasks to release their handles on the
       transaction, but task B is blocked on the qgroup_ioctl_lock mutex
       while holding a handle on the transaction, and that mutex is being held
       by task A, which is waiting for the quota rescan task to complete,
       resulting in a deadlock between these 3 tasks.
    
    To resolve this issue, the thread disabling quota should unlock
    qgroup_ioctl_lock before waiting rescan completion. Move
    btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.
    
    Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
    CC: [email protected] # 5.4+
    Reviewed-by: Filipe Manana <[email protected]>
    Reviewed-by: Shin'ichiro Kawasaki <[email protected]>
    Signed-off-by: Sidong Yang <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

btrfs: subpage: fix a wrong check on subpage->writers [+ + +]

Author: Qu Wenruo <[email protected]>
Date:   Fri Feb 18 10:13:00 2022 +0800

    btrfs: subpage: fix a wrong check on subpage->writers
    
    commit c992fa1fd52380d0c4ced7b07479e877311ae645 upstream.
    
    [BUG]
    When looping btrfs/074 with 64K page size and 4K sectorsize, there is a
    low chance (1/50~1/100) to crash with the following ASSERT() triggered
    in btrfs_subpage_start_writer():
    
            ret = atomic_add_return(nbits, &subpage->writers);
            ASSERT(ret == nbits); <<< This one <<<
    
    [CAUSE]
    With more debugging output on the parameters of
    btrfs_subpage_start_writer(), it shows a very concerning error:
    
      ret=29 nbits=13 start=393216 len=53248
    
    For @nbits it's correct, but @ret which is the returned value from
    atomic_add_return(), it's not only larger than nbits, but also larger
    than max sectors per page value (for 64K page size and 4K sector size,
    it's 16).
    
    This indicates that some call sites are not properly decreasing the value.
    
    And that's exactly the case, in btrfs_page_unlock_writer(), due to the
    fact that we can have page locked either by lock_page() or
    process_one_page(), we have to check if the subpage has any writer.
    
    If no writers, it's locked by lock_page() and we only need to unlock it.
    
    But unfortunately the check for the writers are completely opposite:
    
            if (atomic_read(&subpage->writers))
                    /* No writers, locked by plain lock_page() */
                    return unlock_page(page);
    
    We directly unlock the page if it has writers, which is the completely
    opposite what we want.
    
    Thankfully the affected call site is only limited to
    extent_write_locked_range(), so it's mostly affecting compressed write.
    
    [FIX]
    Just fix the wrong check condition to fix the bug.
    
    Fixes: e55a0de18572 ("btrfs: rework page locking in __extent_writepage()")
    CC: [email protected] # 5.16
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8 [+ + +]

Author: Vincent Mailhol <[email protected]>
Date:   Sat Feb 12 20:27:13 2022 +0900

    can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8
    
    [ Upstream commit f4896248e9025ff744b4147e6758274a1cb8cbae ]
    
    The driver uses an atomic_t variable: struct
    es58x_device::opened_channel_cnt to keep track of the number of opened
    channels in order to only allocate memory for the URBs when this count
    changes from zero to one.
    
    While the intent was to prevent race conditions, the choice of an
    atomic_t turns out to be a bad idea for several reasons:
    
    - implementation is incorrect and fails to decrement
      opened_channel_cnt when the URB allocation fails as reported in
      [1].
    
    - even if opened_channel_cnt were to be correctly decremented,
      atomic_t is insufficient to cover edge cases: there can be a race
      condition in which 1/ a first process fails to allocate URBs
      memory 2/ a second process enters es58x_open() before the first
      process does its cleanup and decrements opened_channed_cnt. In
      which case, the second process would successfully return despite
      the URBs memory not being allocated.
    
    - actually, any kind of locking mechanism was useless here because
      it is redundant with the network stack big kernel lock
      (a.k.a. rtnl_lock) which is being hold by all the callers of
      net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
      ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
      [3].
    
    The atmomic_t is thus replaced by a simple u8 type and the logic to
    increment and decrement es58x_device:opened_channel_cnt is simplified
    accordingly fixing the bug reported in [1]. We do not check again for
    ASSERST_RTNL() as this is already done by the callers.
    
    [1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
    [2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
    [3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541
    
    Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
    Link: https://lore.kernel.org/all/[email protected]
    Reported-by: Dan Carpenter <[email protected]>
    Signed-off-by: Vincent Mailhol <[email protected]>
    Signed-off-by: Marc Kleine-Budde <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

can: gs_usb: change active_channels's type from atomic_t to u8 [+ + +]

Author: Vincent Mailhol <[email protected]>
Date:   Tue Feb 15 08:48:14 2022 +0900

    can: gs_usb: change active_channels's type from atomic_t to u8
    
    commit 035b0fcf02707d3c9c2890dc1484b11aa5335eb1 upstream.
    
    The driver uses an atomic_t variable: gs_usb:active_channels to keep
    track of the number of opened channels in order to only allocate
    memory for the URBs when this count changes from zero to one.
    
    However, the driver does not decrement the counter when an error
    occurs in gs_can_open(). This issue is fixed by changing the type from
    atomic_t to u8 and by simplifying the logic accordingly.
    
    It is safe to use an u8 here because the network stack big kernel lock
    (a.k.a. rtnl_mutex) is being hold. For details, please refer to [1].
    
    [1] https://lore.kernel.org/linux-can/CAMZ6Rq+sHpiw34ijPsmp7vbUpDtJwvVtdV7CvRZJsLixjAFfrg@mail.gmail.com/T/#t
    
    Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices")
    Link: https://lore.kernel.org/all/[email protected]
    Signed-off-by: Vincent Mailhol <[email protected]>
    Signed-off-by: Marc Kleine-Budde <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

cifs: do not use uninitialized data in the owner/group sid [+ + +]

Author: Ronnie Sahlberg <[email protected]>
Date:   Sat Feb 12 08:16:20 2022 +1000

    cifs: do not use uninitialized data in the owner/group sid
    
    [ Upstream commit 26d3dadebbcbddfaf1d9caad42527a28a0ed28d8 ]
    
    When idsfromsid is used we create a special SID for owner/group.
    This structure must be initialized or else the first 5 bytes
    of the Authority field of the SID will contain uninitialized data
    and thus not be a valid SID.
    
    Signed-off-by: Ronnie Sahlberg <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: fix double free race when mount fails in cifs_get_root() [+ + +]

Author: Ronnie Sahlberg <[email protected]>
Date:   Fri Feb 11 02:59:15 2022 +1000

    cifs: fix double free race when mount fails in cifs_get_root()
    
    [ Upstream commit 3d6cc9898efdfb062efb74dc18cfc700e082f5d5 ]
    
    When cifs_get_root() fails during cifs_smb3_do_mount() we call
    deactivate_locked_super() which eventually will call delayed_free() which
    will free the context.
    In this situation we should not proceed to enter the out: section in
    cifs_smb3_do_mount() and free the same resources a second time.
    
    [Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0
    
    [Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE     5.17.0-rc3+ #4
    [Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
    [Thu Feb 10 12:59:06 2022] Call Trace:
    [Thu Feb 10 12:59:06 2022]  <IRQ>
    [Thu Feb 10 12:59:06 2022]  dump_stack_lvl+0x5d/0x78
    [Thu Feb 10 12:59:06 2022]  print_address_description.constprop.0+0x24/0x150
    [Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022]  kasan_report.cold+0x7d/0x117
    [Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022]  __asan_load8+0x86/0xa0
    [Thu Feb 10 12:59:06 2022]  rcu_cblist_dequeue+0x32/0x60
    [Thu Feb 10 12:59:06 2022]  rcu_core+0x547/0xca0
    [Thu Feb 10 12:59:06 2022]  ? call_rcu+0x3c0/0x3c0
    [Thu Feb 10 12:59:06 2022]  ? __this_cpu_preempt_check+0x13/0x20
    [Thu Feb 10 12:59:06 2022]  ? lock_is_held_type+0xea/0x140
    [Thu Feb 10 12:59:06 2022]  rcu_core_si+0xe/0x10
    [Thu Feb 10 12:59:06 2022]  __do_softirq+0x1d4/0x67b
    [Thu Feb 10 12:59:06 2022]  __irq_exit_rcu+0x100/0x150
    [Thu Feb 10 12:59:06 2022]  irq_exit_rcu+0xe/0x30
    [Thu Feb 10 12:59:06 2022]  sysvec_hyperv_stimer0+0x9d/0xc0
    ...
    [Thu Feb 10 12:59:07 2022] Freed by task 58179:
    [Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
    [Thu Feb 10 12:59:07 2022]  kasan_set_track+0x25/0x30
    [Thu Feb 10 12:59:07 2022]  kasan_set_free_info+0x24/0x40
    [Thu Feb 10 12:59:07 2022]  ____kasan_slab_free+0x137/0x170
    [Thu Feb 10 12:59:07 2022]  __kasan_slab_free+0x12/0x20
    [Thu Feb 10 12:59:07 2022]  slab_free_freelist_hook+0xb3/0x1d0
    [Thu Feb 10 12:59:07 2022]  kfree+0xcd/0x520
    [Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0x149/0xbe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
    [Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
    [Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
    [Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
    [Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
    [Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    [Thu Feb 10 12:59:07 2022] Last potentially related work creation:
    [Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
    [Thu Feb 10 12:59:07 2022]  __kasan_record_aux_stack+0xb6/0xc0
    [Thu Feb 10 12:59:07 2022]  kasan_record_aux_stack_noalloc+0xb/0x10
    [Thu Feb 10 12:59:07 2022]  call_rcu+0x76/0x3c0
    [Thu Feb 10 12:59:07 2022]  cifs_umount+0xce/0xe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  cifs_kill_sb+0xc8/0xe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  deactivate_locked_super+0x5d/0xd0
    [Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0xab9/0xbe0 [cifs]
    [Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
    [Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
    [Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
    [Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
    [Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
    [Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Reported-by: Shyam Prasad N <[email protected]>
    Reviewed-by: Shyam Prasad N <[email protected]>
    Signed-off-by: Ronnie Sahlberg <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: modefromsids must add an ACE for authenticated users [+ + +]

Author: Ronnie Sahlberg <[email protected]>
Date:   Mon Feb 14 08:40:52 2022 +1000

    cifs: modefromsids must add an ACE for authenticated users
    
    [ Upstream commit 0c6f4ebf8835d01866eb686d47578cde80097981 ]
    
    When we create a file with modefromsids we set an ACL that
    has one ACE for the magic modefromsid as well as a second ACE that
    grants full access to all authenticated users.
    
    When later we chante the mode on the file we strip away this, and other,
    ACE for authenticated users in set_chmod_dacl() and then just add back/update
    the modefromsid ACE.
    Thus leaving the file with a single ACE that is for the mode and no ACE
    to grant any user any rights to access the file.
    Fix this by always adding back also the modefromsid ACE so that we do not
    drop the rights to access the file.
    
    Signed-off-by: Ronnie Sahlberg <[email protected]>
    Reviewed-by: Shyam Prasad N <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

dmaengine: shdma: Fix runtime PM imbalance on error [+ + +]

Author: Yongzhi Liu <[email protected]>
Date:   Sat Jan 15 21:34:56 2022 -0800

    dmaengine: shdma: Fix runtime PM imbalance on error
    
    [ Upstream commit 455896c53d5b803733ddd84e1bf8a430644439b6 ]
    
    pm_runtime_get_() increments the runtime PM usage counter even
    when it returns an error code, thus a matching decrement is needed on
    the error handling path to keep the counter balanced.
    
    Signed-off-by: Yongzhi Liu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Vinod Koul <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: Reduce dmesg error to a debug print [+ + +]

Author: Leo (Hanghong) Ma <[email protected]>
Date:   Fri Nov 12 10:11:35 2021 -0500

    drm/amd/display: Reduce dmesg error to a debug print
    
    commit 1d925758ba1a5d2716a847903e2fd04efcbd9862 upstream.
    
    [Why & How]
    Dmesg errors are found on dcn3.1 during reset test, but it's not
    a really failure. So reduce it to a debug print.
    
    Signed-off-by: Leo (Hanghong) Ma <[email protected]>
    Reviewed-by: Nicholas Kazlauskas <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: Mario Limonciello <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby [+ + +]

Author: Evan Quan <[email protected]>
Date:   Tue Jan 18 14:07:51 2022 +0800

    drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby
    
    [ Upstream commit 0136f5844b006e2286f873457c3fcba8c45a3735 ]
    
    Correct the UMD pstate profiling clocks for Dimgrey Cavefish and Beige
    Goby.
    
    Signed-off-by: Evan Quan <[email protected]>
    Reviewed-by: Alex Deucher <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: check vm ready by amdgpu_vm->evicting flag [+ + +]

Author: Qiang Yu <[email protected]>
Date:   Mon Feb 21 17:53:56 2022 +0800

    drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
    
    [ Upstream commit c1a66c3bc425ff93774fb2f6eefa67b83170dd7e ]
    
    Workstation application ANSA/META v21.1.4 get this error dmesg when
    running CI test suite provided by ANSA/META:
    [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
    
    This is caused by:
    1. create a 256MB buffer in invisible VRAM
    2. CPU map the buffer and access it causes vm_fault and try to move
       it to visible VRAM
    3. force visible VRAM space and traverse all VRAM bos to check if
       evicting this bo is valuable
    4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
       will set amdgpu_vm->evicting, but latter due to not in visible
       VRAM, won't really evict it so not add it to amdgpu_vm->evicted
    5. before next CS to clear the amdgpu_vm->evicting, user VM ops
       ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
       but fail in amdgpu_vm_bo_update_mapping() (check
       amdgpu_vm->evicting) and get this error log
    
    This error won't affect functionality as next CS will finish the
    waiting VM ops. But we'd better clear the error log by checking
    the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
    amdgpu_vm_bo_update_mapping() later.
    
    Another reason is amdgpu_vm->evicted list holds all BOs (both
    user buffer and page table), but only page table BOs' eviction
    prevent VM ops. amdgpu_vm->evicting flag is set only for page
    table BOs, so we should use evicting flag instead of evicted list
    in amdgpu_vm_ready().
    
    The side effect of this change is: previously blocked VM op (user
    buffer in "evicted" list but no page table in it) gets done
    immediately.
    
    v2: update commit comments.
    
    Acked-by: Paul Menzel <[email protected]>
    Reviewed-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Qiang Yu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdgpu: fix suspend/resume hang regression [+ + +]

Author: Qiang Yu <[email protected]>
Date:   Tue Mar 1 14:11:59 2022 +0800

    drm/amdgpu: fix suspend/resume hang regression
    
    [ Upstream commit f1ef17011c765495c876fa75435e59eecfdc1ee4 ]
    
    Regression has been reported that suspend/resume may hang with
    the previous vm ready check commit.
    
    So bring back the evicted list check as a temp fix.
    
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
    Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
    Reviewed-by: Christian Kц╤nig <[email protected]>
    Signed-off-by: Qiang Yu <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/bridge: ti-sn65dsi86: Properly undo autosuspend [+ + +]

Author: Douglas Anderson <[email protected]>
Date:   Tue Feb 22 14:18:43 2022 -0800

    drm/bridge: ti-sn65dsi86: Properly undo autosuspend
    
    [ Upstream commit 26d3474348293dc752c55fe6d41282199f73714c ]
    
    The PM Runtime docs say:
      Drivers in ->remove() callback should undo the runtime PM changes done
      in ->probe(). Usually this means calling pm_runtime_disable(),
      pm_runtime_dont_use_autosuspend() etc.
    
    We weren't doing that for autosuspend. Let's do it.
    
    Fixes: 9bede63127c6 ("drm/bridge: ti-sn65dsi86: Use pm_runtime autosuspend")
    Signed-off-by: Douglas Anderson <[email protected]>
    Reviewed-by: Linus Walleij <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/20220222141838.1.If784ba19e875e8ded4ec4931601ce6d255845245@changeid
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915/guc/slpc: Correct the param count for unset param [+ + +]

Author: Vinay Belgaumkar <[email protected]>
Date:   Wed Feb 16 10:15:04 2022 -0800

    drm/i915/guc/slpc: Correct the param count for unset param
    
    [ Upstream commit 1b279f6ad467535c3b8a66b4edefaca2cdd5bdc3 ]
    
    SLPC unset param H2G only needs one parameter - the id of the
    param.
    
    Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits")
    
    Suggested-by: Umesh Nerlige Ramappa <[email protected]>
    Signed-off-by: Vinay Belgaumkar <[email protected]>
    Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
    Signed-off-by: Ramalingam C <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 9648f1c3739505557d94ff749a4f32192ea81fe3)
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915: s/JSP2/ICP2/ PCH [+ + +]

Author: Ville Syrjц╓lц╓ <[email protected]>
Date:   Thu Feb 24 15:21:42 2022 +0200

    drm/i915: s/JSP2/ICP2/ PCH
    
    commit 08783aa7693f55619859f4f63f384abf17cb58c5 upstream.
    
    This JSP2 PCH actually seems to be some special Apple
    specific ICP variant rather than a JSP. Make it so. Or at
    least all the references to it seem to be some Apple ICL
    machines. Didn't manage to find these PCI IDs in any
    public chipset docs unfortunately.
    
    The only thing we're losing here with this JSP->ICP change
    is Wa_14011294188, but based on the HSD that isn't actually
    needed on any ICP based design (including JSP), only TGP
    based stuff (including MCC) really need it. The documented
    w/a just never made that distinction because Windows didn't
    want to differentiate between JSP and MCC (not sure how
    they handle hpd/ddc/etc. then though...).
    
    Cc: [email protected]
    Cc: Matt Roper <[email protected]>
    Cc: Vivek Kasireddy <[email protected]>
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4226
    Fixes: 943682e3bd19 ("drm/i915: Introduce Jasper Lake PCH")
    Signed-off-by: Ville Syrjц╓lц╓ <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Acked-by: Vivek Kasireddy <[email protected]>
    Tested-by: Tomas Bzatek <[email protected]>
    (cherry picked from commit 53581504a8e216d435f114a4f2596ad0dfd902fc)
    Signed-off-by: Tvrtko Ursulin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

e1000e: Correct NVM checksum verification flow [+ + +]

Author: Sasha Neftin <[email protected]>
Date:   Thu Feb 3 14:21:49 2022 +0200

    e1000e: Correct NVM checksum verification flow
    
    commit ffd24fa2fcc76ecb2e61e7a4ef8588177bcb42a6 upstream.
    
    Update MAC type check e1000_pch_tgp because for e1000_pch_cnp,
    NVM checksum update is still possible.
    Emit a more detailed warning message.
    
    Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1191663
    Fixes: 4051f68318ca ("e1000e: Do not take care about recovery NVM checksum")
    Reported-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Neftin <[email protected]>
    Tested-by: Naama Meir <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

e1000e: Fix possible HW unit hang after an s0ix exit [+ + +]

Author: Sasha Neftin <[email protected]>
Date:   Tue Jan 25 19:31:23 2022 +0200

    e1000e: Fix possible HW unit hang after an s0ix exit
    
    [ Upstream commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd ]
    
    Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY
    LAN connected device (LCD) reset during power management flows. This
    fixes possible HW unit hangs on the s0ix exit on some corporate ADL
    platforms.
    
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
    Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix")
    Suggested-by: Dima Ruinskiy <[email protected]>
    Suggested-by: Nir Efrati <[email protected]>
    Signed-off-by: Sasha Neftin <[email protected]>
    Tested-by: Kai-Heng Feng <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

efivars: Respect "block" flag in efivar_entry_set_safe() [+ + +]

Author: Jann Horn <[email protected]>
Date:   Fri Feb 18 19:05:59 2022 +0100

    efivars: Respect "block" flag in efivar_entry_set_safe()
    
    commit 258dd902022cb10c83671176688074879517fd21 upstream.
    
    When the "block" flag is false, the old code would sometimes still call
    check_var_size(), which wrongly tells ->query_variable_store() that it can
    block.
    
    As far as I can tell, this can't really materialize as a bug at the moment,
    because ->query_variable_store only does something on X86 with generic EFI,
    and in that configuration we always take the efivar_entry_set_nonblocking()
    path.
    
    Fixes: ca0e30dcaa53 ("efi: Add nonblocking option to efi_query_variable_store()")
    Signed-off-by: Jann Horn <[email protected]>
    Signed-off-by: Ard Biesheuvel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

exfat: fix i_blocks for files truncated over 4 GiB [+ + +]

Author: Christophe Vu-Brugier <[email protected]>
Date:   Mon Nov 22 22:02:37 2021 +0900

    exfat: fix i_blocks for files truncated over 4 GiB
    
    [ Upstream commit 92fba084b79e6bc7b12fc118209f1922c1a2df56 ]
    
    In exfat_truncate(), the computation of inode->i_blocks is wrong if
    the file is larger than 4 GiB because a 32-bit variable is used as a
    mask. This is fixed and simplified by using round_up().
    
    Also fix the same buggy computation in exfat_read_root() and another
    (correct) one in exfat_fill_inode(). The latter was fixed another way
    last month but can be simplified by using round_up() as well. See:
    
      commit 0c336d6e33f4 ("exfat: fix incorrect loading of i_blocks for
                            large files")
    
    Fixes: 98d917047e8b ("exfat: add file operations")
    Cc: [email protected] # v5.7+
    Suggested-by: Matthew Wilcox <[email protected]>
    Reviewed-by: Sungjong Seo <[email protected]>
    Signed-off-by: Christophe Vu-Brugier <[email protected]>
    Signed-off-by: Namjae Jeon <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

exfat: reuse exfat_inode_info variable instead of calling EXFAT_I() [+ + +]

Author: Christophe Vu-Brugier <[email protected]>
Date:   Tue Nov 2 22:23:58 2021 +0100

    exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
    
    [ Upstream commit 7dee6f57d7f22a89dd214518c778aec448270d4c ]
    
    Also add a local "struct exfat_inode_info *ei" variable to
    exfat_truncate() to simplify the code.
    
    Signed-off-by: Christophe Vu-Brugier <[email protected]>
    Signed-off-by: Namjae Jeon <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: drop ineligible txn start stop APIs [+ + +]

Author: Harshad Shirwadkar <[email protected]>
Date:   Thu Dec 23 12:21:38 2021 -0800

    ext4: drop ineligible txn start stop APIs
    
    [ Upstream commit 7bbbe241ec7ce0def9f71464c878fdbd2b0dcf37 ]
    
    This patch drops ext4_fc_start_ineligible() and
    ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions
    should simply call ext4_fc_mark_ineligible() after starting the
    trasaction.
    
    Signed-off-by: Harshad Shirwadkar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fast commit may miss file actions [+ + +]

Author: Xin Yin <[email protected]>
Date:   Mon Jan 17 17:36:55 2022 +0800

    ext4: fast commit may miss file actions
    
    [ Upstream commit bdc8a53a6f2f0b1cb5f991440f2100732299eb93 ]
    
    in the follow scenario:
    1. jbd start transaction n
    2. task A get new handle for transaction n+1
    3. task A do some actions and add inode to FC_Q_MAIN fc_q
    4. jbd complete transaction n and clear FC_Q_MAIN fc_q
    5. task A call fsync
    
    Fast commit will lost the file actions during a full commit.
    
    we should also add updates to staging queue during a full commit.
    and in ext4_fc_cleanup(), when reset a inode's fc track range, check
    it's i_sync_tid, if it bigger than current transaction tid, do not
    rest it, or we will lost the track range.
    
    And EXT4_MF_FC_COMMITTING is not needed anymore, so drop it.
    
    Signed-off-by: Xin Yin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fast commit may not fallback for ineligible commit [+ + +]

Author: Xin Yin <[email protected]>
Date:   Mon Jan 17 17:36:54 2022 +0800

    ext4: fast commit may not fallback for ineligible commit
    
    [ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ]
    
    For the follow scenario:
    1. jbd start commit transaction n
    2. task A get new handle for transaction n+1
    3. task A do some ineligible actions and mark FC_INELIGIBLE
    4. jbd complete transaction n and clean FC_INELIGIBLE
    5. task A call fsync
    
    In this case fast commit will not fallback to full commit and
    transaction n+1 also not handled by jbd.
    
    Make ext4_fc_mark_ineligible() also record transaction tid for
    latest ineligible case, when call ext4_fc_cleanup() check
    current transaction tid, if small than latest ineligible tid
    do not clear the EXT4_MF_FC_INELIGIBLE.
    
    Reported-by: kernel test robot <[email protected]>
    Reported-by: Dan Carpenter <[email protected]>
    Reported-by: Ritesh Harjani <[email protected]>
    Suggested-by: Harshad Shirwadkar <[email protected]>
    Signed-off-by: Xin Yin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Cc: [email protected]
    Signed-off-by: Sasha Levin <[email protected]>

ext4: simplify updating of fast commit stats [+ + +]

Author: Harshad Shirwadkar <[email protected]>
Date:   Thu Dec 23 12:21:39 2021 -0800

    ext4: simplify updating of fast commit stats
    
    [ Upstream commit 0915e464cb274648e1ef1663e1356e53ff400983 ]
    
    Move fast commit stats updating logic to a separate function from
    ext4_fc_commit(). This significantly improves readability of
    ext4_fc_commit().
    
    Signed-off-by: Harshad Shirwadkar <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

firmware: arm_scmi: Remove space in MODULE_ALIAS name [+ + +]

Author: Alyssa Ross <[email protected]>
Date:   Fri Feb 11 10:27:04 2022 +0000

    firmware: arm_scmi: Remove space in MODULE_ALIAS name
    
    commit 1ba603f56568c3b4c2542dfba07afa25f21dcff3 upstream.
    
    modprobe can't handle spaces in aliases. Get rid of it to fix the issue.
    
    Link: https://lore.kernel.org/r/[email protected]
    Fixes: aa4f886f3893 ("firmware: arm_scmi: add basic driver infrastructure for SCMI")
    Reviewed-by: Cristian Marussi <[email protected]>
    Signed-off-by: Alyssa Ross <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: add mapping for KEY_ALL_APPLICATIONS [+ + +]

Author: William Mahon <[email protected]>
Date:   Thu Mar 3 18:26:22 2022 -0800

    HID: add mapping for KEY_ALL_APPLICATIONS
    
    commit 327b89f0acc4c20a06ed59e4d9af7f6d804dc2e2 upstream.
    
    This patch adds a new key definition for KEY_ALL_APPLICATIONS
    and aliases KEY_DASHBOARD to it.
    
    It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS.
    
    Signed-off-by: William Mahon <[email protected]>
    Acked-by: Benjamin Tissoires <[email protected]>
    Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: add mapping for KEY_DICTATE [+ + +]

Author: William Mahon <[email protected]>
Date:   Thu Mar 3 18:23:42 2022 -0800

    HID: add mapping for KEY_DICTATE
    
    commit bfa26ba343c727e055223be04e08f2ebdd43c293 upstream.
    
    Numerous keyboards are adding dictate keys which allows for text
    messages to be dictated by a microphone.
    
    This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8
    usage code to this new keycode. Additionally hid-debug is adjusted to
    recognize this new usage code as well.
    
    Signed-off-by: William Mahon <[email protected]>
    Acked-by: Benjamin Tissoires <[email protected]>
    Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: amd_sfh: Add functionality to clear interrupts [+ + +]

Author: Basavaraj Natikar <[email protected]>
Date:   Tue Feb 8 17:51:11 2022 +0530

    HID: amd_sfh: Add functionality to clear interrupts
    
    [ Upstream commit fb75a3791a8032848c987db29b622878d8fe2b1c ]
    
    Newer AMD platforms with SFH may generate interrupts on some events
    which are unwarranted. Until this is cleared the actual MP2 data
    processing maybe stalled in some cases.
    
    Add a mechanism to clear the pending interrupts (if any) during the
    driver initialization and sensor command operations.
    
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

HID: amd_sfh: Add interrupt handler to process interrupts [+ + +]

Author: Basavaraj Natikar <[email protected]>
Date:   Tue Feb 8 17:51:12 2022 +0530

    HID: amd_sfh: Add interrupt handler to process interrupts
    
    [ Upstream commit 7f016b35ca7623c71b31facdde080e8ce171a697 ]
    
    On newer AMD platforms with SFH, it is observed that random interrupts
    get generated on the SFH hardware and until this is cleared the firmware
    sensor processing is stalled, resulting in no data been received to
    driver side.
    
    Add routines to handle these interrupts, so that firmware operations are
    not stalled.
    
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

HID: amd_sfh: Handle amd_sfh work buffer in PM ops [+ + +]

Author: Basavaraj Natikar <[email protected]>
Date:   Tue Feb 8 17:51:08 2022 +0530

    HID: amd_sfh: Handle amd_sfh work buffer in PM ops
    
    [ Upstream commit 0cf74235f4403b760a37f77271d2ca3424001ff9 ]
    
    Since in the current amd_sfh design the sensor data is periodically
    obtained in the form of poll data, during the suspend/resume cycle,
    scheduling a delayed work adds no value.
    
    So, cancel the work and restart back during the suspend/resume cycle
    respectively.
    
    Signed-off-by: Basavaraj Natikar <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: bcm2835: Avoid clock stretching timeouts [+ + +]

Author: Eric Anholt <[email protected]>
Date:   Fri Feb 23 22:42:31 2018 +0100

    i2c: bcm2835: Avoid clock stretching timeouts
    
    [ Upstream commit 9495b9b31abe525ebd93da58de2c88b9f66d3a0e ]
    
    The CLKT register contains at poweron 0x40, which at our typical 100kHz
    bus rate means .64ms. But there is no specified limit to how long devices
    should be able to stretch the clocks, so just disable the timeout. We
    still have a timeout wrapping the entire transfer.
    
    Signed-off-by: Eric Anholt <[email protected]>
    Signed-off-by: Stefan Wahren <[email protected]>
    BugLink: https://github.com/raspberrypi/linux/issues/3064
    Signed-off-by: Wolfram Sang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: cadence: allow COMPILE_TEST [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Sat Feb 12 20:45:48 2022 +0100

    i2c: cadence: allow COMPILE_TEST
    
    [ Upstream commit 0b0dcb3882c8f08bdeafa03adb4487e104d26050 ]
    
    Driver builds fine with COMPILE_TEST. Enable it for wider test coverage
    and easier maintenance.
    
    Signed-off-by: Wolfram Sang <[email protected]>
    Acked-by: Michal Simek <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: imx: allow COMPILE_TEST [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Sat Feb 12 20:46:57 2022 +0100

    i2c: imx: allow COMPILE_TEST
    
    [ Upstream commit 2ce4462f2724d1b3cedccea441c6d18bb360629a ]
    
    Driver builds fine with COMPILE_TEST. Enable it for wider test coverage
    and easier maintenance.
    
    Signed-off-by: Wolfram Sang <[email protected]>
    Acked-by: Oleksij Rempel <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i2c: qup: allow COMPILE_TEST [+ + +]

Author: Wolfram Sang <[email protected]>
Date:   Sat Feb 12 20:47:07 2022 +0100

    i2c: qup: allow COMPILE_TEST
    
    [ Upstream commit 5de717974005fcad2502281e9f82e139ca91f4bb ]
    
    Driver builds fine with COMPILE_TEST. Enable it for wider test coverage
    and easier maintenance.
    
    Signed-off-by: Wolfram Sang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Add trace while removing device [+ + +]

Author: Jedrzej Jagielski <[email protected]>
Date:   Tue Jun 22 15:43:48 2021 +0200

    iavf: Add trace while removing device
    
    [ Upstream commit bdb9e5c7aec73a7b8b5acab37587b6de1203e68d ]
    
    Add kernel trace that device was removed.
    Currently there is no such information.
    I.e. Host admin removes a PCI device from a VM,
    than on VM shall be info about the event.
    
    This patch adds info log to iavf_remove function.
    
    Signed-off-by: Arkadiusz Kubalewski <[email protected]>
    Signed-off-by: Jedrzej Jagielski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Add waiting so the port is initialized in remove [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:36:56 2022 +0100

    iavf: Add waiting so the port is initialized in remove
    
    [ Upstream commit 974578017fc1fdd06cea8afb9dfa32602e8529ed ]
    
    There exist races when port is being configured and remove is
    triggered.
    
    unregister_netdev is not and can't be called under crit_lock
    mutex since it is calling ndo_stop -> iavf_close which requires
    this lock. Depending on init state the netdev could be still
    unregistered so unregister_netdev never cleans up, when shortly
    after that the device could become registered.
    
    Make iavf_remove wait until port finishes initialization.
    All critical state changes are atomic (under crit_lock).
    Crashes that come from iavf_reset_interrupt_capability and
    iavf_free_traffic_irqs should now be solved in a graceful
    manner.
    
    Fixes: 605ca7c5c6707 ("iavf: Fix kernel BUG in free_msi_irqs")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix __IAVF_RESETTING state usage [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:55 2022 +0100

    iavf: Fix __IAVF_RESETTING state usage
    
    [ Upstream commit 14756b2ae265d526b8356e86729090b01778fdf6 ]
    
    The setup of __IAVF_RESETTING state in watchdog task had no
    effect and could lead to slow resets in the driver as
    the task for __IAVF_RESETTING state only requeues watchdog.
    Till now the __IAVF_RESETTING was interpreted by reset task
    as running state which could lead to errors with allocating
    and resources disposal.
    
    Make watchdog_task queue the reset task when it's necessary.
    Do not update the state to __IAVF_RESETTING so the reset task
    knows exactly what is the current state of the adapter.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix deadlock in iavf_reset_task [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:31 2022 +0100

    iavf: Fix deadlock in iavf_reset_task
    
    commit e85ff9c631e1bf109ce8428848dfc8e8b0041f48 upstream.
    
    There exists a missing mutex_unlock call on crit_lock in
    iavf_reset_task call path.
    
    Unlock the crit_lock before returning from reset task.
    
    Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iavf: Fix init state closure on remove [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:37:10 2022 +0100

    iavf: Fix init state closure on remove
    
    [ Upstream commit 3ccd54ef44ebfa0792c5441b6d9c86618f3378d1 ]
    
    When init states of the adapter work, the errors like lack
    of communication with the PF might hop in. If such events
    occur the driver restores previous states in order to retry
    initialization in a proper way. When remove task kicks in,
    this situation could lead to races with unregistering the
    netdevice as well as resources cleanup. With the commit
    introducing the waiting in remove for init to complete,
    this problem turns into an endless waiting if init never
    recovers from errors.
    
    Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the
    remove thread has started.
    
    Make __IAVF_COMM_FAILED adapter state respect the
    __IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED
    state and return without any action instead of trying to
    recover.
    
    Make __IAVF_INIT_FAILED adapter state respect the
    __IAVF_IN_REMOVE_TASK bit and return without any further
    actions.
    
    Make the loop in the remove handler break when adapter has
    __IAVF_INIT_FAILED state set.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:37:50 2022 +0100

    iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
    
    [ Upstream commit 0579fafd37fb7efe091f0e6c8ccf968864f40f3e ]
    
    iavf_virtchnl_completion is called under crit_lock but when
    the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called,
    this lock is released in order to obtain rtnl_lock to avoid
    ABBA deadlock with unregister_netdev.
    
    Along with the new way iavf_remove behaves, there exist
    many risks related to the lock release and attmepts to regrab
    it. The driver faces crashes related to races between
    unregister_netdev and netdev_update_features. Yet another
    risk is that the driver could already obtain the crit_lock
    in order to destroy it and iavf_virtchnl_completion could
    crash or block forever.
    
    Make iavf_virtchnl_completion never relock crit_lock in it's
    call paths.
    
    Extract rtnl_lock locking logic to the driver for
    unregister_netdev in order to set the netdev_registered flag
    inside the lock.
    
    Introduce a new flag that will inform adminq_task to perform
    the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after
    it finishes processing messages. Guard this code with remove
    flags so it's never called when the driver is in remove state.
    
    Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Fix missing check for running netdev [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:43 2022 +0100

    iavf: Fix missing check for running netdev
    
    commit d2c0f45fcceb0995f208c441d9c9a453623f9ccf upstream.
    
    The driver was queueing reset_task regardless of the netdev
    state.
    
    Do not queue the reset task in iavf_change_mtu if netdev
    is not running.
    
    Fixes: fdd4044ffdc8 ("iavf: Remove timer for work triggering, use delaying work instead")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iavf: Fix race in init state [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:38:01 2022 +0100

    iavf: Fix race in init state
    
    [ Upstream commit a472eb5cbaebb5774672c565e024336c039e9128 ]
    
    When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES
    message, the driver will wait for the response after requeueing
    the watchdog task in iavf_init_get_resources call stack. The
    logic is implemented this way that iavf_init_get_resources has
    to be called in order to allocate adapter->vf_res. It is polling
    for the AQ response in iavf_get_vf_config function. Expect a
    call trace from kernel when adminq_task worker handles this
    message first. adapter->vf_res will be NULL in
    iavf_virtchnl_completion.
    
    Make the watchdog task not queue the adminq_task if the init
    process is not finished yet.
    
    Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iavf: Rework mutexes for better synchronisation [+ + +]

Author: Slawomir Laba <[email protected]>
Date:   Wed Feb 23 13:35:49 2022 +0100

    iavf: Rework mutexes for better synchronisation
    
    [ Upstream commit fc2e6b3b132a907378f6af08356b105a4139c4fb ]
    
    The driver used to crash in multiple spots when put to stress testing
    of the init, reset and remove paths.
    
    The user would experience call traces or hangs when creating,
    resetting, removing VFs. Depending on the machines, the call traces
    are happening in random spots, like reset restoring resources racing
    with driver remove.
    
    Make adapter->crit_lock mutex a mandatory lock for guarding the
    operations performed on all workqueues and functions dealing with
    resource allocation and disposal.
    
    Make __IAVF_REMOVE a final state of the driver respected by
    workqueues that shall not requeue, when they fail to obtain the
    crit_lock.
    
    Make the IRQ handler not to queue the new work for adminq_task
    when the __IAVF_REMOVE state is set.
    
    Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
    Signed-off-by: Slawomir Laba <[email protected]>
    Signed-off-by: Phani Burra <[email protected]>
    Signed-off-by: Jacob Keller <[email protected]>
    Signed-off-by: Mateusz Palczewski <[email protected]>
    Tested-by: Konrad Jankowski <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: Allow queueing resets during probe [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:58 2022 -0800

    ibmvnic: Allow queueing resets during probe
    
    [ Upstream commit fd98693cb0721317f27341951593712c580c36a1 ]
    
    We currently don't allow queuing resets when adapter is in VNIC_PROBING
    state - instead we throw away the reset and return EBUSY. The reasoning
    is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
    being initialized so performing a reset during this time can lead us to
    accessing fields in the ibmvnic_adapter that are not fully initialized.
    A review of the code shows that all the adapter state neede to process a
    reset is initialized before registering the CRQ so that should no longer
    be a concern.
    
    Further the expectation is that if we do get a reset (transport event)
    during probe, the do..while() loop in ibmvnic_probe() will handle this
    by reinitializing the CRQ.
    
    While that is true to some extent, it is possible that the reset might
    occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
    but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
    such a reset will be thrown away. While the client assumes that the
    adapter is functional, the vnic server will wait for the client to reinit
    the adapter. This disconnect between the two leaves the adapter down
    needing manual intervention.
    
    Because ibmvnic_probe() has other work to do after initializing the CRQ
    (such as registering the netdev at a minimum) and because the reset event
    can occur at any instant after the CRQ is initialized, there will always
    be a window between initializing the CRQ and considering the adapter
    ready for resets (ie state == PROBED).
    
    So rather than discarding resets during this window, allow queueing them
    - but only process them after the adapter is fully initialized.
    
    To do this, introduce a new completion state ->probe_done and have the
    reset worker thread wait on this before processing resets.
    
    This change brings up two new situations in or just after ibmvnic_probe().
    First after one or more resets were queued, we encounter an error and
    decide to retry the initialization.  At that point the queued resets are
    no longer relevant since we could be talking to a new vnic server. So we
    must purge/flush the queued resets before restarting the initialization.
    As a side note, since we are still in the probing stage and we have not
    registered the netdev, it will not be CHANGE_PARAM reset.
    
    Second this change opens up a potential race between the worker thread
    in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
    following sequence of events:
    
            1. Register CRQ
            2. Get transport event before CRQ_INIT completes.
            3. Tasklet schedules reset:
                    a) add rwi to list
                    b) schedule_work() to start worker thread which runs
                       and waits for ->probe_done.
            4. ibmvnic_probe() decides to retry, purges rwi_list
            5. Re-register crq and this time rest of probe succeeds - register
               netdev and complete(->probe_done).
            6. Worker thread resumes in __ibmvnic_reset() from 3b.
            7. Worker thread sets ->resetting bit
            8. ibmvnic_open() comes in, notices ->resetting bit, sets state
               to IBMVNIC_OPEN and returns early expecting worker thread to
               finish the open.
            9. Worker thread finds rwi_list empty and returns without
               opening the interface.
    
    If this happens, the ->ndo_open() call is effectively lost and the
    interface remains down. To address this, ensure that ->rwi_list is
    not empty before setting the ->resetting  bit. See also comments in
    __ibmvnic_reset().
    
    Fixes: 6a2fb0e99f9c ("ibmvnic: driver initialization for kdump/kexec")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: clear fop when retrying probe [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:57 2022 -0800

    ibmvnic: clear fop when retrying probe
    
    [ Upstream commit f628ad531b4f34fdba0984255b4a2850dd369513 ]
    
    Clear ->failover_pending flag that may have been set in the previous
    pass of registering CRQ. If we don't clear, a subsequent ibmvnic_open()
    call would be misled into thinking a failover is pending and assuming
    that the reset worker thread would open the adapter. If this pass of
    registering the CRQ succeeds (i.e there is no transport event), there
    wouldn't be a reset worker thread.
    
    This would leave the adapter unconfigured and require manual intervention
    to bring it up during boot.
    
    Fixes: 5a18e1e0c193 ("ibmvnic: Fix failover case for non-redundant configuration")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: complete init_done on transport events [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:54 2022 -0800

    ibmvnic: complete init_done on transport events
    
    [ Upstream commit 36491f2df9ad2501e5a4ec25d3d95d72bafd2781 ]
    
    If we get a transport event, set the error and mark the init as
    complete so the attempt to send crq-init or login fail sooner
    rather than wait for the timeout.
    
    Fixes: bbd669a868bb ("ibmvnic: Fix completion structure initialization")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: define flush_reset_queue helper [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:53 2022 -0800

    ibmvnic: define flush_reset_queue helper
    
    [ Upstream commit 83da53f7e4bd86dca4b2edc1e2bb324fb3c033a1 ]
    
    Define and use a helper to flush the reset queue.
    
    Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: free reset-work-item when flushing [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:51 2022 -0800

    ibmvnic: free reset-work-item when flushing
    
    commit 8d0657f39f487d904fca713e0bc39c2707382553 upstream.
    
    Fix a tiny memory leak when flushing the reset work queue.
    
    Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ibmvnic: init init_done_rc earlier [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:56 2022 -0800

    ibmvnic: init init_done_rc earlier
    
    [ Upstream commit ae16bf15374d8b055e040ac6f3f1147ab1c9bb7d ]
    
    We currently initialize the ->init_done completion/return code fields
    before issuing a CRQ_INIT command. But if we get a transport event soon
    after registering the CRQ the taskslet may already have recorded the
    completion and error code. If we initialize here, we might overwrite/
    lose that and end up issuing the CRQ_INIT only to timeout later.
    
    If that timeout happens during probe, we will leave the adapter in the
    DOWN state rather than retrying to register/init the CRQ.
    
    Initialize the completion before registering the CRQ so we don't lose
    the notification.
    
    Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: initialize rc before completing wait [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:52 2022 -0800

    ibmvnic: initialize rc before completing wait
    
    [ Upstream commit 765559b10ce514eb1576595834f23cdc92125fee ]
    
    We should initialize ->init_done_rc before calling complete(). Otherwise
    the waiting thread may see ->init_done_rc as 0 before we have updated it
    and may assume that the CRQ was successful.
    
    Fixes: 6b278c0cb378 ("ibmvnic delay complete()")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ibmvnic: register netdev after init of adapter [+ + +]

Author: Sukadev Bhattiprolu <[email protected]>
Date:   Thu Feb 24 22:23:55 2022 -0800

    ibmvnic: register netdev after init of adapter
    
    commit 570425f8c7c18b14fa8a2a58a0adb431968ad118 upstream.
    
    Finish initializing the adapter before registering netdev so state
    is consistent.
    
    Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
    Signed-off-by: Sukadev Bhattiprolu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ibmvnic: Update driver return codes [+ + +]

Author: Dany Madden <[email protected]>
Date:   Tue Dec 14 00:17:47 2021 -0500

    ibmvnic: Update driver return codes
    
    [ Upstream commit b6ee566cf3940883d67c0d142fae8d410e975f47 ]
    
    Update return codes to be more informative.
    
    Signed-off-by: Jacob Root <[email protected]>
    Signed-off-by: Dany Madden <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

igc: igc_read_phy_reg_gpy: drop premature return [+ + +]

Author: Corinna Vinschen <[email protected]>
Date:   Wed Feb 16 14:31:35 2022 +0100

    igc: igc_read_phy_reg_gpy: drop premature return
    
    commit fda2635466cd26ad237e1bc5d3f6a60f97ad09b6 upstream.
    
    igc_read_phy_reg_gpy checks the return value from igc_read_phy_reg_mdic
    and if it's not 0, returns immediately. By doing this, it leaves the HW
    semaphore in the acquired state.
    
    Drop this premature return statement, the function returns after
    releasing the semaphore immediately anyway.
    
    Fixes: 5586838fe9ce ("igc: Add code for PHY support")
    Signed-off-by: Corinna Vinschen <[email protected]>
    Acked-by: Sasha Neftin <[email protected]>
    Tested-by: Naama Meir <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

igc: igc_write_phy_reg_gpy: drop premature return [+ + +]

Author: Sasha Neftin <[email protected]>
Date:   Sun Feb 20 09:29:15 2022 +0200

    igc: igc_write_phy_reg_gpy: drop premature return
    
    commit c4208653a327a09da1e9e7b10299709b6d9b17bf upstream.
    
    Similar to "igc_read_phy_reg_gpy: drop premature return" patch.
    igc_write_phy_reg_gpy checks the return value from igc_write_phy_reg_mdic
    and if it's not 0, returns immediately. By doing this, it leaves the HW
    semaphore in the acquired state.
    
    Drop this premature return statement, the function returns after
    releasing the semaphore immediately anyway.
    
    Fixes: 5586838fe9ce ("igc: Add code for PHY support")
    Suggested-by: Dima Ruinskiy <[email protected]>
    Reported-by: Corinna Vinschen <[email protected]>
    Signed-off-by: Sasha Neftin <[email protected]>
    Tested-by: Naama Meir <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: clear BTN_RIGHT/MIDDLE on buttonpads [+ + +]

Author: Josц╘ ExpцЁsito <[email protected]>
Date:   Tue Feb 8 09:59:16 2022 -0800

    Input: clear BTN_RIGHT/MIDDLE on buttonpads
    
    [ Upstream commit 37ef4c19b4c659926ce65a7ac709ceaefb211c40 ]
    
    Buttonpads are expected to map the INPUT_PROP_BUTTONPAD property bit
    and the BTN_LEFT key bit.
    
    As explained in the specification, where a device has a button type
    value of 0 (click-pad) or 1 (pressure-pad) there should not be
    discrete buttons:
    https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/touchpad-windows-precision-touchpad-collection#device-capabilities-feature-report
    
    However, some drivers map the BTN_RIGHT and/or BTN_MIDDLE key bits even
    though the device is a buttonpad and therefore does not have those
    buttons.
    
    This behavior has forced userspace applications like libinput to
    implement different workarounds and quirks to detect buttonpads and
    offer to the user the right set of features and configuration options.
    For more information:
    https://gitlab.freedesktop.org/libinput/libinput/-/merge_requests/726
    
    In order to avoid this issue clear the BTN_RIGHT and BTN_MIDDLE key
    bits when the input device is register if the INPUT_PROP_BUTTONPAD
    property bit is set.
    
    Notice that this change will not affect udev because it does not check
    for buttons. See systemd/src/udev/udev-builtin-input_id.c.
    
    List of known affected hardware:
    
     - Chuwi AeroBook Plus
     - Chuwi Gemibook
     - Framework Laptop
     - GPD Win Max
     - Huawei MateBook 2020
     - Prestigio Smartbook 141 C2
     - Purism Librem 14v1
     - StarLite Mk II   - AMI firmware
     - StarLite Mk II   - Coreboot firmware
     - StarLite Mk III  - AMI firmware
     - StarLite Mk III  - Coreboot firmware
     - StarLabTop Mk IV - AMI firmware
     - StarLabTop Mk IV - Coreboot firmware
     - StarBook Mk V
    
    Acked-by: Peter Hutterer <[email protected]>
    Acked-by: Benjamin Tissoires <[email protected]>
    Acked-by: Jiri Kosina <[email protected]>
    Signed-off-by: Josц╘ ExpцЁsito <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Input: elan_i2c - fix regulator enable count imbalance after suspend/resume [+ + +]

Author: Hans de Goede <[email protected]>
Date:   Mon Feb 28 23:39:50 2022 -0800

    Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
    
    commit 04b7762e37c95d9b965d16bb0e18dbd1fa2e2861 upstream.
    
    Before these changes elan_suspend() would only disable the regulator
    when device_may_wakeup() returns false; whereas elan_resume() would
    unconditionally enable it, leading to an enable count imbalance when
    device_may_wakeup() returns true.
    
    This triggers the "WARN_ON(regulator->enable_count)" in regulator_put()
    when the elan_i2c driver gets unbound, this happens e.g. with the
    hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1.
    
    Fix this by making the regulator_enable() call also be conditional
    on device_may_wakeup() returning false.
    
    Signed-off-by: Hans de Goede <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power() [+ + +]

Author: Hans de Goede <[email protected]>
Date:   Mon Feb 28 23:39:38 2022 -0800

    Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
    
    commit 81a36d8ce554b82b0a08e2b95d0bd44fcbff339b upstream.
    
    elan_disable_power() is called conditionally on suspend, where as
    elan_enable_power() is always called on resume. This leads to
    an imbalance in the regulator's enable count.
    
    Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power()
    in preparation of fixing this.
    
    No functional changes intended.
    
    Signed-off-by: Hans de Goede <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [dtor: consolidate elan_[en|dis]able() into elan_set_power()]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Input: samsung-keypad - properly state IOMEM dependency [+ + +]

Author: David Gow <[email protected]>
Date:   Sun Feb 27 21:00:10 2022 -0800

    Input: samsung-keypad - properly state IOMEM dependency
    
    commit ba115adf61b36b8c167126425a62b0efc23f72c0 upstream.
    
    Make the samsung-keypad driver explicitly depend on CONFIG_HAS_IOMEM, as it
    calls devm_ioremap(). This prevents compile errors in some configs (e.g,
    allyesconfig/randconfig under UML):
    
    /usr/bin/ld: drivers/input/keyboard/samsung-keypad.o: in function `samsung_keypad_probe':
    samsung-keypad.c:(.text+0xc60): undefined reference to `devm_ioremap'
    
    Signed-off-by: David Gow <[email protected]>
    Acked-by: anton ivanov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Dmitry Torokhov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/amd: Fix I/O page table memory leak [+ + +]

Author: Suravee Suthikulpanit <[email protected]>
Date:   Thu Feb 10 09:47:45 2022 -0600

    iommu/amd: Fix I/O page table memory leak
    
    [ Upstream commit 6b0b2d9a6a308bcd9300c2d83000a82812c56cea ]
    
    The current logic updates the I/O page table mode for the domain
    before calling the logic to free memory used for the page table.
    This results in IOMMU page table memory leak, and can be observed
    when launching VM w/ pass-through devices.
    
    Fix by freeing the memory used for page table before updating the mode.
    
    Cc: Joerg Roedel <[email protected]>
    Reported-by: Daniel Jordan <[email protected]>
    Tested-by: Daniel Jordan <[email protected]>
    Signed-off-by: Suravee Suthikulpanit <[email protected]>
    Fixes: e42ba0633064 ("iommu/amd: Restructure code for freeing page table")
    Link: https://lore.kernel.org/all/[email protected]/
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iommu/amd: Recover from event log overflow [+ + +]

Author: Lennert Buytenhek <[email protected]>
Date:   Mon Oct 4 13:07:24 2021 +0300

    iommu/amd: Recover from event log overflow
    
    commit 5ce97f4ec5e0f8726a5dda1710727b1ee9badcac upstream.
    
    The AMD IOMMU logs I/O page faults and such to a ring buffer in
    system memory, and this ring buffer can overflow.  The AMD IOMMU
    spec has the following to say about the interrupt status bit that
    signals this overflow condition:
    
            EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
            event log overflow has occurred. This bit is set when a new
            event is to be written to the event log and there is no usable
            entry in the event log, causing the new event information to
            be discarded. An interrupt is generated when EventOverflow = 1b
            and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
            entries are written while this bit is set. Software Note: To
            resume logging, clear EventOverflow (W1C), and write a 1 to
            MMIO Offset 0018h[EventLogEn].
    
    The AMD IOMMU driver doesn't currently implement this recovery
    sequence, meaning that if a ring buffer overflow occurs, logging
    of EVT/PPR/GA events will cease entirely.
    
    This patch implements the spec-mandated reset sequence, with the
    minor tweak that the hardware seems to want to have a 0 written to
    MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this
    field, or the IOMMU won't actually resume logging events.
    
    Signed-off-by: Lennert Buytenhek <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find [+ + +]

Author: Miaoqian Lin <[email protected]>
Date:   Fri Jan 7 08:09:11 2022 +0000

    iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
    
    commit 9826e393e4a8c3df474e7f9eacd3087266f74005 upstream.
    
    The reference taken by 'of_find_device_by_node()' must be released when
    not needed anymore.
    Add the corresponding 'put_device()' in the error handling path.
    
    Fixes: 765a9d1d02b2 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan")
    Signed-off-by: Miaoqian Lin <[email protected]>
    Acked-by: Thierry Reding <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu/vt-d: Fix double list_add when enabling VMD in scalable mode [+ + +]

Author: Adrian Huang <[email protected]>
Date:   Mon Feb 21 13:33:48 2022 +0800

    iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
    
    commit b00833768e170a31af09268f7ab96aecfcca9623 upstream.
    
    When enabling VMD and IOMMU scalable mode, the following kernel panic
    call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
    CPU) during booting:
    
    pci 0000:59:00.5: Adding to iommu group 42
    ...
    vmd 0000:59:00.5: PCI host bridge to bus 10000:80
    pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
    pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
    pci 10000:80:01.0: enabling Extended Tags
    pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
    pci 10000:80:01.0: DMAR: Setup RID2PASID failed
    pci 10000:80:01.0: Failed to add to iommu group 42: -16
    pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
    pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
    pci 10000:80:03.0: enabling Extended Tags
    pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
    ------------[ cut here ]------------
    kernel BUG at lib/list_debug.c:29!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
    Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
    Workqueue: events work_for_cpu_fn
    RIP: 0010:__list_add_valid.cold+0x26/0x3f
    Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
          0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
          fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
          9e e8 8b b1 fe
    RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
    RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
    RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
    RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
    R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
    R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
    FS:  0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     intel_pasid_alloc_table+0x9c/0x1d0
     dmar_insert_one_dev_info+0x423/0x540
     ? device_to_iommu+0x12d/0x2f0
     intel_iommu_attach_device+0x116/0x290
     __iommu_attach_device+0x1a/0x90
     iommu_group_add_device+0x190/0x2c0
     __iommu_probe_device+0x13e/0x250
     iommu_probe_device+0x24/0x150
     iommu_bus_notifier+0x69/0x90
     blocking_notifier_call_chain+0x5a/0x80
     device_add+0x3db/0x7b0
     ? arch_memremap_can_ram_remap+0x19/0x50
     ? memremap+0x75/0x140
     pci_device_add+0x193/0x1d0
     pci_scan_single_device+0xb9/0xf0
     pci_scan_slot+0x4c/0x110
     pci_scan_child_bus_extend+0x3a/0x290
     vmd_enable_domain.constprop.0+0x63e/0x820
     vmd_probe+0x163/0x190
     local_pci_probe+0x42/0x80
     work_for_cpu_fn+0x13/0x20
     process_one_work+0x1e2/0x3b0
     worker_thread+0x1c4/0x3a0
     ? rescuer_thread+0x370/0x370
     kthread+0xc7/0xf0
     ? kthread_complete_and_exit+0x20/0x20
     ret_from_fork+0x1f/0x30
     </TASK>
    Modules linked in:
    ---[ end trace 0000000000000000 ]---
    ...
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---
    
    The following 'lspci' output shows devices '10000:80:*' are subdevices of
    the VMD device 0000:59:00.5:
    
      $ lspci
      ...
      0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
      ...
      10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
      10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
      10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
      10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
      10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
      10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
    
    The symptom 'list_add double add' is caused by the following failure
    message:
    
      pci 10000:80:01.0: DMAR: Setup RID2PASID failed
      pci 10000:80:01.0: Failed to add to iommu group 42: -16
      pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
    
    Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
    so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
    device 0000:59:00.5. Here is call path:
    
      intel_pasid_alloc_table
        pci_for_each_dma_alias
         get_alias_pasid_table
           search_pasid_table
    
    pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
    which is the VMD device 0000:59:00.5. However, pte of the VMD device
    0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
    Adding to iommu group 42". So, the status -EBUSY is returned when
    configuring pasid entry for device 10000:80:01.0.
    
    It then invokes dmar_remove_one_dev_info() to release
    'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
    table is not released because of the following statement in
    __dmar_remove_one_dev_info():
    
            if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
                    ...
                    intel_pasid_free_table(info->dev);
            }
    
    The subsequent dmar_insert_one_dev_info() operation of device
    10000:80:03.0 allocates 'struct device_domain_info *' from
    iommu_devinfo_cache. The allocated address is the same address that
    is released previously for device 10000:80:01.0. Finally, invoking
    device_attach_pasid_table() causes the issue.
    
    `git bisect` points to the offending commit 474dd1c65064 ("iommu/vt-d:
    Fix clearing real DMA device's scalable-mode context entries"), which
    releases the pasid table if the device is not the subdevice by
    checking the returned status of dev_is_real_dma_subdevice().
    Reverting the offending commit can work around the issue.
    
    The solution is to prevent from allocating pasid table if those
    devices are subdevices of the VMD device.
    
    Fixes: 474dd1c65064 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
    Cc: [email protected] # v5.14+
    Signed-off-by: Adrian Huang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Thu Mar 3 09:37:28 2022 -0800

    ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
    
    [ Upstream commit 2d3916f3189172d5c69d33065c3c21119fe539fc ]
    
    While investigating on why a synchronize_net() has been added recently
    in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
    might drop skbs in some cases.
    
    Discussion about removing synchronize_net() from ipv6_mc_down()
    will happen in a different thread.
    
    Fixes: f185de28d9ae ("mld: add new workqueues for process mld events")
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: Taehee Yoo <[email protected]>
    Cc: Cong Wang <[email protected]>
    Cc: David Ahern <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

iwlwifi: mvm: check debugfs_dir ptr before use [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Tue Feb 22 19:06:30 2022 -0800

    iwlwifi: mvm: check debugfs_dir ptr before use
    
    commit 5a6248c0a22352f09ea041665d3bd3e18f6f872c upstream.
    
    When "debugfs=off" is used on the kernel command line, iwiwifi's
    mvm module uses an invalid/unchecked debugfs_dir pointer and causes
    a BUG:
    
     BUG: kernel NULL pointer dereference, address: 000000000000004f
     #PF: supervisor read access in kernel mode
     #PF: error_code(0x0000) - not-present page
     PGD 0 P4D 0
     Oops: 0000 [#1] PREEMPT SMP
     CPU: 1 PID: 503 Comm: modprobe Tainted: G        W         5.17.0-rc5 #7
     Hardware name: Dell Inc. Inspiron 15 5510/076F7Y, BIOS 2.4.1 11/05/2021
     RIP: 0010:iwl_mvm_dbgfs_register+0x692/0x700 [iwlmvm]
     Code: 69 a0 be 80 01 00 00 48 c7 c7 50 73 6a a0 e8 95 cf ee e0 48 8b 83 b0 1e 00 00 48 c7 c2 54 73 6a a0 be 64 00 00 00 48 8d 7d 8c <48> 8b 48 50 e8 15 22 07 e1 48 8b 43 28 48 8d 55 8c 48 c7 c7 5f 73
     RSP: 0018:ffffc90000a0ba68 EFLAGS: 00010246
     RAX: ffffffffffffffff RBX: ffff88817d6e3328 RCX: ffff88817d6e3328
     RDX: ffffffffa06a7354 RSI: 0000000000000064 RDI: ffffc90000a0ba6c
     RBP: ffffc90000a0bae0 R08: ffffffff824e4880 R09: ffffffffa069d620
     R10: ffffc90000a0ba00 R11: ffffffffffffffff R12: 0000000000000000
     R13: ffffc90000a0bb28 R14: ffff88817d6e3328 R15: ffff88817d6e3320
     FS:  00007f64dd92d740(0000) GS:ffff88847f640000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000000000000004f CR3: 000000016fc79001 CR4: 0000000000770ee0
     PKRU: 55555554
     Call Trace:
      <TASK>
      ? iwl_mvm_mac_setup_register+0xbdc/0xda0 [iwlmvm]
      iwl_mvm_start_post_nvm+0x71/0x100 [iwlmvm]
      iwl_op_mode_mvm_start+0xab8/0xb30 [iwlmvm]
      _iwl_op_mode_start+0x6f/0xd0 [iwlwifi]
      iwl_opmode_register+0x6a/0xe0 [iwlwifi]
      ? 0xffffffffa0231000
      iwl_mvm_init+0x35/0x1000 [iwlmvm]
      ? 0xffffffffa0231000
      do_one_initcall+0x5a/0x1b0
      ? kmem_cache_alloc+0x1e5/0x2f0
      ? do_init_module+0x1e/0x220
      do_init_module+0x48/0x220
      load_module+0x2602/0x2bc0
      ? __kernel_read+0x145/0x2e0
      ? kernel_read_file+0x229/0x290
      __do_sys_finit_module+0xc5/0x130
      ? __do_sys_finit_module+0xc5/0x130
      __x64_sys_finit_module+0x13/0x20
      do_syscall_64+0x38/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae
     RIP: 0033:0x7f64dda564dd
     Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 29 0f 00 f7 d8 64 89 01 48
     RSP: 002b:00007ffdba393f88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
     RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f64dda564dd
     RDX: 0000000000000000 RSI: 00005575399e2ab2 RDI: 0000000000000001
     RBP: 000055753a91c5e0 R08: 0000000000000000 R09: 0000000000000002
     R10: 0000000000000001 R11: 0000000000000246 R12: 00005575399e2ab2
     R13: 000055753a91ceb0 R14: 0000000000000000 R15: 000055753a923018
      </TASK>
     Modules linked in: btintel(+) btmtk bluetooth vfat snd_hda_codec_hdmi fat snd_hda_codec_realtek snd_hda_codec_generic iwlmvm(+) snd_sof_pci_intel_tgl mac80211 snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence soundwire_bus snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core btrfs snd_compress snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec raid6_pq iwlwifi snd_hda_core snd_pcm snd_timer snd soundcore cfg80211 intel_ish_ipc(+) thunderbolt rfkill intel_ishtp ucsi_acpi wmi i2c_hid_acpi i2c_hid evdev
     CR2: 000000000000004f
     ---[ end trace 0000000000000000 ]---
    
    Check the debugfs_dir pointer for an error before using it.
    
    Fixes: 8c082a99edb9 ("iwlwifi: mvm: simplify iwl_mvm_dbgfs_register")
    Signed-off-by: Randy Dunlap <[email protected]>
    Cc: Luca Coelho <[email protected]>
    Cc: [email protected]
    Cc: Kalle Valo <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: Emmanuel Grumbach <[email protected]>
    Cc: stable <[email protected]>
    Reviewed-by: Greg Kroah-Hartman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [change to make both conditional]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() [+ + +]

Author: Maciej Fijalkowski <[email protected]>
Date:   Wed Mar 2 09:59:27 2022 -0800

    ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
    
    commit 6c7273a266759d9d36f7c862149f248bcdeddc0f upstream.
    
    Commit c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if
    netif is not OK") addressed the ring transient state when
    MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
    interface to through down/up. Maurice reported that when carrier is not
    ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
    cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
    there is still some work to be done.
    
    To fix this, do not set work_done to false for a !netif_carrier_ok().
    
    Fixes: c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
    Reported-by: Maurice Baijens <[email protected]>
    Tested-by: Maurice Baijens <[email protected]>
    Signed-off-by: Maciej Fijalkowski <[email protected]>
    Tested-by: Sandeep Penigalapati <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: arm64: vgic: Read HW interrupt pending state from the HW [+ + +]

Author: Marc Zyngier <[email protected]>
Date:   Thu Feb 3 09:24:45 2022 +0000

    KVM: arm64: vgic: Read HW interrupt pending state from the HW
    
    [ Upstream commit 5bfa685e62e9ba93c303a9a8db646c7228b9b570 ]
    
    It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always
    result in the pending interrupts being accurately reported if they are
    mapped to a HW interrupt. This is particularily visible when acking
    the timer interrupt and reading the GICR_ISPENDR1 register immediately
    after, for example (the interrupt appears as not-pending while it really
    is...).
    
    This is because a HW interrupt has its 'active and pending state' kept
    in the *physical* distributor, and not in the virtual one, as mandated
    by the spec (this is what allows the direct deactivation). The virtual
    distributor only caries the pending and active *states* (note the
    plural, as these are two independent and non-overlapping states).
    
    Fix it by reading the HW state back, either from the timer itself or
    from the distributor if necessary.
    
    Reported-by: Ricardo Koller <[email protected]>
    Tested-by: Ricardo Koller <[email protected]>
    Reviewed-by: Ricardo Koller <[email protected]>
    Signed-off-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata [+ + +]

Author: James Morse <[email protected]>
Date:   Thu Jan 27 12:20:52 2022 +0000

    KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
    
    [ Upstream commit 1dd498e5e26ad71e3e9130daf72cfb6a693fee03 ]
    
    Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
    single-stepping authenticated ERET instructions. A single step is
    expected, but a pointer authentication trap is taken instead. The
    erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
    EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
    
    Because the conditions require an ERET into active-not-pending state,
    this is only a problem for the EL2 when EL2 is stepping EL1. In this case
    the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
    restored.
    
    Cc: [email protected] # 53960faf2b73: arm64: Add Cortex-A510 CPU part definition
    Cc: [email protected]
    Signed-off-by: James Morse <[email protected]>
    [maz: fixup cpucaps ordering]
    Signed-off-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots() [+ + +]

Author: Like Xu <[email protected]>
Date:   Tue Mar 1 20:49:41 2022 +0800

    KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
    
    commit c6c937d673aaa1d603f62f134e1ca9c173eeeed3 upstream.
    
    Just like on the optional mmu_alloc_direct_roots() path, once shadow
    path reaches "r = -EIO" somewhere, the caller needs to know the actual
    state in order to enter error handling and avoid something worse.
    
    Fixes: 4a38162ee9f1 ("KVM: MMU: load PDPTRs outside mmu_lock")
    Signed-off-by: Like Xu <[email protected]>
    Reviewed-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: x86: Add KVM_CAP_ENABLE_CAP to x86 [+ + +]

Author: Aaron Lewis <[email protected]>
Date:   Mon Feb 14 21:29:51 2022 +0000

    KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
    
    [ Upstream commit 127770ac0d043435375ab86434f31a93efa88215 ]
    
    Follow the precedent set by other architectures that support the VCPU
    ioctl, KVM_ENABLE_CAP, and advertise the VM extension, KVM_CAP_ENABLE_CAP.
    This way, userspace can ensure that KVM_ENABLE_CAP is available on a
    vcpu before using it.
    
    Fixes: 5c919412fe61 ("kvm/x86: Hyper-V synthetic interrupt controller")
    Signed-off-by: Aaron Lewis <[email protected]>
    Message-Id: <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Linux: Linux 5.16.13 [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Tue Mar 8 19:14:20 2022 +0100

    Linux 5.16.13
    
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Fox Chen <[email protected]>
    Tested-by: Ronald Warsow <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: Shuah Khan <[email protected]>
    Tested-by: Fox Chen <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Tested-by: Rudi Heitbaum <[email protected]>
    Tested-by: Justin M. Forbes <[email protected]>
    Tested-by: Bagas Sanjaya <[email protected]>
    Tested-by: Luna Jernberg <[email protected]>
    Tested-by: Ron Economos <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211: fix EAPoL rekey fail in 802.3 rx path [+ + +]

Author: Deren Wu <[email protected]>
Date:   Sun Feb 13 00:20:15 2022 +0800

    mac80211: fix EAPoL rekey fail in 802.3 rx path
    
    commit 610d086d6df0b15c3732a7b4a5b0f1c3e1b84d4c upstream.
    
    mac80211 set capability NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211
    to upper layer by default. That means we should pass EAPoL packets through
    nl80211 path only, and should not send the EAPoL skb to netdevice diretly.
    At the meanwhile, wpa_supplicant would not register sock to listen EAPoL
    skb on the netdevice.
    
    However, there is no control_port_protocol handler in mac80211 for 802.3 RX
    packets, mac80211 driver would pass up the EAPoL rekey frame to netdevice
    and wpa_supplicant would be never interactive with this kind of packets,
    if SUPPORTS_RX_DECAP_OFFLOAD is enabled. This causes STA always rekey fail
    if EAPoL frame go through 802.3 path.
    
    To avoid this problem, align the same process as 802.11 type to handle
    this frame before put it into network stack.
    
    This also addresses a potential security issue in 802.3 RX mode that was
    previously fixed in commit a8c4d76a8dd4 ("mac80211: do not accept/forward
    invalid EAPOL frames").
    
    Cc: [email protected] # 5.12+
    Fixes: 80a915ec4427 ("mac80211: add rx decapsulation offload support")
    Signed-off-by: Deren Wu <[email protected]>
    Link: https://lore.kernel.org/r/6889c9fced5859ebb088564035f84fd0fa792a49.1644680751.git.deren.wu@mediatek.com
    [fix typos, update comment and add note about security issue]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211: fix forwarded mesh frames AC & queue selection [+ + +]

Author: Nicolas Escande <[email protected]>
Date:   Mon Feb 14 18:32:14 2022 +0100

    mac80211: fix forwarded mesh frames AC & queue selection
    
    commit 859ae7018316daa4adbc496012dcbbb458d7e510 upstream.
    
    There are two problems with the current code that have been highlighted
    with the AQL feature that is now enbaled by default.
    
    First problem is in ieee80211_rx_h_mesh_fwding(),
    ieee80211_select_queue_80211() is used on received packets to choose
    the sending AC queue of the forwarding packet although this function
    should only be called on TX packet (it uses ieee80211_tx_info).
    This ends with forwarded mesh packets been sent on unrelated random AC
    queue. To fix that, AC queue can directly be infered from skb->priority
    which has been extracted from QOS info (see ieee80211_parse_qos()).
    
    Second problem is the value of queue_mapping set on forwarded mesh
    frames via skb_set_queue_mapping() is not the AC of the packet but a
    hardware queue index. This may or may not work depending on AC to HW
    queue mapping which is driver specific.
    
    Both of these issues lead to improper AC selection while forwarding
    mesh packets but more importantly due to improper airtime accounting
    (which is done on a per STA, per AC basis) caused traffic stall with
    the introduction of AQL.
    
    Fixes: cf44012810cc ("mac80211: fix unnecessary frame drops in mesh fwding")
    Fixes: d3c1597b8d1b ("mac80211: fix forwarded mesh frame queue mapping")
    Co-developed-by: Remi Pommarel <[email protected]>
    Signed-off-by: Remi Pommarel <[email protected]>
    Signed-off-by: Nicolas Escande <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211: treat some SAE auth steps as final [+ + +]

Author: Johannes Berg <[email protected]>
Date:   Thu Feb 24 10:39:34 2022 +0100

    mac80211: treat some SAE auth steps as final
    
    commit 94d9864cc86f572f881db9b842a78e9d075493ae upstream.
    
    When we get anti-clogging token required (added by the commit
    mentioned below), or the other status codes added by the later
    commit 4e56cde15f7d ("mac80211: Handle special status codes in
    SAE commit") we currently just pretend (towards the internal
    state machine of authentication) that we didn't receive anything.
    
    This has the undesirable consequence of retransmitting the prior
    frame, which is not expected, because the timer is still armed.
    
    If we just disarm the timer at that point, it would result in
    the undesirable side effect of being in this state indefinitely
    if userspace crashes, or so.
    
    So to fix this, reset the timer and set a new auth_data->waiting
    in order to have no more retransmissions, but to have the data
    destroyed when the timer actually fires, which will only happen
    if userspace didn't continue (i.e. crashed or abandoned it.)
    
    Fixes: a4055e74a2ff ("mac80211: Don't destroy auth data in case of anti-clogging")
    Reported-by: Jouni Malinen <[email protected]>
    Link: https://lore.kernel.org/r/20220224103932.75964e1d7932.Ia487f91556f29daae734bf61f8181404642e1eec@changeid
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work [+ + +]

Author: JaeMan Park <[email protected]>
Date:   Thu Jan 13 15:02:35 2022 +0900

    mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
    
    [ Upstream commit cacfddf82baf1470e5741edeecb187260868f195 ]
    
    In mac80211_hwsim, the probe_req frame is created and sent while
    scanning. It is sent with ieee80211_tx_info which is not initialized.
    Uninitialized ieee80211_tx_info can cause problems when using
    mac80211_hwsim with wmediumd. wmediumd checks the tx_rates field of
    ieee80211_tx_info and doesn't relay probe_req frame to other clients
    even if it is a broadcasting message.
    
    Call ieee80211_tx_prepare_skb() to initialize ieee80211_tx_info for
    the probe_req that is created by hw_scan_work in mac80211_hwsim.
    
    Signed-off-by: JaeMan Park <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [fix memory leak]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mac80211_hwsim: report NOACK frames in tx_status [+ + +]

Author: Benjamin Beichler <[email protected]>
Date:   Tue Jan 11 22:13:26 2022 +0000

    mac80211_hwsim: report NOACK frames in tx_status
    
    [ Upstream commit 42a79960ffa50bfe9e0bf5d6280be89bf563a5dd ]
    
    Add IEEE80211_TX_STAT_NOACK_TRANSMITTED to tx_status flags to have proper
    statistics for non-acked frames.
    
    Signed-off-by: Benjamin Beichler <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

memfd: fix F_SEAL_WRITE after shmem huge page allocated [+ + +]

Author: Hugh Dickins <[email protected]>
Date:   Fri Mar 4 20:29:01 2022 -0800

    memfd: fix F_SEAL_WRITE after shmem huge page allocated
    
    commit f2b277c4d1c63a85127e8aa2588e9cc3bd21cb99 upstream.
    
    Wangyong reports: after enabling tmpfs filesystem to support transparent
    hugepage with the following command:
    
      echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
    
    the docker program tries to add F_SEAL_WRITE through the following
    command, but it fails unexpectedly with errno EBUSY:
    
      fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
    
    That is because memfd_tag_pins() and memfd_wait_for_pins() were never
    updated for shmem huge pages: checking page_mapcount() against
    page_count() is hopeless on THP subpages - they need to check
    total_mapcount() against page_count() on THP heads only.
    
    Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
    (compared != 1): either can be justified, but given the non-atomic
    total_mapcount() calculation, it is better now to be strict.  Bear in
    mind that total_mapcount() itself scans all of the THP subpages, when
    choosing to take an XA_CHECK_SCHED latency break.
    
    Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
    page has been swapped out since memfd_tag_pins(), then its refcount must
    have fallen, and so it can safely be untagged.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Hugh Dickins <[email protected]>
    Reported-by: Zeal Robot <[email protected]>
    Reported-by: wangyong <[email protected]>
    Cc: Mike Kravetz <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Cc: CGEL ZTE <[email protected]>
    Cc: Kirill A. Shutemov <[email protected]>
    Cc: Song Liu <[email protected]>
    Cc: Yang Yang <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

MIPS: ralink: mt7621: do memory detection on KSEG1 [+ + +]

Author: Chuanhong Guo <[email protected]>
Date:   Fri Feb 11 08:13:44 2022 +0800

    MIPS: ralink: mt7621: do memory detection on KSEG1
    
    [ Upstream commit cc19db8b312a6c75645645f5cc1b45166b109006 ]
    
    It's reported that current memory detection code occasionally detects
    larger memory under some bootloaders.
    Current memory detection code tests whether address space wraps around
    on KSEG0, which is unreliable because it's cached.
    
    Rewrite memory size detection to perform the same test on KSEG1 instead.
    While at it, this patch also does the following two things:
    1. use a fixed pattern instead of a random function pointer as the magic
       value.
    2. add an additional memory write and a second comparison as part of the
       test to prevent possible smaller memory detection result due to
       leftover values in memory.
    
    Fixes: 139c949f7f0a MIPS: ("ralink: mt7621: add memory detection support")
    Reported-by: Rui Salvaterra <[email protected]>
    Signed-off-by: Chuanhong Guo <[email protected]>
    Tested-by: Sergio Paracuellos <[email protected]>
    Tested-by: Rui Salvaterra <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

MIPS: ralink: mt7621: use bitwise NOT instead of logical [+ + +]

Author: Ilya Lipnitskiy <[email protected]>
Date:   Mon Feb 28 17:15:07 2022 -0800

    MIPS: ralink: mt7621: use bitwise NOT instead of logical
    
    [ Upstream commit 5d8965704fe5662e2e4a7e4424a2cbe53e182670 ]
    
    It was the intention to reverse the bits, not make them all zero by
    using logical NOT operator.
    
    Fixes: cc19db8b312a ("MIPS: ralink: mt7621: do memory detection on KSEG1")
    Suggested-by: Chuanhong Guo <[email protected]>
    Signed-off-by: Ilya Lipnitskiy <[email protected]>
    Reviewed-by: Sergio Paracuellos <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mips: setup: fix setnocoherentio() boolean setting [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Mon Feb 21 09:50:29 2022 -0800

    mips: setup: fix setnocoherentio() boolean setting
    
    commit 1e6ae0e46e32749b130f1823da30cea9aa2a59a0 upstream.
    
    Correct a typo/pasto: setnocoherentio() should set
    dma_default_coherent to false, not true.
    
    Fixes: 14ac09a65e19 ("MIPS: refactor the runtime coherent vs noncoherent DMA indicators")
    Signed-off-by: Randy Dunlap <[email protected]>
    Cc: Christoph Hellwig <[email protected]>
    Cc: Thomas Bogendoerfer <[email protected]>
    Cc: [email protected]
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Thomas Bogendoerfer <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls [+ + +]

Author: Daniel Borkmann <[email protected]>
Date:   Fri Mar 4 15:26:32 2022 +0100

    mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
    
    commit 0708a0afe291bdfe1386d74d5ec1f0c27e8b9168 upstream.
    
    syzkaller was recently triggering an oversized kvmalloc() warning via
    xdp_umem_create().
    
    The triggered warning was added back in 7661809d493b ("mm: don't allow
    oversized kvmalloc() calls"). The rationale for the warning for huge
    kvmalloc sizes was as a reaction to a security bug where the size was
    more than UINT_MAX but not everything was prepared to handle unsigned
    long sizes.
    
    Anyway, the AF_XDP related call trace from this syzkaller report was:
    
      kvmalloc include/linux/mm.h:806 [inline]
      kvmalloc_array include/linux/mm.h:824 [inline]
      kvcalloc include/linux/mm.h:829 [inline]
      xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
      xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
      xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
      xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
      __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
      __do_sys_setsockopt net/socket.c:2187 [inline]
      __se_sys_setsockopt net/socket.c:2184 [inline]
      __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Bjц╤rn mentioned that requests for >2GB allocation can still be valid:
    
      The structure that is being allocated is the page-pinning accounting.
      AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
      still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
      PAGE_SIZE on 64 bit systems). [...]
    
      I could just change from U32_MAX to INT_MAX, but as I stated earlier
      that has a hacky feeling to it. [...] From my perspective, the code
      isn't broken, with the memcg limits in consideration. [...]
    
    Linus says:
    
      [...] Pretty much every time this has come up, the kernel warning has
      shown that yes, the code was broken and there really wasn't a reason
      for doing allocations that big.
    
      Of course, some people would be perfectly fine with the allocation
      failing, they just don't want the warning. I didn't want __GFP_NOWARN
      to shut it up originally because I wanted people to see all those
      cases, but these days I think we can just say "yeah, people can shut
      it up explicitly by saying 'go ahead and fail this allocation, don't
      warn about it'".
    
      So enough time has passed that by now I'd certainly be ok with [it].
    
    Thus allow call-sites to silence such userspace triggered splats if the
    allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
    to kvcalloc() this is already the case, so nothing else needed there.
    
    Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
    Reported-by: [email protected]
    Suggested-by: Linus Torvalds <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Tested-by: [email protected]
    Cc: Bjц╤rn Tц╤pel <[email protected]>
    Cc: Magnus Karlsson <[email protected]>
    Cc: Willy Tarreau <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Alexei Starovoitov <[email protected]>
    Cc: Andrii Nakryiko <[email protected]>
    Cc: Jakub Kicinski <[email protected]>
    Cc: David S. Miller <[email protected]>
    Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
    Link: https://lore.kernel.org/bpf/[email protected]
    Reviewed-by: Leon Romanovsky <[email protected]>
    Ackd-by: Michal Hocko <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mptcp: Correctly set DATA_FIN timeout when number of retransmits is large [+ + +]

Author: Mat Martineau <[email protected]>
Date:   Thu Feb 24 16:52:59 2022 -0800

    mptcp: Correctly set DATA_FIN timeout when number of retransmits is large
    
    commit 877d11f0332cd2160e19e3313e262754c321fa36 upstream.
    
    Syzkaller with UBSAN uncovered a scenario where a large number of
    DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
    timeout calculation:
    
    ================================================================================
    UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
    shift exponent 32 is too large for 32-bit type 'unsigned int'
    CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
    Workqueue: events mptcp_worker
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
     __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
     mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
     __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
     mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
     process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
     worker_thread+0x95/0xe10 kernel/workqueue.c:2454
     kthread+0x2f4/0x3b0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
     </TASK>
    ================================================================================
    
    This change limits the maximum timeout by limiting the size of the
    shift, which keeps all intermediate values in-bounds.
    
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
    Fixes: 6477dd39e62c ("mptcp: Retransmit DATA_FIN")
    Acked-by: Paolo Abeni <[email protected]>
    Signed-off-by: Mat Martineau <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/smc: fix connection leak [+ + +]

Author: D. Wythe <[email protected]>
Date:   Thu Feb 24 23:26:19 2022 +0800

    net/smc: fix connection leak
    
    commit 9f1c50cf39167ff71dc5953a3234f3f6eeb8fcb5 upstream.
    
    There's a potential leak issue under following execution sequence :
    
    smc_release                             smc_connect_work
    if (sk->sk_state == SMC_INIT)
                                            send_clc_confirim
            tcp_abort();
                                            ...
                                            sk.sk_state = SMC_ACTIVE
    smc_close_active
    switch(sk->sk_state) {
    ...
    case SMC_ACTIVE:
            smc_close_final()
            // then wait peer closed
    
    Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
    still in the tcp send buffer, in which case our connection token cannot
    be delivered to the server side, which means that we cannot get a
    passive close message at all. Therefore, it is impossible for the to be
    disconnected at all.
    
    This patch tries a very simple way to avoid this issue, once the state
    has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
    smc connection, considering that the state is SMC_INIT before
    tcp_abort(), abandoning the complete disconnection process should not
    cause too much problem.
    
    In fact, this problem may exist as long as the CLC CONFIRM message is
    not received by the server. Whether a timer should be added after
    smc_close_final() needs to be discussed in the future. But even so, this
    patch provides a faster release for connection in above case, it should
    also be valuable.
    
    Fixes: 39f41f367b08 ("net/smc: common release code for non-accepted sockets")
    Signed-off-by: D. Wythe <[email protected]>
    Acked-by: Karsten Graul <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server [+ + +]

Author: D. Wythe <[email protected]>
Date:   Wed Mar 2 21:25:12 2022 +0800

    net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
    
    commit 4940a1fdf31c39f0806ac831cde333134862030b upstream.
    
    The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
    Based on the fact that whether a new SMC connection can be accepted or
    not depends on not only the limit of conn nums, but also the available
    entries of rtoken. Since the rtoken release is trigger by peer, while
    the conn nums is decrease by local, tons of thing can happen in this
    time difference.
    
    This only thing that needs to be mentioned is that now all connection
    creations are completely protected by smc_server_lgr_pending lock, it's
    enough to check only the available entries in rtokens_used_mask.
    
    Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)")
    Signed-off-by: D. Wythe <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client [+ + +]

Author: D. Wythe <[email protected]>
Date:   Wed Mar 2 21:25:11 2022 +0800

    net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
    
    commit 0537f0a2151375dcf90c1bbfda6a0aaf57164e89 upstream.
    
    The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
    dues to following execution sequence:
    
    Server Conn A:           Server Conn B:                 Client Conn B:
    
    smc_lgr_unregister_conn
                            smc_lgr_register_conn
                            smc_clc_send_accept     ->
                                                            smc_rtoken_add
    smcr_buf_unuse
                    ->              Client Conn A:
                                    smc_rtoken_delete
    
    smc_lgr_unregister_conn() makes current link available to assigned to new
    incoming connection, while smcr_buf_unuse() has not executed yet, which
    means that smc_rtoken_add may fail because of insufficient rtoken_entry,
    reversing their execution order will avoid this problem.
    
    Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
    Signed-off-by: D. Wythe <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() [+ + +]

Author: Zheyu Ma <[email protected]>
Date:   Wed Mar 2 20:24:23 2022 +0800

    net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
    
    commit bd6f1fd5d33dfe5d1b4f2502d3694a7cc13f166d upstream.
    
    During driver initialization, the pointer of card info, i.e. the
    variable 'ci' is required. However, the definition of
    'com20020pci_id_table' reveals that this field is empty for some
    devices, which will cause null pointer dereference when initializing
    these devices.
    
    The following log reveals it:
    
    [    3.973806] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
    [    3.973819] RIP: 0010:com20020pci_probe+0x18d/0x13e0 [com20020_pci]
    [    3.975181] Call Trace:
    [    3.976208]  local_pci_probe+0x13f/0x210
    [    3.977248]  pci_device_probe+0x34c/0x6d0
    [    3.977255]  ? pci_uevent+0x470/0x470
    [    3.978265]  really_probe+0x24c/0x8d0
    [    3.978273]  __driver_probe_device+0x1b3/0x280
    [    3.979288]  driver_probe_device+0x50/0x370
    
    Fix this by checking whether the 'ci' is a null pointer first.
    
    Fixes: 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata")
    Signed-off-by: Zheyu Ma <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: chelsio: cxgb3: check the return value of pci_find_capability() [+ + +]

Author: Jia-Ju Bai <[email protected]>
Date:   Fri Feb 25 04:37:27 2022 -0800

    net: chelsio: cxgb3: check the return value of pci_find_capability()
    
    [ Upstream commit 767b9825ed1765894e569a3d698749d40d83762a ]
    
    The function pci_find_capability() in t3_prep_adapter() can fail, so its
    return value should be checked.
    
    Fixes: 4d22de3e6cc4 ("Add support for the latest 1G/10G Chelsio adapter, T3")
    Reported-by: TOTE Robot <[email protected]>
    Signed-off-by: Jia-Ju Bai <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dcb: disable softirqs in dcbnl_flush_dev() [+ + +]

Author: Vladimir Oltean <[email protected]>
Date:   Wed Mar 2 21:39:39 2022 +0200

    net: dcb: disable softirqs in dcbnl_flush_dev()
    
    [ Upstream commit 10b6bb62ae1a49ee818fc479cf57b8900176773e ]
    
    Ido Schimmel points out that since commit 52cff74eef5d ("dcbnl : Disable
    software interrupts before taking dcb_lock"), the DCB API can be called
    by drivers from softirq context.
    
    One such in-tree example is the chelsio cxgb4 driver:
    dcb_rpl
    -> cxgb4_dcb_handle_fw_update
       -> dcb_ieee_setapp
    
    If the firmware for this driver happened to send an event which resulted
    in a call to dcb_ieee_setapp() at the exact same time as another
    DCB-enabled interface was unregistering on the same CPU, the softirq
    would deadlock, because the interrupted process was already holding the
    dcb_lock in dcbnl_flush_dev().
    
    Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
    in the rest of the dcbnl code.
    
    Fixes: 91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices")
    Reported-by: Ido Schimmel <[email protected]>
    Signed-off-by: Vladimir Oltean <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dcb: flush lingering app table entries for unregistered devices [+ + +]

Author: Vladimir Oltean <[email protected]>
Date:   Thu Feb 24 18:01:54 2022 +0200

    net: dcb: flush lingering app table entries for unregistered devices
    
    commit 91b0383fef06f20b847fa9e4f0e3054ead0b1a1b upstream.
    
    If I'm not mistaken (and I don't think I am), the way in which the
    dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
    the application table with dynamically allocated struct dcb_app_type
    entries that are kept in the module-global dcb_app_list.
    
    However, nobody keeps exact track of these entries, and although
    dcb_ieee_delapp() is supposed to remove them, nobody does so when the
    interface goes away (example: driver unbinds from device). So the
    dcb_app_list will contain lingering entries with an ifindex that no
    longer matches any device in dcb_app_lookup().
    
    Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
    flushing the app table entries of interfaces that are now gone.
    
    In fact something like this used to be done as part of the initial
    commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
    essentially at module_exit time. That became dead code after commit
    7a6b6f515f77 ("DCB: fix kconfig option") which essentially merged
    "tristate config DCB" and "bool config DCBNL" into a single "bool config
    DCB", so net/dcb/dcbnl.c could not be built as a module anymore.
    
    Commit 36b9ad8084bd ("net/dcb: make dcbnl.c explicitly non-modular")
    recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
    leaving us with the version we have today.
    
    Since flushing application table entries can and should be done as soon
    as the netdevice disappears, fundamentally the commit that is to blame
    is the one that introduced the design of this API.
    
    Fixes: 9ab933ab2cc8 ("dcbnl: add appliction tlv handlers")
    Signed-off-by: Vladimir Oltean <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: dsa: microchip: fix bridging with more than two member ports [+ + +]

Author: Svenning Sц╦rensen <[email protected]>
Date:   Fri Feb 18 11:27:01 2022 +0000

    net: dsa: microchip: fix bridging with more than two member ports
    
    commit 3d00827a90db6f79abc7cdc553887f89a2e0a184 upstream.
    
    Commit b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support")
    plugged a packet leak between ports that were members of different bridges.
    Unfortunately, this broke another use case, namely that of more than two
    ports that are members of the same bridge.
    
    After that commit, when a port is added to a bridge, hardware bridging
    between other member ports of that bridge will be cleared, preventing
    packet exchange between them.
    
    Fix by ensuring that the Port VLAN Membership bitmap includes any existing
    ports in the bridge, not just the port being added.
    
    Fixes: b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support")
    Signed-off-by: Svenning Sц╦rensen <[email protected]>
    Tested-by: Oleksij Rempel <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: fix up skbs delta_truesize in UDP GRO frag_list [+ + +]

Author: lena wang <[email protected]>
Date:   Tue Mar 1 19:17:09 2022 +0800

    net: fix up skbs delta_truesize in UDP GRO frag_list
    
    commit 224102de2ff105a2c05695e66a08f4b5b6b2d19c upstream.
    
    The truesize for a UDP GRO packet is added by main skb and skbs in main
    skb's frag_list:
    skb_gro_receive_list
            p->truesize += skb->truesize;
    
    The commit 53475c5dd856 ("net: fix use-after-free when UDP GRO with
    shared fraglist") introduced a truesize increase for frag_list skbs.
    When uncloning skb, it will call pskb_expand_head and trusesize for
    frag_list skbs may increase. This can occur when allocators uses
    __netdev_alloc_skb and not jump into __alloc_skb. This flow does not
    use ksize(len) to calculate truesize while pskb_expand_head uses.
    skb_segment_list
    err = skb_unclone(nskb, GFP_ATOMIC);
    pskb_expand_head
            if (!skb->sk || skb->destructor == sock_edemux)
                    skb->truesize += size - osize;
    
    If we uses increased truesize adding as delta_truesize, it will be
    larger than before and even larger than previous total truesize value
    if skbs in frag_list are abundant. The main skb truesize will become
    smaller and even a minus value or a huge value for an unsigned int
    parameter. Then the following memory check will drop this abnormal skb.
    
    To avoid this error we should use the original truesize to segment the
    main skb.
    
    Fixes: 53475c5dd856 ("net: fix use-after-free when UDP GRO with shared fraglist")
    Signed-off-by: lena wang <[email protected]>
    Acked-by: Paolo Abeni <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ipa: add an interconnect dependency [+ + +]

Author: Alex Elder <[email protected]>
Date:   Tue Mar 1 05:34:40 2022 -0600

    net: ipa: add an interconnect dependency
    
    commit 1dba41c9d2e2dc94b543394974f63d55aa195bfe upstream.
    
    In order to function, the IPA driver very clearly requires the
    interconnect framework to be enabled in the kernel configuration.
    State that dependency in the Kconfig file.
    
    This became a problem when CONFIG_COMPILE_TEST support was added.
    Non-Qualcomm platforms won't necessarily enable CONFIG_INTERCONNECT.
    
    Reported-by: kernel test robot <[email protected]>
    Fixes: 38a4066f593c5 ("net: ipa: support COMPILE_TEST")
    Signed-off-by: Alex Elder <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ipa: fix a build dependency [+ + +]

Author: Alex Elder <[email protected]>
Date:   Fri Feb 25 14:15:30 2022 -0600

    net: ipa: fix a build dependency
    
    commit caef14b7530c065fb85d54492768fa48fdb5093e upstream.
    
    An IPA build problem arose in the linux-next tree the other day.
    The problem is that a recent commit adds a new dependency on some
    code, and the Kconfig file for IPA doesn't reflect that dependency.
    As a result, some configurations can fail to build (particularly
    when COMPILE_TEST is enabled).
    
    The recent patch adds calls to qmp_get(), qmp_put(), and qmp_send(),
    and those are built based on the QCOM_AOSS_QMP config option.  If
    that symbol is not defined, stubs are defined, so we just need to
    ensure QCOM_AOSS_QMP is compatible with QCOM_IPA, or it's not
    defined.
    
    Reported-by: Randy Dunlap <[email protected]>
    Fixes: 34a081761e4e3 ("net: ipa: request IPA register values be retained")
    Signed-off-by: Alex Elder <[email protected]>
    Tested-by: Randy Dunlap <[email protected]>
    Acked-by: Randy Dunlap <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: ipv6: ensure we call ipv6_mc_down() at most once [+ + +]

Author: [email protected] <[email protected]>
Date:   Thu Feb 24 10:06:49 2022 +0100

    net: ipv6: ensure we call ipv6_mc_down() at most once
    
    commit 9995b408f17ff8c7f11bc725c8aa225ba3a63b1c upstream.
    
    There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
    either the network device is actually going down, or IPv6 was disabled
    on the interface.
    
    If either of them stays down while the other is toggled, we repeatedly
    call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
    calling the corresponding ipv6_mc_up() in between. This will cause a
    new entry in idev->mc_tomb to be allocated for each multicast group
    the interface is subscribed to, which in turn leaks one struct ifmcaddr6
    per nontrivial multicast group the interface is subscribed to.
    
    The following reproducer will leak at least $n objects:
    
    ip addr add ff2e::4242/32 dev eth0 autojoin
    sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
    for i in $(seq 1 $n); do
            ip link set up eth0; ip link set down eth0
    done
    
    Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
    sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
    can also be used to create a nontrivial idev->mc_list, which will the
    leak objects with the right up-down-sequence.
    
    Based on both sources for NETDEV_DOWN events the interface IPv6 state
    should be considered:
    
     - not ready if the network interface is not ready OR IPv6 is disabled
       for it
     - ready if the network interface is ready AND IPv6 is enabled for it
    
    The functions ipv6_mc_up() and ipv6_down() should only be run when this
    state changes.
    
    Implement this by remembering when the IPv6 state is ready, and only
    run ipv6_mc_down() if it actually changed from ready to not ready.
    
    The other direction (not ready -> ready) already works correctly, as:
    
     - the interface notification triggered codepath for NETDEV_UP /
       NETDEV_CHANGE returns early if ipv6 is disabled, and
     - the disable_ipv6=0 triggered codepath skips fully initializing the
       interface as long as addrconf_link_ready(dev) returns false
     - calling ipv6_mc_up() repeatedly does not leak anything
    
    Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
    Signed-off-by: Johannes Nixdorf <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: sparx5: Fix add vlan when invalid operation [+ + +]

Author: Casper Andersson <[email protected]>
Date:   Fri Feb 25 11:15:16 2022 +0100

    net: sparx5: Fix add vlan when invalid operation
    
    [ Upstream commit b3a34dc362c03215031b268fcc0b988e69490231 ]
    
    Check if operation is valid before changing any
    settings in hardware. Otherwise it results in
    changes being made despite it not being a valid
    operation.
    
    Fixes: 78eab33bb68b ("net: sparx5: add vlan support")
    
    Signed-off-by: Casper Andersson <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: enhance XDP ZC driver level switching performance [+ + +]

Author: Ong Boon Leong <[email protected]>
Date:   Thu Nov 11 22:39:49 2021 +0800

    net: stmmac: enhance XDP ZC driver level switching performance
    
    [ Upstream commit ac746c8520d9d056b6963ecca8ff1da9929d02f1 ]
    
    The previous stmmac_xdp_set_prog() implementation uses stmmac_release()
    and stmmac_open() which tear down the PHY device and causes undesirable
    autonegotiation which causes a delay whenever AFXDP ZC is setup.
    
    This patch introduces two new functions that just sufficiently tear
    down DMA descriptors, buffer, NAPI process, and IRQs and reestablish
    them accordingly in both stmmac_xdp_release() and stammac_xdp_open().
    
    As the results of this enhancement, we get rid of transient state
    introduced by the link auto-negotiation:
    
    $ ./xdpsock -i eth0 -t -z
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 634444         634560
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 632330         1267072
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 632438         1899584
    
     sock0@eth0:0 txonly xdp-drv
                       pps            pkts           1.00
    rx                 0              0
    tx                 632502         2532160
    
    Reported-by: Kurt Kanzenbach <[email protected]>
    Signed-off-by: Ong Boon Leong <[email protected]>
    Tested-by: Kurt Kanzenbach <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: fix return value of __setup handler [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Feb 23 19:35:36 2022 -0800

    net: stmmac: fix return value of __setup handler
    
    commit e01b042e580f1fbf4fd8da467442451da00c7a90 upstream.
    
    __setup() handlers should return 1 on success, i.e., the parameter
    has been handled. A return of 0 causes the "option=value" string to be
    added to init's environment strings, polluting it.
    
    Fixes: 47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers.")
    Fixes: f3240e2811f0 ("stmmac: remove warning when compile as built-in (V2)")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Link: lore.kernel.org/r/[email protected]
    Cc: Giuseppe Cavallaro <[email protected]>
    Cc: Alexandre Torgue <[email protected]>
    Cc: Jose Abreu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: stmmac: only enable DMA interrupts when ready [+ + +]

Author: Vincent Whitchurch <[email protected]>
Date:   Thu Feb 24 12:38:29 2022 +0100

    net: stmmac: only enable DMA interrupts when ready
    
    [ Upstream commit 087a7b944c5db409f7c1a68bf4896c56ba54eaff ]
    
    In this driver's ->ndo_open() callback, it enables DMA interrupts,
    starts the DMA channels, then requests interrupts with request_irq(),
    and then finally enables napi.
    
    If RX DMA interrupts are received before napi is enabled, no processing
    is done because napi_schedule_prep() will return false.  If the network
    has a lot of broadcast/multicast traffic, then the RX ring could fill up
    completely before napi is enabled.  When this happens, no further RX
    interrupts will be delivered, and the driver will fail to receive any
    packets.
    
    Fix this by only enabling DMA interrupts after all other initialization
    is complete.
    
    Fixes: 523f11b5d4fd72efb ("net: stmmac: move hardware setup for stmmac_open to new function")
    Reported-by: Lars Persson <[email protected]>
    Signed-off-by: Vincent Whitchurch <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: stmmac: perserve TX and RX coalesce value during XDP setup [+ + +]

Author: Ong Boon Leong <[email protected]>
Date:   Wed Nov 24 19:40:19 2021 +0800

    net: stmmac: perserve TX and RX coalesce value during XDP setup
    
    commit 61da6ac715700bcfeef50d187e15c6cc7c9d079b upstream.
    
    When XDP program is loaded, it is desirable that the previous TX and RX
    coalesce values are not re-inited to its default value. This prevents
    unnecessary re-configurig the coalesce values that were working fine
    before.
    
    Fixes: ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance")
    Signed-off-by: Ong Boon Leong <[email protected]>
    Tested-by: Kurt Kanzenbach <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: sxgbe: fix return value of __setup handler [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Feb 23 19:35:28 2022 -0800

    net: sxgbe: fix return value of __setup handler
    
    commit 50e06ddceeea263f57fe92baa677c638ecd65bb6 upstream.
    
    __setup() handlers should return 1 on success, i.e., the parameter
    has been handled. A return of 0 causes the "option=value" string to be
    added to init's environment strings, polluting it.
    
    Fixes: acc18c147b22 ("net: sxgbe: add EEE(Energy Efficient Ethernet) for Samsung sxgbe")
    Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Link: lore.kernel.org/r/[email protected]
    Cc: Siva Reddy <[email protected]>
    Cc: Girish K S <[email protected]>
    Cc: Byungho An <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 [+ + +]

Author: Daniele Palmas <[email protected]>
Date:   Tue Feb 15 12:13:35 2022 +0100

    net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
    
    [ Upstream commit 21e8a96377e6b6debae42164605bf9dcbe5720c5 ]
    
    Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
    0x1071 composition in order to avoid bind error.
    
    Signed-off-by: Daniele Palmas <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: fix use-after-free in __nf_register_net_hook() [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Sun Feb 27 10:01:41 2022 -0800

    netfilter: fix use-after-free in __nf_register_net_hook()
    
    commit 56763f12b0f02706576a088e85ef856deacc98a0 upstream.
    
    We must not dereference @new_hooks after nf_hook_mutex has been released,
    because other threads might have freed our allocated hooks already.
    
    BUG: KASAN: use-after-free in nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
    BUG: KASAN: use-after-free in hooks_validate net/netfilter/core.c:171 [inline]
    BUG: KASAN: use-after-free in __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
    Read of size 2 at addr ffff88801c1a8000 by task syz-executor237/4430
    
    CPU: 1 PID: 4430 Comm: syz-executor237 Not tainted 5.17.0-rc5-syzkaller-00306-g2293be58d6a1 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
     __kasan_report mm/kasan/report.c:442 [inline]
     kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
     nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
     hooks_validate net/netfilter/core.c:171 [inline]
     __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
     nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
     nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
     nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
     synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
     xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
     check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
     find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
     translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
     do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
     do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
     nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
     ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1024
     rawv6_setsockopt+0xd3/0x6a0 net/ipv6/raw.c:1084
     __sys_setsockopt+0x2db/0x610 net/socket.c:2180
     __do_sys_setsockopt net/socket.c:2191 [inline]
     __se_sys_setsockopt net/socket.c:2188 [inline]
     __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7f65a1ace7d9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 71 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f65a1a7f308 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f65a1ace7d9
    RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
    RBP: 00007f65a1b574c8 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000020000000 R11: 0000000000000246 R12: 00007f65a1b55130
    R13: 00007f65a1b574c0 R14: 00007f65a1b24090 R15: 0000000000022000
     </TASK>
    
    The buggy address belongs to the page:
    page:ffffea0000706a00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c1a8
    flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
    raw: 00fff00000000000 ffffea0001c1b108 ffffea000046dd08 0000000000000000
    raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as freed
    page last allocated via order 2, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 4430, ts 1061781545818, free_ts 1061791488993
     prep_new_page mm/page_alloc.c:2434 [inline]
     get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
     __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
     __alloc_pages_node include/linux/gfp.h:572 [inline]
     alloc_pages_node include/linux/gfp.h:595 [inline]
     kmalloc_large_node+0x62/0x130 mm/slub.c:4438
     __kmalloc_node+0x35a/0x4a0 mm/slub.c:4454
     kmalloc_node include/linux/slab.h:604 [inline]
     kvmalloc_node+0x97/0x100 mm/util.c:580
     kvmalloc include/linux/slab.h:731 [inline]
     kvzalloc include/linux/slab.h:739 [inline]
     allocate_hook_entries_size net/netfilter/core.c:61 [inline]
     nf_hook_entries_grow+0x140/0x780 net/netfilter/core.c:128
     __nf_register_net_hook+0x144/0x820 net/netfilter/core.c:429
     nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
     nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
     nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
     synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
     xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
     check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
     find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
     translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
     do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
     do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
     nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
    page last free stack trace:
     reset_page_owner include/linux/page_owner.h:24 [inline]
     free_pages_prepare mm/page_alloc.c:1352 [inline]
     free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
     free_unref_page_prepare mm/page_alloc.c:3325 [inline]
     free_unref_page+0x19/0x690 mm/page_alloc.c:3404
     kvfree+0x42/0x50 mm/util.c:613
     rcu_do_batch kernel/rcu/tree.c:2527 [inline]
     rcu_core+0x7b1/0x1820 kernel/rcu/tree.c:2778
     __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
    
    Memory state around the buggy address:
     ffff88801c1a7f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ffff88801c1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    >ffff88801c1a8000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                       ^
     ffff88801c1a8080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
     ffff88801c1a8100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    
    Fixes: 2420b79f8c18 ("netfilter: debug: check for sorted array")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reported-by: syzbot <[email protected]>
    Acked-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_queue: don't assume sk is full socket [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Fri Feb 25 14:02:41 2022 +0100

    netfilter: nf_queue: don't assume sk is full socket
    
    commit 747670fd9a2d1b7774030dba65ca022ba442ce71 upstream.
    
    There is no guarantee that state->sk refers to a full socket.
    
    If refcount transitions to 0, sock_put calls sk_free which then ends up
    with garbage fields.
    
    I'd like to thank Oleksandr Natalenko and Jiri Benc for considerable
    debug work and pointing out state->sk oddities.
    
    Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
    Tested-by: Oleksandr Natalenko <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_queue: fix possible use-after-free [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Mon Feb 28 06:22:22 2022 +0100

    netfilter: nf_queue: fix possible use-after-free
    
    commit c3873070247d9e3c7a6b0cf9bf9b45e8018427b1 upstream.
    
    Eric Dumazet says:
      The sock_hold() side seems suspect, because there is no guarantee
      that sk_refcnt is not already 0.
    
    On failure, we cannot queue the packet and need to indicate an
    error.  The packet will be dropped by the caller.
    
    v2: split skb prefetch hunk into separate change
    
    Fixes: 271b72c7fa82c ("udp: RCU handling for Unicast packets.")
    Reported-by: Eric Dumazet <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_queue: handle socket prefetch [+ + +]

Author: Florian Westphal <[email protected]>
Date:   Tue Mar 1 00:46:19 2022 +0100

    netfilter: nf_queue: handle socket prefetch
    
    commit 3b836da4081fa585cf6c392f62557496f2cb0efe upstream.
    
    In case someone combines bpf socket assign and nf_queue, then we will
    queue an skb who references a struct sock that did not have its
    reference count incremented.
    
    As we leave rcu protection, there is no guarantee that skb->sk is still
    valid.
    
    For refcount-less skb->sk case, try to increment the reference count
    and then override the destructor.
    
    In case of failure we have two choices: orphan the skb and 'delete'
    preselect or let nf_queue() drop the packet.
    
    Do the latter, it should not happen during normal operation.
    
    Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
    Acked-by: Joe Stringer <[email protected]>
    Signed-off-by: Florian Westphal <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Tue Feb 22 10:13:31 2022 -0800

    netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
    
    [ Upstream commit ae089831ff28a115908b8d796f667c2dadef1637 ]
    
    While kfree_rcu(ptr) _is_ supported, it has some limitations.
    
    Given that 99.99% of kfree_rcu() users [1] use the legacy
    two parameters variant, and @catchall objects do have an rcu head,
    simply use it.
    
    Choice of kfree_rcu(ptr) variant was probably not intentional.
    
    [1] including calls from net/netfilter/nf_tables_api.c
    
    Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nl80211: Handle nla_memdup failures in handle_nan_filter [+ + +]

Author: Jiasheng Jiang <[email protected]>
Date:   Tue Mar 1 18:00:20 2022 +0800

    nl80211: Handle nla_memdup failures in handle_nan_filter
    
    [ Upstream commit 6ad27f522cb3b210476daf63ce6ddb6568c0508b ]
    
    As there's potential for failure of the nla_memdup(),
    check the return value.
    
    Fixes: a442b761b24b ("cfg80211: add add_nan_func / del_nan_func")
    Signed-off-by: Jiasheng Jiang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ntb: intel: fix port config status offset for SPR [+ + +]

Author: Dave Jiang <[email protected]>
Date:   Thu Jan 27 13:31:12 2022 -0700

    ntb: intel: fix port config status offset for SPR
    
    commit d5081bf5dcfb1cb83fb538708b0ac07a10a79cc4 upstream.
    
    The field offset for port configuration status on SPR has been changed to
    bit 14 from ICX where it resides at bit 12. By chance link status detection
    continued to work on SPR. This is due to bit 12 being a configuration bit
    which is in sync with the status bit. Fix this by checking for a SPR device
    and checking correct status bit.
    
    Fixes: 26bfe3d0b227 ("ntb: intel: Add Icelake (gen4) support for Intel NTB")
    Tested-by: Jerry Dai <[email protected]>
    Signed-off-by: Dave Jiang <[email protected]>
    Signed-off-by: Jon Mason <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

pinctrl: sunxi: Use unique lockdep classes for IRQs [+ + +]

Author: Samuel Holland <[email protected]>
Date:   Tue Feb 15 22:00:36 2022 -0600

    pinctrl: sunxi: Use unique lockdep classes for IRQs
    
    commit bac129dbc6560dfeb634c03f0c08b78024e71915 upstream.
    
    This driver, like several others, uses a chained IRQ for each GPIO bank,
    and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
    a call to irq_set_irq_wake() needs to lock both the upstream and
    downstream irq_desc's. Lockdep considers this to be a possible deadlock
    when the irq_desc's share lockdep classes, which they do by default:
    
     ============================================
     WARNING: possible recursive locking detected
     5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
     --------------------------------------------
     init/307 is trying to acquire lock:
     c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
    
     but task is already holding lock:
     c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
    
     other info that might help us debug this:
      Possible unsafe locking scenario:
    
            CPU0
            ----
       lock(&irq_desc_lock_class);
       lock(&irq_desc_lock_class);
    
      *** DEADLOCK ***
    
      May be due to missing lock nesting notation
    
     4 locks held by init/307:
      #0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c
      #1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224
      #2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224
      #3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
    
     stack backtrace:
     CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
     Hardware name: Allwinner sun8i Family
      unwind_backtrace from show_stack+0x10/0x14
      show_stack from dump_stack_lvl+0x68/0x90
      dump_stack_lvl from __lock_acquire+0x1680/0x31a0
      __lock_acquire from lock_acquire+0x148/0x3dc
      lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
      _raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
      __irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
      irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
        [tail call from sunxi_pinctrl_irq_set_wake]
      irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
      gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
      gpio_keys_shutdown from device_shutdown+0x180/0x224
      device_shutdown from __do_sys_reboot+0x134/0x23c
      __do_sys_reboot from ret_fast_syscall+0x0/0x1c
    
    However, this can never deadlock because the upstream and downstream
    IRQs are never the same (nor do they even involve the same irqchip).
    
    Silence this erroneous lockdep splat by applying what appears to be the
    usual fix of moving the GPIO IRQs to separate lockdep classes.
    
    Fixes: a59c99d9eaf9 ("pinctrl: sunxi: Forward calls to irq_set_irq_wake")
    Reported-by: Guenter Roeck <[email protected]>
    Signed-off-by: Samuel Holland <[email protected]>
    Reviewed-by: Jernej Skrabec <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Linus Walleij <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup [+ + +]

Author: Mario Limonciello <[email protected]>
Date:   Wed Feb 23 11:52:37 2022 -0600

    platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
    
    commit 68af28426b3ca1bf9ba21c7d8bdd0ff639e5134c upstream.
    
    commit 59348401ebed ("platform/x86: amd-pmc: Add special handling for
    timer based S0i3 wakeup") adds support for using another platform timer
    in lieu of the RTC which doesn't work properly on some systems. This path
    was validated and worked well before submission. During the 5.16-rc1 merge
    window other patches were merged that caused this to stop working properly.
    
    When this feature was used with 5.16-rc1 or later some OEM laptops with the
    matching firmware requirements from that commit would shutdown instead of
    program a timer based wakeup.
    
    This was bisected to commit 8d89835b0467 ("PM: suspend: Do not pause
    cpuidle in the suspend-to-idle path").  This wasn't supposed to cause any
    negative impacts and also tested well on both Intel and ARM platforms.
    However this changed the semantics of when CPUs are allowed to be in the
    deepest state. For the AMD systems in question it appears this causes a
    firmware crash for timer based wakeup.
    
    It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
    and all the CPUs going into a deep state while the timer is still being
    programmed. It's likely a firmware bug, but to avoid it don't allow setting
    CPUs into the deepest state while using CZN timer wakeup path.
    
    If later it's discovered that this also occurs from "regular" suspends
    without a timer as well or on other silicon, this may be later expanded to
    run in the suspend path for more scenarios.
    
    Cc: [email protected] # 5.16+
    Suggested-by: Rafael J. Wysocki <[email protected]>
    Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e
    Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
    Fixes: 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
    Fixes: 59348401ebed ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
    Reviewed-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Mario Limonciello <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Hans de Goede <[email protected]>
    Signed-off-by: Hans de Goede <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

proc: fix documentation and description of pagemap [+ + +]

Author: Yun Zhou <[email protected]>
Date:   Fri Mar 4 20:29:07 2022 -0800

    proc: fix documentation and description of pagemap
    
    commit dd21bfa425c098b95ca86845f8e7d1ec1ddf6e4a upstream.
    
    Since bit 57 was exported for uffd-wp write-protected (commit
    fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
    fixing it can reduce some unnecessary confusion.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information")
    Signed-off-by: Yun Zhou <[email protected]>
    Reviewed-by: Peter Xu <[email protected]>
    Cc: Jonathan Corbet <[email protected]>
    Cc: Tiberiu A Georgescu <[email protected]>
    Cc: Florian Schmidt <[email protected]>
    Cc: Ivan Teterevkov <[email protected]>
    Cc: SeongJae Park <[email protected]>
    Cc: Yang Shi <[email protected]>
    Cc: David Hildenbrand <[email protected]>
    Cc: Axel Rasmussen <[email protected]>
    Cc: Miaohe Lin <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Colin Cross <[email protected]>
    Cc: Alistair Popple <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments [+ + +]

Author: Jonathan Lemon <[email protected]>
Date:   Mon Feb 28 12:39:57 2022 -0800

    ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
    
    [ Upstream commit 90f8f4c0e3cebd541deaa45cf0e470bb9810dd4f ]
    
    In ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime."), the
    ns adjustment was written to the FPGA register, so the clock could
    accurately perform adjustments.
    
    However, the adjtime() call passes in a s64, while the clock adjustment
    registers use a s32.  When trying to perform adjustments with a large
    value (37 sec), things fail.
    
    Examine the incoming delta, and if larger than 1 sec, use the original
    (coarse) adjustment method.  If smaller than 1 sec, then allow the
    FPGA to fold in the changes over a 1 second window.
    
    Fixes: 6d59d4fa1789 ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime.")
    Signed-off-by: Jonathan Lemon <[email protected]>
    Acked-by: Richard Cochran <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

regulator: core: fix false positive in regulator_late_cleanup() [+ + +]

Author: Oliver Barta <[email protected]>
Date:   Tue Feb 8 09:46:45 2022 +0100

    regulator: core: fix false positive in regulator_late_cleanup()
    
    [ Upstream commit 4e2a354e3775870ca823f1fb29bbbffbe11059a6 ]
    
    The check done by regulator_late_cleanup() to detect whether a regulator
    is on was inconsistent with the check done by _regulator_is_enabled().
    While _regulator_is_enabled() takes the enable GPIO into account,
    regulator_late_cleanup() was not doing that.
    
    This resulted in a false positive, e.g. when a GPIO-controlled fixed
    regulator was used, which was not enabled at boot time, e.g.
    
    reg_disp_1v2: reg_disp_1v2 {
            compatible = "regulator-fixed";
            regulator-name = "display_1v2";
            regulator-min-microvolt = <1200000>;
            regulator-max-microvolt = <1200000>;
            gpio = <&tlmm 148 0>;
            enable-active-high;
    };
    
    Such regulator doesn't have an is_enabled() operation. Nevertheless
    it's state can be determined based on the enable GPIO. The check in
    regulator_late_cleanup() wrongly assumed that the regulator is on and
    tried to disable it.
    
    Signed-off-by: Oliver Barta <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6" [+ + +]

Author: Jiri Bohac <[email protected]>
Date:   Wed Jan 26 16:00:18 2022 +0100

    Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
    
    commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream.
    
    This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a.
    
    Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu
    should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
    calculation in ipsec transport mode, resulting complete stalls of TCP
    connections. This happens when the (P)MTU is 1280 or slighly larger.
    
    The desired formula for the MSS is:
    MSS = (MTU - ESP_overhead) - IP header - TCP header
    
    However, the above commit clamps the (MTU - ESP_overhead) to a
    minimum of 1280, turning the formula into
    MSS = max(MTU - ESP overhead, 1280) -  IP header - TCP header
    
    With the (P)MTU near 1280, the calculated MSS is too large and the
    resulting TCP packets never make it to the destination because they
    are over the actual PMTU.
    
    The above commit also causes suboptimal double fragmentation in
    xfrm tunnel mode, as described in
    https://lore.kernel.org/netdev/[email protected]/
    
    The original problem the above commit was trying to fix is now fixed
    by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
    regression").
    
    Signed-off-by: Jiri Bohac <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value [+ + +]

Author: Sunil V L <[email protected]>
Date:   Fri Jan 28 10:20:04 2022 +0530

    riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
    
    commit dcf0c838854c86e1f41fb1934aea906845d69782 upstream.
    
    The get_boot_hartid_from_fdt() function currently returns U32_MAX
    for failure case which is not correct because U32_MAX is a valid
    hartid value. This patch fixes the issue by returning error code.
    
    Cc: <[email protected]>
    Fixes: d7071743db31 ("RISC-V: Add EFI stub support.")
    Signed-off-by: Sunil V L <[email protected]>
    Reviewed-by: Heinrich Schuchardt <[email protected]>
    Signed-off-by: Ard Biesheuvel <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv: Fix config KASAN && DEBUG_VIRTUAL [+ + +]

Author: Alexandre Ghiti <[email protected]>
Date:   Fri Feb 25 13:39:51 2022 +0100

    riscv: Fix config KASAN && DEBUG_VIRTUAL
    
    commit c648c4bb7d02ceb53ee40172fdc4433b37cee9c6 upstream.
    
    __virt_to_phys function is called very early in the boot process (ie
    kasan_early_init) so it should not be instrumented by KASAN otherwise it
    bugs.
    
    Fix this by declaring phys_addr.c as non-kasan instrumentable.
    
    Signed-off-by: Alexandre Ghiti <[email protected]>
    Fixes: 8ad8b72721d0 (riscv: Add KASAN support)
    Cc: [email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP [+ + +]

Author: Alexandre Ghiti <[email protected]>
Date:   Fri Feb 25 13:39:49 2022 +0100

    riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
    
    commit a3d328037846d013bb4c7f3777241e190e4c75e1 upstream.
    
    In order to get the pfn of a struct page* when sparsemem is enabled
    without vmemmap, the mem_section structures need to be initialized which
    happens in sparse_init.
    
    But kasan_early_init calls pfn_to_page way before sparse_init is called,
    which then tries to dereference a null mem_section pointer.
    
    Fix this by removing the usage of this function in kasan_early_init.
    
    Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
    Signed-off-by: Alexandre Ghiti <[email protected]>
    Cc: [email protected]
    Signed-off-by: Palmer Dabbelt <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/extable: fix exception table sorting [+ + +]

Author: Heiko Carstens <[email protected]>
Date:   Thu Feb 24 22:03:29 2022 +0100

    s390/extable: fix exception table sorting
    
    commit c194dad21025dfd043210912653baab823bdff67 upstream.
    
    s390 has a swap_ex_entry_fixup function, however it is not being used
    since common code expects a swap_ex_entry_fixup define. If it is not
    defined the default implementation will be used. So fix this by adding
    a proper define.
    However also the implementation of the function must be fixed, since a
    NULL value for handler has a special meaning and must not be adjusted.
    
    Luckily all of this doesn't fix a real bug currently: the main extable
    is correctly sorted during build time, and for runtime sorting there
    is currently no case where the handler field is not NULL.
    
    Fixes: 05a68e892e89 ("s390/kernel: expand exception table logic to allow new handling options")
    Acked-by: Ilya Leoshkevich <[email protected]>
    Reviewed-by: Alexander Gordeev <[email protected]>
    Signed-off-by: Heiko Carstens <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/ftrace: fix arch_ftrace_get_regs implementation [+ + +]

Author: Heiko Carstens <[email protected]>
Date:   Tue Feb 22 14:53:47 2022 +0100

    s390/ftrace: fix arch_ftrace_get_regs implementation
    
    commit 1389f17937a03fe4ec71b094e1aa6530a901963e upstream.
    
    arch_ftrace_get_regs is supposed to return a struct pt_regs pointer
    only if the pt_regs structure contains all register contents, which
    means it must have been populated when created via ftrace_regs_caller.
    
    If it was populated via ftrace_caller the contents are not complete
    (the psw mask part is missing), and therefore a NULL pointer needs be
    returned.
    
    The current code incorrectly always returns a struct pt_regs pointer.
    
    Fix this by adding another pt_regs flag which indicates if the
    contents are complete, and fix arch_ftrace_get_regs accordingly.
    
    Fixes: 894979689d3a ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
    Reported-by: Christophe Leroy <[email protected]>
    Reported-by: Naveen N. Rao <[email protected]>
    Reviewed-by: Sven Schnelle <[email protected]>
    Acked-by: Ilya Leoshkevich <[email protected]>
    Signed-off-by: Heiko Carstens <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation [+ + +]

Author: Heiko Carstens <[email protected]>
Date:   Wed Feb 23 13:02:59 2022 +0100

    s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
    
    commit 9fa881f7e3c74ce6626d166bca9397e5d925937f upstream.
    
    ftrace_caller was used for both ftrace_caller and ftrace_regs_caller,
    which means that the target address of the hotpatch trampoline was
    never updated.
    
    With commit 894979689d3a ("s390/ftrace: provide separate
    ftrace_caller/ftrace_regs_caller implementations") a separate
    ftrace_regs_caller entry point was implemeted, however it was
    forgotten to implement the necessary changes for ftrace_modify_call
    and ftrace_make_call, where the branch target has to be modified
    accordingly.
    
    Therefore add the missing code now.
    
    Fixes: 894979689d3a ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
    Reviewed-by: Sven Schnelle <[email protected]>
    Acked-by: Ilya Leoshkevich <[email protected]>
    Signed-off-by: Heiko Carstens <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE [+ + +]

Author: Alexander Egorenkov <[email protected]>
Date:   Wed Feb 9 11:25:09 2022 +0100

    s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
    
    commit 6b4b54c7ca347bcb4aa7a3cc01aa16e84ac7fbe4 upstream.
    
    We need to preserve the values at OLDMEM_BASE and OLDMEM_SIZE which are
    used by zgetdump in case when kdump crashes. In that case zgetdump will
    attempt to read OLDMEM_BASE and OLDMEM_SIZE in order to find out where
    the memory range [0 - OLDMEM_SIZE] belonging to the production kernel is.
    
    Fixes: f1a546947431 ("s390/setup: don't reserve memory that occupied decompressor's head")
    Cc: [email protected] # 5.15+
    Signed-off-by: Alexander Egorenkov <[email protected]>
    Acked-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

sched/fair: Fix fault in reweight_entity [+ + +]

Author: Tadeusz Struk <[email protected]>
Date:   Thu Feb 3 08:18:46 2022 -0800

    sched/fair: Fix fault in reweight_entity
    
    [ Upstream commit 13765de8148f71fa795e0a6607de37c49ea5915a ]
    
    Syzbot found a GPF in reweight_entity. This has been bisected to
    commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid
    sched_task_group")
    
    Thereб═is a race between sched_post_fork() and setpriority(PRIO_PGRP)
    within a thread group that causes a null-ptr-derefб═in
    reweight_entity() in CFS. The scenario is that the main process spawns
    number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
    wait, and exit.  For each of the new threads the copy_process() gets
    invoked, which adds the new task_struct and calls sched_post_fork()
    for it.
    
    In the above scenario there is a possibility that
    setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
    in the group that is just being created by copy_process(), and for
    which the sched_post_fork() has not been executed yet. This will
    trigger a null pointer dereference in reweight_entity(),б═as it will
    try to access the run queue pointer, which hasn't been set.
    
    Before the mentioned change the cfs_rq pointer for the task  has been
    set in sched_fork(), which is called much earlier in copy_process(),
    before the new task is added to the thread_group.  Now it is done in
    the sched_post_fork(), which is called after that.  To fix the issue
    the remove the update_load param from the update_load param() function
    and call reweight_task() only if the task flag doesn't have the
    TASK_NEW flag set.
    
    Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
    Reported-by: [email protected]
    Signed-off-by: Tadeusz Struk <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Reviewed-by: Dietmar Eggemann <[email protected]>
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

sched: Fix yet more sched_fork() races [+ + +]

Author: Peter Zijlstra <[email protected]>
Date:   Mon Feb 14 10:16:57 2022 +0100

    sched: Fix yet more sched_fork() races
    
    commit b1e8206582f9d680cff7d04828708c8b6ab32957 upstream.
    
    Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
    invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
    race vs syscalls by not placing the task on the runqueue before it
    gets exposed through the pidhash.
    
    Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is
    trying to fix a single instance of this, instead fix the whole class
    of issues, effectively reverting this commit.
    
    Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
    Reported-by: Linus Torvalds <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Tested-by: Tadeusz Struk <[email protected]>
    Tested-by: Zhang Qiao <[email protected]>
    Tested-by: Dietmar Eggemann <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT [+ + +]

Author: Krzysztof Kozlowski <[email protected]>
Date:   Mon Feb 14 09:36:57 2022 +0100

    selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
    
    [ Upstream commit 6fec1ab67f8d60704cc7de64abcfd389ab131542 ]
    
    The PREEMPT_RT patchset does not use do_softirq() function thus trying
    to filter for do_softirq fails for such kernel:
    
      echo do_softirq
      ftracetest: 81: echo: echo: I/O error
    
    Choose some other visible function for the test.  The function does not
    have to be actually executed during the test, because it is only testing
    filter API interface.
    
    Signed-off-by: Krzysztof Kozlowski <[email protected]>
    Reviewed-by: Shuah Khan <[email protected]>
    Acked-by: Sebastian Andrzej Siewior <[email protected]>
    Reviewed-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Shuah Khan <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests/seccomp: Fix seccomp failure by adding missing headers [+ + +]

Author: Sherry Yang <[email protected]>
Date:   Thu Feb 10 12:30:49 2022 -0800

    selftests/seccomp: Fix seccomp failure by adding missing headers
    
    [ Upstream commit 21bffcb76ee2fbafc7d5946cef10abc9df5cfff7 ]
    
    seccomp_bpf failed on tests 47 global.user_notification_filter_empty
    and 48 global.user_notification_filter_empty_threaded when it's
    tested on updated kernel but with old kernel headers. Because old
    kernel headers don't have definition of macro __NR_clone3 which is
    required for these two tests. Since under selftests/, we can install
    headers once for all tests (the default INSTALL_HDR_PATH is
    usr/include), fix it by adding usr/include to the list of directories
    to be searched. Use "-isystem" to indicate it's a system directory as
    the real kernel headers directories are.
    
    Signed-off-by: Sherry Yang <[email protected]>
    Tested-by: Sherry Yang <[email protected]>
    Reviewed-by: Kees Cook <[email protected]>
    Signed-off-by: Shuah Khan <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: mlxsw: resource_scale: Fix return value [+ + +]

Author: Amit Cohen <[email protected]>
Date:   Wed Mar 2 18:14:47 2022 +0200

    selftests: mlxsw: resource_scale: Fix return value
    
    [ Upstream commit 196f9bc050cbc5085b4cbb61cce2efe380bc66d0 ]
    
    The test runs several test cases and is supposed to return an error in
    case at least one of them failed.
    
    Currently, the check of the return value of each test case is in the
    wrong place, which can result in the wrong return value. For example:
    
     # TESTS='tc_police' ./resource_scale.sh
     TEST: 'tc_police' [default] 968                                     [FAIL]
             tc police offload count failed
     Error: mlxsw_spectrum: Failed to allocate policer index.
     We have an error talking to the kernel
     Command failed /tmp/tmp.i7Oc5HwmXY:969
     TEST: 'tc_police' [default] overflow 969                            [ OK ]
     ...
     TEST: 'tc_police' [ipv4_max] overflow 969                           [ OK ]
    
     $ echo $?
     0
    
    Fix this by moving the check to be done after each test case.
    
    Fixes: 059b18e21c63 ("selftests: mlxsw: Return correct error code in resource scale test")
    Signed-off-by: Amit Cohen <[email protected]>
    Reviewed-by: Petr Machata <[email protected]>
    Signed-off-by: Ido Schimmel <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: mlxsw: tc_police_scale: Make test more robust [+ + +]

Author: Amit Cohen <[email protected]>
Date:   Wed Mar 2 18:14:46 2022 +0200

    selftests: mlxsw: tc_police_scale: Make test more robust
    
    commit dc9752075341e7beb653e37c6f4a3723074dc8bc upstream.
    
    The test adds tc filters and checks how many of them were offloaded by
    grepping for 'in_hw'.
    
    iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
    action offload") added offload indication to tc actions, producing the
    following output:
    
     $ tc filter show dev swp2 ingress
     ...
     filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
       eth_type ipv6
       dst_ip 2001:db8:1::7bf
       skip_sw
       in_hw in_hw_count 1
             action order 1:  police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
             ref 1 bind 1
             not_in_hw
             used_hw_stats immediate
    
    The current grep expression matches on both 'in_hw' and 'not_in_hw',
    resulting in incorrect results.
    
    Fix that by using JSON output instead.
    
    Fixes: 5061e773264b ("selftests: mlxsw: Add scale test for tc-police")
    Signed-off-by: Amit Cohen <[email protected]>
    Reviewed-by: Petr Machata <[email protected]>
    Signed-off-by: Ido Schimmel <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: stm32: prevent TDR register overwrite when sending x_char [+ + +]

Author: Valentin Caron <[email protected]>
Date:   Tue Jan 11 17:44:40 2022 +0100

    serial: stm32: prevent TDR register overwrite when sending x_char
    
    [ Upstream commit d3d079bde07e1b7deaeb57506dc0b86010121d17 ]
    
    When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
    the value of TDR register by the value of x_char. If this happens, the
    previous value that was present in TDR register will not be sent through
    uart.
    
    This code checks if the previous value in TDR register is sent before
    writing the x_char value into register.
    
    Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
    Cc: stable <[email protected]>
    Signed-off-by: Valentin Caron <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: guts: Add a missing memory allocation failure check [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Wed Nov 3 21:00:33 2021 +0100

    soc: fsl: guts: Add a missing memory allocation failure check
    
    [ Upstream commit b9abe942cda43a1d46a0fd96efb54f1aa909f757 ]
    
    If 'devm_kstrdup()' fails, we should return -ENOMEM.
    
    While at it, move the 'of_node_put()' call in the error handling path and
    after the 'machine' has been copied.
    Better safe than sorry.
    
    Fixes: a6fc3b698130 ("soc: fsl: add GUTS driver for QorIQ platforms")
    Depends-on: fddacc7ff4dd ("soc: fsl: guts: Revert commit 3c0d64e867ed")
    Suggested-by: Tyrel Datwyler <[email protected]>
    Signed-off-by: Christophe JAILLET <[email protected]>
    Signed-off-by: Li Yang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: guts: Revert commit 3c0d64e867ed [+ + +]

Author: Christophe JAILLET <[email protected]>
Date:   Wed Nov 3 21:00:17 2021 +0100

    soc: fsl: guts: Revert commit 3c0d64e867ed
    
    [ Upstream commit b113737cf12964a20cc3ba1ddabe6229099661c6 ]
    
    This reverts commit 3c0d64e867ed
    ("soc: fsl: guts: reuse machine name from device tree").
    
    A following patch will fix the missing memory allocation failure check
    instead.
    
    Suggested-by: Tyrel Datwyler <[email protected]>
    Signed-off-by: Christophe JAILLET <[email protected]>
    Signed-off-by: Li Yang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: fsl: qe: Check of ioremap return value [+ + +]

Author: Jiasheng Jiang <[email protected]>
Date:   Thu Dec 30 09:45:43 2021 +0800

    soc: fsl: qe: Check of ioremap return value
    
    [ Upstream commit a222fd8541394b36b13c89d1698d9530afd59a9c ]
    
    As the possible failure of the ioremap(), the par_io could be NULL.
    Therefore it should be better to check it and return error in order to
    guarantee the success of the initiation.
    But, I also notice that all the caller like mpc85xx_qe_par_io_init() in
    `arch/powerpc/platforms/85xx/common.c` don't check the return value of
    the par_io_init().
    Actually, par_io_init() needs to check to handle the potential error.
    I will submit another patch to fix that.
    Anyway, par_io_init() itsely should be fixed.
    
    Fixes: 7aa1aa6ecec2 ("QE: Move QE from arch/powerpc to drivers/soc")
    Signed-off-by: Jiasheng Jiang <[email protected]>
    Signed-off-by: Li Yang <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

soc: imx: gpcv2: Fix clock disabling imbalance in error path [+ + +]

Author: Laurent Pinchart <[email protected]>
Date:   Fri Feb 18 23:57:20 2022 +0200

    soc: imx: gpcv2: Fix clock disabling imbalance in error path
    
    [ Upstream commit fa231bef3b34f1670b240409c11e59a3ce095e6d ]
    
    The imx_pgc_power_down() starts by enabling the domain clocks, and thus
    disables them in the error path. Commit 18c98573a4cf ("soc: imx: gpcv2:
    add domain option to keep domain clocks enabled") made the clock enable
    conditional, but forgot to add the same condition to the error path.
    This can result in a clock enable/disable imbalance. Fix it.
    
    Fixes: 18c98573a4cf ("soc: imx: gpcv2: add domain option to keep domain clocks enabled")
    Signed-off-by: Laurent Pinchart <[email protected]>
    Reviewed-by: Lucas Stach <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

thermal: core: Fix TZ_GET_TRIP NULL pointer dereference [+ + +]

Author: Nicolas Cavallari <[email protected]>
Date:   Mon Feb 28 12:03:51 2022 +0100

    thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
    
    commit 5838a14832d447990827d85e90afe17e6fb9c175 upstream.
    
    Do not call get_trip_hyst() from thermal_genl_cmd_tz_get_trip() if
    the thermal zone does not define one.
    
    Fixes: 1ce50e7d408e ("thermal: core: genetlink support for events/cmd/sampling")
    Signed-off-by: Nicolas Cavallari <[email protected]>
    Cc: 5.10+ <[email protected]> # 5.10+
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tipc: fix a bit overflow in tipc_crypto_key_rcv() [+ + +]

Author: Hangyu Hua <[email protected]>
Date:   Fri Feb 11 12:55:10 2022 +0800

    tipc: fix a bit overflow in tipc_crypto_key_rcv()
    
    [ Upstream commit 143de8d97d79316590475dc2a84513c63c863ddf ]
    
    msg_data_sz return a 32bit value, but size is 16bit. This may lead to a
    bit overflow.
    
    Signed-off-by: Hangyu Hua <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing/histogram: Fix sorting on old "cpu" value [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Tue Mar 1 22:29:04 2022 -0500

    tracing/histogram: Fix sorting on old "cpu" value
    
    commit 1d1898f65616c4601208963c3376c1d828cbf2c7 upstream.
    
    When trying to add a histogram against an event with the "cpu" field, it
    was impossible due to "cpu" being a keyword to key off of the running CPU.
    So to fix this, it was changed to "common_cpu" to match the other generic
    fields (like "common_pid"). But since some scripts used "cpu" for keying
    off of the CPU (for events that did not have "cpu" as a field, which is
    most of them), a backward compatibility trick was added such that if "cpu"
    was used as a key, and the event did not have "cpu" as a field name, then
    it would fallback and switch over to "common_cpu".
    
    This fix has a couple of subtle bugs. One was that when switching over to
    "common_cpu", it did not change the field name, it just set a flag. But
    the code still found a "cpu" field. The "cpu" field is used for filtering
    and is returned when the event does not have a "cpu" field.
    
    This was found by:
    
      # cd /sys/kernel/tracing
      # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
      # cat events/sched/sched_wakeup/hist
    
    Which showed the histogram unsorted:
    
    { cpu:         19, pid:       1175 } hitcount:          1
    { cpu:          6, pid:        239 } hitcount:          2
    { cpu:         23, pid:       1186 } hitcount:         14
    { cpu:         12, pid:        249 } hitcount:          2
    { cpu:          3, pid:        994 } hitcount:          5
    
    Instead of hard coding the "cpu" checks, take advantage of the fact that
    trace_event_field_field() returns a special field for "cpu" and "CPU" if
    the event does not have "cpu" as a field. This special field has the
    "filter_type" of "FILTER_CPU". Check that to test if the returned field is
    of the CPU type instead of doing the string compare.
    
    Also, fix the sorting bug by testing for the hist_field flag of
    HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
    the special CPU field to know what compare routine to use, and since that
    special field does not have a size, it returns tracing_map_cmp_none.
    
    Cc: [email protected]
    Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"")
    Reported-by: Daniel Bristot de Oliveira <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tracing: Add test for user space strings when filtering on string pointers [+ + +]

Author: Steven Rostedt <[email protected]>
Date:   Mon Jan 10 11:55:32 2022 -0500

    tracing: Add test for user space strings when filtering on string pointers
    
    [ Upstream commit 77360f9bbc7e5e2ab7a2c8b4c0244fbbfcfc6f62 ]
    
    Pingfan reported that the following causes a fault:
    
      echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
      echo 1 > events/syscalls/sys_enter_at/enable
    
    The reason is that trace event filter treats the user space pointer
    defined by "filename" as a normal pointer to compare against the "cpu"
    string. The following bug happened:
    
     kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
     #PF: supervisor read access in kernel mode
     #PF: error_code(0x0001) - permissions violation
     PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
     Oops: 0001 [#1] PREEMPT SMP PTI
     CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
     Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
     RIP: 0010:strlen+0x0/0x20
     Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
           48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
           48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
     RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
     RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
     RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
     RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
     R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
     R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
     FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     PKRU: 55555554
     Call Trace:
      filter_pred_pchar+0x18/0x40
      filter_match_preds+0x31/0x70
      ftrace_syscall_enter+0x27a/0x2c0
      syscall_trace_enter.constprop.0+0x1aa/0x1d0
      do_syscall_64+0x16/0x90
      entry_SYSCALL_64_after_hwframe+0x44/0xae
     RIP: 0033:0x7fa861d88664
    
    The above happened because the kernel tried to access user space directly
    and triggered a "supervisor read access in kernel mode" fault. Worse yet,
    the memory could not even be loaded yet, and a SEGFAULT could happen as
    well. This could be true for kernel space accessing as well.
    
    To be even more robust, test both kernel and user space strings. If the
    string fails to read, then simply have the filter fail.
    
    Note, TASK_SIZE is used to determine if the pointer is user or kernel space
    and the appropriate strncpy_from_kernel/user_nofault() function is used to
    copy the memory. For some architectures, the compare to TASK_SIZE may always
    pick user space or kernel space. If it gets it wrong, the only thing is that
    the filter will fail to match. In the future, this needs to be fixed to have
    the event denote which should be used. But failing a filter is much better
    than panicing the machine, and that can be solved later.
    
    Link: https://lore.kernel.org/all/[email protected]/
    Link: https://lkml.kernel.org/r/[email protected]
    
    Cc: [email protected]
    Cc: Ingo Molnar <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Tom Zanussi <[email protected]>
    Reported-by: Pingfan Liu <[email protected]>
    Tested-by: Pingfan Liu <[email protected]>
    Fixes: 87a342f5db69d ("tracing/filters: Support filtering for char * strings")
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Add ustring operation to filtering string pointers [+ + +]

Author: Steven Rostedt <[email protected]>
Date:   Thu Jan 13 20:08:40 2022 -0500

    tracing: Add ustring operation to filtering string pointers
    
    [ Upstream commit f37c3bbc635994eda203a6da4ba0f9d05165a8d6 ]
    
    Since referencing user space pointers is special, if the user wants to
    filter on a field that is a pointer to user space, then they need to
    specify it.
    
    Add a ".ustring" attribute to the field name for filters to state that the
    field is pointing to user space such that the kernel can take the
    appropriate action to read that pointer.
    
    Link: https://lore.kernel.org/all/[email protected]/
    
    Fixes: 77360f9bbc7e ("tracing: Add test for user space strings when filtering on string pointers")
    Tested-by: Sven Schnelle <[email protected]>
    Signed-off-by: Steven Rostedt <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Fix return value of __setup handlers [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Wed Mar 2 19:17:44 2022 -0800

    tracing: Fix return value of __setup handlers
    
    commit 1d02b444b8d1345ea4708db3bab4db89a7784b55 upstream.
    
    __setup() handlers should generally return 1 to indicate that the
    boot options have been handled.
    
    Using invalid option values causes the entire kernel boot option
    string to be reported as Unknown and added to init's environment
    strings, polluting it.
    
      Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
        kprobe_event=p,syscall_any,$arg1 trace_options=quiet
        trace_clock=jiffies", will be passed to user space.
    
     Run /sbin/init as init process
       with arguments:
         /sbin/init
       with environment:
         HOME=/
         TERM=linux
         BOOT_IMAGE=/boot/bzImage-517rc6
         kprobe_event=p,syscall_any,$arg1
         trace_options=quiet
         trace_clock=jiffies
    
    Return 1 from the __setup() handlers so that init's environment is not
    polluted with kernel boot options.
    
    Link: lore.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    
    Cc: [email protected]
    Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter")
    Fixes: e1e232ca6b8f ("tracing: Add trace_clock=<clock> kernel parameter")
    Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
    Signed-off-by: Randy Dunlap <[email protected]>
    Reported-by: Igor Zhbanov <[email protected]>
    Acked-by: Masami Hiramatsu <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ucounts: Fix systemd LimitNPROC with private users regression [+ + +]

Author: Eric W. Biederman <[email protected]>
Date:   Thu Feb 24 08:32:28 2022 -0600

    ucounts: Fix systemd LimitNPROC with private users regression
    
    commit 0ac983f512033cb7b5e210c9589768ad25b1e36b upstream.
    
    Long story short recursively enforcing RLIMIT_NPROC when it is not
    enforced on the process that creates a new user namespace, causes
    currently working code to fail.  There is no reason to enforce
    RLIMIT_NPROC recursively when we don't enforce it normally so update
    the code to detect this case.
    
    I would like to simply use capable(CAP_SYS_RESOURCE) to detect when
    RLIMIT_NPROC is not enforced upon the caller.  Unfortunately because
    RLIMIT_NPROC is charged and checked for enforcement based upon the
    real uid, using capable() which is euid based is inconsistent with reality.
    Come as close as possible to testing for capable(CAP_SYS_RESOURCE) by
    testing for when the real uid would match the conditions when
    CAP_SYS_RESOURCE would be present if the real uid was the effective
    uid.
    
    Reported-by: Etienne Dechamps <[email protected]>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215596
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Cc: [email protected]
    Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
    Reviewed-by: Kees Cook <[email protected]>
    Signed-off-by: "Eric W. Biederman" <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: clear related members when goto fail [+ + +]

Author: Hangyu Hua <[email protected]>
Date:   Sat Jan 1 01:21:38 2022 +0800

    usb: gadget: clear related members when goto fail
    
    commit 501e38a5531efbd77d5c73c0ba838a889bfc1d74 upstream.
    
    dev->config and dev->hs_config and dev->dev need to be cleaned if
    dev_config fails to avoid UAF.
    
    Acked-by: Alan Stern <[email protected]>
    Signed-off-by: Hangyu Hua <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: don't release an existing dev->buf [+ + +]

Author: Hangyu Hua <[email protected]>
Date:   Sat Jan 1 01:21:37 2022 +0800

    usb: gadget: don't release an existing dev->buf
    
    commit 89f3594d0de58e8a57d92d497dea9fee3d4b9cda upstream.
    
    dev->buf does not need to be released if it already exists before
    executing dev_config.
    
    Acked-by: Alan Stern <[email protected]>
    Signed-off-by: Hangyu Hua <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64 [+ + +]

Author: Dexuan Cui <[email protected]>
Date:   Fri Feb 25 00:46:00 2022 -0800

    x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
    
    commit 92e68cc558774de01024c18e8b35cdce4731c910 upstream.
    
    When Linux runs as an Isolated VM on Hyper-V, it supports AMD SEV-SNP
    but it's partially enlightened, i.e. cc_platform_has(
    CC_ATTR_GUEST_MEM_ENCRYPT) is true but sev_active() is false.
    
    Commit 4d96f9109109 per se is good, but with it now
    kvm_setup_vsyscall_timeinfo() -> kvmclock_init_mem() calls
    set_memory_decrypted(), and later gets stuck when trying to zere out
    the pages pointed by 'hvclock_mem', if Linux runs as an Isolated VM on
    Hyper-V. The cause is that here now the Linux VM should no longer access
    the original guest physical addrss (GPA); instead the VM should do
    memremap() and access the original GPA + ms_hyperv.shared_gpa_boundary:
    see the example code in drivers/hv/connection.c: vmbus_connect() or
    drivers/hv/ring_buffer.c: hv_ringbuffer_init(). If the VM tries to
    access the original GPA, it keepts getting injected a fault by Hyper-V
    and gets stuck there.
    
    Here the issue happens only when the VM has >=65 vCPUs, because the
    global static array hv_clock_boot[] can hold 64 "struct
    pvclock_vsyscall_time_info" (the sizeof of the struct is 64 bytes), so
    kvmclock_init_mem() only allocates memory in the case of vCPUs > 64.
    
    Since the 'hvclock_mem' pages are only useful when the kvm clock is
    supported by the underlying hypervisor, fix the issue by returning
    early when Linux VM runs on Hyper-V, which doesn't support kvm clock.
    
    Fixes: 4d96f9109109 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
    Tested-by: Andrea Parri (Microsoft) <[email protected]>
    Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
    Signed-off-by: Dexuan Cui <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xen/netfront: destroy queues before real_num_tx_queues is zeroed [+ + +]

Author: Marek Marczykowski-GцЁrecki <[email protected]>
Date:   Wed Feb 23 22:19:54 2022 +0100

    xen/netfront: destroy queues before real_num_tx_queues is zeroed
    
    commit dcf4ff7a48e7598e6b10126cc02177abb8ae4f3f upstream.
    
    xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
    delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5
    ("net-sysfs: update the queue counts in the unregistration path"),
    unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
    facts together means, that xennet_destroy_queues() called from
    xennet_remove() cannot do its job, because it's called after
    unregister_netdev(). This results in kfree-ing queues that are still
    linked in napi, which ultimately crashes:
    
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
        RIP: 0010:free_netdev+0xa3/0x1a0
        Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
        RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
        RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
        R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
        FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
        Call Trace:
         <TASK>
         xennet_remove+0x13d/0x300 [xen_netfront]
         xenbus_dev_remove+0x6d/0xf0
         __device_release_driver+0x17a/0x240
         device_release_driver+0x24/0x30
         bus_remove_device+0xd8/0x140
         device_del+0x18b/0x410
         ? _raw_spin_unlock+0x16/0x30
         ? klist_iter_exit+0x14/0x20
         ? xenbus_dev_request_and_reply+0x80/0x80
         device_unregister+0x13/0x60
         xenbus_dev_changed+0x18e/0x1f0
         xenwatch_thread+0xc0/0x1a0
         ? do_wait_intr_irq+0xa0/0xa0
         kthread+0x16b/0x190
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x22/0x30
         </TASK>
    
    Fix this by calling xennet_destroy_queues() from xennet_uninit(),
    when real_num_tx_queues is still available. This ensures that queues are
    destroyed when real_num_tx_queues is set to 0, regardless of how
    unregister_netdev() was called.
    
    Originally reported at
    https://github.com/QubesOS/qubes-issues/issues/7257
    
    Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path")
    Cc: [email protected]
    Signed-off-by: Marek Marczykowski-GцЁrecki <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: enforce validity of offload input flags [+ + +]

Author: Leon Romanovsky <[email protected]>
Date:   Tue Feb 8 16:14:32 2022 +0200

    xfrm: enforce validity of offload input flags
    
    commit 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 upstream.
    
    struct xfrm_user_offload has flags variable that received user input,
    but kernel didn't check if valid bits were provided. It caused a situation
    where not sanitized input was forwarded directly to the drivers.
    
    For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
    strongswan, but not implemented in the kernel at all.
    
    As a solution, check and sanitize input flags to forward
    XFRM_OFFLOAD_INBOUND to the drivers.
    
    Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
    Signed-off-by: Leon Romanovsky <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: fix MTU regression [+ + +]

Author: Jiri Bohac <[email protected]>
Date:   Wed Jan 19 10:22:53 2022 +0100

    xfrm: fix MTU regression
    
    commit 6596a0229541270fb8d38d989f91b78838e5e9da upstream.
    
    Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
    sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
    
    A Packet Too Big ICMPv6 message received in response to an ESP
    packet will prevent all further communication through the tunnel
    if the reported MTU minus the ESP overhead is smaller than 1280.
    
    E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
    is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
    in all further packets in the tunnel dropped. A ping through the
    tunnel fails with "ping: sendmsg: Invalid argument".
    
    Apparently the MTU on the xfrm route is smaller than 1280 and
    fails the check inside ip6_setup_cork() added by 749439bf.
    
    We found this by debugging USGv6/ipv6ready failures. Failing
    tests are: "Phase-2 Interoperability Test Scenario IPsec" /
    5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
    
    Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm:
    xfrm_state_mtu should return at least 1280 for ipv6") attempted
    to fix this but caused another regression in TCP MSS calculations
    and had to be reverted.
    
    The patch below fixes the situation by dropping the MTU
    check and instead checking for the underflows described in the
    749439bf commit message.
    
    Signed-off-by: Jiri Bohac <[email protected]>
    Fixes: 749439bfac6e ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: fix the if_id check in changelink [+ + +]

Author: Antony Antony <[email protected]>
Date:   Tue Feb 1 07:51:57 2022 +0100

    xfrm: fix the if_id check in changelink
    
    commit 6d0d95a1c2b07270870e7be16575c513c29af3f1 upstream.
    
    if_id will be always 0, because it was not yet initialized.
    
    Fixes: 8dce43919566 ("xfrm: interface with if_id 0 should return error")
    Reported-by: Pavel Machek <[email protected]>
    Signed-off-by: Antony Antony <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Список изменений в Linux 5.16.13