Kernel Live Patch Security Notice LSN-0107-1
Linux kernel vulnerabilities
A security issue affects these releases of Ubuntu and its derivatives:
– Ubuntu 20.04 LTS
– Ubuntu 18.04 LTS
– Ubuntu 16.04 LTS
– Ubuntu 22.04 LTS
– Ubuntu 14.04 LTS
Summary
Several security issues were fixed in the kernel.
Software Description
– linux – Linux kernel
– linux-aws – Linux kernel for Amazon Web Services (AWS) systems
– linux-azure – Linux kernel for Microsoft Azure Cloud systems
– linux-gcp – Linux kernel for Google Cloud Platform (GCP) systems
– linux-gke – Linux kernel for Google Container Engine (GKE) systems
– linux-gkeop – Linux kernel for Google Container Engine (GKE) systems
– linux-ibm – Linux kernel for IBM cloud systems
– linux-oracle – Linux kernel for Oracle Cloud systems
Details
In the Linux kernel, the following vulnerability has been resolved:
inet: inet_defrag: prevent sk release while still in use ip_local_out()
and other functions can pass skb->sk as function argument. If the skb is
a fragment and reassembly happens before such function call returns, the
sk must not be released. This affects skb fragments reassembled via
netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as
part of tx pipeline. Eric Dumazet made an initial analysis of this bug.
Quoting Eric: Calling ip_defrag() in output path is also implying
skb_orphan(), which is buggy because output path relies on sk not
disappearing. A relevant old patch about the issue was : 8282f27449bf
(“inet: frag: Always orphan skbs inside ip_defrag()”) [..
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one. If we orphan the packet in ipvlan,
then downstream things like FQ packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used. Eric suggested to
stash sk in fragment queue and made an initial patch. However there is a
problem with this: If skb is refragmented again right after,
ip_do_fragment() will copy head->sk to the new fragments, and sets up
destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem
accouting to reflect the fully reassembled skb, else wmem will
underflow. This change moves the orphan down into the core, to last
possible moment. As ip_defrag_offset is aliased with sk_buff->sk member,
we must move the offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue. In the former
case, things work as before, skb is orphaned. This is safe because skb
gets queued/stolen and won’t continue past reasm engine. In the latter
case, we will steal the skb->sk reference, reattach it to the head skb,
and fix up wmem accouting when inet_frag inflates truesize.
(CVE-2024-26921)
In the Linux kernel, the following vulnerability has been resolved:
af_unix: Fix garbage collector racing against connect() Garbage
collector does not take into account the risk of embryo getting enqueued
during the garbage collection. If such embryo has a peer that carries
SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list. sockets
are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening
in-flight socket bound to addr, not in fdtable V’s fd will be passed via
sendmsg(), gets inflight count bumped connect(S, addr) sendmsg(S, [V]);
close(V) __unix_gc() —————- ————————- ———– NS = unix_create1() skb1 =
sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L)
unix_peer(S) = NS // V count=1 inflight=0 NS = unix_peer(S) skb2 =
sock_alloc() skb_queue_tail(NS, skb2[V]) // V became in-flight // V
count=2 inflight=1 close(V) // V count=1 inflight=1 // GC candidate
condition met for u in gc_inflight_list: if (total_refs ==
inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in
gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not
// reachable from L yet, so V’s // inflight remains unchanged
__skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates:
if (u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1
inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock
its state. This makes GC wait until the end of any ongoing connect() to
that socket. After flipping the lock, a possibly SCM-laden embryo is
already enqueued. And if there is another embryo coming, it can not
possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen
because unix_gc_lock is already taken. Inflight graph remains
unaffected. (CVE-2024-26923)
In the Linux kernel, the following vulnerability has been resolved: mm:
swap: fix race between free_swap_and_cache() and swapoff() There was
previously a theoretical window where swapoff() could run and teardown a
swap_info_struct while a call to free_swap_and_cache() was running in
another thread. This could cause, amongst other bad possibilities,
swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to
access the freed memory for swap_map. This is a theoretical problem and
I haven’t been able to provoke it from a test case. But there has been
agreement based on code review that this is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn’t present in get_swap_device()
because it doesn’t make sense in general due to the race between getting
the reference and swapoff. So I’ve added an equivalent check directly in
free_swap_and_cache(). Details of how to provoke one possible issue
(thanks to David Hildenbrand for deriving this): –8<—–
__swap_entry_free() might be the last user and result in “count ==
SWAP_HAS_CACHE”. swapoff->try_to_unuse() will stop as soon as soon as
si->inuse_pages==0. So the question is: could someone reclaim the folio
and turn si->inuse_pages==0, before we completed
swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in
the swapcache. Only 2 subpages are still references by swap entries.
Process 1 still references subpage 0 via swap entry. Process 2 still
references subpage 1 via swap entry. Process 1 quits. Calls
free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in
the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). ->
count == SWAP_HAS_CACHE Process 2 goes ahead, passes
swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()-> … WRITE_ONCE(si->inuse_pages,
si->inuse_pages – nr_entries); What stops swapoff to succeed after
process 2 reclaimed the swap cache but before process1 finished its call
to swap_page_trans_huge_swapped()? –8<—– (CVE-2024-26960)
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout When the
sco connection is established and then, the sco socket is releasing,
timeout_work will be scheduled to judge whether the sco disconnection is
timeout. The sock will be deallocated later, but it is dereferenced
again in sco_sock_timeout. As a result, the use-after-free bugs will
happen. The root cause is shown below: Cleanup Thread | Worker Thread
sco_sock_release | sco_sock_close | __sco_sock_close |
sco_sock_set_timer | schedule_delayed_work | sco_sock_kill | (wait a
time) sock_put(sk) //FREE | sco_sock_timeout | sock_hold(sk) //USE The
KASAN report triggered by POC is shown below: [ 95.890016
================================================================== [
95.890496] BUG: KASAN: slab-use-after-free in
sco_sock_timeout+0x5e/0x1c0 [ 95.890755] Write of size 4 at addr
ffff88800c388080 by task kworker/0:0/7 … [ 95.890755] Workqueue: events
sco_sock_timeout [ 95.890755] Call Trace: [ 95.890755][ 95.890755] dump_stack_lvl+0x45/0x110 [ 95.890755]print_address_description+0x78/0x390 [ 95.890755
print_report+0x11b/0x250 [ 95.890755] ? __virt_addr_valid+0xbe/0xf0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_report+0x139/0x170 [ 95.890755] ? update_load_avg+0xe5/0x9f0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_check_range+0x2c3/0x2e0 [ 95.890755] sco_sock_timeout+0x5e/0x1c0 [
95.890755] process_one_work+0x561/0xc50 [ 95.890755
worker_thread+0xab2/0x13c0 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] kthread+0x279/0x300 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755]ret_from_fork+0x34/0x60 [ 95.890755] ? kthread_blkcg+0xa0/0xa0 [
95.890755 ret_from_fork_asm+0x11/0x20 [ 95.890755][ 95.890755] [ 95.890755 Allocated by task 506: [ 95.890755]kasan_save_track+0x3f/0x70 [ 95.890755 __kasan_kmalloc+0x86/0x90 [
95.890755] __kmalloc+0x17f/0x360 [ 95.890755 sk_prot_alloc+0xe1/0x1a0 [
95.890755] sk_alloc+0x31/0x4e0 [ 95.890755 bt_sock_alloc+0x2b/0x2a0 [
95.890755] sco_sock_create+0xad/0x320 [ 95.890755]bt_sock_create+0x145/0x320 [ 95.890755 __sock_create+0x2e1/0x650 [
95.890755] __sys_socket+0xd0/0x280 [ 95.890755
__x64_sys_socket+0x75/0x80 [ 95.890755] do_syscall_64+0xc4/0x1b0 [
95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [
95.890755] Freed by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [
95.890755] kasan_save_free_info+0x40/0x50 [ 95.890755
poison_slab_object+0x118/0x180 [ 95.890755] __kasan_slab_free+0x12/0x30
[ 95.890755] kfree+0xb2/0x240 [ 95.890755] __sk_destruct+0x317/0x410 [
95.890755] sco_sock_release+0x232/0x280 [ 95.890755]sock_close+0xb2/0x210 [ 95.890755] __fput+0x37f/0x770 [ 95.890755]task_work_run+0x1ae/0x210 [ 95.890755] get_signal+0xe17/0xf70 [
95.890755 arch_do_signal_or_restart+0x3f/0x520 [ 95.890755
syscall_exit_to_user_mode+0x55/0x120 [ 95.890755]do_syscall_64+0xd1/0x1b0 [ 95.890755]entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [ 95.890755] The
buggy address belongs to the object at ffff88800c388000 [ 95.890755]which belongs to the cache kmalloc-1k of size 1024 [ 95.890755 The buggy
address is located 128 bytes inside of [ 95.890755] freed 1024-byte
region [ffff88800c388000, ffff88800c388400) [ 95.890755] [ 95.890755]The buggy address belongs to the physical page: [ 95.890755 page:
refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800c38a800
pfn:0xc388 [ 95.890755] head: order:3 entire_mapcount:0
nr_pages_mapped:0 pincount:0 [ 95.890755] ano —truncated—
(CVE-2024-27398)
In the Linux kernel, the following vulnerability has been resolved:
watchdog: cpu5wdt.c: Fix use-after-free bug caused by cpu5wdt_trigger
When the cpu5wdt module is removing, the origin code uses del_timer() to
de-activate the timer. If the timer handler is running, del_timer()
could not stop it and will return directly. If the port region is
released by release_region() and then the timer handler
cpu5wdt_trigger() calls outb() to write into the region that is
released, the use-after-free bug will happen. Change del_timer() to
timer_shutdown_sync() in order that the timer handler could be finished
before the port region is released. (CVE-2024-38630)
Update instructions
The problem can be corrected by updating your kernel livepatch to the
following versions:
Ubuntu 20.04 LTS
aws – 107.1
aws – 107.2
azure – 107.1
azure – 107.2
gcp – 107.1
gcp – 107.2
generic – 107.1
generic – 107.2
gke – 107.1
gkeop – 107.1
gkeop – 107.2
ibm – 107.1
ibm – 107.2
lowlatency – 107.1
lowlatency – 107.2
oracle – 107.1
oracle – 107.2
Ubuntu 18.04 LTS
aws – 107.1
aws – 107.2
azure – 107.1
azure – 107.2
gcp – 107.1
gcp – 107.2
generic – 107.1
generic – 107.2
lowlatency – 107.1
lowlatency – 107.2
oracle – 107.1
oracle – 107.2
Ubuntu 16.04 LTS
aws – 107.1
aws – 107.2
azure – 107.1
azure – 107.2
gcp – 107.1
gcp – 107.2
generic – 107.1
generic – 107.2
lowlatency – 107.1
lowlatency – 107.2
Ubuntu 22.04 LTS
aws – 107.1
aws – 107.2
azure – 107.1
azure – 107.2
gcp – 107.1
gcp – 107.2
generic – 107.1
generic – 107.2
gke – 107.1
gke – 107.2
ibm – 107.1
ibm – 107.2
oracle – 107.1
Ubuntu 14.04 LTS
generic – 107.1
lowlatency – 107.1
Support Information
Livepatches for supported LTS kernels will receive upgrades for a period
of up to 13 months after the build date of the kernel.
Livepatches for supported HWE kernels which are not based on an LTS
kernel version will receive upgrades for a period of up to 9 months
after the build date of the kernel, or until the end of support for that
kernel’s non-LTS distro release version, whichever is sooner.
References
– CVE-2024-26921
– CVE-2024-26923
– CVE-2024-26960
– CVE-2024-27398
– CVE-2024-38630