{"id":57786,"date":"2024-07-01T18:49:47","date_gmt":"2024-07-01T15:49:47","guid":{"rendered":"https:\/\/packetstormsecurity.com\/files\/179290\/regresshion.txt"},"modified":"2024-07-01T18:49:47","modified_gmt":"2024-07-01T15:49:47","slug":"openssh-server-regresshion-remote-code-execution","status":"publish","type":"post","link":"https:\/\/afaghhosting.net\/blog\/openssh-server-regresshion-remote-code-execution\/","title":{"rendered":"OpenSSH Server regreSSHion Remote Code Execution"},"content":{"rendered":"<p>Qualys Security Advisory<\/p>\n<p>regreSSHion: RCE in OpenSSH&#8217;s server, on glibc-based Linux systems<br \/>(CVE-2024-6387)<\/p>\n<p>========================================================================<br \/>Contents<br \/>========================================================================<\/p>\n<p>Summary<br \/>SSH-2.0-OpenSSH_3.4p1 Debian 1:3.4p1-1.woody.3 (Debian 3.0r6, from 2005)<br \/>&#8211; Theory<br \/>&#8211; Practice<br \/>&#8211; Timing<br \/>SSH-2.0-OpenSSH_4.2p1 Debian-7ubuntu3 (Ubuntu 6.06.1, from 2006)<br \/>&#8211; Theory, take one<br \/>&#8211; Theory, take two<br \/>&#8211; Practice<br \/>&#8211; Timing<br \/>SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2 (Debian 12.5.0, from 2024)<br \/>&#8211; Theory<br \/>&#8211; Practice<br \/>&#8211; Timing<br \/>Towards an amd64 exploit<br \/>Patches and mitigation<br \/>Acknowledgments<br \/>Timeline<\/p>\n<p>========================================================================<br \/>Summary<br \/>========================================================================<\/p>\n<p>All it takes is a leap of faith<br \/>&#8212; The Interrupters, &#8220;Leap of Faith&#8221;<\/p>\n<p>Preliminary note: OpenSSH is one of the most secure software in the<br \/>world; this vulnerability is one slip-up in an otherwise near-flawless<br \/>implementation. Its defense-in-depth design and code are a model and an<br \/>inspiration, and we thank OpenSSH&#8217;s developers for their exemplary work.<\/p>\n<p>We discovered a vulnerability (a signal handler race condition) in<br \/>OpenSSH&#8217;s server (sshd): if a client does not authenticate within<br \/>LoginGraceTime seconds (120 by default, 600 in old OpenSSH versions),<br \/>then sshd&#8217;s SIGALRM handler is called asynchronously, but this signal<br \/>handler calls various functions that are not async-signal-safe (for<br \/>example, syslog()). This race condition affects sshd in its default<br \/>configuration.<\/p>\n<p>On investigation, we realized that this vulnerability is in fact a<br \/>regression of CVE-2006-5051 (&#8220;Signal handler race condition in OpenSSH<br \/>before 4.4 allows remote attackers to cause a denial of service (crash),<br \/>and possibly execute arbitrary code&#8221;), which was reported in 2006 by<br \/>Mark Dowd.<\/p>\n<p>This regression was introduced in October 2020 (OpenSSH 8.5p1) by commit<br \/>752250c (&#8220;revised log infrastructure for OpenSSH&#8221;), which accidentally<br \/>removed an &#8220;#ifdef DO_LOG_SAFE_IN_SIGHAND&#8221; from sigdie(), a function<br \/>that is directly called by sshd&#8217;s SIGALRM handler. In other words:<\/p>\n<p>&#8211; OpenSSH &lt; 4.4p1 is vulnerable to this signal handler race condition,<br \/>if not backport-patched against CVE-2006-5051, or not patched against<br \/>CVE-2008-4109, which was an incorrect fix for CVE-2006-5051;<\/p>\n<p>&#8211; 4.4p1 &lt;= OpenSSH &lt; 8.5p1 is not vulnerable to this signal handler race<br \/>condition (because the &#8220;#ifdef DO_LOG_SAFE_IN_SIGHAND&#8221; that was added<br \/>to sigdie() by the patch for CVE-2006-5051 transformed this unsafe<br \/>function into a safe _exit(1) call);<\/p>\n<p>&#8211; 8.5p1 &lt;= OpenSSH &lt; 9.8p1 is vulnerable again to this signal handler<br \/>race condition (because the &#8220;#ifdef DO_LOG_SAFE_IN_SIGHAND&#8221; was<br \/>accidentally removed from sigdie()).<\/p>\n<p>This vulnerability is exploitable remotely on glibc-based Linux systems,<br \/>where syslog() itself calls async-signal-unsafe functions (for example,<br \/>malloc() and free()): an unauthenticated remote code execution as root,<br \/>because it affects sshd&#8217;s privileged code, which is not sandboxed and<br \/>runs with full privileges. We have not investigated any other libc or<br \/>operating system; but OpenBSD is notably not vulnerable, because its<br \/>SIGALRM handler calls syslog_r(), an async-signal-safer version of<br \/>syslog() that was invented by OpenBSD in 2001.<\/p>\n<p>To exploit this vulnerability remotely (to the best of our knowledge,<br \/>CVE-2006-5051 has never been successfully exploited before), we drew<br \/>inspiration from a visionary paper, &#8220;Delivering Signals for Fun and<br \/>Profit&#8221;, which was published in 2001 by Michal Zalewski:<\/p>\n<p>https:\/\/lcamtuf.coredump.cx\/signals.txt<\/p>\n<p>Nevertheless, we immediately faced three major problems:<\/p>\n<p>&#8211; From a theoretical point of view, we must find a useful code path<br \/>that, if interrupted at the right time by SIGALRM, leaves sshd in an<br \/>inconsistent state, and we must then exploit this inconsistent state<br \/>inside the SIGALRM handler.<\/p>\n<p>&#8211; From a practical point of view, we must find a way to reach this<br \/>useful code path in sshd, and maximize our chances of interrupting it<br \/>at the right time.<\/p>\n<p>&#8211; From a timing point of view, we must find a way to further increase<br \/>our chances of interrupting this useful code path at the right time,<br \/>remotely.<\/p>\n<p>To focus on these three problems without having to immediately fight<br \/>against all the modern operating system protections (in particular, ASLR<br \/>and NX), we decided to exploit old OpenSSH versions first, on i386, and<br \/>then, based on this experience, recent versions:<\/p>\n<p>&#8211; First, &#8220;SSH-2.0-OpenSSH_3.4p1 Debian 1:3.4p1-1.woody.3&#8221;, from<br \/>&#8220;debian-30r6-dvd-i386-binary-1_NONUS.iso&#8221;: this is the first Debian<br \/>version that has privilege separation enabled by default and that is<br \/>patched against all the critical vulnerabilities of that era (in<br \/>particular, CVE-2003-0693 and CVE-2002-0640).<\/p>\n<p>To remotely exploit this version, we interrupt a call to free() with<br \/>SIGALRM (inside sshd&#8217;s public-key parsing code), leave the heap in an<br \/>inconsistent state, and exploit this inconsistent state during another<br \/>call to free(), inside the SIGALRM handler.<\/p>\n<p>In our experiments, it takes ~10,000 tries on average to win this race<br \/>condition; i.e., with 10 connections (MaxStartups) accepted per 600<br \/>seconds (LoginGraceTime), it takes ~1 week on average to obtain a<br \/>remote root shell.<\/p>\n<p>&#8211; Second, &#8220;SSH-2.0-OpenSSH_4.2p1 Debian-7ubuntu3&#8221;, from<br \/>&#8220;ubuntu-6.06.1-server-i386.iso&#8221;: this is the last Ubuntu version that<br \/>is still vulnerable to CVE-2006-5051 (&#8220;Signal handler race condition<br \/>in OpenSSH before 4.4&#8243;).<\/p>\n<p>To remotely exploit this version, we interrupt a call to pam_start()<br \/>with SIGALRM, leave one of PAM&#8217;s structures in an inconsistent state,<br \/>and exploit this inconsistent state during a call to pam_end(), inside<br \/>the SIGALRM handler.<\/p>\n<p>In our experiments, it takes ~10,000 tries on average to win this race<br \/>condition; i.e., with 10 connections (MaxStartups) accepted per 120<br \/>seconds (LoginGraceTime), it takes ~1-2 days on average to obtain a<br \/>remote root shell.<\/p>\n<p>&#8211; Finally, &#8220;SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2&#8221;, from<br \/>&#8220;debian-12.5.0-i386-DVD-1.iso&#8221;: this is the current Debian stable<br \/>version, and it is vulnerable to the regression of CVE-2006-5051.<\/p>\n<p>To remotely exploit this version, we interrupt a call to malloc() with<br \/>SIGALRM (inside sshd&#8217;s public-key parsing code), leave the heap in an<br \/>inconsistent state, and exploit this inconsistent state during another<br \/>call to malloc(), inside the SIGALRM handler (more precisely, inside<br \/>syslog()).<\/p>\n<p>In our experiments, it takes ~10,000 tries on average to win this race<br \/>condition, so ~3-4 hours with 100 connections (MaxStartups) accepted<br \/>per 120 seconds (LoginGraceTime). Ultimately, it takes ~6-8 hours on<br \/>average to obtain a remote root shell, because we can only guess the<br \/>glibc&#8217;s address correctly half of the time (because of ASLR).<\/p>\n<p>This research is still a work in progress:<\/p>\n<p>&#8211; we have targeted virtual machines only, not bare-metal servers, on a<br \/>mostly stable network link (~10ms packet jitter);<\/p>\n<p>&#8211; we are convinced that various aspects of our exploits can be greatly<br \/>improved;<\/p>\n<p>&#8211; we have started to work on an amd64 exploit, which is much harder<br \/>because of the stronger ASLR.<\/p>\n<p>A few days after we started our work on amd64, we noticed the following<br \/>bug report (in OpenSSH&#8217;s public Bugzilla), about a deadlock in sshd&#8217;s<br \/>SIGALRM handler:<\/p>\n<p>https:\/\/bugzilla.mindrot.org\/show_bug.cgi?id=3690<\/p>\n<p>We therefore decided to contact OpenSSH&#8217;s developers immediately (to let<br \/>them know that this deadlock is caused by an exploitable vulnerability),<br \/>we put our amd64 work on hold, and we started to write this advisory.<\/p>\n<p>========================================================================<br \/>SSH-2.0-OpenSSH_3.4p1 Debian 1:3.4p1-1.woody.3 (Debian 3.0r6, from 2005)<br \/>========================================================================<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Theory<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>But that&#8217;s not like me, I&#8217;m breaking free<br \/>&#8212; The Interrupters, &#8220;Haven&#8217;t Seen the Last of Me&#8221;<\/p>\n<p>The SIGALRM handler of this OpenSSH version calls packet_close(), which<br \/>calls buffer_free(), which calls xfree() and hence free(), which is not<br \/>async-signal-safe:<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>302 grace_alarm_handler(int sig)<br \/>303 {<br \/>&#8230;<br \/>307 packet_close();<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>329 packet_close(void)<br \/>330 {<br \/>&#8230;<br \/>341 buffer_free(&amp;input);<br \/>342 buffer_free(&amp;output);<br \/>343 buffer_free(&amp;outgoing_packet);<br \/>344 buffer_free(&amp;incoming_packet);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>35 buffer_free(Buffer *buffer)<br \/>36 {<br \/>37 memset(buffer-&gt;buf, 0, buffer-&gt;alloc);<br \/>38 xfree(buffer-&gt;buf);<br \/>39 }<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>51 xfree(void *ptr)<br \/>52 {<br \/>53 if (ptr == NULL)<br \/>54 fatal(&#8220;xfree: NULL pointer given as argument&#8221;);<br \/>55 free(ptr);<br \/>56 }<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Consequently, we started to read the malloc code of this Debian&#8217;s glibc<br \/>(2.2.5), to see if a first call to free() can be interrupted by SIGALRM<br \/>and exploited during a second call to free() inside the SIGALRM handler<br \/>(at lines 341-344, above). Because this glibc&#8217;s malloc is not hardened<br \/>against the unlink() technique pioneered by Solar Designer in 2000, we<br \/>quickly spotted an interesting code path in chunk_free() (which is<br \/>called internally by free()):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>1028 struct malloc_chunk<br \/>1029 {<br \/>1030 INTERNAL_SIZE_T prev_size; \/* Size of previous chunk (if free). *\/<br \/>1031 INTERNAL_SIZE_T size; \/* Size in bytes, including overhead. *\/<br \/>1032 struct malloc_chunk* fd; \/* double links &#8212; used only if free. *\/<br \/>1033 struct malloc_chunk* bk;<br \/>1034 };<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>2516 #define unlink(P, BK, FD) \\<br \/>2517 { \\<br \/>2518 BK = P-&gt;bk; \\<br \/>2519 FD = P-&gt;fd; \\<br \/>2520 FD-&gt;bk = BK; \\<br \/>2521 BK-&gt;fd = FD; \\<br \/>2522 } \\<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>3160 chunk_free(arena *ar_ptr, mchunkptr p)<br \/>&#8230;.<br \/>3164 {<br \/>3165 INTERNAL_SIZE_T hd = p-&gt;size; \/* its head field *\/<br \/>&#8230;.<br \/>3177 sz = hd &amp; ~PREV_INUSE;<br \/>3178 next = chunk_at_offset(p, sz);<br \/>3179 nextsz = chunksize(next);<br \/>&#8230;.<br \/>3230 if (!(inuse_bit_at_offset(next, nextsz))) \/* consolidate forward *\/<br \/>3231 {<br \/>&#8230;.<br \/>3241 unlink(next, bck, fwd);<br \/>&#8230;.<br \/>3244 }<br \/>3245 else<br \/>3246 set_head(next, nextsz); \/* clear inuse bit *\/<br \/>&#8230;.<br \/>3251 frontlink(ar_ptr, p, sz, idx, bck, fwd);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>To exploit this code path, we arrange for sshd&#8217;s heap to have the<br \/>following layout (chunk_X, chunk_Y, and chunk_Z are malloc()ated chunks<br \/>of memory, and p, s, f, b are their prev_size, size, fd, and bk fields):<\/p>\n<p>&#8212;&#8211;|&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;|&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;|&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;|&#8212;&#8211;<br \/>&#8230; |p|s|f|b| chunk_X |p|s|f|b| chunk_Y |p|s|f|b| chunk_Z | &#8230;<br \/>&#8212;&#8211;|&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;|&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;|&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;|&#8212;&#8211;<br \/>|&lt;&#8212;&#8212;&#8212;&#8212;-&gt;|<br \/>user data<\/p>\n<p>&#8211; First, if a call to free(chunk_Y) is interrupted by SIGALRM *after*<br \/>line 3246 but *before* line 3251, then chunk_Y is already marked as<br \/>free (because chunk_Z&#8217;s PREV_INUSE bit is cleared at line 3246) but it<br \/>is not yet linked into its doubly-linked list (at line 3251): in other<br \/>words, chunk_Y&#8217;s fd and bk pointers still contain user data (attacker-<br \/>controlled data).<\/p>\n<p>&#8211; Second, if (inside the SIGALRM handler) packet_close() calls<br \/>free(chunk_X), then the code block at lines 3230-3244 is entered<br \/>(because chunk_Y is marked as free) and chunk_Y is unlink()ed (at line<br \/>3241): a so-called aa4bmo primitive (almost arbitrary 4 bytes mirrored<br \/>overwrite), because chunk_Y&#8217;s fd and bk pointers are still attacker-<br \/>controlled. For more information on the unlink() technique and the<br \/>aa4bmo primitive:<\/p>\n<p>https:\/\/www.openwall.com\/articles\/JPEG-COM-Marker-Vulnerability#exploit<br \/>http:\/\/phrack.org\/issues\/61\/6.html#article<\/p>\n<p>&#8211; Last, with this aa4bmo primitive we overwrite the glibc&#8217;s __free_hook<br \/>function pointer (this old Debian version does not have ASLR, nor NX)<br \/>with the address of our shellcode in the heap, thus achieving remote<br \/>code execution during the next call to free() in packet_close().<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Practice<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Now they&#8217;re taking over and they got complete control<br \/>&#8212; The Interrupters, &#8220;Liberty&#8221;<\/p>\n<p>To mount this attack against sshd, we interrupt a call to free() inside<br \/>sshd&#8217;s parsing code of a DSA public key (i.e., line 144 below is our<br \/>free(chunk_Y)) and exploit it during one of the free() calls in<br \/>packet_close() (i.e., one of the lines 341-344 above is our<br \/>free(chunk_X)):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>136 buffer_get_bignum2(Buffer *buffer, BIGNUM *value)<br \/>137 {<br \/>138 u_int len;<br \/>139 u_char *bin = buffer_get_string(buffer, &amp;len);<br \/>&#8230;<br \/>143 BN_bin2bn(bin, len, value);<br \/>144 xfree(bin);<br \/>145 }<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Initially, however, we were never able to win this race condition (i.e.,<br \/>interrupt the free() call at line 144 at the right time). Eventually, we<br \/>realized that we could greatly improve our chances of winning this race:<br \/>the DSA public-key parsing code allows us to call free() four times (at<br \/>lines 704-707 below), and furthermore sshd allows us to attempt six user<br \/>authentications (AUTH_FAIL_MAX); if any one of these 24 free() calls is<br \/>interrupted at the right time, then we later achieve remote code<br \/>execution inside the SIGALRM handler.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>678 key_from_blob(u_char *blob, int blen)<br \/>679 {<br \/>&#8230;<br \/>693 switch (type) {<br \/>&#8230;<br \/>702 case KEY_DSA:<br \/>703 key = key_new(type);<br \/>704 buffer_get_bignum2(&amp;b, key-&gt;dsa-&gt;p);<br \/>705 buffer_get_bignum2(&amp;b, key-&gt;dsa-&gt;q);<br \/>706 buffer_get_bignum2(&amp;b, key-&gt;dsa-&gt;g);<br \/>707 buffer_get_bignum2(&amp;b, key-&gt;dsa-&gt;pub_key);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>With this improvement, we finally won the race condition after ~1 month:<br \/>we were happy (and did a root-shell dance), but we also felt that there<br \/>was still room for improvement.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Timing<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Don&#8217;t worry, just wait and see<br \/>&#8212; The Interrupters, &#8220;Haven&#8217;t Seen the Last of Me&#8221;<\/p>\n<p>We therefore implemented the following threefold timing strategy:<\/p>\n<p>&#8211; We do not wait until the last moment to send our (rather large) DSA<br \/>public-key packet to sshd: instead, we send the entire packet minus<br \/>one byte (the last byte) long before the LoginGraceTime, and send the<br \/>very last byte at the very last moment, to minimize the effects of<br \/>network delays. (And we disable the Nagle algorithm.)<\/p>\n<p>&#8211; We keep track of the median round-trip time (by regularly sending<br \/>packets that produce a response from sshd), and keep track of the<br \/>difference between the moment we are expecting our connection to be<br \/>closed by sshd (essentially the moment we receive the first byte of<br \/>sshd&#8217;s banner, plus LoginGraceTime) and the moment our connection is<br \/>really closed by sshd, and accordingly adjust our timing (i.e., the<br \/>moment when we send the last byte of our DSA packet).<\/p>\n<p>These time differences allow us to track clock skews and network<br \/>delays, which show predictable patterns over time: we experimented<br \/>with linear and spline regressions, but in the end, nothing worked<br \/>better than simply re-using the most recent measurement. Possibly,<br \/>deep learning might yield even better results; this is left as an<br \/>exercise for the interested reader.<\/p>\n<p>&#8211; More importantly, we further increase our chances of winning this race<br \/>condition by slowly adjusting our timing through involuntary feedback<br \/>from sshd:<\/p>\n<p>&#8211; if we receive a response (SSH2_MSG_USERAUTH_FAILURE) to our DSA<br \/>public-key packet, then we sent it too early (sshd had the time to<br \/>receive our packet in the unprivileged child, parse it, send it to<br \/>the privileged child, parse it there, and send a response all the<br \/>way back to us);<\/p>\n<p>&#8211; if we cannot even send the last byte of our DSA packet, then we<br \/>waited too long (sshd already received the SIGALRM and closed our<br \/>connection);<\/p>\n<p>&#8211; if we can send the last byte of our DSA packet, and receive no<br \/>response before sshd closes our connection, then our timing was<br \/>reasonably accurate.<\/p>\n<p>This feedback allows us to target what we call the &#8220;large&#8221; race<br \/>window: hitting it does not guarantee that we win the race condition,<br \/>but inside this large window are the 24 &#8220;small&#8221; race windows (inside<br \/>the 24 free() calls) that, if hit, guarantee that we do win the race<br \/>condition.<\/p>\n<p>With these improvements, it takes ~10,000 tries on average to win this<br \/>race condition; i.e., with 10 connections (MaxStartups) accepted per 600<br \/>seconds (LoginGraceTime), it takes ~1 week on average to obtain a remote<br \/>root shell.<\/p>\n<p>========================================================================<br \/>SSH-2.0-OpenSSH_4.2p1 Debian-7ubuntu3 (Ubuntu 6.06.1, from 2006)<br \/>========================================================================<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Theory, take one<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>I sleep when the sun starts to rise<br \/>&#8212; The Interrupters, &#8220;Alien&#8221;<\/p>\n<p>The SIGALRM handler of this OpenSSH version does not call packet_close()<br \/>anymore; moreover, this Ubuntu&#8217;s glibc (2.3.6) always takes a mandatory<br \/>lock when entering the functions of the malloc family (even if single-<br \/>threaded like sshd), which prevents us from interrupting a call to one<br \/>of the malloc functions and later exploiting it during another call to<br \/>these functions (they would always deadlock). We must find another<br \/>solution.<\/p>\n<p>CVE-2006-5051 mentions a double-free in GSSAPI, but GSSAPI (or Kerberos)<br \/>is not enabled by default, so this does not sound very appealing. On the<br \/>other hand, PAM is enabled by default, and pam_end() is called by sshd&#8217;s<br \/>SIGALRM handler (and is, of course, not async-signal-safe). We therefore<br \/>searched for a PAM function that, if interrupted by SIGALRM at the right<br \/>time, would leave PAM&#8217;s internal structures in an inconsistent state,<br \/>exploitable during pam_end() in the SIGALRM handler. We found<br \/>pam_set_data():<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>33 int pam_set_data(<br \/>34 pam_handle_t *pamh,<br \/>..<br \/>37 void (*cleanup)(pam_handle_t *pamh, void *data, int error_status))<br \/>38 {<br \/>39 struct pam_data *data_entry;<br \/>..<br \/>57 } else if ((data_entry = malloc(sizeof(*data_entry)))) {<br \/>..<br \/>65 data_entry-&gt;next = pamh-&gt;data;<br \/>66 pamh-&gt;data = data_entry;<br \/>..<br \/>74 data_entry-&gt;cleanup = cleanup;<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>If this function is interrupted by SIGALRM *after* line 66 but *before*<br \/>line 74, then data_entry is already linked into PAM&#8217;s structures (pamh),<br \/>but its cleanup field (a function pointer) is not yet initialized (since<br \/>the malloc() at line 57 does not initialize its memory). If we are able<br \/>to control cleanup (through leftovers from previous heap allocations),<br \/>then we can execute arbitrary code when pam_end() (inside the SIGALRM<br \/>handler) calls _pam_free_data() (at line 118):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>104 void _pam_free_data(pam_handle_t *pamh, int status)<br \/>105 {<br \/>106 struct pam_data *last;<br \/>107 struct pam_data *data;<br \/>&#8230;<br \/>112 data = pamh-&gt;data;<br \/>113 <br \/>114 while (data) {<br \/>115 last = data;<br \/>116 data = data-&gt;next;<br \/>117 if (last-&gt;cleanup) {<br \/>118 last-&gt;cleanup(pamh, last-&gt;data, status);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>This would have been an extremely simple exploit; unfortunately, we<br \/>completely overlooked that pam_set_data() can only be called from PAM<br \/>modules: if we interrupt it with SIGALRM, then pamh-&gt;caller_is is still<br \/>_PAM_CALLED_FROM_MODULE, in which case pam_end() returns immediately,<br \/>without ever calling _pam_free_data(). Back to the drawing board.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Theory, take two<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Not giving up, it&#8217;s not what we do<br \/>&#8212; The Interrupters, &#8220;Title Holder&#8221;<\/p>\n<p>We noticed that, at line 601 below, sshd passes a pointer to its global<br \/>sshpam_handle pointer directly to pam_start() (which is called once per<br \/>connection):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>202 static pam_handle_t *sshpam_handle = NULL;<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>584 sshpam_init(Authctxt *authctxt)<br \/>585 {<br \/>&#8230;<br \/>600 sshpam_err =<br \/>601 pam_start(SSHD_PAM_SERVICE, user, &amp;store_conv, &amp;sshpam_handle);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>We therefore decided to look into pam_start() itself: if interrupted by<br \/>SIGALRM, it might leave the structure pointed to by sshpam_handle in an<br \/>inconsistent state, which could then be exploited inside the SIGALRM<br \/>handler, when &#8220;pam_end(sshpam_handle, sshpam_err)&#8221; is called.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>18 int pam_start (<br \/>..<br \/>22 pam_handle_t **pamh)<br \/>23 {<br \/>..<br \/>32 if ((*pamh = calloc(1, sizeof(**pamh))) == NULL) {<br \/>&#8230;<br \/>110 if ( _pam_init_handlers(*pamh) != PAM_SUCCESS ) {<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>319 int _pam_init_handlers(pam_handle_t *pamh)<br \/>320 {<br \/>&#8230;<br \/>398 retval = _pam_parse_conf_file(pamh, f, pamh-&gt;service_name, PAM_T_ANY<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>66 static int _pam_parse_conf_file(pam_handle_t *pamh, FILE *f<br \/>..<br \/>73 {<br \/>&#8230;<br \/>252 res = _pam_add_handler(pamh, must_fail, other<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>581 int _pam_add_handler(pam_handle_t *pamh<br \/>&#8230;<br \/>585 {<br \/>&#8230;<br \/>755 the_handlers = (other) ? &amp;pamh-&gt;handlers.other : &amp;pamh-&gt;handlers.conf;<br \/>&#8230;<br \/>767 handler_p = &amp;the_handlers-&gt;authenticate;<br \/>&#8230;<br \/>874 if ((*handler_p = malloc(sizeof(struct handler))) == NULL) {<br \/>&#8230;<br \/>886 (*handler_p)-&gt;next = NULL;<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>At line 32, pam_start() immediately sets sshd&#8217;s sshpam_handle to a<br \/>calloc()ated chunk of memory; this is safe, because calloc() initializes<br \/>this memory to zero. On the other hand, if _pam_add_handler() (which is<br \/>called multiple times by pam_start()) is interrupted by SIGALRM *after*<br \/>line 874 but *before* line 886, then a malloc()ated structure is linked<br \/>into pamh, but its next field is not yet initialized. If we are able to<br \/>control next (through leftovers from previous heap allocations), then we<br \/>can pass an arbitrary pointer to free() during the call to pam_end()<br \/>(inside the SIGALRM handler), at line 1020 (and line 1017) below:<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>11 int pam_end(pam_handle_t *pamh, int pam_status)<br \/>12 {<br \/>..<br \/>31 if ((ret = _pam_free_handlers(pamh)) != PAM_SUCCESS) {<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>925 int _pam_free_handlers(pam_handle_t *pamh)<br \/>926 {<br \/>&#8230;<br \/>954 _pam_free_handlers_aux(&amp;(pamh-&gt;handlers.conf.authenticate));<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>1009 void _pam_free_handlers_aux(struct handler **hp)<br \/>1010 {<br \/>1011 struct handler *h = *hp;<br \/>1012 struct handler *last;<br \/>&#8230;.<br \/>1015 while (h) {<br \/>1016 last = h;<br \/>1017 _pam_drop(h-&gt;argv); \/* This is all alocated in a single chunk *\/<br \/>1018 h = h-&gt;next;<br \/>1019 memset(last, 0, sizeof(*last));<br \/>1020 free(last);<br \/>1021 }<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Because the malloc of this Ubuntu&#8217;s glibc is already hardened against<br \/>the old unlink() technique, we decided to transform our arbitrary free()<br \/>into the Malloc Maleficarum&#8217;s House of Mind (fastbin version): we free()<br \/>our own NON_MAIN_ARENA chunk, point our fake arena to sshd&#8217;s .got.plt<br \/>(this Ubuntu&#8217;s sshd has ASLR but not PIE), and overwrite _exit()&#8217;s entry<br \/>with the address of our shellcode in the heap (this Ubuntu&#8217;s heap is<br \/>still executable by default). For more information on the Malloc<br \/>Maleficarum:<\/p>\n<p>https:\/\/seclists.org\/bugtraq\/2005\/Oct\/118<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Practice<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>I learned everything the hard way<br \/>&#8212; The Interrupters, &#8220;The Hard Way&#8221;<\/p>\n<p>To mount this attack against sshd, we initially faced three problems:<\/p>\n<p>&#8211; The House of Mind requires us to store the pointer to our fake arena<br \/>at address 0x08100000 in the heap; but are we able to store attacker-<br \/>controlled data at such a high address? Because sshd calls pam_start()<br \/>at the very beginning of the user authentication, we do not control<br \/>anything except the user name itself; luckily, a user name of length<br \/>~128KB (shorter than DEFAULT_MMAP_THRESHOLD) allows us to store our<br \/>own data at address 0x08100000.<\/p>\n<p>&#8211; The size field of our fake NON_MAIN_ARENA chunk must not be too large<br \/>(to pass free()&#8217;s security checks); i.e., it must contain null bytes.<br \/>But our long user name is a null-terminated string that cannot contain<br \/>null bytes; luckily we remembered that _pam_free_handlers_aux() zeroes<br \/>the structures that it free()s (line 1019 above): we therefore &#8220;patch&#8221;<br \/>the size field of our fake chunk with such a memset(0), and only then<br \/>free() it.<\/p>\n<p>&#8211; We must survive several calls to free() (at lines 1017 and 1020 above)<br \/>before the free() of our fake NON_MAIN_ARENA chunk. We transform these<br \/>free()s into no-ops by pointing them to fake IS_MMAPPED chunks: free()<br \/>calls munmap_chunk(), which calls munmap(), which fails because these<br \/>fake IS_MMAPPED chunks are misaligned; effectively a no-op, because<br \/>assert()ion failures are not enforced in this Ubuntu&#8217;s glibc.<\/p>\n<p>Finally, our long user name also allows us to control the potentially<br \/>uninitialized next field of 20 different structures (through leftovers<br \/>from temporary copies of our long user name), because pam_start() calls<br \/>_pam_add_handler() multiple times; i.e., our large race window contains<br \/>20 small race windows.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Timing<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Same tricks they used before<br \/>&#8212; The Interrupters, &#8220;Divide Us&#8221;<\/p>\n<p>For this attack against Ubuntu 6.06.1, we simply re-used the timing<br \/>strategy that we used against Debian 3.0r6: it takes ~10,000 tries on<br \/>average to win the race condition, and with 10 connections (MaxStartups)<br \/>accepted per 120 seconds (LoginGraceTime), it takes ~1-2 days on average<br \/>to obtain a remote root shell.<\/p>\n<p>Note: because this Ubuntu&#8217;s glibc always takes a mandatory lock when<br \/>entering the functions of the malloc family, an unlucky attacker might<br \/>deadlock all 10 MaxStartups connections before obtaining a root shell;<br \/>we have not tried to work around this problem because our ultimate goal<br \/>was to exploit a modern OpenSSH version anyway.<\/p>\n<p>========================================================================<br \/>SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2 (Debian 12.5.0, from 2024)<br \/>========================================================================<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Theory<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Now you&#8217;re ready, take the demons head on<br \/>&#8212; The Interrupters, &#8220;Be Gone&#8221;<\/p>\n<p>The SIGALRM handler of this OpenSSH version does not call packet_close()<br \/>nor pam_end(); in fact it calls only one interesting function, syslog():<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>358 grace_alarm_handler(int sig)<br \/>359 {<br \/>&#8230;<br \/>370 sigdie(&#8220;Timeout before authentication for %s port %d&#8221;,<br \/>371 ssh_remote_ipaddr(the_active_state),<br \/>372 ssh_remote_port(the_active_state));<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>96 #define sigdie(&#8230;) sshsigdie(__FILE__, __func__, __LINE__, 0, SYSLOG_LEVEL_ERROR, NULL, __VA_ARGS__)<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>451 sshsigdie(const char *file, const char *func, int line, int showfunc,<br \/>452 LogLevel level, const char *suffix, const char *fmt, &#8230;)<br \/>453 {<br \/>&#8230;<br \/>457 sshlogv(file, func, line, showfunc, SYSLOG_LEVEL_FATAL,<br \/>458 suffix, fmt, args);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>464 sshlogv(const char *file, const char *func, int line, int showfunc,<br \/>465 LogLevel level, const char *suffix, const char *fmt, va_list args)<br \/>466 {<br \/>&#8230;<br \/>489 do_log(level, forced, suffix, fmt2, args);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>337 do_log(LogLevel level, int force, const char *suffix, const char *fmt,<br \/>338 va_list args)<br \/>339 {<br \/>&#8230;<br \/>419 syslog(pri, &#8220;%.500s&#8221;, fmtbuf);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Our two key questions, then, are: Does the syslog() of this Debian&#8217;s<br \/>glibc (2.36) call async-signal-unsafe functions such as malloc() and<br \/>free()? And if yes, does this glibc still take a mandatory lock when<br \/>entering the functions of the malloc family?<\/p>\n<p>&#8211; Luckily for us attackers, the answer to our first question is yes; if,<br \/>and only if, the syslog() inside the SIGALRM handler is the very first<br \/>call to syslog(), then __localtime64_r() (which is called by syslog())<br \/>calls malloc(304) to allocate a FILE structure (at line 166) and calls<br \/>malloc(4096) to allocate an internal read buffer (at line 186):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>28 __localtime64_r (const __time64_t *t, struct tm *tp)<br \/>29 {<br \/>30 return __tz_convert (*t, 1, tp);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>567 __tz_convert (__time64_t timer, int use_localtime, struct tm *tp)<br \/>568 {<br \/>&#8230;<br \/>577 tzset_internal (tp == &amp;_tmbuf &amp;&amp; use_localtime);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>367 tzset_internal (int always)<br \/>368 {<br \/>&#8230;<br \/>405 __tzfile_read (tz, 0, NULL);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>105 __tzfile_read (const char *file, size_t extra, char **extrap)<br \/>106 {<br \/>&#8230;<br \/>109 FILE *f;<br \/>&#8230;<br \/>166 f = fopen (file, &#8220;rce&#8221;);<br \/>&#8230;<br \/>186 if (__builtin_expect (__fread_unlocked ((void *) &amp;tzhead, sizeof (tzhead),<br \/>187 1, f) != 1, 0)<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Note: because we do not control anything about these malloc()ations<br \/>(not their order, not their sizes, not their contents), we took the<br \/>&#8220;rce&#8221; at line 166 as a much-needed good omen.<\/p>\n<p>&#8211; And luckily for us, the answer to our second question is no; since<br \/>October 2017, the glibc&#8217;s malloc functions do not take any lock<br \/>anymore, when single-threaded (like sshd):<\/p>\n<p>https:\/\/sourceware.org\/git?p=glibc.git;a=commit;h=a15d53e2de4c7d83bda251469d92a3c7b49a90db<br \/>https:\/\/sourceware.org\/git?p=glibc.git;a=commit;h=3f6bb8a32e5f5efd78ac08c41e623651cc242a89<br \/>https:\/\/sourceware.org\/git?p=glibc.git;a=commit;h=905a7725e9157ea522d8ab97b4c8b96aeb23df54<\/p>\n<p>Moreover, this Debian version suffers from the ASLR weakness described<br \/>in the following great blog posts (by Justin Miller and Mathias Krause,<br \/>respectively):<\/p>\n<p>https:\/\/zolutal.github.io\/aslrnt\/<br \/>https:\/\/grsecurity.net\/toolchain_necromancy_past_mistakes_haunting_aslr<\/p>\n<p>Concretely, in the case of sshd on i386, every memory mapping is<br \/>randomized normally (sshd&#8217;s PIE, the heap, most libraries, the stack),<br \/>but the glibc itself is always mapped either at address 0xb7200000 or at<br \/>address 0xb7400000; in other words, we can correctly guess the glibc&#8217;s<br \/>address half of the time (a small price to pay for defeating ASLR). In<br \/>our exploit we assume that the glibc is mapped at address 0xb7400000,<br \/>because it is slightly more common than 0xb7200000.<\/p>\n<p>Our next question is: which code paths inside the glibc&#8217;s malloc<br \/>functions, if interrupted by SIGALRM at the right time, leave the heap<br \/>in an inconsistent state, exploitable during one of the malloc() calls<br \/>inside the SIGALRM handler?<\/p>\n<p>We found several interesting (and surprising!) code paths, but the one<br \/>we chose involves only relative sizes, not absolute addresses (unlike<br \/>various code paths inside unlink_chunk(), for example); this difference<br \/>might prove crucial for a future amd64 exploit. This code path, inside<br \/>malloc(), splits a large free chunk (victim) into two smaller chunks;<br \/>the first chunk is returned to malloc()&#8217;s caller (at line 4345) and the<br \/>second chunk (remainder) is linked into an unsorted list of free chunks<br \/>(at lines 4324-4327):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>1449 #define set_head(p, s) ((p)-&gt;mchunk_size = (s))<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>3765 _int_malloc (mstate av, size_t bytes)<br \/>3766 {<br \/>&#8230;.<br \/>3798 nb = checked_request2size (bytes);<br \/>&#8230;.<br \/>4295 size = chunksize (victim);<br \/>&#8230;.<br \/>4300 remainder_size = size &#8211; nb;<br \/>&#8230;.<br \/>4316 remainder = chunk_at_offset (victim, nb);<br \/>&#8230;.<br \/>4320 bck = unsorted_chunks (av);<br \/>4321 fwd = bck-&gt;fd;<br \/>&#8230;.<br \/>4324 remainder-&gt;bk = bck;<br \/>4325 remainder-&gt;fd = fwd;<br \/>4326 bck-&gt;fd = remainder;<br \/>4327 fwd-&gt;bk = remainder;<br \/>&#8230;.<br \/>4337 set_head (victim, nb | PREV_INUSE |<br \/>4338 (av != &amp;main_arena ? NON_MAIN_ARENA : 0));<br \/>4339 set_head (remainder, remainder_size | PREV_INUSE);<br \/>&#8230;.<br \/>4343 void *p = chunk2mem (victim);<br \/>&#8230;.<br \/>4345 return p;<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>&#8211; If this code path is interrupted by SIGALRM *after* line 4327 but<br \/>*before* line 4339, then the remainder chunk of this split is already<br \/>linked into the unsorted list of free chunks (lines 4324-4327), but<br \/>its size field (mchunk_size) is not yet initialized (line 4339).<\/p>\n<p>&#8211; If we are able to control its size field (through leftovers from<br \/>previous heap allocations), then we can make this remainder chunk<br \/>larger and overlap with other heap chunks, and therefore corrupt heap<br \/>memory when this enlarged, overlapping remainder chunk is eventually<br \/>malloc()ated and written to (inside the SIGALRM handler).<\/p>\n<p>Our last question, then, is: given that we do not control anything about<br \/>the malloc() calls inside the SIGALRM handler, what can we overwrite in<br \/>the heap to achieve arbitrary code execution before sshd calls _exit()<br \/>(in sshsigdie())?<\/p>\n<p>Because __tzfile_read() (inside the SIGALRM handler) malloc()ates a FILE<br \/>structure in the heap (at line 166 above), and because FILE structures<br \/>have a long history of abuse for arbitrary code execution, we decided to<br \/>aim our heap corruption at this FILE structure. This is, however, easier<br \/>said than done: our heap corruption is very limited, and FILE structures<br \/>have been significantly hardened over the years (by IO_validate_vtable()<br \/>and PTR_DEMANGLE(), for example).<\/p>\n<p>Eventually, we devised the following technique (which seems to be<br \/>specific to the i386 glibc &#8212; the amd64 glibc does not seem to use<br \/>_vtable_offset at all):<\/p>\n<p>&#8211; with our limited heap corruption, we overwrite the _vtable_offset<br \/>field (a single signed char) of __tzfile_read()&#8217;s FILE structure;<\/p>\n<p>&#8211; the glibc&#8217;s libio functions will therefore look for this FILE<br \/>structure&#8217;s vtable pointer (a pointer to an array of function<br \/>pointers) at a non-zero offset (our overwritten _vtable_offset),<br \/>instead of the default zero offset;<\/p>\n<p>&#8211; we (attackers) can easily control this fake vtable pointer (through<br \/>leftovers from previous heap allocations), because the FILE structure<br \/>around this offset is not explicitly initialized by fopen();<\/p>\n<p>&#8211; to pass the glibc&#8217;s security checks, our fake vtable pointer must<br \/>point somewhere into the __libc_IO_vtables section: we decided to<br \/>point it to the vtable for wide-character streams, _IO_wfile_jumps<br \/>(i.e., to 0xb761b740, since we assume that the glibc is mapped at<br \/>address 0xb7400000);<\/p>\n<p>&#8211; as a result, __fread_unlocked() (at line 186 above) calls<br \/>_IO_wfile_underflow() (instead of _IO_file_underflow()), which calls a<br \/>function pointer (__fct) that basically comes from a structure whose<br \/>pointer (_codecvt) is yet another field of the FILE structure;<\/p>\n<p>&#8211; we (attackers) can easily control this _codecvt pointer (through<br \/>leftovers from previous heap allocations, because this field of the<br \/>FILE structure is not explicitly initialized by fopen()), which also<br \/>allows us to control the __fct function pointer.<\/p>\n<p>In summary, by overwriting a single byte (_vtable_offset) of the FILE<br \/>structure malloc()ated by fopen(), we can call our own __fct function<br \/>pointer and execute arbitrary code during __fread_unlocked().<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Practice<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>I wanted it perfect, no wrinkles in it<br \/>&#8212; The Interrupters, &#8220;In the Mirror&#8221;<\/p>\n<p>To mount this attack against sshd&#8217;s privileged child, let us first<br \/>imagine the following heap layout (the &#8220;XXX&#8221;s are &#8220;barrier&#8221; chunks that<br \/>allow us to make holes in the heap; for example, small memory-leaked<br \/>chunks):<\/p>\n<p>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>XXX| large hole |XXX| small hole |XXX<br \/>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>| ~8KB | | 320B |<\/p>\n<p>&#8211; shortly before sshd receives the SIGALRM, we malloc()ate a ~4KB chunk<br \/>that splits the large ~8KB hole into two smaller chunks:<\/p>\n<p>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>XXX| large allocated chunk | free remainder chunk |XXX| small hole |XXX<br \/>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>| ~4KB | ~4KB | | 320B |<\/p>\n<p>&#8211; but if this malloc() is interrupted by SIGALRM *after* line 4327 but<br \/>*before* line 4339, then the remainder chunk of this split is already<br \/>linked into the unsorted list of free chunks, but its size field is<br \/>under our control (through leftovers from previous heap allocations),<br \/>and this artificially enlarged remainder chunk overlaps with the<br \/>following small hole:<\/p>\n<p>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>XXX| large allocated chunk | real remainder chunk |XXX| small hole |XXX<br \/>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>| ~4KB |&lt;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-&gt;|<br \/>artificially enlarged remainder chunk<\/p>\n<p>&#8211; when the SIGALRM handler calls syslog() and hence __tzfile_read(),<br \/>fopen() malloc()ates the small hole for its FILE structure, and<br \/>__fread_unlocked() malloc()ates a 4KB read buffer, thereby splitting<br \/>the enlarged remainder chunk in two (the 4KB read buffer and a small<br \/>remainder chunk):<\/p>\n<p>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>XXX| large allocated chunk | |XXX| FILE |XXX<br \/>&#8212;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;|&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-|&#8212;|&#8211;|&#8212;&#8212;&#8212;|&#8212;<br \/>| ~4KB |&lt;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&gt;|&lt;&#8212;&#8212;-&gt;|<br \/>4KB read buffer remainder<\/p>\n<p>&#8211; we therefore overwrite parts of the FILE structure with the internal<br \/>header of this small remainder chunk: more precisely, we overwrite the<br \/>FILE&#8217;s _vtable_offset with the third byte of this header&#8217;s bk field,<br \/>which is a pointer to the unsorted list of free chunks, 0xb761d7f8<br \/>(i.e., we overwrite _vtable_offset with 0x61);<\/p>\n<p>&#8211; then, as explained in the &#8220;Theory&#8221; subsection, __fread_unlocked()<br \/>calls _IO_wfile_underflow() (instead of _IO_file_underflow()), which<br \/>calls our own __fct function pointer (through our own _codecvt<br \/>pointer) and executes our arbitrary code.<\/p>\n<p>Note: we have not yet explained how to reliably go from a controlled<br \/>_codecvt pointer to a controlled __fct function pointer; we will do<br \/>so, but we must first solve a more pressing problem.<\/p>\n<p>Indeed, we learned from our work on older OpenSSH versions that we will<br \/>never win this signal handler race condition if our large race window<br \/>contains only one small race window. Consequently, we implemented the<br \/>following strategy, based on the following heap layout:<\/p>\n<p>&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>XXX|large hole 1|XXX|small hole 1|XXX|large hole 2|XXX|small hole 2|&#8230;<br \/>&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;|&#8212;&#8212;&#8212;&#8212;|&#8212;<br \/>| ~8KB | | 320B | | ~8KB | | 320B |<\/p>\n<p>The last packet that we send to sshd (shortly before the delivery of<br \/>SIGALRM) forces sshd to perform the following sequence of malloc()<br \/>calls: malloc(~4KB), malloc(304), malloc(~4KB), malloc(304), etc.<\/p>\n<p>1\/ Our first malloc(~4KB) splits the large hole 1 in two:<\/p>\n<p>&#8211; if this first split is interrupted by SIGALRM at the right time, then<br \/>the fopen() inside the SIGALRM handler malloc()ates the small hole 1<br \/>for its FILE structure, and we achieve arbitrary code execution as<br \/>explained above;<\/p>\n<p>&#8211; if not, then we malloc()ate the small hole 1 ourselves with our first<br \/>malloc(304), and:<\/p>\n<p>2\/ Our second malloc(~4KB) splits the large hole 2 in two:<\/p>\n<p>&#8211; if this second split is interrupted by SIGALRM at the right time, then<br \/>the fopen() inside the SIGALRM handler malloc()ates the small hole 2<br \/>for its FILE structure, and we achieve arbitrary code execution as<br \/>explained above;<\/p>\n<p>&#8211; if not, then we malloc()ate the small hole 2 ourselves with our second<br \/>malloc(304), etc.<\/p>\n<p>We were able to make 27 pairs of such large and small holes in sshd&#8217;s<br \/>heap (28 would exceed PACKET_MAX_SIZE, 256KB): our large race window now<br \/>contains 27 small race windows! Achieving this complex heap layout was<br \/>extremely painful and time-consuming, but the two highlights are:<\/p>\n<p>&#8211; We abuse sshd&#8217;s public-key parsing code to perform arbitrary sequences<br \/>of malloc() and free() calls (at lines 1805 and 573):<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>1754 cert_parse(struct sshbuf *b, struct sshkey *key, struct sshbuf *certbuf)<br \/>1755 {<br \/>&#8230;.<br \/>1797 while (sshbuf_len(principals) &gt; 0) {<br \/>&#8230;.<br \/>1805 if ((ret = sshbuf_get_cstring(principals, &amp;principal,<br \/>&#8230;.<br \/>1820 key-&gt;cert-&gt;principals[key-&gt;cert-&gt;nprincipals++] = principal;<br \/>1821 }<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>562 cert_free(struct sshkey_cert *cert)<br \/>563 {<br \/>&#8230;<br \/>572 for (i = 0; i &lt; cert-&gt;nprincipals; i++)<br \/>573 free(cert-&gt;principals[i]);<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>&#8211; We were unable to find a memory leak for our small &#8220;barrier&#8221; chunks;<br \/>instead, we use tcache chunks (which are never really freed, because<br \/>their inuse bit is never cleared) as makeshift &#8220;barrier&#8221; chunks.<\/p>\n<p>To reliably achieve this heap layout, we send five different public-key<br \/>packets to sshd (packets a\/ to d\/ can be sent long before SIGALRM; most<br \/>of packet e\/ can also be sent long before SIGALRM, but its very last<br \/>byte must be sent at the very last moment):<\/p>\n<p>a\/ We malloc()ate and free() a variety of tcache chunks, to ensure that<br \/>the heap allocations that we do not control end up in these tcache<br \/>chunks and do not interfere with our careful heap layout.<\/p>\n<p>b\/ We malloc()ate and free() chunks of various sizes, to make our 27<br \/>pairs of large and small holes (and the corresponding &#8220;barrier&#8221; chunks).<\/p>\n<p>c\/ We malloc()ate and free() ~4KB chunks and 320B chunks, to:<\/p>\n<p>&#8211; write the fake header (the large size field) of our potentially<br \/>enlarged remainder chunk, into the middle of our large holes;<\/p>\n<p>&#8211; write the fake footer of our potentially enlarged remainder chunk, to<br \/>the end of our small holes (to pass the glibc&#8217;s security checks);<\/p>\n<p>&#8211; write our fake vtable and _codecvt pointers, into our small holes<br \/>(which are potential FILE structures).<\/p>\n<p>d\/ We malloc()ate and free() one very large string (nearly 256KB), to<br \/>ensure that our large and small holes are removed from the unsorted list<br \/>of free chunks and placed into their respective malloc bins.<\/p>\n<p>e\/ We force sshd to perform our final sequence of malloc() calls<br \/>(malloc(~4KB), malloc(304), malloc(~4KB), malloc(304), etc), to open our<br \/>27 small race windows.<\/p>\n<p>Attentive readers may have noticed that we have still not addressed<br \/>(literally and figuratively) the problem of _codecvt. In fact, _codecvt<br \/>is a pointer to a structure (_IO_codecvt) that contains a pointer to a<br \/>structure (__gconv_step) that contains the __fct function pointer that<br \/>allows us to execute arbitrary code. To reliably control __fct through<br \/>_codecvt, we simply point _codecvt to one of the glibc&#8217;s malloc bins,<br \/>which conveniently contains a pointer to one of our free chunks in the<br \/>heap, which contains our own __fct function pointer to arbitrary glibc<br \/>code (all of these glibc addresses are known to us, because we assume<br \/>that the glibc is mapped at address 0xb7400000).<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>Timing<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>We&#8217;re running out of time<br \/>&#8212; The Interrupters, &#8220;As We Live&#8221;<\/p>\n<p>As we implemented this third exploit, it became clear that we could not<br \/>simply re-use the timing strategy that we had used against the two older<br \/>OpenSSH versions: we were never winning this new race condition.<br \/>Eventually, we understood why:<\/p>\n<p>&#8211; It takes a long time (~10ms) for sshd to parse our fifth and last<br \/>public key (packet e\/ above); in other words, our large race window is<br \/>too large (our 27 small race windows are like needles in a haystack).<\/p>\n<p>&#8211; The user_specific_delay() that was introduced recently (OpenSSH 7.8p1)<br \/>delays sshd&#8217;s response to our last public-key packet by up to ~9ms and<br \/>therefore destroys our feedback-based timing strategy.<\/p>\n<p>As a result, we developed a completely different timing strategy:<\/p>\n<p>&#8211; from time to time, we send our last public-key packet with a little<br \/>mistake that produces an error response (lines 138-142 below), right<br \/>before the call to sshkey_from_blob() that parses our public key;<\/p>\n<p>&#8211; from time to time, we send our last public-key packet with another<br \/>little mistake that produces an error response (lines 151-155 below),<br \/>right after the call to sshkey_from_blob() that parses our public key;<\/p>\n<p>&#8211; the difference between these two response times is the time that it<br \/>takes for sshd to parse our last public key, and this allows us to<br \/>precisely time the transmission of our last packets (to ensure that<br \/>sshd has the time to parse our public key in the unprivileged child,<br \/>send it to the privileged child, and start to parse it there, before<br \/>the delivery of SIGALRM).<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>88 userauth_pubkey(struct ssh *ssh, const char *method)<br \/>89 {<br \/>&#8230;<br \/>138 if (pktype == KEY_UNSPEC) {<br \/>139 \/* this is perfectly legal *\/<br \/>140 verbose_f(&#8220;unsupported public key algorithm: %s&#8221;, pkalg);<br \/>141 goto done;<br \/>142 }<br \/>143 if ((r = sshkey_from_blob(pkblob, blen, &amp;key)) != 0) {<br \/>144 error_fr(r, &#8220;parse key&#8221;);<br \/>145 goto done;<br \/>146 }<br \/>&#8230;<br \/>151 if (key-&gt;type != pktype) {<br \/>152 error_f(&#8220;type mismatch for decoded key &#8220;<br \/>153 &#8220;(received %d, expected %d)&#8221;, key-&gt;type, pktype);<br \/>154 goto done;<br \/>155 }<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>With this change in strategy, it takes ~10,000 tries on average to win<br \/>the race condition; i.e., with 100 connections (MaxStartups) accepted<br \/>per 120 seconds (LoginGraceTime), it takes ~3-4 hours on average to win<br \/>the race condition, and ~6-8 hours to obtain a remote root shell<br \/>(because of ASLR).<\/p>\n<p>========================================================================<br \/>Towards an amd64 exploit<br \/>========================================================================<\/p>\n<p>What&#8217;s your plan for tomorrow?<br \/>&#8212; The Interrupters, &#8220;Take Back the Power&#8221;<\/p>\n<p>We decided to target Rocky Linux 9 (a Red Hat Enterprise Linux 9<br \/>derivative), from &#8220;Rocky-9.4-x86_64-minimal.iso&#8221;, for two reasons:<\/p>\n<p>&#8211; its OpenSSH version (8.7p1) is vulnerable to this signal handler race<br \/>condition and its glibc is always mapped at a multiple of 2MB (because<br \/>of the ASLR weakness discussed in the previous &#8220;Theory&#8221; subsection),<br \/>which makes partial pointer overwrites much more powerful;<\/p>\n<p>&#8211; the syslog() function (which is async-signal-unsafe but is called by<br \/>sshd&#8217;s SIGALRM handler) of this glibc version (2.34) internally calls<br \/>__open_memstream(), which malloc()ates a FILE structure in the heap,<br \/>and also calls calloc(), realloc(), and free() (which gives us some<br \/>much-needed freedom).<\/p>\n<p>With a heap corruption as a primitive, two FILE structures malloc()ated<br \/>in the heap, and 21 fixed bits in the glibc&#8217;s addresses, we believe that<br \/>this signal handler race condition is exploitable on amd64 (probably not<br \/>in ~6-8 hours, but hopefully in less than a week). Only time will tell.<\/p>\n<p>Side note: we discovered that Ubuntu 24.04 does not re-randomize the<br \/>ASLR of its sshd children (it is randomized only once, at boot time); we<br \/>tracked this down to the patch below, which turns off sshd&#8217;s rexec_flag.<br \/>This is generally a bad idea, but in the particular case of this signal<br \/>handler race condition, it prevents sshd from being exploitable: the<br \/>syslog() inside the SIGALRM handler does not call any of the malloc<br \/>functions, because it is never the very first call to syslog().<\/p>\n<p>https:\/\/git.launchpad.net\/ubuntu\/+source\/openssh\/tree\/debian\/patches\/systemd-socket-activation.patch<\/p>\n<p>========================================================================<br \/>Patches and mitigation<br \/>========================================================================<\/p>\n<p>The storm has come and gone<br \/>&#8212; The Interrupters, &#8220;Good Things&#8221;<\/p>\n<p>On June 6, 2024, this signal handler race condition was fixed by commit<br \/>81c1099 (&#8220;Add a facility to sshd(8) to penalise particular problematic<br \/>client behaviours&#8221;), which moved the async-signal-unsafe code from<br \/>sshd&#8217;s SIGALRM handler to sshd&#8217;s listener process, where it can be<br \/>handled synchronously:<\/p>\n<p>https:\/\/github.com\/openssh\/openssh-portable\/commit\/81c1099d22b81ebfd20a334ce986c4f753b0db29<\/p>\n<p>Because this fix is part of a large commit (81c1099), on top of an even<br \/>larger defense-in-depth commit (03e3de4, &#8220;Start the process of splitting<br \/>sshd into separate binaries&#8221;), it might prove difficult to backport. In<br \/>that case, the signal handler race condition itself can be fixed by<br \/>removing or commenting out the async-signal-unsafe code from the<br \/>sshsigdie() function; for example:<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>sshsigdie(const char *file, const char *func, int line, int showfunc,<br \/>LogLevel level, const char *suffix, const char *fmt, &#8230;)<br \/>{<br \/>#if 0<br \/>va_list args;<\/p>\n<p>va_start(args, fmt);<br \/>sshlogv(file, func, line, showfunc, SYSLOG_LEVEL_FATAL,<br \/>suffix, fmt, args);<br \/>va_end(args);<br \/>#endif<br \/>_exit(1);<br \/>}<br \/>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>Finally, if sshd cannot be updated or recompiled, this signal handler<br \/>race condition can be fixed by simply setting LoginGraceTime to 0 in the<br \/>configuration file. This makes sshd vulnerable to a denial of service<br \/>(the exhaustion of all MaxStartups connections), but it makes it safe<br \/>from the remote code execution presented in this advisory.<\/p>\n<p>========================================================================<br \/>Acknowledgments<br \/>========================================================================<\/p>\n<p>We thank OpenSSH&#8217;s developers for their outstanding work and close<br \/>collaboration on this release. We also thank the distros@openwall.<br \/>Finally, we dedicate this advisory to Sophia d&#8217;Antoine.<\/p>\n<p>========================================================================<br \/>Timeline<br \/>========================================================================<\/p>\n<p>2024-05-19: We contacted OpenSSH&#8217;s developers. Successive iterations of<br \/>patches and patch reviews followed.<\/p>\n<p>2024-06-20: We contacted the distros@openwall.<\/p>\n<p>2024-07-01: Coordinated Release Date.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Qualys Security Advisory regreSSHion: RCE in OpenSSH&#8217;s server, on glibc-based Linux systems(CVE-2024-6387) ========================================================================Contents======================================================================== SummarySSH-2.0-OpenSSH_3.4p1 Debian 1:3.4p1-1.woody.3 (Debian 3.0r6, from 2005)&#8211; Theory&#8211; Practice&#8211; TimingSSH-2.0-OpenSSH_4.2p1 Debian-7ubuntu3 (Ubuntu 6.06.1, from 2006)&#8211; Theory, take one&#8211; Theory, take two&#8211; Practice&#8211; TimingSSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2 (Debian 12.5.0, from 2024)&#8211; Theory&#8211; Practice&#8211; TimingTowards an amd64 exploitPatches and mitigationAcknowledgmentsTimeline ========================================================================Summary======================================================================== All it takes is a &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[],"class_list":["post-57786","post","type-post","status-publish","format-standard","hentry","category-vulnerability"],"_links":{"self":[{"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/posts\/57786","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/comments?post=57786"}],"version-history":[{"count":0,"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/posts\/57786\/revisions"}],"wp:attachment":[{"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/media?parent=57786"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/categories?post=57786"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/afaghhosting.net\/blog\/wp-json\/wp\/v2\/tags?post=57786"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}