commit feb908dd62056fd757a6764e459a7440302e3799 Author: Willy Tarreau Date: Mon Jun 10 11:42:10 2013 +0200 Linux 2.6.32.61 Signed-off-by: Willy Tarreau commit 4ed3bb08f1698c62685278051c19f474fbf961d2 Author: Willy Tarreau Date: Fri Jun 7 07:11:37 2013 +0200 x86, ptrace: fix build breakage with gcc 4.7 Christoph Biedl reported that 2.6.32 does not build with gcc 4.7 on i386 : CC arch/x86/kernel/ptrace.o arch/x86/kernel/ptrace.c:1472:17: error: conflicting types for 'syscall_trace_enter' In file included from /«PKGBUILDDIR»/arch/x86/include/asm/vm86.h:130:0, from /«PKGBUILDDIR»/arch/x86/include/asm/processor.h:10, from /«PKGBUILDDIR»/arch/x86/include/asm/thread_info.h:22, from include/linux/thread_info.h:56, from include/linux/preempt.h:9, from include/linux/spinlock.h:50, from include/linux/seqlock.h:29, from include/linux/time.h:8, from include/linux/timex.h:56, from include/linux/sched.h:56, from arch/x86/kernel/ptrace.c:11: /«PKGBUILDDIR»/arch/x86/include/asm/ptrace.h:145:13: note: previous declaration of 'syscall_trace_enter' was here arch/x86/kernel/ptrace.c:1517:17: error: conflicting types for 'syscall_trace_leave' In file included from /«PKGBUILDDIR»/arch/x86/include/asm/vm86.h:130:0, from /«PKGBUILDDIR»/arch/x86/include/asm/processor.h:10, from /«PKGBUILDDIR»/arch/x86/include/asm/thread_info.h:22, from include/linux/thread_info.h:56, from include/linux/preempt.h:9, from include/linux/spinlock.h:50, from include/linux/seqlock.h:29, from include/linux/time.h:8, from include/linux/timex.h:56, from include/linux/sched.h:56, from arch/x86/kernel/ptrace.c:11: /«PKGBUILDDIR»/arch/x86/include/asm/ptrace.h:146:13: note: previous declaration of 'syscall_trace_leave' was here make[4]: *** [arch/x86/kernel/ptrace.o] Error 1 make[3]: *** [arch/x86/kernel] Error 2 make[3]: *** Waiting for unfinished jobs.... He also found that this issue did not appear in more recent kernels since this asmregparm disappeared in 3.0-rc1 with commit 1b4ac2a935 that was applied after some UM changes that we don't necessarily want in 2.6.32. Thus, the cleanest fix for older kernels is to make the declaration in ptrace.h match the one in ptrace.c by specifying asmregparm on these functions. They're only called from asm which explains why it used to work despite the inconsistency in the declaration. Reported-by: Christoph Biedl Tested-by: Christoph Biedl Signed-off-by: Willy Tarreau commit ad6bb568777206368e78b2aaa16d501d14c39be4 Author: Kashyap, Desai Date: Mon Apr 5 14:19:21 2010 +0530 mpt2sas: Send default descriptor for RAID pass through in mpt2ctl commit ebda4d38df542e1ff4747c4daadfc7da250b4fa6 upstream. RAID_SCSI_IO_PASSTHROUGH: Driver needs to be sending the default descriptor for RAID Passthru, currently its sending SCSI_IO descriptor. Signed-off-by: Kashyap Desai Signed-off-by: James Bottomley Signed-off-by: Willy Tarreau commit 0736a7172906aa0356f1ce47363c429c60214fec Author: Mathias Krause Date: Sun Apr 7 01:52:00 2013 +0000 tipc: fix info leaks via msg_name in recv_msg/recv_stream commit 60085c3d009b0df252547adb336d1ccca5ce52ec upstream. The code in set_orig_addr() does not initialize all of the members of struct sockaddr_tipc when filling the sockaddr info -- namely the union is only partly filled. This will make recv_msg() and recv_stream() -- the only users of this function -- leak kernel stack memory as the msg_name member is a local variable in net/socket.c. Additionally to that both recv_msg() and recv_stream() fail to update the msg_namelen member to 0 while otherwise returning with 0, i.e. "success". This is the case for, e.g., non-blocking sockets. This will lead to a 128 byte kernel stack leak in net/socket.c. Fix the first issue by initializing the memory of the union with memset(0). Fix the second one by setting msg_namelen to 0 early as it will be updated later if we're going to fill the msg_name member. Cc: Jon Maloy Cc: Allan Stephens Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 21f908c9c285c0e9a1148aef6243caf6930bc158 Author: Mathias Krause Date: Sun Apr 7 01:51:53 2013 +0000 irda: Fix missing msg_namelen update in irda_recvmsg_dgram() commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d upstream. The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about irda_recvmsg_dgram() not filling the msg_name in case it was set. Cc: Samuel Ortiz Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [dannf: adjusted to apply to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 3cab351d977035a0f397491d3266a15eb8e363dc Author: Mathias Krause Date: Sun Apr 7 01:51:59 2013 +0000 rose: fix info leak via msg_name in rose_recvmsg() [ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ] The code in rose_recvmsg() does not initialize all of the members of struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info. Nor does it initialize the padding bytes of the structure inserted by the compiler for alignment. This will lead to leaking uninitialized kernel stack bytes in net/socket.c. Fix the issue by initializing the memory used for sockaddr info with memset(0). Signed-off-by: Mathias Krause Cc: Ralf Baechle Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a86c9b39fdff1740b631b94901e0ea0bb0c22789 Author: Weiping Pan Date: Mon Jul 23 10:37:48 2012 +0800 rds: set correct msg_namelen commit 06b6a1cf6e776426766298d055bb3991957d90a7 upstream Jay Fenlason (fenlason@redhat.com) found a bug, that recvfrom() on an RDS socket can return the contents of random kernel memory to userspace if it was called with a address length larger than sizeof(struct sockaddr_in). rds_recvmsg() also fails to set the addr_len paramater properly before returning, but that's just a bug. There are also a number of cases wher recvfrom() can return an entirely bogus address. Anything in rds_recvmsg() that returns a non-negative value but does not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path at the end of the while(1) loop will return up to 128 bytes of kernel memory to userspace. And I write two test programs to reproduce this bug, you will see that in rds_server, fromAddr will be overwritten and the following sock_fd will be destroyed. Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is better to make the kernel copy the real length of address to user space in such case. How to run the test programs ? I test them on 32bit x86 system, 3.5.0-rc7. 1 compile gcc -o rds_client rds_client.c gcc -o rds_server rds_server.c 2 run ./rds_server on one console 3 run ./rds_client on another console 4 you will see something like: server is waiting to receive data... old socket fd=3 server received data from client:data from client msg.msg_namelen=32 new socket fd=-1067277685 sendmsg() : Bad file descriptor /***************** rds_client.c ********************/ int main(void) { int sock_fd; struct sockaddr_in serverAddr; struct sockaddr_in toAddr; char recvBuffer[128] = "data from client"; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if (sock_fd < 0) { perror("create socket error\n"); exit(1); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4001); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind() error\n"); close(sock_fd); exit(1); } memset(&toAddr, 0, sizeof(toAddr)); toAddr.sin_family = AF_INET; toAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); toAddr.sin_port = htons(4000); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = strlen(recvBuffer) + 1; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendto() error\n"); close(sock_fd); exit(1); } printf("client send data:%s\n", recvBuffer); memset(recvBuffer, '\0', 128); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("receive data from server:%s\n", recvBuffer); close(sock_fd); return 0; } /***************** rds_server.c ********************/ int main(void) { struct sockaddr_in fromAddr; int sock_fd; struct sockaddr_in serverAddr; unsigned int addrLen; char recvBuffer[128]; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if(sock_fd < 0) { perror("create socket error\n"); exit(0); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4000); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind error\n"); close(sock_fd); exit(1); } printf("server is waiting to receive data...\n"); msg.msg_name = &fromAddr; /* * I add 16 to sizeof(fromAddr), ie 32, * and pay attention to the definition of fromAddr, * recvmsg() will overwrite sock_fd, * since kernel will copy 32 bytes to userspace. * * If you just use sizeof(fromAddr), it works fine. * */ msg.msg_namelen = sizeof(fromAddr) + 16; /* msg.msg_namelen = sizeof(fromAddr); */ msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; while (1) { printf("old socket fd=%d\n", sock_fd); if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("server received data from client:%s\n", recvBuffer); printf("msg.msg_namelen=%d\n", msg.msg_namelen); printf("new socket fd=%d\n", sock_fd); strcat(recvBuffer, "--data from server"); if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendmsg()\n"); close(sock_fd); exit(1); } } close(sock_fd); return 0; } Signed-off-by: Weiping Pan Signed-off-by: David S. Miller [dannf: Adjusted to apply to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 5f52780225b1707e51b67123eb13b21c500b9997 Author: Mathias Krause Date: Sun Apr 7 01:51:56 2013 +0000 llc: Fix missing msg_namelen update in llc_ui_recvmsg() [ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ] For stream sockets the code misses to update the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. The msg_namelen update is also missing for datagram sockets in case the socket is shutting down during receive. Fix both issues by setting msg_namelen to 0 early. It will be updated later if we're going to fill the msg_name member. Signed-off-by: Mathias Krause Cc: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3c445af8ed354897ae0d01d53b4ebc53647b7f1f Author: Mathias Krause Date: Wed Aug 15 11:31:53 2012 +0000 llc: fix info leak via getsockname() [ Upstream commit 3592aaeb80290bda0f2cf0b5456c97bfc638b192 ] The LLC code wrongly returns 0, i.e. "success", when the socket is zapped. Together with the uninitialized uaddrlen pointer argument from sys_getsockname this leads to an arbitrary memory leak of up to 128 bytes kernel stack via the getsockname() syscall. Return an error instead when the socket is zapped to prevent the info leak. Also remove the unnecessary memset(0). We don't directly write to the memory pointed by uaddr but memcpy() a local structure at the end of the function that is properly initialized. Signed-off-by: Mathias Krause Cc: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 775ad2a81199393161e6293f86fcbcd75a2ddb65 Author: Mathias Krause Date: Sun Apr 7 01:51:54 2013 +0000 iucv: Fix missing msg_namelen update in iucv_sock_recvmsg() [ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ] The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about iucv_sock_recvmsg() not filling the msg_name in case it was set. Signed-off-by: Mathias Krause Cc: Ursula Braun Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3822b3c57419e02ee9d8604ec9fbad832ade55ce Author: Wu Fengguang Date: Thu Aug 2 23:10:01 2012 +0000 isdnloop: fix and simplify isdnloop_init() [ Upstream commit 77f00f6324cb97cf1df6f9c4aaeea6ada23abdb2 ] Fix a buffer overflow bug by removing the revision and printk. [ 22.016214] isdnloop-ISDN-driver Rev 1.11.6.7 [ 22.097508] isdnloop: (loop0) virtual card added [ 22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972 [ 22.174400] [ 22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb1-dirty #129 [ 22.624071] Call Trace: [ 22.720558] [] ? CallcNew+0x56/0x56 [ 22.815248] [] panic+0x110/0x329 [ 22.914330] [] ? isdnloop_init+0xaf/0xb1 [ 23.014800] [] ? CallcNew+0x56/0x56 [ 23.090763] [] __stack_chk_fail+0x2b/0x30 [ 23.185748] [] isdnloop_init+0xaf/0xb1 Signed-off-by: Fengguang Wu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 92df2f6066f0c9882abef19b70def40d4198f895 Author: Mathias Krause Date: Sun Apr 7 01:51:48 2013 +0000 ax25: fix info leak via msg_name in ax25_recvmsg() [ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ] When msg_namelen is non-zero the sockaddr info gets filled out, as requested, but the code fails to initialize the padding bytes of struct sockaddr_ax25 inserted by the compiler for alignment. Additionally the msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is not always filled up to this size. Both issues lead to the fact that the code will leak uninitialized kernel stack bytes in net/socket.c. Fix both issues by initializing the memory with memset(0). Signed-off-by: Mathias Krause Cc: Ralf Baechle Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 55dde8cfc4af2d57a6d8c377f3cd523d1b40e10b Author: Mathias Krause Date: Wed Aug 15 11:31:44 2012 +0000 atm: fix info leak in getsockopt(SO_ATMPVC) commit e862f1a9b7df4e8196ebec45ac62295138aa3fc2 upstream. The ATM code fails to initialize the two padding bytes of struct sockaddr_atmpvc inserted for alignment. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [bwh: Backported to 2.6.32: adjust context, indentation] Signed-off-by: Willy Tarreau commit dde45d396e5d0395b821a536a22b545ce9d45618 Author: Mathias Krause Date: Wed Aug 15 11:31:45 2012 +0000 atm: fix info leak via getsockname() commit 3c0c5cfdcd4d69ffc4b9c0907cec99039f30a50a upstream. The ATM code fails to initialize the two padding bytes of struct sockaddr_atmpvc inserted for alignment. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Willy Tarreau commit 531539ab21be455b59226d241a3274bead7458ab Author: Mathias Krause Date: Sun Apr 7 01:51:47 2013 +0000 atm: update msg_namelen in vcc_recvmsg() [ Upstream commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 ] The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about vcc_recvmsg() not filling the msg_name in case it was set. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 75ca20885ed80079d507fddda00b805e88258094 Author: Mathias Krause Date: Wed Aug 15 11:31:56 2012 +0000 ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT) commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream. If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is not set, __ip_vs_get_timeouts() does not fully initialize the structure that gets copied to userland and that for leaks up to 12 bytes of kernel stack. Add an explicit memset(0) before passing the structure to __ip_vs_get_timeouts() to avoid the info leak. Signed-off-by: Mathias Krause Cc: Wensong Zhang Cc: Simon Horman Cc: Julian Anastasov Signed-off-by: David S. Miller [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Willy Tarreau commit 9df2c9ad3bafd141a5e76b3d695aed0e37fac5c4 Author: Jesper Dangaard Brouer Date: Tue Aug 28 22:05:51 2012 +0200 ipvs: IPv6 MTU checking cleanup and bugfix Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using a common helper function __mtu_check_toobig_v6(). The MTU check for tunnel mode can also use this helper as ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to skb->len. And the 'mtu' variable have been adjusted before calling helper. Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6() were missing a check for skb_is_gso(). This bug e.g. caused issues for KVM IPVS setups, where different Segmentation Offloading techniques are utilized, between guests, via the virtio driver. This resulted in very bad performance, due to the ICMPv6 "too big" messages didn't affect the sender. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Patrick McHardy Signed-off-by: Pablo Neira Ayuso (cherry picked from commit 590e3f79a21edd2e9857ac3ced25ba6b2a491ef8) Signed-off-by: Willy Tarreau commit ec3dc8cdeadcbbac55e837d5931131ec5464f508 Author: Simon Horman Date: Tue Nov 9 10:08:49 2010 +0900 ipvs: allow transmit of GRO aggregated skbs Attempt at allowing LVS to transmit skbs of greater than MTU length that have been aggregated by GRO and can thus be deaggregated by GSO. Cc: Julian Anastasov Cc: Herbert Xu Signed-off-by: Simon Horman (cherry picked from commit 8f1b03a4c18e8f3f0801447b62330faa8ed3bb37) Signed-off-by: Willy Tarreau commit df7753cfd9be4b4fca56b38eb30375d20d3dda4f Author: Jozsef Kadlecsik Date: Tue Apr 3 22:02:01 2012 +0200 netfilter: nf_ct_ipv4: packets with wrong ihl are invalid commit 07153c6ec074257ade76a461429b567cff2b3a1e upstream. It was reported that the Linux kernel sometimes logs: klogd: [2629147.402413] kernel BUG at net / netfilter / nf_conntrack_proto_tcp.c: 447! klogd: [1072212.887368] kernel BUG at net / netfilter / nf_conntrack_proto_tcp.c: 392 ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in nf_conntrack_proto_tcp.c should catch malformed packets, so the errors at the indicated lines - TCP options parsing - should not happen. However, tcp_error() relies on the "dataoff" offset to the TCP header, calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check bogus ihl values in IPv4 packets, which then can slip through tcp_error() and get caught at the TCP options parsing routines. The patch fixes ipv4_get_l4proto() by invalidating packets with bogus ihl value. The patch closes netfilter bugzilla id 771. Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Acked-by: David Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c023a0b45bb91bce7bf3300518592f07fcd35244 Author: Eric Dumazet Date: Mon Aug 8 23:44:00 2011 -0700 ipv6: make fragment identifications less predictable [ Backport of upstream commit 87c48fa3b4630905f98268dde838ee43626a060c ] Fernando Gont reported current IPv6 fragment identification generation was not secure, because using a very predictable system-wide generator, allowing various attacks. IPv4 uses inetpeer cache to address this problem and to get good performance. We'll use this mechanism when IPv6 inetpeer is stable enough in linux-3.1 For the time being, we use jhash on destination address to provide less predictable identifications. Also remove a spinlock and use cmpxchg() to get better SMP performance. Reported-by: Fernando Gont Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman [bwh: Backport further to 2.6.32] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit b1a1c38d6d1860584ec828b798276fffd2b7eadc Author: Nicolas Dichtel Date: Fri Sep 3 05:13:05 2010 +0000 ipv6: discard overlapping fragment commit 70789d7052239992824628db8133de08dc78e593 upstream RFC5722 prohibits reassembling fragments when some data overlaps. Bug spotted by Zhang Zuotao . Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 9c51a966d1dc8a0dc359a4e2f94b31c7273b8f69 Author: Daniel Borkmann Date: Thu Feb 7 00:55:37 2013 +0000 net: sctp: sctp_auth_key_put: use kzfree instead of kfree [ Upstream commit 586c31f3bf04c290dc0a0de7fc91d20aa9a5ee53 ] For sensitive data like keying material, it is common practice to zero out keys before returning the memory back to the allocator. Thus, use kzfree instead of kfree. Signed-off-by: Daniel Borkmann Acked-by: Neil Horman Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 57201215910fc6fb5496ca85025dfa92d7518660 Author: Daniel Borkmann Date: Fri Feb 8 03:04:35 2013 +0000 net: sctp: sctp_endpoint_free: zero out secret key data [ Upstream commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf ] On sctp_endpoint_destroy, previously used sensitive keying material should be zeroed out before the memory is returned, as we already do with e.g. auth keys when released. Signed-off-by: Daniel Borkmann Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit bae3ff4ab33d03d11233c9967cfbdc76c6ad7685 Author: Daniel Borkmann Date: Fri Feb 8 03:04:34 2013 +0000 net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree [ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ] In sctp_setsockopt_auth_key, we create a temporary copy of the user passed shared auth key for the endpoint or association and after internal setup, we free it right away. Since it's sensitive data, we should zero out the key before returning the memory back to the allocator. Thus, use kzfree instead of kfree, just as we do in sctp_auth_key_put(). Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5e4b9c85cc21931ed6543cd8b527fd34ceb40717 Author: Tommi Rantala Date: Tue Nov 27 04:01:46 2012 +0000 sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails [ Upstream commit be364c8c0f17a3dd42707b5a090b318028538eb9 ] Trinity (the syscall fuzzer) discovered a memory leak in SCTP, reproducible e.g. with the sendto() syscall by passing invalid user space pointer in the second argument: #include #include #include int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } As far as I can tell, the leak has been around since ~2003. Signed-off-by: Tommi Rantala Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0e03cad4fb451922fb5371ddec1919b491ad2b08 Author: Mathias Krause Date: Sat Mar 9 05:52:21 2013 +0000 dcbnl: fix various netlink info leaks commit 29cd8ae0e1a39e239a3a7b67da1986add1199fc0 upstream. The dcb netlink interface leaks stack memory in various places: * perm_addr[] buffer is only filled at max with 12 of the 32 bytes but copied completely, * no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand, so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes for ieee_pfc structs, etc., * the same is true for CEE -- no in-kernel driver fills the whole struct, Prevent all of the above stack info leaks by properly initializing the buffers/structures involved. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [bwh: Backported to 2.6.32: no support for IEEE or CEE commands, so only deal with perm_addr] Signed-off-by: Willy Tarreau commit 480cabccafcfd68837bd4423d93e1cc2b1f170d5 Author: Paul Moore Date: Mon Mar 25 03:18:33 2013 +0000 unix: fix a race condition in unix_release() [ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ] As reported by Jan, and others over the past few years, there is a race condition caused by unix_release setting the sock->sk pointer to NULL before properly marking the socket as dead/orphaned. This can cause a problem with the LSM hook security_unix_may_send() if there is another socket attempting to write to this partially released socket in between when sock->sk is set to NULL and it is marked as dead/orphaned. This patch fixes this by only setting sock->sk to NULL after the socket has been marked as dead; I also take the opportunity to make unix_release_sock() a void function as it only ever returned 0/success. Dave, I think this one should go on the -stable pile. Special thanks to Jan for coming up with a reproducer for this problem. Reported-by: Jan Stancek Signed-off-by: Paul Moore Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0d99f34409d5108997afece6932e407e43d5db09 Author: Eric Dumazet Date: Thu Mar 21 17:36:09 2013 +0000 tcp: preserve ACK clocking in TSO [ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ] A long standing problem with TSO is the fact that tcp_tso_should_defer() rearms the deferred timer, while it should not. Current code leads to following bad bursty behavior : 20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119 20:11:24.484337 IP B > A: . ack 263721 win 1117 20:11:24.485086 IP B > A: . ack 265241 win 1117 20:11:24.485925 IP B > A: . ack 266761 win 1117 20:11:24.486759 IP B > A: . ack 268281 win 1117 20:11:24.487594 IP B > A: . ack 269801 win 1117 20:11:24.488430 IP B > A: . ack 271321 win 1117 20:11:24.489267 IP B > A: . ack 272841 win 1117 20:11:24.490104 IP B > A: . ack 274361 win 1117 20:11:24.490939 IP B > A: . ack 275881 win 1117 20:11:24.491775 IP B > A: . ack 277401 win 1117 20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119 20:11:24.492620 IP B > A: . ack 278921 win 1117 20:11:24.493448 IP B > A: . ack 280441 win 1117 20:11:24.494286 IP B > A: . ack 281961 win 1117 20:11:24.495122 IP B > A: . ack 283481 win 1117 20:11:24.495958 IP B > A: . ack 285001 win 1117 20:11:24.496791 IP B > A: . ack 286521 win 1117 20:11:24.497628 IP B > A: . ack 288041 win 1117 20:11:24.498459 IP B > A: . ack 289561 win 1117 20:11:24.499296 IP B > A: . ack 291081 win 1117 20:11:24.500133 IP B > A: . ack 292601 win 1117 20:11:24.500970 IP B > A: . ack 294121 win 1117 20:11:24.501388 IP B > A: . ack 295641 win 1117 20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119 While the expected behavior is more like : 20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119 20:19:49.260446 IP B > A: . ack 154281 win 1212 20:19:49.261282 IP B > A: . ack 155801 win 1212 20:19:49.262125 IP B > A: . ack 157321 win 1212 20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119 20:19:49.262958 IP B > A: . ack 158841 win 1212 20:19:49.263795 IP B > A: . ack 160361 win 1212 20:19:49.264628 IP B > A: . ack 161881 win 1212 20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119 20:19:49.265465 IP B > A: . ack 163401 win 1212 20:19:49.265886 IP B > A: . ack 164921 win 1212 20:19:49.266722 IP B > A: . ack 166441 win 1212 20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119 20:19:49.267559 IP B > A: . ack 167961 win 1212 20:19:49.268394 IP B > A: . ack 169481 win 1212 20:19:49.269232 IP B > A: . ack 171001 win 1212 20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119 Signed-off-by: Eric Dumazet Cc: Yuchung Cheng Cc: Van Jacobson Cc: Neal Cardwell Cc: Nandita Dukkipati Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ef32163f6120ba913ab768a6a6a36dd7a34d15b6 Author: Eric Dumazet Date: Sun Jan 6 18:21:49 2013 +0000 tcp: fix MSG_SENDPAGE_NOTLAST logic [ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ] commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once) added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all frags but the last one for a splice() call. The condition used to set the flag in pipe_to_sendpage() relied on splice() user passing the exact number of bytes present in the pipe, or a smaller one. But some programs pass an arbitrary high value, and the test fails. The effect of this bug is a lack of tcp_push() at the end of a splice(pipe -> socket) call, and possibly very slow or erratic TCP sessions. We should both test sd->total_len and fact that another fragment is in the pipe (pipe->nrbufs > 1) Many thanks to Willy for providing very clear bug report, bisection and test programs. Reported-by: Willy Tarreau Bisected-by: Willy Tarreau Tested-by: Willy Tarreau Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 2ed8840ba140a6c4c50f32521a3f551be20d9883 Author: Eric Dumazet Date: Tue Apr 24 22:12:06 2012 -0400 tcp: allow splice() to build full TSO packets [ This combines upstream commit 2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ] vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert Cc: Nandita Dukkipati Cc: Neal Cardwell Cc: Tom Herbert Cc: Yuchung Cheng Cc: H.K. Jerry Chu Cc: Maciej Å»enczykowski Cc: Mahesh Bandewar Cc: Ilpo Järvinen Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit b8710128e201bc0d628d9644454585925a6e59fd Author: Eric Dumazet Date: Thu Apr 21 09:45:37 2011 +0000 inet: add RCU protection to inet->opt commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream. We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: Eric Dumazet Cc: Herbert Xu Signed-off-by: David S. Miller [dannf/bwh: backported to Debian's 2.6.32] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit 66096e8915ce1ff5daa23362dc890651b160be7d Author: Mathias Krause Date: Wed Aug 15 11:31:57 2012 +0000 net: fix info leak in compat dev_ifconf() commit 43da5f2e0d0c69ded3d51907d9552310a6b545e8 upstream. The implementation of dev_ifconf() for the compat ioctl interface uses an intermediate ifc structure allocated in userland for the duration of the syscall. Though, it fails to initialize the padding bytes inserted for alignment and that for leaks four bytes of kernel stack. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [bwh: Backported to 2.6.32: adjust filename, context] Signed-off-by: Willy Tarreau commit f2e244fabae264798ac7980a3d3056ba1684a187 Author: Eric Dumazet Date: Mon Sep 24 07:00:11 2012 +0000 net: guard tcp_set_keepalive() to tcp sockets [ Upstream commit 3e10986d1d698140747fcfc2761ec9cb64c1d582 ] Its possible to use RAW sockets to get a crash in tcp_set_keepalive() / sk_reset_timer() Fix is to make sure socket is a SOCK_STREAM one. Reported-by: Dave Jones Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 81821315da86f301a126c3fdcdb542f77c3d7fe7 Author: Jesper Dangaard Brouer Date: Wed Oct 31 02:45:32 2012 +0000 net: fix divide by zero in tcp algorithm illinois commit 8f363b77ee4fbf7c3bbcf5ec2c5ca482d396d664 upstream Reading TCP stats when using TCP Illinois congestion control algorithm can cause a divide by zero kernel oops. The division by zero occur in tcp_illinois_info() at: do_div(t, ca->cnt_rtt); where ca->cnt_rtt can become zero (when rtt_reset is called) Steps to Reproduce: 1. Register tcp_illinois: # sysctl -w net.ipv4.tcp_congestion_control=illinois 2. Monitor internal TCP information via command "ss -i" # watch -d ss -i 3. Establish new TCP conn to machine Either it fails at the initial conn, or else it needs to wait for a loss or a reset. This is only related to reading stats. The function avg_delay() also performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its calling point in update_params(). Thus, simply fix tcp_illinois_info(). Function tcp_illinois_info() / get_info() is called without socket lock. Thus, eliminate any race condition on ca->cnt_rtt by using a local stack variable. Simply reuse info.tcpv_rttcnt, as its already set to ca->cnt_rtt. Function avg_delay() is not affected by this race condition, as its called with the socket lock. Cc: Petr Matousek Signed-off-by: Jesper Dangaard Brouer Acked-by: Eric Dumazet Acked-by: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 495ba7ee74f9cd65a0db9bcf43ccb90af79f1b93 Author: Cong Wang Date: Mon Jan 7 21:17:00 2013 +0000 net: prevent setting ttl=0 via IP_TTL [ Upstream commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce ] A regression is introduced by the following commit: commit 4d52cfbef6266092d535237ba5a4b981458ab171 Author: Eric Dumazet Date: Tue Jun 2 00:42:16 2009 -0700 net: ipv4/ip_sockglue.c cleanups Pure cleanups but it is not a pure cleanup... - if (val != -1 && (val < 1 || val>255)) + if (val != -1 && (val < 0 || val > 255)) Since there is no reason provided to allow ttl=0, change it back. Reported-by: nitin padalia Cc: nitin padalia Cc: Eric Dumazet Cc: David S. Miller Signed-off-by: Cong Wang Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 6e9a4199eb6df7a4444cf3a18dbfba51a021e4bb Author: Stefan Hasko Date: Fri Dec 21 15:04:59 2012 +0000 net: sched: integer overflow fix [ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ] Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0b42ca0de11329f2e7330273d6b029a426a5dc17 Author: Hiroaki SHIMODA Date: Fri Aug 3 19:57:52 2012 +0900 net_sched: gact: Fix potential panic in tcf_gact(). [ Upstream commit 696ecdc10622d86541f2e35cc16e15b6b3b1b67e ] gact_rand array is accessed by gact->tcfg_ptype whose value is assumed to less than MAX_RAND, but any range checks are not performed. So add a check in tcf_gact_init(). And in tcf_gact(), we can reduce a branch. Signed-off-by: Hiroaki SHIMODA Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 32ed4a99bbfca22a44a843564598c275ffa6b210 Author: Benjamin LaHaise Date: Fri Oct 19 15:21:01 2012 -0400 ipv4: check rt_genid in dst_check commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92 Author: Timo Teräs Date: Thu Mar 18 23:20:20 2010 +0000 ipv4: check rt_genid in dst_check Xfrm_dst keeps a reference to ipv4 rtable entries on each cached bundle. The only way to renew xfrm_dst when the underlying route has changed, is to implement dst_check for this. This is what ipv6 side does too. The problems started after 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9 ("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst to not get reused, until that all lookups always generated new xfrm_dst with new route reference and path mtu worked. But after the fix, the old routes started to get reused even after they were expired causing pmtu to break (well it would occationally work if the rtable gc had run recently and marked the route obsolete causing dst_check to get called). Signed-off-by: Timo Teras Acked-by: Herbert Xu Signed-off-by: David S. Miller This commit is based on the above, with the addition of verifying blackhole routes in the same manner. Fixing the issue with blackhole routes as it was accomplished in mainline would require pulling in a lot more code, and people were not interested in pulling in all of the dependencies given the much higher risk of trying to select the right subset of changes to include. The addition of the single line of "dst->obsolete = -1;" in ipv4_dst_blackhole() was much easier to verify, and is in the spirit of the patch in question. This is the minimal set of changes to fix the bug in question. A test case is available here : http://marc.info/?l=linux-netdev&m=135015076708950&w=2 Signed-off-by: Benjamin LaHaise Signed-off-by: Willy Tarreau commit 8500db68e0a884c16b9c4ca5fcd6dabfbdcf7522 Author: Hillf Danton Date: Fri Dec 10 18:54:11 2010 +0000 bonding: Fix slave selection bug. The returned slave is incorrect, if the net device under check is not charged yet by the master. Signed-off-by: Hillf Danton Signed-off-by: David S. Miller (cherry picked from commit af3e5bd5f650163c2e12297f572910a1af1b8236) Signed-off-by: Willy Tarreau commit 48bff07f50471e3a64250e4cce39529999230dad Author: Stephen Hemminger Date: Mon Feb 11 08:22:22 2013 +0000 bridge: set priority of STP packets Spanning Tree Protocol packets should have always been marked as control packets, this causes them to get queued in the high prirority FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge gets overloaded and can't communicate. This is a long-standing bug back to the first versions of Linux bridge. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller (cherry picked from commit 547b4e718115eea74087e28d7fa70aec619200db) Signed-off-by: Willy Tarreau commit 6f96a94d976a4fd37ee1617da8a3767be5494932 Author: danborkmann@iogearbox.net Date: Fri Aug 10 22:48:54 2012 +0000 af_packet: remove BUG statement in tpacket_destruct_skb [ Upstream commit 7f5c3e3a80e6654cf48dfba7cf94f88c6b505467 ] Here's a quote of the comment about the BUG macro from asm-generic/bug.h: Don't use BUG() or BUG_ON() unless there's really no way out; one example might be detecting data structure corruption in the middle of an operation that can't be backed out of. If the (sub)system can somehow continue operating, perhaps with reduced functionality, it's probably not BUG-worthy. If you're tempted to BUG(), think again: is completely giving up really the *only* solution? There are usually better options, where users don't need to reboot ASAP and can mostly shut down cleanly. In our case, the status flag of a ring buffer slot is managed from both sides, the kernel space and the user space. This means that even though the kernel side might work as expected, the user space screws up and changes this flag right between the send(2) is triggered when the flag is changed to TP_STATUS_SENDING and a given skb is destructed after some time. Then, this will hit the BUG macro. As David suggested, the best solution is to simply remove this statement since it cannot be used for kernel side internal consistency checks. I've tested it and the system still behaves /stable/ in this case, so in accordance with the above comment, we should rather remove it. Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3003ed6412e5a587eaa670cdadb34ab978c507f9 Author: Eric Dumazet Date: Thu Jan 10 15:26:34 2013 -0800 softirq: reduce latencies In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback to ksoftirqd condition to : - A time limit of 2 ms. - need_resched() being set on current task When one of this condition is met, we wakeup ksoftirqd for further softirq processing if we still have pending softirqs. Using need_resched() as the only condition can trigger RCU stalls, as we can keep BH disabled for too long. I ran several benchmarks and got no significant difference in throughput, but a very significant reduction of latencies (one order of magnitude) : In following bench, 200 antagonist "netperf -t TCP_RR" are started in background, using all available cpus. Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC IRQ (hard+soft) Before patch : RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=550110.424 MIN_LATENCY=146858 MAX_LATENCY=997109 P50_LATENCY=305000 P90_LATENCY=550000 P99_LATENCY=710000 MEAN_LATENCY=376989.12 STDDEV_LATENCY=184046.92 After patch : RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=40545.492 MIN_LATENCY=9834 MAX_LATENCY=78366 P50_LATENCY=33583 P90_LATENCY=59000 P99_LATENCY=69000 MEAN_LATENCY=38364.67 STDDEV_LATENCY=12865.26 Signed-off-by: Eric Dumazet Cc: David Miller Cc: Tom Herbert Cc: Ben Hutchings Signed-off-by: David S. Miller (cherry picked from commit c10d73671ad30f54692f7f69f0e09e75d3a8926a) Signed-off-by: Willy Tarreau commit e072f1c62ab6b74be00105c8db56eff6b760d42e Author: Eric Dumazet Date: Tue Mar 5 07:15:13 2013 +0000 net: reduce net_rx_action() latency to 2 HZ We should use time_after_eq() to get maximum latency of two ticks, instead of three. Bug added in commit 24f8b2385 (net: increase receive packet quantum) Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit d1f41b67ff7735193bc8b418b98ac99a448833e2) Signed-off-by: Willy Tarreau commit b680135dfdf54a1d59efca19c432412f4ed481ea Author: Alexey Khoroshilov Date: Wed Aug 8 00:33:25 2012 +0000 net/core: Fix potential memory leak in dev_set_alias() [ Upstream commit 7364e445f62825758fa61195d237a5b8ecdd06ec ] Do not leak memory by updating pointer with potentially NULL realloc return value. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit be1a41d754084db22bb7599edd140dff61853ced Author: J. Bruce Fields Date: Tue Dec 4 18:25:10 2012 -0500 nfsd4: fix oops on unusual readlike compound commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream. If the argument and reply together exceed the maximum payload size, then a reply with a read-like operation can overlow the rq_pages array. Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit fc9e795a20735726400c834f3290806139f2c957 Author: Trond Myklebust Date: Mon Dec 20 21:19:26 2010 +0000 kernel panic when mount NFSv4 On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote: > Hi, > > When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic > at NFS client's __rpc_create_common function. > > The panic place is: > rpc_mkpipe > __rpc_lookup_create() <=== find pipefile *idmap* > __rpc_mkpipe() <=== pipefile is *idmap* > __rpc_create_common() > ****** BUG_ON(!d_unhashed(dentry)); ****** *panic* > > It means that the dentry's d_flags have be set DCACHE_UNHASHED, > but it should not be set here. > > Is someone known this bug? or give me some idea? > > A reproduce program is append, but it can't reproduce the bug every time. > the export is: "/nfsroot *(rw,no_root_squash,fsid=0,insecure)" > > And the panic message is append. > > ============================================================================ > #!/bin/sh > > LOOPTOTAL=768 > LOOPCOUNT=0 > ret=0 > > while [ $LOOPCOUNT -ne $LOOPTOTAL ] > do > ((LOOPCOUNT += 1)) > service nfs restart > /usr/sbin/rpc.idmapd > mount -t nfs4 127.0.0.1:/ /mnt|| return 1; > ls -l /var/lib/nfs/rpc_pipefs/nfs/*/ > umount /mnt > echo $LOOPCOUNT > done > > =============================================================================== > Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b > 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20 > de ee f0 c7 44 24 04 56 ea > EIP:[] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28 > ---[ end trace 8f5606cd08928ed2]--- > Kernel panic - not syncing: Fatal exception > Pid:7131, comm: mount.nfs4 Tainted: G D -------------------2.6.32 #1 > Call Trace: > [] ? panic+0x42/0xed > [] ? oops_end+0xbc/0xd0 > [] ? do_invalid_op+0x0/0x90 > [] ? do_invalid_op+0x7f/0x90 > [] ? __rpc_create_common+0x8a/0xc0[sunrpc] > [] ? rpc_free_task+0x33/0x70[sunrpc] > [] ? prc_call_sync+0x48/0x60[sunrpc] > [] ? rpc_ping+0x4e/0x60[sunrpc] > [] ? rpc_create+0x38f/0x4f0[sunrpc] > [] ? error_code+0x73/0x78 > [] ? __rpc_create_common+0x8a/0xc0[sunrpc] > [] ? d_lookup+0x2a/0x40 > [] ? rpc_mkpipe+0x111/0x1b0[sunrpc] > [] ? nfs_create_rpc_client+0xb4/0xf0[nfs] > [] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs] > [] ? nfs_idmap_new+0x7b/0x140[nfs] > [] ? strlcpy+0x3a/0x60 > [] ? nfs4_set_client+0xea/0x2b0[nfs] > [] ? nfs4_create_server+0xac/0x1b0[nfs] > [] ? krealloc+0x40/0x50 > [] ? nfs4_remote_get_sb+0x6b/0x250[nfs] > [] ? kstrdup+0x3c/0x60 > [] ? vfs_kern_mount+0x69/0x170 > [] ? nfs_do_root_mount+0x6c/0xa0[nfs] > [] ? nfs4_try_mount+0x37/0xa0[nfs] > [] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs] > [] ? nfs4_get_sb+0x92/0x2f0 > [] ? vfs_kern_mount+0x69/0x170 > [] ? get_fs_type+0x32/0xb0 > [] ? do_kern_mount+0x3f/0xe0 > [] ? do_mount+0x2ef/0x740 > [] ? copy_mount_options+0xb0/0x120 > [] ? sys_mount+0x6e/0xa0 Hi, Does the following patch fix the problem? Cheers Trond -------------------------- SUNRPC: Fix a BUG in __rpc_create_common From: Trond Myklebust Mi Jinlong reports: When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic at NFS client's __rpc_create_common function. The panic place is: rpc_mkpipe __rpc_lookup_create() <=== find pipefile *idmap* __rpc_mkpipe() <=== pipefile is *idmap* __rpc_create_common() ****** BUG_ON(!d_unhashed(dentry)); ****** *panic* The test is wrong: we can find ourselves with a hashed negative dentry here if the idmapper tried to look up the file before we got round to creating it. Just replace the BUG_ON() with a d_drop(dentry). [2.6.32 background info from Jonathan below] > Hi Willy et al, > > Please consider > > beb0f0a9fba1 kernel panic when mount NFSv4, 2010-12-20 > > for application to kernel.org's 2.6.32.y and 2.6.34.y trees. The > patch was applied upstream during the 2.6.38 merge window, so newer > kernels don't need it. > > (Context: .) Tom Downes (cc-ed) > experienced the bug on a Debian kernel close to 2.6.32.58 and > confirmed that the patch doesn't seem to hurt. > > The patch is part of Fedora 13's 2.6.34-based and Fedora 14's > 2.6.35-based kernels[1]. It was also included in the RHEL kernel at > some point between 2.6.32-71.29.1.el6 and 2.6.32-131.0.15.el6[2]. > > Thoughts of all kinds welcome, as always. > > Regards, > Jonathan > > [1] https://bugzilla.redhat.com/673207 > [2] https://oss.oracle.com/git/?p=redpatch.git;a=commit;h=8028cccdc4b1 Reported-by: Mi Jinlong Signed-off-by: Trond Myklebust (cherry picked from commit beb0f0a9fba1fa98b378329a9a5b0a73f25097ae) Cc: Jonathan Nieder Signed-off-by: Willy Tarreau commit eeb72f5d343f362aaf16e7ec037026c963c9112b Author: Eric Sandeen Date: Sat Mar 9 15:18:39 2013 +0000 btrfs: use rcu_barrier() to wait for bdev puts at unmount commit bc178622d40d87e75abc131007342429c9b03351 upstream. Doing this would reliably fail with -EBUSY for me: # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2 ... unable to open /dev/sdb2: Device or resource busy because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it. Using systemtap to track bdev gets & puts shows a kworker thread doing a blkdev put after mkfs attempts a get; this is left over from the unmount path: btrfs_close_devices __btrfs_close_devices call_rcu(&device->rcu, free_device); free_device INIT_WORK(&device->rcu_work, __free_device); schedule_work(&device->rcu_work); so unmount might complete before __free_device fires & does its blkdev_put. Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount completes. Signed-off-by: Eric Sandeen Signed-off-by: Josef Bacik Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c3ff29a27cb6acef072eb552ca6aaefeee2839d3 Author: Vyacheslav Dubeyko Date: Wed Apr 17 15:58:33 2013 -0700 hfsplus: fix potential overflow in hfsplus_file_truncate() commit 12f267a20aecf8b84a2a9069b9011f1661c779b4 upstream. Change a u32 to loff_t hfsplus_file_truncate(). Signed-off-by: Vyacheslav Dubeyko Cc: Christoph Hellwig Cc: Al Viro Cc: Hin-Tak Leung Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9ecb77213c3f5393deaf3d9c942d25ce31f80074 Author: Alan Stern Date: Thu Nov 17 16:42:19 2011 -0500 NLS: improve UTF8 -> UTF16 string conversion routine commit 0720a06a7518c9d0c0125bd5d1f3b6264c55c3dd upstream. The utf8s_to_utf16s conversion routine needs to be improved. Unlike its utf16s_to_utf8s sibling, it doesn't accept arguments specifying the maximum length of the output buffer or the endianness of its 16-bit output. This patch (as1501) adds the two missing arguments, and adjusts the only two places in the kernel where the function is called. A follow-on patch will add a third caller that does utilize the new capabilities. The two conversion routines are still annoyingly inconsistent in the way they handle invalid byte combinations. But that's a subject for a different patch. Signed-off-by: Alan Stern CC: Clemens Ladisch Signed-off-by: Greg Kroah-Hartman [bwh: Bakckported to 2.6.32: drop Hyper-V change] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit c71109d6d398e60fbe55102ae752f1c036cce27b Author: Kevin Dankwardt Date: Wed Feb 10 23:43:40 2010 +0900 fat: Fix stat->f_namelen commit eeb5b4ae81f4a750355fa0c15f4fea22fdf83be1 upstream. I found that the length of a file name when created cannot exceed 255 characters, yet, pathconf(), via statfs(), returns the maximum as 260. Signed-off-by: Kevin Dankwardt Signed-off-by: OGAWA Hirofumi Signed-off-by: Willy Tarreau commit cf2e392e665040845ab7877951a2add3ffc8d7e4 Author: Mathias Krause Date: Thu Jul 12 08:46:54 2012 +0200 isofs: avoid info leak on export commit fe685aabf7c8c9f138e5ea900954d295bf229175 upstream. For type 1 the parent_offset member in struct isofs_fid gets copied uninitialized to userland. Fix this by initializing it to 0. Signed-off-by: Mathias Krause Signed-off-by: Jan Kara Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 1b407e59a7c1c97c71977bbe03ce448de8b50877 Author: Cong Ding Date: Tue Jan 22 19:20:58 2013 -0500 fs/cifs/cifs_dfs_ref.c: fix potential memory leakage commit 10b8c7dff5d3633b69e77f57d404dab54ead3787 upstream. When it goes to error through line 144, the memory allocated to *devname is not freed, and the caller doesn't free it either in line 250. So we free the memroy of *devname in function cifs_compose_mount_options() when it goes to error. Signed-off-by: Cong Ding Reviewed-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a27c8ce1ae0def6866b2ce9b427df53aa8678c27 Author: Jan Kara Date: Tue Feb 5 13:59:56 2013 +0100 udf: Fix bitmap overflow on large filesystems with small block size commit 89b1f39eb4189de745fae554b0d614d87c8d5c63 upstream. For large UDF filesystems with 512-byte blocks the number of necessary bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows (the number will overflow for filesystems larger than 128 GB with 512-byte blocks). That results in ENOSPC errors despite the filesystem has plenty of free space. Fix the problem by changing s_nr_groups' type to 'int'. That is enough even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize. Reported-and-tested-by: v10lator@myway.de Signed-off-by: Jan Kara Cc: Jim Trigg Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 78d630bd79019250a11ea4aae6ab4ecadf1da68e Author: Mathias Krause Date: Thu Jul 12 08:46:55 2012 +0200 udf: avoid info leak on export commit 0143fc5e9f6f5aad4764801015bc8d4b4a278200 upstream. For type 0x51 the udf.parent_partref member in struct fid gets copied uninitialized to userland. Fix this by initializing it to 0. Signed-off-by: Mathias Krause Signed-off-by: Jan Kara Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5ece3559d5d7264dc62ef35df87c92e5c342fd24 Author: Namjae Jeon Date: Wed Oct 10 00:08:56 2012 +0900 udf: fix memory leak while allocating blocks during write commit 2fb7d99d0de3fd8ae869f35ab682581d8455887a upstream. Need to brelse the buffer_head stored in cur_epos and next_epos. Signed-off-by: Namjae Jeon Signed-off-by: Ashish Sangwan Signed-off-by: Jan Kara Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5f591370f5a4614a2575e03acc9adc84f5dd794a Author: Theodore Ts'o Date: Thu Dec 27 01:42:50 2012 -0500 ext4: avoid hang when mounting non-journal filesystems with orphan list commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream. When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" Reviewed-by: Zheng Liu Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau commit 9b3c0cff40f10b6d6fd852e7313719a168a63d7e Author: Anatol Pomozov Date: Tue Sep 18 13:38:59 2012 -0400 ext4: make orphan functions be no-op in no-journal mode commit c9b92530a723ac5ef8e352885a1862b18f31b2f5 upstream. Instead of checking whether the handle is valid, we check if journal is enabled. This avoids taking the s_orphan_lock mutex in all cases when there is no journal in use, including the error paths where ext4_orphan_del() is called with a handle set to NULL. Signed-off-by: Anatol Pomozov Signed-off-by: "Theodore Ts'o" Signed-off-by: Willy Tarreau commit 82c1ce5414718cfee2653c28fe7b9aa268fa0e48 Author: Jamie Iles Date: Thu Feb 21 10:18:51 2013 +0000 CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure [dannf: backported to Debian's 2.6.32] According to Ben : > The original upstream commits were c278531d39f3158bfee93dc67da0b77e09776de2, > 60d4616f3dc63371b3dc367e5e88fd4b4f037f65 and (most importantly) > dee1f973ca341c266229faa5a1a5bb268bed3531 by Dmitry Monakhov > . They were backported into the RHEL 6 kernel by > Lukas Czerner, according to its changelog. Dann got this version from > Oracle's redpatch repository, where, if I understand rightly, Jamie Iles > attempted to regenerate Lukas's patch(es). Cc: Ben Hutchings Signed-off-by: Jamie Iles Signed-off-by: Willy Tarreau commit 83c866120f3ae54640dd1fb37efb92c49cf5476d Author: Lachlan McIlroy Date: Sun May 5 23:10:00 2013 -0400 ext4: limit group search loop for non-extent files commit e6155736ad76b2070652745f9e54cdea3f0d8567 upstream. In the case where we are allocating for a non-extent file, we must limit the groups we allocate from to those below 2^32 blocks, and ext4_mb_regular_allocator() attempts to do this initially by putting a cap on ngroups for the subsequent search loop. However, the initial target group comes in from the allocation context (ac), and it may already be beyond the artificially limited ngroups. In this case, the limit if (group == ngroups) group = 0; at the top of the loop is never true, and the loop will run away. Catch this case inside the loop and reset the search to start at group 0. [sandeen@redhat.com: add commit msg & comments] Signed-off-by: Lachlan McIlroy Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 75de899bc295e9f4d14a359fe59c55bc9fe6ac27 Author: Niu Yawei Date: Fri Feb 1 21:31:27 2013 -0500 ext4: fix race in ext4_mb_add_n_trim() commit f1167009711032b0d747ec89a632a626c901a1ad upstream. In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when changing the lg_prealloc_list. Signed-off-by: Niu Yawei Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a89571bfb8ffbe9ec54e9fa82682f1a9eca9a5bf Author: Theodore Ts'o Date: Thu Dec 27 01:42:48 2012 -0500 ext4: lock i_mutex when truncating orphan inodes commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream. Commit c278531d39 added a warning when ext4_flush_unwritten_io() is called without i_mutex being taken. It had previously not been taken during orphan cleanup since races weren't possible at that point in the mount process, but as a result of this c278531d39, we will now see a kernel WARN_ON in this case. Take the i_mutex in ext4_orphan_cleanup() to suppress this warning. Reported-by: Alexander Beregalov Signed-off-by: "Theodore Ts'o" Reviewed-by: Zheng Liu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 55efcf30c42f266bc3303d7151ad328674bd51f2 Author: Jan Kara Date: Wed Sep 26 21:52:20 2012 -0400 ext4: fix fdatasync() for files with only i_size changes commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4ae5f4a212db4c83983a2672a3c6c9e6a7a05559 Author: Bernd Schubert Date: Wed Sep 26 21:24:57 2012 -0400 ext4: always set i_op in ext4_mknod() commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream. ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e5180af75740e7239c37e1f63d5a67535209dae5 Author: Dmitry Monakhov Date: Wed Sep 26 12:32:54 2012 -0400 ext4: online defrag is not supported for journaled files commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream. Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5d0735ba9c2f8a2e2b03b33bfa1de1840209d2e9 Author: Eugene Shatokhin Date: Thu Nov 8 15:11:11 2012 -0500 ext4: fix memory leak in ext4_xattr_set_acl()'s error path commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream. In ext4_xattr_set_acl(), if ext4_journal_start() returns an error, posix_acl_release() will not be called for 'acl' which may result in a memory leak. This patch fixes that. Reviewed-by: Lukas Czerner Signed-off-by: Eugene Shatokhin Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 352585523cea4cad8a77c441f7e50ed18a99a466 Author: Lukas Czerner Date: Mon Jun 6 00:05:17 2011 -0400 ext4: Fix max file size and logical block counting of extent format file commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream Kazuya Mio reported that he was able to hit BUG_ON(next == lblock) in ext4_ext_put_gap_in_cache() while creating a sparse file in extent format and fill the tail of file up to its end. We will hit the BUG_ON when we write the last block (2^32-1) into the sparse file. The root cause of the problem lies in the fact that we specifically set s_maxbytes so that block at s_maxbytes fit into on-disk extent format, which is 32 bit long. However, we are not storing start and end block number, but rather start block number and length in blocks. It means that in order to cover extent from 0 to EXT_MAX_BLOCK we need EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) - and it does not. The only way to fix it without changing the meaning of the struct ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes by one fs block so we can cover the whole extent we can get by the on-disk extent format. Also in many places EXT_MAX_BLOCK is used as length instead of maximum logical block number as the name suggests, it is all a bit messy. So this commit renames it to EXT_MAX_BLOCKS and change its usage in some places to actually be maximum number of blocks in the extent. The bug which this commit fixes can be reproduced as follows: dd if=/dev/zero of=/mnt/mp1/file bs= count=1 seek=$((2**32-2)) sync dd if=/dev/zero of=/mnt/mp1/file bs= count=1 seek=$((2**32-1)) Reported-by: Kazuya Mio Signed-off-by: Lukas Czerner Signed-off-by: "Theodore Ts'o" [dannf: Applied the backport from RHEL6 to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit efa760eb2d84b4b1af836fa858952f6dd8c9f126 Author: Allison Henderson Date: Sun May 15 00:19:41 2011 -0400 ext4: don't dereference null pointer when make_indexed_dir() fails Fix for a null pointer bug found while running punch hole tests Signed-off-by: Allison Henderson Signed-off-by: "Theodore Ts'o" (cherry picked from commit 6976a6f2acde2b0443cd64f1d08af90630e4ce81) Signed-off-by: Willy Tarreau commit 04b7278098b06c517c5ffcbbda97cd4bdd5cf881 Author: Jan Kara Date: Tue May 3 11:05:55 2011 -0400 ext4: Fix fs corruption when make_indexed_dir() fails When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" (cherry picked from commit 7ad8e4e6ae2a7c95445ee1715b1714106fb95037) Signed-off-by: Willy Tarreau commit 35ea3124c4fc9589d2d2326f81d17957a02a0356 Author: Jan Kara Date: Fri Nov 23 14:03:04 2012 +0100 jbd: Fix lock ordering bug in journal_unmap_buffer() commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream. Commit 09e05d48 introduced a wait for transaction commit into journal_unmap_buffer() in the case we are truncating a buffer undergoing commit in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly we forgot to drop buffer lock before waiting for transaction commit and thus deadlock is possible when kjournald wants to lock the buffer. Fix the problem by dropping the buffer lock before waiting for transaction commit. Since we are still holding page lock (and that is OK), buffer cannot disappear under us. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d940878632e63d7f0c2af5e4ebfcf5135c48dcfb Author: Jan Kara Date: Wed Jul 11 23:16:25 2012 +0200 jbd: Fix assertion failure in commit code due to lacking transaction credits ext3 users of data=journal mode with blocksize < pagesize were occasionally hitting assertion failure in journal_commit_transaction() checking whether the transaction has at least as many credits reserved as buffers attached. The core of the problem is that when a file gets truncated, buffers that still need checkpointing or that are attached to the committing transaction are left with buffer_mapped set. When this happens to buffers beyond i_size attached to a page stradding i_size, subsequent write extending the file will see these buffers and as they are mapped (but underlying blocks were freed) things go awry from here. The assertion failure just coincidentally (and in this case luckily as we would start corrupting filesystem) triggers due to journal_head not being properly cleaned up as well. Under some rare circumstances this bug could even hit data=ordered mode users. There the assertion won't trigger and we would end up corrupting the filesystem. We fix the problem by unmapping buffers if possible (in lots of cases we just need a buffer attached to a transaction as a place holder but it must not be written out anyway). And in one case, we just have to bite the bullet and wait for transaction commit to finish. Reviewed-by: Josef Bacik Signed-off-by: Jan Kara (cherry picked from commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3) Signed-off-by: Willy Tarreau commit 0b48676ff9d2fae5cacf2569f34a0bfbdd02754c Author: Jan Kara Date: Tue Feb 16 20:37:12 2010 +0100 jbd: Delay discarding buffers in journal_unmap_buffer Delay discarding buffers in journal_unmap_buffer until we know that "add to orphan" operation has definitely been committed, otherwise the log space of committing transation may be freed and reused before truncate get committed, updates may get lost if crash happens. This patch is a backport of JBD2 fix by dingdinghua . Signed-off-by: Jan Kara (cherry picked from commit 86963918965eb8fe0c8ae009e7c1b4c630f533d5) Signed-off-by: Willy Tarreau commit 1e3e810627e46506f92d198b9bfee640541c9d09 Author: Greg Thelen Date: Fri Feb 22 16:36:01 2013 -0800 tmpfs: fix use-after-free of mempolicy object commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream. The tmpfs remount logic preserves filesystem mempolicy if the mpol=M option is not specified in the remount request. A new policy can be specified if mpol=M is given. Before this patch remounting an mpol bound tmpfs without specifying mpol= mount option in the remount request would set the filesystem's mempolicy object to a freed mempolicy object. To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run: # mkdir /tmp/x # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0 # mount -o remount,size=200M nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0 # note ? garbage in mpol=... output above # dd if=/dev/zero of=/tmp/x/f count=1 # panic here Panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC Call Trace: mpol_shared_policy_init+0xa5/0x160 shmem_get_inode+0x209/0x270 shmem_mknod+0x3e/0xf0 shmem_create+0x18/0x20 vfs_create+0xb5/0x130 do_last+0x9a1/0xea0 path_openat+0xb3/0x4d0 do_filp_open+0x42/0xa0 do_sys_open+0xfe/0x1e0 compat_sys_open+0x1b/0x20 cstar_dispatch+0x7/0x1f Non-debug kernels will not crash immediately because referencing the dangling mpol will not cause a fault. Instead the filesystem will reference a freed mempolicy object, which will cause unpredictable behavior. The problem boils down to a dropped mpol reference below if shmem_parse_options() does not allocate a new mpol: config = *sbinfo shmem_parse_options(data, &config, true) mpol_put(sbinfo->mpol) sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */ This patch avoids the crash by not releasing the mempolicy if shmem_parse_options() doesn't create a new mpol. How far back does this issue go? I see it in both 2.6.36 and 3.3. I did not look back further. Signed-off-by: Greg Thelen Acked-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 59deaa364afff9d2a6e5d82b51994ab4736ae253 Author: Geert Uytterhoeven Date: Sat Sep 29 22:23:19 2012 +0200 sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat() commit 66081a72517a131430dcf986775f3268aafcb546 upstream. The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4495d600ba082682bbd6cc17303a3a177f135864 Author: Anurup m Date: Mon Apr 29 15:05:52 2013 -0700 fs/fscache/stats.c: fix memory leak commit ec686c9239b4d472052a271c505d04dae84214cc upstream. There is a kernel memory leak observed when the proc file /proc/fs/fscache/stats is read. The reason is that in fscache_stats_open, single_open is called and the respective release function is not called during release. Hence fix with correct release function - single_release(). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101 Signed-off-by: Anurup m Cc: shyju pv Cc: Sanil kumar Cc: Nataraj m Cc: Li Zefan Cc: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e5a2fb5600c04a1ea2fea0e36c35ec7409150c56 Author: Kees Cook Date: Thu Oct 25 13:38:16 2012 -0700 fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check commit 12176503366885edd542389eed3aaf94be163fdb upstream. The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check while converting ioctl arguments. This could lead to leaking kernel stack contents into userspace. Patch extracted from existing fix in grsecurity. Signed-off-by: Kees Cook Cc: David Miller Cc: Brad Spengler Cc: PaX Team Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Willy Tarreau commit 1c137a47bbdd6e86298627e04f547afd7f35d523 Author: Eric Wong Date: Tue Jan 1 21:20:27 2013 +0000 epoll: prevent missed events on EPOLL_CTL_MOD commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream. EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong Cc: Hans Verkuil Cc: Jiri Olsa Cc: Jonathan Corbet Cc: Al Viro Cc: Davide Libenzi Cc: Hans de Goede Cc: Mauro Carvalho Chehab Cc: David Miller Cc: Eric Dumazet Cc: Andrew Morton Cc: Andreas Voellmy Tested-by: "Junchang(Jason) Wang" Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit b7924161e554fa880c9c9c1c987f6800abc7ce29 Author: Ying Xue Date: Mon Aug 6 17:46:37 2012 +0800 USB: ftdi_sio: Quiet sparse noise about using plain integer was NULL pointer commit a816e3113b63753c330ca4751ea1d208e93e3015 upstream. Pointers should not be compared to plain integers. Quiets the sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: Ying Xue Signed-off-by: Greg Kroah-Hartman Suggested-by: Lotfi Manseur Signed-off-by: Willy Tarreau commit d62e044103ca7e75bfaa97c562472c33d5b9e408 Author: Andrew Worsley Date: Tue Nov 22 20:00:19 2011 +1100 USB: serial: ftdi_sio: Handle the old_termios == 0 case e.g. uart_resume_port() commit c515598e0f5769916c31c00392cc2bfe6af74e55 upstream. Handle null old_termios in ftdi_set_termios() calls from uart_resume_port(). Signed-off-by: Andrew Worsley Signed-off-by: Greg Kroah-Hartman Suggested-by: Lotfi Manseur Signed-off-by: Willy Tarreau commit e3c81f656abdbe5ba541a04d60d264046fb6fa22 Author: Oliver Neukum Date: Tue Mar 12 14:52:42 2013 +0100 USB: cdc-wdm: fix buffer overflow commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream. The buffer for responses must not overflow. If this would happen, set a flag, drop the data and return an error after user space has read all remaining data. Signed-off-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit 09b1d9f2b34b2dd409897602fd0133b7d1a5f117 Author: Wolfgang Frisch Date: Thu Jan 17 01:07:02 2013 +0100 USB: io_ti: Fix NULL dereference in chase_port() commit 1ee0a224bc9aad1de496c795f96bc6ba2c394811 upstream The tty is NULL when the port is hanging up. chase_port() needs to check for this. This patch is intended for stable series. The behavior was observed and tested in Linux 3.2 and 3.7.1. Johan Hovold submitted a more elaborate patch for the mainline kernel. [ 56.277883] usb 1-1: edge_bulk_in_callback - nonzero read bulk status received: -84 [ 56.278811] usb 1-1: USB disconnect, device number 3 [ 56.278856] usb 1-1: edge_bulk_in_callback - stopping read! [ 56.279562] BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8 [ 56.280536] IP: [] _raw_spin_lock_irqsave+0x19/0x35 [ 56.281212] PGD 1dc1b067 PUD 1e0f7067 PMD 0 [ 56.282085] Oops: 0002 [#1] SMP [ 56.282744] Modules linked in: [ 56.283512] CPU 1 [ 56.283512] Pid: 25, comm: khubd Not tainted 3.7.1 #1 innotek GmbH VirtualBox/VirtualBox [ 56.283512] RIP: 0010:[] [] _raw_spin_lock_irqsave+0x19/0x35 [ 56.283512] RSP: 0018:ffff88001fa99ab0 EFLAGS: 00010046 [ 56.283512] RAX: 0000000000000046 RBX: 00000000000001c8 RCX: 0000000000640064 [ 56.283512] RDX: 0000000000010000 RSI: ffff88001fa99b20 RDI: 00000000000001c8 [ 56.283512] RBP: ffff88001fa99b20 R08: 0000000000000000 R09: 0000000000000000 [ 56.283512] R10: 0000000000000000 R11: ffffffff812fcb4c R12: ffff88001ddf53c0 [ 56.283512] R13: 0000000000000000 R14: 00000000000001c8 R15: ffff88001e19b9f4 [ 56.283512] FS: 0000000000000000(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000 [ 56.283512] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 56.283512] CR2: 00000000000001c8 CR3: 000000001dc51000 CR4: 00000000000006e0 [ 56.283512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 56.283512] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 56.283512] Process khubd (pid: 25, threadinfo ffff88001fa98000, task ffff88001fa94f80) [ 56.283512] Stack: [ 56.283512] 0000000000000046 00000000000001c8 ffffffff810578ec ffffffff812fcb4c [ 56.283512] ffff88001e19b980 0000000000002710 ffffffff812ffe81 0000000000000001 [ 56.283512] ffff88001fa94f80 0000000000000202 ffffffff00000001 0000000000000296 [ 56.283512] Call Trace: [ 56.283512] [] ? add_wait_queue+0x12/0x3c [ 56.283512] [] ? usb_serial_port_work+0x28/0x28 [ 56.283512] [] ? chase_port+0x84/0x2d6 [ 56.283512] [] ? try_to_wake_up+0x199/0x199 [ 56.283512] [] ? tty_ldisc_hangup+0x222/0x298 [ 56.283512] [] ? edge_close+0x64/0x129 [ 56.283512] [] ? __wake_up+0x35/0x46 [ 56.283512] [] ? should_resched+0x5/0x23 [ 56.283512] [] ? tty_port_shutdown+0x39/0x44 [ 56.283512] [] ? usb_serial_port_work+0x28/0x28 [ 56.283512] [] ? __tty_hangup+0x307/0x351 [ 56.283512] [] ? usb_hcd_flush_endpoint+0xde/0xed [ 56.283512] [] ? _raw_spin_lock_irqsave+0x14/0x35 [ 56.283512] [] ? usb_serial_disconnect+0x57/0xc2 [ 56.283512] [] ? usb_unbind_interface+0x5c/0x131 [ 56.283512] [] ? __device_release_driver+0x7f/0xd5 [ 56.283512] [] ? device_release_driver+0x1a/0x25 [ 56.283512] [] ? bus_remove_device+0xd2/0xe7 [ 56.283512] [] ? device_del+0x119/0x167 [ 56.283512] [] ? usb_disable_device+0x6a/0x180 [ 56.283512] [] ? usb_disconnect+0x81/0xe6 [ 56.283512] [] ? hub_thread+0x577/0xe82 [ 56.283512] [] ? __schedule+0x490/0x4be [ 56.283512] [] ? abort_exclusive_wait+0x79/0x79 [ 56.283512] [] ? usb_remote_wakeup+0x2f/0x2f [ 56.283512] [] ? usb_remote_wakeup+0x2f/0x2f [ 56.283512] [] ? kthread+0x81/0x89 [ 56.283512] [] ? __kthread_parkme+0x5c/0x5c [ 56.283512] [] ? ret_from_fork+0x7c/0xb0 [ 56.283512] [] ? __kthread_parkme+0x5c/0x5c [ 56.283512] Code: 8b 7c 24 08 e8 17 0b c3 ff 48 8b 04 24 48 83 c4 10 c3 53 48 89 fb 41 50 e8 e0 0a c3 ff 48 89 04 24 e8 e7 0a c3 ff ba 00 00 01 00 0f c1 13 48 8b 04 24 89 d1 c1 ea 10 66 39 d1 74 07 f3 90 66 [ 56.283512] RIP [] _raw_spin_lock_irqsave+0x19/0x35 [ 56.283512] RSP [ 56.283512] CR2: 00000000000001c8 [ 56.283512] ---[ end trace 49714df27e1679ce ]--- Signed-off-by: Wolfgang Frisch Cc: Johan Hovold Cc: stable Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit f9477f694dccfe65fe26506924b23d464f770ce3 Author: Johan Hovold Date: Tue Mar 19 09:21:07 2013 +0100 USB: garmin_gps: fix memory leak on disconnect commit 618aa1068df29c37a58045fe940f9106664153fd upstream. Remove bogus disconnect test introduced by 95bef012e ("USB: more serial drivers writing after disconnect") which prevented queued data from being freed on disconnect. The possible IO it was supposed to prevent is long gone. Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 22045d4973f27bfbb485ba8c73e67264df9285a9 Author: Johan Hovold Date: Thu Oct 25 13:35:10 2012 +0200 USB: mos7840: fix port-device leak in error path commit 3eb55cc4ed88eee3b5230f66abcdbd2a91639eda upstream. The driver set the usb-serial port pointers to NULL on errors in attach, effectively preventing usb-serial core from decrementing the port ref counters and releasing the port devices and associated data. Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 451820e7e7073a69f4d3c3cfd98059feeb65ec28 Author: Johan Hovold Date: Thu Oct 25 13:35:09 2012 +0200 USB: mos7840: fix urb leak at release commit 65a4cdbb170e4ec1a7fa0e94936d47e24a17b0e8 upstream. Make sure control urb is freed at release. Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 70c15a31d29774d6fe95fc7b96feb6be21dee846 Author: Lennart Sorensen Date: Wed Oct 24 10:23:09 2012 -0400 USB: serial: Fix memory leak in sierra_release() commit f7bc5051667b74c3861f79eed98c60d5c3b883f7 upstream. I found a memory leak in sierra_release() (well sierra_probe() I guess) that looses 8 bytes each time the driver releases a device. Signed-off-by: Len Sorensen Acked-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3fe1b6421a9b124004440ba30f1295e92476e1cd Author: Johan Hovold Date: Thu Oct 25 10:29:01 2012 +0200 USB: whiteheat: fix memory leak in error path commit c129197c99550d356cf5f69b046994dd53cd1b9d upstream. Make sure command buffer is deallocated in case of errors during attach. Signed-off-by: Johan Hovold Cc: Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4b4963a9e7bbd37dcbdc282eecf8ba131828c450 Author: Alan Stern Date: Tue Jul 5 16:34:05 2011 +0000 USB: EHCI: go back to using the system clock for QH unlinks commit 004c19682884d4f40000ce1ded53f4a1d0b18206 upstream This patch (as1477) fixes a problem affecting a few types of EHCI controller. Contrary to what one might expect, these controllers automatically stop their internal frame counter when no ports are enabled. Since ehci-hcd currently relies on the frame counter for determining when it should unlink QHs from the async schedule, those controllers run into trouble: The frame counter stops and the QHs never get unlinked. Some systems have also experienced other problems traced back to commit b963801164618e25fbdc0cd452ce49c3628b46c8 (USB: ehci-hcd unlink speedups), which made the original switch from using the system clock to using the frame counter. It never became clear what the reason was for these problems, but evidently it is related to use of the frame counter. To fix all these problems, this patch more or less reverts that commit and goes back to using the system clock. But this can't be done cleanly because other changes have since been made to the scan_async() subroutine. One of these changes involved the tricky logic that tries to avoid rescanning QHs that have already been seen when the scanning loop is restarted, which happens whenever an URB is given back. Switching back to clock-based unlinks would make this logic even more complicated. Therefore the new code doesn't rescan the entire async list whenever a giveback occurs. Instead it rescans only the current QH and continues on from there. This requires the use of a separate pointer to keep track of the next QH to scan, since the current QH may be unlinked while the scanning is in progress. That new pointer must be global, so that it can be adjusted forward whenever the _next_ QH gets unlinked. (uhci-hcd uses this same trick.) Simplification of the scanning loop removes a level of indentation, which accounts for the size of the patch. The amount of code changed is relatively small, and it isn't exactly a reversion of the b963801164 commit. This fixes Bugzilla #32432. Signed-off-by: Alan Stern CC: Tested-by: Matej Kenda Signed-off-by: Greg Kroah-Hartman Signed-off-by: Thomas Bork Signed-off-by: Willy Tarreau commit 1238778a3ee1ff35da4d47dac157f03c557d2d86 Author: Matthew Garrett Date: Tue Aug 14 16:44:49 2012 -0400 xhci: Make handover code more robust commit e955a1cd086de4d165ae0f4c7be7289d84b63bdc upstream. My test platform (Intel DX79SI) boots reliably under BIOS, but frequently crashes when booting via UEFI. I finally tracked this down to the xhci handoff code. It seems that reads from the device occasionally just return 0xff, resulting in xhci_find_next_cap_offset generating a value that's larger than the resource region. We then oops when attempting to read the value. Sanity checking that value lets us avoid the crash. I've no idea what's causing the underlying problem, and xhci still doesn't actually *work* even with this, but the machine at least boots which will probably make further debugging easier. This should be backported to kernels as old as 2.6.31, that contain the commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci: BIOS handoff and HW initialization." Signed-off-by: Matthew Garrett Signed-off-by: Sarah Sharp Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e55f3e4377c1908b7ef7b5d42b53c4f646502ce6 Author: Mathias Krause Date: Sun Apr 7 01:51:49 2013 +0000 Bluetooth: fix possible info leak in bt_sock_recvmsg() commit 4683f42fde3977bdb4e8a09622788cc8b5313778 upstream. In case the socket is already shutting down, bt_sock_recvmsg() returns with 0 without updating msg_namelen leading to net/socket.c leaking the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix this by moving the msg_namelen assignment in front of the shutdown test. Cc: Marcel Holtmann Cc: Gustavo Padovan Cc: Johan Hedberg Signed-off-by: Mathias Krause Signed-off-by: David S. Miller [dannf: adjusted to apply to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit cb15ea45068c50395623764bba398347cc20c81d Author: Mathias Krause Date: Wed Aug 15 11:31:51 2012 +0000 Bluetooth: L2CAP - Fix info leak via getsockname() commit 792039c73cf176c8e39a6e8beef2c94ff46522ed upstream. The L2CAP code fails to initialize the l2_bdaddr_type member of struct sockaddr_l2 and the padding byte added for alignment. It that for leaks two bytes kernel stack via the getsockname() syscall. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Mathias Krause Cc: Marcel Holtmann Cc: Gustavo Padovan Cc: Johan Hedberg Signed-off-by: David S. Miller [bwh: Backported to 2.6.32: adjust filename] Signed-off-by: Willy Tarreau commit 78ded8d633aaa9e45846e8d6157438f0adaee89b Author: Mathias Krause Date: Sun Apr 7 01:51:50 2013 +0000 Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg() [ Upstream commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 ] If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns early with 0 without updating the possibly set msg_namelen member. This, in turn, leads to a 128 byte kernel stack leak in net/socket.c. Fix this by updating msg_namelen in this case. For all other cases it will be handled in bt_sock_stream_recvmsg(). Signed-off-by: Mathias Krause Cc: Marcel Holtmann Cc: Gustavo Padovan Cc: Johan Hedberg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 473f13e1e624fc5007ff741dd50bb634ad38dbee Author: Mathias Krause Date: Wed Aug 15 11:31:50 2012 +0000 Bluetooth: RFCOMM - Fix info leak via getsockname() [ Upstream commit 9344a972961d1a6d2c04d9008b13617bcb6ec2ef ] The RFCOMM code fails to initialize the trailing padding byte of struct sockaddr_rc added for alignment. It that for leaks one byte kernel stack via the getsockname() syscall. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Mathias Krause Cc: Marcel Holtmann Cc: Gustavo Padovan Cc: Johan Hedberg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit c47abf3b0aa3281b9cd1bfb333ea853b90026d82 Author: Mathias Krause Date: Wed Aug 15 11:31:46 2012 +0000 Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER) [ Upstream commit e15ca9a0ef9a86f0477530b0f44a725d67f889ee ] The HCI code fails to initialize the two padding bytes of struct hci_ufilter before copying it to userland -- that for leaking two bytes kernel stack. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Mathias Krause Cc: Marcel Holtmann Cc: Gustavo Padovan Cc: Johan Hedberg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 326af266f7d2ce13c432292e3430fa022f3f8c5a Author: Anderson Lizardo Date: Sun Jan 6 18:28:53 2013 -0400 Bluetooth: Fix incorrect strncpy() in hidp_setup_hid() commit 0a9ab9bdb3e891762553f667066190c1d22ad62b upstream The length parameter should be sizeof(req->name) - 1 because there is no guarantee that string provided by userspace will contain the trailing '\0'. Can be easily reproduced by manually setting req->name to 128 non-zero bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on input subsystem: $ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af ("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys" field in struct hid_device due to overflow.) Cc: stable@vger.kernel.org Signed-off-by: Anderson Lizardo Acked-by: Marcel Holtmann Signed-off-by: Gustavo Padovan [backported to 2.6.32 jmm] Signed-off-by: Willy Tarreau commit 7664ec67365690cf2a1a997ea6e4dc949a8d80f8 Author: Dan Carpenter Date: Mon Dec 3 22:05:12 2012 +0300 telephony: ijx: buffer overflow in ixj_write_cid() [Not needed in 3.8 or newer as this driver is removed there. - gregkh] We get this from user space and nothing has been done to ensure that these strings are NUL terminated. Reported-by: Chen Gang Signed-off-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit dd70f87c0a17389be3a10ce979a8f59cc99440f1 Author: Patrick McHardy Date: Thu Aug 30 07:01:30 2012 +0000 IPoIB: Fix use-after-free of multicast object commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream. Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz) Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue, and hence the workqueue can't be flushed from the context of ipoib_stop(). In the current code, ipoib_stop() (which doesn't flush the workqueue) calls ipoib_mcast_dev_flush(), which goes and deletes all the multicast entries. This takes place without any synchronization with a possible running instance of ipoib_mcast_join_task() for the same ipoib device, leading to a crash due to NULL pointer dereference. Fix this by making sure that the workqueue is flushed before ipoib_mcast_dev_flush() is called. To make that possible, we move the RTNL-lock wrapped code to ipoib_mcast_join_finish(). Signed-off-by: Patrick McHardy Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e6d11ed587bbd99b2cd3bb2dba791e30d1211a1b Author: Nithin Nayak Sujir Date: Mon Jan 14 17:10:59 2013 +0000 tg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode [ Upstream commit 9c13cb8bb477a83b9a3c9e5a5478a4e21294a760 ] When netconsole is enabled, logging messages generated during tg3_open can result in a null pointer dereference for the uninitialized tg3 status block. Use the irq_sync flag to disable polling in the early stages. irq_sync is cleared when the driver is enabling interrupts after all initialization is completed. Signed-off-by: Nithin Nayak Sujir Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 28c684cee153e764af0a49a95e14b70463c3c69c Author: Larry Finger Date: Wed Sep 26 12:32:02 2012 -0500 b43legacy: Fix crash on unload when firmware not available commit 2d838bb608e2d1f6cb4280e76748cb812dc822e7 upstream. When b43legacy is loaded without the firmware being available, a following unload generates a kernel NULL pointer dereference BUG as follows: [ 214.330789] BUG: unable to handle kernel NULL pointer dereference at 0000004c [ 214.330997] IP: [] drain_workqueue+0x15/0x170 [ 214.331179] *pde = 00000000 [ 214.331311] Oops: 0000 [#1] SMP [ 214.331471] Modules linked in: b43legacy(-) ssb pcmcia mac80211 cfg80211 af_packet mperf arc4 ppdev sr_mod cdrom sg shpchp yenta_socket pcmcia_rsrc pci_hotplug pcmcia_core battery parport_pc parport floppy container ac button edd autofs4 ohci_hcd ehci_hcd usbcore usb_common thermal processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh fan thermal_sys hwmon ata_generic pata_ali libata [last unloaded: cfg80211] [ 214.333421] Pid: 3639, comm: modprobe Not tainted 3.6.0-rc6-wl+ #163 Source Technology VIC 9921/ALI Based Notebook [ 214.333580] EIP: 0060:[] EFLAGS: 00010246 CPU: 0 [ 214.333687] EIP is at drain_workqueue+0x15/0x170 [ 214.333788] EAX: c162ac40 EBX: cdfb8360 ECX: 0000002a EDX: 00002a2a [ 214.333890] ESI: 00000000 EDI: 00000000 EBP: cd767e7c ESP: cd767e5c [ 214.333957] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 214.333957] CR0: 8005003b CR2: 0000004c CR3: 0c96a000 CR4: 00000090 [ 214.333957] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 214.333957] DR6: ffff0ff0 DR7: 00000400 [ 214.333957] Process modprobe (pid: 3639, ti=cd766000 task=cf802e90 task.ti=cd766000) [ 214.333957] Stack: [ 214.333957] 00000292 cd767e74 c12c5e09 00000296 00000296 cdfb8360 cdfb9220 00000000 [ 214.333957] cd767e90 c104c4fd cdfb8360 cdfb9220 cd682800 cd767ea4 d0c10184 cd682800 [ 214.333957] cd767ea4 cba31064 cd767eb8 d0867908 cba31064 d087e09c cd96f034 cd767ec4 [ 214.333957] Call Trace: [ 214.333957] [] ? skb_dequeue+0x49/0x60 [ 214.333957] [] destroy_workqueue+0xd/0x150 [ 214.333957] [] ieee80211_unregister_hw+0xc4/0x100 [mac80211] [ 214.333957] [] b43legacy_remove+0x78/0x80 [b43legacy] [ 214.333957] [] ssb_device_remove+0x1d/0x30 [ssb] [ 214.333957] [] __device_release_driver+0x5a/0xb0 [ 214.333957] [] driver_detach+0x87/0x90 [ 214.333957] [] bus_remove_driver+0x6c/0xe0 [ 214.333957] [] driver_unregister+0x40/0x70 [ 214.333957] [] ssb_driver_unregister+0xb/0x10 [ssb] [ 214.333957] [] b43legacy_exit+0xd/0xf [b43legacy] [ 214.333957] [] sys_delete_module+0x14e/0x2b0 [ 214.333957] [] ? vfs_write+0xf7/0x150 [ 214.333957] [] ? tty_write_lock+0x50/0x50 [ 214.333957] [] ? sys_write+0x38/0x70 [ 214.333957] [] syscall_call+0x7/0xb [ 214.333957] Code: bc 27 00 00 00 00 a1 74 61 56 c1 55 89 e5 e8 a3 fc ff ff 5d c3 90 55 89 e5 57 56 89 c6 53 b8 40 ac 62 c1 83 ec 14 e8 bb b7 34 00 <8b> 46 4c 8d 50 01 85 c0 89 56 4c 75 03 83 0e 40 80 05 40 ac 62 [ 214.333957] EIP: [] drain_workqueue+0x15/0x170 SS:ESP 0068:cd767e5c [ 214.333957] CR2: 000000000000004c [ 214.341110] ---[ end trace c7e90ec026d875a6 ]---Index: wireless-testing/drivers/net/wireless/b43legacy/main.c The problem is fixed by making certain that the ucode pointer is not NULL before deregistering the driver in mac80211. Signed-off-by: Larry Finger Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5d977bde9cb9df4ae5c21875f074ab05a3cb22f7 Author: Francois Romieu Date: Sun Jul 11 17:10:09 2010 -0700 r8169: incorrect identifier for a 8168dp Merge error. See CFG_METHOD_8 (0x3c800000 + 0x00300000) since version 8.002.00 of Realtek's driver. Signed-off-by: Francois Romieu Cc: Hayes Signed-off-by: David S. Miller (cherry picked from commit 17c99297212a2d1b1779a08caf4b0d83a85545df) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit fa9652ba043f012ceea64f11f8e92aa7011a1833 Author: Lennart Sorensen Date: Thu Jul 28 13:18:11 2011 +0000 r8169: Add support for D-Link 530T rev C1 (Kernel Bug 38862) [ Upstream commit 93a3aa25933461d76141179fc94aa32d5f9d954a ] The D-Link DGE-530T rev C1 is a re-badged Realtek 8169 named DLG10028C, unlike the previous revisions which were skge based. It is probably the same as the discontinued DGE-528T (0x4300) other than the PCI ID. The PCI ID is 0x1186:0x4302. Adding it to r8169.c where 0x1186:0x4300 is already found makes the card be detected and work. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=38862 Signed-off-by: Len Sorensen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 7106159f8bd33bd5e5b0ea2c87e499117fc22c69) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit a0a6cfcd0f5ef71a4a42075c3dc9cee797a3859c Author: Timo Teräs Date: Mon Jan 21 22:30:35 2013 +0000 r8169: remove the obsolete and incorrect AMD workaround [ Upstream commit 5d0feaff230c0abfe4a112e6f09f096ed99e0b2d ] This was introduced in commit 6dccd16 "r8169: merge with version 6.001.00 of Realtek's r8169 driver". I did not find the version 6.001.00 online, but in 6.002.00 or any later r8169 from Realtek this hunk is no longer present. Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC rev d" claims to have fixed this issue otherwise. The magic compare mask of 0xfffe000 is dubious as it masks parts of the Reserved part, and parts of the VLAN tag. But this does not make much sense as the VLAN tag parts are perfectly valid there. In matter of fact this seems to be triggered with any VLAN tagged packet as RxVlanTag bit is matched. I would suspect 0xfffe0000 was intended to test reserved part only. Finally, this hunk is evil as it can cause more packets to be handled than what was NAPI quota causing net/core/dev.c: net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and mess up the NAPI state causing device to hang. As result, any system using VLANs and having high receive traffic (so that NAPI poll budget limits rtl_rx) would result in device hang. Signed-off-by: Timo Teräs Acked-by: Francois Romieu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 3a42cce923b0242d0293cae0a162601afa89d552) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit 3e472a9ae743f01b6c8e78693f9edcd32ba45230 Author: Hayes Wang Date: Tue Feb 22 17:26:19 2011 +0800 r8169: correct settings of rtl8102e. Adjust and remove certain settings of RTL8102E which are for previous chips. Signed-off-by: Hayes Wang Acked-off-by: Francois Romieu (cherry picked from commit d24e9aafe5d5dfdf6d114b29e67f8afd5fae5ef0) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit 95f86856ab0265ec1ad76b72b52fc99c8d0fa840 Author: Mathias Krause Date: Fri Sep 14 09:58:32 2012 +0000 xfrm_user: return error pointer instead of NULL #2 commit c25463722509fef0ed630b271576a8c9a70236f3 upstream. When dump_one_policy() returns an error, e.g. because of a too small buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns NULL instead of an error pointer. But its caller expects an error pointer and therefore continues to operate on a NULL skbuff. Signed-off-by: Mathias Krause Acked-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 30d5e9d86fd710b54d6ec4ab9d38ebc357e10f84 Author: Mathias Krause Date: Thu Sep 13 11:41:26 2012 +0000 xfrm_user: return error pointer instead of NULL commit 864745d291b5ba80ea0bd0edcbe67273de368836 upstream. When dump_one_state() returns an error, e.g. because of a too small buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL instead of an error pointer. But its callers expect an error pointer and therefore continue to operate on a NULL skbuff. This could lead to a privilege escalation (execution of user code in kernel context) if the attacker has CAP_NET_ADMIN and is able to map address 0. Signed-off-by: Mathias Krause Acked-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Willy Tarreau commit 89cee197c7947e90ae3600553aaa477eac1ee55a Author: Mathias Krause Date: Wed Sep 19 11:33:41 2012 +0000 xfrm_user: fix info leak in copy_to_user_tmpl() [ Upstream commit 1f86840f897717f86d523a13e99a447e6a5d2fa5 ] The memory used for the template copy is a local stack variable. As struct xfrm_user_tmpl contains multiple holes added by the compiler for alignment, not initializing the memory will lead to leaking stack bytes to userland. Add an explicit memset(0) to avoid the info leak. Initial version of the patch by Brad Spengler. Signed-off-by: Mathias Krause Cc: Brad Spengler Acked-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit ab7dec14409366c185ecf4cb1a4d0a6b1370be89 Author: Mathias Krause Date: Wed Sep 19 11:33:40 2012 +0000 xfrm_user: fix info leak in copy_to_user_policy() [ Upstream commit 7b789836f434c87168eab067cfbed1ec4783dffd ] The memory reserved to dump the xfrm policy includes multiple padding bytes added by the compiler for alignment (padding bytes in struct xfrm_selector and struct xfrm_userpolicy_info). Add an explicit memset(0) before filling the buffer to avoid the heap info leak. Signed-off-by: Mathias Krause Acked-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d6927c1ec8b078e0b7e416b1da320a4f5b70bf4d Author: Mathias Krause Date: Wed Sep 19 11:33:39 2012 +0000 xfrm_user: fix info leak in copy_to_user_state() [ Upstream commit f778a636713a435d3a922c60b1622a91136560c1 ] The memory reserved to dump the xfrm state includes the padding bytes of struct xfrm_usersa_info added by the compiler for alignment (7 for amd64, 3 for i386). Add an explicit memset(0) before filling the buffer to avoid the info leak. Signed-off-by: Mathias Krause Acked-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e760d49b961785ee6bebe8b87832335b8e61c8a2 Author: Jussi Kivilinna Date: Sun Oct 21 20:42:28 2012 +0300 crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption commit 9efade1b3e981f5064f9db9ca971b4dc7557ae42 upstream. cryptd_queue_worker attempts to prevent simultaneous accesses to crypto workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable. However cryptd_enqueue_request might be called from softirq context, so add local_bh_disable/local_bh_enable to prevent data corruption and panics. Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2 v2: - Disable software interrupts instead of hardware interrupts Reported-by: Gurucharan Shetty Signed-off-by: Jussi Kivilinna Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 554da46d8c6d24c1920ec1b9b333de907bbe71c7 Author: David Howells Date: Tue Mar 12 16:44:31 2013 +1100 keys: fix race with concurrent install_user_keyrings() commit 0da9dfdd2cd9889201bc6f6f43580c99165cd087 upstream. This fixes CVE-2013-1792. There is a race in install_user_keyrings() that can cause a NULL pointer dereference when called concurrently for the same user if the uid and uid-session keyrings are not yet created. It might be possible for an unprivileged user to trigger this by calling keyctl() from userspace in parallel immediately after logging in. Assume that we have two threads both executing lookup_user_key(), both looking for KEY_SPEC_USER_SESSION_KEYRING. THREAD A THREAD B =============================== =============================== ==>call install_user_keyrings(); if (!cred->user->session_keyring) ==>call install_user_keyrings() ... user->uid_keyring = uid_keyring; if (user->uid_keyring) return 0; <== key = cred->user->session_keyring [== NULL] user->session_keyring = session_keyring; atomic_inc(&key->usage); [oops] At the point thread A dereferences cred->user->session_keyring, thread B hasn't updated user->session_keyring yet, but thread A assumes it is populated because install_user_keyrings() returned ok. The race window is really small but can be exploited if, for example, thread B is interrupted or preempted after initializing uid_keyring, but before doing setting session_keyring. This couldn't be reproduced on a stock kernel. However, after placing systemtap probe on 'user->session_keyring = session_keyring;' that introduced some delay, the kernel could be crashed reliably. Fix this by checking both pointers before deciding whether to return. Alternatively, the test could be done away with entirely as it is checked inside the mutex - but since the mutex is global, that may not be the best way. Signed-off-by: David Howells Reported-by: Mateusz Guzik Signed-off-by: Andrew Morton Signed-off-by: James Morris Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 4d9de348035b2822961ad43e380c2cff2a52cccf Author: Eddie Wai Date: Tue Aug 21 10:35:53 2012 -0700 SCSI: bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload commit d6532207116307eb7ecbfa7b9e02c53230096a50 upstream. This patch fixes the following kernel panic invoked by uninitialized fields in the chip initialization for the 1G bnx2 iSCSI offload. One of the bits in the chip initialization is being used by the latest firmware to control overflow packets. When this control bit gets enabled erroneously, it would ultimately result in a bad packet placement which would cause the bnx2 driver to dereference a NULL ptr in the placement handler. This can happen under certain stress I/O environment under the Linux iSCSI offload operation. This change only affects Broadcom's 5709 chipset. Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [] :bnx2:bnx2_poll_work+0xd0d/0x13c5 Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2 RIP: 0010:[] [] :bnx2:bnx2_poll_work+0xd0d/0x13c5 RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216 RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000 RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220 RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000 R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010 R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558 FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0 Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820) Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520 ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20 0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8 Call Trace: [] show_schedstat+0x1c2/0x25b [] :bnx2:bnx2_poll+0xf6/0x231 [] net_rx_action+0xac/0x1b1 [] __do_softirq+0x89/0x133 [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] do_IRQ+0xee/0xf7 [] ret_from_intr+0x0/0xa [] acpi_processor_idle_simple+0x1c5/0x341 [] acpi_processor_idle_simple+0x182/0x341 [] acpi_processor_idle_simple+0x0/0x341 [] cpu_idle+0x95/0xb8 [] start_secondary+0x479/0x488 Signed-off-by: Eddie Wai Reviewed-by: Mike Christie Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 34cee3a5778bd7d68e3dcd618526cd2edfc835a1 Author: James Bottomley Date: Thu Jul 7 15:45:40 2011 -0500 fix crash in scsi_dispatch_cmd() USB surprise removal of sr is triggering an oops in scsi_dispatch_command(). What seems to be happening is that USB is hanging on to a queue reference until the last close of the upper device, so the crash is caused by surprise remove of a mounted CD followed by attempted unmount. The problem is that USB doesn't issue its final commands as part of the SCSI teardown path, but on last close when the block queue is long gone. The long term fix is probably to make sr do the teardown in the same way as sd (so remove all the lower bits on ejection, but keep the upper disk alive until last close of user space). However, the current oops can be simply fixed by not allowing any commands to be sent to a dead queue. Cc: stable@kernel.org Signed-off-by: James Bottomley (cherry picked from commit bfe159a51203c15d23cb3158fffdc25ec4b4dda1) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit c6203cd4fc2f54da77898c2a76ae2356a90c992f Author: Peter Korsgaard Date: Tue Aug 10 18:01:24 2010 -0700 scsi: use __uX types for headers exported to user space Commit 9e4f5e29 ("FC Pass Thru support") exported a number of header files in include/scsi to user space, but didn't change the uX types to the userspace-compatible __uX types. Without that you'll get compile errors when including them - E.G.: include/scsi/scsi.h:145: error: expected specifier-qualifier-list before `u8' Signed-off-by: Peter Korsgaard Cc: Boaz Harrosh Cc: James Smart Cc: James Bottomley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 083c8c1e60e5c27a277e87dbeb6b89b47937559f) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit 9eb218602e21779f584a50953c21f8214b7e2f43 Author: Jan Kara Date: Fri Jun 15 12:52:46 2012 +0200 scsi: Silence unnecessary warnings about ioctl to partition commit 6d9359280753d2955f86d6411047516a9431eb51 upstream. Sometimes, warnings about ioctls to partition happen often enough that they form majority of the warnings in the kernel log and users complain. In some cases warnings are about ioctls such as SG_IO so it's not good to get rid of the warnings completely as they can ease debugging of userspace problems when ioctl is refused. Since I have seen warnings from lots of commands, including some proprietary userspace applications, I don't think disallowing the ioctls for processes with CAP_SYS_RAWIO will happen in the near future if ever. So lets just stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed. CC: Paolo Bonzini CC: James Bottomley CC: linux-scsi@vger.kernel.org Acked-by: Paolo Bonzini Signed-off-by: Jan Kara Signed-off-by: Jens Axboe [bwh: Backported to 3.2: use ENOTTY, not ENOIOCTLCMD] Signed-off-by: Ben Hutchings (cherry picked from commit f45c9a6eec20cd712421c442785e7a4e9215a230) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit c398e000072a575a036ef644cffd8c0b642a6d5a Author: Xiaotian Feng Date: Thu Dec 13 16:12:18 2012 +0800 libata: fix Null pointer dereference on disk error commit 26cd4d65deba587f3cf2329b6869ce02bcbe68ec upstream. Following oops were observed when disk error happened: [ 4272.896937] sd 0:0:0:0: [sda] Unhandled error code [ 4272.896939] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 4272.896942] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 5a de a7 00 00 08 00 [ 4272.896951] end_request: I/O error, dev sda, sector 5955239 [ 4291.574947] BUG: unable to handle kernel NULL pointer dereference at (null) [ 4291.658305] IP: [] ahci_activity_show+0x1/0x40 [ 4291.730090] PGD 76dbbc067 PUD 6c4fba067 PMD 0 [ 4291.783408] Oops: 0000 [#1] SMP [ 4291.822100] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/sw_activity [ 4291.934235] CPU 9 [ 4291.958301] Pid: 27942, comm: hwinfo ...... ata_scsi_find_dev could return NULL, so ata_scsi_activity_{show,store} should check if atadev is NULL. Signed-off-by: Xiaotian Feng Cc: James Bottomley Signed-off-by: Jeff Garzik Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3d4329e7924d1a1ccb1db2a074b8a4961b6fc254 Author: Malcolm Priestley Date: Sun Oct 7 08:27:00 2012 +0100 staging: vt6656: [BUG] out of bound array reference in RFbSetPower. commit ab1dd9963137a1e122004d5378a581bf16ae9bc8 upstream. Calling RFbSetPower with uCH zero value will cause out of bound array reference. This causes 64 bit kernels to oops on boot. Note: Driver does not function on 64 bit kernels and should be blacklisted on them. Signed-off-by: Malcolm Priestley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit cb4d9e9e10d8b3fbc8cc75caf6f8d89f6cc0c0e2 Author: Ian Abbott Date: Fri Aug 31 20:41:30 2012 +0100 staging: comedi: das08: Correct AO output for das08jr-16-ao commit 61ed59ed09e6ad2b8395178ea5ad5f653bba08e3 upstream. Don't zero out bits 15..12 of the data value in `das08jr_ao_winsn()` as that knobbles the upper three-quarters of the output range for the 'das08jr-16-ao' board. Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 57d87ddb278da69ddd8f9c4163136f22eebed1f3 Author: Ian Abbott Date: Wed Feb 27 10:56:19 2013 +0000 staging: comedi: check s->async for poll(), read() and write() commit cc400e185c07c15a42d2635995f422de5b94b696 upstream. Some low-level comedi drivers (incorrectly) point `dev->read_subdev` or `dev->write_subdev` to a subdevice that does not support asynchronous commands. Comedi's poll(), read() and write() file operation handlers assume these subdevices do support asynchronous commands. In particular, they assume `s->async` is valid (where `s` points to the read or write subdevice), which it won't be if it has been set incorrectly. This can lead to a NULL pointer dereference. Check `s->async` is non-NULL in `comedi_poll()`, `comedi_read()` and `comedi_write()` to avoid the bug. Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d20405f567ed2eff7460808423245c5a60c2709b Author: Ian Abbott Date: Tue Sep 18 19:46:58 2012 +0100 staging: comedi: don't dereference user memory for INSN_INTTRIG commit 5d06e3df280bd230e2eadc16372e62818c63e894 upstream. `parse_insn()` is dereferencing the user-space pointer `insn->data` directly when handling the `INSN_INTTRIG` comedi instruction. It shouldn't be using `insn->data` at all; it should be using the separate `data` pointer passed to the function. Fix it. Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3b436bd94211c911c6c6f026829c3a8a4e8b0113 Author: Ian Abbott Date: Thu Sep 27 17:45:27 2012 +0100 staging: comedi: jr3_pci: fix iomem dereference commit e1878957b4676a17cf398f7f5723b365e9a2ca48 upstream. Correct a direct dereference of I/O memory to use an appropriate I/O memory access function. Note that the pointer being dereferenced is not currently tagged with `__iomem` but I plan to correct that for 3.7. Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 61d9db0744fd67b24ff883ad631299f999496470 Author: Ian Abbott Date: Mon Sep 24 17:20:52 2012 +0100 staging: comedi: s626: don't dereference insn->data commit b655c2c4782ed3e2e71d2608154e295a3e860311 upstream. `s626_enc_insn_config()` is incorrectly dereferencing `insn->data` which is a pointer to user memory. It should be dereferencing the separate `data` parameter that points to a copy of the data in kernel memory. Signed-off-by: Ian Abbott Reviewed-by: H Hartley Sweeten Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 09f6f2bbba76142af91b4ac9834521c23a1575e2 Author: Ian Abbott Date: Wed Sep 19 19:37:39 2012 +0100 staging: comedi: fix memory leak for saved channel list commit c8cad4c89ee3b15935c532210ae6ebb5c0a2734d upstream. When `do_cmd_ioctl()` allocates memory for the kernel copy of a channel list, it frees any previously allocated channel list in `async->cmd.chanlist` and replaces it with the new one. However, if the device is ever removed (or "detached") the cleanup code in `cleanup_device()` in "drivers.c" does not free this memory so it is lost. A sensible place to free the kernel copy of the channel list is in `do_become_nonbusy()` as at that point the comedi asynchronous command associated with the channel list is no longer valid. Free the channel list in `do_become_nonbusy()` instead of `do_cmd_ioctl()` and clear the pointer to prevent it being freed more than once. Note that `cleanup_device()` could be called at an inappropriate time while the comedi device is open, but that's a separate bug not related to this this patch. Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 52b64ec2fda108f5a802b1c0891fb75ccdec3f48 Author: Ian Abbott Date: Fri Jan 4 11:33:21 2013 +0000 staging: comedi: comedi_test: fix race when cancelling command commit c0729eeefdcd76db338f635162bf0739fd2c5f6f upstream. Éric Piel reported a kernel oops in the "comedi_test" module. It was a NULL pointer dereference within `waveform_ai_interrupt()` (actually a timer function) that sometimes occurred when a running asynchronous command is cancelled (either by the `COMEDI_CANCEL` ioctl or by closing the device file). This seems to be a race between the caller of `waveform_ai_cancel()` which on return from that function goes and tears down the running command, and the timer function which uses the command. In particular, `async->cmd.chanlist` gets freed (and the pointer set to NULL) by `do_become_nonbusy()` in "comedi_fops.c" but a previously scheduled `waveform_ai_interrupt()` timer function will dereference that pointer regardless, leading to the oops. Fix it by replacing the `del_timer()` call in `waveform_ai_cancel()` with `del_timer_sync()`. Signed-off-by: Ian Abbott Reported-by: Éric Piel Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit deb76af69d8fa03c6dbd68a007b1c914b41b3ccd Author: Ian Abbott Date: Wed Feb 27 13:27:38 2013 +0000 staging: comedi: ni_labpc: set up command4 register *after* command3 Commit 22056e2b46246d97ff0f7c6e21a77b8daa07f02c upstream. Tuomas reported problems getting meaningful output from a Lab-PC+ in differential mode for AI cmds, but AI insn reads gave correct readings. He tracked it down to two problems, one of which is addressed by this patch. It seems that writing to the command3 register after writing to the command4 register in `labpc_ai_cmd()` messes up the differential reference bit setting in the command4 register. Set up the command4 register after the command3 register (as in `labpc_ai_rinsn()`) to avoid the problem. Thanks to Tuomas for suggesting the fix. Signed-off-by: Ian Abbott Signed-off-by: Willy Tarreau commit 0bada243965ab71669cde9b0e1bc671b73d95df6 Author: Ian Abbott Date: Wed Feb 27 13:27:37 2013 +0000 staging: comedi: ni_labpc: correct differential channel sequence for AI commands Commit 4c4bc25d0fa6beaf054c0b4c3b324487f266c820 upstream. Tuomas reported problems getting meaningful output from a Lab-PC+ in differential mode for AI cmds, but AI insn reads gave correct readings. He tracked it down to two problems, one of which is addressed by this patch. It seems the setting of the channel bits for particular scanning modes was incorrect for differential mode. (Only half the number of channels are available in differential mode; comedi refers to them as channels 0, 1, 2 and 3, but the hardware documentation refers to them as channels 0, 2, 4 and 6.) In differential mode, the setting of the channel enable bits in the command1 register should depend on whether the scan enable bit is set. Effectively, we need to double the comedi channel number when the scan enable bit is not set in differential mode. The scan enable bit gets set when the AI scan mode is `MODE_MULT_CHAN_UP` or `MODE_MULT_CHAN_DOWN`, and gets cleared when the AI scan mode is `MODE_SINGLE_CHAN` or `MODE_SINGLE_CHAN_INTERVAL`. The existing test for whether the comedi channel number needs to be doubled in differential mode is incorrect in `labpc_ai_cmd()`. This patch corrects the test. Thanks to Tuomas for suggesting the fix. Signed-off-by: Ian Abbott Signed-off-by: Willy Tarreau commit 245f69a0948e328dde824b00de7343ffac948365 Author: Marcin Jurkowski Date: Sat Mar 2 14:50:15 2013 +0100 w1: fix oops when w1_search is called from netlink connector commit 9d1817cab2f030f6af360e961cc69bb1da8ad765 upstream. On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote: > This is the bad commit I found doing git bisect: > 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit > commit 04f482faf50535229a5a5c8d629cf963899f857c > Author: Patrick McHardy > Date: Mon Mar 28 08:39:36 2011 +0000 Good job. I was too lazy to bisect for bad commit;) Reading the code I found problematic kthread_should_stop call from netlink connector which causes the oops. After applying a patch, I've been testing owfs+w1 setup for nearly two days and it seems to work very reliable (no hangs, no memleaks etc). More detailed description and possible fix is given below: Function w1_search can be called from either kthread or netlink callback. While the former works fine, the latter causes oops due to kthread_should_stop invocation. This patch adds a check if w1_search is serving netlink command, skipping kthread_should_stop invocation if so. Signed-off-by: Marcin Jurkowski Acked-by: Evgeniy Polyakov Cc: Josh Boyer Tested-by: Sven Geggus Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 9eeeda586fae4eeb7af6a52a6cbb66051624bc48 Author: Chen Gang Date: Thu May 16 14:04:25 2013 -0500 drivers/char/ipmi: memcpy, need additional 2 bytes to avoid memory overflow commit a5f2b3d6a738e7d4180012fe7b541172f8c8dcea upstream. When calling memcpy, read_data and write_data need additional 2 bytes. write_data: for checking: "if (size > IPMI_MAX_MSG_LENGTH)" for operating: "memcpy(bt->write_data + 3, data + 1, size - 1)" read_data: for checking: "if (msg_len < 3 || msg_len > IPMI_MAX_MSG_LENGTH)" for operating: "memcpy(data + 2, bt->read_data + 4, msg_len - 2)" Signed-off-by: Chen Gang Signed-off-by: Corey Minyard Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d9cbbc3f7be22f5c2e60524827644cd0316d5fe5 Author: Jiri Slaby Date: Sun Jun 5 22:51:49 2011 +0200 serial: 8250, increase PASS_LIMIT With virtual machines like qemu, it's pretty common to see "too much work for irq4" messages nowadays. This happens when a bunch of output is printed on the emulated serial console. This is caused by too low PASS_LIMIT. When ISR loops more than the limit, it spits the message. I've been using a kernel with doubled the limit and I couldn't see no problems. Maybe it's time to get rid of the message now? [2.6.32: background info from Ram Gupta] > I need a patch for serial driver that increases PASS_LIMIT merged in > 3.1. I am using 2.6.32 kernel which experiences kernel panic > occasionally. It will be great if you can backport to 2.6.32 and 3.0 > kernel. The commit ID is e7328ae1 serial: 8250, increase PASS_LIMIT Signed-off-by: Jiri Slaby Cc: Alan Cox Signed-off-by: Greg Kroah-Hartman (cherry picked from commit e7328ae1848966181a7ac47e8ae6cddbd2cf55f3) Cc: Ram Gupta Signed-off-by: Willy Tarreau commit 0f6bacee381f71861b3a7d8849d097d0f4ac7343 Author: Al Viro Date: Wed Nov 21 19:27:23 2012 +0000 PARISC: fix user-triggerable panic on parisc commit 441a179dafc0f99fc8b3a8268eef66958621082e upstream. int sys32_rt_sigprocmask(int how, compat_sigset_t __user *set, compat_sigset_t __user *oset, unsigned int sigsetsize) { sigset_t old_set, new_set; int ret; if (set && get_sigset32(set, &new_set, sigsetsize)) ... static int get_sigset32(compat_sigset_t __user *up, sigset_t *set, size_t sz) { compat_sigset_t s; int r; if (sz != sizeof *set) panic("put_sigset32()"); In other words, rt_sigprocmask(69, (void *)69, 69) done by 32bit process will promptly panic the box. Signed-off-by: Al Viro Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 05cd028a8ba98802239b06bbec63fb8777f58712 Author: Jay Estabrook Date: Sun Apr 7 21:36:09 2013 +1200 alpha: Add irongate_io to PCI bus resources commit aa8b4be3ac049c8b1df2a87e4d1d902ccfc1f7a9 upstream. Fixes a NULL pointer dereference at boot on UP1500. Reviewed-and-Tested-by: Matt Turner Signed-off-by: Jay Estabrook Signed-off-by: Matt Turner Signed-off-by: Michael Cree Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 711a06c6d59c33b6441afabb2f7a84453b16b267 Author: Konrad Rzeszutek Wilk Date: Wed Jan 16 23:40:07 2013 +0100 ACPI / cpuidle: Fix NULL pointer issues when cpuidle is disabled commit b88a634a903d9670aa5f2f785aa890628ce0dece upstream. If cpuidle is disabled, that means that: per_cpu(acpi_cpuidle_device, pr->id) is set to NULL as the acpi_processor_power_init ends up failing at retval = cpuidle_register_driver(&acpi_idle_driver) (in acpi_processor_power_init) and never sets the per_cpu idle device. So when acpi_processor_hotplug on CPU online notification tries to reference said device it crashes: cpu 3 spinlock event irq 62 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [] acpi_processor_setup_cpuidle_cx+0x3f/0x105 PGD a259b067 PUD ab38b067 PMD 0 Oops: 0002 [#1] SMP odules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi libcrc32c crc32c nouveau mxm_wmi wmi radeon ttm sg sr_mod sd_mod cdrom ata_generic ata_piix libata crc32c_intel scsi_mod atl1c i915 fbcon tileblit font bitblit softcursor drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd mperf CPU 1 Pid: 3047, comm: bash Not tainted 3.8.0-rc3upstream-00250-g165c029 #1 MSI MS-7680/H61M-P23 (MS-7680) RIP: e030:[] [] acpi_processor_setup_cpuidle_cx+0x3f/0x105 RSP: e02b:ffff88001742dca8 EFLAGS: 00010202 RAX: 0000000000010be9 RBX: ffff8800a0a61800 RCX: ffff880105380000 RDX: 0000000000000003 RSI: 0000000000000200 RDI: ffff8800a0a61800 RBP: ffff88001742dce8 R08: ffffffff81812360 R09: 0000000000000200 R10: aaaaaaaaaaaaaaaa R11: 0000000000000001 R12: ffff8800a0a61800 R13: 00000000ffffff01 R14: 0000000000000000 R15: ffffffff81a907a0 FS: 00007fd6942f7700(0000) GS:ffff880105280000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000004 CR3: 00000000a6773000 CR4: 0000000000042660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 3047, threadinfo ffff88001742c000, task ffff880017944000) Stack: 0000000000000150 ffff880100f59e00 ffff88001742dcd8 ffff8800a0a61800 0000000000000000 00000000ffffff01 0000000000000000 ffffffff81a907a0 ffff88001742dd18 ffffffff813815b1 ffff88001742dd08 ffffffff810ae336 Call Trace: [] acpi_processor_hotplug+0x7c/0x9f [] ? schedule_delayed_work_on+0x16/0x20 [] acpi_cpu_soft_notify+0x90/0xca [] notifier_call_chain+0x4d/0x70 [] __raw_notifier_call_chain+0x9/0x10 [] __cpu_notify+0x1b/0x30 [] _cpu_up+0x103/0x14b [] cpu_up+0xd9/0xec [] store_online+0x94/0xd0 [] dev_attr_store+0x1b/0x20 [] sysfs_write_file+0xf4/0x170 This patch fixes it. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a4d30bc1b46b335b98c50ec2f900e4340cc029f2 Author: Andi Kleen Date: Fri Nov 19 13:16:22 2010 +0100 MCE: Fix vm86 handling for 32bit mce handler commit a129a7c84582629741e5fa6f40026efcd7a65bd4 upstream. When running on 32bit the mce handler could misinterpret vm86 mode as ring 0. This can affect whether it does recovery or not; it was possible to panic when recovery was actually possible. Fix this by always forcing vm86 to look like ring 3. [ Backport to 3.0 notes: Things changed there slightly: - move mce_get_rip() up. It fills up m->cs and m->ip values which are evaluated in mce_severity(). Therefore move it up right before the mce_severity call. This seem to be another bug in 3.0? - Place the backport (fix m->cs in V86 case) to where m->cs gets filled which is mce_get_rip() in 3.0 ] Signed-off-by: Andi Kleen Signed-off-by: Tony Luck Signed-off-by: Thomas Renninger Reviewed-by: Tony Luck Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 228b7542cc253d63faa60a18034b4ace5cbacf91 Author: Petr Matousek Date: Tue Nov 6 19:24:07 2012 +0100 KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream. On hosts without the XSAVE support unprivileged local user can trigger oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN ioctl. invalid opcode: 0000 [#2] SMP Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables ... Pid: 24935, comm: zoog_kvm_monito Tainted: G D 3.2.0-3-686-pae EIP: 0060:[] EFLAGS: 00210246 CPU: 0 EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0 task.ti=d7c62000) Stack: 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80 Call Trace: [] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm] ... [] ? syscall_call+0x7/0xb Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74 1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01 d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89 EIP: [] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP 0068:d7c63e70 QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID and then sets them later. So guest's X86_FEATURE_XSAVE should be masked out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with X86_FEATURE_XSAVE even on hosts that do not support it, might be susceptible to this attack from inside the guest as well. Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support. Signed-off-by: Petr Matousek Signed-off-by: Marcelo Tosatti [bwh: Backported to 2.6.32: XSAVE is not supported at all, so always deny setting OSXSAVE] Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau commit d277a2de4f7fb7f346578cf8cf28cebda2960345 Author: Andy Honig Date: Wed Feb 20 14:49:16 2013 -0800 KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) commit a2c118bfab8bc6b8bb213abfc35201e441693d55 upstream. If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate that request. ioapic_read_indirect contains an ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in non-debug builds. In recent kernels this allows a guest to cause a kernel oops by reading invalid memory. In older kernels (pre-3.3) this allows a guest to read from large ranges of host memory. Tested: tested against apic unit tests. Signed-off-by: Andrew Honig Signed-off-by: Marcelo Tosatti Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit aa184a862d1617273bb56da292a923dbfcc1486c Author: Marcelo Tosatti Date: Fri Mar 22 19:14:07 2013 +0000 KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check This was fixed by commit 8f964525a121f2ff2df948dac908dcc65be21b5b upstream. This alternate fix avoids the need for extensive backporting. RHEL5 i386 guests register non 32-byte aligned addresses: kvm-clock: cpu 1, msr 0:3018aa5, secondary cpu clock kvm-clock: cpu 2, msr 0:301f8e9, secondary cpu clock kvm-clock: cpu 3, msr 0:302672d, secondary cpu clock Check for an address+len that would cross page boundary instead. Signed-off-by: Marcelo Tosatti [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 1a33aafff4febd7d3b0c2e27f1c2c7c5b5e08bd1 Author: Andy Honig Date: Mon Mar 11 09:34:52 2013 -0700 KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) commit c300aa64ddf57d9c5d9c898a64b36877345dd4a9 upstream. If the guest sets the GPA of the time_page so that the request to update the time straddles a page then KVM will write onto an incorrect page. The write is done byusing kmap atomic to get a pointer to the page for the time structure and then performing a memcpy to that page starting at an offset that the guest controls. Well behaved guests always provide a 32-byte aligned address, however a malicious guest could use this to corrupt host kernel memory. Tested: Tested against kvmclock unit test. Signed-off-by: Andrew Honig Signed-off-by: Marcelo Tosatti Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 13fbb87535cf44e1cc06c4dcdd0469bf209571a0 Author: Konrad Rzeszutek Wilk Date: Wed Oct 10 13:25:48 2012 -0400 xen/bootup: allow {read|write}_cr8 pvops call. commit 1a7bbda5b1ab0e02622761305a32dc38735b90b2 upstream. We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[] [] save_processor_state+0x212/0x270 .. snip.. Call Trace: [] do_suspend_lowlevel+0xf/0xac [] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [] acpi_suspend_enter+0x57/0xd5 Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e73a5ba6f155c8bb631e0d62b6fd7c4ed3660c07 Author: Konrad Rzeszutek Wilk Date: Wed Oct 10 13:30:47 2012 -0400 xen/bootup: allow read_tscp call for Xen PV guests. commit cd0608e71e9757f4dae35bcfb4e88f4d1a03a8ab upstream. The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 2ed9527f11b43c3ec112cd88f6c46b4a4b470250 Author: Samu Kallio Date: Sat Mar 23 09:36:35 2013 -0400 x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5 upstream. In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[] [] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [] do_page_fault+0x399/0x4b0 [] ? xen_mc_extend_args+0xec/0x110 [] page_fault+0x25/0x30 [] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [] __mem_cgroup_uncharge_common+0xd8/0x350 [] mem_cgroup_uncharge_page+0x57/0x60 [] page_remove_rmap+0xe0/0x150 [] ? vm_normal_page+0x1a/0x80 [] unmap_single_vma+0x531/0x870 [] unmap_vmas+0x52/0xa0 [] ? pte_mfn_to_pfn+0x72/0x100 [] exit_mmap+0x98/0x170 [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [] mmput+0x83/0xf0 [] exit_mm+0x104/0x130 [] do_exit+0x15a/0x8c0 [] do_group_exit+0x3f/0xa0 [] sys_exit_group+0x17/0x20 [] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737 Tested-by: Josh Boyer Reported-and-Tested-by: Krishna Raman Signed-off-by: Samu Kallio Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com Tested-by: Konrad Rzeszutek Wilk Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 49beb96cacf64f7d4ac956348f7d19bbab7acfe4 Author: Mel Gorman Date: Mon Feb 11 14:52:36 2013 +0000 x86/mm: Check if PUD is large when validating a kernel address commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream. A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [] kern_addr_valid+0xbe/0x110 [...] Call Trace: [] read_kcore+0x17a/0x370 [] proc_reg_read+0x77/0xc0 [] vfs_read+0xc7/0x130 [] sys_read+0x53/0xa0 [] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Reviewed-by: Michal Hocko Acked-by: Johannes Weiner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 3dcf19f3a879bf8b77970abc60e88a8e62af4afe Author: Alan Cox Date: Thu Nov 15 13:06:22 2012 +0000 x86/msr: Add capabilities check commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream At the moment the MSR driver only relies upon file system checks. This means that anything as root with any capability set can write to MSRs. Historically that wasn't very interesting but on modern processors the MSRs are such that writing to them provides several ways to execute arbitary code in kernel space. Sample code and documentation on doing this is circulating and MSR attacks are used on Windows 64bit rootkits already. In the Linux case you still need to be able to open the device file so the impact is fairly limited and reduces the security of some capability and security model based systems down towards that of a generic "root owns the box" setup. Therefore they should require CAP_SYS_RAWIO to prevent an elevation of capabilities. The impact of this is fairly minimal on most setups because they don't have heavy use of capabilities. Those using SELinux, SMACK or AppArmor rules might want to consider if their rulesets on the MSR driver could be tighter. Signed-off-by: Alan Cox Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Horses Signed-off-by: Ingo Molnar [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 783defce6e23f3d387c14af72c3eb009d28f00bf Author: Jan Beulich Date: Thu Jan 24 13:11:10 2013 +0000 x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS. commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream. This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [] ? panic+0x6e/0x122 [] ? oops_end+0xbc/0xd0 [] ? do_general_protection+0x0/0x210 [] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek Reported-by: Petr Matousek Reviewed-by: Andrew Cooper Signed-off-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 119274d60ecf507189679f18e478ef682183d95c Author: Romain Francoise Date: Wed Jan 23 11:58:36 2013 +0100 x86, random: make ARCH_RANDOM prompt if EMBEDDED, not EXPERT Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the Kconfig entry was not changed to match when upstream commit 628c6246d47b85f5357298601df2444d7f4dd3fd ("x86, random: Architectural inlines to get random integers with RDRAND") was backported. Signed-off-by: Romain Francoise Signed-off-by: Willy Tarreau commit d9869cbed12fa2aacd744b86aaff5f826fb7b2bf Author: Matthew Garrett Date: Wed Jul 6 16:52:37 2011 -0400 x86: Don't use the EFI reboot method by default Testing suggests that at least some Lenovos and some Intels will fail to reboot via EFI, attempting to jump to an unmapped physical address. In the long run we could handle this by providing a page table with a 1:1 mapping of physical addresses, but for now it's probably just easier to assume that ACPI or legacy methods will be present and reboot via those. [2.6.32: additional background information from Jonathan below] > > Please consider > > f70e957cda22 x86: Don't use the EFI reboot method by default, > 2011-07-06 > > for application to the 2.6.32.y and 2.6.34.y trees. The patch was > applied upstream late in the 3.0 cycle, so newer kernels don't need > it. > > In 2011, Keith Ward wrote[1]: > > > When attempting to reboot my my UEFI enabled system, the system hangs when > > calling reboot requiring me to manually reset the system via the reset switch. > > > > Screenshot: http://twitgoo.com/29bq1c > > Ben Hutchings writes[1]: > > > Version: 3.0.0-1 > > > > I also had this problem on my own system, but it is fixed now. > > I bisected the fix to: > > > > commit f70e957cda22d309c769805cbb932407a5232219 > > Author: Matthew Garrett > > Date: Wed Jul 6 16:52:37 2011 -0400 > > > > x86: Don't use the EFI reboot method by default > > > > which is basically equivalent to the workaround! > > > > I'll also apply this fix to squeeze as it's so simple. > > Keith Ward also wrote[1]: > > > It seems as if this has recently been reported at Ubuntu's Launchpad as well: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/721576 > > There are a variety of reports of the same panic at that bug on > 2.6.32.y-, 2.6.38.y-, and 2.6.39-based kernels. Passing "reboot=a,w" > on the kernel command line avoids trouble for reporters. Signed-off-by: Matthew Garrett Cc: Linus Torvalds Cc: Andrew Morton Cc: Alan Cox Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com Signed-off-by: Ingo Molnar (cherry picked from commit f70e957cda22d309c769805cbb932407a5232219) Cc: Jonathan Nieder Cc: Ben Hutchings Signed-off-by: Willy Tarreau commit dadd486a2f6db966992f5100eaf497ac99264835 Author: Suresh Siddha Date: Wed Oct 24 15:24:30 2012 -0700 x86, ioapic: initialize nr_ioapic_registers early in mp_register_ioapic() Lin Bao reported that one of the HP platforms failed to boot 2.6.32 kernel, when the BIOS enabled interrupt-remapping and x2apic before handing over the control to the Linux kernel. During boot, Linux kernel masks all the interrupt sources (8259, IO-APIC RTE's), setup the interrupt-remapping hardware with the OS controlled table and unmasks the 8259 interrupts but not the IO-APIC RTE's (as the newly setup interrupt-remapping table and the IO-APIC RTE's are not yet programmed by the kernel). Shortly after this, IO-APIC RTE's and the interrupt-remapping table entries are programmed based on the ACPI tables etc. So the expectation is that any interrupt during this window will be dropped and not see the intermediate configuration. In the reported problematic case, BIOS has configured the IO-APIC in virtual wire-B mode. Between the window of the kernel setting up new interrupt-remapping table and the IO-APIC RTE's are properly configured, an interrupt gets routed by the IO-APIC RTE (setup by the virtual wire-B configuration) and sees the empty interrupt-remapping table entry, resulting in vt-d fault causing the platform to generate NMI. And the OS panics on this unexpected NMI. This problem doesn't happen with more recent kernels and closer look at the 2.6.32 kernel shows that the code which masks the IO-APIC RTE's is not working as expected as the nr_ioapic_registers for each IO-APIC is not yet initialized at this point. In the later kernels we initialize nr_ioapic_registers much before and everything works as expected. For 2.6.[32..34] kernels, fix this issue by initializing nr_ioapic_registers early in mp_register_ioapic() [ Relevant upstream commit info: commit 7716a5c4ff5f1f3dc5e9edcab125cbf7fceef0af Author: Eric W. Biederman Date: Tue Mar 30 01:07:12 2010 -0700 x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic. As the upstream commit depends on quite a few prior commits and some followup fixes in the mainline, we just picked the smallest relevant hunk for fixing the issue at hand. Problematic platform uses ACPI for IO-APIC, VT-d enumeration etc and this hunk only touches the ACPI based platforms. nr_ioapic_reigsters initialization in enable_IO_APIC() is still retained, so that other configurations like legacy MPS table based enumeration etc works with no change. ] Reported-and-tested-by: Zhang, Lin-Bao Signed-off-by: Suresh Siddha Cc: stable@vger.kernel.org Reviewed-by: Jonathan Nieder Acked-by: "Eric W. Biederman" Signed-off-by: Willy Tarreau commit cda9add3150b19b35e456250cbb164ec31474d5a Author: Takashi Iwai Date: Thu Oct 11 16:43:40 2012 +0200 ALSA: ac97 - Fix missing NULL check in snd_ac97_cvol_new() commit 733a48e5ae5bf28b046fad984d458c747cbb8c21 upstream. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44721 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 402adfdd8f83dd9b9ee8417db32ca508d2a2c65f Author: Takashi Iwai Date: Fri Mar 8 18:11:17 2013 +0100 ALSA: seq: Fix missing error handling in snd_seq_timer_open() commit 66efdc71d95887b652a742a5dae51fa834d71465 upstream. snd_seq_timer_open() didn't catch the whole error path but let through if the timer id is a slave. This may lead to Oops by accessing the uninitialized pointer. BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae IP: [] snd_seq_timer_open+0xe7/0x130 PGD 785cd067 PUD 76964067 PMD 0 Oops: 0002 [#4] SMP CPU 0 Pid: 4288, comm: trinity-child7 Tainted: G D W 3.9.0-rc1+ #100 Bochs Bochs RIP: 0010:[] [] snd_seq_timer_open+0xe7/0x130 RSP: 0018:ffff88006ece7d38 EFLAGS: 00010246 RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000 RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38 RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007 FS: 00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290) Stack: 0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d 65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000 ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520 Call Trace: [] ? trace_hardirqs_on+0xd/0x10 [] snd_seq_queue_timer_open+0x29/0x70 [] snd_seq_ioctl_set_queue_timer+0xda/0x120 [] snd_seq_do_ioctl+0x9b/0xd0 [] snd_seq_ioctl+0x10/0x20 [] do_vfs_ioctl+0x522/0x570 [] ? file_has_perm+0x83/0xa0 [] ? trace_hardirqs_on+0xd/0x10 [] sys_ioctl+0x5d/0xa0 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b Reported-and-tested-by: Tommi Rantala Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0b6c9f057c24cfb1543c55c378a7e3ecb61716d9 Author: Takashi Iwai Date: Tue Dec 1 12:19:37 2009 +0100 ALSA: hda - Add a pin-fix for FSC Amilo Pi1505 FSC Amilo Pi 1505 has a buggy BIOS and doesn't set up the HP and speaker pins properly. Add the pinfix entry for that. Reference: Novell bnc#557403 https://bugzilla.novell.com/show_bug.cgi?id=557403 [2.6.32: additional background from Jonathan below] > Hi Willy, > > Please consider > > cfc9b06f0bef ALSA: hda - Add a pin-fix for FSC Amilo Pi1505 > > for application to the 2.6.32.y tree. Without this patch, the Amilo > Pi 1505's internal speaker is silent unless a jack is plugged into its > headphone jack. > > Jose Manuel Castroagudin noticed[1] that 2.6.30 is not affected, so > this seems to be a regression. > > The patch was applied upstream during the 2.6.33 merge window, where > it worked. That said, I didn't manage to track down anyone with a > Pi1505 to test it against 2.6.32, so thoughts from alsa folks on > whether this is appropriate for 2.6.32.y would be useful. > > Hope that helps, > Jonathan > > [1] http://bugs.debian.org/599582 has many more details. Signed-off-by: Takashi Iwai (cherry picked from commit cfc9b06f0befe50ef02253f72b76946363549031) Cc: Jonathan Nieder Signed-off-by: Willy Tarreau commit da8aad73d6ae5936483ee9b5dfc100be8bfadbc5 Author: Kailang Yang Date: Thu Dec 17 12:23:00 2009 +0100 ALSA: hda - More ALC663 fixes and support of compatible chips commit ebb83eeb6469bedda83b4dc6f23ddf93eb32b347 upstream. 1. Add more ASUS NB model. 2. Fixed alc663_m51va_setup M51VA has Digital Mic that NID is 0x12. The record source index is 0x9 for ALC663. So, to modify the alc663_m51va_setup function to index 0x9 and add analog Mic aupport function alc663_mode1_setup. 3. Add ASUS mode7 and mode8 modules for ALC663 [jn: backport to 2.6.32.y to address http://bugs.debian.org/688564] Signed-off-by: Kailang Yang Signed-off-by: Takashi Iwai Signed-off-by: Jonathan Nieder Tested-by: Bill Allombert # Vaio w/ ALC275 Signed-off-by: Willy Tarreau commit aaa1bc4772e1a66ac5738fad439586c35d56c25e Author: Mel Gorman Date: Mon Oct 8 16:29:17 2012 -0700 mempolicy: fix a race in shared_policy_replace() commit b22d127a39ddd10d93deee3d96e643657ad53a49 upstream. shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot be dereferenced if sp->lock is not held and 2) another thread can modify sp_node between spin_unlock for allocating a new sp node and next spin_lock. The bug was introduced before 2.6.12-rc2. Kosaki's original patch for this problem was to allocate an sp node and policy within shared_policy_replace and initialise it when the lock is reacquired. I was not keen on this approach because it partially duplicates sp_alloc(). As the paths were sp->lock is taken are not that performance critical this patch converts sp->lock to sp->mutex so it can sleep when calling sp_alloc(). [kosaki.motohiro@jp.fujitsu.com: Original patch] Signed-off-by: Mel Gorman Acked-by: KOSAKI Motohiro Reviewed-by: Christoph Lameter Cc: Josh Boyer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit bc68657ff7acd52c300fbfadddbab8069f1b7b4f Author: Hugh Dickins Date: Mon Oct 8 16:33:14 2012 -0700 mm: fix invalidate_complete_page2() lock ordering commit ec4d9f626d5908b6052c2973f37992f1db52e967 upstream. In fuzzing with trinity, lockdep protested "possible irq lock inversion dependency detected" when isolate_lru_page() reenabled interrupts while still holding the supposedly irq-safe tree_lock: invalidate_inode_pages2 invalidate_complete_page2 spin_lock_irq(&mapping->tree_lock) clear_page_mlock isolate_lru_page spin_unlock_irq(&zone->lru_lock) isolate_lru_page() is correct to enable interrupts unconditionally: invalidate_complete_page2() is incorrect to call clear_page_mlock() while holding tree_lock, which is supposed to nest inside lru_lock. Both truncate_complete_page() and invalidate_complete_page() call clear_page_mlock() before taking tree_lock to remove page from radix_tree. I guess invalidate_complete_page2() preferred to test PageDirty (again) under tree_lock before committing to the munlock; but since the page has already been unmapped, its state is already somewhat inconsistent, and no worse if clear_page_mlock() moved up. Reported-by: Sasha Levin Deciphered-by: Andrew Morton Signed-off-by: Hugh Dickins Acked-by: Mel Gorman Cc: Rik van Riel Cc: Johannes Weiner Cc: Michel Lespinasse Cc: Ying Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 61daa923b480d1bb6f2340d8bca26b6503723d41 Author: Takamori Yamaguchi Date: Thu Nov 8 15:53:39 2012 -0800 mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() commit b0a8cc58e6b9aaae3045752059e5e6260c0b94bc upstream. In kswapd(), set current->reclaim_state to NULL before returning, as current->reclaim_state holds reference to variable on kswapd()'s stack. In rare cases, while returning from kswapd() during memory offlining, __free_slab() and freepages() can access the dangling pointer of current->reclaim_state. Signed-off-by: Takamori Yamaguchi Signed-off-by: Aaditya Kumar Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 6d9f42bed29dd5e643d392b5e038cb0c12c96678 Author: Christoffer Dall Date: Fri Dec 21 13:03:50 2012 -0500 mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED commit ad4b3fb7ff9940bcdb1e4cd62bd189d10fa636ba upstream. Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and (PageHead) is true, for tail pages. If this is indeed the intended behavior, which I doubt because it breaks cache cleaning on some ARM systems, then the nomenclature is highly problematic. This patch makes sure PageHead is only true for head pages and PageTail is only true for tail pages, and neither is true for non-compound pages. [ This buglet seems ancient - seems to have been introduced back in Apr 2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new macros". And the reason nobody noticed is because the PageHead() tests are almost all about just sanity-checking, and only used on pages that are actual page heads. The fact that the old code returned true for tail pages too was thus not really noticeable. - Linus ] Signed-off-by: Christoffer Dall Acked-by: Andrea Arcangeli Cc: Andrew Morton Cc: Will Deacon Cc: Steve Capper Cc: Christoph Lameter Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 6b36635bfe86eb1bd5c9ee1b1b6d343349ea9942 Author: Dave Hansen Date: Wed May 30 07:51:07 2012 -0700 mm: fix vma_resv_map() NULL pointer commit 4523e1458566a0e8ecfaff90f380dd23acc44d27 upstream hugetlb_reserve_pages() can be used for either normal file-backed hugetlbfs mappings, or MAP_HUGETLB. In the MAP_HUGETLB, semi-anonymous mode, there is not a VMA around. The new call to resv_map_put() assumed that there was, and resulted in a NULL pointer dereference: BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: vma_resv_map+0x9/0x30 PGD 141453067 PUD 1421e1067 PMD 0 Oops: 0000 [#1] PREEMPT SMP ... Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36 RIP: vma_resv_map+0x9/0x30 ... Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0) Call Trace: resv_map_put+0xe/0x40 hugetlb_reserve_pages+0xa6/0x1d0 hugetlb_file_setup+0x102/0x2c0 newseg+0x115/0x360 ipcget+0x1ce/0x310 sys_shmget+0x5a/0x60 system_call_fastpath+0x16/0x1b This was reported by Dave Jones, but was reproducible with the libhugetlbfs test cases, so shame on me for not running them in the first place. With this, the oops is gone, and the output of libhugetlbfs's run_tests.py is identical to plain 3.4 again. [ Marked for stable, since this was introduced by commit c50ac050811d ("hugetlb: fix resv_map leak in error path") which was also marked for stable ] Reported-by: Dave Jones Cc: Mel Gorman Cc: KOSAKI Motohiro Cc: Christoph Lameter Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit bc89fc4aa9a77e1a3411a514f9d7e7c17317c239 Author: Dave Hansen Date: Tue May 29 15:06:46 2012 -0700 hugetlb: fix resv_map leak in error path commit c50ac050811d6485616a193eb0f37bfbd191cc89 upstream When called for anonymous (non-shared) mappings, hugetlb_reserve_pages() does a resv_map_alloc(). It depends on code in hugetlbfs's vm_ops->close() to release that allocation. However, in the mmap() failure path, we do a plain unmap_region() without the remove_vma() which actually calls vm_ops->close(). This is a decent fix. This leak could get reintroduced if new code (say, after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return an error. But, I think it would have to unroll the reservation anyway. Christoph's test case: http://marc.info/?l=linux-mm&m=133728900729735 This patch applies to 3.4 and later. A version for earlier kernels is at https://lkml.org/lkml/2012/5/22/418. Signed-off-by: Dave Hansen Acked-by: Mel Gorman Acked-by: KOSAKI Motohiro Reported-by: Christoph Lameter Tested-by: Christoph Lameter Cc: Andrea Arcangeli Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 83c84290dc4a61a8898967ea99c2ac44f3fea744 Author: Namhyung Kim Date: Mon Apr 1 21:46:23 2013 +0900 tracing: Fix double free when function profile init failed commit 83e03b3fe4daffdebbb42151d5410d730ae50bd1 upstream. On the failure path, stat->start and stat->pages will refer same page. So it'll attempt to free the same page again and get kernel panic. Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim Cc: Frederic Weisbecker Cc: Namhyung Kim Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a2cefc7b084718d81b4ab765e948fb77993a9f5e Author: Wen Congyang Date: Thu Sep 20 14:04:47 2012 +0800 tracing: Don't call page_to_pfn() if page is NULL commit 85f2a2ef1d0ab99523e0b947a2b723f5650ed6aa upstream. When allocating memory fails, page is NULL. page_to_pfn() will cause the kernel panicked if we don't use sparsemem vmemmap. Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.com Acked-by: Mel Gorman Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Andrew Morton Reviewed-by: Minchan Kim Signed-off-by: Wen Congyang Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 48bac8099a1b28e7d400d89c067e64ea5e196ef0 Author: Li Zhong Date: Tue Jul 24 15:02:49 2012 -0700 Fix a dead loop in async_synchronize_full() [Fixed upstream by commits 2955b47d2c1983998a8c5915cb96884e67f7cb53 and a4683487f90bfe3049686fc5c566bdc1ad03ace6 from Dan Williams, but they are much more intrusive than this tiny fix, according to Andrew - gregkh] This patch tries to fix a dead loop in async_synchronize_full(), which could be seen when preemption is disabled on a single cpu machine. void async_synchronize_full(void) { do { async_synchronize_cookie(next_cookie); } while (!list_empty(&async_running) || ! list_empty(&async_pending)); } async_synchronize_cookie() calls async_synchronize_cookie_domain() with &async_running as the default domain to synchronize. However, there might be some works in the async_pending list from other domains. On a single cpu system, without preemption, there is no chance for the other works to finish, so async_synchronize_full() enters a dead loop. It seems async_synchronize_full() wants to synchronize all entries in all running lists(domains), so maybe we could just check the entry_count to know whether all works are finished. Currently, async_synchronize_cookie_domain() expects a non-NULL running list ( if NULL, there would be NULL pointer dereference ), so maybe a NULL pointer could be used as an indication for the functions to synchronize all works in all domains. Reported-by: Paul E. McKenney Signed-off-by: Li Zhong Tested-by: Paul E. McKenney Tested-by: Christian Kujau Cc: Andrew Morton Cc: Dan Williams Cc: Christian Kujau Cc: Andrew Morton Cc: Cong Wang Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 5e224edf1762931e0259d7d1dcedbbeae4b7d29e Author: Tejun Heo Date: Mon Nov 19 08:13:35 2012 -0800 cgroup: remove incorrect dget/dput() pair in cgroup_create_dir() commit 175431635ec09b1d1bba04979b006b99e8305a83 upstream. cgroup_create_dir() does weird dancing with dentry refcnt. On success, it gets and then puts it achieving nothing. On failure, it puts but there isn't no matching get anywhere leading to the following oops if cgroup_create_file() fails for whatever reason. ------------[ cut here ]------------ kernel BUG at /work/os/work/fs/dcache.c:552! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU 2 Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs RIP: 0010:[] [] dput+0x1dc/0x1e0 RSP: 0018:ffff88001a3ebef8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403 RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58 RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001 R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60 FS: 00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080) Stack: ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000 ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8 ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8 Call Trace: [] done_path_create+0x19/0x50 [] sys_mkdirat+0x59/0x80 [] sys_mkdir+0x19/0x20 [] system_call_fastpath+0x16/0x1b Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 RIP [] dput+0x1dc/0x1e0 RSP ---[ end trace 1277bcfd9561ddb0 ]--- Fix it by dropping the unnecessary dget/dput() pair. Signed-off-by: Tejun Heo Acked-by: Li Zefan Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a21f3aa1d36219b92389d68f585d3bc405ba3e78 Author: Bjorn Helgaas Date: Tue Jan 29 16:44:27 2013 -0700 Driver core: treat unregistered bus_types as having no devices commit 4fa3e78be7e985ca814ce2aa0c09cbee404efcf7 upstream. A bus_type has a list of devices (klist_devices), but the list and the subsys_private structure that contains it are not initialized until the bus_type is registered with bus_register(). The panic/reboot path has fixups that look up devices in pci_bus_type. If we panic before registering pci_bus_type, the bus_type exists but the list does not, so mach_reboot_fixups() trips over a null pointer and panics again: mach_reboot_fixups pci_get_device .. bus_find_device(&pci_bus_type, ...) bus->p is NULL Joonsoo reported a problem when panicking before PCI was initialized. I think this patch should be sufficient to replace the patch he posted here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip reboot_fixups in early boot phase") Reported-by: Joonsoo Kim Signed-off-by: Bjorn Helgaas Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 0cb1546ca0c37fd23d0ba86deff5167a0ff514b4 Author: T Makphaibulchoke Date: Thu Oct 4 17:16:55 2012 -0700 kernel/resource.c: fix stack overflow in __reserve_region_with_split() commit 4965f5667f36a95b41cda6638875bc992bd7d18b upstream. Using a recursive call add a non-conflicting region in __reserve_region_with_split() could result in a stack overflow in the case that the recursive calls are too deep. Convert the recursive calls to an iterative loop to avoid the problem. Tested on a machine containing 135 regions. The kernel no longer panicked with stack overflow. Also tested with code arbitrarily adding regions with no conflict, embedding two consecutive conflicts and embedding two non-consecutive conflicts. Signed-off-by: T Makphaibulchoke Reviewed-by: Ram Pai Cc: Paul Gortmaker Cc: Wei Yang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit b4c006c379f551e0f31d0ef17273e71915d9638d Author: Thadeu Lima de Souza Cascardo Date: Thu Oct 25 13:37:51 2012 -0700 genalloc: stop crashing the system when destroying a pool commit eedce141cd2dad8d0cefc5468ef41898949a7031 upstream. The genalloc code uses the bitmap API from include/linux/bitmap.h and lib/bitmap.c, which is based on long values. Both bitmap_set from lib/bitmap.c and bitmap_set_ll, which is the lockless version from genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in the bitmap. That one uses (1 << bits) - 1, 0b111, if you are setting the first three bits. This means that the API counts from the least significant bits (LSB from now on) to the MSB. The LSB in the first long is bit 0, then. The same works for the lookup functions. The genalloc code uses longs for the bitmap, as it should. In include/linux/genalloc.h, struct gen_pool_chunk has unsigned long bits[0] as its last member. When allocating the struct, genalloc should reserve enough space for the bitmap. This should be a proper number of longs that can fit the amount of bits in the bitmap. However, genalloc allocates an integer number of bytes that fit the amount of bits, but may not be an integer amount of longs. 9 bytes, for example, could be allocated for 70 bits. This is a problem in itself if the Least Significat Bit in a long is in the byte with the largest address, which happens in Big Endian machines. This means genalloc is not allocating the byte in which it will try to set or check for a bit. This may end up in memory corruption, where genalloc will try to set the bits it has not allocated. In fact, genalloc may not set these bits because it may find them already set, because they were not zeroed since they were not allocated. And that's what causes a BUG when gen_pool_destroy is called and check for any set bits. What really happens is that genalloc uses kmalloc_node with __GFP_ZERO on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab will be cleared, not only the requested bytes. Since struct gen_pool_chunk has a size that is a multiple of 8, and slab sizes are multiples of 8, we get lucky and allocate and clear the right amount of bytes. Hower, this is not the case with SLOB or with older code that did memset after allocating instead of using __GFP_ZERO. So, a simple module as this (running 3.6.0), will cause a crash when rmmod'ed. [root@phantom-lp2 foo]# cat foo.c #include #include #include #include MODULE_LICENSE("GPL"); MODULE_VERSION("0.1"); static struct gen_pool *foo_pool; static __init int foo_init(void) { int ret; foo_pool = gen_pool_create(10, -1); if (!foo_pool) return -ENOMEM; ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1); if (ret) { gen_pool_destroy(foo_pool); return ret; } return 0; } static __exit void foo_exit(void) { gen_pool_destroy(foo_pool); } module_init(foo_init); module_exit(foo_exit); [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB CONFIG_SLOB=y [root@phantom-lp2 foo]# insmod ./foo.ko [root@phantom-lp2 foo]# rmmod foo ------------[ cut here ]------------ kernel BUG at lib/genalloc.c:243! cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960] pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110 lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110 sp: c0000000bb0e7be0 msr: 8000000000029032 current = 0xc0000000bb0e0000 paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01 pid = 13044, comm = rmmod kernel BUG at lib/genalloc.c:243! [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo] [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290 [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94 --- Exception: c00 (System Call) at 000000800753d1a0 SP (fffd0b0e640) is in userspace Signed-off-by: Thadeu Lima de Souza Cascardo Cc: Paul Gortmaker Cc: Benjamin Gaignard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 501563ac4d9083bf1bf98b3d83a98992d798e4ae Author: Steven Rostedt Date: Thu Nov 29 22:31:16 2012 -0500 ring-buffer: Fix race between integrity check and readers commit 54f7be5b831254199522523ccab4c3d954bbf576 upstream. The function rb_check_pages() was added to make sure the ring buffer's pages were sane. This check is done when the ring buffer size is modified as well as when the iterator is released (closing the "trace" file), as that was considered a non fast path and a good place to do a sanity check. The problem is that the check does not have any locks around it. If one process were to read the trace file, and another were to read the raw binary file, the check could happen while the reader is reading the file. The issues with this is that the check requires to clear the HEAD page before doing the full check and it restores it afterward. But readers require the HEAD page to exist before it can read the buffer, otherwise it gives a nasty warning and disables the buffer. By adding the reader lock around the check, this keeps the race from happening. Signed-off-by: Steven Rostedt Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit a289a811c8210d79bf8bb29f5e8600edbbe37593 Author: Shawn Guo Date: Thu Oct 4 17:12:23 2012 -0700 kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() commit f96972f2dc6365421cf2366ebd61ee4cf060c8d5 upstream. As kernel_power_off() calls disable_nonboot_cpus(), we may also want to have kernel_restart() call disable_nonboot_cpus(). Doing so can help machines that require boot cpu be the last alive cpu during reboot to survive with kernel restart. This fixes one reboot issue seen on imx6q (Cortex-A9 Quad). The machine requires that the restart routine be run on the primary cpu rather than secondary ones. Otherwise, the secondary core running the restart routine will fail to come to online after reboot. Signed-off-by: Shawn Guo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 8b7435d260ce2dce26751e5477c339eb02ad07e7 Author: Denys Vlasenko Date: Wed Sep 26 11:34:50 2012 +1000 coredump: prevent double-free on an error path in core dumper commit f34f9d186df35e5c39163444c43b4fc6255e39c5 upstream. In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate memory for info->fields, it frees already allocated stuff and returns error to its caller, fill_note_info. Which in turn returns error to its caller, elf_core_dump. Which jumps to cleanup label and calls free_note_info, which will happily try to free all info->fields again. BOOM. This is the fix. Signed-off-by: Oleg Nesterov Signed-off-by: Denys Vlasenko Cc: Venu Byravarasu Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 16365e5badcb314f350c47d4a9be90ed8a61b71a Author: Oleg Nesterov Date: Tue Feb 19 17:31:10 2013 +0000 wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task CVE-2013-0871 BugLink: http://bugs.launchpad.net/bugs/1129192 wake_up_process() should never wakeup a TASK_STOPPED/TRACED task. Change it to use TASK_NORMAL and add the WARN_ON(). TASK_ALL has no other users, probably can be killed. Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds (backported from commit 9067ac85d533651b98c2ff903182a20cbb361fcb) Conflicts: kernel/sched/core.c Signed-off-by: Luis Henriques Acked-by: Colin King Signed-off-by: Tim Gardner Signed-off-by: Willy Tarreau commit 77ae04b3a15ae4a1bd063be6e2768d6f489f51b5 Author: Andrew Morton Date: Wed Mar 13 14:59:34 2013 -0700 kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER commit 522cff142d7d2f9230839c9e1f21a4d8bcc22a4a upstream. __ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and later kernels, per Kees. Signed-off-by: Andrew Morton Cc: Emese Revfy Cc: Emese Revfy Cc: PaX Team Cc: Al Viro Cc: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Serge Hallyn Cc: Julien Tinnes Signed-off-by: Linus Torvalds Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit e3542c8c8c0a689c990acdb684e27989b96169e1 Author: Ben Hutchings Date: Sun Nov 25 22:24:19 2012 -0500 signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorer Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side struct sigaction declarations'. flush_signal_handlers() needs to know whether sigaction::sa_restorer is defined, not whether SA_RESTORER is defined. Define the __ARCH_HAS_SA_RESTORER macro to indicate this. Signed-off-by: Ben Hutchings Cc: Al Viro Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d7328a9bffdd25bb510d98e2ed5bcced32a9475d Author: Emese Revfy Date: Wed Apr 17 15:58:36 2013 -0700 kernel/signal.c: stop info leak via the tkill and the tgkill syscalls commit b9e146d8eb3b9ecae5086d373b50fa0c1f3e7f0f upstream. This fixes a kernel memory contents leak via the tkill and tgkill syscalls for compat processes. This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field when handling signals delivered from tkill. The place of the infoleak: int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from) { ... put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr); ... } Signed-off-by: Emese Revfy Reviewed-by: PaX Team Signed-off-by: Kees Cook Cc: Al Viro Cc: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Serge Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 96ec5658179a39389e482d4d8e31d1aeeea9c539 Author: John Johansen Date: Thu Mar 21 05:04:13 2013 -0700 ptrace: Fix ptrace when task is in task_is_stopped() state This patch fixes a regression in ptrace, introduced by commit 9e74eb39 (backport of 9899d11f) which makes assumptions about ptrace behavior which are not true in the 2.6.32 kernel. BugLink: http://bugs.launchpad.net/bugs/1145234 9899d11f makes the assumption that task_is_stopped() is not a valid state in ptrace because it is built on top of a series of patches which change how the TASK_STOPPED state is tracked (321fb561 which requires d79fdd6d and several other patches). Because Lucid does not have the set of patches that make task_is_stopped() an invalid state in ptrace_check_attach, partially revert 9e74eb39 so that ptrace_check_attach() correctly handles task_is_stopped(). However we must replace the assignment of TASK_TRACED with __TASK_TRACED to ensure TASK_WAKEKILL is cleared. Signed-off-by: John Johansen Acked-by: Colin King Acked-by: Stefan Bader Acked-by: Luis Henriques Signed-off-by: Tim Gardner Cc: Oleg Nesterov Signed-off-by: Willy Tarreau commit 062bdcb3e5d3e9892c91eff5950dac767af71fae Author: Oleg Nesterov Date: Tue Feb 19 17:31:09 2013 +0000 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL CVE-2013-0871 BugLink: http://bugs.launchpad.net/bugs/1129192 putreg() assumes that the tracee is not running and pt_regs_access() can safely play with its stack. However a killed tracee can return from ptrace_stop() to the low-level asm code and do RESTORE_REST, this means that debugger can actually read/modify the kernel stack until the tracee does SAVE_REST again. set_task_blockstep() can race with SIGKILL too and in some sense this race is even worse, the very fact the tracee can be woken up breaks the logic. As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace() call, this ensures that nobody can ever wakeup the tracee while the debugger looks at it. Not only this fixes the mentioned problems, we can do some cleanups/simplifications in arch_ptrace() paths. Probably ptrace_unfreeze_traced() needs more callers, for example it makes sense to make the tracee killable for oom-killer before access_process_vm(). While at it, add the comment into may_ptrace_stop() to explain why ptrace_stop() still can't rely on SIGKILL and signal_pending_state(). Reported-by: Salman Qazi Reported-by: Suleiman Souhlal Suggested-by: Linus Torvalds Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds (backported from commit 9899d11f654474d2d54ea52ceaa2a1f4db3abd68) Conflicts: arch/x86/kernel/step.c kernel/ptrace.c kernel/signal.c Signed-off-by: Luis Henriques Acked-by: Colin King Signed-off-by: Tim Gardner Signed-off-by: Willy Tarreau commit 905f1272f58fbeceee4f5b66527a0b1c367f8c57 Author: Oleg Nesterov Date: Tue Feb 19 17:31:08 2013 +0000 ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() CVE-2013-0871 BugLink: http://bugs.launchpad.net/bugs/1129192 Cleanup and preparation for the next change. signal_wake_up(resume => true) is overused. None of ptrace/jctl callers actually want to wakeup a TASK_WAKEKILL task, but they can't specify the necessary mask. Turn signal_wake_up() into signal_wake_up_state(state), reintroduce signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up() which adds __TASK_TRACED. This way ptrace_signal_wake_up() can work "inside" ptrace_request() even if the tracee doesn't have the TASK_WAKEKILL bit set. Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds (backported from commit 910ffdb18a6408e14febbb6e4b6840fd2c928c82) Conflicts: kernel/ptrace.c kernel/signal.c Signed-off-by: Luis Henriques Acked-by: Colin King Signed-off-by: Tim Gardner Signed-off-by: Willy Tarreau commit fd2ab7dcdaaf7fd5ec4e1e702b405610069f052c Author: Oleg Nesterov Date: Tue Feb 19 17:31:07 2013 +0000 ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread CVE-2013-0871 BugLink: http://bugs.launchpad.net/bugs/1129192 It is not clear why ptrace_resume() does wake_up_process(). Unless the caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do not need the extra and potentionally spurious wakeup. If the caller is PTRACE_KILL, wake_up_process() is even more wrong. The tracee can sleep in any state in any place, and if we have a buggy code which doesn't handle a spurious wakeup correctly PTRACE_KILL can be used to exploit it. For example: int main(void) { int child, status; child = fork(); if (!child) { int ret; assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); ret = pause(); printf("pause: %d %m\n", ret); return 0x23; } sleep(1); assert(ptrace(PTRACE_KILL, child, 0,0) == 0); assert(child == wait(&status)); printf("wait: %x\n", status); return 0; } prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the userland. In this case sys_pause() is buggy as well and should be fixed. I do not know what was the original rationality behind PTRACE_KILL. The man page is simply wrong and afaics it was always wrong. Imho it should be deprecated, or may be it should do send_sig(SIGKILL) as Denys suggests, but in any case I do not think that the current behaviour was intentional. Note: there is another problem, ptrace_resume() changes ->exit_code and this can race with SIGKILL too. Eventually we should change ptrace to not use ->exit_code. Signed-off-by: Oleg Nesterov (cherry picked from commit 0666fb51b1483f27506e212cc7f7b2645b5c7acc) Signed-off-by: Luis Henriques Acked-by: Colin King Signed-off-by: Tim Gardner Signed-off-by: Willy Tarreau commit 0433cb597b3c1143c5436d28ea35408df02f0670 Author: Kees Cook Date: Wed Mar 13 14:59:33 2013 -0700 signal: always clear sa_restorer on execve commit 2ca39528c01a933f6689cd6505ce65bd6d68a530 upstream. When the new signal handlers are set up, the location of sa_restorer is not cleared, leaking a parent process's address space location to children. This allows for a potential bypass of the parent's ASLR by examining the sa_restorer value returned when calling sigaction(). Based on what should be considered "secret" about addresses, it only matters across the exec not the fork (since the VMAs haven't changed until the exec). But since exec sets SIG_DFL and keeps sa_restorer, this is where it should be fixed. Given the few uses of sa_restorer, a "set" function was not written since this would be the only use. Instead, we use __ARCH_HAS_SA_RESTORER, as already done in other places. Example of the leak before applying this patch: $ cat /proc/$$/maps ... 7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so ... $ ./leak ... 7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so ... 1 0 (nil) 0x7fb9f30b94a0 2 4000000 (nil) 0x7f278bcaa4a0 3 4000000 (nil) 0x7f278bcaa4a0 4 0 (nil) 0x7fb9f30b94a0 ... [akpm@linux-foundation.org: use SA_RESTORER for backportability] Signed-off-by: Kees Cook Reported-by: Emese Revfy Cc: Emese Revfy Cc: PaX Team Cc: Al Viro Cc: Oleg Nesterov Cc: "Eric W. Biederman" Cc: Serge Hallyn Cc: Julien Tinnes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit fa078af1f06731a76f76a2f5fd2dd4b4b1dc167e Author: Kees Cook Date: Mon Dec 17 16:03:20 2012 -0800 exec: use -ELOOP for max recursion depth commit d740269867021faf4ce38a449353d2b986c34a67 upstream To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { *argv-- = cmd; *argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: Kees Cook Cc: halfdog Cc: P J P Cc: Alexander Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 59e376d0e4f12daf3817621c023e13556996c851 Author: Kees Cook Date: Thu Dec 20 15:05:16 2012 -0800 exec: do not leave bprm->interp on stack commit b66c5984017533316fd1951770302649baf1aa33 upstream If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook Cc: halfdog Cc: P J P Cc: Alexander Viro Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit ec4808e70c93b17965a7c6de9f243eceb02389a9 Author: Oleg Nesterov Date: Fri Mar 23 15:02:50 2012 -0700 kmod: make __request_module() killable commit 1cc684ab75123efe7ff446eb821d44375ba8fa30 upstream As Tetsuo Handa pointed out, request_module() can stress the system while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE. The task T uses "almost all" memory, then it does something which triggers request_module(). Say, it can simply call sys_socket(). This in turn needs more memory and leads to OOM. oom-killer correctly chooses T and kills it, but this can't help because it sleeps in TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the TIF_MEMDIE task T. Make __request_module() killable. The only necessary change is that call_modprobe() should kmalloc argv and module_name, they can't live in the stack if we use UMH_KILLABLE. This memory is freed via call_usermodehelper_freeinfo()->cleanup. Reported-by: Tetsuo Handa Signed-off-by: Oleg Nesterov Cc: Rusty Russell Cc: Tejun Heo Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf, bwh: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit d5734d32817bb3a425eee4b1f5b9be78e9a8684d Author: Oleg Nesterov Date: Fri Mar 23 15:02:49 2012 -0700 kmod: introduce call_modprobe() helper commit 3e63a93b987685f02421e18b2aa452d20553a88b upstream No functional changes. Move the call_usermodehelper code from __request_module() into the new simple helper, call_modprobe(). Signed-off-by: Oleg Nesterov Cc: Tetsuo Handa Cc: Rusty Russell Cc: Tejun Heo Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 67fd05cc8b160538a9da654758c45e0295272c47 Author: Oleg Nesterov Date: Fri Mar 23 15:02:49 2012 -0700 usermodehelper: ____call_usermodehelper() doesn't need do_exit() commit 5b9bd473e3b8a8c6c4ae99be475e6e9b27568555 upstream Minor cleanup. ____call_usermodehelper() can simply return, no need to call do_exit() explicitely. Signed-off-by: Oleg Nesterov Cc: Tetsuo Handa Cc: Rusty Russell Cc: Tejun Heo Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: adjusted to apply to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit e2a28e9a9f98d9a73ea5ce58328595dcab64a1f6 Author: Oleg Nesterov Date: Fri Mar 23 15:02:47 2012 -0700 usermodehelper: implement UMH_KILLABLE commit d0bd587a80960d7ba7e0c8396e154028c9045c54 upstream Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC. The caller must ensure that subprocess_info->path/etc can not go away until call_usermodehelper_freeinfo(). call_usermodehelper_exec(UMH_KILLABLE) does wait_for_completion_killable. If it fails, it uses xchg(&sub_info->complete, NULL) to serialize with umh_complete() which does the same xhcg() to access sub_info->complete. If call_usermodehelper_exec wins, it can safely return. umh_complete() should get NULL and call call_usermodehelper_freeinfo(). Otherwise we know that umh_complete() was already called, in this case call_usermodehelper_exec() falls back to wait_for_completion() which should succeed "very soon". Note: UMH_NO_WAIT == -1 but it obviously should not be used with UMH_KILLABLE. We delay the neccessary cleanup to simplify the back porting. Signed-off-by: Oleg Nesterov Cc: Tetsuo Handa Cc: Rusty Russell Cc: Tejun Heo Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit 13db73535cc63e0034b901a287cf5bb20919825b Author: Oleg Nesterov Date: Fri Mar 23 15:02:47 2012 -0700 usermodehelper: introduce umh_complete(sub_info) commit b3449922502f5a161ee2b5022a33aec8472fbf18 upstream Preparation. Add the new trivial helper, umh_complete(). Currently it simply does complete(sub_info->complete). Signed-off-by: Oleg Nesterov Cc: Tetsuo Handa Cc: Rusty Russell Cc: Tejun Heo Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [dannf: Adjusted to apply to Debian's 2.6.32] Signed-off-by: Willy Tarreau commit dbd3462bbd6cbdfbe8c9db662b2f2b4a90311db3 Author: Kees Cook Date: Thu Oct 25 13:38:14 2012 -0700 gen_init_cpio: avoid stack overflow when expanding commit 20f1de659b77364d55d4e7fad2ef657e7730323f upstream. Fix possible overflow of the buffer used for expanding environment variables when building file list. In the extremely unlikely case of an attacker having control over the environment variables visible to gen_init_cpio, control over the contents of the file gen_init_cpio parses, and gen_init_cpio was built without compiler hardening, the attacker can gain arbitrary execution control via a stack buffer overflow. $ cat usr/crash.list file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0 $ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list *** buffer overflow detected ***: ./usr/gen_init_cpio terminated This also replaces the space-indenting with tabs. Patch based on existing fix extracted from grsecurity. Signed-off-by: Kees Cook Cc: Michal Marek Cc: Brad Spengler Cc: PaX Team Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 56fb3d90a30b207a17562fc18bf62f30e566e319 Author: Jean Delvare Date: Wed Nov 21 09:24:42 2012 +0100 kbuild: Fix gcc -x syntax This is upstream commit b1e0d8b70fa31821ebca3965f2ef8619d7c5e316 backported to the 2.6.32.x stable branch. The correct syntax for gcc -x is "gcc -x assembler", not "gcc -xassembler". Even though the latter happens to work, the former is what is documented in the manual page and thus what gcc wrappers such as icecream do expect. This isn't a cosmetic change. The missing space prevents icecream from recognizing compilation tasks it can't handle, leading to silent kernel miscompilations. Besides me, credits go to Michael Matz and Dirk Mueller for investigating the miscompilation issue and tracking it down to this incorrect -x parameter syntax. Signed-off-by: Jean Delvare Cc: stable@vger.kernel.org Cc: Bernhard Walle Cc: Michal Marek Cc: Ralf Baechle Signed-off-by: Willy Tarreau commit d31e3b9e8be60e71d8c9edb15689c496f73755c0 Author: Thomas Gleixner Date: Fri May 3 15:02:50 2013 +0200 tick: Cleanup NOHZ per cpu data on cpu down commit 4b0c0f294f60abcdd20994a8341a95c8ac5eeb96 upstream. Prarit reported a crash on CPU offline/online. The reason is that on CPU down the NOHZ related per cpu data of the dead cpu is not cleaned up. If at cpu online an interrupt happens before the per cpu tick device is registered the irq_enter() check potentially sees stale data and dereferences a NULL pointer. Cleanup the data after the cpu is dead. Reported-by: Prarit Bhargava Cc: Mike Galbraith Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 34d9182227e979c2242dfa0ec73e71ad677c1617 Author: Tirupathi Reddy Date: Tue May 14 13:59:02 2013 +0530 timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE commit 42a5cf46cd56f46267d2a9fcf2655f4078cd3042 upstream. An inactive timer's base can refer to a offline cpu's base. In the current code, cpu_base's lock is blindly reinitialized each time a CPU is brought up. If a CPU is brought online during the period that another thread is trying to modify an inactive timer on that CPU with holding its timer base lock, then the lock will be reinitialized under its feet. This leads to following SPIN_BUG(). <0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466 <0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1 <4> [] (unwind_backtrace+0x0/0x11c) from [] (do_raw_spin_unlock+0x40/0xcc) <4> [] (do_raw_spin_unlock+0x40/0xcc) from [] (_raw_spin_unlock+0x8/0x30) <4> [] (_raw_spin_unlock+0x8/0x30) from [] (mod_timer+0x294/0x310) <4> [] (mod_timer+0x294/0x310) from [] (queue_delayed_work_on+0x104/0x120) <4> [] (queue_delayed_work_on+0x104/0x120) from [] (sdhci_msm_bus_voting+0x88/0x9c) <4> [] (sdhci_msm_bus_voting+0x88/0x9c) from [] (sdhci_disable+0x40/0x48) <4> [] (sdhci_disable+0x40/0x48) from [] (mmc_release_host+0x4c/0xb0) <4> [] (mmc_release_host+0x4c/0xb0) from [] (mmc_sd_detect+0x90/0xfc) <4> [] (mmc_sd_detect+0x90/0xfc) from [] (mmc_rescan+0x7c/0x2c4) <4> [] (mmc_rescan+0x7c/0x2c4) from [] (process_one_work+0x27c/0x484) <4> [] (process_one_work+0x27c/0x484) from [] (worker_thread+0x210/0x3b0) <4> [] (worker_thread+0x210/0x3b0) from [] (kthread+0x80/0x8c) <4> [] (kthread+0x80/0x8c) from [] (kernel_thread_exit+0x0/0x8) As an example, this particular crash occurred when CPU #3 is executing mod_timer() on an inactive timer whose base is refered to offlined CPU #2. The code locked the timer_base corresponding to CPU #2. Before it could proceed, CPU #2 came online and reinitialized the spinlock corresponding to its base. Thus now CPU #3 held a lock which was reinitialized. When CPU #3 finally ended up unlocking the old cpu_base corresponding to CPU #2, we hit the above SPIN_BUG(). CPU #0 CPU #3 CPU #2 ------ ------- ------- ..... ...... mod_timer() lock_timer_base spin_lock_irqsave(&base->lock) cpu_up(2) ..... ...... init_timers_cpu() .... ..... spin_lock_init(&base->lock) ..... spin_unlock_irqrestore(&base->lock) ...... Allocation of per_cpu timer vector bases is done only once under "tvec_base_done[]" check. In the current code, spinlock_initialization of base->lock isn't under this check. When a CPU is up each time the base lock is reinitialized. Move base spinlock initialization under the check. Signed-off-by: Tirupathi Reddy Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 820e8b30ea5fcc708b49078d1746131a158ba8bb Author: Stanislaw Gruszka Date: Fri Feb 15 11:08:11 2013 +0100 posix-cpu-timers: Fix nanosleep task_struct leak commit e6c42c295e071dd74a66b5a9fcf4f44049888ed8 upstream. The trinity fuzzer triggered a task_struct reference leak via clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls posic_cpu_timer_create(), but misses a corresponding posix_cpu_timer_del() which leads to the task_struct reference leak. Reported-and-tested-by: Tommi Rantala Signed-off-by: Stanislaw Gruszka Cc: Dave Jones Cc: John Stultz Cc: Oleg Nesterov Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit 30145daa32a9f20f6de2cdcff698365f72d2d4b6 Author: Mark Rutland Date: Thu Mar 7 15:09:24 2013 +0000 clockevents: Don't allow dummy broadcast timers commit a7dc19b8652c862d5b7c4d2339bd3c428bd29c4a upstream. Currently tick_check_broadcast_device doesn't reject clock_event_devices with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real hardware if they have a higher rating value. In this situation, the dummy timer is responsible for broadcasting to itself, and the core clockevents code may attempt to call non-existent callbacks for programming the dummy, eventually leading to a panic. This patch makes tick_check_broadcast_device always reject dummy timers, preventing this problem. Signed-off-by: Mark Rutland Cc: linux-arm-kernel@lists.infradead.org Cc: Jon Medhurst (Tixy) Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Willy Tarreau commit d556d3262945a9f7733c0c2cf70be5ac0c358254 Author: John Stultz Date: Mon Jan 28 12:01:40 2013 -0800 2.6.32.y: timekeeping: Fix nohz issue with commit 61b76840ddee647c0c223365378c3f394355b7d7 Commit 61b76840ddee647c0c223365378c3f394355b7d7 ("time: Avoid making adjustments if we haven't accumulated anything") introduced a regression with nohz. Basically with kernels between 2.6.20-something to 2.6.32, we accumulate time in half second chunks, rather then every timer-tick. This was added because when NOHZ landed, if you were idle for a few seconds, you had to spin for every tick we skipped in the accumulation loop, which created some bad latencies. However, this required that we create the xtime_cache() which was still updated each tick, so that filesystem timestamps, etc continued to see time increment normally. Of course, the xtime_cache is updated at the bottom of update_wall_time(). So the early return on (offset < timekeeper.cycle_interval), added by the problematic commit causes the xtime_cache to not be updated. This can cause code using current_kernel_time() (like the mqueue code) or hrtimer_get_softirq_time(), which uses the non-updated xtime_cache, to see timers to fire with very coarse half-second granularity. Many thanks to Romain for describing the issue clearly, providing test case to reproduce it and helping with testing the solution. This change is for 2.6.32-stable ONLY! Cc: stable@vger.kernel.org Cc: Willy Tarreau Cc: Romain Francoise Reported-by: Romain Francoise Tested-by: Romain Francoise Signed-off-by: John Stultz Signed-off-by: Willy Tarreau commit ec2826bc2f8c9874284080a9cda657c4c1dc948e Author: Jens Axboe Date: Tue Feb 23 08:40:43 2010 +0100 Revert "block: improve queue_should_plug() by looking at IO depths" This reverts commit fb1e75389bd06fd5987e9cda1b4e0305c782f854. "Benjamin S." reports that the patch in question causes a big drop in sequential throughput for him, dropping from 200MB/sec down to only 70MB/sec. Needs to be investigated more fully, for now lets just revert the offending commit. Conflicts: include/linux/blkdev.h Signed-off-by: Jens Axboe (cherry picked from commit 79da0644a8e0838522828f106e4049639eea6baf) Cc: Thomas Bork Signed-off-by: Willy Tarreau commit 01ab25d50c6d5e1a0e82f9053e03b86b9fa757c8 Author: Ben Hutchings Date: Sat Jan 12 21:25:51 2013 +0000 Revert "pcdp: use early_ioremap/early_iounmap to access pcdp table" This reverts commit 2af3af56e7d4756b21a2e0d86e4fc4e5b7f0df24, which was commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream. This broke compilation of the driver in 2.6.32.y as the early_io{remap,unmap}() functions are not defined for ia64. The driver can *only* be built for ia64 (even in current mainline), so a fix for x86_64 is pointless. Signed-off-by: Ben Hutchings Signed-off-by: Willy Tarreau