Must-fix bugs ============= drivers/char/ ~~~~~~~~~~~~~ o TTY locking is broken. o see FIXME in do_tty_hangup(). This causes ppp BUGs in local_bh_enable() o Other problems: aviro, dipankar, Alan have details. o somebody will have to document the tty driver and ldisc API drivers/tty ~~~~~~~~~~~ o viro: tty_driver refcounting, tty/misc/upper levels of sound still not completely fixed. drivers/block/ ~~~~~~~~~~~~~~ o loop.c: Concurrent write access on block devices might cause a deadlock of the complete system. See: http://marc.theaimsgroup.com/?l=linux-kernel&m=106275365925769&w== http://bugzilla.kernel.org/show_bug.cgi?id=1198 Thread of possible fix: http://www.kerneli.org/pipermail/cryptoapi-devel/2003-October/000676.html (Fruhwirth Clemens) o ideraid hasn't been ported to 2.5 at all yet. We need to understand whether the proposed BIO split code will suffice for this. drivers/input/ ~~~~~~~~~~~~~~ o rmk: unconverted keyboard/mouse drivers (there's a deadline of 2.6.0 currently on these remaining in my/Linus' tree.) o viro: large absence of locking. o viro: parport is nearly as bad as that and there the code is more hairy. IMO parport is more of "figure out what API changes are needed for its users, get them done ASAP, then fix generic layer at leisure" o (Albert Cahalan) Lots of people (check Google) get this message from the kernel: psmouse.c: Lost synchronization, throwing 2 bytes away. (the number of bytes will be 1, 2, or 3) At work, I get it when there is heavy NFS traffic. The mouse goes crazy, jumping around and doing random cut-and-paste all over everything. This is with a decently fast and modern PC. o There seem to be too many reports of keyboards and mice failing or acting strangely. drivers/misc/ ~~~~~~~~~~~~~ o rmk: UCB1[23]00 drivers, currently sitting in drivers/misc in the ARM tree. (touchscreen, audio, gpio, type device.) These need to be moved out of drivers/misc/ and into real places o viro: actually, misc.c has a good chance to die. With cdev-cidr that's trivial. drivers/net/ ~~~~~~~~~~~~ drivers/net/irda/ ~~~~~~~~~~~~~~~~~ o dongle drivers need to be converted to sir-dev o irport need to be converted to sir-kthread o new drivers (irtty-sir/smsc-ircc2/donauboe) need more testing o rmk: Refuse IrDA initialisation if sizeof(structures) is incorrect (I'm not sure if we still need this; I think gcc 2.95.3 on ARM shows this problem though.) drivers/pci/ ~~~~~~~~~~~~ o alan: Some cardbus crashes the system (bugzilla, please?) drivers/pcmcia/ ~~~~~~~~~~~~~~~ o alan: This is a locking disaster. (rmk, brodo: in progress) drivers/pld/ ~~~~~~~~~~~~ o rmk: EPXA (ARM platform) PLD hotswap drivers (drivers/pld) (rmk: will work out what to do here. maybe drivers/arm/) drivers/video/ ~~~~~~~~~~~~~~ o Lots of drivers don't compile, others do but don't work. drivers/scsi/ ~~~~~~~~~~~~~ o Convert am53c974, dpt_i2o, initio and pci2220i to DMA-mapping o Make inia100, cpqfc, pci2000 and dc390t compile o Convert wd33c99 based: a2091 a3000 gpv11 mvme174 sgiwd93 53c7xx based: amiga7xxx bvme6000 mvme16x initio am53c974 pci2000 pci2220i dc390t To new error handling It also might be possible to shift the 53c7xx based drivers over to 53c700 which does the new EH stuff, but I don't have the hardware to check such a shift. For the non-compiling stuff, I've probably missed a few that just aren't compilable on my platforms, so any updates would be welcome. Also, are some of our non-compiling or unconverted drivers obsolete? fs/ ~~~ o AIO/direct-IO writes can race with truncate and wreck filesystems. (Badari has a patch) o viro: fs/char_dev.c needs removal of aeb stuff and merge of cdev-cidr. In progress. o forward-port sct's O_DIRECT fixes (Badari has a patch) o viro: there is some generic stuff for namei/namespace/super, but that's a slow-merge and can go in 2.6 just fine o andi: also soft needs to be fixed - there are quite a lot of uninterruptible waits in sunrpc/nfs o trond: NFS has a mmap-versus-truncate problem kernel/sched.c ~~~~~~~~~~~~~~ o Starvation, general interactivity need close monitoring. o SMT aware scheduler (Ingo, Rusty, Nick have implementations) kernel/ ~~~~~~~ o Alan: 32bit uid support is *still* broken for process accounting. Create a 32bit uid, turn accounting on. Shock horror it doesn't work because the field is 16bit. We need an acct structure flag day for 2.6 IMHO (alan has patch) o viro: core sysctl code is racy. And its interaction wiuth sysfs o (ingo) rwsems (on x86) are limited to 32766 waiting processes. This means that setting pid_max to above 32K is unsafe :-( An option is to use CONFIG_RWSEM_GENERIC_SPINLOCK variant all the time, for all archs, and not inline any part of the ops. lib/kobject.c ~~~~~~~~~~~~~ o kobject refcounting (comments from Al Viro): _anything_ can grab a temporary reference to kobject. IOW, if kobject is embedded into something that could be freed - it _MUST_ have a destructor and that destructor _MUST_ be the destructor for containing object. Any violation of the above (and we already have a bunch of those) is a user-triggerable memory corruption. We can tolerate it for a while in 2.5 (e.g. during work on susbsystem we can decide to switch to that way of handling objects and have subsystem vulnerable for a while), but all such windows must be closed before 2.6 and during 2.6 we can't open them at all. o All block drivers which control multiple gendisks with a single request_queue are broken, due to one-to-one assumptions in the request queue sysfs hookup. mm/ ~~~ o GFP_DMA32 (or something like that). Lots of ideas. jejb, zaitcev, willy, arjan, wli. Specifically, 64-bit systems need to be able to enforce 32-bit addressing limits for device metadata like network cards' ring buffers and SCSI command descriptors. o access_process_vm() doesn't flush right. We probably need new flushing primitives to do this (davem?) modules ~~~~~~~ (Rusty) net/ ~~~~ (davem) o UDP apps can in theory deadlock, because the ip_append_data path can end up sleeping while the socket lock is held. It is OK to sleep with the socket held held, normally. But in this case the sleep happens while waiting for socket memory/space to become available, if another context needs to take the socket lock to free up the space we could hang. I sent a rough patch on how to fix this to Alexey, and he is analyzing the situation. I expect a final fix from him next week or so. o Semantics for IPSEC during operations such as TCP connect suck currently. When we first try to connect to a destination, we may need to ask the IPSEC key management daemon to resolve the IPSEC routes for us. For the purposes of what the kernel needs to do, you can think of it like ARP. We can't send the packet out properly until we resolve the path. What happens now for IPSEC is basically this: O_NONBLOCK: returns -EAGAIN over and over until route is resolved !O_NONBLOCK: Sleeps until route is resolved These semantics are total crap. The solution, which Alexey is working on, is to allow incomplete routes to exist. These "incomplete" routes merely put the packet onto a "resolution queue", and once the key manager does it's thing we finish the output of the packet. This is precisely how ARP works. I don't know when Alexey will be done with this. net/*/netfilter/ ~~~~~~~~~~~~~~~~ (Rusty) sound/ ~~~~~~ global ~~~~~~ o viro: 64-bit dev_t (not a mustfix for 2.6.0). 32-bit dev_t is done, 64-bit means extra work on nfsd/raid/etc. o alan: Forward port 2.4 fixes - Chris Wright: Security fixes including execve holes, execve vs proc races o There are about 60 or 70 security related checks that need doing (copy_user etc) from Stanford tools. (badari is looking into this, and hollisb) o A couple of hundred real looking bugzilla bugs o viro: cdev rework. Mostly done.