summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] check for null vfsmount in dentry_open()Christoph Hellwig2008-03-191-0/+12
| | | | | | | | | | | | | | | Make sure no-one calls dentry_open with a NULL vfsmount argument and crap out with a stacktrace otherwise. A NULL file->f_vfsmnt has always been problematic, but with the per-mount r/o tracking we can't accept anymore at all. [AV] the last place that passed NULL had been eliminated by the previous patch (reiserfs xattr stuff) Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] reiserfs: eliminate private use of struct file in xattrJeff Mahoney2008-03-191-80/+30
| | | | | | | | | | | | | | | | | | | | | After several posts and bug reports regarding interaction with the NULL nameidata, here's a patch to clean up the mess with struct file in the reiserfs xattr code. As observed in several of the posts, there's really no need for struct file to exist in the xattr code. It was really only passed around due to the f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it. reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the struct file passed to it, and the xattr code uses a private version of reiserfs_readdir() to enumerate the xattr directories. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] sanitize hppfsAl Viro2008-03-191-79/+34
| | | | | | | | | | | | | | * hppfs_iget() and its users are racy; there's no need to pollute icache anyway, new_inode() works fine and is safe, unlike the current kludges (these relied on overwriting ->i_ino before another iget_locked() gets to that one - and did it after unlocking). * merge hppfs_iget()/init_inode()/hppfs_read_inode(), while we are at it. * to pass proper vfsmount to dentry_open() store the reference in hppfs superblock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --
* hppfs pass vfsmount to dentry_open()Dave Hansen2008-03-191-176/+188
| | | | | | | | | | | | | | | Here's patch for hppfs that uses vfs_kern_mount to make sure it always has a procfs instance and passed the vfsmount on through the inode private data. Also fixes a procfs file_system_type leak for every attempted hppfs mount. [ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ] Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] restore export of do_kern_mount()Al Viro2008-03-171-0/+1
| | | | | | | | | vfs_kern_mount() requires having a reference to fs type, which makes it impossible for module to create procfs, etc. private mount. Open-coding is not an option, since e.g. put_filesystem() is _not_ exported, and for a good reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'upstream-linus' of ↵Linus Torvalds2008-03-175-52/+145
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ahci: Add Marvell 6121 SATA support pata_ali: use atapi_cmd_type() to determine cmd type instead of transfer size ahci: implement skip_host_reset parameter ahci: request all PCI BARs devres: implement pcim_iomap_regions_request_all() libata-acpi: improve dock event handling
| * ahci: Add Marvell 6121 SATA supportJose Alberto Reguero2008-03-171-4/+15
| | | | | | | | | | Signed-off-by: Jose Alberto Reguero <jareguero@telefonica.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * pata_ali: use atapi_cmd_type() to determine cmd type instead of transfer sizeTejun Heo2008-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | pata_ali was using qc->nbytes to determine whether a command is data transfer type or not. As now qc->nbytes can be extended by padding and draining buffers, these tests are not useful anymore. Use atapi_cmd_type() instead. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * ahci: implement skip_host_reset parameterTejun Heo2008-03-171-19/+29
| | | | | | | | | | | | | | | | | | Under certain circumstances (SSP turned off by the BIOS) and for debugging purposes, skipping global controller reset is helpful. Add a kernel parameter for it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * ahci: request all PCI BARsTejun Heo2008-03-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | ahci is often implemented with accompanying SFF compatible interface and legacy IDE driver may attach to the legacy IO ports when the controller is already claimed by ahci and vice-versa. This patch makes ahci use pcim_iomap_regions_request_all() so that all IO regions are claimed on attach. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * devres: implement pcim_iomap_regions_request_all()Tejun Heo2008-03-172-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | Some drivers need to reserve all PCI BARs to prevent other drivers misusing unoccupied BARs. pcim_iomap_regions_request_all() requests all BARs and iomap specified BARs. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * libata-acpi: improve dock event handlingTejun Heo2008-03-171-27/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve ACPI hotplug handling such that dock event is handled properly. * Register handlers for dock events. * Directly detach device on EJECT_REQUEST instead of signaling hotplug event. This prevents libata from accessing severed controller and/or device. * While at it, use named constants for ACPI events and move uevent signaling inside host lock. Original patch and testing by Holger Macht. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Holger Macht <hmacht@suse.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds2008-03-176-14/+34
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: fix race in enable_cb virtio: Enable netpoll interface for netconsole logging virtio: handle > 2 billion page balloon targets virtio: Fix sysfs bits to have proper block symlink virtio: Use spin_lock_irqsave/restore for virtio-pci
| * | virtio: fix race in enable_cbChristian Borntraeger2008-03-173-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race in virtio_net, dealing with disabling/enabling the callback. I saw the following oops: kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218! illegal operation: 0001 [#1] SMP Modules linked in: sunrpc dm_mod CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99 Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60) Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001 000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000 0000000000000000 000000000f870000 0000000000000000 0000000000001237 000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8 Krnl Code: 00000000002b819a: a7110001 tmll %r1,1 00000000002b819e: a7840004 brc 8,2b81a6 00000000002b81a2: a7f40001 brc 15,2b81a4 >00000000002b81a6: a51b0001 oill %r1,1 00000000002b81aa: 40102000 sth %r1,0(%r2) 00000000002b81ae: 07fe bcr 15,%r14 00000000002b81b0: eb7ff0380024 stmg %r7,%r15,56(%r15) 00000000002b81b6: a7f13e00 tmll %r15,15872 Call Trace: ([<000000000fa0bcd0>] 0xfa0bcd0) [<00000000002b8350>] vring_interrupt+0x5c/0x6c [<000000000010ab08>] do_extint+0xb8/0xf0 [<0000000000110716>] ext_no_vtime+0x16/0x1a [<0000000000107e72>] cpu_idle+0x1c2/0x1e0 The problem can be triggered with a high amount of host->guest traffic. I think its the following race: poll says netif_rx_complete poll calls enable_cb enable_cb opens the interrupt mask a new packet comes, an interrupt is triggered----\ enable_cb sees that there is more work | enable_cb disables the interrupt | . V . interrupt is delivered . skb_recv_done does atomic napi test, ok some waiting disable_cb is called->check fails->bang! . poll would do napi check poll would do disable_cb The fix is to let enable_cb not disable the interrupt again, but expect the caller to do the cleanup if it returns false. In that case, the interrupt is only disabled, if the napi test_set_bit was successful. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
| * | virtio: Enable netpoll interface for netconsole loggingAmit Shah2008-03-171-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new poll_controller handler that the netpoll interface needs. This enables netconsole logging from a kvm guest over the virtio net interface. Signed-off-by: Amit Shah <amitshah@gmx.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | virtio: handle > 2 billion page balloon targetsRusty Russell2008-03-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | If the host asks for a huge target towards_target() can overflow, and we up oops as we try to release more pages than we have. The simple fix is to use a 64-bit value. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | virtio: Fix sysfs bits to have proper block symlinkJeremy Katz2008-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix up so that the virtio_blk devices in sysfs link correctly to their block device. This then allows them to be detected by hal, etc Signed-off-by: Jeremy Katz <katzj@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | virtio: Use spin_lock_irqsave/restore for virtio-pciAnthony Liguori2008-03-171-6/+9
| |/ | | | | | | | | | | | | | | | | virtio-pci acquires its spin lock in an interrupt context so it's necessary to use spin_lock_irqsave/restore variants. This patch fixes guest SMP when using virtio devices in KVM. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* / hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakageAl Viro2008-03-171-3/+15
|/ | | | | | | oops and fs corruption; the latter can happen even on valid fs in case of oom. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Linux 2.6.25-rc6v2.6.25-rc6Linus Torvalds2008-03-161-1/+1
|
* Merge branch 'master' of ↵Linus Torvalds2008-03-1618-65/+125
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: [PARISC] make ptr_to_pide() static [PARISC] head.S: section mismatch fixes [PARISC] add back Crestone Peak cpu [PARISC] futex: special case cmpxchg NULL in kernel space [PARISC] clean up show_stack [PARISC] add pa8900 CPUs to hardware inventory [PARISC] clean up include/asm-parisc/elf.h [PARISC] move defconfig to arch/parisc/configs/ [PARISC] add back AD1889 MAINTAINERS entry [PARISC] pdc_console: fix bizarre panic on boot [PARISC] dump_stack in show_regs [PARISC] pdc_stable: fix compile errors [PARISC] remove unused pdc_iodc_printf function [PARISC] bump __NR_syscalls [PARISC] unbreak pgalloc.h [PARISC] move VMALLOC_* definitions to fixmap.h [PARISC] wire up timerfd syscalls [PARISC] remove old timerfd syscall
| * [PARISC] make ptr_to_pide() staticFUJITA Tomonori2008-03-151-2/+2
| | | | | | | | | | Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] head.S: section mismatch fixesHelge Deller2008-03-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | - move boot_args[] into the init section - move $global$ into the read_mostly section - fix the following two section mismatches: WARNING: vmlinux.o(.text+0x9c): Section mismatch: reference to .init.text:start_kernel (between '$pgt_fill_loop' and '$is_pa20') WARNING: vmlinux.o(.text+0xa0): Section mismatch: reference to .init.text:start_kernel (between '$pgt_fill_loop' and '$is_pa20') Signed-off-by: Helge Deller <deller@gmx.de> SIgned-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] add back Crestone Peak cpuKyle McMartin2008-03-151-0/+1
| | | | | | | | | | | | | | Crestone Peak Slow is the 800MHz PA-8800 cpu in the C8000. 0x88B is probably the Crestone Peak Fast. Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] futex: special case cmpxchg NULL in kernel spaceKyle McMartin2008-03-151-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a0c1e9073ef7428a14309cba010633a6cd6719ea added code to futex.c to detect whether futex_atomic_cmpxchg_inatomic was implemented at run time: + curval = cmpxchg_futex_value_locked(NULL, 0, 0); + if (curval == -EFAULT) + futex_cmpxchg_enabled = 1; This is bogus on parisc, since page zero in kernel virtual space is the gateway page for syscall entry, and should not be read from the kernel. (That, and we really don't like the kernel faulting on its own address space...) Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] clean up show_stackKyle McMartin2008-03-151-4/+21
| | | | | | | | | | | | | | | | When we show_regs, we obviously have a struct pt_regs of the calling frame. Use these in show_stack so we don't have the entire bogus call trace up to the show_stack call. Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] add pa8900 CPUs to hardware inventoryJames Bottomley2008-03-151-1/+11
| | | | | | | | | | | | | | | | This patch adds the known pa8900 CPUs to the inventory list and removes the Crestone Peak one which apparently never escaped into the wild. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] clean up include/asm-parisc/elf.hRandolph Chung2008-03-151-12/+10
| | | | | | | | | | | | | | Cleanup some cruft. No functionality changes. Signed-off-by: Randolph Chung <tausq@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] move defconfig to arch/parisc/configs/Adrian Bunk2008-03-152-0/+3
| | | | | | | | | | | | | | | | | | This patch moves the default parisc defconfig to arch/parisc/configs/generic_defconfig where it belongs and selects it as the default defconfig through KBUILD_DEFCONFIG. Signed-off-by: Adrian Bunk <adrian.bunk@movial.fi> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] add back AD1889 MAINTAINERS entryThibaut VARENE2008-03-151-0/+9
| | | | | | | | | | Signed-off-by: Thibaut VARENE <T-Bone@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
| * [PARISC] pdc_console: fix bizarre panic on bootKyle McMartin2008-03-153-13/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 721fdf34167580ff98263c74cead8871d76936e6 introduced a subtle bug by accidently removing the "static" from iodc_dbuf. This resulted in, what appeared to be, a trap without *current set to a task. Probably the result of a trap in real mode while calling firmware. Also do other misc clean ups. Since the only input from firmware is non blocking, share iodc_dbuf between input and output, and spinlock the only callers. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] dump_stack in show_regsKyle McMartin2008-03-151-0/+2
| | | | | | | | | | | | | | | | Originally, show_stack was used in BUG() output. However, a recent commit changed it to print register state (no idea what that's supposed to help, really...) and parisc was missing a backtrace because of it. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] pdc_stable: fix compile errorsJoel Soete2008-03-151-3/+3
| | | | | | | | | | Signed-off-by: Joel Soete <rubisher@scarlet.be> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] remove unused pdc_iodc_printf functionKyle McMartin2008-03-152-14/+0
| | | | | | | | Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] bump __NR_syscallsKyle McMartin2008-03-151-1/+1
| | | | | | | | | | | | oops, forgot this in the previous commit. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] unbreak pgalloc.hKyle McMartin2008-03-151-2/+2
| | | | | | | | | | | | | | Commit 2f569afd9ced9ebec9a6eb3dbf6f83429be0a7b4 broke the compile rather spectacularly. Fix code errors. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] move VMALLOC_* definitions to fixmap.hKyle McMartin2008-03-152-9/+8
| | | | | | | | | | | | They make way more sense here, really... Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] wire up timerfd syscallsKyle McMartin2008-03-152-0/+6
| | | | | | | | Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
| * [PARISC] remove old timerfd syscallKyle McMartin2008-03-151-1/+1
| | | | | | | | Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
* | ACPI: Remove ACPI_CUSTOM_DSDT_INITRD optionLinus Torvalds2008-03-157-165/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This essentially reverts commit 71fc47a9adf8ee89e5c96a47222915c5485ac437 ("ACPI: basic initramfs DSDT override support"), because the code simply isn't ready. It did ugly things to the init sequence to populate the rootfs image early, but that just ended up showing other problems with the whole approach. The fact is, the VFS layer simply isn't initialized this early, and the relevant ACPI code should either run much later, or this shouldn't be done at all. For 2.6.25, we'll just pick the latter option. We can revisit this concept later if necessary. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Tilman Schmidt <tilman@imap.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Renninger <trenn@suse.de> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Markus Gaugusch <dsdt@gaugusch.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | tifm_sd: DATA_CARRY is not boolean in tifm_sd_transfer_data()Roel Kluin2008-03-151-1/+1
| | | | | | | | | | | | | | | | DATA_CARRY is not boolean Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-03-151-9/+14
|\ \ | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Fix tbench regression in 2.6.25-rc1
| * | [NET]: Fix tbench regression in 2.6.25-rc1Zhang Yanmin2008-03-121-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Comparing with kernel 2.6.24, tbench result has regression with 2.6.25-rc1. 1) On 2 quad-core processor stoakley: 4%. 2) On 4 quad-core processor tigerton: more than 30%. bisect located below patch. b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue Nov 13 21:33:32 2007 -0800 [IPV6]: Move nfheader_len into rt6_info The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. Above patch changes the cache line alignment, especially member __refcnt. I did a testing by adding 2 unsigned long pading before lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next cache line. The performance is recovered. I created a patch to rearrange the members in struct dst_entry. With Eric and Valdis Kletnieks's suggestion, I made finer arrangement. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. On 16-core tigerton: If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch; If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't introduce regression. Thank Eric, Valdis, and David! Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sched: simplify sched_slice()Ingo Molnar2008-03-151-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the existing calc_delta_mine() calculation for sched_slice(). This saves a divide and simplifies the code because we share it with the other /cfs_rq->load users. It also improves code size: text data bss dec hex filename 42659 2740 144 45543 b1e7 sched.o.before 42093 2740 144 44977 afb1 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* | | sched: fix fair sleepersIngo Molnar2008-03-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Fair sleepers need to scale their latency target down by runqueue weight. Otherwise busy systems will gain ever larger sleep bonus. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* | | sched: fix overload performance: buddy wakeupsPeter Zijlstra2008-03-152-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we schedule to the leftmost task in the runqueue. When the runtimes are very short because of some server/client ping-pong, especially in over-saturated workloads, this will cycle through all tasks trashing the cache. Reduce cache trashing by keeping dependent tasks together by running newly woken tasks first. However, by not running the leftmost task first we could starve tasks because the wakee can gain unlimited runtime. Therefore we only run the wakee if its within a small (wakeup_granularity) window of the leftmost task. This preserves fairness, but does alternate server/client task groups. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | sched: fix calc_delta_mine()Ingo Molnar2008-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | lw->weight can be 0 for a short time during bootup. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* | | sched: fix update_load_add()/sub()Ingo Molnar2008-03-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Clear the cached inverse value when updating load. This is needed for calc_delta_mine() to work correctly when using the rq load. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* | | sched: min_vruntime fixPeter Zijlstra2008-03-151-18/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current min_vruntime tracking is incorrect and will cause serious problems when we don't run the leftmost task for some reason. min_vruntime does two things; 1) it's used to determine a forward direction when the u64 vruntime wraps, 2) it's used to track the leftmost vruntime to position newly enqueued tasks from. The current logic advances min_vruntime whenever the current task's vruntime advance. Because the current task may pass the leftmost task still waiting we're failing the second goal. This causes new tasks to be placed too far ahead and thus penalizes their runtime. Fix this by making min_vruntime the min_vruntime of the waiting tasks by tracking it in enqueue/dequeue, and compare against current's vruntime to obtain the absolute minimum when placing new tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | sched: fix race in schedule()Hiroshi Shimamoto2008-03-151-22/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a hard to trigger crash seen in the -rt kernel that also affects the vanilla scheduler. There is a race condition between schedule() and some dequeue/enqueue functions; rt_mutex_setprio(), __setscheduler() and sched_move_task(). When scheduling to idle, idle_balance() is called to pull tasks from other busy processor. It might drop the rq lock. It means that those 3 functions encounter on_rq=0 and running=1. The current task should be put when running. Here is a possible scenario: CPU0 CPU1 | schedule() | ->deactivate_task() | ->idle_balance() | -->load_balance_newidle() rt_mutex_setprio() | | --->double_lock_balance() *get lock *rel lock * on_rq=0, ruuning=1 | * sched_class is changed | *rel lock *get lock : | : ->put_prev_task_rt() ->pick_next_task_fair() => panic The current process of CPU1(P1) is scheduling. Deactivated P1, and the scheduler looks for another process on other CPU's runqueue because CPU1 will be idle. idle_balance(), load_balance_newidle() and double_lock_balance() are called and double_lock_balance() could drop the rq lock. On the other hand, CPU0 is trying to boost the priority of P1. The result of boosting only P1's prio and sched_class are changed to RT. The sched entities of P1 and P1's group are never put. It makes cfs_rq invalid, because the cfs_rq has curr and no leaf, but pick_next_task_fair() is called, then the kernel panics. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>