summaryrefslogtreecommitdiffstats
path: root/arch/ppc
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] kzalloc() conversion in arch/ppcEric Sesterhenn2006-03-175-10/+5
| | | | | | | | This converts arch/ppc to kzalloc usage. Crosscompile tested with allyesconfig. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
* Merge ../linux-2.6Paul Mackerras2006-03-092-72/+24
|\
| * powerpc: Fix various syscall/signal/swapcontext bugsPaul Mackerras2006-03-082-72/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A careful reading of the recent changes to the system call entry/exit paths revealed several problems, plus some things that could be simplified and improved: * 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit path, so it was only doing anything with it once it saw some other bit being set. In other words, the noerror behaviour would apply to the next system call where we had to reschedule or deliver a signal, which is not necessarily the current system call. * 32-bit wasn't doing the call to ptrace_notify in the syscall exit path when the _TIF_SINGLESTEP bit was set. * _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set by system calls. I took it out of _TIF_USER_WORK_MASK. * On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers to be restored (unless perhaps a signal was delivered or the syscall was traced or single-stepped). Thus the non-volatile registers weren't restored on exit from a signal handler. We probably got away with it mostly because signal handlers written in C wouldn't alter the non-volatile registers. * On 32-bit I simplified the code and made it more like 64-bit by making the syscall exit path jump to ret_from_except to handle preemption and signal delivery. * 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was set - but I think because of that 32-bit was actually restoring the non-volatile registers on exit from a signal handler. * I changed the order of enabling interrupts and saving the non-volatile registers before calling do_syscall_trace_leave; now we enable interrupts first. Signed-off-by: Paul Mackerras <paulus@samba.org>
* | Merge ../powerpc-mergePaul Mackerras2006-02-245-508/+5
|\|
| * [PATCH] powerpc: fix altivec_unavailable_exception OopsesAlan Curry2006-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | altivec_unavailable_exception is called without setting r3... it looks like the r3 that actually gets passed in as struct pt_regs *regs is the undisturbed value of r3 at the time the altivec instruction was encountered. The user actually gets to choose the pt_regs printed in the Oops! This fixes the oops by passing the correct pt_regs pointer to altivec_unavailable_exception. Signed-off-by: Alan Curry <pacman@TheWorld.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * [PATCH] ppc: fix adb breakage in xmonOlaf Hering2006-02-243-486/+3
| | | | | | | | | | | | | | | | Fix up xmon compilation after the last change. Remove lots of dead code, all the pmac and chrp support is in arch/powerpc Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * [PATCH] powerpc: remove duplicate exportsOlaf Hering2006-02-202-22/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few symbols are exported twice, remove them from ppc_ksyms.c Remove users of sys_ctrler in arch/ppc/ WARNING: vmlinux: duplicate symbol '__delay' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol '__up' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol '__down' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol '__down_interruptible' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'sys_ctrler' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strncat' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strncmp' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strchr' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strrchr' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strnlen' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strpbrk' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'memscan' previous definition was in vmlinux WARNING: vmlinux: duplicate symbol 'strstr' previous definition was in vmlinux Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] powerpc: trivial: modify comments to refer to new location of filesJon Mason2006-02-10200-402/+1
| | | | | | | | | | | | | | | | | | This patch removes all self references and fixes references to files in the now defunct arch/ppc64 tree. I think this accomplises everything wanted, though there might be a few references I missed. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] ppc32: Make platform devices being able to assign functionsVitaly Bordug2006-02-101-3/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implemented by modification of the .name field of the platform device, when PDs with the same names are to be used within different drivers, as <device_name> -> <device_name>:<function> Corresponding drivers should change the .name in struct device_driver to reflect upper of course. Added ppc_sys_device_disable/enable function set, making it easier to disable all the inexistent/not utilized platform device way pdevs. By the check of the "disabled" bit in the config field of ppc_sys_specs, disabled platform devices will be either added/removed from the bus, or simply not registered on it, depending on the time when disable/enable call asserted. The default behaviour when nothing is disabled/enabled will be "all devices are enabled", which is the same as before. Also helper platform_notify_map function added, making assignment of board-specific platform_info more consistent and generic. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] powerpc/ppc: Add missing isyncs in head_fsl_booke.SBecky Bruce2006-02-101-0/+4
| | | | | | | | | | | | | | | | | | | | The e500 core reference manual indicates that isync is required after mtmsr(DE bit) and mtspr DBCR0. Add isyncs to make the code conform to the spec. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | Merge ../powerpc-mergePaul Mackerras2006-02-1020-313/+34
|\|
| * ppc: Use the system call table from arch/powerpc/kernel/systbl.SPaul Mackerras2006-02-101-283/+0
| | | | | | | | | | | | | | With this, new system calls only have to be wired up in one place for ARCH=ppc and ARCH=powerpc, rather than 2. Signed-off-by: Paul Mackerras <paulus@samba.org>
| * Merge master.kernel.org:/home/rmk/linux-2.6-serialLinus Torvalds2006-02-0818-30/+30
| |\
| | * [SERIAL] uart_port flags member should use UPF_*Russell King2006-02-058-8/+8
| | | | | | | | | | | | | | | | | | Convert usage of ASYNC_* to UPF_*. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | * [SERIAL] uart_port iotype member should use UPIO_*Russell King2006-02-0518-22/+22
| | | | | | | | | | | | | | | | | | Convert usage of SERIAL_IO_* to UPIO_*. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-mergeLinus Torvalds2006-02-071-2/+5
| |\ \
| * | | [PATCH] ppc: last_task_.... is defined only on non-SMPAl Viro2006-02-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | ... so it should be exported only on non-SMP. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | [PATCH] fix breakage in ocp.cAl Viro2006-02-071-2/+2
| | |/ | |/| | | | | | | | | | | | | it's ocp_device_...., not ocp_driver_.... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpcPaul Mackerras2006-02-0832-249/+3243
|\ \ \
| * | | [PATCH] powerpc: Remove arch/ppc/syslib/ppc4xx_pm.cDomen Puncer2006-02-071-47/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove nowhere referenced file ("grep ppc4xx_pm -r ." didn't find anything). Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] ppc32: MPC885ADS, MPC866ADS and MPC8272ADS-specific platform stuff ↵Vitaly Bordug2006-02-078-1/+1067
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for fs_enet Added proper ppc_sys identification and fs_platform_info's for MPC 885ADS, 866ADS and 8272ADS, utilizing function assignment to remove/do not use platform devices which conflict with PD-incompatible drivers. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Add ML403 defconfigGrant C. Likely2006-02-071-0/+740
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Add support for Xilinx ML403 reference designGrant C. Likely2006-02-078-4/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Includes fix for Xilinx silicon errata 213 Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Add xparameters file for Xilinx ML403 reference designGrant C. Likely2006-02-071-0/+243
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Add ML300 defconfigGrant C. Likely2006-02-071-0/+739
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Migrate ML300 reference design to the platform busGrant C. Likely2006-02-074-25/+55
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Migrate Xilinx Vertex support from the OCP bus to the ↵Grant C. Likely2006-02-076-128/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | platfom bus. This patch only deals with the serial port definitions as there is no support for any other xilinx IP cores in the kernel tree at the moment. Board specific configuration moved out of virtex.[ch] and into the xparameters.h wrapper. This also prepares for the transition to the flattened device tree model. When the bootloader provides a device tree generated from an xparameters.h files, the kernel will no longer need xparameters/*. The platform bus will get populated with data from the device tree, and the device drivers will be automatically connected to the devices. Only the bootloader (or ppcboot) will need xparameters directly. Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Make Virtex-II Pro support generic for all Virtex devicesGrant C. Likely2006-02-077-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PPC405 hard core is used in both the Virtex-II Pro and Virtex 4 FX FPGAs. This patch cleans up the Virtex naming convention to reflect more than just the Virtex-II Pro. Rename files virtex-ii_pro.[ch] to virtex.[ch] Rename config value VIRTEX_II_PRO to XILINX_VIRTEX Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | [PATCH] powerpc: Move xparameters.h into xilinx virtex device specific pathGrant C. Likely2006-02-073-2/+20
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xparameters should not be needed by anything but virtex platform code. Move it from include/asm-ppc/ to platforms/4xx/xparameters/ This is preparing for work to remove xparameters from the dependancy tree for most c files. xparam changes should not cause a recompile of the world. Instead, drivers should get device info from the platform bus (populated by the boot code) Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | [PATCH] powerpc/8xx: last two 8MB D-TLB entries are incorrectly setMarcelo Tosatti2006-02-071-2/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last two 8MB TLB entries are being incorrectly set by initial_mmu on 8xx. The first entry is written with the same virtual/physical address, which renders it invalid: BDI>rms 792 0x00001e00 BDI>rms 824 1 BDI>rds 824 SPR 824 : 0xc08000c0 -1065353024 BDI>rds 825 SPR 825 : 0xc0800de0 -1065349664 BDI>rds 826 SPR 826 : 0x00000000 0 And the second entry, in addition, does not have its TLB index set correctly. Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * [PATCH] mv643xx_eth: Fix for building as a moduleDale Farnsworth2006-01-271-2/+2
| | | | | | | | | | | | | | | | Enable mv643xx_eth driver to work when built as a module on mv64x60-based embedded systems. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
* | [PATCH] PPC32 8xx: support for the physmapped flash on m8xxVitaly Bordug2006-01-201-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | Implemented more correct way to support physmapped flash on m8xx than map in mtd. The areas intended to contain bootloader are protected readonly. Note that CFI and JEDEC stuff should be configured properly in order this to work, e.g. for 885/86x CFI should support 4-chip flash interleave. Also fixed compilation warning. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] powerpc: generalize PPC44x_PIN_SIZEMarcelo Tosatti2006-01-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The following patch generalizes PPC44x_PIN_SIZE by changing it to PPC_PIN_SIZE, which can be defined by any sub-arch to automatically adjust VMALLOC_START. Define PPC_PIN_SIZE on 8xx, avoiding potential conflicts with the pinned space. Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] ppc32 8xx: Added setbitsXX/clrbitsXX macro for read-modify-write ↵Vitaly Bordug2006-01-203-11/+11
|/ | | | | | | | | | | operations This adds setbitsXX/clrbitsXX macro for read-modify-write operations and converts the 8xx core and drivers to use them. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [PATCH] e1000: Added disable packet split capabilityJesse Brandeburg2006-01-184-0/+4
| | | | | | | | | Adds the ability to disability packet split at compile time and use the legacy receive path on PCI express hardware. Made this a CONFIG option and modified the Kconfig, to reflect the new option. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: John Ronciak <john.ronciak@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
* powerpc/32: Restore previous version of 32-bit PCI codePaul Mackerras2006-01-151-1/+0
| | | | | | | | | When I removed the powermac support from arch/ppc/kernel/pci.c, I overlooked the fact that that file is used in 32-bit ARCH=powerpc builds. To prevent problems in future, restore the original version of that file as arch/powerpc/kernel/pci_32.c, and use that. Signed-off-by: Paul Mackerras <paulus@samba.org>
* [PATCH] ppc: Remove powermac support from ARCH=ppcPaul Mackerras2006-01-1533-10283/+195
| | | | | | | | | | | This makes it possible to build kernels for PReP and/or CHRP with ARCH=ppc by removing the (non-building) powermac support. It's now also possible to select PReP and CHRP independently. Powermac users should now build with ARCH=powerpc instead of ARCH=ppc. (This does mean that it is no longer possible to build a 32-bit kernel for a G5.) Signed-off-by: Paul Mackerras <paulus@samba.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds2006-01-141-2/+2
|\
| * [PATCH] Add ocp_bus_type probe and remove methodsRussell King2006-01-131-2/+2
| | | | | | | | | | Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | [PATCH] remove unused tmp_buf_sem'sAdrian Bunk2006-01-141-1/+0
| | | | | | | | | | | | | | | | tmp_buf_sem sems to be a common name for something completely unused... Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> ("usb portion") Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | ppc: Remove duplicate export of get_wchanPaul Mackerras2006-01-141-1/+0
| | | | | | | | | | | | | | | | | | | | The arch/powerpc version of process.c exports get_wchan itself. When I moved ARCH=ppc over to using arch/powerpc/kernel/process.c the get_wchan export in arch/ppc/kernel/ppc_ksyms.c became redundant, so remove it. Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 9871166ad692121d6b944159ef3f053570158ea8 commit)
* | [PATCH] powerpc/8xx: Use 8MB D-TLB's for kernel static mapping faultsMarcelo Tosatti2006-01-141-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following implements support for instantiation of 8MB D-TLB entries for the kernel direct virtual mapping on 8xx, thus reducing TLB space consumed for the kernel. Test used: writing 40MB from /dev/zero to file in ext2fs over RAMDISK. $ time dd if=/dev/zero of=file bs=4k count=10000 VANILLA 8MB kernel data pages real 0m11.485s real 0m11.267s user 0m0.218s user 0m0.250s sys 0m8.939s sys 0m9.108s real 0m11.518s real 0m10.978s user 0m0.203s user 0m0.222s sys 0m9.585s sys 0m9.138s real 0m11.554s real 0m10.967s user 0m0.228s user 0m0.222s sys 0m9.497s sys 0m9.127s real 0m11.633s real 0m11.286s user 0m0.214s user 0m0.196s sys 0m9.529s sys 0m9.134s and averages for both: real 11.54750 real 11.12450 Which is a 3.6% improvement in execution time. More improvement is expected for loads with larger kernel data footprint (real workloads). Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] powerpc: Updated platforms that use gianfar to match driverKumar Gala2006-01-139-55/+52
|/ | | | | | | | | The gianfar driver changed how it required MDIO bus and phy id's to be passed to it. Also, it no longer passes the physical address of the MDIO bus. Instead we have a proper platform device. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-mergeLinus Torvalds2006-01-122-852/+0
|\ | | | | | | | | | | | | Fix up delete/modify conflict of arch/ppc/kernel/process.c by hand (it's gone, gone, gone). Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * powerpc: make ARCH=ppc use arch/powerpc/kernel/process.cPaul Mackerras2006-01-122-852/+0
| | | | | | | | | | | | | | | | | | | | Commit 5388fb1025443ec223ba556b10efc4c5f83f8682 made signal_32.c use discard_lazy_cpu_state, which broke ARCH=ppc because that uses the common signal_32.c but has its own process.c. Make ARCH=ppc use the common process.c to fix this and to reduce the amount of duplicated code. Signed-off-by: Paul Mackerras <paulus@samba.org>
* | [PATCH] m68k: kill mach_floppy_setup, convert to proper __setup() in driversAl Viro2006-01-122-20/+0
| | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] m68k: namespace pollution fix (custom->amiga_custom)Al Viro2006-01-124-41/+41
| | | | | | | | | | | | | | | | | | | | in amigahw.h custom renamed to amiga_custom, in drivers with few instances the same replacement, in the rest - #define custom amiga_custom in driver itself Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] powerpc: task_stack_page()Al Viro2006-01-121-4/+4
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] powerpc: task_thread_info()Al Viro2006-01-122-3/+3
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] scheduler cache-hot-autodetectakpm@osdl.org2006-01-121-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>