summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'omap-fixes' of ↵Russell King2009-06-2514-22/+108
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
| * OMAP: Fix IOMEM macro for assemblyTony Lindgren2009-06-231-1/+1
| | | | | | | | | | | | Otherwise IOMEM calculations can fail. Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP2/3: Initialize gpio debounce registerjanboe2009-06-231-0/+1
| | | | | | | | | | | | | | | | | | | | Some bootloader may initialize debounce register and this will make dbclk not consist with the debounce register after linux kernel boot up. Signed-off-by: janboe <janboe.ye@gmail.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP: IOMMU: function flush_iotlb_page is not flushing correct entryFernando Guzman Lugo2009-06-231-1/+1
| | | | | | | | | | | | | | | | | | The function flush_iotlb_page is not loading the CAM register with the correct entry to be flushed, so it is flushing other entry Signed-off-by: Fernando Guzman Lugo <x0095840@ti.com> Signed-off-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP3: RX51: Use OneNAND sync read / writeAdrian Hunter2009-06-231-0/+1
| | | | | | | | | | | | | | Use OneNAND sync read / write Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP2/3: gpmc-onenand: correct use of async timingsAdrian Hunter2009-06-231-2/+19
| | | | | | | | | | | | | | | | | | | | Use async timings when sync timings are not requested. Also ensure that OneNAND is in async mode when async timings are used. Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP3: DMA: Enable idlemodes for DMA OCPKalle Jokiniemi2009-06-232-0/+28
| | | | | | | | | | | | | | | | | | This patch enables MStandby smart-idle mode, autoidle smartidle mode, and the autoidle bit for DMA4_OCP_SYSCONFIG. Signed-off-by: Kalle Jokiniemi <ext-kalle.jokiniemi@nokia.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Kevin Hilman <khilman@ti.deeprootsystems.com>
| * OMAP3: SRAM size fix for HS/EMU devicesTero Kristo2009-06-231-1/+6
| | | | | | | | | | | | | | | | SRAM size fix for HS/EMU devices Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP2/3: Add omap_type() for determining GP/EMU/HSKevin Hilman2009-06-232-11/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The omap_type() function is added and returns the DEVICETYPE field of the CONTROL_STATUS register. The result can be used for conditional code based on whether device is GP (general purpose), EMU or HS (high security). Also move the type defines so omap1 code compile does not require ifdefs for sections using these defines. This code is needed for the following fix to set the SRAM size correctly for HS omaps. Also at least PM and watchdog code will need this function. Signed-off-by: Kevin Hilman <khilman@ti.deeprootsystems.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP2/3: omap mailbox: platform_get_irq() error ignoredRoel Kluin2009-06-231-3/+3
| | | | | | | | | | | | | | | | | | platform_get_irq may return -ENXIO. but struct omap_mbox mbox_dsp_info.irq is unsigned, so the error was not noticed. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP2/3: mmc-twl4030: use correct controller in twl_mmc23_set_powerGrazvydas Ignotas2009-06-231-1/+12
| | | | | | | | | | | | | | | | | | twl_mmc23_set_power() has MMC2 twl_mmc_controller hardcoded in it, which breaks MMC3. Find the right controller to use instead. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP1: remove duplicated #includeHuang Weiyi2009-06-231-1/+0
| | | | | | | | | | | | | | Remove duplicated #include in arch/arm/mach-omap1/board-nokia770.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP1: Fix N770 MMC supportAndrew de Quincey2009-06-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the N770's MMC configuration options seem to have been dropped. This patch adds them back in again. Note that only the .ocr_mask change was /critical/, but I've added the .max_freq setting back as well, as the original sources had it. Can anyone confirm if this is unnecessary? Secondly, there is support in the original code for a 4wire/higher speed mode. As I don't have the requisite N770 hardware (I think it was a rev2 N770?) to test this, I can't really add it back. Signed-off-by: Andrew de Quincey <adq@lidskialf.net> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * OMAP1: Fix compilation of arch/arm/mach-omap1/mailbox.cJonathan McDowell2009-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | This fixes the positioning of " in MODULE_AUTHOR, which is currently causing a build failure on latest git with CONFIG_OMAP_MBOX_FWK=m; the original breakage appears to date from the end of last year in a5abbbe52b7e89a7633319c5417bd4331f7ac8ed Signed-Off-By: Jonathan McDowell <noodles@earth.li> Acked-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
* | Linux 2.6.31-rc1v2.6.31-rc1Linus Torvalds2009-06-241-2/+2
| |
* | Revert "PCI: use ACPI _CRS data by default"Linus Torvalds2009-06-245-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9e9f46c44e487af0a82eb61b624553e2f7118f5b. Quoting from the commit message: "At this point, it seems to solve more problems than it causes, so let's try using it by default. It's an easy revert if it ends up causing trouble." And guess what? The _CRS code causes trouble. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.infradead.org/battery-2.6Linus Torvalds2009-06-248-17/+420
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/battery-2.6: da9030_battery: Fix race between event handler and monitor Add MAX17040 Fuel Gauge driver w1: ds2760_battery: add support for sleep mode feature w1: ds2760: add support for EEPROM read and write ds2760_battery: cleanups in ds2760_battery_probe()
| * | da9030_battery: Fix race between event handler and monitorMike Rapoport2009-06-091-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are cases when charging monitor and the event handler try to change the charger state simultaneously. For instance, a charger is connected to the system, there's the detection event and the event handler tries to enable charging. It is possible that the periodic charging monitor runs at the same time and it still thinks there's no external charger. So it tries to disable the charging. As the result, even if the conditions necessary to charge the battery hold, there will be no actual charging. The patch changes the event handler so that instead of enabling/ disabling the charger immediately it would rather make the monitor run. The monitor code then decides what should be the charger state. Signed-off-by: Mike Rapoport <mike@compulab.co.il> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | Add MAX17040 Fuel Gauge driverMinkyu Kang2009-06-094-1/+338
| | | | | | | | | | | | | | | | | | | | | | | | The MAX17040 is a I2C interfaced Fuel Gauge systems for lithium-ion batteries This patch adds support the MAX17040 Fuel Gauge Signed-off-by: Minkyu Kang <mk7.kang@samsung.com> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | w1: ds2760_battery: add support for sleep mode featureDaniel Mack2009-06-082-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for ds2760's sleep mode feature. With this feature enabled, the chip enters a deep sleep mode and disconnects from the battery when the w1 line is held down for more than 2 seconds. This new behaviour can be switched on and off using a new module parameter. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Acked-by: Matt Reimer <mreimer@vpop.net> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | w1: ds2760: add support for EEPROM read and writeDaniel Mack2009-06-082-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to modify the DS2762's status registers and to add support for sleep mode, there is need for functions to write the internal EEPROM. Signed-off-by: Daniel Mack <daniel@caiaq.de> Acked-by: Matt Reimer <mreimer@vpop.net> Acked-by: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | ds2760_battery: cleanups in ds2760_battery_probe()Daniel Mack2009-06-081-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed struct ds2760_platform_data which wasn't defined anywhere. Indentation cleanups. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Acked-by: Matt Reimer <mreimer@vpop.net> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| | |
| \ \
*-. \ \ Merge branches 'for-linus' of ↵Linus Torvalds2009-06-246-18/+23
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/{vfs-2.6,audit-current} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: another race fix in jfs_check_acl() Get "no acls for this inode" right, fix shmem breakage inline functions left without protection of ifdef (acl) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL
| | * | | audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALLEric Paris2009-06-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even though one cannot make use of the audit watch code without CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that the audit rule filtering requires that it at least be compiled. Thus build the audit_watch code when we build auditfilter like it was before cfcad62c74abfef83762dc05a556d21bdf3980a2 Clearly this is a point of potential future cleanup.. Reported-by: Frans Pop <elendil@planet.nl> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | another race fix in jfs_check_acl()Al Viro2009-06-241-6/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | Get "no acls for this inode" right, fix shmem breakageAl Viro2009-06-244-10/+13
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | inline functions left without protection of ifdef (acl)Markus Trippelsdorf2009-06-241-1/+2
| |/ / / | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge branch 'futexes-for-linus' of ↵Linus Torvalds2009-06-241-21/+24
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Fix the write access fault problem for real
| * | | | futex: Fix the write access fault problem for realThomas Gleixner2009-06-241-21/+24
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 64d1304a64 (futex: setup writeable mapping for futex ops which modify user space data) did address only half of the problem of write access faults. The patch was made on two wrong assumptions: 1) access_ok(VERIFY_WRITE,...) would actually check write access. On x86 it does _NOT_. It's a pure address range check. 2) a RW mapped region can not go away under us. That's wrong as well. Nobody can prevent another thread to call mprotect(PROT_READ) on that region where the futex resides. If that call hits between the get_user_pages_fast() verification and the actual write access in the atomic region we are toast again. The solution is to not rely on access_ok and get_user() for any write access related fault on private and shared futexes. Instead we need to fault it in with verification of write access. There is no generic non destructive write mechanism which would fault the user page in trough a #PF, but as we already know that we will fault we can as well call get_user_pages() directly and avoid the #PF overhead. If get_user_pages() returns -EFAULT we know that we can not fix it anymore and need to bail out to user space. Remove a bunch of confusing comments on this issue as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
* | | | SLUB: Don't pass __GFP_FAIL for the initial allocationPekka Enberg2009-06-241-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLUB uses higher order allocations by default but falls back to small orders under memory pressure. Make sure the GFP mask used in the initial allocation doesn't include __GFP_NOFAIL. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Don't warn about order-1 allocations with __GFP_NOFAILLinus Torvalds2009-06-241-2/+2
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally, we never failed small orders (even regardless of any __GFP_NOFAIL flags), and slab will allocate order-1 allocations even for small allocations that could fit in a single page (in order to avoid excessive fragmentation). Maybe we should remove this warning entirely, but before making that judgement, at least limit it to bigger allocations. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds2009-06-2463-492/+996
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: Staging: octeon-ethernet: Fix race freeing transmit buffers. Staging: octeon-ethernet: Convert to use net_device_ops. MIPS: Cavium: Add CPU hotplugging code. MIPS: SMP: Allow suspend and hibernation if CPU hotplug is available MIPS: Add arch generic CPU hotplug DMA: txx9dmac: use dma_unmap_single if DMA_COMPL_{SRC,DEST}_UNMAP_SINGLE set MIPS: Sibyte: Fix build error if CONFIG_SERIAL_SB1250_DUART is undefined. MIPS: MIPSsim: Fix build error if MSC01E_INT_BASE is undefined. MIPS: Hibernation: Remove SMP TLB and cacheflushing code. MIPS: Build fix - include <linux/smp.h> into all smp_processor_id() users. MIPS: bug.h Build fix - include <linux/compiler.h>.
| * | | Staging: octeon-ethernet: Fix race freeing transmit buffers.David Daney2009-06-244-66/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code had the following race: Thread-1 Thread-2 inc/read in_use inc/read in_use inc tx_free_list[qos].len inc tx_free_list[qos].len The actual in_use value was incremented twice, but thread-1 is going to free memory based on its stale value, and will free one too many times. The result is that memory is freed back to the kernel while its packet is still in the transmit buffer. If the memory is overwritten before it is transmitted, the hardware will put a valid checksum on it and send it out (just like it does with good packets). If by chance the TCP flags are clobbered but not the addresses or ports, the result can be a broken TCP stream. The fix is to track the number of freed packets in a single location (a Fetch-and-Add Unit register). That way it can never get out of sync with itself. We try to free up to MAX_SKB_TO_FREE (currently 10) buffers at a time. If fewer are available we adjust the free count with the difference. The action of claiming buffers to free is atomic so two threads cannot claim the same buffers. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | Staging: octeon-ethernet: Convert to use net_device_ops.David Daney2009-06-2410-396/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the driver to use net_device_ops as it is now mandatory. Also compensate for the removal of struct sk_buff's dst field. The changes are mostly mechanical, the content of ethernet-common.c was moved to ethernet.c and ethernet-common.{c,h} are removed. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Cavium: Add CPU hotplugging code.Ralf Baechle2009-06-245-2/+365
| | | | | | | | | | | | | | | | | | | | | | | | Thanks to Cavium Inc. for the code contribution and help. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: SMP: Allow suspend and hibernation if CPU hotplug is availableRalf Baechle2009-06-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SMP implementation of suspend and hibernate depends on CPU hotplugging. In the past we didn't have CPU hotplug so suspend and hibernation were not possible on SMP systems. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Add arch generic CPU hotplugRalf Baechle2009-06-247-7/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each platform has to add support for CPU hotplugging itself by providing suitable definitions for the cpu_disable and cpu_die of the smp_ops methods and setting SYS_SUPPORTS_HOTPLUG_CPU. A platform should only set SYS_SUPPORTS_HOTPLUG_CPU once all it's smp_ops definitions have the necessary changes. This patch contains the changes to the dummy smp_ops definition for uni-processor systems. Parts of the code contributed by Cavium Inc. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | DMA: txx9dmac: use dma_unmap_single if DMA_COMPL_{SRC,DEST}_UNMAP_SINGLE setAtsushi Nemoto2009-06-241-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does not change actual behaviour since dma_unmap_page is just an alias of dma_unmap_single on MIPS. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Sibyte: Fix build error if CONFIG_SERIAL_SB1250_DUART is undefined.Ralf Baechle2009-06-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes kernel.org bugzilla 13596, see http://bugzilla.kernel.org/show_bug.cgi?id=13596 Reported-by: dvice_null@yahoo.com Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: MIPSsim: Fix build error if MSC01E_INT_BASE is undefined.Ralf Baechle2009-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes kernel.org bugzilla 13595, see http://bugzilla.kernel.org/show_bug.cgi?id=13595 Reported-by: dvice_null@yahoo.com Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Hibernation: Remove SMP TLB and cacheflushing code.Ralf Baechle2009-06-241-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't perform any flushes on SMP from swsusp_arch_resume because interrupts are disabled. A cross-CPU flush is unnecessary anyway because all but the local CPU have already been disabled. A local flush is not needed either because we didn't change any mappings. So just delete the code. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Build fix - include <linux/smp.h> into all smp_processor_id() users.Ralf Baechle2009-06-2439-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the were relying into smp.h being dragged in by another header which of course is fragile. <asm/cpu-info.h> uses smp_processor_id() only in macros and including smp.h there leads to an include loop, so don't change cpu-info.h. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: bug.h Build fix - include <linux/compiler.h>.Ralf Baechle2009-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | In the past this file somehow used to be dragged in. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds2009-06-2435-435/+3993
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (48 commits) dm mpath: change to be request based dm: disable interrupt when taking map_lock dm: do not set QUEUE_ORDERED_DRAIN if request based dm: enable request based option dm: prepare for request based option dm raid1: add userspace log dm: calculate queue limits during resume not load dm log: fix create_log_context to use logical_block_size of log device dm target:s introduce iterate devices fn dm table: establish queue limits by copying table limits dm table: replace struct io_restrictions with struct queue_limits dm table: validate device logical_block_size dm table: ensure targets are aligned to logical_block_size dm ioctl: support cookies for udev dm: sysfs add suspended attribute dm table: improve warning message when devices not freed before destruction dm mpath: add service time load balancer dm mpath: add queue length load balancer dm mpath: add start_io and nr_bytes to path selectors dm snapshot: use barrier when writing exception store ...
| * | | | dm mpath: change to be request basedKiyoshi Ueda2009-06-221-65/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts dm-multipath target to request-based from bio-based. Basically, the patch just converts the I/O unit from struct bio to struct request. In the course of the conversion, it also changes the I/O queueing mechanism. The change in the I/O queueing is described in details as follows. I/O queueing mechanism change ----------------------------- In I/O submission, map_io(), there is no mechanism change from bio-based, since the clone request is ready for retry as it is. However, in I/O complition, do_end_io(), there is a mechanism change from bio-based, since the clone request is not ready for retry. In do_end_io() of bio-based, the clone bio has all needed memory for resubmission. So the target driver can queue it and resubmit it later without memory allocations. The mechanism has almost no overhead. On the other hand, in do_end_io() of request-based, the clone request doesn't have clone bios, so the target driver can't resubmit it as it is. To resubmit the clone request, memory allocation for clone bios is needed, and it takes some overheads. To avoid the overheads just for queueing, the target driver doesn't queue the clone request inside itself. Instead, the target driver asks dm core for queueing and remapping the original request of the clone request, since the overhead for queueing is just a freeing memory for the clone request. As a result, the target driver doesn't need to record/restore the information of the original request for resubmitting the clone request. So dm_bio_details in dm_mpath_io is removed. multipath_busy() --------------------- The target driver returns "busy", only when the following case: o The target driver will map I/Os, if map() function is called and o The mapped I/Os will wait on underlying device's queue due to their congestions, if map() function is called now. In other cases, the target driver doesn't return "busy". Otherwise, dm core will keep the I/Os and the target driver can't do what it wants. (e.g. the target driver can't map I/Os now, so wants to kill I/Os.) Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | dm: disable interrupt when taking map_lockKiyoshi Ueda2009-06-221-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch disables interrupt when taking map_lock to avoid lockdep warnings in request-based dm. request-based dm takes map_lock after taking queue_lock with disabling interrupt: spin_lock_irqsave(queue_lock) q->request_fn() == dm_request_fn() => dm_get_table() => read_lock(map_lock) while queue_lock could be (but isn't) taken in interrupt context. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Christof Schmitt <christof.schmitt@de.ibm.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | dm: do not set QUEUE_ORDERED_DRAIN if request basedKiyoshi Ueda2009-06-223-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Request-based dm doesn't have barrier support yet. So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm. Since the device type is decided at the first table loading time, the flag set is deferred until then. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | dm: enable request based optionKiyoshi Ueda2009-06-224-26/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables request-based dm. o Request-based dm and bio-based dm coexist, since there are some target drivers which are more fitting to bio-based dm. Also, there are other bio-based devices in the kernel (e.g. md, loop). Since bio-based device can't receive struct request, there are some limitations on device stacking between bio-based and request-based. type of underlying device bio-based request-based ---------------------------------------------- bio-based OK OK request-based -- OK The device type is recognized by the queue flag in the kernel, so dm follows that. o The type of a dm device is decided at the first table binding time. Once the type of a dm device is decided, the type can't be changed. o Mempool allocations are deferred to at the table loading time, since mempools for request-based dm are different from those for bio-based dm and needed mempool type is fixed by the type of table. o Currently, request-based dm supports only tables that have a single target. To support multiple targets, we need to support request splitting or prevent bio/request from spanning multiple targets. The former needs lots of changes in the block layer, and the latter needs that all target drivers support merge() function. Both will take a time. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | dm: prepare for request based optionKiyoshi Ueda2009-06-224-4/+725
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds core functions for request-based dm. When struct mapped device (md) is initialized, md->queue has an I/O scheduler and the following functions are used for request-based dm as the queue functions: make_request_fn: dm_make_request() pref_fn: dm_prep_fn() request_fn: dm_request_fn() softirq_done_fn: dm_softirq_done() lld_busy_fn: dm_lld_busy() Actual initializations are done in another patch (PATCH 2). Below is a brief summary of how request-based dm behaves, including: - making request from bio - cloning, mapping and dispatching request - completing request and bio - suspending md - resuming md bio to request ============== md->queue->make_request_fn() (dm_make_request()) calls __make_request() for a bio submitted to the md. Then, the bio is kept in the queue as a new request or merged into another request in the queue if possible. Cloning and Mapping =================== Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()), when requests are dispatched after they are sorted by the I/O scheduler. dm_request_fn() checks busy state of underlying devices using target's busy() function and stops dispatching requests to keep them on the dm device's queue if busy. It helps better I/O merging, since no merge is done for a request once it is dispatched to underlying devices. Actual cloning and mapping are done in dm_prep_fn() and map_request() called from dm_request_fn(). dm_prep_fn() clones not only request but also bios of the request so that dm can hold bio completion in error cases and prevent the bio submitter from noticing the error. (See the "Completion" section below for details.) After the cloning, the clone is mapped by target's map_rq() function and inserted to underlying device's queue using blk_insert_cloned_request(). Completion ========== Request completion can be hooked by rq->end_io(), but then, all bios in the request will have been completed even error cases, and the bio submitter will have noticed the error. To prevent the bio completion in error cases, request-based dm clones both bio and request and hooks both bio->bi_end_io() and rq->end_io(): bio->bi_end_io(): end_clone_bio() rq->end_io(): end_clone_request() Summary of the request completion flow is below: blk_end_request() for a clone request => blk_update_request() => bio->bi_end_io() == end_clone_bio() for each clone bio => Free the clone bio => Success: Complete the original bio (blk_update_request()) Error: Don't complete the original bio => blk_finish_request() => rq->end_io() == end_clone_request() => blk_complete_request() => dm_softirq_done() => Free the clone request => Success: Complete the original request (blk_end_request()) Error: Requeue the original request end_clone_bio() completes the original request on the size of the original bio in successful cases. Even if all bios in the original request are completed by that completion, the original request must not be completed yet to keep the ordering of request completion for the stacking. So end_clone_bio() uses blk_update_request() instead of blk_end_request(). In error cases, end_clone_bio() doesn't complete the original bio. It just frees the cloned bio and gives over the error handling to end_clone_request(). end_clone_request(), which is called with queue lock held, completes the clone request and the original request in a softirq context (dm_softirq_done()), which has no queue lock, to avoid a deadlock issue on submission of another request during the completion: - The submitted request may be mapped to the same device - Request submission requires queue lock, but the queue lock has been held by itself and it doesn't know that The clone request has no clone bio when dm_softirq_done() is called. So target drivers can't resubmit it again even error cases. Instead, they can ask dm core for requeueing and remapping the original request in that cases. suspend ======= Request-based dm uses stopping md->queue as suspend of the md. For noflush suspend, just stops md->queue. For flush suspend, inserts a marker request to the tail of md->queue. And dispatches all requests in md->queue until the marker comes to the front of md->queue. Then, stops dispatching request and waits for the all dispatched requests to complete. After that, completes the marker request, stops md->queue and wake up the waiter on the suspend queue, md->wait. resume ====== Starts md->queue. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | dm raid1: add userspace logJonthan Brassow2009-06-229-1/+1448
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains a device-mapper mirror log module that forwards requests to userspace for processing. The structures used for communication between kernel and userspace are located in include/linux/dm-log-userspace.h. Due to the frequency, diversity, and 2-way communication nature of the exchanges between kernel and userspace, 'connector' was chosen as the interface for communication. The first log implementations written in userspace - "clustered-disk" and "clustered-core" - support clustered shared storage. A userspace daemon (in the LVM2 source code repository) uses openAIS/corosync to process requests in an ordered fashion with the rest of the nodes in the cluster so as to prevent log state corruption. Other implementations with no association to LVM or openAIS/corosync, are certainly possible. (Imagine if two machines are writing to the same region of a mirror. They would both mark the region dirty, but you need a cluster-aware entity that can handle properly marking the region clean when they are done. Otherwise, you might clear the region when the first machine is done, not the second.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Alasdair G Kergon <agk@redhat.com>