summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-block-rssd12
-rw-r--r--Documentation/ABI/testing/sysfs-bus-fcoe77
-rw-r--r--Documentation/ABI/testing/sysfs-bus-i2c-devices-lm353315
-rw-r--r--Documentation/ABI/testing/sysfs-bus-rbd4
-rw-r--r--Documentation/ABI/testing/sysfs-class-backlight-driver-lm353348
-rw-r--r--Documentation/ABI/testing/sysfs-class-led-driver-lm353365
-rw-r--r--Documentation/ABI/testing/sysfs-class-mtd51
-rw-r--r--Documentation/CodingStyle16
-rw-r--r--Documentation/DocBook/mtdnand.tmpl2
-rw-r--r--Documentation/arm/OMAP/DSS46
-rw-r--r--Documentation/cgroups/memory.txt37
-rw-r--r--Documentation/cgroups/resource_counter.txt8
-rw-r--r--Documentation/device-mapper/thin-provisioning.txt11
-rw-r--r--Documentation/devicetree/bindings/gpio/gpio-mm-lantiq.txt38
-rw-r--r--Documentation/devicetree/bindings/gpio/gpio-stp-xway.txt42
-rw-r--r--Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt14
-rw-r--r--Documentation/devicetree/bindings/mfd/da9052-i2c.txt60
-rw-r--r--Documentation/devicetree/bindings/mfd/tps65910.txt133
-rw-r--r--Documentation/devicetree/bindings/mfd/twl6040.txt62
-rw-r--r--Documentation/devicetree/bindings/mtd/gpmi-nand.txt33
-rw-r--r--Documentation/devicetree/bindings/mtd/mxc-nand.txt19
-rw-r--r--Documentation/devicetree/bindings/rtc/lpc32xx-rtc.txt15
-rw-r--r--Documentation/devicetree/bindings/rtc/spear-rtc.txt17
-rw-r--r--Documentation/feature-removal-schedule.txt6
-rw-r--r--Documentation/filesystems/Locking5
-rw-r--r--Documentation/filesystems/proc.txt25
-rw-r--r--Documentation/filesystems/vfs.txt17
-rw-r--r--Documentation/i2c/functionality9
-rw-r--r--Documentation/i2c/i2c-protocol9
-rw-r--r--Documentation/kernel-parameters.txt15
-rw-r--r--Documentation/leds/ledtrig-transient.txt152
-rw-r--r--Documentation/power/charger-manager.txt41
-rw-r--r--Documentation/power/power_supply_class.txt2
-rw-r--r--Documentation/sysctl/fs.txt7
-rw-r--r--Documentation/vm/frontswap.txt278
-rw-r--r--Documentation/vm/pagemap.txt2
-rw-r--r--Documentation/vm/slub.txt2
-rw-r--r--Documentation/vm/transhuge.txt62
-rw-r--r--Documentation/watchdog/watchdog-kernel-api.txt43
-rw-r--r--Documentation/watchdog/watchdog-parameters.txt5
-rw-r--r--Documentation/x86/efi-stub.txt65
41 files changed, 1525 insertions, 45 deletions
diff --git a/Documentation/ABI/testing/sysfs-block-rssd b/Documentation/ABI/testing/sysfs-block-rssd
index d535757799fe..679ce3543122 100644
--- a/Documentation/ABI/testing/sysfs-block-rssd
+++ b/Documentation/ABI/testing/sysfs-block-rssd
@@ -6,13 +6,21 @@ Description: This is a read-only file. Dumps below driver information and
hardware registers.
- S ACTive
- Command Issue
- - Allocated
- Completed
- PORT IRQ STAT
- HOST IRQ STAT
+ - Allocated
+ - Commands in Q
What: /sys/block/rssd*/status
Date: April 2012
KernelVersion: 3.4
Contact: Asai Thambi S P <asamymuthupa@micron.com>
-Description: This is a read-only file. Indicates the status of the device.
+Description: This is a read-only file. Indicates the status of the device.
+
+What: /sys/block/rssd*/flags
+Date: May 2012
+KernelVersion: 3.5
+Contact: Asai Thambi S P <asamymuthupa@micron.com>
+Description: This is a read-only file. Dumps the flags in port and driver
+ data structure
diff --git a/Documentation/ABI/testing/sysfs-bus-fcoe b/Documentation/ABI/testing/sysfs-bus-fcoe
new file mode 100644
index 000000000000..469d09c02f6b
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-fcoe
@@ -0,0 +1,77 @@
+What: /sys/bus/fcoe/ctlr_X
+Date: March 2012
+KernelVersion: TBD
+Contact: Robert Love <robert.w.love@intel.com>, devel@open-fcoe.org
+Description: 'FCoE Controller' instances on the fcoe bus
+Attributes:
+
+ fcf_dev_loss_tmo: Device loss timeout peroid (see below). Changing
+ this value will change the dev_loss_tmo for all
+ FCFs discovered by this controller.
+
+ lesb_link_fail: Link Error Status Block (LESB) link failure count.
+
+ lesb_vlink_fail: Link Error Status Block (LESB) virtual link
+ failure count.
+
+ lesb_miss_fka: Link Error Status Block (LESB) missed FCoE
+ Initialization Protocol (FIP) Keep-Alives (FKA).
+
+ lesb_symb_err: Link Error Status Block (LESB) symbolic error count.
+
+ lesb_err_block: Link Error Status Block (LESB) block error count.
+
+ lesb_fcs_error: Link Error Status Block (LESB) Fibre Channel
+ Serivces error count.
+
+Notes: ctlr_X (global increment starting at 0)
+
+What: /sys/bus/fcoe/fcf_X
+Date: March 2012
+KernelVersion: TBD
+Contact: Robert Love <robert.w.love@intel.com>, devel@open-fcoe.org
+Description: 'FCoE FCF' instances on the fcoe bus. A FCF is a Fibre Channel
+ Forwarder, which is a FCoE switch that can accept FCoE
+ (Ethernet) packets, unpack them, and forward the embedded
+ Fibre Channel frames into a FC fabric. It can also take
+ outbound FC frames and pack them in Ethernet packets to
+ be sent to their destination on the Ethernet segment.
+Attributes:
+
+ fabric_name: Identifies the fabric that the FCF services.
+
+ switch_name: Identifies the FCF.
+
+ priority: The switch's priority amongst other FCFs on the same
+ fabric.
+
+ selected: 1 indicates that the switch has been selected for use;
+ 0 indicates that the swich will not be used.
+
+ fc_map: The Fibre Channel MAP
+
+ vfid: The Virtual Fabric ID
+
+ mac: The FCF's MAC address
+
+ fka_peroid: The FIP Keep-Alive peroid
+
+ fabric_state: The internal kernel state
+ "Unknown" - Initialization value
+ "Disconnected" - No link to the FCF/fabric
+ "Connected" - Host is connected to the FCF
+ "Deleted" - FCF is being removed from the system
+
+ dev_loss_tmo: The device loss timeout peroid for this FCF.
+
+Notes: A device loss infrastructre similar to the FC Transport's
+ is present in fcoe_sysfs. It is nice to have so that a
+ link flapping adapter doesn't continually advance the count
+ used to identify the discovered FCF. FCFs will exist in a
+ "Disconnected" state until either the timer expires and the
+ FCF becomes "Deleted" or the FCF is rediscovered and becomes
+ "Connected."
+
+
+Users: The first user of this interface will be the fcoeadm application,
+ which is commonly packaged in the fcoe-utils package.
diff --git a/Documentation/ABI/testing/sysfs-bus-i2c-devices-lm3533 b/Documentation/ABI/testing/sysfs-bus-i2c-devices-lm3533
new file mode 100644
index 000000000000..1b62230b33b9
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-i2c-devices-lm3533
@@ -0,0 +1,15 @@
+What: /sys/bus/i2c/devices/.../output_hvled[n]
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the controlling backlight device for high-voltage current
+ sink HVLED[n] (n = 1, 2) (0, 1).
+
+What: /sys/bus/i2c/devices/.../output_lvled[n]
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the controlling led device for low-voltage current sink
+ LVLED[n] (n = 1..5) (0..3).
diff --git a/Documentation/ABI/testing/sysfs-bus-rbd b/Documentation/ABI/testing/sysfs-bus-rbd
index dbedafb095e2..bcd88eb7ebcd 100644
--- a/Documentation/ABI/testing/sysfs-bus-rbd
+++ b/Documentation/ABI/testing/sysfs-bus-rbd
@@ -65,11 +65,11 @@ snap_*
Entries under /sys/bus/rbd/devices/<dev-id>/snap_<snap-name>
-------------------------------------------------------------
-id
+snap_id
The rados internal snapshot id assigned for this snapshot
-size
+snap_size
The size of the image when this snapshot was taken.
diff --git a/Documentation/ABI/testing/sysfs-class-backlight-driver-lm3533 b/Documentation/ABI/testing/sysfs-class-backlight-driver-lm3533
new file mode 100644
index 000000000000..77cf7ac949af
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-backlight-driver-lm3533
@@ -0,0 +1,48 @@
+What: /sys/class/backlight/<backlight>/als_channel
+Date: May 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Get the ALS output channel used as input in
+ ALS-current-control mode (0, 1), where
+
+ 0 - out_current0 (backlight 0)
+ 1 - out_current1 (backlight 1)
+
+What: /sys/class/backlight/<backlight>/als_en
+Date: May 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Enable ALS-current-control mode (0, 1).
+
+What: /sys/class/backlight/<backlight>/id
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Get the id of this backlight (0, 1).
+
+What: /sys/class/backlight/<backlight>/linear
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the brightness-mapping mode (0, 1), where
+
+ 0 - exponential mode
+ 1 - linear mode
+
+What: /sys/class/backlight/<backlight>/pwm
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the PWM-input control mask (5 bits), where
+
+ bit 5 - PWM-input enabled in Zone 4
+ bit 4 - PWM-input enabled in Zone 3
+ bit 3 - PWM-input enabled in Zone 2
+ bit 2 - PWM-input enabled in Zone 1
+ bit 1 - PWM-input enabled in Zone 0
+ bit 0 - PWM-input enabled
diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-lm3533 b/Documentation/ABI/testing/sysfs-class-led-driver-lm3533
new file mode 100644
index 000000000000..620ebb3b9baa
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-led-driver-lm3533
@@ -0,0 +1,65 @@
+What: /sys/class/leds/<led>/als_channel
+Date: May 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the ALS output channel to use as input in
+ ALS-current-control mode (1, 2), where
+
+ 1 - out_current1
+ 2 - out_current2
+
+What: /sys/class/leds/<led>/als_en
+Date: May 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Enable ALS-current-control mode (0, 1).
+
+What: /sys/class/leds/<led>/falltime
+What: /sys/class/leds/<led>/risetime
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the pattern generator fall and rise times (0..7), where
+
+ 0 - 2048 us
+ 1 - 262 ms
+ 2 - 524 ms
+ 3 - 1.049 s
+ 4 - 2.097 s
+ 5 - 4.194 s
+ 6 - 8.389 s
+ 7 - 16.78 s
+
+What: /sys/class/leds/<led>/id
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Get the id of this led (0..3).
+
+What: /sys/class/leds/<led>/linear
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the brightness-mapping mode (0, 1), where
+
+ 0 - exponential mode
+ 1 - linear mode
+
+What: /sys/class/leds/<led>/pwm
+Date: April 2012
+KernelVersion: 3.5
+Contact: Johan Hovold <jhovold@gmail.com>
+Description:
+ Set the PWM-input control mask (5 bits), where
+
+ bit 5 - PWM-input enabled in Zone 4
+ bit 4 - PWM-input enabled in Zone 3
+ bit 3 - PWM-input enabled in Zone 2
+ bit 2 - PWM-input enabled in Zone 1
+ bit 1 - PWM-input enabled in Zone 0
+ bit 0 - PWM-input enabled
diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
index 4d55a1888981..db1ad7e34fc3 100644
--- a/Documentation/ABI/testing/sysfs-class-mtd
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -123,3 +123,54 @@ Description:
half page, or a quarter page).
In the case of ECC NOR, it is the ECC block size.
+
+What: /sys/class/mtd/mtdX/ecc_strength
+Date: April 2012
+KernelVersion: 3.4
+Contact: linux-mtd@lists.infradead.org
+Description:
+ Maximum number of bit errors that the device is capable of
+ correcting within each region covering an ecc step. This will
+ always be a non-negative integer. Note that some devices will
+ have multiple ecc steps within each writesize region.
+
+ In the case of devices lacking any ECC capability, it is 0.
+
+What: /sys/class/mtd/mtdX/bitflip_threshold
+Date: April 2012
+KernelVersion: 3.4
+Contact: linux-mtd@lists.infradead.org
+Description:
+ This allows the user to examine and adjust the criteria by which
+ mtd returns -EUCLEAN from mtd_read(). If the maximum number of
+ bit errors that were corrected on any single region comprising
+ an ecc step (as reported by the driver) equals or exceeds this
+ value, -EUCLEAN is returned. Otherwise, absent an error, 0 is
+ returned. Higher layers (e.g., UBI) use this return code as an
+ indication that an erase block may be degrading and should be
+ scrutinized as a candidate for being marked as bad.
+
+ The initial value may be specified by the flash device driver.
+ If not, then the default value is ecc_strength.
+
+ The introduction of this feature brings a subtle change to the
+ meaning of the -EUCLEAN return code. Previously, it was
+ interpreted to mean simply "one or more bit errors were
+ corrected". Its new interpretation can be phrased as "a
+ dangerously high number of bit errors were corrected on one or
+ more regions comprising an ecc step". The precise definition of
+ "dangerously high" can be adjusted by the user with
+ bitflip_threshold. Users are discouraged from doing this,
+ however, unless they know what they are doing and have intimate
+ knowledge of the properties of their device. Broadly speaking,
+ bitflip_threshold should be low enough to detect genuine erase
+ block degradation, but high enough to avoid the consequences of
+ a persistent return value of -EUCLEAN on devices where sticky
+ bitflips occur. Note that if bitflip_threshold exceeds
+ ecc_strength, -EUCLEAN is never returned by mtd_read().
+ Conversely, if bitflip_threshold is zero, -EUCLEAN is always
+ returned, absent a hard error.
+
+ This is generally applicable only to NAND flash devices with ECC
+ capability. It is ignored on devices lacking ECC capability;
+ i.e., devices for which ecc_strength is zero.
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index c58b236bbe04..cb9258b8fd35 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -671,8 +671,9 @@ ones already enabled by DEBUG.
Chapter 14: Allocating memory
The kernel provides the following general purpose memory allocators:
-kmalloc(), kzalloc(), kcalloc(), vmalloc(), and vzalloc(). Please refer to
-the API documentation for further information about them.
+kmalloc(), kzalloc(), kmalloc_array(), kcalloc(), vmalloc(), and
+vzalloc(). Please refer to the API documentation for further information
+about them.
The preferred form for passing a size of a struct is the following:
@@ -686,6 +687,17 @@ Casting the return value which is a void pointer is redundant. The conversion
from void pointer to any other pointer type is guaranteed by the C programming
language.
+The preferred form for allocating an array is the following:
+
+ p = kmalloc_array(n, sizeof(...), ...);
+
+The preferred form for allocating a zeroed array is the following:
+
+ p = kcalloc(n, sizeof(...), ...);
+
+Both forms check for overflow on the allocation size n * sizeof(...),
+and return NULL if that occurred.
+
Chapter 15: The inline disease
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index 0c674be0d3c6..e0aedb7a7827 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -1119,8 +1119,6 @@ in this page</entry>
These constants are defined in nand.h. They are ored together to describe
the chip functionality.
<programlisting>
-/* Chip can not auto increment pages */
-#define NAND_NO_AUTOINCR 0x00000001
/* Buswitdh is 16 bit */
#define NAND_BUSWIDTH_16 0x00000002
/* Device supports partial programming without padding */
diff --git a/Documentation/arm/OMAP/DSS b/Documentation/arm/OMAP/DSS
index 888ae7b83ae4..a564ceea9e98 100644
--- a/Documentation/arm/OMAP/DSS
+++ b/Documentation/arm/OMAP/DSS
@@ -47,6 +47,51 @@ flexible way to enable non-common multi-display configuration. In addition to
modelling the hardware overlays, omapdss supports virtual overlays and overlay
managers. These can be used when updating a display with CPU or system DMA.
+omapdss driver support for audio
+--------------------------------
+There exist several display technologies and standards that support audio as
+well. Hence, it is relevant to update the DSS device driver to provide an audio
+interface that may be used by an audio driver or any other driver interested in
+the functionality.
+
+The audio_enable function is intended to prepare the relevant
+IP for playback (e.g., enabling an audio FIFO, taking in/out of reset
+some IP, enabling companion chips, etc). It is intended to be called before
+audio_start. The audio_disable function performs the reverse operation and is
+intended to be called after audio_stop.
+
+While a given DSS device driver may support audio, it is possible that for
+certain configurations audio is not supported (e.g., an HDMI display using a
+VESA video timing). The audio_supported function is intended to query whether
+the current configuration of the display supports audio.
+
+The audio_config function is intended to configure all the relevant audio
+parameters of the display. In order to make the function independent of any
+specific DSS device driver, a struct omap_dss_audio is defined. Its purpose
+is to contain all the required parameters for audio configuration. At the
+moment, such structure contains pointers to IEC-60958 channel status word
+and CEA-861 audio infoframe structures. This should be enough to support
+HDMI and DisplayPort, as both are based on CEA-861 and IEC-60958.
+
+The audio_enable/disable, audio_config and audio_supported functions could be
+implemented as functions that may sleep. Hence, they should not be called
+while holding a spinlock or a readlock.
+
+The audio_start/audio_stop function is intended to effectively start/stop audio
+playback after the configuration has taken place. These functions are designed
+to be used in an atomic context. Hence, audio_start should return quickly and be
+called only after all the needed resources for audio playback (audio FIFOs,
+DMA channels, companion chips, etc) have been enabled to begin data transfers.
+audio_stop is designed to only stop the audio transfers. The resources used
+for playback are released using audio_disable.
+
+The enum omap_dss_audio_state may be used to help the implementations of
+the interface to keep track of the audio state. The initial state is _DISABLED;
+then, the state transitions to _CONFIGURED, and then, when it is ready to
+play audio, to _ENABLED. The state _PLAYING is used when the audio is being
+rendered.
+
+
Panel and controller drivers
----------------------------
@@ -156,6 +201,7 @@ timings Display timings (pixclock,xres/hfp/hbp/hsw,yres/vfp/vbp/vsw)
"pal" and "ntsc"
panel_name
tear_elim Tearing elimination 0=off, 1=on
+output_type Output type (video encoder only): "composite" or "svideo"
There are also some debugfs files at <debugfs>/omapdss/ which show information
about clocks and registers.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 9b1067afb224..dd88540bb995 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -184,12 +184,14 @@ behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
+But see section 8.2: when moving a task to another cgroup, its pages may
+be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
+
Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
be backed into memory in force, charges for pages are accounted against the
caller of swapoff rather than the users of shmem.
-
2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
Swap Extension allows you to record charge for swap. A swapped-in page is
@@ -374,14 +376,15 @@ cgroup might have some charge associated with it, even though all
tasks have migrated away from it. (because we charge against pages, not
against tasks.)
-Such charges are freed or moved to their parent. At moving, both of RSS
-and CACHES are moved to parent.
-rmdir() may return -EBUSY if freeing/moving fails. See 5.1 also.
+We move the stats to root (if use_hierarchy==0) or parent (if
+use_hierarchy==1), and no change on the charge except uncharging
+from the child.
Charges recorded in swap information is not updated at removal of cgroup.
Recorded information is discarded and a cgroup which uses swap (swapcache)
will be charged as a new owner of it.
+About use_hierarchy, see Section 6.
5. Misc. interfaces.
@@ -394,13 +397,15 @@ will be charged as a new owner of it.
Almost all pages tracked by this memory cgroup will be unmapped and freed.
Some pages cannot be freed because they are locked or in-use. Such pages are
- moved to parent and this cgroup will be empty. This may return -EBUSY if
- VM is too busy to free/move all pages immediately.
+ moved to parent(if use_hierarchy==1) or root (if use_hierarchy==0) and this
+ cgroup will be empty.
Typical use case of this interface is that calling this before rmdir().
Because rmdir() moves all pages to parent, some out-of-use page caches can be
moved to the parent. If you want to avoid that, force_empty will be useful.
+ About use_hierarchy, see Section 6.
+
5.2 stat file
memory.stat file includes following statistics
@@ -430,17 +435,10 @@ hierarchical_memory_limit - # of bytes of memory limit with regard to hierarchy
hierarchical_memsw_limit - # of bytes of memory+swap limit with regard to
hierarchy under which memory cgroup is.
-total_cache - sum of all children's "cache"
-total_rss - sum of all children's "rss"
-total_mapped_file - sum of all children's "cache"
-total_pgpgin - sum of all children's "pgpgin"
-total_pgpgout - sum of all children's "pgpgout"
-total_swap - sum of all children's "swap"
-total_inactive_anon - sum of all children's "inactive_anon"
-total_active_anon - sum of all children's "active_anon"
-total_inactive_file - sum of all children's "inactive_file"
-total_active_file - sum of all children's "active_file"
-total_unevictable - sum of all children's "unevictable"
+total_<counter> - # hierarchical version of <counter>, which in
+ addition to the cgroup's own value includes the
+ sum of all hierarchical children's values of
+ <counter>, i.e. total_cache
# The following additional stats are dependent on CONFIG_DEBUG_VM.
@@ -622,8 +620,7 @@ memory cgroup.
bit | what type of charges would be moved ?
-----+------------------------------------------------------------------------
0 | A charge of an anonymous page(or swap of it) used by the target task.
- | Those pages and swaps must be used only by the target task. You must
- | enable Swap Extension(see 2.4) to enable move of swap charges.
+ | You must enable Swap Extension(see 2.4) to enable move of swap charges.
-----+------------------------------------------------------------------------
1 | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory)
| and swaps of tmpfs file) mmapped by the target task. Unlike the case of
@@ -636,8 +633,6 @@ memory cgroup.
8.3 TODO
-- Implement madvise(2) to let users decide the vma to be moved or not to be
- moved.
- All of moving charge operations are done under cgroup_mutex. It's not good
behavior to hold the mutex too long, so we may need some trick.
diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
index f3c4ec3626a2..0c4a344e78fa 100644
--- a/Documentation/cgroups/resource_counter.txt
+++ b/Documentation/cgroups/resource_counter.txt
@@ -92,6 +92,14 @@ to work with it.
The _locked routines imply that the res_counter->lock is taken.
+ f. void res_counter_uncharge_until
+ (struct res_counter *rc, struct res_counter *top,
+ unsinged long val)
+
+ Almost same as res_cunter_uncharge() but propagation of uncharge
+ stops when rc == top. This is useful when kill a res_coutner in
+ child cgroup.
+
2.1 Other accounting routines
There are more routines that may help you with common needs, like
diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 3370bc4d7b98..f5cfc62b7ad3 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -287,6 +287,17 @@ iii) Messages
the current transaction id is when you change it with this
compare-and-swap message.
+ reserve_metadata_snap
+
+ Reserve a copy of the data mapping btree for use by userland.
+ This allows userland to inspect the mappings as they were when
+ this message was executed. Use the pool's status command to
+ get the root block associated with the metadata snapshot.
+
+ release_metadata_snap
+
+ Release a previously reserved copy of the data mapping btree.
+
'thin' target
-------------
diff --git a/Documentation/devicetree/bindings/gpio/gpio-mm-lantiq.txt b/Documentation/devicetree/bindings/gpio/gpio-mm-lantiq.txt
new file mode 100644
index 000000000000..f93d51478d5a
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-mm-lantiq.txt
@@ -0,0 +1,38 @@
+Lantiq SoC External Bus memory mapped GPIO controller
+
+By attaching hardware latches to the EBU it is possible to create output
+only gpios. This driver configures a special memory address, which when
+written to outputs 16 bit to the latches.
+
+The node describing the memory mapped GPIOs needs to be a child of the node
+describing the "lantiq,localbus".
+
+Required properties:
+- compatible : Should be "lantiq,gpio-mm-lantiq"
+- reg : Address and length of the register set for the device
+- #gpio-cells : Should be two. The first cell is the pin number and
+ the second cell is used to specify optional parameters (currently
+ unused).
+- gpio-controller : Marks the device node as a gpio controller.
+
+Optional properties:
+- lantiq,shadow : The default value that we shall assume as already set on the
+ shift register cascade.
+
+Example:
+
+localbus@0 {
+ #address-cells = <2>;
+ #size-cells = <1>;
+ ranges = <0 0 0x0 0x3ffffff /* addrsel0 */
+ 1 0 0x4000000 0x4000010>; /* addsel1 */
+ compatible = "lantiq,localbus", "simple-bus";
+
+ gpio_mm0: gpio@4000000 {
+ compatible = "lantiq,gpio-mm";
+ reg = <1 0x0 0x10>;
+ gpio-controller;
+ #gpio-cells = <2>;
+ lantiq,shadow = <0x77f>
+ };
+}
diff --git a/Documentation/devicetree/bindings/gpio/gpio-stp-xway.txt b/Documentation/devicetree/bindings/gpio/gpio-stp-xway.txt
new file mode 100644
index 000000000000..854de130a971
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-stp-xway.txt
@@ -0,0 +1,42 @@
+Lantiq SoC Serial To Parallel (STP) GPIO controller
+
+The Serial To Parallel (STP) is found on MIPS based Lantiq socs. It is a
+peripheral controller used to drive external shift register cascades. At most
+3 groups of 8 bits can be driven. The hardware is able to allow the DSL modem
+to drive the 2 LSBs of the cascade automatically.
+
+
+Required properties:
+- compatible : Should be "lantiq,gpio-stp-xway"
+- reg : Address and length of the register set for the device
+- #gpio-cells : Should be two. The first cell is the pin number and
+ the second cell is used to specify optional parameters (currently
+ unused).
+- gpio-controller : Marks the device node as a gpio controller.
+
+Optional properties:
+- lantiq,shadow : The default value that we shall assume as already set on the
+ shift register cascade.
+- lantiq,groups : Set the 3 bit mask to select which of the 3 groups are enabled
+ in the shift register cascade.
+- lantiq,dsl : The dsl core can control the 2 LSBs of the gpio cascade. This 2 bit
+ property can enable this feature.
+- lantiq,phy1 : The gphy1 core can control 3 bits of the gpio cascade.
+- lantiq,phy2 : The gphy2 core can control 3 bits of the gpio cascade.
+- lantiq,rising : use rising instead of falling edge for the shift register
+
+Example:
+
+gpio1: stp@E100BB0 {
+ compatible = "lantiq,gpio-stp-xway";
+ reg = <0xE100BB0 0x40>;
+ #gpio-cells = <2>;
+ gpio-controller;
+
+ lantiq,shadow = <0xffff>;
+ lantiq,groups = <0x7>;
+ lantiq,dsl = <0x3>;
+ lantiq,phy1 = <0x7>;
+ lantiq,phy2 = <0x7>;
+ /* lantiq,rising; */
+};
diff --git a/Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt b/Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt
new file mode 100644
index 000000000000..099d9362ebc1
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt
@@ -0,0 +1,14 @@
+NVIDIA Tegra 20 GART
+
+Required properties:
+- compatible: "nvidia,tegra20-gart"
+- reg: Two pairs of cells specifying the physical address and size of
+ the memory controller registers and the GART aperture respectively.
+
+Example:
+
+ gart {
+ compatible = "nvidia,tegra20-gart";
+ reg = <0x7000f024 0x00000018 /* controller registers */
+ 0x58000000 0x02000000>; /* GART aperture */
+ };
diff --git a/Documentation/devicetree/bindings/mfd/da9052-i2c.txt b/Documentation/devicetree/bindings/mfd/da9052-i2c.txt
new file mode 100644
index 000000000000..1857f4a6b9a9
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/da9052-i2c.txt
@@ -0,0 +1,60 @@
+* Dialog DA9052/53 Power Management Integrated Circuit (PMIC)
+
+Required properties:
+- compatible : Should be "dlg,da9052", "dlg,da9053-aa",
+ "dlg,da9053-ab", or "dlg,da9053-bb"
+
+Sub-nodes:
+- regulators : Contain the regulator nodes. The DA9052/53 regulators are
+ bound using their names as listed below:
+
+ buck0 : regulator BUCK0
+ buck1 : regulator BUCK1
+ buck2 : regulator BUCK2
+ buck3 : regulator BUCK3
+ ldo4 : regulator LDO4
+ ldo5 : regulator LDO5
+ ldo6 : regulator LDO6
+ ldo7 : regulator LDO7
+ ldo8 : regulator LDO8
+ ldo9 : regulator LDO9
+ ldo10 : regulator LDO10
+ ldo11 : regulator LDO11
+ ldo12 : regulator LDO12
+ ldo13 : regulator LDO13
+
+ The bindings details of individual regulator device can be found in:
+ Documentation/devicetree/bindings/regulator/regulator.txt
+
+Examples:
+
+i2c@63fc8000 { /* I2C1 */
+ status = "okay";
+
+ pmic: dialog@48 {
+ compatible = "dlg,da9053-aa";
+ reg = <0x48>;
+
+ regulators {
+ buck0 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <2075000>;
+ };
+
+ buck1 {
+ regulator-min-microvolt = <500000>;
+ regulator-max-microvolt = <2075000>;
+ };
+
+ buck2 {
+ regulator-min-microvolt = <925000>;
+ regulator-max-microvolt = <2500000>;
+ };
+
+ buck3 {
+ regulator-min-microvolt = <925000>;
+ regulator-max-microvolt = <2500000>;
+ };
+ };
+ };
+};
diff --git a/Documentation/devicetree/bindings/mfd/tps65910.txt b/Documentation/devicetree/bindings/mfd/tps65910.txt
new file mode 100644
index 000000000000..645f5eaadb3f
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/tps65910.txt
@@ -0,0 +1,133 @@
+TPS65910 Power Management Integrated Circuit
+
+Required properties:
+- compatible: "ti,tps65910" or "ti,tps65911"
+- reg: I2C slave address
+- interrupts: the interrupt outputs of the controller
+- #gpio-cells: number of cells to describe a GPIO, this should be 2.
+ The first cell is the GPIO number.
+ The second cell is used to specify additional options <unused>.
+- gpio-controller: mark the device as a GPIO controller
+- #interrupt-cells: the number of cells to describe an IRQ, this should be 2.
+ The first cell is the IRQ number.
+ The second cell is the flags, encoded as the trigger masks from
+ Documentation/devicetree/bindings/interrupts.txt
+- regulators: This is the list of child nodes that specify the regulator
+ initialization data for defined regulators. Not all regulators for the given
+ device need to be present. The definition for each of these nodes is defined
+ using the standard binding for regulators found at
+ Documentation/devicetree/bindings/regulator/regulator.txt.
+
+ The valid names for regulators are:
+ tps65910: vrtc, vio, vdd1, vdd2, vdd3, vdig1, vdig2, vpll, vdac, vaux1,
+ vaux2, vaux33, vmmc
+ tps65911: vrtc, vio, vdd1, vdd3, vddctrl, ldo1, ldo2, ldo3, ldo4, ldo5,
+ ldo6, ldo7, ldo8
+
+Optional properties:
+- ti,vmbch-threshold: (tps65911) main battery charged threshold
+ comparator. (see VMBCH_VSEL in TPS65910 datasheet)
+- ti,vmbch2-threshold: (tps65911) main battery discharged threshold
+ comparator. (see VMBCH_VSEL in TPS65910 datasheet)
+- ti,en-gpio-sleep: enable sleep control for gpios
+ There should be 9 entries here, one for each gpio.
+
+Regulator Optional properties:
+- ti,regulator-ext-sleep-control: enable external sleep
+ control through external inputs [0 (not enabled), 1 (EN1), 2 (EN2) or 4(EN3)]
+ If this property is not defined, it defaults to 0 (not enabled).
+
+Example:
+
+ pmu: tps65910@d2 {
+ compatible = "ti,tps65910";
+ reg = <0xd2>;
+ interrupt-parent = <&intc>;
+ interrupts = < 0 118 0x04 >;
+
+ #gpio-cells = <2>;
+ gpio-controller;
+
+ #interrupt-cells = <2>;
+ interrupt-controller;
+
+ ti,vmbch-threshold = 0;
+ ti,vmbch2-threshold = 0;
+
+ ti,en-gpio-sleep = <0 0 1 0 0 0 0 0 0>;
+
+ regulators {
+ vdd1_reg: vdd1 {
+ regulator-min-microvolt = < 600000>;
+ regulator-max-microvolt = <1500000>;
+ regulator-always-on;
+ regulator-boot-on;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ vdd2_reg: vdd2 {
+ regulator-min-microvolt = < 600000>;
+ regulator-max-microvolt = <1500000>;
+ regulator-always-on;
+ regulator-boot-on;
+ ti,regulator-ext-sleep-control = <4>;
+ };
+ vddctrl_reg: vddctrl {
+ regulator-min-microvolt = < 600000>;
+ regulator-max-microvolt = <1400000>;
+ regulator-always-on;
+ regulator-boot-on;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ vio_reg: vio {
+ regulator-min-microvolt = <1500000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ ti,regulator-ext-sleep-control = <1>;
+ };
+ ldo1_reg: ldo1 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <3300000>;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ ldo2_reg: ldo2 {
+ regulator-min-microvolt = <1050000>;
+ regulator-max-microvolt = <1050000>;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ ldo3_reg: ldo3 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <3300000>;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ ldo4_reg: ldo4 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-always-on;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ ldo5_reg: ldo5 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <3300000>;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ ldo6_reg: ldo6 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ ti,regulator-ext-sleep-control = <0>;
+ };
+ ldo7_reg: ldo7 {
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <1200000>;
+ regulator-always-on;
+ regulator-boot-on;
+ ti,regulator-ext-sleep-control = <1>;
+ };
+ ldo8_reg: ldo8 {
+ regulator-min-microvolt = <1000000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-always-on;
+ ti,regulator-ext-sleep-control = <1>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/mfd/twl6040.txt b/Documentation/devicetree/bindings/mfd/twl6040.txt
new file mode 100644
index 000000000000..bc67c6f424aa
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/twl6040.txt
@@ -0,0 +1,62 @@
+Texas Instruments TWL6040 family
+
+The TWL6040s are 8-channel high quality low-power audio codecs providing audio
+and vibra functionality on OMAP4+ platforms.
+They are connected ot the host processor via i2c for commands, McPDM for audio
+data and commands.
+
+Required properties:
+- compatible : Must be "ti,twl6040";
+- reg: must be 0x4b for i2c address
+- interrupts: twl6040 has one interrupt line connecteded to the main SoC
+- interrupt-parent: The parent interrupt controller
+- twl6040,audpwron-gpio: Power on GPIO line for the twl6040
+
+- vio-supply: Regulator for the twl6040 VIO supply
+- v2v1-supply: Regulator for the twl6040 V2V1 supply
+
+Optional properties, nodes:
+- enable-active-high: To power on the twl6040 during boot.
+
+Vibra functionality
+Required properties:
+- vddvibl-supply: Regulator for the left vibra motor
+- vddvibr-supply: Regulator for the right vibra motor
+- vibra { }: Configuration section for vibra parameters containing the following
+ properties:
+- ti,vibldrv-res: Resistance parameter for left driver
+- ti,vibrdrv-res: Resistance parameter for right driver
+- ti,viblmotor-res: Resistance parameter for left motor
+- ti,viblmotor-res: Resistance parameter for right motor
+
+Optional properties within vibra { } section:
+- vddvibl_uV: If the vddvibl default voltage need to be changed
+- vddvibr_uV: If the vddvibr default voltage need to be changed
+
+Example:
+&i2c1 {
+ twl6040: twl@4b {
+ compatible = "ti,twl6040";
+ reg = <0x4b>;
+
+ interrupts = <0 119 4>;
+ interrupt-parent = <&gic>;
+ twl6040,audpwron-gpio = <&gpio4 31 0>;
+
+ vio-supply = <&v1v8>;
+ v2v1-supply = <&v2v1>;
+ enable-active-high;
+
+ /* regulators for vibra motor */
+ vddvibl-supply = <&vbat>;
+ vddvibr-supply = <&vbat>;
+
+ vibra {
+ /* Vibra driver, motor resistance parameters */
+ ti,vibldrv-res = <8>;
+ ti,vibrdrv-res = <3>;
+ ti,viblmotor-res = <10>;
+ ti,vibrmotor-res = <10>;
+ };
+ };
+};
diff --git a/Documentation/devicetree/bindings/mtd/gpmi-nand.txt b/Documentation/devicetree/bindings/mtd/gpmi-nand.txt
new file mode 100644
index 000000000000..1a5bbd346d22
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/gpmi-nand.txt
@@ -0,0 +1,33 @@
+* Freescale General-Purpose Media Interface (GPMI)
+
+The GPMI nand controller provides an interface to control the
+NAND flash chips. We support only one NAND chip now.
+
+Required properties:
+ - compatible : should be "fsl,<chip>-gpmi-nand"
+ - reg : should contain registers location and length for gpmi and bch.
+ - reg-names: Should contain the reg names "gpmi-nand" and "bch"
+ - interrupts : The first is the DMA interrupt number for GPMI.
+ The second is the BCH interrupt number.
+ - interrupt-names : The interrupt names "gpmi-dma", "bch";
+ - fsl,gpmi-dma-channel : Should contain the dma channel it uses.
+
+The device tree may optionally contain sub-nodes describing partitions of the
+address space. See partition.txt for more detail.
+
+Examples:
+
+gpmi-nand@8000c000 {
+ compatible = "fsl,imx28-gpmi-nand";
+ #address-cells = <1>;
+ #size-cells = <1>;
+ reg = <0x8000c000 2000>, <0x8000a000 2000>;
+ reg-names = "gpmi-nand", "bch";
+ interrupts = <88>, <41>;
+ interrupt-names = "gpmi-dma", "bch";
+ fsl,gpmi-dma-channel = <4>;
+
+ partition@0 {
+ ...
+ };
+};
diff --git a/Documentation/devicetree/bindings/mtd/mxc-nand.txt b/Documentation/devicetree/bindings/mtd/mxc-nand.txt
new file mode 100644
index 000000000000..b5833d11c7be
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/mxc-nand.txt
@@ -0,0 +1,19 @@
+* Freescale's mxc_nand
+
+Required properties:
+- compatible: "fsl,imxXX-nand"
+- reg: address range of the nfc block
+- interrupts: irq to be used
+- nand-bus-width: see nand.txt
+- nand-ecc-mode: see nand.txt
+- nand-on-flash-bbt: see nand.txt
+
+Example:
+
+ nand@d8000000 {
+ compatible = "fsl,imx27-nand";
+ reg = <0xd8000000 0x1000>;
+ interrupts = <29>;
+ nand-bus-width = <8>;
+ nand-ecc-mode = "hw";
+ };
diff --git a/Documentation/devicetree/bindings/rtc/lpc32xx-rtc.txt b/Documentation/devicetree/bindings/rtc/lpc32xx-rtc.txt
new file mode 100644
index 000000000000..a87a1e9bc060
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/lpc32xx-rtc.txt
@@ -0,0 +1,15 @@
+* NXP LPC32xx SoC Real Time Clock controller
+
+Required properties:
+- compatible: must be "nxp,lpc3220-rtc"
+- reg: physical base address of the controller and length of memory mapped
+ region.
+- interrupts: The RTC interrupt
+
+Example:
+
+ rtc@40024000 {
+ compatible = "nxp,lpc3220-rtc";
+ reg = <0x40024000 0x1000>;
+ interrupts = <52 0>;
+ };
diff --git a/Documentation/devicetree/bindings/rtc/spear-rtc.txt b/Documentation/devicetree/bindings/rtc/spear-rtc.txt
new file mode 100644
index 000000000000..ca67ac62108e
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/spear-rtc.txt
@@ -0,0 +1,17 @@
+* SPEAr RTC
+
+Required properties:
+- compatible : "st,spear600-rtc"
+- reg : Address range of the rtc registers
+- interrupt-parent: Should be the phandle for the interrupt controller
+ that services interrupts for this device
+- interrupt: Should contain the rtc interrupt number
+
+Example:
+
+ rtc@fc000000 {
+ compatible = "st,spear600-rtc";
+ reg = <0xfc000000 0x1000>;
+ interrupt-parent = <&vic1>;
+ interrupts = <12>;
+ };
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index ebaffe208ccb..56000b33340b 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -606,3 +606,9 @@ Why: There are two mci drivers: at91-mci and atmel-mci. The PDC support
Who: Ludovic Desroches <ludovic.desroches@atmel.com>
----------------------------
+
+What: net/wanrouter/
+When: June 2013
+Why: Unsupported/unmaintained/unused since 2.6
+
+----------------------------
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 4fca82e5276e..8e2da1e06e3b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -60,8 +60,8 @@ ata *);
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*removexattr) (struct dentry *, const char *);
- void (*truncate_range)(struct inode *, loff_t, loff_t);
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len);
+ void (*update_time)(struct inode *, struct timespec *, int);
locking rules:
all may block
@@ -87,8 +87,9 @@ setxattr: yes
getxattr: no
listxattr: no
removexattr: yes
-truncate_range: yes
fiemap: no
+update_time: no
+
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_mutex on
victim.
cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index ef088e55ab2e..fb0a6aeb936c 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -40,6 +40,7 @@ Table of Contents
3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
+ 3.7 /proc/<pid>/task/<tid>/children - Information about task children
4 Configuring procfs
4.1 Mount options
@@ -310,6 +311,11 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
start_data address above which program data+bss is placed
end_data address below which program data+bss is placed
start_brk address above which program heap can be expanded with brk()
+ arg_start address above which program command line is placed
+ arg_end address below which program command line is placed
+ env_start address above which program environment is placed
+ env_end address below which program environment is placed
+ exit_code the thread's exit_code in the form reported by the waitpid system call
..............................................................................
The /proc/PID/maps file containing the currently mapped memory regions and
@@ -743,6 +749,7 @@ Committed_AS: 100056 kB
VmallocTotal: 112216 kB
VmallocUsed: 428 kB
VmallocChunk: 111088 kB
+AnonHugePages: 49152 kB
MemTotal: Total usable ram (i.e. physical ram minus a few reserved
bits and the kernel binary code)
@@ -776,6 +783,7 @@ VmallocChunk: 111088 kB
Dirty: Memory which is waiting to get written back to the disk
Writeback: Memory which is actively being written back to the disk
AnonPages: Non-file backed pages mapped into userspace page tables
+AnonHugePages: Non-file backed huge pages mapped into userspace page tables
Mapped: files which have been mmaped, such as libraries
Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
@@ -1576,6 +1584,23 @@ then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
comm value.
+3.7 /proc/<pid>/task/<tid>/children - Information about task children
+-------------------------------------------------------------------------
+This file provides a fast way to retrieve first level children pids
+of a task pointed by <pid>/<tid> pair. The format is a space separated
+stream of pids.
+
+Note the "first level" here -- if a child has own children they will
+not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
+to obtain the descendants.
+
+Since this interface is intended to be fast and cheap it doesn't
+guarantee to provide precise results and some children might be
+skipped, especially if they've exited right after we printed their
+pids, so one need to either stop or freeze processes being inspected
+if precise results are needed.
+
+
------------------------------------------------------------------------------
Configuring procfs
------------------------------------------------------------------------------
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 0d0492028082..efd23f481704 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -363,7 +363,7 @@ struct inode_operations {
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*removexattr) (struct dentry *, const char *);
- void (*truncate_range)(struct inode *, loff_t, loff_t);
+ void (*update_time)(struct inode *, struct timespec *, int);
};
Again, all methods are called without any locks being held, unless
@@ -472,9 +472,9 @@ otherwise noted.
removexattr: called by the VFS to remove an extended attribute from
a file. This method is called by removexattr(2) system call.
- truncate_range: a method provided by the underlying filesystem to truncate a
- range of blocks , i.e. punch a hole somewhere in a file.
-
+ update_time: called by the VFS to update a specific time or the i_version of
+ an inode. If this is not defined the VFS will update the inode itself
+ and call mark_inode_dirty_sync.
The Address Space Object
========================
@@ -760,7 +760,7 @@ struct file_operations
----------------------
This describes how the VFS can manipulate an open file. As of kernel
-2.6.22, the following members are defined:
+3.5, the following members are defined:
struct file_operations {
struct module *owner;
@@ -790,6 +790,8 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned int);
+ int (*setlease)(struct file *, long arg, struct file_lock **);
+ long (*fallocate)(struct file *, int mode, loff_t offset, loff_t len);
};
Again, all methods are called without any locks being held, unless
@@ -858,6 +860,11 @@ otherwise noted.
splice_read: called by the VFS to splice data from file to a pipe. This
method is used by the splice(2) system call
+ setlease: called by the VFS to set or release a file lock lease.
+ setlease has the file_lock_lock held and must not sleep.
+
+ fallocate: called by the VFS to preallocate blocks or punch a hole.
+
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special
diff --git a/Documentation/i2c/functionality b/Documentation/i2c/functionality
index 42c17c1fb3cd..b0ff2ab596ce 100644
--- a/Documentation/i2c/functionality
+++ b/Documentation/i2c/functionality
@@ -18,9 +18,9 @@ For the most up-to-date list of functionality constants, please check
adapters typically can not do these)
I2C_FUNC_10BIT_ADDR Handles the 10-bit address extensions
I2C_FUNC_PROTOCOL_MANGLING Knows about the I2C_M_IGNORE_NAK,
- I2C_M_REV_DIR_ADDR, I2C_M_NOSTART and
- I2C_M_NO_RD_ACK flags (which modify the
- I2C protocol!)
+ I2C_M_REV_DIR_ADDR and I2C_M_NO_RD_ACK
+ flags (which modify the I2C protocol!)
+ I2C_FUNC_NOSTART Can skip repeated start sequence
I2C_FUNC_SMBUS_QUICK Handles the SMBus write_quick command
I2C_FUNC_SMBUS_READ_BYTE Handles the SMBus read_byte command
I2C_FUNC_SMBUS_WRITE_BYTE Handles the SMBus write_byte command
@@ -50,6 +50,9 @@ A few combinations of the above flags are also defined for your convenience:
emulated by a real I2C adapter (using
the transparent emulation layer)
+In kernel versions prior to 3.5 I2C_FUNC_NOSTART was implemented as
+part of I2C_FUNC_PROTOCOL_MANGLING.
+
ADAPTER IMPLEMENTATION
----------------------
diff --git a/Documentation/i2c/i2c-protocol b/Documentation/i2c/i2c-protocol
index 10518dd58814..0b3e62d1f77a 100644
--- a/Documentation/i2c/i2c-protocol
+++ b/Documentation/i2c/i2c-protocol
@@ -49,7 +49,9 @@ a byte read, followed by a byte write:
Modified transactions
=====================
-We have found some I2C devices that needs the following modifications:
+The following modifications to the I2C protocol can also be generated,
+with the exception of I2C_M_NOSTART these are usually only needed to
+work around device issues:
Flag I2C_M_NOSTART:
In a combined transaction, no 'S Addr Wr/Rd [A]' is generated at some
@@ -60,6 +62,11 @@ We have found some I2C devices that needs the following modifications:
we do not generate Addr, but we do generate the startbit S. This will
probably confuse all other clients on your bus, so don't try this.
+ This is often used to gather transmits from multiple data buffers in
+ system memory into something that appears as a single transfer to the
+ I2C device but may also be used between direction changes by some
+ rare devices.
+
Flags I2C_M_REV_DIR_ADDR
This toggles the Rd/Wr flag. That is, if you want to do a write, but
need to emit an Rd instead of a Wr, or vice versa, you set this
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b40b413db88e..a92c5ebf373e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -335,6 +335,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
requirements as needed. This option
does not override iommu=pt
+ amd_iommu_dump= [HW,X86-64]
+ Enable AMD IOMMU driver option to dump the ACPI table
+ for AMD IOMMU. With this option enabled, AMD IOMMU
+ driver will print ACPI tables for AMD IOMMU during
+ IOMMU initialization.
+
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -2537,6 +2543,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
sched_debug [KNL] Enables verbose scheduler debug messages.
+ skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
+ xtime_lock contention on larger systems, and/or RCU lock
+ contention on all systems with CONFIG_MAXSMP set.
+ Format: { "0" | "1" }
+ 0 -- disable. (may be 1 via CONFIG_CMDLINE="skew_tick=1"
+ 1 -- enable.
+ Note: increases power consumption, thus should only be
+ enabled if running jitter sensitive (HPC/RT) workloads.
+
security= [SECURITY] Choose a security module to enable at boot.
If this boot parameter is not specified, only the first
security module asking for security registration will be
diff --git a/Documentation/leds/ledtrig-transient.txt b/Documentation/leds/ledtrig-transient.txt
new file mode 100644
index 000000000000..3bd38b487df1
--- /dev/null
+++ b/Documentation/leds/ledtrig-transient.txt
@@ -0,0 +1,152 @@
+LED Transient Trigger
+=====================
+
+The leds timer trigger does not currently have an interface to activate
+a one shot timer. The current support allows for setting two timers, one for
+specifying how long a state to be on, and the second for how long the state
+to be off. The delay_on value specifies the time period an LED should stay
+in on state, followed by a delay_off value that specifies how long the LED
+should stay in off state. The on and off cycle repeats until the trigger
+gets deactivated. There is no provision for one time activation to implement
+features that require an on or off state to be held just once and then stay in
+the original state forever.
+
+Without one shot timer interface, user space can still use timer trigger to
+set a timer to hold a state, however when user space application crashes or
+goes away without deactivating the timer, the hardware will be left in that
+state permanently.
+
+As a specific example of this use-case, let's look at vibrate feature on
+phones. Vibrate function on phones is implemented using PWM pins on SoC or
+PMIC. There is a need to activate one shot timer to control the vibrate
+feature, to prevent user space crashes leaving the phone in vibrate mode
+permanently causing the battery to drain.
+
+Transient trigger addresses the need for one shot timer activation. The
+transient trigger can be enabled and disabled just like the other leds
+triggers.
+
+When an led class device driver registers itself, it can specify all leds
+triggers it supports and a default trigger. During registration, activation
+routine for the default trigger gets called. During registration of an led
+class device, the LED state does not change.
+
+When the driver unregisters, deactivation routine for the currently active
+trigger will be called, and LED state is changed to LED_OFF.
+
+Driver suspend changes the LED state to LED_OFF and resume doesn't change
+the state. Please note that there is no explicit interaction between the
+suspend and resume actions and the currently enabled trigger. LED state
+changes are suspended while the driver is in suspend state. Any timers
+that are active at the time driver gets suspended, continue to run, without
+being able to actually change the LED state. Once driver is resumed, triggers
+start functioning again.
+
+LED state changes are controlled using brightness which is a common led
+class device property. When brightness is set to 0 from user space via
+echo 0 > brightness, it will result in deactivating the current trigger.
+
+Transient trigger uses standard register and unregister interfaces. During
+trigger registration, for each led class device that specifies this trigger
+as its default trigger, trigger activation routine will get called. During
+registration, the LED state does not change, unless there is another trigger
+active, in which case LED state changes to LED_OFF.
+
+During trigger unregistration, LED state gets changed to LED_OFF.
+
+Transient trigger activation routine doesn't change the LED state. It
+creates its properties and does its initialization. Transient trigger
+deactivation routine, will cancel any timer that is active before it cleans
+up and removes the properties it created. It will restore the LED state to
+non-transient state. When driver gets suspended, irrespective of the transient
+state, the LED state changes to LED_OFF.
+
+Transient trigger can be enabled and disabled from user space on led class
+devices, that support this trigger as shown below:
+
+echo transient > trigger
+echo none > trigger
+
+NOTE: Add a new property trigger state to control the state.
+
+This trigger exports three properties, activate, state, and duration. When
+transient trigger is activated these properties are set to default values.
+
+- duration allows setting timer value in msecs. The initial value is 0.
+- activate allows activating and deactivating the timer specified by
+ duration as needed. The initial and default value is 0. This will allow
+ duration to be set after trigger activation.
+- state allows user to specify a transient state to be held for the specified
+ duration.
+
+ activate - one shot timer activate mechanism.
+ 1 when activated, 0 when deactivated.
+ default value is zero when transient trigger is enabled,
+ to allow duration to be set.
+
+ activate state indicates a timer with a value of specified
+ duration running.
+ deactivated state indicates that there is no active timer
+ running.
+
+ duration - one shot timer value. When activate is set, duration value
+ is used to start a timer that runs once. This value doesn't
+ get changed by the trigger unless user does a set via
+ echo new_value > duration
+
+ state - transient state to be held. It has two values 0 or 1. 0 maps
+ to LED_OFF and 1 maps to LED_FULL. The specified state is
+ held for the duration of the one shot timer and then the
+ state gets changed to the non-transient state which is the
+ inverse of transient state.
+ If state = LED_FULL, when the timer runs out the state will
+ go back to LED_OFF.
+ If state = LED_OFF, when the timer runs out the state will
+ go back to LED_FULL.
+ Please note that current LED state is not checked prior to
+ changing the state to the specified state.
+ Driver could map these values to inverted depending on the
+ default states it defines for the LED in its brightness_set()
+ interface which is called from the led brightness_set()
+ interfaces to control the LED state.
+
+When timer expires activate goes back to deactivated state, duration is left
+at the set value to be used when activate is set at a future time. This will
+allow user app to set the time once and activate it to run it once for the
+specified value as needed. When timer expires, state is restored to the
+non-transient state which is the inverse of the transient state.
+
+ echo 1 > activate - starts timer = duration when duration is not 0.
+ echo 0 > activate - cancels currently running timer.
+ echo n > duration - stores timer value to be used upon next
+ activate. Currently active timer if
+ any, continues to run for the specified time.
+ echo 0 > duration - stores timer value to be used upon next
+ activate. Currently active timer if any,
+ continues to run for the specified time.
+ echo 1 > state - stores desired transient state LED_FULL to be
+ held for the specified duration.
+ echo 0 > state - stores desired transient state LED_OFF to be
+ held for the specified duration.
+
+What is not supported:
+======================
+- Timer activation is one shot and extending and/or shortening the timer
+ is not supported.
+
+Example use-case 1:
+ echo transient > trigger
+ echo n > duration
+ echo 1 > state
+repeat the following step as needed:
+ echo 1 > activate - start timer = duration to run once
+ echo 1 > activate - start timer = duration to run once
+ echo none > trigger
+
+This trigger is intended to be used for for the following example use cases:
+ - Control of vibrate (phones, tablets etc.) hardware by user space app.
+ - Use of LED by user space app as activity indicator.
+ - Use of LED by user space app as a kind of watchdog indicator -- as
+ long as the app is alive, it can keep the LED illuminated, if it dies
+ the LED will be extinguished automatically.
+ - Use by any user space app that needs a transient GPIO output.
diff --git a/Documentation/power/charger-manager.txt b/Documentation/power/charger-manager.txt
index fdcca991df30..b4f7f4b23f64 100644
--- a/Documentation/power/charger-manager.txt
+++ b/Documentation/power/charger-manager.txt
@@ -44,6 +44,16 @@ Charger Manager supports the following:
Normally, the platform will need to resume and suspend some devices
that are used by Charger Manager.
+* Support for premature full-battery event handling
+ If the battery voltage drops by "fullbatt_vchkdrop_uV" after
+ "fullbatt_vchkdrop_ms" from the full-battery event, the framework
+ restarts charging. This check is also performed while suspended by
+ setting wakeup time accordingly and using suspend_again.
+
+* Support for uevent-notify
+ With the charger-related events, the device sends
+ notification to users with UEVENT.
+
2. Global Charger-Manager Data related with suspend_again
========================================================
In order to setup Charger Manager with suspend-again feature
@@ -55,7 +65,7 @@ if there are multiple batteries. If there are multiple batteries, the
multiple instances of Charger Manager share the same charger_global_desc
and it will manage in-suspend monitoring for all instances of Charger Manager.
-The user needs to provide all the two entries properly in order to activate
+The user needs to provide all the three entries properly in order to activate
in-suspend monitoring:
struct charger_global_desc {
@@ -74,6 +84,11 @@ bool (*rtc_only_wakeup)(void);
same struct. If there is any other wakeup source triggered the
wakeup, it should return false. If the "rtc" is the only wakeup
reason, it should return true.
+
+bool assume_timer_stops_in_suspend;
+ : if true, Charger Manager assumes that
+ the timer (CM uses jiffies as timer) stops during suspend. Then, CM
+ assumes that the suspend-duration is same as the alarm length.
};
3. How to setup suspend_again
@@ -111,6 +126,16 @@ enum polling_modes polling_mode;
CM_POLL_CHARGING_ONLY: poll this battery if and only if the
battery is being charged.
+unsigned int fullbatt_vchkdrop_ms;
+unsigned int fullbatt_vchkdrop_uV;
+ : If both have non-zero values, Charger Manager will check the
+ battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
+ charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
+ Manager will try to recharge the battery by disabling and enabling
+ chargers. Recharge with voltage drop condition only (without delay
+ condition) is needed to be implemented with hardware interrupts from
+ fuel gauges or charger devices/chips.
+
unsigned int fullbatt_uV;
: If specified with a non-zero value, Charger Manager assumes
that the battery is full (capacity = 100) if the battery is not being
@@ -122,6 +147,8 @@ unsigned int polling_interval_ms;
this battery every polling_interval_ms or more frequently.
enum data_source battery_present;
+ : CM_BATTERY_PRESENT: assume that the battery exists.
+ CM_NO_BATTERY: assume that the battery does not exists.
CM_FUEL_GAUGE: get battery presence information from fuel gauge.
CM_CHARGER_STAT: get battery presence from chargers.
@@ -151,7 +178,17 @@ bool measure_battery_temp;
the value of measure_battery_temp.
};
-5. Other Considerations
+5. Notify Charger-Manager of charger events: cm_notify_event()
+=========================================================
+If there is an charger event is required to notify
+Charger Manager, a charger device driver that triggers the event can call
+cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
+In the function, psy is the charger driver's power_supply pointer, which is
+associated with Charger-Manager. The parameter "type"
+is the same as irq's type (enum cm_event_types). The event message "msg" is
+optional and is effective only if the event type is "UNDESCRIBED" or "OTHERS".
+
+6. Other Considerations
=======================
At the charger/battery-related events such as battery-pulled-out,
diff --git a/Documentation/power/power_supply_class.txt b/Documentation/power/power_supply_class.txt
index 9f16c5178b66..211831d4095f 100644
--- a/Documentation/power/power_supply_class.txt
+++ b/Documentation/power/power_supply_class.txt
@@ -84,6 +84,8 @@ are already charged or discharging, 'n/a' can be displayed (or
HEALTH - represents health of the battery, values corresponds to
POWER_SUPPLY_HEALTH_*, defined in battery.h.
+VOLTAGE_OCV - open circuit voltage of the battery.
+
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
minimal power supply voltages. Maximal/minimal means values of voltages
when battery considered "full"/"empty" at normal conditions. Yes, there is
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 88fd7f5c8dcd..13d6166d7a27 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -225,6 +225,13 @@ a queue must be less or equal then msg_max.
maximum message size value (it is every message queue's attribute set during
its creation).
+/proc/sys/fs/mqueue/msg_default is a read/write file for setting/getting the
+default number of messages in a queue value if attr parameter of mq_open(2) is
+NULL. If it exceed msg_max, the default value is initialized msg_max.
+
+/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
+the default message size value if attr parameter of mq_open(2) is NULL. If it
+exceed msgsize_max, the default value is initialized msgsize_max.
4. /proc/sys/fs/epoll - Configuration options for the epoll interface
--------------------------------------------------------
diff --git a/Documentation/vm/frontswap.txt b/Documentation/vm/frontswap.txt
new file mode 100644
index 000000000000..37067cf455f4
--- /dev/null
+++ b/Documentation/vm/frontswap.txt
@@ -0,0 +1,278 @@
+Frontswap provides a "transcendent memory" interface for swap pages.
+In some environments, dramatic performance savings may be obtained because
+swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk.
+
+(Note, frontswap -- and cleancache (merged at 3.0) -- are the "frontends"
+and the only necessary changes to the core kernel for transcendent memory;
+all other supporting code -- the "backends" -- is implemented as drivers.
+See the LWN.net article "Transcendent memory in a nutshell" for a detailed
+overview of frontswap and related kernel parts:
+https://lwn.net/Articles/454795/ )
+
+Frontswap is so named because it can be thought of as the opposite of
+a "backing" store for a swap device. The storage is assumed to be
+a synchronous concurrency-safe page-oriented "pseudo-RAM device" conforming
+to the requirements of transcendent memory (such as Xen's "tmem", or
+in-kernel compressed memory, aka "zcache", or future RAM-like devices);
+this pseudo-RAM device is not directly accessible or addressable by the
+kernel and is of unknown and possibly time-varying size. The driver
+links itself to frontswap by calling frontswap_register_ops to set the
+frontswap_ops funcs appropriately and the functions it provides must
+conform to certain policies as follows:
+
+An "init" prepares the device to receive frontswap pages associated
+with the specified swap device number (aka "type"). A "store" will
+copy the page to transcendent memory and associate it with the type and
+offset associated with the page. A "load" will copy the page, if found,
+from transcendent memory into kernel memory, but will NOT remove the page
+from from transcendent memory. An "invalidate_page" will remove the page
+from transcendent memory and an "invalidate_area" will remove ALL pages
+associated with the swap type (e.g., like swapoff) and notify the "device"
+to refuse further stores with that swap type.
+
+Once a page is successfully stored, a matching load on the page will normally
+succeed. So when the kernel finds itself in a situation where it needs
+to swap out a page, it first attempts to use frontswap. If the store returns
+success, the data has been successfully saved to transcendent memory and
+a disk write and, if the data is later read back, a disk read are avoided.
+If a store returns failure, transcendent memory has rejected the data, and the
+page can be written to swap as usual.
+
+If a backend chooses, frontswap can be configured as a "writethrough
+cache" by calling frontswap_writethrough(). In this mode, the reduction
+in swap device writes is lost (and also a non-trivial performance advantage)
+in order to allow the backend to arbitrarily "reclaim" space used to
+store frontswap pages to more completely manage its memory usage.
+
+Note that if a page is stored and the page already exists in transcendent memory
+(a "duplicate" store), either the store succeeds and the data is overwritten,
+or the store fails AND the page is invalidated. This ensures stale data may
+never be obtained from frontswap.
+
+If properly configured, monitoring of frontswap is done via debugfs in
+the /sys/kernel/debug/frontswap directory. The effectiveness of
+frontswap can be measured (across all swap devices) with:
+
+failed_stores - how many store attempts have failed
+loads - how many loads were attempted (all should succeed)
+succ_stores - how many store attempts have succeeded
+invalidates - how many invalidates were attempted
+
+A backend implementation may provide additional metrics.
+
+FAQ
+
+1) Where's the value?
+
+When a workload starts swapping, performance falls through the floor.
+Frontswap significantly increases performance in many such workloads by
+providing a clean, dynamic interface to read and write swap pages to
+"transcendent memory" that is otherwise not directly addressable to the kernel.
+This interface is ideal when data is transformed to a different form
+and size (such as with compression) or secretly moved (as might be
+useful for write-balancing for some RAM-like devices). Swap pages (and
+evicted page-cache pages) are a great use for this kind of slower-than-RAM-
+but-much-faster-than-disk "pseudo-RAM device" and the frontswap (and
+cleancache) interface to transcendent memory provides a nice way to read
+and write -- and indirectly "name" -- the pages.
+
+Frontswap -- and cleancache -- with a fairly small impact on the kernel,
+provides a huge amount of flexibility for more dynamic, flexible RAM
+utilization in various system configurations:
+
+In the single kernel case, aka "zcache", pages are compressed and
+stored in local memory, thus increasing the total anonymous pages
+that can be safely kept in RAM. Zcache essentially trades off CPU
+cycles used in compression/decompression for better memory utilization.
+Benchmarks have shown little or no impact when memory pressure is
+low while providing a significant performance improvement (25%+)
+on some workloads under high memory pressure.
+
+"RAMster" builds on zcache by adding "peer-to-peer" transcendent memory
+support for clustered systems. Frontswap pages are locally compressed
+as in zcache, but then "remotified" to another system's RAM. This
+allows RAM to be dynamically load-balanced back-and-forth as needed,
+i.e. when system A is overcommitted, it can swap to system B, and
+vice versa. RAMster can also be configured as a memory server so
+many servers in a cluster can swap, dynamically as needed, to a single
+server configured with a large amount of RAM... without pre-configuring
+how much of the RAM is available for each of the clients!
+
+In the virtual case, the whole point of virtualization is to statistically
+multiplex physical resources acrosst the varying demands of multiple
+virtual machines. This is really hard to do with RAM and efforts to do
+it well with no kernel changes have essentially failed (except in some
+well-publicized special-case workloads).
+Specifically, the Xen Transcendent Memory backend allows otherwise
+"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple
+virtual machines, but the pages can be compressed and deduplicated to
+optimize RAM utilization. And when guest OS's are induced to surrender
+underutilized RAM (e.g. with "selfballooning"), sudden unexpected
+memory pressure may result in swapping; frontswap allows those pages
+to be swapped to and from hypervisor RAM (if overall host system memory
+conditions allow), thus mitigating the potentially awful performance impact
+of unplanned swapping.
+
+A KVM implementation is underway and has been RFC'ed to lkml. And,
+using frontswap, investigation is also underway on the use of NVM as
+a memory extension technology.
+
+2) Sure there may be performance advantages in some situations, but
+ what's the space/time overhead of frontswap?
+
+If CONFIG_FRONTSWAP is disabled, every frontswap hook compiles into
+nothingness and the only overhead is a few extra bytes per swapon'ed
+swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend"
+registers, there is one extra global variable compared to zero for
+every swap page read or written. If CONFIG_FRONTSWAP is enabled
+AND a frontswap backend registers AND the backend fails every "store"
+request (i.e. provides no memory despite claiming it might),
+CPU overhead is still negligible -- and since every frontswap fail
+precedes a swap page write-to-disk, the system is highly likely
+to be I/O bound and using a small fraction of a percent of a CPU
+will be irrelevant anyway.
+
+As for space, if CONFIG_FRONTSWAP is enabled AND a frontswap backend
+registers, one bit is allocated for every swap page for every swap
+device that is swapon'd. This is added to the EIGHT bits (which
+was sixteen until about 2.6.34) that the kernel already allocates
+for every swap page for every swap device that is swapon'd. (Hugh
+Dickins has observed that frontswap could probably steal one of
+the existing eight bits, but let's worry about that minor optimization
+later.) For very large swap disks (which are rare) on a standard
+4K pagesize, this is 1MB per 32GB swap.
+
+When swap pages are stored in transcendent memory instead of written
+out to disk, there is a side effect that this may create more memory
+pressure that can potentially outweigh the other advantages. A
+backend, such as zcache, must implement policies to carefully (but
+dynamically) manage memory limits to ensure this doesn't happen.
+
+3) OK, how about a quick overview of what this frontswap patch does
+ in terms that a kernel hacker can grok?
+
+Let's assume that a frontswap "backend" has registered during
+kernel initialization; this registration indicates that this
+frontswap backend has access to some "memory" that is not directly
+accessible by the kernel. Exactly how much memory it provides is
+entirely dynamic and random.
+
+Whenever a swap-device is swapon'd frontswap_init() is called,
+passing the swap device number (aka "type") as a parameter.
+This notifies frontswap to expect attempts to "store" swap pages
+associated with that number.
+
+Whenever the swap subsystem is readying a page to write to a swap
+device (c.f swap_writepage()), frontswap_store is called. Frontswap
+consults with the frontswap backend and if the backend says it does NOT
+have room, frontswap_store returns -1 and the kernel swaps the page
+to the swap device as normal. Note that the response from the frontswap
+backend is unpredictable to the kernel; it may choose to never accept a
+page, it could accept every ninth page, or it might accept every
+page. But if the backend does accept a page, the data from the page
+has already been copied and associated with the type and offset,
+and the backend guarantees the persistence of the data. In this case,
+frontswap sets a bit in the "frontswap_map" for the swap device
+corresponding to the page offset on the swap device to which it would
+otherwise have written the data.
+
+When the swap subsystem needs to swap-in a page (swap_readpage()),
+it first calls frontswap_load() which checks the frontswap_map to
+see if the page was earlier accepted by the frontswap backend. If
+it was, the page of data is filled from the frontswap backend and
+the swap-in is complete. If not, the normal swap-in code is
+executed to obtain the page of data from the real swap device.
+
+So every time the frontswap backend accepts a page, a swap device read
+and (potentially) a swap device write are replaced by a "frontswap backend
+store" and (possibly) a "frontswap backend loads", which are presumably much
+faster.
+
+4) Can't frontswap be configured as a "special" swap device that is
+ just higher priority than any real swap device (e.g. like zswap,
+ or maybe swap-over-nbd/NFS)?
+
+No. First, the existing swap subsystem doesn't allow for any kind of
+swap hierarchy. Perhaps it could be rewritten to accomodate a hierarchy,
+but this would require fairly drastic changes. Even if it were
+rewritten, the existing swap subsystem uses the block I/O layer which
+assumes a swap device is fixed size and any page in it is linearly
+addressable. Frontswap barely touches the existing swap subsystem,
+and works around the constraints of the block I/O subsystem to provide
+a great deal of flexibility and dynamicity.
+
+For example, the acceptance of any swap page by the frontswap backend is
+entirely unpredictable. This is critical to the definition of frontswap
+backends because it grants completely dynamic discretion to the
+backend. In zcache, one cannot know a priori how compressible a page is.
+"Poorly" compressible pages can be rejected, and "poorly" can itself be
+defined dynamically depending on current memory constraints.
+
+Further, frontswap is entirely synchronous whereas a real swap
+device is, by definition, asynchronous and uses block I/O. The
+block I/O layer is not only unnecessary, but may perform "optimizations"
+that are inappropriate for a RAM-oriented device including delaying
+the write of some pages for a significant amount of time. Synchrony is
+required to ensure the dynamicity of the backend and to avoid thorny race
+conditions that would unnecessarily and greatly complicate frontswap
+and/or the block I/O subsystem. That said, only the initial "store"
+and "load" operations need be synchronous. A separate asynchronous thread
+is free to manipulate the pages stored by frontswap. For example,
+the "remotification" thread in RAMster uses standard asynchronous
+kernel sockets to move compressed frontswap pages to a remote machine.
+Similarly, a KVM guest-side implementation could do in-guest compression
+and use "batched" hypercalls.
+
+In a virtualized environment, the dynamicity allows the hypervisor
+(or host OS) to do "intelligent overcommit". For example, it can
+choose to accept pages only until host-swapping might be imminent,
+then force guests to do their own swapping.
+
+There is a downside to the transcendent memory specifications for
+frontswap: Since any "store" might fail, there must always be a real
+slot on a real swap device to swap the page. Thus frontswap must be
+implemented as a "shadow" to every swapon'd device with the potential
+capability of holding every page that the swap device might have held
+and the possibility that it might hold no pages at all. This means
+that frontswap cannot contain more pages than the total of swapon'd
+swap devices. For example, if NO swap device is configured on some
+installation, frontswap is useless. Swapless portable devices
+can still use frontswap but a backend for such devices must configure
+some kind of "ghost" swap device and ensure that it is never used.
+
+5) Why this weird definition about "duplicate stores"? If a page
+ has been previously successfully stored, can't it always be
+ successfully overwritten?
+
+Nearly always it can, but no, sometimes it cannot. Consider an example
+where data is compressed and the original 4K page has been compressed
+to 1K. Now an attempt is made to overwrite the page with data that
+is non-compressible and so would take the entire 4K. But the backend
+has no more space. In this case, the store must be rejected. Whenever
+frontswap rejects a store that would overwrite, it also must invalidate
+the old data and ensure that it is no longer accessible. Since the
+swap subsystem then writes the new data to the read swap device,
+this is the correct course of action to ensure coherency.
+
+6) What is frontswap_shrink for?
+
+When the (non-frontswap) swap subsystem swaps out a page to a real
+swap device, that page is only taking up low-value pre-allocated disk
+space. But if frontswap has placed a page in transcendent memory, that
+page may be taking up valuable real estate. The frontswap_shrink
+routine allows code outside of the swap subsystem to force pages out
+of the memory managed by frontswap and back into kernel-addressable memory.
+For example, in RAMster, a "suction driver" thread will attempt
+to "repatriate" pages sent to a remote machine back to the local machine;
+this is driven using the frontswap_shrink mechanism when memory pressure
+subsides.
+
+7) Why does the frontswap patch create the new include file swapfile.h?
+
+The frontswap code depends on some swap-subsystem-internal data
+structures that have, over the years, moved back and forth between
+static and global. This seemed a reasonable compromise: Define
+them as global but declare them in a new include file that isn't
+included by the large number of source files that include swap.h.
+
+Dan Magenheimer, last updated April 9, 2012
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 4600cbe3d6be..7587493c67f1 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,7 +16,7 @@ There are three components to pagemap:
* Bits 0-4 swap type if swapped
* Bits 5-54 swap offset if swapped
* Bits 55-60 page shift (page size = 1<<page shift)
- * Bit 61 reserved for future use
+ * Bit 61 page is file-page or shared-anon
* Bit 62 page swapped
* Bit 63 page present
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index 6752870c4970..b0c6d1bbb434 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -17,7 +17,7 @@ data and perform operation on the slabs. By default slabinfo only lists
slabs that have data in them. See "slabinfo -h" for more options when
running the command. slabinfo can be compiled with
-gcc -o slabinfo tools/slub/slabinfo.c
+gcc -o slabinfo tools/vm/slabinfo.c
Some of the modes of operation of slabinfo require that slub debugging
be enabled on the command line. F.e. no tracking information will be
diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 29bdf62aac09..f734bb2a78dc 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -166,6 +166,68 @@ behavior. So to make them effective you need to restart any
application that could have been using hugepages. This also applies to
the regions registered in khugepaged.
+== Monitoring usage ==
+
+The number of transparent huge pages currently used by the system is
+available by reading the AnonHugePages field in /proc/meminfo. To
+identify what applications are using transparent huge pages, it is
+necessary to read /proc/PID/smaps and count the AnonHugePages fields
+for each mapping. Note that reading the smaps file is expensive and
+reading it frequently will incur overhead.
+
+There are a number of counters in /proc/vmstat that may be used to
+monitor how successfully the system is providing huge pages for use.
+
+thp_fault_alloc is incremented every time a huge page is successfully
+ allocated to handle a page fault. This applies to both the
+ first time a page is faulted and for COW faults.
+
+thp_collapse_alloc is incremented by khugepaged when it has found
+ a range of pages to collapse into one huge page and has
+ successfully allocated a new huge page to store the data.
+
+thp_fault_fallback is incremented if a page fault fails to allocate
+ a huge page and instead falls back to using small pages.
+
+thp_collapse_alloc_failed is incremented if khugepaged found a range
+ of pages that should be collapsed into one huge page but failed
+ the allocation.
+
+thp_split is incremented every time a huge page is split into base
+ pages. This can happen for a variety of reasons but a common
+ reason is that a huge page is old and is being reclaimed.
+
+As the system ages, allocating huge pages may be expensive as the
+system uses memory compaction to copy data around memory to free a
+huge page for use. There are some counters in /proc/vmstat to help
+monitor this overhead.
+
+compact_stall is incremented every time a process stalls to run
+ memory compaction so that a huge page is free for use.
+
+compact_success is incremented if the system compacted memory and
+ freed a huge page for use.
+
+compact_fail is incremented if the system tries to compact memory
+ but failed.
+
+compact_pages_moved is incremented each time a page is moved. If
+ this value is increasing rapidly, it implies that the system
+ is copying a lot of data to satisfy the huge page allocation.
+ It is possible that the cost of copying exceeds any savings
+ from reduced TLB misses.
+
+compact_pagemigrate_failed is incremented when the underlying mechanism
+ for moving a page failed.
+
+compact_blocks_moved is incremented each time memory compaction examines
+ a huge page aligned range of pages.
+
+It is possible to establish how long the stalls were using the function
+tracer to record how long was spent in __alloc_pages_nodemask and
+using the mm_page_alloc tracepoint to identify which allocations were
+for huge pages.
+
== get_user_pages and follow_page ==
get_user_pages and follow_page if run on a hugepage, will return the
diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
index 25fe4304f2fc..086638f6c82d 100644
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ b/Documentation/watchdog/watchdog-kernel-api.txt
@@ -1,6 +1,6 @@
The Linux WatchDog Timer Driver Core kernel API.
===============================================
-Last reviewed: 16-Mar-2012
+Last reviewed: 22-May-2012
Wim Van Sebroeck <wim@iguana.be>
@@ -39,6 +39,10 @@ watchdog_device structure.
The watchdog device structure looks like this:
struct watchdog_device {
+ int id;
+ struct cdev cdev;
+ struct device *dev;
+ struct device *parent;
const struct watchdog_info *info;
const struct watchdog_ops *ops;
unsigned int bootstatus;
@@ -46,10 +50,20 @@ struct watchdog_device {
unsigned int min_timeout;
unsigned int max_timeout;
void *driver_data;
+ struct mutex lock;
unsigned long status;
};
It contains following fields:
+* id: set by watchdog_register_device, id 0 is special. It has both a
+ /dev/watchdog0 cdev (dynamic major, minor 0) as well as the old
+ /dev/watchdog miscdev. The id is set automatically when calling
+ watchdog_register_device.
+* cdev: cdev for the dynamic /dev/watchdog<id> device nodes. This
+ field is also populated by watchdog_register_device.
+* dev: device under the watchdog class (created by watchdog_register_device).
+* parent: set this to the parent device (or NULL) before calling
+ watchdog_register_device.
* info: a pointer to a watchdog_info structure. This structure gives some
additional information about the watchdog timer itself. (Like it's unique name)
* ops: a pointer to the list of watchdog operations that the watchdog supports.
@@ -61,6 +75,7 @@ It contains following fields:
* driver_data: a pointer to the drivers private data of a watchdog device.
This data should only be accessed via the watchdog_set_drvdata and
watchdog_get_drvdata routines.
+* lock: Mutex for WatchDog Timer Driver Core internal use only.
* status: this field contains a number of status bits that give extra
information about the status of the device (Like: is the watchdog timer
running/active, is the nowayout bit set, is the device opened via
@@ -78,6 +93,8 @@ struct watchdog_ops {
unsigned int (*status)(struct watchdog_device *);
int (*set_timeout)(struct watchdog_device *, unsigned int);
unsigned int (*get_timeleft)(struct watchdog_device *);
+ void (*ref)(struct watchdog_device *);
+ void (*unref)(struct watchdog_device *);
long (*ioctl)(struct watchdog_device *, unsigned int, unsigned long);
};
@@ -85,6 +102,21 @@ It is important that you first define the module owner of the watchdog timer
driver's operations. This module owner will be used to lock the module when
the watchdog is active. (This to avoid a system crash when you unload the
module and /dev/watchdog is still open).
+
+If the watchdog_device struct is dynamically allocated, just locking the module
+is not enough and a driver also needs to define the ref and unref operations to
+ensure the structure holding the watchdog_device does not go away.
+
+The simplest (and usually sufficient) implementation of this is to:
+1) Add a kref struct to the same structure which is holding the watchdog_device
+2) Define a release callback for the kref which frees the struct holding both
+3) Call kref_init on this kref *before* calling watchdog_register_device()
+4) Define a ref operation calling kref_get on this kref
+5) Define a unref operation calling kref_put on this kref
+6) When it is time to cleanup:
+ * Do not kfree() the struct holding both, the last kref_put will do this!
+ * *After* calling watchdog_unregister_device() call kref_put on the kref
+
Some operations are mandatory and some are optional. The mandatory operations
are:
* start: this is a pointer to the routine that starts the watchdog timer
@@ -125,6 +157,10 @@ they are supported. These optional routines/operations are:
(Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
watchdog's info structure).
* get_timeleft: this routines returns the time that's left before a reset.
+* ref: the operation that calls kref_get on the kref of a dynamically
+ allocated watchdog_device struct.
+* unref: the operation that calls kref_put on the kref of a dynamically
+ allocated watchdog_device struct.
* ioctl: if this routine is present then it will be called first before we do
our own internal ioctl call handling. This routine should return -ENOIOCTLCMD
if a command is not supported. The parameters that are passed to the ioctl
@@ -144,6 +180,11 @@ bit-operations. The status bits that are defined are:
(This bit should only be used by the WatchDog Timer Driver Core).
* WDOG_NO_WAY_OUT: this bit stores the nowayout setting for the watchdog.
If this bit is set then the watchdog timer will not be able to stop.
+* WDOG_UNREGISTERED: this bit gets set by the WatchDog Timer Driver Core
+ after calling watchdog_unregister_device, and then checked before calling
+ any watchdog_ops, so that you can be sure that no operations (other then
+ unref) will get called after unregister, even if userspace still holds a
+ reference to /dev/watchdog
To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog
timer device) you can either:
diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 17ddd822b456..04fddbacdbde 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -78,6 +78,11 @@ wd0_timeout: Default watchdog0 timeout in 1/10secs
wd1_timeout: Default watchdog1 timeout in 1/10secs
wd2_timeout: Default watchdog2 timeout in 1/10secs
-------------------------------------------------
+da9052wdt:
+timeout: Watchdog timeout in seconds. 2<= timeout <=131, default=2.048s
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
davinci_wdt:
heartbeat: Watchdog heartbeat period in seconds from 1 to 600, default 60
-------------------------------------------------
diff --git a/Documentation/x86/efi-stub.txt b/Documentation/x86/efi-stub.txt
new file mode 100644
index 000000000000..44e6bb6ead10
--- /dev/null
+++ b/Documentation/x86/efi-stub.txt
@@ -0,0 +1,65 @@
+ The EFI Boot Stub
+ ---------------------------
+
+On the x86 platform, a bzImage can masquerade as a PE/COFF image,
+thereby convincing EFI firmware loaders to load it as an EFI
+executable. The code that modifies the bzImage header, along with the
+EFI-specific entry point that the firmware loader jumps to are
+collectively known as the "EFI boot stub", and live in
+arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
+respectively.
+
+By using the EFI boot stub it's possible to boot a Linux kernel
+without the use of a conventional EFI boot loader, such as grub or
+elilo. Since the EFI boot stub performs the jobs of a boot loader, in
+a certain sense it *IS* the boot loader.
+
+The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
+
+
+**** How to install bzImage.efi
+
+The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
+System Partiion (ESP) and renamed with the extension ".efi". Without
+the extension the EFI firmware loader will refuse to execute it. It's
+not possible to execute bzImage.efi from the usual Linux file systems
+because EFI firmware doesn't have support for them.
+
+
+**** Passing kernel parameters from the EFI shell
+
+Arguments to the kernel can be passed after bzImage.efi, e.g.
+
+ fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
+
+
+**** The "initrd=" option
+
+Like most boot loaders, the EFI stub allows the user to specify
+multiple initrd files using the "initrd=" option. This is the only EFI
+stub-specific command line parameter, everything else is passed to the
+kernel when it boots.
+
+The path to the initrd file must be an absolute path from the
+beginning of the ESP, relative path names do not work. Also, the path
+is an EFI-style path and directory elements must be separated with
+backslashes (\). For example, given the following directory layout,
+
+fs0:>
+ Kernels\
+ bzImage.efi
+ initrd-large.img
+
+ Ramdisks\
+ initrd-small.img
+ initrd-medium.img
+
+to boot with the initrd-large.img file if the current working
+directory is fs0:\Kernels, the following command must be used,
+
+ fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
+
+Notice how bzImage.efi can be specified with a relative path. That's
+because the image we're executing is interpreted by the EFI shell,
+which understands relative paths, whereas the rest of the command line
+is passed to bzImage.efi.