summaryrefslogtreecommitdiffstats
path: root/Documentation/power
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/power')
-rw-r--r--Documentation/power/00-INDEX44
-rw-r--r--Documentation/power/apm-acpi.txt32
-rw-r--r--Documentation/power/basic-pm-debugging.txt227
-rw-r--r--Documentation/power/charger-manager.txt200
-rw-r--r--Documentation/power/devices.txt669
-rw-r--r--Documentation/power/drivers-testing.txt46
-rw-r--r--Documentation/power/freezing-of-tasks.txt225
-rw-r--r--Documentation/power/interface.txt75
-rw-r--r--Documentation/power/notifiers.txt53
-rw-r--r--Documentation/power/opp.txt380
-rw-r--r--Documentation/power/pci.txt1025
-rw-r--r--Documentation/power/pm_qos_interface.txt148
-rw-r--r--Documentation/power/power_supply_class.txt182
-rw-r--r--Documentation/power/regulator/consumer.txt182
-rw-r--r--Documentation/power/regulator/design.txt33
-rw-r--r--Documentation/power/regulator/machine.txt100
-rw-r--r--Documentation/power/regulator/overview.txt171
-rw-r--r--Documentation/power/regulator/regulator.txt30
-rw-r--r--Documentation/power/runtime_pm.txt887
-rw-r--r--Documentation/power/s2ram.txt81
-rw-r--r--Documentation/power/states.txt80
-rw-r--r--Documentation/power/suspend-and-cpuhotplug.txt275
-rw-r--r--Documentation/power/swsusp-and-swap-files.txt60
-rw-r--r--Documentation/power/swsusp-dmcrypt.txt138
-rw-r--r--Documentation/power/swsusp.txt408
-rw-r--r--Documentation/power/tricks.txt27
-rw-r--r--Documentation/power/userland-swsusp.txt170
-rw-r--r--Documentation/power/video.txt185
-rw-r--r--Documentation/power/video_extension.txt37
29 files changed, 0 insertions, 6170 deletions
diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
deleted file mode 100644
index a4d682f5423..00000000000
--- a/Documentation/power/00-INDEX
+++ /dev/null
@@ -1,44 +0,0 @@
-00-INDEX
- - This file
-apm-acpi.txt
- - basic info about the APM and ACPI support.
-basic-pm-debugging.txt
- - Debugging suspend and resume
-devices.txt
- - How drivers interact with system-wide power management
-drivers-testing.txt
- - Testing suspend and resume support in device drivers
-freezing-of-tasks.txt
- - How processes and controlled during suspend
-interface.txt
- - Power management user interface in /sys/power
-notifiers.txt
- - Registering suspend notifiers in device drivers
-opp.txt
- - Operating Performance Point library
-pci.txt
- - How the PCI Subsystem Does Power Management
-pm_qos_interface.txt
- - info on Linux PM Quality of Service interface
-power_supply_class.txt
- - Tells userspace about battery, UPS, AC or DC power supply properties
-s2ram.txt
- - How to get suspend to ram working (and debug it when it isn't)
-states.txt
- - System power management states
-suspend-and-cpuhotplug.txt
- - Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
-swsusp-and-swap-files.txt
- - Using swap files with software suspend (to disk)
-swsusp-dmcrypt.txt
- - How to use dm-crypt and software suspend (to disk) together
-swsusp.txt
- - Goals, implementation, and usage of software suspend (ACPI S3)
-tricks.txt
- - How to trick software suspend (to disk) into working when it isn't
-userland-swsusp.txt
- - Experimental implementation of software suspend in userspace
-video_extension.txt
- - ACPI video extensions
-video.txt
- - Video issues during resume from suspend
diff --git a/Documentation/power/apm-acpi.txt b/Documentation/power/apm-acpi.txt
deleted file mode 100644
index 6cc423d3662..00000000000
--- a/Documentation/power/apm-acpi.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-APM or ACPI?
-------------
-If you have a relatively recent x86 mobile, desktop, or server system,
-odds are it supports either Advanced Power Management (APM) or
-Advanced Configuration and Power Interface (ACPI). ACPI is the newer
-of the two technologies and puts power management in the hands of the
-operating system, allowing for more intelligent power management than
-is possible with BIOS controlled APM.
-
-The best way to determine which, if either, your system supports is to
-build a kernel with both ACPI and APM enabled (as of 2.3.x ACPI is
-enabled by default). If a working ACPI implementation is found, the
-ACPI driver will override and disable APM, otherwise the APM driver
-will be used.
-
-No, sorry, you cannot have both ACPI and APM enabled and running at
-once. Some people with broken ACPI or broken APM implementations
-would like to use both to get a full set of working features, but you
-simply cannot mix and match the two. Only one power management
-interface can be in control of the machine at once. Think about it..
-
-User-space Daemons
-------------------
-Both APM and ACPI rely on user-space daemons, apmd and acpid
-respectively, to be completely functional. Obtain both of these
-daemons from your Linux distribution or from the Internet (see below)
-and be sure that they are started sometime in the system boot process.
-Go ahead and start both. If ACPI or APM is not available on your
-system the associated daemon will exit gracefully.
-
- apmd: http://ftp.debian.org/pool/main/a/apmd/
- acpid: http://acpid.sf.net/
diff --git a/Documentation/power/basic-pm-debugging.txt b/Documentation/power/basic-pm-debugging.txt
deleted file mode 100644
index 262acf56fa7..00000000000
--- a/Documentation/power/basic-pm-debugging.txt
+++ /dev/null
@@ -1,227 +0,0 @@
-Debugging hibernation and suspend
- (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
-
-1. Testing hibernation (aka suspend to disk or STD)
-
-To check if hibernation works, you can try to hibernate in the "reboot" mode:
-
-# echo reboot > /sys/power/disk
-# echo disk > /sys/power/state
-
-and the system should create a hibernation image, reboot, resume and get back to
-the command prompt where you have started the transition. If that happens,
-hibernation is most likely to work correctly. Still, you need to repeat the
-test at least a couple of times in a row for confidence. [This is necessary,
-because some problems only show up on a second attempt at suspending and
-resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
-modes causes the PM core to skip some platform-related callbacks which on ACPI
-systems might be necessary to make hibernation work. Thus, if your machine fails
-to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
-
-# echo platform > /sys/power/disk
-# echo disk > /sys/power/state
-
-which is the default and recommended mode of hibernation.
-
-Unfortunately, the "platform" mode of hibernation does not work on some systems
-with broken BIOSes. In such cases the "shutdown" mode of hibernation might
-work:
-
-# echo shutdown > /sys/power/disk
-# echo disk > /sys/power/state
-
-(it is similar to the "reboot" mode, but it requires you to press the power
-button to make the system resume).
-
-If neither "platform" nor "shutdown" hibernation mode works, you will need to
-identify what goes wrong.
-
-a) Test modes of hibernation
-
-To find out why hibernation fails on your system, you can use a special testing
-facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
-there is the file /sys/power/pm_test that can be used to make the hibernation
-core run in a test mode. There are 5 test modes available:
-
-freezer
-- test the freezing of processes
-
-devices
-- test the freezing of processes and suspending of devices
-
-platform
-- test the freezing of processes, suspending of devices and platform
- global control methods(*)
-
-processors
-- test the freezing of processes, suspending of devices, platform
- global control methods(*) and the disabling of nonboot CPUs
-
-core
-- test the freezing of processes, suspending of devices, platform global
- control methods(*), the disabling of nonboot CPUs and suspending of
- platform/system devices
-
-(*) the platform global control methods are only available on ACPI systems
- and are only tested if the hibernation mode is set to "platform"
-
-To use one of them it is necessary to write the corresponding string to
-/sys/power/pm_test (eg. "devices" to test the freezing of processes and
-suspending devices) and issue the standard hibernation commands. For example,
-to use the "devices" test mode along with the "platform" mode of hibernation,
-you should do the following:
-
-# echo devices > /sys/power/pm_test
-# echo platform > /sys/power/disk
-# echo disk > /sys/power/state
-
-Then, the kernel will try to freeze processes, suspend devices, wait 5 seconds,
-resume devices and thaw processes. If "platform" is written to
-/sys/power/pm_test , then after suspending devices the kernel will additionally
-invoke the global control methods (eg. ACPI global control methods) used to
-prepare the platform firmware for hibernation. Next, it will wait 5 seconds and
-invoke the platform (eg. ACPI) global methods used to cancel hibernation etc.
-
-Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
-hibernation/suspend operations. Also, when open for reading, /sys/power/pm_test
-contains a space-separated list of all available tests (including "none" that
-represents the normal functionality) in which the current test level is
-indicated by square brackets.
-
-Generally, as you can see, each test level is more "invasive" than the previous
-one and the "core" level tests the hardware and drivers as deeply as possible
-without creating a hibernation image. Obviously, if the "devices" test fails,
-the "platform" test will fail as well and so on. Thus, as a rule of thumb, you
-should try the test modes starting from "freezer", through "devices", "platform"
-and "processors" up to "core" (repeat the test on each level a couple of times
-to make sure that any random factors are avoided).
-
-If the "freezer" test fails, there is a task that cannot be frozen (in that case
-it usually is possible to identify the offending task by analysing the output of
-dmesg obtained after the failing test). Failure at this level usually means
-that there is a problem with the tasks freezer subsystem that should be
-reported.
-
-If the "devices" test fails, most likely there is a driver that cannot suspend
-or resume its device (in the latter case the system may hang or become unstable
-after the test, so please take that into consideration). To find this driver,
-you can carry out a binary search according to the rules:
-- if the test fails, unload a half of the drivers currently loaded and repeat
-(that would probably involve rebooting the system, so always note what drivers
-have been loaded before the test),
-- if the test succeeds, load a half of the drivers you have unloaded most
-recently and repeat.
-
-Once you have found the failing driver (there can be more than just one of
-them), you have to unload it every time before hibernation. In that case please
-make sure to report the problem with the driver.
-
-It is also possible that the "devices" test will still fail after you have
-unloaded all modules. In that case, you may want to look in your kernel
-configuration for the drivers that can be compiled as modules (and test again
-with these drivers compiled as modules). You may also try to use some special
-kernel command line options such as "noapic", "noacpi" or even "acpi=off".
-
-If the "platform" test fails, there is a problem with the handling of the
-platform (eg. ACPI) firmware on your system. In that case the "platform" mode
-of hibernation is not likely to work. You can try the "shutdown" mode, but that
-is rather a poor man's workaround.
-
-If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
-work (of course, this only may be an issue on SMP systems) and the problem
-should be reported. In that case you can also try to switch the nonboot CPUs
-off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
-see if that works.
-
-If the "core" test fails, which means that suspending of the system/platform
-devices has failed (these devices are suspended on one CPU with interrupts off),
-the problem is most probably hardware-related and serious, so it should be
-reported.
-
-A failure of any of the "platform", "processors" or "core" tests may cause your
-system to hang or become unstable, so please beware. Such a failure usually
-indicates a serious problem that very well may be related to the hardware, but
-please report it anyway.
-
-b) Testing minimal configuration
-
-If all of the hibernation test modes work, you can boot the system with the
-"init=/bin/bash" command line parameter and attempt to hibernate in the
-"reboot", "shutdown" and "platform" modes. If that does not work, there
-probably is a problem with a driver statically compiled into the kernel and you
-can try to compile more drivers as modules, so that they can be tested
-individually. Otherwise, there is a problem with a modular driver and you can
-find it by loading a half of the modules you normally use and binary searching
-in accordance with the algorithm:
-- if there are n modules loaded and the attempt to suspend and resume fails,
-unload n/2 of the modules and try again (that would probably involve rebooting
-the system),
-- if there are n modules loaded and the attempt to suspend and resume succeeds,
-load n/2 modules more and try again.
-
-Again, if you find the offending module(s), it(they) must be unloaded every time
-before hibernation, and please report the problem with it(them).
-
-c) Advanced debugging
-
-In case that hibernation does not work on your system even in the minimal
-configuration and compiling more drivers as modules is not practical or some
-modules cannot be unloaded, you can use one of the more advanced debugging
-techniques to find the problem. First, if there is a serial port in your box,
-you can boot the kernel with the 'no_console_suspend' parameter and try to log
-kernel messages using the serial console. This may provide you with some
-information about the reasons of the suspend (resume) failure. Alternatively,
-it may be possible to use a FireWire port for debugging with firescope
-(ftp://ftp.firstfloor.org/pub/ak/firescope/). On x86 it is also possible to
-use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .
-
-2. Testing suspend to RAM (STR)
-
-To verify that the STR works, it is generally more convenient to use the s2ram
-tool available from http://suspend.sf.net and documented at
-http://en.opensuse.org/SDB:Suspend_to_RAM.
-
-Namely, after writing "freezer", "devices", "platform", "processors", or "core"
-into /sys/power/pm_test (available if the kernel is compiled with
-CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
-to given string. The STR test modes are defined in the same way as for
-hibernation, so please refer to Section 1 for more information about them. In
-particular, the "core" test allows you to test everything except for the actual
-invocation of the platform firmware in order to put the system into the sleep
-state.
-
-Among other things, the testing with the help of /sys/power/pm_test may allow
-you to identify drivers that fail to suspend or resume their devices. They
-should be unloaded every time before an STR transition.
-
-Next, you can follow the instructions at http://en.opensuse.org/s2ram to test
-the system, but if it does not work "out of the box", you may need to boot it
-with "init=/bin/bash" and test s2ram in the minimal configuration. In that
-case, you may be able to search for failing drivers by following the procedure
-analogous to the one described in section 1. If you find some failing drivers,
-you will have to unload them every time before an STR transition (ie. before
-you run s2ram), and please report the problems with them.
-
-There is a debugfs entry which shows the suspend to RAM statistics. Here is an
-example of its output.
- # mount -t debugfs none /sys/kernel/debug
- # cat /sys/kernel/debug/suspend_stats
- success: 20
- fail: 5
- failed_freeze: 0
- failed_prepare: 0
- failed_suspend: 5
- failed_suspend_noirq: 0
- failed_resume: 0
- failed_resume_noirq: 0
- failures:
- last_failed_dev: alarm
- adc
- last_failed_errno: -16
- -16
- last_failed_step: suspend
- suspend
-Field success means the success number of suspend to RAM, and field fail means
-the failure number. Others are the failure number of different steps of suspend
-to RAM. suspend_stats just lists the last 2 failed devices, error number and
-failed step of suspend.
diff --git a/Documentation/power/charger-manager.txt b/Documentation/power/charger-manager.txt
deleted file mode 100644
index b4f7f4b23f6..00000000000
--- a/Documentation/power/charger-manager.txt
+++ /dev/null
@@ -1,200 +0,0 @@
-Charger Manager
- (C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL
-
-Charger Manager provides in-kernel battery charger management that
-requires temperature monitoring during suspend-to-RAM state
-and where each battery may have multiple chargers attached and the userland
-wants to look at the aggregated information of the multiple chargers.
-
-Charger Manager is a platform_driver with power-supply-class entries.
-An instance of Charger Manager (a platform-device created with Charger-Manager)
-represents an independent battery with chargers. If there are multiple
-batteries with their own chargers acting independently in a system,
-the system may need multiple instances of Charger Manager.
-
-1. Introduction
-===============
-
-Charger Manager supports the following:
-
-* Support for multiple chargers (e.g., a device with USB, AC, and solar panels)
- A system may have multiple chargers (or power sources) and some of
- they may be activated at the same time. Each charger may have its
- own power-supply-class and each power-supply-class can provide
- different information about the battery status. This framework
- aggregates charger-related information from multiple sources and
- shows combined information as a single power-supply-class.
-
-* Support for in suspend-to-RAM polling (with suspend_again callback)
- While the battery is being charged and the system is in suspend-to-RAM,
- we may need to monitor the battery health by looking at the ambient or
- battery temperature. We can accomplish this by waking up the system
- periodically. However, such a method wakes up devices unncessary for
- monitoring the battery health and tasks, and user processes that are
- supposed to be kept suspended. That, in turn, incurs unnecessary power
- consumption and slow down charging process. Or even, such peak power
- consumption can stop chargers in the middle of charging
- (external power input < device power consumption), which not
- only affects the charging time, but the lifespan of the battery.
-
- Charger Manager provides a function "cm_suspend_again" that can be
- used as suspend_again callback of platform_suspend_ops. If the platform
- requires tasks other than cm_suspend_again, it may implement its own
- suspend_again callback that calls cm_suspend_again in the middle.
- Normally, the platform will need to resume and suspend some devices
- that are used by Charger Manager.
-
-* Support for premature full-battery event handling
- If the battery voltage drops by "fullbatt_vchkdrop_uV" after
- "fullbatt_vchkdrop_ms" from the full-battery event, the framework
- restarts charging. This check is also performed while suspended by
- setting wakeup time accordingly and using suspend_again.
-
-* Support for uevent-notify
- With the charger-related events, the device sends
- notification to users with UEVENT.
-
-2. Global Charger-Manager Data related with suspend_again
-========================================================
-In order to setup Charger Manager with suspend-again feature
-(in-suspend monitoring), the user should provide charger_global_desc
-with setup_charger_manager(struct charger_global_desc *).
-This charger_global_desc data for in-suspend monitoring is global
-as the name suggests. Thus, the user needs to provide only once even
-if there are multiple batteries. If there are multiple batteries, the
-multiple instances of Charger Manager share the same charger_global_desc
-and it will manage in-suspend monitoring for all instances of Charger Manager.
-
-The user needs to provide all the three entries properly in order to activate
-in-suspend monitoring:
-
-struct charger_global_desc {
-
-char *rtc_name;
- : The name of rtc (e.g., "rtc0") used to wakeup the system from
- suspend for Charger Manager. The alarm interrupt (AIE) of the rtc
- should be able to wake up the system from suspend. Charger Manager
- saves and restores the alarm value and use the previously-defined
- alarm if it is going to go off earlier than Charger Manager so that
- Charger Manager does not interfere with previously-defined alarms.
-
-bool (*rtc_only_wakeup)(void);
- : This callback should let CM know whether
- the wakeup-from-suspend is caused only by the alarm of "rtc" in the
- same struct. If there is any other wakeup source triggered the
- wakeup, it should return false. If the "rtc" is the only wakeup
- reason, it should return true.
-
-bool assume_timer_stops_in_suspend;
- : if true, Charger Manager assumes that
- the timer (CM uses jiffies as timer) stops during suspend. Then, CM
- assumes that the suspend-duration is same as the alarm length.
-};
-
-3. How to setup suspend_again
-=============================
-Charger Manager provides a function "extern bool cm_suspend_again(void)".
-When cm_suspend_again is called, it monitors every battery. The suspend_ops
-callback of the system's platform_suspend_ops can call cm_suspend_again
-function to know whether Charger Manager wants to suspend again or not.
-If there are no other devices or tasks that want to use suspend_again
-feature, the platform_suspend_ops may directly refer to cm_suspend_again
-for its suspend_again callback.
-
-The cm_suspend_again() returns true (meaning "I want to suspend again")
-if the system was woken up by Charger Manager and the polling
-(in-suspend monitoring) results in "normal".
-
-4. Charger-Manager Data (struct charger_desc)
-=============================================
-For each battery charged independently from other batteries (if a series of
-batteries are charged by a single charger, they are counted as one independent
-battery), an instance of Charger Manager is attached to it.
-
-struct charger_desc {
-
-char *psy_name;
- : The power-supply-class name of the battery. Default is
- "battery" if psy_name is NULL. Users can access the psy entries
- at "/sys/class/power_supply/[psy_name]/".
-
-enum polling_modes polling_mode;
- : CM_POLL_DISABLE: do not poll this battery.
- CM_POLL_ALWAYS: always poll this battery.
- CM_POLL_EXTERNAL_POWER_ONLY: poll this battery if and only if
- an external power source is attached.
- CM_POLL_CHARGING_ONLY: poll this battery if and only if the
- battery is being charged.
-
-unsigned int fullbatt_vchkdrop_ms;
-unsigned int fullbatt_vchkdrop_uV;
- : If both have non-zero values, Charger Manager will check the
- battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
- charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
- Manager will try to recharge the battery by disabling and enabling
- chargers. Recharge with voltage drop condition only (without delay
- condition) is needed to be implemented with hardware interrupts from
- fuel gauges or charger devices/chips.
-
-unsigned int fullbatt_uV;
- : If specified with a non-zero value, Charger Manager assumes
- that the battery is full (capacity = 100) if the battery is not being
- charged and the battery voltage is equal to or greater than
- fullbatt_uV.
-
-unsigned int polling_interval_ms;
- : Required polling interval in ms. Charger Manager will poll
- this battery every polling_interval_ms or more frequently.
-
-enum data_source battery_present;
- : CM_BATTERY_PRESENT: assume that the battery exists.
- CM_NO_BATTERY: assume that the battery does not exists.
- CM_FUEL_GAUGE: get battery presence information from fuel gauge.
- CM_CHARGER_STAT: get battery presence from chargers.
-
-char **psy_charger_stat;
- : An array ending with NULL that has power-supply-class names of
- chargers. Each power-supply-class should provide "PRESENT" (if
- battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an
- external power source is attached or not), and "STATUS" (shows whether
- the battery is {"FULL" or not FULL} or {"FULL", "Charging",
- "Discharging", "NotCharging"}).
-
-int num_charger_regulators;
-struct regulator_bulk_data *charger_regulators;
- : Regulators representing the chargers in the form for
- regulator framework's bulk functions.
-
-char *psy_fuel_gauge;
- : Power-supply-class name of the fuel gauge.
-
-int (*temperature_out_of_range)(int *mC);
-bool measure_battery_temp;
- : This callback returns 0 if the temperature is safe for charging,
- a positive number if it is too hot to charge, and a negative number
- if it is too cold to charge. With the variable mC, the callback returns
- the temperature in 1/1000 of centigrade.
- The source of temperature can be battery or ambient one according to
- the value of measure_battery_temp.
-};
-
-5. Notify Charger-Manager of charger events: cm_notify_event()
-=========================================================
-If there is an charger event is required to notify
-Charger Manager, a charger device driver that triggers the event can call
-cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
-In the function, psy is the charger driver's power_supply pointer, which is
-associated with Charger-Manager. The parameter "type"
-is the same as irq's type (enum cm_event_types). The event message "msg" is
-optional and is effective only if the event type is "UNDESCRIBED" or "OTHERS".
-
-6. Other Considerations
-=======================
-
-At the charger/battery-related events such as battery-pulled-out,
-charger-pulled-out, charger-inserted, DCIN-over/under-voltage, charger-stopped,
-and others critical to chargers, the system should be configured to wake up.
-At least the following should wake up the system from a suspend:
-a) charger-on/off b) external-power-in/out c) battery-in/out (while charging)
-
-It is usually accomplished by configuring the PMIC as a wakeup source.
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
deleted file mode 100644
index 872815cd41d..00000000000
--- a/Documentation/power/devices.txt
+++ /dev/null
@@ -1,669 +0,0 @@
-Device Power Management
-
-Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
-Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu>
-
-
-Most of the code in Linux is device drivers, so most of the Linux power
-management (PM) code is also driver-specific. Most drivers will do very
-little; others, especially for platforms with small batteries (like cell
-phones), will do a lot.
-
-This writeup gives an overview of how drivers interact with system-wide
-power management goals, emphasizing the models and interfaces that are
-shared by everything that hooks up to the driver model core. Read it as
-background for the domain-specific work you'd do with any specific driver.
-
-
-Two Models for Device Power Management
-======================================
-Drivers will use one or both of these models to put devices into low-power
-states:
-
- System Sleep model:
- Drivers can enter low-power states as part of entering system-wide
- low-power states like "suspend" (also known as "suspend-to-RAM"), or
- (mostly for systems with disks) "hibernation" (also known as
- "suspend-to-disk").
-
- This is something that device, bus, and class drivers collaborate on
- by implementing various role-specific suspend and resume methods to
- cleanly power down hardware and software subsystems, then reactivate
- them without loss of data.
-
- Some drivers can manage hardware wakeup events, which make the system
- leave the low-power state. This feature may be enabled or disabled
- using the relevant /sys/devices/.../power/wakeup file (for Ethernet
- drivers the ioctl interface used by ethtool may also be used for this
- purpose); enabling it may cost some power usage, but let the whole
- system enter low-power states more often.
-
- Runtime Power Management model:
- Devices may also be put into low-power states while the system is
- running, independently of other power management activity in principle.
- However, devices are not generally independent of each other (for
- example, a parent device cannot be suspended unless all of its child
- devices have been suspended). Moreover, depending on the bus type the
- device is on, it may be necessary to carry out some bus-specific
- operations on the device for this purpose. Devices put into low power
- states at run time may require special handling during system-wide power
- transitions (suspend or hibernation).
-
- For these reasons not only the device driver itself, but also the
- appropriate subsystem (bus type, device type or device class) driver and
- the PM core are involved in runtime power management. As in the system
- sleep power management case, they need to collaborate by implementing
- various role-specific suspend and resume methods, so that the hardware
- is cleanly powered down and reactivated without data or service loss.
-
-There's not a lot to be said about those low-power states except that they are
-very system-specific, and often device-specific. Also, that if enough devices
-have been put into low-power states (at runtime), the effect may be very similar
-to entering some system-wide low-power state (system sleep) ... and that
-synergies exist, so that several drivers using runtime PM might put the system
-into a state where even deeper power saving options are available.
-
-Most suspended devices will have quiesced all I/O: no more DMA or IRQs (except
-for wakeup events), no more data read or written, and requests from upstream
-drivers are no longer accepted. A given bus or platform may have different
-requirements though.
-
-Examples of hardware wakeup events include an alarm from a real time clock,
-network wake-on-LAN packets, keyboard or mouse activity, and media insertion
-or removal (for PCMCIA, MMC/SD, USB, and so on).
-
-
-Interfaces for Entering System Sleep States
-===========================================
-There are programming interfaces provided for subsystems (bus type, device type,
-device class) and device drivers to allow them to participate in the power
-management of devices they are concerned with. These interfaces cover both
-system sleep and runtime power management.
-
-
-Device Power Management Operations
-----------------------------------
-Device power management operations, at the subsystem level as well as at the
-device driver level, are implemented by defining and populating objects of type
-struct dev_pm_ops:
-
-struct dev_pm_ops {
- int (*prepare)(struct device *dev);
- void (*complete)(struct device *dev);
- int (*suspend)(struct device *dev);
- int (*resume)(struct device *dev);
- int (*freeze)(struct device *dev);
- int (*thaw)(struct device *dev);
- int (*poweroff)(struct device *dev);
- int (*restore)(struct device *dev);
- int (*suspend_late)(struct device *dev);
- int (*resume_early)(struct device *dev);
- int (*freeze_late)(struct device *dev);
- int (*thaw_early)(struct device *dev);
- int (*poweroff_late)(struct device *dev);
- int (*restore_early)(struct device *dev);
- int (*suspend_noirq)(struct device *dev);
- int (*resume_noirq)(struct device *dev);
- int (*freeze_noirq)(struct device *dev);
- int (*thaw_noirq)(struct device *dev);
- int (*poweroff_noirq)(struct device *dev);
- int (*restore_noirq)(struct device *dev);
- int (*runtime_suspend)(struct device *dev);
- int (*runtime_resume)(struct device *dev);
- int (*runtime_idle)(struct device *dev);
-};
-
-This structure is defined in include/linux/pm.h and the methods included in it
-are also described in that file. Their roles will be explained in what follows.
-For now, it should be sufficient to remember that the last three methods are
-specific to runtime power management while the remaining ones are used during
-system-wide power transitions.
-
-There also is a deprecated "old" or "legacy" interface for power management
-operations available at least for some subsystems. This approach does not use
-struct dev_pm_ops objects and it is suitable only for implementing system sleep
-power management methods. Therefore it is not described in this document, so
-please refer directly to the source code for more information about it.
-
-
-Subsystem-Level Methods
------------------------
-The core methods to suspend and resume devices reside in struct dev_pm_ops
-pointed to by the ops member of struct dev_pm_domain, or by the pm member of
-struct bus_type, struct device_type and struct class. They are mostly of
-interest to the people writing infrastructure for platforms and buses, like PCI
-or USB, or device type and device class drivers. They also are relevant to the
-writers of device drivers whose subsystems (PM domains, device types, device
-classes and bus types) don't provide all power management methods.
-
-Bus drivers implement these methods as appropriate for the hardware and the
-drivers using it; PCI works differently from USB, and so on. Not many people
-write subsystem-level drivers; most driver code is a "device driver" that builds
-on top of bus-specific framework code.
-
-For more information on these driver calls, see the description later;
-they are called in phases for every device, respecting the parent-child
-sequencing in the driver model tree.
-
-
-/sys/devices/.../power/wakeup files
------------------------------------
-All device objects in the driver model contain fields that control the handling
-of system wakeup events (hardware signals that can force the system out of a
-sleep state). These fields are initialized by bus or device driver code using
-device_set_wakeup_capable() and device_set_wakeup_enable(), defined in
-include/linux/pm_wakeup.h.
-
-The "power.can_wakeup" flag just records whether the device (and its driver) can
-physically support wakeup events. The device_set_wakeup_capable() routine
-affects this flag. The "power.wakeup" field is a pointer to an object of type
-struct wakeup_source used for controlling whether or not the device should use
-its system wakeup mechanism and for notifying the PM core of system wakeup
-events signaled by the device. This object is only present for wakeup-capable
-devices (i.e. devices whose "can_wakeup" flags are set) and is created (or
-removed) by device_set_wakeup_capable().
-
-Whether or not a device is capable of issuing wakeup events is a hardware
-matter, and the kernel is responsible for keeping track of it. By contrast,
-whether or not a wakeup-capable device should issue wakeup events is a policy
-decision, and it is managed by user space through a sysfs attribute: the
-"power/wakeup" file. User space can write the strings "enabled" or "disabled"
-to it to indicate whether or not, respectively, the device is supposed to signal
-system wakeup. This file is only present if the "power.wakeup" object exists
-for the given device and is created (or removed) along with that object, by
-device_set_wakeup_capable(). Reads from the file will return the corresponding
-string.
-
-The "power/wakeup" file is supposed to contain the "disabled" string initially
-for the majority of devices; the major exceptions are power buttons, keyboards,
-and Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with
-ethtool. It should also default to "enabled" for devices that don't generate
-wakeup requests on their own but merely forward wakeup requests from one bus to
-another (like PCI Express ports).
-
-The device_may_wakeup() routine returns true only if the "power.wakeup" object
-exists and the corresponding "power/wakeup" file contains the string "enabled".
-This information is used by subsystems, like the PCI bus type code, to see
-whether or not to enable the devices' wakeup mechanisms. If device wakeup
-mechanisms are enabled or disabled directly by drivers, they also should use
-device_may_wakeup() to decide what to do during a system sleep transition.
-Device drivers, however, are not supposed to call device_set_wakeup_enable()
-directly in any case.
-
-It ought to be noted that system wakeup is conceptually different from "remote
-wakeup" used by runtime power management, although it may be supported by the
-same physical mechanism. Remote wakeup is a feature allowing devices in
-low-power states to trigger specific interrupts to signal conditions in which
-they should be put into the full-power state. Those interrupts may or may not
-be used to signal system wakeup events, depending on the hardware design. On
-some systems it is impossible to trigger them from system sleep states. In any
-case, remote wakeup should always be enabled for runtime power management for
-all devices and drivers that support it.
-
-/sys/devices/.../power/control files
-------------------------------------
-Each device in the driver model has a flag to control whether it is subject to
-runtime power management. This flag, called runtime_auto, is initialized by the
-bus type (or generally subsystem) code using pm_runtime_allow() or
-pm_runtime_forbid(); the default is to allow runtime power management.
-
-The setting can be adjusted by user space by writing either "on" or "auto" to
-the device's power/control sysfs file. Writing "auto" calls pm_runtime_allow(),
-setting the flag and allowing the device to be runtime power-managed by its
-driver. Writing "on" calls pm_runtime_forbid(), clearing the flag, returning
-the device to full power if it was in a low-power state, and preventing the
-device from being runtime power-managed. User space can check the current value
-of the runtime_auto flag by reading the file.
-
-The device's runtime_auto flag has no effect on the handling of system-wide
-power transitions. In particular, the device can (and in the majority of cases
-should and will) be put into a low-power state during a system-wide transition
-to a sleep state even though its runtime_auto flag is clear.
-
-For more information about the runtime power management framework, refer to
-Documentation/power/runtime_pm.txt.
-
-
-Calling Drivers to Enter and Leave System Sleep States
-======================================================
-When the system goes into a sleep state, each device's driver is asked to
-suspend the device by putting it into a state compatible with the target
-system state. That's usually some version of "off", but the details are
-system-specific. Also, wakeup-enabled devices will usually stay partly
-functional in order to wake the system.
-
-When the system leaves that low-power state, the device's driver is asked to
-resume it by returning it to full power. The suspend and resume operations
-always go together, and both are multi-phase operations.
-
-For simple drivers, suspend might quiesce the device using class code
-and then turn its hardware as "off" as possible during suspend_noirq. The
-matching resume calls would then completely reinitialize the hardware
-before reactivating its class I/O queues.
-
-More power-aware drivers might prepare the devices for triggering system wakeup
-events.
-
-
-Call Sequence Guarantees
-------------------------
-To ensure that bridges and similar links needing to talk to a device are
-available when the device is suspended or resumed, the device tree is
-walked in a bottom-up order to suspend devices. A top-down order is
-used to resume those devices.
-
-The ordering of the device tree is defined by the order in which devices
-get registered: a child can never be registered, probed or resumed before
-its parent; and can't be removed or suspended after that parent.
-
-The policy is that the device tree should match hardware bus topology.
-(Or at least the control bus, for devices which use multiple busses.)
-In particular, this means that a device registration may fail if the parent of
-the device is suspending (i.e. has been chosen by the PM core as the next
-device to suspend) or has already suspended, as well as after all of the other
-devices have been suspended. Device drivers must be prepared to cope with such
-situations.
-
-
-System Power Management Phases
-------------------------------
-Suspending or resuming the system is done in several phases. Different phases
-are used for standby or memory sleep states ("suspend-to-RAM") and the
-hibernation state ("suspend-to-disk"). Each phase involves executing callbacks
-for every device before the next phase begins. Not all busses or classes
-support all these callbacks and not all drivers use all the callbacks. The
-various phases always run after tasks have been frozen and before they are
-unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have
-been disabled (except for those marked with the IRQF_NO_SUSPEND flag).
-
-All phases use PM domain, bus, type, class or driver callbacks (that is, methods
-defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, dev->class->pm or
-dev->driver->pm). These callbacks are regarded by the PM core as mutually
-exclusive. Moreover, PM domain callbacks always take precedence over all of the
-other callbacks and, for example, type callbacks take precedence over bus, class
-and driver callbacks. To be precise, the following rules are used to determine
-which callback to execute in the given phase:
-
- 1. If dev->pm_domain is present, the PM core will choose the callback
- included in dev->pm_domain->ops for execution
-
- 2. Otherwise, if both dev->type and dev->type->pm are present, the callback
- included in dev->type->pm will be chosen for execution.
-
- 3. Otherwise, if both dev->class and dev->class->pm are present, the
- callback included in dev->class->pm will be chosen for execution.
-
- 4. Otherwise, if both dev->bus and dev->bus->pm are present, the callback
- included in dev->bus->pm will be chosen for execution.
-
-This allows PM domains and device types to override callbacks provided by bus
-types or device classes if necessary.
-
-The PM domain, type, class and bus callbacks may in turn invoke device- or
-driver-specific methods stored in dev->driver->pm, but they don't have to do
-that.
-
-If the subsystem callback chosen for execution is not present, the PM core will
-execute the corresponding method from dev->driver->pm instead if there is one.
-
-
-Entering System Suspend
------------------------
-When the system goes into the standby or memory sleep state, the phases are:
-
- prepare, suspend, suspend_late, suspend_noirq.
-
- 1. The prepare phase is meant to prevent races by preventing new devices
- from being registered; the PM core would never know that all the
- children of a device had been suspended if new children could be
- registered at will. (By contrast, devices may be unregistered at any
- time.) Unlike the other suspend-related phases, during the prepare
- phase the device tree is traversed top-down.
-
- After the prepare callback method returns, no new children may be
- registered below the device. The method may also prepare the device or
- driver in some way for the upcoming system power transition, but it
- should not put the device into a low-power state.
-
- 2. The suspend methods should quiesce the device to stop it from performing
- I/O. They also may save the device registers and put it into the
- appropriate low-power state, depending on the bus type the device is on,
- and they may enable wakeup events.
-
- 3 For a number of devices it is convenient to split suspend into the
- "quiesce device" and "save device state" phases, in which cases
- suspend_late is meant to do the latter. It is always executed after
- runtime power management has been disabled for all devices.
-
- 4. The suspend_noirq phase occurs after IRQ handlers have been disabled,
- which means that the driver's interrupt handler will not be called while
- the callback method is running. The methods should save the values of
- the device's registers that weren't saved previously and finally put the
- device into the appropriate low-power state.
-
- The majority of subsystems and device drivers need not implement this
- callback. However, bus types allowing devices to share interrupt
- vectors, like PCI, generally need it; otherwise a driver might encounter
- an error during the suspend phase by fielding a shared interrupt
- generated by some other device after its own device had been set to low
- power.
-
-At the end of these phases, drivers should have stopped all I/O transactions
-(DMA, IRQs), saved enough state that they can re-initialize or restore previous
-state (as needed by the hardware), and placed the device into a low-power state.
-On many platforms they will gate off one or more clock sources; sometimes they
-will also switch off power supplies or reduce voltages. (Drivers supporting
-runtime PM may already have performed some or all of these steps.)
-
-If device_may_wakeup(dev) returns true, the device should be prepared for
-generating hardware wakeup signals to trigger a system wakeup event when the
-system is in the sleep state. For example, enable_irq_wake() might identify
-GPIO signals hooked up to a switch or other external hardware, and
-pci_enable_wake() does something similar for the PCI PME signal.
-
-If any of these callbacks returns an error, the system won't enter the desired
-low-power state. Instead the PM core will unwind its actions by resuming all
-the devices that were suspended.
-
-
-Leaving System Suspend
-----------------------
-When resuming from standby or memory sleep, the phases are:
-
- resume_noirq, resume_early, resume, complete.
-
- 1. The resume_noirq callback methods should perform any actions needed
- before the driver's interrupt handlers are invoked. This generally
- means undoing the actions of the suspend_noirq phase. If the bus type
- permits devices to share interrupt vectors, like PCI, the method should
- bring the device and its driver into a state in which the driver can
- recognize if the device is the source of incoming interrupts, if any,
- and handle them correctly.
-
- For example, the PCI bus type's ->pm.resume_noirq() puts the device into
- the full-power state (D0 in the PCI terminology) and restores the
- standard configuration registers of the device. Then it calls the
- device driver's ->pm.resume_noirq() method to perform device-specific
- actions.
-
- 2. The resume_early methods should prepare devices for the execution of
- the resume methods. This generally involves undoing the actions of the
- preceding suspend_late phase.
-
- 3 The resume methods should bring the the device back to its operating
- state, so that it can perform normal I/O. This generally involves
- undoing the actions of the suspend phase.
-
- 4. The complete phase should undo the actions of the prepare phase. Note,
- however, that new children may be registered below the device as soon as
- the resume callbacks occur; it's not necessary to wait until the
- complete phase.
-
-At the end of these phases, drivers should be as functional as they were before
-suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
-gated on. Even if the device was in a low-power state before the system sleep
-because of runtime power management, afterwards it should be back in its
-full-power state. There are multiple reasons why it's best to do this; they are
-discussed in more detail in Documentation/power/runtime_pm.txt.
-
-However, the details here may again be platform-specific. For example,
-some systems support multiple "run" states, and the mode in effect at
-the end of resume might not be the one which preceded suspension.
-That means availability of certain clocks or power supplies changed,
-which could easily affect how a driver works.
-
-Drivers need to be able to handle hardware which has been reset since the
-suspend methods were called, for example by complete reinitialization.
-This may be the hardest part, and the one most protected by NDA'd documents
-and chip errata. It's simplest if the hardware state hasn't changed since
-the suspend was carried out, but that can't be guaranteed (in fact, it usually
-is not the case).
-
-Drivers must also be prepared to notice that the device has been removed
-while the system was powered down, whenever that's physically possible.
-PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
-where common Linux platforms will see such removal. Details of how drivers
-will notice and handle such removals are currently bus-specific, and often
-involve a separate thread.
-
-These callbacks may return an error value, but the PM core will ignore such
-errors since there's nothing it can do about them other than printing them in
-the system log.
-
-
-Entering Hibernation
---------------------
-Hibernating the system is more complicated than putting it into the standby or
-memory sleep state, because it involves creating and saving a system image.
-Therefore there are more phases for hibernation, with a different set of
-callbacks. These phases always run after tasks have been frozen and memory has
-been freed.
-
-The general procedure for hibernation is to quiesce all devices (freeze), create
-an image of the system memory while everything is stable, reactivate all
-devices (thaw), write the image to permanent storage, and finally shut down the
-system (poweroff). The phases used to accomplish this are:
-
- prepare, freeze, freeze_late, freeze_noirq, thaw_noirq, thaw_early,
- thaw, complete, prepare, poweroff, poweroff_late, poweroff_noirq
-
- 1. The prepare phase is discussed in the "Entering System Suspend" section
- above.
-
- 2. The freeze methods should quiesce the device so that it doesn't generate
- IRQs or DMA, and they may need to save the values of device registers.
- However the device does not have to be put in a low-power state, and to
- save time it's best not to do so. Also, the device should not be
- prepared to generate wakeup events.
-
- 3. The freeze_late phase is analogous to the suspend_late phase described
- above, except that the device should not be put in a low-power state and
- should not be allowed to generate wakeup events by it.
-
- 4. The freeze_noirq phase is analogous to the suspend_noirq phase discussed
- above, except again that the device should not be put in a low-power
- state and should not be allowed to generate wakeup events.
-
-At this point the system image is created. All devices should be inactive and
-the contents of memory should remain undisturbed while this happens, so that the
-image forms an atomic snapshot of the system state.
-
- 5. The thaw_noirq phase is analogous to the resume_noirq phase discussed
- above. The main difference is that its methods can assume the device is
- in the same state as at the end of the freeze_noirq phase.
-
- 6. The thaw_early phase is analogous to the resume_early phase described
- above. Its methods should undo the actions of the preceding
- freeze_late, if necessary.
-
- 7. The thaw phase is analogous to the resume phase discussed above. Its
- methods should bring the device back to an operating state, so that it
- can be used for saving the image if necessary.
-
- 8. The complete phase is discussed in the "Leaving System Suspend" section
- above.
-
-At this point the system image is saved, and the devices then need to be
-prepared for the upcoming system shutdown. This is much like suspending them
-before putting the system into the standby or memory sleep state, and the phases
-are similar.
-
- 9. The prepare phase is discussed above.
-
- 10. The poweroff phase is analogous to the suspend phase.
-
- 11. The poweroff_late phase is analogous to the suspend_late phase.
-
- 12. The poweroff_noirq phase is analogous to the suspend_noirq phase.
-
-The poweroff, poweroff_late and poweroff_noirq callbacks should do essentially
-the same things as the suspend, suspend_late and suspend_noirq callbacks,
-respectively. The only notable difference is that they need not store the
-device register values, because the registers should already have been stored
-during the freeze, freeze_late or freeze_noirq phases.
-
-
-Leaving Hibernation
--------------------
-Resuming from hibernation is, again, more complicated than resuming from a sleep
-state in which the contents of main memory are preserved, because it requires
-a system image to be loaded into memory and the pre-hibernation memory contents
-to be restored before control can be passed back to the image kernel.
-
-Although in principle, the image might be loaded into memory and the
-pre-hibernation memory contents restored by the boot loader, in practice this
-can't be done because boot loaders aren't smart enough and there is no
-established protocol for passing the necessary information. So instead, the
-boot loader loads a fresh instance of the kernel, called the boot kernel, into
-memory and passes control to it in the usual way. Then the boot kernel reads
-the system image, restores the pre-hibernation memory contents, and passes
-control to the image kernel. Thus two different kernels are involved in
-resuming from hibernation. In fact, the boot kernel may be completely different
-from the image kernel: a different configuration and even a different version.
-This has important consequences for device drivers and their subsystems.
-
-To be able to load the system image into memory, the boot kernel needs to
-include at least a subset of device drivers allowing it to access the storage
-medium containing the image, although it doesn't need to include all of the
-drivers present in the image kernel. After the image has been loaded, the
-devices managed by the boot kernel need to be prepared for passing control back
-to the image kernel. This is very similar to the initial steps involved in
-creating a system image, and it is accomplished in the same way, using prepare,
-freeze, and freeze_noirq phases. However the devices affected by these phases
-are only those having drivers in the boot kernel; other devices will still be in
-whatever state the boot loader left them.
-
-Should the restoration of the pre-hibernation memory contents fail, the boot
-kernel would go through the "thawing" procedure described above, using the
-thaw_noirq, thaw, and complete phases, and then continue running normally. This
-happens only rarely. Most often the pre-hibernation memory contents are
-restored successfully and control is passed to the image kernel, which then
-becomes responsible for bringing the system back to the working state.
-
-To achieve this, the image kernel must restore the devices' pre-hibernation
-functionality. The operation is much like waking up from the memory sleep
-state, although it involves different phases:
-
- restore_noirq, restore_early, restore, complete
-
- 1. The restore_noirq phase is analogous to the resume_noirq phase.
-
- 2. The restore_early phase is analogous to the resume_early phase.
-
- 3. The restore phase is analogous to the resume phase.
-
- 4. The complete phase is discussed above.
-
-The main difference from resume[_early|_noirq] is that restore[_early|_noirq]
-must assume the device has been accessed and reconfigured by the boot loader or
-the boot kernel. Consequently the state of the device may be different from the
-state remembered from the freeze, freeze_late and freeze_noirq phases. The
-device may even need to be reset and completely re-initialized. In many cases
-this difference doesn't matter, so the resume[_early|_noirq] and
-restore[_early|_norq] method pointers can be set to the same routines.
-Nevertheless, different callback pointers are used in case there is a situation
-where it actually does matter.
-
-
-Device Power Management Domains
--------------------------------
-Sometimes devices share reference clocks or other power resources. In those
-cases it generally is not possible to put devices into low-power states
-individually. Instead, a set of devices sharing a power resource can be put
-into a low-power state together at the same time by turning off the shared
-power resource. Of course, they also need to be put into the full-power state
-together, by turning the shared power resource on. A set of devices with this
-property is often referred to as a power domain.
-
-Support for power domains is provided through the pm_domain field of struct
-device. This field is a pointer to an object of type struct dev_pm_domain,
-defined in include/linux/pm.h, providing a set of power management callbacks
-analogous to the subsystem-level and device driver callbacks that are executed
-for the given device during all power transitions, instead of the respective
-subsystem-level callbacks. Specifically, if a device's pm_domain pointer is
-not NULL, the ->suspend() callback from the object pointed to by it will be
-executed instead of its subsystem's (e.g. bus type's) ->suspend() callback and
-anlogously for all of the remaining callbacks. In other words, power management
-domain callbacks, if defined for the given device, always take precedence over
-the callbacks provided by the device's subsystem (e.g. bus type).
-
-The support for device power management domains is only relevant to platforms
-needing to use the same device driver power management callbacks in many
-different power domain configurations and wanting to avoid incorporating the
-support for power domains into subsystem-level callbacks, for example by
-modifying the platform bus type. Other platforms need not implement it or take
-it into account in any way.
-
-
-Device Low Power (suspend) States
----------------------------------
-Device low-power states aren't standard. One device might only handle
-"on" and "off, while another might support a dozen different versions of
-"on" (how many engines are active?), plus a state that gets back to "on"
-faster than from a full "off".
-
-Some busses define rules about what different suspend states mean. PCI
-gives one example: after the suspend sequence completes, a non-legacy
-PCI device may not perform DMA or issue IRQs, and any wakeup events it
-issues would be issued through the PME# bus signal. Plus, there are
-several PCI-standard device states, some of which are optional.
-
-In contrast, integrated system-on-chip processors often use IRQs as the
-wakeup event sources (so drivers would call enable_irq_wake) and might
-be able to treat DMA completion as a wakeup event (sometimes DMA can stay
-active too, it'd only be the CPU and some peripherals that sleep).
-
-Some details here may be platform-specific. Systems may have devices that
-can be fully active in certain sleep states, such as an LCD display that's
-refreshed using DMA while most of the system is sleeping lightly ... and
-its frame buffer might even be updated by a DSP or other non-Linux CPU while
-the Linux control processor stays idle.
-
-Moreover, the specific actions taken may depend on the target system state.
-One target system state might allow a given device to be very operational;
-another might require a hard shut down with re-initialization on resume.
-And two different target systems might use the same device in different
-ways; the aforementioned LCD might be active in one product's "standby",
-but a different product using the same SOC might work differently.
-
-
-Power Management Notifiers
---------------------------
-There are some operations that cannot be carried out by the power management
-callbacks discussed above, because the callbacks occur too late or too early.
-To handle these cases, subsystems and device drivers may register power
-management notifiers that are called before tasks are frozen and after they have
-been thawed. Generally speaking, the PM notifiers are suitable for performing
-actions that either require user space to be available, or at least won't
-interfere with user space.
-
-For details refer to Documentation/power/notifiers.txt.
-
-
-Runtime Power Management
-========================
-Many devices are able to dynamically power down while the system is still
-running. This feature is useful for devices that are not being used, and
-can offer significant power savings on a running system. These devices
-often support a range of runtime power states, which might use names such
-as "off", "sleep", "idle", "active", and so on. Those states will in some
-cases (like PCI) be partially constrained by the bus the device uses, and will
-usually include hardware states that are also used in system sleep states.
-
-A system-wide power transition can be started while some devices are in low
-power states due to runtime power management. The system sleep PM callbacks
-should recognize such situations and react to them appropriately, but the
-necessary actions are subsystem-specific.
-
-In some cases the decision may be made at the subsystem level while in other
-cases the device driver may be left to decide. In some cases it may be
-desirable to leave a suspended device in that state during a system-wide power
-transition, but in other cases the device must be put back into the full-power
-state temporarily, for example so that its system wakeup capability can be
-disabled. This all depends on the hardware and the design of the subsystem and
-device driver in question.
-
-During system-wide resume from a sleep state it's easiest to put devices into
-the full-power state, as explained in Documentation/power/runtime_pm.txt. Refer
-to that document for more information regarding this particular issue as well as
-for information on the device runtime power management framework in general.
diff --git a/Documentation/power/drivers-testing.txt b/Documentation/power/drivers-testing.txt
deleted file mode 100644
index 638afdf4d6b..00000000000
--- a/Documentation/power/drivers-testing.txt
+++ /dev/null
@@ -1,46 +0,0 @@
-Testing suspend and resume support in device drivers
- (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
-
-1. Preparing the test system
-
-Unfortunately, to effectively test the support for the system-wide suspend and
-resume transitions in a driver, it is necessary to suspend and resume a fully
-functional system with this driver loaded. Moreover, that should be done
-several times, preferably several times in a row, and separately for hibernation
-(aka suspend to disk or STD) and suspend to RAM (STR), because each of these
-cases involves slightly different operations and different interactions with
-the machine's BIOS.
-
-Of course, for this purpose the test system has to be known to suspend and
-resume without the driver being tested. Thus, if possible, you should first
-resolve all suspend/resume-related problems in the test system before you start
-testing the new driver. Please see Documentation/power/basic-pm-debugging.txt
-for more information about the debugging of suspend/resume functionality.
-
-2. Testing the driver
-
-Once you have resolved the suspend/resume-related problems with your test system
-without the new driver, you are ready to test it:
-
-a) Build the driver as a module, load it and try the test modes of hibernation
- (see: Documentation/power/basic-pm-debugging.txt, 1).
-
-b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
- "platform" modes (see: Documentation/power/basic-pm-debugging.txt, 1).
-
-c) Compile the driver directly into the kernel and try the test modes of
- hibernation.
-
-d) Attempt to hibernate with the driver compiled directly into the kernel
- in the "reboot", "shutdown" and "platform" modes.
-
-e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.txt,
- 2). [As far as the STR tests are concerned, it should not matter whether or
- not the driver is built as a module.]
-
-f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
- (see: Documentation/power/basic-pm-debugging.txt, 2).
-
-Each of the above tests should be repeated several times and the STD tests
-should be mixed with the STR tests. If any of them fails, the driver cannot be
-regarded as suspend/resume-safe.
diff --git a/Documentation/power/freezing-of-tasks.txt b/Documentation/power/freezing-of-tasks.txt
deleted file mode 100644
index 6ec291ea1c7..00000000000
--- a/Documentation/power/freezing-of-tasks.txt
+++ /dev/null
@@ -1,225 +0,0 @@
-Freezing of tasks
- (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
-
-I. What is the freezing of tasks?
-
-The freezing of tasks is a mechanism by which user space processes and some
-kernel threads are controlled during hibernation or system-wide suspend (on some
-architectures).
-
-II. How does it work?
-
-There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN
-and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
-PF_NOFREEZE unset (all user space processes and some kernel threads) are
-regarded as 'freezable' and treated in a special way before the system enters a
-suspend state as well as before a hibernation image is created (in what follows
-we only consider hibernation, but the description also applies to suspend).
-
-Namely, as the first step of the hibernation procedure the function
-freeze_processes() (defined in kernel/power/process.c) is called. A system-wide
-variable system_freezing_cnt (as opposed to a per-task flag) is used to indicate
-whether the system is to undergo a freezing operation. And freeze_processes()
-sets this variable. After this, it executes try_to_freeze_tasks() that sends a
-fake signal to all user space processes, and wakes up all the kernel threads.
-All freezable tasks must react to that by calling try_to_freeze(), which
-results in a call to __refrigerator() (defined in kernel/freezer.c), which sets
-the task's PF_FROZEN flag, changes its state to TASK_UNINTERRUPTIBLE and makes
-it loop until PF_FROZEN is cleared for it. Then, we say that the task is
-'frozen' and therefore the set of functions handling this mechanism is referred
-to as 'the freezer' (these functions are defined in kernel/power/process.c,
-kernel/freezer.c & include/linux/freezer.h). User space processes are generally
-frozen before kernel threads.
-
-__refrigerator() must not be called directly. Instead, use the
-try_to_freeze() function (defined in include/linux/freezer.h), that checks
-if the task is to be frozen and makes the task enter __refrigerator().
-
-For user space processes try_to_freeze() is called automatically from the
-signal-handling code, but the freezable kernel threads need to call it
-explicitly in suitable places or use the wait_event_freezable() or
-wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
-that combine interruptible sleep with checking if the task is to be frozen and
-calling try_to_freeze(). The main loop of a freezable kernel thread may look
-like the following one:
-
- set_freezable();
- do {
- hub_events();
- wait_event_freezable(khubd_wait,
- !list_empty(&hub_event_list) ||
- kthread_should_stop());
- } while (!kthread_should_stop() || !list_empty(&hub_event_list));
-
-(from drivers/usb/core/hub.c::hub_thread()).
-
-If a freezable kernel thread fails to call try_to_freeze() after the freezer has
-initiated a freezing operation, the freezing of tasks will fail and the entire
-hibernation operation will be cancelled. For this reason, freezable kernel
-threads must call try_to_freeze() somewhere or use one of the
-wait_event_freezable() and wait_event_freezable_timeout() macros.
-
-After the system memory state has been restored from a hibernation image and
-devices have been reinitialized, the function thaw_processes() is called in
-order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
-have been frozen leave __refrigerator() and continue running.
-
-
-Rationale behind the functions dealing with freezing and thawing of tasks:
--------------------------------------------------------------------------
-
-freeze_processes():
- - freezes only userspace tasks
-
-freeze_kernel_threads():
- - freezes all tasks (including kernel threads) because we can't freeze
- kernel threads without freezing userspace tasks
-
-thaw_kernel_threads():
- - thaws only kernel threads; this is particularly useful if we need to do
- anything special in between thawing of kernel threads and thawing of
- userspace tasks, or if we want to postpone the thawing of userspace tasks
-
-thaw_processes():
- - thaws all tasks (including kernel threads) because we can't thaw userspace
- tasks without thawing kernel threads
-
-
-III. Which kernel threads are freezable?
-
-Kernel threads are not freezable by default. However, a kernel thread may clear
-PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
-directly is not allowed). From this point it is regarded as freezable
-and must call try_to_freeze() in a suitable place.
-
-IV. Why do we do that?
-
-Generally speaking, there is a couple of reasons to use the freezing of tasks:
-
-1. The principal reason is to prevent filesystems from being damaged after
-hibernation. At the moment we have no simple means of checkpointing
-filesystems, so if there are any modifications made to filesystem data and/or
-metadata on disks, we cannot bring them back to the state from before the
-modifications. At the same time each hibernation image contains some
-filesystem-related information that must be consistent with the state of the
-on-disk data and metadata after the system memory state has been restored from
-the image (otherwise the filesystems will be damaged in a nasty way, usually
-making them almost impossible to repair). We therefore freeze tasks that might
-cause the on-disk filesystems' data and metadata to be modified after the
-hibernation image has been created and before the system is finally powered off.
-The majority of these are user space processes, but if any of the kernel threads
-may cause something like this to happen, they have to be freezable.
-
-2. Next, to create the hibernation image we need to free a sufficient amount of
-memory (approximately 50% of available RAM) and we need to do that before
-devices are deactivated, because we generally need them for swapping out. Then,
-after the memory for the image has been freed, we don't want tasks to allocate
-additional memory and we prevent them from doing that by freezing them earlier.
-[Of course, this also means that device drivers should not allocate substantial
-amounts of memory from their .suspend() callbacks before hibernation, but this
-is a separate issue.]
-
-3. The third reason is to prevent user space processes and some kernel threads
-from interfering with the suspending and resuming of devices. A user space
-process running on a second CPU while we are suspending devices may, for
-example, be troublesome and without the freezing of tasks we would need some
-safeguards against race conditions that might occur in such a case.
-
-Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
-of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
-
-"RJW:> Why we freeze tasks at all or why we freeze kernel threads?
-
-Linus: In many ways, 'at all'.
-
-I _do_ realize the IO request queue issues, and that we cannot actually do
-s2ram with some devices in the middle of a DMA. So we want to be able to
-avoid *that*, there's no question about that. And I suspect that stopping
-user threads and then waiting for a sync is practically one of the easier
-ways to do so.
-
-So in practice, the 'at all' may become a 'why freeze kernel threads?' and
-freezing user threads I don't find really objectionable."
-
-Still, there are kernel threads that may want to be freezable. For example, if
-a kernel thread that belongs to a device driver accesses the device directly, it
-in principle needs to know when the device is suspended, so that it doesn't try
-to access it at that time. However, if the kernel thread is freezable, it will
-be frozen before the driver's .suspend() callback is executed and it will be
-thawed after the driver's .resume() callback has run, so it won't be accessing
-the device while it's suspended.
-
-4. Another reason for freezing tasks is to prevent user space processes from
-realizing that hibernation (or suspend) operation takes place. Ideally, user
-space processes should not notice that such a system-wide operation has occurred
-and should continue running without any problems after the restore (or resume
-from suspend). Unfortunately, in the most general case this is quite difficult
-to achieve without the freezing of tasks. Consider, for example, a process
-that depends on all CPUs being online while it's running. Since we need to
-disable nonboot CPUs during the hibernation, if this process is not frozen, it
-may notice that the number of CPUs has changed and may start to work incorrectly
-because of that.
-
-V. Are there any problems related to the freezing of tasks?
-
-Yes, there are.
-
-First of all, the freezing of kernel threads may be tricky if they depend one
-on another. For example, if kernel thread A waits for a completion (in the
-TASK_UNINTERRUPTIBLE state) that needs to be done by freezable kernel thread B
-and B is frozen in the meantime, then A will be blocked until B is thawed, which
-may be undesirable. That's why kernel threads are not freezable by default.
-
-Second, there are the following two problems related to the freezing of user
-space processes:
-1. Putting processes into an uninterruptible sleep distorts the load average.
-2. Now that we have FUSE, plus the framework for doing device drivers in
-userspace, it gets even more complicated because some userspace processes are
-now doing the sorts of things that kernel threads do
-(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
-
-The problem 1. seems to be fixable, although it hasn't been fixed so far. The
-other one is more serious, but it seems that we can work around it by using
-hibernation (and suspend) notifiers (in that case, though, we won't be able to
-avoid the realization by the user space processes that the hibernation is taking
-place).
-
-There are also problems that the freezing of tasks tends to expose, although
-they are not directly related to it. For example, if request_firmware() is
-called from a device driver's .resume() routine, it will timeout and eventually
-fail, because the user land process that should respond to the request is frozen
-at this point. So, seemingly, the failure is due to the freezing of tasks.
-Suppose, however, that the firmware file is located on a filesystem accessible
-only through another device that hasn't been resumed yet. In that case,
-request_firmware() will fail regardless of whether or not the freezing of tasks
-is used. Consequently, the problem is not really related to the freezing of
-tasks, since it generally exists anyway.
-
-A driver must have all firmwares it may need in RAM before suspend() is called.
-If keeping them is not practical, for example due to their size, they must be
-requested early enough using the suspend notifier API described in notifiers.txt.
-
-VI. Are there any precautions to be taken to prevent freezing failures?
-
-Yes, there are.
-
-First of all, grabbing the 'pm_mutex' lock to mutually exclude a piece of code
-from system-wide sleep such as suspend/hibernation is not encouraged.
-If possible, that piece of code must instead hook onto the suspend/hibernation
-notifiers to achieve mutual exclusion. Look at the CPU-Hotplug code
-(kernel/cpu.c) for an example.
-
-However, if that is not feasible, and grabbing 'pm_mutex' is deemed necessary,
-it is strongly discouraged to directly call mutex_[un]lock(&pm_mutex) since
-that could lead to freezing failures, because if the suspend/hibernate code
-successfully acquired the 'pm_mutex' lock, and hence that other entity failed
-to acquire the lock, then that task would get blocked in TASK_UNINTERRUPTIBLE
-state. As a consequence, the freezer would not be able to freeze that task,
-leading to freezing failure.
-
-However, the [un]lock_system_sleep() APIs are safe to use in this scenario,
-since they ask the freezer to skip freezing this task, since it is anyway
-"frozen enough" as it is blocked on 'pm_mutex', which will be released
-only after the entire suspend/hibernation sequence is complete.
-So, to summarize, use [un]lock_system_sleep() instead of directly using
-mutex_[un]lock(&pm_mutex). That would prevent freezing failures.
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
deleted file mode 100644
index c537834af00..00000000000
--- a/Documentation/power/interface.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-Power Management Interface
-
-
-The power management subsystem provides a unified sysfs interface to
-userspace, regardless of what architecture or platform one is
-running. The interface exists in /sys/power/ directory (assuming sysfs
-is mounted at /sys).
-
-/sys/power/state controls system power state. Reading from this file
-returns what states are supported, which is hard-coded to 'standby'
-(Power-On Suspend), 'mem' (Suspend-to-RAM), and 'disk'
-(Suspend-to-Disk).
-
-Writing to this file one of those strings causes the system to
-transition into that state. Please see the file
-Documentation/power/states.txt for a description of each of those
-states.
-
-
-/sys/power/disk controls the operating mode of the suspend-to-disk
-mechanism. Suspend-to-disk can be handled in several ways. We have a
-few options for putting the system to sleep - using the platform driver
-(e.g. ACPI or other suspend_ops), powering off the system or rebooting the
-system (for testing).
-
-Additionally, /sys/power/disk can be used to turn on one of the two testing
-modes of the suspend-to-disk mechanism: 'testproc' or 'test'. If the
-suspend-to-disk mechanism is in the 'testproc' mode, writing 'disk' to
-/sys/power/state will cause the kernel to disable nonboot CPUs and freeze
-tasks, wait for 5 seconds, unfreeze tasks and enable nonboot CPUs. If it is
-in the 'test' mode, writing 'disk' to /sys/power/state will cause the kernel
-to disable nonboot CPUs and freeze tasks, shrink memory, suspend devices, wait
-for 5 seconds, resume devices, unfreeze tasks and enable nonboot CPUs. Then,
-we are able to look in the log messages and work out, for example, which code
-is being slow and which device drivers are misbehaving.
-
-Reading from this file will display all supported modes and the currently
-selected one in brackets, for example
-
- [shutdown] reboot test testproc
-
-Writing to this file will accept one of
-
- 'platform' (only if the platform supports it)
- 'shutdown'
- 'reboot'
- 'testproc'
- 'test'
-
-/sys/power/image_size controls the size of the image created by
-the suspend-to-disk mechanism. It can be written a string
-representing a non-negative integer that will be used as an upper
-limit of the image size, in bytes. The suspend-to-disk mechanism will
-do its best to ensure the image size will not exceed that number. However,
-if this turns out to be impossible, it will try to suspend anyway using the
-smallest image possible. In particular, if "0" is written to this file, the
-suspend image will be as small as possible.
-
-Reading from this file will display the current image size limit, which
-is set to 2/5 of available RAM by default.
-
-/sys/power/pm_trace controls the code which saves the last PM event point in
-the RTC across reboots, so that you can debug a machine that just hangs
-during suspend (or more commonly, during resume). Namely, the RTC is only
-used to save the last PM event point if this file contains '1'. Initially it
-contains '0' which may be changed to '1' by writing a string representing a
-nonzero integer into it.
-
-To use this debugging feature you should attempt to suspend the machine, then
-reboot it and run
-
- dmesg -s 1000000 | grep 'hash matches'
-
-CAUTION: Using it will cause your machine's real-time (CMOS) clock to be
-set to a random invalid time after a resume.
diff --git a/Documentation/power/notifiers.txt b/Documentation/power/notifiers.txt
deleted file mode 100644
index c2a4a346c0d..00000000000
--- a/Documentation/power/notifiers.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-Suspend notifiers
- (C) 2007-2011 Rafael J. Wysocki <rjw@sisk.pl>, GPL
-
-There are some operations that subsystems or drivers may want to carry out
-before hibernation/suspend or after restore/resume, but they require the system
-to be fully functional, so the drivers' and subsystems' .suspend() and .resume()
-or even .prepare() and .complete() callbacks are not suitable for this purpose.
-For example, device drivers may want to upload firmware to their devices after
-resume/restore, but they cannot do it by calling request_firmware() from their
-.resume() or .complete() routines (user land processes are frozen at these
-points). The solution may be to load the firmware into memory before processes
-are frozen and upload it from there in the .resume() routine.
-A suspend/hibernation notifier may be used for this purpose.
-
-The subsystems or drivers having such needs can register suspend notifiers that
-will be called upon the following events by the PM core:
-
-PM_HIBERNATION_PREPARE The system is going to hibernate or suspend, tasks will
- be frozen immediately.
-
-PM_POST_HIBERNATION The system memory state has been restored from a
- hibernation image or an error occurred during
- hibernation. Device drivers' restore callbacks have
- been executed and tasks have been thawed.
-
-PM_RESTORE_PREPARE The system is going to restore a hibernation image.
- If all goes well, the restored kernel will issue a
- PM_POST_HIBERNATION notification.
-
-PM_POST_RESTORE An error occurred during restore from hibernation.
- Device drivers' restore callbacks have been executed
- and tasks have been thawed.
-
-PM_SUSPEND_PREPARE The system is preparing for suspend.
-
-PM_POST_SUSPEND The system has just resumed or an error occurred during
- suspend. Device drivers' resume callbacks have been
- executed and tasks have been thawed.
-
-It is generally assumed that whatever the notifiers do for
-PM_HIBERNATION_PREPARE, should be undone for PM_POST_HIBERNATION. Analogously,
-operations performed for PM_SUSPEND_PREPARE should be reversed for
-PM_POST_SUSPEND. Additionally, all of the notifiers are called for
-PM_POST_HIBERNATION if one of them fails for PM_HIBERNATION_PREPARE, and
-all of the notifiers are called for PM_POST_SUSPEND if one of them fails for
-PM_SUSPEND_PREPARE.
-
-The hibernation and suspend notifiers are called with pm_mutex held. They are
-defined in the usual way, but their last argument is meaningless (it is always
-NULL). To register and/or unregister a suspend notifier use the functions
-register_pm_notifier() and unregister_pm_notifier(), respectively, defined in
-include/linux/suspend.h . If you don't need to unregister the notifier, you can
-also use the pm_notifier() macro defined in include/linux/suspend.h .
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
deleted file mode 100644
index 3035d00757a..00000000000
--- a/Documentation/power/opp.txt
+++ /dev/null
@@ -1,380 +0,0 @@
-*=============*
-* OPP Library *
-*=============*
-
-(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
-
-Contents
---------
-1. Introduction
-2. Initial OPP List Registration
-3. OPP Search Functions
-4. OPP Availability Control Functions
-5. OPP Data Retrieval Functions
-6. Cpufreq Table Generation
-7. Data Structures
-
-1. Introduction
-===============
-Complex SoCs of today consists of a multiple sub-modules working in conjunction.
-In an operational system executing varied use cases, not all modules in the SoC
-need to function at their highest performing frequency all the time. To
-facilitate this, sub-modules in a SoC are grouped into domains, allowing some
-domains to run at lower voltage and frequency while other domains are loaded
-more. The set of discrete tuples consisting of frequency and voltage pairs that
-the device will support per domain are called Operating Performance Points or
-OPPs.
-
-OPP library provides a set of helper functions to organize and query the OPP
-information. The library is located in drivers/base/power/opp.c and the header
-is located in include/linux/opp.h. OPP library can be enabled by enabling
-CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
-CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
-optionally boot at a certain OPP without needing cpufreq.
-
-Typical usage of the OPP library is as follows:
-(users) -> registers a set of default OPPs -> (library)
-SoC framework -> modifies on required cases certain OPPs -> OPP layer
- -> queries to search/retrieve information ->
-
-Architectures that provide a SoC framework for OPP should select ARCH_HAS_OPP
-to make the OPP layer available.
-
-OPP layer expects each domain to be represented by a unique device pointer. SoC
-framework registers a set of initial OPPs per device with the OPP layer. This
-list is expected to be an optimally small number typically around 5 per device.
-This initial list contains a set of OPPs that the framework expects to be safely
-enabled by default in the system.
-
-Note on OPP Availability:
-------------------------
-As the system proceeds to operate, SoC framework may choose to make certain
-OPPs available or not available on each device based on various external
-factors. Example usage: Thermal management or other exceptional situations where
-SoC framework might choose to disable a higher frequency OPP to safely continue
-operations until that OPP could be re-enabled if possible.
-
-OPP library facilitates this concept in it's implementation. The following
-operational functions operate only on available opps:
-opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count
-and opp_init_cpufreq_table
-
-opp_find_freq_exact is meant to be used to find the opp pointer which can then
-be used for opp_enable/disable functions to make an opp available as required.
-
-WARNING: Users of OPP library should refresh their availability count using
-get_opp_count if opp_enable/disable functions are invoked for a device, the
-exact mechanism to trigger these or the notification mechanism to other
-dependent subsystems such as cpufreq are left to the discretion of the SoC
-specific framework which uses the OPP library. Similar care needs to be taken
-care to refresh the cpufreq table in cases of these operations.
-
-WARNING on OPP List locking mechanism:
--------------------------------------------------
-OPP library uses RCU for exclusivity. RCU allows the query functions to operate
-in multiple contexts and this synchronization mechanism is optimal for a read
-intensive operations on data structure as the OPP library caters to.
-
-To ensure that the data retrieved are sane, the users such as SoC framework
-should ensure that the section of code operating on OPP queries are locked
-using RCU read locks. The opp_find_freq_{exact,ceil,floor},
-opp_get_{voltage, freq, opp_count} fall into this category.
-
-opp_{add,enable,disable} are updaters which use mutex and implement it's own
-RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses
-mutex to implment RCU updater strategy. These functions should *NOT* be called
-under RCU locks and other contexts that prevent blocking functions in RCU or
-mutex operations from working.
-
-2. Initial OPP List Registration
-================================
-The SoC implementation calls opp_add function iteratively to add OPPs per
-device. It is expected that the SoC framework will register the OPP entries
-optimally- typical numbers range to be less than 5. The list generated by
-registering the OPPs is maintained by OPP library throughout the device
-operation. The SoC framework can subsequently control the availability of the
-OPPs dynamically using the opp_enable / disable functions.
-
-opp_add - Add a new OPP for a specific domain represented by the device pointer.
- The OPP is defined using the frequency and voltage. Once added, the OPP
- is assumed to be available and control of it's availability can be done
- with the opp_enable/disable functions. OPP library internally stores
- and manages this information in the opp struct. This function may be
- used by SoC framework to define a optimal list as per the demands of
- SoC usage environment.
-
- WARNING: Do not use this function in interrupt context.
-
- Example:
- soc_pm_init()
- {
- /* Do things */
- r = opp_add(mpu_dev, 1000000, 900000);
- if (!r) {
- pr_err("%s: unable to register mpu opp(%d)\n", r);
- goto no_cpufreq;
- }
- /* Do cpufreq things */
- no_cpufreq:
- /* Do remaining things */
- }
-
-3. OPP Search Functions
-=======================
-High level framework such as cpufreq operates on frequencies. To map the
-frequency back to the corresponding OPP, OPP library provides handy functions
-to search the OPP list that OPP library internally manages. These search
-functions return the matching pointer representing the opp if a match is
-found, else returns error. These errors are expected to be handled by standard
-error checks such as IS_ERR() and appropriate actions taken by the caller.
-
-opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
- availability. This function is especially useful to enable an OPP which
- is not available by default.
- Example: In a case when SoC framework detects a situation where a
- higher frequency could be made available, it can use this function to
- find the OPP prior to call the opp_enable to actually make it available.
- rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, false);
- rcu_read_unlock();
- /* dont operate on the pointer.. just do a sanity check.. */
- if (IS_ERR(opp)) {
- pr_err("frequency not disabled!\n");
- /* trigger appropriate actions.. */
- } else {
- opp_enable(dev,1000000000);
- }
-
- NOTE: This is the only search function that operates on OPPs which are
- not available.
-
-opp_find_freq_floor - Search for an available OPP which is *at most* the
- provided frequency. This function is useful while searching for a lesser
- match OR operating on OPP information in the order of decreasing
- frequency.
- Example: To find the highest opp for a device:
- freq = ULONG_MAX;
- rcu_read_lock();
- opp_find_freq_floor(dev, &freq);
- rcu_read_unlock();
-
-opp_find_freq_ceil - Search for an available OPP which is *at least* the
- provided frequency. This function is useful while searching for a
- higher match OR operating on OPP information in the order of increasing
- frequency.
- Example 1: To find the lowest opp for a device:
- freq = 0;
- rcu_read_lock();
- opp_find_freq_ceil(dev, &freq);
- rcu_read_unlock();
- Example 2: A simplified implementation of a SoC cpufreq_driver->target:
- soc_cpufreq_target(..)
- {
- /* Do stuff like policy checks etc. */
- /* Find the best frequency match for the req */
- rcu_read_lock();
- opp = opp_find_freq_ceil(dev, &freq);
- rcu_read_unlock();
- if (!IS_ERR(opp))
- soc_switch_to_freq_voltage(freq);
- else
- /* do something when we can't satisfy the req */
- /* do other stuff */
- }
-
-4. OPP Availability Control Functions
-=====================================
-A default OPP list registered with the OPP library may not cater to all possible
-situation. The OPP library provides a set of functions to modify the
-availability of a OPP within the OPP list. This allows SoC frameworks to have
-fine grained dynamic control of which sets of OPPs are operationally available.
-These functions are intended to *temporarily* remove an OPP in conditions such
-as thermal considerations (e.g. don't use OPPx until the temperature drops).
-
-WARNING: Do not use these functions in interrupt context.
-
-opp_enable - Make a OPP available for operation.
- Example: Lets say that 1GHz OPP is to be made available only if the
- SoC temperature is lower than a certain threshold. The SoC framework
- implementation might choose to do something as follows:
- if (cur_temp < temp_low_thresh) {
- /* Enable 1GHz if it was disabled */
- rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, false);
- rcu_read_unlock();
- /* just error check */
- if (!IS_ERR(opp))
- ret = opp_enable(dev, 1000000000);
- else
- goto try_something_else;
- }
-
-opp_disable - Make an OPP to be not available for operation
- Example: Lets say that 1GHz OPP is to be disabled if the temperature
- exceeds a threshold value. The SoC framework implementation might
- choose to do something as follows:
- if (cur_temp > temp_high_thresh) {
- /* Disable 1GHz if it was enabled */
- rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, true);
- rcu_read_unlock();
- /* just error check */
- if (!IS_ERR(opp))
- ret = opp_disable(dev, 1000000000);
- else
- goto try_something_else;
- }
-
-5. OPP Data Retrieval Functions
-===============================
-Since OPP library abstracts away the OPP information, a set of functions to pull
-information from the OPP structure is necessary. Once an OPP pointer is
-retrieved using the search functions, the following functions can be used by SoC
-framework to retrieve the information represented inside the OPP layer.
-
-opp_get_voltage - Retrieve the voltage represented by the opp pointer.
- Example: At a cpufreq transition to a different frequency, SoC
- framework requires to set the voltage represented by the OPP using
- the regulator framework to the Power Management chip providing the
- voltage.
- soc_switch_to_freq_voltage(freq)
- {
- /* do things */
- rcu_read_lock();
- opp = opp_find_freq_ceil(dev, &freq);
- v = opp_get_voltage(opp);
- rcu_read_unlock();
- if (v)
- regulator_set_voltage(.., v);
- /* do other things */
- }
-
-opp_get_freq - Retrieve the freq represented by the opp pointer.
- Example: Lets say the SoC framework uses a couple of helper functions
- we could pass opp pointers instead of doing additional parameters to
- handle quiet a bit of data parameters.
- soc_cpufreq_target(..)
- {
- /* do things.. */
- max_freq = ULONG_MAX;
- rcu_read_lock();
- max_opp = opp_find_freq_floor(dev,&max_freq);
- requested_opp = opp_find_freq_ceil(dev,&freq);
- if (!IS_ERR(max_opp) && !IS_ERR(requested_opp))
- r = soc_test_validity(max_opp, requested_opp);
- rcu_read_unlock();
- /* do other things */
- }
- soc_test_validity(..)
- {
- if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp))
- return -EINVAL;
- if(opp_get_freq(max_opp) < opp_get_freq(requested_opp))
- return -EINVAL;
- /* do things.. */
- }
-
-opp_get_opp_count - Retrieve the number of available opps for a device
- Example: Lets say a co-processor in the SoC needs to know the available
- frequencies in a table, the main processor can notify as following:
- soc_notify_coproc_available_frequencies()
- {
- /* Do things */
- rcu_read_lock();
- num_available = opp_get_opp_count(dev);
- speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL);
- /* populate the table in increasing order */
- freq = 0;
- while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) {
- speeds[i] = freq;
- freq++;
- i++;
- }
- rcu_read_unlock();
-
- soc_notify_coproc(AVAILABLE_FREQs, speeds, num_available);
- /* Do other things */
- }
-
-6. Cpufreq Table Generation
-===========================
-opp_init_cpufreq_table - cpufreq framework typically is initialized with
- cpufreq_frequency_table_cpuinfo which is provided with the list of
- frequencies that are available for operation. This function provides
- a ready to use conversion routine to translate the OPP layer's internal
- information about the available frequencies into a format readily
- providable to cpufreq.
-
- WARNING: Do not use this function in interrupt context.
-
- Example:
- soc_pm_init()
- {
- /* Do things */
- r = opp_init_cpufreq_table(dev, &freq_table);
- if (!r)
- cpufreq_frequency_table_cpuinfo(policy, freq_table);
- /* Do other things */
- }
-
- NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in
- addition to CONFIG_PM as power management feature is required to
- dynamically scale voltage and frequency in a system.
-
-opp_free_cpufreq_table - Free up the table allocated by opp_init_cpufreq_table
-
-7. Data Structures
-==================
-Typically an SoC contains multiple voltage domains which are variable. Each
-domain is represented by a device pointer. The relationship to OPP can be
-represented as follows:
-SoC
- |- device 1
- | |- opp 1 (availability, freq, voltage)
- | |- opp 2 ..
- ... ...
- | `- opp n ..
- |- device 2
- ...
- `- device m
-
-OPP library maintains a internal list that the SoC framework populates and
-accessed by various functions as described above. However, the structures
-representing the actual OPPs and domains are internal to the OPP library itself
-to allow for suitable abstraction reusable across systems.
-
-struct opp - The internal data structure of OPP library which is used to
- represent an OPP. In addition to the freq, voltage, availability
- information, it also contains internal book keeping information required
- for the OPP library to operate on. Pointer to this structure is
- provided back to the users such as SoC framework to be used as a
- identifier for OPP in the interactions with OPP layer.
-
- WARNING: The struct opp pointer should not be parsed or modified by the
- users. The defaults of for an instance is populated by opp_add, but the
- availability of the OPP can be modified by opp_enable/disable functions.
-
-struct device - This is used to identify a domain to the OPP layer. The
- nature of the device and it's implementation is left to the user of
- OPP library such as the SoC framework.
-
-Overall, in a simplistic view, the data structure operations is represented as
-following:
-
-Initialization / modification:
- +-----+ /- opp_enable
-opp_add --> | opp | <-------
- | +-----+ \- opp_disable
- \-------> domain_info(device)
-
-Search functions:
- /-- opp_find_freq_ceil ---\ +-----+
-domain_info<---- opp_find_freq_exact -----> | opp |
- \-- opp_find_freq_floor ---/ +-----+
-
-Retrieval functions:
-+-----+ /- opp_get_voltage
-| opp | <---
-+-----+ \- opp_get_freq
-
-domain_info <- opp_get_opp_count
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt
deleted file mode 100644
index 62328d76b55..00000000000
--- a/Documentation/power/pci.txt
+++ /dev/null
@@ -1,1025 +0,0 @@
-PCI Power Management
-
-Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
-
-An overview of concepts and the Linux kernel's interfaces related to PCI power
-management. Based on previous work by Patrick Mochel <mochel@transmeta.com>
-(and others).
-
-This document only covers the aspects of power management specific to PCI
-devices. For general description of the kernel's interfaces related to device
-power management refer to Documentation/power/devices.txt and
-Documentation/power/runtime_pm.txt.
-
----------------------------------------------------------------------------
-
-1. Hardware and Platform Support for PCI Power Management
-2. PCI Subsystem and Device Power Management
-3. PCI Device Drivers and Power Management
-4. Resources
-
-
-1. Hardware and Platform Support for PCI Power Management
-=========================================================
-
-1.1. Native and Platform-Based Power Management
------------------------------------------------
-In general, power management is a feature allowing one to save energy by putting
-devices into states in which they draw less power (low-power states) at the
-price of reduced functionality or performance.
-
-Usually, a device is put into a low-power state when it is underutilized or
-completely inactive. However, when it is necessary to use the device once
-again, it has to be put back into the "fully functional" state (full-power
-state). This may happen when there are some data for the device to handle or
-as a result of an external event requiring the device to be active, which may
-be signaled by the device itself.
-
-PCI devices may be put into low-power states in two ways, by using the device
-capabilities introduced by the PCI Bus Power Management Interface Specification,
-or with the help of platform firmware, such as an ACPI BIOS. In the first
-approach, that is referred to as the native PCI power management (native PCI PM)
-in what follows, the device power state is changed as a result of writing a
-specific value into one of its standard configuration registers. The second
-approach requires the platform firmware to provide special methods that may be
-used by the kernel to change the device's power state.
-
-Devices supporting the native PCI PM usually can generate wakeup signals called
-Power Management Events (PMEs) to let the kernel know about external events
-requiring the device to be active. After receiving a PME the kernel is supposed
-to put the device that sent it into the full-power state. However, the PCI Bus
-Power Management Interface Specification doesn't define any standard method of
-delivering the PME from the device to the CPU and the operating system kernel.
-It is assumed that the platform firmware will perform this task and therefore,
-even though a PCI device is set up to generate PMEs, it also may be necessary to
-prepare the platform firmware for notifying the CPU of the PMEs coming from the
-device (e.g. by generating interrupts).
-
-In turn, if the methods provided by the platform firmware are used for changing
-the power state of a device, usually the platform also provides a method for
-preparing the device to generate wakeup signals. In that case, however, it
-often also is necessary to prepare the device for generating PMEs using the
-native PCI PM mechanism, because the method provided by the platform depends on
-that.
-
-Thus in many situations both the native and the platform-based power management
-mechanisms have to be used simultaneously to obtain the desired result.
-
-1.2. Native PCI Power Management
---------------------------------
-The PCI Bus Power Management Interface Specification (PCI PM Spec) was
-introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
-standard interface for performing various operations related to power
-management.
-
-The implementation of the PCI PM Spec is optional for conventional PCI devices,
-but it is mandatory for PCI Express devices. If a device supports the PCI PM
-Spec, it has an 8 byte power management capability field in its PCI
-configuration space. This field is used to describe and control the standard
-features related to the native PCI power management.
-
-The PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses
-(B0-B3). The higher the number, the less power is drawn by the device or bus
-in that state. However, the higher the number, the longer the latency for
-the device or bus to return to the full-power state (D0 or B0, respectively).
-
-There are two variants of the D3 state defined by the specification. The first
-one is D3hot, referred to as the software accessible D3, because devices can be
-programmed to go into it. The second one, D3cold, is the state that PCI devices
-are in when the supply voltage (Vcc) is removed from them. It is not possible
-to program a PCI device to go into D3cold, although there may be a programmable
-interface for putting the bus the device is on into a state in which Vcc is
-removed from all devices on the bus.
-
-PCI bus power management, however, is not supported by the Linux kernel at the
-time of this writing and therefore it is not covered by this document.
-
-Note that every PCI device can be in the full-power state (D0) or in D3cold,
-regardless of whether or not it implements the PCI PM Spec. In addition to
-that, if the PCI PM Spec is implemented by the device, it must support D3hot
-as well as D0. The support for the D1 and D2 power states is optional.
-
-PCI devices supporting the PCI PM Spec can be programmed to go to any of the
-supported low-power states (except for D3cold). While in D1-D3hot the
-standard configuration registers of the device must be accessible to software
-(i.e. the device is required to respond to PCI configuration accesses), although
-its I/O and memory spaces are then disabled. This allows the device to be
-programmatically put into D0. Thus the kernel can switch the device back and
-forth between D0 and the supported low-power states (except for D3cold) and the
-possible power state transitions the device can undergo are the following:
-
-+----------------------------+
-| Current State | New State |
-+----------------------------+
-| D0 | D1, D2, D3 |
-+----------------------------+
-| D1 | D2, D3 |
-+----------------------------+
-| D2 | D3 |
-+----------------------------+
-| D1, D2, D3 | D0 |
-+----------------------------+
-
-The transition from D3cold to D0 occurs when the supply voltage is provided to
-the device (i.e. power is restored). In that case the device returns to D0 with
-a full power-on reset sequence and the power-on defaults are restored to the
-device by hardware just as at initial power up.
-
-PCI devices supporting the PCI PM Spec can be programmed to generate PMEs
-while in a low-power state (D1-D3), but they are not required to be capable
-of generating PMEs from all supported low-power states. In particular, the
-capability of generating PMEs from D3cold is optional and depends on the
-presence of additional voltage (3.3Vaux) allowing the device to remain
-sufficiently active to generate a wakeup signal.
-
-1.3. ACPI Device Power Management
----------------------------------
-The platform firmware support for the power management of PCI devices is
-system-specific. However, if the system in question is compliant with the
-Advanced Configuration and Power Interface (ACPI) Specification, like the
-majority of x86-based systems, it is supposed to implement device power
-management interfaces defined by the ACPI standard.
-
-For this purpose the ACPI BIOS provides special functions called "control
-methods" that may be executed by the kernel to perform specific tasks, such as
-putting a device into a low-power state. These control methods are encoded
-using special byte-code language called the ACPI Machine Language (AML) and
-stored in the machine's BIOS. The kernel loads them from the BIOS and executes
-them as needed using an AML interpreter that translates the AML byte code into
-computations and memory or I/O space accesses. This way, in theory, a BIOS
-writer can provide the kernel with a means to perform actions depending
-on the system design in a system-specific fashion.
-
-ACPI control methods may be divided into global control methods, that are not
-associated with any particular devices, and device control methods, that have
-to be defined separately for each device supposed to be handled with the help of
-the platform. This means, in particular, that ACPI device control methods can
-only be used to handle devices that the BIOS writer knew about in advance. The
-ACPI methods used for device power management fall into that category.
-
-The ACPI specification assumes that devices can be in one of four power states
-labeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM
-D0-D3 states (although the difference between D3hot and D3cold is not taken
-into account by ACPI). Moreover, for each power state of a device there is a
-set of power resources that have to be enabled for the device to be put into
-that state. These power resources are controlled (i.e. enabled or disabled)
-with the help of their own control methods, _ON and _OFF, that have to be
-defined individually for each of them.
-
-To put a device into the ACPI power state Dx (where x is a number between 0 and
-3 inclusive) the kernel is supposed to (1) enable the power resources required
-by the device in this state using their _ON control methods and (2) execute the
-_PSx control method defined for the device. In addition to that, if the device
-is going to be put into a low-power state (D1-D3) and is supposed to generate
-wakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI
-3.0) control method defined for it has to be executed before _PSx. Power
-resources that are not required by the device in the target power state and are
-not required any more by any other device should be disabled (by executing their
-_OFF control methods). If the current power state of the device is D3, it can
-only be put into D0 this way.
-
-However, quite often the power states of devices are changed during a
-system-wide transition into a sleep state or back into the working state. ACPI
-defines four system sleep states, S1, S2, S3, and S4, and denotes the system
-working state as S0. In general, the target system sleep (or working) state
-determines the highest power (lowest number) state the device can be put
-into and the kernel is supposed to obtain this information by executing the
-device's _SxD control method (where x is a number between 0 and 4 inclusive).
-If the device is required to wake up the system from the target sleep state, the
-lowest power (highest number) state it can be put into is also determined by the
-target state of the system. The kernel is then supposed to use the device's
-_SxW control method to obtain the number of that state. It also is supposed to
-use the device's _PRW control method to learn which power resources need to be
-enabled for the device to be able to generate wakeup signals.
-
-1.4. Wakeup Signaling
----------------------
-Wakeup signals generated by PCI devices, either as native PCI PMEs, or as
-a result of the execution of the _DSW (or _PSW) ACPI control method before
-putting the device into a low-power state, have to be caught and handled as
-appropriate. If they are sent while the system is in the working state
-(ACPI S0), they should be translated into interrupts so that the kernel can
-put the devices generating them into the full-power state and take care of the
-events that triggered them. In turn, if they are sent while the system is
-sleeping, they should cause the system's core logic to trigger wakeup.
-
-On ACPI-based systems wakeup signals sent by conventional PCI devices are
-converted into ACPI General-Purpose Events (GPEs) which are hardware signals
-from the system core logic generated in response to various events that need to
-be acted upon. Every GPE is associated with one or more sources of potentially
-interesting events. In particular, a GPE may be associated with a PCI device
-capable of signaling wakeup. The information on the connections between GPEs
-and event sources is recorded in the system's ACPI BIOS from where it can be
-read by the kernel.
-
-If a PCI device known to the system's ACPI BIOS signals wakeup, the GPE
-associated with it (if there is one) is triggered. The GPEs associated with PCI
-bridges may also be triggered in response to a wakeup signal from one of the
-devices below the bridge (this also is the case for root bridges) and, for
-example, native PCI PMEs from devices unknown to the system's ACPI BIOS may be
-handled this way.
-
-A GPE may be triggered when the system is sleeping (i.e. when it is in one of
-the ACPI S1-S4 states), in which case system wakeup is started by its core logic
-(the device that was the source of the signal causing the system wakeup to occur
-may be identified later). The GPEs used in such situations are referred to as
-wakeup GPEs.
-
-Usually, however, GPEs are also triggered when the system is in the working
-state (ACPI S0) and in that case the system's core logic generates a System
-Control Interrupt (SCI) to notify the kernel of the event. Then, the SCI
-handler identifies the GPE that caused the interrupt to be generated which,
-in turn, allows the kernel to identify the source of the event (that may be
-a PCI device signaling wakeup). The GPEs used for notifying the kernel of
-events occurring while the system is in the working state are referred to as
-runtime GPEs.
-
-Unfortunately, there is no standard way of handling wakeup signals sent by
-conventional PCI devices on systems that are not ACPI-based, but there is one
-for PCI Express devices. Namely, the PCI Express Base Specification introduced
-a native mechanism for converting native PCI PMEs into interrupts generated by
-root ports. For conventional PCI devices native PMEs are out-of-band, so they
-are routed separately and they need not pass through bridges (in principle they
-may be routed directly to the system's core logic), but for PCI Express devices
-they are in-band messages that have to pass through the PCI Express hierarchy,
-including the root port on the path from the device to the Root Complex. Thus
-it was possible to introduce a mechanism by which a root port generates an
-interrupt whenever it receives a PME message from one of the devices below it.
-The PCI Express Requester ID of the device that sent the PME message is then
-recorded in one of the root port's configuration registers from where it may be
-read by the interrupt handler allowing the device to be identified. [PME
-messages sent by PCI Express endpoints integrated with the Root Complex don't
-pass through root ports, but instead they cause a Root Complex Event Collector
-(if there is one) to generate interrupts.]
-
-In principle the native PCI Express PME signaling may also be used on ACPI-based
-systems along with the GPEs, but to use it the kernel has to ask the system's
-ACPI BIOS to release control of root port configuration registers. The ACPI
-BIOS, however, is not required to allow the kernel to control these registers
-and if it doesn't do that, the kernel must not modify their contents. Of course
-the native PCI Express PME signaling cannot be used by the kernel in that case.
-
-
-2. PCI Subsystem and Device Power Management
-============================================
-
-2.1. Device Power Management Callbacks
---------------------------------------
-The PCI Subsystem participates in the power management of PCI devices in a
-number of ways. First of all, it provides an intermediate code layer between
-the device power management core (PM core) and PCI device drivers.
-Specifically, the pm field of the PCI subsystem's struct bus_type object,
-pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
-pointers to several device power management callbacks:
-
-const struct dev_pm_ops pci_dev_pm_ops = {
- .prepare = pci_pm_prepare,
- .complete = pci_pm_complete,
- .suspend = pci_pm_suspend,
- .resume = pci_pm_resume,
- .freeze = pci_pm_freeze,
- .thaw = pci_pm_thaw,
- .poweroff = pci_pm_poweroff,
- .restore = pci_pm_restore,
- .suspend_noirq = pci_pm_suspend_noirq,
- .resume_noirq = pci_pm_resume_noirq,
- .freeze_noirq = pci_pm_freeze_noirq,
- .thaw_noirq = pci_pm_thaw_noirq,
- .poweroff_noirq = pci_pm_poweroff_noirq,
- .restore_noirq = pci_pm_restore_noirq,
- .runtime_suspend = pci_pm_runtime_suspend,
- .runtime_resume = pci_pm_runtime_resume,
- .runtime_idle = pci_pm_runtime_idle,
-};
-
-These callbacks are executed by the PM core in various situations related to
-device power management and they, in turn, execute power management callbacks
-provided by PCI device drivers. They also perform power management operations
-involving some standard configuration registers of PCI devices that device
-drivers need not know or care about.
-
-The structure representing a PCI device, struct pci_dev, contains several fields
-that these callbacks operate on:
-
-struct pci_dev {
- ...
- pci_power_t current_state; /* Current operating state. */
- int pm_cap; /* PM capability offset in the
- configuration space */
- unsigned int pme_support:5; /* Bitmask of states from which PME#
- can be generated */
- unsigned int pme_interrupt:1;/* Is native PCIe PME signaling used? */
- unsigned int d1_support:1; /* Low power state D1 is supported */
- unsigned int d2_support:1; /* Low power state D2 is supported */
- unsigned int no_d1d2:1; /* D1 and D2 are forbidden */
- unsigned int wakeup_prepared:1; /* Device prepared for wake up */
- unsigned int d3_delay; /* D3->D0 transition time in ms */
- ...
-};
-
-They also indirectly use some fields of the struct device that is embedded in
-struct pci_dev.
-
-2.2. Device Initialization
---------------------------
-The PCI subsystem's first task related to device power management is to
-prepare the device for power management and initialize the fields of struct
-pci_dev used for this purpose. This happens in two functions defined in
-drivers/pci/pci.c, pci_pm_init() and platform_pci_wakeup_init().
-
-The first of these functions checks if the device supports native PCI PM
-and if that's the case the offset of its power management capability structure
-in the configuration space is stored in the pm_cap field of the device's struct
-pci_dev object. Next, the function checks which PCI low-power states are
-supported by the device and from which low-power states the device can generate
-native PCI PMEs. The power management fields of the device's struct pci_dev and
-the struct device embedded in it are updated accordingly and the generation of
-PMEs by the device is disabled.
-
-The second function checks if the device can be prepared to signal wakeup with
-the help of the platform firmware, such as the ACPI BIOS. If that is the case,
-the function updates the wakeup fields in struct device embedded in the
-device's struct pci_dev and uses the firmware-provided method to prevent the
-device from signaling wakeup.
-
-At this point the device is ready for power management. For driverless devices,
-however, this functionality is limited to a few basic operations carried out
-during system-wide transitions to a sleep state and back to the working state.
-
-2.3. Runtime Device Power Management
-------------------------------------
-The PCI subsystem plays a vital role in the runtime power management of PCI
-devices. For this purpose it uses the general runtime power management
-(runtime PM) framework described in Documentation/power/runtime_pm.txt.
-Namely, it provides subsystem-level callbacks:
-
- pci_pm_runtime_suspend()
- pci_pm_runtime_resume()
- pci_pm_runtime_idle()
-
-that are executed by the core runtime PM routines. It also implements the
-entire mechanics necessary for handling runtime wakeup signals from PCI devices
-in low-power states, which at the time of this writing works for both the native
-PCI Express PME signaling and the ACPI GPE-based wakeup signaling described in
-Section 1.
-
-First, a PCI device is put into a low-power state, or suspended, with the help
-of pm_schedule_suspend() or pm_runtime_suspend() which for PCI devices call
-pci_pm_runtime_suspend() to do the actual job. For this to work, the device's
-driver has to provide a pm->runtime_suspend() callback (see below), which is
-run by pci_pm_runtime_suspend() as the first action. If the driver's callback
-returns successfully, the device's standard configuration registers are saved,
-the device is prepared to generate wakeup signals and, finally, it is put into
-the target low-power state.
-
-The low-power state to put the device into is the lowest-power (highest number)
-state from which it can signal wakeup. The exact method of signaling wakeup is
-system-dependent and is determined by the PCI subsystem on the basis of the
-reported capabilities of the device and the platform firmware. To prepare the
-device for signaling wakeup and put it into the selected low-power state, the
-PCI subsystem can use the platform firmware as well as the device's native PCI
-PM capabilities, if supported.
-
-It is expected that the device driver's pm->runtime_suspend() callback will
-not attempt to prepare the device for signaling wakeup or to put it into a
-low-power state. The driver ought to leave these tasks to the PCI subsystem
-that has all of the information necessary to perform them.
-
-A suspended device is brought back into the "active" state, or resumed,
-with the help of pm_request_resume() or pm_runtime_resume() which both call
-pci_pm_runtime_resume() for PCI devices. Again, this only works if the device's
-driver provides a pm->runtime_resume() callback (see below). However, before
-the driver's callback is executed, pci_pm_runtime_resume() brings the device
-back into the full-power state, prevents it from signaling wakeup while in that
-state and restores its standard configuration registers. Thus the driver's
-callback need not worry about the PCI-specific aspects of the device resume.
-
-Note that generally pci_pm_runtime_resume() may be called in two different
-situations. First, it may be called at the request of the device's driver, for
-example if there are some data for it to process. Second, it may be called
-as a result of a wakeup signal from the device itself (this sometimes is
-referred to as "remote wakeup"). Of course, for this purpose the wakeup signal
-is handled in one of the ways described in Section 1 and finally converted into
-a notification for the PCI subsystem after the source device has been
-identified.
-
-The pci_pm_runtime_idle() function, called for PCI devices by pm_runtime_idle()
-and pm_request_idle(), executes the device driver's pm->runtime_idle()
-callback, if defined, and if that callback doesn't return error code (or is not
-present at all), suspends the device with the help of pm_runtime_suspend().
-Sometimes pci_pm_runtime_idle() is called automatically by the PM core (for
-example, it is called right after the device has just been resumed), in which
-cases it is expected to suspend the device if that makes sense. Usually,
-however, the PCI subsystem doesn't really know if the device really can be
-suspended, so it lets the device's driver decide by running its
-pm->runtime_idle() callback.
-
-2.4. System-Wide Power Transitions
-----------------------------------
-There are a few different types of system-wide power transitions, described in
-Documentation/power/devices.txt. Each of them requires devices to be handled
-in a specific way and the PM core executes subsystem-level power management
-callbacks for this purpose. They are executed in phases such that each phase
-involves executing the same subsystem-level callback for every device belonging
-to the given subsystem before the next phase begins. These phases always run
-after tasks have been frozen.
-
-2.4.1. System Suspend
-
-When the system is going into a sleep state in which the contents of memory will
-be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
-
- prepare, suspend, suspend_noirq.
-
-The following PCI bus type's callbacks, respectively, are used in these phases:
-
- pci_pm_prepare()
- pci_pm_suspend()
- pci_pm_suspend_noirq()
-
-The pci_pm_prepare() routine first puts the device into the "fully functional"
-state with the help of pm_runtime_resume(). Then, it executes the device
-driver's pm->prepare() callback if defined (i.e. if the driver's struct
-dev_pm_ops object is present and the prepare pointer in that object is valid).
-
-The pci_pm_suspend() routine first checks if the device's driver implements
-legacy PCI suspend routines (see Section 3), in which case the driver's legacy
-suspend callback is executed, if present, and its result is returned. Next, if
-the device's driver doesn't provide a struct dev_pm_ops object (containing
-pointers to the driver's callbacks), pci_pm_default_suspend() is called, which
-simply turns off the device's bus master capability and runs
-pcibios_disable_device() to disable it, unless the device is a bridge (PCI
-bridges are ignored by this routine). Next, the device driver's pm->suspend()
-callback is executed, if defined, and its result is returned if it fails.
-Finally, pci_fixup_device() is called to apply hardware suspend quirks related
-to the device if necessary.
-
-Note that the suspend phase is carried out asynchronously for PCI devices, so
-the pci_pm_suspend() callback may be executed in parallel for any pair of PCI
-devices that don't depend on each other in a known way (i.e. none of the paths
-in the device tree from the root bridge to a leaf device contains both of them).
-
-The pci_pm_suspend_noirq() routine is executed after suspend_device_irqs() has
-been called, which means that the device driver's interrupt handler won't be
-invoked while this routine is running. It first checks if the device's driver
-implements legacy PCI suspends routines (Section 3), in which case the legacy
-late suspend routine is called and its result is returned (the standard
-configuration registers of the device are saved if the driver's callback hasn't
-done that). Second, if the device driver's struct dev_pm_ops object is not
-present, the device's standard configuration registers are saved and the routine
-returns success. Otherwise the device driver's pm->suspend_noirq() callback is
-executed, if present, and its result is returned if it fails. Next, if the
-device's standard configuration registers haven't been saved yet (one of the
-device driver's callbacks executed before might do that), pci_pm_suspend_noirq()
-saves them, prepares the device to signal wakeup (if necessary) and puts it into
-a low-power state.
-
-The low-power state to put the device into is the lowest-power (highest number)
-state from which it can signal wakeup while the system is in the target sleep
-state. Just like in the runtime PM case described above, the mechanism of
-signaling wakeup is system-dependent and determined by the PCI subsystem, which
-is also responsible for preparing the device to signal wakeup from the system's
-target sleep state as appropriate.
-
-PCI device drivers (that don't implement legacy power management callbacks) are
-generally not expected to prepare devices for signaling wakeup or to put them
-into low-power states. However, if one of the driver's suspend callbacks
-(pm->suspend() or pm->suspend_noirq()) saves the device's standard configuration
-registers, pci_pm_suspend_noirq() will assume that the device has been prepared
-to signal wakeup and put into a low-power state by the driver (the driver is
-then assumed to have used the helper functions provided by the PCI subsystem for
-this purpose). PCI device drivers are not encouraged to do that, but in some
-rare cases doing that in the driver may be the optimum approach.
-
-2.4.2. System Resume
-
-When the system is undergoing a transition from a sleep state in which the
-contents of memory have been preserved, such as one of the ACPI sleep states
-S1-S3, into the working state (ACPI S0), the phases are:
-
- resume_noirq, resume, complete.
-
-The following PCI bus type's callbacks, respectively, are executed in these
-phases:
-
- pci_pm_resume_noirq()
- pci_pm_resume()
- pci_pm_complete()
-
-The pci_pm_resume_noirq() routine first puts the device into the full-power
-state, restores its standard configuration registers and applies early resume
-hardware quirks related to the device, if necessary. This is done
-unconditionally, regardless of whether or not the device's driver implements
-legacy PCI power management callbacks (this way all PCI devices are in the
-full-power state and their standard configuration registers have been restored
-when their interrupt handlers are invoked for the first time during resume,
-which allows the kernel to avoid problems with the handling of shared interrupts
-by drivers whose devices are still suspended). If legacy PCI power management
-callbacks (see Section 3) are implemented by the device's driver, the legacy
-early resume callback is executed and its result is returned. Otherwise, the
-device driver's pm->resume_noirq() callback is executed, if defined, and its
-result is returned.
-
-The pci_pm_resume() routine first checks if the device's standard configuration
-registers have been restored and restores them if that's not the case (this
-only is necessary in the error path during a failing suspend). Next, resume
-hardware quirks related to the device are applied, if necessary, and if the
-device's driver implements legacy PCI power management callbacks (see
-Section 3), the driver's legacy resume callback is executed and its result is
-returned. Otherwise, the device's wakeup signaling mechanisms are blocked and
-its driver's pm->resume() callback is executed, if defined (the callback's
-result is then returned).
-
-The resume phase is carried out asynchronously for PCI devices, like the
-suspend phase described above, which means that if two PCI devices don't depend
-on each other in a known way, the pci_pm_resume() routine may be executed for
-the both of them in parallel.
-
-The pci_pm_complete() routine only executes the device driver's pm->complete()
-callback, if defined.
-
-2.4.3. System Hibernation
-
-System hibernation is more complicated than system suspend, because it requires
-a system image to be created and written into a persistent storage medium. The
-image is created atomically and all devices are quiesced, or frozen, before that
-happens.
-
-The freezing of devices is carried out after enough memory has been freed (at
-the time of this writing the image creation requires at least 50% of system RAM
-to be free) in the following three phases:
-
- prepare, freeze, freeze_noirq
-
-that correspond to the PCI bus type's callbacks:
-
- pci_pm_prepare()
- pci_pm_freeze()
- pci_pm_freeze_noirq()
-
-This means that the prepare phase is exactly the same as for system suspend.
-The other two phases, however, are different.
-
-The pci_pm_freeze() routine is quite similar to pci_pm_suspend(), but it runs
-the device driver's pm->freeze() callback, if defined, instead of pm->suspend(),
-and it doesn't apply the suspend-related hardware quirks. It is executed
-asynchronously for different PCI devices that don't depend on each other in a
-known way.
-
-The pci_pm_freeze_noirq() routine, in turn, is similar to
-pci_pm_suspend_noirq(), but it calls the device driver's pm->freeze_noirq()
-routine instead of pm->suspend_noirq(). It also doesn't attempt to prepare the
-device for signaling wakeup and put it into a low-power state. Still, it saves
-the device's standard configuration registers if they haven't been saved by one
-of the driver's callbacks.
-
-Once the image has been created, it has to be saved. However, at this point all
-devices are frozen and they cannot handle I/O, while their ability to handle
-I/O is obviously necessary for the image saving. Thus they have to be brought
-back to the fully functional state and this is done in the following phases:
-
- thaw_noirq, thaw, complete
-
-using the following PCI bus type's callbacks:
-
- pci_pm_thaw_noirq()
- pci_pm_thaw()
- pci_pm_complete()
-
-respectively.
-
-The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(),
-but it doesn't put the device into the full power state and doesn't attempt to
-restore its standard configuration registers. It also executes the device
-driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq().
-
-The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device
-driver's pm->thaw() callback instead of pm->resume(). It is executed
-asynchronously for different PCI devices that don't depend on each other in a
-known way.
-
-The complete phase it the same as for system resume.
-
-After saving the image, devices need to be powered down before the system can
-enter the target sleep state (ACPI S4 for ACPI-based systems). This is done in
-three phases:
-
- prepare, poweroff, poweroff_noirq
-
-where the prepare phase is exactly the same as for system suspend. The other
-two phases are analogous to the suspend and suspend_noirq phases, respectively.
-The PCI subsystem-level callbacks they correspond to
-
- pci_pm_poweroff()
- pci_pm_poweroff_noirq()
-
-work in analogy with pci_pm_suspend() and pci_pm_poweroff_noirq(), respectively,
-although they don't attempt to save the device's standard configuration
-registers.
-
-2.4.4. System Restore
-
-System restore requires a hibernation image to be loaded into memory and the
-pre-hibernation memory contents to be restored before the pre-hibernation system
-activity can be resumed.
-
-As described in Documentation/power/devices.txt, the hibernation image is loaded
-into memory by a fresh instance of the kernel, called the boot kernel, which in
-turn is loaded and run by a boot loader in the usual way. After the boot kernel
-has loaded the image, it needs to replace its own code and data with the code
-and data of the "hibernated" kernel stored within the image, called the image
-kernel. For this purpose all devices are frozen just like before creating
-the image during hibernation, in the
-
- prepare, freeze, freeze_noirq
-
-phases described above. However, the devices affected by these phases are only
-those having drivers in the boot kernel; other devices will still be in whatever
-state the boot loader left them.
-
-Should the restoration of the pre-hibernation memory contents fail, the boot
-kernel would go through the "thawing" procedure described above, using the
-thaw_noirq, thaw, and complete phases (that will only affect the devices having
-drivers in the boot kernel), and then continue running normally.
-
-If the pre-hibernation memory contents are restored successfully, which is the
-usual situation, control is passed to the image kernel, which then becomes
-responsible for bringing the system back to the working state. To achieve this,
-it must restore the devices' pre-hibernation functionality, which is done much
-like waking up from the memory sleep state, although it involves different
-phases:
-
- restore_noirq, restore, complete
-
-The first two of these are analogous to the resume_noirq and resume phases
-described above, respectively, and correspond to the following PCI subsystem
-callbacks:
-
- pci_pm_restore_noirq()
- pci_pm_restore()
-
-These callbacks work in analogy with pci_pm_resume_noirq() and pci_pm_resume(),
-respectively, but they execute the device driver's pm->restore_noirq() and
-pm->restore() callbacks, if available.
-
-The complete phase is carried out in exactly the same way as during system
-resume.
-
-
-3. PCI Device Drivers and Power Management
-==========================================
-
-3.1. Power Management Callbacks
--------------------------------
-PCI device drivers participate in power management by providing callbacks to be
-executed by the PCI subsystem's power management routines described above and by
-controlling the runtime power management of their devices.
-
-At the time of this writing there are two ways to define power management
-callbacks for a PCI device driver, the recommended one, based on using a
-dev_pm_ops structure described in Documentation/power/devices.txt, and the
-"legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
-.resume() callbacks from struct pci_driver are used. The legacy approach,
-however, doesn't allow one to define runtime power management callbacks and is
-not really suitable for any new drivers. Therefore it is not covered by this
-document (refer to the source code to learn more about it).
-
-It is recommended that all PCI device drivers define a struct dev_pm_ops object
-containing pointers to power management (PM) callbacks that will be executed by
-the PCI subsystem's PM routines in various circumstances. A pointer to the
-driver's struct dev_pm_ops object has to be assigned to the driver.pm field in
-its struct pci_driver object. Once that has happened, the "legacy" PM callbacks
-in struct pci_driver are ignored (even if they are not NULL).
-
-The PM callbacks in struct dev_pm_ops are not mandatory and if they are not
-defined (i.e. the respective fields of struct dev_pm_ops are unset) the PCI
-subsystem will handle the device in a simplified default manner. If they are
-defined, though, they are expected to behave as described in the following
-subsections.
-
-3.1.1. prepare()
-
-The prepare() callback is executed during system suspend, during hibernation
-(when a hibernation image is about to be created), during power-off after
-saving a hibernation image and during system restore, when a hibernation image
-has just been loaded into memory.
-
-This callback is only necessary if the driver's device has children that in
-general may be registered at any time. In that case the role of the prepare()
-callback is to prevent new children of the device from being registered until
-one of the resume_noirq(), thaw_noirq(), or restore_noirq() callbacks is run.
-
-In addition to that the prepare() callback may carry out some operations
-preparing the device to be suspended, although it should not allocate memory
-(if additional memory is required to suspend the device, it has to be
-preallocated earlier, for example in a suspend/hibernate notifier as described
-in Documentation/power/notifiers.txt).
-
-3.1.2. suspend()
-
-The suspend() callback is only executed during system suspend, after prepare()
-callbacks have been executed for all devices in the system.
-
-This callback is expected to quiesce the device and prepare it to be put into a
-low-power state by the PCI subsystem. It is not required (in fact it even is
-not recommended) that a PCI driver's suspend() callback save the standard
-configuration registers of the device, prepare it for waking up the system, or
-put it into a low-power state. All of these operations can very well be taken
-care of by the PCI subsystem, without the driver's participation.
-
-However, in some rare case it is convenient to carry out these operations in
-a PCI driver. Then, pci_save_state(), pci_prepare_to_sleep(), and
-pci_set_power_state() should be used to save the device's standard configuration
-registers, to prepare it for system wakeup (if necessary), and to put it into a
-low-power state, respectively. Moreover, if the driver calls pci_save_state(),
-the PCI subsystem will not execute either pci_prepare_to_sleep(), or
-pci_set_power_state() for its device, so the driver is then responsible for
-handling the device as appropriate.
-
-While the suspend() callback is being executed, the driver's interrupt handler
-can be invoked to handle an interrupt from the device, so all suspend-related
-operations relying on the driver's ability to handle interrupts should be
-carried out in this callback.
-
-3.1.3. suspend_noirq()
-
-The suspend_noirq() callback is only executed during system suspend, after
-suspend() callbacks have been executed for all devices in the system and
-after device interrupts have been disabled by the PM core.
-
-The difference between suspend_noirq() and suspend() is that the driver's
-interrupt handler will not be invoked while suspend_noirq() is running. Thus
-suspend_noirq() can carry out operations that would cause race conditions to
-arise if they were performed in suspend().
-
-3.1.4. freeze()
-
-The freeze() callback is hibernation-specific and is executed in two situations,
-during hibernation, after prepare() callbacks have been executed for all devices
-in preparation for the creation of a system image, and during restore,
-after a system image has been loaded into memory from persistent storage and the
-prepare() callbacks have been executed for all devices.
-
-The role of this callback is analogous to the role of the suspend() callback
-described above. In fact, they only need to be different in the rare cases when
-the driver takes the responsibility for putting the device into a low-power
-state.
-
-In that cases the freeze() callback should not prepare the device system wakeup
-or put it into a low-power state. Still, either it or freeze_noirq() should
-save the device's standard configuration registers using pci_save_state().
-
-3.1.5. freeze_noirq()
-
-The freeze_noirq() callback is hibernation-specific. It is executed during
-hibernation, after prepare() and freeze() callbacks have been executed for all
-devices in preparation for the creation of a system image, and during restore,
-after a system image has been loaded into memory and after prepare() and
-freeze() callbacks have been executed for all devices. It is always executed
-after device interrupts have been disabled by the PM core.
-
-The role of this callback is analogous to the role of the suspend_noirq()
-callback described above and it very rarely is necessary to define
-freeze_noirq().
-
-The difference between freeze_noirq() and freeze() is analogous to the
-difference between suspend_noirq() and suspend().
-
-3.1.6. poweroff()
-
-The poweroff() callback is hibernation-specific. It is executed when the system
-is about to be powered off after saving a hibernation image to a persistent
-storage. prepare() callbacks are executed for all devices before poweroff() is
-called.
-
-The role of this callback is analogous to the role of the suspend() and freeze()
-callbacks described above, although it does not need to save the contents of
-the device's registers. In particular, if the driver wants to put the device
-into a low-power state itself instead of allowing the PCI subsystem to do that,
-the poweroff() callback should use pci_prepare_to_sleep() and
-pci_set_power_state() to prepare the device for system wakeup and to put it
-into a low-power state, respectively, but it need not save the device's standard
-configuration registers.
-
-3.1.7. poweroff_noirq()
-
-The poweroff_noirq() callback is hibernation-specific. It is executed after
-poweroff() callbacks have been executed for all devices in the system.
-
-The role of this callback is analogous to the role of the suspend_noirq() and
-freeze_noirq() callbacks described above, but it does not need to save the
-contents of the device's registers.
-
-The difference between poweroff_noirq() and poweroff() is analogous to the
-difference between suspend_noirq() and suspend().
-
-3.1.8. resume_noirq()
-
-The resume_noirq() callback is only executed during system resume, after the
-PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
-be invoked while resume_noirq() is running, so this callback can carry out
-operations that might race with the interrupt handler.
-
-Since the PCI subsystem unconditionally puts all devices into the full power
-state in the resume_noirq phase of system resume and restores their standard
-configuration registers, resume_noirq() is usually not necessary. In general
-it should only be used for performing operations that would lead to race
-conditions if carried out by resume().
-
-3.1.9. resume()
-
-The resume() callback is only executed during system resume, after
-resume_noirq() callbacks have been executed for all devices in the system and
-device interrupts have been enabled by the PM core.
-
-This callback is responsible for restoring the pre-suspend configuration of the
-device and bringing it back to the fully functional state. The device should be
-able to process I/O in a usual way after resume() has returned.
-
-3.1.10. thaw_noirq()
-
-The thaw_noirq() callback is hibernation-specific. It is executed after a
-system image has been created and the non-boot CPUs have been enabled by the PM
-core, in the thaw_noirq phase of hibernation. It also may be executed if the
-loading of a hibernation image fails during system restore (it is then executed
-after enabling the non-boot CPUs). The driver's interrupt handler will not be
-invoked while thaw_noirq() is running.
-
-The role of this callback is analogous to the role of resume_noirq(). The
-difference between these two callbacks is that thaw_noirq() is executed after
-freeze() and freeze_noirq(), so in general it does not need to modify the
-contents of the device's registers.
-
-3.1.11. thaw()
-
-The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
-callbacks have been executed for all devices in the system and after device
-interrupts have been enabled by the PM core.
-
-This callback is responsible for restoring the pre-freeze configuration of
-the device, so that it will work in a usual way after thaw() has returned.
-
-3.1.12. restore_noirq()
-
-The restore_noirq() callback is hibernation-specific. It is executed in the
-restore_noirq phase of hibernation, when the boot kernel has passed control to
-the image kernel and the non-boot CPUs have been enabled by the image kernel's
-PM core.
-
-This callback is analogous to resume_noirq() with the exception that it cannot
-make any assumption on the previous state of the device, even if the BIOS (or
-generally the platform firmware) is known to preserve that state over a
-suspend-resume cycle.
-
-For the vast majority of PCI device drivers there is no difference between
-resume_noirq() and restore_noirq().
-
-3.1.13. restore()
-
-The restore() callback is hibernation-specific. It is executed after
-restore_noirq() callbacks have been executed for all devices in the system and
-after the PM core has enabled device drivers' interrupt handlers to be invoked.
-
-This callback is analogous to resume(), just like restore_noirq() is analogous
-to resume_noirq(). Consequently, the difference between restore_noirq() and
-restore() is analogous to the difference between resume_noirq() and resume().
-
-For the vast majority of PCI device drivers there is no difference between
-resume() and restore().
-
-3.1.14. complete()
-
-The complete() callback is executed in the following situations:
- - during system resume, after resume() callbacks have been executed for all
- devices,
- - during hibernation, before saving the system image, after thaw() callbacks
- have been executed for all devices,
- - during system restore, when the system is going back to its pre-hibernation
- state, after restore() callbacks have been executed for all devices.
-It also may be executed if the loading of a hibernation image into memory fails
-(in that case it is run after thaw() callbacks have been executed for all
-devices that have drivers in the boot kernel).
-
-This callback is entirely optional, although it may be necessary if the
-prepare() callback performs operations that need to be reversed.
-
-3.1.15. runtime_suspend()
-
-The runtime_suspend() callback is specific to device runtime power management
-(runtime PM). It is executed by the PM core's runtime PM framework when the
-device is about to be suspended (i.e. quiesced and put into a low-power state)
-at run time.
-
-This callback is responsible for freezing the device and preparing it to be
-put into a low-power state, but it must allow the PCI subsystem to perform all
-of the PCI-specific actions necessary for suspending the device.
-
-3.1.16. runtime_resume()
-
-The runtime_resume() callback is specific to device runtime PM. It is executed
-by the PM core's runtime PM framework when the device is about to be resumed
-(i.e. put into the full-power state and programmed to process I/O normally) at
-run time.
-
-This callback is responsible for restoring the normal functionality of the
-device after it has been put into the full-power state by the PCI subsystem.
-The device is expected to be able to process I/O in the usual way after
-runtime_resume() has returned.
-
-3.1.17. runtime_idle()
-
-The runtime_idle() callback is specific to device runtime PM. It is executed
-by the PM core's runtime PM framework whenever it may be desirable to suspend
-the device according to the PM core's information. In particular, it is
-automatically executed right after runtime_resume() has returned in case the
-resume of the device has happened as a result of a spurious event.
-
-This callback is optional, but if it is not implemented or if it returns 0, the
-PCI subsystem will call pm_runtime_suspend() for the device, which in turn will
-cause the driver's runtime_suspend() callback to be executed.
-
-3.1.18. Pointing Multiple Callback Pointers to One Routine
-
-Although in principle each of the callbacks described in the previous
-subsections can be defined as a separate function, it often is convenient to
-point two or more members of struct dev_pm_ops to the same routine. There are
-a few convenience macros that can be used for this purpose.
-
-The SIMPLE_DEV_PM_OPS macro declares a struct dev_pm_ops object with one
-suspend routine pointed to by the .suspend(), .freeze(), and .poweroff()
-members and one resume routine pointed to by the .resume(), .thaw(), and
-.restore() members. The other function pointers in this struct dev_pm_ops are
-unset.
-
-The UNIVERSAL_DEV_PM_OPS macro is similar to SIMPLE_DEV_PM_OPS, but it
-additionally sets the .runtime_resume() pointer to the same value as
-.resume() (and .thaw(), and .restore()) and the .runtime_suspend() pointer to
-the same value as .suspend() (and .freeze() and .poweroff()).
-
-The SET_SYSTEM_SLEEP_PM_OPS can be used inside of a declaration of struct
-dev_pm_ops to indicate that one suspend routine is to be pointed to by the
-.suspend(), .freeze(), and .poweroff() members and one resume routine is to
-be pointed to by the .resume(), .thaw(), and .restore() members.
-
-3.2. Device Runtime Power Management
-------------------------------------
-In addition to providing device power management callbacks PCI device drivers
-are responsible for controlling the runtime power management (runtime PM) of
-their devices.
-
-The PCI device runtime PM is optional, but it is recommended that PCI device
-drivers implement it at least in the cases where there is a reliable way of
-verifying that the device is not used (like when the network cable is detached
-from an Ethernet adapter or there are no devices attached to a USB controller).
-
-To support the PCI runtime PM the driver first needs to implement the
-runtime_suspend() and runtime_resume() callbacks. It also may need to implement
-the runtime_idle() callback to prevent the device from being suspended again
-every time right after the runtime_resume() callback has returned
-(alternatively, the runtime_suspend() callback will have to check if the
-device should really be suspended and return -EAGAIN if that is not the case).
-
-The runtime PM of PCI devices is disabled by default. It is also blocked by
-pci_pm_init() that runs the pm_runtime_forbid() helper function. If a PCI
-driver implements the runtime PM callbacks and intends to use the runtime PM
-framework provided by the PM core and the PCI subsystem, it should enable this
-feature by executing the pm_runtime_enable() helper function. However, the
-driver should not call the pm_runtime_allow() helper function unblocking
-the runtime PM of the device. Instead, it should allow user space or some
-platform-specific code to do that (user space can do it via sysfs), although
-once it has called pm_runtime_enable(), it must be prepared to handle the
-runtime PM of the device correctly as soon as pm_runtime_allow() is called
-(which may happen at any time). [It also is possible that user space causes
-pm_runtime_allow() to be called via sysfs before the driver is loaded, so in
-fact the driver has to be prepared to handle the runtime PM of the device as
-soon as it calls pm_runtime_enable().]
-
-The runtime PM framework works by processing requests to suspend or resume
-devices, or to check if they are idle (in which cases it is reasonable to
-subsequently request that they be suspended). These requests are represented
-by work items put into the power management workqueue, pm_wq. Although there
-are a few situations in which power management requests are automatically
-queued by the PM core (for example, after processing a request to resume a
-device the PM core automatically queues a request to check if the device is
-idle), device drivers are generally responsible for queuing power management
-requests for their devices. For this purpose they should use the runtime PM
-helper functions provided by the PM core, discussed in
-Documentation/power/runtime_pm.txt.
-
-Devices can also be suspended and resumed synchronously, without placing a
-request into pm_wq. In the majority of cases this also is done by their
-drivers that use helper functions provided by the PM core for this purpose.
-
-For more information on the runtime PM of devices refer to
-Documentation/power/runtime_pm.txt.
-
-
-4. Resources
-============
-
-PCI Local Bus Specification, Rev. 3.0
-PCI Bus Power Management Interface Specification, Rev. 1.2
-Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
-PCI Express Base Specification, Rev. 2.0
-Documentation/power/devices.txt
-Documentation/power/runtime_pm.txt
diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
deleted file mode 100644
index 17e130a8034..00000000000
--- a/Documentation/power/pm_qos_interface.txt
+++ /dev/null
@@ -1,148 +0,0 @@
-PM Quality Of Service Interface.
-
-This interface provides a kernel and user mode interface for registering
-performance expectations by drivers, subsystems and user space applications on
-one of the parameters.
-
-Two different PM QoS frameworks are available:
-1. PM QoS classes for cpu_dma_latency, network_latency, network_throughput.
-2. the per-device PM QoS framework provides the API to manage the per-device latency
-constraints.
-
-Each parameters have defined units:
- * latency: usec
- * timeout: usec
- * throughput: kbs (kilo bit / sec)
-
-
-1. PM QoS framework
-
-The infrastructure exposes multiple misc device nodes one per implemented
-parameter. The set of parameters implement is defined by pm_qos_power_init()
-and pm_qos_params.h. This is done because having the available parameters
-being runtime configurable or changeable from a driver was seen as too easy to
-abuse.
-
-For each parameter a list of performance requests is maintained along with
-an aggregated target value. The aggregated target value is updated with
-changes to the request list or elements of the list. Typically the
-aggregated target value is simply the max or min of the request values held
-in the parameter list elements.
-Note: the aggregated target value is implemented as an atomic variable so that
-reading the aggregated value does not require any locking mechanism.
-
-
-From kernel mode the use of this interface is simple:
-
-void pm_qos_add_request(handle, param_class, target_value):
-Will insert an element into the list for that identified PM QoS class with the
-target value. Upon change to this list the new target is recomputed and any
-registered notifiers are called only if the target value is now different.
-Clients of pm_qos need to save the returned handle for future use in other
-pm_qos API functions.
-
-void pm_qos_update_request(handle, new_target_value):
-Will update the list element pointed to by the handle with the new target value
-and recompute the new aggregated target, calling the notification tree if the
-target is changed.
-
-void pm_qos_remove_request(handle):
-Will remove the element. After removal it will update the aggregate target and
-call the notification tree if the target was changed as a result of removing
-the request.
-
-int pm_qos_request(param_class):
-Returns the aggregated value for a given PM QoS class.
-
-int pm_qos_request_active(handle):
-Returns if the request is still active, i.e. it has not been removed from a
-PM QoS class constraints list.
-
-int pm_qos_add_notifier(param_class, notifier):
-Adds a notification callback function to the PM QoS class. The callback is
-called when the aggregated value for the PM QoS class is changed.
-
-int pm_qos_remove_notifier(int param_class, notifier):
-Removes the notification callback function for the PM QoS class.
-
-
-From user mode:
-Only processes can register a pm_qos request. To provide for automatic
-cleanup of a process, the interface requires the process to register its
-parameter requests in the following way:
-
-To register the default pm_qos target for the specific parameter, the process
-must open one of /dev/[cpu_dma_latency, network_latency, network_throughput]
-
-As long as the device node is held open that process has a registered
-request on the parameter.
-
-To change the requested target value the process needs to write an s32 value to
-the open device node. Alternatively the user mode program could write a hex
-string for the value using 10 char long format e.g. "0x12345678". This
-translates to a pm_qos_update_request call.
-
-To remove the user mode request for a target value simply close the device
-node.
-
-
-2. PM QoS per-device latency framework
-
-For each device a list of performance requests is maintained along with
-an aggregated target value. The aggregated target value is updated with
-changes to the request list or elements of the list. Typically the
-aggregated target value is simply the max or min of the request values held
-in the parameter list elements.
-Note: the aggregated target value is implemented as an atomic variable so that
-reading the aggregated value does not require any locking mechanism.
-
-
-From kernel mode the use of this interface is the following:
-
-int dev_pm_qos_add_request(device, handle, value):
-Will insert an element into the list for that identified device with the
-target value. Upon change to this list the new target is recomputed and any
-registered notifiers are called only if the target value is now different.
-Clients of dev_pm_qos need to save the handle for future use in other
-dev_pm_qos API functions.
-
-int dev_pm_qos_update_request(handle, new_value):
-Will update the list element pointed to by the handle with the new target value
-and recompute the new aggregated target, calling the notification trees if the
-target is changed.
-
-int dev_pm_qos_remove_request(handle):
-Will remove the element. After removal it will update the aggregate target and
-call the notification trees if the target was changed as a result of removing
-the request.
-
-s32 dev_pm_qos_read_value(device):
-Returns the aggregated value for a given device's constraints list.
-
-
-Notification mechanisms:
-The per-device PM QoS framework has 2 different and distinct notification trees:
-a per-device notification tree and a global notification tree.
-
-int dev_pm_qos_add_notifier(device, notifier):
-Adds a notification callback function for the device.
-The callback is called when the aggregated value of the device constraints list
-is changed.
-
-int dev_pm_qos_remove_notifier(device, notifier):
-Removes the notification callback function for the device.
-
-int dev_pm_qos_add_global_notifier(notifier):
-Adds a notification callback function in the global notification tree of the
-framework.
-The callback is called when the aggregated value for any device is changed.
-
-int dev_pm_qos_remove_global_notifier(notifier):
-Removes the notification callback function from the global notification tree
-of the framework.
-
-
-From user mode:
-No API for user space access to the per-device latency constraints is provided
-yet - still under discussion.
-
diff --git a/Documentation/power/power_supply_class.txt b/Documentation/power/power_supply_class.txt
deleted file mode 100644
index 211831d4095..00000000000
--- a/Documentation/power/power_supply_class.txt
+++ /dev/null
@@ -1,182 +0,0 @@
-Linux power supply class
-========================
-
-Synopsis
-~~~~~~~~
-Power supply class used to represent battery, UPS, AC or DC power supply
-properties to user-space.
-
-It defines core set of attributes, which should be applicable to (almost)
-every power supply out there. Attributes are available via sysfs and uevent
-interfaces.
-
-Each attribute has well defined meaning, up to unit of measure used. While
-the attributes provided are believed to be universally applicable to any
-power supply, specific monitoring hardware may not be able to provide them
-all, so any of them may be skipped.
-
-Power supply class is extensible, and allows to define drivers own attributes.
-The core attribute set is subject to the standard Linux evolution (i.e.
-if it will be found that some attribute is applicable to many power supply
-types or their drivers, it can be added to the core set).
-
-It also integrates with LED framework, for the purpose of providing
-typically expected feedback of battery charging/fully charged status and
-AC/USB power supply online status. (Note that specific details of the
-indication (including whether to use it at all) are fully controllable by
-user and/or specific machine defaults, per design principles of LED
-framework).
-
-
-Attributes/properties
-~~~~~~~~~~~~~~~~~~~~~
-Power supply class has predefined set of attributes, this eliminates code
-duplication across drivers. Power supply class insist on reusing its
-predefined attributes *and* their units.
-
-So, userspace gets predictable set of attributes and their units for any
-kind of power supply, and can process/present them to a user in consistent
-manner. Results for different power supplies and machines are also directly
-comparable.
-
-See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the
-example how to declare and handle attributes.
-
-
-Units
-~~~~~
-Quoting include/linux/power_supply.h:
-
- All voltages, currents, charges, energies, time and temperatures in µV,
- µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
- stated. It's driver's job to convert its raw values to units in which
- this class operates.
-
-
-Attributes/properties detailed
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
-~ ~
-~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
-~ of battery, this class distinguish these terms. Don't mix them! ~
-~ ~
-~ CHARGE_* attributes represents capacity in µAh only. ~
-~ ENERGY_* attributes represents capacity in µWh only. ~
-~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
-~ ~
-~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
-
-Postfixes:
-_AVG - *hardware* averaged value, use it if your hardware is really able to
-report averaged values.
-_NOW - momentary/instantaneous values.
-
-STATUS - this attribute represents operating status (charging, full,
-discharging (i.e. powering a load), etc.). This corresponds to
-BATTERY_STATUS_* values, as defined in battery.h.
-
-CHARGE_TYPE - batteries can typically charge at different rates.
-This defines trickle and fast charges. For batteries that
-are already charged or discharging, 'n/a' can be displayed (or
-'unknown', if the status is not known).
-
-HEALTH - represents health of the battery, values corresponds to
-POWER_SUPPLY_HEALTH_*, defined in battery.h.
-
-VOLTAGE_OCV - open circuit voltage of the battery.
-
-VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
-minimal power supply voltages. Maximal/minimal means values of voltages
-when battery considered "full"/"empty" at normal conditions. Yes, there is
-no direct relation between voltage and battery capacity, but some dumb
-batteries use voltage for very approximated calculation of capacity.
-Battery driver also can use this attribute just to inform userspace
-about maximal and minimal voltage thresholds of a given battery.
-
-VOLTAGE_MAX, VOLTAGE_MIN - same as _DESIGN voltage values except that
-these ones should be used if hardware could only guess (measure and
-retain) the thresholds of a given power supply.
-
-CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
-battery considered full/empty.
-
-ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
-
-CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
-of charge when battery became full/empty". It also could mean "value of
-charge when battery considered full/empty at given conditions (temperature,
-age)". I.e. these attributes represents real thresholds, not design values.
-
-CHARGE_COUNTER - the current charge counter (in µAh). This could easily
-be negative; there is no empty or full value. It is only useful for
-relative, time-based measurements.
-
-ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
-
-CAPACITY - capacity in percents.
-CAPACITY_LEVEL - capacity level. This corresponds to
-POWER_SUPPLY_CAPACITY_LEVEL_*.
-
-TEMP - temperature of the power supply.
-TEMP_AMBIENT - ambient temperature.
-
-TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
-while battery powers a load)
-TIME_TO_FULL - seconds left for battery to be considered full (i.e.
-while battery is charging)
-
-
-Battery <-> external power supply interaction
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Often power supplies are acting as supplies and supplicants at the same
-time. Batteries are good example. So, batteries usually care if they're
-externally powered or not.
-
-For that case, power supply class implements notification mechanism for
-batteries.
-
-External power supply (AC) lists supplicants (batteries) names in
-"supplied_to" struct member, and each power_supply_changed() call
-issued by external power supply will notify supplicants via
-external_power_changed callback.
-
-
-QA
-~~
-Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
-A: If you cannot find attribute suitable for your driver needs, feel free
- to add it and send patch along with your driver.
-
- The attributes available currently are the ones currently provided by the
- drivers written.
-
- Good candidates to add in future: model/part#, cycle_time, manufacturer,
- etc.
-
-
-Q: I have some very specific attribute (e.g. battery color), should I add
- this attribute to standard ones?
-A: Most likely, no. Such attribute can be placed in the driver itself, if
- it is useful. Of course, if the attribute in question applicable to
- large set of batteries, provided by many drivers, and/or comes from
- some general battery specification/standard, it may be a candidate to
- be added to the core attribute set.
-
-
-Q: Suppose, my battery monitoring chip/firmware does not provides capacity
- in percents, but provides charge_{now,full,empty}. Should I calculate
- percentage capacity manually, inside the driver, and register CAPACITY
- attribute? The same question about time_to_empty/time_to_full.
-A: Most likely, no. This class is designed to export properties which are
- directly measurable by the specific hardware available.
-
- Inferring not available properties using some heuristics or mathematical
- model is not subject of work for a battery driver. Such functionality
- should be factored out, and in fact, apm_power, the driver to serve
- legacy APM API on top of power supply class, uses a simple heuristic of
- approximating remaining battery capacity based on its charge, current,
- voltage and so on. But full-fledged battery model is likely not subject
- for kernel at all, as it would require floating point calculation to deal
- with things like differential equations and Kalman filters. This is
- better be handled by batteryd/libbattery, yet to be written.
diff --git a/Documentation/power/regulator/consumer.txt b/Documentation/power/regulator/consumer.txt
deleted file mode 100644
index 55c4175d809..00000000000
--- a/Documentation/power/regulator/consumer.txt
+++ /dev/null
@@ -1,182 +0,0 @@
-Regulator Consumer Driver Interface
-===================================
-
-This text describes the regulator interface for consumer device drivers.
-Please see overview.txt for a description of the terms used in this text.
-
-
-1. Consumer Regulator Access (static & dynamic drivers)
-=======================================================
-
-A consumer driver can get access to its supply regulator by calling :-
-
-regulator = regulator_get(dev, "Vcc");
-
-The consumer passes in its struct device pointer and power supply ID. The core
-then finds the correct regulator by consulting a machine specific lookup table.
-If the lookup is successful then this call will return a pointer to the struct
-regulator that supplies this consumer.
-
-To release the regulator the consumer driver should call :-
-
-regulator_put(regulator);
-
-Consumers can be supplied by more than one regulator e.g. codec consumer with
-analog and digital supplies :-
-
-digital = regulator_get(dev, "Vcc"); /* digital core */
-analog = regulator_get(dev, "Avdd"); /* analog */
-
-The regulator access functions regulator_get() and regulator_put() will
-usually be called in your device drivers probe() and remove() respectively.
-
-
-2. Regulator Output Enable & Disable (static & dynamic drivers)
-====================================================================
-
-A consumer can enable its power supply by calling:-
-
-int regulator_enable(regulator);
-
-NOTE: The supply may already be enabled before regulator_enabled() is called.
-This may happen if the consumer shares the regulator or the regulator has been
-previously enabled by bootloader or kernel board initialization code.
-
-A consumer can determine if a regulator is enabled by calling :-
-
-int regulator_is_enabled(regulator);
-
-This will return > zero when the regulator is enabled.
-
-
-A consumer can disable its supply when no longer needed by calling :-
-
-int regulator_disable(regulator);
-
-NOTE: This may not disable the supply if it's shared with other consumers. The
-regulator will only be disabled when the enabled reference count is zero.
-
-Finally, a regulator can be forcefully disabled in the case of an emergency :-
-
-int regulator_force_disable(regulator);
-
-NOTE: this will immediately and forcefully shutdown the regulator output. All
-consumers will be powered off.
-
-
-3. Regulator Voltage Control & Status (dynamic drivers)
-======================================================
-
-Some consumer drivers need to be able to dynamically change their supply
-voltage to match system operating points. e.g. CPUfreq drivers can scale
-voltage along with frequency to save power, SD drivers may need to select the
-correct card voltage, etc.
-
-Consumers can control their supply voltage by calling :-
-
-int regulator_set_voltage(regulator, min_uV, max_uV);
-
-Where min_uV and max_uV are the minimum and maximum acceptable voltages in
-microvolts.
-
-NOTE: this can be called when the regulator is enabled or disabled. If called
-when enabled, then the voltage changes instantly, otherwise the voltage
-configuration changes and the voltage is physically set when the regulator is
-next enabled.
-
-The regulators configured voltage output can be found by calling :-
-
-int regulator_get_voltage(regulator);
-
-NOTE: get_voltage() will return the configured output voltage whether the
-regulator is enabled or disabled and should NOT be used to determine regulator
-output state. However this can be used in conjunction with is_enabled() to
-determine the regulator physical output voltage.
-
-
-4. Regulator Current Limit Control & Status (dynamic drivers)
-===========================================================
-
-Some consumer drivers need to be able to dynamically change their supply
-current limit to match system operating points. e.g. LCD backlight driver can
-change the current limit to vary the backlight brightness, USB drivers may want
-to set the limit to 500mA when supplying power.
-
-Consumers can control their supply current limit by calling :-
-
-int regulator_set_current_limit(regulator, min_uA, max_uA);
-
-Where min_uA and max_uA are the minimum and maximum acceptable current limit in
-microamps.
-
-NOTE: this can be called when the regulator is enabled or disabled. If called
-when enabled, then the current limit changes instantly, otherwise the current
-limit configuration changes and the current limit is physically set when the
-regulator is next enabled.
-
-A regulators current limit can be found by calling :-
-
-int regulator_get_current_limit(regulator);
-
-NOTE: get_current_limit() will return the current limit whether the regulator
-is enabled or disabled and should not be used to determine regulator current
-load.
-
-
-5. Regulator Operating Mode Control & Status (dynamic drivers)
-=============================================================
-
-Some consumers can further save system power by changing the operating mode of
-their supply regulator to be more efficient when the consumers operating state
-changes. e.g. consumer driver is idle and subsequently draws less current
-
-Regulator operating mode can be changed indirectly or directly.
-
-Indirect operating mode control.
---------------------------------
-Consumer drivers can request a change in their supply regulator operating mode
-by calling :-
-
-int regulator_set_optimum_mode(struct regulator *regulator, int load_uA);
-
-This will cause the core to recalculate the total load on the regulator (based
-on all its consumers) and change operating mode (if necessary and permitted)
-to best match the current operating load.
-
-The load_uA value can be determined from the consumers datasheet. e.g.most
-datasheets have tables showing the max current consumed in certain situations.
-
-Most consumers will use indirect operating mode control since they have no
-knowledge of the regulator or whether the regulator is shared with other
-consumers.
-
-Direct operating mode control.
-------------------------------
-Bespoke or tightly coupled drivers may want to directly control regulator
-operating mode depending on their operating point. This can be achieved by
-calling :-
-
-int regulator_set_mode(struct regulator *regulator, unsigned int mode);
-unsigned int regulator_get_mode(struct regulator *regulator);
-
-Direct mode will only be used by consumers that *know* about the regulator and
-are not sharing the regulator with other consumers.
-
-
-6. Regulator Events
-===================
-Regulators can notify consumers of external events. Events could be received by
-consumers under regulator stress or failure conditions.
-
-Consumers can register interest in regulator events by calling :-
-
-int regulator_register_notifier(struct regulator *regulator,
- struct notifier_block *nb);
-
-Consumers can uregister interest by calling :-
-
-int regulator_unregister_notifier(struct regulator *regulator,
- struct notifier_block *nb);
-
-Regulators use the kernel notifier framework to send event to their interested
-consumers.
diff --git a/Documentation/power/regulator/design.txt b/Documentation/power/regulator/design.txt
deleted file mode 100644
index f9b56b72b78..00000000000
--- a/Documentation/power/regulator/design.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-Regulator API design notes
-==========================
-
-This document provides a brief, partially structured, overview of some
-of the design considerations which impact the regulator API design.
-
-Safety
-------
-
- - Errors in regulator configuration can have very serious consequences
- for the system, potentially including lasting hardware damage.
- - It is not possible to automatically determine the power confugration
- of the system - software-equivalent variants of the same chip may
- have different power requirments, and not all components with power
- requirements are visible to software.
-
- => The API should make no changes to the hardware state unless it has
- specific knowledge that these changes are safe to do perform on
- this particular system.
-
-Consumer use cases
-------------------
-
- - The overwhelming majority of devices in a system will have no
- requirement to do any runtime configuration of their power beyond
- being able to turn it on or off.
-
- - Many of the power supplies in the system will be shared between many
- different consumers.
-
- => The consumer API should be structured so that these use cases are
- very easy to handle and so that consumers will work with shared
- supplies without any additional effort.
diff --git a/Documentation/power/regulator/machine.txt b/Documentation/power/regulator/machine.txt
deleted file mode 100644
index ce63af0a8e3..00000000000
--- a/Documentation/power/regulator/machine.txt
+++ /dev/null
@@ -1,100 +0,0 @@
-Regulator Machine Driver Interface
-===================================
-
-The regulator machine driver interface is intended for board/machine specific
-initialisation code to configure the regulator subsystem.
-
-Consider the following machine :-
-
- Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V]
- |
- +-> [Consumer B @ 3.3V]
-
-The drivers for consumers A & B must be mapped to the correct regulator in
-order to control their power supply. This mapping can be achieved in machine
-initialisation code by creating a struct regulator_consumer_supply for
-each regulator.
-
-struct regulator_consumer_supply {
- const char *dev_name; /* consumer dev_name() */
- const char *supply; /* consumer supply - e.g. "vcc" */
-};
-
-e.g. for the machine above
-
-static struct regulator_consumer_supply regulator1_consumers[] = {
-{
- .dev_name = "dev_name(consumer B)",
- .supply = "Vcc",
-},};
-
-static struct regulator_consumer_supply regulator2_consumers[] = {
-{
- .dev = "dev_name(consumer A"),
- .supply = "Vcc",
-},};
-
-This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2
-to the 'Vcc' supply for Consumer A.
-
-Constraints can now be registered by defining a struct regulator_init_data
-for each regulator power domain. This structure also maps the consumers
-to their supply regulator :-
-
-static struct regulator_init_data regulator1_data = {
- .constraints = {
- .name = "Regulator-1",
- .min_uV = 3300000,
- .max_uV = 3300000,
- .valid_modes_mask = REGULATOR_MODE_NORMAL,
- },
- .num_consumer_supplies = ARRAY_SIZE(regulator1_consumers),
- .consumer_supplies = regulator1_consumers,
-};
-
-The name field should be set to something that is usefully descriptive
-for the board for configuration of supplies for other regulators and
-for use in logging and other diagnostic output. Normally the name
-used for the supply rail in the schematic is a good choice. If no
-name is provided then the subsystem will choose one.
-
-Regulator-1 supplies power to Regulator-2. This relationship must be registered
-with the core so that Regulator-1 is also enabled when Consumer A enables its
-supply (Regulator-2). The supply regulator is set by the supply_regulator
-field below and co:-
-
-static struct regulator_init_data regulator2_data = {
- .supply_regulator = "Regulator-1",
- .constraints = {
- .min_uV = 1800000,
- .max_uV = 2000000,
- .valid_ops_mask = REGULATOR_CHANGE_VOLTAGE,
- .valid_modes_mask = REGULATOR_MODE_NORMAL,
- },
- .num_consumer_supplies = ARRAY_SIZE(regulator2_consumers),
- .consumer_supplies = regulator2_consumers,
-};
-
-Finally the regulator devices must be registered in the usual manner.
-
-static struct platform_device regulator_devices[] = {
-{
- .name = "regulator",
- .id = DCDC_1,
- .dev = {
- .platform_data = &regulator1_data,
- },
-},
-{
- .name = "regulator",
- .id = DCDC_2,
- .dev = {
- .platform_data = &regulator2_data,
- },
-},
-};
-/* register regulator 1 device */
-platform_device_register(&regulator_devices[0]);
-
-/* register regulator 2 device */
-platform_device_register(&regulator_devices[1]);
diff --git a/Documentation/power/regulator/overview.txt b/Documentation/power/regulator/overview.txt
deleted file mode 100644
index 8ed17587a74..00000000000
--- a/Documentation/power/regulator/overview.txt
+++ /dev/null
@@ -1,171 +0,0 @@
-Linux voltage and current regulator framework
-=============================================
-
-About
-=====
-
-This framework is designed to provide a standard kernel interface to control
-voltage and current regulators.
-
-The intention is to allow systems to dynamically control regulator power output
-in order to save power and prolong battery life. This applies to both voltage
-regulators (where voltage output is controllable) and current sinks (where
-current limit is controllable).
-
-(C) 2008 Wolfson Microelectronics PLC.
-Author: Liam Girdwood <lrg@slimlogic.co.uk>
-
-
-Nomenclature
-============
-
-Some terms used in this document:-
-
- o Regulator - Electronic device that supplies power to other devices.
- Most regulators can enable and disable their output whilst
- some can control their output voltage and or current.
-
- Input Voltage -> Regulator -> Output Voltage
-
-
- o PMIC - Power Management IC. An IC that contains numerous regulators
- and often contains other subsystems.
-
-
- o Consumer - Electronic device that is supplied power by a regulator.
- Consumers can be classified into two types:-
-
- Static: consumer does not change its supply voltage or
- current limit. It only needs to enable or disable it's
- power supply. Its supply voltage is set by the hardware,
- bootloader, firmware or kernel board initialisation code.
-
- Dynamic: consumer needs to change it's supply voltage or
- current limit to meet operation demands.
-
-
- o Power Domain - Electronic circuit that is supplied its input power by the
- output power of a regulator, switch or by another power
- domain.
-
- The supply regulator may be behind a switch(s). i.e.
-
- Regulator -+-> Switch-1 -+-> Switch-2 --> [Consumer A]
- | |
- | +-> [Consumer B], [Consumer C]
- |
- +-> [Consumer D], [Consumer E]
-
- That is one regulator and three power domains:
-
- Domain 1: Switch-1, Consumers D & E.
- Domain 2: Switch-2, Consumers B & C.
- Domain 3: Consumer A.
-
- and this represents a "supplies" relationship:
-
- Domain-1 --> Domain-2 --> Domain-3.
-
- A power domain may have regulators that are supplied power
- by other regulators. i.e.
-
- Regulator-1 -+-> Regulator-2 -+-> [Consumer A]
- |
- +-> [Consumer B]
-
- This gives us two regulators and two power domains:
-
- Domain 1: Regulator-2, Consumer B.
- Domain 2: Consumer A.
-
- and a "supplies" relationship:
-
- Domain-1 --> Domain-2
-
-
- o Constraints - Constraints are used to define power levels for performance
- and hardware protection. Constraints exist at three levels:
-
- Regulator Level: This is defined by the regulator hardware
- operating parameters and is specified in the regulator
- datasheet. i.e.
-
- - voltage output is in the range 800mV -> 3500mV.
- - regulator current output limit is 20mA @ 5V but is
- 10mA @ 10V.
-
- Power Domain Level: This is defined in software by kernel
- level board initialisation code. It is used to constrain a
- power domain to a particular power range. i.e.
-
- - Domain-1 voltage is 3300mV
- - Domain-2 voltage is 1400mV -> 1600mV
- - Domain-3 current limit is 0mA -> 20mA.
-
- Consumer Level: This is defined by consumer drivers
- dynamically setting voltage or current limit levels.
-
- e.g. a consumer backlight driver asks for a current increase
- from 5mA to 10mA to increase LCD illumination. This passes
- to through the levels as follows :-
-
- Consumer: need to increase LCD brightness. Lookup and
- request next current mA value in brightness table (the
- consumer driver could be used on several different
- personalities based upon the same reference device).
-
- Power Domain: is the new current limit within the domain
- operating limits for this domain and system state (e.g.
- battery power, USB power)
-
- Regulator Domains: is the new current limit within the
- regulator operating parameters for input/output voltage.
-
- If the regulator request passes all the constraint tests
- then the new regulator value is applied.
-
-
-Design
-======
-
-The framework is designed and targeted at SoC based devices but may also be
-relevant to non SoC devices and is split into the following four interfaces:-
-
-
- 1. Consumer driver interface.
-
- This uses a similar API to the kernel clock interface in that consumer
- drivers can get and put a regulator (like they can with clocks atm) and
- get/set voltage, current limit, mode, enable and disable. This should
- allow consumers complete control over their supply voltage and current
- limit. This also compiles out if not in use so drivers can be reused in
- systems with no regulator based power control.
-
- See Documentation/power/regulator/consumer.txt
-
- 2. Regulator driver interface.
-
- This allows regulator drivers to register their regulators and provide
- operations to the core. It also has a notifier call chain for propagating
- regulator events to clients.
-
- See Documentation/power/regulator/regulator.txt
-
- 3. Machine interface.
-
- This interface is for machine specific code and allows the creation of
- voltage/current domains (with constraints) for each regulator. It can
- provide regulator constraints that will prevent device damage through
- overvoltage or over current caused by buggy client drivers. It also
- allows the creation of a regulator tree whereby some regulators are
- supplied by others (similar to a clock tree).
-
- See Documentation/power/regulator/machine.txt
-
- 4. Userspace ABI.
-
- The framework also exports a lot of useful voltage/current/opmode data to
- userspace via sysfs. This could be used to help monitor device power
- consumption and status.
-
- See Documentation/ABI/testing/sysfs-class-regulator
diff --git a/Documentation/power/regulator/regulator.txt b/Documentation/power/regulator/regulator.txt
deleted file mode 100644
index 13902778ae4..00000000000
--- a/Documentation/power/regulator/regulator.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Regulator Driver Interface
-==========================
-
-The regulator driver interface is relatively simple and designed to allow
-regulator drivers to register their services with the core framework.
-
-
-Registration
-============
-
-Drivers can register a regulator by calling :-
-
-struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc,
- const struct regulator_config *config);
-
-This will register the regulators capabilities and operations to the regulator
-core.
-
-Regulators can be unregistered by calling :-
-
-void regulator_unregister(struct regulator_dev *rdev);
-
-
-Regulator Events
-================
-Regulators can send events (e.g. over temp, under voltage, etc) to consumer
-drivers by calling :-
-
-int regulator_notifier_call_chain(struct regulator_dev *rdev,
- unsigned long event, void *data);
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
deleted file mode 100644
index 4abe83e1045..00000000000
--- a/Documentation/power/runtime_pm.txt
+++ /dev/null
@@ -1,887 +0,0 @@
-Runtime Power Management Framework for I/O Devices
-
-(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
-(C) 2010 Alan Stern <stern@rowland.harvard.edu>
-
-1. Introduction
-
-Support for runtime power management (runtime PM) of I/O devices is provided
-at the power management core (PM core) level by means of:
-
-* The power management workqueue pm_wq in which bus types and device drivers can
- put their PM-related work items. It is strongly recommended that pm_wq be
- used for queuing all work items related to runtime PM, because this allows
- them to be synchronized with system-wide power transitions (suspend to RAM,
- hibernation and resume from system sleep states). pm_wq is declared in
- include/linux/pm_runtime.h and defined in kernel/power/main.c.
-
-* A number of runtime PM fields in the 'power' member of 'struct device' (which
- is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
- be used for synchronizing runtime PM operations with one another.
-
-* Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in
- include/linux/pm.h).
-
-* A set of helper functions defined in drivers/base/power/runtime.c that can be
- used for carrying out runtime PM operations in such a way that the
- synchronization between them is taken care of by the PM core. Bus types and
- device drivers are encouraged to use these functions.
-
-The runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM
-fields of 'struct dev_pm_info' and the core helper functions provided for
-runtime PM are described below.
-
-2. Device Runtime PM Callbacks
-
-There are three device runtime PM callbacks defined in 'struct dev_pm_ops':
-
-struct dev_pm_ops {
- ...
- int (*runtime_suspend)(struct device *dev);
- int (*runtime_resume)(struct device *dev);
- int (*runtime_idle)(struct device *dev);
- ...
-};
-
-The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
-are executed by the PM core for the device's subsystem that may be either of
-the following:
-
- 1. PM domain of the device, if the device's PM domain object, dev->pm_domain,
- is present.
-
- 2. Device type of the device, if both dev->type and dev->type->pm are present.
-
- 3. Device class of the device, if both dev->class and dev->class->pm are
- present.
-
- 4. Bus type of the device, if both dev->bus and dev->bus->pm are present.
-
-If the subsystem chosen by applying the above rules doesn't provide the relevant
-callback, the PM core will invoke the corresponding driver callback stored in
-dev->driver->pm directly (if present).
-
-The PM core always checks which callback to use in the order given above, so the
-priority order of callbacks from high to low is: PM domain, device type, class
-and bus type. Moreover, the high-priority one will always take precedence over
-a low-priority one. The PM domain, bus type, device type and class callbacks
-are referred to as subsystem-level callbacks in what follows.
-
-By default, the callbacks are always invoked in process context with interrupts
-enabled. However, the pm_runtime_irq_safe() helper function can be used to tell
-the PM core that it is safe to run the ->runtime_suspend(), ->runtime_resume()
-and ->runtime_idle() callbacks for the given device in atomic context with
-interrupts disabled. This implies that the callback routines in question must
-not block or sleep, but it also means that the synchronous helper functions
-listed at the end of Section 4 may be used for that device within an interrupt
-handler or generally in an atomic context.
-
-The subsystem-level suspend callback, if present, is _entirely_ _responsible_
-for handling the suspend of the device as appropriate, which may, but need not
-include executing the device driver's own ->runtime_suspend() callback (from the
-PM core's point of view it is not necessary to implement a ->runtime_suspend()
-callback in a device driver as long as the subsystem-level suspend callback
-knows what to do to handle the device).
-
- * Once the subsystem-level suspend callback (or the driver suspend callback,
- if invoked directly) has completed successfully for the given device, the PM
- core regards the device as suspended, which need not mean that it has been
- put into a low power state. It is supposed to mean, however, that the
- device will not process data and will not communicate with the CPU(s) and
- RAM until the appropriate resume callback is executed for it. The runtime
- PM status of a device after successful execution of the suspend callback is
- 'suspended'.
-
- * If the suspend callback returns -EBUSY or -EAGAIN, the device's runtime PM
- status remains 'active', which means that the device _must_ be fully
- operational afterwards.
-
- * If the suspend callback returns an error code different from -EBUSY and
- -EAGAIN, the PM core regards this as a fatal error and will refuse to run
- the helper functions described in Section 4 for the device until its status
- is directly set to either'active', or 'suspended' (the PM core provides
- special helper functions for this purpose).
-
-In particular, if the driver requires remote wakeup capability (i.e. hardware
-mechanism allowing the device to request a change of its power state, such as
-PCI PME) for proper functioning and device_run_wake() returns 'false' for the
-device, then ->runtime_suspend() should return -EBUSY. On the other hand, if
-device_run_wake() returns 'true' for the device and the device is put into a
-low-power state during the execution of the suspend callback, it is expected
-that remote wakeup will be enabled for the device. Generally, remote wakeup
-should be enabled for all input devices put into low-power states at run time.
-
-The subsystem-level resume callback, if present, is _entirely_ _responsible_ for
-handling the resume of the device as appropriate, which may, but need not
-include executing the device driver's own ->runtime_resume() callback (from the
-PM core's point of view it is not necessary to implement a ->runtime_resume()
-callback in a device driver as long as the subsystem-level resume callback knows
-what to do to handle the device).
-
- * Once the subsystem-level resume callback (or the driver resume callback, if
- invoked directly) has completed successfully, the PM core regards the device
- as fully operational, which means that the device _must_ be able to complete
- I/O operations as needed. The runtime PM status of the device is then
- 'active'.
-
- * If the resume callback returns an error code, the PM core regards this as a
- fatal error and will refuse to run the helper functions described in Section
- 4 for the device, until its status is directly set to either 'active', or
- 'suspended' (by means of special helper functions provided by the PM core
- for this purpose).
-
-The idle callback (a subsystem-level one, if present, or the driver one) is
-executed by the PM core whenever the device appears to be idle, which is
-indicated to the PM core by two counters, the device's usage counter and the
-counter of 'active' children of the device.
-
- * If any of these counters is decreased using a helper function provided by
- the PM core and it turns out to be equal to zero, the other counter is
- checked. If that counter also is equal to zero, the PM core executes the
- idle callback with the device as its argument.
-
-The action performed by the idle callback is totally dependent on the subsystem
-(or driver) in question, but the expected and recommended action is to check
-if the device can be suspended (i.e. if all of the conditions necessary for
-suspending the device are satisfied) and to queue up a suspend request for the
-device in that case. The value returned by this callback is ignored by the PM
-core.
-
-The helper functions provided by the PM core, described in Section 4, guarantee
-that the following constraints are met with respect to runtime PM callbacks for
-one device:
-
-(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
- ->runtime_suspend() in parallel with ->runtime_resume() or with another
- instance of ->runtime_suspend() for the same device) with the exception that
- ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
- ->runtime_idle() (although ->runtime_idle() will not be started while any
- of the other callbacks is being executed for the same device).
-
-(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
- devices (i.e. the PM core will only execute ->runtime_idle() or
- ->runtime_suspend() for the devices the runtime PM status of which is
- 'active').
-
-(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
- the usage counter of which is equal to zero _and_ either the counter of
- 'active' children of which is equal to zero, or the 'power.ignore_children'
- flag of which is set.
-
-(4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the
- PM core will only execute ->runtime_resume() for the devices the runtime
- PM status of which is 'suspended').
-
-Additionally, the helper functions provided by the PM core obey the following
-rules:
-
- * If ->runtime_suspend() is about to be executed or there's a pending request
- to execute it, ->runtime_idle() will not be executed for the same device.
-
- * A request to execute or to schedule the execution of ->runtime_suspend()
- will cancel any pending requests to execute ->runtime_idle() for the same
- device.
-
- * If ->runtime_resume() is about to be executed or there's a pending request
- to execute it, the other callbacks will not be executed for the same device.
-
- * A request to execute ->runtime_resume() will cancel any pending or
- scheduled requests to execute the other callbacks for the same device,
- except for scheduled autosuspends.
-
-3. Runtime PM Device Fields
-
-The following device runtime PM fields are present in 'struct dev_pm_info', as
-defined in include/linux/pm.h:
-
- struct timer_list suspend_timer;
- - timer used for scheduling (delayed) suspend and autosuspend requests
-
- unsigned long timer_expires;
- - timer expiration time, in jiffies (if this is different from zero, the
- timer is running and will expire at that time, otherwise the timer is not
- running)
-
- struct work_struct work;
- - work structure used for queuing up requests (i.e. work items in pm_wq)
-
- wait_queue_head_t wait_queue;
- - wait queue used if any of the helper functions needs to wait for another
- one to complete
-
- spinlock_t lock;
- - lock used for synchronisation
-
- atomic_t usage_count;
- - the usage counter of the device
-
- atomic_t child_count;
- - the count of 'active' children of the device
-
- unsigned int ignore_children;
- - if set, the value of child_count is ignored (but still updated)
-
- unsigned int disable_depth;
- - used for disabling the helper funcions (they work normally if this is
- equal to zero); the initial value of it is 1 (i.e. runtime PM is
- initially disabled for all devices)
-
- unsigned int runtime_error;
- - if set, there was a fatal error (one of the callbacks returned error code
- as described in Section 2), so the helper funtions will not work until
- this flag is cleared; this is the error code returned by the failing
- callback
-
- unsigned int idle_notification;
- - if set, ->runtime_idle() is being executed
-
- unsigned int request_pending;
- - if set, there's a pending request (i.e. a work item queued up into pm_wq)
-
- enum rpm_request request;
- - type of request that's pending (valid if request_pending is set)
-
- unsigned int deferred_resume;
- - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
- being executed for that device and it is not practical to wait for the
- suspend to complete; means "start a resume as soon as you've suspended"
-
- unsigned int run_wake;
- - set if the device is capable of generating runtime wake-up events
-
- enum rpm_status runtime_status;
- - the runtime PM status of the device; this field's initial value is
- RPM_SUSPENDED, which means that each device is initially regarded by the
- PM core as 'suspended', regardless of its real hardware status
-
- unsigned int runtime_auto;
- - if set, indicates that the user space has allowed the device driver to
- power manage the device at run time via the /sys/devices/.../power/control
- interface; it may only be modified with the help of the pm_runtime_allow()
- and pm_runtime_forbid() helper functions
-
- unsigned int no_callbacks;
- - indicates that the device does not use the runtime PM callbacks (see
- Section 8); it may be modified only by the pm_runtime_no_callbacks()
- helper function
-
- unsigned int irq_safe;
- - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
- will be invoked with the spinlock held and interrupts disabled
-
- unsigned int use_autosuspend;
- - indicates that the device's driver supports delayed autosuspend (see
- Section 9); it may be modified only by the
- pm_runtime{_dont}_use_autosuspend() helper functions
-
- unsigned int timer_autosuspends;
- - indicates that the PM core should attempt to carry out an autosuspend
- when the timer expires rather than a normal suspend
-
- int autosuspend_delay;
- - the delay time (in milliseconds) to be used for autosuspend
-
- unsigned long last_busy;
- - the time (in jiffies) when the pm_runtime_mark_last_busy() helper
- function was last called for this device; used in calculating inactivity
- periods for autosuspend
-
-All of the above fields are members of the 'power' member of 'struct device'.
-
-4. Runtime PM Device Helper Functions
-
-The following runtime PM helper functions are defined in
-drivers/base/power/runtime.c and include/linux/pm_runtime.h:
-
- void pm_runtime_init(struct device *dev);
- - initialize the device runtime PM fields in 'struct dev_pm_info'
-
- void pm_runtime_remove(struct device *dev);
- - make sure that the runtime PM of the device will be disabled after
- removing the device from device hierarchy
-
- int pm_runtime_idle(struct device *dev);
- - execute the subsystem-level idle callback for the device; returns 0 on
- success or error code on failure, where -EINPROGRESS means that
- ->runtime_idle() is already being executed
-
- int pm_runtime_suspend(struct device *dev);
- - execute the subsystem-level suspend callback for the device; returns 0 on
- success, 1 if the device's runtime PM status was already 'suspended', or
- error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
- to suspend the device again in future and -EACCES means that
- 'power.disable_depth' is different from 0
-
- int pm_runtime_autosuspend(struct device *dev);
- - same as pm_runtime_suspend() except that the autosuspend delay is taken
- into account; if pm_runtime_autosuspend_expiration() says the delay has
- not yet expired then an autosuspend is scheduled for the appropriate time
- and 0 is returned
-
- int pm_runtime_resume(struct device *dev);
- - execute the subsystem-level resume callback for the device; returns 0 on
- success, 1 if the device's runtime PM status was already 'active' or
- error code on failure, where -EAGAIN means it may be safe to attempt to
- resume the device again in future, but 'power.runtime_error' should be
- checked additionally, and -EACCES means that 'power.disable_depth' is
- different from 0
-
- int pm_request_idle(struct device *dev);
- - submit a request to execute the subsystem-level idle callback for the
- device (the request is represented by a work item in pm_wq); returns 0 on
- success or error code if the request has not been queued up
-
- int pm_request_autosuspend(struct device *dev);
- - schedule the execution of the subsystem-level suspend callback for the
- device when the autosuspend delay has expired; if the delay has already
- expired then the work item is queued up immediately
-
- int pm_schedule_suspend(struct device *dev, unsigned int delay);
- - schedule the execution of the subsystem-level suspend callback for the
- device in future, where 'delay' is the time to wait before queuing up a
- suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
- item is queued up immediately); returns 0 on success, 1 if the device's PM
- runtime status was already 'suspended', or error code if the request
- hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
- ->runtime_suspend() is already scheduled and not yet expired, the new
- value of 'delay' will be used as the time to wait
-
- int pm_request_resume(struct device *dev);
- - submit a request to execute the subsystem-level resume callback for the
- device (the request is represented by a work item in pm_wq); returns 0 on
- success, 1 if the device's runtime PM status was already 'active', or
- error code if the request hasn't been queued up
-
- void pm_runtime_get_noresume(struct device *dev);
- - increment the device's usage counter
-
- int pm_runtime_get(struct device *dev);
- - increment the device's usage counter, run pm_request_resume(dev) and
- return its result
-
- int pm_runtime_get_sync(struct device *dev);
- - increment the device's usage counter, run pm_runtime_resume(dev) and
- return its result
-
- void pm_runtime_put_noidle(struct device *dev);
- - decrement the device's usage counter
-
- int pm_runtime_put(struct device *dev);
- - decrement the device's usage counter; if the result is 0 then run
- pm_request_idle(dev) and return its result
-
- int pm_runtime_put_autosuspend(struct device *dev);
- - decrement the device's usage counter; if the result is 0 then run
- pm_request_autosuspend(dev) and return its result
-
- int pm_runtime_put_sync(struct device *dev);
- - decrement the device's usage counter; if the result is 0 then run
- pm_runtime_idle(dev) and return its result
-
- int pm_runtime_put_sync_suspend(struct device *dev);
- - decrement the device's usage counter; if the result is 0 then run
- pm_runtime_suspend(dev) and return its result
-
- int pm_runtime_put_sync_autosuspend(struct device *dev);
- - decrement the device's usage counter; if the result is 0 then run
- pm_runtime_autosuspend(dev) and return its result
-
- void pm_runtime_enable(struct device *dev);
- - decrement the device's 'power.disable_depth' field; if that field is equal
- to zero, the runtime PM helper functions can execute subsystem-level
- callbacks described in Section 2 for the device
-
- int pm_runtime_disable(struct device *dev);
- - increment the device's 'power.disable_depth' field (if the value of that
- field was previously zero, this prevents subsystem-level runtime PM
- callbacks from being run for the device), make sure that all of the pending
- runtime PM operations on the device are either completed or canceled;
- returns 1 if there was a resume request pending and it was necessary to
- execute the subsystem-level resume callback for the device to satisfy that
- request, otherwise 0 is returned
-
- int pm_runtime_barrier(struct device *dev);
- - check if there's a resume request pending for the device and resume it
- (synchronously) in that case, cancel any other pending runtime PM requests
- regarding it and wait for all runtime PM operations on it in progress to
- complete; returns 1 if there was a resume request pending and it was
- necessary to execute the subsystem-level resume callback for the device to
- satisfy that request, otherwise 0 is returned
-
- void pm_suspend_ignore_children(struct device *dev, bool enable);
- - set/unset the power.ignore_children flag of the device
-
- int pm_runtime_set_active(struct device *dev);
- - clear the device's 'power.runtime_error' flag, set the device's runtime
- PM status to 'active' and update its parent's counter of 'active'
- children as appropriate (it is only valid to use this function if
- 'power.runtime_error' is set or 'power.disable_depth' is greater than
- zero); it will fail and return error code if the device has a parent
- which is not active and the 'power.ignore_children' flag of which is unset
-
- void pm_runtime_set_suspended(struct device *dev);
- - clear the device's 'power.runtime_error' flag, set the device's runtime
- PM status to 'suspended' and update its parent's counter of 'active'
- children as appropriate (it is only valid to use this function if
- 'power.runtime_error' is set or 'power.disable_depth' is greater than
- zero)
-
- bool pm_runtime_suspended(struct device *dev);
- - return true if the device's runtime PM status is 'suspended' and its
- 'power.disable_depth' field is equal to zero, or false otherwise
-
- bool pm_runtime_status_suspended(struct device *dev);
- - return true if the device's runtime PM status is 'suspended'
-
- void pm_runtime_allow(struct device *dev);
- - set the power.runtime_auto flag for the device and decrease its usage
- counter (used by the /sys/devices/.../power/control interface to
- effectively allow the device to be power managed at run time)
-
- void pm_runtime_forbid(struct device *dev);
- - unset the power.runtime_auto flag for the device and increase its usage
- counter (used by the /sys/devices/.../power/control interface to
- effectively prevent the device from being power managed at run time)
-
- void pm_runtime_no_callbacks(struct device *dev);
- - set the power.no_callbacks flag for the device and remove the runtime
- PM attributes from /sys/devices/.../power (or prevent them from being
- added when the device is registered)
-
- void pm_runtime_irq_safe(struct device *dev);
- - set the power.irq_safe flag for the device, causing the runtime-PM
- callbacks to be invoked with interrupts off
-
- void pm_runtime_mark_last_busy(struct device *dev);
- - set the power.last_busy field to the current time
-
- void pm_runtime_use_autosuspend(struct device *dev);
- - set the power.use_autosuspend flag, enabling autosuspend delays
-
- void pm_runtime_dont_use_autosuspend(struct device *dev);
- - clear the power.use_autosuspend flag, disabling autosuspend delays
-
- void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
- - set the power.autosuspend_delay value to 'delay' (expressed in
- milliseconds); if 'delay' is negative then runtime suspends are
- prevented
-
- unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
- - calculate the time when the current autosuspend delay period will expire,
- based on power.last_busy and power.autosuspend_delay; if the delay time
- is 1000 ms or larger then the expiration time is rounded up to the
- nearest second; returns 0 if the delay period has already expired or
- power.use_autosuspend isn't set, otherwise returns the expiration time
- in jiffies
-
-It is safe to execute the following helper functions from interrupt context:
-
-pm_request_idle()
-pm_request_autosuspend()
-pm_schedule_suspend()
-pm_request_resume()
-pm_runtime_get_noresume()
-pm_runtime_get()
-pm_runtime_put_noidle()
-pm_runtime_put()
-pm_runtime_put_autosuspend()
-pm_runtime_enable()
-pm_suspend_ignore_children()
-pm_runtime_set_active()
-pm_runtime_set_suspended()
-pm_runtime_suspended()
-pm_runtime_mark_last_busy()
-pm_runtime_autosuspend_expiration()
-
-If pm_runtime_irq_safe() has been called for a device then the following helper
-functions may also be used in interrupt context:
-
-pm_runtime_idle()
-pm_runtime_suspend()
-pm_runtime_autosuspend()
-pm_runtime_resume()
-pm_runtime_get_sync()
-pm_runtime_put_sync()
-pm_runtime_put_sync_suspend()
-pm_runtime_put_sync_autosuspend()
-
-5. Runtime PM Initialization, Device Probing and Removal
-
-Initially, the runtime PM is disabled for all devices, which means that the
-majority of the runtime PM helper funtions described in Section 4 will return
--EAGAIN until pm_runtime_enable() is called for the device.
-
-In addition to that, the initial runtime PM status of all devices is
-'suspended', but it need not reflect the actual physical state of the device.
-Thus, if the device is initially active (i.e. it is able to process I/O), its
-runtime PM status must be changed to 'active', with the help of
-pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
-
-However, if the device has a parent and the parent's runtime PM is enabled,
-calling pm_runtime_set_active() for the device will affect the parent, unless
-the parent's 'power.ignore_children' flag is set. Namely, in that case the
-parent won't be able to suspend at run time, using the PM core's helper
-functions, as long as the child's status is 'active', even if the child's
-runtime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
-the child yet or pm_runtime_disable() has been called for it). For this reason,
-once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
-should be called for it too as soon as reasonably possible or its runtime PM
-status should be changed back to 'suspended' with the help of
-pm_runtime_set_suspended().
-
-If the default initial runtime PM status of the device (i.e. 'suspended')
-reflects the actual state of the device, its bus type's or its driver's
-->probe() callback will likely need to wake it up using one of the PM core's
-helper functions described in Section 4. In that case, pm_runtime_resume()
-should be used. Of course, for this purpose the device's runtime PM has to be
-enabled earlier by calling pm_runtime_enable().
-
-If the device bus type's or driver's ->probe() callback runs
-pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
-they will fail returning -EAGAIN, because the device's usage counter is
-incremented by the driver core before executing ->probe(). Still, it may be
-desirable to suspend the device as soon as ->probe() has finished, so the driver
-core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for
-the device at that time.
-
-Moreover, the driver core prevents runtime PM callbacks from racing with the bus
-notifier callback in __device_release_driver(), which is necessary, because the
-notifier is used by some subsystems to carry out operations affecting the
-runtime PM functionality. It does so by calling pm_runtime_get_sync() before
-driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This
-resumes the device if it's in the suspended state and prevents it from
-being suspended again while those routines are being executed.
-
-To allow bus types and drivers to put devices into the suspended state by
-calling pm_runtime_suspend() from their ->remove() routines, the driver core
-executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER
-notifications in __device_release_driver(). This requires bus types and
-drivers to make their ->remove() callbacks avoid races with runtime PM directly,
-but also it allows of more flexibility in the handling of devices during the
-removal of their drivers.
-
-The user space can effectively disallow the driver of the device to power manage
-it at run time by changing the value of its /sys/devices/.../power/control
-attribute to "on", which causes pm_runtime_forbid() to be called. In principle,
-this mechanism may also be used by the driver to effectively turn off the
-runtime power management of the device until the user space turns it on.
-Namely, during the initialization the driver can make sure that the runtime PM
-status of the device is 'active' and call pm_runtime_forbid(). It should be
-noted, however, that if the user space has already intentionally changed the
-value of /sys/devices/.../power/control to "auto" to allow the driver to power
-manage the device at run time, the driver may confuse it by using
-pm_runtime_forbid() this way.
-
-6. Runtime PM and System Sleep
-
-Runtime PM and system sleep (i.e., system suspend and hibernation, also known
-as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
-ways. If a device is active when a system sleep starts, everything is
-straightforward. But what should happen if the device is already suspended?
-
-The device may have different wake-up settings for runtime PM and system sleep.
-For example, remote wake-up may be enabled for runtime suspend but disallowed
-for system sleep (device_may_wakeup(dev) returns 'false'). When this happens,
-the subsystem-level system suspend callback is responsible for changing the
-device's wake-up setting (it may leave that to the device driver's system
-suspend routine). It may be necessary to resume the device and suspend it again
-in order to do so. The same is true if the driver uses different power levels
-or other settings for runtime suspend and system sleep.
-
-During system resume, the simplest approach is to bring all devices back to full
-power, even if they had been suspended before the system suspend began. There
-are several reasons for this, including:
-
- * The device might need to switch power levels, wake-up settings, etc.
-
- * Remote wake-up events might have been lost by the firmware.
-
- * The device's children may need the device to be at full power in order
- to resume themselves.
-
- * The driver's idea of the device state may not agree with the device's
- physical state. This can happen during resume from hibernation.
-
- * The device might need to be reset.
-
- * Even though the device was suspended, if its usage counter was > 0 then most
- likely it would need a runtime resume in the near future anyway.
-
-If the device had been suspended before the system suspend began and it's
-brought back to full power during resume, then its runtime PM status will have
-to be updated to reflect the actual post-system sleep status. The way to do
-this is:
-
- pm_runtime_disable(dev);
- pm_runtime_set_active(dev);
- pm_runtime_enable(dev);
-
-The PM core always increments the runtime usage counter before calling the
-->suspend() callback and decrements it after calling the ->resume() callback.
-Hence disabling runtime PM temporarily like this will not cause any runtime
-suspend attempts to be permanently lost. If the usage count goes to zero
-following the return of the ->resume() callback, the ->runtime_idle() callback
-will be invoked as usual.
-
-On some systems, however, system sleep is not entered through a global firmware
-or hardware operation. Instead, all hardware components are put into low-power
-states directly by the kernel in a coordinated way. Then, the system sleep
-state effectively follows from the states the hardware components end up in
-and the system is woken up from that state by a hardware interrupt or a similar
-mechanism entirely under the kernel's control. As a result, the kernel never
-gives control away and the states of all devices during resume are precisely
-known to it. If that is the case and none of the situations listed above takes
-place (in particular, if the system is not waking up from hibernation), it may
-be more efficient to leave the devices that had been suspended before the system
-suspend began in the suspended state.
-
-The PM core does its best to reduce the probability of race conditions between
-the runtime PM and system suspend/resume (and hibernation) callbacks by carrying
-out the following operations:
-
- * During system suspend it calls pm_runtime_get_noresume() and
- pm_runtime_barrier() for every device right before executing the
- subsystem-level .suspend() callback for it. In addition to that it calls
- pm_runtime_disable() for every device right after executing the
- subsystem-level .suspend() callback for it.
-
- * During system resume it calls pm_runtime_enable() and pm_runtime_put_sync()
- for every device right before and right after executing the subsystem-level
- .resume() callback for it, respectively.
-
-7. Generic subsystem callbacks
-
-Subsystems may wish to conserve code space by using the set of generic power
-management callbacks provided by the PM core, defined in
-driver/base/power/generic_ops.c:
-
- int pm_generic_runtime_idle(struct device *dev);
- - invoke the ->runtime_idle() callback provided by the driver of this
- device, if defined, and call pm_runtime_suspend() for this device if the
- return value is 0 or the callback is not defined
-
- int pm_generic_runtime_suspend(struct device *dev);
- - invoke the ->runtime_suspend() callback provided by the driver of this
- device and return its result, or return -EINVAL if not defined
-
- int pm_generic_runtime_resume(struct device *dev);
- - invoke the ->runtime_resume() callback provided by the driver of this
- device and return its result, or return -EINVAL if not defined
-
- int pm_generic_suspend(struct device *dev);
- - if the device has not been suspended at run time, invoke the ->suspend()
- callback provided by its driver and return its result, or return 0 if not
- defined
-
- int pm_generic_suspend_noirq(struct device *dev);
- - if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
- callback provided by the device's driver and return its result, or return
- 0 if not defined
-
- int pm_generic_resume(struct device *dev);
- - invoke the ->resume() callback provided by the driver of this device and,
- if successful, change the device's runtime PM status to 'active'
-
- int pm_generic_resume_noirq(struct device *dev);
- - invoke the ->resume_noirq() callback provided by the driver of this device
-
- int pm_generic_freeze(struct device *dev);
- - if the device has not been suspended at run time, invoke the ->freeze()
- callback provided by its driver and return its result, or return 0 if not
- defined
-
- int pm_generic_freeze_noirq(struct device *dev);
- - if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
- callback provided by the device's driver and return its result, or return
- 0 if not defined
-
- int pm_generic_thaw(struct device *dev);
- - if the device has not been suspended at run time, invoke the ->thaw()
- callback provided by its driver and return its result, or return 0 if not
- defined
-
- int pm_generic_thaw_noirq(struct device *dev);
- - if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
- callback provided by the device's driver and return its result, or return
- 0 if not defined
-
- int pm_generic_poweroff(struct device *dev);
- - if the device has not been suspended at run time, invoke the ->poweroff()
- callback provided by its driver and return its result, or return 0 if not
- defined
-
- int pm_generic_poweroff_noirq(struct device *dev);
- - if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
- callback provided by the device's driver and return its result, or return
- 0 if not defined
-
- int pm_generic_restore(struct device *dev);
- - invoke the ->restore() callback provided by the driver of this device and,
- if successful, change the device's runtime PM status to 'active'
-
- int pm_generic_restore_noirq(struct device *dev);
- - invoke the ->restore_noirq() callback provided by the device's driver
-
-These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(),
-->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(),
-->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(),
-->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() callback
-pointers in the subsystem-level dev_pm_ops structures.
-
-If a subsystem wishes to use all of them at the same time, it can simply assign
-the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its
-dev_pm_ops structure pointer.
-
-Device drivers that wish to use the same function as a system suspend, freeze,
-poweroff and runtime suspend callback, and similarly for system resume, thaw,
-restore, and runtime resume, can achieve this with the help of the
-UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
-last argument to NULL).
-
-8. "No-Callback" Devices
-
-Some "devices" are only logical sub-devices of their parent and cannot be
-power-managed on their own. (The prototype example is a USB interface. Entire
-USB devices can go into low-power mode or send wake-up requests, but neither is
-possible for individual interfaces.) The drivers for these devices have no
-need of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend()
-and ->runtime_resume() would always return 0 without doing anything else and
-->runtime_idle() would always call pm_runtime_suspend().
-
-Subsystems can tell the PM core about these devices by calling
-pm_runtime_no_callbacks(). This should be done after the device structure is
-initialized and before it is registered (although after device registration is
-also okay). The routine will set the device's power.no_callbacks flag and
-prevent the non-debugging runtime PM sysfs attributes from being created.
-
-When power.no_callbacks is set, the PM core will not invoke the
-->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
-Instead it will assume that suspends and resumes always succeed and that idle
-devices should be suspended.
-
-As a consequence, the PM core will never directly inform the device's subsystem
-or driver about runtime power changes. Instead, the driver for the device's
-parent must take responsibility for telling the device's driver when the
-parent's power state changes.
-
-9. Autosuspend, or automatically-delayed suspends
-
-Changing a device's power state isn't free; it requires both time and energy.
-A device should be put in a low-power state only when there's some reason to
-think it will remain in that state for a substantial time. A common heuristic
-says that a device which hasn't been used for a while is liable to remain
-unused; following this advice, drivers should not allow devices to be suspended
-at runtime until they have been inactive for some minimum period. Even when
-the heuristic ends up being non-optimal, it will still prevent devices from
-"bouncing" too rapidly between low-power and full-power states.
-
-The term "autosuspend" is an historical remnant. It doesn't mean that the
-device is automatically suspended (the subsystem or driver still has to call
-the appropriate PM routines); rather it means that runtime suspends will
-automatically be delayed until the desired period of inactivity has elapsed.
-
-Inactivity is determined based on the power.last_busy field. Drivers should
-call pm_runtime_mark_last_busy() to update this field after carrying out I/O,
-typically just before calling pm_runtime_put_autosuspend(). The desired length
-of the inactivity period is a matter of policy. Subsystems can set this length
-initially by calling pm_runtime_set_autosuspend_delay(), but after device
-registration the length should be controlled by user space, using the
-/sys/devices/.../power/autosuspend_delay_ms attribute.
-
-In order to use autosuspend, subsystems or drivers must call
-pm_runtime_use_autosuspend() (preferably before registering the device), and
-thereafter they should use the various *_autosuspend() helper functions instead
-of the non-autosuspend counterparts:
-
- Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
- Instead of: pm_schedule_suspend use: pm_request_autosuspend;
- Instead of: pm_runtime_put use: pm_runtime_put_autosuspend;
- Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend.
-
-Drivers may also continue to use the non-autosuspend helper functions; they
-will behave normally, not taking the autosuspend delay into account.
-Similarly, if the power.use_autosuspend field isn't set then the autosuspend
-helper functions will behave just like the non-autosuspend counterparts.
-
-Under some circumstances a driver or subsystem may want to prevent a device
-from autosuspending immediately, even though the usage counter is zero and the
-autosuspend delay time has expired. If the ->runtime_suspend() callback
-returns -EAGAIN or -EBUSY, and if the next autosuspend delay expiration time is
-in the future (as it normally would be if the callback invoked
-pm_runtime_mark_last_busy()), the PM core will automatically reschedule the
-autosuspend. The ->runtime_suspend() callback can't do this rescheduling
-itself because no suspend requests of any kind are accepted while the device is
-suspending (i.e., while the callback is running).
-
-The implementation is well suited for asynchronous use in interrupt contexts.
-However such use inevitably involves races, because the PM core can't
-synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
-This synchronization must be handled by the driver, using its private lock.
-Here is a schematic pseudo-code example:
-
- foo_read_or_write(struct foo_priv *foo, void *data)
- {
- lock(&foo->private_lock);
- add_request_to_io_queue(foo, data);
- if (foo->num_pending_requests++ == 0)
- pm_runtime_get(&foo->dev);
- if (!foo->is_suspended)
- foo_process_next_request(foo);
- unlock(&foo->private_lock);
- }
-
- foo_io_completion(struct foo_priv *foo, void *req)
- {
- lock(&foo->private_lock);
- if (--foo->num_pending_requests == 0) {
- pm_runtime_mark_last_busy(&foo->dev);
- pm_runtime_put_autosuspend(&foo->dev);
- } else {
- foo_process_next_request(foo);
- }
- unlock(&foo->private_lock);
- /* Send req result back to the user ... */
- }
-
- int foo_runtime_suspend(struct device *dev)
- {
- struct foo_priv foo = container_of(dev, ...);
- int ret = 0;
-
- lock(&foo->private_lock);
- if (foo->num_pending_requests > 0) {
- ret = -EBUSY;
- } else {
- /* ... suspend the device ... */
- foo->is_suspended = 1;
- }
- unlock(&foo->private_lock);
- return ret;
- }
-
- int foo_runtime_resume(struct device *dev)
- {
- struct foo_priv foo = container_of(dev, ...);
-
- lock(&foo->private_lock);
- /* ... resume the device ... */
- foo->is_suspended = 0;
- pm_runtime_mark_last_busy(&foo->dev);
- if (foo->num_pending_requests > 0)
- foo_process_requests(foo);
- unlock(&foo->private_lock);
- return 0;
- }
-
-The important point is that after foo_io_completion() asks for an autosuspend,
-the foo_runtime_suspend() callback may race with foo_read_or_write().
-Therefore foo_runtime_suspend() has to check whether there are any pending I/O
-requests (while holding the private lock) before allowing the suspend to
-proceed.
-
-In addition, the power.autosuspend_delay field can be changed by user space at
-any time. If a driver cares about this, it can call
-pm_runtime_autosuspend_expiration() from within the ->runtime_suspend()
-callback while holding its private lock. If the function returns a nonzero
-value then the delay has not yet expired and the callback should return
--EAGAIN.
diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
deleted file mode 100644
index 1bdfa044377..00000000000
--- a/Documentation/power/s2ram.txt
+++ /dev/null
@@ -1,81 +0,0 @@
- How to get s2ram working
- ~~~~~~~~~~~~~~~~~~~~~~~~
- 2006 Linus Torvalds
- 2006 Pavel Machek
-
-1) Check suspend.sf.net, program s2ram there has long whitelist of
- "known ok" machines, along with tricks to use on each one.
-
-2) If that does not help, try reading tricks.txt and
- video.txt. Perhaps problem is as simple as broken module, and
- simple module unload can fix it.
-
-3) You can use Linus' TRACE_RESUME infrastructure, described below.
-
- Using TRACE_RESUME
- ~~~~~~~~~~~~~~~~~~
-
-I've been working at making the machines I have able to STR, and almost
-always it's a driver that is buggy. Thank God for the suspend/resume
-debugging - the thing that Chuck tried to disable. That's often the _only_
-way to debug these things, and it's actually pretty powerful (but
-time-consuming - having to insert TRACE_RESUME() markers into the device
-driver that doesn't resume and recompile and reboot).
-
-Anyway, the way to debug this for people who are interested (have a
-machine that doesn't boot) is:
-
- - enable PM_DEBUG, and PM_TRACE
-
- - use a script like this:
-
- #!/bin/sh
- sync
- echo 1 > /sys/power/pm_trace
- echo mem > /sys/power/state
-
- to suspend
-
- - if it doesn't come back up (which is usually the problem), reboot by
- holding the power button down, and look at the dmesg output for things
- like
-
- Magic number: 4:156:725
- hash matches drivers/base/power/resume.c:28
- hash matches device 0000:01:00.0
-
- which means that the last trace event was just before trying to resume
- device 0000:01:00.0. Then figure out what driver is controlling that
- device (lspci and /sys/devices/pci* is your friend), and see if you can
- fix it, disable it, or trace into its resume function.
-
- If no device matches the hash (or any matches appear to be false positives),
- the culprit may be a device from a loadable kernel module that is not loaded
- until after the hash is checked. You can check the hash against the current
- devices again after more modules are loaded using sysfs:
-
- cat /sys/power/pm_trace_dev_match
-
-For example, the above happens to be the VGA device on my EVO, which I
-used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
-that "radeonfb" simply cannot resume that device - it tries to set the
-PLL's, and it just _hangs_. Using the regular VGA console and letting X
-resume it instead works fine.
-
-NOTE
-====
-pm_trace uses the system's Real Time Clock (RTC) to save the magic number.
-Reason for this is that the RTC is the only reliably available piece of
-hardware during resume operations where a value can be set that will
-survive a reboot.
-
-Consequence is that after a resume (even if it is successful) your system
-clock will have a value corresponding to the magic number instead of the
-correct date/time! It is therefore advisable to use a program like ntp-date
-or rdate to reset the correct date/time from an external time source when
-using this trace option.
-
-As the clock keeps ticking it is also essential that the reboot is done
-quickly after the resume failure. The trace option does not use the seconds
-or the low order bits of the minutes of the RTC, but a too long delay will
-corrupt the magic value.
diff --git a/Documentation/power/states.txt b/Documentation/power/states.txt
deleted file mode 100644
index 4416b28630d..00000000000
--- a/Documentation/power/states.txt
+++ /dev/null
@@ -1,80 +0,0 @@
-
-System Power Management States
-
-
-The kernel supports three power management states generically, though
-each is dependent on platform support code to implement the low-level
-details for each state. This file describes each state, what they are
-commonly called, what ACPI state they map to, and what string to write
-to /sys/power/state to enter that state
-
-
-State: Standby / Power-On Suspend
-ACPI State: S1
-String: "standby"
-
-This state offers minimal, though real, power savings, while providing
-a very low-latency transition back to a working system. No operating
-state is lost (the CPU retains power), so the system easily starts up
-again where it left off.
-
-We try to put devices in a low-power state equivalent to D1, which
-also offers low power savings, but low resume latency. Not all devices
-support D1, and those that don't are left on.
-
-A transition from Standby to the On state should take about 1-2
-seconds.
-
-
-State: Suspend-to-RAM
-ACPI State: S3
-String: "mem"
-
-This state offers significant power savings as everything in the
-system is put into a low-power state, except for memory, which is
-placed in self-refresh mode to retain its contents.
-
-System and device state is saved and kept in memory. All devices are
-suspended and put into D3. In many cases, all peripheral buses lose
-power when entering STR, so devices must be able to handle the
-transition back to the On state.
-
-For at least ACPI, STR requires some minimal boot-strapping code to
-resume the system from STR. This may be true on other platforms.
-
-A transition from Suspend-to-RAM to the On state should take about
-3-5 seconds.
-
-
-State: Suspend-to-disk
-ACPI State: S4
-String: "disk"
-
-This state offers the greatest power savings, and can be used even in
-the absence of low-level platform support for power management. This
-state operates similarly to Suspend-to-RAM, but includes a final step
-of writing memory contents to disk. On resume, this is read and memory
-is restored to its pre-suspend state.
-
-STD can be handled by the firmware or the kernel. If it is handled by
-the firmware, it usually requires a dedicated partition that must be
-setup via another operating system for it to use. Despite the
-inconvenience, this method requires minimal work by the kernel, since
-the firmware will also handle restoring memory contents on resume.
-
-For suspend-to-disk, a mechanism called 'swsusp' (Swap Suspend) is used
-to write memory contents to free swap space. swsusp has some restrictive
-requirements, but should work in most cases. Some, albeit outdated,
-documentation can be found in Documentation/power/swsusp.txt.
-Alternatively, userspace can do most of the actual suspend to disk work,
-see userland-swsusp.txt.
-
-Once memory state is written to disk, the system may either enter a
-low-power state (like ACPI S4), or it may simply power down. Powering
-down offers greater savings, and allows this mechanism to work on any
-system. However, entering a real low-power state allows the user to
-trigger wake up events (e.g. pressing a key or opening a laptop lid).
-
-A transition from Suspend-to-Disk to the On state should take about 30
-seconds, though it's typically a bit more with the current
-implementation.
diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
deleted file mode 100644
index e13dafc8e8f..00000000000
--- a/Documentation/power/suspend-and-cpuhotplug.txt
+++ /dev/null
@@ -1,275 +0,0 @@
-Interaction of Suspend code (S3) with the CPU hotplug infrastructure
-
- (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
-
-
-I. How does the regular CPU hotplug code differ from how the Suspend-to-RAM
- infrastructure uses it internally? And where do they share common code?
-
-Well, a picture is worth a thousand words... So ASCII art follows :-)
-
-[This depicts the current design in the kernel, and focusses only on the
-interactions involving the freezer and CPU hotplug and also tries to explain
-the locking involved. It outlines the notifications involved as well.
-But please note that here, only the call paths are illustrated, with the aim
-of describing where they take different paths and where they share code.
-What happens when regular CPU hotplug and Suspend-to-RAM race with each other
-is not depicted here.]
-
-On a high level, the suspend-resume cycle goes like this:
-
-|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
-|tasks | | cpus | | | | cpus | |tasks|
-
-
-More details follow:
-
- Suspend call path
- -----------------
-
- Write 'mem' to
- /sys/power/state
- sysfs file
- |
- v
- Acquire pm_mutex lock
- |
- v
- Send PM_SUSPEND_PREPARE
- notifications
- |
- v
- Freeze tasks
- |
- |
- v
- disable_nonboot_cpus()
- /* start */
- |
- v
- Acquire cpu_add_remove_lock
- |
- v
- Iterate over CURRENTLY
- online CPUs
- |
- |
- | ----------
- v | L
- ======> _cpu_down() |
- | [This takes cpuhotplug.lock |
- Common | before taking down the CPU |
- code | and releases it when done] | O
- | While it is at it, notifications |
- | are sent when notable events occur, |
- ======> by running all registered callbacks. |
- | | O
- | |
- | |
- v |
- Note down these cpus in | P
- frozen_cpus mask ----------
- |
- v
- Disable regular cpu hotplug
- by setting cpu_hotplug_disabled=1
- |
- v
- Release cpu_add_remove_lock
- |
- v
- /* disable_nonboot_cpus() complete */
- |
- v
- Do suspend
-
-
-
-Resuming back is likewise, with the counterparts being (in the order of
-execution during resume):
-* enable_nonboot_cpus() which involves:
- | Acquire cpu_add_remove_lock
- | Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
- | Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
- | Release cpu_add_remove_lock
- v
-
-* thaw tasks
-* send PM_POST_SUSPEND notifications
-* Release pm_mutex lock.
-
-
-It is to be noted here that the pm_mutex lock is acquired at the very
-beginning, when we are just starting out to suspend, and then released only
-after the entire cycle is complete (i.e., suspend + resume).
-
-
-
- Regular CPU hotplug call path
- -----------------------------
-
- Write 0 (or 1) to
- /sys/devices/system/cpu/cpu*/online
- sysfs file
- |
- |
- v
- cpu_down()
- |
- v
- Acquire cpu_add_remove_lock
- |
- v
- If cpu_hotplug_disabled is 1
- return gracefully
- |
- |
- v
- ======> _cpu_down()
- | [This takes cpuhotplug.lock
- Common | before taking down the CPU
- code | and releases it when done]
- | While it is at it, notifications
- | are sent when notable events occur,
- ======> by running all registered callbacks.
- |
- |
- v
- Release cpu_add_remove_lock
- [That's it!, for
- regular CPU hotplug]
-
-
-
-So, as can be seen from the two diagrams (the parts marked as "Common code"),
-regular CPU hotplug and the suspend code path converge at the _cpu_down() and
-_cpu_up() functions. They differ in the arguments passed to these functions,
-in that during regular CPU hotplug, 0 is passed for the 'tasks_frozen'
-argument. But during suspend, since the tasks are already frozen by the time
-the non-boot CPUs are offlined or onlined, the _cpu_*() functions are called
-with the 'tasks_frozen' argument set to 1.
-[See below for some known issues regarding this.]
-
-
-Important files and functions/entry points:
-------------------------------------------
-
-kernel/power/process.c : freeze_processes(), thaw_processes()
-kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
-kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
-
-
-
-II. What are the issues involved in CPU hotplug?
- -------------------------------------------
-
-There are some interesting situations involving CPU hotplug and microcode
-update on the CPUs, as discussed below:
-
-[Please bear in mind that the kernel requests the microcode images from
-userspace, using the request_firmware() function defined in
-drivers/base/firmware_class.c]
-
-
-a. When all the CPUs are identical:
-
- This is the most common situation and it is quite straightforward: we want
- to apply the same microcode revision to each of the CPUs.
- To give an example of x86, the collect_cpu_info() function defined in
- arch/x86/kernel/microcode_core.c helps in discovering the type of the CPU
- and thereby in applying the correct microcode revision to it.
- But note that the kernel does not maintain a common microcode image for the
- all CPUs, in order to handle case 'b' described below.
-
-
-b. When some of the CPUs are different than the rest:
-
- In this case since we probably need to apply different microcode revisions
- to different CPUs, the kernel maintains a copy of the correct microcode
- image for each CPU (after appropriate CPU type/model discovery using
- functions such as collect_cpu_info()).
-
-
-c. When a CPU is physically hot-unplugged and a new (and possibly different
- type of) CPU is hot-plugged into the system:
-
- In the current design of the kernel, whenever a CPU is taken offline during
- a regular CPU hotplug operation, upon receiving the CPU_DEAD notification
- (which is sent by the CPU hotplug code), the microcode update driver's
- callback for that event reacts by freeing the kernel's copy of the
- microcode image for that CPU.
-
- Hence, when a new CPU is brought online, since the kernel finds that it
- doesn't have the microcode image, it does the CPU type/model discovery
- afresh and then requests the userspace for the appropriate microcode image
- for that CPU, which is subsequently applied.
-
- For example, in x86, the mc_cpu_callback() function (which is the microcode
- update driver's callback registered for CPU hotplug events) calls
- microcode_update_cpu() which would call microcode_init_cpu() in this case,
- instead of microcode_resume_cpu() when it finds that the kernel doesn't
- have a valid microcode image. This ensures that the CPU type/model
- discovery is performed and the right microcode is applied to the CPU after
- getting it from userspace.
-
-
-d. Handling microcode update during suspend/hibernate:
-
- Strictly speaking, during a CPU hotplug operation which does not involve
- physically removing or inserting CPUs, the CPUs are not actually powered
- off during a CPU offline. They are just put to the lowest C-states possible.
- Hence, in such a case, it is not really necessary to re-apply microcode
- when the CPUs are brought back online, since they wouldn't have lost the
- image during the CPU offline operation.
-
- This is the usual scenario encountered during a resume after a suspend.
- However, in the case of hibernation, since all the CPUs are completely
- powered off, during restore it becomes necessary to apply the microcode
- images to all the CPUs.
-
- [Note that we don't expect someone to physically pull out nodes and insert
- nodes with a different type of CPUs in-between a suspend-resume or a
- hibernate/restore cycle.]
-
- In the current design of the kernel however, during a CPU offline operation
- as part of the suspend/hibernate cycle (the CPU_DEAD_FROZEN notification),
- the existing copy of microcode image in the kernel is not freed up.
- And during the CPU online operations (during resume/restore), since the
- kernel finds that it already has copies of the microcode images for all the
- CPUs, it just applies them to the CPUs, avoiding any re-discovery of CPU
- type/model and the need for validating whether the microcode revisions are
- right for the CPUs or not (due to the above assumption that physical CPU
- hotplug will not be done in-between suspend/resume or hibernate/restore
- cycles).
-
-
-III. Are there any known problems when regular CPU hotplug and suspend race
- with each other?
-
-Yes, they are listed below:
-
-1. When invoking regular CPU hotplug, the 'tasks_frozen' argument passed to
- the _cpu_down() and _cpu_up() functions is *always* 0.
- This might not reflect the true current state of the system, since the
- tasks could have been frozen by an out-of-band event such as a suspend
- operation in progress. Hence, it will lead to wrong notifications being
- sent during the cpu online/offline events (eg, CPU_ONLINE notification
- instead of CPU_ONLINE_FROZEN) which in turn will lead to execution of
- inappropriate code by the callbacks registered for such CPU hotplug events.
-
-2. If a regular CPU hotplug stress test happens to race with the freezer due
- to a suspend operation in progress at the same time, then we could hit the
- situation described below:
-
- * A regular cpu online operation continues its journey from userspace
- into the kernel, since the freezing has not yet begun.
- * Then freezer gets to work and freezes userspace.
- * If cpu online has not yet completed the microcode update stuff by now,
- it will now start waiting on the frozen userspace in the
- TASK_UNINTERRUPTIBLE state, in order to get the microcode image.
- * Now the freezer continues and tries to freeze the remaining tasks. But
- due to this wait mentioned above, the freezer won't be able to freeze
- the cpu online hotplug task and hence freezing of tasks fails.
-
- As a result of this task freezing failure, the suspend operation gets
- aborted.
diff --git a/Documentation/power/swsusp-and-swap-files.txt b/Documentation/power/swsusp-and-swap-files.txt
deleted file mode 100644
index f281886de49..00000000000
--- a/Documentation/power/swsusp-and-swap-files.txt
+++ /dev/null
@@ -1,60 +0,0 @@
-Using swap files with software suspend (swsusp)
- (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
-
-The Linux kernel handles swap files almost in the same way as it handles swap
-partitions and there are only two differences between these two types of swap
-areas:
-(1) swap files need not be contiguous,
-(2) the header of a swap file is not in the first block of the partition that
-holds it. From the swsusp's point of view (1) is not a problem, because it is
-already taken care of by the swap-handling code, but (2) has to be taken into
-consideration.
-
-In principle the location of a swap file's header may be determined with the
-help of appropriate filesystem driver. Unfortunately, however, it requires the
-filesystem holding the swap file to be mounted, and if this filesystem is
-journaled, it cannot be mounted during resume from disk. For this reason to
-identify a swap file swsusp uses the name of the partition that holds the file
-and the offset from the beginning of the partition at which the swap file's
-header is located. For convenience, this offset is expressed in <PAGE_SIZE>
-units.
-
-In order to use a swap file with swsusp, you need to:
-
-1) Create the swap file and make it active, eg.
-
-# dd if=/dev/zero of=<swap_file_path> bs=1024 count=<swap_file_size_in_k>
-# mkswap <swap_file_path>
-# swapon <swap_file_path>
-
-2) Use an application that will bmap the swap file with the help of the
-FIBMAP ioctl and determine the location of the file's swap header, as the
-offset, in <PAGE_SIZE> units, from the beginning of the partition which
-holds the swap file.
-
-3) Add the following parameters to the kernel command line:
-
-resume=<swap_file_partition> resume_offset=<swap_file_offset>
-
-where <swap_file_partition> is the partition on which the swap file is located
-and <swap_file_offset> is the offset of the swap header determined by the
-application in 2) (of course, this step may be carried out automatically
-by the same application that determines the swap file's header offset using the
-FIBMAP ioctl)
-
-OR
-
-Use a userland suspend application that will set the partition and offset
-with the help of the SNAPSHOT_SET_SWAP_AREA ioctl described in
-Documentation/power/userland-swsusp.txt (this is the only method to suspend
-to a swap file allowing the resume to be initiated from an initrd or initramfs
-image).
-
-Now, swsusp will use the swap file in the same way in which it would use a swap
-partition. In particular, the swap file has to be active (ie. be present in
-/proc/swaps) so that it can be used for suspending.
-
-Note that if the swap file used for suspending is deleted and recreated,
-the location of its header need not be the same as before. Thus every time
-this happens the value of the "resume_offset=" kernel command line parameter
-has to be updated.
diff --git a/Documentation/power/swsusp-dmcrypt.txt b/Documentation/power/swsusp-dmcrypt.txt
deleted file mode 100644
index 59931b46ff7..00000000000
--- a/Documentation/power/swsusp-dmcrypt.txt
+++ /dev/null
@@ -1,138 +0,0 @@
-Author: Andreas Steinmetz <ast@domdv.de>
-
-
-How to use dm-crypt and swsusp together:
-========================================
-
-Some prerequisites:
-You know how dm-crypt works. If not, visit the following web page:
-http://www.saout.de/misc/dm-crypt/
-You have read Documentation/power/swsusp.txt and understand it.
-You did read Documentation/initrd.txt and know how an initrd works.
-You know how to create or how to modify an initrd.
-
-Now your system is properly set up, your disk is encrypted except for
-the swap device(s) and the boot partition which may contain a mini
-system for crypto setup and/or rescue purposes. You may even have
-an initrd that does your current crypto setup already.
-
-At this point you want to encrypt your swap, too. Still you want to
-be able to suspend using swsusp. This, however, means that you
-have to be able to either enter a passphrase or that you read
-the key(s) from an external device like a pcmcia flash disk
-or an usb stick prior to resume. So you need an initrd, that sets
-up dm-crypt and then asks swsusp to resume from the encrypted
-swap device.
-
-The most important thing is that you set up dm-crypt in such
-a way that the swap device you suspend to/resume from has
-always the same major/minor within the initrd as well as
-within your running system. The easiest way to achieve this is
-to always set up this swap device first with dmsetup, so that
-it will always look like the following:
-
-brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
-
-Now set up your kernel to use /dev/mapper/swap0 as the default
-resume partition, so your kernel .config contains:
-
-CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
-
-Prepare your boot loader to use the initrd you will create or
-modify. For lilo the simplest setup looks like the following
-lines:
-
-image=/boot/vmlinuz
-initrd=/boot/initrd.gz
-label=linux
-append="root=/dev/ram0 init=/linuxrc rw"
-
-Finally you need to create or modify your initrd. Lets assume
-you create an initrd that reads the required dm-crypt setup
-from a pcmcia flash disk card. The card is formatted with an ext2
-fs which resides on /dev/hde1 when the card is inserted. The
-card contains at least the encrypted swap setup in a file
-named "swapkey". /etc/fstab of your initrd contains something
-like the following:
-
-/dev/hda1 /mnt ext3 ro 0 0
-none /proc proc defaults,noatime,nodiratime 0 0
-none /sys sysfs defaults,noatime,nodiratime 0 0
-
-/dev/hda1 contains an unencrypted mini system that sets up all
-of your crypto devices, again by reading the setup from the
-pcmcia flash disk. What follows now is a /linuxrc for your
-initrd that allows you to resume from encrypted swap and that
-continues boot with your mini system on /dev/hda1 if resume
-does not happen:
-
-#!/bin/sh
-PATH=/sbin:/bin:/usr/sbin:/usr/bin
-mount /proc
-mount /sys
-mapped=0
-noresume=`grep -c noresume /proc/cmdline`
-if [ "$*" != "" ]
-then
- noresume=1
-fi
-dmesg -n 1
-/sbin/cardmgr -q
-for i in 1 2 3 4 5 6 7 8 9 0
-do
- if [ -f /proc/ide/hde/media ]
- then
- usleep 500000
- mount -t ext2 -o ro /dev/hde1 /mnt
- if [ -f /mnt/swapkey ]
- then
- dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
- fi
- umount /mnt
- break
- fi
- usleep 500000
-done
-killproc /sbin/cardmgr
-dmesg -n 6
-if [ $mapped = 1 ]
-then
- if [ $noresume != 0 ]
- then
- mkswap /dev/mapper/swap0 > /dev/null 2>&1
- fi
- echo 254:0 > /sys/power/resume
- dmsetup remove swap0
-fi
-umount /sys
-mount /mnt
-umount /proc
-cd /mnt
-pivot_root . mnt
-mount /proc
-umount -l /mnt
-umount /proc
-exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
-
-Please don't mind the weird loop above, busybox's msh doesn't know
-the let statement. Now, what is happening in the script?
-First we have to decide if we want to try to resume, or not.
-We will not resume if booting with "noresume" or any parameters
-for init like "single" or "emergency" as boot parameters.
-
-Then we need to set up dmcrypt with the setup data from the
-pcmcia flash disk. If this succeeds we need to reset the swap
-device if we don't want to resume. The line "echo 254:0 > /sys/power/resume"
-then attempts to resume from the first device mapper device.
-Note that it is important to set the device in /sys/power/resume,
-regardless if resuming or not, otherwise later suspend will fail.
-If resume starts, script execution terminates here.
-
-Otherwise we just remove the encrypted swap device and leave it to the
-mini system on /dev/hda1 to set the whole crypto up (it is up to
-you to modify this to your taste).
-
-What then follows is the well known process to change the root
-file system and continue booting from there. I prefer to unmount
-the initrd prior to continue booting but it is up to you to modify
-this.
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
deleted file mode 100644
index ac190cf1963..00000000000
--- a/Documentation/power/swsusp.txt
+++ /dev/null
@@ -1,408 +0,0 @@
-Some warnings, first.
-
- * BIG FAT WARNING *********************************************************
- *
- * If you touch anything on disk between suspend and resume...
- * ...kiss your data goodbye.
- *
- * If you do resume from initrd after your filesystems are mounted...
- * ...bye bye root partition.
- * [this is actually same case as above]
- *
- * If you have unsupported (*) devices using DMA, you may have some
- * problems. If your disk driver does not support suspend... (IDE does),
- * it may cause some problems, too. If you change kernel command line
- * between suspend and resume, it may do something wrong. If you change
- * your hardware while system is suspended... well, it was not good idea;
- * but it will probably only crash.
- *
- * (*) suspend/resume support is needed to make it safe.
- *
- * If you have any filesystems on USB devices mounted before software suspend,
- * they won't be accessible after resume and you may lose data, as though
- * you have unplugged the USB devices with mounted filesystems on them;
- * see the FAQ below for details. (This is not true for more traditional
- * power states like "standby", which normally don't turn USB off.)
-
-You need to append resume=/dev/your_swap_partition to kernel command
-line. Then you suspend by
-
-echo shutdown > /sys/power/disk; echo disk > /sys/power/state
-
-. If you feel ACPI works pretty well on your system, you might try
-
-echo platform > /sys/power/disk; echo disk > /sys/power/state
-
-. If you have SATA disks, you'll need recent kernels with SATA suspend
-support. For suspend and resume to work, make sure your disk drivers
-are built into kernel -- not modules. [There's way to make
-suspend/resume with modular disk drivers, see FAQ, but you probably
-should not do that.]
-
-If you want to limit the suspend image size to N bytes, do
-
-echo N > /sys/power/image_size
-
-before suspend (it is limited to 500 MB by default).
-
-
-Article about goals and implementation of Software Suspend for Linux
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Author: G‚ábor Kuti
-Last revised: 2003-10-20 by Pavel Machek
-
-Idea and goals to achieve
-
-Nowadays it is common in several laptops that they have a suspend button. It
-saves the state of the machine to a filesystem or to a partition and switches
-to standby mode. Later resuming the machine the saved state is loaded back to
-ram and the machine can continue its work. It has two real benefits. First we
-save ourselves the time machine goes down and later boots up, energy costs
-are real high when running from batteries. The other gain is that we don't have to
-interrupt our programs so processes that are calculating something for a long
-time shouldn't need to be written interruptible.
-
-swsusp saves the state of the machine into active swaps and then reboots or
-powerdowns. You must explicitly specify the swap partition to resume from with
-``resume='' kernel option. If signature is found it loads and restores saved
-state. If the option ``noresume'' is specified as a boot parameter, it skips
-the resuming. If the option ``hibernate=nocompress'' is specified as a boot
-parameter, it saves hibernation image without compression.
-
-In the meantime while the system is suspended you should not add/remove any
-of the hardware, write to the filesystems, etc.
-
-Sleep states summary
-====================
-
-There are three different interfaces you can use, /proc/acpi should
-work like this:
-
-In a really perfect world:
-echo 1 > /proc/acpi/sleep # for standby
-echo 2 > /proc/acpi/sleep # for suspend to ram
-echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
-echo 4 > /proc/acpi/sleep # for suspend to disk
-echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
-
-and perhaps
-echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
-
-Frequently Asked Questions
-==========================
-
-Q: well, suspending a server is IMHO a really stupid thing,
-but... (Diego Zuccato):
-
-A: You bought new UPS for your server. How do you install it without
-bringing machine down? Suspend to disk, rearrange power cables,
-resume.
-
-You have your server on UPS. Power died, and UPS is indicating 30
-seconds to failure. What do you do? Suspend to disk.
-
-
-Q: Maybe I'm missing something, but why don't the regular I/O paths work?
-
-A: We do use the regular I/O paths. However we cannot restore the data
-to its original location as we load it. That would create an
-inconsistent kernel state which would certainly result in an oops.
-Instead, we load the image into unused memory and then atomically copy
-it back to it original location. This implies, of course, a maximum
-image size of half the amount of memory.
-
-There are two solutions to this:
-
-* require half of memory to be free during suspend. That way you can
-read "new" data onto free spots, then cli and copy
-
-* assume we had special "polling" ide driver that only uses memory
-between 0-640KB. That way, I'd have to make sure that 0-640KB is free
-during suspending, but otherwise it would work...
-
-suspend2 shares this fundamental limitation, but does not include user
-data and disk caches into "used memory" by saving them in
-advance. That means that the limitation goes away in practice.
-
-Q: Does linux support ACPI S4?
-
-A: Yes. That's what echo platform > /sys/power/disk does.
-
-Q: What is 'suspend2'?
-
-A: suspend2 is 'Software Suspend 2', a forked implementation of
-suspend-to-disk which is available as separate patches for 2.4 and 2.6
-kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
-highmem and preemption. It also has a extensible architecture that
-allows for arbitrary transformations on the image (compression,
-encryption) and arbitrary backends for writing the image (eg to swap
-or an NFS share[Work In Progress]). Questions regarding suspend2
-should be sent to the mailing list available through the suspend2
-website, and not to the Linux Kernel Mailing List. We are working
-toward merging suspend2 into the mainline kernel.
-
-Q: What is the freezing of tasks and why are we using it?
-
-A: The freezing of tasks is a mechanism by which user space processes and some
-kernel threads are controlled during hibernation or system-wide suspend (on some
-architectures). See freezing-of-tasks.txt for details.
-
-Q: What is the difference between "platform" and "shutdown"?
-
-A:
-
-shutdown: save state in linux, then tell bios to powerdown
-
-platform: save state in linux, then tell bios to powerdown and blink
- "suspended led"
-
-"platform" is actually right thing to do where supported, but
-"shutdown" is most reliable (except on ACPI systems).
-
-Q: I do not understand why you have such strong objections to idea of
-selective suspend.
-
-A: Do selective suspend during runtime power management, that's okay. But
-it's useless for suspend-to-disk. (And I do not see how you could use
-it for suspend-to-ram, I hope you do not want that).
-
-Lets see, so you suggest to
-
-* SUSPEND all but swap device and parents
-* Snapshot
-* Write image to disk
-* SUSPEND swap device and parents
-* Powerdown
-
-Oh no, that does not work, if swap device or its parents uses DMA,
-you've corrupted data. You'd have to do
-
-* SUSPEND all but swap device and parents
-* FREEZE swap device and parents
-* Snapshot
-* UNFREEZE swap device and parents
-* Write
-* SUSPEND swap device and parents
-
-Which means that you still need that FREEZE state, and you get more
-complicated code. (And I have not yet introduce details like system
-devices).
-
-Q: There don't seem to be any generally useful behavioral
-distinctions between SUSPEND and FREEZE.
-
-A: Doing SUSPEND when you are asked to do FREEZE is always correct,
-but it may be unnecessarily slow. If you want your driver to stay simple,
-slowness may not matter to you. It can always be fixed later.
-
-For devices like disk it does matter, you do not want to spindown for
-FREEZE.
-
-Q: After resuming, system is paging heavily, leading to very bad interactivity.
-
-A: Try running
-
-cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null
-
-after resume. swapoff -a; swapon -a may also be useful.
-
-Q: What happens to devices during swsusp? They seem to be resumed
-during system suspend?
-
-A: That's correct. We need to resume them if we want to write image to
-disk. Whole sequence goes like
-
- Suspend part
- ~~~~~~~~~~~~
- running system, user asks for suspend-to-disk
-
- user processes are stopped
-
- suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
- with state snapshot
-
- state snapshot: copy of whole used memory is taken with interrupts disabled
-
- resume(): devices are woken up so that we can write image to swap
-
- write image to swap
-
- suspend(PMSG_SUSPEND): suspend devices so that we can power off
-
- turn the power off
-
- Resume part
- ~~~~~~~~~~~
- (is actually pretty similar)
-
- running system, user asks for suspend-to-disk
-
- user processes are stopped (in common case there are none, but with resume-from-initrd, no one knows)
-
- read image from disk
-
- suspend(PMSG_FREEZE): devices are frozen so that they don't interfere
- with image restoration
-
- image restoration: rewrite memory with image
-
- resume(): devices are woken up so that system can continue
-
- thaw all user processes
-
-Q: What is this 'Encrypt suspend image' for?
-
-A: First of all: it is not a replacement for dm-crypt encrypted swap.
-It cannot protect your computer while it is suspended. Instead it does
-protect from leaking sensitive data after resume from suspend.
-
-Think of the following: you suspend while an application is running
-that keeps sensitive data in memory. The application itself prevents
-the data from being swapped out. Suspend, however, must write these
-data to swap to be able to resume later on. Without suspend encryption
-your sensitive data are then stored in plaintext on disk. This means
-that after resume your sensitive data are accessible to all
-applications having direct access to the swap device which was used
-for suspend. If you don't need swap after resume these data can remain
-on disk virtually forever. Thus it can happen that your system gets
-broken in weeks later and sensitive data which you thought were
-encrypted and protected are retrieved and stolen from the swap device.
-To prevent this situation you should use 'Encrypt suspend image'.
-
-During suspend a temporary key is created and this key is used to
-encrypt the data written to disk. When, during resume, the data was
-read back into memory the temporary key is destroyed which simply
-means that all data written to disk during suspend are then
-inaccessible so they can't be stolen later on. The only thing that
-you must then take care of is that you call 'mkswap' for the swap
-partition used for suspend as early as possible during regular
-boot. This asserts that any temporary key from an oopsed suspend or
-from a failed or aborted resume is erased from the swap device.
-
-As a rule of thumb use encrypted swap to protect your data while your
-system is shut down or suspended. Additionally use the encrypted
-suspend image to prevent sensitive data from being stolen after
-resume.
-
-Q: Can I suspend to a swap file?
-
-A: Generally, yes, you can. However, it requires you to use the "resume=" and
-"resume_offset=" kernel command line parameters, so the resume from a swap file
-cannot be initiated from an initrd or initramfs image. See
-swsusp-and-swap-files.txt for details.
-
-Q: Is there a maximum system RAM size that is supported by swsusp?
-
-A: It should work okay with highmem.
-
-Q: Does swsusp (to disk) use only one swap partition or can it use
-multiple swap partitions (aggregate them into one logical space)?
-
-A: Only one swap partition, sorry.
-
-Q: If my application(s) causes lots of memory & swap space to be used
-(over half of the total system RAM), is it correct that it is likely
-to be useless to try to suspend to disk while that app is running?
-
-A: No, it should work okay, as long as your app does not mlock()
-it. Just prepare big enough swap partition.
-
-Q: What information is useful for debugging suspend-to-disk problems?
-
-A: Well, last messages on the screen are always useful. If something
-is broken, it is usually some kernel driver, therefore trying with as
-little as possible modules loaded helps a lot. I also prefer people to
-suspend from console, preferably without X running. Booting with
-init=/bin/bash, then swapon and starting suspend sequence manually
-usually does the trick. Then it is good idea to try with latest
-vanilla kernel.
-
-Q: How can distributions ship a swsusp-supporting kernel with modular
-disk drivers (especially SATA)?
-
-A: Well, it can be done, load the drivers, then do echo into
-/sys/power/disk/resume file from initrd. Be sure not to mount
-anything, not even read-only mount, or you are going to lose your
-data.
-
-Q: How do I make suspend more verbose?
-
-A: If you want to see any non-error kernel messages on the virtual
-terminal the kernel switches to during suspend, you have to set the
-kernel console loglevel to at least 4 (KERN_WARNING), for example by
-doing
-
- # save the old loglevel
- read LOGLEVEL DUMMY < /proc/sys/kernel/printk
- # set the loglevel so we see the progress bar.
- # if the level is higher than needed, we leave it alone.
- if [ $LOGLEVEL -lt 5 ]; then
- echo 5 > /proc/sys/kernel/printk
- fi
-
- IMG_SZ=0
- read IMG_SZ < /sys/power/image_size
- echo -n disk > /sys/power/state
- RET=$?
- #
- # the logic here is:
- # if image_size > 0 (without kernel support, IMG_SZ will be zero),
- # then try again with image_size set to zero.
- if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
- echo 0 > /sys/power/image_size
- echo -n disk > /sys/power/state
- RET=$?
- fi
-
- # restore previous loglevel
- echo $LOGLEVEL > /proc/sys/kernel/printk
- exit $RET
-
-Q: Is this true that if I have a mounted filesystem on a USB device and
-I suspend to disk, I can lose data unless the filesystem has been mounted
-with "sync"?
-
-A: That's right ... if you disconnect that device, you may lose data.
-In fact, even with "-o sync" you can lose data if your programs have
-information in buffers they haven't written out to a disk you disconnect,
-or if you disconnect before the device finished saving data you wrote.
-
-Software suspend normally powers down USB controllers, which is equivalent
-to disconnecting all USB devices attached to your system.
-
-Your system might well support low-power modes for its USB controllers
-while the system is asleep, maintaining the connection, using true sleep
-modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
-/sys/power/state file; write "standby" or "mem".) We've not seen any
-hardware that can use these modes through software suspend, although in
-theory some systems might support "platform" modes that won't break the
-USB connections.
-
-Remember that it's always a bad idea to unplug a disk drive containing a
-mounted filesystem. That's true even when your system is asleep! The
-safest thing is to unmount all filesystems on removable media (such USB,
-Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
-before suspending; then remount them after resuming.
-
-There is a work-around for this problem. For more information, see
-Documentation/usb/persist.txt.
-
-Q: Can I suspend-to-disk using a swap partition under LVM?
-
-A: No. You can suspend successfully, but you'll not be able to
-resume. uswsusp should be able to work with LVM. See suspend.sf.net.
-
-Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
-compiled with the similar configuration files. Anyway I found that
-suspend to disk (and resume) is much slower on 2.6.16 compared to
-2.6.15. Any idea for why that might happen or how can I speed it up?
-
-A: This is because the size of the suspend image is now greater than
-for 2.6.15 (by saving more data we can get more responsive system
-after resume).
-
-There's the /sys/power/image_size knob that controls the size of the
-image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
-root), the 2.6.15 behavior should be restored. If it is still too
-slow, take a look at suspend.sf.net -- userland suspend is faster and
-supports LZF compression to speed it up further.
diff --git a/Documentation/power/tricks.txt b/Documentation/power/tricks.txt
deleted file mode 100644
index a1b8f7249f4..00000000000
--- a/Documentation/power/tricks.txt
+++ /dev/null
@@ -1,27 +0,0 @@
- swsusp/S3 tricks
- ~~~~~~~~~~~~~~~~
-Pavel Machek <pavel@ucw.cz>
-
-If you want to trick swsusp/S3 into working, you might want to try:
-
-* go with minimal config, turn off drivers like USB, AGP you don't
- really need
-
-* turn off APIC and preempt
-
-* use ext2. At least it has working fsck. [If something seems to go
- wrong, force fsck when you have a chance]
-
-* turn off modules
-
-* use vga text console, shut down X. [If you really want X, you might
- want to try vesafb later]
-
-* try running as few processes as possible, preferably go to single
- user mode.
-
-* due to video issues, swsusp should be easier to get working than
- S3. Try that first.
-
-When you make it work, try to find out what exactly was it that broke
-suspend, and preferably fix that.
diff --git a/Documentation/power/userland-swsusp.txt b/Documentation/power/userland-swsusp.txt
deleted file mode 100644
index 0e870825c1b..00000000000
--- a/Documentation/power/userland-swsusp.txt
+++ /dev/null
@@ -1,170 +0,0 @@
-Documentation for userland software suspend interface
- (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
-
-First, the warnings at the beginning of swsusp.txt still apply.
-
-Second, you should read the FAQ in swsusp.txt _now_ if you have not
-done it already.
-
-Now, to use the userland interface for software suspend you need special
-utilities that will read/write the system memory snapshot from/to the
-kernel. Such utilities are available, for example, from
-<http://suspend.sourceforge.net>. You may want to have a look at them if you
-are going to develop your own suspend/resume utilities.
-
-The interface consists of a character device providing the open(),
-release(), read(), and write() operations as well as several ioctl()
-commands defined in include/linux/suspend_ioctls.h . The major and minor
-numbers of the device are, respectively, 10 and 231, and they can
-be read from /sys/class/misc/snapshot/dev.
-
-The device can be open either for reading or for writing. If open for
-reading, it is considered to be in the suspend mode. Otherwise it is
-assumed to be in the resume mode. The device cannot be open for simultaneous
-reading and writing. It is also impossible to have the device open more than
-once at a time.
-
-Even opening the device has side effects. Data structures are
-allocated, and PM_HIBERNATION_PREPARE / PM_RESTORE_PREPARE chains are
-called.
-
-The ioctl() commands recognized by the device are:
-
-SNAPSHOT_FREEZE - freeze user space processes (the current process is
- not frozen); this is required for SNAPSHOT_CREATE_IMAGE
- and SNAPSHOT_ATOMIC_RESTORE to succeed
-
-SNAPSHOT_UNFREEZE - thaw user space processes frozen by SNAPSHOT_FREEZE
-
-SNAPSHOT_CREATE_IMAGE - create a snapshot of the system memory; the
- last argument of ioctl() should be a pointer to an int variable,
- the value of which will indicate whether the call returned after
- creating the snapshot (1) or after restoring the system memory state
- from it (0) (after resume the system finds itself finishing the
- SNAPSHOT_CREATE_IMAGE ioctl() again); after the snapshot
- has been created the read() operation can be used to transfer
- it out of the kernel
-
-SNAPSHOT_ATOMIC_RESTORE - restore the system memory state from the
- uploaded snapshot image; before calling it you should transfer
- the system memory snapshot back to the kernel using the write()
- operation; this call will not succeed if the snapshot
- image is not available to the kernel
-
-SNAPSHOT_FREE - free memory allocated for the snapshot image
-
-SNAPSHOT_PREF_IMAGE_SIZE - set the preferred maximum size of the image
- (the kernel will do its best to ensure the image size will not exceed
- this number, but if it turns out to be impossible, the kernel will
- create the smallest image possible)
-
-SNAPSHOT_GET_IMAGE_SIZE - return the actual size of the hibernation image
-
-SNAPSHOT_AVAIL_SWAP_SIZE - return the amount of available swap in bytes (the
- last argument should be a pointer to an unsigned int variable that will
- contain the result if the call is successful).
-
-SNAPSHOT_ALLOC_SWAP_PAGE - allocate a swap page from the resume partition
- (the last argument should be a pointer to a loff_t variable that
- will contain the swap page offset if the call is successful)
-
-SNAPSHOT_FREE_SWAP_PAGES - free all swap pages allocated by
- SNAPSHOT_ALLOC_SWAP_PAGE
-
-SNAPSHOT_SET_SWAP_AREA - set the resume partition and the offset (in <PAGE_SIZE>
- units) from the beginning of the partition at which the swap header is
- located (the last ioctl() argument should point to a struct
- resume_swap_area, as defined in kernel/power/suspend_ioctls.h,
- containing the resume device specification and the offset); for swap
- partitions the offset is always 0, but it is different from zero for
- swap files (see Documentation/power/swsusp-and-swap-files.txt for
- details).
-
-SNAPSHOT_PLATFORM_SUPPORT - enable/disable the hibernation platform support,
- depending on the argument value (enable, if the argument is nonzero)
-
-SNAPSHOT_POWER_OFF - make the kernel transition the system to the hibernation
- state (eg. ACPI S4) using the platform (eg. ACPI) driver
-
-SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
- immediately enter the suspend-to-RAM state, so this call must always
- be preceded by the SNAPSHOT_FREEZE call and it is also necessary
- to use the SNAPSHOT_UNFREEZE call after the system wakes up. This call
- is needed to implement the suspend-to-both mechanism in which the
- suspend image is first created, as though the system had been suspended
- to disk, and then the system is suspended to RAM (this makes it possible
- to resume the system from RAM if there's enough battery power or restore
- its state on the basis of the saved suspend image otherwise)
-
-The device's read() operation can be used to transfer the snapshot image from
-the kernel. It has the following limitations:
-- you cannot read() more than one virtual memory page at a time
-- read()s across page boundaries are impossible (ie. if ypu read() 1/2 of
- a page in the previous call, you will only be able to read()
- _at_ _most_ 1/2 of the page in the next call)
-
-The device's write() operation is used for uploading the system memory snapshot
-into the kernel. It has the same limitations as the read() operation.
-
-The release() operation frees all memory allocated for the snapshot image
-and all swap pages allocated with SNAPSHOT_ALLOC_SWAP_PAGE (if any).
-Thus it is not necessary to use either SNAPSHOT_FREE or
-SNAPSHOT_FREE_SWAP_PAGES before closing the device (in fact it will also
-unfreeze user space processes frozen by SNAPSHOT_UNFREEZE if they are
-still frozen when the device is being closed).
-
-Currently it is assumed that the userland utilities reading/writing the
-snapshot image from/to the kernel will use a swap partition, called the resume
-partition, or a swap file as storage space (if a swap file is used, the resume
-partition is the partition that holds this file). However, this is not really
-required, as they can use, for example, a special (blank) suspend partition or
-a file on a partition that is unmounted before SNAPSHOT_CREATE_IMAGE and
-mounted afterwards.
-
-These utilities MUST NOT make any assumptions regarding the ordering of
-data within the snapshot image. The contents of the image are entirely owned
-by the kernel and its structure may be changed in future kernel releases.
-
-The snapshot image MUST be written to the kernel unaltered (ie. all of the image
-data, metadata and header MUST be written in _exactly_ the same amount, form
-and order in which they have been read). Otherwise, the behavior of the
-resumed system may be totally unpredictable.
-
-While executing SNAPSHOT_ATOMIC_RESTORE the kernel checks if the
-structure of the snapshot image is consistent with the information stored
-in the image header. If any inconsistencies are detected,
-SNAPSHOT_ATOMIC_RESTORE will not succeed. Still, this is not a fool-proof
-mechanism and the userland utilities using the interface SHOULD use additional
-means, such as checksums, to ensure the integrity of the snapshot image.
-
-The suspending and resuming utilities MUST lock themselves in memory,
-preferably using mlockall(), before calling SNAPSHOT_FREEZE.
-
-The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE
-in the memory location pointed to by the last argument of ioctl() and proceed
-in accordance with it:
-1. If the value is 1 (ie. the system memory snapshot has just been
- created and the system is ready for saving it):
- (a) The suspending utility MUST NOT close the snapshot device
- _unless_ the whole suspend procedure is to be cancelled, in
- which case, if the snapshot image has already been saved, the
- suspending utility SHOULD destroy it, preferably by zapping
- its header. If the suspend is not to be cancelled, the
- system MUST be powered off or rebooted after the snapshot
- image has been saved.
- (b) The suspending utility SHOULD NOT attempt to perform any
- file system operations (including reads) on the file systems
- that were mounted before SNAPSHOT_CREATE_IMAGE has been
- called. However, it MAY mount a file system that was not
- mounted at that time and perform some operations on it (eg.
- use it for saving the image).
-2. If the value is 0 (ie. the system state has just been restored from
- the snapshot image), the suspending utility MUST close the snapshot
- device. Afterwards it will be treated as a regular userland process,
- so it need not exit.
-
-The resuming utility SHOULD NOT attempt to mount any file systems that could
-be mounted before suspend and SHOULD NOT attempt to perform any operations
-involving such file systems.
-
-For details, please refer to the source code.
diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt
deleted file mode 100644
index 3e6272bc447..00000000000
--- a/Documentation/power/video.txt
+++ /dev/null
@@ -1,185 +0,0 @@
-
- Video issues with S3 resume
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- 2003-2006, Pavel Machek
-
-During S3 resume, hardware needs to be reinitialized. For most
-devices, this is easy, and kernel driver knows how to do
-it. Unfortunately there's one exception: video card. Those are usually
-initialized by BIOS, and kernel does not have enough information to
-boot video card. (Kernel usually does not even contain video card
-driver -- vesafb and vgacon are widely used).
-
-This is not problem for swsusp, because during swsusp resume, BIOS is
-run normally so video card is normally initialized. It should not be
-problem for S1 standby, because hardware should retain its state over
-that.
-
-We either have to run video BIOS during early resume, or interpret it
-using vbetool later, or maybe nothing is necessary on particular
-system because video state is preserved. Unfortunately different
-methods work on different systems, and no known method suits all of
-them.
-
-Userland application called s2ram has been developed; it contains long
-whitelist of systems, and automatically selects working method for a
-given system. It can be downloaded from CVS at
-www.sf.net/projects/suspend . If you get a system that is not in the
-whitelist, please try to find a working solution, and submit whitelist
-entry so that work does not need to be repeated.
-
-Currently, VBE_SAVE method (6 below) works on most
-systems. Unfortunately, vbetool only runs after userland is resumed,
-so it makes debugging of early resume problems
-hard/impossible. Methods that do not rely on userland are preferable.
-
-Details
-~~~~~~~
-
-There are a few types of systems where video works after S3 resume:
-
-(1) systems where video state is preserved over S3.
-
-(2) systems where it is possible to call the video BIOS during S3
- resume. Unfortunately, it is not correct to call the video BIOS at
- that point, but it happens to work on some machines. Use
- acpi_sleep=s3_bios.
-
-(3) systems that initialize video card into vga text mode and where
- the BIOS works well enough to be able to set video mode. Use
- acpi_sleep=s3_mode on these.
-
-(4) on some systems s3_bios kicks video into text mode, and
- acpi_sleep=s3_bios,s3_mode is needed.
-
-(5) radeon systems, where X can soft-boot your video card. You'll need
- a new enough X, and a plain text console (no vesafb or radeonfb). See
- http://www.doesi.gmxhome.de/linux/tm800s3/s3.html for more information.
- Alternatively, you should use vbetool (6) instead.
-
-(6) other radeon systems, where vbetool is enough to bring system back
- to life. It needs text console to be working. Do vbetool vbestate
- save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
- vbestate restore < /tmp/delme; setfont <whatever>, and your video
- should work.
-
-(7) on some systems, it is possible to boot most of kernel, and then
- POSTing bios works. Ole Rohne has patch to do just that at
- http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
-
-(8) on some systems, you can use the video_post utility and or
- do echo 3 > /sys/power/state && /usr/sbin/video_post - which will
- initialize the display in console mode. If you are in X, you can switch
- to a virtual terminal and back to X using CTRL+ALT+F1 - CTRL+ALT+F7 to get
- the display working in graphical mode again.
-
-Now, if you pass acpi_sleep=something, and it does not work with your
-bios, you'll get a hard crash during resume. Be careful. Also it is
-safest to do your experiments with plain old VGA console. The vesafb
-and radeonfb (etc) drivers have a tendency to crash the machine during
-resume.
-
-You may have a system where none of above works. At that point you
-either invent another ugly hack that works, or write proper driver for
-your video card (good luck getting docs :-(). Maybe suspending from X
-(proper X, knowing your hardware, not XF68_FBcon) might have better
-chance of working.
-
-Table of known working notebooks:
-
-Model hack (or "how to do it")
-------------------------------------------------------------------------------
-Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
-Acer TM 230 s3_bios (2)
-Acer TM 242FX vbetool (6)
-Acer TM C110 video_post (8)
-Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
-Acer TM 4052LCi s3_bios (2)
-Acer TM 636Lci s3_bios,s3_mode (4)
-Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back
-Acer TM 660 ??? (*)
-Acer TM 800 vga=normal, X patches, see webpage (5) or vbetool (6)
-Acer TM 803 vga=normal, X patches, see webpage (5) or vbetool (6)
-Acer TM 803LCi vga=normal, vbetool (6)
-Arima W730a vbetool needed (6)
-Asus L2400D s3_mode (3)(***) (S1 also works OK)
-Asus L3350M (SiS 740) (6)
-Asus L3800C (Radeon M7) s3_bios (2) (S1 also works OK)
-Asus M6887Ne vga=normal, s3_bios (2), use radeon driver instead of fglrx in x.org
-Athlon64 desktop prototype s3_bios (2)
-Compal CL-50 ??? (*)
-Compaq Armada E500 - P3-700 none (1) (S1 also works OK)
-Compaq Evo N620c vga=normal, s3_bios (2)
-Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1
-Dell D600, ATI RV250 vga=normal and X, or try vbestate (6)
-Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested)
-Dell Inspiron 4000 ??? (*)
-Dell Inspiron 500m ??? (*)
-Dell Inspiron 510m ???
-Dell Inspiron 5150 vbetool needed (6)
-Dell Inspiron 600m ??? (*)
-Dell Inspiron 8200 ??? (*)
-Dell Inspiron 8500 ??? (*)
-Dell Inspiron 8600 ??? (*)
-eMachines athlon64 machines vbetool needed (6) (someone please get me model #s)
-HP NC6000 s3_bios, may not use radeonfb (2); or vbetool (6)
-HP NX7000 ??? (*)
-HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
-HP Omnibook XE3 athlon version none (1)
-HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
-HP Omnibook XE3L-GF vbetool (6)
-HP Omnibook 5150 none (1), (S1 also works OK)
-IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
-IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
-IBM TP R32 / Type 2658-MMG none (1)
-IBM TP R40 2722B3G ??? (*)
-IBM TP R50p / Type 1832-22U s3_bios (2)
-IBM TP R51 none (1)
-IBM TP T30 236681A ??? (*)
-IBM TP T40 / Type 2373-MU4 none (1)
-IBM TP T40p none (1)
-IBM TP R40p s3_bios (2)
-IBM TP T41p s3_bios (2), switch to X after resume
-IBM TP T42 s3_bios (2)
-IBM ThinkPad T42p (2373-GTG) s3_bios (2)
-IBM TP X20 ??? (*)
-IBM TP X30 s3_bios, s3_mode (4)
-IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
-IBM TP X32 none (1), but backlight is on and video is trashed after long suspend. s3_bios,s3_mode (4) works too. Perhaps that gets better results?
-IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
-IBM TP 600e none(1), but a switch to console and back to X is needed
-Medion MD4220 ??? (*)
-Samsung P35 vbetool needed (6)
-Sharp PC-AR10 (ATI rage) none (1), backlight does not switch off
-Sony Vaio PCG-C1VRX/K s3_bios (2)
-Sony Vaio PCG-F403 ??? (*)
-Sony Vaio PCG-GRT995MP none (1), works with 'nv' X driver
-Sony Vaio PCG-GR7/K none (1), but needs radeonfb, use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
-Sony Vaio PCG-N505SN ??? (*)
-Sony Vaio vgn-s260 X or boot-radeon can init it (5)
-Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X.
-Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
-Toshiba Libretto L5 none (1)
-Toshiba Libretto 100CT/110CT vbetool (6)
-Toshiba Portege 3020CT s3_mode (3)
-Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
-Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
-Toshiba Satellite 4090XCDT ??? (*)
-Toshiba Satellite P10-554 s3_bios,s3_mode (4)(****)
-Toshiba M30 (2) xor X with nvidia driver using internal AGP
-Uniwill 244IIO ??? (*)
-
-Known working desktop systems
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Mainboard Graphics card hack (or "how to do it")
-------------------------------------------------------------------------------
-Asus A7V8X nVidia RIVA TNT2 model 64 s3_bios,s3_mode (4)
-
-
-(*) from https://wiki.ubuntu.com/HoaryPMResults, not sure
- which options to use. If you know, please tell me.
-
-(***) To be tested with a newer kernel.
-
-(****) Not with SMP kernel, UP only.
diff --git a/Documentation/power/video_extension.txt b/Documentation/power/video_extension.txt
deleted file mode 100644
index b2f9b1598ac..00000000000
--- a/Documentation/power/video_extension.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-ACPI video extensions
-~~~~~~~~~~~~~~~~~~~~~
-
-This driver implement the ACPI Extensions For Display Adapters for
-integrated graphics devices on motherboard, as specified in ACPI 2.0
-Specification, Appendix B, allowing to perform some basic control like
-defining the video POST device, retrieving EDID information or to
-setup a video output, etc. Note that this is an ref. implementation
-only. It may or may not work for your integrated video device.
-
-Interfaces exposed to userland through /proc/acpi/video:
-
-VGA/info : display the supported video bus device capability like Video ROM, CRT/LCD/TV.
-VGA/ROM : Used to get a copy of the display devices' ROM data (up to 4k).
-VGA/POST_info : Used to determine what options are implemented.
-VGA/POST : Used to get/set POST device.
-VGA/DOS : Used to get/set ownership of output switching:
- Please refer ACPI spec B.4.1 _DOS
-VGA/CRT : CRT output
-VGA/LCD : LCD output
-VGA/TVO : TV output
-VGA/*/brightness : Used to get/set brightness of output device
-
-Notify event through /proc/acpi/event:
-
-#define ACPI_VIDEO_NOTIFY_SWITCH 0x80
-#define ACPI_VIDEO_NOTIFY_PROBE 0x81
-#define ACPI_VIDEO_NOTIFY_CYCLE 0x82
-#define ACPI_VIDEO_NOTIFY_NEXT_OUTPUT 0x83
-#define ACPI_VIDEO_NOTIFY_PREV_OUTPUT 0x84
-
-#define ACPI_VIDEO_NOTIFY_CYCLE_BRIGHTNESS 0x82
-#define ACPI_VIDEO_NOTIFY_INC_BRIGHTNESS 0x83
-#define ACPI_VIDEO_NOTIFY_DEC_BRIGHTNESS 0x84
-#define ACPI_VIDEO_NOTIFY_ZERO_BRIGHTNESS 0x85
-#define ACPI_VIDEO_NOTIFY_DISPLAY_OFF 0x86
-