diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/ext4.txt | 24 | ||||
-rw-r--r-- | Documentation/filesystems/gfs2-uevents.txt | 100 | ||||
-rw-r--r-- | Documentation/filesystems/ncpfs.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/nfs.txt | 98 | ||||
-rw-r--r-- | Documentation/filesystems/nfs41-server.txt | 54 | ||||
-rw-r--r-- | Documentation/filesystems/nfsroot.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 27 | ||||
-rw-r--r-- | Documentation/filesystems/seq_file.txt | 2 |
8 files changed, 292 insertions, 21 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 7be02ac5fa3..18b5ec8cea4 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -134,15 +134,9 @@ ro Mount filesystem read only. Note that ext4 will mount options "ro,noload" can be used to prevent writes to the filesystem. -journal_checksum Enable checksumming of the journal transactions. - This will allow the recovery code in e2fsck and the - kernel to detect corruption in the kernel. It is a - compatible change and will be ignored by older kernels. - journal_async_commit Commit block can be written to disk without waiting for descriptor blocks. If enabled older kernels cannot - mount the device. This will enable 'journal_checksum' - internally. + mount the device. journal=update Update the ext4 file system's journal to the current format. @@ -263,10 +257,18 @@ resuid=n The user ID which may use the reserved blocks. sb=n Use alternate superblock at this location. -quota -noquota -grpquota -usrquota +quota These options are ignored by the filesystem. They +noquota are used only by quota tools to recognize volumes +grpquota where quota should be turned on. See documentation +usrquota in the quota-tools package for more details + (http://sourceforge.net/projects/linuxquota). + +jqfmt=<quota type> These options tell filesystem details about quota +usrjquota=<file> so that quota information can be properly updated +grpjquota=<file> during journal replay. They replace the above + quota options. See documentation in the quota-tools + package for more details + (http://sourceforge.net/projects/linuxquota). bh (*) ext4 associates buffer heads to data pages to nobh (a) cache disk block mapping information diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt new file mode 100644 index 00000000000..fd966dc9979 --- /dev/null +++ b/Documentation/filesystems/gfs2-uevents.txt @@ -0,0 +1,100 @@ + uevents and GFS2 + ================== + +During the lifetime of a GFS2 mount, a number of uevents are generated. +This document explains what the events are and what they are used +for (by gfs_controld in gfs2-utils). + +A list of GFS2 uevents +----------------------- + +1. ADD + +The ADD event occurs at mount time. It will always be the first +uevent generated by the newly created filesystem. If the mount +is successful, an ONLINE uevent will follow. If it is not successful +then a REMOVE uevent will follow. + +The ADD uevent has two environment variables: SPECTATOR=[0|1] +and RDONLY=[0|1] that specify the spectator status (a read-only mount +with no journal assigned), and read-only (with journal assigned) status +of the filesystem respectively. + +2. ONLINE + +The ONLINE uevent is generated after a successful mount or remount. It +has the same environment variables as the ADD uevent. The ONLINE +uevent, along with the two environment variables for spectator and +RDONLY are a relatively recent addition (2.6.32-rc+) and will not +be generated by older kernels. + +3. CHANGE + +The CHANGE uevent is used in two places. One is when reporting the +successful mount of the filesystem by the first node (FIRSTMOUNT=Done). +This is used as a signal by gfs_controld that it is then ok for other +nodes in the cluster to mount the filesystem. + +The other CHANGE uevent is used to inform of the completion +of journal recovery for one of the filesystems journals. It has +two environment variables, JID= which specifies the journal id which +has just been recovered, and RECOVERY=[Done|Failed] to indicate the +success (or otherwise) of the operation. These uevents are generated +for every journal recovered, whether it is during the initial mount +process or as the result of gfs_controld requesting a specific journal +recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file. + +Because the CHANGE uevent was used (in early versions of gfs_controld) +without checking the environment variables to discover the state, we +cannot add any more functions to it without running the risk of +someone using an older version of the user tools and breaking their +cluster. For this reason the ONLINE uevent was used when adding a new +uevent for a successful mount or remount. + +4. OFFLINE + +The OFFLINE uevent is only generated due to filesystem errors and is used +as part of the "withdraw" mechanism. Currently this doesn't give any +information about what the error is, which is something that needs to +be fixed. + +5. REMOVE + +The REMOVE uevent is generated at the end of an unsuccessful mount +or at the end of a umount of the filesystem. All REMOVE uevents will +have been preceeded by at least an ADD uevent for the same fileystem, +and unlike the other uevents is generated automatically by the kernel's +kobject subsystem. + + +Information common to all GFS2 uevents (uevent environment variables) +---------------------------------------------------------------------- + +1. LOCKTABLE= + +The LOCKTABLE is a string, as supplied on the mount command +line (locktable=) or via fstab. It is used as a filesystem label +as well as providing the information for a lock_dlm mount to be +able to join the cluster. + +2. LOCKPROTO= + +The LOCKPROTO is a string, and its value depends on what is set +on the mount command line, or via fstab. It will be either +lock_nolock or lock_dlm. In the future other lock managers +may be supported. + +3. JOURNALID= + +If a journal is in use by the filesystem (journals are not +assigned for spectator mounts) then this will give the +numeric journal id in all GFS2 uevents. + +4. UUID= + +With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID +into the filesystem superblock. If it exists, this will +be included in every uevent relating to the filesystem. + + + diff --git a/Documentation/filesystems/ncpfs.txt b/Documentation/filesystems/ncpfs.txt index f12c30c93f2..5af164f4b37 100644 --- a/Documentation/filesystems/ncpfs.txt +++ b/Documentation/filesystems/ncpfs.txt @@ -7,6 +7,6 @@ ftp.gwdg.de/pub/linux/misc/ncpfs, but sunsite and its many mirrors will have it as well. Related products are linware and mars_nwe, which will give Linux partial -NetWare server functionality. Linware's home site is -klokan.sh.cvut.cz/pub/linux/linware; mars_nwe can be found on -ftp.gwdg.de/pub/linux/misc/ncpfs. +NetWare server functionality. + +mars_nwe can be found on ftp.gwdg.de/pub/linux/misc/ncpfs. diff --git a/Documentation/filesystems/nfs.txt b/Documentation/filesystems/nfs.txt new file mode 100644 index 00000000000..f50f26ce6cd --- /dev/null +++ b/Documentation/filesystems/nfs.txt @@ -0,0 +1,98 @@ + +The NFS client +============== + +The NFS version 2 protocol was first documented in RFC1094 (March 1989). +Since then two more major releases of NFS have been published, with NFSv3 +being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April +2003). + +The Linux NFS client currently supports all the above published versions, +and work is in progress on adding support for minor version 1 of the NFSv4 +protocol. + +The purpose of this document is to provide information on some of the +upcall interfaces that are used in order to provide the NFS client with +some of the information that it requires in order to fully comply with +the NFS spec. + +The DNS resolver +================ + +NFSv4 allows for one server to refer the NFS client to data that has been +migrated onto another server by means of the special "fs_locations" +attribute. See + http://tools.ietf.org/html/rfc3530#section-6 +and + http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 + +The fs_locations information can take the form of either an ip address and +a path, or a DNS hostname and a path. The latter requires the NFS client to +do a DNS lookup in order to mount the new volume, and hence the need for an +upcall to allow userland to provide this service. + +Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual +/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: + + (1) The process checks the dns_resolve cache to see if it contains a + valid entry. If so, it returns that entry and exits. + + (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' + (may be changed using the 'nfs.cache_getent' kernel boot parameter) + is run, with two arguments: + - the cache name, "dns_resolve" + - the hostname to resolve + + (3) After looking up the corresponding ip address, the helper script + writes the result into the rpc_pipefs pseudo-file + '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' + in the following (text) format: + + "<ip address> <hostname> <ttl>\n" + + Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6 + (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. + <hostname> is identical to the second argument of the helper + script, and <ttl> is the 'time to live' of this cache entry (in + units of seconds). + + Note: If <ip address> is invalid, say the string "0", then a negative + entry is created, which will cause the kernel to treat the hostname + as having no valid DNS translation. + + + + +A basic sample /sbin/nfs_cache_getent +===================================== + +#!/bin/bash +# +ttl=600 +# +cut=/usr/bin/cut +getent=/usr/bin/getent +rpc_pipefs=/var/lib/nfs/rpc_pipefs +# +die() +{ + echo "Usage: $0 cache_name entry_name" + exit 1 +} + +[ $# -lt 2 ] && die +cachename="$1" +cache_path=${rpc_pipefs}/cache/${cachename}/channel + +case "${cachename}" in + dns_resolve) + name="$2" + result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" + [ -z "${result}" ] && result="0" + ;; + *) + die + ;; +esac +echo "${result} ${name} ${ttl}" >${cache_path} + diff --git a/Documentation/filesystems/nfs41-server.txt b/Documentation/filesystems/nfs41-server.txt index 05d81cbcb2e..5920fe26e6f 100644 --- a/Documentation/filesystems/nfs41-server.txt +++ b/Documentation/filesystems/nfs41-server.txt @@ -11,6 +11,11 @@ the /proc/fs/nfsd/versions control file. Note that to write this control file, the nfsd service must be taken down. Use your user-mode nfs-utils to set this up; see rpc.nfsd(8) +(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and +"-4", respectively. Therefore, code meant to work on both new and old +kernels must turn 4.1 on or off *before* turning support for version 4 +on or off; rpc.nfsd does this correctly.) + The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based on the latest NFSv4.1 Internet Draft: http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29 @@ -25,6 +30,49 @@ are still under development out of tree. See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design for more information. +The current implementation is intended for developers only: while it +does support ordinary file operations on clients we have tested against +(including the linux client), it is incomplete in ways which may limit +features unexpectedly, cause known bugs in rare cases, or cause +interoperability problems with future clients. Known issues: + + - gss support is questionable: currently mounts with kerberos + from a linux client are possible, but we aren't really + conformant with the spec (for example, we don't use kerberos + on the backchannel correctly). + - no trunking support: no clients currently take advantage of + trunking, but this is a mandatory failure, and its use is + recommended to clients in a number of places. (E.g. to ensure + timely renewal in case an existing connection's retry timeouts + have gotten too long; see section 8.3 of the draft.) + Therefore, lack of this feature may cause future clients to + fail. + - Incomplete backchannel support: incomplete backchannel gss + support and no support for BACKCHANNEL_CTL mean that + callbacks (hence delegations and layouts) may not be + available and clients confused by the incomplete + implementation may fail. + - Server reboot recovery is unsupported; if the server reboots, + clients may fail. + - We do not support SSV, which provides security for shared + client-server state (thus preventing unauthorized tampering + with locks and opens, for example). It is mandatory for + servers to support this, though no clients use it yet. + - Mandatory operations which we do not support, such as + DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and + TEST_STATEID, are not currently used by clients, but will be + (and the spec recommends their uses in common cases), and + clients should not be expected to know how to recover from the + case where they are not supported. This will eventually cause + interoperability failures. + +In addition, some limitations are inherited from the current NFSv4 +implementation: + + - Incomplete delegation enforcement: if a file is renamed or + unlinked, a client holding a delegation may continue to + indefinitely allow opens of the file under the old name. + The table below, taken from the NFSv4.1 document, lists the operations that are mandatory to implement (REQ), optional (OPT), and NFSv4.0 operations that are required not to implement (MNI) @@ -142,6 +190,12 @@ NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | Implementation notes: +DELEGPURGE: +* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or + CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that + persist across client reboots). Thus we need not implement this for + now. + EXCHANGE_ID: * only SP4_NONE state protection supported * implementation ids are ignored diff --git a/Documentation/filesystems/nfsroot.txt b/Documentation/filesystems/nfsroot.txt index 68baddf3c3e..3ba0b945aaf 100644 --- a/Documentation/filesystems/nfsroot.txt +++ b/Documentation/filesystems/nfsroot.txt @@ -105,7 +105,7 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> the client address and this parameter is NOT empty only replies from the specified server are accepted. - Only required for for NFS root. That is autoconfiguration + Only required for NFS root. That is autoconfiguration will not be triggered if it is missing and NFS root is not in operation. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index ffead13f944..b5aee7838a0 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -176,6 +176,7 @@ read the file /proc/PID/status: CapBnd: ffffffffffffffff voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 1 + Stack usage: 12 kB This shows you nearly the same information you would get if you viewed it with the ps command. In fact, ps uses the proc file system to obtain its @@ -229,6 +230,7 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7) Mems_allowed_list Same as previous, but in "list format" voluntary_ctxt_switches number of voluntary context switches nonvoluntary_ctxt_switches number of non voluntary context switches + Stack usage: stack usage high water mark (round up to page size) .............................................................................. Table 1-3: Contents of the statm files (as of 2.6.8-rc3) @@ -307,7 +309,7 @@ address perms offset dev inode pathname 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test 0804a000-0806b000 rw-p 00000000 00:00 0 [heap] a7cb1000-a7cb2000 ---p 00000000 00:00 0 -a7cb2000-a7eb2000 rw-p 00000000 00:00 0 +a7cb2000-a7eb2000 rw-p 00000000 00:00 0 [threadstack:001ff4b4] a7eb2000-a7eb3000 ---p 00000000 00:00 0 a7eb3000-a7ed5000 rw-p 00000000 00:00 0 a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 @@ -343,6 +345,7 @@ is not associated with a file: [stack] = the stack of the main process [vdso] = the "virtual dynamic shared object", the kernel system call handler + [threadstack:xxxxxxxx] = the stack of the thread, xxxxxxxx is the stack size or if empty, the mapping is anonymous. @@ -375,6 +378,19 @@ of memory currently marked as referenced or accessed. This file is only present if the CONFIG_MMU kernel configuration option is enabled. +The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG +bits on both physical and virtual pages associated with a process. +To clear the bits for all the pages associated with the process + > echo 1 > /proc/PID/clear_refs + +To clear the bits for the anonymous pages associated with the process + > echo 2 > /proc/PID/clear_refs + +To clear the bits for the file mapped pages associated with the process + > echo 3 > /proc/PID/clear_refs +Any other value written to /proc/PID/clear_refs will have no effect. + + 1.2 Kernel data --------------- @@ -1032,9 +1048,9 @@ Various pieces of information about kernel activity are available in the since the system first booted. For a quick look, simply cat the file: > cat /proc/stat - cpu 2255 34 2290 22625563 6290 127 456 0 - cpu0 1132 34 1441 11311718 3675 127 438 0 - cpu1 1123 0 849 11313845 2614 0 18 0 + cpu 2255 34 2290 22625563 6290 127 456 0 0 + cpu0 1132 34 1441 11311718 3675 127 438 0 0 + cpu1 1123 0 849 11313845 2614 0 18 0 0 intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...] ctxt 1990473 btime 1062191376 @@ -1056,6 +1072,7 @@ second). The meanings of the columns are as follows, from left to right: - irq: servicing interrupts - softirq: servicing softirqs - steal: involuntary wait +- guest: running a guest The "intr" line gives counts of interrupts serviced since boot time, for each of the possible system interrupts. The first column is the total of all @@ -1191,7 +1208,7 @@ The following heuristics are then applied: * if the task was reniced, its score doubles * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE or CAP_SYS_RAWIO) have their score divided by 4 - * if oom condition happened in one cpuset and checked task does not belong + * if oom condition happened in one cpuset and checked process does not belong to it, its score is divided by 8 * the resulting score is multiplied by two to the power of oom_adj, i.e. points <<= oom_adj when it is positive and diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt index b843743aa0b..0d15ebccf5b 100644 --- a/Documentation/filesystems/seq_file.txt +++ b/Documentation/filesystems/seq_file.txt @@ -46,7 +46,7 @@ better to do. The file is seekable, in that one can do something like the following: dd if=/proc/sequence of=out1 count=1 - dd if=/proc/sequence skip=1 out=out2 count=1 + dd if=/proc/sequence skip=1 of=out2 count=1 Then concatenate the output files out1 and out2 and get the right result. Yes, it is a thoroughly useless module, but the point is to show |