summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | ocfs2: Stop orphan scan as early as possible during umountSunil Mushran2009-06-223-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | | | ocfs2: Fix ocfs2_osb_dump()Sunil Mushran2009-06-221-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | | | ocfs2: Pin journal head before accessing jh->b_committed_dataSunil Mushran2009-06-221-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses to jh->b_committed_data. Fixes oss bugzilla#1131 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | | | ocfs2: Update atime in splice read if necessary.Tao Ma2009-06-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so that we can update atime in splice read if necessary. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | | | ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.Joel Becker2009-06-225-9/+38
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>
* | | | NFS: Correct the NFS mount path when following a referralTrond Myklebust2009-06-221-0/+24
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | NFS: Fix nfs_path() to always return a '/' at the beginning of the pathTrond Myklebust2009-06-221-0/+5
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespaceTrond Myklebust2009-06-221-21/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the previous patch, the NFSv4 client mount code currently has several limitations. If the mount path contains symlinks, or referrals, or even if it just contains a '..', then the client code in nfs4_path_walk() will fail with an error. This patch replaces the nfs4_path_walk()-based lookup with a helper function that sets up a private namespace to represent the namespace on the server, then uses the ordinary VFS and NFS path lookup code to walk down the mount path in that namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | VFS: Add VFS helper functions for setting up private namespacesTrond Myklebust2009-06-221-8/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | VFS: Uninline the function put_mnt_ns()Trond Myklebust2009-06-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge git://git.infradead.org/mtd-2.6Linus Torvalds2009-06-222-54/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/mtd-2.6: (63 commits) mtd: OneNAND: Allow setting of boundary information when built as module jffs2: leaking jffs2_summary in function jffs2_scan_medium mtd: nand: Fix memory leak on txx9ndfmc probe failure. mtd: orion_nand: use burst reads with double word accesses mtd/nand: s3c6400 support for s3c2410 driver [MTD] [NAND] S3C2410: Use DIV_ROUND_UP [MTD] [NAND] S3C2410: Deal with unaligned lengths in S3C2440 buffer read/write [MTD] [NAND] S3C2410: Allow the machine code to get the BBT table from NAND [MTD] [NAND] S3C2410: Added a kerneldoc for s3c2410_nand_set mtd: physmap_of: Add multiple regions and concatenation support mtd: nand: max_retries off by one in mxc_nand mtd: nand: s3c2410_nand_setrate(): use correct macros for 2412/2440 mtd: onenand: add bbt_wait & unlock_all as replaceable for some platform mtd: Flex-OneNAND support mtd: nand: add OMAP2/OMAP3 NAND driver mtd: maps: Blackfin async: fix memory leaks in probe/remove funcs mtd: uclinux: mark local stuff static mtd: uclinux: do not allow to be built as a module mtd: uclinux: allow systems to override map addr/size mtd: blackfin NFC: fix hang when using NAND on BF527-EZKITs ...
| * | | | jffs2: leaking jffs2_summary in function jffs2_scan_mediumChristian Engelmayer2009-06-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of an error returned by file_dirty() 's' is not freed as the cleanup path is skipped. Reported by Coverity. Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | Merge branch 'next-mtd' of git://aeryn.fluff.org.uk/bjdooks/linuxDavid Woodhouse2009-06-0870-719/+831
| |\ \ \ \
| * | | | | mtd: Handle compat ioctls directly; remove all trace from compat_ioctl.cKevin Cernekee2009-05-291-22/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove all references to MTD ioctls from fs/compat_ioctl.c and let them all be handled by mtd_compat_ioctl(). Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | | mtd: add OOB ioctls for >4GiB devicesKevin Cernekee2009-05-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | | mtd: compat_ioctl cleanupKevin Cernekee2009-05-291-42/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Move the MEMREADOOB/MEMWRITEOOB compat_ioctl wrappers from fs/compat_ioctl.c into mtdchar.c . Original request was here: http://lkml.org/lkml/2009/4/1/295 2) Add missing COMPATIBLE_IOCTL lines, so that mtd-utils does not error out when running in 64/32 compatibility mode. LKML-Reference: <200904011650.22928.arnd@arndb.de> Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | | mtd: add MEMERASE64 ioctl for >4GiB devicesKevin Cernekee2009-05-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New MEMERASE/MEMREADOOB/MEMWRITEOOB ioctls are needed in order to support 64-bit offsets into large NAND flash devices. Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | | | | Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsdLinus Torvalds2009-06-2217-597/+1142
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...
| * | | | | | nfsd41: Backchannel: minorversion support for the back channelAndy Adamson2009-06-182-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare to share backchannel code with NFSv4.1. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd41: Backchannel: cleanup nfs4.0 callback encode routinesAndy Adamson2009-06-181-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mimic the client and prepare to share the back channel xdr with NFSv4.1. Bump the number of operations in each encode routine, then backfill the number of operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd41: Remove ip address collision detection caseMike Sager2009-06-181-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Verified that cthon and pynfs exchange id tests pass (except for the two expected fails: EID8 and EID50) Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd: optimise the starting of zero threads when none are running.NeilBrown2009-06-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if we ask to set then number of nfsd threads to zero when there are none running, we set up all the sockets and register the service, and then tear it all down again. This is pointless. So detect that case and exit promptly. (also remove an assignment to 'error' which was never used. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com>
| * | | | | | nfsd: don't take nfsd_mutex twice when setting number of threads.NeilBrown2009-06-182-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when we write a number to 'threads' in nfsdfs, we take the nfsd_mutex, update the number of threads, then take the mutex again to read the number of threads. Mostly this isn't a big deal. However if we are write '0', and portmap happens to be dead, then we can get unpredictable behaviour. If the nfsd threads all got killed quickly and the last thread is waiting for portmap to respond, then the second time we take the mutex we will block waiting for the last thread. However if the nfsd threads didn't die quite that fast, then there will be no contention when we try to take the mutex again. Unpredictability isn't fun, and waiting for the last thread to exit is pointless, so avoid taking the lock twice. To achieve this, get nfsd_svc return a non-negative number of active threads when not returning a negative error. Signed-off-by: NeilBrown <neilb@suse.de>
| * | | | | | nfsd41: sanity check client drc maxreqsAndy Adamson2009-06-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure the client requested maximum requests are between 1 and NFSD_MAX_SLOTS_PER_SESSION Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr ↵Alexandros Batsakis2009-06-162-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct the change is valid for both the forechannel and the backchannel (currently dummy) Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | NFS: kill off complicated macro 'PROC'Yu Zhiguo2009-06-152-56/+379
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kill off obscure macro 'PROC' of NFSv2&3 in order to make the code more clear. Among other things, this makes it simpler to grep for callers of these functions--something which has frequently caused confusion among nfs developers. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd: minor nfsd_vfs_write cleanupJ. Bruce Fields2009-06-151-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to check host_err >= 0 every time here when we could check host_err < 0 once, following the usual kernel style. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd: Pull write-gathering code out of nfsd_vfs_writeJ. Bruce Fields2009-06-151-30/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a relatively self-contained piece of code that handles a special case--move it to its own function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd: track last inode only in use_wgather caseJ. Bruce Fields2009-06-151-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating last_ino and last_dev probably isn't useful in the !use_wgather case. Also remove some pointless ifdef'd-out code. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | nfsd: Use write gathering only with NFSv2Trond Myklebust2009-06-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv3 and above can use unstable writes whenever they are sending more than one write, rather than relying on the flaky write gathering heuristics. More often than not, write gathering is currently getting it wrong when the NFSv3 clients are sending a single write with FILE_SYNC for efficiency reasons. This patch turns off write gathering for NFSv3/v4, and ensures that it only applies to the one case that can actually benefit: namely NFSv2. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | Merge commit 'v2.6.30' into for-2.6.31J. Bruce Fields2009-06-15144-3168/+2531
| |\ \ \ \ \ \
| * | | | | | | NFSv4: kill off complicated macro 'PROC'Yu Zhiguo2009-06-011-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | J. Bruce Fields wrote: ... > (This is extremely confusing code to track down: note that > proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC() > macro at the end of fs/nfsd/nfs4proc.c. Which means, for example, that > grepping for nfs4svc_decode_compoundargs() gets you nowhere. Patches to > kill off that macro would be welcomed....) the macro 'PROC' is complicated and obscure, it had better be killed off in order to make the code more clear. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | NFSv4: do exact check about attribute specifiedYu Zhiguo2009-06-012-38/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is not supported in current environment. Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check. This bug is found when do newpynfs tests. The names of the tests that failed are following: CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s Add function do_check_fattr() to do exact check: 1, Check attribute specified is supported by the NFSv4 server or not. 2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported in current environment or not. 3, Check attribute specified is writable or not. step 1 and 3 are done in function nfsd4_decode_fattr() but removed to this function now. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | knfsd: remove unreported filehandle stats countersGreg Banks2009-05-271-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The file nfsfh.c contains two static variables nfsd_nr_verified and nfsd_nr_put. These are counters which are incremented as a side effect of the fh_verify() fh_compose() and fh_put() operations, i.e. at least twice per NFS call for any non-trivial workload. Needless to say this makes the cacheline that contains them (and any other innocent victims) a very hot contention point indeed under high call-rate workloads on multiprocessor NFS server. It also turns out that these counters are not used anywhere. They're not reported to userspace, they're not used in logic, they're not even exported from the object file (let alone the module). All they do is waste CPU time. So this patch removes them. Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured separately (no bonding). Workload is 640 client threads doing directory traverals with random small reads, from server RAM. Before ====== Kernel profile: % cumulative self self total time samples samples calls 1/call 1/call name 6.05 2716.00 2716.00 30406 0.09 1.02 svc_process 4.44 4706.00 1990.00 1975 1.01 1.01 spin_unlock_irqrestore 3.72 6376.00 1670.00 1666 1.00 1.00 svc_export_put 3.41 7907.00 1531.00 1786 0.86 1.02 nfsd_ofcache_lookup 3.25 9363.00 1456.00 10965 0.13 1.01 nfsd_dispatch 3.10 10752.00 1389.00 1376 1.01 1.01 nfsd_cache_lookup 2.57 11907.00 1155.00 4517 0.26 1.03 svc_tcp_recvfrom ... 2.21 15352.00 1003.00 1081 0.93 1.00 nfsd_choose_ofc <---- ^^^^ Here the function nfsd_choose_ofc() reads a global variable which by accident happened to be located in the same cacheline as nfsd_nr_verified. Call rate: nullarbor:~ # pmdumptext nfs3.server.calls ... Thu Dec 13 00:15:27 184780.663 Thu Dec 13 00:15:28 184885.881 Thu Dec 13 00:15:29 184449.215 Thu Dec 13 00:15:30 184971.058 Thu Dec 13 00:15:31 185036.052 Thu Dec 13 00:15:32 185250.475 Thu Dec 13 00:15:33 184481.319 Thu Dec 13 00:15:34 185225.737 Thu Dec 13 00:15:35 185408.018 Thu Dec 13 00:15:36 185335.764 After ===== kernel profile: % cumulative self self total time samples samples calls 1/call 1/call name 6.33 2813.00 2813.00 29979 0.09 1.01 svc_process 4.66 4883.00 2070.00 2065 1.00 1.00 spin_unlock_irqrestore 4.06 6687.00 1804.00 2182 0.83 1.00 nfsd_ofcache_lookup 3.20 8110.00 1423.00 10932 0.13 1.00 nfsd_dispatch 3.03 9456.00 1346.00 1343 1.00 1.00 nfsd_cache_lookup 2.62 10622.00 1166.00 4645 0.25 1.01 svc_tcp_recvfrom [...] 0.10 42586.00 44.00 74 0.59 1.00 nfsd_choose_ofc <--- HA!! ^^^^ Call rate: nullarbor:~ # pmdumptext nfs3.server.calls ... Thu Dec 13 01:45:28 194677.118 Thu Dec 13 01:45:29 193932.692 Thu Dec 13 01:45:30 194294.364 Thu Dec 13 01:45:31 194971.276 Thu Dec 13 01:45:32 194111.207 Thu Dec 13 01:45:33 194999.635 Thu Dec 13 01:45:34 195312.594 Thu Dec 13 01:45:35 195707.293 Thu Dec 13 01:45:36 194610.353 Thu Dec 13 01:45:37 195913.662 Thu Dec 13 01:45:38 194808.675 i.e. about a 5.3% improvement in call rate. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Reviewed-by: David Chinner <dgc@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | knfsd: fix reply cache memory corruptionGreg Banks2009-05-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a regression in the reply cache introduced when the code was converted to use proper Linux lists. When a new entry needs to be inserted, the case where all the entries are currently being used by threads is not correctly detected. This can result in memory corruption and a crash. In the current code this is an extremely unlikely corner case; it would require the machine to have 1024 nfsd threads and all of them to be busy at the same time. However, upcoming reply cache changes make this more likely; a crash due to this problem was actually observed in field. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | knfsd: reply cache cleanupsGreg Banks2009-05-271-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make REQHASH() an inline function. Rename hash_list to cache_hash. Fix an obsolete comment. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | lockd: fix FILE_LOCKING=n build errorRandy Dunlap2009-05-132-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockd/svclock.c is missing a header file <linux/fs.h>. <linux/fs.h> is missing a definition of locks_release_private() for the config case of FILE_LOCKING=n, causing a build error: fs/lockd/svclock.c:330: error: implicit declaration of function 'locks_release_private' lockd without FILE_LOCKING doesn't make sense, so make LOCKD and LOCKD_V4 depend on FILE_LOCKING, and make NFS depend on FILE_LOCKING. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd: nfs4_stat_init cleanupWang Chen2009-05-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Save some loop time. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd: use C99 struct initializersRandy Dunlap2009-05-031-56/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate 56 sparse warnings like this one: fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: make recall callback an asynchronous rpcJ. Bruce Fields2009-05-032-50/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As with the probe, this removes the need for another kthread. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: track recall retries in nfs4_delegationJ. Bruce Fields2009-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move this out of a local variable into the nfs4_delegation object in preparation for making this an async rpc call (at which point we'll need any state like this in a common object that's preserved across function calls). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: remove unused dl_truncJ. Bruce Fields2009-05-012-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no point in keeping this field around--it's always zero. (Background: the protocol allows you to tell the client that the file is about to be truncated, as an optimization to save the client from writing back dirty pages that will just be discarded. We don't implement this hint. If we do some day, adding this field back in will be the least of the work involved.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: eliminate struct nfs4_cb_recallJ. Bruce Fields2009-05-012-16/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfs4_cb_recall struct is used only in nfs4_delegation, so its pointer to the containing delegation is unnecessary--we could just use container_of(). But there's no real reason to have this a separate struct at all--just move these fields to nfs4_delegation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: rename callback struct to cb_connJ. Bruce Fields2009-05-012-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I want to use the name for a struct that actually does represent a single callback. (Actually, I've never been sure it helps to a separate struct for the callback information. Some day maybe those fields could just be dumped into struct nfs4_client. I don't know.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: replace callback thread by asynchronous rpcJ. Bruce Fields2009-04-291-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't really need a synchronous rpc, and moving to an asynchronous rpc allows us to do without this extra kthread. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: lookup up callback cred only onceJ. Bruce Fields2009-04-292-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lookup the callback cred once and then use it for all subsequent callbacks. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: create rpc callback client from server threadJ. Bruce Fields2009-04-291-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code is a little simpler, and it should be easier to avoid races, if we just do all rpc client creation/destruction from nfsd or laundromat threads and do only the rpc calls themselves asynchronously. The rpc creation doesn't involve any significant waiting (it doesn't call the client, for example), so there's no reason not to do this. Also don't bother destroying the client on failure of the rpc null probe. We may want to retry the probe later anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: set cb_client inside setup_callback_clientJ. Bruce Fields2009-04-291-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is just a minor code simplification. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: set shorter timeoutJ. Bruce Fields2009-04-291-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We tried to do something overly complicated with the callback rpc timeouts here. And they're wrong--the result is that by the time a single callback times out, it's already too late to tell the client (using the cb_path_down return to RENEW) that the callback is down. Use a much shorter, simpler timeout. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | | | | nfsd4: setclientid_confirm callback-change fixesJ. Bruce Fields2009-04-291-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This setclientid_confirm case should allow the client to change callbacks, but it currently has a dummy implementation that just turns off callbacks completely. That dummy implementation isn't completely correct either, though: - There's no need to remove any client recovery directory in this case. - New clientid confirm verifiers should be generated (and returned) in setclientid; there's no need to generate a new one here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>