summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: should add a permission check for setfaclShi Weihua2010-07-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2f26afba46f0ebf155cf9be746496a0304a5b7cf upstream. On btrfs, do the following ------------------ # su user1 # cd btrfs-part/ # touch aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rw- group::rw- other::r-- # su user2 # cd btrfs-part/ # setfacl -m u::rwx aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rwx <- successed to setfacl group::rw- other::r-- ------------------ but we should prohibit it that user2 changing user1's acl. In fact, on ext3 and other fs, a message occurs: setfacl: aaa: Operation not permitted This patch fixed it. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* vfs: add NOFOLLOW flag to umount(2)Miklos Szeredi2010-07-051-1/+8
| | | | | | | | | | | | | | | | | | commit db1f05bb85d7966b9176e293f3ceead1cb8b5d79 upstream. Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent symlink attacks in unprivileged unmounts (fuse, samba, ncpfs). Additionally, return -EINVAL if an unknown flag is used (and specify an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for the caller to determine if a flag is supported or not. CC: Eugene Teo <eugene@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* CIFS: Allow null nd (as nfs server uses) on createSteve French2010-07-053-14/+23
| | | | | | | | | | | | | | commit fa588e0c57048b3d4bfcd772d80dc0615f83fd35 upstream. While creating a file on a server which supports unix extensions such as Samba, if a file is being created which does not supply nameidata (i.e. nd is null), cifs client can oops when calling cifs_posix_open. Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* GFS2: Fix permissions checking for setflags ioctl()Steven Whitehouse2010-07-051-0/+7
| | | | | | | | | | | | commit 7df0e0397b9a18358573274db9fdab991941062f upstream. We should be checking for the ownership of the file for which flags are being set, rather than just for write access. Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only filesTheodore Ts'o2010-07-051-0/+3
| | | | | | | | | | | | | | commit 1f5a81e41f8b1a782c68d3843e9ec1bfaadf7d72 upstream. Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the donor file is an append-only file, we should not allow the operation to proceed, lest we end up overwriting the contents of an append-only file. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ext4: check s_log_groups_per_flex in online resize codeEric Sandeen2010-07-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 42007efd569f1cf3bfb9a61da60ef6c2179508ca upstream. If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549 Reported-by: Alessandro Polverini <alex@nibbles.it> Test-case-by: Christoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* wrong type for 'magic' argument in simple_fill_super()Roberto Sassu2010-07-051-1/+2
| | | | | | | | | | | | | commit 7d683a09990ff095a91b6e724ecee0ff8733274a upstream. It's used to superblock ->s_magic, which is unsigned long. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Reviewed-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* exofs: confusion between kmap() and kmap_atomic() apiDan Carpenter2010-07-051-1/+1
| | | | | | | | | | | | | commit ddf08f4b90a413892bbb9bb2e8a57aed991cd47d upstream. For kmap_atomic() we call kunmap_atomic() on the returned pointer. That's different from kmap() and kunmap() and so it's easy to get them backwards. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* writeback: disable periodic old data writeback for !dirty_writeback_centisecsJens Axboe2010-07-051-2/+12
| | | | | | | | | | | | | | commit 69b62d01ec44fe0d505d89917392347732135a4d upstream. Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled periodic dirty writeback from kupdate. This got broken and now causes excessive sys CPU usage if set to zero, as we'll keep beating on schedule(). Reported-by: Justin Maggard <jmaggard10@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSD: don't report compiled-out versions as presentPavel Emelyanov2010-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | commit 15ddb4aec54422ead137b03ea4e9b3f5db3f7cc2 upstream. The /proc/fs/nfsd/versions file calls nfsd_vers() to check whether the particular nfsd version is present/available. The problem is that once I turn off e.g. NFSD-V4 this call returns -1 which is true from the callers POV which is wrong. The proposal is to report false in that case. The bug has existed since 6658d3a7bbfd1768 "[PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions". Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nilfs2: fix sync silent failureRyusuke Konishi2010-05-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 973bec34bfc1bc2465646181653d67f767d418c8 upstream. As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set. And nilfs does not set s_bdi anywhere. I noticed this problem by the warning introduced by the recent commit 5129a469 ("Catch filesystem lacking s_bdi"). WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e() Hardware name: PowerEdge 2850 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38 Call Trace: [<c1028422>] warn_slowpath_common+0x60/0x90 [<c102845f>] warn_slowpath_null+0xd/0x10 [<c1095936>] vfs_kern_mount+0xc5/0x14e [<c1095a03>] do_kern_mount+0x32/0xbd [<c10a811e>] do_mount+0x671/0x6d0 [<c1073794>] ? __get_free_pages+0x1f/0x21 [<c10a684f>] ? copy_mount_options+0x2b/0xe2 [<c107b634>] ? strndup_user+0x48/0x67 [<c10a81de>] sys_mount+0x61/0x8f [<c100280c>] sysenter_do_call+0x12/0x32 This ensures to set s_bdi for nilfs and fixes the sync silent failure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* CacheFiles: Fix error handling in cachefiles_determine_cache_security()David Howells2010-05-261-0/+4
| | | | | | | | | | | | | | | | | commit 7ac512aa8237c43331ffaf77a4fd8b8d684819ba upstream. cachefiles_determine_cache_security() is expected to return with a security override in place. However, if set_create_files_as() fails, we fail to do this. In this case, we should just reinstate the security override that was set by the caller. Furthermore, if set_create_files_as() fails, we should dispose of the new credentials we were in the process of creating. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Btrfs: check for read permission on src file in the clone ioctlDan Rosenberg2010-05-261-0/+5
| | | | | | | | | | | commit 5dc6416414fb3ec6e2825fd4d20c8bf1d7fe0395 upstream. The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* inotify: don't leak user struct on inotify releasePavel Emelyanov2010-05-261-0/+2
| | | | | | | | | | | | | | | | | | commit b3b38d842fa367d862b83e7670af4e0fd6a80fc0 upstream. inotify_new_group() receives a get_uid-ed user_struct and saves the reference on group->inotify_data.user. The problem is that free_uid() is never called on it. Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify using fsnotify) after 2.6.30. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* inotify: race use after free/double free in inotify inode marksEric Paris2010-05-261-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e08733446e72b983fed850fc5d8bd21b386feb29 upstream. There is a race in the inotify add/rm watch code. A task can find and remove a mark which doesn't have all of it's references. This can result in a use after free/double free situation. Task A Task B ------------ ----------- inotify_new_watch() allocate a mark (refcnt == 1) add it to the idr inotify_rm_watch() inotify_remove_from_idr() fsnotify_put_mark() refcnt hits 0, free take reference because we are on idr [at this point it is a use after free] [time goes on] refcnt may hit 0 again, double free The fix is to take the reference BEFORE the object can be found in the idr. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* cifs: guard against hardlinking directoriesJeff Layton2010-05-262-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3d69438031b00c601c991ab447cafb7d5c3c59a6 upstream. When we made serverino the default, we trusted that the field sent by the server in the "uniqueid" field was actually unique. It turns out that it isn't reliably so. Samba, in particular, will just put the st_ino in the uniqueid field when unix extensions are enabled. When a share spans multiple filesystems, it's quite possible that there will be collisions. This is a server bug, but when the inodes in question are a directory (as is often the case) and there is a collision with the root inode of the mount, the result is a kernel panic on umount. Fix this by checking explicitly for directory inodes with the same uniqueid. If that is the case, then we can assume that using server inode numbers will be a problem and that they should be disabled. Fixes Samba bugzilla 7407 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* revert "procfs: provide stack information for threads" and its fixup commitsRobin Holt2010-05-264-25/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream. Originally, commit d899bf7b ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit 1306d603f ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted c44972f1 ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: add a shrinker to background inode reclaimDave Chinner2010-05-126-9/+115
| | | | | | | | | | | | | | | | | | | | commit 9bf729c0af67897ea8498ce17c29b0683f7f2028 upstream On low memory boxes or those with highmem, kernel can OOM before the background reclaims inodes via xfssyncd. Add a shrinker to run inode reclaim so that it inode reclaim is expedited when memory is low. This is more complex than it needs to be because the VM folk don't want a context added to the shrinker infrastructure. Hence we need to add a global list of XFS mount structures so the shrinker can traverse them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* jfs: fix diAllocExt error in resizing filesystemBill Pemberton2010-05-121-1/+5
| | | | | | | | | | | | | | | commit 2b0b39517d1af5294128dbc2fd7ed39c8effa540 upstream. Resizing the filesystem would result in an diAllocExt error in some instances because changes in bmp->db_agsize would not get noticed if goto extendBmap was called. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: jfs-discussion@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ext4: correctly calculate number of blocks for fiemapLeonard Michlmayr2010-05-121-2/+7
| | | | | | | | | | | | | | | | | | commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1 upstream. ext4_fiemap() rounds the length of the requested range down to blocksize, which is is not the true number of blocks that cover the requested region. This problem is especially impressive if the user requests only the first byte of a file: not a single extent will be reported. We fix this by calculating the last block of the region and then subtract to find the number of blocks in the extents. Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFS: rsize and wsize settings ignored on v4 mountsChuck Lever2010-05-121-0/+2
| | | | | | | | | | | | | | | | | | commit 356e76b855bdbfd8d1c5e75bcf0c6bf0dfe83496 upstream. NFSv4 mounts ignore the rsize and wsize mount options, and always use the default transfer size for both. This seems to be because all NFSv4 mounts are now cloned, and the cloning logic doesn't copy the rsize and wsize settings from the parent nfs_server. I tested Fedora's 2.6.32.11-99 and it seems to have this problem as well, so I'm guessing that .33, .32, and perhaps older kernels have this issue as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfs d_revalidate() is too trigger-happy with d_drop()Al Viro2010-05-121-0/+2
| | | | | | | | | | | | | | | | | | commit d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f upstream. If dentry found stale happens to be a root of disconnected tree, we can't d_drop() it; its d_hash is actually part of s_anon and d_drop() would simply hide it from shrink_dcache_for_umount(), leading to all sorts of fun, including busy inodes on umount and oopsen after that. Bug had been there since at least 2006 (commit c636eb already has it), so it's definitely -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ocfs2_dlmfs: Fix math error when reading LVB.Joel Becker2010-05-121-1/+1
| | | | | | | | | | | | commit a36d515c7a2dfacebcf41729f6812dbc424ebcf0 upstream. When asked for a partial read of the LVB in a dlmfs file, we can accidentally calculate a negative count. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ocfs2: Compute metaecc for superblocks during online resize.Joel Becker2010-05-121-0/+2
| | | | | | | | | | | | commit a42ab8e1a37257da37e0f018e707bf365ac24531 upstream. Online resize writes out the new superblock and its backups directly. The metaecc data wasn't being recomputed. Let's do that directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ocfs2: potential ERR_PTR dereference on error pathsDan Carpenter2010-05-121-0/+1
| | | | | | | | | | | | commit 0350cb078f5035716ebdad4ad4709d02fe466a8a upstream. If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ocfs2: Update VFS inode's id info after reflink.Tao Ma2010-05-121-0/+3
| | | | | | | | | | | | | | commit c21a534e2f24968cf74976a4e721ac194db30ded upstream. In reflink we update the id info on the disk but forgot to update the corresponding information in the VFS inode. Update them accordingly when we want to preserve the attributes. Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfsd4: bug in read_bufNeil Brown2010-05-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | commit 2bc3c1179c781b359d4f2f3439cb3df72afc17fc upstream. When read_buf is called to move over to the next page in the pagelist of an NFSv4 request, it sets argp->end to essentially a random number, certainly not an address within the page which argp->p now points to. So subsequent calls to READ_BUF will think there is much more than a page of spare space (the cast to u32 ensures an unsigned comparison) so we can expect to fall off the end of the second page. We never encountered thsi in testing because typically the only operations which use more than two pages are write-like operations, which have their own decoding logic. Something like a getattr after a write may cross a page boundary, but it would be very unusual for it to cross another boundary after that. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* procfs: fix tid fdinfoJerome Marchand2010-05-121-1/+1
| | | | | | | | | | | | | | | | | | commit 3835541dd481091c4dbf5ef83c08aed12e50fd61 upstream. Correct the file_operations struct in fdinfo entry of tid_base_stuff[]. Presently /proc/*/task/*/fdinfo contains symlinks to opened files like /proc/*/fd/. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* reiserfs: fix corruption during shrinking of xattrsJeff Mahoney2010-05-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fb2162df74bb19552db3d988fd11c787cf5fad56 upstream. Commit 48b32a3553a54740d236b79a90f20147a25875e3 ("reiserfs: use generic xattr handlers") introduced a problem that causes corruption when extended attributes are replaced with a smaller value. The issue is that the reiserfs_setattr to shrink the xattr file was moved from before the write to after the write. The root issue has always been in the reiserfs xattr code, but was papered over by the fact that in the shrink case, the file would just be expanded again while the xattr was written. The end result is that the last 8 bytes of xattr data are lost. This patch fixes it to use new_size. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jethro Beekman <kernel@jbeekman.nl> Cc: Greg Surbey <gregsurbey@hotmail.com> Cc: Marco Gatti <marco.gatti@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* reiserfs: fix permissions on .reiserfs_privJeff Mahoney2010-05-122-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cac36f707119b792b2396aed371d6b5cdc194890 upstream. Commit 677c9b2e393a0cd203bd54e9c18b012b2c73305a ("reiserfs: remove privroot hiding in lookup") removed the magic from the lookup code to hide the .reiserfs_priv directory since it was getting loaded at mount-time instead. The intent was that the entry would be hidden from the user via a poisoned d_compare, but this was faulty. This introduced a security issue where unprivileged users could access and modify extended attributes or ACLs belonging to other users, including root. This patch resolves the issue by properly hiding .reiserfs_priv. This was the intent of the xattr poisoning code, but it appears to have never worked as expected. This is fixed by using d_revalidate instead of d_compare. This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way. The effort involved in working out the corner cases wrt permissions and caching outweigh the benefit of the feature. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Reported-by: Matt McCutchen <matt@mattmccutchen.net> Tested-by: Matt McCutchen <matt@mattmccutchen.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ext4: fix async i/o writes beyond 4GB to a sparse fileEric Sandeen2010-04-263-5/+5
| | | | | | | | | | | | | | | | | | | | commit a1de02dccf906faba2ee2d99cac56799bda3b96a upstream. The "offset" member in ext4_io_end holds bytes, not blocks, so ext4_lblk_t is wrong - and too small (u32). This caused the async i/o writes to sparse files beyond 4GB to fail when they wrapped around to 0. Also fix up the type of arguments to ext4_convert_unwritten_extents(), it gets ssize_t from ext4_end_aio_dio_nolock() and ext4_ext_direct_IO(). Reported-by: Giel de Nijs <giel@vectorwise.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: check for more work before sleeping in xfssyncdDave Chinner2010-04-261-3/+3
| | | | | | | | | | | | | | | | | | | commit 20f6b2c785cf187445f126321638ab8ba7aa7494 upstream. xfssyncd processes a queue of work by detaching the queue and then iterating over all the work items. It then sleeps for a time period or until new work comes in. If new work is queued while xfssyncd is actively processing the detached work queue, it will not process that new work until after a sleep timeout or the next work event queued wakes it. Fix this by checking the work queue again before going to sleep. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: fix locking for inode cache radix tree tag updatesChristoph Hellwig2010-04-262-8/+15
| | | | | | | | | | | | | | | | | | | commit f1f724e4b523d444c5a598d74505aefa3d6844d2 upstream. The radix-tree code requires it's users to serialize tag updates against other updates to the tree. While XFS protects tag updates against each other it does not serialize them against updates of the tree contents, which can lead to tag corruption. Fix the inode cache to always take pag_ici_lock in exclusive mode when updating radix tree tags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Signed-off-by: Alex Elder <aelder@sgi.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: Non-blocking inode locking in IO completionDave Chinner2010-04-261-37/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 77d7a0c2eeb285c9069e15396703d0cb9690ac50 upstream. The introduction of barriers to loop devices has created a new IO order completion dependency that XFS does not handle. The loop device implements barriers using fsync and so turns a log IO in the XFS filesystem on the loop device into a data IO in the backing filesystem. That is, the completion of log IOs in the loop filesystem are now dependent on completion of data IO in the backing filesystem. This can cause deadlocks when a flush daemon issues a log force with an inode locked because the IO completion of IO on the inode is blocked by the inode lock. This in turn prevents further data IO completion from occuring on all XFS filesystems on that CPU (due to the shared nature of the completion queues). This then prevents the log IO from completing because the log is waiting for data IO completion as well. The fix for this new completion order dependency issue is to make the IO completion inode locking non-blocking. If the inode lock can't be grabbed, simply requeue the IO completion back to the work queue so that it can be processed later. This prevents the completion queue from being blocked and allows data IO completion on other inodes to proceed, hence avoiding completion order dependent deadlocks. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ecryptfs: fix error code for missing xattrs in lower fsChristian Pulvermacher2010-04-261-4/+4
| | | | | | | | | | | | | | | commit cfce08c6bdfb20ade979284e55001ca1f100ed51 upstream. If the lower file system driver has extended attributes disabled, ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP. This breaks execution of programs in the ecryptfs mount, since the kernel expects the latter error when checking for security capabilities in xattrs. Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* eCryptfs: Decrypt symlink target for stat sizeTyler Hicks2010-04-261-48/+52
| | | | | | | | | | | | | | | | | | commit 3a60a1686f0d51c99bd0df8ac93050fb6dfce647 upstream. Create a getattr handler for eCryptfs symlinks that is capable of reading the lower target and decrypting its path. Prior to this patch, a stat's st_size field would represent the strlen of the encrypted path, while readlink() would return the strlen of the decrypted path. This could lead to confusion in some userspace applications, since the two values should be equal. https://bugs.launchpad.net/bugs/524919 Reported-by: Loïc Minier <loic.minier@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inodeJeff Mahoney2010-04-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 133b8f9d632cc23715c6d72d1c5ac449e054a12a upstream. Since tmpfs has no persistent storage, it pins all its dentries in memory so they have d_count=1 when other file systems would have d_count=0. ->lookup is only used to create new dentries. If the caller doesn't instantiate it, it's freed immediately at dput(). ->readdir reads directly from the dcache and depends on the dentries being hashed. When an ecryptfs mount is mounted, it associates the lower file and dentry with the ecryptfs files as they're accessed. When it's umounted and destroys all the in-memory ecryptfs inodes, it fput's the lower_files and d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in 2008 and several months later commit caeeeecf removed the d_delete. I believe the d_drop() needs to be removed as well. The d_drop effectively hides any file that has been accessed via ecryptfs from the underlying tmpfs since it depends on it being hashed for it to be accessible. I've removed the d_drop on my development node and see no ill effects with basic testing on both tmpfs and persistent storage. As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs BUGs on umount. This is due to the dentries being unhashed. tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop the reference pinning the dentry. It skips unhashed and negative dentries, but shrink_dcache_for_umount_subtree doesn't. Since those dentries still have an elevated d_count, we get a BUG(). This patch removes the d_drop call and fixes both issues. This issue was reported at: https://bugzilla.novell.com/show_bug.cgi?id=567887 Reported-by: Árpád Bíró <biroa@demasz.hu> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 9p: Skip check for mandatory locks when unlockingSachin Prabhu2010-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit f78233dd44a110c574fe760ad6f9c1e8741a0d00 upstream. While investigating a bug, I came across a possible bug in v9fs. The problem is similar to the one reported for NFS by ASANO Masahiro in http://lkml.org/lkml/2005/12/21/334. v9fs_file_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. Such a lock will be skipped during unlock and the machine will end up with a BUG in locks_remove_flock(). v9fs_file_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ocfs2: Change bg_chain check for ocfs2_validate_gd_parent.Tao Ma2010-04-261-5/+8
| | | | | | | | | | | | | | | | | | | | | | commit 78c37eb0d5e6a9727b12ea0f1821795ffaa66cfe upstream. In ocfs2_validate_gd_parent, we check bg_chain against the cl_next_free_rec of the dinode. Actually in resize, we have the chance of bg_chain == cl_next_free_rec. So add some additional condition check for it. I also rename paramter "clean_error" to "resize", since the old one is not clearly enough to indicate that we should only meet with this case in resize. btw, the correpsonding bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ocfs2: set i_mode on disk during acl operationsMark Fasheh2010-04-261-5/+72
| | | | | | | | | | | | | | | | | commit fcefd25ac89239cb57fa198f125a79ff85468c75 upstream. ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory inode, but never setting it on the disk copy. Thus, acls were some times not getting propagated between nodes. This patch fixes the issue by adding a helper function ocfs2_acl_set_mode() which does this the right way. ocfs2_set_acl() and ocfs2_init_acl() are then updated to call ocfs2_acl_set_mode(). Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* quota: Fix possible dq_flags corruptionAndrew Perepechko2010-04-261-6/+6
| | | | | | | | | | | | | | commit 08261673cb6dc638c39f44d69b76fffb57b92a8b upstream. dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a change done by an atomic operation can be overwritten by a change done by a non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk. Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* fix NFS4 handling of mountpoint statAl Viro2010-04-261-3/+9
| | | | | | | | | | | commit 462d60577a997aa87c935ae4521bd303733a9f2b upstream. RFC says we need to follow the chain of mounts if there's more than one stacked on that point. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4: fix delegated lockingTrond Myklebust2010-04-262-2/+5
| | | | | | | | | | | | | | | | | | | | | | commit 0df5dd4aae211edeeeb84f7f84f6d093406d7c22 upstream. Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIRTrond Myklebust2010-04-261-1/+1
| | | | | | | | commit 80e60639f1b7c121a7fea53920c5a4b94009361a upstream. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* CIFS: initialize nbytes at the beginning of CIFSSMBWrite()Steve French2010-04-261-1/+2
| | | | | | | | | | | | | | | commit a24e2d7d8f512340991ef0a59cb5d08d491b8e98 upstream. By doing this we always overwrite nbytes value that is being passed on to CIFSSMBWrite() and need not rely on the callers to initialize. CIFSSMBWrite2 is doing this already. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* cifs: Fix a kernel BUG with remote OS/2 server (try #3)Suresh Jayaraman2010-04-261-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6513a81e9325d712f1bfb9a1d7b750134e49ff18 upstream. While chasing a bug report involving a OS/2 server, I noticed the server sets pSMBr->CountHigh to a incorrect value even in case of normal writes. This results in 'nbytes' being computed wrongly and triggers a kernel BUG at mm/filemap.c. void iov_iter_advance(struct iov_iter *i, size_t bytes) { BUG_ON(i->count < bytes); <--- BUG here Why the server is setting 'CountHigh' is not clear but only does so after writing 64k bytes. Though this looks like the server bug, the client side crash may not be acceptable. The workaround is to mask off high 16 bits if the number of bytes written as returned by the server is greater than the bytes requested by the client as suggested by Jeff Layton. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* raw: fsync method is now requiredAnton Blanchard2010-04-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 55ab3a1ff843e3f0e24d2da44e71bffa5d853010 upstream. Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop || !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* reiserfs: Fix locking BUG during mount failureJeff Mahoney2010-04-261-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit b7b7fa43103a9fb30dbcc60cbd5161fdfc25f904 upstream. Commit 8ebc423238341b52912c7295b045a32477b33f09 (reiserfs: kill-the-BKL) introduced a bug in the mount failure case. The error label releases the lock before calling journal_release_error, but it requires that the lock be held. do_journal_release unlocks and retakes it. When it releases it without it held, we trigger a BUG(). The error_alloc label skips the unlock since the lock isn't held yet but none of the other conditions that are clean up exist yet either. This patch returns immediately after the kzalloc failure and moves the reiserfs_write_unlock after the journal_release_error call. This was reported in https://bugzilla.novell.com/show_bug.cgi?id=591807 Reported-by: Thomas Siedentopf <thomas.siedentopf@novell.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Thomas Siedentopf <thomas.siedentopf@novell.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* oom: fix the unsafe usage of badness() in proc_oom_score()Oleg Nesterov2010-04-261-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | commit b95c35e76b29ba812e5dabdd91592e25ec640e93 upstream. proc_oom_score(task) has a reference to task_struct, but that is all. If this task was already released before we take tasklist_lock - we can't use task->group_leader, it points to nowhere - it is not safe to call badness() even if this task is ->group_leader, has_intersects_mems_allowed() assumes it is safe to iterate over ->thread_group list. - even worse, badness() can hit ->signal == NULL Add the pid_alive() check to ensure __unhash_process() was not called. Also, use "task" instead of task->group_leader. badness() should return the same result for any sub-thread. Currently this is not true, but this should be changed anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* fat: fix buffer overflow in vfat_create_shortname()Nikolaus Schulz2010-04-261-3/+3
| | | | | | | | | | | | | | | | | | | commit 30d1872d9eb3663b4cf7bdebcbf5cd465674cced upstream. When using the string representation of a random counter as part of the base name, ensure that it is no longer than 4 bytes. Since we are repeatedly decrementing the counter in a loop until we have found a unique base name, the counter may wrap around zero; therefore, it is not enough to mask its higher bits before entering the loop, this must be done inside the loop. [hirofumi@mail.parknet.co.jp: use snprintf()] Signed-off-by: Nikolaus Schulz <microschulz@web.de> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>