summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-06-024-9/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Make syn floods consume significantly less resources by a) Not pre-COW'ing routing metrics for SYN/ACKs b) Mirroring the device queue mapping of the SYN for the SYN/ACK reply. Both from Eric Dumazet. 2) Fix calculation errors in Byte Queue Limiting, from Hiroaki SHIMODA. 3) Validate the length requested when building a paged SKB for a socket, so we don't overrun the page vector accidently. From Jason Wang. 4) When netlabel is disabled, we abort all IP option processing when we see a CIPSO option. This isn't the right thing to do, we should simply skip over it and continue processing the remaining options (if any). Fix from Paul Moore. 5) SRIOV fixes for the mellanox driver from Jack orgenstein and Marcel Apfelbaum. 6) 8139cp enables the receiver before the ring address is properly programmed, which potentially lets the device crap over random memory. Fix from Jason Wang. 7) e1000/e1000e fixes for i217 RST handling, and an improper buffer address reference in jumbo RX frame processing from Bruce Allan and Sebastian Andrzej Siewior, respectively. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: fec_mpc52xx: fix timestamp filtering mcs7830: Implement link state detection e1000e: fix Rapid Start Technology support for i217 e1000: look into the page instead of skb->data for e1000_tbi_adjust_stats() r8169: call netif_napi_del at errpaths and at driver unload tcp: reflect SYN queue_mapping into SYNACK packets tcp: do not create inetpeer on SYNACK message 8139cp/8139too: terminate the eeprom access with the right opmode 8139cp: set ring address before enabling receiver cipso: handle CIPSO options correctly when NetLabel is disabled net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() bql: Avoid possible inconsistent calculation. bql: Avoid unneeded limit decrement. bql: Fix POSDIFF() to integer overflow aware. net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap net/mlx4_core: Fixes for VF / Guest startup flow net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event net/mlx4_core: Fix number of EQs used in ICM initialisation net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
| * tcp: reflect SYN queue_mapping into SYNACK packetsEric Dumazet2012-06-012-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing how linux behaves on SYNFLOOD attack on multiqueue device (ixgbe), I found that SYNACK messages were dropped at Qdisc level because we send them all on a single queue. Obvious choice is to reflect incoming SYN packet @queue_mapping to SYNACK packet. Under stress, my machine could only send 25.000 SYNACK per second (for 200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues. After patch, not a single SYNACK is dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: do not create inetpeer on SYNACK messageEric Dumazet2012-06-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting larger and larger, using lots of memory and cpu time. tcp_v4_send_synack() ->inet_csk_route_req() ->ip_route_output_flow() ->rt_set_nexthop() ->rt_init_metrics() ->inet_getpeer( create = true) This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for TCP) added in 2.6.39 Possible solution : Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS Before patch : # grep peer /proc/slabinfo inet_peer_cache 4175430 4175430 192 42 2 : tunables 0 0 0 : slabdata 99415 99415 0 Samples: 41K of event 'cycles', Event count (approx.): 30716565122 + 20,24% ksoftirqd/0 [kernel.kallsyms] [k] inet_getpeer + 8,19% ksoftirqd/0 [kernel.kallsyms] [k] peer_avl_rebalance.isra.1 + 4,81% ksoftirqd/0 [kernel.kallsyms] [k] sha_transform + 3,64% ksoftirqd/0 [kernel.kallsyms] [k] fib_table_lookup + 2,36% ksoftirqd/0 [ixgbe] [k] ixgbe_poll + 2,16% ksoftirqd/0 [kernel.kallsyms] [k] __ip_route_output_key + 2,11% ksoftirqd/0 [kernel.kallsyms] [k] kernel_map_pages + 2,11% ksoftirqd/0 [kernel.kallsyms] [k] ip_route_input_common + 2,01% ksoftirqd/0 [kernel.kallsyms] [k] __inet_lookup_established + 1,83% ksoftirqd/0 [kernel.kallsyms] [k] md5_transform + 1,75% ksoftirqd/0 [kernel.kallsyms] [k] check_leaf.isra.9 + 1,49% ksoftirqd/0 [kernel.kallsyms] [k] ipt_do_table + 1,46% ksoftirqd/0 [kernel.kallsyms] [k] hrtimer_interrupt + 1,45% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc + 1,29% ksoftirqd/0 [kernel.kallsyms] [k] inet_csk_search_req + 1,29% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb + 1,16% ksoftirqd/0 [kernel.kallsyms] [k] copy_user_generic_string + 1,15% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free + 1,02% ksoftirqd/0 [kernel.kallsyms] [k] tcp_make_synack + 0,93% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 0,87% ksoftirqd/0 [kernel.kallsyms] [k] __call_rcu + 0,84% ksoftirqd/0 [kernel.kallsyms] [k] rt_garbage_collect + 0,84% ksoftirqd/0 [kernel.kallsyms] [k] fib_rules_lookup Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()Jason Wang2012-05-311-2/+5
| | | | | | | | | | | | | | | | | | We need to validate the number of pages consumed by data_len, otherwise frags array could be overflowed by userspace. So this patch validate data_len and return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tty: Revert the tty locking series, it needs more workLinus Torvalds2012-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the tty layer change to use per-tty locking, because it's not correct yet, and fixing it will require some more deep surgery. The main revert is d29f3ef39be4 ("tty_lock: Localise the lock"), but there are several smaller commits that built upon it, they also get reverted here. The list of reverted commits is: fde86d310886 - tty: add lockdep annotations 8f6576ad476b - tty: fix ldisc lock inversion trace d3ca8b64b97e - pty: Fix lock inversion b1d679afd766 - tty: drop the pty lock during hangup abcefe5fc357 - tty/amiserial: Add missing argument for tty_unlock() fd11b42e3598 - cris: fix missing tty arg in wait_event_interruptible_tty call d29f3ef39be4 - tty_lock: Localise the lock The revert had a trivial conflict in the 68360serial.c staging driver that got removed in the meantime. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-06-011-2/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...
| * | sch_atm.c: get rid of poinless externAl Viro2012-06-011-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | sockfd_lookup() is declared in linux/net.h, which is pulled by linux/skbuff.h (and needed for a lot of other stuff in sch_atm.c anyway). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2012-06-016-58/+75
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull the rest of the nfsd commits from Bruce Fields: "... and then I cherry-picked the remainder of the patches from the head of my previous branch" This is the rest of the original nfsd branch, rebased without the delegation stuff that I thought really needed to be redone. I don't like rebasing things like this in general, but in this situation this was the lesser of two evils. * 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits) nfsd4: fix, consolidate client_has_state nfsd4: don't remove rebooted client record until confirmation nfsd4: remove some dprintk's and a comment nfsd4: return "real" sequence id in confirmed case nfsd4: fix exchange_id to return confirm flag nfsd4: clarify that renewing expired client is a bug nfsd4: simpler ordering of setclientid_confirm checks nfsd4: setclientid: remove pointless assignment nfsd4: fix error return in non-matching-creds case nfsd4: fix setclientid_confirm same_cred check nfsd4: merge 3 setclientid cases to 2 nfsd4: pull out common code from setclientid cases nfsd4: merge last two setclientid cases nfsd4: setclientid/confirm comment cleanup nfsd4: setclientid remove unnecessary terms from a logical expression nfsd4: move rq_flavor into svc_cred nfsd4: stricter cred comparison for setclientid/exchange_id nfsd4: move principal name into svc_cred nfsd4: allow removing clients not holding state nfsd4: rearrange exchange_id logic to simplify ...
| * | | nfsd4: move rq_flavor into svc_credJ. Bruce Fields2012-05-312-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the rq_flavor into struct svc_cred, and use it in setclientid and exchange_id comparisons as well. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: move principal name into svc_credJ. Bruce Fields2012-05-312-19/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of keeping the principal name associated with a request in a structure that's private to auth_gss and using an accessor function, move it to svc_cred. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | svcrpc: fix a comment typoJ. Bruce Fields2012-05-311-1/+1
| | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | sunrpc: do array overrun check in svc_recv before allocating pagesJeff Layton2012-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's little point in waiting until after we allocate all of the pages to see if we're going to overrun the array. In the event that this calculation is really off we could end up scribbling over a bunch of memory and make it tougher to debug. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | SUNRPC: move per-net operations from svc_destroy()Stanislav Kinsbursky2012-05-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is to separate service destruction and per-net operations, because these are two different things and the mix looks ugly. Notes: 1) For NFS server this patch looks ugly (sorry for that). But these place will be rewritten soon during NFSd containerization. 2) LockD per-net counter increase int lockd_up() was moved prior to make_socks() to make lockd_down_net() call safe in case of error. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | SUNRPC: new svc_bind() routine introducedStanislav Kinsbursky2012-05-312-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new routine is responsible for service registration in a specified network context. The idea is to separate service creation from per-net operations. Note also: since registering service with svc_bind() can fail, the service will be destroyed and during destruction it will try to unregister itself from rpcbind. In this case unregistration has to be skipped. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | rpc: handle rotated gss data for Windows interoperabilityJ. Bruce Fields2012-05-311-16/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data in Kerberos gss tokens can be rotated. But we were lazy and rejected any nonzero rotation value. It wasn't necessary for the implementations we were testing against at the time. But it appears that Windows does use a nonzero value here. So, implement rotation to bring ourselves into compliance with the spec and to interoperate with Windows. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | Merge branch 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2012-05-312-45/+68
|\| | | | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd update from Bruce Fields. * 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux: (23 commits) nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek SUNRPC: split upcall function to extract reusable parts nfsd: allocate id-to-name and name-to-id caches in per-net operations. nfsd: make name-to-id cache allocated per network namespace context nfsd: make id-to-name cache allocated per network namespace context nfsd: pass network context to idmap init/exit functions nfsd: allocate export and expkey caches in per-net operations. nfsd: make expkey cache allocated per network namespace context nfsd: make export cache allocated per network namespace context nfsd: pass pointer to export cache down to stack wherever possible. nfsd: pass network context to export caches init/shutdown routines Lockd: pass network namespace to creation and destruction routines NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches nfsd: pass pointer to expkey cache down to stack wherever possible. nfsd: use hash table from cache detail in nfsd export seq ops nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops nfsd: use exp_put() for svc_export_cache put nfsd: use cache detail pointer from svc_export structure on cache put nfsd: add link to owner cache detail to svc_export structure nfsd: use passed cache_detail pointer expkey_parse() ...
| * | SUNRPC: split upcall function to extract reusable partsSimo Sorce2012-04-251-36/+64
| | | | | | | | | | | | | | | | | | | | | | | | This is needed to share code between the current server upcall mechanism and the new gssproxy upcall mechanism introduced in a following patch. Signed-off-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | nfsd: make expkey cache allocated per network namespace contextStanislav Kinsbursky2012-04-121-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch also changes svcauth_unix_purge() function: added network namespace as a parameter and thus loop over all networks was replaced by only one call for ip map cache purge. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-05-3114-61/+122
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David S. Miller: 1) Fix IPSEC header length calculation for transport mode in ESP. The issue is whether to do the calculation before or after alignment. Fix from Benjamin Poirier. 2) Fix regression in IPV6 IPSEC fragment length calculations, from Gao Feng. This is another transport vs tunnel mode issue. 3) Handle AF_UNSPEC connect()s properly in L2TP to avoid OOPSes. Fix from James Chapman. 4) Fix USB ASIX driver's reception of full sized VLAN packets, from Eric Dumazet. 5) Allow drop monitor (and, more generically, all generic netlink protocols) to be automatically loaded as a module. From Neil Horman. Fix up trivial conflict in Documentation/feature-removal-schedule.txt due to new entries added next to each other at the end. As usual. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net/smsc911x: Repair broken failure paths virtio-net: remove useless disable on freeze netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG drop_monitor: Add module alias to enable automatic module loading genetlink: Build a generic netlink family module alias net: add MODULE_ALIAS_NET_PF_PROTO_NAME r6040: Do a Proper deinit at errorpath and also when driver unloads (calling r6040_remove_one) r6040: disable pci device if the subsequent calls (after pci_enable_device) fails skb: avoid unnecessary reallocations in __skb_cow net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens asix: allow full size 8021Q frames to be received rds_rdma: don't assume infiniband device is PCI l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case mac80211: fix ADDBA declined after suspend with wowlan wlcore: fix undefined symbols when CONFIG_PM is not defined mac80211: fix flag check for QoS NOACK frames ath9k_hw: apply internal regulator settings on AR933x ath9k_hw: update AR933x initvals to fix issues with high power devices ath9k: fix a use-after-free-bug when ath_tx_setup_buffer() fails ath9k: stop rx dma before stopping tx ...
| * \ \ Merge branch 'master' of ↵John W. Linville2012-05-303-8/+9
| |\ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
| | * | | mac80211: fix ADDBA declined after suspend with wowlanEyal Shapira2012-05-291-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WLAN_STA_BLOCK_BA is set while suspending but doesn't get cleared when resuming in case of wowlan. This causes further ADDBA requests received to be rejected. Fix it by clearing it in the wowlan path as well. Signed-off-by: Eyal Shapira <eyal@wizery.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@vger.kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | mac80211: fix flag check for QoS NOACK framesClaudio Pisa2012-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Claudio Pisa <claudio.pisa@uniroma2.it> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | mac80211: Fix race in checking AP status by sending null frameSoumik Das2012-05-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mac80211 tries to verify the existence of the current AP by probing or sending a NULL frame in function ieee80211_mgd_probe_ap_send. It 1st sends a null frame to the AP, increments probe_send_count and waits for the ACK to the NULL frame for a finite duration of time. At times, it happens that by the time mac80211 gets to increment probe_send_count, the ACK for the NULL frame transmitted has already been processed. This leads to a race condition where mac80211 times out waiting for the ACK for the NULL frame causing unnecessary disconnection with the AP. Signed-off-by: Soumik Das <soumik.das@stericsson.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | drop_monitor: Add module alias to enable automatic module loadingNeil Horman2012-05-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have module alias macros for generic netlink families, lets use those to mark modules with the appropriate family names for loading Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | genetlink: Build a generic netlink family module aliasNeil Horman2012-05-292-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic netlink searches for -type- formatted aliases when requesting a module to fulfill a protocol request (i.e. net-pf-16-proto-16-type-<x>, where x is a type value). However generic netlink protocols have no well defined type numbers, they have string names. Modify genl_ctrl_getfamily to request an alias in the format net-pf-16-proto-16-family-<x> instead, where x is a generic string, and add a macro that builds on the previously added MODULE_ALIAS_NET_PF_PROTO_NAME macro to allow modules to specifify those generic strings. Note, l2tp previously hacked together an net-pf-16-proto-16-type-l2tp alias using the MODULE_ALIAS macro, with these updates we can convert that to use the PROTO_NAME macro. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: James Chapman <jchapman@katalix.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | rds_rdma: don't assume infiniband device is PCIThadeu Lima de Souza Cascardo2012-05-291-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDS code assumes that the struct ib_device dma_device member, which is a pointer, points to a struct device embedded in a struct pci_dev. This is not the case for ehca, for example, which is a OF driver, and makes dma_device point to a struct device embedded in a struct platform_device. This will make the system crash when rds_rdma is loaded in a system with ehca, since it will try to access the bus member of a non-existent struct pci_dev. The only reason rds_rdma uses the struct pci_dev is to get the NUMA node the device is attached to. Using dev_to_node for that is much better, since it won't assume which bus the infiniband is attached to. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: dledford@redhat.com Cc: Jes.Sorensen@redhat.com Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC caseJames Chapman2012-05-292-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An application may call connect() to disconnect a socket using an address with family AF_UNSPEC. The L2TP IP sockets were not handling this case when the socket is not bound and an attempt to connect() using AF_UNSPEC in such cases would result in an oops. This patch addresses the problem by protecting the sk_prot->disconnect() call against trying to unhash the socket before it is bound. The L2TP IPv4 and IPv6 sockets have the same problem. Both are fixed by this patch. The patch also adds more checks that the sockaddr supplied to bind() and connect() calls is valid. RIP: 0010:[<ffffffff82e133b0>] [<ffffffff82e133b0>] inet_unhash+0x50/0xd0 RSP: 0018:ffff88001989be28 EFLAGS: 00010293 Stack: ffff8800407a8000 0000000000000000 ffff88001989be78 ffffffff82e3a249 ffffffff82e3a050 ffff88001989bec8 ffff88001989be88 ffff8800407a8000 0000000000000010 ffff88001989bec8 ffff88001989bea8 ffffffff82e42639 Call Trace: [<ffffffff82e3a249>] udp_disconnect+0x1f9/0x290 [<ffffffff82e42639>] inet_dgram_connect+0x29/0x80 [<ffffffff82d012fc>] sys_connect+0x9c/0x100 Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | ipv6: fix incorrect ipsec fragmentGao feng2012-05-272-18/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ad0081e43a "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed" the fragment of packets is incorrect. because tunnel mode needs IPsec headers and trailer for all fragments, while on transport mode it is sufficient to add the headers to the first fragment and the trailer to the last. so modify mtu and maxfraglen base on ipsec mode and if fragment is first or last. with my test,it work well(every fragment's size is the mtu) and does not trigger slow fragment path. Changes from v1: though optimization, mtu_prev and maxfraglen_prev can be delete. replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL. add fuction ip6_append_data_mtu to make codes clearer. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | xfrm: take net hdr len into account for esp payload size calculationBenjamin Poirier2012-05-272-26/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Corrects the function that determines the esp payload size. The calculations done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for certain mtu values and suboptimal frames for others. According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(), net_header_len must be taken into account before doing the alignment calculation. Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net/wanrouter: Deprecate and schedule for removalJoe Perches2012-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No one uses this on current kernels anymore. Let it be known it's going to be removed eventually. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds2012-05-307-276/+235
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...
| * | | | | libceph: fix pg_temp updatesSage Weil2012-05-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Usually, we are adding pg_temp entries or removing them. Occasionally they update. In that case, osdmap_apply_incremental() was failing because the rbtree entry already exists. Fix by removing the existing entry before inserting a new one. Fixes http://tracker.newdream.net/issues/2446 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
| * | | | | libceph: avoid unregistering osd request when not registeredSage Weil2012-05-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between two __unregister_request() callers: the reply path and the ceph_osdc_wait_request(). If we get a reply *and* the timeout expires at roughly the same time, both callers will try to unregister the request, and the second one will do bad things. Simply check if the request is still already unregistered; if so, return immediately and do nothing. Fixes http://tracker.newdream.net/issues/2420 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
| * | | | | ceph: add auth buf in prepare_write_connect()Alex Elder2012-05-181-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the addition of the authorizer buffer to a connection's out_kvec out of get_connect_authorizer() and into its caller. This way, the caller--prepare_write_connect()--can avoid adding the connect header to out_kvec before it has been fully initialized. Prior to this patch, it was possible for a connect header to be sent over the wire before the authorizer protocol or buffer length fields were initialized. An authorizer buffer associated with that header could also be queued to send only after the connection header that describes it was on the wire. Fixes http://tracker.newdream.net/issues/2424 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: rename prepare_connect_authorizer()Alex Elder2012-05-171-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the name of prepare_connect_authorizer(). The next patch is going to make this function no longer add anything to the connection's out_kvec, so it will no longer fit the pattern of the rest of the prepare_connect_*() functions. In addition, pass the address of a variable that will hold the authorization protocol to use. Move the assignment of that to the connection's out_connect structure into prepare_write_connect(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: return pointer from prepare_connect_authorizer()Alex Elder2012-05-171-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change prepare_connect_authorizer() so it returns a pointer (or pointer-coded error). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: use info returned by get_authorizerAlex Elder2012-05-172-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: have get_authorizer methods return pointersAlex Elder2012-05-172-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: ensure auth ops are defined before useAlex Elder2012-05-171-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: reduce args to create_authorizerAlex Elder2012-05-173-32/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: define ceph_auth_handshake typeAlex Elder2012-05-171-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: check return from get_authorizerAlex Elder2012-05-171-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In prepare_connect_authorizer(), a connection's get_authorizer method is called but ignores its return value. This function can return an error, so check for it and return it if that ever occurs. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: rework prepare_connect_authorizer()Alex Elder2012-05-171-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change prepare_connect_authorizer() so it returns without dropping the connection mutex if the connection has no get_authorizer method. Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning authorization protocols. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: check prepare_write_connect() resultAlex Elder2012-05-171-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | prepare_write_connect() can return an error, but only one of its callers checks for it. All the rest are in functions that already return errors, so it should be fine to return the error if one gets returned. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: don't set WRITE_PENDING too earlyAlex Elder2012-05-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | prepare_write_connect() prepares a connect message, then sets WRITE_PENDING on the connection. Then *after* this, it calls prepare_connect_authorizer(), which updates the content of the connection buffer already queued for sending. It's also possible it will result in prepare_write_connect() returning -EAGAIN despite the WRITE_PENDING big getting set. Fix this by preparing the connect authorizer first, setting the WRITE_PENDING bit only after that is done. Partially addresses http://tracker.newdream.net/issues/2424 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: drop msgr argument from prepare_write_connect()Alex Elder2012-05-171-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In all cases, the value passed as the msgr argument to prepare_write_connect() is just con->msgr. Just get the msgr value from the ceph connection and drop the unneeded argument. The only msgr passed to prepare_write_banner() is also therefore just the one from con->msgr, so change that function to drop the msgr argument as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: send banner in process_connect()Alex Elder2012-05-171-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | prepare_write_connect() has an argument indicating whether a banner should be sent out before sending out a connection message. It's only ever set in one of its callers, so move the code that arranges to send the banner into that caller and drop the "include_banner" argument from prepare_write_connect(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: reset connection kvec callerAlex Elder2012-05-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reset a connection's kvec fields in the caller rather than in prepare_write_connect(). This ends up repeating a few lines of code but it's improving the separation between distinct operations on the connection, which we can take advantage of later. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | libceph: don't reset kvec in prepare_write_banner()Alex Elder2012-05-171-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the kvec reset for a connection out of prepare_write_banner and into its only caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | ceph: messenger: change read_partial() to take "end" argAlex Elder2012-05-141-22/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the second argument to read_partial() be the ending input byte position rather than the beginning offset it now represents. This amounts to moving the addition "to + size" into the caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>