glusterfs.git/xlators/protocol/server/src/server.h, branch v3.3.0qa30

protocol/server: Handle server send reply failure gracefully.

2012-03-19T15:11:43+00:00

Server send reply failure should not call server connection cleanup because
if a reconnection happens with in the grace-timeout the connection object is
reused. We must cleanup only on grace-timeout.

Change-Id: I7d171a863382646ff392031c2b845fe4f0d3d5dc
BUG: 803365
Signed-off-by: Mohammed Junaid 
Reviewed-on: http://review.gluster.com/2947
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

protocol/server: Clear internal locks on disconnect

2012-03-18T08:09:43+00:00

If there is a disconnect observed on the client when the
inode/entry unlock is issued, but the reconnection to server
happens with in the grace-time period the inode/entry lk will
live and the unlock will never come from that client.
The internal locks should be cleared on disconnect.

Change-Id: Ib45b1035cfe3b1de381ef3b331c930011e7403be
BUG: 803209
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.com/2966
Reviewed-by: Anand Avati 
Tested-by: Gluster Build System

protocol/server: Avoid race in add/del locker, connection_cleanup

2012-03-18T06:22:24+00:00

conn->ltable address keeps changing in
server_connection_cleanup every time it is called.
i.e. New ltable is created every time it is called.
Here is the race that happened:
---------------------------------------------------
thread-1                        | thread-2
add_locker is called with       |
conn->ltable. lets call the     |
ltable address lt1              |
                                | connection cleanup is called
                                | and do_lock_table_cleanup is
                                | triggered for lt1. locker
                                | lists are splice_inited under
                                | the lt1->lock
lt1 adds the locker under       |
lt1->lock (lets call this l1)   |
                                | GF_FREE(lt1) happens in
                                | do_lock_table_cleanup

The locker l1 that is added just before lt1 is freed will never
be cleared in the subsequent server_connection_cleanups as there
does not exist a reference to the locker. The stale lock remains
in the locks xlator even though the transport on which it was
issued is destroyed.

Change-Id: I0a02f16c703d1e7598b083aa1057cda9624eb3fe
BUG: 787601
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.com/2957
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Amar Tumballi 
Reviewed-by: Anand Avati

protocol/server: Remove connection from conf->conns w.o. race

2012-03-13T11:58:45+00:00

1) Adding the connection to conf->conns used to
happen in conf->mutex, but removing happened under conn->lock.
Fixed that as below.
When the connection object is created conn's ref, bind_ref count
is set to '1'. For bind_ref ref/unref happens under conf->mutex
whenever server_connection_get, put is called.
When bind_ref goes to '0' connection object is removed from
conf->conns under conf->mutex.  After it is removed from the list,
conn_unref is called outside the conf->mutex.
conn_ref/unref still happens under conn->lock.

2) Fixed races in server_connection_cleaup in grace_timer_handler
and server_setvolume.

Change-Id: Ie7b63b10f658af909a11c3327066667f5b7bd114
BUG: 801675
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.com/2911
Tested-by: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Vijay Bellur

protocol/server: Make conn object ref-counted

2012-03-01T17:12:10+00:00

Change-Id: I992a7f8a75edfe7d75afaa1abe0ad45e8f351c8b
BUG: 796581
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.com/2806
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

transport/socket: configuring tcp window-size

2012-02-29T10:10:44+00:00

Till now, send and recieve buffer window sizes for sockets
were set to a default glusterfs-specific value.
Linux's default window sizes have been found to be better
w.r.t performance, and hence, no more setting it to any
default value.

However, if one wishes, there's the new configuration option:
   network.tcp-window-size 
which takes a size value (int or human readable) and will set
the window size of sockets for both clients and servers.
Nfs clients will also be updated with the same.

Change-Id: I841479bbaea791b01086c42f58401ed297ff16ea
BUG: 795635
Signed-off-by: Rajesh Amaravathi 
Reviewed-on: http://review.gluster.com/2821
Tested-by: Gluster Build System 
Reviewed-by: Amar Tumballi 
Reviewed-by: Jeff Darcy 
Reviewed-by: Vijay Bellur

protocol/client,server: fcntl lock self healing.

2012-02-20T12:45:31+00:00

Currently(with out this patch), on a disconnect the server cleans up
the transport which inturn closes the fd's and releases the locks acquired on
those fd's by that client. On a reconnect, client just reopens the fd's but
doesn't reacquire the locks. The application that had previously acquired
the locks still is under the assumption that it is the owner of those locks
which might have been granted to other clients(if they request) by the server
leading to data corruption.

This patch allows the client to reacquire the fcntl locks (held on the fd's)
during client-server handshake.

* The server identifies the client via process-uuid-xl (which is a combination
  of uuid and client-protocol name, it is assumed to be unique) and lk-version
  number.

* The client maintains a list of process-uuid-xl, lk-version pair for each
  accepted connection. On a connect, the server traverses the list for a
  matching pair, if a matching pair is not found the the server returns
  lk-version with value 0, else it returns the lk-version it has in store.

* On a disconnect, the server and client enter grace period, and on the
  completion of the grace period, the client bumps up its lk-version number
  (which means, it will reacquire the locks the next time) and the server will
  distroy the connection. If reconnection happens within the grace period, the
  server will find the matching (process-uuid-xl, lk-version) pair in its list
  which guarantees that the fd's and there corresponding locks are still valid
  for this client.

Configurable options:
  To set grace-timeout, the following options are
    option server.grace-timeout value
    option client.grace-timeout value

  To enable or disable the lk-heal,
    option lk-heal [on|off]

gluster volume set command can be used to configurable options
Change-Id: Id677ef1087b300d649f278b8b2aa0d94eae85ed2
BUG: 795386
Signed-off-by: Mohammed Junaid 
Reviewed-on: http://review.gluster.com/2766
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

core: get xattrs also as part of readdirp

2012-01-25T10:03:44+00:00

readdirp_req() call sends a dict_t * as an argument, which
contains all the xattr keys for which the entries got in
readdirp_rsp() are having xattr value filled dictionary.

Change-Id: I8b7e1290740ea3e884e67d19156ce849227167c0
Signed-off-by: Amar Tumballi 
BUG: 765785
Reviewed-on: http://review.gluster.com/771
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

core: change lk-owner as a 1k buffer

2012-01-25T04:14:17+00:00

so, NLM can send the lk-owner field directly to the locks translators,
while doing the same effort, also enabled sending maximum of 500 aux gid
over protocol.

Change-Id: I87c2514392748416f7ffe21d5154faad2e413969
Signed-off-by: Amar Tumballi 
BUG: 767229
Reviewed-on: http://review.gluster.com/779
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

core: GFID filehandle based backend and anonymous FDs

2012-01-20T13:03:42+00:00

1. What
--------
This change introduces an infrastructure change in the filesystem
which lets filesystem operation address objects (inodes) just by its
GFID. Thus far GFID has been a unique identifier of a user-visible
inode. But in terms of addressability the only mechanism thus far has
been the backend filesystem path, which could be derived from the
GFID only if it was cached in the inode table along with the entire set
of dentry ancestry leading up to the root.

This change essentially decouples addressability from the namespace. It
is no more necessary to be aware of the parent directory to address a
file or directory.

2. Why
-------
The biggest use case for such a feature is NFS for generating
persistent filehandles. So far the technique for generating filehandles
in NFS has been to encode path components so that the appropriate
inode_t can be repopulated into the inode table by means of a recursive
lookup of each component top-down.

Another use case is the ability to perform more intelligent self-healing
and rebalancing of inodes with hardlinks and also to detect renames.

A derived feature from GFID filehandles is anonymous FDs. An anonymous FD
is an internal USABLE "fd_t" which does not map to a user opened file
descriptor or to an internal ->open()'d fd. The ability to address a file
by the GFID eliminates the need to have a persistent ->open()'d fd for the
purpose of avoiding the namespace. This improves NFS read/write performance
significantly eliminating open/close calls and also fixes some of today's
limitations (like keeping an FD open longer than necessary resulting
in disk space leakage)

3. How
-------

At each storage/posix translator level, every file is hardlinked inside
a hidden .glusterfs directory (under the top level export) with the name
as the ascii-encoded standard UUID format string. For reasons of performance
and scalability there is a two-tier classification of those hardlinks
under directories with the initial parts of the UUID string as the directory
names.

For directories (which cannot be hardlinked), the approach is to use a symlink
which dereferences the parent GFID path along with basename of the directory.
The parent GFID dereference will in turn be a dereference of the grandparent
with the parent's basename, and so on recursively up to the root export.

4. Development
---------------

4a. To leverage the ability to address an inode by its GFID, the technique is
to perform a "nameless lookup". This means, to populate a loc_t structure as:

loc_t {
   pargfid: NULL
   parent: NULL
   name: NULL
   path: NULL
   gfid: GFID to be looked up [out parameter]
   inode: inode_new () result [in parameter]
}

and performing such lookup will return in its callback an inode_t
populated with the right contexts and a struct iatt which can be
used to perform an inode_link () on the inode (without a parent and
basename). The inode will now be hashed and linked in the inode table
and findable via inode_find().

A fundamental change moving forward is that the primary fields in a
loc_t structure are now going to be (pargfid, name) and (gfid) depending
on the kind of FOP. So far path had been the primary field for operations.
The remaining fields only serve as hints/helpers.

4b. If read/write is to be performed on an inode_t, the approach so far
has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and
then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do
fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost
in performance in the inbuilt NFS server.

5. Misc
-------
The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent
with the rest of the codebase.

Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f
BUG: 781318
Reviewed-on: http://review.gluster.com/669
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati