From 8c7904a00b06d2ee51149794b619e07369fcf9d4 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 31 Mar 2006 02:31:37 -0800 Subject: [PATCH] task: RCU protect task->usage A big problem with rcu protected data structures that are also reference counted is that you must jump through several hoops to increase the reference count. I think someone finally implemented atomic_inc_not_zero(&count) to automate the common case. Unfortunately this means you must special case the rcu access case. When data structures are only visible via rcu in a manner that is not determined by the reference count on the object (i.e. tasks are visible until their zombies are reaped) there is a much simpler technique we can employ. Simply delaying the decrement of the reference count until the rcu interval is over. What that means is that the proc code that looks up a task and later wants to sleep can now do: rcu_read_lock(); task = find_task_by_pid(some_pid); if (task) { get_task_struct(task); } rcu_read_unlock(); The effect on the rest of the kernel is that put_task_struct becomes cheaper and immediate, and in the case where the task has been reaped it frees the task immediate instead of unnecessarily waiting an until the rcu interval is over. Cleanup of task_struct does not happen when its reference count drops to zero, instead cleanup happens when release_task is called. Tasks can only be looked up via rcu before release_task is called. All rcu protected members of task_struct are freed by release_task. Therefore we can move call_rcu from put_task_struct into release_task. And we can modify release_task to not immediately release the reference count but instead have it call put_task_struct from the function it gives to call_rcu. The end result: - get_task_struct is safe in an rcu context where we have just looked up the task. - put_task_struct() simplifies into its old pre rcu self. This reorganization also makes put_task_struct uncallable from modules as it is not exported but it does not appear to be called from any modules so this should not be an issue, and is trivially fixed. Signed-off-by: Eric W. Biederman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/exit.c b/kernel/exit.c index bc0ec674d3f..6c2eeb8f639 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -127,6 +127,11 @@ static void __exit_signal(struct task_struct *tsk) } } +static void delayed_put_task_struct(struct rcu_head *rhp) +{ + put_task_struct(container_of(rhp, struct task_struct, rcu)); +} + void release_task(struct task_struct * p) { int zap_leader; @@ -168,7 +173,7 @@ repeat: spin_unlock(&p->proc_lock); proc_pid_flush(proc_dentry); release_thread(p); - put_task_struct(p); + call_rcu(&p->rcu, delayed_put_task_struct); p = leader; if (unlikely(zap_leader)) -- cgit