summaryrefslogtreecommitdiffstats
path: root/elevator-Fix-a-race-in-elevator-switching-and-md.patch
blob: 5517687ff6a8e2b9a441b283edcbd394076bb370 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
                                                                                                                                                                                                                                                               
Delivered-To: jwboyer@gmail.com
Received: by 10.76.168.104 with SMTP id zv8csp55663oab;
        Fri, 30 Aug 2013 15:52:46 -0700 (PDT)
X-Received: by 10.68.244.168 with SMTP id xh8mr12419215pbc.3.1377903166373;
        Fri, 30 Aug 2013 15:52:46 -0700 (PDT)
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
        by mx.google.com with ESMTP id qc9si280431pac.269.1969.12.31.16.00.00;
        Fri, 30 Aug 2013 15:52:46 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Authentication-Results: mx.google.com;
       spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mail=linux-kernel-owner@vger.kernel.org;
       dkim=neutral (bad format) header.i=@hds.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753535Ab3H3WrV (ORCPT <rfc822;georgezhim@gmail.com>
	+ 99 others); Fri, 30 Aug 2013 18:47:21 -0400
Received: from usindpps04.hds.com ([207.126.252.17]:35636 "EHLO
	usindpps04.hds.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752650Ab3H3WrU (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 30 Aug 2013 18:47:20 -0400
DKIM-Signature:	v=1; a=rsa-sha256; c=relaxed/simple; d=hds.com; h=subject : to : from : cc
 : date : message-id : mime-version : content-type :
 content-transfer-encoding; s=mail1;
 bh=VofHN8IMnygn2hbqnFjLmX0PPEPbvpzE377u1RxpGOY=;
 b=piW6J78W57qDXBPJJuodWw/tvf0T//JbxKX6sLPvpuaOG2nBLMHzDqUeTYwFEQqUvdmf
 ZTkiwsKi0WEku3MKcxJ7veR7wvTZcQ4fGMETFTf1c2J/1JOKpXLnft4ERuW89/FAxw25
 wQM1ulsuQ3Cncl0I/sIaqMlaMOtvuQ/C8rsHorp+75eFiL6yx1jU5wMbuti4D/NprIET
 3r57cPZ0YCh6sLjvOgjay6mKyktMToyjHPx6X1TWCSWcwes33Popc1hpadxUdFI/0npL
 mN3Tttbe7e2RcmkXAZbwg8xj+FwSu3nIRC4G9UpFCsMz518C/AWZj4puwWE6VHZWVvVZ Rg== 
Received: from usindmail01.hds.com (usindmail03 [207.126.252.22])
	by usindpps04.hds.com (8.14.5/8.14.5) with ESMTP id r7UMlBjr025492;
	Fri, 30 Aug 2013 18:47:11 -0400
Received: from hds.com (usindnetf5d-vlan47float.corp.hds.com [10.74.73.11])
	by usindmail01.hds.com (8.14.1/8.14.1) with ESMTP id r7UMl8SG058466;
	Fri, 30 Aug 2013 18:47:10 -0400 (EDT)
Subject: [PATCH v2 1/2] elevator: Fix a race in elevator switching and md
 device initialization
To:	linux-kernel@vger.kernel.org
From:	Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Cc:	axboe@kernel.dk, tj@kernel.org, seiji.aguchi@hds.com,
	vgoyal@redhat.com, majianpeng@gmail.com
Date:	Fri, 30 Aug 2013 18:47:07 -0400
Message-ID: <20130830224707.21812.63516.stgit@hds.com>
User-Agent: StGit/0.16
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Proofpoint-SPF-Result: pass
X-Proofpoint-SPF-Record: v=spf1 mx ip4:207.126.244.0/26 ip4:207.126.252.0/25 include:mktomail.com
 include:cloud.hds.com ~all
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794,1.0.431,0.0.0000
 definitions=2013-08-30_09:2013-08-30,2013-08-30,1970-01-01 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=outbound_policy score=0 spamscore=0 suspectscore=1
 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=7.0.1-1305240000 definitions=main-1308300162
Sender:	linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List:	linux-kernel@vger.kernel.org

The soft lockup below happens at the boot time of the system using dm
multipath and the udev rules to switch scheduler.

[  356.127001] BUG: soft lockup - CPU#3 stuck for 22s! [sh:483]
[  356.127001] RIP: 0010:[<ffffffff81072a7d>]  [<ffffffff81072a7d>] lock_timer_base.isra.35+0x1d/0x50
...
[  356.127001] Call Trace:
[  356.127001]  [<ffffffff81073810>] try_to_del_timer_sync+0x20/0x70
[  356.127001]  [<ffffffff8118b08a>] ? kmem_cache_alloc_node_trace+0x20a/0x230
[  356.127001]  [<ffffffff810738b2>] del_timer_sync+0x52/0x60
[  356.127001]  [<ffffffff812ece22>] cfq_exit_queue+0x32/0xf0
[  356.127001]  [<ffffffff812c98df>] elevator_exit+0x2f/0x50
[  356.127001]  [<ffffffff812c9f21>] elevator_change+0xf1/0x1c0
[  356.127001]  [<ffffffff812caa50>] elv_iosched_store+0x20/0x50
[  356.127001]  [<ffffffff812d1d09>] queue_attr_store+0x59/0xb0
[  356.127001]  [<ffffffff812143f6>] sysfs_write_file+0xc6/0x140
[  356.127001]  [<ffffffff811a326d>] vfs_write+0xbd/0x1e0
[  356.127001]  [<ffffffff811a3ca9>] SyS_write+0x49/0xa0
[  356.127001]  [<ffffffff8164e899>] system_call_fastpath+0x16/0x1b

This is caused by a race between md device initialization by multipathd and
shell script to switch the scheduler using sysfs.

 - multipathd:
   SyS_ioctl -> do_vfs_ioctl -> dm_ctl_ioctl -> ctl_ioctl -> table_load
   -> dm_setup_md_queue -> blk_init_allocated_queue -> elevator_init
    q->elevator = elevator_alloc(q, e); // not yet initialized

 - sh -c 'echo deadline > /sys/$DEVPATH/queue/scheduler':
   elevator_switch (in the call trace above)
    struct elevator_queue *old = q->elevator;
    q->elevator = elevator_alloc(q, new_e);
    elevator_exit(old);                 // lockup! (*)

 - multipathd: (cont.)
    err = e->ops.elevator_init_fn(q);   // init fails; q->elevator is modified

(*) When del_timer_sync() is called, lock_timer_base() will loop infinitely
while timer->base == NULL. In this case, as timer will never initialized,
it results in lockup.

This patch introduces acquisition of q->sysfs_lock around elevator_init()
into blk_init_allocated_queue(), to provide mutual exclusion between
initialization of the q->scheduler and switching of the scheduler.

This should fix this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=902012

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
---
 block/blk-core.c |   10 +++++++++-
 block/elevator.c |    6 ++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 93a18d1..2f6275f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -739,9 +739,17 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 
 	q->sg_reserved_size = INT_MAX;
 
+	/* Protect q->elevator from elevator_change */
+	mutex_lock(&q->sysfs_lock);
+
 	/* init elevator */
-	if (elevator_init(q, NULL))
+	if (elevator_init(q, NULL)) {
+		mutex_unlock(&q->sysfs_lock);
 		return NULL;
+	}
+
+	mutex_unlock(&q->sysfs_lock);
+
 	return q;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/elevator.c b/block/elevator.c
index 668394d..02d4390 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -186,6 +186,12 @@ int elevator_init(struct request_queue *q, char *name)
 	struct elevator_type *e = NULL;
 	int err;
 
+	/*
+	 * q->sysfs_lock must be held to provide mutual exclusion between
+	 * elevator_switch() and here.
+	 */
+	lockdep_assert_held(&q->sysfs_lock);
+
 	if (unlikely(q->elevator))
 		return 0;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/