lockdep splat when disconnecting NVMe oF TCP inititator
Sagi Grimberg
sagi at grimberg.me
Tue Sep 29 04:20:00 EDT 2020
> Hi nvme folks,
>
> I've toyed a bit with NVMe oF TCP over the last few days and came
> across the following lockdep splat on the target when disconnecting the
> initiator. I don't know if I'll have time to look into this anytime
> soon, so I figured I could at least leave the report here.
This is the way the driver always behaved... To be honest I'm not clear
at all that this is not a false negative.
Nothing in the release_work is dependent on the flush_scheduled_work
waiting for online queue removals...
I'll try to have a deeper look later.
>
> Regards,
> Lucas
>
> [ 142.078964] WARNING: possible circular locking dependency detected
> [ 142.085172] 5.9.0-rc5 #2 Not tainted
> [ 142.091982] ------------------------------------------------------
> [ 142.098186] kworker/0:3/74 is trying to acquire lock:
> [ 142.103259] ffff000015228830 ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: __flush_work+0x54/0x510
> [ 142.112981]
> [ 142.112981] but task is already holding lock:
> [ 142.118835] ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [ 142.129413]
> [ 142.129413] which lock already depends on the new lock.
> [ 142.129413]
> [ 142.137618]
> [ 142.137618] the existing dependency chain (in reverse order) is:
> [ 142.145124]
> [ 142.145124] -> #2 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
> [ 142.153352] process_one_work+0x248/0x710
> [ 142.157906] worker_thread+0x74/0x470
> [ 142.162119] kthread+0x15c/0x160
> [ 142.165894] ret_from_fork+0x10/0x38
> [ 142.170005]
> [ 142.170005] -> #1 ((wq_completion)events){+.+.}-{0:0}:
> [ 142.176665] flush_workqueue+0x98/0x400
> [ 142.181054] nvmet_tcp_install_queue+0x11c/0x130
> [ 142.186217] nvmet_install_queue+0xbc/0x150
> [ 142.190946] nvmet_execute_admin_connect+0x11c/0x200
> [ 142.196458] nvmet_tcp_io_work+0x8c0/0x950
> [ 142.201099] process_one_work+0x294/0x710
> [ 142.205654] worker_thread+0x74/0x470
> [ 142.209860] kthread+0x15c/0x160
> [ 142.213630] ret_from_fork+0x10/0x38
> [ 142.217742]
> [ 142.217742] -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
> [ 142.225542] __lock_acquire+0x13fc/0x2160
> [ 142.230097] lock_acquire+0xec/0x4d0
> [ 142.234215] __flush_work+0x7c/0x510
> [ 142.238333] flush_work+0x14/0x20
> [ 142.242192] nvmet_tcp_release_queue_work+0xb0/0x280
> [ 142.247703] process_one_work+0x294/0x710
> [ 142.252257] worker_thread+0x74/0x470
> [ 142.256465] kthread+0x15c/0x160
> [ 142.260234] ret_from_fork+0x10/0x38
> [ 142.264346]
> [ 142.264346] other info that might help us debug this:
> [ 142.264346]
> [ 142.272375] Chain exists of:
> [ 142.272375] (work_completion)(&queue->io_work) --> (wq_completion)events --> (work_completion)(&queue->release_work)
> [ 142.272375]
> [ 142.287476] Possible unsafe locking scenario:
> [ 142.287476]
> [ 142.293415] CPU0 CPU1
> [ 142.297962] ---- ----
> [ 142.302508] lock((work_completion)(&queue->release_work));
> [ 142.308194] lock((wq_completion)events);
> [ 142.314836] lock((work_completion)(&queue->release_work));
> [ 142.323046] lock((work_completion)(&queue->io_work));
> [ 142.328296]
> [ 142.328296] *** DEADLOCK ***
> [ 142.328296]
> [ 142.334239] 2 locks held by kworker/0:3/74:
> [ 142.338438] #0: ffff000017405738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [ 142.347887] #1: ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [ 142.358901]
> [ 142.358901] stack backtrace:
> [ 142.363284] CPU: 0 PID: 74 Comm: kworker/0:3 Not tainted 5.9.0-rc5 #2
> [ 142.372970] Hardware name: XXX (DT)
> [ 142.378053] Workqueue: events nvmet_tcp_release_queue_work
> [ 142.383563] Call trace:
> [ 142.386032] dump_backtrace+0x0/0x1b0
> [ 142.389717] show_stack+0x18/0x30
> [ 142.393055] dump_stack+0xe8/0x15c
> [ 142.396480] print_circular_bug+0x278/0x280
> [ 142.400688] check_noncircular+0x164/0x1e0
> [ 142.404808] __lock_acquire+0x13fc/0x2160
> [ 142.408841] lock_acquire+0xec/0x4d0
> [ 142.412437] __flush_work+0x7c/0x510
> [ 142.416032] flush_work+0x14/0x20
> [ 142.419369] nvmet_tcp_release_queue_work+0xb0/0x280
> [ 142.424358] process_one_work+0x294/0x710
> [ 142.428390] worker_thread+0x74/0x470
> [ 142.432075] kthread+0x15c/0x160
> [ 142.435322] ret_from_fork+0x10/0x38
>
More information about the Linux-nvme
mailing list