lockdep splat when disconnecting NVMe oF TCP inititator

Wed Sep 23 05:02:05 EDT 2020

Hi nvme folks,

I've toyed a bit with NVMe oF TCP over the last few days and came
across the following lockdep splat on the target when disconnecting the
initiator. I don't know if I'll have time to look into this anytime
soon, so I figured I could at least leave the report here.

Regards,
Lucas

[  142.078964] WARNING: possible circular locking dependency detected
[  142.085172] 5.9.0-rc5 #2 Not tainted
[  142.091982] ------------------------------------------------------
[  142.098186] kworker/0:3/74 is trying to acquire lock:
[  142.103259] ffff000015228830 ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: __flush_work+0x54/0x510
[  142.112981] 
[  142.112981] but task is already holding lock:
[  142.118835] ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
[  142.129413] 
[  142.129413] which lock already depends on the new lock.
[  142.129413] 
[  142.137618] 
[  142.137618] the existing dependency chain (in reverse order) is:
[  142.145124] 
[  142.145124] -> #2 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
[  142.153352]        process_one_work+0x248/0x710
[  142.157906]        worker_thread+0x74/0x470
[  142.162119]        kthread+0x15c/0x160
[  142.165894]        ret_from_fork+0x10/0x38
[  142.170005] 
[  142.170005] -> #1 ((wq_completion)events){+.+.}-{0:0}:
[  142.176665]        flush_workqueue+0x98/0x400
[  142.181054]        nvmet_tcp_install_queue+0x11c/0x130
[  142.186217]        nvmet_install_queue+0xbc/0x150
[  142.190946]        nvmet_execute_admin_connect+0x11c/0x200
[  142.196458]        nvmet_tcp_io_work+0x8c0/0x950
[  142.201099]        process_one_work+0x294/0x710
[  142.205654]        worker_thread+0x74/0x470
[  142.209860]        kthread+0x15c/0x160
[  142.213630]        ret_from_fork+0x10/0x38
[  142.217742] 
[  142.217742] -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
[  142.225542]        __lock_acquire+0x13fc/0x2160
[  142.230097]        lock_acquire+0xec/0x4d0
[  142.234215]        __flush_work+0x7c/0x510
[  142.238333]        flush_work+0x14/0x20
[  142.242192]        nvmet_tcp_release_queue_work+0xb0/0x280
[  142.247703]        process_one_work+0x294/0x710
[  142.252257]        worker_thread+0x74/0x470
[  142.256465]        kthread+0x15c/0x160
[  142.260234]        ret_from_fork+0x10/0x38
[  142.264346] 
[  142.264346] other info that might help us debug this:
[  142.264346] 
[  142.272375] Chain exists of:
[  142.272375]   (work_completion)(&queue->io_work) --> (wq_completion)events --> (work_completion)(&queue->release_work)
[  142.272375] 
[  142.287476]  Possible unsafe locking scenario:
[  142.287476] 
[  142.293415]        CPU0                    CPU1
[  142.297962]        ----                    ----
[  142.302508]   lock((work_completion)(&queue->release_work));
[  142.308194]                                lock((wq_completion)events);
[  142.314836]                                lock((work_completion)(&queue->release_work));
[  142.323046]   lock((work_completion)(&queue->io_work));
[  142.328296] 
[  142.328296]  *** DEADLOCK ***
[  142.328296] 
[  142.334239] 2 locks held by kworker/0:3/74:
[  142.338438]  #0: ffff000017405738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
[  142.347887]  #1: ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
[  142.358901] 
[  142.358901] stack backtrace:
[  142.363284] CPU: 0 PID: 74 Comm: kworker/0:3 Not tainted 5.9.0-rc5 #2
[  142.372970] Hardware name: XXX (DT)
[  142.378053] Workqueue: events nvmet_tcp_release_queue_work
[  142.383563] Call trace:
[  142.386032]  dump_backtrace+0x0/0x1b0
[  142.389717]  show_stack+0x18/0x30
[  142.393055]  dump_stack+0xe8/0x15c
[  142.396480]  print_circular_bug+0x278/0x280
[  142.400688]  check_noncircular+0x164/0x1e0
[  142.404808]  __lock_acquire+0x13fc/0x2160
[  142.408841]  lock_acquire+0xec/0x4d0
[  142.412437]  __flush_work+0x7c/0x510
[  142.416032]  flush_work+0x14/0x20
[  142.419369]  nvmet_tcp_release_queue_work+0xb0/0x280
[  142.424358]  process_one_work+0x294/0x710
[  142.428390]  worker_thread+0x74/0x470
[  142.432075]  kthread+0x15c/0x160
[  142.435322]  ret_from_fork+0x10/0x38