Opened 2 years ago
Closed 22 months ago
#199 closed defect (fixed)
Failure to attach vif to netvm
| Reported by: | rafal | Owned by: | marmarek |
|---|---|---|---|
| Priority: | major | Milestone: | Release 1 Beta 2 |
| Component: | xen | Keywords: | |
| Cc: |
Description
After many create/destroy domain cycles, xen is unable to do network-attach to netvm. In Netvm logs, there is:
[root@netvm1 ~]# udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
KERNEL[1302084480.843334] add /devices/xen-backend/vif-76-0
(xen-backend)
Apr 6 06:08:00 localhost kernel: [ 2516.786792] vif vif-76-0: 2 writing
feature-sg
Apr 6 06:08:00 localhost kernel: [ 2516.787018] vif vif-76-0: xenbus:
failed to write error node for backend/vif/76/0 (2 writing feature-sg)
Apr 6 06:08:00 localhost kernel: [ 2516.787532] vif vif-76-0: 2
xenbus_dev_probe on backend/vif/76/0
Apr 6 06:08:00 localhost kernel: [ 2516.787719] vif vif-76-0: xenbus:
failed to write error node for backend/vif/76/0 (2 xenbus_dev_probe on
backend/vif/76/0)
UDEV [1302084480.848507] add /devices/xen-backend/vif-76-0
(xen-backend)
the hotplug script is not called, the vif76.0 device is not present.
Nothing in xen logs, not dom0 logs.
Change History (6)
comment:1 Changed 2 years ago by rafal
comment:2 Changed 2 years ago by rafal
- Owner changed from joanna to rafal
- Status changed from new to accepted
comment:3 Changed 2 years ago by joanna
- Owner changed from rafal to marmarek
- Status changed from accepted to assigned
comment:4 Changed 2 years ago by joanna
- Resolution set to notanissue
- Status changed from assigned to closed
This will likely gone in Xen 4.1 that we use in Beta 2 now. So, I'm closing this now, and in case somebody discovered it on Beta 2, it should be reopened.
comment:5 Changed 22 months ago by rafal
- Resolution notanissue deleted
- Status changed from closed to reopened
The issue is still present in beta2.
This time, there is warning_slowpath in /var/log/messages in firewallvm, followed by
vif vif-68-0: xenbus: failed to write error node for backend/vif/68/0 (2)
In order to reproduce, it is enough to just run/destroy a domain in a loop, e.g.:
while qvm-run -a personal --pass_io 'echo alive' | grep -q alive ; do echo still alive; qvm-kill personal; done
After ca 60 iterations, firewallvm is unable to attach a device. And VM will take 300s to boot due to xenbus warnings.
The problem seems to be caused by two factors:
1) there is a limit on the number of xenstore keys a domain can create
2) if a backend is not dom0, then the backend has no privilege to remove e.g. backend/vif/client-xid
key upon device detach (e.g. upon domain termination)
This issue is likely to affect all non-dom0 backends, not only vifs.
The solution is to do
xenstore-chmod /local/domain/$backend-xid/vif/client-xid w"$backend-xid
when creating the key. The key is created by xl, so the proper place for the patch is libxl. Reassigning to Marek, who knows libxl already :) If possible, do it generically, not only for vifs, but for all backends.
comment:6 Changed 22 months ago by marmarek
- Resolution set to fixed
- Status changed from reopened to closed

Error 2 is ENOENT. However, if I
1) pause netvm1 (it has xid 1)
2) do manual xm network-attach test7 backend=netvm1
then the /local/domain/1/backend/vif/XID/0 is present, along with keys in it.
3) unpause netvm1
the same error.