Opened 2 years ago

Closed 22 months ago

#199 closed defect (fixed)

Failure to attach vif to netvm

Reported by: rafal Owned by: marmarek
Priority: major Milestone: Release 1 Beta 2
Component: xen Keywords:
Cc:

Description

After many create/destroy domain cycles, xen is unable to do network-attach to netvm. In Netvm logs, there is:
[root@netvm1 ~]# udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[1302084480.843334] add /devices/xen-backend/vif-76-0
(xen-backend)
Apr 6 06:08:00 localhost kernel: [ 2516.786792] vif vif-76-0: 2 writing
feature-sg
Apr 6 06:08:00 localhost kernel: [ 2516.787018] vif vif-76-0: xenbus:
failed to write error node for backend/vif/76/0 (2 writing feature-sg)
Apr 6 06:08:00 localhost kernel: [ 2516.787532] vif vif-76-0: 2
xenbus_dev_probe on backend/vif/76/0
Apr 6 06:08:00 localhost kernel: [ 2516.787719] vif vif-76-0: xenbus:
failed to write error node for backend/vif/76/0 (2 xenbus_dev_probe on
backend/vif/76/0)
UDEV [1302084480.848507] add /devices/xen-backend/vif-76-0
(xen-backend)

the hotplug script is not called, the vif76.0 device is not present.
Nothing in xen logs, not dom0 logs.

Change History (6)

comment:1 Changed 2 years ago by rafal

Error 2 is ENOENT. However, if I
1) pause netvm1 (it has xid 1)
2) do manual xm network-attach test7 backend=netvm1
then the /local/domain/1/backend/vif/XID/0 is present, along with keys in it.
3) unpause netvm1
the same error.

comment:2 Changed 2 years ago by rafal

  • Owner changed from joanna to rafal
  • Status changed from new to accepted

comment:3 Changed 2 years ago by joanna

  • Owner changed from rafal to marmarek
  • Status changed from accepted to assigned

comment:4 Changed 2 years ago by joanna

  • Resolution set to notanissue
  • Status changed from assigned to closed

This will likely gone in Xen 4.1 that we use in Beta 2 now. So, I'm closing this now, and in case somebody discovered it on Beta 2, it should be reopened.

comment:5 Changed 22 months ago by rafal

  • Resolution notanissue deleted
  • Status changed from closed to reopened

The issue is still present in beta2.
This time, there is warning_slowpath in /var/log/messages in firewallvm, followed by

vif vif-68-0: xenbus: failed to write error node for backend/vif/68/0 (2)

In order to reproduce, it is enough to just run/destroy a domain in a loop, e.g.:

while qvm-run -a personal --pass_io 'echo alive' | grep -q alive ; do echo still alive; qvm-kill personal; done

After ca 60 iterations, firewallvm is unable to attach a device. And VM will take 300s to boot due to xenbus warnings.

The problem seems to be caused by two factors:
1) there is a limit on the number of xenstore keys a domain can create
2) if a backend is not dom0, then the backend has no privilege to remove e.g. backend/vif/client-xid
key upon device detach (e.g. upon domain termination)

This issue is likely to affect all non-dom0 backends, not only vifs.

The solution is to do
xenstore-chmod /local/domain/$backend-xid/vif/client-xid w"$backend-xid
when creating the key. The key is created by xl, so the proper place for the patch is libxl. Reassigning to Marek, who knows libxl already :) If possible, do it generically, not only for vifs, but for all backends.

comment:6 Changed 22 months ago by marmarek

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.