Eucalyptus
  1. Eucalyptus
  2. EUCA-3421

NC: lose track of instances when libvirtd segfaults

    Details

    • Type: Bug Bug
    • Status: Closed (View Workflow)
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0, 3.1.1, 3.2.0-devel
    • Fix Version/s: 3.1.2
    • Component/s: Node Controller
    • Security Level: Public (Anonymously viewable)
    • Labels:
      None
    • Security:
      No
    • SLA:
      Not Applicable
    • Rank:
      7135

      Description

      The NC cannot get information about running instances when libvirtd segfaults. We're observing many libvirtd segfaults on rhel/centos 6 when 10+ VMs are running on a single host. Workaround is to harden the NC against this case by having the NC code simply restart libvirtd if it detects a libvirtd crash.

        Activity

        Hide
        Dmitrii Zagorodnov added a comment -

        I moved the libvirtd check and restart code from NC proper into an NC hook, which gets invoked at the same point where check_libvirt_runtime() used to be. Currently, this NC hook (in tools/nc-hooks/libvirt-check.sh) faithfully implements the logic that used to be in check_libvirt_runtime().

        Garrett pointed out that on some distros we may not want to perform this check-and-restart because this is best left to general mechanisms. Also, the mechanism for checking and restarting may vary among distros (e.g. '/etc/init.d/SERVICE status' vs 'service SERVICE status'), as well as the name of the service (libvirt-bin, libvirtd). Those are all valid points, but for now I wanted to have the minimal viable solution in place, since the current logic has been tested on the system that inspired this change in the first place.

        Packaging implications: While the installation footprint has not changed (tools/nc-hooks/libvirt-check.sh is not getting installed by the Makefile), packaging spec should install that hook (into $EUCALYPTUS/etc/eucalyptus/nc-hooks) on the distros where the fix is desired. Let's talk about where that is, but I suspect it will be at least Centos 5.x and 6.x.

        QA implications: Once we know what distros we are targeting with this hook and we have the packages that install the hook, the efficacy of the code change can be tested by shutting down libvirtd at will and ensuring that it is brought back. For distros on which we do not install the fix, we should provide instructions for how to configure the operating system to restart libvirtd automatically, and those instructions, then, are the ones to be QAed.

        Show
        Dmitrii Zagorodnov added a comment - I moved the libvirtd check and restart code from NC proper into an NC hook, which gets invoked at the same point where check_libvirt_runtime() used to be. Currently, this NC hook (in tools/nc-hooks/libvirt-check.sh) faithfully implements the logic that used to be in check_libvirt_runtime(). Garrett pointed out that on some distros we may not want to perform this check-and-restart because this is best left to general mechanisms. Also, the mechanism for checking and restarting may vary among distros (e.g. '/etc/init.d/SERVICE status' vs 'service SERVICE status'), as well as the name of the service (libvirt-bin, libvirtd). Those are all valid points, but for now I wanted to have the minimal viable solution in place, since the current logic has been tested on the system that inspired this change in the first place. Packaging implications: While the installation footprint has not changed (tools/nc-hooks/libvirt-check.sh is not getting installed by the Makefile), packaging spec should install that hook (into $EUCALYPTUS/etc/eucalyptus/nc-hooks) on the distros where the fix is desired. Let's talk about where that is, but I suspect it will be at least Centos 5.x and 6.x. QA implications: Once we know what distros we are targeting with this hook and we have the packages that install the hook, the efficacy of the code change can be tested by shutting down libvirtd at will and ensuring that it is brought back. For distros on which we do not install the fix, we should provide instructions for how to configure the operating system to restart libvirtd automatically, and those instructions, then, are the ones to be QAed.
        Hide
        Dmitrii Zagorodnov added a comment -

        As of 66700823d70edf145446ed1c9860f400ddb839c5 the hook will try to accommodate 'service' method for checking and restarting the libvirt dæmon. The hook should also work on distros that call the service 'libvirt-bin'. I think the right course of action now is to decide on which distros we are installing the hook and test it manually on each of those (by shutting down libvirt service and seeing if it comes back).

        Show
        Dmitrii Zagorodnov added a comment - As of 66700823d70edf145446ed1c9860f400ddb839c5 the hook will try to accommodate 'service' method for checking and restarting the libvirt dæmon. The hook should also work on distros that call the service 'libvirt-bin'. I think the right course of action now is to decide on which distros we are installing the hook and test it manually on each of those (by shutting down libvirt service and seeing if it comes back).

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development