Details

    • Type: New Feature New Feature
    • Status: Unconfirmed (View Workflow)
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Security Level: Public (Anonymously viewable)
    • Labels:
    • Epic/Theme:
    • Rank:
      2659
    • Feature Category:
      Resource Management: Physical

      Description

      Thanks to many who had off-line discussions with me on this topic. Some quick points about Eucalyptus upgrade (mostly related to "Live Upgrade")

      1. We realized that there are no clearly defined industry standards for "rolling upgrade", "live upgrade", and "online upgrade", especially for clouds. Without theorizing too much, we decided to focus on "live upgrade" for the near term, and "rolling upgrade" for the longer term, with the following description customized for Eucalyptus.

      2. Live Upgrade refers to the process where the instances in Eucalyptus will remain live.

      2a. In other words, instances in Eucalyptus that are running prior to the beginning of the upgrade, will continue to run during the upgrade and beyond.

      2b. Live Upgrade does not guarantee that the instances that are live during the upgrade process will also be accessible by the end users during the upgrade process. There will be a downtime on the connectivity, but the goal will be to keep this minimal.

      2c. There are two different types of down-times we need to distinguish between - one, the downtime experienced by the cloud management services; two, the downtime in access to the live instances.

      2d. Live Upgrade will focus on minimizing both types of downtime, but priority is given for reducing the downtime in access to the instance

      3. "Rolling Upgrade" refers to the process where Eucalyptus components can be upgraded one at a time while keeping all the services and instance access intact. This is a tall order since this will involve running and testing Eucalyptus clouds with multiple versions of Eucalyptus at the same time. We will get to this. This is NOT in scope for this feature request.

        Issue Links

          Activity

          Hide
          Shashi Mysore added a comment -

          Zach Hill says -
          "We should probably generalize Live Upgrade to include any user-owned and externally reachable resource. Currently that is just instances, but in the future it would include ELBs as well, for example.

          We are going to have to be careful with regard to EBS volumes, particularly those on TGT-based SCs because upgrading eucalyptus on such a host may also upgrade TGT which will reset the connections when it restarts itself after upgrade. This is something we will have to address directly and test exact behavior."

          Show
          Shashi Mysore added a comment - Zach Hill says - "We should probably generalize Live Upgrade to include any user-owned and externally reachable resource. Currently that is just instances, but in the future it would include ELBs as well, for example. We are going to have to be careful with regard to EBS volumes, particularly those on TGT-based SCs because upgrading eucalyptus on such a host may also upgrade TGT which will reset the connections when it restarts itself after upgrade. This is something we will have to address directly and test exact behavior."
          Hide
          Lester Wade added a comment -

          I think rolling upgrade is something which is a long way off. It's a good goal to aim for but I don't think its an immediate priority. Obviously its desirable but I don't think I'm aware of any users who wouldn't expect some period of downtime during an upgrade, this is pretty much a fact of life Before and after such a live upgrade, they want resources to persist and those to be accessible to users still (users just can't make new requests against services). This is most important to them.

          I think the explanation above is very good. We still need to message this appropriately, "Live" might still be a little misleading; "The service is still live during the upgrade".

          Show
          Lester Wade added a comment - I think rolling upgrade is something which is a long way off. It's a good goal to aim for but I don't think its an immediate priority. Obviously its desirable but I don't think I'm aware of any users who wouldn't expect some period of downtime during an upgrade, this is pretty much a fact of life Before and after such a live upgrade, they want resources to persist and those to be accessible to users still (users just can't make new requests against services). This is most important to them. I think the explanation above is very good. We still need to message this appropriately, "Live" might still be a little misleading; "The service is still live during the upgrade".
          Hide
          Shashi Mysore added a comment -

          Agreed, Lester Wade and Benno Joy. Now contemplating renaming this to "Warm Upgrade" to avoid confusion.

          Show
          Shashi Mysore added a comment - Agreed, Lester Wade and Benno Joy . Now contemplating renaming this to "Warm Upgrade" to avoid confusion.
          Hide
          Shashi Mysore added a comment -

          Further discussion on this - "live" upgrade may be mis-leading in terms of what the industry is already used to. Any time we say "live", the expectation is that the services will remain live during the upgrade. In our case, there are two services – The Eucalyptus cloud itself serving API requests; the cloud resources - VMs, EBS volumes, etc.; We know that with the current definition of live upgrade, neither is the cloud nor are the resources going to be live (available for use / access) during the upgrade. Hence, "live upgrade" will lead to confusion. We need a different name. We came up with the following proposal -

          Let's call this "Warm Upgrade".

          I immediately googled to see if this jibes well with the market (specifically data center operations folks), and apparently it does. Cisco IOS uses this term extensively mainly to convey -

          "This functionality reduces the downtime of a device during planned Cisco IOS software upgrades or downgrades."

          and

          "You want to upgrade the router IOS version with minimal service interruption."

          I think "warm upgrade" is the right name for this feature. Any objections or comments?

          For future:

          • When we figure out a way to keep access to our services and cloud resources available during the upgrade, we will call it "live upgrade".
          • When we figure out a way to roll in the upgrades one service / one machine at a time with zero downtime, we will call it "rolling upgrade".

          Thoughts?

          Show
          Shashi Mysore added a comment - Further discussion on this - "live" upgrade may be mis-leading in terms of what the industry is already used to. Any time we say "live", the expectation is that the services will remain live during the upgrade. In our case, there are two services – The Eucalyptus cloud itself serving API requests; the cloud resources - VMs, EBS volumes, etc.; We know that with the current definition of live upgrade, neither is the cloud nor are the resources going to be live (available for use / access) during the upgrade. Hence, "live upgrade" will lead to confusion. We need a different name. We came up with the following proposal - Let's call this "Warm Upgrade". I immediately googled to see if this jibes well with the market (specifically data center operations folks), and apparently it does. Cisco IOS uses this term extensively mainly to convey - "This functionality reduces the downtime of a device during planned Cisco IOS software upgrades or downgrades." and "You want to upgrade the router IOS version with minimal service interruption." I think "warm upgrade" is the right name for this feature. Any objections or comments? For future: When we figure out a way to keep access to our services and cloud resources available during the upgrade, we will call it "live upgrade". When we figure out a way to roll in the upgrades one service / one machine at a time with zero downtime, we will call it "rolling upgrade". Thoughts?
          Hide
          Colby Dyess added a comment -

          Thanks for the heads up Shashi Mysore. From a marketing standpoint the current messaging is sufficient. Details for this, or any other capability, should be covered in documentation and other authoritative sources.

          Show
          Colby Dyess added a comment - Thanks for the heads up Shashi Mysore . From a marketing standpoint the current messaging is sufficient. Details for this, or any other capability, should be covered in documentation and other authoritative sources.
          Hide
          Lester Wade added a comment -

          As an administrator, I would also like the capability to upgrade AZ's one at a time. This would enable me to schedule the upgrade across different AZ's or the second tier (with cloud-wide components like the service endpoints being the first-tier).

          Show
          Lester Wade added a comment - As an administrator, I would also like the capability to upgrade AZ's one at a time. This would enable me to schedule the upgrade across different AZ's or the second tier (with cloud-wide components like the service endpoints being the first-tier).

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development