taikun.cloud

Taikun OCP Guide

Table of Contents

Upgrading Keystone

As of the Newton release, keystone supports two different approaches
to upgrading across releases. The traditional approach requires a
significant outage to be scheduled for the entire duration of the
upgrade process. The more modern approach results in zero downtime, but
is more complicated due to a longer upgrade procedure.

Note

The details of these steps are entirely dependent on the details of
your specific deployment, such as your chosen application server and
database management system. Use it only as a guide when implementing
your own upgrade process.

Before you begin

Plan your upgrade:

  • Read and ensure you understand the release
    notes
    for the next release.
  • Resolve any outstanding deprecation warnings in your logs. Some
    deprecation cycles are as short as a single release, so it’s possible to
    break a deployment if you leave any outstanding warnings. It
    might be a good idea to re-read the release notes for the previous
    release (or two!).
  • Prepare your new configuration files, including
    keystone.conf, logging.conf,
    policy.yaml, keystone-paste.ini, and anything
    else in /etc/keystone/, by customizing the corresponding
    files from the next release.

Upgrading with downtime

This is a high-level description of our upgrade strategy built around
keystone-manage db_sync. It assumes that you are willing to
have downtime of your control plane during the upgrade process and
presents minimal risk. With keystone unavailable, no other OpenStack
services will be able to authenticate requests, effectively preventing
the rest of the control plane from functioning normally.

  1. Stop all keystone processes. Otherwise, you’ll risk multiple
    releases of keystone trying to write to the database at the same time,
    which may result in data being inconsistently written and read.
  2. Make a backup of your database. Keystone does not support
    downgrading the database, so restoring from a full backup is your only
    option for recovery in the event of an upgrade failure.
  3. Upgrade all keystone nodes to the next release.
  4. Update your configuration files (/etc/keystone/) with
    those corresponding from the latest release.
  5. Run keystone-manage db_sync from any single node to
    upgrade both the database schema and run any corresponding database
    migrations.
  6. (New in Newton) Run keystone-manage doctor to
    diagnose symptoms of common deployment issues and receive instructions
    for resolving them.
  7. Start all keystone processes.

Upgrading with minimal
downtime

If you run a multi-node keystone cluster that uses a replicated
database, like a Galera cluster, it is possible to upgrade with minimal
downtime. This method also optimizes recovery time from a failed
upgrade. This section assumes familiarity with the base case (Upgrading with downtime) outlined
above. In these steps the nodes will be divided into first
and other nodes.

  1. Backup your database. There is no way to rollback the upgrade of
    keystone and this is your worst-case fallback option.
  2. Disable keystone on all nodes but the first node. This
    can be done via a variety of mechanisms that will depend on the
    deployment. If you are unable to disable a service or place a service
    into maintenance mode in your load balancer, you can stop the keystone
    processes.
  3. Stop the database service on one of the other nodes in
    the cluster. This will isolate the old dataset on a single node in the
    cluster. In the event of a failed update this data can be used to
    rebuild the cluster without having to restore from backup.
  4. Update the configuration files on the first node.
  5. Upgrade keystone on the first node. keystone is now
    down for your cloud.
  6. Run keystone-manage db_sync on the first
    node. As soon as this finishes, keystone is now working again on a
    single node in the cluster.
  7. keystone is now upgraded on a single node. Your load balancers will
    be sending all traffic to this single node. This is your chance to run
    ensure keystone up and running, and not broken. If keystone is broken,
    see the Rollback after a
    failed upgrade
    section below.
  8. Once you have verified that keystone is up and running, begin the
    upgrade on the other nodes. This entails updating
    configuration files and upgrading the code. The db_sync
    does not need to be run again.
  9. On the node where you stopped the database service, be sure to
    restart it and ensure that it properly rejoins the cluster.

Using this model, the outage window is minimized because the only
time when your cluster is totally offline is between loading the newer
version of keystone and running the db_sync command.
Typically the outage with this method can be measured in tens of seconds
especially if automation is used.

Rollback after a failed
upgrade

If the upgrade fails, only a single node has been affected. This
makes the recovery simpler and quicker. If issues are not discovered
until the entire cluster is upgraded, a full shutdown and restore from
backup will be required. That will take much longer than just fixing a
single node with an old copy of the database still available. This
process will be dependent on your architecture and it is highly
recommended that you’ve practiced this in a development environment
before trying to use it for the first time.

  1. Isolate the bad node. Shutdown keystone and the database services on
    the upgraded “bad” node.
  2. Bootstrap the database cluster from the node holding the old data.
    This may require wiping the data first on any nodes who are not holding
    old data.
  3. Enable keystone on the old nodes in your load balancer or if the
    processes were stopped, restart them.
  4. Validate that keystone is working.
  5. Downgrade the code and config files on the bad node.

This process should be doable in a matter of minutes and will
minimize cloud downtime if it is required.

Upgrading without downtime

10.0.0 (Newton)

Upgrading without downtime is only supported in deployments upgrading
from Newton or a newer release.

If upgrading a Mitaka deployment to Newton, the commands described
here will be available as described below, but the
keystone-manage db_sync --expand command will incur
downtime (similar to running keystone-manage db_sync), as
it runs legacy (downtime-incurring) migrations prior to running schema
expansions.

21.0.0 (Yoga)

The migration tooling was changed from SQLAlchemy-Migrate to
Alembic. As part of this change, the data migration phase of
the database upgrades was dropped.

This is a high-level description of our upgrade strategy built around
additional options in keystone-manage db_sync. Although it
is much more complex than the upgrade process described above, it
assumes that you are not willing to have downtime of your control plane
during the upgrade process. With this upgrade process, end users will
still be able to authenticate to receive tokens normally, and other
OpenStack services will still be able to authenticate requests
normally.

  1. Make a backup of your database. keystone does not support
    downgrading the database, so restoring from a full backup is your only
    option for recovery in the event of an upgrade failure.

  2. Stop the keystone processes on the first node (or really, any
    arbitrary node). This node will serve to orchestrate database
    upgrades.

  3. Upgrade your first node to the next release, but do not start any
    keystone processes.

  4. Update your configuration files on the first node
    (/etc/keystone/) with those corresponding to the latest
    release.

  5. Run keystone-manage doctor on the first node to
    diagnose symptoms of common deployment issues and receive instructions
    for resolving them.

  6. Run keystone-manage db_sync --expand on the first
    node to expand the database schema to a superset of what both the
    previous and next release can utilize, and create triggers to facilitate
    the live migration process.

    Warning

    For MySQL, using the keystone-manage db_sync --expand
    command requires that you either grant your keystone user
    SUPER privileges, or run
    set global log_bin_trust_function_creators=1; in mysql
    beforehand.

    At this point, new columns and tables may exist in the database, but
    will not all be populated in such a way that the next release
    will be able to function normally.

    As the previous release continues to write to the old schema,
    database triggers will live migrate the data to the new schema so it can
    be read by the next release.

    Note

    Prior to Yoga, data migrations were treated separatly and required
    the use of the keystone-manage db_sync --migrate command
    after applying the expand migrations. This is no longer necessary and
    the keystone-manage db_sync --migrate command is now a
    no-op.

  7. Update your configuration files (/etc/keystone/) on
    all nodes (except the first node, which you’ve already done) with those
    corresponding to the latest release.

  8. Upgrade all keystone nodes to the next release, and restart them
    one at a time. During this step, you’ll have a mix of releases operating
    side by side, both writing to the database.

    As the next release begins writing to the new schema, database
    triggers will also migrate the data to the old schema, keeping both data
    schemas in sync.

  9. Run keystone-manage db_sync --contract to remove the
    old schema and all data migration triggers.

    When this process completes, the database will no longer be able to
    support the previous release.

Using db_sync check

12.0.0 (Pike)

21.0.0 (Yoga)

Previously this command would return 3 if data
migrations were required. Data migrations are now part of the expand
schema migrations, therefore this step is no longer necessary.

In order to check the current state of your rolling upgrades, you may
run the command keystone-manage db_sync --check. This will
inform you of any outstanding actions you have left to take as well as
any possible upgrades you can make from your current version. Here are a
list of possible return codes.

  • A return code of 0 means you are currently up to date
    with the latest migration script version and all db_sync
    commands are complete.
  • A return code of 1 generally means something serious is
    wrong with your database and operator intervention will be
    required.
  • A return code of 2 means that an upgrade from your
    current database version is available, your database is not currently
    under version control, or the database is already under control. Your
    first step is to run keystone-manage db_sync --expand.
  • A return code of 4 means that the expansion and data
    migration stages are complete, and the next step is to run
    keystone-manage db_sync --contract.