taikun.cloud

Taikun Logo

Taikun OCP Guide

Table of Contents

Live-migrate instances

Live-migrating an instance means moving its virtual machine to a
different OpenStack Compute server while the instance continues running.
Before starting a live-migration, review the chapter section_configuring-compute-migrations. It covers the
configuration settings required to enable live-migration, but also
reasons for migrations and non-live-migration options.

The instructions below cover shared-storage and volume-backed
migration. To block-migrate instances, add the command-line option
-block-migrate to the nova live-migration command, and
--block-migration to the openstack server migrate command.

Manual selection of the
destination host

  1. Obtain the ID of the instance you want to migrate:

    $ openstack server list
    
    +--------------------------------------+------+--------+-----------------+------------+
    | ID                                   | Name | Status | Networks        | Image Name |
    +--------------------------------------+------+--------+-----------------+------------+
    | d1df1b5a-70c4-4fed-98b7-423362f2c47c | vm1  | ACTIVE | private=a.b.c.d | ...        |
    | d693db9e-a7cf-45ef-a7c9-b3ecb5f22645 | vm2  | ACTIVE | private=e.f.g.h | ...        |
    +--------------------------------------+------+--------+-----------------+------------+
  2. Determine on which host the instance is currently running. In
    this example, vm1 is running on HostB:

    $ openstack server show d1df1b5a-70c4-4fed-98b7-423362f2c47c
    
    +----------------------+--------------------------------------+
    | Field                | Value                                |
    +----------------------+--------------------------------------+
    | ...                  | ...                                  |
    | OS-EXT-SRV-ATTR:host | HostB                                |
    | ...                  | ...                                  |
    | addresses            | a.b.c.d                              |
    | flavor               | m1.tiny                              |
    | id                   | d1df1b5a-70c4-4fed-98b7-423362f2c47c |
    | name                 | vm1                                  |
    | status               | ACTIVE                               |
    | ...                  | ...                                  |
    +----------------------+--------------------------------------+
  3. Select the compute node the instance will be migrated to. In this
    example, we will migrate the instance to HostC, because
    nova-compute is running on it:

    $ openstack compute service list
    
    +----+------------------+-------+----------+---------+-------+----------------------------+
    | ID | Binary           | Host  | Zone     | Status  | State | Updated At                 |
    +----+------------------+-------+----------+---------+-------+----------------------------+
    |  3 | nova-conductor   | HostA | internal | enabled | up    | 2017-02-18T09:42:29.000000 |
    |  4 | nova-scheduler   | HostA | internal | enabled | up    | 2017-02-18T09:42:26.000000 |
    |  5 | nova-compute     | HostB | nova     | enabled | up    | 2017-02-18T09:42:29.000000 |
    |  6 | nova-compute     | HostC | nova     | enabled | up    | 2017-02-18T09:42:29.000000 |
    +----+------------------+-------+----------+---------+-------+----------------------------+
  4. Check that HostC has enough resources for
    migration:

    $ openstack host show HostC
    
    +-------+------------+-----+-----------+---------+
    | Host  | Project    | CPU | Memory MB | Disk GB |
    +-------+------------+-----+-----------+---------+
    | HostC | (total)    |  16 |     32232 |     878 |
    | HostC | (used_now) |  22 |     21284 |     422 |
    | HostC | (used_max) |  22 |     21284 |     422 |
    | HostC | p1         |  22 |     21284 |     422 |
    | HostC | p2         |  22 |     21284 |     422 |
    +-------+------------+-----+-----------+---------+
    • cpu: Number of CPUs
    • memory_mb: Total amount of memory, in MB
    • disk_gb: Total amount of space for
      NOVA-INST-DIR/instances, in GB

    In this table, the first row shows the total amount of resources
    available on the physical server. The second line shows the currently
    used resources. The third line shows the maximum used resources. The
    fourth line and below shows the resources available for each
    project.

  5. Migrate the instance:

    $ openstack server migrate d1df1b5a-70c4-4fed-98b7-423362f2c47c --live HostC
  6. Confirm that the instance has been migrated successfully:

    $ openstack server show d1df1b5a-70c4-4fed-98b7-423362f2c47c
    
    +----------------------+--------------------------------------+
    | Field                | Value                                |
    +----------------------+--------------------------------------+
    | ...                  | ...                                  |
    | OS-EXT-SRV-ATTR:host | HostC                                |
    | ...                  | ...                                  |
    +----------------------+--------------------------------------+

    If the instance is still running on HostB, the migration
    failed. The nova-scheduler and nova-conductor
    log files on the controller and the nova-compute log file
    on the source compute host can help pin-point the problem.

Automatic selection of the destination
host

To leave the selection of the destination host to the Compute
service, use the nova command-line client.

  1. Obtain the instance ID as shown in step 1 of the section section-manual-selection-of-dest.

  2. Leave out the host selection steps 2, 3, and 4.

  3. Migrate the instance:

    $ nova live-migration d1df1b5a-70c4-4fed-98b7-423362f2c47c

Monitoring the migration

  1. Confirm that the instance is migrating:

    $ openstack server show d1df1b5a-70c4-4fed-98b7-423362f2c47c
    
    +----------------------+--------------------------------------+
    | Field                | Value                                |
    +----------------------+--------------------------------------+
    | ...                  | ...                                  |
    | status               | MIGRATING                            |
    | ...                  | ...                                  |
    +----------------------+--------------------------------------+
  2. Check progress

    Use the nova command-line client for nova’s migration monitoring
    feature. First, obtain the migration ID:

    $ nova server-migration-list d1df1b5a-70c4-4fed-98b7-423362f2c47c
    +----+-------------+-----------  (...)
    | Id | Source Node | Dest Node | (...)
    +----+-------------+-----------+ (...)
    | 2  | -           | -         | (...)
    +----+-------------+-----------+ (...)

    For readability, most output columns were removed. Only the first
    column, Id, is relevant. In this example, the migration
    ID is 2. Use this to get the migration status.

    $ nova server-migration-show d1df1b5a-70c4-4fed-98b7-423362f2c47c 2
    +------------------------+--------------------------------------+
    | Property               | Value                                |
    +------------------------+--------------------------------------+
    | created_at             | 2017-03-08T02:53:06.000000           |
    | dest_compute           | controller                           |
    | dest_host              | -                                    |
    | dest_node              | -                                    |
    | disk_processed_bytes   | 0                                    |
    | disk_remaining_bytes   | 0                                    |
    | disk_total_bytes       | 0                                    |
    | id                     | 2                                    |
    | memory_processed_bytes | 65502513                             |
    | memory_remaining_bytes | 786427904                            |
    | memory_total_bytes     | 1091379200                           |
    | server_uuid            | d1df1b5a-70c4-4fed-98b7-423362f2c47c |
    | source_compute         | compute2                             |
    | source_node            | -                                    |
    | status                 | running                              |
    | updated_at             | 2017-03-08T02:53:47.000000           |
    +------------------------+--------------------------------------+

    The output shows that the migration is running. Progress is measured
    by the number of memory bytes that remain to be copied. If this number
    is not decreasing over time, the migration may be unable to complete,
    and it may be aborted by the Compute service.

    Note

    The command reports that no disk bytes are processed, even in the
    event of block migration.

What to do when the
migration times out

During the migration process, the instance may write to a memory page
after that page has been copied to the destination. When that happens,
the same page has to be copied again. The instance may write to memory
pages faster than they can be copied, so that the migration cannot
complete. There are two optional actions, controlled by libvirt.live_migration_timeout_action,
which can be taken against a VM after libvirt.live_migration_completion_timeout
is reached:

  1. abort (default): The live migration operation will be
    cancelled after the completion timeout is reached. This is similar to
    using API
    DELETE /servers/{server_id}/migrations/{migration_id}.
  2. force_complete: The compute service will either pause
    the VM or trigger post-copy depending on if post copy is enabled and
    available (libvirt.live_migration_permit_post_copy
    is set to True). This is similar to using
    API
    POST /servers/{server_id}/migrations/{migration_id}/action (force_complete).

You can also read the libvirt.live_migration_timeout_action
configuration option help for more details.

The following remarks assume the KVM/Libvirt hypervisor.

How to know that the
migration timed out

To determine that the migration timed out, inspect the
nova-compute log file on the source host. The following log
entry shows that the migration timed out:

# grep WARNING.*d1df1b5a-70c4-4fed-98b7-423362f2c47c /var/log/nova/nova-compute.log
...
WARNING nova.virt.libvirt.migration [req-...] [instance: ...]
live migration not completed after 1800 sec

Addressing migration
timeouts

To stop the migration from putting load on infrastructure resources
like network and disks, you may opt to cancel it manually.

$ nova live-migration-abort INSTANCE_ID MIGRATION_ID

To make live-migration succeed, you have several options:

  • Manually force-complete the migration

    $ nova live-migration-force-complete INSTANCE_ID MIGRATION_ID

    The instance is paused until memory copy completes.

    Caution

    Since the pause impacts time keeping on the instance and not all
    applications tolerate incorrect time settings, use this approach with
    caution.

  • Enable auto-convergence

    Auto-convergence is a Libvirt feature. Libvirt detects that the
    migration is unlikely to complete and slows down its CPU until the
    memory copy process is faster than the instance’s memory writes.

    To enable auto-convergence, set
    live_migration_permit_auto_converge=true in
    nova.conf and restart nova-compute. Do this on
    all compute hosts.

    Caution

    One possible downside of auto-convergence is the slowing down of the
    instance.

  • Enable post-copy

    This is a Libvirt feature. Libvirt detects that the migration does
    not progress and responds by activating the virtual machine on the
    destination host before all its memory has been copied. Access to
    missing memory pages result in page faults that are satisfied from the
    source host.

    To enable post-copy, set
    live_migration_permit_post_copy=true in
    nova.conf and restart nova-compute. Do this on
    all compute hosts.

    When post-copy is enabled, manual force-completion does not pause the
    instance but switches to the post-copy process.

    Caution

    Possible downsides:

    • When the network connection between source and destination is
      interrupted, page faults cannot be resolved anymore, and the virtual
      machine is rebooted.
    • Post-copy may lead to an increased page fault rate during migration,
      which can slow the instance down.

If live migrations routinely timeout or fail during cleanup
operations due to the user token timing out, consider configuring nova
to use service user tokens <service_user_token>.