Deploy Steps Tutorial

In this tutorial I'm showing how to create a custom deploy step for Ironic, how to build a ramdisk with in and how to use it when deploying a node.

Deploy steps are an answer to question "how do I run non-standard actions during deployment". Out-of-band steps run from your control plane and can talk to the BMC. More interesting for us are in-band steps that run from within the machine and offer nearly infinite opportunity for customization.

Today we'll create a solution for the following story:

As an operator, I would like to inject small files into the root partition of the final instance through the bare metal API.

There are, of course, numerous way of implementing it, cloud-init being probably the most popular. But we will concentrate on using a deploy step. The complete source code for this tutorial can be found here: https://github.com/dtantsur/ironic-inject-files/

Prerequisites

What will we need for this exercise? Of course a functional Ironic installation! If you don't have one, create it with Bifrost. If you use another way to install ironic, everything should work, but the paths may be different.

Note

In Bifrost projects are cloned into

/opt/stack/<project name>

and installed into a virtual environment at

/opt/stack/bifrost

I will use a CentOS 8 image created with:

DIB_RELEASE=8 disk-image-create -o centos8 centos vm openssh-server

This guide assumes some familiarity with Ironic and its CLI. I also recommend reading my post on scheduling first.

Writing a step

We are creating an in-band deploy step, thus the code will be executed inside the deployment ramdisk on the target node. We will not have access to the control plane, but we will have unlimited access to hardware and will be able to mount the disks.

Before we dive into code we need to decide on the priority, which defines when exactly the step will be running. Looking at the existing steps, we need to run our step between the image is written (priority 80) and the ramdisk is shut down (priority 40). In theory the new deploy step may modify the files that affect the bootloader installation, so let's pick 50.

Finally, we need to understand what we can and cannot use. We cannot use Ironic itself. We can use any Python library that can be installed with pip. What about agent API?

The more stable API is offered by ironic-lib - the internal shared library used by Ironic components. Somewhat less stable API is offered by ironic-python-agent itself. Try to use ironic-lib whenever possible, falling back to ironic-python-agent internal API when required. Hardware manager calls (the ones from HardwareManager) must always be made via dispatch_to_managers, not directly!

Initial structure

We are building a Python package, and we'll use some standard OpenStack libraries, for example pbr. Let's start with this layout:

$ tree
.
├── ironic_inject_files.py
├── LICENSE
├── setup.cfg
└── setup.py

0 directories, 4 files

setup.py will be a stub that tells pbr to look at setup.cfg:

import setuptools

setuptools.setup(
    setup_requires=['pbr>=2.0.0'],
    pbr=True)

We'll start with a very simple setup.cfg:

[metadata]
name = ironic_inject_files
summary = File injection deploy step for Ironic
author = Dmitry Tantsur
python-requires = >=3.6

Hardware manager

Hardware managers are a kind of ironic-python-agent plugins. They are extremely powerful, but today we're only interested in get_deploy_steps.

Let us start working on ironic_inject_files.py by adding a skeleton for a hardware manager:

from ironic_python_agent import hardware

class InjectFilesHardwareManager(hardware.HardwareManager):

    HARDWARE_MANAGER_NAME = 'InjectFilesHardwareManager'
    HARDWARE_MANAGER_VERSION = '1'

    def evaluate_hardware_support(self):
        return hardware.HardwareSupport.SERVICE_PROVIDER

First, we specify the hardware manager name and version. Then we provide the mandatory evaluate_hardware_support call that tells the ramdisk whether this hardware manager is suitable for this node. Our hardware manager is considered suitable for all nodes and receives the highest priority (SERVICE_PROVIDER).

Warning

Do not inherit an existing hardware manager, e.g. GenericHardwareManager. This is nearly never what you want. Inherit the base HardwareManager instead and provide only the methods you care about. Lower priority hardware managers will be used for methods that you don't provide.

Then we need to declare our future deploy step, let's call it inject_files:

def get_deploy_steps(self, node, ports):
    return [
        {
            'interface': 'deploy',
            'step': 'inject_files',
            'priority': 0,
            'reboot_requested': False,
            'abortable': True,
            'argsinfo': {
                'files': {
                    'required': True,
                    'description': 'Mapping between file paths and their '
                                   'base64 encoded contents'
                }
            }
        }
    ]

def inject_files(self, node, ports, files):
    pass

Most of the values are obvious, but why is priority 0? Haven't we agreed to use 50? Well, I could put 50 here, and the step will be always run for all nodes. But I want to show you how to enable optional steps during scheduling. Priority of 0 makes a step optional.

The inject_files call accepts two standard arguments, the node as a dictionary and a list of ports belonging to it, and one custom argument files. If we wanted per-node customization, we should have used node["instance_info"] or node["extra"] dictionaries for passing information. But I would like to show-case passing arguments via deploy templates since this approach is more cloud-style and is compatible with using OpenStack Nova if you need it.

Looking for a partition

We continue with writing a helper function to find a partition with the given directory. I will cut the corner a bit and look for the /etc directory as a marker of the root partition. Real code may rather allow specifying the target partition.

Using a few existing primitives from ironic-lib and ironic-python-agent:

import contextlib
import os
import tempfile

from ironic_lib import disk_utils
from ironic_lib import utils
from oslo_concurrency import processutils

from ironic_python_agent import hardware


# This is being moved to ironic-lib: https://review.opendev.org/c/openstack/ironic-lib/+/774502
def partition_index_to_name(device, index):
    part_delimiter = ''
    if 'nvme' in device:
        part_delimiter = 'p'
    return device + part_delimiter + str(index)


@contextlib.contextmanager
def partition_with_path(path):
    root_dev = hardware.dispatch_to_managers('get_os_install_device')
    partitions = disk_utils.list_partitions(root_dev)
    local_path = tempfile.mkdtemp()

    for part in partitions:
        if 'esp' in part['flags'] or 'lvm' in part['flags']:
            LOG.debug('Skipping partition %s', part)
            continue

        part_path = partition_index_to_name(root_dev, part['number'])
        try:
            with utils.mounted(part_path) as local_path:
                found_path = os.path.join(local_path, path)
                LOG.debug('Checking for path %s on %s', found_path, part_path)
                if not os.path.isdir(found_path):
                    continue

                LOG.info('Path found: /%s on %s', found_path, part_path)
                yield found_path
                return
        except processutils.ProcessExecutionError as exc:
            LOG.warning('Failure when inspecting partition %s: %s', part, exc)

    raise RuntimeError("No partition found with path %s, scanned: %s"
                       % (path, partitions))
  1. dispatch_to_managers calls into another hardware manager that implements get_os_install_device - a call to find out which disk device is used for the root file system.

  2. list_partitions returns a list of dictionaries with partition information.

  3. We filter out the UEFI boot partition and ignore LVM.

  4. Then we mount each partition (requires ironic-lib 4.5.0 from Wallaby) and look for a path there.

  5. If the path exists, yield it and stop.

Deploy step

Now we are ready to finish our deploy step:

class InjectFilesHardwareManager(hardware.HardwareManager):

    HARDWARE_MANAGER_NAME = 'InjectFilesHardwareManager'
    HARDWARE_MANAGER_VERSION = '1'

    def evaluate_hardware_support(self):
        return hardware.HardwareSupport.SERVICE_PROVIDER

    def get_deploy_steps(self, node, ports):
        return [
            {
                'interface': 'deploy',
                'step': 'inject_files',
                'priority': 0,
                'reboot_requested': False,
                'abortable': True,
                'argsinfo': {
                    'files': {
                        'required': True,
                        'description': 'Mapping between file paths and their '
                                       'base64 encoded contents'
                    }
                }
            }
        ]

    def inject_files(self, node, ports, files):
        with partition_with_path('etc') as path:
            for dest, content in files.items():
                content = base64.b64decode(content)
                fname = os.path.normpath(
                    os.path.join(path, '..', dest.lstrip('/')))
                LOG.info('Injecting %s into %s', dest, fname)
                with open(fname, 'wb') as fp:
                    fp.write(content)

The code is pretty straightforward, we're finding the root partition, decoding the content, recalculating the path and writing the file.

Entry point

The package needs the last touch before we can work on building a ramdisk with it. Let us update setup.cfg to help ironic-python-agent find our new hardware manager:

[metadata]
name = ironic-inject-files
summary = File injection deploy step for Ironic
author = Dmitry Tantsur
python-requires = >=3.6

[files]
modules =
    ironic_inject_files

[entry_points]
ironic_python_agent.hardware_managers =
    ironic_inject_files = ironic_inject_files:InjectFilesHardwareManager

The last line creates an entry point called ironic_inject_files in the ironic_python_agent.hardware_managers namespace pointing at our new class.

Note

All hardware managers are loaded automatically and sorted according to what their evaluate_hardware_support returns.

Building ramdisk

In this part we are building a deployment ramdisk with ironic-python-agent and our new hardware manager.

Writing an element

The standard way of building production-ready ramdisks for Ironic is with diskimage-builder (DIB). We will create a new element to install our hardware manager automatically (using the same repository for convenience):

$ tree elements/
elements/
└── ironic-inject-files
    ├── element-deps
    ├── install.d
    │   └── 80-ironic-inject-files-install
    └── source-repository-ironic-inject-files

2 directories, 3 files

The diskimage-builder documentation explains these files in greater details, here is what we need there:

element-deps

lists dependencies on other elements:

ironic-python-agent-ramdisk
source-repositories

The first item simply declares a dependency on ironic-python-agent itself, the second - on a helper element for checking out git repositories.

source-repository-ironic-inject-files

is a configuration for checking out the ironic-inject-files repository:

ironic-inject-files git /tmp/ironic-inject-files https://github.com/dtantsur/ironic-inject-files

This line specifies the destination directory and the source repository. It is convenient that DIB allows overriding it via environment variables, so e.g. you can use your local repository instead of my github.

install.d/80-ironic-inject-files-install

is an actual installation script. The leading number is a priority. Since ironic-python-agent itself is installed with priority 60, we need a higher value (in DIB priorities work the opposite way from deploy steps).

#!/bin/bash

if [ "${DIB_DEBUG_TRACE:-0}" -gt 0 ]; then
    set -x
fi
set -eu
set -o pipefail

/opt/ironic-python-agent/bin/pip install /tmp/ironic-inject-files

The first lines are boilerplate typical for DIB elements, the last line installs our project (cloned by source-repository) into the virtual environment where DIB lives.

Note

DIB also supports installing ironic-python-agent from packages, in which case the path will be different.

It is mandatory to make install scripts executable!

chmod +x elements/ironic-inject-files/install.d/80-ironic-inject-files-install

Building and configuring

Now we are ready to build the ramdisk! Assuming you want to use Debian, and that ironic-python-agent-builder is cloned in /opt/stack (as the case in Bifrost) and your project in /home/user, run:

# Bifrost-specific
source /opt/stack/bifrost/bin/activate
cd /opt/stack/ironic-python-agent-builder
# Build the image
export DIB_REPOLOCATION_ironic_inject_files=/home/$USER/ironic-inject-files
ironic-python-agent-builder -o ~/ipa-debian-inject-files debian-minimal \
    --elements-path /home/$USER/ironic-inject-files/elements \
    -e dhcp-all-interfaces -e ironic-inject-files

What is happening here?

  • DIB_REPOREF_ironic_inject_files overrides the location for our project (note underscores instead of dashes),

  • debian-minimal creates a minimal Debian image,

  • dhcp-all-interfaces (optional) runs DHCP for all interfaces on start-up,

  • ironic-inject-files requests our new element to be used.

The output will be files ramdisk.kernel and ramdisk.initramfs that we will supply to ironic. On Bifrost images are kept in /httpboot, thus:

sudo cp ~/ipa-debian-inject-files.kernel /httpboot/ipa.kernel
sudo cp ~/ipa-debian-inject-files.initramfs /httpboot/ipa.initramfs

Note

Rather then overwriting the default deploy ramdisk, you can also set it per node with

baremetal node set <node> \
    --driver-info deploy_kernel=file:///httpboot/ipa-debian-inject-files.kernel \
    --driver-info deploy_ramdisk=file:///httpboot/ipa-debian-inject-files.initramfs

Lastly, Bifrost uses the feature called fast track, in which it keeps the node powered on with the ramdisk running and waits for commands. A node in the fast track mode won't recognize your new hardware manager until you restart it. Power the nodes off with:

baremetal node power off <node>

Deploy templates

Our deploy step is disabled by default (priority = 0). Planned in the Wallaby release is the ability to request non-default deploy steps directly via the provisioning API/CLI. But in our case we can benefit from the older procedure that involves deploy templates.

Note

Deploy templates are handy when only some nodes support a certain feature (e.g. you only use the new IPA image for a subset of nodes).

Let us create a new deploy template called CUSTOM_INJECT_FILES that runs our new step at the priority 50 (as agreed before). We will inject a message of the day:

Hello Ironic

or in base64:

SGVsbG8gSXJvbmljCg==

baremetal deploy template create CUSTOM_INJECT_FILES \
    --steps '[{"interface": "deploy",
               "step": "inject_files",
               "priority": 50,
               "args": {"files": {
                   "/etc/motd": "SGVsbG8gSXJvbmljCg=="
               }}}]'

Here:

  • interface must match one of the step (usually deploy),

  • step is the step name,

  • priority is the priority to run the step at,

  • args are used to pass additional arguments to the step.

Next we assign a trait CUSTOM_INJECT_FILES to some nodes. Traits represent a certain ability of a node, in this case - to execute the deploy template with the same name. See my post on scheduling for a much more detailed explanation.

baremetal node add trait <node> CUSTOM_INJECT_FILES

Deployment

Finally, we need to request an allocation with this trait and deploy on the resulting node. The manual procedure is somewhat verbose (check the deploy templates documentation), so we will use metalsmith which has native support for traits. Assuming you have a CentOS 8 cloud image in /httpboot/centos8.qcow2:

metalsmith deploy --resource-class baremetal --trait CUSTOM_INJECT_FILES \
    --image file:///httpboot/centos8.qcow2 \
    --ssh-public-key ~/.ssh/id_ed25519.pub
  • The first two arguments concern scheduling. We request resource class baremetal and trait CUSTOM_INJECT_FILES. The trait request automatically engage the corresponding deploy template.

  • The last two arguments provide the image to deploy and a public key to connect to the node later.

Under the hood metalsmith creates an allocation like this:

$ baremetal allocation show 840a2966-e8da-4337-b2e7-e400c81c4fe9
+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| candidate_nodes | []                                   |
| created_at      | 2021-02-08T18:13:27+00:00            |
| extra           | {}                                   |
| last_error      | None                                 |
| name            | testvm1                              |
| node_uuid       | 4e41df61-84b1-5856-bfb6-6b5f2cd3dd11 |
| owner           | None                                 |
| resource_class  | baremetal                            |
| state           | active                               |
| traits          | ['CUSTOM_INJECT_FILES']              |
| updated_at      | 2021-02-08T18:13:28+00:00            |
| uuid            | 840a2966-e8da-4337-b2e7-e400c81c4fe9 |
+-----------------+--------------------------------------+

and populates instance_info like this (more fields in reality):

$ baremetal node show <node> --fields instance_info -f json
{
  "instance_info": {
    "traits": [
      "CUSTOM_INJECT_FILES"
    ],
    "capabilities": {
      "boot_option": "local"
    },
    "image_source": "file:///httpboot/centos8.qcow2"
  }
}

Once the deployment is successful, we can wait a bit to let the operating system boot, find out the IP address via dnsmasq logs and try SSH access using the provided IP and our SSH key:

$ ssh centos@192.168.122.78
...
Hello Ironic

Conclusion

In this post you've learned how to create a deploy step that injects arbitrary files into the final image. Fortunately, you won't have to: I'm planning on upstreaming a more sophisticated version of this deploy step in the Wallaby release. But this knowledge will definitely help you build your own deploy steps.

Comments

Comments powered by Disqus