Ephemeral workloads with Ironic

In this post I'm presenting the ramdisk deploy interface, explaining how to use it to run ephemeral workloads and how to provide configuration data for them.

Ironic has been actively explored by the scientific community as a way to automate running calculations without incurring the costs of virtualization. This sort of workloads does not necessarily require installing anything on the machine's hard drive, which may instead be used for caching, swap or not used at all. The results are posted back via HTTP(s) or stored on a network share.

Ramdisk deploy

Ironic has a concept of deploy interfaces. They can be configured per node and define how exactly the provisioning process happens. One of the deploy interface implementations is the ramdisk deploy interface which essentially bypasses the whole deployment process and boots the provided ramdisk or an ISO image directly. The hard drive, if present at all, is not touched, the operating systems runs fully in RAM.

Two uses cases influenced the development of this interface:

  1. Scientific workloads that don't need persistent local state.

  2. 3rd party installers.

The latter is a relatively new idea currently researched by the OpenShift community to reuse the same installer across the platforms.

Configuring

The ramdisk deploy is configured the same way as other deploy interfaces:

[DEFAULT]
enabled_deploy_interfaces = direct,ramdisk

and can be set per node:

baremetal node set <node> --deploy-interface ramdisk

Starting with the Wallaby cycle, Bifrost enables the ramdisk deploy by default, and I will use it throughout this guide.

Boot interfaces

Another Ironic concept is boot interface, which specifies how exactly a ramdisk (either our service ramdisk or a user-provided one) gets to a node. Since boot interface implementations have a direct impact on booting ramdisks, not all of them support the ramdisk deploy or all its options. The two main implementations are:

  • ipxe - uses iPXE to boot the ramdisk or an ISO via network.

  • redfish-virtual-media - uses the Redfish protocol to attach a remote ISO as a virtual media. It will look as a normal CD drive to the operating system.

Deploying

First, nodes must be configured with the right deploy and boot interfaces. For iPXE:

baremetal node set <node> --boot-interface ipxe --deploy-interface ramdisk

For Redfish virtual media:

baremetal node set <node> --boot-interface redfish-virtual-media --deploy-interface ramdisk

The for each deployment you need to provide a link to the kernel and ramdisk. Using the file protocol:

baremetal node set <node> \
    --instance-info kernel=file:///httpboot/ramdisk.kernel \
    --instance-info ramdisk=file:///httpboot/ramdisk.initramfs \
    --instance-info image_source=file:///httpboot/ramdisk.initramfs

Using HTTP:

baremetal node set <node> \
    --instance-info kernel=http://<bifrost IP>/ramdisk.kernel \
    --instance-info ramdisk=http://<bifrost IP>/ramdisk.initramfs \
    --instance-info image_source=http://<bifrost IP>/ramdisk.initramfs

If you have an ISO image, provide only it:

baremetal node set <node> --instance-info boot_iso=http://<bifrost IP>/myimage.iso

In the end deploy as normally:

baremetal node deploy <node>

After a few seconds your node will become active and start booting your ramdisk of choice.

To clean or not to clean?

Automated cleaning normally runs before a node is first available for deployment and between deployments. While it's highly encouraged to leave cleaning enabled, it may not make much sense for ephemeral workloads. Starting with Ironic 16.1.0 (Wallaby) and soon-to-be-released ironicclient 4.5.0 (also Wallaby), it is possible to disable cleaning for a node:

baremetal node set <node> --no-automated-clean

and enable it back afterwards:

baremetal node set <node> --automated-clean

If you workloads leave any temporary data (or you're using an older Ironic), it's highly recommended to keep cleaning enabled. If the data is not sensitive, you can limit cleaning to only remove metadata (partitioning) in ironic.conf:

[deploy]
erase_devices_priority = 0
erase_devices_metadata_priority = 10

Building images

Ironic does not place any restrictions on the content of the operating system, other than a few caveats mentioned for boot interfaces. Starting with version 2.4.0, ironic-python-agent-builder contains the ironic-ramdisk-base element that can be used by diskimage-builder to build general purpose ramdisks.

Assuming you want to use a minimal Debian image and that ironic-python-agent-builder is cloned in /opt/stack (as the case in Bifrost), run:

export ELEMENTS_PATH=/opt/stack/ironic-python-agent-builder/dib
disk-image-create -o ~/ramdisk debian-minimal \
    ironic-ramdisk-base devuser simple-init openssh-server

What is happening here?

  • debian-minimal creates a minimal Debian image,

  • ironic-ramdisk-base creates a ramdisk instead of a normal image,

  • devuser (optional) configures authorized keys for user devuser (can also create a user of your choice - see devuser documentation),

  • simple-init (optional) installs Glean which handles network configuration,

  • openssh-server ensures the image has an SSH server enabled.

Instead of simple-init you can use:

  • dhcp-all-interfaces (optional) runs DHCP for all interfaces on start-up.

Depending on your workloads you may want to add:

  • stable-interface-names (optional) force network interface names to be stable using biosdevname.

The output will be files ramdisk.kernel and ramdisk.initramfs that you supply to Ironic. On Bifrost images are kept in /httpboot, thus:

sudo cp ~/ramdisk.kernel ~/ramdisk.initramfs /httpboot

Disconnected deploy

In the previous example we relied on DHCP for networking and devuser for SSH access. Using DHCP may be problematic for nodes that do not have L2 connectivity to the control plane; you may also need predictable IP addresses or different SSH keys per instance. For normal deployments this is handled by providing a config drive to first-boot scripts like Glean or cloud-init. Starting with the Wallaby release of Ironic, it's also possible for ramdisk deployments.

Config drives

Config drive is a file system (usually ISO 9660, the one used for data CD) that contains first-boot configuration. Bare Metal provisioning API accepts config drives either as a gzipped and base64 encoded blob or as JSON sources.

meta_data

This structure contains generic information about the instance. The most useful fields are:

  • name (Glean) or hostname (cloud-init) - host name for the node. Ironic defaults name to the node name.

  • public_keys - a dictionary with SSH public keys as values. They will be added as authorized keys for the root user.

Example:

{
  "public_keys": {
    "0": "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIChPTEJpAcqEaz0u1m9HuGizxZkjl401Jyw2nEU7or3T dtantsur@bifrost.localdomain"
  },
  "name": "testvm2"
}
network_data

Network configuration. The exact capabilities differ between Glean and cloud-init, but both a capable of setting IPv4 addresses (Glean has only limited support for IPv6), configuring DNS and creating bonds.

Example:

{
  "links": [
    {
      "id": "port-023c6a90-1e4b-4e02-a119-131d8a729b60",
      "type": "phy",
      "ethernet_mac_address": "52:54:00:1f:79:7e"
    }
  ],
  "networks": [
    {
      "id": "network0",
      "type": "ipv4",
      "link": "port-023c6a90-1e4b-4e02-a119-131d8a729b60",
      "ip_address": "192.168.122.42",
      "netmask": "255.255.255.0",
      "network_id": "network0",
      "routes": []
    }
  ],
  "services": []
}

Ramdisk with a config drive

Adding a config drive to a ramdisk deployment is only supported for Redfish virtual media and works by attaching the config drive to a virtual USB slot (the hardware must have it, which is quite common).

If your hardware supports redfish-virtual-media boot, everything else works the same way as for normal deployments: either build an image with your config drive or make Ironic build it for you:

baremetal node deploy <node> --config-drive '{"meta_data": {...}, "network_data": {...}}'

Scripting

The deployment command becomes quite large for larger config drives, so you may script the deployment in Python instead. As a nice side effect, you'll be able to populate MAC addresses automatically.

Start with installing openstacksdk and following its instructions on populating the environment. For Bifrost it's enough to do:

source /opt/stack/bifrost/bin/activate
export OS_CLOUD=bifrost

Now you can create a connection and fetch the required objects:

# Using environment variables for connection parameters
conn = openstack.connect().baremetal
node = conn.get_node(node_id)
port = next(conn.ports(node=node_id))

Then build meta data and network configuration:

meta_data = {
    "public_keys": {
        "0": open(os.path.expanduser("~/.ssh/id_ed25519.pub"), "rt").read(),
    }
}

network_data = {
    "links": [
        {
            "id": f"port-{port.id}",
            "type": "phy",
            "ethernet_mac_address": port.address,
        }
    ],
    "networks": [
        {
            "id": "network0",
            "type": "ipv4",
            "link": f"port-{port.id}",
            "ip_address": ip,
            "netmask": "255.255.255.0",
            "network_id": "network0",
            "routes": []
        }
    ],
    "services": []
}

Finally, configure the node (we set boot and deploy interfaces just in case) and deploy it:

conn.update_node(
    node,
    boot_interface="redfish-virtual-media",
    deploy_interface="ramdisk",
    instance_info={"kernel": kernel,
                   "ramdisk": initramfs,
                   "image_source": initramfs},
)
conn.set_node_provision_state(
    node, 'active',
    config_drive={'meta_data': meta_data, 'network_data': network_data},
    wait=True, timeout=300,
)

A complete example script can be found here: https://gist.github.com/dtantsur/7e614963d48cd929ef39fa60c0b34a3d.

See it in action

This short demo (no sound) show-cases what I've just explained: