Bare Metal + Kubernetes = ♡
In this blog post, I'm talking about Metal3 (pronounced "metal-kubed"): a Kubernetes API and a cluster API provider for bare metal. I'll explain how and why it uses Ironic, and how you can use it to provision bare metal machines.
I assume that you are familiar with Kubernetes and the concept of Custom Resource Definitions (CRD). I also won't spend time explaining what cluster API is, there are enough good resources on this topic. I will only say that it's a way for Kubernetes to manage itself, in this case by provisioning (and deprovisioning) bare metal machines.
Introduction in Metal3
To put it simply, the Metal3 project provides two components: a bare metal management API and a cluster API implementation that uses it. The API is a Kubernetes-native wrapper around Ironic that allows to use it in a straightforward and declarative way. On top of that, Metal3 provides container images and tools to deploy both Ironic and its components.
Metal3 deliverables
- baremetal-operator (BMO)
-
The heart of the project: the BareMetalHost (BMH) custom resource definition and the controller to manage such resources. The main deployment tools can also be found in this repository.
- cluster-api-provider-metal3 (CAPM3)
-
The cluster API provider using baremetal-operator to provision machines for the cluster.
- ironic-image
-
The container image for Ironic with suitable configuration for Metal3.
- ip-address-manager (IPAM)
-
An optional controller to manage IP addresses. See this blog post for details: Introducing the Metal3 IP Address Manager.
- hardware-classification-controller
-
An optional controller to label hosts according to their physical properties, which are obtained through inspection (more about it below).
- ironic-ipa-downloader
-
An init container to download the IPA (ironic-python-agent - our service agent that is loaded on the machines when provisioning) from the RDO project.
- metal3-dev-env
-
Scripts and Ansible playbooks to set up a development environment using virtual machines. If you know OpenStack: metal3-dev-env is like DevStack.
- ironic-client
-
A container image with the bare metal CLI for debugging. Of course you can just install it locally using:
Why Ironic
… or how many people put it nowadays:
Why not Just Rewrite It in Go ™?
Maybe I'm just a bit too old to rewrite things that already work?
But seriously though, isn't it easier to just write bare metal management in a Kubernetes-native fashion from scratch? No. Definitely not.
All eggs in their baskets
By using a separate project for bare metal management, Metal3 achieves a clean separation of concerns between two areas:
A lower-level bare metal state machine provided by Ironic.
A high-level declarative API provided by baremetal-operator.
Therefore, the bare metal management system doesn't need to know, how Kubernetes works, while the Kubernetes API provider doesn't need to be aware of the sometimes awkward properties of real hardware. Both projects can attract experts in the relevant domains of knowledge without requiring them to keep too much context in the head.
On top of that, Ironic (just as all OpenStack software) is API-first, that is, it has been designed around an API rather than some sort of a user interface. This is why Ironic can be easily integrated into higher-level projects like Metal3 and also used as it is.
Isn't it OpenStack?
Indeed, Ironic has been and still is developed under the OpenStack umbrella, which includes dozens of the projects governed by the OpenInfra Foundation. However, this relationship does not imply a hard dependency! What we call standalone Ironic can be used without the rest of OpenStack or only with a few chosen services (Identity and Networking are two common examples).
The above mentioned TripleO and Kayobe don't use the whole OpenStack with Ironic. And the Ironic project itself provides a standalone tool called Bifrost that can be used for bare metal management (see also a wonderful introduction to Bifrost from Julia). Ironic works very well with and without OpenStack, and Metal3 is great proof of that!
Kubernetes API for bare metal
Now that you understand, what Ironic is, and why Metal3 uses it, let us take a look at the most important object in Metal3: BareMetalHost (BMH). In Ironic terms, BMH is a node with ports and introspection data embedded in it. The only required information is the address of the BMC (the node's management controller) and its credentials.
--- # This is the secret with the BMC credentials (Redfish in this case). apiVersion: v1 kind: Secret metadata: name: node-1-bmc-secret type: Opaque data: username: VXNlcg== password: UGFzc3dvcmQ= --- apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: node-1 spec: bmc: # BMC address, the actual format depends on the protocol: address: redfish+http://mgmt.node1.example.com/redfish/v1/Systems/1 # Good old IPMI is supported as well: #address: ipmi://192.168.122.1:6233 # BMC credentials - a link to a secret: credentialsName: node-1-bmc-secret # MAC address the node boots from. Will eventually be optional for # Redfish, but it's better to provide it. bootMACAddress: 00:5f:33:20:6b:5f # The node will use UEFI for booting (the default). bootMode: UEFI # Bring it online for further actions online: true # It's recommended to tell Ironic which device to use as a root device. # The `deviceName` hint is not the most reliable, you can use # `serialNumber`, `model`, `minSizeGigabytes` and a few others instead. rootDeviceHints: deviceName: /dev/sda
Once created, the BMH undergoes inspection, cleaning and reaches the Ready
state, where it can be used for deployment. Inspection results in a lot of
hardware information being collected and saved as part of the status
:
status: # ... hardware: cpu: arch: x86_64 clockMegahertz: 2100 count: 8 flags: - 3dnowprefetch - abm # ... firmware: bios: date: 02/06/2015 vendor: EFI Development Kit II / OVMF version: 0.0.0 hostname: master-0.ostest.test.metalkube.org nics: - ip: fd2e:6f44:5dd8:c956::14 mac: 00:5f:33:20:6b:5d model: 0x1af4 0x0001 name: enp2s0 - ip: fd00:1101::8eb0:ccc2:928e:2a5 mac: 00:5f:33:20:6b:5b model: 0x1af4 0x0001 name: enp1s0 pxe: true ramMebibytes: 32768 storage: - hctl: "0:0:0:0" model: QEMU HARDDISK name: /dev/sda rotational: true serialNumber: drive-scsi0-0-0-0 sizeBytes: 107374182400 type: HDD vendor: QEMU - name: /dev/vda rotational: true sizeBytes: 8589934592 type: HDD vendor: "0x1af4" systemVendor: manufacturer: Red Hat productName: KVM (8.2.0)
To deploy, you only need to populate the image information, i.e. add this to
the spec
:
spec: # ... image: # The image URL can be in the qcow2 format or raw. url: http://images.example.com/images/my-os.qcow2 # The image checksum URL: either the checksum itself or a file with # checksums per file name. checksum: http://images.example.com/images/my-os.qcow2.md5sum # Checksum type must be set if not md5. Supported are sha256 and sha512. checksumType: md5
After the image is downloaded, written on the disk and configured, the BMH
reaches the provisioned
state. After that, it can be used for creating a
Kubernetes worker or for any other purpose.
status: # ... provisioning: ID: 7ef3f064-2b86-4d34-8b33-fb5d127a713b bootMode: UEFI image: url: http://images.example.com/images/my-os.qcow2 checksum: http://images.example.com/images/my-os.qcow2.md5sum state: provisioned
To de-provision a node, simply remove the whole image
field. After
cleaning it will be back to the ready
state.
When writing an image is not enough
Sometimes operators ask for the ability to use a custom installer while
keeping all other benefits of Ironic and Metal3. This is also possible. By
setting diskFormat
to a special value live-iso
, you can request Ironic
to boot the provided ISO and finish the installation once it starts booting.
From this point, you can use your installation procedure of choice.
spec: # ... image: diskFormat: live-iso # The image URL: an ISO in this case. url: http://images.example.com/images/my-installer.iso
If you already have bare metal machines provisioned through other means, you
can simply add them as they are by marking them as externallyProvisioned
:
How it works
When BMO starts, it accepts Ironic and Inspector endpoints and credentials (only HTTP basic authentication is supported), as well as deploy kernel/ramdisk URLs via environment variables. Ironic and Inspector are supposed to be started and managed separately, for example by using deployment templates provided in the BMO repository (e.g. OpenShift does it via a separate operator cluster-baremetal-operator).
Once started, the controller manages the BareMetalHost (BMH) CRD by synchronizing the changes between Kubernetes and Ironic. BMO expects to completely own the nodes defined as BMH. It creates them in Ironic if they are missing, updates them if they don't match the information it has, provisions if there is an image, de-provisions if the image is removed, deletes if the BMH is deleted (or detached).
Nodes are created with names looking like <NAMESPACE>~<BMH NAME>
, for
example on OpenShift you can see the following picture:
$ baremetal node list --fields uuid name provision_state +--------------------------------------+---------------------------------------+--------------------+ | UUID | Name | Provisioning State | +--------------------------------------+---------------------------------------+--------------------+ | 7ef3f064-2b86-4d34-8b33-fb5d127a713b | openshift-machine-api~ostest-master-0 | active | | 13b00cf0-562b-47ee-8d45-c0f4dba0e074 | openshift-machine-api~ostest-master-2 | active | | 8e13a1ad-1029-4106-aec2-ba640eb99a1e | openshift-machine-api~ostest-master-1 | active | +--------------------------------------+---------------------------------------+--------------------+ $ baremetal node show openshift-machine-api~ostest-master-0 --fields driver_info instance_info -f json { "driver_info": { "deploy_iso": "http://localhost:6181/images/ironic-python-agent.iso", "deploy_kernel": "http://localhost:6181/images/ironic-python-agent.kernel", "deploy_ramdisk": "http://localhost:6181/images/ironic-python-agent.initramfs", "redfish_address": "http://[fd2e:6f44:5dd8:c956::1]:8000", "redfish_password": "******", "redfish_system_id": "/redfish/v1/Systems/3f32c07a-b060-4d44-b73b-894be044b347", "redfish_username": "admin" }, "instance_info": { "capabilities": {}, "image_source": "http://[fd00:1101::3]:6181/images/rhcos-48.84.202105190318-0-openstack.x86_64.qcow2/cached-rhcos-48.84.202105190318-0-openstack.x86_64.qcow2", "image_os_hash_algo": "md5", "image_os_hash_value": "http://[fd00:1101::3]:6181/images/rhcos-48.84.202105190318-0-openstack.x86_64.qcow2/cached-rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.md5sum", "image_checksum": "http://[fd00:1101::3]:6181/images/rhcos-48.84.202105190318-0-openstack.x86_64.qcow2/cached-rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.md5sum" } }
One interesting consequence of this approach is that the Ironic's database (usually MySQL/MariaDB) becomes transient: it can be removed on the container restart, and BMO will re-create the nodes in the right states: any in-progress operations are started from scratch, while already deployed nodes are adopted.
Future plans
There is a lot of work planned in Metal3. In the near future, we want to bring BIOS and RAID support to BMO, previews of these features are already available. Another contributor is working on a network configuration proposal, which will allow in the future to manage bare metal switches while provisioning the nodes connected to them.
Looking further in the future, I'd like us to consider the possibility of dropping MySQL in favor of either SQLite or anything else that is not a full-featured database. Additionally, we need to develop our multi-master story: currently the Metal3 pod with all services is usually deployed on one master.
Get involved
We're looking for more users, more developers, and more opinions! If you'd like to talk to us, check the metal3 community page; we have a Slack channel, as well as a good old mailing list. If you're interested in Ironic specifically, check out the Ironic community page instead.
If you'd like to experiment with Metal3 or evaluate it before production, try metal3-dev-env. If you prefer OpenShift, check out openshift dev-scripts. Finally, if you want to learn more about Ironic, check the Ironic documentation or start with the Bifrost installation guide.
Good luck with your bare metal journey!