Auto-discovery of bare metal nodes is a peculiar thing: everyone wants it in theory, but very few end up using it after facing the harsh reality. The truth is, there is not so much information you can discover by powering a machine on and booting a special ramdisk on it. I mean, oh, sure, we can collect literally thousands of various facts and runtime characteristics, but a few critical ones keep evading us. Specifically, BMC credentials. The very few facts ironic needs to be able to manage the machine. Oops.
In all fairness to hardware vendors, it's not very sensible to allow any user, even one with root access, to learn these critical bits of a hardware infrastructure. Not in the cloud era.
Two ideas appeared from numerous heated (and not so much) discussions, one great and one abysmal:
Introspection rules as a way to encode the logic of setting the credentials post-discovery.
Setting IPMI credentials during discovery.
Today we're talking about the latter.
Table of Contents
What can easier: the ramdisk generates a random password, then does:
ipmitool user set name 2 ironic ipmitool user set password 2 pa$$w0rd ipmitool user enable 2 ironic ipmitool channel setaccess 1 2 link=on ipmi=on callin=on privilege=4
Isn't IPMI obvious? No? Okay, fine, these commands do the following:
Assign user #2 name
Set password for user #2 to
Enable the user #2.
Grant the user #2 full access to the 1st channel.
If you already feel an uncontrollable urge to burn your monitor, feel free to close this post. Otherwise read on!
A careful reader has already noticed two seemingly arbitrary numbers that carry certain assumptions: we chose user #2 for our manipulations, and we assumed that the IPMI channel to use is channel #1. It wasn't until a few years later that I realized that hardware vendors may get quite creative when assigning channel numbers.
Or were it operators?
Anyway, these two numbers had to be parameterized. This was the first red flag: while user #2 and channel #1 were probably reasonable defaults (they worked in my testing), a robust implementation would require an operator to input these numbers, especially the channel number. To pick them in a sensible way they would go to the BMC web interface and check the IPMI configuration.
… which requires BMC credentials. Hmm.
Our discovery process is designed, well, to discover. Not to modify node configuration, definitely not in a dangerous fashion. Information is collected, pre-processed and sent to the control plane, the ramdisk is powered off or stays awaiting deployment commands. What happens if network connectivity dies mid-way?
Well, a normal discovery may just retry until success. Or terminate. But what if it has already modified the IPMI credentials?
The fire-and-forget model no longer worked, we needed to generate the credentials on the server side, then pass them to the ramdisk. We needed to change the whole procedure to accommodate this feature.
… but at least when we're talking about passing credentials on network, we're talking about a securely encrypted channel, right? RIGHT??
Here comes an issue that has been haunting us for years: when you're booting a node with e.g. PXE, you don't have a secure way to exchange any secrets. See yourself:
boot parameters are passed via DHCP (insecure);
iPXE firmware is loaded via TFTP (insecure);
the ramdisk is loaded over HTTP, but using TLS requires building a custom iPXE ROM (problematic or insecure).
At this point an intruder has had enough chances to read and intercept the traffic. The best we can do is to establish an unverified TLS connection and hope for better. You're relieved, aren't you?
Ironic has always served as a primary authority for the state of a node, but has never been a primary authority for access credentials. Users have a habit of deleting nodes that misbehave, but doing it after discover would remove the only place where the generated credentials are stored!
Oh, and by the way, in the default configuration ironic does not allow reading passwords via API.
Around 2017 we became aware of Redfish - a new standard for managing hardware. I could say a lot of good and bad words about it, but what matters now is that its approach to creating a host interface (accessing BMC from within the machine) is different from IPMI. For one, it does not allow credential-less access, which pretty much kills this idea. There should be a way to read temporary created credentials, but they seem to be valid only during early UEFI boot phase.
Honestly, some hardware (Cisco?) did not even allow the IPMI trick described above. As a result, the hardware support for this feature was patchy and kept decreasing with time.
What did kill the feature in the end is customer demand. While a lot of product managers were (and are) very excited about it, it has never received any major traction in the field. Honestly, I think the assumption that operators don't know their BMC credentials was completely wrong from the beginning.
You can have it, but you need to write a bit of code. Or wait for someone (not me) to write it for you. What you will need is a new ironic-inspector introspection hook that:
checks if BMC credentials are present;
generates new credentials and stores them on a node immediately;
accesses the node via SSH to executes the commands above;
verifies the result.
An alternative to SSH could be a new ironic-python-agent API extension.
The ironic community is not likely to accept responsibility for the new code, but it can happily leave on OpenDev… provided that somebody actually writes it in the end.