Maintaining a distbuild network on a Calxeda Highbank server
This guide is for administrators who are managing a Morph distributed build network on a Calxeda Highbank ARM server. Note that Calxeda have gone out of business: for new deployments we recommend setting up a cluster of NVIDIA Jetson boards to do distributed builds (or something else, if you want).
This guide assumes you have a Baserock build environment set up. Follow the 'Preparation' section in HOW-TOs - Using distbuild for Baserock on ARM if you have not done this already.
Placeholders that are used in this guide:
- $trove -- hostname and trove-id of your Trove
- $infrastructure.git -- your infrastructure definitions repo. (This can be
a keyed Morph URL, e.g. baserock:baserock/infrastructure.git, or a full
URL like git://$trove/$trove/site/definitions.)
- $cxmanage -- hostname of a machine that contains `ipmitool` and has
network access to your Calxeda server
- $version - a version label. This can be anything you want that is valid
as a directory name. I recommend using the date, e.g. `2015-04-20`.
You will need an 'infrastructure' repository, forked from the Baserock
reference system
definitions.
You may have this in your Trove as
git://$trove/$trove/site/definitions.git
.
Upgrading distbuild to the latest release
This describes how to upgrade your distbuild network to an imaginary 20.00 release of Baserock.
First, clone infrastructure definitions
git clone $infrastructure.git
Merge in the latest tagged release of the Baserock reference systems from the reference systems Git repository.
cd infrastructure
git remote add upstream git://git.baserock.org/baserock/baserock/definitions.git
git remote update upstream
git merge --no-ff baserock-20.00
Build the new version of systems/build-system-armv7lhf-highbank.morph
using your distbuild network. You need to push your branch first, so the
distbuild network can see it.
git push origin HEAD
morph distbuild systems/build-system-armv7lhf-highbank.morph --local-changes=ignore
Deploy the cluster for your distbuild network. I'll assume that this is called
distbuild-cluster.morph
for the purpose of this guide, and contains a system
named distbuild
, deployed with extensions/distbuild-trove-nfsboot.write
.
This will create a new rootfs for each node, it will not overwrite the running
rootfs, so you can keep using the distbuild network while this completes.
morph upgrade distbuild-cluster.morph --local-changes=ignore \
distbuild.VERSION_LABEL=$version
When you are ready, you need to restart the distbuild nodes in the updated system. This will stop any builds that are running, so it's a good idea to check first if there are any running builds:
morph distbuild-list-jobs
Restart all of the nodes on your distbuild network using impitool
. First run
ssh root@$cxmanage
to log into the right machine. You then need to check and
reset the power of each node. The example below assumes your nodes are on the
subnet 172.17.1.0 with the first node's IPMI interface at 172.17.1.3, the second
node's IPMI interface at 172.17.1.7, and so on. It also assumes you are using 8
nodes (0 to 7). You can use commands ipmitool chassis power status
, ipmitool
chassis power off
and on
as well as reset
, which may be useful if not all
of the nodes on the server are
currently powered on, or you don't know the status of them.
for i in `seq 0 7`; do
ipmi_address=172.17.1.$(expr $i \* 4 + 3)
ipmitool -U admin -P admin -H $ipmi_address chassis power reset
done
Rollback distbuild nodes to an older version
If you find that a new version does not work for some reason and you want to roll back to the previous version, you need to update the 'default' symlinks on the NFS server to point to the old version.
First, log in to the Trove (assuming you use Trove as your NFS server) with
ssh root@$trove
. Then, set up an environment variable listing the name of
each node. The example below assumes your nodes are called node0
, node1
and
node2
.
nodes="node0 node1 node2"
You can see the 'default' symlink for each node with this command. This tells you which directory each node will read its root filesystem from next time it boots.
cd /srv/nfsboot
for name in $nodes; do readlink $name/systems/default; done
To cause each node to boot from an older system version, update the 'default' symlink for each node. For example, to switch them to version with label '2015-01-01', do this:
cd /srv/nfsboot
for name in $nodes; do
ln -sf /srv/nfsboot/$name/systems/2015-01-01 $name/systems/default
done
You can then reboot the nodes as described above. When you reboot the nodes, any running builds will be cancelled.
Debugging distbuild
If you find that your distbuild network doesn't work, start by logging into the controller node and checking:
systemctl status distbuild-setup.service
systemctl status morph-controller.service
systemctl status morph-controller-helper.service
systemctl status morph-worker.service
systemctl status morph-worker-helper.service
systemctl status morph-cache-server.service
You can find full logs for these services in /var/log
.