Continuous Integration and Automated Test (C.I.A.T.) initiative

Introduction

Continuous delivery is a very popular concept nowadays, especially among the application and web services space. It is little by little being adopted by other segments.

For this first attempt of building a reference implementation, CIAT was originally designed with a specific context, goal and target in mind:

Context: CIAT has been initially developed for the embedded industry although it is being extended to cover use cases from other segments
Goal: by design, CIAT intends to reduce the time window that goes from detecting an new version of a relevant component of a customised OS to publish a candidate ready for verification and validation (delivery) in just a few hours, allowing any corporation to create a 24 hours continuous delivery time cycle for very complex and mission critical solutions.
Target: The initial target is the car industry. This is a consequence of the long term relation between Baserock and GENIVI/AGL, where some of the ideas and technologies that are part of CIAT has proven to be production ready for over 2 years.

Continuous Integration and Automated Testing (C.I.A.T.) is as a Baserock initiative that creates a Linux based operating system using existing Open Source software through an automated heart-beat pipeline process with 6 main stages:

Detection of a new version of a component included in our customised OS.
Integration of the update into the OS definition.
Creation a new build based on the updated definition.
Preparation of the builds/images and artifacts and provision of a bare-metal or VM farm for (automated) testing.
Test the build.
Publication of the candidate and the associated artifacts, making them ready for verification and validation.

But CIAT does not stop there...

Now that many embedded projects and organisations are widely using GNU/Linux based systems, there is an opportunity to take full advantage of upstream development processes, consuming newer software on regular basis. The idea behind Baserock, implemented in CIAT, is to integrate the latest software from the most relevant Open Source community projects (upstream) as soon as they release it. Staying as close to upstream as possible simplifies integration and maintenance at big scale.

Hence, this combination of existing Open Source technologies applied to modern automated delivery processes to build customised GNU/Linux based OS, using the latest released software from upstream, provides to any organisation the ideal foundation to deliver complex solutions significantly faster and cheaper.

A first version of CIAT was shown at ELCE 2015, that took place in Dublin, IE. Further improvements are being developed and implemented on ciat.baserock.org. Additional use cases are being considered. As any Baserock initiative, CIAT is an Open and Free Software based project.

CIAT design

This is the current high level design of CIAT. Please find below the image a description of each block.

[[!img Error: Image::Magick is not installed]]

Upstream

Here we mean all of the open source community whose projects are integrated into systems we are working with, along with any software we create locally by whichever team is instantiating the CIAT infrastructure and any other third-party projects integrated into those systems.

Trove

The Trove is a git server (or collection of git servers if appropriate) which contains every repository involved in the integration of a set of systems which are then tested using the CIAT deployment. For the Baserock project that is git.baserock.org but for any other project it can be any git server capable of notifying the CIAT orchestration component when changes come in.

In addition, Baserock Trove's have a service running which regularly polls upstream repositories and injects changes from upstream into the locally held git repositories. This forms the basis of the triggering of the CIAT orchestration on upstream-change.

Firehose

Firehose is the name of the tool we use which reacts to incoming changes to all software components integrated into the systems it is instructed to care about. Firehose acts as a computerised equivalent to an integration engineer and determines what changed, how to react to it, and modifies system definitions to include new changes. Firehose then commits those changes and pushes them to specified branches of the system definitions repository.

Orchestration

The CIAT orchestration is a service which is responsible for processing incoming triggers from all parts of the infrastructure related to CIAT and instructing builders and testers in the work they need to perform. The Orchestration is also responsible for managing the resources available to the CIAT infrastructure such as requesting more build resource if the incoming build load is higher than can be supported with the currently available infrastructure.

Builder

Builders (or build slaves) are simply systems which are capable of building the software which the CIAT infrastructure is monitoring and integrating. Typically this will be either Baserock systems or other Linux systems running software such as Morph or ybd. Though they could equally well run Yocto build systems or any other build infrastructure appropriate to the systems being integrated and tested.

fileserver

The file server is, simply, a place where there is some shared storage. Intermediate and completed build artifacts are present on the file server and the various test systems retrieve artifacts from the file server in order to test them. The file server may also have externally visible interfaces which allow people to acquire artifacts such as test systems, build logs, and test logs.

CIAT-Tester

The tester is another build-slave but this time one oriented toward testing the systems integrated and built by earlier stages of the CIAT pipeline. Testing can involve putting images onto target hardware, or spinning up virtual machines in order to test various aspects of the various systems integrated by the CIAT pipeline. Hidden inside this box is a target system farm manager should target boards be necessary for testing.

CIAT implementation

You can currently check an early implamantation of CIAT in action through a very simple UI that helps to visualise the whole automated process (heart beat pipeline).

In the current implementation, CIAT is divided in five main stages:

Integration: this is the process that check if the there are upstream changes relevant to our customized OS and upgrade (integrate) those changes in the definitions.
Build: building stage. In the current implementation we are dealing with two different builds, a GENIVI based demo platform and a ARM base image.
Image creation: creation of the image out of the build and store it, together with the different artifacts required in a file server so they are available for further stages.
Provision and test: once published, VMs are created and deployed in a cloud so test can be run automatically.
Publish candidate: if tests pass, the image is declared as candidate, together with the artifacts, published and announced.

By clicking in each box, you can access to further information about the process currently running and the previous one, including logs, together with a summary of previous processes. You will identify which process are in execution state by an animation and a progress bar. All this information will help you, not just to get context of what is happening, but also to analise failures.

If you are interested in digging deeper, it is recommended to check the CIAT Buildbot page, specially the summary that Waterfall provides.

Integration

This stage is triggered when its configuration, or any of the other git repositories on the Trove have changes.

It fetches the latest Firehose configuration, then runs it on each defined set of configuration files.

Each set of configuration files determines how to update a base branch with the latest development commit or tag of the specified projects.

This is done by modifying the definition files, and if the result differs from the previous version, then a candidate branch is pushed.

Build

Jobs in this stage are triggered by changes to candidate branches.

Build jobs are run on potentially remote worker machines.

They fetch their configured candidate branch, and build the system they are configured for, with YBD.

Because Baserock tooling has reliable caching, we don't need to build the whole system from scratch unless the candidate branch changed the toolchain.

Image Creation

Jobs in this stage are triggered by the completion of Build jobs.

Publish jobs are responsible for fetching the artifacts produced by the Build jobs, producing a configured disk image that may be tested, and putting it somewhere that it may be found by the test jobs.

Provision and test

Provision and test is configured with a set of published disk images, which it must fetch and instantiate in a test harness.

Currently the only test harness supported is using SSH to run commands on machines in an OpenStack cloud.

It handles injecting credentials, uploading the images to the cloud, instantiating the machines, waiting for them to boot, then running the configured set of commands to test whether they pass.

Becaues of these tests, we can know whether an upstream change causes a machine to be un-bootable, and whatever tests the commands specify.

Once the tests have finished being run, the machines that were used are deconfigured and returned to the pool of test machines.

For OpenStack clouds, this involves shutting it down and cleaning up the disk images.

Publish candidate

This step is intended to collect up all the information required for a human integrator to make a judgement call on whether to apply the candidate change.

This is currently a stub, so we can prove that it's triggered after tests.

In future, we want it to:

Ensure the artifacts are preserved somewhere that a human integrator can make use of.
Make the candidate branch available somewhere appropriate.
Provide a report for human integrators.

Next steps

In the coming weeks/months there are some areas that will be analysed, improved, refactored, developed... These new actions are related with:

Ochestration
- Multiplicity
- Elasticity
- UID passed down pipeline
- Automation of Orchestration Reconfiguration
Baserock on AWS
Publish step is a stub
Shared artifact cache
Improvements in the testing stage

Orchestration Features

Muliplicity

CIAT Orchestration has a concept of Pipeline. This defines what Steps are done and where. The pipeline has a slave-type and the slave-type has an architecture. At the moment there is only 1 slave per slave-type and which slave a pipeline runs on is a one-to-one-to-one mapping of pipelines to slave-types to slaves.

The Pipeline code can be found in git://cu010-trove.codethink.com/cu010-trove/br6/ciatlib And the configuration is pulled from git://cu010-trove.codethink.com/cu010-trove/br6/ciatconfig

Note: comments have been left in the code to note where future expansion is necessary to complete it.

Elasticity

For CIAT to be elastic it needs to be able to create slave machines, install the slave git://cu010-trove.codethink.com/cu010-trove/br6/ciat-slave) and then dynamically distribute the pipelines amongst the available slaves.
Works best when instantiation is a one-step process. Config baked in, and slaves register themselves with the master.
Slaves must be registered by type, and scheduling works by picking slaves out of the pool by type.
Some process looks at buildbot status and spawns elastic workers when there's more work than slaves available, tears down slaves when more slaves than jobs.
No direct relation between number of slaves and work available, could have a proxy service that handles builds, and gets assigned worker slaves to batch jobs out to.

UID passed down pipeline

It is desirable for each run of each pipeline to have a UID. This means passing it from step to step as it goes through the pipeline. It would also need to be associated with an upstream change. A UID could be created in bottlerock when a candidate-ref changes and then traceback to the upstream change.

Have bottlerock.py generate a UUID to pass into the Integration step's build properties.
Have the Integration job pass the UUID into the build commands.
Have firehose include the UUID in the commit message
Have bottlerock.py read the commit messages for the UUIDs and include them in the Build step's build properties.
Have all build jobs' on-finish trigger steps pass the UUID along in a parameter.

Automation of Orchestration Reconfiguration

Orchestration should update to use its own source when it notices it change. This can be done by installing the master.cfg then using buildbots reconfig command and by restarting bottlerock. It would also need to watch ciatlib

Extend the existing logic for updating the tests when the test repository changes to work for ciatlib and orchestration (ssh://git@cu010-trove.codethink.com/cu010-trove/br6/orchestration).

Baserock on AWS

Implementation isn't complete at this point, although is functional.

We determined the steps necessary to put Baserock images into AWS, but we can't get into images we deployed, so there's potentially more systems engineering required, as well as wrapping up the tooling into someing easy to consume.

Publish step is a stub

We need to ensure the locations of all the tested artifacts that need to be published are passed along.

Primarily we need the tested artifacts, log results and candidate to be published, so integrators can make a judgement on whether to include the candidate change.

Secondarily, we want to include all the intermediate build artifacts, to aid in debugging, or just to share intermediate build steps.

CIAT visualization

Once we have the UUIDs we can make the UI display a stack of pipelines for each UUID. Filtering by triggered builds, and showing all the current builds, with a pipeline per UUID.

The UX of the current visualization should be improved. So far we understand which kind of data CIAT can provide and represent. It has to be defined how to represented based on use cases for specific targets.

Shared artifact cache

Further analysis is required in this area and a first implementation, integrated with the rest of the pipeline.

Improvements in the testing stage

The testing stage can test various images in different hosts. At the moment the testing step is getting one URL to the image to test and modifying a YAML file to put the URL in. This testing step should get the whole YAML file with all the information about all the images needed to test and the tests to be run.