Making use of BitBake recipes outside of BitBake

Lots of software integration projects are based on Poky, the reference distribution maintained by the Yocto and OpenEmbedded projects. These use a build tool called BitBake. BitBake is a very generic tool, like an extended version of make. To make it useful for building GNU/Linux operating systems from source code, Poky implements a lot of build logic on top the primitives set out by BitBake. Poky also provides a set of build instructions (recipes) for many common free software packages. Projects based on Poky can then add to or modify this set of recipes to produce some kind of OS image.

In Baserock we are keen on finding a way to represent build instructions that is independent of any actual build tool. We think this will help how we build software in general.

Exporting each task as shell / Python programs

At its core, BitBake is just generating snippets of shell or Python that operate on the filesystem, and running them in order. We can export them, instead of running them.

An important caveat: don't expect them to be pretty! The shell scripts are actually quite readable, but because the Python code is executed inside the running BitBake process, we need to dump the huge internal dictionary of variables that BitBake maintains.

In many cases the generated scripts do actually work, so this is a really useful first step. Later we can hopefully strip out most of the context that makes them so huge.

Demo: The FreeDesktop SDK

I tested this with the Freedesktop SDK Base project. Please try with other Yocto/OpenEmbedded projects too!

NOTE: this is totally unofficial, and if the Devil visits you in the night as a result then I am not going to be held responsible.

I strongly advise you to do this in a clean VM or container, not in your main development machine! It only takes one screwed-up variable expansion to turn rm -rf ${tempdir}/* into rm -rf /*.

Clone the source repo:

 git clone --recurse-submodules https://git.gnome.org/browse/freedesktop-sdk-base

Use my patched version of BitBake (I had to fork the whole of poky.git, but only bitbake code is changed):

 cd freedesktop-sdk-base/yocto
 git remote add fork https://github.com/ssssam/poky
 git remote update fork
 git checkout sam/dump-build-definitions-freedesktop-sdk-backports

Edit the freedesktop-sdk-build-yocto script (in the top Git repo) so that the bitbake call becomes bitbake --export-tasks ~/bitbake-dump (or whatever location you want to dump the scripts).
Also, edit meta-freedesktop/conf/distro/freedesktop.conf and comment out 'PREFERRED_PROVIDER_virtual/kernel' line. (Otherwise you get an error due to some Git repo missing a tag at git.yoctoproject.org)
Make the necessary directories and run it:
```
 mkdir -p ~/work/x86_64/conf
 mkdir ~/bitbake-dumps
 ./freedesktop-sdk-build-yocto . ../bitbake-work x86_64 1  > log 2>&1
```
The parameters are: source dir, work dir, architecture, and some 'hash' field which seems irrelevant to us. Change whatever you want. It writes a huge amount of pointless DEBUG messages to stdout and stderr (I'm not sure why!), you'll find that it takes 5 times as long if you leave the output going to a terminal.

Note that BitBake will hang once it has written all the files and you'll have to kill it with CTRL+Z and probably kill -9. This is my fault for blocking the server process's main loop, probably.

So now in ~/bitbake-dump you should have about 3GB of generated shell and Python scripts. There are a few special ones: data.py has the 'global' set of variables, globals.sh has the global set of variables that are exported to shell processes, and run.sh contains the task order.

Now you can run a build. The Python tools need to be able to import the 'bb' and 'oe' modules from BitBake, so set PYTHONPATH (change ~ to wherever you checked out freedesktop-sdk-base.git) to this:

export PYTHONPATH=~/freedesktop-sdk-base/yocto/bitbake/lib/:~/freedesktop-sdk-base/yocto/meta/lib

OK! Try calling sh run.sh and see how far your build gets. Mine built quite a lot of stuff. All the work it does will be within the directory you gave to freedesktop-sdk-build-yocto as the 2nd argument. After a while it will break for some reason. Mine broke at glibc-initial due to a variable called ld_append_if_tune_exists being set to None instead of the Python function definition that it should be set to. That's probably because I have no idea what I'm doing.

You can see a video of this, from August 2015: https://www.youtube.com/watch?v=gW1NKZPomjI

Future work

Submit the --export-tasks patch to BitBake developers for advice (it's unlikely to be merged, but they could probably give helpful comments on the approach, anyway).
Avoid the need to patch siggen.py, by monkeypatching bb.parse.siggen.dump_sigtask in each generated .py file.
Instead of --export-tasks writing the entire data dict for each class, it should 'import data' and then only write out the values that are different from the initial set of data. Ditto for shell scripts, but '. globals.sh'. I implemented this already in a branch, but it turned out to be incredibly slow due to the amount of value comparisons between data dicts. It needs a bit more thought.

Distilling the most important information

It'd be nice to get the key information in a nice browseable format. This will probably be a format that no longer actually executes as-is.

Shell tasks (configure, compile, install) are actually quite readable already. If we removed the 'export' statements that are duplicated from 'globals.sh', squashed the chains of shell functions that just call another function (such as do_make -> oe_runmake -> oe_runmake_call), and removed log functions like bb_note, they would be pretty clear as-is.

The 'fetch' and 'unpack' tasks call functions in the 'bb' module to fetch source code. Really, we just care about where the source comes from.

The 'patch' task also calls functions in the 'bb' module. Again, we only really care what the patches are.

Modifying the .bbclass files to export this information somehow in a machine-readable way might make sense. A tool could then scan their code or their output for markers that say "this is the source code URI", "this is a patch", etc.

Need to figure out how to represent source code and patches in our data model. Since source code on 3rd party servers can be changed at any time, we should really import it to Git straight away, and apply the patches there and then. This leaves you with a huge amount of data, though. Maybe just storing the patches and the source code URI is useful.

Dependency information: it's there in the variables, somewhere. The BitBake runqueue module has all the task dependency information in memory, so could easily write that out in some machine-parsable format. We probably are only interested in component level dependencies -- although we may find that we have dependency loops if we try to simplify BitBake's internal task-level dependency model.

The 'package' task calls the functions listed in PACKAGESPLITFUNCS, and these set the FILES_xx variables. Each of these seems to expand to globs of what should be included in a package. E.G:

d.setVar('FILES_zlib-bin', '${bindir}/* ${sbindir}/*')

We should be able to dump those variables as 'split rules'.

'package_write_rpm' task generates a .spec file to generate one or more RPM packages from the build output, with some metadata too. We don't

The end result will be the same set of information as the recipes, but in a much more static form. This means it'll be much easier to comprehend, and alternative build tools can try to build the same system. On the other hand, it will probably be harder to change this version compared to the original, as lots of information that was previously generated at runtime (such as the splitting rules) will now be 'hardcoded' for each component.

Building BitBake recipes with a Baserock build tool

A completely automated conversion from BitBake recipes to Baserock definitions would be a huge amount of effort and probably pretty fragile. I think the main challenges are...

dealing with the hardcoded paths. Baserock build tools run each component build in 'staging' directory with a random filename. The Baserock reference systems do some hairy hacks at bootstrap time to work around that (including patching GCC specs so that we can change the sysroot path in an environment variable). Build instructions from Poky might not cope with that. Although, the 'sstate_hardcode_path' function seems to remove hardcoded paths after the 'populate_sysroot', so perhaps it could work.
source code import. We need a way of producing a suitable Git branch, given a source code URI and a set of patches. The 'lorry' tool is mostly capable of this, but we need to figure out how to handle the patches.
fakeroot/pseudo. currently we just ignore that completely.
anything that does special things during the fetch, unpack, patch, populate_sysroot or package stages will not work as-is, because we'll be ignoring the actual code for those tasks
lots of other things too, probably