Versioned Exploit Tracker

VET is a system for tracking exploit data down to the granularity of git commits. This page outlines how it works and how to integrate it into Baserock. VET is still under development, and integration into Baserock tooling is still ongoing, but this page will serve as an explanation and exploration.

The Problem

Baserock is designed to allow flexible, traceable, repeatable builds. You can build from stable versions, from upstream heads, from custom branches, from public or private repositories... whatever you need to build the system you want.

Unfortunately, this doesn't sit well with the traditional model of software security. In this model, vulnerabilities are usually identified in a particular release version, and fixed in another release version. This model already causes problems with stable distributions such as Debian, which backport security fixes to make minimal changes without altering the version number.

With Baserock, you must be comfortable that the builds you create are secure - or at least, what known vulnerabilities are present. You must also be able to make, test and deploy minimal changes to keep your systems secure.

The git "Solution"

Initially, it was thought that a git-based solution would suffice - assuming that all projects either do use, or will use, git. Assuming that a vulnerability is introduced by a commit, or set of commits, and fixed by a commit, or set of commits, then all you need to do is make sure that:

  • you can identify which commit(s) in upstream introduce and fix the vulnerability
    • this would be a significant manual curation effort at first, but if useful it might become a standard feature of vulnerability reporting
  • the commit(s) which introduces the vulnerability is not in your build
  • both the commit(s) which introduce the vulnerability and the commit(s) that fix it are present

Unfortunately this fails for a number of reasons. The most serious is that when you pull from upstream branches into your local trove, the git history is lost - there's no way of knowing whether upstream commit X is in your build. It might be possible to mandate a specific way of cherrypicking changes that preserves this information as metadata (rather than in the git history itself), but that ties developers to a rigid model at the expense of false results.

Because initially we are only interested in scanning for problems with open source software, searching for actual lines of code that embody a flaw is feasible.

By analysing new releases that fix security flaws, we can often find the lines of code that cause the problem, and then scan the code that is actually used to build a system to see if they contain them. This solves numerous problems that the git approach above does not address, such as loss of information when rebasing or merging, as well as embedded copies of commonly-exploited libraries.

Embedding is an especially large concern: libraries such as libpng, zlib, OpenSSL, etc are routinely embedded (or functionality copy-and-pasted from them) into other pieces of software, which not only means they're unlikely to be updated but they're also more commonly targeted by attackers.

While a "fuzzy" code search would be useful, it is difficult to do this in a language agnostic way (beyond simply collapsing all whitespace to a single space). Care must be taken to ensure that the "snippet" of code searched for is not generic enough that it may trigger false positives.

The downside to code searching is that everything must be checked out to be scanned, which is time-consuming. Heuristics such as only scanning files with certain names associated with the snippet being scanned for being considered, so not every file may have to be opened.

Additionally, it may be beneficial to use Aho–Corasick for string-matching, which builds a state machine to allow multiple needles to be searched for simultaneously in a haystack. Not only would this reduce data reading substantially, it also opens the possibility of pre-creating the state machine data and distributing that via the web service.