critique

Notes on Trove Reference Manual

Introduction

Trove is the central store of your software in the Baserock ecosystem. Trove manages both your source code and sources acquired from third parties, along with caching and serving the binary artifacts which are built from those sources. Trove also provides your project management team with a central location for managing access to source code and for your development and integration team to look to for branches and codelines.

The terminology is confusing:

How can a store "manage" anything? A store is (normally) a passive object. It is something that someone puts things into (or buys things from), it doesn't "manage" those things, it simply stores them.

It can help, facilitate, enable, allow the management of those things but it can't "do" anything. If it does, it needs explaining how the Baserock use of the unmodified word "store" differs from the rest of the world.

Reading on, it seems that Trove isn't a store per se, it's an active management system. Could the first sentence be re-worded to reflect this? e.g. "Trove acts as the central store of..." or something like that. Possibly even better is "Trove is the tool you use to manage the central store of..."

Maybe I'm too literal minded but I find that the current wording leaves me confused and wanting to know How? Why? What does it do? Does it do things without me asking it to? Where do I fit into the picture when it sounds like Trove is some kind of AI?

Basically I come to the end of the first paragraph with more questions than I started with.

Trove is itself a Baserock appliance and is comprised of several important moving parts. The most obvious and actively used part of Trove is the Git service. Development users normally interact with Trove via Morph, which uses the Git service to access and modify the repositories on Trove. Integration, Q.A. and even project management users may also interact with the Git service during their normal work, either directly using Git, or via Morph, or via a web interface.

Less trouble here but the Git/Morph interaction is confusing. Perhaps it would be better to introduce Morph first then Git, explaining that Morph uses Git and making it clear that Morph is higher level and can simplify things for the user but using Git is ultimately more powerful albeit potentially dangerous?

The next important moving part is the source code acquisition service, called Lorry. This runs in the background on Trove and for the most part users do not need to interact with it. However if you are acquiring source code from multiple places in order to build your product, you may need to configure the Lorry service to get the acquisitions going.

Little problem with this. However, if I understand its implications correctly, it might be worth including a note that you should always acquire new repos via Lorry rather than fetching them first and then trying to retro-fit them into Trove (a trap I have just fallen into).

Trove also contains a cache service which is used to provide your Baserock ecosystem with a place to store and retrieve binary artifacts and also to look up information about source code branches and file content without having to check the code out of the Git service directly.

The description doesn't really sound like a cache, perhaps more explanation is needed?

All these services are integrated within the Trove system and work together to ensure that your Baserock-based developers can always get at the source code they need in order to build products.

Quick Start

Something You Need To Know Before You Start

Because Trove is designed to be secure, users and programs are forced to access Trove in a secure way. All configuration of repositories, users and services is done via git. Users and programs are expected to interact with Trove via ssh, and security is enforced by ssh keys. If you do not know what ssh is, how public/private keys work, and why they are generally preferable to passwords, you should probably read up on these topics first.

The recommendation to read up is bit vague. Perhaps a link to a tutorial and a bit more about the aspects to be concentrated on?

There are no user passwords for Trove. In normal operation, if a user is prompted for a password by Trove, it is because they are using a computer, a virtual machine, or an account which has not been authorised on the Trove. Authorisation is done by creating a public/private key pair on the computer/virtual machine and adding the public key to the list of keys that Trove will accept. See the The Git Service section of this manual for information about Trove roles, setting up users and groups, and adding keys.

This warning sounds a bit dire. Is it needed at this stage? It just gives a user, who is probably already a bit daunted, something else to worry about.

Note: there are only a few Linux "users" set up on Trove; "root", "git" and "lorry". Of these only "root" can actually be used as a login.

At this point the user (me) does not have any idea what the implications of this statement are so I think it should either be explained or left out.

Anyone who can log in to the Trove as root can subvert the security scheme, so root access should be very carefully controlled. After initial setup it should only be necessary to log in as root in exceptional circumstances.

Ok, another slightly dire warning. Perhaps it should be explained what the alternative is?

How to Add a Repo To Your Trove

New repos in Trove are either managed by Lorry, or by you.

Good to know.

Lorry should be used for code which is not being developed directly by your organisation, for example:

open source projects
third party code delivered as tarballs or pulled from external repos

I would suggest that this should be merged with the preceding paragraph as it is an immediate clarification of the statement made there. On reading the first paragraph my mind raced ahead with "ok, but which should I choose and why?" This paragraph helps to clarify that.

I would also suggest that you have a line or two explaining when repos should be managed by the user rather than Trove and why. I can see that the implication is that you should manage repos that are being developed internally but I think it bears spelling out explicitly.

To setup trove-cmd

alias trove-cmd="ssh git@YOUR-TROVE-HOST"

"set up" should be two words. "setup" is a noun (and an American one, at that. the English version should probably be hyphenated but I'm not that fussed, language changes).

What is "trove-cmd"? You should add a few words as to why it will be useful.

What is "YOUR-TROVE-HOST"? How do I find out? Do I create it myself? Is it done by some administrator somewhere? At this point I don't know because you haven't told me.

If you are authorised to add Lorried repos, trove-cmd whoami should show that you are a member of the group local-config-writers.

The final line seems a bit random, sort of an afterthought or throw-away. I don't know what it means or implies because it is floating, out of context.

Your Project Repos

Code for projects created and managed in your organisation should be added via trove-cmd create, under your trove_id. So for example, if your project is called new-code, and your trove_id is ab-cd

trove-cmd create ab-cd/new-code

This seems good. Clear, concise and tells me why I might want to use the command shown.

I am, however, a little puzzled by "trove_id". Is this an arbitrary designation that the user selects? It kind of looks like it is but I think it should be spelled out for the avoidance of doubt. As I worked on through the document, it became clear that "trove_id" is an important designator so you should make more of it here so that the user understands what it is for.

Open Source Repos

For an open source project which is to be lorried, the process is:

git clone ssh://git@TROVE-HOST/TROVE_ID/local-config/lorries
cd lorries
mkdir open-source-lorries # if it doesn't exist already
cd open-source-lorries
edit new-project.lorry
git add . & git commit & git push

Jargon alert: "lorried" is not a word because "Lorry" is a noun and does not have a past tense. It may seem obvious to you what it means but it is not to someone new to Baserock and it is capable of misinterpretation.

You should either use English or explain your jargon. It could conveniently have been done when you introduced Lorry earlier on.

In the code box, can you make clearer what is meant to be literal text and what is to be substituted with user specific text? If CAPITALS means "put your own item name here", you should spell that out somewhere. And always explain, at some point, how to work out what the substitution should be.

If I understand your meaning correctly, for consistency with the earlier example you should actually use "YOUR-TROVE-HOST" not "TROVE-HOST".

To reinforce my earlier comment about trove_id, there was a break between my reading the preceding section and this one. Because trove_id was not made anything of, I had forgotten what it was.

It seems odd to use the two "&" in the last line. Doesn't that parallelize the three commands? That would mean that there is a risk that the push could occur before the commit was finished and the commit could occur before the add was complete. Or are we using a weirdo shell where "&" does not mean "run the preceding command in background"? I'd recommend using a separate line for each command unless you are certain that it is ok.

Third Party Repos

For a third-party closed-source project which is to be lorried, the process is the same, only we recommend you use the closed-source-lorries directory in the lorries repository and that you do not lorry them into delta/ but rather create your own prefix for each third party.

Jargon as above.

Where are closed-source-lorries and lorries?

What is delta/?

On a style note, I'd recommend sticking to using either repo or repository and not mixing them in the same document.

Note that if Lorry needs to use the SSH protocol to access a server, the host keys for that server must be known by the Trove machine. You can achieve this by running the following command on the Trove:

ssh-keyscan REMOTE-SERVER >> /etc/ssh/ssh_known_hosts

You say to run a command "on the Trove", what does that mean? It appears to be now using the word "Trove" to refer to the hardware that the Trove is running on. What does the sentence actually mean?

Trove's Moving Parts

The Git service

Trove's Git service allows developers, integrators and build machines to gain access to source code stored within the Trove. The Git service also allows project managers to define access control rights for engineers, testers and other roles, and to apply those access control rights across the user base.

In addition to acting as a central location to store your code, Trove's Git service also has hooks within it to facilitate other parts of the Baserock ecosystem's behaviour.

From day-to-day, we expect developers, integrators, Q.A and project managers to interact with Trove's Git service.

The only thing additional thing I would like to have been told is how "The Git Service" relates to Git as known and loved by millions of developers worldwide. Is it vanilla Git? Is it an overlay? Does it provide a superset of commands? A sub-set?

Otherwise, this section works well for me.

The source code acquisition service

Trove's source code acquisition service (Lorry) allows you to acquire source from multiple locations and centralise access to that in your Trove. Typically you will draw your source code from three disparate sources. Your upstream Trove which is likely to be Baserock itself, any third party projects, and any internal projects which are not themselves going to be stored in the Trove Git service.

Trove can acquire source code from all sorts of different revision control systems and normalises them all into Git which is then used throughout Baserock.

NOTE: that while you can use HTTPS addresses, or other addresses invoking SSL or TLS encryption on the connection, the Trove does not verify certificate validity. Trove does not validate or require any certificates or signatures on the remote end, and does not check them if they are present.

Is this a technology limitation? If not, is there an intention to add this in the future? I can see that a corporate client might regard this as a potential security hole. If you intend to plug the hole, it might be a good idea to make that point. If not, you should explain why it isn't really a problem.

Otherwise, another good section.

The cache service

Trove provides a cache service to your users and to your build network. Services such (missing "as"?) the distributed build workers use the cache to acquire information about the Git repositories stored in the Trove Git service. They also use the Trove to store, index, and serve binary artefacts (built elements of the systems) to all other parts of the Baserock ecosystem.

At this point you haven't introduced me to "distributed build workers", so I don't know what they are and therefore why cache is important to me because it's important to them.

For the most part, users will not interact with the cache system on the Trove except to acquire built system artifacts (use UK spelling) once they have been produced by the build network. This is handled transparently when using Morph.

This section is informative but it doesn't tell me why I need to know about cache.

Also, this service sounds more like an "index" than a "cache", perhaps explain why the choice of term?

Visualising your Trove in a browser

Trove provides two web interfaces. The first is to the Git service and that can be found by pointing your web browser at the Trove system. This interface will allow you to visualise all the Git repositories in the Trove system which have been designated as "public" in the sense that anyone in your organisation may look at them.

How do I thus point? You haven't told me what I should point at.

Does it only show the "public" repos or does it also show private ones that the user is authorised to see?

The second interface is to the source code acquisition service which can be accessed by selecting the link at the bottom of any of the Git service pages. The link is entitled "View Lorry Controller Status" and will take you to a self-refreshing page which indicates the current status of the source code acquisition service.

Isn't this Lorry? You should try to be consistent in naming throughout the document.

Backing up all these moving parts

Backups can be made using the following command:

$ sudo rsync --numeric-ids --delete-before --delete-excluded \
      --exclude '/nfsboot/' --exclude '/lorry/' \
      -ahHSx root@TROVE_HOST:/home/. /path/to/backed/up/trove/files/.

What is this backing up? What is it not backing up? When is a good time to back up? How often should the user do so?

Basically, not enough information to allow the user to make an informed judgement about what and when to back up.

The Lorry service

Lorry is a service in Trove that mirrors git repositories from elsewhere, and converts repositories of other version control systems into git for Baserock use. For example, Lorry gets updates from the git repository Linus Torvalds publishes for the Linux kernel, and puts the updates to the repository in Trove. Lorry can also convert Subversion repositories, for example, into git, and also create a git repository from a release tar archive if that is the only format in which some software is available. Within the Baserock system, all source code is kept in git.

Elsewhere in this document you describe Lorry as the "source code acquisition service". This seems to be an accurate description, it is good, understandable and succinct. It makes Lorry's raison d'etre immediately understandable to a software engineer. I think you should open the previous paragraph by using it. The rest of the paragraph is excellent as it gives a good feel for the detail of what Lorry does.

The Lorry service in Trove consists of a command line tool, called Lorry, and a management tool, the Lorry controller, which runs Lorry at suitable intervals. The Lorry controller reads its configuration from a git repository, stored in the same Trove instance. To make configuration changes, you commit them to the configuration repository and push them to Trove. The Lorry controller will automatically get the changes from there.

The first sentence is hard to parse. It has too many commas. As two items is a bit short for a list, I'd suggest naming the two items then starting new sentences (possibly even new paragraphs) to describe what each does.

This is the first time in the document that you have even hinted that there could be more than one instance of Trove. That is a pretty major thing to introduce in an aside about a component. I strongly suggest that you introduce the concept in a separate topic all of its own and, if that introduction is not earlier in the document than this, that you remove the reference from here, just "Trove" rather than "same Trove instance".

Reading back, I can see how this maybe ties in with trove_id and we're back to you needing to explain what that was about when it was first introduced.

The rest of the paragraph is fine.

Configuring the Lorry controller

The configuration for the Lorry controller is stored in a git repository created during the initial trove-setup of the unit. The repository will be stored in the customer's trove_id, as TROVE_ID/local-config/lorries. In the root of that repository must exist a file called lorry-controller.conf, which is a JSON document consisting of a list of dictionaries.

What is "trove-setup"? i.e. why the hyphen?

I am still not clear what a "trove_id" is. Is it a directory? Is it a repo? Is it an arbitrary bit of text used to identify something? Terms you use MUST be defined.

[
    {
        "type": "trove",
        "uuid": "TROVE_ID/initial",
        "serial": 1,
        "trovehost": "git.baserock.org",
        "ls-interval": "4H",
        "interval": "2H",
        "create": "always",
        "destroy": "never",
        "stagger": true,
        "prefixmap": {
            "baserock": "baserock",
            "delta": "delta"
        },
        "ignore": [
            "baserock/lorries"
        ],
        "tarball": "always"
    },
    {
        "type": "lorries",
        "uuid": "TROVE_ID/open-source-lorries",
        "serial": 1,
        "interval": "6H",
        "create": "always",
        "destroy": "never",
        "stagger": true,
        "prefix": "delta",
        "tarball": "always",
        "globs": [
            "open-source-lorries/*.lorry"
        ]
    }
]

Those are two very non-random looking UUIDs! If they aren't UUIDs, and the example ones certainly aren't, you should explain why you have used the tag "uuid".

The Lorry controller reads this configuration file on its first run, and then monitors changes to it. Whenever the configuration file changes, the Lorry controller compares the new file with the current state, and makes any changes needed.

How often does this happen? Is the frequency configurable?

The configuration file has two types of stanzas:

trove specifies an upstream Trove instance to mirror
lorries specifies a set of .lorry files in the configuration git repository to use to mirror individual repositories

See the next section for how to write .lorry files. With a trove stanza, Lorry controller connect to another Trove instance, use the ls command in gitano to get a list of git repositories, and mirror those. The list of git repositories can be limited, see below. For each repository, Lorry controller then generates an internal .lorry file, using parameters it deduces automatically.

The second sentence is so badly constructed I can't make sense of it. Do you mean "connects" rather than "connect"? The second comma should be something else, full stop or semicolon probably. The clause after the second comma is an imperative, is it meant as such? Or does it mean "You can use..."? The third comma appears to be totally redundant. "gitano" appears out of nowhere. There is no indication of why we would want to mirror repos nor where we would mirror them to.

After that it is not too bad, although it makes a forward reference, which is often a bad idea.

At frequent intervals, the Lorry controller runs all the .lorry files it has (whether it generated them itself, or got them from the configuration repository), and runs the ones that are due.

The following fields can be used in the stanzas.

field	description
`type`	type of stanza; must be `trove` or `lorries`; mandatory
`serial`	if this changes, trigger all lorries from this stanza at once
`trovehost`	domain name of another Trove instance (`trove` only)
`ls-interval`	interval between checking the upstream Trove for new or removed repositories.
`interval`	interval between runs of `lorry` for each respository for this stanza.
`create`	when to create new repositories: `always` or `never`
`destroy`	when to destroy repos that vanish: `always`, `never`, or `unchanged`
`prefixmap`	map local Trove's prefixes to remote ones (`trove only`)
`ignore`	glob patterns on repository pathnames which to not mirror (`trove` only)
`globs`	glob patterns on `.lorry` files which to use (`lorries` only)

The intervals for ls-interval and interval are in seconds (no suffix), minutes (M suffix), hours (H suffix), or days (D suffix). The suffix may be in upper or lower case. There must be no space between the number and the suffix. For example, 3600, 60m and 1H all specify an interval of one hour.

It is important that you do not configure your interval to be so short that Trove cannot complete the mirroring process in time. If your interval is too short then Trove may never "catch up" with itself. It is recommended that if it takes one hour to run lorry for all of the repositories in a stanza, then the interval for that stanza should be no less than 90 minutes in order to cope with instances of lorry taking longer sometimes. If you need some parts of your Trove to be mirrored more often than this, you can simply create additional stanzas and use the ignore and prefixmap entries to control what you mirror at what schedules.

Your Trove comes pre-configured to mirror the Baserock Trove. You can just use the default configuration, until and unless you need to add new repositories, or you need to mirror from other Troves, such as those run by chipset vendors or third party contractors.

This whole section is good solid information. The only thing I would like clarified is at the start of last paragraph: "Your Trove comes pre-configured to...", does it mean "As part of the set up process, a Trove is configured to..."? If so, I think the latter is clearer and less ambiguous.

Writing .lorry files

Lorry can convert Bazaar, Mercurial, Subversion, and CVS repositories to git, and create git repositories from tar archives. It can also mirror git repositories from elsewhere. All of these are controlled by writing .lorry files specifying what to mirror or to convert, and what the result should be called.

A .lorry file uses JSON syntax. The entire file is a JSON object (key/value mapping), where each key is the nickname for a repository to convert. The corresponding value is another JSON object, where the key/value pairs specify what type the repository is, where it is, and so on. This is best explained using an example:

{
    "git": {
        "type": "git",
        "url": "git://github.com/gitster/git.git"
    }
}

The above file tells Lorry how to mirror the git repository for git itself. A .lorry file can contain as many repositories as needed. The example below has an example for every type of repository Lorry supports.

I know it's not really Baserock's fault but how can anybody fail to be confused by an example where out of 38 alphabetics, 21 are the word git repeated 7 times and the rest of the text consists of {":,/.} in various combinations? While it may be amusing to geeks (me included), it doesn't seem like a really helpful example.

{
    "git": {
        "type": "git",
        "url": "git://github.com/gitster/git.git",
        "refspecs": [
            "snap", "master", "+next", "naster", "initial",
            "maint", "maint-1.7.6", "maint-1.7.7",
            "refs/tags/*"
        ]
    },
    "bzr": {
        "type": "bzr",
        "branches": {
            "trunk": "lp:bzr/trunk"
        }
    },
    "mercurial": {
        "type": "hg",
        "url": "http://selenic.com/hg"
    },
    "subversion": {
        "type": "svn",
        "url": "https://svn.apache.org/repos/asf/subversion/",
        "layout": "standard"
    },
    "cvs": {
        "type": "cvs",
        "url": ":pserver:anonymous@cvs.savannah.nongnu.org:/sources/cvs",
        "module": "ccvs"
    },
    "gcc-tarball": {
        "type": "tarball",
        "url": "http://ftp.gnu.org/gnu/gcc/gcc-4.6.2/gcc-4.6.2.tar.bz2"
    }
}

Note how every repository specifies at least the type field. Everything else is dependent on that field.

field	description	types of VCS
url	URL to remote repository	all except Bazaar
refspecs	ref names (branches, tags) to push	all
branches	mapping of branch names to URLs	Bazaar
layout	repository layout	Subversion
module	name of module to convert	CVS

For all:

url: the URL to the remote repository to be mirrored
refspecs: the ref names (branches, tags) to push to the git server. Default is all.

See git push's documentation for what is allowed here.

Does the last sentence apply to both the preceding list items?

refspecs description refers specifically to "the git server".

Both of these are confusing as both items are supposed to be "for all". This sort of implies that both of the tags could be used for all repo types but there are no instances of refspec for repositories other than "git" in the example above.

For Bazaar:

branches: a mapping of branch names to URLs for the branches, in addition to the branch specified in url, which gets mapped to trunk. It is customary to not specify a url for Bazaar, only branches.

For Subversion:

layout: specifies the Subversion repository layout. This is used to locate where the master branch, other branches and tags can be found in the repository. The usual layout of being at the base of the repository url looks like this:

layout of being - missing word(s)? It doesn't make sense to me.

    PROJECT-1/
    |- trunk/
    |  |- src
    |  \- doc
    |- branches/
    |  |- foo
    |  |  |- src
    |  |  \- doc
    |  |- bar
    |  |  |- src
    |  |  \- doc
    |  \- baz
    |     |- src
    |     \- doc
    \- tags/
       \- v0.1
          |- src
          \- doc
    PROJECT-2/
    |- trunk/
    |- branches/
    \- tags/

The layout to lorry this is:

    "url": "http://example.com/svn/PROJECT-1",
    "layout": {
        "trunk": "trunk",
        "branches": "branches/*",
        "tags": "tags/*"
    }

This is by far the most common, so may be referred to as "layout": "standard".

Another common layout, which is somewhat inverted, has the projects inside the branches:

    trunk/
    |- PROJECT-1/
    |  |- src
    |  \- doc
    \- PROJECT-2/
    branches/
    |- PROJECT-1/
    |  |- foo
    |  |  |- src
    |  |  \- doc
    |  |- bar
    |  |  |- src
    |  |  \- doc
    |  \- baz
    |     |- src
    |     \- doc
    \- PROJECT-2/
    tags/
    |- PROJECT-1/
    |  \- v0.1
    |     |- src
    |     \- doc
    \- PROJECT-2/

The layout to lorry this is:

    "url": "http://example.com/svn/",
    "layout": {
        "trunk": "trunk/PROJECT-1",
        "branches": "branches/PROJECT-1/*",
        "tags": "tags/PROJECT-1/*"
    }

Sometimes you only want one part of a project. For example, you may only care about the src directory of PROJECT-1 when it has the first layout. You could lorry only that with the following config:

    "url": "http://example.com/svn/PROJECT-1",
    "layout": {
        "trunk": "trunk/src",
        "branches": "branches/*/src",
        "tags": "tags/*/src"
    }

You can also only lorry a fixed set of branches. For example, you may only want the bar and baz branches of PROJECT-1.

    "url": "http://example.com/svn/PROJECT-1",
    "layout": {
        "trunk": "trunk/src",
        "branches": "branches/{bar,baz}",
        "tags": "tags/*"
    }

I'm afraid I've never used subversion so can't really comment on how useful the rest of this is.

For CVS:

module: names the CVS module to mirror.

Mercurial and tarball do not have any additional fields.

All seems good here.

Monitoring the Lorry controller

The Lorry controller provides a status page, which it keeps up to date. You can find it at the following URL:

http://TROVE_HOST/lc-status.html

There is a link to this at the bottom of the front page of your Trove host.

What front page? This is the first mention that I can see. It is also the first use of "Trove host" in a sentence rather than a code block.

Please, define your terms.

The Lorry controller is in one of six states when executing:

initialisation
loading repository listings from other Troves
removing vanished repositories
creating new repositories
running Lorry on due .lorry files
waiting for the next run ("finished")

The states are shown as "tabs" at the top of the page. Underneath are two tables. The first one shows the other Troves that the Lorry controller is configured to mirror, and the second one shows all the individual repositories it is mirroring, whether from other Troves or individual .lorry files.

Minor point: "finished" seems a slightly odd name for the state, wouldn't "idling" or similar be better? I'm not that bothered by it but Baserock has too much jargon as it is.

The Morph Cache Service

What is the cache service

The cache service is logically split into two halves. The access to repository data, and the access and management of binary artifacts.

I find this paragraph hard to understand. Do you mean "The accessing of" rather than "The access to"?

The cache server software is implemented as an HTTP service on non-standard specific TCP ports and access to those ports can be managed at the network layer.

Wht are these "non-standard ports? Why are they "non-standard"? Could "standard" ports have been used? If not, why not?

i.e, please give more explanation.

Repository data access

The repository data access interface is used by Morph to retrieve information about repositories. This allows Morph to determine what needs building and to process morphologies without having to clone all the Git repositories into their local caches.

Does this mean that the user does not really need to concern himself with this interface? If so, you should say so.

Binary artifacts

The binary artifact interface is split into two parts. The read-only interface is visible to all users of the Baserock ecosystem and allows access to download binary artifacts from the system. This allows Morph to fetch pre-built artefacts from the cache rather than having to re-build them itself on every node. The write-enabled interface allows the distributed-build controller to cause the Trove-based Morph cache server to retrieve artifacts from the distributed-build worker nodes.

For clarity, I'd suggest that you have a separate paragraph for each half of this interface. I would then start "The first part is read-only and is visible to all users of the Baserock ecosystem. It allows..." and so on.

Short, sharp sentences tend to be easier to absorb than long, multi-part ones.

What uses the cache service

There are three main classes of users of the cache service. Automated Morph instances, human-driven Morph instances and non-Morph services. Automated Morph instances include those running on distributed-build workers. These workers have access to the read-only interfaces of the cache service on the Trove and use it to improve performance during the preparatory phase of building artifacts. Human-driven Morph instances run on engineers' workstations and also have access to the read-only interfaces of the cache service for the same purpose. Non-Morph services, including the distributed build controller, can have access to the read-only interfaces and, in the case of the distributed-build controller, the write-enabled interfaces too. The distributed-build controller can cause the cache server on the Trove to retrieve artifacts which its workers have finished building, in order that they will be available during further builds.

Again, for clarity, I would use a separate paragraph for each of the classes.

Again, you are using concepts ("distributed-build workers") without previously defining them.

What configuration applies to the cache service

The only configuration which is applicable at this time is that of any firewalls involved in managing access to the Trove's write-enabled cache server interface. Typically your Trove will already be configured so as to only allow write access to the Trove from the distributed-build controller node.

The Git Service

What is it

Trove has a Git service which provides storage and access to a large collection of Git repositories. Trove's Git service also provides a rich access control system for project access, role-based access and various other important features often needed in revision control. The Git service also offers a web interface for browsing all repositories in the Trove which have been marked as visible to all users.

This is excellent.

Top level configuration

The Git service is configured entirely in a Git repository of its own. In normal operation no-one should need to interact with this repository. Instead users and administrators use the remote administration features built into the service. Trove builds a complex default ruleset which provides for roles, system users, groups and a concept known as your 'trove_id'.

At last something that tries to say what a 'trove_id' is. Unfortunately it's very woolly and still doesn't tell me what it actually is, how it comes about/is created, what format it takes, if it can be changed and so on. Describing it as a "concept" is at odds with its use earlier on as something that can be used in commands etc. which sort of implies it is a label or has some kind of alphanumeric value that can be used in commands.

Everything in this section of the manual corresponds to the default set of rules provided with Trove. If your local Trove has been modified in a non-standard way then this documentation may not apply.

Roles within Trove

Trove defines system roles which relate to the administration of the Git service and also some roles related to the rest of the Baserock ecosystem. Some of these roles are to facilitate the automation of building and take the form of users in the system. Others are to facilitate the management of the access rules and take the form of groups.

The primary roles defined by the default rules are:

Trove 'System' administration
Trove 'Site' administration
Project administration, management, write access and read access.
- These roles are an automatic multiplicity based on the path to the repository in consideration.
Lorry (source code acquisition service)
Distributed build (Distributed-build controller and workers)
Worker (Any of the above who has read-access to everything)

This is mostly nice and clear although "are an automatic multiplicity" is a very convoluted way of saying ... what?

Default provided users, groups and projects

Trove comes pre-configured with the following users and groups:

Users
- trove
  - This is the system administration user.
  - It corresponds to the POSIX account on the Trove system which owns the Git service.
- lorry
  - This is the source code acquisition user.
  - It corresponds to the POSIX account on the Trove system which runs the Lorry service.
- distbuild
  - This is the user for the distributed-build infrastructure.
  - It corresponds to both the controller and the workers which form the distributed-build network used by your engineers to build systems.
Groups
- gitano-admin
  - This is the system administration role for adding/deleting users and repositories
  - It contains the trove user by default.
- workers
  - This is the group which provides read-only access to the entire trove.
  - This contains distbuild by default.
  - Do not put humans into this group.
- trove-admin
  - This is the group which provides administration access to the users and groups on the Trove.
- During initial setup, one of your managers will have been granted membership of this group and will then have set things up further.

I have a slight problem with the very last item being in the reference manual. The said manager may well be reading this manual to find out how to do what the manual is telling him may have already have been done.

Projects
- local-config
  - This project contains the local Lorry configuration for your Baserock ecosystem.
  - Nobody has access to this project by default.

The last item here may well be true but it is insufficient information: should this remain the case? If so, why? If not, who should be given access and why?

What is a 'Trove_ID'?

The 'Trove_ID' is the part of the repository path name or branch name which is unique to your Trove instance.

All source code projects local to your instance of the Baserock ecosystem will be stored underneath your trove_id in your Trove. All local branches of other repositories which are not stored in your trove_id will contain your trove_id in the branch name instead.

ok, this tries to give a definition of Trove_ID! Still some questions though:

Who decides what the Trove_ID is? Is it automagically generated? Is it picked by the user? Is there a standard format for it?

The two uses of "will": do you mean "will" or "must"?

Morph allows you to specify the hostname and ID of the Trove from which it should pull source code and binary artefacts, using the trove-host and trove-id variables.

Where, when and how does it allow you to specify these items?

For example, when Trove is shipped to you, it contains several trove_ids:

baserock
- This trove_id contains the Baserock open source code, such as the source to morph, lorry etc.
delta
- This trove_id contains all open source "upstream" projects which form part of Baserock. For example you will find linux, busybox and gcc here.

I think what is missing is an overview of how all these Baserocky things relate to Linuxy things.

What Baserock things are solely concepts? What Baserock things are configuration items? In which config files? What Baserock things are Linux files? Where are those files? What Baserock things are Linux directories? Where are those directories? What Baserock things are Linux programs? What are the programs called?

Baserock may be an ecosystem but it is not a shell. The user is still operating within the Linux environment so you need to relate all Baserock terms to that environment. The relationships may be obvious to you but they are not necessarily so to a new user.

Accessing the git service

The Git service can be accessed in three ways. There is a web interface to allow you to browse the open source repositories (and any other repositories which are designated as all-access). Also those repositories are available via the git:// protocol for rapid cloning.

Do you mean "which are designated as all-access" or do you mean "to which you have authorised access"?

All other access must be done via the Secure Shell protocol (ssh) which provides both encryption and authentication so that the Git service can provide access control to your repositories.

Since all ssh access to the Trove is via the git POSIX account on the Trove, you must provide an ssh public key to the Git service. For information on how to do this, see the section on creating and managing users below.

can you make the forward reference into a hyperlink?

To verify if you have access to the Trove, you can run the following command:

ssh git@LOCAL-TROVE-ADDRESS whoami

"LOCAL-TROVE-ADDRESS" is another new thing. What is it?

If you have access, you will be presented with the information that the Trove has stored about you, along with your group memberships. If you do not have access then it is likely that you will be presented with a Password: prompt which will never grant you access.

When you say "have access", do you mean "are authorised to access"?

If entering a password will never grant you access, why issue the prompt?

Your development machine's morph.conf will carry information on how to access the local Trove instance, including your 'trove_id'. Thus for the most part, you will only interact with the Trove Git service via Morph's commands for cloning, branching and editing repositories.

This is the first and only reference to "morph.conf" in this document. What is it? Where is it? Who sets it up? How is it set up? When is it set up? What does it contain?

It is not immediately obvious to me how the second sentence is a concomitant of the first. Can you explain why it follows, or will it be obvious when "morph.conf" is explained?

Trove and Project access levels

Access to the Trove Git service is carefully structured to allow you to screen access to the codebases stored in the Trove while still allowing systems to be built.

Minor point: maybe it's just me, but "screen access" caused me a moment's confusion because I initially read "screen" as a noun adjunct to "access" rather than as a verb. Might it be better to use "restrict", "validate" or something similar?

Once I'd got my head round that, it made perfect sense.

The top level 'access all areas' role is the system administration role represented by the gitano-admin group. This role is typically used only during the very early setup of your Trove instance.

"set up" should be two words. In fact, it should probably be "setting up".

The primary 'site' administration role is represented by the group trove-admin and when your Trove was configured, at least one person will have been granted membership of this group. Membership of this group grants the user the right to administer other groups (other than gitano-admin) and the right to manage users. Members of this group will be involved with creating user accounts, granting access to projects, etc.

Missing comma after the first "and". Alternatively, move the parenthetical clause to the end of the sentence and do without the commas altogether.

Content is good though.

Within a project there are four levels of access. These are referred to as the -managers -admins -writers and -readers roles. They offer, respectively, reducing access to the given project. Projects are named for the part of the path to the repository after your trove_id and before the repository name. For example, the repository named trove_id/local-config/lorries is in the local-config project and is therefore governed by the groups local-config-manager local-config-admins local-config-writers and local-config-readers.

Membership of these groups is hierarchical. Thus all -managers are implicitly -admins, all -admins are implicitly -writers and all -writers are implicitly -readers. The rights granted at each level (from -readers up) are:

-readers may see the repository and clone from it. They may fetch updates from the repository as and when they wish.
-writers may create branches and push to refs within the repository. The naming scheme of the branches is up to local policy.
-admins may alter the administration ref of project repositories and may create and destroy repositories for a project. They are also permitted to perform remote repository configuration operations, such as setting the HEAD of the repository.
-managers may query and modify the makeup of the groups associated with controlling access to the project. For example, a member of -managers may grant read, write, admin or management access to a project to any other user in the system.

Note: membership of trove-admin does not imply membership of any of the -managers groups.

What is "the administration ref"?

Otherwise, this is really good, meaty stuff

Creating and managing users

Within Trove's Git service, users represent entities which have access to Git repositories. These can either be humans or programs which need source code access for various reasons. The role which is permitted to perform the management of users is the trove-admin role.

Can you make it clear whether or not your use of the word "user" refers to bog-standard Linux users?

The second sentence is ambiguous. "These" could refer to any of the plural nouns in the preceding sentence. I presume that you mean the users, if so, just add the word "users" after the word "These" to remove the ambiguity.

Users can be listed, created and destroyed using the user command. Each user has a number of ssh public keys associated with it. These keys can be managed using the sshkey command coupled with the as command.

Creating a user is therefore a two-phase operation. From here on, we assume that trove-cmd is either a shell alias for morph trovectl or else a shell alias for ssh git@local-trove-address. You can substitute either of the expansions for the trove-cmd in the below commands and everything will otherwise work as stated.

$ trove-cmd user add someusername email.address@domain.com Real Person

The above command will add a user to the Trove Git service. The user's username will be someusername and Trove will store an email address and real name for the person as provided. These are used by Trove if the user makes changes to administration refs or to their own configuration in any way. Since all of the Git service's configuration is also stored in a Git repository, changes will be committed using the user's details where appropriate.

$ trove-cmd as someusername sshkey add sometag < somekey.pub

The above command then adds an ssh public key to the user named. The key is given a tag, 'sometag', which is used to identify the key in the case that multiple keys are registered to a single user. This information can be used in complex rules which are beyond the scope of this document. The public key to register for the user is provided in the file somekey.pub and must be in OpenSSH public key format.

I have added the parenthetical "'sometag'" to the second sentence, just for the sake of completeness and clarity.

Once these two commands have been run, the new user can check their access by executing the command:

$ trove-cmd whoami

Which should display their personal details as entered above.

Removing users should only be done in extreme circumstances. Instead should access need to be revoked, it is better to simply remove all the ssh keys registered with a user. This can be done by the following sequence of commands...

$ trove-cmd as someusername sshkey list

...to acquire the list of keys registered to the user, and...

$ trove-cmd as someusername sshkey del sometag

...for each tag listed in the first command.

I can guess why removing the user is a bad idea but it would be helpful to spell out why, otherwise this just comes across as a hint/tip rather than a recommendation.

All users have the right to create branches under the local trove_id in every repository which is not part of any local project. For example, any registered user may create branches named under the local trove_id in the baserock/morphs repository. Commonly this functionality will only be used by Morph automatically creating build branches as users build systems.

Creating and managing projects

Projects are the fundamental access control primitive which will be used day-to-day. Projects are designated by the path element directly after the local trove_id in your Trove. For example, the repository trove_id/local-config/lorries is in the local-config project.

Projects exist ephemerally as a side-effect of users have rights granted to them and repositories existing. The default ruleset has an expectation of how the access groups will be set up. Thus, to create a project requires that you choose a name for the project, nominate a user to be the manager for that project and then that you create the project's set of groups. The nominated user can then set about granting access to other users and/or creating repositories.

Last paragraph, first sentence: "having", not "have".

Second sentence: the sentence structure here comes across to me as a bit stilted. It might be clearer to say "The default ruleset requires that certain rules must be followed when setting up access groups." or something similar.

Do you need to create the project directory or will that happen automagically?

Users who are a member of the trove-admin role group may create project groups.

Creating projects

Assuming that we are creating a project called new-proj and that the user who has been nominated to manage the project has the username stevesmith then the command sequence to create and configure all of the project groups would be:

$ trove-cmd group add new-proj-managers Managers for the Badger IVI Project
$ trove-cmd group adduser new-proj-managers stevesmith

$ trove-cmd group add new-proj-admins Repository admins for the Badger IVI Project
$ trove-cmd group addgroup new-proj-admins new-proj-managers

$ trove-cmd group add new-proj-writers Users with write-access to Badger IVI repositories
$ trove-cmd group addgroup new-proj-writers new-proj-admins

$ trove-cmd group add new-proj-readers Users with read-access to Badger IVI repositories
$ trove-cmd group addgroup new-proj-readers new-proj-writers

While that sequence of commands seems long, it does configure the series of four groups necessary to screen access to the new-proj project, along with chaining the groups together so that higher access permissions grant lower permissions automatically. It also ensures that the user stevesmith is capable of configuring the access control as the project requires.

This looks like a perfect candidate for a script.

N.b. Even if a script is provided, I'd still keep the above text because it clearly explains the mechanism by which the hierarchy is established.

Controlling access

As stated above in the Trove and Project access levels section, access to projects is split into four levels. It is not even possible for a user to gain read access to a project unless they are in the -readers group either directly or by virtue of being in one of the other groups for the project. The groups relevant to a project called foo would be foo-managers, foo-admins, foo-writers and foo-readers.

To examine a group's membership you can use the group command as follows:

$ trove-cmd group show GROUPNAME

This will not show the users who are members of the given group by virtue of being members of another group.

I would also describe what it does show.

To add a user to a group, you would use the following command:

$ trove-cmd group adduser GROUPNAME USERNAME

However, to remove a user from a group is a two step process. First run:

$ trove-cmd group deluser GROUPNAME USERNAME

Your Trove will then give you a confirmation code which must be added to the command line to make the group deluser work.

$ trove-cmd group deluser GROUPNAME USERNAME CONFIRMATIONCODE

The confirmation codes are 40 hex-digit numbers and are related to the state of the Trove's administration repository. If someone else performs an administration command in between your two commands then Trove will inform you and provide you with a new confirmation code. You should verify that your change will not conflict with what has happened and then try again.

Excellent stuff.

Creating and destroying repositories

Trove's Git service requires that repositories be created before they can be used. This is to ensure that typographical errors do not result in a proliferation of misspelled auto-vivified repositories.

Only the lorry user is permitted to create repositories outside of the local trove_id and the lorry user is controlled by lorry-controller on the Trove. The lorry-controller can be managed via the TROVE_ID/local-config/lorries.git repository. Within the local trove_id, only members of a -admins group may create repositories and only inside the projects they administer.

Continuing the example from above, if stevesmith wishes to create a repository in the new-proj project for the system morphologies, he might run a command along the lines of:

$ trove-cmd create trove_id/new-proj/morphs

As with removing users from groups, the destruction of repositories requires confirmation in the form of a confirmation token. The tokens are acquired by attempting to destroy a repository...

$ trove-cmd destroy trove_id/new-proj/obsolete-stuff

...and once acquired, can be used to complete the destruction process...

$ trove-cmd destroy trove_id/new-proj/obsolete-stuff CONFIRMATIONTOKEN

When a repository is destroyed, it is moved to a graveyard. Currently only someone with filesystem access to the Trove instance can restore a repository from the graveyard. If you require a repository restoring on your Trove then you will need the information provided by the destroy command when it completes the destruction of the repository.

Final paragraph:

This is the first ues of the term "filesystem access". It's meaning is obvious but what is less obvious is who might have such access or how it can be obtained. You should indicate one or the other (or both).

You say that "you will need ...", does this mean that you must record the information at the time of deletion? Can it be subsequently recovered? If the former, you really need to emphasise it more. someone deleting a repo might well be convinced at the time that it is absolutely necessary and only discover later that they were wrong, by which time the information has already been lost.

Apart from that, all of the above seems excellent. Good, solid content.

Command reference

I think you should have a short section here showing the general form of usage of the following commands. e.g.

trove-cmd <command> <options>

and possibly reiterate the definition of trove-cmd.

The `as` command

usage: as <user> <cmdline>...

Runs the given command line as the given user. The only limitation is that you are not permitted to run 'as' as someone else.

Is anyone allowed to use this command? If not, what restrictions are there?

In fact, reading on, I think you should make it clear for each command what membership level is needed to run it. A few already have this information but most don't so I will not flag each one up that needs it adding.

The `config` command

usage: config <reponame> <cmd> [args...]

View and manipulate the configuration of a repository.

config <reponame> show [<filter>...] List all configuration variables in which match any of the filters provided. The filters are prefixes which are matched against the keys of the configuration variables.

For example: config sampler list project will list all the project configuration entries for the sampler.git repository.

Keys which represent lists are shown as foo.* If you wish to show the detailed key, showing the index of the entry in the list then you should set the filter exactly to foo.* which will cause the show command to expand list keys into the form foo.i_N where N is the index in the list.

It is not clear to me how this relates to the config file as described earlier.

Could the earlier example perhaps be brought in line with the content of this section or vice versa?

config <reponame> set key value Set the given configuration key to the given value. If the key ends in .* then the system will add the given value to the end of the list represented by the key. To replace a specific entry, set the specific i_N entry to the value you want to replace it.

I don't understand this. Could you clarify, perhaps with an example?

config <reponame> {del,delete,rm} key Removes the given key from the configuration set. If the key ends in .* then the system will remove all configuration values below that prefix. To remove a specific element of a list, instead, be sure to delete the i_N entry instead.

The command syntax with the brackets is not defined anywhere in this document.

The term "configuration set" has not been previously used or defined.

The word "below", do you mean "with"?

What does "i_N" mean?

The `count-objects` command

usage: count-objects repo [options]

Counts objects in your repository.

You must have read access to the repository in order to run count-objects.

The options are passed to git count-objects and the most commonly useful option is -v to increase the verbosity of the count.

The `create` command

usage: create <reponame> [<owner>]

Create a new repository, optionally setting its owner directly.

In order to create a repository, the site administrators must grant you the ability in some part of the namespace. Specifying an owner is equivalent to creating the repository and then calling set-owner to re-assign it.

Wrong tense: use "have granted", not "grant".

I presume that by "ability" you mean "access rights"? If so please use the same terminology used everywhere else in this document.

I'd add "(q.v.)" after "set-owner" and I'd probably either back-tick it or hotlink it to the command description (can you do that in Wiki?).

The `destroy` command

usage: destroy <repo> [confirmtoken]

This command destroys a repository. Run without a confirmation token it will tell the caller what the confirmation token is for that repository. The caller will then run the destroy command again with the confirmation token if they really do wish to destroy the repository.

I'd use "The caller must" rather than "The caller will". It better conveys the essential nature of the second step.

The `gc` command

usage: gc repo [options]

Invoke git gc, passing the given options, on the given repository. You must have basic write access to the repository in order to invoke a gc.

The options will be passed to git gc and the most commonly used options are --auto (to only gc if Git thinks it worthwhile) and --aggressive to cause Git to try much harder to gc the repository.

It might be nice to give a brief idea of what the "git gc" command does.

e.g. add a clause such as "to perform standard housekeeping on the repo".

It costs nothing but a couple of dozen bytes of storage and may be of help to those still feeling their way around the system.

The `group` command

usage: group [list]
       group show <groupname>
       group add <groupname> <description>
       group del <groupname [confirm token]
       group description <groupname> <description>
       group adduser <groupname> <username>
       group deluser <groupname> <username> [confirm token]
       group addgroup <groupname> <groupname>
       group delgroup <groupname> <groupname> [confirm token]

The syntax that uses "[]" and "<>" is not defined anywhere in this document.

The use of "" is commonly known (it should still be defined though) and, as such, could probably be used in the earlier examples where CAPITAL_LETTERS were used to indicate user supplied variable text.

I'd also suggest using "" or similar rather than "trove-id".

With no subcommand, or the subcommand list the user command will show a list of all the users, along with their descriptions

Showing a group will display membership information

What membership information? Describe it or give an example.

Adding a user to a group adds the user to the direct membership list. Removing a user from a group removes them from the direct membership list only.

If this is saying that it does not affect their inherited membership, you should spell that out unambiguously, not leave it to the user to deduce it.

If you add a group to a group, you are stating that everyone in the sub group is to be considered a member of this group also. Removing a group undoes this effect.

Please unambiguously identify which group you mean.

If I have interpreted the syntax correctly, I suggest:

If you add a group to a group, you are stating that everyone in the sub group is to be considered a member of the primary group also.

e.g. the command group addgroup <group1> <group2>, is stating that everyone in <group2> is to be considered a member of <group1> and have all the access rights of a member of <group1>.

Removing a group undoes this effect.

To delete a group, remove a user from a group or remove a group from a group requires a confirmation token which will be supplied to you if missing.

This is inadequate, give each sub-command a full section of its own.

The `help` command

usage: help [admin|all|commandname]

Without the command argument, lists all visible commands. With the command argument, provides detailed help about the given command.

If the command argument is specifically admin then list the admin commands instead of the normal commands. If it is all then list all the commands, even the hidden commands.

This is the first (and only) mention of "visible" and "hidden" commands. This needs explanation, here or elsewhere.

The `ls` command

usage: ls [--verbose|-v] [<pattern>...]

List repositories on the server. If you do not provide a pattern then all repositories are considered, otherwise only ones which match the given patterns will be considered. If you specify --verbose then the HEAD ref name and the description will be provided in addition to the access rights and repository name, separated by tabs.

Patterns are a type of extended glob style syntax:

  ? == any one character except /
  * == zero or more characters except /
 ** == zero or more characters including /

Any other characters are "as-is" except \ which escapes the next character.

If your pattern contains no / and no ** then it will be matched against leafnames of repositories, no matter the depth of the filesystem tree.

Note, this means that if you run ls foo then the server is going to look for repositories called foo.git rather than look in side a folder called foo/. For the latter, do ls foo/ instead.

Good stuff. Except for the random space that has crept into "inside" in the penultimate sentence.

The `readme` command

usage: readme <reponame>

Shows you the readme for the given repository. You must have read access to the repository in order to see it. The readme is typically provided in the README.mdwn file in the administration ref.

What is "the administration ref."?

The `rename` command

usage: rename <repo> <newname>

Renames a repository to the given new name. In order to do this, you must have the ability to create repositories at the new name, the ability to read the current repository and the ability to rename the current repository.

"at the new name" is not good English. I suggest using at <newname> instead, or "at the location specified by the given new name".

The `set-description` command

usage: set-description <repo> Description text

Sets the short description of the repository to the given text. This text is used in the web-based display tool or shown in ls --verbose output.

The `set-head` command

usage: set-head <repo> <ref>

Sets the HEAD of the repository to the given ref. You may need to be the owner of the repository or the project administrator to use this command.

The `set-owner` command

usage: set-owner <reponame> <owner>

Set the owner of a repository. Who is allowed to do this is configured by the site administrators. Typically site admins and repository owners are the only people allowed to change the ownership of a repository.

Why "Typically"? Who else might be able to do it and how and why?

The `sshkey` command

usage: sshkey [list]
   sshkey add <tag>
       sshkey del <tag>

sshkey list

With the list subcommand (or no subcommand), sshkey will list the ssh keys you have, similarly to the whoami command.

sshkey add <tag>

Adds an ssh key with the given tag. The content of the key must be supplied on a single line on stdin.

cat id_rsa.pub | ssh gitano@somehost sshkey add personal-laptop

sshkey del <tag>

Removes an ssh key with the given tag. If the current access is via that ssh key then you cannot remove it. Add a new key and switch to using that one first.

This all seems good except that "gitano@..." has suddenly appeared without prior introduction. According to Wikipedia it means a Spanish gypsy.

The `user` command

usage: user [list]
       user add <username> <email> <real name>
       user del <username> [confirm token]
       user email <username> <email>
       user name <user> <real name>

With no subcommand, or the subcommand list the user command will show a list of all the users, along with their email addresses and real names.

With the add subcommand, you can add a new user to the system.
With the del subcommand, you can delete a user from the system.
With the email subcommand, you can change a user's email address.
With the name subcommand, you can change a user's real name.

If you try and delete a user, you will need to paste a confirmation token which will be supplied if you try and delete the user without it. That token is reliant on the state of the admin repository. Any admin operations performed between the two delete attempts will invalidate the token and you will have to retry.

All good, although I feel the description of using tokens could be improved.

The `whoami` command

usage: whoami

Tells you who you are, what your email address is set to, what keys you have registered etc.

Good, although the description is a bit terse. Perhaps a sample of output would help?