The main building blocks of the API are put together this way:
graph TD
api(FastAPI)
pubsub(Pub/Sub) --- api
redis(Redis) --- pubsub
models(Pydantic) --- api
db(MongoDB) --- models
storage(Storage)
api --> client
storage --> client
The client
component can be a number of things. Each pipeline step is a
client, and there could be extra standalone ones too. They interact via the
API and the storage services separately. By default, storage is provided by
SSH for uploads and HTTP for downloads for standalone deployments using
docker-compose
. Production deployments may use alternative solutions as
long as all the binaries are accessible via direct HTTP URLs.
The new KernelCI API is a complete redesign to replace the legacy backend used in production, which has several limitations. In particular, the previous backend API:
To overcome these limitations, the new API has the following characteristics:
A few things aren’t changing:
Here’s a simplified view of the current pipeline:
graph LR
trigger(Trigger) --> new{New <br /> kernel?}
new --> |yes| tarball(Tarball)
new -.-> |no| trigger
tarball --> |event: <br /> new revision| scheduler(Test <br /> Scheduler)
scheduler --> |event: <br /> test running| runtime(Runtime <br /> Environment)
runtime --> |event: <br /> test complete| email(Email <br /> Report)
Each step is a client for the API and related storage solutions. They are all
implemented in Python, rely on the KernelCI Core tools and run in a separate docker-compose
container. Here’s a summary of each step:
The Trigger step periodically checks whether a new kernel revision has appeared
on a number of git branches. It first gets the revision checksum (SHA-1) from
the top of the branch from the remote git repo and then queries the API to
check whether it has already been covered. If there’s no revision entry with
this checksum in the API, it then pushes a new checkout
one with “running”
state.
The Tarball step listens for pub/sub events about new revisions or checkout
nodes. When one is received, typically because the trigger pushed a new one to
the API, it then updates a local git checkout of the full kernel source tree.
Then it makes a tarball with the source code and pushes it to the storage.
Finally, it sends an update to the checkout
node to the API with the URL of
the tarball added to the list of artifacts and the status set to “available”.
The Scheduler step listens for a variety of pub/sub events and then looks for
matches in the scheduler
YAML
configuration.
Every time an event matches an entry in the config, it will send a new node
with the “running” state to the API and submit a job in the designated runtime
environment.
Note: It’s then up to the job itself to send the results to the API and any files to storage. The scheduler doesn’t track the status of submitted jobs, however the system’s timeout feature will update the status of nodes that reach their timeout value.
Jobs are all the concrete parts of the actual pipeline payload: preparing some source files, running static checks, building the kernel, running functional tests on real hardware, checking regressions, post-processing the test results etc. They get run in Runtime environments.
Runtime environments could be anything that can run a job. Currently, it can be a local shell, a local Docker container, a Kubernetes cluster or a LAVA lab. Additional Runtime implementations could be on-demand VMs in GCE, LabGrid and maybe SSH. It’s of course possible to create custom ones for particular use-cases.
The current implementation for email reports is very minimalist. This pipeline step just listens for events about kernel revisions being done, then sends an email to a fixed recipient with the top-level results and details about any test failures. Here’s an example (you’ll notice a few inconsistencies, work in progress…):
[STAGING] mainline/master v6.5-11329-g708283abf896: 4 runs 0 failures
Summary
=======
Tree: mainline
Branch: master
Describe: v6.5-11329-g708283abf896
URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
SHA1: 708283abf896dd4853e673cc8cba70acaf9bf4ea
Name | Result | Total | Failures
------------------+----------+----------+---------
kver | pass | 0 | 0
kbuild-gcc-10-x86 | pass | 5 | 0
kunit | None | 0 | 0
kunit-x86_64 | pass | 104 | 2
Failing tests
=============
kunit-x86_64
------------
* exec.example.example_skip_test # SKIP this test should be skipped
* exec.example.example_mark_skipped_test # SKIP this test should be skipped