An attempt at defining an ideal pipeline
The Continuous Delivery Foundation is currently looking to build out a reference architecture, which I think is a fantastic idea. While there are a bunch of social things that need to be figured out to really "get" CI/CD, the Best Practices SIG is working to get those well documented. I thought it might be helpful for me to document what my ideal pipeline is.
When a developer submits their pull request, automatic validation begins. We validate:
- The user can make the change (DCO check, if relevant)
- The user has a strong identity (GPG signed or GitSigned commit).
- The artifact (e.g. container image) is built and non-network tests are run against it.
- The artifact is cryptographically signed.
- Any test metrics (coverage, etc) are sent to the appropriate services.
- The artifact is pushed into storage.
- An ephemeral environment is stood up
- The ephemeral environment passes some simple "health checks"
- A small suite of network tests are run (by which I mean makes network calls to mocked backends)
- The artifact undergoes validation that there aren't known security issues (known CVEs in dependencies, committed secrets, etc).
- All of the above steps are written into a datastore (e.g. rekor) with signed attestations that we can later validate.
- Someone has agreed that this code change is a good idea.
At this point, we should have a pretty well-tested system with cryptographic assurances that the relevant steps were run. When that code is merged, we will ideally re-use the work that was already done in the case of fast-forward commits, if you use those.
From here, there is no human Involvement.
As we deploy to each environment, we consult with a policy engine (like OpenPolicyAgent) at various points to ensure that all correct steps have been followed. This uses the signed attestations so we're confident.
If this is a non-fast-forward commit, we should re-run all of the PR checks again. The "ephemeral environment" can instead be replaced by some stable environment like a "dev" or "staging".
After we have deployed to this environment, we should have a few synthetic tests which run on a continual basis (every 1-5 minutes). We should validate that our metrics reflect that the synthetic is running happily. In some services where performance is critical, we may also run a simulated load test, either through traffic replay or using synthetic traffic.. depending on the service.
For performance-critical systems, we may choose to do a scale test. This could take the form of running jmeter tests against a single box.
The production deploy should be identical, except synthetics are the only networked tests we run. Both staging and production deploys are done through a progressive roll-out mechanism (e.g. 1 pod, then 2, 5, 20, 100, etc; perhaps using percents).
If at any time, a step in the pipeline fails or system alarms go off, we stop the pipeline and do any relevant rollbacks. We do not roll-back the commits, and instead rely on a developer to do that explictly.
Each one of these steps has a "break glass" feature which allows for a one-time override in case of emergency. A notification is sent to an audit log, security, and possibly up the reporting structure.
The status of the relevant steps are communicated in a chat program (e.g. Slack). To prevent lots of spam, ideally this would be one primary message for the pull request and one for the deploy pipeline with any updates being threadded.
You will note that there are no traditional "end to end" tests in this pipeline. They tend to be slow and flaky. If possible, I prefer to use a mixture of component tests and synthetics to cover similar ground.
Thank you to Todd Baert and David Van Couvering for their review.