The Service MOT takes influence from the UKs MOT which is a yearly test on motor vehicles to check they are roadworthy, rather than checking a car, we check our services. By doing MOTs you can be confident in your service and also know it is ready to be worked on at any time.
Birth of the Service MOT
The team which I am part of looks after a large number of applications. Often we will be focusing on adding features to one application at a time. This results in the other applications not having much attention and becoming stale. One of the sides effects of this is that when we do need to make a change to an application we have not worked on in a while, it often takes us a long time to make the change, due to things not working and/or things not following standards.
We had been sharing our frustrations with our Product Owner (PO) and although we are encouraged to “Scout”, meaning fixing things we see as we go, this does not fix the bigger issues and only fixes things in areas we are actively working on. We had previously tried having a “Tech Debt Register” which kind of worked, however, we had to sell each item on the register to our PO and schedule it into our Backlog.
The MOT was born out of these iterations of looking after tech debt for a large number of critical applications. It removes the burden of prioritising debt with the PO and lets the engineers be in control of what should be worked on.
How does it work?
A Service MOT should be done at regular intervals for all the services your team owns. A list of tasks is completed which will check that the service is up to standards and any work needed to bring it up to standard is completed.
The MOT should be time-boxed, so not to spiral out of control and if you don’t complete all the tasks, that is ok. As you complete more MOTs you will get faster and fixing things and also will have lest to fix.
You may own services with different development statuses e.g. you may be trying to get rid of a service, this may be known as deprecating or sunsetting a service. In this case, you may not do certain tasks on that service as it will be a waste of time. In the base MOT, there is a colour-coded system to help identify which items should be completed for the different statuses.
The list of tasks may be company-specific and be tailored to how you deploy and build your applications. A base MOT can be found at ServiceMOT.GitHub.io which is a great way of getting started. Once you have got used to completing MOTs and see a need for some business-specific checks feel free to fork the project. If you think there is something missing which would be useful for all applications please open a pull request or raise an issue.
When we started doing MOTs we found ourselves getting distracted with certain things which were not 100% necessary and would stop us completing more important tasks. The Service MOT now contains an “Upgrades” section at the end. This contains the nice to have tasks and tasks which will help the service further into the future. This can be seen with how library upgrades are split out, in the main MOT you should only upgrade patch versions of dependencies. These upgrades should, in theory, go without any code changes and provide bug/security fixes. Once all the tasks are complete and you still have time, you can upgrade the major versions and deal with any code changes. It is important to note if the version in use is not receiving security patch fixes you should do a major upgrade as part of the MOT. The upgrade section is in priority order and should be agreed with your team before starting MOTs.
Selling to Your Team
In our team, the MOT was originally sold to our PO by asking for a couple of hours every month or two for each service. Unfortunately for our PO, I think at that moment they forgot how many services we own as this actually adds up to a large amount of time. We ended up starting with two hours every week and it has now turned into Monday mornings as the improvements are starting to show.
One of the main points around Service MOTs is to show that owning and running applications is not free. Whether it is patching security vulnerabilities, keeping up with standards or migrating infrastructure there are always things moving. By providing a routine set of checks, we can evidence this to non-technical people and by asking for a set amount of time we give them confidence this is not going to become a time sink.
Product Owners want their team to be predictable in terms of delivery, this allows them to manage stakeholder expectations and plan work with other teams. By allocating time to maintain services, when work comes into a service which has not been developed on for a while the changes are more likely to go smoothly.
No one can build a system which has 100% uptime, by spending time working on alerts and Observability when live issues happen your service is more likely to self-heal and easy to find the root cause. Teams are always learning about their applications and subsequently improving Observability. Service MOTs encourage this behaviour making it less likely your team will have actions from Reason For outage meetings (RFOs) and misconfigured alerts which take your team away form product delivery.
We have improved our delivery predictability and reliability in production by conducting regular checks on our applications called Service MOTs. Hopefully, this story gives you the information to encourage your team to start doing MOTs on your services.
An initial MOT for your team can be found at ServiceMOT.GitHub.io.