I have been using Stadia for a number of months and am thoroughly enjoying it. Yes I have had one or two days where it gets a bit laggy but overall it has been a positive experience and the ease of access to games makes up for any bits of lag.

Due to me using even more streaming service I thought it was worth upgrading my home network. I ended up choosing the Ubiquity Edgerouter X and access point. This is more of a prosumer setup but I have always wanted to delve into the networking side of things. …

Photo by Nina Mercado on Unsplash

The Service MOT takes influence from the UKs MOT which is a yearly test on motor vehicles to check they are roadworthy, rather than checking a car, we check our services. By doing MOTs you can be confident in your service and also know it is ready to be worked on at any time.

Birth of the Service MOT

The team which I am part of looks after a large number of applications. Often we will be focusing on adding features to one application at a time. This results in the other applications not having much attention and becoming stale. …

Improve the security of your Express app today

Photo by Irvan Smith on Unsplash

My team has recently started implementing CSP on our website. As we started building out the configuration we realised that we were manually testing things and our feedback loop was not as small as we would have liked. We decided to create some tests so we wouldn’t have to retest all the different pages after changing things.

This story walks through some of the key parts of how we tested our CSP header using an example application which can be found here.

Strategies against failure in distributed systems

Photo by Mitchell Griest on Unsplash.

It is inevitable that something will fail in a distributed system, and we should plan as if it is a normal occurrence. One solution to this problem is to run multiple instances of a service. That way, if one fails, the others can take over.

In this article, we will explore some of the different ways we can achieve this on Kubernetes (K8s).


Redundancy has a cost to it, and we should consider this when deciding how much resiliency we need. …

As easy as it is to change configuration in User Interfaces maintaining this over a long time becomes a hassle. This is especially true if you are looking after configuration for multiple environments and applications. In this article, we look at a new Terraform provider for AppDynamics to get around this issue using configuration as code.

Why Configuration as Code

The ‘as code’ practice has been thrown around a lot recently. You may recognise it in infrastructure as code which is often used to build applications on AWS. Ther are various tools used for this such as Terraform, CDK, CloudFormation and Puppet to name…

TL;DR: Yes, if everything is set up correctly. Keep reading to find out if you have

Image credit: Author

We all strive to build resilient and self-healing applications, but occasionally we make a mistake and have to restart one. Hopefully, we will have the time to fix this, but until then, we may need manual intervention. In this article, we understand what happens when we delete a Kubernetes (K8s) pod while it is serving live traffic. We can then apply this knowledge to our operations so we don't affect our customer's experience.

Pod Lifecycle

First, let's understand what actually happens when a pod is deleted.

Kubernetes sends two signals to the process in a container when it is deleted. The initial…

Learn how Node scales with CPU

Photo by Edward Howell on Unsplash.

I have heard many people say we should scale applications horizontally rather than vertically, but is this actually the best way to scale? In this article, we will explore how Node.js scales with CPU and see if there is anything else we need to take into account if we do so.

Test Infrastructure

To test Node.js, a demo application was created with endpoints that could be used to simulate a load. The application is Dockerised and can be found on the Docker Hub. The source code can be found on GitHub.

The application was deployed on AWS ECS with different CPU limits…

Live issues are a great opportunity to learn and improve. Here’s what happened to us

Photo by Fleur on Unsplash.

In this article, we will explore a case when one of our services scaled to its maximum and how we changed our alerting to stop this from becoming an issue in the future.

Our Infrastructure

The service we are using as an example in this article is deployed on Kubernetes (K8s) with autoscaling enabled. We scale based on requests per second and K8s is configured to keep the requests per second (RPS) at 50. There is a slight delay before the service is scaled, as RPS is averaged over one minute. For more information on K8s scaling, check out its documentation.


Photo by Hal Gatewood on Unsplash

Now shops are limiting the number of people inside, shopping takes a lot longer than it used to. Most times I visit a store I am having to queue outside for an extended period of time. Probably longer than I spend inside. This article explores why this is the case from the eyes of a software engineer.

The Problem

Putting a limit to the number of people in a store is like having a thread pool for a server. Most new servers are moving to asynchronous processing rather than using thread pools e.g. Nginx and NodeJS. …

Applied to Software

Photo by Vladyslav Cherkasenko on Unsplash

This story is by no means trying to make light of the Chernobyl incident. It tries to show that observability and resiliency is a problem which has been around for a long time and we can learn from each other to make our systems better. I wrote this after watching the HBO TV series if you haven't watched it add it to your watch list as it is a really interesting and eye-opening show.

Below are some key points which I connected with when watching the series in relation to some of the problems we have seen with our observability…

Harry Martland

Senior Software Engineer and Observability Guild Lead at Booking.com — Transport, writing mainly about observability and micro front ends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store