Resources and scaling
In Kubernetes, resources like CPU and memory are crucial for workload allocation on Nodes. Pods have resource requests and limits, which help the scheduler decide where to place them. Proper provisioning ensures efficient resource use and system stability.
Choosing the right resource requests for workloads in Kubernetes is challenging due to the need to balance application performance with efficient resource utilization (over vs. under provisioning).
TIP
If you're not familiar with Kubernetes resources, check out the How Pods with resource requests are scheduled section in the official Kubernetes documentation.
Additionally, horizontal scaling is a critical aspect of managing workloads in Kubernetes: it allows you to scale the number of Pods based on resource usage or other metrics. It is preferable to have smaller Pods, which can be easily scaled horizontally, and facilitate better resource utilization on Nodes.
Resource allocation
Web server Pods
When building our PHP FPM Docker image, we fixed the number of PHP-FPM workers to 10 (see this section).
This means that theoretically, each PHP-FPM worker can consume up to 1 CPU core, and the total CPU usage of all PHP-FPM workers in 1 Pod can reach 10 CPU cores.
However, in practice, PHP-FPM workers are not always using CPU when handling requests: they can be waiting for I/O operations, network requests, etc.
Depending on your specific workload, the actual user CPU usage duration varies between 20% and 80% of the total time taken by the worker to handle a request.
Our general recommentation of resources for the web servers Pods containers is:
Container | Resource | Request | Limit |
---|---|---|---|
PHP-FPM | CPU | 2 | 2 1 |
PHP-FPM | Memory | 2Gi | 2Gi |
nginx | CPU | 0.2 | |
nginx | Memory | 256Mi | 256Mi |
1 : setting CPU limits is often not recommended as it may cause contention and performance degradation. Setting such a limit may not be useful with properly configured horizontal autoscaling.
IMPORTANT
Requesting the proper amount of memory is more critical than CPU: if a Node runs out of memory, it can become unresponsive, and the Pod can be evicted. Also, when the memory limit of a Pods container is reached, the container is killed. CPU throttling is less critical, as the Pod can still run, but it will be slower.
TIP
Although we're using round numbers for resource requests, you might want to fine-tune those values depending on your nodes capacity: an 8-core Node will actually have around 7 CPU cores available for Pods, as between 0.5 and 1 core is used for system processes and DaemonSets
.
Consumer and cron Pods
The consumer and cron Pods are running a single process, meaning that they can not use more than 1 CPU core.
Our general recommendation of resources for the consumer and cron Pods containers is:
Type | Resource | Request | Limit |
---|---|---|---|
Consumer | CPU | 0.2 | |
Consumer | Memory | 1Gi | 1Gi |
Cron | CPU | 0.5 | |
Cron | Memory | 1Gi | 1Gi |
The CPU resources to request may differ between cron groups (if running separately) and between the different consumers.
For instance, the index
cron group has smaller CPU requirements, as it's mainly working with I/O operations (database, search, etc.).
Wrapping up
Allocating resources to Pods is a complex task, and may require some trial and error to find the right balance between performance and resource utilization.
You should monitor your Pods and Nodes (i.e. using Prometheus, see further in this guide) to ensure that the resources allocated are adapted to the workload, and adjust them if needed.
Workload placement
When deploying Pods in a Kubernetes cluster, the scheduler decides where to place them based on the resource requests and limits of the Pods, as well as the available resources on the Nodes.
But it also takes into account other factors, such as the Pod's affinity and anti-affinity rules, taints and tolerations, etc.
As a general rule, you should avoid placing all the Pods
of a same Deployment
on the same Node, to ensure high availability in case of Node failure.
Such a rule can be enforced using Pod anti-affinity rules, which prevent Pods with the same label to be scheduled on the same Node:
apiVersion: apps/v1
kind: Deployment
metadata:
name: magento-deployment
spec:
...
template:
spec:
...
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
<label-key>: <label-value>
TIP
To ensure high availability, you should also have an anti-affinity rule based on the failure-domain.beta.kubernetes.io/zone
label, to ensure that Pods are not scheduled on the same Availability Zone.
Horizontal scaling
We want to achieve two main goals with horizontal scaling:
- High availability: if a
Pod
fails (or aNode
), another one can take over - Resource and cost optimization: we can scale the number of
Pods
based on resource usage: less traffic, fewerPods
, more traffic, morePods
Horizontal scaling can be achieved in Kubernetes using the HorizontalPodAutoscaler
resource.
The HorizontalPodAutoscaler
can automatically scale the number of Pods
in a Deployment
based on observed CPU utilization (or other metrics, but we'll only use CPU).
The configuration of an HorizontalPodAutoscaler
is quite simple ; we need to define:
- A target
Deployment
- A minimum and maximum number of replicas (
Pods
) - A target CPU utilization percentage, over which the
Pods
will be scaled up
A sample configuration would look like this:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: magento-autoscaler
labels:
app: magento
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: magento-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
A few important points to note:
- The
HorizontalPodAutoscaler
will only scale the number ofPods
of theDeployment
if the containers have resource requests defined. - The CPU average utilization is calculated every 15 seconds by default, and over a small period of time, so the
Pods
won't be scaled up immediately if the CPU usage spikes - The calculated average CPU utilization of a
Pod
is actually an average ratio of the CPU usage of all aPod
's containers, over their respective CPU requests
TIP
You can also use the PodDisruptionBudget
resource to define how many Pods
of a Deployment
can be down at the same time. We'll cover this further in the high availability section.
Recommendations
You should keep your minReplicas
as low as possible, to save resources and costs, and your maxReplicas
as high as possible, to ensure high availability, while keeping in mind the capacity of your Nodes.
From our experience, a good CPU utilization percentage to start with is 70%, which is high enough to prevent over-provisioning, but low enough to ensure that the Pods
can handle traffic spikes, by scaling up before the CPU usage reaches 100%.
As always, you should monitor your Pods
and Nodes
to ensure that the autoscaling is working as expected, and adjust the configuration if needed.
Cluster sizing
When sizing your Kubernetes cluster, you should take into account the resources needed by your workloads, as well as the capacity of your Nodes.
In order to be scale horizontally in a cost-effective way, you should aim to have a cluster with relatively small Nodes, between 4 and 8 CPU cores.
On cloud providers, instances cost is generally proportional to the number of CPU cores and available memory, so having a cluster with small Nodes won't be more expensive than having a cluster with large Nodes.
You should keep in mind that system processes and DaemonSets
will consume some CPU and memory on each Node, betwwen 0.5 and 1 CPU cores, so you should take that into account when sizing your Nodes:
- If nodes are too large, you may end up with underutilized resources
- If nodes are too small, you will proportionally have less resources for your Pods