Core Concepts
Kubernetes Architecture
Nodes
— Worker machine (minions): physical or virtual.Cluster
— a set of Nodes grouped together. If one node fails, the others remain available. Used for load balancing.Master
— keeps track of all nodes and manages the orchestration.Kubectl
— management utility.
Components:
- Master nodes:
API Server
(kube-apiserver
) — acts as frontend for Kubernetes. Everyone interacts with Kubernetes through it.etcd
— distributed key-value storage; makes sure that all masters have the same settings and there are no conflicts.Controller
— the central brain of orchestration. Responsible for monitoring and responding to situations where nodes go down. In this case, it makes decisions about raising new containers.Scheduler
— distributes load or containers across multiple Nodes. We look at the newly created containers and distribute them among the nodes.
- Worker nodes:
kubelet agent
— is responsible for ensuring that the nodes work and are configured as expected of them. Interacts withmaster
(kube-apiserver
).Container Runtime
— underlying software for running containers. Not necessarilyDocker
.Rkt
,CRI-O
, etc.
POD
- Kubernetes does not deploy containers directly to Worker Nodes, instead it encapsulates them in objects known as
POD
. POD is a single application instance, the smallest object that can be created in Kubernetes. - New application instances are added by adding new PODs.
- Typically, PODs have a 1:1 relationship with the application container. At the same time, it is allowed to add auxiliary containers to one POD, for example, Redis. Although this is a rare case.
- Several PODs can be placed on the same Node, or on different ones.
Namespaces
default
— created automatically.
kube-system
— isolated, for internal usage only.
kube-public
— available to users.
You can add custom: dev
, prod
, etc.
DNS:
- In
default
:mysql.connect('db-service')
. - In
dev
:mysql.connect('db-service.dev.svc.cluster.local')
, where:db-service
— Service Name;dev
— Namespace;svc
— Service;cluster.local
— domain.
Service accounts
- User — for users.
- Service — for machines.
kubectl create serviceaccount my-service-acc-name
kubectl describe serviceaccounts default
kubectl describe serviceaccounts my-service-acc-name
kubectl describe secrets my-service-acc-name-sa-token-kbbdm
kubectl get serviceaccount
Namespace
in Kubernetes
a service account named default
is automatically created. It only has
permissions for default Kubernetes API
requests. Each Namespace
has its own default service account.
Each POD automatically mounts default token at path: /var/run/secrets/kubernetes.io/serviceaccount
.
kubectl exec -it my-kubernetes-dashboard ls /var/run/secrets/kubernetes.io/serviceaccount
kubectl exec -it my-kubernetes-dashboard cat /var/run/secrets/kubernetes.io/serviceaccount/token
Resource requests and limits
If the node where a Pod is running has enough of a resource available, it's possible (and allowed) for a container
to use more resource than its request
for that resource specifies. However, a container is not allowed to use more
than its resource limit
.
Limits and requests for CPU resources are measured in cpu units. In Kubernetes, 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine.
Attention
CPU resource is always specified as an absolute amount of resource, never as a relative amount. For example,
500m
CPU represents the roughly same amount of computing power whether that container runs on a single-core,
dual-core, or 48-core machine.
Behavior in case of an attempt to use resources in excess of the set limit:
- CPU —
THROTTLE
; - Memory:
TERMINATE
. See Exceed a Container's memory limit.
CPU
Containers cannot use more CPU than the configured limit. Provided the system has CPU time free, a container is guaranteed to be allocated as much CPU as it requests.
Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod CPU request.
Memory
A Container is guaranteed to have as much memory as it requests, but is not allowed to use more memory than its limit.
A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to
use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a
candidate for termination. If the Container continues to consume memory beyond its limit, the Container is
terminated. If a terminated Container can be restarted, the kubelet
restarts it, as with any other type of
runtime failure.
If you do not specify a CPU limit for a Container, then one of these situations applies:
- The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.
- The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.
To view the CPU usage of a Pod: kubectl top pod cpu-demo --namespace=cpu-example
Default values:
- CPU: 1 == 1000m, min = 1m;
- AWS vCPU: 1;
- GCP Core: 1;
- Azure Core: 1;
- Hyperthread on a bare-metal Intel processor with Hyperthreading: 1;
- Memory: 256 Mi:
- 268435456;
- 268M;
- 1G.
Note
- 1 G (Gigabyte) = 1_000_000_000 bytes;
- 1 M (Megabyte) = 1_000_000 bytes;
- 1 K (Kilobyte) = 1_000 bytes;
- 1 Gi (Gibibyte) = 1_073_741_824 bytes;
- 1 Mi (Mebibyte) = 1_048_576 bytes;
- 1 Ki (Kibibyte) = 1_024 bytes.
References:
- Resource Management for Pods and Containers
- Assign Memory Resources to Containers and Pods
- Assign CPU Resources to Containers and Pods
- Configure Default CPU Requests and Limits for a Namespace
- Configure Default Memory Requests and Limits for a Namespace
Taints and Tolerations
Node affinity is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite — they allow a node to repel a set of pods.
Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
You add a taint to a node using kubectl taint. For example,
places a taint on nodenode1
. The taint has key key1
, value value1
, and taint effect NoSchedule
. This means
that no pod will be able to schedule onto node1
unless it has a matching toleration.
To remove the taint added by the command above, you can run:
nodeSelector
nodeSelector
is the simplest recommended form of node selection constraint. You can add the nodeSelector
field
to your Pod specification and specif the node labels you want the target node to have. Kubernetes only schedules
the Pod onto nodes that have each of the labels you specify.
Node Affinity
Type | DuringScheduling | DuringExecution |
---|---|---|
requiredDuringSchedulingIgnoredDuringExecution |
Required | Ignored |
preferredDuringSchedulingIgnoredDuringExecution |
Preferred | Ignored |
requiredDuringSchedulingRequiredDuringExecution |
Required | Required |
Observability
POD Status: Pending
→ ContainerCreating
→ Running
POD Conditions:
PodScheduled
:true
/false
;Initialized
:true
/false
;ContainersReady
:true
/false
;Ready
:true
/false
;
Monitoring, Logging, and Debugging
Labels & Selectors
- By object type:
PODs
,ReplicaSets
,Deployments
,Services
. - By application name:
app1
,app2
,...appN
, etc. - By functionality:
front-end
,back-end
,web-servers
,app-servers
,db
,auth
,audit
,cache
, etc.
kubectl get pods --selector environment=production,tier=frontend --show-labels
kubectl get pods -l environment=production,tier=frontend
kubectl get pods -l environment=production,tier=frontend --output name | wc -l # to return the number of pod
Services, Load Balancing, and Networking
Publishing Services (ServiceTypes)
Kubernetes ServiceTypes
allow you to specify what kind of Service you want. Type values and their behaviors are:
ClusterIP
(default): Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster.NodePort
: Exposes the Service on each Node's IP at a static port (theNodePort
). AClusterIP
Service, to which theNodePort
Service routes, is automatically created. You'll be able to contact theNodePort
Service, from outside the cluster, by requesting<NodeIP>:<NodePort>
.LoadBalancer
: Exposes the Service externally using a cloud provider's load balancer.NodePort
andClusterIP
Services, to which the external load balancer routes, are automatically created.ExternalName
: Maps the Service to the contents of theexternalName
field (e.g.foo.bar.example.com
), by returning aCNAME
record with its value. No proxying of any kind is set up.
Packet path from outside to POD: NodePort
(Node) → Port
(Service inside Node) → TargetPort
(POD inside Node).