Core Concepts

Kubernetes Architecture

Nodes — Worker machine (minions): physical or virtual.
Cluster — a set of Nodes grouped together. If one node fails, the others remain available. Used for load balancing.
Master — keeps track of all nodes and manages the orchestration.
Kubectl — management utility.

Components:

Master nodes:
1. API Server (kube-apiserver) — acts as frontend for Kubernetes. Everyone interacts with Kubernetes through it.
2. etcd — distributed key-value storage; makes sure that all masters have the same settings and there are no conflicts.
3. Controller — the central brain of orchestration. Responsible for monitoring and responding to situations where nodes go down. In this case, it makes decisions about raising new containers.
4. Scheduler — distributes load or containers across multiple Nodes. We look at the newly created containers and distribute them among the nodes.
Worker nodes:
1. kubelet agent — is responsible for ensuring that the nodes work and are configured as expected of them. Interacts with master (kube-apiserver).
2. Container Runtime — underlying software for running containers. Not necessarily Docker. Rkt, CRI-O, etc.

POD

Kubernetes does not deploy containers directly to Worker Nodes, instead it encapsulates them in objects known as POD. POD is a single application instance, the smallest object that can be created in Kubernetes.
New application instances are added by adding new PODs.
Typically, PODs have a 1:1 relationship with the application container. At the same time, it is allowed to add auxiliary containers to one POD, for example, Redis. Although this is a rare case.
Several PODs can be placed on the same Node, or on different ones.

Namespaces

default — created automatically. kube-system — isolated, for internal usage only. kube-public — available to users.

You can add custom: dev, prod, etc.

DNS:

In default: mysql.connect('db-service').
In dev: mysql.connect('db-service.dev.svc.cluster.local'), where:
1. db-service — Service Name;
2. dev — Namespace;
3. svc — Service;
4. cluster.local — domain.

Service accounts

User — for users.
Service — for machines.

kubectl create serviceaccount my-service-acc-name

kubectl describe serviceaccounts default
kubectl describe serviceaccounts my-service-acc-name

kubectl describe secrets my-service-acc-name-sa-token-kbbdm

kubectl get serviceaccount

For every Namespace in Kubernetes a service account named default is automatically created. It only has permissions for default Kubernetes API requests. Each Namespace has its own default service account.

Each POD automatically mounts default token at path: /var/run/secrets/kubernetes.io/serviceaccount.

kubectl exec -it my-kubernetes-dashboard ls /var/run/secrets/kubernetes.io/serviceaccount
kubectl exec -it my-kubernetes-dashboard cat /var/run/secrets/kubernetes.io/serviceaccount/token

Resource requests and limits

If the node where a Pod is running has enough of a resource available, it's possible (and allowed) for a container to use more resource than its request for that resource specifies. However, a container is not allowed to use more than its resource limit.

Limits and requests for CPU resources are measured in cpu units. In Kubernetes, 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine.

Attention

CPU resource is always specified as an absolute amount of resource, never as a relative amount. For example, 500m CPU represents the roughly same amount of computing power whether that container runs on a single-core, dual-core, or 48-core machine.

Behavior in case of an attempt to use resources in excess of the set limit:

CPU — THROTTLE;
Memory: TERMINATE. See Exceed a Container's memory limit.

CPU

Containers cannot use more CPU than the configured limit. Provided the system has CPU time free, a container is guaranteed to be allocated as much CPU as it requests.

Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod CPU request.

Memory

A Container is guaranteed to have as much memory as it requests, but is not allowed to use more memory than its limit.

A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.

If you do not specify a CPU limit for a Container, then one of these situations applies:

The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.
The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.

To view the CPU usage of a Pod: kubectl top pod cpu-demo --namespace=cpu-example

Default values:

CPU: 1 == 1000m, min = 1m;
1. AWS vCPU: 1;
2. GCP Core: 1;
3. Azure Core: 1;
4. Hyperthread on a bare-metal Intel processor with Hyperthreading: 1;
Memory: 256 Mi:
1. 268435456;
2. 268M;
3. 1G.

Note

1 G (Gigabyte) = 1_000_000_000 bytes;
1 M (Megabyte) = 1_000_000 bytes;
1 K (Kilobyte) = 1_000 bytes;
1 Gi (Gibibyte) = 1_073_741_824 bytes;
1 Mi (Mebibyte) = 1_048_576 bytes;
1 Ki (Kibibyte) = 1_024 bytes.

References:

Taints and Tolerations

Documentation

Node affinity is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite — they allow a node to repel a set of pods.

Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

You add a taint to a node using kubectl taint. For example,

kubectl taint nodes node1 key1=value1:NoSchedule

places a taint on node node1. The taint has key key1, value value1, and taint effect NoSchedule. This means that no pod will be able to schedule onto node1 unless it has a matching toleration.

To remove the taint added by the command above, you can run:

kubectl taint nodes node1 key1=value1:NoSchedule-

nodeSelector

nodeSelector is the simplest recommended form of node selection constraint. You can add the nodeSelector field to your Pod specification and specif the node labels you want the target node to have. Kubernetes only schedules the Pod onto nodes that have each of the labels you specify.

Node Affinity

Type	DuringScheduling	DuringExecution
`requiredDuringSchedulingIgnoredDuringExecution`	Required	Ignored
`preferredDuringSchedulingIgnoredDuringExecution`	Preferred	Ignored
`requiredDuringSchedulingRequiredDuringExecution`	Required	Required

Observability

POD Status: Pending → ContainerCreating → Running

POD Conditions:

PodScheduled: true/false;
Initialized: true/false;
ContainersReady: true/false;
Ready: true/false;

Monitoring, Logging, and Debugging

Documentation

minikube addons enable metrics-server

kubectl top node
kubectl top pod

Labels & Selectors

Documentation

By object type: PODs, ReplicaSets, Deployments, Services.
By application name: app1, app2,... appN, etc.
By functionality: front-end, back-end, web-servers, app-servers, db, auth, audit, cache, etc.

apiVersion: v1
kind: Pod
metadata:
  name: webapp
  labels:
    app: app1
    tier: back-end

kubectl get pods --selector environment=production,tier=frontend --show-labels

kubectl get pods -l environment=production,tier=frontend
kubectl get pods -l environment=production,tier=frontend  --output name | wc -l  # to return the number of pod

Services, Load Balancing, and Networking

Documentation

Publishing Services (ServiceTypes)

Kubernetes ServiceTypes allow you to specify what kind of Service you want. Type values and their behaviors are:

ClusterIP (default): Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster.
NodePort: Exposes the Service on each Node's IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You'll be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
LoadBalancer: Exposes the Service externally using a cloud provider's load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.

Packet path from outside to POD: NodePort (Node) → Port (Service inside Node) → TargetPort (POD inside Node).

Ingress

Documentation

Core Concepts

Kubernetes Architecture

POD

Namespaces

Service accounts

Resource requests and limits

Taints and Tolerations

nodeSelector

Node Affinity

Observability

Monitoring, Logging, and Debugging

Labels & Selectors

Services, Load Balancing, and Networking

Publishing Services (ServiceTypes)

Ingress

Liveness, Startup and Readiness Probes