KIA CH01 Introducing Kubernetes

Hi there.

Today, let us read the Chapter 01: Introducing Kubernetes (Part I: Overview) of Kubernetes in Action

  1. the history of software developing
  2. isolation by containers
  3. how containers and Docker are used by Kubernetes
  4. how to simplify works by Kubernetes

The software architecture has transitioned from Monolithic to Microservice. Legacy software applications were big monoliths; nowadays, microservices, the small and independently running components, are introduced to decouple from each other, and are therefore easily developed, deployed, updated, and scaled, to meet changing business requirements.

Kubernetes (k8s) is introduced to reduce complexity brought by bigger number of microservices, automating the process of scheduling components to our servers, automatic configuration, supervision, and failure-handling. K8s abstracts the hardware infrastructure as a single enormous computational resource, selects a server for each component, deploys it, and enables it to easily find and communicate with all the other components.

1.1 Understanding the need for a system like Kubernetes

In this section, the book talks about how the development and deployment of applications has changed in recent years, caused by:

  • splitting big monolithic apps into smaller microservices
  • the changes in the infrastructure that runs those apps

1.1.1 Moving from monolithic to microservices

Monolithic applications: components that are all tightly coupled together and have to be developed, deployed, and managed as one entity, because they all run as a single OS process.

microservices: smaller independently deployable components.

Monolithic Microservices
components tightly coupled together independently deployable
scaling vertical scaling (scaling up) horizontal scaling (scaling out)
communication function invoking well-defined interfaces (RESTful APIs, AMQP, etc.)
changes redeployment of whole system minimal redeployment
deployment easy tedious and error-prone
debug/ trace easy hard: span multiple processes and machines (requires Zipkin)

1.1.2 Providing a consistent environment to applications

The environments on which the apps rely can differ from one machine to another, from one operating system to another, and from one library to another.

A consistent environment is required, to prevent failures :

  • exact same operating system, libraries, system configuration, networking environment, etc.
  • add applications to the same server without affecting any of the existing applications on that server.

1.1.3 Moving to continuous delivery: DevOps and NoOps

Nowadays, there are two typical practices that the same team develops the app, deploys it, and takes cares of it over its whole lifetime:

  • DevOps: a practice that the developer, QA, and operations teams collaborate throughout the whole process.
    • a better understanding of issues from users and ops team, early feedback
    • streamlining the deployment process, more often of releasing newer versions of applications
  • NoOps: a practice that the developers can deploy applications themselves without knowing hardware infrastructure and without dealing with the ops team.
    • Kubernetes allows developers to configure and deploy their apps independently
    • sysadmins focus on how to keep the underlying infrastructure up and running, rather than on how the apps run on top of the underlying infrastructure.

1.2 Introducing container technologies

Kubernetes uses Linux container technologies to provide isolation.

1.2.1 What are containers

Containers are much more lightweight (than VMs), which allows you to run many software components on the same hardware.

  • the process in the container is isolated from other processes inside the same host OS
  • containers consume only necessary resources (while VMs require a whole separate operating systems and additional compute resources)

Two mechanisms that containers use to isolate processes: Linux Namespaces, and Linux Control Groups(cgroups)

  1. Linux Namespaces

    Linux Namespaces isolates system resources, and make each process can only see resources that are inside the same namespace.

    The following table shows kinds of namespace:

    namespace meaning
    mnt Mount
    pid Process ID
    net Network namespace 1
    ipc Inter-process communication
    UTS hostname and domain name 2
    user User ID
  1. Linux Control Groups (cgroups)

    Linux Control Groups(cgroups) is a Linux kernel feature that can limit the resource usage of a process, or a group of processes.

1.2.2 Introducing the Docker container platform

Docker is a platform for packaging, distributing, and running applications.

  • Image: packaging application and environment, comprised of:
    • isolated filesystem, which is available to the app
    • metadata, which is used to execute the image on running image
  • Registry: a (public or private) repository that stores and shares Docker images.
    • push: uploading the image to a registry
    • pull: downloading the image from a registry
  • Container: a process that is isolated (running) and resource-constrained, running on the host OS, created from a Docker-based container image.
@startuml
start
:Docker builds image;
:Docker pushes image to registry;
:Docker pulls image from registry;
:Docker runs container from image;
stop
@enduml

Docker container images are composed of "layers":

  • shared and reused by building a new image on top of an existing parent image
    • speeding up distribution across network
    • reducing the storage footprint (each layer stored only once)
  • readonly for layers in images
    • until a new container is run, and a new writable layer is to be created;
    • until a write request is made to a file located in underlying image layers, the write operation is then applied to the newly created top-most layer that contains a copy of the file.

However, Docker uses Linux kernel of the host OS, it therefore does have limitations:

  • same version of Linux kernel
  • same kernel modules available

1.2.3 Introducing 'rkt' — an alternative to Docker

Just like Docker, rkt is a platform for running containers, but with a strong emphasis on security, composability, and conforming to open standards.

1.3 Introducing Kubernetes

Kubernetes is a software system that allows you to easily deploy and manage containerized applications.

1.3.1 The origins of Kubernetes

Google invented Kubernetes out of its internal systems like 'Borg' and 'Omega':

  • Simplification of Development and Management
  • higher utilization of infrastructure

1.3.2 Looking at Kubernetes from the top of a mountain

There are 3 features that Kubernetes has:

  1. easy deployment and management

    • Linux containers to run heterogeneous applications
      • without detailed knowledge of their internals
      • without manual deployment on each host
    • containerization to isolate applications, on shared hardware
      • optimal hardware utilization
      • complete isolation of hosted applications
  2. abstraction of the underlying infrastructure

    • runs applications on thousands of nodes as if all nodes were one single enormous computer
    • easy development, deployment and management for both development and the operations teams
  3. Deploying applications in Kubernetes is a consistent process

    • cluster nodes represent amount of resources available to the apps
    • number of nodes does not change the process of deployment

In practice, Kubernetes exposes the whole data center as a single deployment platform. Kubernetes allows developers to focus on implementing the actual features of the applications. And Kubernetes will handle infrastructure-related services (such as service discovery, scaling, load-balancing, self-healing, and leader election ).

1.3.3 Architecture of a Kubernetes cluster

Kubernetes cluster is composed of 2 types of nodes:

  1. Control Plane (Master): controls the cluster
    • API Server: communicates with other components
    • Scheduler: schedules apps by assigning a worker node to each deployable component of app
    • Controller Manager: performs cluster-level functions, such as replicating components, keeping track of worker nodes, and handling node failures.
    • etcd: a reliable distributed database that persistently stores the cluster configuration
  2. Worker Nodes: runs containerized applications
    • Kubelet: talks to the API server and manages containers on its node
    • kube-proxy (Kubernetes Service Proxy): load-balances network traffic between application components
    • container runtime: runs containers, e.g., Docker rkt
@startuml
title "components of Kubernetes cluster"
node "Control Plane (master)" {
    database "etcd" as etcd
    rectangle  "API server" as apiServer
    rectangle  "Scheduler" as scheduler
    rectangle  "Controller Manager" as controllerManager
    scheduler --> apiServer
    controllerManager --> apiServer
    apiServer --> etcd
}
node "Worker node(s)" {
    rectangle  "Container Runtime" as containerRuntime
    rectangle  "Kubelet" as kubelet
    rectangle  "kube-proxy" as kubeProxy
    kubelet --> containerRuntime
    kubelet --> apiServer
    kubeProxy --> apiServer
}
@enduml

1.3.4 Running an application in Kubernetes

When the developer submits App Descriptor(a list of apps) to the master, Kubernetes then chooses worker nodes and deploys apps.

And App Descriptor is used to describe the detail of the running container:

  • which container images, or which images that contain your application
  • how many replicas for each component
  • how components are related to each other
    • co-located: run together on the same worker node
    • otherwise, spread around the cluster.
  • whether a service is internal or external

The diagram below shows how an App Descriptor works in starting app:

@startuml
start
:Developer submits App Descriptor to API Server;
:Scheduler schedules the specified groups of containers onto the available worker nodes;
:Kubelet on the worker node instruct Container Runtime to pull and run the containers;
stop
@enduml

After the application is running, Kubernetes continuously makes sure that the deployed state of the application always matches the description :

  • if one instance stopped working, Kubernetes will restart this instance
  • if one worker node dies (becomes inaccessible), Kubernetes will select a new node and run all the previous containers on the newly selected worker node

If workload fluctuates, Kubernetes can also automatically scale(increase/decrease) the number of replicas, based on real-time metrics your app exposes, such as CPU load, memory consumption, queries per second, etc.

However, Kubernetes may need to move containers around the cluster, under the following 2 circumstances:

  • worker node failure
  • running container evicted to make room for other containers

To ensure services remain available to clients during the movement of containers, Kubernetes uses environment variables to expose a single static IP address to all applications running in the cluster. This allows clients to access the containers with a constant IP address, and kube-proxy will also ensure connections to the service are load-balanced across all the containers providing the service.

1.3.5 benefits of using Kubernetes

  • Simplifying application deployment

  • Achieving better utilization of hardware

  • Health checking and self-healing

  • Automatic scaling

  • Simplifying application development


  1. Each network interface belongs to exactly one namespace, but can be moved from one namespace to another. ↩︎

  2. Different UTS namespaces makes processes see different host names↩︎


* This blog was last updated on 2024-05-06 15:54