$ kubectl get node
Summary
This section describes the components that make up Kubernetes (K8s) and explains how Kubernetes works.
References
The purpose of this document is to explain what you might need to know to read these documents.
Basic Configuration of K8s
The official documentation includes the following diagram.
When simply written "Node", it refers to the Woker Node. The special node on which the critical API server that makes up the K8s system runs is sometimes called the Control Plane Node to distinguish it. Usually, some of worker nodes are used as control plane nodes.
In the Kubernetes cluster used in this SCCP, the Control Plane functionality is running on some Worker Nodes, so our Control Plane Nodes are not independent.
You can check the node and its types with the following command.
The Control-Plane node, where the api-server runs and is the heart of the system, has the role control-plane (since v1.20.x).
It used to be called the Master node, but after the BLM movement, some terms, such as master and slave, are replaced in the documentation of various projects, not only Kubernetes, in progress.
NAME STATUS ROLES AGE VERSION
u109ls01 Ready control-plane 2y358d v1.30.4
u109ls02 Ready control-plane 2y358d v1.30.4
u109ls03 Ready <none> 2y358d v1.30.4
u109ls04 Ready <none> 2y358d v1.30.4
Node (Worker Node) configuration
To run containers on the Kubernetes system, each node runs a Container Runtime Interface (CRI) compliant container engine such as Docker, containerd, CRI-O, etc. In addition, there are Kubernetes-specific components such as kubelet and kube-proxy are running.
-
Virtualized networking
-
CRI-compliant container engines (Docker, containerd, CRI-O, etc.)
-
kubelet process
-
kube-proxy process
This container engine, kubelet, and kube-proxy are always running on all of nodes and cooperate with the API-server (kube-apiserver, described later) to run user pods, etc.
Virtualized network
Although not explicitly shown in the diagrams in the official guide, the heart of Kubernetes is network virtualization.
The initial design is documented in GitHub.
The document lists four basic features.
-
Communication between containers (inside pods)
-
Communication between Pods
-
Communication between Service and Pod
-
Communication between outside and inside
In the environment of Seminar Room 10, Calico is used to build a virtualized network.
Calico operates at the Layer-3 level and uses BGP to publicize the location of the /32 IP addresses assigned to each node Other solutions, such as Flannel, use VxLAN technology to provide virtualization at Layer-2, but there are performance concerns Calico, which uses Layer-3, is considered a better choice if Layer-2 functionality is not required.
The default value of calico network backend mode is vxlan from ipip. Please check the GitHubのissues page for details.
CRI compliant container engine (Runtime)
Official kubernetes documentation lists the following three CRI runtimes Kubernetes official documentation lists the following three CRI runtimes.
-
containerd
-
CRI-O
-
Docker
Described at the CRI: the Container Runtime Interface, the Protobuf API lists the services implemented by CRI compliant container engines.
The basic role is to provide the following functions described in the service definition
-
Runtime Service: Retrieving, storing, and deleting container images (container-image managements)
-
Image Service: Start, stop, and delete containers (pod operations)
Container engines provide a defined set of services, and the following functions may behave differently depending on the container engine that you choose.
-
Where to pull container images from (Not necessarily looking for docker.io when the Repository name is omitted)
-
where to store container images (Not necessarily images are stored under /var/lib/docker/)
-
How to set up a connection to your own Registry that uses your own TSL/SSL CA certificate file (Different locations for ca.key file)
The Kubernetes team used the docker as a container runtime at first. However, they decided that the support of docker was discontinued from v1.20 release. Now we use the containerd as the container runtime. For more details, please see the official blog.
Essential Processes of Kubernetes Node
kubelet
kubelet works with Control-Plane’s api-server (kube-apiserver) to manage pod (container) activities.
In each actual node, it exists as the following process
## ps auxww | grep kubelet output
root 151957 7.4 0.6 3357840 158592 ? Ssl May31 5958:14 /usr/local/bin/kubelet --logtostderr=true --v=2 --node-ip=192.168.100.51 --hostname-override=u109ls01 --bootstrap- kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet. conf --pod-infra-container-image=k8s.gcr.io/pause:3.3 --runtime-cgroups=/systemd/system.slice --network-plugin=cni --cni-conf-dir=/etc /cni/net.d --cni-bin-dir=/opt/cni/bin
kube-proxy
kube-proxy controls pod communication (network).
It runs as a Docker container on each node.
## docker ps | grep kube-proxy output
39ff1f5995bd 9d368f4517bb "/usr/local/bin/kube..." 5 days ago Up 5 days k8s_kube-proxy_kube-proxy-pg7d4_kube-system_f95cad6f-482b -4c52-91f1-a6759cbe7a0b_2
622fa3ac83bd k8s.gcr.io/pause:3.3 "/pause" 5 days ago Up 5 days k8s_POD_kube-proxy-pg7d4_kube-system_f95cad6f-482b-4c52-91f1-a6759cbe7a0b_2
Since kube-proxy is running as a pod, you can see how it works from kubectl.
$ kubectl -n kube-system get pod -l k8s-app=kube-proxy -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS G
ATES
kube-proxy-56j9f 1/1 Running 3 55d 192.168.100.54 u109ls04 <none> <none>
kube-proxy-gt7gg 1/1 Running 2 55d 192.168.100.52 u109ls02 <none> <none>
kube-proxy-hlkn8 1/1 Running 2 55d 192.168.100.53 u109ls03 <none> <none>
kube-proxy-pg7d4 1/1 Running 2 55d 192.168.100.51 u109ls01 <none> <none>
In communication, the IP address used by Load-Balancer is associated with the MAC address of each node.
Other IP addresses used within K8s, such as ClusterIP, are associated with the local MAC address (x[26ae]:xx:xx:xx:xx:xx:xx) and assigned to one of the nodes.
## K8s internal network as observed from the outside
$ arp -n
Address HWtype HWaddress Flags Mask Iface
192.168.100.160 ether 00:1b:21:bc:0c:3a C ens1
192.168.100.52 ether 00:1b:21:bc:0c:89 C ens1
192.168.100.53 ether 00:1b:21:bc:0c:3a C ens1
192.168.100.54 ether 00:1b:21:bc:0c:3b C ens1
...
## K8s internal network that can be observed from the inside
$ arp -n
Address HWtype HWaddress Flags Mask Iface
192.168.100.52 ether 00:1b:21:bc:0c:89 C enp1s0
192.168.100.53 ether 00:1b:21:bc:0c:3a C enp1s0
192.168.100.54 ether 00:1b:21:bc:0c:3b C enp1s0
10.233.113.131 ether 1a:73:20:a4:cd:b3 C cali1223486b3a4
10.233.113.140 ether 72:e9:69:66:14:dc C cali79c5fc4a9e9
...
K8s System Components (Control Plane)
The main body of the Kubernetes system runs as a Control Plane.
-
api-server
-
kube-scheduler
-
etcd
-
kube-controller-manager (Controller Manager)
-
(optional) Cloud Controller Manager
All of these components except etcd are running under the control of the container engine as pods on the Control Plane Node.
Processes running as pods can use kubectl to check which server they are running on.
$ kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-8b5ff5d58-rr4jp 1/1 Running 1 6d1h 192.168.100.54 u109ls04 <none> <none>
calico-node-2cm5r 1/1 Running 8 171d 192.168.100.53 u109ls03 <none> <none>
calico-node-5x5pr 1/1 Running 10 171d 192.168.100.51 u109ls01 <none> <none>
calico-node-7v65s 1/1 Running 10 171d 192.168.100.52 u109ls02 <none> <none>
calico-node-l7hqn 1/1 Running 7 171d 192.168.100.54 u109ls04 <none> <none>
coredns-85967d65-7g7fb 1/1 Running 0 6d1h 10.233.112.58 u109ls03 <none> <none>
coredns-85967d65-hbtjj 1/1 Running 3 55d 10.233.105.203 u109ls04 <none> <none>
dns-autoscaler-5b7b5c9b6f-44jh8 1/1 Running 0 6d1h 10.233.105.9 u109ls04 <none> <none>
kube-apiserver-u109ls01 1/1 Running 20 110d 192.168.100.51 u109ls01 <none> <none>
kube-apiserver-u109ls02 1/1 Running 17 109d 192.168.100.52 u109ls02 <none> <none>
kube-controller-manager-u109ls01 1/1 Running 8 171d 192.168.100.51 u109ls01 <none> <none>
kube-controller-manager-u109ls02 1/1 Running 7 171d 192.168.100.52 u109ls02 <none> <none>
kube-proxy-56j9f 1/1 Running 3 55d 192.168.100.54 u109ls04 <none> <none>
kube-proxy-gt7gg 1/1 Running 2 55d 192.168.100.52 u109ls02 <none> <none>
kube-proxy-hlkn8 1/1 Running 2 55d 192.168.100.53 u109ls03 <none> <none>
kube-proxy-pg7d4 1/1 Running 2 55d 192.168.100.51 u109ls01 <none> <none>
kube-scheduler-u109ls01 1/1 Running 8 171d 192.168.100.51 u109ls01 <none> <none>
kube-scheduler-u109ls02 1/1 Running 7 171d 192.168.100.52 u109ls02 <none> <none>
metrics-server-7c5f68c54d-zrtgl 2/2 Running 1 6d1h 10.233.105.248 u109ls04 <none> <none>
nodelocaldns-9pz2w 1/1 Running 7 171d 192.168.100.54 u109ls04 <none> <none>
nodelocaldns-bzhwn 1/1 Running 11 171d 192.168.100.51 u109ls01 <none> <none>
nodelocaldns-nsgk7 1/1 Running 12 171d 192.168.100.53 u109ls03 <none> <none>
nodelocaldns-z44sj 1/1 Running 12 171d 192.168.100.52 u109ls02 <none> <none>
etcd, kubelet running directly under OS management do not appear here.
api-server
This is a server component that provides Kubernetes API and communicates with kubelets running on each Node and with the client kbuectl.
The received content is stored in etcd.
The other party we communicate directly with is api-server. If this function is lost, the
kube-scheduler
This is the component that determines which Node the Pod object will run on when it is registered.
etcd
etcd is an open source, Key-Value NoSQL distributed database. It is used in many projects other than Kubernetes.
etcd is not a pod, but in Ubuntu it is under the management of systemd and runs as a server process of the OS.
The command etcdctl can be used to check what data is stored in etcd. Although it must be run on the server, information on namespace: metallb-systemd can be checked as follows
## Running on 192.168.100.51-54
$ sudo etcdctl --endpoints https://192.168.100.51:2379 --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/member-u109ls01.pem -- key=/etc/etc/etcd/ssl/member-u109ls01-key.pem get /registry/namespaces/metallb-system
/registry/namespaces/metallb-system
k8s
v1 Namespace
metallb-system "*$50ded0f3-600b-4433-a4ac-0adc17a50f192ZB
appmetallbb
0kubectl.kubernetes.io/last-applied-configurationx{"apiVersion": "v1", "kind": "Namespace", "metadata":{"annotatio
ns":{}, "labels":{"app": "metallb"}, "name": "metallb-system"}}
z
kubectl-client-side-applyUpdatevFieldsV1:.
{"f:metadata":{"f:annotations":{"." :{}, "f:kubectl.kubernetes.io/last-applied-configuration":{}}, "f:labels":{".
":{}, "f:app":{}}, "f:status":{"f:phase":{}}
kubernetes
Active"
kube-controller-manager
A pod called kube-controller-manager is running, which works with a mechanism called ``Controller''.
The Controller object itself can be created on its own, but this component implements several Contorller objects that are required to run Kubernetes.
The official guide lists the following four, but it also controls Deployment and StatefulSet objects.
-
Node Controller - Controller that detects and notifies Node Up/Down.
-
Replication Controller - Controller to maintain an appropriate number of Pods.
-
EndPoint Controller - Controller that connects Pods to services
-
ServiceAccount/Token Controller - Controller to create default account and API access token when a new Namespace is created
The last one, ServiceAccount/Token, is not something you are aware of, but it stores the necessary information as a Secret object.
$ kubectl -n $(id -un) get secret
The default-token-xxxxx at the top of the output is the Secret object created by the ServiceAccount/Token Controller.
NAME TYPE DATA AGE
default-token-4pmsl kubernetes.io/service-account-token 3 109d
objectstore Opaque 4 109d
ssh-auhorized-keys Opaque 1 13d
ssh-host-keys Opaque 3 13d
$ kubectl -n $(id -un) get secret default-token-4pmsl -o yaml
As follows: token: … and kubernetes.io/service-account.name: default and related information are registered.
apiVersion: v1
data:
ca.crt: ....
namespace: eWFzdS1hYmU=
token: ....
kind: Secret
metadata: annotations
annotations: default
kubernetes.io/service-account.name: default
kubernetes.io/service-account.uid: a37638b6-917e-41d0-b14b-7ba4eac7889c
creationTimestamp: "2021-04-07T02:53:50Z"
....
The pod of kube-controller-manager can be checked with the following operation.
$ kubectl -n kube-system get pod -l component=kube-controller-manager -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED N
ODE READINESS GATES
kube-controller-manager-u109ls01 1/1 Running 8 170d 192.168.100.51 u109ls01 <none> <none>
kube-controller-manager-u109ls02 1/1 Running 7 170d 192.168.100.52 u109ls02 <none> <none>
This other k8s component
Other components running include a DNS server and the network controller required to run Calico.
Customizing K8s
In the Kubernetes system, api-server was the central system that communicated with kubelet, the client, and kube-controller-manager*, which implements the Controller object. If you can communicate with this api-server, you can customize its behavior.
Kubernetes has flexible mechanisms for extensions, and we will discuss the following mechanisms here.
-
Controller
-
Custom Resource Definition (CRD)
Both CRD and Controller are separate mechanisms, but both are usually used to extend the system.
Examples of customization
For example, the following project introduces its own Operator mechanism to simplify system implementation.
-
RabbitMQ Cluster Operator for Kubernetes (Source Code: link:. GitHub rabbitmq/cluster-operator)
-
Apache Solr Operator (Source Code: GitHub apache/ solr-operator)
As you can see from GitHub, the programming language used by these projects is Go. Kubernetes provides *client-go as a library to communicate with the API, and other libraries are also based on the Go language.
If you are creating a systems management application, not just Kubernetes, you should aim to learn not only scripting languages, but also C and Go. It is useful to be able to read code, make necessary modifications, compile, etc.
The Go language is likely to be needed more often in the future for customization, as in this case. In addition, it will be less likely to have problems like C or Perl, where different authors have different ways of doing things, the OS is not the same, and the shared libraries required do not work when copied to the production environment. For this reason, we expect to see more applications created in Go, and it is well suited for slightly more complex utility programming.
Client-go provides external commands like kubectl, including a sample that communicates with api-server.
To actually create your own Operator, client-go alone is not enough, so the following framework is provided as a tool to assist you.
-
code-generator (aka Kubernetes Way) (GitHub kubernetes/code-generator)
-
KubeBuilder (GitHub kubernetes-sig/kubebuilder) (SIGs - Special Interest Groups)
-
Operator SDK (hosted by RedHat Inc.)
Choose one of these to build your own Operator using CRD and Controller.
Controller
The Controller object is a program that manages a Resource and communicates with the api-server.
Already registered resources have corresponding controllers, and the kube-controller-manager manages basic objects such as Deployment, ReplicaSet, etc.
The end result is a Docker container, but if you check the Dockerfile of RabbitMQ Operator and others, as linked below, you will see that we have created a simple Docker container that simply copies the compiled (Build) executable file and executes it.
## Excerpt of the main body of the Dockerfile, just copy the manager command and run
FROM scratch
ARG GIT_COMMIT
LABEL GitCommit=$GIT_COMMIT
WORKDIR /
COPY --from=builder /workspace/manager .
COPY --from=etc-builder /etc/passwd /etc/group /etc/
COPY --from=etc-builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
USER 1000:1000
ENTRYPOINT ["/manager"]
Basic controller behavior
The Controller object is simply a command that communicates with the api-server, but it has a mechanism for constantly monitoring the system state and acting quickly when the state changes.
This mechanism is implemented by each controller separately from Kubernetes, and various frameworks exist to support the creation of this common mechanism. The frameworks also provide various innovations to reduce the load on the api-server, such as providing incremental updates using in-memory-cache.
Reconciliation Loop, as the move is called, is listed in the reference material link:https://speakerdeck.com/govargo/under-the-kubernetes-controller-36f9b71b-9781- 4846-9625-23c31da93014?slide=5[Kubernetes Controller^ starting from scratch]
In order for this to work, you will be limited to one controller working in concert with one resource.
Custom Resource Definition (CRD)
Any CRD can be registered with Kubernetes via kubectl. As a sample, let’s check what kind of CRDs RabbitMQ’s Operator has registered.
The definition is long, so you can either pipe it to "less", or drop it into a file and check it in the Editor, or check it from a web browser.
$ curl -L "https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml" | less
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.6.0
labels:
app.kubernetes.io/component: rabbitmq-operator
app.kubernetes.io/name: rabbitmq-cluster-operator
app.kubernetes.io/part-of: rabbitmq
name: rabbitmqclusters.rabbitmq.com
So far, you can see that the controller-gen command is generated using KubeBuilder. (controller-gen.kubebuilder.io/version: v0.6.0)
Resource definitions follow from here.
spec:
group: rabbitmq.com
names:
categories:
- all
kind: RabbitmqCluster
listKind: RabbitmqClusterList
plural: rabbitmqclusters
shortNames:
- rmq
singular: rabbitmqcluster
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .status.conditions[?(@.type == 'AllReplicasReady')]status
name: AllReplicasReady
type: string
- jsonPath: .status.conditions[?(@.type == 'ReconcileSuccess')]status
name: ReconcileSuccess
type: string
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1beta1
schema:
openAPIV3Schema:
....
Here, in the kind: RabbitmqCluster section, we see that it is the definition of a new RabbitmqCluster resource.
It is quite a challenge to read the file generated by the tool, but if you look at ---, this file contains the following definitions.
-
Namespace
-
CustomResourceDefinition
-
ServiceAccount
-
Role
-
ClusterRole
-
RoleBinding
-
ClusterRoleBinding
-
Deployment
This file is applied with kubectl apply -f and the results are checked.
$ kubectl -n rabbitmq-system get all
The result of this command is as follows
NAME READY STATUS RESTARTS AGE
pod/rabbitmq-cluster-operator-5b4b795998-sfxmm 1/1 Running 0 9m32s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/rabbitmq-cluster-operator 1/1 1 1 9m32s
NAME DESIRED CURRENT READY AGE
replicaset.apps/rabbitmq-cluster-operator-5b4b795998 1 1 1 9m32s
Error from server (Forbidden): rabbitmqclusters.rabbitmq.com is forbidden: User "yasu-abe@u-aizu.ac.jp" cannot list
resource "rabbitmqclusters" in API group "rabbitmq.com" in the namespace "rabbitmq-system"
The error is still there, but we will ignore it for now, as there is no change in what is displayed, even with administrator privileges.
In order to create a RabbitmqCluster object corresponding to the registered CRD, the following YAML file will be accepted.
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: definition
spec:
replicas: 3
persistence:
storageClassName: rook-ceph-block
storage: 20Gi
service:
type: LoadBalancer
annotations:
metallb.universe.tf/address-pool: rabbitmq-pool
After running kubectl -n rabbitmq-system apply -f on this file and checking the status with administrator privileges, you should see something like this
$ kubectl -n rabbitmq-system get all
NAME READY STATUS RESTARTS AGE
pod/definition-server-0 0/1 Init:0/1 0 30s
pod/definition-server-1 0/1 Init:0/1 0 30s
pod/definition-server-2 0/1 Init:0/1 0 29s
pod/rabbitmq-cluster-operator-5b4b795998-sfxmm 1/1 Running 0 29m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/definition LoadBalancer 10.233.53.4 <pending> 5672:32533/TCP,15672:30224/TCP,15692:32648/TCP 31s
service/definition-nodes ClusterIP None <none> 4369/TCP,25672/TCP 32s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/rabbitmq-cluster-operator 1/1 1 1 29m
NAME DESIRED CURRENT READY AGE
replicaset.apps/rabbitmq-cluster-operator-5b4b795998 1 1 1 29m
NAME READY AGE
statefulset.apps/definition-server 0/3 30s
NAME ALLREPLICASREADY RECONCILESUCCESS AGE
rabbitmqcluster.rabbitmq.com/definition False Unknown 32s
This shows how the Operator registers the necessary StatefulSet and Service objects, given the registered RabbitMQCluster object.
The best way to see how it actually works is to check the code on GitHub.
Above.