How to build a Kubernetes cluster in Openstack for orchestrating Docker containers

“Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available. Google open-sourced the Kubernetes project in 2014. Kubernetes builds upon a decade and a half of experience that Google has with running production workloads at scale, combined with best-of-breed ideas and practices from the community.”

Below is a step by step guide manually building Kubernetes cluster in Openstackwithout an extra complexity, instability and constraints of additional products such as Terraform, Kuberspray and Ansible, which certainly may be valid for large deployments, i.e. hundreds of clusters but seems like an overkill for one-off installation.

It should also be noted that Openstack has container orchestration module called Magnum. It may not be available in your Openstack due to Pike version limitations.

Step-by-step guide

1. Install Openstack CLI client (openstackclient 3.12) for Pike on a linux (!CentOS) VM (can be anything, including those that run in VMWare Player, !VirtualBox, etc)

 yum install centos-release-openstack-pike
 yum install python-openstackclient

2. Obtain RC file to authenticate to Openstack from the horizon dashboard.

Click on your username in the top right corner of the dashboard and choose !OpenStack RC File v3. Save it as openstack-env in your home folder and source it. You’ll be asked for your keystone password, which will be cached in env variable OS_PASSWORD. Be careful with it.

 source openstack-env

3. Create two-server HA load balancer VMs for 3 Kubernetes master node VMs. Note that Openstack’s load balancer (part of Neutron service) may or may not be utilized because a key manager service barbican may not installed in Openstack. Octavia is a better choice.

3.1. Generate rsa key pair for ssh a user kubeadmin and save the public key in id_rsa.pub file

 openstack keypair create kubeadmin
 groupadd kubeadmin
 useradd -g kubeadmin -m -d /home/kubeadmin kubeadmin
 mkdir /home/kubeadmin/.ssh
 openstack keypair show --public-key kubeadmin > /home/kubeadmin/.ssh/id_rsa.pub
 chown -R kubeadmin:kubeadmin/home/kubeadmin/.ssh

3.2. Create bootable volumes ds-kube-server-lb-1 and ds-kube-server-lb-2, 16GB each, from the glance image centos-7.6

 openstack image list
 openstack volume create --image centos-7.6 --bootable --size 16 ds-kube-server-lb-1
 openstack volume create --image centos-7.6 --bootable --size 16 ds-kube-server-lb-2

3.3. Create hard anti-affinity policy to ensure that a haproxy cluster is created on different hardware blades

 openstack server group create --policy anti-affinity ha
 openstack server group list

3.4. Create a flavor project.2c.2r.0d with 2 CPUs, 2GB RAM and 0GB local disk (we will use an external !NetApp volume), make it private for your project and include tenant_id to ensure that VMs are created on the dedicated to DS hardware blades, if host aggregates with tenant_id filtering is used instead of availability zones.

 openstack flavor create --disk 2 --ram 2048 --vcpus 2 --project <project-name> --private --property aggregate_instance_extra_specs:filter_tenant_id='<tenant-id>' project.2c.2r.0d

3.4. Create VMs (ds-kube-server-proxy-1 and ds-kube-server-proxy-2) using the flavor, volume, ha group and ssh key that we just created, attached to a network (can be seen with “openstack network list” command).

 openstack server create --flavor project.2c.2r.0d --volume ds-kube-server-lb-1 --key-name kubeadmin --network <network> --hint group=<ha-group-id> ds-kube-server-proxy-1
 openstack server create --flavor project.2c.2r.0d --volume ds-kube-server-lb-2 --key-name kubeadmin --network <network> --hint group=<ha-group-id> ds-kube-server-proxy-2
 openstack server list
 openstack server show ds-kube-server-proxy-1
 openstack server show ds-kube-server-proxy-2

3.5. Note adminPass in the output of “openstack server show” commands once the servers are built and ensure that the servers are created on different blades according the anti-affinity policy

3.6. To verify that both servers are part of the same anti-affinity group, run

 openstack server group show ha

3.7. ssh to the servers and configure them

 ssh -i /home/kubeadmin/.ssh/id_rsa.pub kubeadmin@ds-kube-server-proxy-1
 ssh -i /home/kubeadmin/.ssh/id_rsa.pub kubeadmin@ds-kube-server-proxy-2

3.8. Extend LV and then resize the file system to match the size of the volume (image size for !CentOS is 8GB)

 lvextend -l +100%FREE /dev/mapper/VolGroup00-root
 resize2fs /dev/mapper/VolGroup00-root

3.9. Update !CentOS

 yum -y update

3.11. Download haproxy from the Internet because yum version is too old - 1.5.18. The latest is 1.8.20.

 yum info haproxy|grep Version
 wget http://www.haproxy.org/download/1.8/src/haproxy-1.8.20.tar.gz
 gunzip haproxy-1.8.20.tar.gz
 tar -xf haproxy-1.8.20.tar
 cd haproxy-1.8.20
 yum -y groupinstall "Development Tools"
 yum -y install openssl-devel pcre-devel zlib-devel systemd-devel
 make TARGET=linux2628 USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1

3.12. Install haproxy into /opt/haproxy-1.8.20 (default is /usr/local)

 make install PREFIX=/opt/haproxy-1.8.20 MANDIR=/usr/share/man DOCDIR=/usr/share/doc/haproxy

3.13. Configure pass-through SSL for tcp port 6443 for 3 Kubernetes master nodes

 vi /opt/haproxy/etc/haproxy.conf 
 global
        master-worker
        log /run/systemd/journal/syslog local6
    defaults
        mode tcp
        log global
        timeout connect 5000ms
        timeout server 5000ms
        timeout client 5000ms
    frontend kube-endpoint
        bind :6443
        default_backend kube-servers
    backend kube-servers
        balance source
        server ds-kube-server-1 <ip1>:6443 check
        server ds-kube-server-2 <ip2>:6443 check
        server ds-kube-server-3 <ip3>:6443 check

3.14. Configure syslog and logrotate

 vi /etc/rsyslog.conf
 #comment out $OmitLocalLogging on, otherwise rsyslog will not create /run/systemd/journal/syslog unix socket even though the imuxsock module is loaded
 #$OmitLocalLogging on
 local6.*                                                /var/log/haproxy.log
 touch /var/log/haproxy.log
 systemctl restart rsyslog
 vi /etc/logrotate.d/syslog
 /var/log/haproxy.log

3.15. Verify the config

 ln -s /opt/haproxy-1.8.20 /opt/haproxy
 /opt/haproxy/sbin/haproxy -c -f /opt/haproxy/etc/haproxy.conf

3.16. Create a systemd startup script, enable haproxy daemon and start it

 vi /etc/systemd/system/haproxy.service
 [Unit]
 Description=HAProxy Server
 [Service]
 Restart=on-failure
 ExecStart=/opt/haproxy/sbin/haproxy -f /opt/haproxy/etc/haproxy.conf
 ExecStop=/bin/pkill -SIGTERM haproxy
 ExecReload=/bin/pkill -SIGUSR2 haproxy
 [Install]
 WantedBy=default.target
 systemctl enable haproxy
 systemctl start haproxy

3.17 Install keepalived on both haproxy VMs and configure it

 wget https://www.keepalived.org/software/keepalived-2.0.16.tar.gz
 gunzip keepalived-2.0.16.tar.gz
 tar -xf keepalived-2.0.16.tar
 cd keepalived-2.0.16
 yum -y install openssl-devel libnl3-devel ipset-devel iptables-devel file-devel net-snmp-devel glib2-devel json-c-devel pcre2-devel libnftnl-devel libmnl-devel python-sphinx epel-release python-sphinx_rtd_theme
 ./configure --prefix=/opt/keepalived-2.0.16 --datarootdir=/usr/share --enable-snmp --enable-snmp-vrrp --enable-snmp-checker --enable-snmp-rfc --enable-dbus --enable-dbus-create-instance
 make
 make install
 ln -s /opt/keepalived-2.0.16 /opt/keepalived
 mkdir /etc/keepalived
 vi /etc/keepalived/keepalived.conf
 ! Configuration File for keepalived
 global_defs {
   notification_email {
     <email>
   }
   notification_email_from <email>
   smtp_server <smtp-server>
   smtp_connect_timeout 30
   enable_script_security
 }
 vrrp_script chk_haproxy {
    script "/bin/killall -0 haproxy"
    interval 2
 }
 vrrp_instance VI_1 {
    state MASTER #BACKUP on the other haproxy VM
    interface eth0
    virtual_router_id 51
    priority 100 #50 on BACKUP keepalived instance
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass <password>
    }
    virtual_ipaddress {
        <virt-ip>
    }
    track_script {
        chk_haproxy
    }
 }
 chmod 640 /etc/keepalived/keepalived.conf
 vi /etc/sysctl.conf
 net.ipv4.ip_nonlocal_bind = 1
 sysctl net.ipv4.ip_nonlocal_bind=1
 systemctl enable keepalived
 systemctl start keepalived

3.18. On a VM that has Openstack client install, run the following commands to add virtual proxy IP address <virt-ip> (ds-kube-server-proxy) to the network ports.

 openstack port list --network <network>
 openstack port set --allowed-address ip-address=<virt-ip> <ID_of_the_proxy-1_server's_ip>
 openstack port set --allowed-address ip-address=<virt-ip> <ID_of_the_proxy-2_server's_ip>

Note that this virtual IP must be excluded from the DHCP pool of the subnet of <network>. This can be done via Openstack Horizon dashboard or CLI (the commands are NOT shown below and left as an exercise for the reader).

3.19. Add A record for ds-kube-server-proxy

 ds-kube-server-proxy IN A <virt-ip>

3.20. Ensure that IP multicast is allowed for VRRP to work in an Openstack security group assigned to haproxy servers (in our case it is the default security group)

 openstack security group list --project <project>
 openstack security group show  <default-group-id>

3.21. Once both nodes are up, shutdown haproxy on the MASTER and witness the virtual IP failover on MASTER/BACKUP using

 systemctl stop haproxy
 ip addr show
 systemctl start haproxy

4. Build 3 Kubernetes master nodes: ds-kube-server-1, ds-kube-server-2, ds-kube-server-3

4.1. Create a flavor with Kubernetes requirements for control plane nodes: 2CPUs, 2GB RAM, no disk because we use a volume

 openstack flavor create --disk 0 --ram 2048 --vcpus 2 --project <project> --private --property  aggregate_instance_extra_specs:filter_tenant_id='<tenant_id>' project.2c.2r.0d

4.2. Create 3 bootable volumes 16GB each from an image

 openstack volume create --image centos-7.6 --bootable --size 16 ds-kube-server-vol-1
 openstack volume create --image centos-7.6 --bootable --size 16 ds-kube-server-vol-2
 openstack volume create --image centos-7.6 --bootable --size 16 ds-kube-server-vol-3

4.3 Create soft anti-affinity group because we have three servers but only 2 blades

 openstack server group create --os-compute-api-version 2.15 --policy soft-anti-affinity soft-ha

4.4. Create 3 servers. Get the group id of soft-ha by running “openstack server group list” command

 openstack server create --flavor project.2c.2r.0d --volume ds-kube-server-vol-1 --key-name kubeadmin --network <network> --hint group=<soft-ha-id> ds-kube-server-1
 openstack server create --flavor project.2c.2r.0d --volume ds-kube-server-vol-2 --key-name kubeadmin --network <network> --hint group=<soft-ha-id> ds-kube-server-2
 openstack server create --flavor project.2c.2r.0d --volume ds-kube-server-vol-3 --key-name kubeadmin --network <network> --hint group=<soft-ha-id> ds-kube-server-3

Note that it is possible to migrate the server from one host to another for HA or other reason using this command

 openstack server migrate --live <blade> ds-kube-server-2

4.5. ssh to each master node server and configure it initially the same way as a proxy servers (extend LV and resize FS, set yum repos, update !CentOS). See the commands above.

4.6. Install docker runtime

 yum install -y yum-utils
 yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
 yum install -y docker-ce docker-ce-cli containerd.io
 systemctl enable docker

4.7. Docker relies on iptables and kubernetes cluster requires certain ports to be open (https://kubernetes.io/docs/setup/independent/install-kubeadm/)

 systemctl enable firewalld
 systemctl start firewalld
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 2379 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 2380 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 6443 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 10250 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 10251 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 10252 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule  filter INPUT 0 -s 224.0.0.0/4 -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -s 224.0.0.0/4 -d 224.0.0.0/4 -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter OUTPUT 0 -d 224.0.0.0/4 -j ACCEPT
 firewall-cmd --reload

4.8. Docker should be configured with systemd cgroup driver rather than cgroupfs, otherwise, kubernetes complains (see https://kubernetes.io/docs/setup/cri/)

 mkdir /etc/docker
 vi /etc/docker/daemon.json
 {
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
   "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": ["overlay2.override_kernel_check=true"],
  "registry-mirrors": ["<url of the docker mirror>"]
 }

systemctl start docker

5. Install Kubernetes on the master nodes

 vi /etc/yum.repos.d/kubernetes.repo
 [kubernetes]
 name=Kubernetes
 baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
 enabled=1
 gpgcheck=1
 repo_gpgcheck=1
 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
 exclude=kube*
 yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
 systemctl enable kubelet

5.1. Disable swap - this is required by Kubernetes to run the installation

 swapoff -a
 lvremove -Ay /dev/VolGroup00/swap
 lvextend -l +100%FREE /dev/VolGroup00/root
 vi /etc/default/grub
 ##GRUB_CMDLINE_LINUX="nofb splash=quiet crashkernel=auto rd.lvm.lv=VolGroup00/root rd.lvm.lv=VolGroup00/swap rhgb quiet"
 GRUB_CMDLINE_LINUX="nofb splash=quiet crashkernel=auto rd.lvm.lv=VolGroup00/root rhgb quiet"
 cp /etc/grub2.cfg /etc/grub2.cfg.bak
 grub2-mkconfig >/etc/grub2.cfg
 vi /etc/sysctl.conf
 vm.swappiness = 0
 sysctl vm.swappiness=0

5.2. Create a kubeadm config file

 vi kubeadm-config.yaml
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: ClusterConfiguration
 kubernetesVersion: 1.14.1
 apiServerCertSANs:
 - "ds-kube-server-proxy"
 controlPlaneEndpoint: "ds-kube-server-proxy:6443"
 networking:
   podSubnet: "10.244.0.0/16"
 #imageRepository: docker.com/google-containers

5.3. Generate the certs

 kubeadm init --config=kubeadm-config.yaml phase certs all

5.4. Comment out two lines in the kubeadm-config.yaml file that were needed to include the proxy DNS name in Subject Alternative Names (apiServerCertSANs)

  1. apiServerCertSANs:
  2. - “ds-kube-server-proxy”

5.5. Initialize the first master node

 kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs --skip-phases certs

5.6. Download flannel pod network add-on config file from https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml and apply it

 wget https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
 export KUBECONFIG=/etc/kubernetes/admin.conf
 kubectl apply -f kube-flannel.yml

5.7. Repeat the steps above for ds-kube-server-2 and ds-kube-server-3, except instead of running “kubeadm init”, run

 kubeadm join ds-kube-server-proxy:6443 --token <token> --discovery-token-ca-cert-hash sha256:4fdded177869c114fecfbe67088b78372422ede7f251bd72099c2e1167aac692    --experimental-control-plane --certificate-key <the-key>

5.8. If the token (–token option in the above command) has expired, it can be re-created with

 kubeadm token create --print-join-command --ttl 0
 kubeadm token list

5.9. To generate sha256 hash of CA cert (--discovery-token-ca-cert-hash), run

 openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

5.10. To see all the nodes in Kubernetes cluster, run

 kubectl get nodes

5.11. Configure some permissions to create CSRs, access keys and certificates, approve CSRs and renew the certs as described in https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/

 vi worker_kubelet_csr_creation_authorization.yml
 # enable bootstrapping nodes to create CSR
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
  name: create-csrs-for-bootstrapping
 subjects:
 - kind: Group
  name: system:bootstrappers
  apiGroup: rbac.authorization.k8s.io
 roleRef:
  kind: ClusterRole
  name: system:node-bootstrapper
  apiGroup: rbac.authorization.k8s.io
 kubectl apply -f worker_kubelet_csr_creation_authorization.yml
 vi approve_worker_csr.yml
 # Approve all CSRs for the group "system:bootstrappers"
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
  name: auto-approve-csrs-for-group
 subjects:
 - kind: Group
  name: system:bootstrappers
  apiGroup: rbac.authorization.k8s.io
 roleRef:
  kind: ClusterRole
  name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
  apiGroup: rbac.authorization.k8s.io
 kubectl apply -f approve_worker_csr.yml
 vi renew_worker_certs.yml
 # Approve renewal CSRs for the group "system:nodes"
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
  name: auto-approve-renewals-for-nodes
 subjects:
 - kind: Group
  name: system:nodes
  apiGroup: rbac.authorization.k8s.io
 roleRef:
  kind: ClusterRole
  name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
  apiGroup: rbac.authorization.k8s.io
 kubectl apply -f renew_worker_certs.yml

5.12. Verify the permissions

 kubectl get ClusterRoleBinding create-csrs-for-bootstrapping -o wide
 kubectl get ClusterRoleBinding auto-approve-csrs-for-group -o wide
 kubectl get ClusterRoleBinding auto-approve-renewals-for-nodes -o wide

6. Create 3 worker (or minion) nodes and join them the Kubernetes cluster

6.1. Create a flavor with 8CPUs, 100GB RAM and 0GB disk

 openstack flavor create --disk 0 --ram 10240 --vcpus 8 --project <project> --private --property aggregate_instance_extra_specs:filter_tenant_id='<tenant_id>' project.8c.100r.0d

6.2. Create 3 bootable volumes from an image

 openstack volume create --image centos-7.6 --bootable --size 32 ds-kube-minion-vol-1
 openstack volume create --image centos-7.6 --bootable --size 32 ds-kube-minion-vol-2
 openstack volume create --image centos-7.6 --bootable --size 32 ds-kube-minion-vol-3

6.3. Create 3 worker servers using the flavor, volume, ssh key and soft anti-affinity group, attached to both <network> and <nfs_network> networks (for NFS mounted storage)

 openstack server create --flavor project.8c.100r.0d --volume ds-kube-minion-vol-1 --key-name kubeadmin --network <network> --network <nfs_network> --hint group=<soft-ha-id> ds-kube-minion-1
 openstack server create --flavor project.8c.100r.0d --volume ds-kube-minion-vol-2 --key-name kubeadmin --network <network> --network <nfs_network> --hint group=<soft-ha-id> ds-kube-minion-2
 openstack server create --flavor project.8c.100r.0d --volume ds-kube-minion-vol-3 --key-name kubeadmin --network <network> --network <nfs_network> --hint group=<soft-ha-id> ds-kube-minion-3

6.4. ssh to each worker node and configure it as other kubernetes nodes (see above), except for iptables and when joining the cluster run the following command

 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 10250 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p tcp --dport 30000:32767 -m conntrack --ctstate NEW -j ACCEPT
 firewall-cmd --reload
 kubeadm join ds-kube-server-proxy:6443 --token <token> --discovery-token-ca-cert-hash <hash>

6.5. To list all the nodes, run on one of the masters or copy /etc/kubernetes/admin.conf file from the master into /.kube/config

 kubectl get nodes