How to build postgres HA cluster in K8s

One of the python development tools is Eclipse Che that relies on Keycloak for authentication, which in turn requires a postgres database. All three are bundled in k8s operator and are not-functional because of complexity of each product. Deploying each of the three products separately helps to avoid the complexity and is much more transparent. This article focuses on postgres HA cluster deployment in Kubernetes.

 FROM centos:7.6.1810
 LABEL "description"="PostgreSQL Database Server 12"
 LABEL "version"="12.2-2"
 RUN rm -rf /etc/yum.repos.d/*
 COPY system.repo /etc/yum.repos.d/
 RUN yum -y update
 RUN yum -y install https://yum.postgresql.org/12/redhat/rhel-7-x86_64/postgresql12-libs-12.2-2PGDG.rhel7.x86_64.rpm
 RUN yum -y install https://yum.postgresql.org/12/redhat/rhel-7-x86_64/postgresql12-12.2-2PGDG.rhel7.x86_64.rpm
 RUN yum -y install https://yum.postgresql.org/12/redhat/rhel-7-x86_64/postgresql12-server-12.2-2PGDG.rhel7.x86_64.rpm
 RUN yum -y clean all
 RUN mkdir -p /data/postgres && chown postgres:postgres /data/postgres
 VOLUME /data
 USER postgres
 #CMD ["sh", "-c", "/usr/pgsql-12/bin/initdb -D /data/postgres; /usr/pgsql-12/bin/postmaster -D /data/postgres -i -h 0.0.0.0 --logging_collector=FALSE --ssl_ca_file=<CA.pem> --ssl_cert_file=<ssl.crt> --ssl_ciphers=HIGH --ssl_key_file=<ssl.key> --ssl_min_protocol_version=TLSv12 --unix_socket_directories="]
 CMD ["sh", "-c", "/usr/pgsql-12/bin/initdb -D /data/postgres; /usr/pgsql-12/bin/postmaster -D /data/postgres -i -h 0.0.0.0 --logging_collector=FALSE --unix_socket_directories="]
  
  
 docker build -t=postgresql:12.2-2 .
 docker tag postgresql:12.2-2 nexus:8443/repository/docker-repo/postgresql:12.2-2
 docker login nexus:8443
 docker push
 docker logout nexus:8443

To accomplish HA, we use k8s !StatefulSet with 2 node-antiaffinity replicas and a headless Service plus Openstack Cinder persistent volumeClaimTemplates. !InitContainer will help running initdb and pg_basebackup that is necessary to establish the replication. Since the focus is on HA rather than performance, we use synchronous streaming replication waiting for standby server to acknowledge a transaction (‘remote_apply’). As usual we keep config files (postgresql.conf and pg_hba.conf) and initdb.sh script in k8s !ConmigMaps and postgres password (.pgpass and pwfile files) in k8s Secrets. Here is the complete yaml template (postgres.yaml) of postgres HA cluster.

 apiVersion: v1
 kind: Service
 metadata:
   name: postgres
 spec:
   type: "ClusterIP"
   clusterIP: None
   selector:
     app: postgres
   ports:
   - port: 5432
   sessionAffinity: "ClientIP"
   sessionAffinityConfig:
     clientIP:
       timeoutSeconds: 3600
 ---
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
   name: postgres
   labels:
     app: postgres
 spec:
   replicas: 2
   selector:
     matchLabels:
       app: postgres
   serviceName: postgres
   template:
     metadata:
       name: postgres
       labels:
         app: postgres
     spec:
       affinity:
         podAntiAffinity:
           preferredDuringSchedulingIgnoredDuringExecution:
           - podAffinityTerm:
               labelSelector:
                 matchExpressions:
                 - key: app
                   operator: In
                   values:
                   - postgres
               topologyKey: "kubernetes.io/hostname"
             weight: 50
       initContainers:
       - name: init
         image: nexus:8443/repository/docker-repo/postgresql:12.2-2
         securityContext:
           runAsUser: 0
         resources:
           limits:
             cpu: 100m
             memory: 200Mi
           requests:
             cpu: 100m
             memory: 100Mi
         env:
           - name: PGDATA
             value: "/data/postgres"
         args: ["sh", "/tmp/dbinit.sh"]
         volumeMounts:
         - name: "postgres-dbinit"
           mountPath: "/tmp/dbinit.sh"
           subPath: "dbinit.sh"
         - name: "pwfile"
           mountPath: "/tmp/pwfile.txt"
           subPath: "pwfile.txt"
         - name: "pgpass"
           mountPath: "/tmp/.pgpass"
           subPath: ".pgpass"
         - name: "postgres"
           mountPath: "/data"
       containers:
       - name: postgres
         image: nexus:8443/repository/docker-repo/postgresql:12.2-2
         ports:
         - name: postgres
           containerPort: 5432
         resources:
           limits:
             cpu: 2
             memory: 1Gi
           requests:
             cpu: 1
             memory: 500Mi
         env:
           - name: PGDATA
             value: "/data/postgres"
           - name: PGPASSFILE
             value: "/data/postgres/.pgpass"
         args: ["sh", "-c", "/usr/pgsql-12/bin/postmaster -D $(PGDATA) -i -h 0.0.0.0 --logging_collector=FALSE --unix_socket_directories="]
         volumeMounts:
         - name: "postgresql-conf"
           mountPath: "/data/postgres/postgresql.conf"
           subPath: "postgresql.conf"
         - name: "pg-hba-conf"
           mountPath: "/data/postgres/pg_hba.conf"
           subPath: "pg_hba.conf"
         - name: "postgres"
           mountPath: "/data"
       volumes:
       - name: "postgres-dbinit"
         configMap:
           name: "postgres-dbinit"
       - name: "pwfile"
         secret:
           secretName: "postgres-pwfile"
           items:
           - key: "pwfile"
             mode: 0400
             path: "pwfile.txt"
           optional: false
       - name: "pgpass"
         secret:
           secretName: "postgres-pgpass"
           items:
           - key: ".pgpass"
             mode: 0400
             path: ".pgpass"
           optional: false
       - name: "postgresql-conf"
         configMap:
           name: "postgresql-conf"
           items:
           - key: "postgresql.conf"
             mode: 0644
             path: "postgresql.conf"
           optional: false
       - name: "pg-hba-conf"
         configMap:
           name: "pg-hba-conf"
           items:
           - key: "pg_hba.conf"
             mode: 0644
             path: "pg_hba.conf"
           optional: false
   volumeClaimTemplates:
   - metadata:
       name: postgres
     spec:
       resources:
         requests:
           storage: 1Gi
         limits:
           storage: 2Gi
       accessModes:
         - ReadWriteOnce
       storageClassName: cinder
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: pg-hba-conf
   namespace: default
 data:
   pg_hba.conf: |
     host         all             postgres        127.0.0.1/32                                   trust
     host         replication     postgres        10.244.0.0/16                                  password
     hostnossl    all             all             0.0.0.0/0                                      scram-sha-256
     hostssl      all             all             0.0.0.0/0                                      password
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: postgresql-conf
   namespace: default
 data:
   postgresql.conf: |
     max_connections = 100                   # (change requires restart)
     shared_buffers = 128MB                  # min 128kB
     dynamic_shared_memory_type = posix      # the default is the first option
     max_wal_size = 1GB
     min_wal_size = 80MB
     log_destination = 'stderr'              # Valid values are combinations of
     logging_collector = off                 # Enable capturing of stderr and csvlog
     datestyle = 'iso, mdy'
     timezone = 'UTC'
     lc_messages = 'C'                       # locale for system error message
     lc_monetary = 'C'                       # locale for monetary formatting
     lc_numeric = 'C'                        # locale for number formatting
     lc_time = 'C'                           # locale for time formatting
     default_text_search_config = 'pg_catalog.english'
     #archive_mode = on
     #wal_level = replica
     #primary_conninfo = 'host=postgres-0.postgres.default.svc.cluster.local port=5432 user=postgres passfile=/var/lib/pgsql/.pgpass dbname=replication replication=on'
     #primary_slot_name = 'replication_slot'
     synchronous_standby_names = '*'
     synchronous_commit = 'remote_apply'
 ---

The script that runs inside !InitContainer is kept in postgres-dbinit !ConfigMap

 cat dbinit.sh
 #!/usr/bin/bash
 #sleep 5m
 if [[ ! -d ${PGDATA} ]]; then
   mkdir ${PGDATA}
   chmod 750 ${PGDATA}
   chown postgres:postgres ${PGDATA}
 fi
 cp /tmp/pwfile.txt /var/lib/pgsql/pwfile
 chown postgres:postgres /var/lib/pgsql/pwfile
 cp /tmp/.pgpass ${PGDATA}/.pgpass
 chown postgres:postgres ${PGDATA}/.pgpass
 HOSTNAME=`hostname`
 if [[ "${HOSTNAME}" == "postgres-0" ]]; then
   su postgres -c "/usr/pgsql-12/bin/initdb -D ${PGDATA} --pwfile=/var/lib/pgsql/pwfile"
 else
   su postgres -c "pg_basebackup -w -R -X stream -C -S replication_slot -d 'host=postgres-0.postgres.default.svc.cluster.local port=5432 user=postgres passfile=/data/postgres/.pgpass' -D ${PGDATA}"
 fi
  
 kubectl create cm postgres-dbinit --from-file=dbinit.sh

Secrets (.pgfile and pwfile) are created using kubectl commands:

 openssl rand -base64 -out pwfile 32
 kubectl create secret generic postgres-pwfile --from-file=pwfile
 kubectl create secret generic postgres-pgpass --from-file=.pgpass

.pgpass has the following format (https://www.postgresql.org/docs/12/libpq-pgpass.html): hostname:port:database:username:password

So we use this:

PGPASSFILE env var is set to /data/postgres/.pgpass

To deploy the postgres HA cluster, once the above !ConfigMap and Secrets are in place, just run

 kubectl apply -f postgres.yaml