How to install JupyterHub in Kubernetes

!JupyterHub allows users to interact with a computing environment through a webpage. As most devices have access to a web browser, !JupyterHub makes it is easy to provide and standardize the computing environment of a group of people (e.g., for a class of students or an analytics team). Most common use of !JupyterHub is to launch !JupyterLab via !KubeSpawner once a user authenticates.

Moreover, one can create configurations for multiple user environments, and let users select from them once they log in to !JupyterHub. This is done by creating multiple profiles, each of which is attached to a set of configuration options that override !JupyterHub’s default configuration (specified in a Helm Chart). This can be used to let users choose among many Docker images, to select the hardware on which they want their jobs to run, or to configure default interfaces such as !JupyterLab vs. Jupyter Notebook. Step-by-step guide

!JupyterHub for Kubernetes is provided in a form of Helm’s chart. Helm is a Kubernetes package manager similar to yum or apt. Therefore, the first step was to install Helm. Helm consists of a server (tiller) and a client (helm). Tiller once deployed in Kubernetes cluster would run as a pod but for one-off installation of !JupyterHub, I decided not to install it into Kubernetes but just run it from the console on one of the kubernetes cluster minion nodes. I followed the instructions published in https://zero-to-!JupyterHub.readthedocs.io/en/latest/setup-helm.html but only run

 helm init --client-only

DID NOT RUN helm init --service-account tiller --wait

Started tiller locally from the terminal:

 /home/esportz/linux-amd64/tiller

and then

 export HELM_HOST=localhost:44134

It appears that in later versions of helm (>=3.0) tiller component is removed and binary helm all is needed to install charts (no need to run helm init, etc)

The other prerequisite for running !JupyterHub is Kubernetes Dynamic Volume Provisioning. I chose nfs-client provisioner and installed it as described in https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client

Basically I modified the external-storage-master/nfs-client/deploy/deployment.yaml file:

 apiVersion: v1
 kind: ServiceAccount
 metadata:
   name: nfs-client-provisioner
 ---
 kind: Deployment
 apiVersion: extensions/v1beta1
 metadata:
   name: nfs-client-provisioner
 spec:
   replicas: 1
   strategy:
     type: Recreate
   template:
     metadata:
       labels:
         app: nfs-client-provisioner
     spec:
       serviceAccountName: nfs-client-provisioner
       containers:
         - name: nfs-client-provisioner
           image: quay.io/external_storage/nfs-client-provisioner:latest
           volumeMounts:
             - name: nfs-client-root
               mountPath: /persistentvolumes
           env:
             - name: PROVISIONER_NAME
               value: netapp/nfs-client
             - name: NFS_SERVER
               value: nfs_server
             - name: NFS_PATH
               value: /export/volumes
       volumes:
         - name: nfs-client-root
           nfs:
             server: nfs_server
             path: /export/volumes

Then set the !StorageClass as default in external-storage-master/nfs-client/deploy/class.yaml like this:

 apiVersion: storage.k8s.io/v1
 kind: StorageClass
 metadata:
   name: managed-nfs-storage
   annotations:
     storageclass.kubernetes.io/is-default-class: "true"
 provisioner: netapp/nfs-client # or choose another name, must match deployment's env PROVISIONER_NAME'
 parameters:
   archiveOnDelete: "false"
 allowVolumeExpansion: true

Create RBAC:

 kubectl create -f external-storage-master/nfs-client/deploy/rbac.yaml

Then

 kubectl apply -f external-storage-master/nfs-client/deploy/class.yaml
 kubectl apply -f external-storage-master/nfs-client/deploy/deployment.yaml

Then tested as described in the above URL.

Lastly, I downloaded Helm’s chart !JupyterHub-0.8.2.tgz and followed instructions on https://zero-to-JupyterHub.readthedocs.io/en/latest/setup-JupyterHub.html to install !JupyterHub.

Created jhub-config-ldap.yaml

 proxy:
   secretToken: "<secret-token>"
   service:
     type: ClusterIP
   https:
     enabled: true
     type: manual
     manual:
 #      key: |
 #        <data from cert/conda.example.com.key>
 #      cert: |
 #        <data from cert/conda.example.com.crt>
 prePuller:
   pause:
     image:
       name: docker.com/google_containers/pause
 scheduling:
   userScheduler:
     image:
       name: docker.com/google_containers/kube-scheduler-amd64
 singleuser:
   defaultUrl: "/lab"
   image:
     name: conda
     tag: '1.2'
   events: false
 hub:
   extraConfig:
     jupyterlab: |
       c.Spawner.cmd = ['jupyter-labhub']
 auth:
   type: ldap
   ldap:
     server:
       address: adc.example.com
     dn:
       lookup: True
       search:
         filter: '({login_attr}={login})'
         user: 'ctera-ldap-svc'
         password: '<password-from-the-vault>'
       templates:
         - 'CN={username},OU=ITO,OU=Users,OU=NY,OU=US,OU=Offices,DC=example,DC=com'
       user:
         searchBase: 'ou=Offices,dc=example,dc=com'
         escape: False
         attribute: 'sAMAccountName'
         dnAttribute: 'cn'
     allowedGroups:
       - 'cn=admins,ou=users,ou=Security Groups,ou=Groups,dc=example,dc=com'
 ingress:
   enabled: true
   annotations:
     ingress.kubernetes.io/ssl-passthrough: "true"
   hosts:
     - conda.example.com

We must include the templates of all users! Otherwise, if a user whose template is not included, will get “invalid password” error! The problem with https://github.com/!jupyterhub/ldapauthenticator/blob/master/ldapauthenticator/ldapauthenticator.py is that they form DN by concatenating CN and either searchBase or a template and then bind with that DN using a password. Instead they should bind with just sAMAccountName or if they really want to use DN, then it is available in a pure form in distinguishedName attribute.

Then ran

 linux-amd64/helm upgrade --install jhub jupyterhub-0.8.2.tgz -f jhub-config-ldap.yaml
 
 Release "jhub" does not exist. Installing it now.
 NAME: jhub
 LAST DEPLOYED: Tue Aug 27 23:05:06 2019
 NAMESPACE: default
 STATUS: DEPLOYED
  
 RESOURCES:
 ==> v1/ConfigMap
 NAME DATA AGE
 hub-config 1 21s
  
 ==> v1/Deployment
 NAME READY UP-TO-DATE AVAILABLE AGE
 hub 0/1 1 0 21s
 proxy 0/1 1 0 21s
  
 ==> v1/PersistentVolumeClaim
 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
 hub-db-dir Bound pvc-8608fc30-7478-42f9-8f05-d6907896138c 1Gi RWO managed-nfs-storage 21s
  
 ==> v1/Pod(related)
 NAME READY STATUS RESTARTS AGE
 hub-65b79f57b6-8lnk7 0/1 ContainerCreating 0 21s
 proxy-79f45b4bb4-4b6zr 0/1 ContainerCreating 0 21s
  
 ==> v1/Role
 NAME AGE
 hub 21s
  
 ==> v1/RoleBinding
 NAME AGE
 hub 21s
  
 ==> v1/Secret
 NAME TYPE DATA AGE
 hub-secret Opaque 2 21s
  
 ==> v1/Service
 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
 hub ClusterIP 10.100.254.40 <none> 8081/TCP 21s
 proxy-api ClusterIP 10.108.146.3 <none> 8001/TCP 21s
 proxy-public LoadBalancer 10.102.222.146 <pending> 80:32522/TCP,443:32022/TCP 21s
  
 ==> v1/ServiceAccount
 NAME SECRETS AGE
 hub 1 21s
  
 ==> v1/StatefulSet
 NAME READY AGE
 user-placeholder 0/0 21s
  
 ==> v1beta1/PodDisruptionBudget
 NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
 hub 1 N/A 0 21s
 proxy 1 N/A 0 21s
 user-placeholder 0 N/A 0 21s
 user-scheduler 1 N/A 0 21s
  
  
 NOTES:
 Thank you for installing JupyterHub!
  
 Your release is named jhub and installed into the namespace default.
  
 You can find if the hub and proxy is ready by doing:
  
 kubectl --namespace=default get pod
  
 and watching for both those pods to be in status 'Ready'.
  
 You can find the public IP of the JupyterHub by doing:
  
 kubectl --namespace=default get svc proxy-public
  
 It might take a few minutes for it to appear!
  
 Note that this is still an alpha release! If you have questions, feel free to
 1. Read the guide at https://z2jh.jupyter.org
 2. Chat with us at https://gitter.im/jupyterhub/jupyterhub
 3. File issues at https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues

Since our cluster did not have public IPs and ingress had not been enabled, I had to configure haproxy to point to http NodePort 32522:

 frontend sni
 acl conda req.ssl_sni -i conda.example.com
 use_backend conda if conda
  
 backend conda
 timeout server 86400000
 server localhost 127.0.0.1:3443
  
 frontend conda-https
 timeout client 86400000
 bind :3443 ssl crt /opt/haproxy/etc/conda.example.com.crt
 default_backend conda-servers
  
 backend conda-servers
 timeout server 86400000
 balance source
 server ds-kube-minion-1.example.com 10.161.124.23:32522 check
 server ds-kube-minion-2.example.com 10.161.124.24:32522 check

The above haproxy config is no longer necessary because of k8s ingress:

 frontend sni
    timeout client 86400000
    bind :443
    tcp-request inspect-delay 5s
    tcp-request content accept if { req_ssl_hello_type 1 }
    default_backend kubernetes-ingress
 backend kubernetes-ingress
    timeout server 86400000
    server ds-kube-minion-1.lat.internal 10.161.124.23:443 check-ssl verify none

Important!

!JupyterHub proxy deployment is now processing ssl traffic via k8s ingress (conda-ssl-ingress) with tls secret (proxy-manual-tls) mounted via volumeMounts (tls-secret). k8s service name is proxy-public.

 apiVersion: apps/v1
 kind: Deployment
 metadata:
   annotations:
     deployment.kubernetes.io/revision: "4"
   creationTimestamp: "2019-08-28T03:05:07Z"
   generation: 4
   labels:
     app: jupyterhub
     chart: jupyterhub-0.8.2
     component: proxy
     heritage: Tiller
     release: jhub
   name: proxy
   namespace: default
 spec:
   progressDeadlineSeconds: 600
   replicas: 1
   revisionHistoryLimit: 10
   selector:
     matchLabels:
       app: jupyterhub
       component: proxy
       release: jhub
   strategy:
     rollingUpdate:
       maxSurge: 25%
       maxUnavailable: 25%
     type: RollingUpdate
   template:
     metadata:
       annotations:
         checksum/hub-secret: fc1435556e4b18bf057d25296940f1dbc49e803339faaddb0fbd94e34a7a1b88
         checksum/proxy-secret: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
       creationTimestamp: null
       labels:
         app: jupyterhub
         component: proxy
         hub.jupyter.org/network-access-hub: "true"
         hub.jupyter.org/network-access-singleuser: "true"
         release: jhub
     spec:
       affinity:
         nodeAffinity:
           preferredDuringSchedulingIgnoredDuringExecution:
           - preference:
               matchExpressions:
               - key: hub.jupyter.org/node-purpose
                 operator: In
                 values:
                 - core
             weight: 100
       containers:
       - command:
         - configurable-http-proxy
         - --ip=0.0.0.0
         - --api-ip=0.0.0.0
         - --api-port=8001
         - --default-target=http://$(HUB_SERVICE_HOST):$(HUB_SERVICE_PORT)
         - --error-target=http://$(HUB_SERVICE_HOST):$(HUB_SERVICE_PORT)/hub/error
         - --port=8443
         - --redirect-port=8000
         - --ssl-key=/etc/chp/tls/tls.key
         - --ssl-cert=/etc/chp/tls/tls.crt
         env:
         - name: CONFIGPROXY_AUTH_TOKEN
           valueFrom:
             secretKeyRef:
               key: proxy.token
               name: hub-secret
         image: jupyterhub/configurable-http-proxy:4.1.0
         imagePullPolicy: IfNotPresent
         name: chp
         ports:
         - containerPort: 8443
           name: proxy-https
           protocol: TCP
         - containerPort: 8000
           name: proxy-public
           protocol: TCP
         - containerPort: 8001
           name: api
           protocol: TCP
         resources:
           requests:
             cpu: 200m
             memory: 512Mi
         terminationMessagePath: /dev/termination-log
         terminationMessagePolicy: File
         volumeMounts:
         - mountPath: /etc/chp/tls
           name: tls-secret
           readOnly: true
       dnsPolicy: ClusterFirst
       restartPolicy: Always
       schedulerName: default-scheduler
       securityContext: {}
       terminationGracePeriodSeconds: 60
       volumes:
       - name: tls-secret
         secret:
           defaultMode: 420
           secretName: proxy-manual-tls
  
 ---
 apiVersion: v1
 kind: Service
 metadata:
   labels:
     app: jupyterhub
     chart: jupyterhub-0.8.2
     component: proxy-public
     heritage: Tiller
     release: jhub
   name: proxy-public
   namespace: default
 spec:
   ports:
   - name: http
     port: 80
     protocol: TCP
     targetPort: 8000
   - name: https
     port: 443
     protocol: TCP
     targetPort: 8443
   selector:
     component: proxy
     release: jhub
  
 ---
 apiVersion: networking.k8s.io/v1beta1
 kind: Ingress
 metadata:
   name: conda-ssl-ingress
   annotations:
     ingress.kubernetes.io/ssl-passthrough: "true"
   namespace: default
 spec:
   rules:
   - host: conda.example.com
     http:
       paths:
       - backend:
           serviceName: proxy-public
           servicePort: 443