Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The initial step is to define the data structure of the resource. This data structure defines the fields and the data types represented to a user.

type AppASpec struct {

    Name string `json:"name,omitempty"`

    Size int    `json:"size,omitempty"`

}

type AppAStatus struct {

    Status string `json:"status,omitempty"`

}


This data structure defines a name and a size field as user inputs and a status field which will contain the status information.

...

The operator SDK allows us to generate anthe OpenAPIv3 representation of the data structure automatically.

apiVersion: apiextensions.k8s.io/v1beta1

kind: CustomResourceDefinition

metadata:

 name: appas.app.example.com

spec:

 group: app.example.com

 names:

   kind: AppA

   listKind: AppAList

   plural: appas

   singular: appa

 scope: Namespaced

 subresources:

   status: {}

 validation:

   openAPIV3Schema:

     properties:

       apiVersion:

         description: 'APIVersion defines the versioned schema of this representation

           of an object. Servers should convert recognized schemas to the latest

           internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources'

         type: string

       kind:

         description: 'Kind is a string value representing the REST resource this

           object represents. Servers may infer this from the endpoint the client

           submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds'

         type: string

       metadata:

         type: object

       spec:

         properties:

           name:

             description: 'INSERT ADDITIONAL SPEC FIELDS - desired state of cluster

               Important: Run "operator-sdk generate k8s" to regenerate code after

               modifying this file Add custom validation using kubebuilder tags:

               https://book.kubebuilder.io/beyond_basics/generating_crd.html'

             type: string

           size:

             format: int64

             type: integer

         type: object

       status:

         properties:

           status:

             description: 'INSERT ADDITIONAL STATUS FIELD - define observed state

               of cluster Important: Run "operator-sdk generate k8s" to regenerate

               code after modifying this file Add custom validation using kubebuilder

               tags: https://book.kubebuilder.io/beyond_basics/generating_crd.html'

             type: string

         type: object

 version: v1alpha1

 versions:

 - name: v1alpha1

   served: true

   storage: true


This CRD (Custom Resource Definition) above can be applied to the KubeAPI server which will then expose anthe endpoint to the resource. KubeAPI will perform syntax validation based on thise definition.

...

Once the application resources are created, the Manager Controller watches for changes on the resources. The Manager Controller status provides a holistic view on the state of the entire system.

Manifest format


apiVersion: contrail.juniper.net/v1alpha1

kind: Manager

metadata:

 name: cluster-1

spec:

 size: 1

 hostNetwork: true

 contrailStatusImage: hub.juniper.net/contrail-nightly/contrail-status:5.2.0-0.740

 imagePullSecrets:

 - contrail-nightly

 config:

   activate: true

   create: true

   configuration:

     cloudOrchestrator: kubernetes

   images:

     api: hub.juniper.net/contrail-nightly/contrail-controller-config-api:5.2.0-0.740

     devicemanager: hub.juniper.net/contrail-nightly/contrail-controller-config-devicemgr:5.2.0-0.740

     schematransformer: hub.juniper.net/contrail-nightly/contrail-controller-config-schema:5.2.0-0.740

     servicemonitor: hub.juniper.net/contrail-nightly/contrail-controller-config-svcmonitor:5.2.0-0.740

     analyticsapi: hub.juniper.net/contrail-nightly/contrail-analytics-api:5.2.0-0.740

     collector: hub.juniper.net/contrail-nightly/contrail-analytics-collector:5.2.0-0.740

     redis: hub.juniper.net/contrail-nightly/contrail-external-redis:5.2.0-0.740

     nodemanagerconfig: hub.juniper.net/contrail-nightly/contrail-nodemgr:5.2.0-0.740

     nodemanageranalytics: hub.juniper.net/contrail-nightly/contrail-nodemgr:5.2.0-0.740

     nodeinit: hub.juniper.net/contrail-nightly/contrail-node-init:5.2.0-0.740

     init: busybox

 control:

   activate: true

   create: true

   images:

     control: hub.juniper.net/contrail-nightly/contrail-controller-control-control:5.2.0-0.740

     dns: hub.juniper.net/contrail-nightly/contrail-controller-control-dns:5.2.0-0.740

     named: hub.juniper.net/contrail-nightly/contrail-controller-control-named:5.2.0-0.740

     nodemanager: hub.juniper.net/contrail-nightly/contrail-nodemgr:5.2.0-0.740

     nodeinit: hub.juniper.net/contrail-nightly/contrail-node-init:5.2.0-0.740

     init: busybox

 kubemanager:

   activate: true

   create: true

   images:

     kubemanager: hub.juniper.net/contrail-nightly/contrail-kubernetes-kube-manager:5.2.0-0.740

     nodeinit: hub.juniper.net/contrail-nightly/contrail-node-init:5.2.0-0.740

     init: busybox

   configuration:

     serviceAccount: contrail-service-account

     clusterRoleBinding: contrail-cluster-role-binding

     clusterRole: contrail-cluster-role

     cloudOrchestrator: kubernetes

     #useKubeadmConfig: true

     kubernetesApiServer: "10.96.0.1"

     kubernetesApiSecurePort: 443

     kubernetesPodSubnets: 10.32.0.0/12

     kubernetesServiceSubnets: 10.96.0.0/12

     kubernetesClusterName: kubernetes

     kubernetesIpFabricForwarding: true

     kubernetesIpFabricSnat: true

     k8sTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

 webui:

   activate: true

   create: true

   images:

     webuiweb: hub.juniper.net/contrail-nightly/contrail-controller-webui-web:5.2.0-0.740

     webuijob: hub.juniper.net/contrail-nightly/contrail-controller-webui-job:5.2.0-0.740

     nodeinit: hub.juniper.net/contrail-nightly/contrail-node-init:5.2.0-0.740

 vrouter:

   activate: true

   create: true

   images:

     vrouteragent: hub.juniper.net/contrail-nightly/contrail-vrouter-agent:5.2.0-0.740

     vrouterkernelinit: hub.juniper.net/contrail-nightly/contrail-vrouter-kernel-init:5.2.0-0.740

     vroutercni: hub.juniper.net/contrail-nightly/contrail-kubernetes-cni-init:5.2.0-0.740

     nodemanager: hub.juniper.net/contrail-nightly/contrail-nodemgr:5.2.0-0.740

     nodeinit: hub.juniper.net/contrail-nightly/contrail-node-init:5.2.0-0.740

 cassandra:

   activate: true

   create: true

   images:

     cassandra: gcr.io/google-samples/cassandra:v13

     init: busybox

   configuration:

     cassandraListenAddress: auto

     cassandraPort: 9160

     cassandraCqlPort: 9042

     cassandraSslStoragePort: 7001

     cassandraStoragePort: 7000

     cassandraJmxPort: 7199

     cassandraStartRpc: true

     cassandraClusterName: ContrailConfigDB

     maxHeapSize: 512M

     heapNewSize: 100M

     nodeType: config-database

 zookeeper:

   activate: true

   create: true

   images:

     zookeeper: hub.juniper.net/contrail-nightly/contrail-external-zookeeper:5.2.0-0.740

     init: busybox

   configuration:

     zookeeperPort: 2181

     zookeeperPorts: 2888:3888

     nodeType: config-database

 rabbitmq:

   activate: true

   create: true

   images:

     rabbitmq: hub.juniper.net/contrail-nightly/contrail-external-rabbitmq:5.2.0-0.740

     init: busybox

   configuration:

     erlangCookie: 47EFF3BB-4786-46E0-A5BB-58455B3C2CB4

     nodePort: 5673

     nodeType: config-database


Workflow diagrams

Cassandra Controller

...

The Cassandra Pods require a configuration file and each Pod’s configuration file contains Pod specific and cluster wide configuration information, as such, each Pod requires its own configuration file. In a Deployment, there is no concept of a per Pod configuration. All Pods share the same configuration. In order to overcome this limitation, the configuration is written as a value in a ConfigMap. The key for the value is a Pod identifier (Pod Name or IP). Each ConfigMap has a key per Pod and the key’s value is the Pod configuration.

apiVersion: v1

kind: ConfigMap

metadata:

 name: cassandra-cluster1

 namespace: default

data:

 "172.16.0.1": |

     listen_address: "172.16.0.1"

     seed_provider:

     - class_name: org.apache.cassandra.locator.SimpleSeedProvider

       parameters:

       - seeds: 172.16.0.1,172.16.0.2

 "172.16.0.2": |

     listen_address: "172.16.0.2"

     seed_provider:

     - class_name: org.apache.cassandra.locator.SimpleSeedProvider

       parameters:

       - seeds: 172.16.0.1,172.16.0.2

 "172.16.0.3": |

     listen_address: "172.16.0.3"

     seed_provider:

     - class_name: org.apache.cassandra.locator.SimpleSeedProvider

       parameters:

       - seeds: 172.16.0.1,172.16.0.2

The ConfigMap keys are mounted as VolumeMounts inside the Pods, where the key is the filename and the value the content of the file. Each Pod will have all configuration files:

➜  ~ cat configs/172.16.0.1.yaml

listen_address: "172.16.0.1"

seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  parameters:

  - seeds: 172.16.0.1,172.16.0.2

➜  ~ cat configs/172.16.0.2.yaml

listen_address: "172.16.0.2"

seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  parameters:

  - seeds: 172.16.0.1,172.16.0.2

➜  ~ cat configs/172.16.0.3.yaml

listen_address: "172.16.0.3"

seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  parameters:

  - seeds: 172.16.0.1,172.16.0.2


Each Pod is configured with a command parameter which starts Cassandra using the Pod specific configuration file:

command := []string{"bash", "-c", "/docker-entrypoint.sh cassandra -f -Dcassandra.config=file:///configs/${POD_IP}.yaml"}


This allows the Cassandra Controller to update the configuration at runtime. I.e. in case two nodes are added to the cluster, one node should be added to the seeds list. The Controller updates the ConfigMap values and changes  

...

In order to signalize to other controllers the readiness of the Cassandra cluster, Readiness Probes are used. Readiness Probes run in a loop and execute a command inside the container and return 0 or -1. If all Pods of the cluster return 0, the cluster is considered to be ready.

      readinessProbe:

         exec:

           command:

           - /bin/bash

           - -c

           - "seeds=$(grep -r '  - seeds:' /mydata/${POD_IP}.yaml |awk -F'  - seeds: ' '{print $2}'|tr  ',' ' ') &&  for seed in $(echo $seeds); do if [[ $(nodetool status | grep $seed |awk '{print $1}') != 'UN' ]]; then exit -1; fi; done"


// Needs to be adjusted as it only checks for the seeds but not all nodes

...

When a Cassandra node leaves the cluster, it needs to be de-registered first. When the Pod is stopped, a life-cycle hook executes a command inside the container before the Pod is terminated.

      lifecycle:

         preStop:

           exec:

             command:

             - /bin/sh

             - -c

             - nodetool drain

Zookeeper Controller

The Zookeeper Controller operates the Zookeeper cluster. It allows to

...

Crafting the static configuration is a bit complex as Zookeeper process cannot be started with a configuration file as an argument, it only takes a directory. Mounting a ConfigMap as a volume does not allow for per Pod directory names, the directory name is the same across all Pods and has no node identifier. As a consequence the controller can only craft a configuration which is common to all Pods. However, the configuration file requires Pod specific information. This is where the container startup command comes into play:

command := []string{"bash", "-c", "myid=$(cat /mydata/${POD_IP}) && echo ${myid} > /data/myid && cp /conf-1/* /conf/ && sed -i \"s/clientPortAddress=.*/clientPortAddress=${POD_IP}/g\" /conf/zoo.cfg && zkServer.sh --config /conf start-foreground"}


It takes the generic static configuration file which was crafted by the controller, adds the Pod specific information and copies it to a separate directory. Zookeeper is started by pointing to that directory. The static configuration is mutable and cannot be changed at runtime. Changes to static configuration require a re-creation of the Pod.

...

Static configuration file (per Pod specific)

root@zookeeper-cluster-1-7c9467656b-hzkg2:/apache-zookeeper-3.5.5-bin# cat /conf/zoo.cfg
clientPort=2181
clientPortAddress=172.17.0.7
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=5
syncLimit=2
maxClientCnxns=60
admin.enableServer=true
standaloneEnabled=false
4lw.commands.whitelist=stat,ruok,conf,isro
reconfigEnabled=true
dynamicConfigFile=/mydata/zoo.cfg.dynamic.100000000


Dynamic configuration file (same across all Pods)

root@zookeeper-cluster-1-7c9467656b-hzkg2:/apache-zookeeper-3.5.5-bin# cat /mydata/zoo.cfg.dynamic.100000000
server.1=172.17.0.6:2888:3888:participant
server.2=172.17.0.7:2888:3888:participant
server.3=172.17.0.8:2888:3888:participant

Readiness Probe

      readinessProbe:

         exec:

           command:

           - /bin/bash

           - -c

           - "OK=$(echo ruok | nc ${POD_IP} 2181); if [[ ${OK} == \"imok\" ]]; then exit 0; else exit 1;fi"


Rabbitmq Controller

The Rabbitmq Controller operates the Rabbitmq cluster. It allows to

...

Rabbitmq cannot be started using a configuration file location as an argument. The location of the configuration file is defined by environment variables. Each Pod requires its own RABBITMQ_NODENAME, as such the environment variable cannot be defined as part of the Pod definition. Therefore the export of the variable and the startup sequence are part of the startup command.

            runner := `#!/bin/bash

echo $RABBITMQ_ERLANG_COOKIE > /var/lib/rabbitmq/.erlang.cookie

chmod 0600 /var/lib/rabbitmq/.erlang.cookie

export RABBITMQ_NODENAME=rabbit@${POD_IP}

if [[ $(grep $POD_IP /etc/rabbitmq/0) ]] ; then

 rabbitmq-server

else

 rabbitmqctl --node rabbit@$(cat /etc/rabbitmq/0) ping

 while [[ $? -ne 0 ]]; do

   rabbitmqctl --node rabbit@$(cat /etc/rabbitmq/0) ping

 done

 rabbitmq-server -detached

 rabbitmqctl --node rabbit@$(cat /etc/rabbitmq/0) node_health_check

 while [[ $? -ne 0 ]]; do

   rabbitmqctl --node rabbit@$(cat /etc/rabbitmq/0) node_health_check

 done

 rabbitmqctl stop_app

 sleep 2

 rabbitmqctl join_cluster rabbit@$(cat /etc/rabbitmq/0)

 rabbitmqctl shutdown

 rabbitmq-server

fi

command := []string{"bash", "/runner/run.sh"}

Readiness Probe

The Controller creates a file in each Pod which contains the intended amount of total Pods. The Readiness Probe script compares that with the output of the rabbitmqctl cluster_status command.

      readinessProbe:

         exec:

           command:

           c- /bin/bash

           - -c

           - "export RABBITMQ_NODENAME=rabbit@$POD_IP; cluster_status=$(rabbitmqctl cluster_status);nodes=$(echo $cluster_status | sed -e 's/.*disc,\\[\\(.*\\)]}]}, {.*/\\1/' | grep -oP \"(?<=rabbit@).*?(?=')\"); for node in $(cat /etc/rabbitmq/rabbitmq.nodes); do echo ${nodes} |grep ${node}; if [[ $? -ne 0 ]]; then exit -1; fi; done"

         initialDelaySeconds: 15

         timeoutSeconds: 5

Node Drain

When a Pod is (gracefully) stopped, the command rabbitmqctl reset is executed.

...

Nodes will be labeled with a nodeselector based on which the correct Group/Daemonset will be deployed.

Data structure

type Vrouter struct {

    Spec Spec

}

type Spec struct {

    Groups []*Group

    Profiles []*Profile

}

type Group struct {

    Name           string

    Profiles       []*Profile

    VrouterGateway net.IPAddr

    NodeSelector   NodeSelector

    Tolerations    []Toleration

}

type Profile struct {

    Name                    string

    DpdkConfiguration       DpdkConfiguration

    KernelModeConfiguration KernelModeConfiguration

    SriovConfiguration      SriovConfiguration

    OtherConfig             OtherConfig

}

type DpdkConfiguration struct {

    CoreMask   string

    MoreConfig map[string]string

}

type KernelModeConfiguration struct {

    MoreConfig map[string]string

}

type SriovConfiguration struct {

    NumberOfVfs             int

    VirtualFunctionMappings []string

    MoreConfig              map[string]string

}

type OtherConfig struct {

    MoreConfig map[string]string

}

Custom Resource

vrouterProfileTemplates:

 - metadata:

     name: dpdk-profile1

     labels:

       contrailcluster: cluster-1

   spec:

     dpdkConfiguration:

     coreMask: "0xF"

     2MBHugePages: 1024

     1GBHugePages: 10

     cpuPinning:

     moreConfig:

       key1: value1

       key2: value2

 - metadata:

     name: sriov-profile1

     labels:

       contrailcluster: cluster-1

   spec:

     sriovConfiguration:

       numberOfVfs: 7

       virtualFunctionMappings:

       - vf1

       - vf2

     moreConfig:

       key1: value1

       key2: value2

 - metadata:

     name: kernelmode-profile1

     labels:

       contrailcluster: cluster-1

   spec:

     kernelModeConfiguration:

     moreConfig:

       key1: value1

       key2: value2

 vrouterTemplates:

 - metadata:

     name: vrouter-dpdk-group1

     kind: ContrailVrouter

     labels:

       contrailcluster: cluster-1

   spec:

     activate: true

     nodeSelector:

       node-role.kubernetes.io/infra: ""

       nicType: x710

     tolerations:

     - operator: Exists

       effect: NoSchedule

     override: false

     upgradeStrategy: rolling

     configuration:

       vRouterGateway: 1.1.1.1

       profiles:

       - dpdk-profile1

       - other-profile1

 - metadata:

     name: vrouter-sriov-group 1

     kind: ContrailVrouter

     labels:

       contrailcluster: cluster-1

   spec:

     activate: true

     nodeSelector:

       node-role.kubernetes.io/infra: ""

       nodeType: sriov

     tolerations:

     - operator: Exists

       effect: NoSchedule

     override: false

     upgradeStrategy: rolling

     configuration:

       vRouterGateway: 1.1.1.2

       profiles:

       - sriov-profile1

       - other-profile1

 - metadata:

     name: vrouter-kernelmode-group1

     kind: ContrailVrouter

     labels:

       contrailcluster: cluster-1

   spec:

     activate: true

     nodeSelector:

       node-role.kubernetes.io/infra: ""

       nodeType: sriov

     tolerations:

     - operator: Exists

       effect: NoSchedule

     override: false

     upgradeStrategy: rolling

     configuration:

       vRouterGateway: 1.1.1.3

       profiles:

       - kernelmode-profile1

       - other-profile1

...




Action Items:

  •  Unittests

...

  •  Unittests

...

  •  Unittests

...

...

...

  •  Add controllers for fabric Pods (swift, ironic, keystone, mysql)

...

  •  Description of rolling/in-place upgrade per Controller

...

  •  Ansible playbook (...due to the lack of any real alternatives) for kubernetes deployment (we should consider KubeSpray) - tested with KubeSpray

...

  •  KubeAPI HA strategy

...

  •  reverse proxy per node?

...

...

...

  •  Contrail-status must be replaced. Status of the components must be shown in the status field of the resource

...

  •  Add DPDK/SRIOV agent roles

...

  •  Add TLS

...