kubeRL
is a software framework implementing a reward biased scheduler extender in Kubernetes. kubeRL
adopts the idea from reinforcement learning to adaptively learn
the failures or performance issues of containers and model their runtime performance on nodes as rewards. kubeRL
then adaptively prevents
scheduling pods on nodes that give low rewards.
Resource scheduling has been intensively studied with multiple objectives. Common objectives include maximizing the return of investment of the cloud, reducing the power consumption and providing good load balancing, while meeting workload performance constraints and resource preferences. However, the existing schedulers assume that the workload performance only relies on resources or status that can be accurately monitored. However, accurate resource information about the the current and future state of the cloud can hardly be obtained. Besides, not all factors impacting the workload performance can be monitored.
However, from what we observed in production clusters, we find out the following issues.
-
Certain types of workloads fail to be scheduled on only a set of nodes due to some issues that are
unknown
to the scheduler. Such failures are very common in Kubernetes clusters, includesPersistentVolumeClaim is not bound
(a Persistent Volume Claim is created without the actual volume created),Operation cannot be fulfilled on pods/binding ...
(A pod is scheduled to a node where the binding operation always fails),Timeout
(A pod being scheduled to a node where the binding requests always time out), etc. -
Certain types of workloads always have poor performance on only a set of nodes due to factors
unobservable
to the scheduler. Machines in datacenters may have different hardware configurations, such as the CPU architectures, I/O speed, memory architectures, etc. Nodes in a Kubernetes cluster can be a virtual machine with the same t-shirt size, in terms of vcpu and memory size. However, the same t-shirt size does not guarantee they have the same speed when running the same workload, such as spark or deep learning jobs. We notice that some spark jobs are always slower on certain nodes though they have been allocated the same amount of resources. These are performance issues caused by factors that areunobservable
to schedulers.
kubeRL
treats workloads as blackboxes, monitors the above failures/performance issues as rewards for those pods during their runtime
and adaptively prevent scheduling similar pods on nodes that have given low rewards.
The framework should consists of following modules:
- A reward monitoring agent, which monitors a pod's reward on the node it runs on. The reward is supposed to be updated periodically or at runtime to reflect the runtime performance of the pod on a node.
- A reward cache etcd where rewards of pods are evaluated based on their performance/QoS/errors on all different nodes, depending on how a particular application/service defines its reward function.
- A reward biased scheduler extender that bias the scheduling decisions based on rewards that reflect how different pod types perform on different nodes.
The design of kubeRL
is shown in the following picture.
The testing environment includes:
- A Kubernetes cluster created by IBM Cloud Kubernetes Service (IKS) with Kubernetes version 1.15.3_1515.
- A cluster created by
kube-admin
.
When you have a Kubernetes cluster set up ready, you can deploy kubeRL
by Makefile
.
$ make deploy-only
You can see several pods with rl-
prefixes deployed.
Now you can run a simple demo to test how rl-monitor
pod updates the rewards and how rl-scheduler-extender
bias
the scheduling decisions for pods arriving after.
You can easily go through the following steps to try the kubeRL
.
- Step 1. Set up a Kubernetes cluster, through IKS or
kube-admin
. - Step 2: Step 2: Deploy kubeRL..
- Step 3: Watch logs and caching rewards on
rl-etcd
- Step 4: Build and deploy the
rl-demo
to emulate <pod, node> failures.. - Step 5. Test your
rl-scheduler
Details of the above steps are given in the followings.
- Set up a Kubernetes cluster using IBM Cloud Kubernetes Service.
- Or you can set up a Kubernetes Clusters on multiple VMs using
kubeadmin
. Please follow the tutorialCreating a single control-plane cluster with kubeadm
.
Deploy the kubeRL
framework.
$ cd kubeRL
$ make deploy-only
You should now see all pods with rl-
prefixes created.
$ watch 'kubectl get pods --all-namespaces |grep rl-'
You can stream logs to your console to study how kubeRL
works.
-
Open a terminal to watch
rl-monitor
pod logs.$ kubectl logs -f $(kubectl get pods |grep rl-monitor|cut -d' ' -f1)
-
Open a new terminal and get to the shell of a pod in the cluster to watch the reward changes in
rl-etcd
. You first need to get therl-etcd-client
endpoint forrl-etcd
.$kubectl describe service rl-etcd-client
You can log into a pod in the cluster to watch the rewards cached in
rl-etcd
.$kubectl exec -it shell-demo -- /bin/bash
When watching rewards in
rl-etcd
for pod type "D", you should be able to see cached rewards updating. Please replace the ip and port to the rl-etcd endpoint CLUSTER-IP and PORT in your cluster.watch -d 'curl -LsS http://172.30.187.125:2379/v2/keys/pods/D |jq .'
-
Open a terminal to stream the
rl-scheduler-extender
logs to console.$ kubectl logs -f -c rl-scheduler-extender $(kubectl get pods --namespace=kube-system|grep rl-scheduler|cut -d' ' -f1) --namespace=kube-system
rl-demo
will randomly choose two nodes in the cluster and periodically fail on these nodes with different frequencies,
one fails every 10 seconds and the other fails every 1 minute.
-
Set up your docker hub ID in
config.env
.USERID=chenw
-
Build the
rl-demo
image.make build-demo
-
Deploy the 3
rl-demo
pods on different nodes using the default kubernetes scheduler. The purpose to deployrl-demo
is to emulate pod crashes on different nodes and exam if ourrl-monitor
agent can learn failures from pods and update pods' rewards on different nodes accordingly.make demo
You can watch the states of all
rl-demo
pods to verify how they crash on different nodes.$watch 'kubectl get pods --selector=app=rl-demo'
When streaming logs for
rl-monitor
pod, you should be able to see rewards of pod type "D" being updated.$ kubectl logs -f $(kubectl get pods |grep rl-monitor|cut -d' ' -f1)
-
You can now test if your
rl-scheduler
can schedule a new pod according to rewards learned. Let's create a newnginx
pod underpodtype=D
usingrl-scheduler
kubectl create -f manifests/test-pod.yaml
-
You are expected to see the following when streaming logs from
rl-scheduler
.$ kubectl logs -f -c rl-scheduler-extender $(kubectl get pods --namespace=kube-system|grep rl-scheduler|cut -d' ' -f1) --namespace=kube-system