Skip to content

Latest commit

 

History

History
199 lines (175 loc) · 7.4 KB

tiered_locality_schedule.md

File metadata and controls

199 lines (175 loc) · 7.4 KB

Demo - Pod Scheduling Base on Runtime Tiered Locality

In Pod Scheduling Optimization, we introduce how to schedule application Pods to nodes with cached data.

However, in some cases, if the data cached nodes cannot be scheduled with the application Pod, the Pod will be scheduled to a node closer to the data cached nodes, such as on the same zone, its read and write performance will be better than in different zones.

Fluid supports configuring tiered locality information in K8s clusters, which is stored in the configmap named 'webhook-plugins' in fluid system namespace. file of Fluid's Helm Chart.

The following is a specific example, assuming that the K8s cluster has locality information for zones and regions, achieving the following goals:

  • When the application Pod is not configured with required dataset scheduling, prefer to schedule pod to data cached nodes. If pods can not be scheduled in data cached nodes, prefer to be scheduled in the same zone. If pods can not be scheduled in the same zone nodes too, then prefer to be scheduled in the same region;
  • When using Pod to configure required dataset scheduling, require pod to be scheduled in the same zone of data cached nodes instead of the data cached nodes.

0. Prerequisites

The version of k8s you are using needs to support admissionregistration.k8s.io/v1 (Kubernetes version > 1.16 ) Enabling allowed controllers needs to be configured by passing a flag to the Kubernetes API server. Make sure that your cluster is properly configured.

--enable-admission-plugins=MutatingAdmissionWebhook

Note that if your cluster has been previously configured with other allowed controllers, you only need to add the MutatingAdmissionWebhook parameter.

1. Configure Tiered Locality in Fluid

  1. Configure before installing Fluid

Define the tiered locality configuration in the Helm Charts values.yaml like below.

pluginConfig:
  - name: NodeAffinityWithCache
    args: |
      preferred:
        # fluid built-in name(default not enabled), used to schedule pods to the node with existing fuse pod
        # - name: fluid.io/fuse
        #   weight: 100
        # fluid built-in name(default enabled), used to schedule pods to the data cached node
        - name: fluid.io/node
          weight: 100
        # runtime worker's zone label name(default enabled), can be changed according to k8s environment.
        - name: topology.kubernetes.io/zone
          weight: 50
        # runtime worker's region label name(default enabled), can be changed according to k8s environment.
        - name: topology.kubernetes.io/region
          weight: 10
      required:
        # If Pod is configured with required affinity, then schedule the pod to nodes match the label.
        # Default value is 'fluid.io/node'. Multiple names is the And relation.
        - topology.kubernetes.io/zone

Install Fluid following the document Installation. After installation, a configmap named webhook-plugins storing above configuration will exist in Fluid namespace(default fluid-system).

  1. Modify tiered locality configuration in the existing Fluid cluster

Modify tiered location configuration (content see point 1) in the configMap named 'webhook-plugins' in the Fluid namespace (default fluid-system), the new configuration only takes affect when the fluid-webhook pod restarts.

2. Configure the tiered locality information for the Runtime

Tiered location information can be configured through the NodeAffinity field of the Dataset or the NodeSelector field of the Runtime.

The following is the configuration of tiered location information defined in the yaml of the Dataset. And the workers of the Runtime will be deployed on nodes matching these labels.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: hbase
spec:
  mounts:
    - mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/stable/
      name: hbase
  nodeAffinity:
    required:
      nodeSelectorTerms:
      	- matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values: 
              - zone-a
          - key: topology.kubernetes.io/region
            operator: In
            values:
              - region-a

3. Application Pod Scheduling

3.1 Preferred Affinity Scheduling

Creating the Pod

$ cat<<EOF >nginx-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-1
  labels:
    # enable Fluid's scheduling optimization for the pod
    fuse.serverful.fluid.io/inject: "true"
spec:
  containers:
    - name: nginx-1
      image: nginx
      volumeMounts:
        - mountPath: /data
          name: hbase-vol
  volumes:
    - name: hbase-vol
      persistentVolumeClaim:
        claimName: hbase
EOF
$ kubectl create -f nginx-1.yaml

Check the Pod

Checking the yaml file of Pod, shows that the following affinity constraint information has been injected:

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - preference:
          matchExpressions:
            - key: fluid.io/s-default-hbase
              operator: In
              values:
                - "true"
          weight: 100
        - preference:
            matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                  - "zone-a"
          weight: 50
        - preference:
            matchExpressions:
              - key: topology.kubernetes.io/region
                operator: In
                values:
                  - "region-a"
          weight: 10         

These affinity will achieve the following effects::

  • If the data cached node (a node with the label 'fluid.io/s-default-hbase') is schedulable, schedule Pod to that node;
  • If the data cached node is un-schedulable, prefer to schedule pod to nodes in the same zone ("zone-a");
  • If the same zone nodes are un-schedulable, prefer to schedule pod to nodes in the same region ("region-a");
  • All of the above are not met, schedule to other schedulable nodes in the cluster.

3.2 Required Affinity Scheduling

If sets pod with required dataset scheduling as below :

apiVersion: v1
kind: Pod
metadata:
  name: nginx-1
  labels:
    # required dataset scheduling
    fluid.io/dataset.hbase.sched: required
    fuse.serverful.fluid.io/inject: "true"
spec:
  containers:
    - name: nginx-1
      image: nginx
      volumeMounts:
        - mountPath: /data
          name: hbase-vol
  volumes:
    - name: hbase-vol
      persistentVolumeClaim:
        claimName: hbase

Pod will be injected with required node affinity, as shown below, forcing scheduling to nodes with value "zone-a" for label "topology.kubernetes.io/zone" .

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
                - "zone-a"

3.3 Note

  1. If the application Pod specifies the affinity about tiered locality information (defined in 'spec.affinity' or 'spec.nodeselector'), webhook will not inject the relevant location affinity, and the user's configuration will be kept:
  2. The affinity scheduling of tiered location is a global configuration that takes effect for all datasets and does not support different affinity configurations for different datasets;