Intel Kubernetes Service

The Intel® Kubernetes Service (IKS) gives you the tools to manage Kubernetes clusters for application development, AI/ML training, and helm chart deployments.

Tip

Currently IKS is only available to premium and enterprise account users.

Control Plane

IKS provides managed Kubernetes service in Intel® Tiber™ AI Cloud. IKS manages the availability and scalability of the Kubernetes control plane. For a technical overview, see also Kubernetes Control Plane Components.

Pricing

Pricing is 0.10 cents per cluster per hour. See Billing and usage for more information on payment methods and account types.

Provision Kubernetes Cluster

Create a Cluster

  1. Navigate to the Intel® Tiber™ AI Cloud console.

  2. In the menu at left, click the Intel Kubernetes Service menu.

  3. Visit the Overview tab to view the workflow.

  4. Click Clusters tab.

  5. Click :guilabel:` Launch Cluster`.

  6. Complete the required fields under Cluster details and configuration.

    1. In Cluster name, enter a name.

    2. In Select cluster K8S version, select a version.

      Cluster details and configuration

  7. Click Launch. After launching, the State column shows Updating.

  8. Under Cluster Name column, click your cluster.

    Note

    Now your Cluster name with Actions menu appears below.

Add Node Group to Cluster

  1. From the Actions pulldown menu, select Add node group.

  2. Enter your data in the Node group configuration menu.

    1. In Node type, choose between Virtual Machine Bare Metal for your node. Note the cost per hour. See also Compare Instance Types below.

    2. In Node group name, enter a name.

    3. In Node quantity, choose a quantity from 1 to 10. Select the number of worker nodes you need in your cluster.

    Tip

    You can scale the number of worker nodes up or down.

  3. Under Public Keys, select Upload Key or Refresh Keys.

  4. Select Upload Key, name your key and copy your local SSH public key in the fields shown.

  5. Select Upload Key.

  6. Now, in Node group configuration, check the box next to the SSH key you added.

Compare Instance Types

At any time during Node group configuration, you may choose Compare instance types. This pop-out screen helps you compare and select your preferred processor.

Launch Kubernetes Cluster

When you create a cluster, it includes:

  • K8S Control-plane

  • ETCD Database

  • Scheduler

  • API Server

  1. Select Launch.

  2. Now that your Node group is added, it shows Updating in submenu.

  3. When adding your Node Group is successful, each Node name appears and its State shows Active.

Connect to cluster

  1. Set the KUBECONFIG Environment Variable:

    export KUBECONFIG=/path/to/your/kubeconfig
    
  1. Verify Configuration: Ensure that the current context points to the correct cluster.

    kubectl config view
    

Kubeconfig Admin Access

Ideally, you export the KUBECONFIG to your secret management system and continue.

  1. In the Kubernetes Console, locate options below Kube Config.

  2. Copy or Download the KUBECONFIG file and export it to your development environment.

  3. For more help on exporting, follow related steps in the next section.

Caution

Exercise caution while downloading, accessing, or sharing this file.

Set Context for Multiple Clusters

  1. Optional: List all available contexts.

    kubectl config get-contexts -o=name
    
  2. Change directory, or create one if it doesn’t exist.

    cd ./kubeconfig
    
    mkdir ./kubeconfig
    
  3. In the Kubernetes Console, navigate to My clusters, Kube Config.

  4. From the Kubernetes Console, download (or copy) the KUBECONFIG file to the current directory.

  5. Extract the value from the KUBECONFIG and paste it into the shell, following the example below.

    1. Export KUBECONFIG as an environment variable as shown below.

    export KUBECONFIG =/home/sdp/.kube/dev-env
    
  6. Use kubectl config set context to modify an existing context or create a new cluster context.

    kubectl config set context
    
  7. To view them, enter command.

    kubectl get nodes
    

Important

If you wish to launch another cluster, return to the start of this section and perform all steps again, exporting a different KUBECONFIG file.

Controlling Node Auto-repair Behavior

By default, IKS auto-detects the worker node’s unavailability. If it’s unavailable beyond a specific grace period, it will automatically be replaced (auto-repair) with a fresh new node of the same type. If you do not desire this behavior for one or more worker nodes in your cluster, you may turn off the auto-repair functionality for any given worker node.

Auto-repair Options

If you want to opt out of auto repair mode (where a node will be automatically replaced when it becomes unavailable / unreachable after a grace period elapsed) then you must label the given node with autorepair=false.

As long as the node has this label, IKS will not replace the node if it becomes unavailable. The user interface will show the status as Updating when unavailable (and not ready in kubernetes), a sign to show it detected an unavailability of a node. If the node becomes available later, the status will change from Updating` to Active. During the unavailability of a node, if you remove the auto-repair label , then default behavior of auto-replacement of the node resume and IKS will replace the node, as designed.

We do not recommend removing the node from compute console when this label is On (defeats the purpose of a label in the first place). It will result in dangling node in your Kubernetes console.

Note

You can label the node using the out of the box adding and removing label functionality using kubectl​ commands.

Examples

Add a label to a node to avoid auto replacement:

kubectl label node ng-hdmqnphxi-f49b8 iks.cloud.intel.com/autorepair=false

Remove a label from a node to enable auto replacement:

kubectl label node ng-hdmqnphxi-f49b8 iks.cloud.intel.com/autorepair-

Manage Kubernetes Cluster

  1. Create a pod.

    kubectl apply -f pod-definition.yaml
    
  2. Create a YAML` or JSON file with your pod specificationsr. See example below.

    apiVersion: v1
    kind: Pod
    metadata:
    name: mypod
    spec:
    containers:
    - name: mycontainer
       image: nginx
    
  3. Replace “mypod” with the name of your pod.

    kubectl get pods kubectl describe pod mypod
    
  4. Update a Pod:

    kubectl edit pod mypod
    

    Note

    This opens the pod configuration in your default editor. Make changes and save the file.

  5. Delete a Pod. Replace mypod with the name of your pod.

    kubectl delete pod mypod
    

Upgrade Kubernetes Cluster

  1. In the Cluster name, Details, find the Upgrade link.

  2. Select Upgrade.

  3. In the Upgrade K8S Version, pull-down menu, select your desired version.

  4. Click the Upgrade button.

  5. During the upgrade, the Details menu State may show Upgrading controlplane.

    Note

    If the current version is penultimate to the latest version, only the latest version appears. When the version upgrade is successful, Cluster reconciled appears.

Apply Load Balancer

  1. Navigate to the Cluster name submenu.

  2. In the Actions menu, select Add load balancer.

  3. In the Add load balancer, complete these fields.

    1. Select the port number of your service from the dropdown menu.

    2. For Type, select public or private.

    3. Click on Launch.

  4. In the Cluster name submenu, view the Load Balancer menu.

  5. Your Load Balancer appears with Name and State shows Active .

K8S will automatically perform load balancing for your service.

Add Security Rules

You can create a security rule if you have already created a Load Balancer.

Note

If you haven’t created a Load Balancer, return to above section before proceeding. After a Cluster is available, you must create a Node Group.

  1. Click on your Cluster name.

  2. Select the tab Worker Node Group.

  3. Select Add Node Group.

  4. Complete all required fields as shown in Add Node Group to Cluster. Then return to this workflow.

  5. Wait until the State shows “Active” before proceeding.

  6. Complete all steps in Apply Load Balancer. Then return here.

Add security rule to your own Load Balancer

  1. For your own Load Balancer, click Edit.

  2. Add an Source IP address to create a security rule.

  3. Select a protocol.

  4. Click Save. The rule is created.

Edit or delete security rule

Optional: After the State changes to Active:

  • You may edit the security rules by selecting Edit.

  • You may delete the security rule by selecting Delete.

Add security rule to default Load Balancer

  1. Navigate to the Security tab. You may see Load Balancers populated in a table.

    Note

    The public-apiserver is the default Load Balancer.

  2. For the public-apiserver, click “Edit”.

  3. Then add an Source IP address to create a security rule.

  4. Select a protocol.

  5. Click Save The rule is created.

Configure Ingress, Expose Cluster Services

Note

This requires helm version 3 or a helm client utility. See also Helm Docs.

  1. Create a cluster with at least one worker node. See Create a Cluster.

  2. Create a Load balancer (public) using port 80. See Apply Load Balancer.

    Note

    This IP is used in the last step in the URL for testing Your port number may differ.

  3. Install the ingress controller.

    helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace --set controller.hostPort.enabled=true
    
  4. To install test NGINX POD, Service, and Ingress object, download ingress-test.yml.

    1. Alternatively, copy the contents of file and save it as ingress-test.yml

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: my-app
        labels:
          app: my-app
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: my-app
        template:
          metadata:
            labels:
              app: my-app
          spec:
            containers:
            - name: my-app
              image: nginx:stable
              ports:
                - containerPort: 80
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: my-service
      spec:
        selector:
          app: my-app
        ports:
          - protocol: TCP
            port: 80
            targetPort: 80
      ---
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      metadata:
        name: minimal-ingress
        annotations:
          nginx.ingress.kubernetes.io/rewrite-target: /
      spec:
        ingressClassName: nginx
        rules:
        - http:
            paths:
            - path: /test
              pathType: Prefix
              backend:
                service:
                  name: my-service
                  port:
                    number: 80
      
  5. Run command to apply.

    kubectl apply -f ingress-test.yaml
    
  6. Visit your browser and test, inserting your IP where shown below.

    http://<IP>/test
    
    1. The IP mentioned here is the Public Load balancer IP.

De-Provision Kubernetes Cluster

Delete Cluster Group or Node

Delete Node Group

  1. In the Cluster name submenu select the Node group you wish to delete.

  2. Click Delete button.

Delete Node

  1. Below the Node name table, note Add node and Delete node

  2. Click Delete node button, as desired.

  3. Select Continue.

Deploy Example AI/ML Workloads

Add instance of Intel® Gaudi® 2 processor to a cluster to deploy LLM and Stable Diffusion models.

  1. Complete the tutorial Training a PyTorch Model on Intel Gaudi 2.

  2. Add nodes to the Intel Kubernetes Cluster.

  3. Assure you’re able to access the KUBECONFIG file and the Kubernetes Cluster.

Deploy Stable Diffusion

To deploy with Stable Diffusion, try an example below. Run this on a Intel® Gaudi® 2 processor instance and deploy it on an IKS cluster.

Intel® Gaudi® 2 processor with Stable Diffusion

To run Stable diffusion in IKS with Intel® Gaudi® 2 processor, apply the following configuration.

  1. Apply configuration if huge pages is not set in all nodes. Otherwise, skip to the next section.

    sudo sysctl -w vm.nr hugepages=156300
    
  2. Verify configuration.

    grep HugePages Free /proc/meminfo
    
    grep HugePages Total /proc/meminfo
    
  3. Esnure that your output is similar to this.

    HugePages_Free:    34142
    
    HugePages_Total:   35201
    
  4. Use the suggested settings for model inference.

    hugepages2Mi: 500Mi
    memory: 60G
    
  5. Revise your YAML file, using this example.

    apiVersion: v1
    kind: Pod
    metadata:
      name: std
      labels:
        name: std
    spec:
      containers:
      - name: std
        image: docker.io/rramamu1/std-gaudi:latest
        securityContext:
          capabilities:
            add: ["SYS_NICE"]   
        ports:
            - containerPort: 8000
        resources:
          limits:
            habana.ai/gaudi: "1"
            hugepages-2Mi: 500Mi
            memory: 60G
            #cpu: "25"
    

HugePages Settings by Model

HugePages Settings

Model Name

hugepages-2Mi

Memory

Number of Cards

runwayml/stable-diffusion-v1-5

500Mi

6OG

1

meta-llama/Meta-Llama-3-70B-Instruct

9800Mi

250G

>= 2

mistralai/Mixtral-8x7B-Instruct-v0.1

9800Mi

250G

>= 2

mistralai/Mistral-7B-v0.1

600Mi

5OG

1

Generate Image with Stable Diffusion

Consider using this YAML deployment for Helm Chart resources.

  1. Download the Helm Charts from the STD Helm Charts.

  2. Configuration for hugepages, as noted above, is already applied.

    Note

    This YAML file overrides default configuration. Apply your custom configuration to this file to ensure your settings are applied.

    # Default values for tgi-chart.
    # This is a YAML-formatted file.
    # Declare variables to be passed into your templates.
    
    replicaCount: 1
    
    modelName: runwayml/stable-diffusion-v1-5
    
    hostVolumePath: /scratch-2/data
    
    image:
      repository: docker.io/rramamu1/std-gaudi
      pullPolicy: IfNotPresent
      # Overrides the image tag whose default is the chart appVersion.
      tag: "latest"
    
    service:
      type: ClusterIP
      port: 8000
    
    resources:
      numofgaudi: 1
      hugepages2Mi: 500Mi
      #cpu: 25
      memory: 60G
    
  3. Next, run the install command.

    helm install std std-chart -f ./std-values.yaml
    
  4. Access the result using the load balancer IP.

    Note

    Ensure you followed the section Apply Load Balancer.

  5. Construct a full URL for the Load Balancer by following this two-step process.

    1. Replace the value of <Load Balancer IP> with your own, as shown below.

      http://<Load Balancer IP>/std/generate_image
      
    2. Add the prompt, including parameters, as the second part of the URL.

      Example: The second part starts with “prompts=”

      http://<Load Balancer IP>/std/generate_image/prompts=dark sci-fi , A huge radar on mountain ,sunset, concept art&height=512&width=512&num_inference_steps=50&guidance_scale=7.5&batch_size=1&negative_prompts=''&seed=100&num_images_per_prompt=1
      
    3. Paste the full URL in a browser and press <Enter>.

    4. Change the value of “prompts=”, as desired.

      Example 2: Change the second part of the URL. Replace the text, starting with “prompts=”, as shown below.

      http://<Load Balancer IP>/std/generate_image/prompts=Flying Cars&height=512&width=512&num_inference_steps=50&guidance_scale=7.5&batch_size=1&negative_prompts=''&seed=100&num_images_per_prompt=1
      
    5. Paste the full URL in a browser and press <Enter>.

      Tip

      Your image will differ. Any image that you generate may require managing copyright permissions.

See Helm Docs for more details.

Generate Text with Stable Diffusion

Consider using this sample YAML deployment for Text Generation Interface (TGI). Refer to HugePages Settings by Model.

Note

To use this sample template, you must provide your own HUGGING_FACE_HUB_TOKEN value.

apiVersion: v1
kind: Pod
metadata:
  name: tgi-lama3
  labels:
    name: tgi-lama3
spec:
  tolerations:
  - key: "nodeowner"
    operator: "Equal"
    value: "admin"
    effect: "NoSchedule"
  containers:
  - name: tgi-lama3
    envFrom:
      - configMapRef:
          name: proxy-config
    image: ghcr.io/huggingface/tgi-gaudi:1.2.1 #amr-registry.caas.intel.com/bda-mlop/genops/tgi_gaudi:1.3 #ghcr.io/huggingface/tgi-gaudi:1.2.1
    securityContext:
      capabilities:
        add: ["SYS_NICE"]   
    env:
      - name: PT_HPU_ENABLE_LAZY_COLLECTIVES
        value: "true"
      - name: OMPI_MCA_btl_vader_single_copy_mechanism
        value: none
      - name: MODEL_ID
        value: meta-llama/Meta-Llama-3-8B-Instruct #meta-llama/Meta-Llama-3-8B #meta-llama/Llama-2-70b-chat-hf  
      - name: PORT 
        value: "8080"
      - name: HUGGINGFACE_HUB_CACHE 
        value: /models-cache
      - name: TGI_PROFILER_ENABLED 
        value: "true"    
      - name: NUM_SHARD 
        value: "1"
      - name: SHARDED 
        value: "false"    
      - name: HUGGING_FACE_HUB_TOKEN 
        value: "xxxxxxxxxxxxxxxxxxxxxxx"       
    resources:
      limits:
        habana.ai/gaudi: "1"
        hugepages-2Mi: 9200Mi
        memory: 200G
        #cpu: "50"
    volumeMounts:
        - name: models-cache
          mountPath: models-cache
  volumes:
  - name: models-cache
    hostPath:
     path: /data
     type: Directory     
  1. Download the TGI Helm Charts.

  2. To deploy TGI with Mistral with Helm:

    helm install mistral tgi-chart -f ./mistral-values.yaml
    
  3. Access the result with the load balancer IP.

    1. Follow the section Apply Load Balancer.

  4. Replace the value of <Load Balancer IP>, shown below, with your own.

    http://<Load Balancer IP>/mistral/generate