Getting started with setting up the models on OpenShift AI.
Requirements
- Admin access to an OpenShift AI ready cluster
- OpenShift CLI
Installing the OpenShift AI Operator
The OpenShift AI operator is going to be used in order to setup Model Serving Runtimes
and deploy the Models
needed for this workshop.
Instructions for installing the OpenShift AI Operator can be found here
Intalling the MinIO Operator
The MinIO Operator is going to be used in order to store the Models
needed for this workshop.
Instructions for installing the MinIO Operator can be found here
Uploading the Models to MinIO
Create a MinIO
bucket say models
and upload the Models
needed for this workshop.
In the models
bucket let’s create a folder called llms
to store the models.
The models used in the workshop can be downloaded from:
- https://huggingface.co/ibm-granite/granite-7b-instruct
- https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
After uploading the models the folder should look like this:
Creating a Data Connection
In order to use the Models
on OpenShift AI we need to create a Data Connection
.
Open the OpenShift AI Console
and click on the Data Connections
tab.
Select the MinIO Object Storage and add a new MinIO data connection if one does not already exist.
Create a Serving Runtime
Go the OpenShift AI Console
and click on the Settings
tab, then select Serving Runtimes
.
Find vLLM ServingRuntime for KServe
or similar runtime for vllm
and click on the ...
on the left of that row.
Select Duplicate from that menu
.
Customize the display name as shown below:
Granite
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
annotations:
opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
openshift.io/display-name: vLLM ServingRuntime for KServe (granite)
labels:
opendatahub.io/dashboard: "true"
name: vllm-granite
spec:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
containers:
- args:
- --port=8080
- --model=/mnt/models
- --served-model-name=\{\{.Name\}\}
- --distributed-executor-backend=mp
- --enable-auto-tool-choice
- --tool-call-parser=granite
- --max_model_len=11000
command:
- python
- -m
- vllm.entrypoints.openai.api_server
env:
- name: HF_HOME
value: /tmp/hf_home
image: quay.io/modh/vllm:rhoai-2.17-cuda
name: kserve-container
ports:
- containerPort: 8080
protocol: TCP
multiModel: false
supportedModelFormats:
- autoSelect: true
name: vLLM
Mistral
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
annotations:
opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
openshift.io/display-name: vLLM ServingRuntime for KServe (mistral)
labels:
opendatahub.io/dashboard: "true"
name: vllm-mistral
spec:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
containers:
- args:
- --port=8080
- --model=/mnt/models
- --served-model-name=\{\{.Name\}\}
- --distributed-executor-backend=mp
- --max-model-len=17856
- --enable-auto-tool-choice
- --tool-call-parser=mistral
- --chat-template=examples/tool_chat_template_mistral_parallel.jinja
command:
- python
- -m
- vllm.entrypoints.openai.api_server
env:
- name: HF_HOME
value: /tmp/hf_home
image: quay.io/modh/vllm@sha256:3c56d4c2a5a9565e8b07ba17a6624290c4fb39ac9097b99b946326c09a8b40c8
name: kserve-container
ports:
- containerPort: 8080
protocol: TCP
multiModel: false
supportedModelFormats:
- autoSelect: true
name: vLLM
Troubleshooting
Different version of the vllm images may contain different configuration files, chat templates etc. If they are not found they can be mounted externally similarly to the model.
Create a model serving
Open the OpenShift AI Console
and click on the Model Serving
tab.
Select the target project and click on the Deploy
Button.
Fill the model configuration parameters and click on Deploy
.
After this step the model deployment should start and should be soon ready to use within the cluster.
Exposing the model to the outside world
If the model needs to be accessed from outside the cluster further modifications are required.
The model is deployed as a Knative InferenceService
that is configured by default for cluster visibility.
Edit there resource:
And then delete the following label: networking.knative.dev/visibility: cluster-local
.
The service is now exposed to the outside world. The endpoint URL can be found in the Model Serving
tab of
the OpenShift AI console.