Skip to content

2.2 Deploy AI Models

Let's revisit our Retrieval Augmented Generation design pattern. Note that we make use of two models to implement this design architecture.

  1. A Large Language Model (Embedding) for vectorizing the user query.
  2. A Large Language Model (Chat) for generating the response returned to user.

RAG


Let's find the right models to use and deploy them to our Azure AI project so we can use them in our RAG-based copilot implementation. Start by navigating to the Azure AI Project overview page. Then select the Models + endpoints link under My assets.

Deploy AI Model 1


1. Deploy Chat Model

1.1. Select A Chat Model

  1. Click the blue "Deploy model" button and pick Deploy base model from the dropdown options.

    Deploy AI Model 2

  2. If you know the specific model to use, you can search for it here. In our case, let's look at what our options are. First, click the Collections filter and select Azure OpenAI. Next, click the Inference tasks filter and select Chat completion. We can see that this reduces our choices from 1800+ models in the Azure AI model catalog to 9 matching models.

    Deploy AI Model 3


1.2. Deploy the Chat Model

  1. We can pick any of those options as shown above, to see a Model Card with more details. Let's pick gpt-4o-mini and click Confirm to get this deployment wizard. Note that it selects a Global Standard deployment type by default. The deployment has a default capacity of 10K tokens per minute (TPM) which can be useful when we begin the evaluation phase.

Deploy AI Model 4

  1. Click on the dropdown to see other Deployment type options. Read the documentation to learn more about what each provides. Global Standard is the recommended starting place for customers so let's go with that.

Deploy AI Model 5


1.3. Verify Deployment

On successful deployment, you will be taken to the model deployment page where you can review the details and retrieve relevant information like the Endpoint and Key information, for use with code-first clients.

Deploy AI Model 5

You can Open in playground and use the Azure AI Portal as an ideation tool to explore different prompt templates, model configurations and multi-turn conversational approaches to determine if this model is in fact suitable for your app scenario.

Task: Ask the model to Tell me about hiking boots for my trip to Spain - is response grounded?

Deploy AI Model 5

Homework: Complete the Deploy an enterprise chat app tutorial with your data. Is response grounded now?


1.4. View Metrics

You can also select the Metrics tab of deployed models within an Azure AI project to get metrics about the cost (tokens) and performance (requests) of that model in a given time frame. The screenshot shows the data taken for this model after completing the entire project. The spike reflects the requests made during the evaluation stage of the workflow.

Deploy AI Model 5

You can also click the Open in Azure Monitor link to open up the Azure Monitor dashboard in the Azure Portal as shown below, allowing you to drill down into various metrics or establish dashboards to monitor trends of interest.

Deploy AI Model 5


2. Deploy Embedding Model

The previous steps focused on the chat completion model which has many choices for us to select from. Now, let's look at embeddings.

2.1. Select & Deploy Model

Start by setting the Inference Task to Embeddings.

You'll find there are only about 11 models that match this filter - setting the Collection to Azure OpenAI reduces this further to 5. As before, let's select a model and review the card to see if it matches our needs.

Then Confirm to get the Deployment wizard dialog.

Deploy AI Model 5

And Deploy to complete the workflow.

Deploy AI Model 5

2.3. Verify Deployment

Similarly, we can use the deployment card to view deployment details and explore metrics. But note that we don't have a Playground for embeddings.

Deploy AI Model 5


3. Deployment Complete

The Azure AI Project overview page will now list both models under the Models + endpoints tab for easy lookup later (if you want to explore metrics or engage in Playground).

Deploy AI Model Final


CONGRATULATIONS! You deployed both the AI models needed for RAG