Skip to content

Resources

1. API Reference:

API Endpoint Description
/info Returns the information about the model deployed under the endpoint.
/embeddings Creates an embedding vector representing the input text.
/completions Creates a completion for the provided prompt and parameters (think request-response)
/chat/completions Creates a model response for the given chat conversation. (think multi-turn with history)
/images/embeddings Creates an embedding vector representing the input image and text pair.

2. Documentation

The inference API is currently supported with SDKs for Python, JavaScript, C# and REST API calls.

  1. Azure AI Model Inference API - Main documentation page for the API

3. Code Samples

Samples can be used with Serverless API and Managed Compute model deployments. All samples use Python and depend on the azure-ai-inference library, with the async samples also requiring the aiohttp library.

  1. 17 Samples using sync client with chat completions endpoint
  2. 4 Samples using sync client with embeddings endpoint
  3. 6 Samples using async client with chat completions endpoint
  4. 1 Samples using async client with embeddings endpoint