Chat Completion and Embedding API Configuration

Overview

This area of the product allows users to configure connection information and specify the models they are using for both Chat Completion endpoints and Embedding endpoints.

Purpose

The purpose of the Language Model API Configuration feature is to provide AnswerRocket with the necessary third party tools to run a chat experience.

Prerequisites

  1. Obtain API keys from your OpenAI account or from Azure.

    1. See the Max Release Notes for the release you are on to see what model versions were tested with the release.

    2. You will need both Chat Completion model and an Embedding model.

    3. Make sure you get enough tokens for your environment. We recommend you start with 150K TPM for the language model and then monitor usage and adjust from there. Actual requirements will vary by use case and number of active users.

    4. For OpenAI go to OpenAI API Keys to manage your keys. See OpenAI documentation to learn more.

    5. For Azure create an Azure OpenAI resource and then create deployments for each of the models you plan to use. See Azure documentation to learn more.

Navigation Path

  1. Navigate to "Skill Studio".

  2. Select "API Configurations" from the left panel.

  3. Within this area, you can add a "Chat Completion" and add "Embeddings".

Configuring

  1.  
  1. Add Chat Completion:

    • Click on the "Add" button next to "Chat Completion".

    • Enter the required connection information and specify the model details.

      • Name: A name you give this entry. The name is up to you. A common practice is to use the model name.

      • Choose whether you are using Azure or Open AI

      • Azure Specific Configuration

        • Endpoint: Provide the endpoint for the resource you have provisioned in Azure. This is a URL

        • Key: Provide the key for the resource you have provisioned in Azure. Keep this a secret.

        • Deployment Name: Provide the name of the deployment that the model was deployed under.

        • Model Name: The name of the model used for the deployment. The name must match

      • See advanced configuration notes below for the remaining fields. Generally these should not be modified.

    • Once all the details are entered, click on "Add" to apply the configurations.

  2. Add Embeddings:

    • Click on the "Add" button next to "Embeddings".

    • Enter the required connection information and specify the model details.

    • Once all the details are entered, click on "Add" to apply the configurations.

Ensure that a default model is set for each of Chat, Completion, and Embedding. Each is required for the system to properly operate.

Examples and Use Cases

Example 1: Cost-Efficiency and Performance Flexibility

Scenario: A user wants to use different models for different purposes to balance cost and performance.

  • Chat and Narrative: The user configures a fast and low cost model for the narrative endpoint to ensure fast response times. However uses a slower more advanced model to ensure a quality chat experience.

  • Outcome: The system runs efficiently with high performance for interactive components and cost savings for reports.

Example 2: Testing and Development

Scenario: A user wants to test the latest models and easily switch between them for various tasks.

  • Testing Latest Models: The user adds new models as they become available for testing purposes.

  • Switching Models: The user can easily toggle between the currently used model and the alternative model to evaluate performance and accuracy.

  • Outcome: The user can experiment with and compare different models without losing the current setup, facilitating smoother development and testing processes.

Example 3: Skill Development

  • Skill Development: During skill development, the user's skill code can directly access different models configured in the system. A model may be very good at the task the skill writer is asking of the model. This allows them to use a custom model within their code. For example within the skill a prompt is made to generate SQL. There could be a model very good at that but not as good at other tasks.

  • Outcome: The user can leverage custom models for specific tasks within their skill.

Advanced Options/Settings

For the Chat Completion models, there are advanced options available to fine-tune the model's behavior. These options can be explored in detail on the OpenAI API website. Here are some key options:

  • Temperature: Controls the creativity of the model. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more deterministic. We recommend keeping it at 0.

  • Max Tokens: Sets the maximum number of tokens to generate. Our default is 1024.

  • Top P: Controls diversity via nucleus sampling. Default is 1.

  • Frequency Penalty: Reduces the likelihood of repeating the same line verbatim. Default is 0.

  • Presence Penalty: Increases the likelihood of introducing new topics. Default is 0.

Recommendation: Generally, it is recommended to go with the default settings provided, but specific use cases may benefit from adjustments to these parameters for increased creativity or other desired outcomes.

Best Practices

  • Consistent Updates: Regularly check for new models and updates to ensure you are using the most efficient and accurate models available. Check our release pages for what we are testing.

  • Testing and Validation: Before switching models in production, thoroughly test them in a controlled environment.

  • Set Defaults: Ensure that a default model is set for Chat, Completion, and Embedding to avoid system malfunction.

 

Updated

Was this article helpful?