Diagnostics

Introduction

The Diagnostic Feature is a powerful tool designed to help users understand and optimize the entire answer pipeline, ensuring accurate and efficient responses. This feature provides general information about the answer, including the source copilot, cost, model, token usage, and more. It also provides comprehensive view of the various stages involved in generating answers, from system prompt and hints to skill selection and parameter settings, grounding, and response generation. Additionally, it offers capabilities for evaluating behavior in a playground environment and tracking admin feedback for continuous improvement.

Navigation

To access the diagnostics feature, enable the Question Browser permission. At the base of each answer will be a diagnostic icon to drill into the diagnostics for that answer.

Key Components of the Diagnostic Feature:

Chat Pipeline Diagnosis

Overall diagnosis of the chat pipeline.

Skill Selection:

Understand which skill is being selected to answer a specific query. This includes: the user question, message history, system prompt, selected hints, functions' names and descriptions, relevant LLM (Language Learning Model)

These elements help in identifying if the right functions are being used and understanding the function selection process. Users can also run evaluations of skill selection to identify and address any gaps.

For diagnosis support, users can select to run the following LLM Evaluation:

  • Skill Choice:
    • A second pass by the LLM to see if it agrees with the Skill Selection that was chosen on the initial question answer.

Parameter Selection:

This section provides detailed information relevant to parameter selection, including: the final arguments used by the skill, the chat history, hints, system prompt, detailed information on each parameter, such as descriptions, types, values, enum values, etc.

This information helps in understanding and diagnosing the parameter selection process. Users can review the selected parameters, run diagnostics, and use the playground to change and reevaluate the parameters to ensure they are optimized for the query.

For diagnosis support, users can select to run the following LLM Evaluation:

  • Parameters Choice:
    • A second pass by the LLM to see if it agrees with the Parameter Selections that were chosen on the initial question answer.

Grounding and Response:

Analyze how the system grounds its responses, using context and background information to provide accurate answers. This section includes:

  • Parameters before grounding: The initial parameters selected based on the user’s query and context.
  • Parameters after grounding: The adjusted parameters that are used to generate the final response.

Understanding the parameters before and after grounding helps in evaluating how the system refines its inputs to provide accurate and relevant answers.

Answer

The final LLM response is generated based on all the steps above, including skill selection, history, parameters, prompts, and facts passed to the LLM. This section includes the assistant's response and provides additional details if there was a guardrail response or if no function selection was made.

For diagnosis support, users can select to run the following LLM Evaluations:

  • Faithfulness:
    • Assess whether the response accurately represents the information and context provided. This can often highlight where hallucinations did or could take place.
  • Query Relevance:
    • Determine if the response directly answers the user’s query.
  • Content Completeness:
    • Check if the content contains enough information to be considered complete and informative.

Users can run these evaluations to ensure the response meets the required standards. Additionally, they can use the playground to simulate different scenarios for further analysis and improvement, ensuring the assistant's responses are accurate, relevant, and comprehensive.

Answer Info, Link and General Functionality

Information about the user’s question with relevant links. Including Copilot name and id, cost, token budget, LLM model, answer link and function to add question to question collections

Track Admin Feedback:

Monitor and incorporate feedback from administrators to refine and enhance the system’s performance. This continuous feedback loop when coupled with a Continuous Improvement plan helps in maintaining high standards of accuracy and relevance in responses.

Comprehensive logs for debugging and analysis.

Access detailed logs to debug issues within the answer pipeline. This is mainly beneficial for dev diagnosis of.  failing questions

Benefits of the Diagnostic Feature:

  1. Comprehensive Understanding: Gain a complete view of how answers are generated, ensuring transparency and accuracy.
  2. Skill Optimization: Verify and refine the selection of skills to ensure the right functions are used for queries.
  3. Parameter Adjustment: Diagnose and optimize parameters to enhance response quality.
  4. Grounding Accuracy: Analyze and refine how inputs are grounded for accurate and relevant answers.
  5. Response Evaluation: Assess the final response for faithfulness, relevance, and completeness, ensuring high-quality answers.
  6. Continuous Improvement: Utilize admin feedback and detailed logs for ongoing system optimization and troubleshooting.

Use Cases

User case 1: Diagnosis of failed question

When a question fails to generate a response, it’s crucial to diagnose the issue to understand the root cause and take corrective actions. The diagnostic feature provides a structured approach to identify and resolve such failures.

Recommended key Actions:

  1. Check the Answer Section: Review the final LLM response to see if the reason for the failure is explicitly stated. Sometimes, the answer section will include details about why the response failed, such as a guardrail response, lack of data, no dataset connection or lack of function selection.
  2. Review Logs for Debugging: Access and examine detailed logs related to the failed question.
  3. Examine Parameter Selection: Review the parameters used. Diagnose if incorrect or suboptimal parameters led to the failure. Look at the parameters before and after grounding to see how the system refined its inputs to generate the final response.
  4. Incorporate Admin Feedback to enhance the performance and accuracy of responses.

 

Use Case 2: Unexpected Dimension Value in Parameter

When you encounter an issue where a parameter has an unexpected dimension value (e.g., you expected X but got Y), follow these steps to diagnose and correct the problem:

  1. Analyze Grounding and Response: Examine the parameters before and after grounding to determine if the issue occurred before or after grounding.
  2. If Issue Occurred Before Grounding:
    • Review the conversation history leading up to the parameter selection.
    • Check the value match in the dataset or hints to ensure the correct values are being matched and selected based on the user’s query and context.
  3. If Issue Occurred After Grounding:
    • Review the default settings of the skill being used to ensure they are correctly configured.
    • Examine the parameter descriptions, sample values, and enums to ensure accuracy and conformity to expected types and ranges.
  4. Run LLM Response Evaluations:
    • Conduct evaluation tests focusing on parameter accuracy.

 

Updated

Was this article helpful?