bedrock claude models seem to raise "InvalidLLMOutputType("Response validation failed!")" #1249

metalshanked · 2024-06-24T23:43:56Z

System Info

2.2.6

🐛 Describe the bug

using the claude bedrock models (any from the list of supported) throws the below error after successful code gen and execution



InvalidLLMOutputType("Response validation failed!")

The text was updated successfully, but these errors were encountered:

dosubot · 2024-06-24T23:45:50Z

To resolve the InvalidLLMOutputType("Response validation failed!") error with the Claude Bedrock models, ensure that the response from the model is correctly formatted and validated. The error suggests that the response does not match the expected format.

Here is a potential fix:

Ensure that the response from the invoke_model method is correctly parsed and validated.
Check that the response contains the expected fields and types.

Here is the relevant part of the code with added validation:

def call(self, instruction: BasePrompt, context: PipelineContext = None) -> str:
    prompt = instruction.to_string()
    memory = context.memory if context else None

    messages = []
    system_prompt = ""
    if memory:
        if memory.agent_info:
            system_prompt = memory.get_system_prompt()
        for message in memory.all():
            if message["is_user"]:
                if messages and messages[-1]["role"] == "user":
                    messages[-1]["content"].append(
                        {"type": "text", "text": message["message"]}
                    )
                else:
                    messages.append(
                        {
                            "role": "user",
                            "content": [
                                {"type": "text", "text": message["message"]}
                            ],
                        }
                    )
            else:
                messages.append(
                    {
                        "role": "assistant",
                        "content": [{"type": "text", "text": message["message"]}],
                    }
                )

    # adding current prompt as latest query message
    if messages and messages[-1]["role"] == "user":
        messages[-1]["content"].append({"type": "text", "text": prompt})
    else:
        messages.append(
            {"role": "user", "content": [{"type": "text", "text": prompt}]},
        )

    params = {
        "anthropic_version": "bedrock-2023-05-31",
        "system": system_prompt,
        "messages": messages,
    }
    for key, value in self._default_params.items():
        if key != "model" and value is not None:
            params[key] = value

    body = json.dumps(params)
    response = self.client.invoke_model(modelId=self.model, body=body)

    response_body = json.loads(response.get("body").read())

    # Validate response format
    if "content" not in response_body or not isinstance(response_body["content"], list):
        raise InvalidLLMOutputType("Response validation failed!")

    self.last_prompt = prompt
    return response_body["content"][0]["text"]

This code ensures that the response from the invoke_model method contains a content field and that it is a list. If the validation fails, it raises the InvalidLLMOutputType error.

Additionally, refer to the unit tests in tests/unit_tests/llms/test_bedrock_claude.py to ensure your implementation aligns with the expected behavior:

class MockBedrockRuntimeClient:
    def invoke_model(self, **kwargs):
        text = json.dumps({"content": [{"text": "This is the expected text."}]})
        text = io.StringIO(text)
        return {"body": text}

class TestBedrockClaude:
    """Unit tests for the BedrockClaude LLM class"""

    @pytest.fixture
    def prompt(self):
        class MockBasePrompt(BasePrompt):
            template: str = "Hello"

        return MockBasePrompt()

    @pytest.fixture
    def context(self):
        return MagicMock()

    def test_call(self, mocker, prompt):
        llm = BedrockClaude(bedrock_runtime_client=MockBedrockRuntimeClient())
        expected_text = "This is the expected text."
        result = llm.call(instruction=prompt)
        assert result == expected_text

By ensuring that your response validation logic matches the expected format and using the provided unit tests as a reference, you should be able to resolve the InvalidLLMOutputType("Response validation failed!") error ^[1]^[2].

To continue talking to Dosu, mention @dosu.

metalshanked · 2024-06-25T00:02:33Z

The error seems to be for the validate method for ClarificationQuestionPrompt(BasePrompt).
The json.loads fails since below output i captured does not seem to be a valid JSON....i.e. it has a string followed by the json array

    def validate(self, output) -> bool:
        try:
            output = output.replace("```json", "").replace("```", "")
            json_data = json.loads(output)
            return isinstance(json_data, List)
        except json.JSONDecodeError:
            
            return False

Based on the query "what was the max and min", here are some potential clarification questions a senior data scientist might ask:

[
  "QuestionA?",
"QuestionB?"
]

dosubot · 2024-06-25T00:05:07Z

To ensure that the response from the validate method for ClarificationQuestionPrompt(BasePrompt) is correctly formatted as valid JSON and to avoid the InvalidLLMOutputType("Response validation failed!") error, you need to make sure that the output string is properly formatted as JSON. The validate method already attempts to strip any markdown formatting and parse the JSON. Here is the relevant code snippet:

def validate(self, output) -> bool:
    try:
        output = output.replace("```json", "").replace("```", "")
        json_data = json.loads(output)
        return isinstance(json_data, list)
    except json.JSONDecodeError:
        return False

To avoid the error, ensure that the output string passed to this method is a valid JSON array. For example, the output should look like this:

[
    {"question": "What is the purpose of this data?"},
    {"question": "Can you provide more details on the data source?"}
]

If the output is not in this format, it will fail the json.loads parsing and return False, leading to the InvalidLLMOutputType error.

Additionally, you can refer to the unit tests provided in the tests/unit_tests/_tests_agent.py file to see how the validate method is tested. For example, the test_clarification_prompt_validate_output_false_case and test_clarification_prompt_validate_output_true_case tests show how the method handles invalid and valid JSON responses, respectively:

def test_clarification_prompt_validate_output_false_case(self, agent: Agent):
    agent._lake.llm.call = Mock()
    agent._lake.llm.call.return_value = "This is not json"

    prompt = ClarificationQuestionPrompt(
        dataframes=agent._lake.dfs,
        conversation="test conversation",
        query="test query",
    )
    with pytest.raises(Exception):
        agent._call_llm_with_prompt(prompt)

def test_clarification_prompt_validate_output_true_case(self, agent: Agent):
    agent._lake.llm.call = Mock()
    agent._lake.llm.call.return_value = '["This is test question"]'

    prompt = ClarificationQuestionPrompt(
        dataframes=agent._lake.dfs,
        conversation="test conversation",
        query="test query",
    )
    result = agent._call_llm_with_prompt(prompt)
    assert isinstance(result, str)

These tests ensure that the validate method correctly identifies valid and invalid JSON responses ^[1]^[2].

To continue talking to Dosu, mention @dosu.

metalshanked · 2024-06-25T13:23:16Z

is there a way to remove the extra text so that the json loads correctly for bedrock?

dosubot · 2024-06-25T13:27:15Z

Yes, you can remove the extra text to ensure the JSON loads correctly for Bedrock. Specifically, you need to remove the conflict markers (<<<<<<<, =======, >>>>>>>) and ensure the JSON structure is valid.

Here is the corrected code snippet:

class BedrockClaude(LLM):
    """Bedrock Claude LLM
       Generates text using Anthropic Claude Messages API.

    Attributes:
        bedrock_runtime_client: The boto3 bedrock runtime client.
        max_tokens: Max number of tokens to generate.
        model: The Bedrock Claude model to use.
        temperature: (Optional) The amount of randomness injected into the response.
        top_p: (Optional) Use nucleus sampling. In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
        top_k: (Optional) Only sample from the top K options for each subsequent token.
        stop_sequences: (Optional) Custom text sequences that cause the model to stop generating. Anthropic Claude models normally stop when they have naturally completed their turn, in this case the value of the stop_reason response field is end_turn. If you want the model to stop generating when it encounters custom strings of text, you can use the stop_sequences parameter. If the model encounters one of the custom text strings, the value of the stop_reason response field is stop_sequence and the value of stop_sequence contains the matched stop sequence.
    """

    _supported__models = [
        "anthropic.claude-3-opus-20240229-v1:0",
        "anthropic.claude-3-5-sonnet-20240620-v1:0",
        "anthropic.claude-3-sonnet-20240229-v1:0",
        "anthropic.claude-3-haiku-20240307-v1:0",
    ]
    _valid_params = [
        "max_tokens",
        "model",
        "temperature",
        "top_p",
        "top_k",
        "stop_sequences",
    ]

    max_tokens: int = 1024
    model: str = "anthropic.claude-3-sonnet-20240229-v1:0"
    temperature: Optional[float] = None
    top_p: Optional[float] = None
    top_k: Optional[float] = None
    stop_sequences: Optional[str] = None
    client: Any

    def __init__(self, bedrock_runtime_client, **kwargs):
        for key, val in kwargs.items():
            if key in self._valid_params:
                setattr(self, key, val)

        self.client = bedrock_runtime_client

        if self.model not in self._supported__models:
            raise UnsupportedModelError(self.model)

        invoke_model = getattr(self.client, "invoke_model", None)
        if not callable(invoke_model):
            raise APIKeyNotFoundError

    @property
    def _default_params(self) -> Dict[str, Any]:
        """Get the default parameters for calling text generation inference API."""
        return {
            "max_tokens": self.max_tokens,
            "model": self.model,
            "temperature": self.temperature,
            "top_p": self.top_p,
            "top_k": self.top_k,
            "stop_sequences": self.stop_sequences,
        }

    def call(self, instruction: BasePrompt, context: PipelineContext = None) -> str:
        prompt = instruction.to_string()
        memory = context.memory if context else None

        messages = []
        system_prompt = ""
        if memory:
            if memory.agent_info:
                system_prompt = memory.get_system_prompt()
            for message in memory.all():
                if message["is_user"]:
                    if messages and messages[-1]["role"] == "user":
                        messages[-1]["content"].append(
                            {"type": "text", "text": message["message"]}
                        )
                    else:
                        messages.append(
                            {
                                "role": "user",
                                "content": [
                                    {"type": "text", "text": message["message"]}
                                ],
                            }
                        )
                else:
                    messages.append(
                        {
                            "role": "assistant",
                            "content": [{"type": "text", "text": message["message"]}],
                        }
                    )

        # adding current prompt as latest query message
        if messages and messages[-1]["role"] == "user":
            messages[-1]["content"].append({"type": "text", "text": prompt})
        else:
            messages.append(
                {"role": "user", "content": [{"type": "text", "text": prompt}]},
            )

        params = {
            "anthropic_version": "bedrock-2023-05-31",
            "system": system_prompt,
            "messages": messages,
        }
        for key, value in self._default_params.items():
            if key != "model" and value is not None:
                params[key] = value

        body = json.dumps(params)
        # print(body)

        response = self.client.invoke_model(modelId=self.model, body=body)

        response_body = json.loads(response.get("body").read())

        self.last_prompt = prompt
        # print(response_body["content"][0]["text"])
        return response_body["content"][0]["text"]

    @property
    def type(self) -> str:
        return "bedrock-claude"

This ensures that the JSON structure is valid and removes any extra text that might interfere with the JSON parsing ^[1].

To continue talking to Dosu, mention @dosu.

shivamongit · 2024-06-25T13:49:23Z

BedrockClaude class has unresolved merge conflicts.
Here is the cleaned-up version of the class without the conflict markers
`from future import annotations

import json
from typing import TYPE_CHECKING, Any, Dict, Optional

from ..exceptions import APIKeyNotFoundError, UnsupportedModelError
from ..helpers import load_dotenv
from ..prompts.base import BasePrompt
from .base import LLM

if TYPE_CHECKING:
from pandasai.pipelines.pipeline_context import PipelineContext

load_dotenv()

class BedrockClaude(LLM):
"""Bedrock Claude LLM
Generates text using Anthropic Claude Messages API.

Attributes:
    bedrock_runtime_client: The boto3 bedrock runtime client.
    max_tokens: Max number of tokens to generate.
    model: The Bedrock Claude model to use.
    temperature: (Optional) The amount of randomness injected into the response.
    top_p: (Optional) Use nucleus sampling. In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
    top_k: (Optional) Only sample from the top K options for each subsequent token.
    stop_sequences: (Optional) Custom text sequences that cause the model to stop generating. Anthropic Claude models normally stop when they have naturally completed their turn, in this case the value of the stop_reason response field is end_turn. If you want the model to stop generating when it encounters custom strings of text, you can use the stop_sequences parameter. If the model encounters one of the custom text strings, the value of the stop_reason response field is stop_sequence and the value of stop_sequence contains the matched stop sequence.
"""

_supported__models = [
    "anthropic.claude-3-opus-20240229-v1:0",
    "anthropic.claude-3-5-sonnet-20240620-v1:0",
    "anthropic.claude-3-sonnet-20240229-v1:0",
    "anthropic.claude-3-haiku-20240307-v1:0",
]
_valid_params = [
    "max_tokens",
    "model",
    "temperature",
    "top_p",
    "top_k",
    "stop_sequences",
]

max_tokens: int = 1024
model: str = "anthropic.claude-3-sonnet-20240229-v1:0"
temperature: Optional[float] = None
top_p: Optional[float] = None
top_k: Optional[float] = None
stop_sequences: Optional[str] = None
client: Any

def __init__(self, bedrock_runtime_client, **kwargs):
    for key, val in kwargs.items():
        if key in self._valid_params:
            setattr(self, key, val)

    self.client = bedrock_runtime_client

    if self.model not in self._supported__models:
        raise UnsupportedModelError(self.model)

    invoke_model = getattr(self.client, "invoke_model", None)
    if not callable(invoke_model):
        raise APIKeyNotFoundError

@property
def _default_params(self) -> Dict[str, Any]:
    """Get the default parameters for calling text generation inference API."""
    return {
        "max_tokens": self.max_tokens,
        "model": self.model,
        "temperature": self.temperature,
        "top_p": self.top_p,
        "top_k": self.top_k,
        "stop_sequences": self.stop_sequences,
    }

def call(self, instruction: BasePrompt, context: PipelineContext = None) -> str:
    prompt = instruction.to_string()
    memory = context.memory if context else None

    messages = []
    system_prompt = ""
    if memory:
        if memory.agent_info:
            system_prompt = memory.get_system_prompt()

        for message in memory.all():
            if message["is_user"]:
                if messages and messages[-1]["role"] == "user":
                    messages[-1]["content"].append(
                        {"type": "text", "text": message["message"]}
                    )
                else:
                    messages.append(
                        {
                            "role": "user",
                            "content": [
                                {"type": "text", "text": message["message"]}
                            ],
                        }
                    )
            else:
                messages.append(
                    {
                        "role": "assistant",
                        "content": [{"type": "text", "text": message["message"]}],
                    }
                )

    # adding current prompt as latest query message
    if messages and messages[-1]["role"] == "user":
        messages[-1]["content"].append({"type": "text", "text": prompt})
    else:
        messages.append(
            {"role": "user", "content": [{"type": "text", "text": prompt}]},
        )

    params = {
        "anthropic_version": "bedrock-2023-05-31",
        "system": system_prompt,
        "messages": messages,
    }
    for key, value in self._default_params.items():
        if key != "model" and value is not None:
            params[key] = value

    body = json.dumps(params)
    # print(body)

    response = self.client.invoke_model(modelId=self.model, body=body)

    response_body = json.loads(response.get("body").read())

    self.last_prompt = prompt
    # print(response_body["content"][0]["text"])
    return response_body["content"][0]["text"]

@property
def type(self) -> str:
    return "bedrock-claude"

`
also install pyyaml pip install pyyaml

dosubot bot added the bug Something isn't working label Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bedrock claude models seem to raise "InvalidLLMOutputType("Response validation failed!")" #1249

bedrock claude models seem to raise "InvalidLLMOutputType("Response validation failed!")" #1249

metalshanked commented Jun 24, 2024

dosubot bot commented Jun 24, 2024

metalshanked commented Jun 25, 2024 •

edited

Loading

dosubot bot commented Jun 25, 2024

metalshanked commented Jun 25, 2024

dosubot bot commented Jun 25, 2024

shivamongit commented Jun 25, 2024

bedrock claude models seem to raise "InvalidLLMOutputType("Response validation failed!")" #1249

bedrock claude models seem to raise "InvalidLLMOutputType("Response validation failed!")" #1249

Comments

metalshanked commented Jun 24, 2024

System Info

🐛 Describe the bug

dosubot bot commented Jun 24, 2024

metalshanked commented Jun 25, 2024 • edited Loading

dosubot bot commented Jun 25, 2024

metalshanked commented Jun 25, 2024

dosubot bot commented Jun 25, 2024

shivamongit commented Jun 25, 2024

metalshanked commented Jun 25, 2024 •

edited

Loading