Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable To Generate Schema: InvalidLLMOutputType (Response validation failed!) #1265

Open
mayankagarwal19911 opened this issue Jul 2, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@mayankagarwal19911
Copy link

System Info

pandasai 2.2.8
Python 3.9.6

馃悰 Describe the bug

Sample code

df = pd.DataFrame(columns=["column1"], data=[["value1"]])
df.head()
semantic_agent = SemanticAgent(dfs=df)
print(semantic_agent._schema)

Error

in BaseAgent.call_llm_with_prompt(self, prompt)
return result
else:
raise InvalidLLMOutputType("Response validation failed!")
except Exception:
if (
not self.context.config.use_error_correction_framework
or retry_count >= self.context.config.max_retries - 1
):

@mayankagarwal19911 mayankagarwal19911 changed the title Unable To Generate Schema InvalidLLMOutputType: Response validation failed! Unable To Generate Schema: InvalidLLMOutputType (Response validation failed!) Jul 2, 2024
@dosubot dosubot bot added the bug Something isn't working label Jul 2, 2024
Copy link

dosubot bot commented Jul 2, 2024

Hello, @mayankagarwal19911! I'm here to assist you with any bugs, questions, or contributions you may have. Let's tackle this issue together!

To resolve the InvalidLLMOutputType: Response validation failed! error and successfully generate the schema using pandasai 2.2.8 and Python 3.9.6, follow these steps:

  1. Ensure the Template Correctly Specifies the Expected Output Type:
    Modify the template in pandasai/prompts/templates/correct_output_type_error_prompt.tmpl to ensure it specifies the expected output type.

    {% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}
    
    The user asked the following question:
    {{context.memory.get_conversation()}}
    
    You generated this python code:
    {{code}}
    
    Fix the python code above and return the new python code but the result type should be: {{output_type}}
    
  2. Set the Correct Output Type in Your Context:
    Ensure that the output_type variable is correctly set to the expected type in your context. For example, if you expect a DataFrame, output_type should be set to DataFrame.

    from jinja2 import Template
    
    template_str = """
    {% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}
    
    The user asked the following question:
    {{context.memory.get_conversation()}}
    
    You generated this python code:
    {{code}}
    
    Fix the python code above and return the new python code but the result type should be: {{output_type}}
    """
    
    context = {
        'dfs': [df],  # Your DataFrame
        'memory': memory,  # Your conversation memory
        'code': generated_code,  # The generated Python code
        'output_type': 'DataFrame'  # The expected output type
    }
    
    template = Template(template_str)
    corrected_code = template.render(context=context)
    
    print(corrected_code)
  3. Handle Validation in call_llm_with_prompt Method:
    Ensure that the call_llm_with_prompt method in your BaseAgent class correctly handles the validation of the output type. The method should retry the prompt if the validation fails.

    def call_llm_with_prompt(self, prompt: BasePrompt):
        retry_count = 0
        while retry_count < self.context.config.max_retries:
            try:
                result: str = self.context.config.llm.call(prompt)
                if prompt.validate(result):
                    return result
                else:
                    raise InvalidLLMOutputType("Response validation failed!")
            except Exception:
                if (
                    not self.context.config.use_error_correction_framework
                    or retry_count >= self.context.config.max_retries - 1
                ):
                    raise
                retry_count += 1
  4. Generate and Validate the Schema:
    Ensure that the schema generation and validation are correctly implemented in the SemanticAgent class.

    def _create_schema(self):
        if self._schema:
            self.logger.log(f"using user provided schema: {self._schema}")
            return
    
        key = self._get_schema_cache_key()
        if self.config.enable_cache:
            value = self._schema_cache.get(key)
            if value is not None:
                self._schema = json.loads(value)
                self.logger.log(f"using schema: {self._schema}")
                return
    
        prompt = GenerateDFSchemaPrompt(context=self.context)
        result = self.call_llm_with_prompt(prompt)
        self.logger.log(f"Initializing Schema:  {result}")
        self._schema = result.replace("# SAMPLE SCHEMA", "")
        schema_data = extract_json_from_json_str(result.replace("# SAMPLE SCHEMA", ""))
        if isinstance(schema_data, dict):
            schema_data = [schema_data]
    
        self._schema = schema_data
        if self.config.enable_cache:
            self._schema_cache.set(key, json.dumps(self._schema))
    
        self.logger.log(f"using schema: {self._schema}")
  5. Ensure Correct Output Type Validation:
    Ensure that your generated code produces a result dictionary with the correct type and value fields that match the expected output type.

    class DataFrameOutputType(BaseOutputType):
        @property
        def template_hint(self):
            return """type (must be "dataframe"), value must be pd.DataFrame or pd.Series. Example: { "type": "dataframe", "value": pd.DataFrame({...}) }"""
    
        @property
        def name(self):
            return "dataframe"
    
        def _validate_value(self, actual_value: Any) -> bool:
            return bool(df_type(actual_value))

By following these steps, you should be able to resolve the InvalidLLMOutputType: Response validation failed! error and successfully generate the schema [1][2][3][4].

To continue talking to Dosu, mention @dosu.

@pratik9409
Copy link

#1265 Will like to solve this bug

@gventuri
Copy link
Collaborator

gventuri commented Jul 9, 2024

@pratik9409 sure, thanks a lot for the availability, I've assigned it to you! :D

@pratik9409
Copy link

Testing with a simple DataFrame

df = pd.DataFrame(columns=["Empdata"], data=[[1], [2]])

df.head()

try:
# Create an instance of the SemanticAgent with the provided dataframe
semantic_agent = SemanticAgent(dfs=df)
# Print the generated schema
print(semantic_agent._schema)
except InvalidLLMOutputType as e:
# If the LLM fails to generate a valid schema, catch the InvalidLLMOutputType exception
print(f"Error: {e}") # Print the error message
print("Using fallback schema...") # Inform the user that a fallback schema will be used

semanticouput

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants