Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Key error for most use cases in a simple dataframe using open source lama3:8b - instruct model via ollama #1243

Open
sktg84 opened this issue Jun 20, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@sktg84
Copy link

sktg84 commented Jun 20, 2024

System Info

Server:
mac m1

Packages:
pandas==1.5.3
pandasai==2.1

Python 3.10.9

馃悰 Describe the bug

Getting key error for most use cases and its not able to parse dataframes in local llm.

2024-06-20 10:05:10 [ERROR] Failed with error: Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
result = self.execute_code(code_to_run, code_context)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/chat/code_execution.py", line 171, in execute_code
exec(code, environment)
File "", line 3, in
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/frame.py", line 8402, in groupby
return DataFrameGroupBy(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 965, in init
grouper, exclusions, obj = get_grouper(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 888, in get_grouper
raise KeyError(gpr)
KeyError: 'software_version'

2024-06-20 10:05:10 [ERROR] Pipeline failed on step 6: 'software_version'
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/chat/generate_chat_pipeline.py", line 310, in run
).run(input)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/pipeline.py", line 137, in run
raise e
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/pipeline.py", line 101, in run
step_output = logic.execute(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/chat/code_execution.py", line 113, in execute
raise e
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
result = self.execute_code(code_to_run, code_context)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandasai/pipelines/chat/code_execution.py", line 171, in execute_code
exec(code, environment)
File "", line 3, in
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/frame.py", line 8402, in groupby
return DataFrameGroupBy(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 965, in init
grouper, exclusions, obj = get_grouper(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 888, in get_grouper
raise KeyError(gpr)
KeyError: 'software_version'
2024-06-20 10:05:38 [INFO] HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-20 10:05:38 [INFO] Prompt used:

@dosubot dosubot bot added the bug Something isn't working label Jun 20, 2024
Copy link

dosubot bot commented Jun 20, 2024

Hey there, @sktg84! 馃憢 I'm here to help you out with any bugs or questions you have. I'm a bot that can assist with solving bugs, answering questions, and becoming a contributor. Let's tackle this issue together!

To resolve the KeyError related to the 'software_version' key and successfully parse the DataFrame while using the lama3:8b-instruct model via Ollama, you need to ensure that the 'software_version' key is included in the extras dictionary when calling the serialize method. Here is an example of how you can do this:

import pandas as pd
from pandasai.helpers.dataframe_serializer import DataframeSerializer, DataframeSerializerType

# Sample DataFrame
data = {
    'column1': [1, 2, 3],
    'column2': [4, 5, 6]
}
df = pd.DataFrame(data)

# Create an instance of DataframeSerializer
serializer = DataframeSerializer()

# Define the extras dictionary with the 'software_version' key
extras = {
    'index': 0,
    'type': 'example_type',
    'software_version': '1.0.0'  # Ensure this key is included
}

# Serialize the DataFrame to JSON
json_output = serializer.serialize(df, extras, type_=DataframeSerializerType.JSON)

print(json_output)

This code ensures that the 'software_version' key is present in the extras dictionary, which should prevent the KeyError and allow the DataFrame to be successfully parsed [1].

To continue talking to Dosu, mention @dosu.

@sktg84
Copy link
Author

sktg84 commented Jun 20, 2024

this code is auto gen from llm.

@sktg84
Copy link
Author

sktg84 commented Jun 25, 2024

@dosu any updates?

Copy link

dosubot bot commented Jun 25, 2024

To resolve the KeyError: 'software_version', you need to ensure that the 'software_version' column exists in the DataFrame before performing operations like groupby. Since the code is auto-generated by the LLM, you can add a check in the execute_code method to handle this gracefully.

Here is an updated version of the execute_code method that includes a check for the 'software_version' column:

def execute_code(self, code: str, context: CodeExecutionContext) -> Any:
    """
    Execute the python code generated by LLMs to answer the question
    about the input dataframe. Run the code in the current context and return the
    result.

    Args:
        code (str): Python code to execute.
        context (CodeExecutionContext): Code Execution Context
                with prompt id and skills.

    Returns:
        Any: The result of the code execution. The type of the result depends
            on the generated code.

    """
    # List the required dfs, so we can avoid to run the connectors
    # if the code does not need them
    dfs = self._required_dfs(code)
    environment: dict = get_environment(self._additional_dependencies)
    environment["dfs"] = self._get_originals(dfs)
    if len(environment["dfs"]) == 1:
        environment["df"] = environment["dfs"][0]

    if self._config.direct_sql:
        environment["execute_sql_query"] = self._dfs[0].execute_direct_sql_query

    # Add skills to the env
    if context.skills_manager.used_skills:
        for skill_func_name in context.skills_manager.used_skills:
            skill = context.skills_manager.get_skill_by_func_name(skill_func_name)
            environment[skill_func_name] = skill

    # Check if 'software_version' column exists in the DataFrame
    if 'software_version' not in environment['df'].columns:
        raise KeyError("The 'software_version' column is missing from the DataFrame.")

    # Execute the code
    exec(code, environment)

    # Get the result
    if "result" not in environment:
        raise NoResultFoundError("No result returned")

    return environment["result"]

This code snippet adds a check to ensure that the 'software_version' column exists in the DataFrame before executing the code. If the column is missing, it raises a KeyError with a descriptive message [1].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant