Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #1257 empty dataframe #1258

Closed
wants to merge 1 commit into from
Closed

Conversation

xuyuanme
Copy link

@xuyuanme xuyuanme commented Jun 28, 2024

As described in #1257: considering all possible cases, in pandasai/connectors/sql.py the "where clause" needs to use OR instead of AND to ensure all data gets loaded in dfs.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 28, 2024
@gventuri
Copy link
Collaborator

gventuri commented Jul 1, 2024

@xuyuanme it is indeed expected to be a "AND". We need to come up with a new syntax for the "OR" in my opinion, otherwise we risk some breaking changes

@xuyuanme
Copy link
Author

xuyuanme commented Jul 1, 2024

Well received! I think we do need an "OR" syntax, otherwise in case described in #1257, it results a bug of empty data frame and chart.

@gventuri
Copy link
Collaborator

gventuri commented Jul 1, 2024

@xuyuanme agreed.

This is the currently accepted value:

"where": [
            # this is optional and filters the data to
            # reduce the size of the dataframe
            ["loan_status", "=", "PAIDOFF"],
],

I suggest to add an option to accept an object in the where clause, something like this:

"where": {
    "operator": "AND",
    "conditions": [
        {"column": "loan_status", "operator": "=", "value": "PAIDOFF"},
        {
            "operator": "OR",
            "conditions": [
                {"column": "loan_amount", "operator": ">", "value": 3000},
                {"column": "loan_amount", "operator": "<", "value": 1500}
            ]
        }
    ]
}

Of course we should keep supporting the old way as well.

What's your take? Any alternative syntax?

@xuyuanme
Copy link
Author

xuyuanme commented Jul 2, 2024

Thanks @gventuri . I think the syntax itself should work. One problem is, the conditions (in #1257) are generated from self._additional_filters, instead of self.config.where.

self._additional_filters are created by function _extract_filters() in code_execution.py, which returns:

        Returns:
            dict: The dictionary containing all filters parsed from
                the passed code. The dictionary has the following structure:
                {
                    "<df_number>": [
                        ("<left_operand>", "<operator>", "<right_operand>")
                    ]
                }

This code does not read user config to generate the sql. So it does not have a place to config whether it's "AND" or "OR". Any thought on how this part should be handled?

@gventuri
Copy link
Collaborator

gventuri commented Jul 3, 2024

@xuyuanme I recommend in that case to pass in the configs "direct_sql": True.
In this way, PandasAI will generate directly the SQL query with the more appropriate OR!

@xuyuanme
Copy link
Author

xuyuanme commented Jul 3, 2024

Ok let me do more tests against it. Thanks for the guide!

@xuyuanme
Copy link
Author

xuyuanme commented Jul 9, 2024

Close this PR.

@xuyuanme xuyuanme closed this Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty dataframe is generated in code execution stage, which result in empty chart
2 participants