Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dataframe is generated in code execution stage, which result in empty chart #1257

Open
xuyuanme opened this issue Jun 28, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@xuyuanme
Copy link

xuyuanme commented Jun 28, 2024

System Info

The current version of pandasai being used: 2.2.7

馃悰 Describe the bug

In some case LLM will generate code to retrieve different subset of a same dataframe. In the example below it's generating the "first_purchases" dataframe as well as "repurchases" dataframe. Then these two dataframes are merged for further analysis, which makes perfect sense:

# Assuming dfs[0] is the DataFrame we need to work with
df = dfs[0]

# Filter out first-time purchases and repurchases
first_purchases = df[df['buyer_purchase_sequence'] == 1][['buyer_uid', 'product_type']]
repurchases = df[df['buyer_purchase_sequence'] > 1][['buyer_uid', 'product_type']]

# Merge first purchases with repurchases to track the product type of the first purchase against repurchases
merged_df = pd.merge(repurchases, first_purchases, on='buyer_uid', suffixes=('_repurchase', '_first_purchase'))

In the code_execution phase, two filters are extracted: "buyer_purchase_sequence == 1" (for "first_purchases" dataframe) and "buyer_purchase_sequence > 1" (for "repurchases" dataframe).

However in pandasai/connectors/sql.py, this will concatenate a where clause: where buyer_purchase_sequence == 1 AND buyer_purchase_sequence > 1. This will result an empty dataframe in dfs, and eventually an empty chart.

I understand the design of filtering dataframe is for better system performance. But considering all possible cases, the "where clause" needs to use OR instead of AND to ensure all data gets loaded in dfs.

@dosubot dosubot bot added the bug Something isn't working label Jun 28, 2024
Copy link

dosubot bot commented Jun 28, 2024

Hey @xuyuanme! 馃憢 I'm here to help you with any bugs, questions, or contributions you have in mind. Let's tackle this together!

To address the issue where the where clause in pandasai/connectors/sql.py uses AND instead of OR, you need to modify the _build_query method. Specifically, replace the " AND ".join(condition_strings) with " OR ".join(condition_strings).

Here is the modified _build_query method:

def _build_query(self, limit=None, order=None):
    base_query = select("*").select_from(text(self.cs_table_name))
    if self.config.where or self._additional_filters:
        # conditions is the list of where + additional filters
        conditions = []
        if self.config.where:
            conditions += self.config.where
        if self._additional_filters:
            conditions += self._additional_filters

        query_params = {}
        condition_strings = []
        valid_operators = ["=", ">", "<", ">=", "<=", "LIKE", "!=", "IN", "NOT IN"]

        for i, condition in enumerate(conditions):
            if len(condition) == 3:
                column_name, operator, value = condition
                if operator in valid_operators:
                    self._validate_column_name(column_name)

                    condition_strings.append(f"{column_name} {operator} :value_{i}")
                    query_params[f"value_{i}"] = value

        if condition_strings:
            where_clause = " OR ".join(condition_strings)  # Changed from AND to OR
            base_query = base_query.where(
                text(where_clause).bindparams(**query_params)
            )

    if order:
        base_query = base_query.order_by(asc(text(order)))

    if limit:
        base_query = base_query.limit(limit)

    return base_query

This change ensures that the conditions in the where clause are combined using OR instead of AND, which will load all relevant data into dfs [1].

To continue talking to Dosu, mention @dosu.

xuyuanme added a commit to xuyuanme/pandas-ai that referenced this issue Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant