Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table with negative data fails to save as image when using .fmt_number #391

Open
Mike-Purtell opened this issue Jul 3, 2024 · 14 comments
Open

Comments

@Mike-Purtell
Copy link

Description

Saving image of a table to png file fails when the table has negative values, and .fmt_number is used.

Reproducible example - Verified on complex use cases, and the simple example posted here. Notice that the file extension is .txt, please change to .py or paste into a notebook to run this code.

gt_bug_2024_07_03_MP.txt

Development environment

Win11, great_tables 0.9.0, python 3.11.5 with Anaconda/Jupyter Lab, polars 0.20.31

Expected result

Expect that table with negative data can use .fmt_number to clean the table, and then can be saved as an image file. This failed. .

@machow
Copy link
Collaborator

machow commented Jul 8, 2024

Hey, thanks for raising--I'm having some trouble viewing the .txt file. Do you mind pasting in the python code directly?

@Mike-Purtell
Copy link
Author

Mike-Purtell commented Jul 8, 2024 via email

@Mike-Purtell
Copy link
Author

Here is the python code I wrote to demonstrate the reported bug:

import great_tables
from great_tables import GT
import polars as pl
def save_gt(df, filename):
my_gt = (GT(df).tab_header(title = f'{filename}', subtitle = f'subtitle')

    # TO TEST THIS BUG, RUN THIS CODE WITH and WITHOUT .fmt_number 
    # save table to image fails when .fmt_number with negative values is used
    .fmt_number(
        columns=df.columns,
        decimals=1,
        use_seps=True, 
        sep_mark=','
        )
)
try:
    my_gt.save(filename, window_size=(6, 6))
    print(f'\n ###########  SUCCESSFULLY WROTE {filename}  ###########\n')
except:
    print(f'\n ###########  FAILED TO WRITE {filename}  ###########\n')
return

df_pos = pl.DataFrame(
{
'A': [x for x in list(range(3))],
'B': [x0.5 for x in list(range(3))],
'C': [x
01.5 for x in list(range(3))],
}
)

make df_neg by multiplying all values of df_pos by -1

df_neg = df_pos.with_columns(pl.all()*pl.lit(-1))
display(df_neg, df_pos)
save_gt(df_neg, 'df_neg.png')
save_gt(df_pos, 'df_pos.png')

@jrycw
Copy link
Contributor

jrycw commented Jul 9, 2024

Hello, I reformatted the code to make it easier to read on GitHub. Hope this helps!

By the way, it seems that the display import is missing. I suspect we need to add from IPython.display import display at the top.

import great_tables
import polars as pl
from great_tables import GT


def save_gt(df, filename):
    my_gt = (
        GT(df).tab_header(title=f"{filename}", subtitle=f"subtitle")
        # TO TEST THIS BUG, RUN THIS CODE WITH and WITHOUT .fmt_number
        # save table to image fails when .fmt_number with negative values is used
        .fmt_number(columns=df.columns, decimals=1, use_seps=True, sep_mark=",")
    )
    try:
        my_gt.save(filename, window_size=(6, 6))
        print(f"\n ###########  SUCCESSFULLY WROTE {filename}  ###########\n")
    except:
        print(f"\n ###########  FAILED TO WRITE {filename}  ###########\n")
    return


df_pos = pl.DataFrame(
    {
        "A": [x for x in list(range(3))],
        "B": [x * 0.5 for x in list(range(3))],
        "C": [x * 01.5 for x in list(range(3))],
    }
)

# make df_neg by multiplying all values of df_pos by -1
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1))
display(df_neg, df_pos)
save_gt(df_neg, "df_neg.png")
save_gt(df_pos, "df_pos.png")

@Mike-Purtell
Copy link
Author

Thank you for reformatting of the python code. Not sure how I get away without using from IPython.display import display. Might be automatically imported by my anaconda environment or might be running the native python display command. Thank you for working on this issue, greatly appreciated, and if I can help in any way please don't hesitate to ask.

@Mike-Purtell
Copy link
Author

Thank you for releasing 0.10. I ran the test case submitted and it worked, very happy about that. On my production code, I still have cannot format tables with negative values. My error message indicates that I have an issue with the use of UTF-16 coding for the minus sign, which is represented as 0x2212. In polars, I tried to cast as UTF-8, then back to Float64, still have the issue. I also tried multiplying all values by -1 twice to see if this operation would return with an acceptable minus sign, also to no avail. I will see if I can produce a usable work-around for now.

@Mike-Purtell
Copy link
Author

great_tables 0.10.0 has issues with .fmt_number. Verified using python 3.11.9, polars 1.1.0. Verified with anaconda/spyder, and with a python notebook in jupyter lab. Short python script (18 lines) attached as txt file.

A workaround is to have polars do the rounding, instead of great tables/ .fmt_number. This work around only applies to rounding, does not cover other features of .fmt_number such as thousands commas.

great_table_fmt_number_2024_07_13.txt

@machow
Copy link
Collaborator

machow commented Jul 15, 2024

Thanks for looking into this (and to @jrycw for the clean up!). I'm having some trouble reproducing :/ . Based on the examples, I ran the code below, but did not hit an error.

import polars as pl
from great_tables import GT
from IPython.display import display


df_pos = pl.DataFrame(
    {
        "A": [x for x in list(range(3))],
        "B": [x * 0.5 for x in list(range(3))],
        "C": [x * 01.5 for x in list(range(3))],
    }
)

# make df_neg by multiplying all values of df_pos by -1
df_neg = df_pos.with_columns(pl.all() * pl.lit(-1))
display(df_neg, df_pos)
(
    GT(df_neg)
    .tab_header(title="a", subtitle="b")
    .fmt_number(columns=df_neg.columns, decimals=1, use_seps=True, sep_mark=",")
    .save("test.png", window_size=(6,6))
)

Do you mind pasting in the traceback for the error (or the error name)? I'm a bit stumped on what might cause saving a table to fail when formatting negative numbers... 😵

@Mike-Purtell
Copy link
Author

Mike-Purtell commented Jul 15, 2024 via email

@Mike-Purtell
Copy link
Author

Here is just the code from previous post
great_table_fmt_number_2024_07_13.txt

@jrycw
Copy link
Contributor

jrycw commented Jul 15, 2024

I'm running on Windows 11 as well and cannot reproduce the error with or without .fmt_number(). However, I suspect the error may stem from these lines, which deal with the minus sign.

import random

import polars as pl
from great_tables import GT

random.seed(42)
col_1 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
col_2 = [random.uniform(-1.0, 1.0) for a in list(range(7))]
df = pl.DataFrame({"COL_1": col_1, "COL_2": col_2})

print(df.head(7))
my_gt = (
    GT(df).tab_header(title="Positive, Negative Cosine")
    # Test with.fmt_number invoked, and with .fmt_number commented out
    # .fmt_number(columns=['COL_1', 'COL_2'], decimals=3)
)

# .save fails when great_table .fmt_number was used
my_gt.save("Random.png", window_size=(6, 6))

@Mike-Purtell
Copy link
Author

I ran this code on my personal machine and my work PC, both running Win11, with Anaconda/Spyder, great_tables 0.10.0. I get the same error in both cases when I include .fmt_number. The error message indicates unable to encode \u2212, which is UTF-16. Can the lines that deal with negative values be enhanced to support UTF-16, or to cast the negative sign to an equivalent UTF-8 code? Here is the error message:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 7431: character maps to

@jrycw
Copy link
Contributor

jrycw commented Jul 16, 2024

Another possible fix would be to set the encoding to UTF-8 while writing in GT.save() and related helper functions.

@machow
Copy link
Collaborator

machow commented Jul 16, 2024

Ah, thanks for surfacing! That bit of code definitely looks like the issue, and encoding seems like it should resolve 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants