Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ValueError when accessing dataFrame with array attribute #59196

Open
2 of 3 tasks
Zybulon opened this issue Jul 6, 2024 · 3 comments · May be fixed by #59217
Open
2 of 3 tasks

BUG: ValueError when accessing dataFrame with array attribute #59196

Zybulon opened this issue Jul 6, 2024 · 3 comments · May be fixed by #59217
Assignees
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Zybulon
Copy link

Zybulon commented Jul 6, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

attrs = {"A": "B", "G": np.array([1.2, 2.4])}

# This one works
arr = np.random.rand(60, 1)
df_named = pd.DataFrame(arr)
df_named.attrs = attrs
print(df_named[0])

# This one works
arr = np.random.rand(61, 1)
df_named = pd.DataFrame(arr)
df_named.attrs = {"A": "B", "G": "A"}
print(df_named[0])

# This one does not works
arr = np.random.rand(61, 1)
df_named = pd.DataFrame(arr)
df_named.attrs = attrs
print(df_named)  # This works
print(df_named[0])  # This does not works

Issue Description

Hello,

I have a dataFrame of size (61,1) with 2 attributes (one is an array) and I can't print the first Serie of the DataFrame.
I have the following Error :

Traceback (most recent call last):

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File d:\documents\perso\travail\mbda\pandas_extension\h5pandas\tests\debug.py:23
    print(df_named[0])  # This does not works

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\series.py:1784 in __repr__
    return self.to_string(**repr_params)

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\series.py:1871 in to_string
    formatter = fmt.SeriesFormatter(

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\io\formats\format.py:225 in __init__
    self._chk_truncate()

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\io\formats\format.py:247 in _chk_truncate
    series = concat((series.iloc[:row_num], series.iloc[-row_num:]))

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\reshape\concat.py:395 in concat
    return op.get_result()

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\reshape\concat.py:650 in get_result
    return result.__finalize__(self, method="concat")

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\generic.py:6273 in __finalize__
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\generic.py:6273 in <genexpr>
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

However I can print the DataFrame, it does not raise the ValueError.
If the DataFrame hasn't got the array attribute, I do not have ValueError.
If the DataFrame has only 60 rows, I do not have ValueError.

Expected Behavior

I should not have this ValueError.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.12.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : fr_FR.cp1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
setuptools : 70.1.1
pip : 24.0
Cython : None
pytest : 8.2.2
hypothesis : None
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.26.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.4
numba : None
numexpr : 2.8.7
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : 3.9.2
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : 0.22.0
tzdata : 2024.1
qtpy : 2.4.1
pyqt5 : None

@Zybulon Zybulon added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 6, 2024
@crspencer11
Copy link

take

@crspencer11 crspencer11 linked a pull request Jul 9, 2024 that will close this issue
5 tasks
@crspencer11 crspencer11 removed their assignment Jul 9, 2024
@Anurag-Varma
Copy link
Contributor

Just did a debugging:

By default display.max_rows in pandas is set to 60.

But if you have more than 60 rows, its failing as mentioned in your above case.

To avoid it, you can do this -
For example, if you want 100 rows max, then:

pd.set_option("display.max_rows", 100)

Then it will work, in case of any other value, replace 100 with that value.

@Anurag-Varma
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants