BUG: ValueError when accessing dataFrame with array attribute #59196

Zybulon · 2024-07-06T22:52:44Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

attrs = {"A": "B", "G": np.array([1.2, 2.4])}

# This one works
arr = np.random.rand(60, 1)
df_named = pd.DataFrame(arr)
df_named.attrs = attrs
print(df_named[0])

# This one works
arr = np.random.rand(61, 1)
df_named = pd.DataFrame(arr)
df_named.attrs = {"A": "B", "G": "A"}
print(df_named[0])

# This one does not works
arr = np.random.rand(61, 1)
df_named = pd.DataFrame(arr)
df_named.attrs = attrs
print(df_named)  # This works
print(df_named[0])  # This does not works

Issue Description

Hello,

I have a dataFrame of size (61,1) with 2 attributes (one is an array) and I can't print the first Serie of the DataFrame.
I have the following Error :

Traceback (most recent call last):

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File d:\documents\perso\travail\mbda\pandas_extension\h5pandas\tests\debug.py:23
    print(df_named[0])  # This does not works

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\series.py:1784 in __repr__
    return self.to_string(**repr_params)

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\series.py:1871 in to_string
    formatter = fmt.SeriesFormatter(

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\io\formats\format.py:225 in __init__
    self._chk_truncate()

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\io\formats\format.py:247 in _chk_truncate
    series = concat((series.iloc[:row_num], series.iloc[-row_num:]))

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\reshape\concat.py:395 in concat
    return op.get_result()

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\reshape\concat.py:650 in get_result
    return result.__finalize__(self, method="concat")

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\generic.py:6273 in __finalize__
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])

  File ~\miniforge-pypy3\envs\h5pandas_dev\Lib\site-packages\pandas\core\generic.py:6273 in <genexpr>
    have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:])

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

However I can print the DataFrame, it does not raise the ValueError.
If the DataFrame hasn't got the array attribute, I do not have ValueError.
If the DataFrame has only 60 rows, I do not have ValueError.

Expected Behavior

I should not have this ValueError.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.12.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en
LOCALE : fr_FR.cp1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
setuptools : 70.1.1
pip : 24.0
Cython : None
pytest : 8.2.2
hypothesis : None
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.26.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.4
numba : None
numexpr : 2.8.7
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : 3.9.2
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : 0.22.0
tzdata : 2024.1
qtpy : 2.4.1
pyqt5 : None

The text was updated successfully, but these errors were encountered:

crspencer11 · 2024-07-09T17:16:30Z

take

Anurag-Varma · 2024-07-15T12:48:10Z

Just did a debugging:

By default display.max_rows in pandas is set to 60.

But if you have more than 60 rows, its failing as mentioned in your above case.

To avoid it, you can do this -
For example, if you want 100 rows max, then:

pd.set_option("display.max_rows", 100)

Then it will work, in case of any other value, replace 100 with that value.

Anurag-Varma · 2024-07-15T13:43:29Z

take

Zybulon added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 6, 2024

github-actions bot assigned crspencer11 Jul 9, 2024

crspencer11 linked a pull request Jul 9, 2024 that will close this issue

fix-issue-59196: add extra check for np arrays #59217

Open

5 tasks

crspencer11 removed their assignment Jul 9, 2024

github-actions bot assigned Anurag-Varma Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ValueError when accessing dataFrame with array attribute #59196

BUG: ValueError when accessing dataFrame with array attribute #59196

Zybulon commented Jul 6, 2024

INSTALLED VERSIONS

crspencer11 commented Jul 9, 2024

Anurag-Varma commented Jul 15, 2024

Anurag-Varma commented Jul 15, 2024

BUG: ValueError when accessing dataFrame with array attribute #59196

BUG: ValueError when accessing dataFrame with array attribute #59196

Comments

Zybulon commented Jul 6, 2024

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

crspencer11 commented Jul 9, 2024

Anurag-Varma commented Jul 15, 2024

Anurag-Varma commented Jul 15, 2024