Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1'] #59139

Open
3 tasks done
Hermann12 opened this issue Jun 28, 2024 · 6 comments
Open
3 tasks done
Labels
Bug IO CSV read_csv, to_csv

Comments

@Hermann12
Copy link

Hermann12 commented Jun 28, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# https://stackoverflow.com/a/78681763/12621346

import pandas as pd

df = pd.read_csv("test.csv", usecols=[‘Col1’,’Col2’], header=0, names=['first','third'])
print(df)

Issue Description

This is still a bug! If I read the documentation it said clearly: "For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']."
If I use it as described I get: "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']". Only [0,1,2] index is working! This ERROR message is also misleading/ wrong.

Expected Behavior

As the documentation describe the behavior. usecase: https://stackoverflow.com/a/78681763/12621346
If I would read according old column names and rename it to new names this works only with index 1, 2, 3 and not column names.

Installed Versions

2.0.3

@Hermann12 Hermann12 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 28, 2024
@Aloqeely
Copy link
Member

Thanks for the report! The documentation states: "If names are given, the document header row(s) are not taken into account" which is the current behavior, so this sounds more to me like an enhancement request than a bug report, is that right?

@Aloqeely Aloqeely added Needs Info Clarification about behavior needed to assess issue IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 28, 2024
@Hermann12
Copy link
Author

Hermann12 commented Jun 28, 2024

I think this is a discrepancy to the other referenced sentence see my report, in the documentation.
Quote:"For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']."
Therefore I assume usecols works with both, what "or" said.
Usecols is for read the csv, names is for representation of the result, if I understood it right. So in my opinion it's a bug, because it's not working with both as described into the documentation.

@Aloqeely
Copy link
Member

Aloqeely commented Jun 28, 2024

Well yes, you can pass a list of the column names just as the documentation states. But it also states that if names are provided then the header row won't be considered.

@Aloqeely Aloqeely removed the Needs Info Clarification about behavior needed to assess issue label Jun 28, 2024
@Hermann12
Copy link
Author

Stupid behavior. Not consistent in my opinion.

@Aloqeely
Copy link
Member

Aloqeely commented Jun 29, 2024

If your CSV file has the columns col1, col2, col3, and you passed names=['name1', 'name2', 'name3'], then, passing usecols=['name1', 'name3'] will work correctly.

Can you share why you think it's inconsistent? If you passed names then it makes sense that usecols will rely on those names rather than the names in the CSV header row, do you agree?

@Hermann12
Copy link
Author

Hermann12 commented Jul 1, 2024

That works I agree, but in a use case where you have 25 columns in the input csv and you need only the 1st and maybe the 23th, you have to name 25 new columns that you can usecols by column name (what's still in the csv). I think this is ineffective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

2 participants