Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: In pd.cut(), allow bins='auto' (leveraging np.histogram_bin_edges) #59165

Open
1 of 3 tasks
Hari-Shankar-Karthik opened this issue Jul 2, 2024 · 3 comments · May be fixed by #59241
Open
1 of 3 tasks

ENH: In pd.cut(), allow bins='auto' (leveraging np.histogram_bin_edges) #59165

Hari-Shankar-Karthik opened this issue Jul 2, 2024 · 3 comments · May be fixed by #59241
Assignees
Labels

Comments

@Hari-Shankar-Karthik
Copy link

Hari-Shankar-Karthik commented Jul 2, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

While converting a quantitative variable into a qualitative one, pd.cut() comes in clutch. However, it requires the user to specify bins as either an integer or a list of bin edges. I wish it was allowed to specify bins='auto' similar to how np.histogram allows it. It internally leverages np.histogram_bin_edges to compute these. Thank you.

Expectation

Instead of coding
pd.cut(df['x1'], bins=np.histogram_bin_edges(df['x1'], bins='auto'))
Allow for coding
pd.cut(df['x1'], bins='auto')

Additional Context

Calculation of bin edges is already done via np.histogram_bin_edges. Reference: https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html#numpy-histogram-bin-edges

@Hari-Shankar-Karthik Hari-Shankar-Karthik added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 2, 2024
@Hari-Shankar-Karthik Hari-Shankar-Karthik changed the title ENH: In pd.cut(), allow bins='auto' (Automatic Calculation of bin edges using np.histogram_bin_edges ENH: In pd.cut(), allow bins='auto' (leveraging np.histogram_bin_edges) Jul 2, 2024
@Aloqeely
Copy link
Member

Aloqeely commented Jul 4, 2024

Thanks for the suggestion! It appears there was an effort to allow string bins in pd.cut in #23567 but that PR got stale.
PRs are welcomed to add string bins support, dispatching the string to np.histogram_bin_edges.

@Aloqeely Aloqeely added cut cut, qcut and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 4, 2024
@chaarvii
Copy link

chaarvii commented Jul 4, 2024

Hey, would like to work on this.

@chaarvii
Copy link

chaarvii commented Jul 4, 2024

Take

@chaarvii chaarvii linked a pull request Jul 12, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants