Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bin width < half spread #3646 #3724

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Gornoka
Copy link

@Gornoka Gornoka commented Jul 5, 2024

prevents edge-case when a bin only has elements with spread < 0.5*bin_width .

I would personally prefer this option, with ceil instead of min(1,..).
This would alter previous behavior for binning in general( creating an additional bin in ~50% of cases),I would be interested to know if that is acceptable here or if you would like to use the min approach instead.

* prevents edge-case when a bin only has elements with spread < 0.5 bin width
@mwaskom
Copy link
Owner

mwaskom commented Jul 5, 2024

What is the edge case? Is there an issue that you can link to? Is there a failing test that you could add?

@Gornoka
Copy link
Author

Gornoka commented Jul 5, 2024

What is the edge case? Is there an issue that you can link to? Is there a failing test that you could add?

this is related to #3646 , i can also add a test.

@mwaskom
Copy link
Owner

mwaskom commented Jul 5, 2024

I would personally prefer this option, with ceil instead of min(1,..).

Do you mean max(1, ...)?

That sounds like a better approach to me — IIUC the linked issue, the edge case is entirely about cases where the default bin rule gives "less than 1 bin" and that doesn't make sense. Why would we want to change other plots?

@Gornoka
Copy link
Author

Gornoka commented Jul 5, 2024

I would personally prefer this option, with ceil instead of min(1,..).

Do you mean max(1, ...)?

That sounds like a better approach to me — IIUC the linked issue, the edge case is entirely about cases where the default bin rule gives "less than 1 bin" and that doesn't make sense. Why would we want to change other plots?

Mainly I would want to fix the issue with no bins, personally i would prefer the behaviour with ceil plots instead round plots. But I understand that some people may expect the old behavior, that is why I asked for opinions on that.

I will update this PR to use the max approach, the min was just a mixup, sorry for the confusion.

@mwaskom
Copy link
Owner

mwaskom commented Jul 5, 2024

personally i would prefer the behaviour with ceil plots instead round plots

why though? do you have a reason?

@Gornoka
Copy link
Author

Gornoka commented Jul 6, 2024

personally i would prefer the behaviour with ceil plots instead round plots

why though? do you have a reason?

When I specify binwidth my assumption was that I would get a histogram with that bin width.
When I specify bins I would get a histogram with that number of bins.
When I specify the binrange I would get whatever is in that range depending on the other settings.

Example graphs of the current behavior below.
bin_sizes

I was expecting bindwidth to create new bins until it fits my whole data range .
If I have data that does not fit perfectly in the bin width, I should get the last bin semi-empty,
resulting in a round up of n_bins. (like it does in the bottom row)

If the range is not specified explicitly seaborn will shrink or extend the bins to fit the bins to evenly over the data.
That is a good decision to make the histogram easier to read correctly, and when using round the shrunk/extended bin width is closer to what the user asked for. So probably my preference is wrong here for many applications and I will have to restort to setting both binrange and binwidth in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants