Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in the computeQ function - v2 classifier #41

Open
bharat-biradar opened this issue Aug 20, 2021 · 0 comments
Open

Bug in the computeQ function - v2 classifier #41

bharat-biradar opened this issue Aug 20, 2021 · 0 comments
Assignees

Comments

@bharat-biradar
Copy link
Contributor

bharat-biradar commented Aug 20, 2021

Describe the issue
In the computeQ when the threshold is set to 1.0 the granularity is being calculated as 10, but if we set the threshold to 0.95, 0.99, or 0.999 the granularity is being calculated as 19, 99, 999, respectively where there is exponential growth and also the granularity is greater than the granularity set at maxThresold(1.0) which is 10.

Is this intentional?

A problem occurring due to this issue is that when we set the threshold to 0.95 or greater a lot of licenses are not being detected which in the case we set to 0.9 are easily being detected.

I ran the program for around 17,300 license files out of which around 2950 BSD-3-Clause, 850 BSD-2-Clause and some other licenses were not at all detected which were otherwise detected at a granularity of 10 because at that threshold the granularity is greater than 20 and nearly reaches 100.

A possible solution would be to set the granularity to 10 for a threshold greater than 0.9 and it will also handle the divide by zero cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants