Skip to content
This repository has been archived by the owner on May 7, 2021. It is now read-only.

Probability Returning Infinity for most Categories #3

Open
Nath5 opened this issue Jan 12, 2015 · 4 comments
Open

Probability Returning Infinity for most Categories #3

Nath5 opened this issue Jan 12, 2015 · 4 comments
Assignees
Milestone

Comments

@Nath5
Copy link

Nath5 commented Jan 12, 2015

Hello,

I know you haven't worked on this in a while but was wondering if you had any idea why I keep seeing this issue. I have added about 25 categories to the model with lots of data in each category. For the majority of the categories no matter what I feed in when I classify a chunk of text most of the categories return a probability of infinity.

ex.

Classification[
category=friends_gatherings,
probability=Infinity,
featureset=[
after,
school,
soccerabout,
this,
...
--
]
]

@windweller
Copy link

I literally encountered the same issue LOL. I think it may be because he didn't do any smoothing technique.

@ptnplanet
Copy link
Owner

Hello, yes, unfortunately there is no smoothing technique applied. PROD(P(featI|cat) becomes pretty big with lots of features and categories. You can however provide your own IFeatureProbability<T, K> calculator. This requires you to provide an own Classifier<T, K> though (or to override featuresProbabilityProduct(Collection<T> features, K category) in BayesClassifier<T, K>.

Orbiter added a commit to loklak/loklak_server that referenced this issue Mar 8, 2016
#256 (comment)
which addresses a problem in the Bayesian Classifier source code as
discussed in
ptnplanet/Java-Naive-Bayes-Classifier#3
@ptnplanet
Copy link
Owner

Hi all. You might want to explore the latest feature branch (feature/weight).

Take the feature weight into consideration when calculating the featureProbabilityProduct

  • Made BayesClassifier.featureProbabilityProduct public to enable other
    implementations to overwrite the calculation
  • By default now take the feature weight and the assumed Probability
    into consideration when calculating the feautersProbabilityProduct
  • Added a test to test with high number of categories

@ptnplanet ptnplanet self-assigned this Feb 3, 2017
@ptnplanet ptnplanet added this to the v1.1 milestone Feb 3, 2017
@barovehicles
Copy link

I'm comparing the results in python with numpy and the results with this routine and are completely different. This routine definitely don't work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants