Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swapping phi, token.frequency, vocab, and topic.proportion rda files removes some visualization features #53

Open
Graybosch opened this issue Feb 2, 2016 · 1 comment

Comments

@Graybosch
Copy link

Hello Carson,

I'd appreciate any thoughts on what might be causing an issue with your otherwise great visualization package.

One of your tutorials generates a beautiful Shiny application. I replaced your RDA files with my own - you had RDA files for phi, topic.proportion, token.frequency, and vocab - and got a picture of the topic regions but do not get lists of relevant terms for topics I click on. I also do not get barcharts of the breakdown of tokens for each topic, only a list of the overall most salient terms for the corpus.

I, initially, got a NaN error when the Shiny application tried to build. I built my model using super-fast Vowpal Wabbit for LDA. VW requires a vocabulary size to be a power of 2, plus 1, and so if your |Vocabulary| <> 2^N + 1 then you will have some rows of zeroes in phi. My guess is those zeros made the Kullback-Leibler divergence blow up. When I forced the zero entries in phi to equal 10^-6 the app ran and gave me a beautiful picture of the overlapping topics. However, when I selected a region, I no longer automatically got barcharts of relevant terms for that cluster. Said feature worked beautifully prior to my replacing your RDA files. The app does still tell me how much of the corpus comes from each topic and still does list the overall most salient terms.

I've checked my phi, topic.proportion, token.frequency, and vocab, I'd appreciate any thoughts on what might be causing the issue, thanks again for the great visualization package,

Anthony

@Graybosch
Copy link
Author

This has been resolved by adding row and column headings to phi and by removing the rows corresponding to extra tokens added by VW's implementation of LDA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant