Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index error when using anchors #34

Open
owlas opened this issue Mar 8, 2020 · 7 comments
Open

Index error when using anchors #34

owlas opened this issue Mar 8, 2020 · 7 comments

Comments

@owlas
Copy link

owlas commented Mar 8, 2020

When using the fit method with anchors I get an index error from this line:

p_y_given_x[:, j] = 0.5 * p_y_given_x[:, j] + 0.5 * X[:, a].mean(axis=1).A1 # Assumes X is a binary matrix

The error is understandable because if X is a 2d array, then X[:,i] is a 1d slice and thereforeX[:,i].mean(axis=1) is undefined because there is no dimension 1.

I've installed version corextopic==1.0.5 from pypi.

I can reproduce this for any arguments passed to anchors

@owlas
Copy link
Author

owlas commented Mar 8, 2020

Is a expected to be a list? I'm wondering if there is something missing in the preprocessing step for turning anchors of words into indices.

@gregversteeg
Copy link
Owner

Thanks for pointing this out. I think anchors has to be a list of lists: like anchors = [['cat', 'dog'], ['apple']].
Then "a" will be a list, and the slice will always produce an object that has axis=1!
However! I note that the examples on the readme do not always match this intuition. I'll ask @ryanjgallagher to check if there is an issue with the readme.

topic_model.fit(X, words=words, anchors=[['dog','cat'], 'apple'], anchor_strength=2)
Should be [['dog','cat'], ['apple']]?

topic_model.fit(X, words=words, anchors=['protest', 'protest', 'protest', 'riot', 'riot', 'riot'], anchor_strength=2)
Should be [['protest'], ['protest']... ?

topic_model.fit(X, words=words, anchors=[['bernese', 'mountain', 'dog'], ['mountain', 'rocky', 'colorado'], anchor_strength=2)
This one is missing a bracket I think.

@ryanjgallagher
Copy link
Collaborator

The anchor input for the topic model should be a list. Within that list, the entries can be either strings, ints, or lists, and you should be able to do any combination of those for anchoring. Individual strings or ints (indicating you want to anchor only one word to a topic) are converted to lists with a single entry, and strings are converted to their corresponding column index.

So @gregversteeg isn't quite right, even if you pass just a string or int in the anchors list, it should be preprocessed properly. (He's right that the last example is missing a bracket though, my bad).

@owlas Can you provide a simple reproducible example? I can run the examples in the README without any issues using version 1.0.5, so I'm not sure what error you may be getting.

@d-lowl
Copy link

d-lowl commented Nov 10, 2020

Can report the same for 1.0.6. Although I cannot reproduce the error consistently. For some sets of anchors it works, for some it throws an error (all of those should be valid according to the docs though)

@ryanjgallagher
Copy link
Collaborator

@d-lowl Would you be able to provide a minimal reproducible example that does fail?

@d-lowl
Copy link

d-lowl commented Nov 10, 2020

Yeah, I made it work for my case (it was partially a problem in my code), but I still think that there are some edge cases. I will play around with it tomorrow and try to come up with one.

@joelplantinga
Copy link

Hi! I am not sure if this repo is still maintained but I ran into the same issue.

I found that single item anchor lists are transferred (back) into single items here:
https://github.com/gregversteeg/corex_topic/blob/beea64bc41e62dffc5fb87deb506a3e253be0a6c/corextopic/corextopic.py#L367C27-L367C27

This would contradict @ryanjgallagher comment:

Individual strings or ints (indicating you want to anchor only one word to a topic) are converted to lists with a single entry

Removing the edge case for single item lists solved the problem for me:

if len(new_anchor_list) == 0:
    continue
# if len(new_anchor_list) == 1:
#    processed_anchors.append(new_anchor_list[0])
else:
    processed_anchors.append(new_anchor_list)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants