Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect of the different possible parameter combinations for ‘mentions’ in the REST API #124

Open
aa303554 opened this issue May 3, 2021 · 2 comments
Labels

Comments

@aa303554
Copy link

aa303554 commented May 3, 2021

The following observations come from the online API

1/When you reverse the order between ‘wikipedia’ and ‘ner’ in the mentions parameter, the result is different. Namely, when ‘ner’ comes second, NER isn’t performed at all. The documentation doesn’t cover this particular constraint.

For the order Wikipedia/ner:

image

Result with ner first and wikipedia second :

image

@lfoppiano
Copy link
Collaborator

Thanks for reporting this @aa303554. This seems indeed a bug as the order of the processes on which mentions are extracted should not change the results.
I need to look into it, for the time being, keep them in order ["ner", "wikipedia"].

@lfoppiano lfoppiano added the bug label May 19, 2021
@kermitt2
Copy link
Owner

kermitt2 commented May 19, 2021

mmm it's not a bug, it depends on the order, and it's the expected result. Actually it has to consider the order.

The mentions field gives the list of "mention recognizers" to be applied successively. If a mention is already recognized by wikipedia, it is not "overwritten" by the NER mention. Similarly if the NER mention is found, the wikipedia one does not apply. In general we must start from the most specific mention recognizer, then finish by the most generic ones, Wikipedia.

This is probably easier to understand when using a specialized mention recognizer like a module to recognize the species name. It has to be applied first because it's the most specific (it already disambiguate the species name, so wikipedia is not as precise). However, the tool has no way to know in advance which one is the most specific, so the order is used.
Does it make sense for you?

Ok we need to update the documentation to clarify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants