Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stemming and lemmatisation section #304

Open
LifeIsStrange opened this issue Jul 23, 2019 · 3 comments
Open

Add stemming and lemmatisation section #304

LifeIsStrange opened this issue Jul 23, 2019 · 3 comments

Comments

@LifeIsStrange
Copy link
Contributor

LifeIsStrange commented Jul 23, 2019

According to the List_of_unsolved_problems_in_computer_science

Is there any perfect stemming algorithm in the English language?

I believe that lemmatization is not solved too.

It would be wonderful to add the states of the arts in both tasks.
BTW, lemmatization consists for example of transforming the conjugated verb: jumped to his noun form: jump.
Does a tool that takes in argument a word e.g fast and another argument specifying the requested part of speech form an e.g adverb
which would output fastly.
In fact, stemming and lemmatization are a special case of the NLP task I need.
If it exists, does someone know how it's called? Where could I ask?
Sorry for the digression.

@LifeIsStrange
Copy link
Contributor Author

so if en mean english:
SOTAs ->
en_ewt: 97.23
en_gum: 96.18
en_lines: 96.56
en_pud: 96.39

which are not that much accurate...

@sebastianruder
Copy link
Owner

Thanks for the note! Would you mind taking the lead on this, i.e. adding some state-of-the-art results for lemmatization and/or stemming?
I think the task that you're looking for is morphological reinflection. Note that you need not only the part-of-speech but the remaining morphosyntactic features (otherwise the problem is underspecified).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants