Chinese scattertext #55

sound118 · 2020-04-06T08:19:41Z

Your Environment

Operating System:
Python Version Used:
Scattertext Version Used:
Environment Information:
Browser used (if an HTML error):
Hi,

It seems in your demo code, developer can directly use "chinese_nlp" module from scattertext package. I am wondering for plotting Chinese scatter text, if we could add a list of user defined stopwords and probably some user-defined dictionary specific for certain Chinese context, then use jieba to do the word segmentation and tie all these cleaned results to your demo program?

Thanks

JasonKessler · 2020-04-06T17:14:59Z

You could stop list after tokenization by running corpus.remove_terms(...). Otherwise, feel free to modify AsianNLP.py to fit your use case. It just ducktypes spaCy’s interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinese scattertext #55

Chinese scattertext #55

sound118 commented Apr 6, 2020

JasonKessler commented Apr 6, 2020

Chinese scattertext #55

Chinese scattertext #55

Comments

sound118 commented Apr 6, 2020

Your Environment

JasonKessler commented Apr 6, 2020