corpora
Here are 154 public repositories matching this topic...
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Jun 25, 2024 - Python
微信公众号语料库
-
Updated
Jan 7, 2019
Data repository for pretrained NLP models and NLP corpora.
-
Updated
Mar 16, 2018 - Python
A collaborative catalog of NLP resources for Indic languages
-
Updated
Mar 14, 2024
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
-
Updated
Jan 5, 2021 - Python
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
-
Updated
Jan 1, 2019 - Python
A web-based engine for creating and annotating textual corpora
-
Updated
Aug 26, 2023 - PHP
An advanced, extensible web front-end for the Manatee-open corpus search engine
-
Updated
Jul 12, 2024 - TypeScript
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
-
Updated
Jul 27, 2023 - Python
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
-
Updated
Dec 19, 2023 - PHP
Named Entity Recognition for biomedical entities
-
Updated
Jan 11, 2023 - Python
Unannotated Spanish 3 Billion Words Corpora
-
Updated
Oct 20, 2022 - Python
German Parliamentary Corpus (GerParCor)
-
Updated
May 24, 2024 - Java
Improve this page
Add a description, image, and links to the corpora topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the corpora topic, visit your repo's landing page and select "manage topics."