Datasets of RIDE (A review journal for digital editions and resources)
-
Updated
Jul 16, 2024
Datasets of RIDE (A review journal for digital editions and resources)
An advanced, extensible web front-end for the Manatee-open corpus search engine
Collection of text corpora for publicly available speeches from Mexican president Andres Manuel Lopez Obrador (AMLO) sourced from YouTube. The dataset includes his daily morning conferences (conferencias mañaneras) 😴🪿
An HTTP API server for mining language corpora using Manatee-Open engine.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
CLARIN FCS 2.0 Endpoint for Manatee-open corpus search engine
German Parliamentary Corpus (GerParCor)
Framework for working with brat-annotated .ann files
The user interface for the Corpus & Repository of Writing, built in Angular
A program for calculating corpora alignments using a pivot language
A collaborative catalog of NLP resources for Indic languages
Scripts for building a geo-located web corpus using Common Crawl data
This is a Google Apps Script library for managing the corpora of Gemini API.
WaG - install your own word profile generator out of diverse data resources
In this repository will be published some of the corpora that I helped to create or to annotate.
Measure the similarity of text corpora for 74 languages
Add a description, image, and links to the corpora topic page so that developers can more easily learn about it.
To associate your repository with the corpora topic, visit your repo's landing page and select "manage topics."