(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
-
Updated
Jan 3, 2023 - Python
(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
Some Faroese language statistics taken from fo.wikipedia.org content dump
Builds Wikipedia corpora in I5 (a TEI-based format)
A Search Engine built based on Wikipedia dump of 75GB. Involves creation of Index file and returns search results in real time
Clustering of Spanish Wikipedia articles.
A search engine trained from a corpus of wikipedia articles to provide efficient query results.
Create a wiki corpus using a wiki dump file for Natural Language Processing
RNN model trained from wikipedia corpus
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Command line tool to extract plain text from Wikipedia database dumps
Wiki dump parser (jupyter)
Interactive chatbot using python :)
IR search Engine for Wikipedia app
A desktop application that searches through a set of Wikipedia articles using Apache Lucene.
Code and data for the paper 'Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings'
📚 A Kotlin project which extracts ngram counts from Wikipedia data dumps.
Repositório para disponibilização de bases de dados do Wikipedia e Simple Wikipedia pré-processadas, além de scripts de pré-processamento e geração de bases em Python.
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Add a description, image, and links to the wikipedia-corpus topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-corpus topic, visit your repo's landing page and select "manage topics."