Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.
-
Updated
May 23, 2014 - Python
Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.
Clustering of Spanish Wikipedia articles.
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Wiki dump parser (jupyter)
Reading the data from OPIEC - an Open Information Extraction corpus
A Search Engine built based on Wikipedia dump of 75GB. Involves creation of Index file and returns search results in real time
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Code and data for the paper 'Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings'
Interactive chatbot using python :)
Convert Wikipedia XML dump files to JSON or Text files
Command line tool to extract plain text from Wikipedia database dumps
A desktop application that searches through a set of Wikipedia articles using Apache Lucene.
Corpus creator for Chinese Wikipedia
A search engine trained from a corpus of wikipedia articles to provide efficient query results.
Builds Wikipedia corpora in I5 (a TEI-based format)
Wikipedia text corpus for self-supervised NLP model training
Some Faroese language statistics taken from fo.wikipedia.org content dump
Add a description, image, and links to the wikipedia-corpus topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-corpus topic, visit your repo's landing page and select "manage topics."