Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
-
Updated
Jul 16, 2024 - Python
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
Developed an NLP system using Gradio and Hugging Face to classify disaster tweets with both machine learning (ML) and deep learning (DL) models.
Remove extra whitespace from text.
ValX is an open-source Python package for text cleaning tasks, including profanity detection and removal. Now also includes sensitive information detection, and removal.
NLP预/后处理工具。
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Sentiment Analysis For Restaurant Reviews
Language-Detection
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Article title, authors, date and body extraction dataset.
NLP
Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
The recommendation that recommends the right candidates to the recruiters to a job applicantion. The content is the personal information and their job desires. Implementation of a recommender system based using filtering techniques and Natural language processing to recommend top jobs based on similarity.
This Repo includes modules that helps NLP related tasks.
Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy
In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.
Repo with basic start on Recurrent Neural Networks, Word2Vec, Doc2Vec, TFIDF vectors and NLP basics
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.
Add a description, image, and links to the text-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the text-cleaning topic, visit your repo's landing page and select "manage topics."