A Document Search Engine with TF-IDF.
-
Updated
Jul 16, 2024 - Python
A Document Search Engine with TF-IDF.
MailGuard is an intelligent spam detection tool that classifies emails as spam or ham using a Multinomial Naive Bayes model. Built with Streamlit, it leverages natural language processing techniques for text cleaning and feature extraction.
Fake news detection using machine learning
This project focuses on building a classifier to distinguish between spam and ham emails using Logistic Regression. Key steps include data preprocessing, feature extraction with TF-IDF vectorization, and model evaluation with accuracy metrics and a confusion matrix.
Magic-XML — is a modern web application developed for the convenient and swift transformation of data from XML files into CSV format. The application leverages the power of FastAPI to ensure high performance in request processing, as well as utilizes machine learning algorithms and natural language processing for efficient analysis
This repository explores the correlation between news headlines' textual embeddings and their political orientation. Using clustering and transformer-based embeddings, the goal is to classify news sources based on headline content. Key features include clustering visualizations, BERT embeddings, and comparisons between K-Means, Spectral, and DBSCAN
Sentiment Analysis of Product Reviews from Amazon
NLP demos and talks made with Jupyter Notebook and reveal.js
This repository contains my solution for the Kaggle competition Automated Essay Scoring 2.0. The goal of this project is to develop an automated system capable of scoring essays based on their content and quality using machine learning techniques.
The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.
A TF-IDF based Research paper Recommender that gives the most relevant research paper for a Topic of Interest.
Film Junky Union, a new community for classic movie fans is developing a system to filter and categorize movie reviews, and its main mission is to train models to automatically detect negative reviews.
Simple sentiment analysis of IMDB movie reviews dataset using count vectorizer, Tfidfvectorizer and nltk library.
This projects aim to detect spam emails by using logistic regression and TF-IDF vectorization to convert email messages into numerical values and perform logistic regression to classify spam/ham emails.
A Natural Language Processing model to perform Sentiment Analysis of US Airline Customers
In the above 3 tasks we will study and investigate the proximity between 3 different groups of texts taken from different press sections. With reference to text mining, data cleaning, vector representation of rituals using various methods and performing various NLP tasks.
This application features a GUI for classifying user-input text as spam or ham using a Naive Bayes algorithm for machine learning.
Train model using your own dataset and use it to predict the label for a given text. Additionally, it identify if the text is likely to be spam or irrelevant.
This cloud recommendation system suggests similar services based on use cases. Powered by a TF-IDF backend in Flask and a React frontend, it provides accurate and user-friendly recommendations for cloud services.
Add a description, image, and links to the tfidf-vectorizer topic page so that developers can more easily learn about it.
To associate your repository with the tfidf-vectorizer topic, visit your repo's landing page and select "manage topics."