I'm a graduate Linguistics student doing a master's in Machine Learning and Natural Language Processing.
I'm passionate about constantly improving myself in the fields of Data Science and Machine learning with the aim of bringing the most effective solutions to different types of business related real-world problems.
During my master's education, I realized I would enjoy leveraging AI to drive business impact in BI environments. Therefore, besides my studies, I'm learning the well-known BI tools of PowerBI and Tableau to gain insights from data and help a given business in a decision making process.
In my spare time, I write posts about my personal experience in NLP and publish them in my GitHub profile under the NLP Tutorials repository. Very soon, I'm going to add my so far dashboards done with Tableau and PowerBI as well.
Languages
Business Intelligence
Natural Language Processing
Machine Learning
IDEs & Notebooks
Other Technologies & Tools
-
Turkish Sentiment Analyser - Hugging Face - Web App
Fine-tuned the distilled Turkish BERT model on a review classification dataset for sentiment analysis. The final model achieved 86% accuracy and was deployed to Hugging Face Spaces using Streamlit as an interactive web app. The app provides a no-code way for people to see whether a particular review is "positive" or "negative".
-
Toxic Comment Detector - Web App
Binary classification project to predict whether a comment is toxic or not. Three machine learning models of Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine were used. The best model was a Naive Bayes classifier with TF-IDF Vectorizer with the F1 and Recall scores of 0,85 and 0,88, respectively. The application uses this model to predict the toxicity of comments.
-
cst5 is a tiny T5 model for the Czech language that is based on the smaller version of Google's mT5 model. cst5 is meant to help people in doing experiments for the Czech language by enabling them to use a lightweight model, rather than the 101 languages-covering massive mT5. cst5 was obtained by retaining only the Czech and English embeddings of the mT5 model, during which the total size was reduced from 2.2GB to 0.9GB as a result of shrinking the original "sentencepiece" vocabulary from 250K to 30K tokens and parameters from 582M to 244M. cst5, thus, allows people to do fine-tuning for further downstream tasks in the Czech language with less size requirement and without any loss in quality from the original multilingual model.
-
Financial Sentiment Analysis with Machine Learning, LSTM, and BERT Transformer
Financial sentiment analysis project to predict if a given financial text is to be considered as positive, negative or neutral. Machine learning, LSTM, and BERT transformer were used during the process. The best result was obtained with BERT. It achieved the accuracy score of 0.77.