Skip to content

Materials and approach utilized for competing in 10-day long hackathon hosted by Analytics Vidhya. Given a set of research articles, it's category has to be classified.

Notifications You must be signed in to change notification settings

ishitashah23/AV-Janatahack-topic-modeling-research-articles

Repository files navigation

AV-Janatahack-NLP-classification

This competition was hosted from 15th to 23rd August 2020

Team Members

https://github.com/ishitashah23 and https://github.com/shrey-B

Problem Statement

Topic Modeling for Research Articles Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.

Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.

Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:

  1. Computer Science
  2. Physics
  3. Mathematics
  4. Statistics
  5. Quantitative Biology
  6. Quantitative Finance

Files provided

Data Dictionary, train.zip and train.csv

Evaluation Metric

Submissions are evaluated on micro F1 Score between the predicted and observed topics for each article in the test set

Public and Private Split Test reviews are further divided into Public (40%) and Private (60%)

Guidelines for Final Submission

Please ensure that your final submission includes the following:

  1. Solution file containing the predicted 1/0 for each of the 6 topics for every research article in the test set
  2. Code file for reproducing the submission, note that it is mandatory to submit your code for a valid final submission

Final Results

Public leaderboard score - 0.8425905598 (Rank 24);
Private leaderboard score - 0.8385170078 (Rank 47)

Winning Results

Public leaderboard score - 0.8651449994;
Private leaderboard score - 0.8614475202

About

Materials and approach utilized for competing in 10-day long hackathon hosted by Analytics Vidhya. Given a set of research articles, it's category has to be classified.

Topics

Resources

Stars

Watchers

Forks