Skip to content

Identify the freedom of a local news outlet by comparing sentiment and stance of published news against international outlets.

Notifications You must be signed in to change notification settings

Shakleen/News-Outlet-Freedom-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Outlet Freedom Detection

There are many news organizations around the world. News organizations play a vital role in relaying important news of home and abroad to its readers. These news often cover incidents that are either positive, negative, or neutral. Moreover, some news stories can be viewed as talking for a country or talking against it.

Freedom of speech is the right to express one's ideas and opinions without censorship, restraint, or fear of retribution.

A news outlet is free if it can report news in an unbiased manner and free from censorship. In this project, we aim to detect if a local news organization is free. To do so, we compare the sentiment and stance of the organization with international news reporting institutions Reuters and Associated Press. A news outlet whose news correlates well with these international organizations are deemed as having freedom of press. Meaning, they are free from censorship.

Python Jupyter PyTorch HuggingFace LLAMA2 Open In Colab

Table of Contents
  1. Problem Statement
  2. Data Collection
  3. BERTopic
  4. Sentiment and stance analysis with LLaMa-2
  5. Case Studies
  6. Acknolwedgements

Problem Statement

This study investigates freedom of speech in local news across countries, examining topic-specific distinctions by comparing sentiment and stance scores with international sources to reveal correlations and assess agreement levels.

(back to top)

Data Collection

Source Canada Russia China
Local CBC and Global News The Moscow Times China Daily
International Reuters and Associated Press
  • Used Selenium to search and accumulate article URLs.
  • Employed News Please to fetch article data from collected URLs.
  • Raw data was processed to get it ready for the text mining steps.

drawing

(back to top)

BERTopic

I used BERTopic to perform topic modeling. BERTopic has 4 distinct phases:

  1. It uses Sentence Transformer Model to convert sentences into vector representations. Often having dimensions exceeding 256.

  2. To reduce dimensions from 256, BERTopic employs UMAP. This reduces the dimensions while retaining global and local information among the data.

  3. Afterwards, vectors are clustered using HDBSCAN a hiererchical algorithm.

  4. Finally, c-TF-IDF is used to get topic representations for each cluster.

  5. Fed top 10 represention words for each topic into ChatGPT to get a word for custom topic name.

drawing

(back to top)


Sentiment and stance analysis with LLaMa-2

I used LLaMa-2, the open-source LLM from meta, for sentiment and stance analysis. To do so, I first had to finetune the base version of a 6 billion parameter LLaMa2 model.

Funetuning LLaMa-2

  1. Engineered prompt to get the best possible answer from an LLM. The prompt was tuned with Prompt Perfect.

     As a neutral news analyst, assess the sentiment and stance of the news article excerpt and assign a score between -1.0 (completely negative/against-{country}) and 1.0 (completely positive/pro-{country}) for both sentiment and stance. Provide a single short sentence to justify your scores, drawing on the article's language, tone, and presentation to support your analysis.
    
     Article Excerpt:
     - Title: "{title}"
     - Content: "{content}{dot}"
    
     Output format: 
     1. Sentiment: [Positive/Neutral/Negative]
         * Score: [Your Score]
         * Reason: [Your Reason] 
     2. Stance: [Pro-{country}/Impartial/Against-{country}]
         * Score: [Your Score]
         * Reason: [Your Reason]
    
  2. Select 300 samples from dataset to finetune LLaMa-2 model. Fit each example in the prompt and feed it to ChatGPT. Save answers from ChatGPT as finetuning dataset.

  3. Utilize huggingface's autotrain package to finetune LLaMa-2.

    • Used QLoRA to enable training on single GPU on google colab.

    • Used PEFT (Parameter Efficient Finetuning) to reduce training time.

Sentiment and Stance Analysis

  1. Use finetuned model to inference on collected data.

  2. Parse responses to get sentiment and stance classes and scores for each article.

  3. Perform hypothesis testing to arrive at conclusions.

Test Name Parameter of Interest Null Hypothesis
Welch Test Mean Both sources on average report news with the same score.
Wilcoxon Test Median
F-test Variance News from sources have similar variance across sentiment and/or stance.
Pearson’s Test Linear Correlation Sentiment and/or stance of reported news from sources aren’t correlated.
Spearman’s Test Monotonic Relationship

(back to top)


Case Studies

Detailed case studies about China, Russia, and Canada can be found here.

Acknolwedgements

As a graduate student of University of Rochester, I am greatly indebted to my teachers for arming me with the knowledge required to perform the analytical and technical aspects of this project. In particular,

  • I would like to express my gratitude to Professor Jiebo Luo for his invaluable guidance throughout the Data Mining course. The knowledge and insights I gained from this course have been instrumental in processing the accumulated news corpus and performing topic modelling using BERTopic. I am thrilled to see how the techniques I learned from the course can be applied in real-world scenarios.

  • I would like to extend my sincere appreciation to Professor Anson Kahng for his invaluable guidance throughout the Computational Introduction to Statistics course. The coursework provided me with the necessary tools to design and carry out hypothesis tests to find statistically significant distinctions between local and international news. I am grateful for the opportunity to apply the knowledge I gained from the course in real-world scenarios.

  • I would like to extend my sincere appreciation to Professor Hangfeng He for his invaluable guidance throughout the Natural Language Processing course. The course provided me with a comprehensive understanding of the world of LLMs and armed me with the knowledge required to utilize LLaMa-2 for this project. I am grateful for the opportunity to apply the knowledge I gained from the course in real-world scenarios.

(back to top)

About

Identify the freedom of a local news outlet by comparing sentiment and stance of published news against international outlets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages