Abstractive-Summarization-using-BART

An abstractive summarization tool that can condense documents and web pages using Bidirectional Auto-Regressive Encoders for Transformer or BART for short. It is a desktop application with a UI constructed with python.

How to Run it

Install required packages using thr requirements.txt

pip install -r requirements.txt

Run Main.py

1. Architecture Diagram

The system diagram starts with the methods of input with the standard techniques that involve typing out documents or uploading text files or documents from the user’s systems. If the users want to extract information from a webpage, the URL to the webpage is provided in a text field, and the request is sent to the Web Scraping module, which collects this information from the webpage and processes it to remove unnecessary details such as tags, links, and stopwords, etc. The Text summarization model takes this document which has been input and creates a bag of word models to collect only the vital information, which is then displayed to the User. The user is then allowed an option to encrypt the data using the ECIES algorithm, which generates a 256-bit symmetric key for the user with the ability to exchange this key with other users using the Diffie Hellman Key Exchange algorithm. This file can then be saved to the user’s system for future reference or shared with other authorized personnel.

2. Methodology

In the BERT architecture, the model has access to the entire sequence of tokens to predict the masked or missing tokens. While this may be useful for other NLP tasks such as predicting token positions, it is limited in summarization tasks. Summarization tasks, by their nature, limit the model to the tokens seen thus far. The argument that controls much information is available to the model is known as the attention mask.

The GPT2 model is more suited for the prediction of the next masked token due to its use of the casual attention mask. This makes it suitable for prediction tasks but less effective at downstream tasks, such as situations where the whole input is required to give an output. In essence, GPT2 only uses words it has seen before.

BART adopts a fully visible mask similar to BERT for its encoding process and a casual mask similar to the GPT2 model for the decoder process. The encoder and decoder are connected through cross attention, where every decoder layer performs attention over the encoder’s hidden states. This structure helps the overall output be closer to the input given.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
__pycache__		__pycache__
curation-corpus-master		curation-corpus-master
figma prototypes		figma prototypes
images		images
public		public
saves		saves
views		views
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
Main.py		Main.py
README.md		README.md
Sample for summary.txt		Sample for summary.txt
index.js		index.js
requirements.txt		requirements.txt
test.py		test.py
tkinter_custom_button.py		tkinter_custom_button.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstractive-Summarization-using-BART

How to Run it

1. Architecture Diagram

2. Methodology

3. Results

A. Rouge-1 Scores

B. Rouge-2 Scores

C. Rouge-L Scores

About

Releases

Packages

Languages

Srinivas-Natarajan/Abstractive-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Abstractive-Summarization-using-BART

How to Run it

1. Architecture Diagram

2. Methodology

3. Results

A. Rouge-1 Scores

B. Rouge-2 Scores

C. Rouge-L Scores

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages