Implement your Own BERT

This is an exercise in developing a minimalist version of BERT.

Implemented important components of the BERT model to gain a better understanding of its architecture. Applied Sentence Classification on two datasets: the sst dataset and the cfimdb dataset with the implemented BERT model.

Important Notes

Follow setup.sh to properly setup the environment and install dependencies. Make sure to do the rest of your work on the appropriate environment.
There is a detailed description of the code structure in structure.md, including a description of which parts you will need to implement.
You are only allowed to use torch, no other external libraries are allowed (e.g., transformers).
Use the following commands to run the code:

mkdir -p GMUID

python3 classifier.py --option [pretrain/finetune] --epochs NUM_EPOCHS --lr LR --train data/sst-train.txt --dev data/sst-dev.txt --test data/sst-test.txt

Reference accuracies:

Mean reference accuracies over 10 random seeds with their standard deviation shown in brackets.

Pretraining for SST:
Dev Accuracy: 0.391 (0.007) Test Accuracy: 0.403 (0.008)

Finetuning for SST:
Dev Accuracy: 0.515 (0.004) Test Accuracy: 0.526 (0.008)

Finetuning for CFIMDB:
Dev Accuracy: 0.966 (0.007) Test Accuracy: -

Acknowledgements

This assignment is adapted from the Carnegie Mellon University's CS11-711 course and the minBERT assignment created by Shuyan Zhou, Zhengbao Jiang, Ritam Dutt and Brendon Boldt.

Parts of the code are from the transformers library (Apache License 2.0).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
PRACTICE.ipynb		PRACTICE.ipynb
README.md		README.md
base_bert.py		base_bert.py
bert.py		bert.py
cfimdb-dev.txt		cfimdb-dev.txt
cfimdb-test.txt		cfimdb-test.txt
cfimdb-train.txt		cfimdb-train.txt
classifier.py		classifier.py
config.py		config.py
optimizer.py		optimizer.py
prepare_submit.py		prepare_submit.py
sanity_check.data		sanity_check.data
sanity_check.py		sanity_check.py
setup.sh		setup.sh
sst-dev.txt		sst-dev.txt
sst-test.txt		sst-test.txt
sst-train.txt		sst-train.txt
structure.md		structure.md
tokenizer.py		tokenizer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implement your Own BERT

Important Notes

Reference accuracies:

Acknowledgements

About

Releases

Packages

Languages

License

ramiyappan/BERT

Folders and files

Latest commit

History

Repository files navigation

Implement your Own BERT

Important Notes

Reference accuracies:

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages