TinyModel addition #31804

noanabeshima · 2024-07-05T05:50:56Z

Model description

https://github.com/noanabeshima/tiny_model

It's a small language model trained on TinyStories for interpretability with sparse autoencoders and transcoders added. It has no layernorms (this helps with interpretability) which makes it not fit with any existing model architecture in the transformers library. Its architecture is essentially GPT-2's except that it doesn't have layernorms and it has untied embed/deembed.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

The implementation is here:
https://github.com/noanabeshima/tiny_model/blob/main/tiny_model/lm.py

The weights are here:
https://huggingface.co/noanabeshima/tiny_model/blob/main/tiny_model.pt

The default config corresponding to the weights is:

    d_model=768,
    n_layers=4,
    n_heads=16,
    max_seq_len=256,
    vocab_size=10_000

I am the author.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-07-05T05:57:26Z

It would be quite nice to add this using the new model adder that @ArthurZucker has contributed; @ArthurZucker, when back from leave (next week), do you mind sharing with @noanabeshima how to get this done the best way?

ArthurZucker · 2024-07-16T05:40:58Z

Hey! sorry for the delay! Yep, my recommendation is to use the #30868 tool to isolate the changes as much as possible 🤗

noanabeshima added the New model label Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TinyModel addition #31804

TinyModel addition #31804

noanabeshima commented Jul 5, 2024 •

edited

Loading

LysandreJik commented Jul 5, 2024

ArthurZucker commented Jul 16, 2024

TinyModel addition #31804

TinyModel addition #31804

Comments

noanabeshima commented Jul 5, 2024 • edited Loading

Model description

Open source status

Provide useful links for the implementation

LysandreJik commented Jul 5, 2024

ArthurZucker commented Jul 16, 2024

noanabeshima commented Jul 5, 2024 •

edited

Loading