Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyModel addition #31804

Open
2 tasks done
noanabeshima opened this issue Jul 5, 2024 · 2 comments
Open
2 tasks done

TinyModel addition #31804

noanabeshima opened this issue Jul 5, 2024 · 2 comments

Comments

@noanabeshima
Copy link

noanabeshima commented Jul 5, 2024

Model description

https://github.com/noanabeshima/tiny_model

It's a small language model trained on TinyStories for interpretability with sparse autoencoders and transcoders added. It has no layernorms (this helps with interpretability) which makes it not fit with any existing model architecture in the transformers library. Its architecture is essentially GPT-2's except that it doesn't have layernorms and it has untied embed/deembed.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

The implementation is here:
https://github.com/noanabeshima/tiny_model/blob/main/tiny_model/lm.py

The weights are here:
https://huggingface.co/noanabeshima/tiny_model/blob/main/tiny_model.pt

The default config corresponding to the weights is:

    d_model=768,
    n_layers=4,
    n_heads=16,
    max_seq_len=256,
    vocab_size=10_000

I am the author.

@LysandreJik
Copy link
Member

It would be quite nice to add this using the new model adder that @ArthurZucker has contributed; @ArthurZucker, when back from leave (next week), do you mind sharing with @noanabeshima how to get this done the best way?

@ArthurZucker
Copy link
Collaborator

Hey! sorry for the delay! Yep, my recommendation is to use the #30868 tool to isolate the changes as much as possible 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants