You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's a small language model trained on TinyStories for interpretability with sparse autoencoders and transcoders added. It has no layernorms (this helps with interpretability) which makes it not fit with any existing model architecture in the transformers library. Its architecture is essentially GPT-2's except that it doesn't have layernorms and it has untied embed/deembed.
It would be quite nice to add this using the new model adder that @ArthurZucker has contributed; @ArthurZucker, when back from leave (next week), do you mind sharing with @noanabeshima how to get this done the best way?
Model description
https://github.com/noanabeshima/tiny_model
It's a small language model trained on TinyStories for interpretability with sparse autoencoders and transcoders added. It has no layernorms (this helps with interpretability) which makes it not fit with any existing model architecture in the transformers library. Its architecture is essentially GPT-2's except that it doesn't have layernorms and it has untied embed/deembed.
Open source status
Provide useful links for the implementation
The implementation is here:
https://github.com/noanabeshima/tiny_model/blob/main/tiny_model/lm.py
The weights are here:
https://huggingface.co/noanabeshima/tiny_model/blob/main/tiny_model.pt
The default config corresponding to the weights is:
I am the author.
The text was updated successfully, but these errors were encountered: