Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support google/flan-t5-* #20

Open
mooijtech opened this issue Jun 23, 2023 · 5 comments
Open

Support google/flan-t5-* #20

mooijtech opened this issue Jun 23, 2023 · 5 comments

Comments

@mooijtech
Copy link

I would like to use the following model(s):
https://huggingface.co/google/flan-t5-small
https://huggingface.co/google/flan-t5-xxl

What would be required to add support if I were to look at contributing myself?

Kind regards,
Marten

@mooijtech
Copy link
Author

Not sure if T5 is compatible with BART, hopefully it is since both are encoder-decoder. Seems to be some config.json differences, trying to modify it now.

@mooijtech
Copy link
Author

Stuck on input encoding embeddings.

@mooijtech
Copy link
Author

mooijtech commented Jul 27, 2023

T5 uses an encoder-decoder architecture that closely resembles the original transformer. The differences are:

    LayerNorm is applied immediately before each attention and feed forward transformation (i.e., outside of the residual path)

    No additive bias is used for LayerNorm (i.e., see here; we only use scale and eliminate the additive bias)

    A simple position embedding scheme is used that adds a scalar to the corresponding logit used to compute attention weights

    Dropout is applied throughout the network (e.g., attention weights, feed forward network, skip connection, etc.)

3f25f11e-1daf-4711-940a-6b09a1f62ae7_2298x1474

@matteo-grella
Copy link
Member

@mooijtech I am ready to work on this together, let me know if you're still interested :)

@mooijtech
Copy link
Author

mooijtech commented Nov 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants