Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training HuggingFace tokenizer - ignore_merges #1537

Closed
ykoyfman opened this issue May 22, 2024 · 2 comments
Closed

Training HuggingFace tokenizer - ignore_merges #1537

ykoyfman opened this issue May 22, 2024 · 2 comments

Comments

@ykoyfman
Copy link

Looking through Llama3 changes, I see that "ignore_merges" was added as a property to support conversion from tiktoken models. Can a native HF tokenizer be trained using this property? It's not clear if this is possible with, say, train_new_from_iterator. CC @ArthurZucker - Thanks

@ArthurZucker
Copy link
Collaborator

I think that it's not the case yet, but we should support it!

@github-actions github-actions bot added the Stale label Jul 7, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2024
@huggingface huggingface deleted a comment from github-actions bot Jul 16, 2024
@ArthurZucker
Copy link
Collaborator

(Anyone feel free to open a PR if you have time!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants