You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
let logsum_alt = (sum + freq[id]*(alternatives.len() - 1)asf64).ln();
All sentencepiece[i] in the sentences are replaced with alternatives[i] when sentencepiece[i] is removed. Should the code at row 402 be let logsum_alt = (sum + freq[id] * (alternatives[id].len() - 1) as f64).ln(); because alternatives[id] stores the pieces that replace the piece of freq[id]?
The text was updated successfully, but these errors were encountered:
Codesticker
changed the title
Might be a bug in Unigram Trainer
[BUG]Might be a bug in Unigram Trainer
Jun 4, 2024
tokenizers/tokenizers/src/models/unigram/trainer.rs
Lines 397 to 402 in 25aee8b
All sentencepiece[i] in the sentences are replaced with alternatives[i] when sentencepiece[i] is removed. Should the code at row 402 be
let logsum_alt = (sum + freq[id] * (alternatives[id].len() - 1) as f64).ln();
because alternatives[id] stores the pieces that replace the piece of freq[id]?The text was updated successfully, but these errors were encountered: