Flashtext seem inferior to hash table - am I missing something? #106

Make42 · 2020-04-20T13:50:59Z

Maybe I missed something, but I do not understand what the advantage of using the tie dictionary is over using a hashing dictionary is. Can you tell me what I am missing?

Let K denote the number of keywords in the corpus and Q the number of query words for which you request the check for.

I am guessing that https://www.freecodecamp.org/news/regex-was-taking-5-days-flashtext-does-it-in-15-minutes-55f04411025f/ reflects the time per query. So we have query time complexity of O(Q * K). I actually think this is weird - instinctually I would have expected O(Q * log(K)). Anyhow, just building a hash table would take O(Q).

Regarding build the respective data structures, flashtext should probably take something like Q(K * log(K). The first K is because you need to insert K words for which you need to query the tries data structure. However, since the trie is small at the beginning I put the second K into log. In opposite, the hash table only requires O(1), so we get O(K) in total for inserting the K words.

Assuming I am right: A way to make flashtext useful could be if you build two tries: One for the query words and one for the dictionary. Building the tries would take O(Q * log(Q) + K * log(K)), which would be slower than O(K) for the hash table. But querying would take about O(log(K) * log(Q)), which would be better the O(Q) if Q is large enough and and K is small enough.

Another application for the principles might be for finding longest common substrings.

These two ideas for improvement would of course need a bit more implementation.

thakur-nandan · 2020-05-23T09:24:20Z

Hi @Make42

Regarding the advantages of using the trie dictionary is over a hashing dictionary check this StackOverflow answer- https://stackoverflow.com/questions/245878/how-do-i-choose-between-a-hash-table-and-a-trie-prefix-tree.

Kind Regards,
Nandan Thakur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flashtext seem inferior to hash table - am I missing something? #106

Flashtext seem inferior to hash table - am I missing something? #106

Make42 commented Apr 20, 2020

thakur-nandan commented May 23, 2020

Flashtext seem inferior to hash table - am I missing something? #106

Flashtext seem inferior to hash table - am I missing something? #106

Comments

Make42 commented Apr 20, 2020

thakur-nandan commented May 23, 2020