Model's _fit should accept Dataset also, not just BatchVectorizer #70

Alvant · 2020-05-24T22:47:40Z

Seems more natural for a model to fit on Dataset.
Maybe better to use Union[artm.BatchVectorizer, topicnet.cooking_machine.Dataset] instead of just artm.BatchVectorizer (Union — for compatibility)?

The text was updated successfully, but these errors were encountered:

Alvant · 2020-05-24T22:50:46Z

off-topic (although not quite): BaseModel has TODO in _fit's docstring for dataset_trainable

bt2901 · 2020-05-25T00:39:04Z

Did you mean Union instead of Tuple? Or am I confused about OR operator in typing?

Alvant · 2020-05-25T07:09:58Z

Exactly! The owls are not what they seem. Corrected!

Evgeny-Egorov-Projects · 2020-05-25T10:39:17Z

First, _fit is "protected" method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work. Meaning, that normally user should not use it to train a model and it exists so we can hook up library components with this method.

Given that we go forward and implement this enhancement we will have to change some of the core architecture: making method "legal" to use makes it so that we have to 1) add a cube information to the fit 2) check that the fit is not overlapping with previous actions 3) train model in a separate thread and save/load it afterwards...

See where it's going? the nice and simple method grows into something that duplicates existing functionality and puts it into the "models" class that we already wanted to "separate" from the training action.

bt2901 · 2020-05-25T11:02:41Z

I think you are moving the goalposts. We do not provide guarantees on _fit, but it does not forbid the user to use it. Making this method a bit more flexible does not change that.

Also, training a model without Cubes + Experiment overhead is exactly why one would consider using the method (e.g. for very dirty prototyping or perhaps for cases not covered by Cubes + Experiment yet).

Alvant · 2020-05-25T14:02:28Z

First, _fit is "protected" method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work

Ok, but it doesn't mean that we shouldn't think about how to make the method better 🙂

Alvant changed the title ~~Model's _fit should accept Dataset also, not just batch_vectorizer~~ Model's _fit should accept Dataset also, not just BatchVectorizer May 24, 2020

Alvant added discuss Not everything clear, further communication required enhancement New feature or request labels May 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model's _fit should accept Dataset also, not just BatchVectorizer #70

Model's _fit should accept Dataset also, not just BatchVectorizer #70

Alvant commented May 24, 2020 •

edited

Loading

Alvant commented May 24, 2020 •

edited

Loading

bt2901 commented May 25, 2020

Alvant commented May 25, 2020

Evgeny-Egorov-Projects commented May 25, 2020

bt2901 commented May 25, 2020

Alvant commented May 25, 2020

Model's _fit should accept Dataset also, not just BatchVectorizer #70

Model's _fit should accept Dataset also, not just BatchVectorizer #70

Comments

Alvant commented May 24, 2020 • edited Loading

Alvant commented May 24, 2020 • edited Loading

bt2901 commented May 25, 2020

Alvant commented May 25, 2020

Evgeny-Egorov-Projects commented May 25, 2020

bt2901 commented May 25, 2020

Alvant commented May 25, 2020

Alvant commented May 24, 2020 •

edited

Loading

Alvant commented May 24, 2020 •

edited

Loading