Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model's _fit should accept Dataset also, not just BatchVectorizer #70

Open
Alvant opened this issue May 24, 2020 · 6 comments
Open

Model's _fit should accept Dataset also, not just BatchVectorizer #70

Alvant opened this issue May 24, 2020 · 6 comments
Labels
discuss Not everything clear, further communication required enhancement New feature or request

Comments

@Alvant
Copy link
Collaborator

Alvant commented May 24, 2020

_fit

Seems more natural for a model to fit on Dataset.
Maybe better to use Union[artm.BatchVectorizer, topicnet.cooking_machine.Dataset] instead of just artm.BatchVectorizer (Union — for compatibility)?

@Alvant Alvant changed the title Model's _fit should accept Dataset also, not just batch_vectorizer Model's _fit should accept Dataset also, not just BatchVectorizer May 24, 2020
@Alvant
Copy link
Collaborator Author

Alvant commented May 24, 2020

off-topic (although not quite): BaseModel has TODO in _fit's docstring for dataset_trainable
base_fit

@Alvant Alvant added discuss Not everything clear, further communication required enhancement New feature or request labels May 24, 2020
@bt2901
Copy link
Contributor

bt2901 commented May 25, 2020

Did you mean Union instead of Tuple? Or am I confused about OR operator in typing?

@Alvant
Copy link
Collaborator Author

Alvant commented May 25, 2020

Exactly! The owls are not what they seem. Corrected!

@Evgeny-Egorov-Projects
Copy link
Contributor

First, _fit is "protected" method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work. Meaning, that normally user should not use it to train a model and it exists so we can hook up library components with this method.

Given that we go forward and implement this enhancement we will have to change some of the core architecture: making method "legal" to use makes it so that we have to 1) add a cube information to the fit 2) check that the fit is not overlapping with previous actions 3) train model in a separate thread and save/load it afterwards...

See where it's going? the nice and simple method grows into something that duplicates existing functionality and puts it into the "models" class that we already wanted to "separate" from the training action.

@bt2901
Copy link
Contributor

bt2901 commented May 25, 2020

I think you are moving the goalposts. We do not provide guarantees on _fit, but it does not forbid the user to use it. Making this method a bit more flexible does not change that.

Also, training a model without Cubes + Experiment overhead is exactly why one would consider using the method (e.g. for very dirty prototyping or perhaps for cases not covered by Cubes + Experiment yet).

@Alvant
Copy link
Collaborator Author

Alvant commented May 25, 2020

First, _fit is "protected" method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work

Ok, but it doesn't mean that we shouldn't think about how to make the method better 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Not everything clear, further communication required enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants