Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the numpy requirement #16649

Open
carmocca opened this issue Feb 6, 2023 · 23 comments · May be fixed by #19959 or #20090
Open

Drop the numpy requirement #16649

carmocca opened this issue Feb 6, 2023 · 23 comments · May be fixed by #19959 or #20090
Assignees
Labels
good first issue Good for newcomers help wanted Open to be worked on pl Generic label for PyTorch Lightning package refactor
Milestone

Comments

@carmocca
Copy link
Contributor

carmocca commented Feb 6, 2023

Outline & Motivation

We barely use numpy. We should replace its uses with torch.

Reduce the package footprint

Pitch

Remove https://github.com/Lightning-AI/lightning/blob/7bbbe22636853e1ee6fa57b557b3408d45233589/requirements/pytorch/base.txt#L4

Update numpy usages with torch.

Additional context

No response

cc @Borda @justusschock @awaelchli

@carmocca carmocca added help wanted Open to be worked on refactor pl Generic label for PyTorch Lightning package labels Feb 6, 2023
@carmocca carmocca added this to the future milestone Feb 6, 2023
@carmocca carmocca added the good first issue Good for newcomers label Feb 6, 2023
@Borda
Copy link
Member

Borda commented Feb 6, 2023

I think there was some cross-dependency, leaking numpy typing which was changed in the frozen version but lets see if all is fine now :)

@awaelchli
Copy link
Member

We had a strong dependency on numpy for working with jagged arrays when post-processing the various output formats for the epoch_end hooks.
Since this was removed, we have a weaker dependency now.
One notable usage though is in the seeding: https://github.com/Lightning-AI/lightning/blob/7bbbe22636853e1ee6fa57b557b3408d45233589/src/lightning/fabric/utilities/seed.py#L96-L101

@carmocca
Copy link
Contributor Author

carmocca commented Feb 6, 2023

There's this issue on PyTorch to include the SeedSequence API: pytorch/pytorch#56805, but as a workaround, they added pytorch/pytorch#56797
The relevant discussion is at https://github.com/pytorch/pytorch/pull/56488/files#r619435608

This PR can be done in pieces, we don't need to replace all usages concurrently.

@dingusagar
Copy link
Contributor

@carmocca , if this issue is planned to be done in pieces, I can help with some module. let me know.

@carmocca
Copy link
Contributor Author

carmocca commented Feb 8, 2023

@dingusagar You can go ahead and replace the usages you are confident about.

@ishandutta0098
Copy link
Contributor

@carmocca @Borda is this issue still active? I would be interested to work on it.

Can you provide me with a more detailed context of what needs to be done?

Do we plan to remove the numpy requirement completely and replace it with torch or this is for some particular files?

@carmocca
Copy link
Contributor Author

Do we plan to remove the numpy requirement completely and replace it with torch

This is the end goal. Some of our numpy usages are easy to replace, others not so much. You can start doing the easy replacements first, get those merged, and then addressing the hard ones separately after.

Once all are done, the requirement can be removed.

@ishandutta0098
Copy link
Contributor

Okay, I will get started with it. @carmocca can you assign this issue to me?

@ishandutta0098
Copy link
Contributor

@carmocca I have started off with the numpy to torch conversion, have begun with the LR Finder, and raised a PR at #17264

@ishandutta0098
Copy link
Contributor

I have removed the numpy usage for the callback scripts, and raised a PR #17267

@ryan597
Copy link
Contributor

ryan597 commented Apr 4, 2023

I'd also like to help on this.

@ishandutta0098 I can work on the model summary if you haven't yet.

@ishandutta0098
Copy link
Contributor

Hi @ryan597 I have made changes to most of the files in lightning and they are currently on my local branch, the testing was pending so I didn't create a pull request for them yet.

However I haven't changed anything in the examples folder yet, maybe you can give it a try. If you do kindly share the files you are modifying so that we don't change the same ones!

@ryan597
Copy link
Contributor

ryan597 commented Apr 4, 2023

Sure thanks. I did a quick grep through the files and see numpy in these examples. They look like quick changes so I can do these.

examples/fabric/meta_learning/train_fabric.py
examples/fabric/meta_learning/train_torch.py
examples/fabric/reinforcement_learning/rl/agent.py
examples/fabric/reinforcement_learning/rl/utils.py

Edit to include:
examples/fabric/reinforcement_learning/train_torch.py
examples/fabric/reinforcement_learning/train_fabric.py
examples/fabric/reinforcement_learning/train_fabric_decoupled.py

@ishandutta0098
Copy link
Contributor

Okay, I would skip these files then.

@abisheksaravanakumar
Copy link

Hey are you guys still working on this issue? Anything I could contribute?

@Borda
Copy link
Member

Borda commented Apr 17, 2024

Hey are you guys still working on this issue? Anything I could contribute?

You can take it =)

@Bhavay-2001
Copy link
Contributor

Hi @Peiffap, which files in src/lightning are you currently working at? Can I share some files with you to drop the numpy requirement.
Thanks

@Peiffap
Copy link
Contributor

Peiffap commented Jun 6, 2024

Hi @Bhavay-2001, cheers for checking in with me on this!

which files in src/lightning are you currently working at?

I'm not currently actively working on this issue, but I think it would make sense in general to create a task list for this issue with the work left to do (i.e. the files or directories with numpy dependencies), in order to keep track of who's doing what/which tasks remain.

Can I share some files with you to drop the numpy requirement.

I'm not sure I understand what you mean here.

@Bhavay-2001
Copy link
Contributor

Hi @Peiffap, I will make a list of files that need to be changed or updated from numpy to torch and will share. Then I can work on my half of files and you can work on yours.
Thanks

@Peiffap
Copy link
Contributor

Peiffap commented Jun 7, 2024

I will make a list of files that need to be changed or updated from numpy to torch and will share.

That sounds good!

Then I can work on my half of files and you can work on yours.

I don't currently plan on working on this, but in any case it'd be good to have a place for people wanting to contribute to find which files are being worked on by someone else already.

@Bhavay-2001
Copy link
Contributor

There are mainly 3-4 files in the src/lightning folder that have the numpy library dependency. I think I can handle those myself.
Also @awaelchli, do we need to change numpy dependency in the tests package as well?

@awaelchli
Copy link
Member

awaelchli commented Jul 4, 2024

Hi
Sorry for the late reply here, I've been busy with other things, but I've seen some activity here and wanted to clarify a few things.

  1. The motivation of this issue is to drop the numpy requirement, but to be more precise what that means is making the numpy depencency no longer a required dependency. There are places in the code where we handle numpy, and these need to stay. What that means for this issue is that these places need to be guarded by a check whether numpy is available. If it is not, we do nothing. If it is available, we do what we used to do. The relevant places where this applies are:

    These are all occurrences where we don't want to (and can't) replace numpy with torch. What we would do there is simply do a check whether numpy is available, like so:

    from lightning_utilities.core.imports import RequirementCache
    
    _NUMPY_AVAILABLE = RequirementCache("numpy")
    
    if _NUMPY_AVAILABLE:
         # execute special numpy code
    else:
         # do nothing

    Contributions for these are welcome, please start with these if you are interested.

  2. Should we also remove the numpy from tests? The answer is no. Tests are not shipped as part of the package, and numpy is useful in our tests for various reasons. We should keep numpy as a testing requirement

  3. This issue does not apply to src/lightning/app. In fact, that code will soon be removed from the repository anyway. @01AbhiSingh I've seen you've opened a PR for examples/app, but the timing for this unfortunate because we plan to remove the app code in the next day or so. Maybe you'd like to take one of the above items I've listed instead?

Thanks everyone for the willingness to contribute. I hope the directions above make it a bit clearer what should be done as the next step.

@01AbhiSingh
Copy link
Contributor

to be more precise what that means is making the numpy depencency no longer a required dependency. There are places in the code where we handle numpy, and these need to stay. What that means for this issue is that these places need to be guarded by a check whether numpy is available. If it is not, we do nothing.

So, the goal is to make the numpy an optional dependency ? Thank you for explaining it a bit more, I'll pick one of the issues you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment