Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make numpy an optional dependency in utilities\seed.py #20055

Merged
merged 10 commits into from
Jul 12, 2024

Conversation

01AbhiSingh
Copy link
Contributor

@01AbhiSingh 01AbhiSingh commented Jul 6, 2024

What does this PR do?

Part of #16649

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Numpy no longer a required dependency in utilites\seed.py.
Made the following changes in code :

  1. Added RequirementCache:
_NUMPY_AVAILABLE = RequirementCache("numpy")
  1. Fallback for max_seed_value and min_seed_value:
if _NUMPY_AVAILABLE:
    import numpy as np
    max_seed_value = np.iinfo(np.uint32).max
    min_seed_value = np.iinfo(np.uint32).min
  1. In seed_everything():
if _NUMPY_AVAILABLE:
    np.random.seed(seed)
  1. In pl_worker_init_function():
if _NUMPY_AVAILABLE:
        ss = np.random.SeedSequence([base_seed, worker_id, global_rank])
        # use 128 bits (4 x 32-bit words)
        np.random.seed(ss.generate_state(4))
        # Spawn distinct SeedSequences for the PyTorch PRNG and the stdlib random module
        torch_ss, stdlib_ss = ss.spawn(2)
        torch.manual_seed(torch_ss.generate_state(1, dtype=np.uint64)[0])
        # use 128 bits expressed as an integer
        stdlib_seed = (stdlib_ss.generate_state(2, dtype=np.uint64).astype(object) * [1 << 64, 1]).sum()

 random.seed(stdlib_seed)

  1. In _collect_rng_states():
def _collect_rng_states(include_cuda: bool = True) -> Dict[str, Any]:
    r"""Collect the global random state of :mod:`torch`, :mod:`torch.cuda`, :mod:`numpy` and Python."""
    states = {
        "torch": torch.get_rng_state(),
        "python": python_get_rng_state(),
    }
    if _NUMPY_AVAILABLE:
        states["numpy"] = np.random.get_state()
    if include_cuda:
        states["torch.cuda"] = torch.cuda.get_rng_state_all() if torch.cuda.is_available() else []
    return states
  1. In _set_rng_states():
if _NUMPY_AVAILABLE and "numpy" in rng_state_dict:
    np.random.set_state(rng_state_dict["numpy"])

Below is the test I ran to ensure that numpy is an optional dependency now and all the functions are doing what they are supposed to do without numpy.

import unittest
import subprocess
import sys
import importlib
from unittest.mock import patch

# Command to uninstall numpy for the test
uninstall_numpy_command = [sys.executable, "-m", "pip", "uninstall", "-y", "numpy"]

# Command to reinstall numpy after the test
install_numpy_command = [sys.executable, "-m", "pip", "install", "numpy"]

class TestOptionalNumPyDependency(unittest.TestCase):
    
    @classmethod
    def setUpClass(cls):
        # Uninstall numpy if it is installed
        subprocess.run(uninstall_numpy_command, check=True)
    
    @classmethod
    def tearDownClass(cls):
        # Reinstall numpy after tests
        subprocess.run(install_numpy_command, check=True)
    
    def test_seed_everything_without_numpy(self):
        from lightning.fabric.utilities.seed import seed_everything

        # Mock logging to avoid unnecessary output during tests
        with patch('lightning.fabric.utilities.seed.log') as mock_log:
            # Test seed_everything without numpy
            seed = seed_everything(42, workers=True)
            self.assertEqual(seed, 42)
    
    def test_reset_seed_without_numpy(self):
        from lightning.fabric.utilities.seed import seed_everything, reset_seed

        # Mock logging to avoid unnecessary output during tests
        with patch('lightning.fabric.utilities.seed.log') as mock_log:
            # Test reset_seed without numpy
            seed_everything(42, workers=True)
            reset_seed()

    def test_pl_worker_init_function_without_numpy(self):
        from lightning.fabric.utilities.seed import pl_worker_init_function

        # Mock logging to avoid unnecessary output during tests
        with patch('lightning.fabric.utilities.seed.log') as mock_log:
            # Test pl_worker_init_function without numpy
            pl_worker_init_function(0)

if __name__ == '__main__':
    unittest.main()

tagging @awaelchli for better visibility.

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20055.org.readthedocs.build/en/20055/

@github-actions github-actions bot added the fabric lightning.fabric.Fabric label Jul 6, 2024
@01AbhiSingh 01AbhiSingh changed the title Descriptive message about what you changed numpy now an optional dependency in utilities\seed.py Jul 6, 2024
i accidently pushed the test file also. so I deleted it
@01AbhiSingh
Copy link
Contributor Author

I accidently push the test file also with the PR. I have deleted the file though.

Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is great, exactly what we need.

src/lightning/fabric/utilities/seed.py Outdated Show resolved Hide resolved
src/lightning/fabric/utilities/seed.py Outdated Show resolved Hide resolved
src/lightning/fabric/utilities/seed.py Outdated Show resolved Hide resolved
@awaelchli awaelchli added the community This PR is from the community label Jul 6, 2024
@awaelchli awaelchli added this to the 2.4 milestone Jul 6, 2024
Copy link

codecov bot commented Jul 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90%. Comparing base (9987d99) to head (490fcc9).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #20055   +/-   ##
=======================================
  Coverage      90%      90%           
=======================================
  Files         266      266           
  Lines       22948    22961   +13     
=======================================
+ Hits        20678    20690   +12     
- Misses       2270     2271    +1     

@01AbhiSingh
Copy link
Contributor Author

01AbhiSingh commented Jul 9, 2024

Hi @awaelchli.

I was working on the remaining issue :

Parameter sanitization

in the comment :
#16649 (comment)

i want to ask that according to what I have understood reading the code is that the code below already checking numpy is present or not in the code below :

 for k in params:
        # convert relevant np scalars to python types first (instead of str)
        if isinstance(params[k], (np.bool_, np.integer, np.floating)):
            params[k] = params[k].item()
        elif type(params[k]) not in [bool, int, float, str, Tensor]:
            params[k] = str(params[k])
    return params

My question is that if we already have a Numpy checking condition, is there a need to add another conditional statement to check whether numpy is available or not ?
Please help. :)

@awaelchli
Copy link
Member

My question is that if we already have a Numpy checking condition, is there a need to add another conditional statement to check whether numpy is available or not ?

@01AbhiSingh Yes there is still some work needed there. But it's much easier than the seed work here.
All you need to do is remove the import at the top of the file and move it locally to that function:

if _NUMPY_AVAILABLE:
	import numpy as np 
	
	if isinstance(params[k], (np.bool_, np.integer, np.floating)):
		params[k] = params[k].item()

For this one, please send a separate PR, and we can discuss any additional challenges there :)
Thanks

@01AbhiSingh
Copy link
Contributor Author

if _NUMPY_AVAILABLE:
	import numpy as np 
	
	if isinstance(params[k], (np.bool_, np.integer, np.floating)):
		params[k] = params[k].item()

I will push this in a new PR.

What about this current PR ? Do I need to change anything in this current one or is it's all good ?

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jul 12, 2024
@awaelchli awaelchli changed the title numpy now an optional dependency in utilities\seed.py Make numpy an optional dependency in utilities\seed.py Jul 12, 2024
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @01AbhiSingh!
I also finally found the time to put together the alternative implementation for the seed sequence generator that was needed to replace the numpy implementation. I committed that to this branch directly and added a simple test.

@awaelchli awaelchli merged commit d5ae9ec into Lightning-AI:master Jul 12, 2024
97 of 101 checks passed
@01AbhiSingh
Copy link
Contributor Author

Great work @01AbhiSingh! I also finally found the time to put together the alternative implementation for the seed sequence generator that was needed to replace the numpy implementation. I committed that to this branch directly and added a simple test.

It took me a little time to understand the difference between the code changes you made and my code change in the seed generation method lol . Definitely some high quality code I can learn a lot from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community This PR is from the community fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants