Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed scipy from build-only dependencies #3538

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

filip-komarzyniec
Copy link

Resolves: #3536

In addition to removing scipy from build-system.requires table I've also updated the README file.

Please refer to the comments for some additional doubts I have regarding more changes that might need to be introduced.

README.md Outdated
Comment on lines 52 to 53
This software depends on [NumPy], a Python package for
scientific computing. You must have it installed prior to installing
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really have to be installed prior to installing gensim having in mind that a proper numpy version is already specified in build-system.requires table? Meaning it will be downloaded automatically during gensim build phase.

If it does not --- what should we do with regard to the next paragraph about BLAS lib? I think it will not be customisable for any architecture that a numpy .whl build exists for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the sentence starting "You must" is incorrect. Numpy will get installed automatically if it already isn't there.

With BLAS, I think we just leave that note as-is. You can't customize it as part of the gensim install. If you want to mess with BLAS, then you have to do it yourself, before installing numpy and gensim.

Right @piskvorky ?

Copy link
Owner

@piskvorky piskvorky Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot how exactly it worked back then. IIRC numpy was needed for the installation of gensim: as in numpy actively used during install, down to its C API. Maybe scipy too. I believe that is the intent behind "You must have [numpy] installed prior to installing gensim".

If pip installs numpy automatically before the gensim install itself starts, that's great. Some people don't use pip, I don't know all the deployment methods people use (from source, conda, etc) and whether they all install dependencies first.

In any case that was 15 years ago, the current ecosystem might work differently, maybe none of this is needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, the problem of linking Numpy to any BLAS lib is a matter to worry about only during building Numpy from source.
This can occur only if:

  • one wants to customise his Numpy build in a non typical way. In this case I believe such person knows what he's doing so this note is not really valuable for him,

  • one installs gensim which in turn downloads appropriate NumPy build (but src distribution instead of a .whl). Then the pip (or I believe any package manager) would try building the NumPy from source and implicitly linking an existing BLAS lib.

What's more, a NumPy installed as a gensim's dependency will not be (in some cases) the same NumPy as used for building gensim. Pip by default builds a library in a separate temporary virtual env where it installs build-only dependencies.

I've redacted the paragraph. Please review whether it sounds sensible .

@filip-komarzyniec filip-komarzyniec force-pushed the 3536_remove_scipy_from_build_only_deps branch from b170395 to 1dc75de Compare June 15, 2024 13:19
pyproject.toml Outdated
# oldest supported Numpy for non-'arm64|aarch64' architectures is 1.17.*
# as well as for 'Windows' on arm64. The oldest supported Numpy by Gensim
# is 1.18.5. Remove below line when they increase the oldest supported Numpy for those platforms
"numpy==1.18.5; python_version=='3.8' and platform_machine != 'aarch64' and (platform_machine != 'arm64' or platform_system == 'Windows')",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind changing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the meta package oldest-supported-numpy I see that numpy<1.18.5 will be downloaded for python 3.8 and Windows on arm64 (apart from all non-arm64 and non-aarch64 platforms).

So I believe this check was incomplete resulting in installing unsupported version of Numpy for python3.8 arm64 Windows users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky Do we want to deal with this now? It hasn't been reported yet, so it may not actually be a problem. It's been a while since I've messed with oldest-supported-numpy, so I'm not 100% sure this fix is needed and safe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning.
Should I revert this change then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's revert it. Feel free to make a separate PR with that change so we can review it later, after the current release is out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do as advised, thanks!

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, thank you!

@filip-komarzyniec
Copy link
Author

I see the builds failing on tests step (due to different scipy versions being installed for a given python version).

My PR should not affect anything more than library building step (which passes successfully each time).
Is there anything that can be done in order to fix the tests (for example should I merge my changes with any specific other branch fixing scipy runtime dependency?)

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 28, 2024

Very strange. I see your previous commits built fine. I will investigate.

@juhoinkinen
Copy link

juhoinkinen commented Jul 16, 2024

@mpenkov and @filip-komarzyniec, I tried restricting scipy versions to <1.14 in setup.py, and this makes the CICD tests pass, see this run.

Not sure why scipy 1.13.1 got installed for Python 3.8 and 3.9 in the first place (this logs line), and scipy 1.14.0 for Python 3.10+ (this logs line). I checked that:

Edit: Well, ofcourse the difference for Python versions comes because scipy 1.14.0 supports only Python 3.10+, as stated in the release notes.

Anyway, I wonder could this scipy version constraint <1.14 be applied or is there something more to it?

(I took a look because our CICD pipeline is not working due to Gensim installation issues.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scipy probably not needed in [build-system.requires] table
4 participants