You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The FileWatching.mkpidlock function seems to produce a 10 seconds delay if sufficiently many Julia processes are trying to acquire the pid file.
Minimal test code:
foriin {1..10};do julia -e "using FileWatching; @time FileWatching.mkpidlock(\"testwaitfile.pid\") do ; end"&done
Example output:
0.036439 seconds (21.68 k allocations: 1.112 MiB, 64.32% compilation time)
0.037288 seconds (21.68 k allocations: 1.112 MiB, 62.75% compilation time)
0.037221 seconds (21.68 k allocations: 1.112 MiB, 63.32% compilation time)
0.049062 seconds (21.68 k allocations: 1.112 MiB, 67.08% compilation time)
0.036490 seconds (21.68 k allocations: 1.112 MiB, 63.47% compilation time)
0.035922 seconds (21.68 k allocations: 1.112 MiB, 65.30% compilation time)
0.035703 seconds (21.68 k allocations: 1.112 MiB, 64.55% compilation time)
0.040621 seconds (21.79 k allocations: 1.117 MiB, 65.54% compilation time)
0.037188 seconds (21.68 k allocations: 1.112 MiB, 63.14% compilation time)
10.076159 seconds (27.64 k allocations: 1.417 MiB, 0.46% compilation time)
The number of instances necessary to trigger this and the number of delayed instances depends on the filesystem and the speed of the filesystem. For faster filesystems one can try with more instances, e.g. 50 or 100. This might be hinting at a race condition.
Expected behavior
We expect the instances to not wait for 10 seconds after release, when trying to acquire the pid file.
Background
This problem first came up, when starting multiple Julia instances to do MPI on an HPC cluster started as a SLURM job.
The script contained code to activate an environment:
using Pkg;
Pkg.activate(".");
This resulted in accumulated delay of the instances starting and increased total runtime.
We circumvent this problem by simply using the --project argument instead of using Pkg.activate in the script and the problem does not occur with --project. Further, we identified the delay to come from the usage of mkpidlock on the manifest_usage.toml.pid file.
Julia versions
The behavior was reproduced with Julia 1.10.4 and Julia 1.11.0-rc1 using the official builds from the website.
The text was updated successfully, but these errors were encountered:
This happens if the file watcher (FileWatcher.watch_file) in the OS fails for some reason. This varies with OS/file system etc. The mkpidlock then fallbacks to polling. The default poll interval is 10 seconds. It can be changed with e.g. mkpidlock(..., poll_interval=1.0).
Description
The
FileWatching.mkpidlock
function seems to produce a 10 seconds delay if sufficiently many Julia processes are trying to acquire the pid file.Minimal test code:
Example output:
The number of instances necessary to trigger this and the number of delayed instances depends on the filesystem and the speed of the filesystem. For faster filesystems one can try with more instances, e.g. 50 or 100. This might be hinting at a race condition.
Expected behavior
We expect the instances to not wait for 10 seconds after release, when trying to acquire the pid file.
Background
This problem first came up, when starting multiple Julia instances to do MPI on an HPC cluster started as a SLURM job.
The script contained code to activate an environment:
This resulted in accumulated delay of the instances starting and increased total runtime.
We circumvent this problem by simply using the
--project
argument instead of usingPkg.activate
in the script and the problem does not occur with--project
. Further, we identified the delay to come from the usage ofmkpidlock
on themanifest_usage.toml.pid
file.Julia versions
The behavior was reproduced with Julia 1.10.4 and Julia 1.11.0-rc1 using the official builds from the website.
The text was updated successfully, but these errors were encountered: