Feature request: Finer-grained sleep function #54971

MilesCranmer · 2024-06-28T12:33:47Z

Base.sleep has a minimum time of 1 millisecond. This is very coarse compared to other high-performance languages like C and Rust and can cause some issues for some applications (see thread) such as periodically polling workers without blocking the main thread.

In other languages, for example, Python can actually get fairly high resolution with the default time.sleep (although its blocking):

[ins] In [3]: %%timeit
         ...: time.sleep(1e-6)
         ...:
         ...:
4.52 µs ± 570 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

which calls the corresponding system sleep.

Rust has:

async_std::task::sleep(Duration::from_nanos(1)).await;

which calls the system sleep in a thread.

One workaround is to use Libc.systemsleep. However, this blocks the calling thread. Similarly, one can use time() in a while-loop, but this also blocks.

One option which I would like comments on is the following function which is basically a non-blocking call of usleep by the use of @threadcall:

function systemsleep(seconds::Real)
    microseconds = round(UInt32, 1e6 * seconds)
    @threadcall(:usleep, UInt32, (UInt32,), microseconds)
    return nothing
end

This is OS-dependent and on some systems will not be better than sleep. But it's at least better for unix systems. With this function, one can get a higher resolution than Base.sleep:

julia> @btime systemsleep(1e-4)
  126.625 μs (11 allocations: 496 bytes)

julia> @btime sleep(1e-4)
  1.049 ms (4 allocations: 112 bytes)

with the smallest time being about 20us (presumably due to overhead of threadcall)

julia> @btime systemsleep(0)
  17.333 μs (11 allocations: 496 bytes)

There is also nanosleep which is another option to consider for this task: https://linux.die.net/man/2/nanosleep.

The text was updated successfully, but these errors were encountered:

ufechner7 · 2024-06-28T12:41:19Z

Looks similar to #12770

MilesCranmer · 2024-06-28T12:52:54Z

Wow, 2015! Okay maybe this just needs a PR at this point rather than an issue..

stevengj · 2024-06-28T12:56:56Z

@threadcall(:nanosleep, Int, (Int,), nanoseconds)

This is not the API of nanosleep.

MilesCranmer · 2024-06-28T13:09:21Z

You are fast! I realised my mistake and deleted the comment about 30 seconds after posting 😅 Cunningham's law shows no mercy

ufechner7 · 2024-06-28T13:21:47Z

@MilesCranmer Feel free to create a pull request for https://github.com/ufechner7/Timers.jl if you have a good idea for a new sleep function and what to test it out before creating a pull request to Julia directly.

MilesCranmer · 2024-06-28T13:37:12Z

Thanks, I will probably just go with this code for now:

function systemsleep(seconds::Real)
    microseconds = round(Int, 1e6 * seconds)
    @threadcall(:usleep, Int, (Int,), microseconds)
    return nothing
end

It's small enough that I will just stick it directly in my library, but thanks for the offer.

nanosleep would likely need a change to the C side of Julia because it takes the timespec struct as input. And due to the overhead of @threadcall, we wouldn't be able to get to nanosecond precision anyways, so there's not much point in using nanosleep for a non-blocking call over usleep.

I think a "proper" solution to all of this should handle it directly in C with nanosleep and use jl_yield. I'm happy to help add it if Julia devs are interested.

stevengj · 2024-06-28T14:04:09Z

nanosleep would likely need a change to the C side of Julia because it takes the timespec struct as input.

usleep has a similar issue because useconds_t is a system-dependent width. Moreover, nanosleep can (perversely) sleep for much longer intervals that usleep because of the timespec.

It's not particularly difficult for us to add a new jl_nanosleep function wrapping this in a system-independent API. (What would we do on Windows? There doesn't seem to be a closely analogous Windows function?)

MilesCranmer · 2024-06-28T14:21:52Z

I think for Windows it could just fall back to regular sleep to start with. Later it could use https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/high-resolution-timers like what Python does: https://docs.python.org/3/library/time.html#time.sleep (only Windows 8.1+)

MilesCranmer · 2024-06-28T14:25:39Z

Relevant code in CPython: https://github.com/python/cpython/blob/81a654a3425eaa05a51342509089533c1f623f1b/Modules/timemodule.c#L2164-L2344

oscardssmith · 2024-06-28T14:27:36Z

How do these short system sleeps compare to

function yieldsleep(t)
    t1 = time()
    while time()-t1<t
        yield()
    end
end

It looks like it works pretty well down to ~10 microsecond (and below that, you probably should just busy-wait).

julia> @benchmark yieldsleep(1e-5)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   9.057 μs …  29.295 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.069 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):    9.871 μs ± 621.258 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▃▂                     ▅█▆▁                              
  ▂▃▃▄▇███▄▃▃▂▁▁▁▁▁▁▁▁▁▁▂▂▃▄▅█████▅▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▁▁▁▁ ▂
  9.06 μs         Histogram: frequency by time         11.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark yieldsleep(1e-6)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.507 μs …   6.189 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.711 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.711 μs ± 185.194 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

              ▁█▇▃       ▂▇▇▁        ▁                         
  ▃▅▅▅▃▂▂▂▂▂▂▃████▅▂▂▂▂▃▅████▅▂▂▂▂▂▃███▇▂▂▂▂▂▂▂▂▃▃▃▂▂▁▁▁▁▂▂▂▂ ▃
  1.51 μs         Histogram: frequency by time        2.01 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

MilesCranmer · 2024-06-28T14:30:28Z

I guess the downside of this yieldsleep is that it would occupy CPU time (from other processes) if nothing else is running in the thread?

oscardssmith · 2024-06-28T14:41:49Z

That's fair, the better version of this function probably includes a check for whether t>1e-3 in which case it just does a regular `sleep. For 1ms or less, worrying about the extra power consumption of keeping a single core alive is probably a mistake.

MilesCranmer · 2024-06-28T15:08:48Z

But this would really depend on the context as usually you wouldn't have a one-off sleep call but loop over them (like, say, polling workers or checking a signal). If I'm sleeping for N% of the time, regardless of the duration of each individual sleep, it would still mean N% redundant CPU usage spinning the core (in the case of a "busy" sleep) – whereas the system sleep would be closer to 0%

oscardssmith · 2024-06-28T17:43:04Z

I don't think you can get accurate times and low CPU usage with a loop of short sleeps. (since if you yield back to the OS, you can't know that you will be scheduled in time). If you are using this to try to respond to incoming events, you really want a push based system (or something like wait that is integrated into the Julia scheduler)

MilesCranmer · 2024-06-28T18:10:30Z

Just to be clear I didn't mean using a loop of short sleeps to emulate a single long sleep... I meant doing small tasks with short sleeps between them.

MilesCranmer · 2024-06-28T19:02:09Z

Another example: I want to check if a file exists, and check regularly (<1ms) so I can react to updates. This is where a short sleep separating reads is useful. You could also find this kind of pattern in control applications, graphics, or networking (with various requirements for resolution). I'm not sure wait can save us here, because the task being waited on would be another Julia task that would need to continuously spin a CPU core (or sleep itself). Short-duration sleep is a fairly common utility in other languages so seems like a good thing to have in general, if anything just for allowing common programming patterns.

oscardssmith · 2024-06-28T19:37:49Z

For the file existence example, there are OS APIs to receive notifications that a file has been created/destroyed etc and C# for example, has a platform independent API that wraps them https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=net-8.0 OnCreated(FileSystemEventArgs). Specifically, for this use-case, you would call OnCreated(FileSystemEventArgs) which will call your registered callback whenever a file is created (that matches the filtering conditions).

I agree with your general point that we should make sleep more reliable for short sleeps (possibly via wait), but yielding back to the OS for single digit microseconds is a bad idea. The only advantage of sleep compared to yield is that sleep will let your computer do other work (or go into a low power mode), but if you're only sleeping for a few microseconds, there isn't enough time for much to happen anyway (since chips take a while to go into low power modes and going to the OS scheduler and having to clean out all your registers and cache takes a while).

vtjnash · 2024-06-28T19:58:51Z

I have looked into adding a higher precision mode to libuv, but on Windows (contrary to the python documentation, but per the kernel documentation linked to by the python documentation as its citation), the minimum possible sleep request is 1ms, and often it will round-up significantly more (usually up to the nearest 15ms interval). As Oscar mentioned, there are usually better ways to deal with short sleeps. For example, polling a file at 1ms will still often miss most updates to the file (as most updates to it won't change the metadata of the file), in addition to wasting a lot of computer time, that a call to the kernel filewatching stdlib would have gotten correct more often.

oscardssmith · 2024-06-28T20:10:53Z

Do we need/want libuv support for short sleeps? I think it would be reasonable to make sleeps less than 1ms be a yield loop even if libuv supported short sleeps.

MilesCranmer · 2024-06-28T21:32:45Z

I think if the operating system makes it possible to sleep for a certain interval under a millisecond, Julia should make that function available to the user without added overhead, blocking, or CPU usage.

MilesCranmer · 2024-06-29T11:44:31Z

contrary to the python documentation, but per the kernel documentation linked to by the python documentation as its citation), the minimum possible sleep request is 1ms, and often it will round-up significantly more (usually up to the nearest 15ms interval).

I think you might be reading the wrong page in the kernel docs? You can indeed get lower than 1ms resolution on Windows, I just verified it:

oscardssmith · 2024-06-29T18:49:53Z

The question with the python implementation is what happens if you request a sleep for (say) 5e-7 seconds. It seems very plausible that python turns sleep 0 into a noop,

MilesCranmer · 2024-06-29T19:31:00Z

I'm not sure 5e-7 is relevant, when even 1e-4 is out of the picture at the moment for Julia?

Anyways, in Python, 5e-7 is about 4x slower than expected (on macOS; currently away from my Windows machine) –

[nav] In [2]: import time

[ins] In [3]: %%timeit
         ...: time.sleep(5e-7)
         ...: 
         ...: 
1.91 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

I think even 1e-4 or 1e-5 sleeps would be nice in Julia, though, if 1e-6 isn't yet reachable, as the current status is (also on macOS) –

julia> @btime sleep(5e-7)
  1.145 ms (4 allocations: 112 bytes)

MilesCranmer · 2024-06-29T19:49:19Z

For reference, these are the main functions used by the python sleep code on Windows:

The docs for the second function mention:

The time after which the state of the timer is to be set to signaled, in 100 nanosecond intervals. Use the format described by the FILETIME structure. Positive values indicate absolute time. Be sure to use a UTC-based absolute time, as the system uses UTC-based time internally. Negative values indicate relative time. The actual timer accuracy depends on the capability of your hardware. For more information about UTC-based time, see System Time.

So it sounds like you can "request" sleep times in 100 ns intervals, but the actual accuracy is hardware-dependent.

The code shows that there's also an early-quit branch if sleep of 0 is passed – which calls Sleep(0) on Windows. So I guess on my machine, noting the 1e-6 example I gave above, ~20us is probably the lowest it can go on the hardware. (Also including the overhead of the other Python stuff in that call.)

In any case it would be nice to have access to this :)

vtjnash · 2024-06-29T21:09:18Z

Interesting. It seems that as of 2022, Python developers could prove in testing that the code didn't work (instead giving 16ms resolution) per https://bugs.python.org/issue45429

vtjnash · 2024-06-29T21:14:03Z

Also, just to point out the obvious: 1ms is already 1000 frames per second and therefore already significantly more than the OS checks for new keystrokes

MilesCranmer · 2024-06-29T21:16:26Z

I'm reading that thread now and they say CREATE_WAITABLE_TIMER_HIGH_RESOLUTION was the trick they used to fix it, which is used here: https://github.com/python/cpython/blob/81a654a3425eaa05a51342509089533c1f623f1b/Modules/timemodule.c#L447-L449

vtjnash · 2024-06-29T21:23:27Z

Ah, I see now where in the thread they mention this was implemented in the NT kernel in late 2021. I had not know that, as the documentation seems to not be fully updated for it

oscardssmith · 2024-06-29T22:08:17Z

Also, just to point out the obvious: 1ms is already 1000 frames per second and therefore already significantly more than the OS checks for new keystrokes

I don't think this is quite right. As I understand it, 1000hz polling rates were fairly common for gaming mice/keyboards.

vtjnash · 2024-06-30T05:48:38Z

Not with a stock kernel, though you can compile your own to increase it up to 1khz eg https://superuser.com/questions/369826/increase-usb-polling-rate-across-all-devices-in-linux

MilesCranmer mentioned this issue Jun 29, 2024

Avoid Base.sleep MilesCranmer/SymbolicRegression.jl#305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Finer-grained sleep function #54971

Feature request: Finer-grained sleep function #54971

MilesCranmer commented Jun 28, 2024 •

edited

Loading

ufechner7 commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

stevengj commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

ufechner7 commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

stevengj commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 •

edited

Loading

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 •

edited

Loading

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 •

edited

Loading

oscardssmith commented Jun 28, 2024

vtjnash commented Jun 28, 2024

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 29, 2024

oscardssmith commented Jun 29, 2024

MilesCranmer commented Jun 29, 2024

MilesCranmer commented Jun 29, 2024 •

edited

Loading

vtjnash commented Jun 29, 2024

vtjnash commented Jun 29, 2024

MilesCranmer commented Jun 29, 2024

vtjnash commented Jun 29, 2024

oscardssmith commented Jun 29, 2024

vtjnash commented Jun 30, 2024

Feature request: Finer-grained sleep function #54971

Feature request: Finer-grained sleep function #54971

Comments

MilesCranmer commented Jun 28, 2024 • edited Loading

ufechner7 commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

stevengj commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

ufechner7 commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

stevengj commented Jun 28, 2024 • edited Loading

MilesCranmer commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 • edited Loading

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 • edited Loading

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 • edited Loading

oscardssmith commented Jun 28, 2024

vtjnash commented Jun 28, 2024

oscardssmith commented Jun 28, 2024

MilesCranmer commented Jun 28, 2024 • edited Loading

MilesCranmer commented Jun 29, 2024

oscardssmith commented Jun 29, 2024

MilesCranmer commented Jun 29, 2024

MilesCranmer commented Jun 29, 2024 • edited Loading

vtjnash commented Jun 29, 2024

vtjnash commented Jun 29, 2024

MilesCranmer commented Jun 29, 2024

vtjnash commented Jun 29, 2024

oscardssmith commented Jun 29, 2024

vtjnash commented Jun 30, 2024

MilesCranmer commented Jun 28, 2024 •

edited

Loading

stevengj commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 28, 2024 •

edited

Loading

MilesCranmer commented Jun 29, 2024 •

edited

Loading