Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Optionally allow user code to directly produce data into a plasma store managed buffer #46438

Open
Superskyyy opened this issue Jul 5, 2024 · 0 comments
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@Superskyyy
Copy link
Contributor

Superskyyy commented Jul 5, 2024

Description

Since Plasma was deprecated from Arrow and Ray uses a forked version that is heavily modified, I'm not sure there's an existing way to directly produce data into a fix-sized buffer that lives inside of the object store. Doing so, although is an advanced pattern, would allow heavy data loading operations to save one unnecessary memcpy from the heap and therefore potentially save a lot of CPU time.

Use case

One important use case is to support rapid saving of large model checkpoints directly into the Plasma store without having to save it to the heap first. Then an guardian checkpointer actor (i.e., take care of checkpoint lifecycle and support fault tolerance) can take its time transferring the checkpoint elsewhere, into distributed storage (other plasma stores, disk or network drive), which forms a reliable layered checkpointing system. Whenever trains fail, training process can either recover from 1. plasma 2. local disk 3. remote plasma or network drives. depending on the failure level.

Another use case is when an external datasource might support some kind of (direct memory access) DMA operation, which should be able to directly persist into plasma instead of costing an additional memcpy.

These are just my initial thoughts, any comments and feedback are welcome. @Bye-legumes @liuxsh9 @nemo9cby

@Superskyyy Superskyyy added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 5, 2024
@anyscalesam anyscalesam added the core Issues that should be addressed in Ray Core label Jul 8, 2024
@jjyao jjyao added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

3 participants