Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Deprecate Dataset.get_internal_block_refs() #46455

Merged
merged 29 commits into from
Jul 11, 2024

Conversation

scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Jul 6, 2024

Why are these changes needed?

Stacked on: #46369 (merged)

Replaces Dataset.get_internal_block_refs() usages with Dataset.iter_internal_ref_bundles(), and marks the method as deprecated.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@scottjlee scottjlee added the go add ONLY when ready to merge, run all tests label Jul 6, 2024
@scottjlee scottjlee self-assigned this Jul 8, 2024
@scottjlee scottjlee marked this pull request as ready for review July 10, 2024 21:50
@@ -59,7 +59,7 @@ def __setattr__(self, key, value):
object.__setattr__(self, key, value)

@property
def block_refs(self) -> List[BlockMetadata]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix incorrect typehint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

@omatthew98 omatthew98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a lot of context here but the code changes seem good. One small nit on the comment.

def _ref_bundles_iterator_to_block_refs_list(
ref_bundles: Iterator[RefBundle],
) -> List[ObjectRef[Block]]:
"""Convert an iterator of RefBundles to a list of object references to Blocks."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Change to "Convert an iterator of RefBundles to a list of Block object references." to avoid the double to, for a sec thought this was a double transformation.

@@ -69,7 +69,8 @@ def test_zip_different_num_blocks_split_smallest(
override_num_blocks=num_blocks2,
)
ds = ds1.zip(ds2).materialize()
num_blocks = len(ds.get_internal_block_refs())
bundles = ds.iter_internal_ref_bundles()
num_blocks = sum([len(b.block_refs) for b in bundles])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_blocks = sum([len(b.block_refs) for b in bundles])
num_blocks = sum(len(b.block_refs) for b in bundles)

assert nblocks == 1, nblocks
ctx.target_max_block_size = 2_000_000
nblocks = len(ds2.map_batches(lambda x: x, batch_size=16).get_internal_block_refs())
bundles = ds2.map_batches(lambda x: x, batch_size=16).iter_internal_ref_bundles()
nblocks = sum([len(b.block_refs) for b in bundles])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and elsewhere

Suggested change
nblocks = sum([len(b.block_refs) for b in bundles])
nblocks = sum(len(b.block_refs) for b in bundles)

@bveeramani bveeramani merged commit 19dc58e into ray-project:master Jul 11, 2024
5 checks passed
Catch-Bull pushed a commit to Catch-Bull/ray that referenced this pull request Jul 15, 2024
Replaces Dataset.get_internal_block_refs() usages with Dataset.iter_internal_ref_bundles(), and marks the method as deprecated.

Signed-off-by: sjl <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: hejialing.hjl <[email protected]>
bveeramani pushed a commit that referenced this pull request Jul 16, 2024
…dles` instead of `(Block, BlockMetadata)` (#46575)

Followup to #46369 and
#46455.
Update `ExecutionPlan.execute_to_iterator()` to return `RefBundles`
instead of `(Block, BlockMetadata)`, to unify the logic between
`RefBundle`s and `Block`s. Also refactor the `iter_batches()` code path
accordingly to handle `RefBundle`s instead of raw `Block` and
`BlockMetadata`.

Signed-off-by: sjl <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants