Return `cudf::detail::host_vector` from `make_host_vector` and add a `make_device_uvector` overload #16206

vuule · 2024-07-06T05:03:23Z

Description

Modified make_host_vector functions to return cudf::detail::host_vector, which can use a pinned or a pageable memory resource. When pinned memory is used, the D2H copy is potentially done using a CUDA kernel.

Also added factories to create host_vectors without device data. These are useful to replace uses of std::vector and thrust::host_vector when the data eventually gets copied to the GPU.

Also added make_device_uvector overloads that take a cudf::detail::host_vector. These allow the H2D copy to the done using a CUDA kernel.

Modified cudf::detail::host_vector to be derived from thrust::host_vector, to avoid issues with implicit conversion from std::vector.

Used cudf::detail::host_vector and its new factory functions wherever data ends up copied to the GPU.

TODO:

Add unit tests for allocate_host_as_pinned_threshold.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…fea-pinned-vector-factory

…cudf into fea-pinned-vector-factory

…fea-pinned-vector-factory

…fea-smart-copy

Co-authored-by: David Wendt <[email protected]>

…cudf into fea-pinned-vector-factory

…fea-pinned-vector-factory

vuule · 2024-07-09T07:52:48Z

cpp/src/io/parquet/reader_impl_chunking.cu

+  auto d_comp_in = cudf::detail::make_device_uvector_async(
+    comp_in, stream, rmm::mr::get_current_device_resource());
+  auto d_comp_out = cudf::detail::make_device_uvector_async(
+    comp_out, stream, rmm::mr::get_current_device_resource());


refactored the loop to avoid partial comp_in/comp_out copies to the device.

…fea-make_host_vector-great-again-try2

vuule · 2024-07-11T00:38:55Z

cpp/include/cudf/lists/detail/dremel.hpp

const members were preventing copy assignment

vuule · 2024-07-11T00:43:35Z

cpp/tests/strings/integers_tests.cpp

@@ -308,8 +308,6 @@ TYPED_TEST(StringsIntegerConvertTest, FromToInteger)
  // convert to strings
  auto results_strings = cudf::strings::from_integers(integers->view());

-  // copy back to host
-  h_integers = cudf::detail::make_host_vector_sync(d_integers, cudf::get_default_stream());


this was copying back the same data AFAICT

vuule · 2024-07-11T00:46:49Z

cpp/tests/io/json_tree.cpp

-    cudf::detail::make_host_vector_async(tokens_gpu, stream);
-  thrust::host_vector<cuio_json::SymbolOffsetT> token_indices =
-    cudf::detail::make_host_vector_async(token_indices_gpu1, stream);
+  auto tokens        = cudf::detail::make_host_vector_async(tokens_gpu, stream);


This is a fun one. Here's what I think happened here: this broke when I changed the return type because it would copy the buffer as a part of the implicit conversion(cudf::detail::host_vector -> thrust::host_vector); however, the data in the original object would not be ready because of the async D2H copy.

vuule · 2024-07-11T00:48:15Z

cpp/src/utilities/host_memory.cpp

@@ -186,6 +186,63 @@ CUDF_EXPORT rmm::host_device_async_resource_ref& host_mr()
  return mr_ref;
 }

+class new_delete_memory_resource {


hopefully temporary implementation; this should probably be in rmm.

vuule · 2024-07-11T00:51:24Z

cpp/src/strings/convert/convert_datetime.cu

@@ -123,7 +123,7 @@ struct format_compiler {
    : format(fmt), d_items(0, stream)
  {
    specifiers.insert(extra_specifiers.begin(), extra_specifiers.end());
-    std::vector<format_item> items;
+    auto items  = cudf::detail::make_empty_host_vector<format_item>(format.length(), stream);


estimate of the eventual vector size; no need to be exact

vuule · 2024-07-11T00:51:51Z

cpp/src/strings/combine/join.cu

+    h_offsets[0]   = 0;
+    h_offsets[1]   = chars.size();


no list initialization in thrust::host_vector AFAICT

vuule · 2024-07-11T00:54:58Z

cpp/src/io/orc/writer_impl.hpp

+  size_type id;     // stripe id
+  size_type first;  // first rowgroup in the stripe
+  size_type size;   // number of rowgroups in the stripe


Integer conversion was allowed by emplace_back, but push_back is having none of it. So I had to iron out the types and to static_cast a bit in compute_page_splits_by_row.

cpp/include/cudf/detail/utilities/vector_factories.hpp

cpp/include/cudf/lists/detail/dremel.hpp

cpp/src/io/parquet/writer_impl.cu

cpp/src/utilities/host_memory.cpp

…fea-make_host_vector-great-again-try2

copy-pr-bot · 2024-07-15T16:14:40Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

vuule · 2024-07-15T16:57:58Z

/ok to test

PointKernel

Some final questions/comments.

PointKernel · 2024-07-16T01:18:03Z

cpp/src/utilities/host_memory.cpp

@@ -186,6 +186,63 @@ CUDF_EXPORT rmm::host_device_async_resource_ref& host_mr()
  return mr_ref;
 }

+class new_delete_memory_resource {
+ public:
+  void* allocate(std::size_t bytes, std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)


probably a leftover: alignment is not used in the function.

PointKernel · 2024-07-16T01:22:04Z

cpp/src/utilities/host_memory.cpp

+
+  void deallocate(void* ptr,
+                  std::size_t bytes,
+                  std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)


alignment not used

PointKernel · 2024-07-16T01:30:27Z

cpp/src/utilities/host_memory.cpp

+  void deallocate_async(void* ptr,
+                        std::size_t bytes,
+                        std::size_t alignment,
+                        cuda::stream_ref stream)


Suggested change

cuda::stream_ref stream)

cuda::stream_ref)

remove stream since it's not used or using [[maybe_unused]].

PointKernel · 2024-07-16T01:46:18Z

cpp/src/utilities/host_memory.cpp

+          return ::operator new(size);
+        });
+    } catch (std::bad_alloc const& e) {
+      RMM_FAIL("Failed to allocate memory: " + std::string{e.what()}, rmm::out_of_memory);


Forgot to ask last time: do we want to use CUDF_FAIL instead?

PointKernel · 2024-07-16T01:48:45Z

cpp/src/utilities/host_memory.cpp

+CUDF_EXPORT rmm::host_async_resource_ref get_pageable_memory_resource()
+{
+  static new_delete_memory_resource mr{};
+  static rmm::host_async_resource_ref mr_ref{mr};


question: do we need mr_ref to be static as well?

vuule and others added 30 commits May 30, 2024 16:24

remove pinned_host_vector

eb39019

switch to host_device resource ref

24b1245

rebrand host memory resource

6c896f6

style

0048c59

java update because breaking

1964523

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

f871ca0

…fea-pinned-vector-factory

java fix

ac0ce9c

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

b610ba3

…fea-pinned-vector-factory

move test out of io util

ab36162

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

69a1bce

…fea-pinned-vector-factory

missed rename

83f665a

Merge branch 'branch-24.08' into fea-pinned-vector-factory

659cabc

update benchmark changes

c1ae478

Merge branch 'fea-pinned-vector-factory' of https://github.com/vuule/…

b1a1582

…cudf into fea-pinned-vector-factory

Merge branch 'branch-24.08' into fea-pinned-vector-factory

707dfc7

rename rmm_host_vector

1c09d0c

remove do_xyz

c343c31

Merge branch 'fea-pinned-vector-factory' of https://github.com/vuule/…

25ddc4f

…cudf into fea-pinned-vector-factory

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

3fc988b

…fea-pinned-vector-factory

comment

50f4d3e

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

8dfbd07

…fea-smart-copy

Merge branch 'fea-pinned-vector-factory' into fea-smart-copy

e429840

works

e5af490

include style

9082ccc

Co-authored-by: David Wendt <[email protected]>

Merge branch 'branch-24.08' into fea-pinned-vector-factory

054a98a

reviews

17b1ee0

Merge branch 'fea-pinned-vector-factory' of https://github.com/vuule/…

e3c344b

…cudf into fea-pinned-vector-factory

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

ea6408f

…fea-pinned-vector-factory

available_device_memory

2dbb68f

reviews

cb9cc22

github-actions bot added the CMake CMake build issue label Jul 8, 2024

vuule commented Jul 9, 2024

View reviewed changes

vuule added 4 commits July 9, 2024 00:54

improve docs

7789e39

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

ddf625c

…fea-make_host_vector-great-again-try2

add missing overload

d55fb39

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

309ae34

…fea-make_host_vector-great-again-try2

vuule commented Jul 11, 2024

View reviewed changes

vuule added 2 commits July 10, 2024 17:56

typo fixes; clean up

d8f0e58

Merge branch 'branch-24.08' into fea-make_host_vector-great-again-try2

60cc991

vuule marked this pull request as ready for review July 11, 2024 01:07

vuule requested review from a team as code owners July 11, 2024 01:07

vuule requested review from zpuller and PointKernel July 11, 2024 01:07

robertmaynard approved these changes Jul 11, 2024

View reviewed changes

PointKernel reviewed Jul 12, 2024

View reviewed changes

cpp/include/cudf/detail/utilities/vector_factories.hpp Show resolved Hide resolved

cpp/include/cudf/lists/detail/dremel.hpp Outdated Show resolved Hide resolved

cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved

cpp/src/utilities/host_memory.cpp Outdated Show resolved Hide resolved

vuule added 3 commits July 15, 2024 04:50

Merge branch 'branch-24.08' into fea-make_host_vector-great-again-try2

7a7db99

Merge branch 'branch-24.08' of https://github.com/rapidsai/cudf into …

ffd54f9

…fea-make_host_vector-great-again-try2

fix return type

b94d26c

remove noexcept on deallocates

0dfaee4

PointKernel reviewed Jul 16, 2024

View reviewed changes

abellina mentioned this pull request Jul 16, 2024

[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold #16288

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return `cudf::detail::host_vector` from `make_host_vector` and add a `make_device_uvector` overload #16206

Return `cudf::detail::host_vector` from `make_host_vector` and add a `make_device_uvector` overload #16206

vuule commented Jul 6, 2024 •

edited

Loading

vuule Jul 9, 2024

vuule Jul 11, 2024

vuule Jul 11, 2024

vuule Jul 11, 2024 •

edited

Loading

vuule Jul 11, 2024

vuule Jul 11, 2024

vuule Jul 11, 2024

vuule Jul 11, 2024

copy-pr-bot bot commented Jul 15, 2024

vuule commented Jul 15, 2024

PointKernel left a comment

PointKernel Jul 16, 2024

PointKernel Jul 16, 2024

PointKernel Jul 16, 2024

PointKernel Jul 16, 2024

PointKernel Jul 16, 2024

Return cudf::detail::host_vector from make_host_vector and add a make_device_uvector overload #16206

Are you sure you want to change the base?

Return cudf::detail::host_vector from make_host_vector and add a make_device_uvector overload #16206

Conversation

vuule commented Jul 6, 2024 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copy-pr-bot bot commented Jul 15, 2024

vuule commented Jul 15, 2024

PointKernel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Return `cudf::detail::host_vector` from `make_host_vector` and add a `make_device_uvector` overload #16206

Return `cudf::detail::host_vector` from `make_host_vector` and add a `make_device_uvector` overload #16206

vuule commented Jul 6, 2024 •

edited

Loading

vuule Jul 11, 2024 •

edited

Loading