Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return cudf::detail::host_vector from make_host_vector and add a make_device_uvector overload #16206

Open
wants to merge 110 commits into
base: branch-24.08
Choose a base branch
from

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Jul 6, 2024

Description

Issue #15616

Modified make_host_vector functions to return cudf::detail::host_vector, which can use a pinned or a pageable memory resource. When pinned memory is used, the D2H copy is potentially done using a CUDA kernel.

Also added factories to create host_vectors without device data. These are useful to replace uses of std::vector and thrust::host_vector when the data eventually gets copied to the GPU.

Also added make_device_uvector overloads that take a cudf::detail::host_vector. These allow the H2D copy to the done using a CUDA kernel.

Modified cudf::detail::host_vector to be derived from thrust::host_vector, to avoid issues with implicit conversion from std::vector.

Used cudf::detail::host_vector and its new factory functions wherever data ends up copied to the GPU.

TODO:

  • Add unit tests for allocate_host_as_pinned_threshold.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

vuule and others added 30 commits May 30, 2024 16:24
Co-authored-by: David Wendt <[email protected]>
@github-actions github-actions bot added the CMake CMake build issue label Jul 8, 2024
Comment on lines +849 to +852
auto d_comp_in = cudf::detail::make_device_uvector_async(
comp_in, stream, rmm::mr::get_current_device_resource());
auto d_comp_out = cudf::detail::make_device_uvector_async(
comp_out, stream, rmm::mr::get_current_device_resource());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored the loop to avoid partial comp_in/comp_out copies to the device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const members were preventing copy assignment

@@ -308,8 +308,6 @@ TYPED_TEST(StringsIntegerConvertTest, FromToInteger)
// convert to strings
auto results_strings = cudf::strings::from_integers(integers->view());

// copy back to host
h_integers = cudf::detail::make_host_vector_sync(d_integers, cudf::get_default_stream());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copying back the same data AFAICT

cudf::detail::make_host_vector_async(tokens_gpu, stream);
thrust::host_vector<cuio_json::SymbolOffsetT> token_indices =
cudf::detail::make_host_vector_async(token_indices_gpu1, stream);
auto tokens = cudf::detail::make_host_vector_async(tokens_gpu, stream);
Copy link
Contributor Author

@vuule vuule Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fun one. Here's what I think happened here: this broke when I changed the return type because it would copy the buffer as a part of the implicit conversion(cudf::detail::host_vector -> thrust::host_vector); however, the data in the original object would not be ready because of the async D2H copy.

@@ -186,6 +186,63 @@ CUDF_EXPORT rmm::host_device_async_resource_ref& host_mr()
return mr_ref;
}

class new_delete_memory_resource {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully temporary implementation; this should probably be in rmm.

@@ -123,7 +123,7 @@ struct format_compiler {
: format(fmt), d_items(0, stream)
{
specifiers.insert(extra_specifiers.begin(), extra_specifiers.end());
std::vector<format_item> items;
auto items = cudf::detail::make_empty_host_vector<format_item>(format.length(), stream);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

estimate of the eventual vector size; no need to be exact

Comment on lines +173 to +174
h_offsets[0] = 0;
h_offsets[1] = chars.size();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no list initialization in thrust::host_vector AFAICT

Comment on lines +81 to +83
size_type id; // stripe id
size_type first; // first rowgroup in the stripe
size_type size; // number of rowgroups in the stripe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer conversion was allowed by emplace_back, but push_back is having none of it. So I had to iron out the types and to static_cast a bit in compute_page_splits_by_row.

@vuule vuule marked this pull request as ready for review July 11, 2024 01:07
@vuule vuule requested review from a team as code owners July 11, 2024 01:07
cpp/include/cudf/lists/detail/dremel.hpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved
cpp/src/utilities/host_memory.cpp Outdated Show resolved Hide resolved
Copy link

copy-pr-bot bot commented Jul 15, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@vuule
Copy link
Contributor Author

vuule commented Jul 15, 2024

/ok to test

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final questions/comments.

@@ -186,6 +186,63 @@ CUDF_EXPORT rmm::host_device_async_resource_ref& host_mr()
return mr_ref;
}

class new_delete_memory_resource {
public:
void* allocate(std::size_t bytes, std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a leftover: alignment is not used in the function.


void deallocate(void* ptr,
std::size_t bytes,
std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment not used

void deallocate_async(void* ptr,
std::size_t bytes,
std::size_t alignment,
cuda::stream_ref stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cuda::stream_ref stream)
cuda::stream_ref)

remove stream since it's not used or using [[maybe_unused]].

return ::operator new(size);
});
} catch (std::bad_alloc const& e) {
RMM_FAIL("Failed to allocate memory: " + std::string{e.what()}, rmm::out_of_memory);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to ask last time: do we want to use CUDF_FAIL instead?

CUDF_EXPORT rmm::host_async_resource_ref get_pageable_memory_resource()
{
static new_delete_memory_resource mr{};
static rmm::host_async_resource_ref mr_ref{mr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we need mr_ref to be static as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue Spark Functionality that helps Spark RAPIDS
Projects
Status: In Progress
Status: Burndown
Development

Successfully merging this pull request may close these issues.

None yet

3 participants