-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return cudf::detail::host_vector
from make_host_vector
and add a make_device_uvector
overload
#16206
base: branch-24.08
Are you sure you want to change the base?
Return cudf::detail::host_vector
from make_host_vector
and add a make_device_uvector
overload
#16206
Conversation
…fea-pinned-vector-factory
…fea-pinned-vector-factory
…fea-pinned-vector-factory
…cudf into fea-pinned-vector-factory
…cudf into fea-pinned-vector-factory
…fea-pinned-vector-factory
…fea-smart-copy
Co-authored-by: David Wendt <[email protected]>
…cudf into fea-pinned-vector-factory
…fea-pinned-vector-factory
auto d_comp_in = cudf::detail::make_device_uvector_async( | ||
comp_in, stream, rmm::mr::get_current_device_resource()); | ||
auto d_comp_out = cudf::detail::make_device_uvector_async( | ||
comp_out, stream, rmm::mr::get_current_device_resource()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored the loop to avoid partial comp_in
/comp_out
copies to the device.
…fea-make_host_vector-great-again-try2
…fea-make_host_vector-great-again-try2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const members were preventing copy assignment
@@ -308,8 +308,6 @@ TYPED_TEST(StringsIntegerConvertTest, FromToInteger) | |||
// convert to strings | |||
auto results_strings = cudf::strings::from_integers(integers->view()); | |||
|
|||
// copy back to host | |||
h_integers = cudf::detail::make_host_vector_sync(d_integers, cudf::get_default_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was copying back the same data AFAICT
cudf::detail::make_host_vector_async(tokens_gpu, stream); | ||
thrust::host_vector<cuio_json::SymbolOffsetT> token_indices = | ||
cudf::detail::make_host_vector_async(token_indices_gpu1, stream); | ||
auto tokens = cudf::detail::make_host_vector_async(tokens_gpu, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fun one. Here's what I think happened here: this broke when I changed the return type because it would copy the buffer as a part of the implicit conversion(cudf::detail::host_vector
-> thrust::host_vector
); however, the data in the original object would not be ready because of the async D2H copy.
@@ -186,6 +186,63 @@ CUDF_EXPORT rmm::host_device_async_resource_ref& host_mr() | |||
return mr_ref; | |||
} | |||
|
|||
class new_delete_memory_resource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully temporary implementation; this should probably be in rmm.
@@ -123,7 +123,7 @@ struct format_compiler { | |||
: format(fmt), d_items(0, stream) | |||
{ | |||
specifiers.insert(extra_specifiers.begin(), extra_specifiers.end()); | |||
std::vector<format_item> items; | |||
auto items = cudf::detail::make_empty_host_vector<format_item>(format.length(), stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
estimate of the eventual vector size; no need to be exact
h_offsets[0] = 0; | ||
h_offsets[1] = chars.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no list initialization in thrust::host_vector AFAICT
size_type id; // stripe id | ||
size_type first; // first rowgroup in the stripe | ||
size_type size; // number of rowgroups in the stripe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer conversion was allowed by emplace_back
, but push_back
is having none of it. So I had to iron out the types and to static_cast
a bit in compute_page_splits_by_row
.
…fea-make_host_vector-great-again-try2
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some final questions/comments.
@@ -186,6 +186,63 @@ CUDF_EXPORT rmm::host_device_async_resource_ref& host_mr() | |||
return mr_ref; | |||
} | |||
|
|||
class new_delete_memory_resource { | |||
public: | |||
void* allocate(std::size_t bytes, std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably a leftover: alignment
is not used in the function.
|
||
void deallocate(void* ptr, | ||
std::size_t bytes, | ||
std::size_t alignment = rmm::RMM_DEFAULT_HOST_ALIGNMENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignment
not used
void deallocate_async(void* ptr, | ||
std::size_t bytes, | ||
std::size_t alignment, | ||
cuda::stream_ref stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda::stream_ref stream) | |
cuda::stream_ref) |
remove stream
since it's not used or using [[maybe_unused]]
.
return ::operator new(size); | ||
}); | ||
} catch (std::bad_alloc const& e) { | ||
RMM_FAIL("Failed to allocate memory: " + std::string{e.what()}, rmm::out_of_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to ask last time: do we want to use CUDF_FAIL
instead?
CUDF_EXPORT rmm::host_async_resource_ref get_pageable_memory_resource() | ||
{ | ||
static new_delete_memory_resource mr{}; | ||
static rmm::host_async_resource_ref mr_ref{mr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we need mr_ref
to be static as well?
Description
Issue #15616
Modified
make_host_vector
functions to returncudf::detail::host_vector
, which can use a pinned or a pageable memory resource. When pinned memory is used, the D2H copy is potentially done using a CUDA kernel.Also added factories to create
host_vector
s without device data. These are useful to replace uses ofstd::vector
andthrust::host_vector
when the data eventually gets copied to the GPU.Also added
make_device_uvector
overloads that take acudf::detail::host_vector
. These allow the H2D copy to the done using a CUDA kernel.Modified
cudf::detail::host_vector
to be derived fromthrust::host_vector
, to avoid issues with implicit conversion fromstd::vector
.Used
cudf::detail::host_vector
and its new factory functions wherever data ends up copied to the GPU.TODO:
allocate_host_as_pinned_threshold
.Checklist