You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
libcudf's cudf::datetime::extract_year returns an INT16 column, this can lose information for large positive or negative years.
The date32 type is:
signed 32 bit number of days since the unix epoch
The timestamp types are (for resolutions milli-, micro-, and nano-seconds):
signed 64 bit number of RESOLUTION ticks since the unix epoch
The must positive year representable by the date32 type is (approximately) $1970 + (2^{31} - 1)/365 \approx 5885486 \gg 2^{15} - 1$.
Similarly the most positive year representable by the timestamp64[ms] and timestamp64[us] types is respectively approximately 292473178 and 294441. Both of which are again larger than $2^{15} - 1$.
Steps/Code to reproduce bug
importcudfs=cudf.Series([2**63-1], dtype="datetime64[us]")
cudf_year=s.dt.year[0]
pandas_year=s.to_pandas().dt.year[0]
print(cudf_year) # 32103, incorrectprint(pandas_year) # 294247, correct, depending on how much the earth's rotation speed changes of the next few millenia
Expected behavior
We should produce the right answer. This might be doable by returning an INT32 column for year extraction.
The text was updated successfully, but these errors were encountered:
This is a bit fiddly since std::chrono specifies that the minimum and maximum values of representable years are $-2^{15}$ and $2^{15} - 1$ respectively. So given the manipulations rely on cuda::std::chrono, this may not be fixable.
Describe the bug
libcudf's
cudf::datetime::extract_year
returns an INT16 column, this can lose information for large positive or negative years.The date32 type is:
The timestamp types are (for resolutions milli-, micro-, and nano-seconds):
The must positive year representable by the date32 type is (approximately)$1970 + (2^{31} - 1)/365 \approx 5885486 \gg 2^{15} - 1$ .
Similarly the most positive year representable by the timestamp64[ms] and timestamp64[us] types is respectively approximately 292473178 and 294441. Both of which are again larger than$2^{15} - 1$ .
Steps/Code to reproduce bug
Expected behavior
We should produce the right answer. This might be doable by returning an INT32 column for year extraction.
The text was updated successfully, but these errors were encountered: