Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timestamp.asm8 has different behavior depending on how it was constructed #59184

Closed
3 tasks done
Aloqeely opened this issue Jul 4, 2024 · 5 comments · Fixed by #59200
Closed
3 tasks done

BUG: Timestamp.asm8 has different behavior depending on how it was constructed #59184

Aloqeely opened this issue Jul 4, 2024 · 5 comments · Fixed by #59200
Assignees
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Aloqeely
Copy link
Member

Aloqeely commented Jul 4, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [2]: ts1 = pd.Timestamp(year=2023, month=1, day=1, hour=10, second=15)

In [3]: ts2 = pd.Timestamp('2023-01-01 10:00:15') #  ts1 and ts2 are equal

In [4]: np.int64(ts1.asm8)
Out[4]: 1672567215000000

In [5]: np.int64(ts2.asm8)
Out[5]: 1672567215

Issue Description

In the timestamp.asm8 doc it states "Return numpy datetime64 format in nanoseconds.", but, it has different (wrong) behavior when I construct the timestamp from a string, returning in seconds format. Any thoughts @MarcoGorelli?

Expected Behavior

I would expect np.int64(ts2.asm8) to be equal to 1672567215000000000

Installed Versions

INSTALLED VERSIONS

commit : 1b2d39c
python : 3.12.1
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 24.0
Cython : 3.0.10
sphinx : 7.3.7
IPython : 8.23.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.3.1
html5lib : 1.1
hypothesis : 6.100.1
gcsfs : 2024.3.1
jinja2 : 3.1.4
lxml.etree : 5.2.1
matplotlib : 3.8.4
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.2
psycopg2 : 2.9.9
pymysql : 1.4.6
pyarrow : 16.1.0
pyreadstat : 1.2.7
pytest : 8.1.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.3.1
scipy : 1.13.0
sqlalchemy : 2.0.29
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.5.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@Aloqeely Aloqeely added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 4, 2024
@ritwizsinha
Copy link
Contributor

ritwizsinha commented Jul 5, 2024

I think there is a mistake in the docs here.

  • ts1.asm returns data in microseconds not nanoseconds
  • ts2.asm returns data in seconds not milliseconds.
  • Return type doesn't follow documentation for both ts1 and ts2.

Moving on and addressing the issue:

The datestring is parsed by a c function parse_iso_8601_datetime. This function specifies the units based on the precision it encounters in the string. For example:

  • ts2 = pd.Timestamp('2023-01-01 10:00:15.000000') it will give the precision of microseconds, but for
  • ts2 = pd.Timestamp('2023-01-01 10:00:15) it gives the precision of seconds.

Then after creating the datetime struct from the string we identify its resolution here using get_supported_reso.

Clearly this function returns resolution of seconds, same as what was determined earlier and thus we print in a precision of seconds in asm8 function because it simply reads the resolution which was set at the time of reading.

For the former case, ts1, while creating a tsobject a precision of microseconds is determined here
Thus it outputs in microseconds.

The asm8 function calls to_datetime64 function which returns object with the same precision as it was stored with.

@Aloqeely Maybe the docs need to be updated for this.

@Anurag-Varma
Copy link
Contributor

take

@Anurag-Varma
Copy link
Contributor

Anurag-Varma commented Jul 6, 2024

@Aloqeely

I have tested few different combinations for this using different inputs for pd.Timestamp

if pd.Timestamp(year=2023, month=1, day=1, hour=10, second=15)
then asm8 returns microseconds

if pd.Timestamp('2023-01-01 10:00:15')
then asm8 returns seconds

if pd.Timestamp('2023-01-01 10:00:15.0') till pd.Timestamp('2023-01-01 10:00:15.000')
then asm8 returns milliseconds

if pd.Timestamp('2023-01-01 10:00:15.0000') till pd.Timestamp('2023-01-01 10:00:15.000000')
then asm8 returns microseconds

if pd.Timestamp('2023-01-01 10:00:15.0000000') till pd.Timestamp('2023-01-01 10:00:15.000000000')
then asm8 returns nanoseconds

So yes, I think the documentation should be updated as it says the return is only in nanoseconds.

But also in 1st case of using pd.Timestamp(year=2023, month=1, day=1, hour=10, second=15). We are not using until seconds only and not milli or micro or nano seconds, so i think it should return in seconds and not in microseconds.

Need this above one to be confirmed by a dev.

@Aloqeely
Copy link
Member Author

Aloqeely commented Jul 6, 2024

The doc of asm8 says it returns nanoseconds format datetime while the doc of to_datetime64 says it returns same precision datetime. The problem here is that asm8 is aliased to to_datetime64 so why do the docs differ?

I'm leaning towards fixing asm8 rather than changing the docs and treating it as an alias for to_datetime64.

@jbrockmendel
Copy link
Member

The behavior is correct, the doc needs updating, ts.asm8 returns a dt64 object with the same unit as ts.unit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants