-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457
Comments
To resolve the Here is the relevant code snippet from the if not df.empty:
utils.normalize_dttm_col(
df=df,
dttm_cols=tuple(
[
DateColumn.get_legacy_time_column(
timestamp_format=timestamp_format,
offset=self.datasource.offset,
time_shift=self.form_data.get("time_shift"),
)
]
),
)
if self.enforce_numerical_metrics:
self.df_metrics_to_num(df)
df.replace([np.inf, -np.inf], np.nan, inplace=True) Ensure that this normalization step is correctly applied to handle the timestamp columns and avoid Additionally, the Pinot database connector is known to cause issues with # Pinot driver infers TIMESTAMP column as LONG, so make the quick fix.
# When the Pinot driver fixes this bug, the current method could be removed.
if isinstance(sqla_column_type, types.TIMESTAMP):
return sqla_column_type.compile().upper() This fix ensures that |
#28266 was first cherry-picked in 4.0.2 Please try upgrading and confirm the issue is fixed |
@mistercrunch unfortunately, this did not fix the issue. I upgraded to
Additionally, I noticed that this issue occurs only when selecting columns with the |
Full stracktrace please! Also curious which database engine/driver/version your are using. |
Database engine: mssql+pyodbc Stracktrace:
|
Oh it appears 4.0.2 does not include the large json refactor that centralized all calls to This should make 4.1.x I believe, I don't recommend brining in this large refactor as a cherry as it'll merge-conflict heavily |
@Habeeb556 if you have the ability to test against the |
@mistercrunch, I have some good news and bad news. The good news is that I think I have successfully pushed to the I'm not sure if this is a bug or if my push was incorrect and missed something. |
This is where the The chinese characters would show if/when your binary blob are decodable to utf-8 or utf-16. What is in your binary blob? What do you expect to see? Maybe you're using some funky other encoding or "collation". At this point if you're using something else than utf-N in this day and age you may want to standardize, or wrap the column with some database function that brings things to a modern encoding. |
But what's in there? Some other language/character set? Guessing these bytes represents something intelligible (?) Having worked with SQL Server a long time ago, I'm guessing this has to do with "collation" and MSFT SQL SERVER deep support for different character sets. From my understanding, all this is pretty much obsolete with the rise of the utf-8 / utf-16 standards. Given that, Apache Superset probably shouldn't go out of its way to support the intricacies of how different databases support different character sets, and just tell people to convert to |
I agree with you. I'm not exactly sure about the business logic here since I'm a DBA focused on database support for analytical tools. They encountered the error because of a Overall, it's good that we can skip this error now when using |
Bug description
I encountered the following error when querying
select * from table
in SQL Lab. This issue occurred after upgrading from Supersetversion 2.1.3
toversion 4.0.1
.UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte
How to reproduce the bug
Any table column with a
TIMESTAMP datatype
generates this error.Screenshots/recordings
Superset version
4.0.1
Python version
3.11
Node version
I don't know
Browser
Chrome
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: