UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457

Habeeb556 · 2024-07-02T14:52:42Z

Bug description

I encountered the following error when querying select * from table in SQL Lab. This issue occurred after upgrading from Superset version 2.1.3 to version 4.0.1.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte

How to reproduce the bug

Any table column with a TIMESTAMP datatype generates this error.

Screenshots/recordings

Superset version

4.0.1

Python version

3.11

Node version

I don't know

Browser

Chrome

Additional context

No response

Checklist

I have searched Superset docs and Slack and didn't find a solution to my problem.
I have searched the GitHub issue tracker and didn't find a similar bug report.
I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-07-02T15:00:24Z

To resolve the UnicodeDecodeError when querying a table with a TIMESTAMP datatype in SQL Lab after upgrading from Superset version 2.1.3 to version 4.0.1, ensure that the timestamp columns are properly normalized to a pandas-supported datetime format. This can be done using the utils.normalize_dttm_col function in the get_df method of the BaseViz class.

Here is the relevant code snippet from the get_df method:

if not df.empty:
    utils.normalize_dttm_col(
        df=df,
        dttm_cols=tuple(
            [
                DateColumn.get_legacy_time_column(
                    timestamp_format=timestamp_format,
                    offset=self.datasource.offset,
                    time_shift=self.form_data.get("time_shift"),
                )
            ]
        ),
    )

    if self.enforce_numerical_metrics:
        self.df_metrics_to_num(df)

    df.replace([np.inf, -np.inf], np.nan, inplace=True)

Ensure that this normalization step is correctly applied to handle the timestamp columns and avoid UnicodeDecodeError ^[1].

Additionally, the Pinot database connector is known to cause issues with TIMESTAMP columns in Superset. The Pinot driver infers TIMESTAMP columns as LONG, which can lead to problems such as UnicodeDecodeError. Here is a quick fix for the Pinot driver:

# Pinot driver infers TIMESTAMP column as LONG, so make the quick fix.
# When the Pinot driver fixes this bug, the current method could be removed.
if isinstance(sqla_column_type, types.TIMESTAMP):
    return sqla_column_type.compile().upper()

This fix ensures that TIMESTAMP columns are correctly interpreted and processed ^[2].

To continue talking to Dosu, mention @dosu.

mistercrunch · 2024-07-02T20:43:59Z

#28266 was first cherry-picked in 4.0.2

Please try upgrading and confirm the issue is fixed

Habeeb556 · 2024-07-03T08:47:22Z

@mistercrunch unfortunately, this did not fix the issue. I upgraded to version 4.0.2 and encountered the same error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
ERROR:superset.views.base:'utf-8' codec can't decode byte 0xff in position 6: invalid start byte

Additionally, I noticed that this issue occurs only when selecting columns with the TIMESTAMP datatype. All other columns work fine. It worked correctly with version 2.1.3 when I switched back.

mistercrunch · 2024-07-03T15:23:23Z

Full stracktrace please! Also curious which database engine/driver/version your are using.

Habeeb556 · 2024-07-03T17:32:31Z

Database engine: mssql+pyodbc
Version: 5.1.0

Stracktrace:

'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
2024-07-03 20:26:50,670:ERROR:superset.views.base:'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Triggering query_id: 41782
2024-07-03 20:26:50,944:INFO:superset.commands.sql_lab.execute:Triggering query_id: 41782
Query 41782: Running query on a Celery worker
2024-07-03 20:26:50,954:INFO:superset.sqllab.sql_json_executer:Query 41782: Running query on a Celery worker
'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
2024-07-03 20:26:59,507:ERROR:superset.views.base:'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte

mistercrunch · 2024-07-03T20:35:28Z

Oh it appears 4.0.2 does not include the large json refactor that centralized all calls to superset/utils/json.py here -> #28702

This should make 4.1.x I believe, I don't recommend brining in this large refactor as a cherry as it'll merge-conflict heavily

mistercrunch · 2024-07-03T20:36:25Z

@Habeeb556 if you have the ability to test against the master branch, you could confirm that it's working there. I'm tempted to close the issue, but will wait until you confirm the fix.

Habeeb556 · 2024-07-03T22:08:00Z

@mistercrunch, I have some good news and bad news. The good news is that I think I have successfully pushed to the master branch, and the query is running fine. However, the bad news is that the output is incorrectly formatted with Chinese characters.

I'm not sure if this is a bug or if my push was incorrect and missed something.

mistercrunch · 2024-07-08T17:54:09Z

This is where the [bytes] come from:
https://github.com/apache/superset/blob/master/superset/utils/json.py#L102

The chinese characters would show if/when your binary blob are decodable to utf-8 or utf-16.

What is in your binary blob? What do you expect to see?

Maybe you're using some funky other encoding or "collation". At this point if you're using something else than utf-N in this day and age you may want to standardize, or wrap the column with some database function that brings things to a modern encoding.

Habeeb556 · 2024-07-09T07:55:54Z

Yes, I checked this now with the old version 2.1.3, and it was returned the same value [bytes] when running. So, I can confirm that this master push with version 4.x is working.

Regarding the binary blob, here's what I expect to see when running directly from the SQL server.

mistercrunch · 2024-07-09T17:11:04Z

But what's in there? Some other language/character set? Guessing these bytes represents something intelligible (?)

Having worked with SQL Server a long time ago, I'm guessing this has to do with "collation" and MSFT SQL SERVER deep support for different character sets. From my understanding, all this is pretty much obsolete with the rise of the utf-8 / utf-16 standards.

Given that, Apache Superset probably shouldn't go out of its way to support the intricacies of how different databases support different character sets, and just tell people to convert to utf-x (either physically in your tables or using casting in views) in order to get Superset to deal with non ASCII characters.

Habeeb556 · 2024-07-09T18:33:19Z

I agree with you. I'm not exactly sure about the business logic here since I'm a DBA focused on database support for analytical tools. They encountered the error because of a SELECT * FROM table query, and they might not need that column, or it could reference something within the application — I'm not sure.

Overall, it's good that we can skip this error now when using SELECT *.

dosubot bot added #bug:regression Bugs that are identified as regessions sqllab Namespace | Anything related to the SQL Lab labels Jul 2, 2024

Habeeb556 mentioned this issue Jul 2, 2024

fix: use pessimistic json encoder in SQL Lab #28266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457

Habeeb556 commented Jul 2, 2024

dosubot bot commented Jul 2, 2024

mistercrunch commented Jul 2, 2024

Habeeb556 commented Jul 3, 2024

mistercrunch commented Jul 3, 2024 •

edited

Loading

Habeeb556 commented Jul 3, 2024

mistercrunch commented Jul 3, 2024

mistercrunch commented Jul 3, 2024 •

edited

Loading

Habeeb556 commented Jul 3, 2024

mistercrunch commented Jul 8, 2024

Habeeb556 commented Jul 9, 2024

mistercrunch commented Jul 9, 2024

Habeeb556 commented Jul 9, 2024

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457

Comments

Habeeb556 commented Jul 2, 2024

Bug description

How to reproduce the bug

Screenshots/recordings

Superset version

Python version

Node version

Browser

Additional context

Checklist

dosubot bot commented Jul 2, 2024

mistercrunch commented Jul 2, 2024

Habeeb556 commented Jul 3, 2024

mistercrunch commented Jul 3, 2024 • edited Loading

Habeeb556 commented Jul 3, 2024

mistercrunch commented Jul 3, 2024

mistercrunch commented Jul 3, 2024 • edited Loading

Habeeb556 commented Jul 3, 2024

mistercrunch commented Jul 8, 2024

Habeeb556 commented Jul 9, 2024

mistercrunch commented Jul 9, 2024

Habeeb556 commented Jul 9, 2024

mistercrunch commented Jul 3, 2024 •

edited

Loading

mistercrunch commented Jul 3, 2024 •

edited

Loading