You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import pandas as pd
from evidently import ColumnMapping
from evidently.metric_preset import ClassificationPreset
from evidently.report import Report
df = pd.read_csv("animals.csv")
column_mapping = ColumnMapping()
column_mapping.target = "target"
column_mapping.prediction = list(df.loc[:, df.columns != "target"])
classification_performance_report = Report(metrics=[ClassificationPreset()])
classification_performance_report.run(current_data=df, reference_data=None, column_mapping=column_mapping)
I got, correctly, some warning of this kind:
UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
then the report is generated correctly, but when i try to save it in html format for using it in my streamlit app:
Traceback (most recent call last):
File "/evidently-report/utils/csv2report.py", line 32, in <module>
classification_performance_report.save_html(report_filepath)
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/suite/base_suite.py", line 207, in save_html
dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/report/report.py", line 212, in _build_dashboard_info
html_info = renderer.render_html(test)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py", line 73, in render_html
metric_result = obj.get_result()
^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/base_metric.py", line 232, in get_result
raise result.exception
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculation_engine/engine.py", line 42, in execute_metrics
calculations[metric] = calculation.calculate(context, converted_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculation_engine/python_engine.py", line 88, in calculate
return self.metric.calculate(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py", line 45, in calculate
current = calculate_metrics(
^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/evidently/calculations/classification_performance.py", line 382, in calculate_metrics
roc_auc = metrics.roc_auc_score(binaraized_target, prediction_probas_array, average="macro")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 580, in roc_auc_score
return _average_binary_score(
^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_base.py", line 118, in _average_binary_score
score[c] = binary_metric(y_true_c, y_score_c, sample_weight=score_weight)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/evidently-report/venv/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 339, in _binary_roc_auc_score
raise ValueError(
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
It would be better if this scenario was handled by setting ROC AUC score for that class equal to 0 (or 1).
The text was updated successfully, but these errors were encountered:
I have the same problem. I did a bit of investigating, thinking we could work together to resolve the issue, but the solution to this problem did not seem very straightforward to me.
The main issue is that, when some labels are missing from the set of predictions in the .csv file, certain metrics become meaningless, and when evidently (via sklearn) tries to calculate them using scikit-learn results in errors.
Philosophically, I believe that this shouldn't happen because, even if according to the law of large numbers, it should be very rare for labels to be missing from large samples, a tool like Evidently should be capable of handling scenarios with missing labels, which can occur quite frequently, both in testing/debugging scenarios and in standard tasks where it is common for a label to be significantly less prevalent (e.g., spam detection, anomaly detection, forgery detection).
Practically speaking, fixing this is not trivial. Ideally, the report should be generated without omitting plots where metric calculations fail. Instead, these plots should include placeholders for the missing labels. However, this is not easy to achieve, since the code heavily relies on scikit-learn's abstractions. Should we request Scikit-learn to modify the ROC AUC function to accommodate absent labels in predictions? This approach seems incorrect because the statistic itself becomes irrelevant from a statistical perspective. Therefore, the solution should come from a higher level, although integrating such a change elegantly with Evidently's use of Scikit-learn is challenging if it is the best approach at all.
We could force the set of labels to contain all of them, or put dummy data, and although this should work, is not a definitive solution.
I'd like to help, but I'm not sure on were to start. @emeli-dral, @mike0sv what do you think? Thanks in advance and great work on this project
With this csv:
when i build the multiclass classification using:
I got, correctly, some warning of this kind:
then the report is generated correctly, but when i try to save it in html format for using it in my streamlit app:
i get this error:
It would be better if this scenario was handled by setting ROC AUC score for that class equal to 0 (or 1).
The text was updated successfully, but these errors were encountered: