You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E File "test_shard_blip2.py", line 28, in check_forward_backward
E assert_hf_output_close(org_output, shard_output, ignore_keys=["past_key_values"])
E File "colossalai/testing/comparison.py", line 125, in assert_hf_output_close
E assert_hf_output_close(
E File "colossalai/testing/comparison.py", line 149, in assert_hf_output_close
E assert_close(
E File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1520, in assert_close
E raise error_metas[0].to_error(msg)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 5947392 / 5947392 (100.0%)
E Greatest absolute difference: nan at index (0, 0) (up to 1e-06 allowed)
E Greatest relative difference: nan at index (0, 0) (up to 1e-05 allowed)
With dtype=torch.bfloat16 and without enable_fused_normalization it passes, but if I enable enable_fused_normalization, it fails again:
E File "test_shard_blip2.py", line 28, in check_forward_backward
E assert_hf_output_close(org_output, shard_output, ignore_keys=["past_key_values"])
E File "/colossalai/testing/comparison.py", line 125, in assert_hf_output_close
E assert_hf_output_close(
E File "/colossalai/testing/comparison.py", line 125, in assert_hf_output_close
E assert_hf_output_close(
E File "/colossalai/testing/comparison.py", line 149, in assert_hf_output_close
E assert_close(
E File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1520, in assert_close
E raise error_metas[0].to_error(msg)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 24271 / 2161696 (1.1%)
E Greatest absolute difference: 0.0078125 at index (0, 3, 47) (up to 1e-05 allowed)
E Greatest relative difference: 169.0 at index (0, 3, 47325) (up to 1e-05 allowed)
Environment
torch 2.2.1 / CUDA 12.1
colossalai 0.3.6
transformesr 4.36.0
The text was updated successfully, but these errors were encountered:
I am not sure if it is a bug or an unavoidable error due to lower precision and it was intended to test only on fp32. Would appreciate it if you could share some insights about it. Thanks.
馃悰 Describe the bug
colossalai.shardformer.layer.FusedLayerNorm
doesn't seem to work correctly.https://github.com/hpcaitech/ColossalAI/blob/main/tests/test_shardformer/test_model/test_shard_blip2.py
This test file passes as it is.
But if I change
dtype
totorch.float16
:ColossalAI/tests/test_shardformer/test_model/test_shard_blip2.py
Line 92 in 89049b0
It fails:
With
dtype=torch.bfloat16
and withoutenable_fused_normalization
it passes, but if I enableenable_fused_normalization
, it fails again:Environment
torch 2.2.1 / CUDA 12.1
colossalai 0.3.6
transformesr 4.36.0
The text was updated successfully, but these errors were encountered: