You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ignore spark.task.resource.gpu.amount if num_spark_task_gpus passed manually
Versions / Dependencies
All versions
Reproduction script
Set spark.task.resource.gpu.amount to fractional value and num_gpus_worker_node to not None in call to setup_ray_cluster
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
Joseph-Sarsfield
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Sep 11, 2023
rkooo567
added
P2
Important issue, but not time-critical
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Sep 25, 2023
@jjyao do we have an update on this "spark.task.resource.gpu.amount" can legitimately be a decimal value and shouldn't be used to set num_gpus_worker_node
What happened + What you expected to happen
Bug spark.task.resource.gpu.amount does not support fractional GPU which is required for parallel spark jobs on GPU
https://github.com/ray-project/ray/blob/master/python/ray/util/spark/cluster_init.py
line 1026
num_spark_task_gpus = int(
spark.sparkContext.getConf().get("spark.task.resource.gpu.amount", "0")
)
Ignore spark.task.resource.gpu.amount if num_spark_task_gpus passed manually
Versions / Dependencies
All versions
Reproduction script
Set spark.task.resource.gpu.amount to fractional value and num_gpus_worker_node to not None in call to setup_ray_cluster
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: