Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core| Client] Ray client fail to reconnect after explicit disconnect #46403

Closed
shixiaocaia opened this issue Jul 3, 2024 · 1 comment
Closed
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core core-client ray client related issues P2 Important issue, but not time-critical

Comments

@shixiaocaia
Copy link

What happened + What you expected to happen

Hi,

Our team is using Ray to develop some applications, and I am an intern helping to validate the performance of Ray. My task involves connecting to Ray using the Ray Client API and performing various tasks or deploying actors to measure Ray's performance.

To facilitate this, I implemented an API using FastAPI that connects to Ray through the Ray Client API with the Python SDK. When the application shuts down, I attempt to disconnect the Ray client. However, when I try to reconnect to Ray afterward, I encounter an issue.

I expected Ray to clean up the previous ClientContext and establish a new one. Instead, Ray tries to reuse the last ClientContext and repeatedly attempts to reconnect. Eventually, I receive the following error message:

Request can't be sent because the Ray client has already been disconnected due to an error. Last exception: Failed to reconnect within the reconnection grace period (30s)

I read the Ray documentation, which states that when using multiple Ray clients, you need to actively disconnect each connection. However, the actual operation failed. I would like to ask how to effectively release the connection through Ray Client to connect to Ray, so as not to affect subsequent connections.

Versions / Dependencies

OS: Mac
ray[client]==2.24.0
Python 3.10.14
fastapi==0.108.0
pydantic==1.10.13
uvicorn==0.25.0

Reproduction script

The general FastAPI code is as follows, and you can reproduce the problem by it:

from fastapi import FastAPI, APIRouter
from contextlib import asynccontextmanager
import ray
import logging
from pydantic import BaseModel

logger = logging.getLogger(__name__)

ray_cluster_client = {}

@asynccontextmanager
async def lifespan(app: FastAPI):
    logger.info("FastAPI application startup")
    yield
    global ray_cluster_client
    for name, client in ray_cluster_client.items():
        client.disconnect()
        logger.info(f"Client {name} disconnected")  
    ray_cluster_client.clear()
    logger.info("FastAPI application shutdown")


app = FastAPI(lifespan=lifespan)


class RayAddress(BaseModel):
    ip: str

@app.get("/ping")
async def root():
    return {"message": "Hello World"}

@app.post('/ray_init')
def connect_ray_cluster(item: RayAddress):
    global ray_cluster_client
    if item.ip not in ray_cluster_client:
        try:
            ray_cluster_client[item.ip] = ray.init(address="ray://" + item.ip + ":10001", allow_multiple=True)
            return {"message": "Connected to Ray cluster", "ip": item.ip}
        except ConnectionError as e:
            return {"message": f"Error connecting to Ray cluster: {str(e)}", "ip": item.ip}
    else:
        return {"message": "Client already connected", "ip": item.ip}

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@shixiaocaia shixiaocaia added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 3, 2024
@anyscalesam anyscalesam added the core Issues that should be addressed in Ray Core label Jul 8, 2024
@jjyao jjyao added core-client ray client related issues P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 8, 2024
@jjyao
Copy link
Contributor

jjyao commented Jul 8, 2024

Hi @shixiaocaia,

Ray client is not recommended to be used. Are you able to use ray job submission instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core core-client ray client related issues P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants