Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] gcs_server.out becomes too big #45678

Open
jingzhaoo opened this issue Jun 3, 2024 · 3 comments · May be fixed by #46423
Open

[Core] gcs_server.out becomes too big #45678

jingzhaoo opened this issue Jun 3, 2024 · 3 comments · May be fixed by #46423
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks stability

Comments

@jingzhaoo
Copy link

jingzhaoo commented Jun 3, 2024

What happened + What you expected to happen

We run a few Ray serve applications. After a few days, we found that our node will run out of disk space and cause all our Ray applications to go down. Upon further digging, it is the file gcs_sever.out file growing too big. I checked the source code. Looks like only gcs_sever.err is under log monitor but not for gcs_server.out`.

monitor_log_paths += glob.glob(f"{self.logs_dir}/gcs_server*.err")

I am very curious why. Is there any way to limit the max size of gcs_server.out? I appreciate your help!

Versions / Dependencies

2.22.0

Reproduction script

N/A

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@jingzhaoo jingzhaoo added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 3, 2024
@anyscalesam anyscalesam added the core Issues that should be addressed in Ray Core label Jun 3, 2024
@jjyao
Copy link
Collaborator

jjyao commented Jun 3, 2024

We current do not log rotate gcs_server.out. Can you use some external tool like logrotate to do the log rotation and delete old logs to avoid disk full error?

@jjyao jjyao added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 3, 2024
@frank-berlin
Copy link

frank-berlin commented Jun 4, 2024

What action is needed that after the file is moved that the daemon reopens a new file?

@manasvitickoo17
Copy link

Hi, We are encountering the same issue.
Here is a snapshot of the increase in disk usage (1.5 GB per hour) by gcs_server.out:
gcs

@anyscalesam anyscalesam added stability triage Needs triage (eg: priority, bug/not-bug, and owning component) and removed P1 Issue that should be fixed within a few weeks labels Jun 21, 2024
@jjyao jjyao added P1 Issue that should be fixed within a few weeks enhancement Request for new feature and/or capability and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) bug Something that is supposed to be working; but isn't labels Jul 1, 2024
@Bye-legumes Bye-legumes linked a pull request Jul 3, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks stability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants