Current recommended way to deal with restarted Streamlit sessions, for both recovering the session state and continuing calculations #9031

andrew-weisman · 2024-07-03T18:42:30Z

Checklist

I have searched the existing issues for similar feature requests.
I added a descriptive title and summary to this issue.

Summary

Background:

Unfortunately the platform where we are primarily deploying Streamlit, Palantir Foundry Code Workspaces, randomly restarts Streamlit sessions. It seems they are unable or unwilling to fix this, which is probably the right way to solve the problem, but I am posting here in case others have similar issues, as this could be a more general topic.

If you run a Jupyter notebook in say JupyterLab and then restart the browser, you see in the console that Jupyter reconnects to the session and everything is preserved nicely.

However, we all know that when Streamlit is restarted (and I personally think Streamlit could be utilized as a more user-friendly Jupyter notebook), nothing is preserved.

We have partially combatted the problem by allowing the user to manually save the session state, but if our users forget, then their work since their last save is lost. Note a consideration at least in our case is that we are performing sometimes complex analyses on large datasets, so at times we have 10-15 GB of data in the session state and are running hours-long computations using multiple cores. Each manual save can therefore take a significant amount of time (~a minute).

Questions:

What is Streamlit's current recommended way to continuously save the session state so that if the session is randomly restarted, all is recovered. This has to be a fast process since again we often have large datasets in memory.
What is Streamlit's current recommended way to persist jobs being run by Streamlit instead of them getting killed if the session is restarted?

Thanks so much in advance for your help!

Why?

No response

How?

No response

Additional Context

No response

github-actions · 2024-07-03T18:42:40Z

To help Streamlit prioritize this feature, react with a 👍 (thumbs up emoji) to the initial post.

Your vote helps us identify which enhancements matter most to our users.

Asaurus1 · 2024-07-08T01:48:48Z

Streamlit isn't really designed to be a "more user-friendly jupyter notebook", but here are some answers to your questions:

What is Streamlit's current recommended way to continuously save the session state so that if the session is randomly restarted, all is recovered. This has to be a fast process since again we often have large datasets in memory.

There isn't a "streamlit" way to do this -- you can handle it like you handle any other long-running python process: periodically save off a representation of your calculation "state" to disk and then program a means to load that state back into memory on application startup. One easy way to do this would be to periodically pickle the contents of st.session_state, although your mileage may vary depending on how "large" these large datasets you're talking about are. Large pandas datasets or numpy arrays have faster and more performant ways to be written to disk than the pickle library. You may also find it best to program in a way to detect which portions of the data have changed since the last save, and only save those portions.

None of this is something that streamlit natively provides; so it will require you to write your own code for your own application.

What is Streamlit's current recommended way to persist jobs being run by Streamlit instead of them getting killed if the session is restarted?

If the entire streamlit python process is getting killed and restarted:

You need to run your long-running jobs as separate processes on the computer, and communicate with them via some means. What you're looking for is something like Celery, a python-based task scheduler that can be run along-side a streamlit-based UI. You can send it messages to have it start and stop certain jobs, and monitor their status as they run.

If it's just the user sessions that are getting removed and restarted (but the streamlit python process is persisting):

You may be able to use something like my https://github.com/Asaurus1/streamlit-process-manager to spawn child python processes that will process your data and output results to a file that you can then read back into the streamlit front-end. Keep in mind, these library is in alpha so use it at your own risk.

andrew-weisman · 2024-07-12T02:47:40Z

Thanks so much for your suggestions and comments @Asaurus1!

Good points with your first set of suggestions. We are actually having users manually save the session, but there are performance issues with this and haven't put the effort into implementing incremental saves. We haven't tried the df.to_pickle() etc. ideas you suggest so that could help with performance... right now we're basically saving everything in the session state, saving what we can with pickle (fast), and everything else (basically custom classes) with dill (slow), since pickle can't reliably handle custom classes. Regardless, would be really nice if Streamlit acted like Jupyter that seems to be completely robust to these session disconnects/restarts.

Good ideas with Celery and your streamlit-process-manager tool. We toyed around with our own custom solutions for this sort of thing but good to know what else is out there. Will give them a try!

Asaurus1 · 2024-07-12T03:00:01Z

Glad to hear you tried and checked out some of those things already that I suggested!

You mentioned that Jupiter seems to be "robust to these session disconnect / restarts". I've used Jupiter before many times and I am not aware of a built-in mechanism for preserving the state of the kernel across a restart. What you may be experiencing instead is that when a user rec next to a running Jupiter kernel with a notebook, they will reconnect to the same session that they left. In streamlet if a user disconnects (e.g. there's a network disruption because they lose Wi-Fi or the server is having problems) and when they reconnect streamlit assumes they're a new user and drop some in a new session with a new session state).

If this is truly what's happening when you say your experiencing disconnects, then there is hope. Currently there is no "global" equivalent of st.session_state, but the @st.cache_resource does have global state. You can make a class that contains all the state variables you need to access, and then a function that creates an instance of that class and returns it. If you decorate that function with cache_resource, That state object will be preserved across all sessions and available to every streamlit session.

It's obviously comes with all of the normal concurrency issues that you would get when you have multiple people accessing the same memory from different sessions. You could get around this by asking users to provide a username, and then giving them an instance of your new GlobalState class that is tied to their username. There are many ways to skin this cat :)

If I'm wrong about what you mean by "disconnects/restarts" then perhaps you could share more details?

andrew-weisman added the type:enhancement Requests for feature enhancements or new features label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current recommended way to deal with restarted Streamlit sessions, for both recovering the session state and continuing calculations #9031

Current recommended way to deal with restarted Streamlit sessions, for both recovering the session state and continuing calculations #9031

andrew-weisman commented Jul 3, 2024

github-actions bot commented Jul 3, 2024

Asaurus1 commented Jul 8, 2024

andrew-weisman commented Jul 12, 2024

Asaurus1 commented Jul 12, 2024

Current recommended way to deal with restarted Streamlit sessions, for both recovering the session state and continuing calculations #9031

Current recommended way to deal with restarted Streamlit sessions, for both recovering the session state and continuing calculations #9031

Comments

andrew-weisman commented Jul 3, 2024

Checklist

Summary

Why?

How?

Additional Context

github-actions bot commented Jul 3, 2024

Asaurus1 commented Jul 8, 2024

andrew-weisman commented Jul 12, 2024

Asaurus1 commented Jul 12, 2024