-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current recommended way to deal with restarted Streamlit sessions, for both recovering the session state and continuing calculations #9031
Comments
Streamlit isn't really designed to be a "more user-friendly jupyter notebook", but here are some answers to your questions:
There isn't a "streamlit" way to do this -- you can handle it like you handle any other long-running python process: periodically save off a representation of your calculation "state" to disk and then program a means to load that state back into memory on application startup. One easy way to do this would be to periodically None of this is something that streamlit natively provides; so it will require you to write your own code for your own application.
If the entire
If it's just the user sessions that are getting removed and restarted (but the streamlit python process is persisting):
|
Thanks so much for your suggestions and comments @Asaurus1! Good points with your first set of suggestions. We are actually having users manually save the session, but there are performance issues with this and haven't put the effort into implementing incremental saves. We haven't tried the df.to_pickle() etc. ideas you suggest so that could help with performance... right now we're basically saving everything in the session state, saving what we can with pickle (fast), and everything else (basically custom classes) with dill (slow), since pickle can't reliably handle custom classes. Regardless, would be really nice if Streamlit acted like Jupyter that seems to be completely robust to these session disconnects/restarts. Good ideas with Celery and your streamlit-process-manager tool. We toyed around with our own custom solutions for this sort of thing but good to know what else is out there. Will give them a try! |
Glad to hear you tried and checked out some of those things already that I suggested! You mentioned that Jupiter seems to be "robust to these session disconnect / restarts". I've used Jupiter before many times and I am not aware of a built-in mechanism for preserving the state of the kernel across a restart. What you may be experiencing instead is that when a user rec next to a running Jupiter kernel with a notebook, they will reconnect to the same session that they left. In streamlet if a user disconnects (e.g. there's a network disruption because they lose Wi-Fi or the server is having problems) and when they reconnect streamlit assumes they're a new user and drop some in a new session with a new session state). If this is truly what's happening when you say your experiencing disconnects, then there is hope. Currently there is no "global" equivalent of st.session_state, but the @st.cache_resource does have global state. You can make a class that contains all the state variables you need to access, and then a function that creates an instance of that class and returns it. If you decorate that function with cache_resource, That state object will be preserved across all sessions and available to every streamlit session. It's obviously comes with all of the normal concurrency issues that you would get when you have multiple people accessing the same memory from different sessions. You could get around this by asking users to provide a username, and then giving them an instance of your new GlobalState class that is tied to their username. There are many ways to skin this cat :) If I'm wrong about what you mean by "disconnects/restarts" then perhaps you could share more details? |
Checklist
Summary
Background:
Unfortunately the platform where we are primarily deploying Streamlit, Palantir Foundry Code Workspaces, randomly restarts Streamlit sessions. It seems they are unable or unwilling to fix this, which is probably the right way to solve the problem, but I am posting here in case others have similar issues, as this could be a more general topic.
If you run a Jupyter notebook in say JupyterLab and then restart the browser, you see in the console that Jupyter reconnects to the session and everything is preserved nicely.
However, we all know that when Streamlit is restarted (and I personally think Streamlit could be utilized as a more user-friendly Jupyter notebook), nothing is preserved.
We have partially combatted the problem by allowing the user to manually save the session state, but if our users forget, then their work since their last save is lost. Note a consideration at least in our case is that we are performing sometimes complex analyses on large datasets, so at times we have 10-15 GB of data in the session state and are running hours-long computations using multiple cores. Each manual save can therefore take a significant amount of time (~a minute).
Questions:
Thanks so much in advance for your help!
Why?
No response
How?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: