Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在预训练生成特定格式的文件(tfrecords) 时内存不足问题 #173

Open
TITONIChen opened this issue Jan 18, 2023 · 1 comment

Comments

@TITONIChen
Copy link

TITONIChen commented Jan 18, 2023

大佬们在预训练生成特定格式的文件时,如果输入文件很大时(如news_zh_1.txt,我自己是600M左右),运行create_pretrain_data.sh需要跑很久(>4小时)并且96G内存使用率达100%后killed掉,各位大佬们是怎么处理这种情况的呀?只能拆分文件分步无监督学习么

@zhuchenxi
Copy link

同问

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants