Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf JAVA binding failed Maximum pool size exceeded in LargeTableTest #16199

Closed
pxLi opened this issue Jul 5, 2024 · 2 comments · Fixed by #16216
Closed

[BUG] cudf JAVA binding failed Maximum pool size exceeded in LargeTableTest #16199

pxLi opened this issue Jul 5, 2024 · 2 comments · Fixed by #16216
Assignees
Labels
bug Something isn't working Java Affects Java cuDF API. tests Unit testing for project

Comments

@pxLi
Copy link
Member

pxLi commented Jul 5, 2024

Describe the bug
cudf_nightly-dev run: 1286,1287 (recent 2 runs)

ai.rapids.cudf.LargeTableTest failed in both cuda11+12 (A30 and L4 with 24GB mem)

[2024-07-04T12:47:34.687Z] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.0:test (default-test) on project cudf: There are test failures.
[2024-07-04T12:47:34.687Z] [ERROR] 
[2024-07-04T12:47:34.687Z] [ERROR] Please refer to /home/jenkins/agent/workspace/jenkins-cudf_nightly-dev-github-1287-cuda11/java/target/surefire-reports for the individual test results.
[2024-07-04T12:47:34.687Z] [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
[2024-07-04T12:47:34.687Z] [ERROR] There was an error in the forked process
[2024-07-04T12:47:34.687Z] [ERROR] Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-cudf_nightly-dev-github-1287-cuda11/cpp/build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:255: Maximum pool size exceeded
[2024-07-04T12:47:34.687Z] [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: There was an error in the forked process
[2024-07-04T12:47:34.687Z] [ERROR] Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-cudf_nightly-dev-github-1287-cuda11/cpp/build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:255: Maximum pool size exceeded
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:658)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:533)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:278)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:244)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1194)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1022)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:868)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[2024-07-04T12:47:34.687Z] [ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:954)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.apache.maven.cli.MavenCli.main(MavenCli.java:192)
[2024-07-04T12:47:34.688Z] [ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2024-07-04T12:47:34.688Z] [ERROR] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2024-07-04T12:47:34.688Z] [ERROR] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2024-07-04T12:47:34.688Z] [ERROR] 	at java.lang.reflect.Method.invoke(Method.java:498)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[2024-07-04T12:47:34.688Z] [ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Expected behavior
pass the UT

Environment overview (please complete the following information)

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@pxLi pxLi added bug Something isn't working Java Affects Java cuDF API. tests Unit testing for project labels Jul 5, 2024
@davidwendt
Copy link
Contributor

This looks very familiar to the error I got when running CI while working on #16037
https://github.com/rapidsai/cudf/actions/runs/9649886400/job/26623988951#step:9:2361

Can you check that your libcudf cmake build is set with -DCUDF_LARGE_STRINGS_DISABLED?
Otherwise you can disable large strings by setting the environment variable like here:

export LIBCUDF_LARGE_STRINGS_ENABLED=0

Or perhaps this is something else but the error looked very similar.

cc @jlowe

@jlowe jlowe self-assigned this Jul 8, 2024
@jlowe
Copy link
Member

jlowe commented Jul 8, 2024

Thanks for taking a look, @davidwendt. In #16037 I missed that we need to add the CUDF_LARGE_STRINGS_DISABLED flag to the Java pom. I added it to spark-rapids-jni which we use for the Spark plugin, but I missed it for the cudf jar artifact. I'll post a PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Java Affects Java cuDF API. tests Unit testing for project
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants