-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestJlmRemoteMemoryNoAuth intermittently fails with 'Peak Usage used memory smaller than Current Usage used memory' #274
Comments
5x Grinder passed 100%: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1660/console |
Summary of what the failing test is doing:
So, intermittently, on various platforms, this validation is failing. Question is - whether or not this failure indicates something wrong with what we are testing - i.e., the JLM API of the SDK under test. Java doc for java.lang.management.MemoryPoolMXBean.getUsage()
Javadoc for java.lang.management.MemoryPoolMXBean.getPeakUsage()
FYI.. |
@LinHu2016 can you please take a look. Any relation to eclipse-openj9/openj9#5324? |
@pshipton I will take a look the issue, Thanks |
base on example (https://ci.adoptopenjdk.net/view/Test_system/job/openjdk12_hs_systemtest_x86-64_windows/71/tapResults/), the error is for OpenJDK hotspot G1, please provide the error case on OpenJ9, Thanks CL1 stderr j> 2019/06/10 11:07:07.499 Peak usage smaller than current usage here: |
@LinHu2016 : The error is intermittent. Started a Grinder for OpenJ9 here: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1664/console |
Hi @Mesbah-Alam,Could you please share test code or point the test src code location? if it has the issue on both Hotspot and OpenJ9, sounds like issue is on JCL or testing code. |
another thought it might relate nonheap memory pool(such as codecache?), wait the grinder to catch the issue on OpenJ9 for investigating the details. |
@Mesbah-Alam the Grinder failed to run the tests. |
Likely something is wrong with the Windows build at Adopt. The failure above happened due to AUTO_DETECT mechanism being turned on but the java -version output not containing "Compressed references" string. I launched another Grinder with AUTO_DETECT off, using the published build at Adopt, that failed too: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1675/tapResults/ |
Still occurring from time to time: https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_sanity.system_aarch64_linux/400 |
This is still occurring in the test builds.
are updated by the JVM? I'm looking for a possible explanation for the intermittent test case behavior - for instance, is the data returned by getUsage() always 'live' data but the getPeakUsage() data only updated periodically? |
Hi @lumpfish Could you please provide a failure case? I would like investigate the case. In general for heap memory pools, the peakUsage would be updated at the begin of GC and at the end of GC or when getUsage(), getPeakUsage() and resetPeakUsage() are called by user, so usage should never be larger than peakUsage. previously there was a bug for retrieving usage for the non heap memory pool, I would like confirm if the failure is related with the fix. Thanks |
also getUsage(), getPeakUsage() are two independent api, heap usage could be changed between api calls, so for realtime you should call getUsage before calling getPeakUsage if you want to compare between the two sizes. |
so above the validation code should be like this
|
@LinHu2016 - thank you for the insight about why current usage might be reported as higher than peak usage, I'll try reversing the API calls as you suggest and see if it makes a difference. I haven't managed to recreate this failure in a smaller, simpler test case, nor reproduce it locally - I have only seen the failure on Jenkins slave machines. If you want to run the test locally:
For Unix
The test creates a new directory under RESULTS_HOME. In there you will see an execute.pl file which is run during the test but can be rerun manually. So the commands issued in the test can be tweaked (e.g. to add extra arguments to java command lines) and the test rerun. Each time the test is run via stf.pl a new directory is created with a newly generated execute.pl etc. files. The java test case code is in the openjdk.test.jlm directory in the openjdk-systemtest repository (under SYSTEMTEST_HOME). If you make want to try changes to those files, edit, rebuild and rerun. |
I mean call getUsagge() before calling getPeakUsage(), not current_usage.getUsed() before peak_usage.getUsed(), so for the confusion. |
After more investigation (using hotspot jdk8 mac, which shows this failure more often than other platforms (about 1 in 10 test runs)), is seems that peak_usage is not updated often enough to guarantee that it will always be equal to or greater than the maximum value that current usage has ever had. Adding some additional code to the test case demonstrates this:
The test case was modified such that if the returned peak usage value was less than the returned current usage value the values were retrieved once more. Upon retrieving the values a second time, the peak usage value was unchanged, but the current usage value has fallen back below the peak usage value. The openjdk test case https://github.com/openjdk/jdk/blob/05a764f4ffb8030d6b768f2d362c388e5aabd92d/test/hotspot/jtreg/compiler/codecache/jmx/PeakUsageTest.java#L77 performs a similar test, suggesting that the behavior observed here is not expected. |
I add some debug info in CodeHeapPool::used_in_bytes to print _next_segment and _freelist_segments debug info like this
and print the CodeBlob's name at CodeCache::free and open log in NMethodSweeper::sweep_code_cache then reproduce this issue ,the debug is below
I think this issue is caused by CodeCache sweep, |
I had been investigating along these lines:
Running locally
I never thought to look at Code Cache sweep though. |
the CodeCache usage is related to _next_segment - _freelist_segments In my debug log , _next_segment alaways increase, but _freelist_segments increase or decrease dynamically . |
In my experiment I only modified I added a call:
before
with the same logic as It appears to stop the problem occurring - but now peak_usage will be updated more frequently. As a rough performance assessment I was going to add counters for the number of times peak_usage is updated with and without the fix. I haven't got round to it yet - busy on other things at the moment. I tried reproducing the issue using the openjdk head stream (openjdk/jdk repository), but could not. Have you seen it on a release other than jdk8? |
My working JDK is jdk8,I only run the systemtest on jdk8. |
I've created a branch with my attempt at a fix: The change currently includes code to turn the behaviour on and off and print some diagnostics using environment variables:
Running the test with and without UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE set shows that with the change _peak_usage is updated approximately 1.5 to 2.5 times more frequently with the change depending on the memory pool. The test runs for approximately 5 minutes. Output at end of the test without the change (i.e. without enviroment variable UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE set):
Output at end of the test with the change (i.e. with enviroment variable UPDATE_PEAK_USAGE_IN_GET_MEMORY_USAGE set):
|
I have created a standalone client which can attach to any java process (started with the required command line options) and monitor the peak usage in that process. Using this client the issue can also be seen when running the java db-shootout benchmark from the renaissance benchmarks project https://github.com/renaissance-benchmarks/renaissance. (Presumably other benchmarks would also demonstrate the issue.) When running the benchmark for 5 minutes, peak usage being lower than current usage is reported about one in eight runs. This is the client test case: To allow the client to be attached to a running process, that process must have the following options added to its command line:
To reproduce using the db-shootout benchmark:
The test case monitors the peak usage every second 240 times (so should complete before the benchmark). If the benchmark ends while the client test case is still running the client ends abnormally with various exceptions (ConnectException, SocketException or others). If at any point peak usage is lower than current usage the test case outputs a message:
showing the name of the memory pool with the error (Code Cache), and the usage values returned. Since the error occurs infrequently this shell script will run both the benchmark and the client repeatedly and should ensure that the benchmark never ends before the client test case.
|
@andrew-m-leonard - FYI, #274 (comment) |
I've now seen this on a jdk11 hotspot aarch64_linux nightly build for a different memory pool.
|
The test fails randomly with jdk8 at a high rate and according to the test history reports it's not platform nor machine related. While the test validity is still under investigation I will exclude it for jdk8 temurin. |
adoptium/aqa-systemtest#274 (comment) Signed-off-by: Sophia Guo <[email protected]>
* Exclude TestJlmRemoteMemoryNoAuth with jdk8 temurin adoptium/aqa-systemtest#274 (comment) Signed-off-by: Sophia Guo <[email protected]> * Update os-arch to be linux-all Test fails running within a docker container, should be excluded for linux-all Signed-off-by: Sophia Guo <[email protected]> * Update exclude os-arch Signed-off-by: Sophia Guo <[email protected]> * Temp change for test only Signed-off-by: Sophia Guo <[email protected]> * update exclude tagged issues Signed-off-by: Sophia Guo <[email protected]> * Remove duplicate excludes Signed-off-by: Sophia Guo <[email protected]> * java/lang/ProcessBuilder/Basic.java on solaris-x64 jdk8 Signed-off-by: Sophia Guo <[email protected]> * Remove duplicate exclude item Signed-off-by: Sophia Guo <[email protected]> * exclude known issue Signed-off-by: Sophia Guo <[email protected]> * Fix the indentation Signed-off-by: Sophia Guo <[email protected]>
also happened with TestJlmRemoteMemoryAuth on alpine-linux |
The similar JDK test was failing and excluded temporarily, but fixed here: The peak usage is acquired before current usage which is done in |
TestJlmRemoteMemoryNoAuth test intermittently fails with
Peak Usage used memory smaller than Current Usage used memory
.We should revisit the unstable test to check its validity.
Example:
https://ci.adoptopenjdk.net/view/Test_system/job/openjdk12_hs_systemtest_x86-64_windows/71/tapResults/
The text was updated successfully, but these errors were encountered: