Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 download leak connection #5355

Open
sbxz opened this issue Jul 2, 2024 · 2 comments
Open

S3 download leak connection #5355

sbxz opened this issue Jul 2, 2024 · 2 comments
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@sbxz
Copy link

sbxz commented Jul 2, 2024

Describe the bug

We are experiencing connection leak issues with the AWS SDK when downloading a file to S3.
If an error occurs just before subscribing to the donwload stream (or just if we never subscribe to the stream), the connection is never released.

Expected Behavior

I assume the connection should be released after a certain period of inactivity? However, none of the timeouts I have configured seem to have any effect.

Current Behavior

Once all the connections in the pool are occupied, we get connection acquisition errors. Even after an hour.

Caused by: java.lang.Throwable: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.
Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout.
If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests.
	at software.amazon.awssdk.http.nio.netty.internal.utils.NettyUtils.decorateException(NettyUtils.java:69) ~[netty-nio-client-2.25.24.jar:na]
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:307) ~[netty-nio-client-2.25.24.jar:na]
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.makeRequestListener(NettyRequestExecutor.java:188) ~[netty-nio-client-2.25.24.jar:na]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:503) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]
Caused by: java.util.concurrent.TimeoutException: Acquire operation took longer than 10000 milliseconds.
	at software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.timeoutAcquire(HealthCheckedChannelPool.java:77) ~[netty-nio-client-2.25.24.jar:na]
	at software.amazon.awssdk.http.nio.netty.internal.HealthCheckedChannelPool.lambda$acquire$0(HealthCheckedChannelPool.java:67) ~[netty-nio-client-2.25.24.jar:na]
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
	... 7 common frames omitted

Reproduction Steps

For example, here I have a simple test with one connection in the pool, and if I try to call /download twice, the second one will never be able to acquire a connection.

    @GetMapping("/download")
    public Mono<Void> download() {
        return Mono.fromFuture(this.s3AsyncClient.getObject(r ->
                    r.bucket(BUCKET)
                        .key(S3_KEY),
                AsyncResponseTransformer.toPublisher()))
            .then();
    }
    
    @Bean
    public S3AsyncClient s3AsyncClients(final S3Configuration s3Configuration,
        final SdkAsyncHttpClient sdkAsyncHttpClient)
        throws URISyntaxException {

        return S3AsyncClient.builder()
            .httpClient(sdkAsyncHttpClient)
            .serviceConfiguration(s3Configuration)
            .forcePathStyle(Boolean.TRUE)
            .endpointOverride(new URI(ENDPOINT))
            .region(Region.of(REGION))
            .credentialsProvider(() -> AwsBasicCredentials.create(ACCESS_KEY, PRIVATE_KEY))
            .build();
    }

    @Bean
    public SdkAsyncHttpClient sdkAsyncHttpClient() {
        return NettyNioAsyncHttpClient.builder()
            .maxConcurrency(1)
            .connectionTimeToLive(Duration.ofSeconds(2))
            .connectionTimeout(Duration.ofSeconds(2))
            .connectionAcquisitionTimeout(Duration.ofSeconds(10))
            .connectionMaxIdleTime(Duration.ofSeconds(2))
            .writeTimeout(Duration.ofSeconds(2))
            .tlsNegotiationTimeout(Duration.ofSeconds(2))
            .readTimeout(Duration.ofSeconds(2))
            .tcpKeepAlive(false)
            .useIdleConnectionReaper(true)
            .build();
    }

    @Bean
    public S3Configuration s3Configuration() {
        return S3Configuration.builder()
            .checksumValidationEnabled(Boolean.FALSE)
            .chunkedEncodingEnabled(Boolean.TRUE)
            .build();
    }

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.26.12

JDK version used

21

Operating System and version

Windows 11

@sbxz sbxz added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 2, 2024
@sbxz sbxz changed the title (short issue description) S3 download leak connection Jul 2, 2024
@kuzd4niil
Copy link

Hello @sbxz, I faced the same problem.
I think this problem is related to reactore-core-3541

@BastianBue
Copy link

We are currently facing the issue.

│ ClientException: Unable to execute HTTP request: Timeout waiting for connection from pool\n\tat s.a.a.c.e.SdkClientException$BuilderImpl.build(SdkClientException.java:111)\n\tat s.a.a.c.e.SdkClientException.create(SdkC │ │ lientException.java:47)\n\tat s.a.a.c.i.h.p.s.u.RetryableStageHelper2.setLastException(RetryableStageHelper2.java:226)\n\tat s.a.a.c.i.h.p.s.RetryableStage2.execute(RetryableStage2.java:65)\n\tat s.a.a.c.i.h.p.s.Retrya │ │ bleStage2.execute(RetryableStage2.java:36)\n\tat s.a.a.c.i.h.p.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)\n\tat s.a.a.c.i.h.StreamManagingStage.execute(StreamManagingS │ │ tage.java:53)\n\tat s.a.a.c.i.h.StreamManagingStage.execute(StreamManagingStage.java:35)\n\t... 33 frames truncated (including 14 common frames)\nCaused by: o.a.h.c.ConnectionPoolTimeoutException: Timeout waiting for c │ │ onnection from pool\n\tat o.a.h.i.c.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:316)\n\tat o.a.h.i.c.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionMa │ │ nager.java:282)\n\tat s.a.a.h.a.i.c.ClientConnectionRequestFactory$DelegatingConnectionRequest.get(ClientConnectionRequestFactory.java:92)\n\tat s.a.a.h.a.i.c.ClientConnectionRequestFactory$InstrumentedConnectionReques │ │ t.get(ClientConnectionRequestFactory.java:69)\n\t... 5 frames excluded\n\tat s.a.a.h.a.i.i.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)\n\tat s.a.a.h.a.ApacheHttpClient.execute(ApacheHttpClient.java:254)\n\ │ │ tat s.a.a.h.a.ApacheHttpClient.access$500(ApacheHttpClient.java:104)\n\tat s.a.a.h.a.ApacheHttpClient$1.call(ApacheHttpClient.java:231)\n\t... 59 frames truncated (including 37 common frames)\n"} │ │

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

No branches or pull requests

3 participants