You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Consider the following code:
var endpoints = new[]
{
new DnsEndPoint("localhost", 5000),
new DnsEndPoint("localhost", 6000)
};
var grpcClient = CreateGrpcClient(endpoints);
using var serverStream = grpcClient.ServerStream(new MyGrpcRequest());
// This server stream is effectively endless - it constantly pushes new responses and never returns.
while (await serverStream.ResponseStream.MoveNext())
{
// process response
}
static MyGrpcClient CreateGrpcClient(IReadOnlyCollection<DnsEndPoint> endpoints)
{
const string Scheme = "test";
const string ServiceName = "test";
var services = new ServiceCollection();
services.AddLogging(logging =>
{
logging
.SetMinimumLevel(LogLevel.Information)
.AddFilter("Grpc.Net.Client.Balancer", LogLevel.Trace);
logging.AddSimpleConsole(console =>
{
console.TimestampFormat = "[HH:mm:ss] ";
console.SingleLine = true;
});
});
services
.AddGrpcClient<MyGrpcClient>(o =>
{
o.Address = new Uri($"{Scheme}://{ServiceName}");
})
.ConfigureChannel(o =>
{
o.Credentials = ChannelCredentials.Insecure;
o.ServiceConfig = new ServiceConfig
{
LoadBalancingConfigs =
{
new CustomLoadBalancingConfig()
}
};
});
services.AddSingleton<ResolverFactory>(new CustomResolverFactory(
Scheme,
new Dictionary<string, IReadOnlyCollection<DnsEndPoint>>
{
{ServiceName, endpoints}
}));
services.AddSingleton<LoadBalancerFactory, CustomLoadBalancerFactory>();
var serviceProvider = services.BuildServiceProvider();
return serviceProvider.GetRequiredService<MyGrpcClient>();
}
/// <summary>
/// Basic resolver factory that creates a basic resolver for specified host
/// </summary>
sealed class CustomResolverFactory(string name, IReadOnlyDictionary<string, IReadOnlyCollection<DnsEndPoint>> endpointsByHost)
: ResolverFactory
{
public override Resolver Create(ResolverOptions options)
{
var endpoints =
endpointsByHost.GetValueOrDefault(options.Address.Host)
??
throw new InvalidOperationException($"No endpoints for host {options.Address.Host}");
return new CustomResolver(endpoints, options.LoggerFactory);
}
public override string Name => name;
}
/// <summary>
/// Basic resolver that returns a set of specified endpoints
/// </summary>
sealed class CustomResolver(IReadOnlyCollection<DnsEndPoint> endpoints, ILoggerFactory loggerFactory)
: PollingResolver(loggerFactory)
{
protected override Task ResolveAsync(CancellationToken cancellationToken)
{
var addresses = endpoints.Select(e => new BalancerAddress(e)).ToArray();
var result = ResolverResult.ForResult(addresses);
Listener(result);
return Task.CompletedTask;
}
}
sealed class CustomLoadBalancingConfig() : LoadBalancingConfig(CustomLoadBalancerFactory.LoadBalancerFactoryName);
sealed class CustomLoadBalancerFactory : LoadBalancerFactory
{
public const string LoadBalancerFactoryName = nameof(CustomLoadBalancerFactory);
public override LoadBalancer Create(LoadBalancerOptions options) =>
new CustomLoadBalancer(options.Controller, options.LoggerFactory);
public override string Name => LoadBalancerFactoryName;
}
sealed class CustomLoadBalancer(IChannelControlHelper controller, ILoggerFactory loggerFactory)
: SubchannelsLoadBalancer(controller, loggerFactory)
{
protected override SubchannelPicker CreatePicker(IReadOnlyList<Subchannel> readySubchannels) =>
new CustomSubchannelPicker(readySubchannels);
}
sealed class CustomSubchannelPicker(IReadOnlyList<Subchannel> readySubchannels) : SubchannelPicker
{
public override PickResult Pick(PickContext context) =>
readySubchannels switch
{
[var singleSubChannel] => PickResult.ForSubchannel(singleSubChannel),
null or [] => PickResult.ForFailure(new Status(StatusCode.Unavailable, "No ready subchannels")),
_ => PickResult.ForFailure(new Status(StatusCode.Unavailable,
$"Too many ready subchannels: {readySubchannels.Count}")),
};
}
After letting this code run for a couple of minutes and analyzing logs we can spot an unexpected behavior:
[12:07:45] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4] Subchannel id '1-2' checking socket Unspecified/localhost:6000.
[12:07:50] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4] Subchannel id '1-2' checking socket Unspecified/localhost:6000.
[12:07:55] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4] Subchannel id '1-2' checking socket Unspecified/localhost:6000.
[12:07:55] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[15] Subchannel id '1-2' socket Unspecified/localhost:6000 is receiving 17 available bytes.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[14] Subchannel id '1-2' socket Unspecified/localhost:6000 is in a bad state and can't be used.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[16] Subchannel id '1-2' socket Unspecified/localhost:6000 is being closed because it can't be used. Socket lifetime of 00:02:15.2011945. The socket either can't receive data or it has received unexpected data.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Subchannel[11] Subchannel id '1-2' state changed to Idle. Detail: 'Lost connection to socket.'.
[12:07:55] trce: Grpc.Net.Client.Balancer.Subchannel[14] Subchannel id '1-2' executing state changed registration '1-2-1'.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Internal.ConnectionManager[4] Channel picker updated.
[12:07:55] trce: Grpc.Net.Client.Balancer.PollingResolver[1] CustomResolver refresh requested.
[12:07:55] trce: Grpc.Net.Client.Balancer.PollingResolver[8] CustomResolver resolve starting.
[12:07:55] trce: Grpc.Net.Client.Balancer.PollingResolver[4] CustomResolver result with status code 'OK' and 2 addresses.
[12:07:55] trce: Grpc.Net.Client.Balancer.Subchannel[4] Subchannel id '1-2' connection requested.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Subchannel[11] Subchannel id '1-2' state changed to Connecting. Detail: 'Connection requested.'.
[12:07:55] trce: Grpc.Net.Client.Balancer.Subchannel[14] Subchannel id '1-2' executing state changed registration '1-2-1'.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Internal.ConnectionManager[4] Channel picker updated.
[12:07:55] dbug: Grpc.Net.Client.Balancer.Subchannel[6] Subchannel id '1-2' connecting to transport.
[12:07:55] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[1] Subchannel id '1-2' connecting socket to Unspecified/localhost:6000.
[12:07:55] trce: Grpc.Net.Client.Balancer.Subchannel[19] Subchannel id '1-1' updated with addresses: localhost:5000
[12:07:55] trce: Grpc.Net.Client.Balancer.PollingResolver[7] CustomResolver resolve task completed.
[12:07:57] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[2] Subchannel id '1-2' connected to socket Unspecified/localhost:6000.
[12:07:57] dbug: Grpc.Net.Client.Balancer.Subchannel[11] Subchannel id '1-2' state changed to Ready. Detail: 'Successfully connected to socket.'.
[12:07:57] trce: Grpc.Net.Client.Balancer.Subchannel[14] Subchannel id '1-2' executing state changed registration '1-2-1'.
[12:07:57] dbug: Grpc.Net.Client.Balancer.Internal.ConnectionManager[4] Channel picker updated.
[12:08:02] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4] Subchannel id '1-2' checking socket Unspecified/localhost:6000.
[12:08:07] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4] Subchannel id '1-2' checking socket Unspecified/localhost:6000.
[12:08:12] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4] Subchannel id '1-2' checking socket Unspecified/localhost:6000.
To summarize what happened:
I've implemented a custom basic resolver + custom basic load balancer. Because of this custom load balancing, grpc-dotnet now creates tho subchannels with 1 endpoint each instead of one subchannel with 2 endpoints by default.
I connect to my grpc server using two endpoints and establish an endless server-side stream. grpc-dotnet selects one subchannel to serve this stream, the other one remains idle.
grpc-dotnet begins its health-check routine for idle subchannel by establishing a tcp socket and constantly polling it every 5 seconds.
Because there's no actual http2 traffic being passed through this socket, grpc server considers this connection stale and closes it after some time. For Kestrel, this happens after 130 seconds by default
Our socket is now closed, which forces grpc-dotnet to update a corresponding picker and re-query our resolver
Of course, in this particular example this quirk is nothing to worry about. The problem is that I use DnsResolver in my production code, and this behavior forces DnsResolver to re-query DNS entries over and over again. Considering the fact that I have tons of k8s pods in my production environment that share the same custom grpc balancing mechanism, it means that every 135 seconds tons of unnecessary DNS requests are being resolved which creates unnecessary load on my DNS servers.
Describe the solution you'd like
I can think of two solutions:
Implement actual http2 ping-pong health check mechanism in SocketConnectivitySubchannelTransport instead of just polling a tcp socket. If I understand correctly, this is exactly what grpc-godoes
Making ISubchannelTransport and corresponding types like TransportStatus and ConnectResult public so one can implement their own health-check logic
Describe alternatives you've considered
I've considered implementing my own health-check logic, but, as stated above, ISubchannelTransport and corresponding types are internal, which makes this impossible
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Consider the following code:
After letting this code run for a couple of minutes and analyzing logs we can spot an unexpected behavior:
To summarize what happened:
grpc-dotnet
now creates tho subchannels with 1 endpoint each instead of one subchannel with 2 endpoints by default.grpc-dotnet
selects one subchannel to serve this stream, the other one remains idle.grpc-dotnet
begins its health-check routine for idle subchannel by establishing a tcp socket and constantly polling it every 5 seconds.grpc-dotnet
to update a corresponding picker and re-query our resolverOf course, in this particular example this quirk is nothing to worry about. The problem is that I use
DnsResolver
in my production code, and this behavior forcesDnsResolver
to re-query DNS entries over and over again. Considering the fact that I have tons of k8s pods in my production environment that share the same custom grpc balancing mechanism, it means that every 135 seconds tons of unnecessary DNS requests are being resolved which creates unnecessary load on my DNS servers.Describe the solution you'd like
I can think of two solutions:
SocketConnectivitySubchannelTransport
instead of just polling a tcp socket. If I understand correctly, this is exactly whatgrpc-go
doesISubchannelTransport
and corresponding types likeTransportStatus
andConnectResult
public so one can implement their own health-check logicDescribe alternatives you've considered
I've considered implementing my own health-check logic, but, as stated above,
ISubchannelTransport
and corresponding types areinternal
, which makes this impossibleThe text was updated successfully, but these errors were encountered: