Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the default number of IO queues on larger machines #56501

Merged
merged 2 commits into from
Jul 1, 2024

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Jun 28, 2024

The number of IO queues is currently limited to 16. On machines with more processors, increasing the number of IO queues appears to improve throughput on some benchmarks.

The number of IO queues is currently limited to 16. On machines with more processors, increasing the number of IO queues appears to improve throughput on some benchmarks.
@kouvel kouvel added this to the 9.0.0 milestone Jun 28, 2024
@kouvel kouvel self-assigned this Jun 28, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Jun 28, 2024
@kouvel
Copy link
Member Author

kouvel commented Jun 28, 2024

Some throughput below, on the arm64 Ampere machine with 80 procs. Before the change is with 16 IO queues, and after the change is with 40 IO queues. Looks like there would be some regressions and improvements with the change. Not sure if the regressions would be acceptable, but opening for consideration anyway.

Plaintext:

image

Plaintext regresses with any number of connections. This appears to be due to pipelining and the way IO queues work. Not sure if pipelining is an interesting scenario.

Json:

image

Slight regression at 256 connections, and slight improvement at the other connection counts.

Fortunes:

image

Looks mostly like no change.

// parallelism of processing work queued to IOQueues. The default number below is based on the processor count and tries
// to use a high-enough number for that to not be a significant limiting factor for throughput.

int processorCount = Environment.ProcessorCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
int processorCount = Environment.ProcessorCount;
var processorCount = Environment.ProcessorCount;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the explicit type preferred in this case? I thought that was the case in the runtime repo anyway, not sure about the guidelines in this repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if var is allowed in this repo, how is it better here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"var all the things" is convention here (personally I prefer the explicit type, but to be consistent w/ the codebase -> var).

@kouvel
Copy link
Member Author

kouvel commented Jun 28, 2024

A consideration is that it seems plausible that in some cases fewer IOQueues could perform the relevant work more efficiently and sufficiently quickly for them not to be a bottleneck. Alternatives could be considered, some challenges are outlined in #41391.

Copy link
Member

@halter73 halter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plaintext regresses with any number of connections. This appears to be due to pipelining and the way IO queues work. Not sure if pipelining is an interesting scenario.

I don't think the plaintext scenario is super interesting on its own, because as you note, it uses HTTP/1.1 pipelining which is rare to see in the wild outside of benchmarking.

However, poor HTTP/1.1 pipelining performance can be indicative of poor HTTP/2 performance since HTTP/2 can effectively pipeline 100 requests at a time with the default stream limit of 100. It would be nice to get some HTTP/2 numbers before changing this.

The main benchmark app takes a --threadCount parameter, so we can also use that to test this without needing to modify the socket transport. @sebastienros has done a lot of testing using non-default IOQueueCounts for different benchmarks. I know we've hard coded the --threadCount for some of those scenarios in the past, but I don't think we do currently.

A consideration is that it seems plausible that in some cases fewer IOQueues could perform the relevant work more efficiently and sufficiently quickly for them not to be a bottleneck.

This is why we clamped the value to begin with. We erred on the side of limiting parallelism where benchmark numbers didn't show a clear improvement which happened to be any count higher than 16 with our machines at the time.

But given we now have evidence that more parallelism can significantly help many-core machines at least on Linux, I'm all for trying new things. The way we came up with Math.Min(Environment.ProcessorCount, 16) wasn't super scientific to begin with. I think it's stayed that way mostly due to inertia and fear of regressing scenarios.

Alternatives could be considered, some challenges are outlined in #41391.

The challenges and tradeoffs are what make it interesting. I'm all for trying bigger changes too, but I don't see the harm of trying something like this first. We'll need to keep a close eye on how this affects all our benchmarks at https://aka.ms/aspnet/benchmarks, and be ready to revert if this causes significant regressions in important scenarios.

@halter73
Copy link
Member

@sebastienros I can merge this if you don't have any objections.

@kouvel
Copy link
Member Author

kouvel commented Jun 28, 2024

However, poor HTTP/1.1 pipelining performance can be indicative of poor HTTP/2 performance since HTTP/2 can effectively pipeline 100 requests at a time with the default stream limit of 100. It would be nice to get some HTTP/2 numbers before changing this.

I see that the https variant of plaintext still has pipeline: 16 here: https://github.com/aspnet/Benchmarks/blob/9c454b7357ac0b032c48535df236b8bbb9588052/scenarios/plaintext.benchmarks.yml#L62

Whereas the https variant of json does not: https://github.com/aspnet/Benchmarks/blob/9c454b7357ac0b032c48535df236b8bbb9588052/scenarios/json.benchmarks.yml#L60-L72

What's the difference there, and is that kind of pipelining different and interesting?

[Edit] Nevermind, I mistook one thing for another, these are not http/2 benchmarks

@kouvel
Copy link
Member Author

kouvel commented Jun 28, 2024

What would be a good benchmark to run to measure HTTP/2 multiplexing?

@halter73
Copy link
Member

halter73 commented Jun 28, 2024

One of the gRPC scenarios would be good. Make sure to set the stream count to something like 70 or 100. I think the default is 1.

@halter73 halter73 merged commit 8046091 into dotnet:main Jul 1, 2024
26 checks passed
@dotnet-policy-service dotnet-policy-service bot modified the milestones: 9.0.0, 9.0-preview7 Jul 1, 2024
@kouvel
Copy link
Member Author

kouvel commented Jul 2, 2024

Grpc - This is running on the Ampere machine as server (80 procs) and citrine-amd as load (48 proc) with --scenario grpcaspnetcoreserver-grpcnetclient --variable scenario=serverstreaming --variable requestSize=0 --variable responseSize=0 --variable streams=70 --variable connections=28 --variable threads=28:

image

Maybe a slight regression, but a high error margin too and maybe not too significant. The test also appears to be more load-heavy than app-heavy and I'm not sure how representative it is.

@kouvel kouvel deleted the IoQueueLinFix branch July 2, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants