More than once in my career I have come across this scenario: a .Net application frequently showing high response times. This high latency can have several causes, such as slow access to an external resource (a database or an API, for example), CPU usage reaching 100%, disk access overload, among others. I would like to add another possibility to the previous list, often overlooked: ThreadPool exhaustion. I will briefly show how the .Net ThreadPool works, and code examples where this can happen. Finally, I will demonstrate how to avoid this problem. The .Net ThreadPool The .Net Task-based asynchronous programming model is well known by the development community, but I believe that its implementation details are poorly understood - and it is in the details that the danger lies, as the saying goes. Behind the .Net Task execution mechanism there is a Scheduler, responsible, as its name suggests, for scheduling the execution of Tasks. Unless explicitly changed, the default .Net scheduler is the ThreadPoolTaskScheduler, which, as the name suggests, uses the default .Net ThreadPool to perform its work. The ThreadPool then manages, as expected, a pool of threads, to which it assigns the Tasks it receives using a queue. It is in this queue that the Tasks are stored until there is a free thread in the pool, and then start processing it. By default, the minimum number of threads in the pool is equal to the number of logical processors on the host. And here's the detail in how it works: when there are more Tasks to be executed than the number of threads on the host, pool, the ThreadPool can either wait for a thread to become free or create more threads. If it chooses to create a new thread and if the current number of threads in the pool is equal to or greater than the configured minimum number, this growth takes between 1 and 2 seconds for each new thread added to the pool. Note: Starting with .Net 6, improvements were introduced to this process, allowing for a faster increase in the number of threads in the ThreadPool, but the main idea still remains. Let's look at an example to make it clearer: suppose a computer has 4 cores. The minimum value of the ThreadPool will be 4. If all the Tasks that arrive quickly process their work, the pool may even have less than the minimum of 4 active threads. Now, imagine that 4 Tasks of slightly longer duration arrived simultaneously, thus using all the threads of the pool. When the next Task arrives in the queue, it will need to wait between 1 and 2 seconds, until a new thread is added to the queue. pool, and then leave the queue and start processing. If this new Task also has a longer duration, the next Tasks will wait in the queue again and will need to “pay the toll” of 1 to 2 seconds before they can start executing. If this behavior of new long-running Tasks continues for some time, the clients of this process will feel slow for any new Tasks that arrive at the ThreadPool queue. This scenario is called ThreadPool exhaustion (or ThreadPool starvation). This will occur until the Tasks finish their work and start returning threads to the pool, enabling the reduction of the queue of pending Tasks, or that the pool can grow enough to meet the current demand. This can take several seconds, depending on the load, and only then will the slowdown observed previously cease to exist. Synchronous vs. asynchronous code It is now necessary to make an important distinction about types of long-running work. Generally, they can be classified into 2 types: CPU/GPU-bound (CPU-bound or GPU-bound), such as the execution of complex calculations, or I/O-bound (I/O-bound), such as database access or network calls. In the case of CPU-bound tasks, except for algorithm optimizations, there is not much that can be done: you need to have enough processors to meet the demand. However, in the case of I/O-bound tasks, it is possible to free up the processor to respond to other requests while waiting for the I/O operation to finish. And this is exactly what the ThreadPool does when asynchronous I/O APIs are used. In this case, even if the specific task is still time-consuming, the thread will be returned to the pool and can serve another Task from the queue. When the I/O operation is finished, the Task will be requeued and then continue executing. For more details on how the ThreadPool waits for I/O operations to finish, click here. However, it is important to note that there are still synchronous I/O APIs, which cause the thread to block and prevent it from being released to the pool. These APIs - and any other type of call that blocks a thread before returning to execution - compromise the proper functioning of the ThreadPool, and may cause it to exhaust itself when subjected to sufficiently large and/or long loads. We can therefore say that the ThreadPool - and by extension ASP.NET Core/Kestrel, designed to operate asynchronously - is optimized for executing tasks of low computational complexity, with asynchronous bound I/O loads. In this scenario, a small number of threads is capable of processing a very high number of tasks/requests efficiently. Thread blocking with ASP.NET Core Let's see some code examples that cause threads to block pool, using ASP.NET Core 8.
Note: These codes are simple examples, and are not intended to represent any particular practice, recommendation, or style, except for the points related to the ThreadPool demonstration specifically.
To maintain identical behavior between examples, a request to a SQL Server database will be used that will simulate a workload that takes 1 second to return, using the WAITFOR DELAY statement.
To generate a usage load and demonstrate the practical effects of each example, we will use siege, a free command-line utility designed for this purpose.
In all examples, a load of 120 concurrent accesses will be simulated for 1 minute, with a random delay of up to 200 milliseconds between requests. These numbers are enough to demonstrate the effects on the ThreadPool without generating timeouts when accessing the database.
Synchronous Version Let's start with a completely synchronous implementation: The DbCall action is synchronous, and the ExecuteNonQuery method of the DbCommand/SqlCommand is synchronous, so it will block the thread until the database returns. Below is the result of the load simulation (with the siege command used).
You can see that we achieved a rate of 27 requests per second (Transaction rate), and an average response time (Response time) of around 4 seconds, with the longest request (Longest transaction) lasting more than 16 seconds – a very poor performance.
Asynchronous Version – Attempt 1 Let’s now use an asynchronous action (returning Task ), but still use the synchronous ExecuteNonQuery method.
Running the same load scenario as before, we have the following result.
Note that the result was even worse in this case, with a request rate of 14 per second (compared to 27 for the completely synchronous version) and an average response time of more than 7 seconds (compared to 4 for the previous one).
Asynchronous Version – Attempt 2 In this next version, we have an implementation that exemplifies a common – and not recommended – attempt to transform a synchronous I/O call (in our case, ExecuteNonQuery ) into an “asynchronous API”, using Task.Run.
The result, after simulation, shows that the result is close to the synchronous version: request rate of 24 per second, average response time of more than 4 seconds and the longest request taking more than 14 seconds to return.
Asynchronous Version – Attempt 3 Now the variation known as “sync over async”, where we use asynchronous methods, such as ExecuteNonQueryAsync in this example, but the .Wait() method of the Task returned by the method is called, as shown below. Both .Wait() and the .Result property of a Task have the same behavior: they cause the executing thread to block!
Running our simulation, we can see below how the result is also bad, with a rate of 32 requests per second, an average time of more than 3 seconds, with requests taking up to 25 seconds to return. Not surprisingly, the use of .Wait() or .Result in a Task is discouraged in asynchronous code.
Problem Solution Finally, let's look at the code created to work in the most efficient way, through asynchronous APIs and applying async / await correctly, following Microsoft's recommendation.
We then have the asynchronous action, with the ExecuteNonQueryAsync call with await.
The simulation result speaks for itself: request rate of 88 per second, average response time of 1,23 seconds and request taking a maximum of 3 seconds to return - numbers generally 3 times better than any previous option.
The table below summarizes the results of the different versions, for a better comparison of the data between them.
Code VersionRequest Rate ( /s)Average Time (s)Max Time (s)Synchronous27,384,1416,93Asynchronous114,337,9414,03Asynchronous224,904,5714,80Asynchronous332,433,5225,03Solution88,911,233,18 Workaround It is worth mentioning that we can configure the ThreadPool to have a minimum number of threads greater than the default (the number of logical processors). With this, he will be able to quickly increase the number of threads without paying that “toll” of 1 or 2 seconds.
There are at least 3 ways to do this: by dynamic configuration, using the runtimeconfig.json file, by project configuration, by adjusting the ThreadPoolMinThreads property, or by code, by calling the ThreadPool.SetMinThreads method.
This should be seen as a temporary measure, while the appropriate adjustments are not made to the code as shown above, or after appropriate prior testing to confirm that it brings benefits without performance side effects, as recommended by Microsoft.
Conclusion ThreadPool exhaustion is an implementation detail that can have unexpected consequences. And they can be difficult to detect if we consider that .Net has several ways to obtain the same result, even in its best-known APIs – I believe motivated by years of evolution in the language and ASP.NET, always aiming at backward compatibility.
When we talk about operating at increasing rates or volumes, such as going from dozens to hundreds of requests, it is essential to know the latest practices and recommendations. Furthermore, knowing one or another implementation detail can make a difference in avoiding scale problems or diagnosing them more quickly.
Tech Writers. In a future article, we will explore how to diagnose ThreadPool exhaustion and identify the source of the problem in code from a running process.