I was recently talking to several colleagues about the CLR garbage collector, and the question of where the size of 85,000 bytes as the threshold for objects on the Large Object Heap (LOH) comes from (instead of the seemingly more obvious value of 65,535).
This value was determined as a result of performance tuning by the CLR garbage collector team.
One of the reasons you should try to keep your object allocation sizes below this value (and off the Large Object Heap) is because unlike the Generation 0,1 and 2 area, the LOH is not compacted.
An excellent reference for the LOH is here: Large Object Heap Uncovered
There were changes made in .NET 4.0 as to how the LOH performs, but according to this connect issue there are still improvements that could be made:
Based on the example provided, we were able to allocate nearly 23 times as much memory before running out of memory on the large object heap going from version 3.5 to version 4. That’s not to say we are finished addressing fragmentation issues—we will continue to pay attention as we improve in future versions. In the .NET 4 release, we heard from customers that latency was a high priority. So that is where we have spent much of our focus. [Brandon Bray: Lead program manager responsible for the garbage collector in the .NET Framework]
.NET 4.0 introduced differences between the workstation and server versions of the garbage collector:
Comparing Workstation and Server Garbage Collection
Threading and performance considerations for workstation garbage collection:
The collection occurs on the user thread that triggered the garbage collection and remains at the same priority. Because user threads typically run at normal priority, the garbage collector (which runs on a normal priority thread) must compete with other threads for CPU time.
Threads that are running native code are not suspended.
Workstation garbage collection is always used on a computer that has only one processor, regardless of the setting. If you specify server garbage collection, the CLR uses workstation garbage collection with concurrency disabled.
Threading and performance considerations for server garbage collection:
The collection occurs on multiple dedicated threads that are running at THREAD_PRIORITY_HIGHEST priority level.
A dedicated thread to perform garbage collection and a heap are provided for each CPU, and the heaps are collected at the same time. Each heap contains a small object heap and a large object heap, and all heaps can be accessed by user code. Objects on different heaps can refer to each other.
Because multiple garbage collection threads work together, server garbage collection is faster than workstation garbage collection on the same size heap.
Server garbage collection often has larger size segments.
Server garbage collection can be resource-intensive. For example, if you have 12 processes running on a computer that has 4 processors, there will be 48 dedicated garbage collection threads if they are all using server garbage collection. In a high memory load situation, if all the processes start doing garbage collection, the garbage collector will have 48 threads to schedule.
If you are running hundreds of instances of an application, consider using workstation garbage collection with concurrent garbage collection disabled. This will result in less context switching, which can improve performance.