.NET GC Internals mini-series

[+] Google234|5 years ago|reply

Is this the GC that’s contained in one single 50,000 loc file?

[+] npalli|5 years ago|reply

Yes, a 43K line file.

https://raw.githubusercontent.com/dotnet/runtime/master/src/...

[+] u678u|5 years ago|reply

Anyone have a good comparison of .net vs java GCs? I never had trouble with the former but Java has been terrible. I'm not sure if its just my project or having the xmx memory cap.

[+] throwaway189262|5 years ago|reply

.NET GC is nothing to write home about, pretty standard with long stop the world phase. It's not super interesting so nobody raves about it.

Java's new collectors ZGC and Shenandoah are state of the art, perhaps the best GC's in the world. There's collectors with lower latency and higher throughput but nothing out there that achieves both like Java's new collectors do.

It's a large part of why you hear that Java "performs better in production" than Go and C# even though those languages are faster in micro benchmarks. Java doesn't have value types and generates a ton of garbage but the collectors are good enough that it's still competitive with these faster languages.

Some will tell you that Go's GC is state of the art but I disagree. Last time it came up I started a huge flamewar in the comments so I'll avoid comparing them :).

[+] majkinetor|5 years ago|reply

I am quite interested in this also and I confirm it from empirical standpoint - I never had a single episode of .NET GC but it was usual stuff on Java apps. We even had 3 month session in GC optimization directly with the guy working at Sun on it.

[+] kkokosa|5 years ago|reply

Although there are many valid comments here, I will just leave here a voice from the .NET GC architect herself - https://devblogs.microsoft.com/dotnet/how-to-evaluate-info-y....

[+] pjmlp|5 years ago|reply

Regarding Java GC's it depends pretty much which JVM implementation, and what GC is chosen from the pleothara that each vendor packages.

Even if it has the same name to other vendor, it doesn't mean it is 1:1, as each vendor fine tunes their own implementation.

This also applies to .NET actually, as there are other implementations besides .NET Framework/Core.

So it always boils down to profiling.

[+] na85|5 years ago|reply

I don't use Java but I was under the impression that it has the most sophisticated gc's available.

Isn't the Shenandoah GC supposed to be able to gc asynchronously without pauses?

[+] The_rationalist|5 years ago|reply

ZGC is the state of the art

[+] kkokosa|5 years ago|reply

Thank you for all the comments, here's the link to the second episode if you are interested: https://www.youtube.com/watch?v=OXvT9f5PPbs

[+] mars4rp|5 years ago|reply

Please forgive me ignorance, but why can't I manually delete an object? Lots of times I know this one object is pretty big and want to delete it when I am done with it, but have to force the GC to run to clean it up.

[+] torginus|5 years ago|reply

Well, this is tricky.

I'm assuming you're trying to allocate a large array, since otherwise it's pretty difficult to have a large object in .NET.

One thing you can do is use Marshal.AllocHGlobal (this is essentially an unsafe malloc) which gives you a pointer (not reference) to a chunk of unmanaged memory, which you can access with unsafe pointer. This is pretty messy.

The other, more modern thing is using MemoryPool<T>, which gives you a manually managed "array" (Span, .NET-s version of slice) of structs of type T, which you can manually release after you're done with them.

The third option, is just allocate it using new and abandon all references, the create a different copy. The memory pressure of allocating a big object, will probably trigger a GC. This is dangerous, since there can still be dangling references to the old object (that might not even present in the source, but compiler generated), leading you to retain both the old and new memory.

[+] MikeTheGreat|5 years ago|reply

Because they're memory objects that are managed by .Net.

If you said "Hey, I'm done with this" that's great, but .Net can't actually delete it until it's checked for itself that nothing else is using it. Otherwise you'll inject an error into the memory manager when you delete an object that's still in use.

So you can kinda fake a delete by releasing the last reference to an object and then forcing a garbage collection (in .Net I think you can call gc.collect() or something)(it's worth noting that the .Net docs specifically said that .Net might ignore your request to do a gc so even calling that is more of a suggestion than a guaranteed garbage collection)

[+] patrec|5 years ago|reply

In extreme cases you might be able to emulate arena allocation by spawning an external process, create huge object, do some work on it, throw away the process. How viable this is depends on the programming language you are using. In erlang it can work great, because gc is per process (no shared memory) and you can use green-threads; in languages with very grummy GC like python it can also be worth considering[1] even with the cost of spawning a system process. I haven't kept up to date with Net, but my guess would be it's generally not a very attractive strategy.

[1] Refcounting will be instantaneous of course, but if you have a large heap with a lot of objects and GC kicks in you can have very long gc pauses (even if next to nothing will be reclaimed).

[+] Skinney|5 years ago|reply

With a GC like the one used in .NET, «deleting» an object is a noop. It wouldn’t give you any benefits over not deleting it.

You only pay a price for living things (as in, some object holding a reference to it), the rest is free.

[+] pjmlp|5 years ago|reply

Allocate it via Marshal.AllocHGlobal, and use unsafe to place it there.

Then you can allocate and deallocate at will.

Or just use a struct.

[+] kevingadd|5 years ago|reply

What will deleting it achieve if the GC doesn't run?

[+] jayd16|5 years ago|reply

The runtime can't guarantee no use after free if it also allows manual, unchecked free.

[+] tester756|5 years ago|reply

Why .NET still has no interface for serious, custom GCs?

[+] MarkSweep|5 years ago|reply

CoreCLR does have the ability to compile the GC as a DLL and then choose different GCs are runtime by loading different DLLs. Search for FEATURE_STANDALONE_GC in the code base.

This feature is enabled in the official builds though.

[+] graycat|5 years ago|reply

Maybe for some readers:

Likely GC abbreviates garbage collection.

Going way back, some programming languages have permitted dynamic storage allocation, that is, a programmer using that language could during execution of the program ask for storage, that is, bytes in main memory, to be allocated, i.e., made available for use. Later the programming could free that storage. E.g., early versions of the programming language Fortran did not offer dynamic storage allocation, but some programmers would implement their own, say, in a Fortran array. Then for pointers to the allocated storage, just use a subscript on the array name. The array might be in storage called COMMON which to the linkage editor was external, thus permitting all parts of the program to use dynamic storage. The programming language PL/I had versions of dynamic storage AUTOMATIC, BASED, and CONTROLLED. The programming language C has storage allocation via the function MALLOC and freeing via FREE.

Well, first cut, intuitively can think of garbage collection (GC) as automated dynamic storage freeing.

In the case of the original post (OP) of this thread, what is going on is, in the .NET languages, C#, Visual Basic (VB), F#, etc., can, e.g., in a function, allocate storage, e.g., with the VB statement ReDim, likely use that storage, have flow control leave that function, leave the storage allocated, and, then, have garbage collection notice automatically when that storage will not be used again and free it, i.e., make it available again for allocation and use. In addition, likely the code for some programming language features need at least dynamic storage and might use GC for freeing.

The broad idea of garbage collection is old, in several programming languages goes back decades. E.g., in PL/I, AUTOMATIC gave automatic storage freeing.

Why should .NET implement garbage collection, that is, why bother? Otherwise sometimes programmers forget to do the garbage collection themselves resulting in allocated storage growing until it is too large. One of the old examples was from cases of handling exceptional conditions; in some cases the code that got control did not have the data to know what storage should be freed.

GC has some challenges:

First, in a rich language, it can be not easy to know what storage should be freed. So, there can be some bugs in GC implementations.

Second, GC takes time, and maybe in some situations, that is, in some programs, takes too much time and results in, say, noticeably slower response time. One place where GC tends to be unwelcome is real time programming where want the software to respond in no more than a few milliseconds to external events that occur at unpredictable times.

One of the main ideas for GC implementation is reference counting where the programming language compiler inserts extra code that, for each instance of appropriate cases of allocated storage, keeps track, say, just a count, of essentially how many variables in the code (for each thread of execution) might use, reference, the storage. Then for such an instance of storage when its reference count reaches zero, free the storage.

[+] wahern|5 years ago|reply

> E.g., early versions of the programming language Fortran did not offer dynamic storage allocation, but some programmers would implement their own, say, in a Fortran array. Then for pointers to the allocated storage, just use a subscript on the array name.

Even today people do the same thing in languages like Java and Rust as workarounds for performance or semantic constraints of the environment while still nominally obeying language semantics. I assume the same phenomenon is true in the C# universe.

When you bring this up many users of those languages are quick to explain why array indices are safer than managing raw memory addresses from outside the language's object model, but let's not go there ;)

[+] zbendefy|5 years ago|reply

[deleted]

[+] kyberias|5 years ago|reply

How do I know this person really knows about .NET GC internals for me to spend any time watching some videos?

[+] ojnabieoot|5 years ago|reply

You could make a similar argument about... any video, song, essay, etc, created by anyone. So I’m not sure what the point of your comment is. I think you are getting at the fact that there’s a lot of bad-faith grifters out there who prey on entry-level devs. This is a legitimate concern! But in my experience it’s pretty easy to sniff those folks out with a superficial investigation. You don’t want to “spend any time” watching the video but I think you can spend two minutes skimming their other work, their CV, etc.

I myself didn’t know anything about this person so I poked through their blog a bit. They are clearly marketing their education business (and I am not their target audience) but also seem to know what they’re talking about. More to the point: their material appears to be sincerely helpful to intermediate (and even advanced) NET devs. This relatively short post on unsafe array access in C#[1] although not especially deep, does at least dive into the runtime / IL internals enough to give a clear picture of what the unsafe code is actually doing.

I haven’t watched the video either but they are creating good-faith tutorials, they are obviously capable of understanding the .NET GC, and there’s no reason to suspect they would change type and BS their way through a new video series.

[1] https://tooslowexception.com/getting-rid-of-array-bound-chec...

[+] bargl|5 years ago|reply

While actually a good question, I am reading a bit of snark here.

Don't spend your time on anything you don't want to, but if you want to learn about .NET GC then either research the author or research other deep dives into .NET GC.

To research the author you could.

Go to the About page on their blog...

You could also look them up in Stack Overflow. https://stackoverflow.com/users/2894974/konrad-kokosa

You could watch the first 10 minutes where he addresses this.

[+] jpfed|5 years ago|reply

Not that this proves anything, but this author's twitter account is followed by:

* David Fowler (ASP.NET Core creator)

* Andy Gocke (Lead developer for the CLR)

* Jared Parsons (Lead developer for C# Compiler team)

* Miguel de Icaza

Presumably if he wasn't saying anything worth listening to, they would have unfollowed him by now.

[+] Avalaxy|5 years ago|reply

The author is MVP in .NET and author of the book Pro .NET Memory Management.

[+] fctorial|5 years ago|reply

Just read the source code. It has everything you'll ever need:

https://raw.githubusercontent.com/dotnet/runtime/master/src/...

Or spend an hour listening to this person. Who knows his videos might have some bits of important information.

[+] mustak_im|5 years ago|reply

He's very well known in the community. If one has been in any talks regarding .net gc, chances are they meet him there. Oh yeah he is also the author of an entire book about .net gc Pro .NET Memory Management. So I trust he knows what's he doing.

[+] NicoJuicy|5 years ago|reply

A minimal amount of effort would lead to an author with very good reviews ( 5/5 on amazon)

[+] PieUser|5 years ago|reply

Faith.

74 comments