top | item 46416330

(no title)

luaKmua | 2 months ago

In general when game development comes up here I tend not to engage as professional gamedev is so different than what other people tend to deal with that it's hard to even get on the same page, but seeing as how this one is very directly dealing with my expertise I'll chime in.

There are few things off with the this post that essentially sound as someone more green when it comes to Unity development (no problem, we all start somewhere).

1. The stated approach of separating the simulation and presentation layers isn't all that uncommon, in fact it was the primary way of achieving performance in the past (though, you usually used C++, not C#).

2. Most games don't ship on the mono backend, but instead on il2cpp (it's hard to gauge how feasible that'd be from this post as it lacks details).

3. In modern Unity, if you want to achieve performance, you'd be better off taking the approach of utilizing the burst compiler and HPC#, especially with what appears to be happening in the in sample here as the job system will help tremendously.

4. Profiling the editor is always a fools errand, it's so much slower than even a debug build for obvious reasons.

Long story short, Unity devs are excited for the mentioned update, but it's for accessing modern language features, not particularly for any performance gains. Also, I've seen a lot of mention around GC through this comment section, and professional Unity projects tend to go out of their way to minimize these at runtime, or even sidestep entirely with unmanaged memory and DOTS.

discuss

torginus|2 months ago

I think you've unfortunately got suckered in by Unity marketing wholesale, and things would stand to be cleared up a bit.

Unity's whole shtick is that they make something horrible, then improve upon it marginally. The ground reality is that these performance enhancement schemes still fall very much short of just doing the basic sensible thing - using CoreCLR for most code, and writing C++ for the truly perf critical parts.

IL2Cpp is a horror-kludge of generated code, that generates low-quality C++ code from .NET IL, relying on the opitmizing compiler to extract decent performance out of it.

You can check it out: https://unity.com/blog/engine-platform/il2cpp-internals-a-to...

The resulting code gives up every possible convenience of C# (compile speed, convenience, debuggability), while falling well short of even modern .NET on performance.

The Burst compiler/HPC# plays on every meme perpetuated by modern gamedev culture (structure-of-arrays, ECS), but performance wise, generally still falls short of competently, but naively written C++ or even sometimes .NET C#. (Though tbf, most naive CoreCLR C# code is like 70-80% the speed of hyper-optimized Burst)

These technologies needless to say, are entirely proprietary, and require you to architect your code entirely their paradigms, use proprietary non-free libraries that make it unusable outside unity, and other nasty side effects.

This whole snakeoil salesmanship is enabled by these cooked Unity benchmarks that always compare performance to the (very slow) baseline Mono, not modern C# or C++ compilers.

These are well-established facts, benchmarked time and time again, but Unity marketing somehow still manages to spread the narrative of their special sauce compilers somehow being technically superior.

But it seems the truth has been catching up to them, and even they realized they have to embrace CoreCLR - which is coming soonTM in Unity. I think it's going to be a fun conversation when people realize that their regular Unity code using CoreCLR runs just as fast or faster than the kludgey stuff they spent 3 times as much time writing, that Unity has been pushing for more than a decade as the future of the engine.

pjmlp|2 months ago

The biggest issue is that Unity is at the same time, the farol beacon for doing game develpment in C#, that Microsoft refuses to support, see how much effort Apple puts on game kits for Swift, versus DirectX team.

Efforts like Managed DirectX and XNA were driven by highly motivated individuals, and were quickly killed as soon as those individuals changed role.

One could blame them for leaving the project, or see that without them managemenent did not care enough to keep them going.

While at the same time, since Unity relies on such alternative approaches, it also creates a false perception on how good .NET and C# are in reality, for those devs that never learned C# outside Unity.

In a similar way it is like those devs that have learnt Java in Android, and get sold on the Kotlin vs Java marketing from Google, by taking Android Java as their perception of what it is all about.

Going back to game development and .NET, at least Capcom has the resources to have their own fork of modern .NET, e.g. Devil May Cry for the Playstation was done with it.

"RE:2023 C# 8.0 / .NET Support for Game Code, and the Future"

https://www.youtube.com/watch?v=tDUY90yIC7U

animal531|2 months ago

This part of your comment is wrong on many levels: "The Burst compiler/HPC# plays on every meme perpetuated by modern gamedev culture (structure-of-arrays, ECS), but performance wise, generally still falls short of competently, but naively written C++ or even sometimes .NET C#. (Though tbf, most naive CoreCLR C# code is like 70-80% the speed of hyper-optimized Burst)".

C++ code is much faster than C#, but modern C# has become a lot better with all the time that's been invested into it. But you can't just take a random bit of C code and think that it's going to be better than an optimized bit of C#, those days are long past.

Secondly, the whole point of Burst is that it enables vectorization, which means that if you've converted code to it and it's used properly that its going to support instructions up to 256 wide (from what I remember it doesn't use AVX512). That means that it's going to be significantly faster than standard C# (and C).

If the author is generating for example maps and it takes 80 seconds with Mono, then getting to between 10-30 seconds with Burst is easy to achieve just due to its thread usage. Once you then add in focused optimizations that make use of vectorization you can get that down to probably 4 odd seconds (the actual numbers really depend on what you're doing, if its a numerical calculation you can easily get to 80x improvement, but if there's a lot of logic being applied then you'll be stuck at e.g. 8x.

For the last point, new modern C# can't just magically apply vectorization everywhere, because developers intersperse far too much logic. It has a lot of libraries etc. that have become a lot more performant, but again you can't compare that directly to Burst. To compare to Burst you have to do a comparison with Numerics, etc.

doctorpangloss|2 months ago

While I get that you’re making a stylized comment, it’s a big drag. It’s one of those, “everyone is an idiot except me” styles. By all means, make a game engine that people will adopt based on CoreCLR (or whatever).

It’s not saying much that everything has tradeoffs. During the “decade” you are talking about, CoreCLR didn’t have a solution for writing anything for iOS, and today, it isn’t a solution for writing games for iOS. What you are calling kludges was ultimately a very creative solution. Usually the “right” solution, the nonexistent one that you are advocating with, ends with Apple saying no.

That is why Unity is a valuable piece of software and a big company: not because of C# runtimes, but because they get Apple and Nintendo to say yes in a world where they usually say no.

iliketrains|2 months ago

Author here, thanks for your perspective. Here some thoughts:

> approach of separating the simulation and presentation layers isn't all that uncommon

I agree that some level of separation is is not that uncommon, but games usually depend on things from their respective engine, especially on things like datatypes (e.g. Vector3) or math libraries. The reason I mention that our game is unique in this way is that its non-rendering code does not depend on any Unity types or DLLs. And I think that is quite uncommon, especially for a game made in Unity.

> Most games don't ship on the mono backend, but instead on il2cpp

I think this really depends. If we take absolute numbers, roughly 20% of Unity games on Steam use IL2CPP [1]. Of course many simple games won't be using it so the sample is skewed is we want to measure "how many players play games with IL2CPP tech". But there are still many and higher perf of managed code would certainly have an impact.

We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.

Also, having managed code makes the game "hackabe". Some modders use IL injection to be able to hook to places where our APIs don't allow. This is good and bad, but so far this allowed modders to progress faster than we expected so it's a net positive.

> In modern Unity, if you want to achieve performance, you'd be better off taking the approach of utilizing the burst compiler and HPC#

Yeah, and I really wish we would not need to do that. Burst and HPC# are messy and add a lot of unnecessary complexity and artificial limitations.

The thing is, if Mono and .NET were both equally "slow", then sure, let's do some HPC# tricks to get high performance, but it is not! Modern .NET is fast, but Unity devs cannot take advantage of it, which is frustrating.

By the way, the final trace with parallel workers was just C#'s workers threads and thread pool.

> Profiling the editor is always a fools errand

Maybe, but we (devs) spend 99% of our time in the editor. And perf gains from editor usually translate to the Release build with very similar percentage gains (I know this is generally not true, but in my experience it is). We have done many significant optimizations before and measurements from the editor were always useful indicator.

What is not very useful is Unity's profiler, especially with "deep profile" enabled. It adds constant cost per method, highly exaggerating cost of small methods. So we have our own tracing system that does not do this.

> I've seen a lot of mention around GC through this comment section, and professional Unity projects tend to go out of their way to minimize these at runtime

Yes, minimizing allocations is key, but there are many cases where they are hard to avoid. Things like strings processing for UI generates a lot of garbage every frame. And there are APIs that simply don't have an allocation-free options. CoreCLR would allow to further cut down on allocations and have better APIs available.

Just the fact that the current GC is non-moving means that the memory consumption goes up over time due to fragmentation. We have had numerous reports of "memory" leaks where players report that after periodic load/quit-to-menu loops, memory consumption goes up over time.

Even if we got fast CoreCLR C# code execution, these issues would prevail, so improved CG would be the next on the list.

[1] https://steamdb.info/stats/releases/?tech=SDK.UnityIL2CPP

timmytokyo|2 months ago

>We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.

FieldOffset is supported by IL2CPP at compile time [0]. You can also install new DLLs and force the player to restart if you want downloadable mod support.

It's true that you can't do reflection for serialization, but there are better, more performant alternatives for that use case, in my experience.

[0] https://docs.unity3d.com/Manual/scripting-restrictions.html

animal531|2 months ago

What I agree on is that if we had modern .NET available we'd get a free 2-3x improvement, it would definitely be great. BUT having said that, if you're into performance but unwilling to use the tools available then that's on you.

From the article it seems that you're using some form of threading to create things, but you don't really specify which and/or how.

The default C# implementations are usually quite poor performance wise, so if you used for example the default thread pool I can definitively say that I've achieved a 3x speedup over that by using my own thread pool implementation which would yield about the same 30s -> 12s reduction.

Burst threading/scheduling in general is also a lot better than the standard one, in general if I feed it a logic heavy method (so no vectorization) then I can beat it by a bit, but not close to the 3x of the normal thread pool.

But then if your generation is number heavy (vs logic) then having used Burst you could probably drop that calculation time down to 2-3 seconds (in the same as if you used Vector<256> numerics).

Finally you touch on GC, that's definitely a problem. The Mono variant has been upgraded by them over time, but C# remains C# which was never meant for gaming. Even if we had access to the modern one there would still be issues with it. As with all the other C# libraries etc., they never considered gaming a target where what we want is extremely fast access/latency with no hiccups. C# in the business world doesn't really care if it loses 16ms (or 160ms) here and there due to garbage, it's usually not a problem there.

Coding in Unity means having to go over every instance of allocation outside of startup and eliminating them, you mention API's that still need to allocate which I've never run into myself. Again modern isn't going to simply make those go away.

luaKmua|2 months ago

Hey there, always appreciate a dialog

Per the separation, I think this was far more common both in older unity games, and also professional settings.

For games shipping on mono on steam, that statistic isn't surprising to me given the amount of indie games on there and Unity's prevalence in that environment. My post in general can be read in a professional setting (ie, career game devs). The IL injection is a totally reasonable consideration, but does (currently) lock you out of platforms where AoT is a requirement. You can also support mods/DLC via addressables, and there has been improvement of modding tools for il2cpp, however you're correct it's not nearly as easy.

Going to completely disagree that Burst and HPC# are unnecessary and messy. This is for a few reasons. The restrictions that HPC# enforce essentially are the same you already have if you want to write performant C# code as you just simply use Unity's allocators for your memory up front and then operate on those. Depending on how you do this, you either can eliminate your per frame allocations, or likely eliminate some of the fragmentation you were referring to. Modern .Net is fast, of course, but it's not burst compiled HPC# fast. There are so many things that the compiler and LLVM can do based on those assumptions. Agreed C# strings are always a pain if you actually need to interpolate things at runtime. We always try to avoid these as much as we can, and intern common ones.

The fragmentation you mention on after large operations is (in my experience) indicative of save/load systems, or possibly level init code that do tons of allocations causing that to froth up. That or tons of reflection stuff, which is also usually nono for runtime perf code. The memory profiler used to have a helpful fragmentation view for that, but Unity removed it unfortunately.

oppo777|2 months ago

[deleted]