Small freezes / fps drops with increased frequency when speeding up game

Started by RayZoar-Z, July 10, 2019, 06:05:10 PM

Previous topic - Next topic

Elvenstar32

Quote from: Canute on October 14, 2019, 12:03:07 PM
Quote(sometimes it even causes the audio to freeze)
I got trouble with the audio too.
But this was because a harddisk got on autosleep mode, and the system tried to access it.
This cause the audio to freeze/loop for a few seconds until the system could access the files from that harddisk.

But i don't think that belong to these lagspikes.

Given that this only occurs during the stuttering and the stuttering only occurs when the GC is clearing itself, given that I have the game on an SSD and given that I have disabled the hard drive sleep thing in the windows powerplan I heavily doubt the 2 aren't linked in my case.

madd_mugsy

So I've done a little bit of testing at 60Hz, 3x speed.  I noticed the lag spike for <1s every 55-60 seconds or so (approx 57 seconds usually), though there is also always a spike within 10 seconds of starting a new game.

I then added Apparello2 because it's a pretty big content mod with zero code, and while the memory usage was higher, the time to lag remained the same, as did the duration of the lag.

Then I tried running with just HugsLib, and it was basically the same as vanilla.

Then I tried running with HugsLib + JecsTools and it would lag every 19 seconds instead of every ~57 seconds.  The memory usage goes up very quickly with JecsTools, and I'd wager that it's exacerbating things quite a bit.  Has anyone mentioned this to Jecrell?

In these tests, I only ran the game for a several minutes each time, as I don't have a ton of spare time today.  Anecdotally, my past experience has been that the lag will become more pronounced (longer spikes) over time, though I don't have the data to back that up yet.

Also if anyone knows a good memory profiler that will work with Rimworld, please let me know.  RenderDoc was the one I saw recommended to use with Unity games, but I haven't had much luck with it so far.

Elvenstar32

I have done more testing as well I can confirm everything madd_mugsy said.

Testing conditions were similar at 60hz and using x3 speed. However starting conditions are with 10 colonists instead of 3, just to have something of a midgame performance simulation.

On the hardware side of things, I have Rimworld on an M.2 SATA SSD, running the game on Windows 1903 64bit with an i7-9750h, a GTX1660ti and 16gb of DDR4 RAM @2666hz. No thermal throttling at any point.

So back to the results:

The spike is a lot shorter with vanilla or few mods but it ramps up as the number of mods increases. It however always stays a spike, the game does not freeze for several seconds in a row. Nevertheless the vanilla spike seems to be only a few microseconds whereas a fully modded spike seems to be closer to a full second.

Using RimworldGC to monitor RAM usage, vanilla rimworld RAM usage increases by 0.5MB per second. As you add mods this increases progressively and fairly linearly except for problem mods.

It seems like the bigger the RAM increase per second is big, the more frequent and the longer the spikes become.

There are however as madd_mugsy also mentioned some mods that increase the RAM usage per second a lot more than other mods.

So far I found 3 mods that are way over the line and are probably a big source of those issues and 3 mods that would need more testing but definitely use more than other mods on average

The 3 big ones were: "JecsTools", "What the hack" and "Achtung". Those all increase the RAM increase per second by at least 10 up to 30mb.

The 2 questionable ones were: "Giddy up" and all its modules, "Combat Enhanced" with all necessary patches that this mod always needs and the Humanoid Alien Race Framework (even without any additional factions installed). Those increase the RAM increase per second between 5 and 10mb.

As I have tested a wide variety of mods, here's the list of all mods I have currently enabled : https://pastebin.com/QYdHXwx9

I'm fairly confident that none of those mods are individually responsible for big spikes (power++ seems to be slightly more RAM hungry than average, using a whole mb per second on its own though). As it stands the RAM usage increases by 4 to 5mb per second for a spike every 50 seconds.

Ophaq

Hey all, I'm just going to put this out there and say that I also have been having this issue since around A18. I however, have over 250 mods installed. But my issue is probably the same thing, especially late game. Although I've been experiencing it in early game as well where the game will kind of hiccup for a split second regardless if I'm panning around the map or not. It gets way worse late game and I usually get sound stutters where the sound kind of "drags" like in a BSOD for half a second or second.

My GPU is an HD7970 and I run an i5 3570k OC'd to 4.2ghz.

Penchekrak

I think there is a silver bullet for that issue - disable garbage collector and trigger it to run before autosave. We always have a lag before autosaving, it will change nothing if it will be 1 sec more but make gameplay much more smooth.

LWM

Quote from: Tynan on October 14, 2019, 12:02:04 AM
So basically if code is unoptimized in general, in ways that are not obvious, it'll produce GC hitches. But it won't produce these hitches when it runs; it'll produce them some time later. I think there's definitely some of this going on with these mods. So part of the solution will be simply having mod authors optimize their code to stop allocating and releasing so many objects.

For mod authors who are unversed in Unity programming or unfamiliar with C#, is there anything you can recommend in terms of practices to avoid, practices to follow, tools to use, perhaps reading about the problem that you can recommend?

For everyone reporting problems, there's also the question of memory pressure in general - what's going on in the background?  Many of us (including the dev apparently) don't ever see the distracting stutters, so this has got to be a very frustrating problem to work on.

One other question for Tynan: is it possible to force the GC to run more often - and might that help at all anyway?

Thanks!

Ophaq

I also have an issue late game where the game speeds become slower late game. For example, speed 3 will look like speed 1 or 2. But I think that is just due to RimWorld being only a single core game since so much is going on, especially with over 250 mods.

Ophaq

Quote from: Penchekrak on October 24, 2019, 04:30:00 AM
I think there is a silver bullet for that issue - disable garbage collector and trigger it to run before autosave. We always have a lag before autosaving, it will change nothing if it will be 1 sec more but make gameplay much more smooth.
How do you do this?

Tassatir

Dunno if this will help but I've had this issue recently and I've realized it's because my AI (Colony Manager mod) had its tasks set to check things too often. (Such as checking forestry and hunting among other things)
I had it set to check every hour IGT which was too much for my game to handle, so I cranked it back to every 8 hours and daily and it solved my issue. Hope this helps.

Edited to mention the specific mod, since I noticed it on your mods list and the symptoms seemed to match up with mine, since the faster the game runs the more often it can check tasks. I know it's a bit late but I hope it can help you.

Penchekrak

There is a fix in new version of unity - incremental garbage collector: https://www.youtube.com/watch?v=5Fks2NArDc0
Looks like that should solve that problem in Rimworld.
Tynan, may you please comment about that feature, please. Tested, worth it?

Pangaea

The video says that .NET 4 is a requirements, which I believe means this GC wouldn't work on Linux or Mac systems. Not good.

Since the base game runs well, however, do we know that this really is a GC issue, and not a modding issue?

Pressensaft

For me, the base game exhibits the same behavior, though. The lag isn't as prominent or as often but it´s there and when calling the GC with RuntimeGC you can reproduce the lag. That being said, it´s a lot harder to notice in vanilla as it only happens every 50-60secs. I played with B18 and the lag didn't appear, no matter the number of mods I used.
The issue is now in the game since around 1.0 and I´ve been looking for a fix since then, as the game is unenjoyable for me. There has to be a difference within these two versions that cause the lag.

Tynan

If there is code spewing garbage into the memory heap:

-- Running GC more often won't help for a variety of deeper technical reasons. Otherwise we'd just GC every frame.

-- Incremental GC would likely smooth out the spikes, but would just spread performance degradation more among all the different frames. This may be an improvement but I'm not sure I'd call it an acceptable solution. No way of knowing without testing.

This isn't a problem you're going to solve with a simple band-aid, it's just a matter of engineering code not to spew unnecessary garbage into the heap. It's a continuous process of profiling, optimization, and complex technical decision-making that lasts for an entire project. We did it and modders need to do it too to produce quality, performant code.

Without looking at code it's hard to say how mod authors can do better in terms of writing code specifically. However, the one critical thing they should try to do is profile their code so they can see where the problems are. (I believe there are some .NET profilers that could work, though I'm not an expert here since we use the internal profiler.)

Quote from: Pressensaft on October 30, 2019, 08:07:29 AM
For me, the base game exhibits the same behavior, though. The lag isn't as prominent or as often but it´s there and when calling the GC with RuntimeGC you can reproduce the lag. That being said, it´s a lot harder to notice in vanilla as it only happens every 50-60secs. I played with B18 and the lag didn't appear, no matter the number of mods I used.
The issue is now in the game since around 1.0 and I´ve been looking for a fix since then, as the game is unenjoyable for me. There has to be a difference within these two versions that cause the lag.

If you're using RuntimeGC, you're not using vanilla. If you're using RuntimeGC to induce performance problems deliberately, it's definitely not vanilla.

I'm not sure how it's working now, but we took a look at RuntimeGC a year or two ago and found that it didn't do anything useful; it's perfectly possible this mod is degrading your performance by unnecessarily checking and shuffling data. Not everything that 'looks' or 'feels' like an optimization actually is, and sometimes they're de-optimizations.

If RuntimeGC is forcing garbage collection more often (not sure it is, but if) it's almost certainly degrading performance. GC timing is an extremely complex subject because of the generational nature of the GC and all the non-obvious things it's doing internally to speed itself up. It does not work the way you would think, and the way it does work does not produce results the way you would think. This is a hyper-optimized system designed by people with massive expertise in their field backed by years of formal research, analysis, and performance data. The smartest thing to do is to leave it alone. This is a hard point to make without actually teaching a course on garbage collection and, cache lines, pipelines, branch prediction, etc. But it's just not a system that bear simplistic non-expert reasoning.

The core game of 1.0 is more optimized than B18, but a lot has changed around it in the ecosystem that could affect performance if you're using mods. There are more mods, bigger mods, more ambitious and complex mods. Modders are more numerous and may be less experienced on average than they were before.

---

Regarding core count, even if we made the vanilla game code use multiple cores it wouldn't help GC problems at all. The current situation is several seconds of smooth framerates with hitches at GC time. Moving code to multi-core just speeds up the already-smooth sections without helping the GC whatsoever. So you're solving a problem you don't have, without addressing a problem you do have.

Also note that even if we put a ton of the vanilla game code on multiple cores, it wouldn't mean that mods had to do the same thing. And if mods need more performance, mods already have everything they need to run their code on multiple cores. Even if vanilla ran on multiple cores, mods might not. Mods can still spew garbage on any number of cores, producing the same GC hitches as now, because as noted above the speed of the GC has nothing to do with how many cores the game is running on. It has to do with how much memory garbage the game is producing, which is a figure we optimized quite a bit in vanilla because it's the one that really matters.

Honestly there's a lot of voodoo reasoning about optimization and performance out there. When you actually profile a real game in real-world situations on a real and flawed tech stack (like Unity and the Mono GC), the things that actually matter can often turn out to be completely different than what you might have thought. In this case, what actually matters is reducing memory garbage. Multi-core doesn't help this. Messing with the default GC works against this.
Tynan Sylvester - @TynanSylvester - Tynan's Blog

ReZpawner

This is a bit of a shot in the dark, but during my own development in Unity, I came across someone mentioning that strings could cause stuttering and tiny lagspikes due to garbage collection, and that the way around it was to use stringbuilder instead of strings. It just stuck with me since it describes the problem that a lot of people are having in Rimworld at late game. Could this be relevant?

Tynan

Quote from: ReZpawner on November 10, 2019, 10:56:37 AM
This is a bit of a shot in the dark, but during my own development in Unity, I came across someone mentioning that strings could cause stuttering and tiny lagspikes due to garbage collection, and that the way around it was to use stringbuilder instead of strings. It just stuck with me since it describes the problem that a lot of people are having in Rimworld at late game. Could this be relevant?

Building complex strings with repeated string allocations is one out of a thousand things that can spew memory and make GC spikes. Vanilla doesn't do it, but some mods probably do.
Tynan Sylvester - @TynanSylvester - Tynan's Blog