9thFinger's ThreadingMod

Started by FrodoOf9Fingers, October 09, 2017, 04:38:56 PM

Previous topic - Next topic

FrodoOf9Fingers

I figured that there should be a thread in the mods section for the work I'm attempting to do.

I'm currently working on making the main update loop of the game multi-threaded. Currently, every "thing" does it's "tick" on the same thread, I'm working on splitting the work into separate threads.

Goals:
Implement this using harmony as much as possible. Currently blocked on Harmony not working with reverse patching, so I've hijacking functions (which will destroy compatibility with mods that modify the same functions).

Progress: I feel like I got a good chunk of the errors while threading resolved, the game no longer explodes with errors. Working on a good way to handle loading resources, as unity wants those loaded from the main thread (apparently). These include things like speech bubbles and sounds.

If there are any modders reading this, you might want to consider making your mod thread safe. Once I get this to a certain point, I might release a version for modders to use to see what if they need to modify their code for parallelism.

Wish me luck!

P.S. If you see FifthElement in discord, thats me.

Kiame

Thanks for this post. Most of my mods will probably work without an issue though two of them could be problematic (Change Dresser and Weapon Storage). I'll pull the dev version down and test it out when it's available  :)

TastyCookies

Thank you for making this mod, RimWorld definitely needs multi-threading. I wish you the best luck on your progress :)

SpaceDorf

Maxim 1   : Pillage, then burn
Maxim 37 : There is no overkill. There is only open fire and reload.
Rule 34 of Rimworld :There is a mod for that.
Avatar Made by Chickenplucker

FrodoOf9Fingers

Not to ruin the party, but I thought I should mention...

This may not work. Sometimes communication between threads is greater than the gains from multiple threads. Now, I don't think that's the case here, but it is possible.

Also, I'm changing function definitions at runtime, which is inefficient. We will see if we gain speed rather than lose it.

CannibarRechter

> Currently blocked on Harmony not working with reverse patching,

I think I asked you this before, but I missed your response. What is this reverse patching thing you are referring to? I know Harmony pretty well, but haven't encountered this terminology or feature.
CR All Mods and Tools Download Link
CR Total Texture Overhaul : Gives RimWorld a Natural Feel
CR Moddable: make RimWorld more moddable.
CR CompFX: display dynamic effects over RimWorld objects

FrodoOf9Fingers

Sorry for getting back to you late, visiting family work and school had me busy for awhile.

Reverse patching...

A good example would be the verse.ai.Pathfinder class in rimworld. All it does is run an algorithm, but the class is stateful (game is still in alpha, things can change...), meaning that while one request for findPath is running, running another one in the same instance will corrupt the results of both (or just cause errors).

The way for a modder to fix this (I cannot emphasize enough the word modder there) is to create a pool of Pathfinder instances. Then, forward calls to findPath to available Pathfinder instances.

One way of patching this with harmony is to have a prePatch that forwards the call to an available PathFinder and call it's findPath. But now you've got a circular loop, because as soon as you call the forwarded findPath, the prePatch code runs again.

The way to fix this is to copy the original findPath code to a new function that is an extension method for Pathfinder (lets call it findPathHijack, or something like that). Then your prePatch forwards calls from findPath to another instance of Pathfinder, but instead of calling findPath again you call findPathHijack so that the prePatch doesn't run again.

In short, reversePatching is taking the original code and patching your own code with the original code. Patching is modifying the original code with your own code. 

There are some workarounds, I can still create a working mod. It just won't play nice with other mods that affect the same methods. I'm not even sure if reverse patching will be possible to the extent that I want it to be.

FrodoOf9Fingers

As a heads up, I am taking a short break from the actual coding. I'm trying to make this a senior project for my bachelor's degree, and there's some paperwork I need to fill out, and then I can start reporting to the project coordinator on a weekly basis for credit.

SpaceDorf

Good Idea  ;D

and real long term comitment.
Maxim 1   : Pillage, then burn
Maxim 37 : There is no overkill. There is only open fire and reload.
Rule 34 of Rimworld :There is a mod for that.
Avatar Made by Chickenplucker

Kori


Nightinggale

Now this is a mod, which should really be in Rimworld itself. However based on official statements, it's not going to happen anytime soon, meaning a mod like this is most welcome.

Quote from: FrodoOf9Fingers on October 12, 2017, 08:14:24 PM
Not to ruin the party, but I thought I should mention...

This may not work. Sometimes communication between threads is greater than the gains from multiple threads. Now, I don't think that's the case here, but it is possible.

Also, I'm changing function definitions at runtime, which is inefficient. We will see if we gain speed rather than lose it.
This post makes me so happy about this thread because of past experience with people, who claim they can do wonders if only (certain tech word) is used, like "I can move the graphics from the CPU to the GPU to free up CPU time.", later: "I don't get it. I only get max 1 fps, the CPU load has increased significantly and the screen is drawn upside down" (sadly true story). However the post I reply to here reveals actual insights in threading and added bottlenecks, which can be bigger than the gain.

There are plenty of bottlenecks in going multithreaded and they are not all equally obvious. The most talked about is semaphores, which are used to make sure writing to shared memory stays thread safe, effectively making the threads stall and queue to access memory. If used too little, the code can become unstable. If used too much, the overhead will slow down more than what is gained, effectively making the resulting program run slower. It's a lot harder to get a good stable design with as few semaphores than most new programmers realize.

There are some less talked about issues, which prevents the multithreaded performance from increasing as much as one would think at first glance. Memory I/O and CPU cache activity causes the CPU cores to stall more. Another is CPU temperature or power usage. For instance my CPU clock speed drops 15% when going from one fully loaded core to 4 fully loaded because of a limit to the current (power) to all cores combined and even with reduced speed, it produces more heat. Heat tend to slow down laptops more than this desktop, but I don't have laptop numbers ready. However usually CPU cores aren't fully loaded though as they idle while waiting for memory, even if they show up at 100% in the process overview, meaning it depends on how memory efficient the programming is. That makes this topic quite complex to predict, to say the least.

The real bottleneck in this task is not how much can be spread to other cores, but rather how much the main thread can be reduced. If the main thread is reduced by 10% including the overhead, it doesn't really matter if there are 3 other cores with 10% or 50% load each because it's still the main thread, which will be the bottleneck.

So yeah, predicting performance boost from using multiple CPU cores is not strait forward. Now the question is what to expect from using multiple threads and how to measure it. Combined CPU time makes no sense because that includes time spend on other CPUs and we don't care (much) about that for the time being. I think the best would be how many calculations can we make each tick without visible slowdowns. If we can get a 10% increase there, then I will call it a great success. Sure I hope for 100%+ increase, but I don't find that very likely.

Quote from: FrodoOf9Fingers on October 17, 2017, 01:20:24 PMThe way for a modder to fix this (I cannot emphasize enough the word modder there) is to create a pool of Pathfinder instances. Then, forward calls to findPath to available Pathfinder instances.
This would likely require one instance for each core, multiplied by 2 if the CPU has hyperthreading. This brings up the question: how much extra memory will this require? It's already approaching the limit due to being a 32 bit application, though I have found the out of memory issues seems to go away if playing on a 30% world rather than a 100% world.

Having said that, I also has to point out that it has been announced that pathfinding in A16 is slow, A17 improved performance, but made pawns use weird routes and in A18 it will change again, this time to be equally optimized while using fastest paths. A18 is currently in internal testing, meaning it's very likely that you will have to review your pathfinding code within the near future.

Quote from: FrodoOf9Fingers on October 09, 2017, 04:38:56 PMIf there are any modders reading this, you might want to consider making your mod thread safe.
Good advice, but it's almost useless because the vast majority of modders will not have a clue to how to do that. You will have to write a guide or a list of "do this" and "never do this".

It brings up an interesting question. What if two colonists plans what to do during the same tick and at the very same time on two CPU cores decides to take the same wood to the same furniture construction? What will happen and how do you make it thread safe to prevent the same item from being picked up multiple times?
ModCheck - boost your patch loading times and include patchmods in your main mod.

FrodoOf9Fingers

Quote from: Nightinggale on October 17, 2017, 04:20:00 PMThe real bottleneck in this task is not how much can be spread to other cores, but rather how much the main thread can be reduced. If the main thread is reduced by 10% including the overhead, it doesn't really matter if there are 3 other cores with 10% or 50% load each because it's still the main thread, which will be the bottleneck.

This should be fairly high. I don't believe that graphics takes more than 25% of the CPU time (leaving 70-75% for the main update loop, which I'm threading). But, there is that chance...

Updating the pathfinding algorithm will be rather easy, I haven't modified the actual function code at all, so it'll just be a matter of updating function calls (once Harmony updates). I'm not worried about memory usage right now (I create 8 of everything, I'm on an I7), but before release, I'll go through and re-evaluate my decisions. My first thought is to only spin up extra pathfinders on demand (if I need a 4th, create a 4th and keep it around for awhile), which would be easy to do with my current setup. This is actually my first large C# project, I'm a Java programmer by trade, so I've been playing around alot with coding styles, conventions, etc... that I'll need to clean up / standardize throughout the project before release.

As for your specific threading example: I don't know yet :P. There's tons of hard problems and I haven't gotten tot hat one yet. For the most part, I'll been resolving NREs throughout the code stemming from multithreading. After I get those errors out of the way I get to think about resource contention (finding exceptions is a great way to know the code too, so when I get to the harder debugging I'll stand a better chance). From my ignorant position, I feel like there potentially three possibilities: lock the thought trees so only 1 decision maker can act at a time (eww...), lock some smaller section in the thought tree based on item (I doubt this is possible), or just allow it: two Pawns heading for the same resource trying to do the same task would be resolved when one pawn inevitablely gets there first. If not (VERY unlikely), an NRE would be thrown from picking up an item thats not there, which I would catch and handle with a retick (ewww, but it works).

This would work for an initial release version. I'm not writting perfect code, just working code. Later versions will make things go better.

Kiame

For actions of pawns on shared objects - this is just observation from the game, not actually looking at the code

It looks like items get reserved by pawns before a pawn starts an action on it. In this case it could just be last one wins. If two pawns reserve a production table:
P1 checks and sees table is not reserved and-
P2 checks the table and it's not reserved and reserves it
P1 -reserves the table. P1 verifies it has the reservation on the table and starts going that way.
P2 Verifies it has the reservation but it doesn't. P2 does something else.

Worst case both pawns start doing an action the the bench but the losing pawn gives up after one update cycle.

FrodoOf9Fingers

Quote from: Kiame on October 20, 2017, 03:58:02 PM
For actions of pawns on shared objects - this is just observation from the game, not actually looking at the code

It looks like items get reserved by pawns before a pawn starts an action on it. In this case it could just be last one wins. If two pawns reserve a production table:
P1 checks and sees table is not reserved and-
P2 checks the table and it's not reserved and reserves it
P1 -reserves the table. P1 verifies it has the reservation on the table and starts going that way.
P2 Verifies it has the reservation but it doesn't. P2 does something else.

Worst case both pawns start doing an action the the bench but the losing pawn gives up after one update cycle.

Actually, the first pawn that reserves it gets it. It may be "overridden" by direct commands, but the first pawn to cycle through reserves the item, assuming that they both tried to get the item on the same tick.

If the same happens in a multi-threaded environment, the outcome is somewhat random (currently, working towards this point).

Nightinggale

Quote from: FrodoOf9Fingers on October 20, 2017, 02:28:49 PMFrom my ignorant position, I feel like there potentially three possibilities: lock the thought trees so only 1 decision maker can act at a time (eww...), lock some smaller section in the thought tree based on item (I doubt this is possible), or just allow it: two Pawns heading for the same resource trying to do the same task would be resolved when one pawn inevitablely gets there first. If not (VERY unlikely), an NRE would be thrown from picking up an item thats not there, which I would catch and handle with a retick (ewww, but it works).
It's possible that the problem can be solved by reserving the item and then next tick act on it. This way if two pawns reserve the same item, the next tick they both check for it and one of them will realize it's reserved to somebody else and figure out something else to do. The beauty of this approach is that it should be possible to do without semaphores, which would be beneficial for performance. Picking up and dropping items would need some consideration though as even a minor risk of item duplication would be a serious issue.

Quote from: FrodoOf9Fingers on October 20, 2017, 07:56:36 PMIf the same happens in a multi-threaded environment, the outcome is somewhat random (currently, working towards this point).
Random is ok. Nobody will really be able to tell the difference anyway and Rimworld is singleplayer. Introducing that kind of randomness in multiplayer (read: multiple computers doing the same calculations in parallel) is gamebreaking because then it requires that the output of the randomness (race condition) will have to be the same for everybody every single time, which is extremely unlikely.
ModCheck - boost your patch loading times and include patchmods in your main mod.