64 bit?

Started by daedelus, December 24, 2016, 12:03:30 AM

Previous topic - Next topic

daedelus

I'm just thinking out loud here but is there a reason this game is 32 bit? It seems like Tynan is severely limiting its potential by not going 64 bit... If it was 64 bit, even the fact that it hasn't been optimized for multiple colonies would still make it possible since it would be able to use much more memory. Is this maybe planned for the future?

Zhentar

Making the game 64-bit is pretty much 5 minutes, switch the build target and hit compile.

But we wouldn't benefit from it, and neither would Tynan. I'm sure he doesn't want the memory usage to go past 4GB, since it would push the system requirements past what many players are using successfully today. And we wouldn't benefit from it because RimWorld has little need for more memory. Games that seriously benefit from large amounts of memory are using many very large textures.RimWorld is not. Much of RimWorld's memory usage is game data structures, and making these structures larger directly harms performance.

The right fix for A16's memory usage is to reduce it, trimming the fat. There is likely a lot of low hanging fruit, since Tynan hasn't needed to focus on it before, so it shouldn't be a hard problem to solve.

Bozobub

Pardon me?  You're asserting a player running several concurrent colonies wouldn't benefit from having more RAM?  I find that at least a little difficult to believe; at some point, it has to matter, would it not?

And if it's as easy as flipping a compiler toggle, as you say, I kind of fail to see the problem with offering both.  Hell, offer it with no additional support, if you must -.-' .  It's not like running contiguous colonies is encouraged at the moment, anyhow, right?
Thanks, belgord!

Zhentar

Yes, that is exactly what I'm asserting. A16 RimWorld is not actively working harder in order to constrain memory usage. It does not unload textures and load new textures from disk. It doesn't load anything from disk at all during game play (except background music in some cases, and that's both very light and easily cached by the OS). It's not compressing anything. Using more memory just for the sake of using more memory actively harms performance; more memory means more paging, more TLB misses, more L3, L2, and L1 cache misses. And when you need to use a 64 bit address space to get that memory, the problem is even worse; now all your pointers are twice the size. The 'Thing' class (one of the more common objects in RimWorld) would grow by over 50%; now everything that uses Thing (which is pretty much everything) is slower just because we might want to use more than 4GB of RAM, never mind the cost of actually using that much. RimWorld doesn't do much (maybe not any) 64-bit math, and already agressively inlines performance critical code paths; it benefits fairly little from the x64 instruction set.

The only users that would benefit from a 64 bit build are the ones experiencing out of memory crashes. The rest would, at best, have a performance neutral experience while consuming more system resources, and at worst see significantly worse performance. Offering a second build is giving users a false choice - there's one right answer, giving people the opportunity to choose otherwise implies that there is an important difference, creating confusion and leading people to make the incorrect choice. Even if the second version is offered "without support", it still creates a support burden simply to explain to users which build they should use.

LordMunchkin

Quote from: Zhentar on December 24, 2016, 05:12:32 PM
The only users that would benefit from a 64 bit build are the ones experiencing out of memory crashes.

As one of the users experiencing out of memory crashes regularly (lots and lots of mods), I'd really like a 64 bit build!  :P

RawCode

you won't benefit due to CPU single thread bottleneck, please read techdocs.

eiggzdpvod

Quote from: LordMunchkin on December 25, 2016, 02:33:30 AM
Quote from: Zhentar on December 24, 2016, 05:12:32 PM
The only users that would benefit from a 64 bit build are the ones experiencing out of memory crashes.

As one of the users experiencing out of memory crashes regularly (lots and lots of mods), I'd really like a 64 bit build!  :P

Me too!!!!!!!!!!!!!!!!!!!!!!
We want 64bit.
4 GB is too small.

OFWG

Quote from: RawCode on December 25, 2016, 09:38:12 AM
you won't benefit due to CPU single thread bottleneck, please read techdocs.

Did you mean to post in a different thread or something? This is about memory usage.
Quote from: sadpickle on August 01, 2018, 05:03:35 PM
I like how they saw the naked guy with no food and said, "what he needs is an SMG."

RawCode

Quote from: OFWG on February 08, 2017, 09:07:41 PM
Quote from: RawCode on December 25, 2016, 09:38:12 AM
you won't benefit due to CPU single thread bottleneck, please read techdocs.

Did you mean to post in a different thread or something? This is about memory usage.

read tech docs first if you don't understand how and why?

XeronX

I have seen references to the single thread bottleneck before.  That I am honestly curious why we have in this day and age anyways.  IIRC very few processors anymore are single threaded.

And no please understand I am not being snippy. I truly am curious why this is a single thread game and don't have the technical knowledge to know on my own.

Zhentar

Writing code that can do multiple things at the same time is, generally speaking, hard. I've personally written multi threaded processing routines a number of times at my job, and it remains a last resort when we absolutely cannot get adequate performance in a single thread. If you make mistakes, you end up with random data corruption. Debugging that data corruption can be quite challenging because it happens at random; you can't just set a breakpoint where it goes wrong because you don't know when and where it will go wrong, and hooking a debugger or adding logging code can change the timing just enough that it stops happening. If it was even happening for you in the first place; the particular timing may not ever line up on your own system in the first place. You have to work harder, and think more carefully when you write the code the first time, and any time you touch it afterwards. One time, I had logs proving one of my customers was getting a particular error in an area that was just a couple hundred lines of code. I spent three straight days trying to reproduce the issue, and poring over the code searching for any possible logic error. I never reproduced it, my testers could never reproduce it, and I couldn't ever find the problem, and I ended up shipping a patch that just locked far more than could possibly need it (even things that were guaranteed to be atomic). They stopped  complaining so it must have done the job without hurting performance too much.

If you can plan things things out to be almost completely independent, then it's generally not too hard to get two separate things going at once. In fact, Unity is actually dual threaded, with a second thread doing some of the graphics work. But RimWorld isn't rendering intensive so that doesn't help us much, and the stuff RimWorld actually spends its time on is very tightly interdependent.

XeronX

wow that was a very quick and completely informative answer, TY very much Zhentar

hoffmale

#12
Quote from: Zhentar on December 24, 2016, 05:12:32 PM
Yes, that is exactly what I'm asserting. A16 RimWorld is not actively working harder in order to constrain memory usage. It does not unload textures and load new textures from disk. It doesn't load anything from disk at all during game play (except background music in some cases, and that's both very light and easily cached by the OS). It's not compressing anything. Using more memory just for the sake of using more memory actively harms performance; more memory means more paging, more TLB misses, more L3, L2, and L1 cache misses. And when you need to use a 64 bit address space to get that memory, the problem is even worse; now all your pointers are twice the size. The 'Thing' class (one of the more common objects in RimWorld) would grow by over 50%; now everything that uses Thing (which is pretty much everything) is slower just because we might want to use more than 4GB of RAM, never mind the cost of actually using that much. RimWorld doesn't do much (maybe not any) 64-bit math, and already agressively inlines performance critical code paths; it benefits fairly little from the x64 instruction set.

The only users that would benefit from a 64 bit build are the ones experiencing out of memory crashes. The rest would, at best, have a performance neutral experience while consuming more system resources, and at worst see significantly worse performance. Offering a second build is giving users a false choice - there's one right answer, giving people the opportunity to choose otherwise implies that there is an important difference, creating confusion and leading people to make the incorrect choice. Even if the second version is offered "without support", it still creates a support burden simply to explain to users which build they should use.

While you raise some good points against a 64bit version, you aren't telling the whole story, either.

Modern processors employ quite an arsenal of predictors, especially for memory accesses, which are quite good at their job when you are using predictable memory access patterns. If done well, more memory means about the same amount of TLB/L3/L2/L1 misses (or sometimes those numbers even decrease).

Yes, your pointers are twice the size compared to 32bit. However, you shouldn't really be chasing pointers in performance critical code (unless you really can't avoid it), as those are the cause for said TLB/L3/L2/L1 misses. If that 'Thing' class (your example) consists of more than 50% pointers, it means that there are some performance and software architecture issues right there. (This problem is independent from 32bit vs 64bit.) I know, it's the "OOP way" to have pointers and references to every object everywhere, but sometimes this isn't the "right" approach (especially when you are considering high performance applications like games). But I digress...

Also, while it's true that the 64bit and 32bit instruction sets don't differ much if you aren't using 64bit math and the additional registers (!), just using 64bit basically allows you to assume the presence of other CPU features (like the SSE2 instruction set), as some modern 64bit OSes (windows *cough*) require those for their 64bit version. (yes, there might be some very old 64bit CPUs or some very exotic 64bit Linux platforms that don't support those features, but those can still use the 32bit version.)

Also, you're just contradicting your earlier point ("Using more memory just for the sake of using more memory actively harms performance") with your remark about aggressively inlining: Sometimes, you can speed up execution by using more memory. In the case of inlining, that means using more memory for instructions, thereby saving the overhead of a function call. In other cases it could mean using look-up tables, memoization and/or other dynamic programming techniques.

I can't really disagree with your final argument, though: Providing multiple versions has some burden when trying to direct your users to the correct one. That said, you could still have the 32bit version as default and only direct them to the 64bit version (with warning labels!) if the 32bit version has problems or a user is explicitly looking for it.
Finally, I really hope that the Rimworld code base isn't as bad as to warrant your grim outlook on a 64bit version. And remember: This is all speculation until we can measure the difference ;)

(I really hope I wasn't too aggressive in the representation of my arguments. I don't have anything against you personally ;) )

@XeronX:
Games are wonderfully complicated programs that try to do many things at the same time. But in order not to stumble across their many parallel tasks, they need to synchronize at certain points. These synchronization points can be rather expensive (computationally speaking).

One of these synchronization points is usually when sending info to the graphics card driver about which elements should be drawn (as this data really should not be in a corrupted state!). This is sometimes called the "single thread bottleneck", as there can normally only be one thread talking to the graphics driver (yes, there are some exceptions, but those just shift the real point of synchronization around). Other threads can of course work on other stuff like playing sounds or updating the game state - but the frame rate (and therefore the perceived update rate) is still mostly dependent on the one thread that has the most computationally expensive work to do between synchronization points.

What Zhentar wrote about the difficulty of writing correct multithreaded programs is also relevant ;) FYI: Just checked in task manager, my currently running Rimworld instance uses 53 threads!

Miridor

In certain modern languages like C#, Golang  and Java, or when using modern utility libraries, multithreading really is nothing compared to doing it the 'old' way. And, mind you, I did both. Used PThreads in Linux and Windows native applications, written multithreading code in managed .NET with bound partly native C++ libraries and currently writing multithreading database import/export/conversion software in Golang, multithreading microservices using C++, boost and the Casablanca REST SDK.... I've seen pretty much all of it, except for embedded.
However, the ease of just declaring a Go-routine or Java Thread, not having to mess with 'real' pointers (with a memory address attached to it) in a language, because there aren't and most of the time not having to care about accessing shared data (and if concurrent writes are a thing, built-in sync or mutex functionality is easy to use) can have other drawbacks, like stubborn garbage collectors that won't free data until the very last and hog memory. There is a reason why manual allocate/deallocate, if done right, gives you better performance, and why you'd rather use pointers than have multiple copies of 'same' data. And that's also when it gets tricky because you have to do manually in code what's already automated away for about a decade. Many CS students I speak nowadays don't even know what a pointer is, let alone the benefits and possible dangers of using them, until they get in their third year. And only if they pick specific advanced subjects like Algorithms or Networking and Operating systems (with mandatory lessons in 'classic' C).

Now for this game. I don't have any sources but I could make some 'educated' guesses.
-If the software is CLR compliant (I read somewhere most of it is in C#/.NET?), making a 64-bit version is indeed as easy as a compiler switch. Data types are independent from 'bit-ness' afaik. It's not like unmanaged C++ (and C ofc.), where you have primary data types that are dependent on the architecture word size.
-Managed C#/C++ doesn't have 'real' pointers. You have managed objects, everything goes through garbage collection and threading is quite 'easy'. For concurrent data write-access, built-in mutex classes suffice. Let me know when you see anything that could cause a hard lock. Dining philosophers is still a thing.
-In a multi-colony game it should be very easy to run each colony in its own thread. The only places you'd need synchronisation is for inter-colony interaction (caravans departing/arriving, drop pods and the global timer so colonies don't 'drift'), selection from the world-pawn pool in case of concurrent raids and some interface elements (the colonists overview and messages list).
-In a single colony game you may at leas be able to separate controlling the video output and the AI. Video output only needs to -read- the game state and display all elements on screen accordingly.
-I'm asserting the video performance (even on large maps) is not what's slowing down the game. It's the AI that calculates the game state. Larger colonies don't use more textures per-se, just a bunch of them more often than 'once' on the map. But they do have more interacting colonists, larger raids, more animals, etc. An abundance of them makes the game crawl. I have a three-colony world with more than 100 pawns total. Even when my main colony is in the 'background' and the small mining colony is displayed where I'm strip-mining a map for materials with 10 colonists to be able to expand the main colony, I can't increase the game speed any more than I can not while the main colony is displayed. Single-speed is all I have that's usable, and even then I get regular stutters. Double-speed makes pawns jump all over the map (seems to be a cosmetic thing). Triple speed is just like singe speed but with a slide-show effect.

Usually if I play for about an hour, then get into memory issues. Earlier if I loaded a save half way. The world map stops being displayed properly, texts from item tables in trade or caravan selection start to disappear. Making a save or autosave gives out-of-memory errors. Usually it's not long thereafter the game crashes and I see a clean up of swapped out memory blocks (noticable by a few minutes of 100% SSD activity) and a decrease of ~4GB of actual in use random access memory in the task manager. Sometimes I get a dialog popup with directions to a debug log, sometimes the process just segfaults and all resources are dropped.

I know it isn't 'usual', how I play the game. But if it weren't for the 4GB memory limit, and a bit more available processing speed (running a AMD phenom here, quad-core but per core not that performant) the game would be absolutely playable. Actually it already is, because I keep loading that save ;) and expand my colonies a bit more.

hoffmale

Quote from: Miridor on February 09, 2017, 03:35:28 AM
There is a reason why manual allocate/deallocate, if done right, gives you better performance, and why you'd rather use pointers than have multiple copies of 'same' data. And that's also when it gets tricky because you have to do manually in code what's already automated away for about a decade.
Well, the main reason why manual memory management can get you better performance is having the absolute control about where in memory you create your objects (so you can abuse the hardware prefetching mechanisms). Of course, the control alone doesn't do anything, to really gain measurable improvements you need to layout your data structures in a cache-friendly way to help the prefetcher do its thing (arrays/object pools can really help with that).
Contrarily, having to jump through main memory every time you access a new object (that isn't in an array or similar contiguous memory region) basically sets you up for cache misses every time you access that object!

Quote from: Miridor on February 09, 2017, 03:35:28 AM
Managed C#/C++ doesn't have 'real' pointers. You have managed objects, everything goes through garbage collection [...]
But every reference to a managed object is basically a "fancy" pointer under the hood (even though you can't really manipulate it). So just accessing an object requires you to dereference a pointer (which can point anywhere the GC/runtime thinks appropriate).

Quote from: Miridor on February 09, 2017, 03:35:28 AM
Many CS students I speak nowadays don't even know what a pointer is, let alone the benefits and possible dangers of using them, until they get in their third year. And only if they pick specific advanced subjects like Algorithms or Networking and Operating systems (with mandatory lessons in 'classic' C).
Interestingly, I am a CS student, and those subjects you mentioned were mandatory in 2. to 4. semester at my university in Germany (minus lessons in C, the only language we were taught in any way was Java - so people had to figure out any other language used on their own...). Yes, most of my fellow students aren't always comfortable using pointers, but they really should know their dangers!

I guess you are speaking from an American point of view?