Alright, results with the 2v1s and 3v1s are in. Since I did normalise the quality this time around and also cut the ranged verbs for this testing, these results naturally won't be directly comparable to the 1v1 ones.
3v1 also had a smaller sample size of just 70 battles compared to the standard 100 for 2v1, since I only have 20 pawns in the setup and didn't feel like 'binning' too many new pawns, so I just spawned in a 21st who was trait-neutral, melee-capable and healthy.
2v1 Results:

3v1 Results:

Edit: I mistakenly put 6-1 for the final trial with the longsword in the 3v1; only just noticed this as I was going to get the link to link to this from OP. Final trial was actually 7-0
It's somewhat surprising that maces still output consistently worse results, but less so at the same time since mace tools have a lower selection weight due to the exponential commonality with damage.
3v1 also had a smaller sample size of just 70 battles compared to the standard 100 for 2v1, since I only have 20 pawns in the setup and didn't feel like 'binning' too many new pawns, so I just spawned in a 21st who was trait-neutral, melee-capable and healthy.
2v1 Results:

3v1 Results:

Edit: I mistakenly put 6-1 for the final trial with the longsword in the 3v1; only just noticed this as I was going to get the link to link to this from OP. Final trial was actually 7-0
It's somewhat surprising that maces still output consistently worse results, but less so at the same time since mace tools have a lower selection weight due to the exponential commonality with damage.




