Truel problem analysis

Author: gmnenad

Summary

Abstract

Truel problem is an old problem which can be summarized as :
Several people are doing duel.
Given their probabilities to hit, what are probability of each of them to win and who should they choose as optimal initial target ?

This document presents solutions and analysis of different scenarios for "classical truel" problem - one where players can have different probabilities to hit, shoot sequentially with infinite number of bullets, and where missing is allowed. It contains several novel elements:

Novel part about solution is that it is fast enough to be interactive even for visualization ( where millions of individual pixels/solutions need to be calculated in short time to present AB graph with all possible hit probabilities for players A and B ). That speed was achieved by implementing default solution using mathematical calculations, while still providing option for conventional solution using simulation. Solutions support not only finding win probabilities for players with predefined initial targets, but also finding actual optimal targets - which is another novel element not found in most previous truel related papers. Mathematical solution is around 400 times faster than simulation when solving for predefined targets, and over 2000 times faster when finding optimal targets in truel scenarios. Another novel aspect of the solution is that is support more than three players, both for solutions and for visualization ( AB graphs still show all hit probabilities for players A and B, with fixed but configurable hit probabilities for players C,D... ) - although such variants are not technically "truel". To further improve speed and allow interactivity, Numba accelerated Python functions are used, along with 'smart' optimizations for AB graph that use dynamic interpolation/calculation.

Novel part about truel analysis is that interactive solution was used to demonstrate situations where it is optimal for some players to choose "random" target, as opposed to choosing to target other players or to miss. Document also demonstrate that self targeting is not optimal choice in any situation, and confirms that intentionally missing is optimal choice in large number of cases. While analysis of intentional misses and even self targeting was done previously, it generally lacked proof or demonstration in which situation those options would be optimal choice - because most previous analyses did not solve for optimal targets.

Code for this document is available at GitHub repository, or in ZIP file at gmnenad.com site ( which also include precalculated cache file, to save around 45 min of initial execution )



Problem definition

This document solves generalized version of truel problem:

Original problem:
Three players are doing truel, where each can choose who to try to shoot, then next player plays and so on until only one winner is left. If first player has 1/3 chance to hit, second has 2/3 and last has 100% chance to hit - what is the chance of first player to win and who he should target at start ? ( uses target= Miss )

Alternative problem:
If first player has 75% chance to hit, second has 55% and last has 100% chance to hit - what is the chance of first player to win and who he should target at start ? ( uses target= Random )

Assumptions for problems:

1. Solutions

There are two types of solutions to Truel problem in this document: exact matematical calculation and simulation. Both support solving for win probabilities and for optimal target order, and both support advanced targeting options like Miss, Self or Random targets. Simulation is slower to execute by factor 200-3000x, but even mathematical calculation can be slow for very large number of points ( in 2D graphs ) or very large range of targets ( especially for high random levels).

While different functions will be used ( simulation, mathematical calculation, AB graphs), they all share certain input parameters - same way to define problem :

function ( pHits, targets, search, ... )

Explanation on main shared parameters:

Possible target choices

Options when defining predefined targets or possible targets in optimal searches. Each constant is usable by tg.Name, or directly by name. There are three places where these constants are applicable:

Constants defining possible targets are listed below. They could be used as members of tg class, or as direct variables ( so tg.Miss == Miss , tg.Optimal==Optimal, etc ) :

1.1 Simulation as approximate solution

Python function that simulate truel problem for arbitrary number of players, and finds not only winning chances for each player but also optimal targets.


If we have N players with given targets and hit probabilities for each player, simulate until one player is eliminates, and then:

Complexity of solution differs depending on supplied targets:

Finding who to target ( optimal targets ) :

Search for optimal targets is needed if any of initial 'targets' was 'Optimal' ( for example targets= [Optimal, C, B], asking solution to find optimal target for player A ; or targets= [Optimal, Optimal, Optimal] to find optimal targets for each player ), or in case of 4+ players when we want to keep searching for optimal targets even when some players are eliminated ( specified in 'targetLater' parameter, which default to 'Optimal' , although other options like 'Strongest' would not require increased time ) Finding optimal solution with simulations requires simulating every specific combination of targets ( ie, Random, Miss, A, B, C...). If we have N players, and all possible targets are allowed ie serach=[Others,Self, Random, Miss] , then each player has (N+2) possible targets, and total number of target combinations is: $ nTargetCombinations = (N+2)^N $ For 3 players that amounts to 5^3 = 125 combinations, which all need to be simulated before algorithm for optimality search can be applied ( explained in section 1.3 ). Therefore if we are solving for targets= [Optimal, Optimal, Optimal] , it will take 125 times longer execution than if we are solving for specific targets = [ Miss, C, B ] . For 4+ players, that time increases exponentially for optimality search.


There is one limitation for simulation solution : when used for 4+ players, with targets=[Optimal,...] to find optimal targets, it does not propagate optimal search to cases when players are eliminated - instead, it default to targeting 'Strongest' opponent in those reduced cases. For example, when one player in 4-player duel is eliminated and it become 3-player truel, remaining 3 players will not use their true optimal targets. While in many cases targeting 'Strongest' opponent is optimal, it is not always - this mean that mathematical calculation should be used to find optimal targets in 4+ player cases, instead of simulation.

nb_truel_sim() function calculates win probabilities using simulations. It will select either optimized function for 3 players, or slower generic simulation for N players.

Starting parameters:

Result:

1.2 Mathematical calculation for exact solutions

Python function that solves truel problem for arbitrary number of players, and finds not only winning chances for each player but also optimal targets.


Solution reasoning:

if we have N players, with probability of i-th player (0..N-1) in pHit[i], then if i-th player is on move and he is targeting T:
    pwin(i first out of N)= pHit[i]* pwin(i+1 first out of N-1, T is removed ) + (1-pHit[i])* pwin(i+1 first, out of N)
Assuming we can get exact result for reduced number of players (N-1), we can express solution for N players with i-th player on move as function of solution for (i+1)th player on move:
     pwin(i)= v[]+ k*pwin(i+1) , where v[]= constant vector with N elements = pHit[i]*pwin(i+1 first out of N-1, T is removed ) , and k= 1-ph[i] ... so we can express this as tuple pwn=(v[], k) so that pwin(i)=pwn*pwin(i+1)

Infinite loop:
Recursively calling pwin(i+1) will eventually get to last player, who will have to rely on first player result calling pwin(0) - which would result in infinite loop
That infinite loop can be avoided by passing function that express result for win probabilities if initial player is on move as function of win probabilities if (i+1)th player is on move : pwin(0)=pw0= f( pwin(i+1) ) = pwn * pwin(i+1) , with pwn= (v[],k )
That pw0 can be modified by each successive player, and when we get to last player and he recalculate pw0, he will have pw0= f ( pwin(N+1) ) = f ( pwin(0) ) ... or pw0 = f(pw0 ) which can be solved directly, avoiding infinite loop
Given pwn=(v[],k) and pw0= pwn*pw0= v[]+k*pw0 -> pw0(1-k)= v[] -> pw0[]= v[]/(1-k) ... resulting in probability to win for each player, if first to play is initial player
That exact result for pw0 can be now returned back to exactly calculate pwin(N-1), pwin(N-2)... all way back to pwin(0) which will get same result as pw0.
Finding who to target ( optimal targets ) :
above calculation assume that each player knows in advance who is best target for him to shoot at (T). If he does not know that, he need to apply same calculation to every possible target (including no target) and select one which give him best chance to win
when 'no target' is selected, his chance to win is slightly different formula: pwin(i)= pwin(i+1) - or if we want to present in same format as before ( pwin(i)= v[]+ k*pwin(i+1) ), then v[]=[0,0..0] and k=1, or in tuple for ([0,0...0], 1.0)
special case is if everyone decide to skip (no target), which would lead to infinite loop, so need to be forbidden in some way. For example - last player is not allowed to skip if everyone else did it.

nb_truel_math() function calculates win probabilities using mathematical solution (as opposed to simulations). It will initialize dictionary and check parameters before calling recursive truelC.
Function truelC is main mathematical solution to recursively finds probability to win and optimal targets for each player.

Starting parameters:

Result:

1.3 Optimal targets

Finding optimal targets and win probabilities is more complex problem than just finding win probabilities for predefined initial targets.

What are "predefined targets" ?

Lets say that we have three players, customarily labeled as A (first to shoot), B (second to shoot), and C(third to shoot). And lets use probabilities to hit from original problem, so pA=1/3, pB=2/3 and pC=1.
That could be written as pHits=[1/3,2/3,1] , where size of list determine number of players (here three), and players shoot in order of list ( so player A is represented by pHits[0], player B by pHits[1] etc ).

Now let's say we want to solve problem where all players will always shoot at highest threat ( which is often optimal strategy ). In this case it means A and B will target C, and C will target B. That could be written as targets=[C,C,B] . And that 'targets' list would represent "predefined targets" for this problem setup.
But it is important to understand how and where those targets are used. Using them in initial phase of the game ( "game" refers to one duel/truel match until final winner is known ) is straightforward - and "initial phase" means that all players are still in game.
In initial phase, player A will target C, when he miss then player B will target C, when B miss player C will target B, and when/if player C miss (not possible in case when pC=100%, but in general) then player A will again target player C and so on.
What happens when one player is eliminated? Let's say player A hits player C, and only players A and B remain with player B on the move. It is obvious that player B can not follow 'targets' any more, since playe C is eliminated.
But what if player could follow 'targets' in next phase (after one player is eliminated)? Let's say our predefined targets were [Miss, C, B] , meaning player A would intentionally miss and players B and C shoot each other. Assuming player B hits player C, should player A still shoot to miss? Obviously not, if playe A wants to win.

Conclusion is that:

'targets' only define what players will shoot in initial phase of game, while all players are still in game. As soon as one player is eliminated, 'targets' are not applicable any more.

Note that this holds true both for problem input ( predefined targets ) and problem result ( optimal targets ) - since "optimal targets" (as solution result), can be viewed as "predefined targets" that give best chance to win to all players.

This is often glossed over in Truel analyses, since for three players it is obvious that, when reduced to two players, optimal strategy is to shoot each other. Therefore specifying "targets" for truel completely define how each player will behave during entire game.
But it is not so for more players. If we have four players with targets=[ D,D,D,C ] and player D is eliminated, what will remaining three players target in next phase? Clearly, old targets are not applicable, but neither is simple 'shoot each other' as in duel case.
Therefore, predefined 'targets' parameter does not define all player choices in games with four or more players. To fully specify how four players should play it would need initial 'targets' while all are in game, and at least four additional 'targets' for each of four possible three player cases, and possibly even more than four - if player D is eliminated by player A (so now player B is on move), new targets could be [Miss,C,B,-]. But if player D was eliminated by player B instead ( so now player C is on move), it is possible that players would choose different target, for example [C,C,B,-]. At maximum that would mean 12 different possible 'targets' choices for three remaining players, depending on which player was eliminated and by who. This only gets more complex with 5+ players. So initial targets alone can not uniquely define rest of the game, unless we assume that, as soon as one player is eliminated, players will select targets for next phase in some predefined way. Such predefined way could be "select highest threat", or "keep old target if alive, otherwise Miss" or even "all suicide" etc ... but most of them would not be logical choices since players would tend to select targets that maximize their chance for victory. In any case, choice for targetsLater, or targets to apply after one player is eliminated, must be something applicable to all remaining players regardless of their number and of which players were eliminated. So it can not be "target player C" ( C could have been eliminated), or "Miss" (since it is pointless if all players choose to miss).

Valid options for targetLater would be:

It follows that:

initial predefined 'targets' can define entire game only if we assume that, after one player is eliminated, remaining players will all select same type of target, specified in 'targetLater' - 'Optimal' should be logical choice

What are **"optimal targets"** ?

From above discussion, it follows that:

"optimal targets" are predefined targets that would maximize victory chance of all players.

Since we have shown that 'predefined targets' only define initial phase ( while all players are still in game), that also holds true for 'optimal targets'. Similarly, if new 'optimal targets' are selected once one player is eliminated ( for example, from 4 to 3 players ), that new optimal targets will be valid only until next player is eliminated.

How to "maximize victory chance of all players" ?

Truel game is inherently zero-sum game ... if one player increase their chance to win, it must reduce some others player chance. Ultimately, sum of all players chance to win must be one:

$$ \sum_{player} pWin\, (player) = 1 $$

That presents problem to maximize victory chance for all players - if we maximize for player A, it will not be optimal for player B. But it is still possible to determine "optimal" targets if we make following assumptions:

We need some way to reduce complex problem of 'find optimal targets' to simpler problem of 'find probabilities to win for predefined targets', using above assumptions.

finding optimal targets can be solved using form of **minimax approach**:

Last player to shoot (player C in three players example) will see/know targets of A and B ( as per assumptions ), and would be able to calculate his optimal target by calculating win probabilities for series of predefined targets, where those predefined targets will always have same targets selected by A and B, and his own targets will be his every possible target option. Let's say that A selected tgA=B, B to shoot tgB=A ... and possible targets for C are A/B/Miss. Player C will calculate three times:

Player C will obviously select for his target option where his pwC is maximal - in this example, one marked with 'C+' where pwC=100% - so he will Miss, and all targets are [ B, A, Miss] ). He needed 3 calculations for that.

How does that influence selection process of player B?
When player B is on the move, he knows what is target of player A (tgA=B in above example), and he also can analyse each of his possible choices. But for each of his choices, he needs to analyse potential responses of player C, in the same way that C would reason it from above. Player B will need 9 calculations (3x3):

Same logic now extend to player A. He needs to evaluate each of his target options tgA in { Miss, B, C }, and for each of them repeat logic that B would apply, so he will need to do 27 calculations (3x3x3).
After determining what targets would B and C choose if he select each of his targets, he select one where he has maximal win probability. Note that it may not ( and in many cases will not ) be situation where player A has best win probability of all players. It will merely be one of three options where he has largest probability. In above example, player A options are:

What if different targeting options are available ?

It should not change logic, it will only change number of available target options for each player. For example, if players can target themselves (with Self option), they will have 4 target options at each step, so player C will examine 4 cases, player B 4x4 and player A 4x4x4
In general, there will be $ T^N = TargetChoices ^ {Players} $ cases .

If 'Random' option is available, where players can randomly shoot at any other player each round, it can be considered just as another option - as long as we know how to calculate win probabilities when Random is included in predefined targets, for example targets=[ Random, C, B ]

Same approach can be applied for cases where only some players are allowed to search for optimal target, and other players are fixed to predefined targets. Lets say that we have:

In this case, player A and C can choose any target, but player B must shoot at player C. It only changes step for player B ( who does not have anything to choose ), and player A will need smaller number of total calculations to find optimum (3x1x3=9).
Basically, different options will change total number of possible 'predefined targets', but logic of choosing optimal targets after that should be same application of minimax.

Conclusion:

Optimal targets can be found by using minimax approach on all possible combinations of specific predefined targets


Is same approach to find optimal targets applicable both in mathematical calculations and in simulations ?

In essence, yes.
Simulations need to use this approach just as explained, by simulating all possible predefined targets combinations (27 in above example), and then applying this approach to find optimal target.
Mathematical calculation can do the same, but also has an option to optimize calculation and at same time calculate for optimal and for specific targets, as explained in later section.

So simulating for targets=[Optimal, Optimal, Optimal ] will generally takes 27x more time than for specific predefined targets=[ Miss, C, B ]. Even more for more target options, for example if target options are { Others, Self, Miss, Random }, which means 5 target options ( A,B,C,Miss,Random), then it needs 5^3= 125 more calculations to find optimal solution than to find one specific solution using simulations. Using optimized mathematical calculations instead of simulation can avoid calculation of some of those Targets^Players, but would still need more time than single calculation needed for predefined specific targets .

allTargetOptions is function that generate all possible specific targets , based on predefined 'targets' and 'search' options for valid targets, and probability to hit of each player

Function selectOptimalResult is used to select optimal targets among all possible

truel() function calculates win probabilities using mathematical solution (as opposed to simulations). It enumerate all possible target combinations based on inputs, and then call numba function to actually produce result. Function nb_truel is numba accelerated version which calls either simulation or math calculation, based on specified algorithm.

Starting parameters:

Result:

2. Visualization

Multiple visualization procedures use similar parameters to define one 'test case', with given number of players, their probabilities to hit etc. This is Test class which contain all those parameters:

2.1 ptruel - to find and print result for given hit probabilities and predefined or optimal targets


Helper ptruel (pHits < , targets, search, targetLater,searchLater, ifEQ, alg, iterations, timeit > ) function used to invoke single truel() calculation and pretty print result.
It accept same starting parameters as truel(), and return same result (but only for single case , ie single set of probabilities ) In addition it display result as : pHit%= [,,..] pWin%= [,,..] targets= [,,..] tm= h:mm:ss.ms        ; win probabilities are rounded to one decimal, and targets are final/optimal targets for each player (tw[]) zero based. Asterix (*) denote that target was forced as parameter.


Solution to original problem, where first player can hit in 1/3 cases, second player in 2/3 cases and last player always hit - and program is allowed to find optimal targets for each player:

  • best for first player is to intentionally miss ( 'm' as first value in targets )
  • his probability to win is 39.7% ( first value in pWin%[] ) ... in fact, first player has highest win probability of all 3 players
  • best target for other two players are each other ( second player shoot third and vice-versa )



Example that demonstrate how truel() can find optimal solution for generalized problem with N players - here with 4 players, each 25% better shot than previous.

  • best for first player is to shoot last player, and he has 22.2% chance to win
  • best for second player is to shoot first player, and he has 41.8% chance to win - most of all players
  • best for third player is to shoot last player, and he has 27.8% chance to win
  • best for last player is to shoot third player, and he has only 8.2% chance to win - least of all players, even if he is best shot


Same example as above, but we force all to shoot at highest shot, so B and C shoots D, D shoots C, allowing only first player A to find optimal target. We also do not find optimal targets when 3 players remain, instead forcing them to shoot at strongest.
Now first player is better ( at 32% chance to win ) but second player is worse and now best chance to win has third player with 35%. Note how slower simulation is in this case, even if we did optimal search only for one player.

2.2 ptruelWhy - to demonstrate "why" is target list returned in result really 'optimal'

Function ptruelWhy (pHits <, search, targets,ifEQ, alg > ) to generate all possible target combinations and select optimal targets among them , while printing all options and reasoning behind choices.

Example of minimax selection process for Optimal result

2.3 testAB - to display results as colored graph image for any hit probability of first two players (A and B) on Y and X axis

Functions in this area are intended to create AB image/graph showing how win probabilities or optimal targets change for all possible combinations of hit probabilities for first two players, A and B ( hence 'AB' in names )
They can work for any number of players, using fixed hit probabilities for player C and above (players 3+), and exploring entire [0..1] range of probabilities for first two players A and B. On all images Y-axis represent every possible probability 0%..100% to hit for player A (0% chance is bottom, 100% chance for A to hit is top ) and X-axis represent players B chance to hit ( 0% is left, 100% is right)
Most functions share same parameters, that allows different data to be shown. Main AB functions are :

Common parameters, either passed directly or as members of typed class test:

Initial three parameters are standard parameter passed to truel function:

Next three parameters are also passed to truel(), but are rarely changed from default. They define search/target behaviour after one player is eliminated :

Following are ImageAB specific parameters:

Calculate AB result matrix : makeMatAB

Function that calculate result matrix [ points, points, 2, N ], with win probabilities and target orders for every mat[x,y] point:

makeMatAB : create full matrix ( size x size ) with probabilities and targets for entire AB image area ( where Y axis represent probabilities of player A to hit, from 0.0 to 1.0 , and X-axis is same for player B ), using one of:

Make AB result image : makeImageAB

Functions that create image from result matrix:

Main AB functions : testAB and diffAB

Functions that allow user to calculate and display test results in AB image format :

Additional formats for above functions or useful helper functions :

For each test case, parameter colors can also be list, again generating multiple side-by-side images with different coloring schemes.

3. Analysis

3.1 Comparison between "smart" and "normal" AB methods

There are two methods available for every 'AB' function:

Performance advantage range from moderate for small images (200 x 200), to dozens or hundreds of times for larger images ( 1000 x 1000 ) - depending not only on size of image, but also on parameters of AB function like search options etc.
But any optimization has potential to introduce errors, so here we will use those same AB functions to analyze difference between 'smart' and 'normal' methods. It will show that smart method is acceptable approach with very low result difference.