# Truel problem analysis

## Abstract

Truel problem is an old problem which can be summarized as :
Several people are doing duel.
Given their probabilities to hit, what are probability of each of them to win and who should they choose as optimal initial target ?

This document presents solutions and analysis of different scenarios for "classical truel" problem - one where players can have different probabilities to hit, shoot sequentially with infinite number of bullets, and where missing is allowed. It contains several novel elements:

• solving for optimal targets: solutions that not only find/show win probabilities for predefined targets, but are able to find optimal targets for each player
• interactive reusable solution: interactive Python functions for fast solution and visualization, with flexible configurable parameters
• random choice analysis: demonstration that "random" target choice can be optimal for some players in certain scenarios
• four players analysis: analyze optimal targets in "four players" truel scenarios

Novel part about solution is that it is fast enough to be interactive even for visualization ( where millions of individual pixels/solutions need to be calculated in short time to present AB graph with all possible hit probabilities for players A and B ). That speed was achieved by implementing default solution using mathematical calculations, while still providing option for conventional solution using simulation. Solutions support not only finding win probabilities for players with predefined initial targets, but also finding actual optimal targets - which is another novel element not found in most previous truel related papers. Mathematical solution is around 400 times faster than simulation when solving for predefined targets, and over 2000 times faster when finding optimal targets in truel scenarios. Another novel aspect of the solution is that is support more than three players, both for solutions and for visualization ( AB graphs still show all hit probabilities for players A and B, with fixed but configurable hit probabilities for players C,D... ) - although such variants are not technically "truel". To further improve speed and allow interactivity, Numba accelerated Python functions are used, along with 'smart' optimizations for AB graph that use dynamic interpolation/calculation.

Novel part about truel analysis is that interactive solution was used to demonstrate situations where it is optimal for some players to choose "random" target, as opposed to choosing to target other players or to miss. Document also demonstrate that self targeting is not optimal choice in any situation, and confirms that intentionally missing is optimal choice in large number of cases. While analysis of intentional misses and even self targeting was done previously, it generally lacked proof or demonstration in which situation those options would be optimal choice - because most previous analyses did not solve for optimal targets.

Code for this document is available at GitHub repository, or in ZIP file at gmnenad.com site ( which also include precalculated cache file, to save around 45 min of initial execution )

## Problem definition

This document solves generalized version of truel problem:

• N players: generalization of fixed three participants in original problem - so technically this may not be "truel", although most of analysis is done for truel version
• Any hit probability: unlike original problem which usually constrains 0 <= pA <= pB <= pC == 1, and especially makes it easier with pC fixed at 100%, this solution must support all possible hit probabilities in [0 .. 1] range for any player
• Optimal targets: in addition to solving for predefined initial targets ( which is easier), this solution must be able to find optimal targets for any or all players - so initial targeting order is part of solution result, in addition to win probabilities
• Target selection: to support different analyses solution should be able to include or exclude different target options, as combination of following:
• Others: standard target option, where player can select one among other players as target
• Miss: player can intentionally miss (shoot at ground etc)
• Self: player can shoot at himself ( suicide option )
• Strongest: player will shoot at highest threat - one of other remaining players with highest chance to hit
• Random: player can, at each round, randomly select one of available target options to shoot at. This option should honor inclusion/exclusion of Others/Miss/Self/Strongest

Original problem:
Three players are doing truel, where each can choose who to try to shoot, then next player plays and so on until only one winner is left. If first player has 1/3 chance to hit, second has 2/3 and last has 100% chance to hit - what is the chance of first player to win and who he should target at start ? ( uses target= Miss )

Alternative problem:
If first player has 75% chance to hit, second has 55% and last has 100% chance to hit - what is the chance of first player to win and who he should target at start ? ( uses target= Random )

Assumptions for problems:

• players can have different probabilities to hit
• players shoot sequentially ( player A, then B, then C ... then again A etc )
• players have infinite number of bullets
• missing is allowed as a potential target ( along with other options as self targeting or random targeting )
• all players have same allowed targets ( if player A can choose to miss, so can players B and C )
• all players are capable of finding optimal solution
• players know which target choice was made by players before them ( for example, each player must announce his target choice on his first round and keep that choice until one player is eliminated )

# 1. Solutions

There are two types of solutions to Truel problem in this document: exact matematical calculation and simulation. Both support solving for win probabilities and for optimal target order, and both support advanced targeting options like Miss, Self or Random targets. Simulation is slower to execute by factor 200-3000x, but even mathematical calculation can be slow for very large number of points ( in 2D graphs ) or very large range of targets ( especially for high random levels).

While different functions will be used ( simulation, mathematical calculation, AB graphs), they all share certain input parameters - same way to define problem :

function ( pHits, targets, search, ... )

Explanation on main shared parameters:

• pHits:

• list of probabilities for each player to hit ( floats in range 0.0 .. 1.0 )
• size of pHits list (or array) determines number of players, so for original truel problem it would have pHits=[ 1/3, 2/3, 1 ]
• targets:

• list of predefined targets that each player will choose to shoot at in initial phase of game ( while no player is eliminated yet )
• list where each element is one of: A, B, C ..., Miss, Random, Self, Strongest, Optimal ( or alternative complex 1D targets, see below )
• it should have same number of elements as number of players, and for example targets= [ Miss, C, B ] indicate that player A will intentionally miss, while B and C shoot each other
• default empty list means Optimal for all players
• valid targets for each player are those that are either specific single target (A,B,C,.., Miss), can be resolved to single target (Self, Strongest) or are able to find single target (Random, Optimal) among 'Search' options
• if 'Random' is selected it has equal chance to select any target from 'search' list ( ignoring 'Random' option if in 'search' list ). If 'Miss' is also in search option, it has same chance as any player
• if 'Optimal' is selected as target for any player, solution should find optimal target for that player ( among options allowed by 'search' list ) that will maximize players win probability
• both Optimal and Random will use 'search' parameter to determine which targets are eligible/valid for selection
• Alternative way to specify target for one player is to specify exact 'Random' distribution in list of N floats (probabilities that player would select each player as target) instead of single integer constant. So for example:
• [ 1, 0, 0 ] is equivalent to 'A' ( player will target A with 1.0 probability, ie always). It is also equivalent to 'Self' if used by player A or 'Strongest' if used by B and player A is strongest etc
• [ 0, 0, 0 ] is equivalent to 'Miss' - player will not select any target
• [ 0.5, 0, 0.5 ] is equivalent to 'Random' if used by player B with search=[Other] - it will equally choose among A and C as targets ( each with 50% chance)
• [ 0, 1/3, 1/3 ] is equivalent to 'Random' if used by player A with search=[Other, Miss] - it will equally choose among B and C as targets ( each with 1/3 chance) and remaining chance is Miss (also equal to 1/3)
• [ 1/4, 1/4, 1/4 ] is equivalent to 'Random' if used by any player with search=[Other, Miss, Self ] - it will equally choose among A,B,C,Miss as targets ( each with 1/4 chance)
• Alternative 1D targets ( list with N floats) allow for combinations that can not be expressed with simple 0D targets ( single integer ). For example:
• [ 0, 30%, 70% ] is not equivalent to any simple target - here target will be randomly selected as B in 30% of cases and C in 70% of cases
• [ 1/4, 0, 1/4 ] is also not equivalent to any simple target - here target will be randomly selected among A and C with equal chance, but only 25% chance each while there is 50% chance to select Miss
• Since targets parameter is list of preferred targets for each player, when each player use complex target it becomes list of lists. It is also possible to mix simple and complex targets:
• targets= [ [0,0,0] , [0.5,0,0.5], [0,1,0] ] is equivalent to targets=[ Miss, Random, B ]
• targets= [ Optimal, [30%,0,70%], B ] is mixed use of simple and complex targets ( where B has uneven random selection among A and C as targets)
• 'Optimal' is only simple (single integer) target that can not be expressed as complex 1D target - since it is hint to solver to actually find best target for that player among all allowed with 'search' option
• this parameter is only meaningful in first phase of truel, until one of players is eliminated and it turns into simple duel (where players always target each other).
• For N=4+ players, after one player is eliminated, solution will use 'targetLater' parameter ( and corresponding searchLater parameter )
• search:
• list or set of valid targets for optimal search or random selection.
• default is search= [ Others, Miss ] , meaning players can target other players or intentionally miss, but can not target themselves or shoot randomly
• it uses same constants as those in 'targets' , except 'Optimal' (since this is used to find optimal). In addition, it can use 'Others' - meaning any player except shooter
• it can only accept integer target constants, not alternative 1D list targets - since it must work for reduced number of players too. But 'Random' as simple int constant is valid search option
• for 3 player truel, this parameter is not used unless 'targets' list contains Optimal or Random. But for 4+ players it can be used anyway, after player is eliminated - see 'Optimal search' section
• these are only most important parameters. Full list of parameters can be seen in definition of Test parameters , specifically "fields related to calculating solution".

### Possible target choices

Options when defining predefined targets or possible targets in optimal searches. Each constant is usable by tg.Name, or directly by name. There are three places where these constants are applicable:

• predefined 'targets' : what targets will each player select in initial phase of game. It can be single specific player ( A, C ...), option that evaluate to single choice ( Self, Strongest, Miss ) or group choice ( Random, Optimal ). It can not be Others, since it must be reducible to single target.
This is array of values, where each value define initial target for that player. So targets = [ Miss, C, Optimal ] means that initially player A will miss, player B will shoot player C, and player C will find optimal target for himself.
• 'search' options : what targets are eligible when player is searching optimal target or shooting random target. Unlike for 'targets', it can be Others - since it define set of options. But it can not be Optimal , since it is used to find optimal.
This is basically set of values, that together define valid targets for all players. It can be represented as set or array, so search = { Others, Miss } = [ tg.Others, tg.Miss] means that when players look for optimal target , they can choose among other players or to miss.

• result 'targets' : part of solution result, it shows optimal targets as found by solution ( or passed predefined targets if no optimal option was present ). It can have same values as predefined target, except Optimal - whole idea of solution is to find what was that 'optimal' target and to replace it with best target for given player.
This is array of values, where each value represent either predefined or optimal initial target for that player. So result targets = [ Miss, C, B ] for example predefined targets = [ Miss, C, Optimal ] means that solution kept supplied [ Miss, C, -] targets for A and B, and found that optimal initial target for C is 'B'

Constants defining possible targets are listed below. They could be used as members of tg class, or as direct variables ( so tg.Miss == Miss , tg.Optimal==Optimal, etc ) :

## 1.1 Simulation as approximate solution

Python function that simulate truel problem for arbitrary number of players, and finds not only winning chances for each player but also optimal targets.

• approximate solution: returns simulation result, and while it defaults to 1 million rounds, it is still approximation
• slow: since it need million rounds on average, single calculation can take 3ms - and search for optimal solution can take 100ms just for 3 players
• find optimal targets: can find optimal targets for each player, with options to allow intentional misses (default), selftargeting, or even random targeting
• predefined targets as parameter: also allows specifying fixed targets for some or all players (for initial rounds while all are alive)

If we have N players with given targets and hit probabilities for each player, simulate until one player is eliminates, and then:

• if N was 3, reduction to two players is simple : those two players will always continue to shoot at each other, until one hits and win
• if N was 4+, then simulation need to change player targets (since initial targets may not be valid any more ). Simulation will use 'targetLater' parameter, to determine if next targets should be: Optimal (default), Strongest

Complexity of solution differs depending on supplied targets:

• if only specific targets were supplied ( 'Others', 'Self' or even 'Miss' - targets that uniquely determine what to shoot ), then only single simulation ( with default million rounds ) is enough
• if one or more targets are 'Random', it does not change much in simulation - player will use 'search' option to determine who he can randomly target , but execution time remains roughly same
• but if one or more targets are 'Optimal', it increases complexity. Players will again use 'search' option to determine what targets are valid candidates for optimal search. But simulation needs to be run for each valid combination ( up to 125 times slower for 3 players )
• same optimal search needs to be done each time when number of players is reduced ( for N-1 players), if our goal is for players to find optimal targets. While for 3 players this is not significant issue, it can exponentially increase optimality search for N=4+ players

Finding who to target ( optimal targets ) :

Search for optimal targets is needed if any of initial 'targets' was 'Optimal' ( for example targets= [Optimal, C, B], asking solution to find optimal target for player A ; or targets= [Optimal, Optimal, Optimal] to find optimal targets for each player ), or in case of 4+ players when we want to keep searching for optimal targets even when some players are eliminated ( specified in 'targetLater' parameter, which default to 'Optimal' , although other options like 'Strongest' would not require increased time ) Finding optimal solution with simulations requires simulating every specific combination of targets ( ie, Random, Miss, A, B, C...). If we have N players, and all possible targets are allowed ie serach=[Others,Self, Random, Miss] , then each player has (N+2) possible targets, and total number of target combinations is: $nTargetCombinations = (N+2)^N$ For 3 players that amounts to 5^3 = 125 combinations, which all need to be simulated before algorithm for optimality search can be applied ( explained in section 1.3 ). Therefore if we are solving for targets= [Optimal, Optimal, Optimal] , it will take 125 times longer execution than if we are solving for specific targets = [ Miss, C, B ] . For 4+ players, that time increases exponentially for optimality search.

There is one limitation for simulation solution : when used for 4+ players, with targets=[Optimal,...] to find optimal targets, it does not propagate optimal search to cases when players are eliminated - instead, it default to targeting 'Strongest' opponent in those reduced cases. For example, when one player in 4-player duel is eliminated and it become 3-player truel, remaining 3 players will not use their true optimal targets. While in many cases targeting 'Strongest' opponent is optimal, it is not always - this mean that mathematical calculation should be used to find optimal targets in 4+ player cases, instead of simulation.

nb_truel_sim() function calculates win probabilities using simulations. It will select either optimized function for 3 players, or slower generic simulation for N players.

Starting parameters:

• pHit: list with probabilities of each starting player to hit
• targets: list with predefined target for each player ( default is Optimal for each player )
• search: what targets are allowed when searching for optimal/best or random target ( default= tg.Miss | tg.Others )
• targetLater: what should all players select as target after initial phase (after one player is eliminated, and initial 'targets' become invalid . default= Optimal )
• searchLater: if search options in later phases (when some player was eliminated) should be different than in initial phase
• ifEQ : how to choose if equal probabilities. Default +1 means choose most 'simple' targets
• iterations : how many simulated rounds before averaging result. Default is 1 million

Result:

• pw[]: probability of each player to win (given players shoot in pHits order) , array of float64 in [0,1] range
• tw[]: who is their initial target ( 0..n-1 index of player, or tg.Miss or tg.Random ), array of int32 with target constants (-2,-1,0,1,2.. )

## 1.2 Mathematical calculation for exact solutions

Python function that solves truel problem for arbitrary number of players, and finds not only winning chances for each player but also optimal targets.

• exact solution: returns exact mathematical solution, not approximate simulation result
• fast: computes fast even with 4+ players. It is 400-2000 times faster than simulation for 3 players, and much faster for 4+ players
• find optimal targets: can find optimal targets for each player, with options to allow intentional misses (default), self targeting, or even random targeting
• predefined targets as parameter: also allows specifying fixed targets for some or all players (for initial rounds while all are alive)

Solution reasoning:

if we have N players, with probability of i-th player (0..N-1) in pHit[i], then if i-th player is on move and he is targeting T:
pwin(i first out of N)= pHit[i]* pwin(i+1 first out of N-1, T is removed ) + (1-pHit[i])* pwin(i+1 first, out of N)
Assuming we can get exact result for reduced number of players (N-1), we can express solution for N players with i-th player on move as function of solution for (i+1)th player on move:
pwin(i)= v[]+ k*pwin(i+1) , where v[]= constant vector with N elements = pHit[i]*pwin(i+1 first out of N-1, T is removed ) , and k= 1-ph[i] ... so we can express this as tuple pwn=(v[], k) so that pwin(i)=pwn*pwin(i+1)

Infinite loop:
Recursively calling pwin(i+1) will eventually get to last player, who will have to rely on first player result calling pwin(0) - which would result in infinite loop
That infinite loop can be avoided by passing function that express result for win probabilities if initial player is on move as function of win probabilities if (i+1)th player is on move : pwin(0)=pw0= f( pwin(i+1) ) = pwn * pwin(i+1) , with pwn= (v[],k )
That pw0 can be modified by each successive player, and when we get to last player and he recalculate pw0, he will have pw0= f ( pwin(N+1) ) = f ( pwin(0) ) ... or pw0 = f(pw0 ) which can be solved directly, avoiding infinite loop
Given pwn=(v[],k) and pw0= pwn*pw0= v[]+k*pw0 -> pw0(1-k)= v[] -> pw0[]= v[]/(1-k) ... resulting in probability to win for each player, if first to play is initial player
That exact result for pw0 can be now returned back to exactly calculate pwin(N-1), pwin(N-2)... all way back to pwin(0) which will get same result as pw0.
Finding who to target ( optimal targets ) :
above calculation assume that each player knows in advance who is best target for him to shoot at (T). If he does not know that, he need to apply same calculation to every possible target (including no target) and select one which give him best chance to win
when 'no target' is selected, his chance to win is slightly different formula: pwin(i)= pwin(i+1) - or if we want to present in same format as before ( pwin(i)= v[]+ k*pwin(i+1) ), then v[]=[0,0..0] and k=1, or in tuple for ([0,0...0], 1.0)
special case is if everyone decide to skip (no target), which would lead to infinite loop, so need to be forbidden in some way. For example - last player is not allowed to skip if everyone else did it.

nb_truel_math() function calculates win probabilities using mathematical solution (as opposed to simulations). It will initialize dictionary and check parameters before calling recursive truelC.
Function truelC is main mathematical solution to recursively finds probability to win and optimal targets for each player.

Starting parameters:

• pHit: list with probabilities of each starting player to hit
• targets: list with predefined target for each player ( default is Optimal for each player )
• search: what targets are allowed when searching for optimal/best or random target ( default= tg.Miss | tg.Others )
• targetLater: what should all players select as target after initial phase (after one player is eliminated, and initial 'targets' become invalid . default= Optimal )
• searchLater: if search options in later phases (when some player was eliminated) should be different than in initial phase
• ifEQ : how to choose if equal probabilities. Default +1 means choose most 'simple' targets

Result:

• pw[]: probability of each player to win (given players shoot in pHits order) , array of float64 in [0,1] range
• tw[]: who is their initial target ( 0..n-1 index of player, or tg.Miss or tg.Random ), array of int32 with target constants (-2,-1,0,1,2.. )

## 1.3 Optimal targets

Finding optimal targets and win probabilities is more complex problem than just finding win probabilities for predefined initial targets.

### What are "predefined targets" ?

Lets say that we have three players, customarily labeled as A (first to shoot), B (second to shoot), and C(third to shoot). And lets use probabilities to hit from original problem, so pA=1/3, pB=2/3 and pC=1.
That could be written as pHits=[1/3,2/3,1] , where size of list determine number of players (here three), and players shoot in order of list ( so player A is represented by pHits[0], player B by pHits[1] etc ).

Now let's say we want to solve problem where all players will always shoot at highest threat ( which is often optimal strategy ). In this case it means A and B will target C, and C will target B. That could be written as targets=[C,C,B] . And that 'targets' list would represent "predefined targets" for this problem setup.
But it is important to understand how and where those targets are used. Using them in initial phase of the game ( "game" refers to one duel/truel match until final winner is known ) is straightforward - and "initial phase" means that all players are still in game.
In initial phase, player A will target C, when he miss then player B will target C, when B miss player C will target B, and when/if player C miss (not possible in case when pC=100%, but in general) then player A will again target player C and so on.
What happens when one player is eliminated? Let's say player A hits player C, and only players A and B remain with player B on the move. It is obvious that player B can not follow 'targets' any more, since playe C is eliminated.
But what if player could follow 'targets' in next phase (after one player is eliminated)? Let's say our predefined targets were [Miss, C, B] , meaning player A would intentionally miss and players B and C shoot each other. Assuming player B hits player C, should player A still shoot to miss? Obviously not, if playe A wants to win.

Conclusion is that:

#### 'targets' only define what players will shoot in initial phase of game, while all players are still in game. As soon as one player is eliminated, 'targets' are not applicable any more.

Note that this holds true both for problem input ( predefined targets ) and problem result ( optimal targets ) - since "optimal targets" (as solution result), can be viewed as "predefined targets" that give best chance to win to all players.

This is often glossed over in Truel analyses, since for three players it is obvious that, when reduced to two players, optimal strategy is to shoot each other. Therefore specifying "targets" for truel completely define how each player will behave during entire game.
But it is not so for more players. If we have four players with targets=[ D,D,D,C ] and player D is eliminated, what will remaining three players target in next phase? Clearly, old targets are not applicable, but neither is simple 'shoot each other' as in duel case.
Therefore, predefined 'targets' parameter does not define all player choices in games with four or more players. To fully specify how four players should play it would need initial 'targets' while all are in game, and at least four additional 'targets' for each of four possible three player cases, and possibly even more than four - if player D is eliminated by player A (so now player B is on move), new targets could be [Miss,C,B,-]. But if player D was eliminated by player B instead ( so now player C is on move), it is possible that players would choose different target, for example [C,C,B,-]. At maximum that would mean 12 different possible 'targets' choices for three remaining players, depending on which player was eliminated and by who. This only gets more complex with 5+ players. So initial targets alone can not uniquely define rest of the game, unless we assume that, as soon as one player is eliminated, players will select targets for next phase in some predefined way. Such predefined way could be "select highest threat", or "keep old target if alive, otherwise Miss" or even "all suicide" etc ... but most of them would not be logical choices since players would tend to select targets that maximize their chance for victory. In any case, choice for targetsLater, or targets to apply after one player is eliminated, must be something applicable to all remaining players regardless of their number and of which players were eliminated. So it can not be "target player C" ( C could have been eliminated), or "Miss" (since it is pointless if all players choose to miss).

Valid options for targetLater would be:

• Strongest : everyone target strongest player
• Optimal: everyone finds optimal target - this is most logical choice, and only one that allows to find real optimal targets for 4+ players, so it is default

It follows that:

### What are **"optimal targets"** ?

From above discussion, it follows that:

#### "optimal targets" are predefined targets that would maximize victory chance of all players.

Since we have shown that 'predefined targets' only define initial phase ( while all players are still in game), that also holds true for 'optimal targets'. Similarly, if new 'optimal targets' are selected once one player is eliminated ( for example, from 4 to 3 players ), that new optimal targets will be valid only until next player is eliminated.

### How to "maximize victory chance of all players" ?

Truel game is inherently zero-sum game ... if one player increase their chance to win, it must reduce some others player chance. Ultimately, sum of all players chance to win must be one:

$$\sum_{player} pWin\, (player) = 1$$

That presents problem to maximize victory chance for all players - if we maximize for player A, it will not be optimal for player B. But it is still possible to determine "optimal" targets if we make following assumptions:

• Assumption: players are able to detect which target players before them have selected. This is straightforward, and can either work by assuming players can 'see' what others shot, or by some other means like each player must announce his target before he shoots.
• Assumption: it is possible to calculate win probabilities given predefined targets. This is even more straightforward, since it is main assumption of entire 'Truel' problem.

We need some way to reduce complex problem of 'find optimal targets' to simpler problem of 'find probabilities to win for predefined targets', using above assumptions.

#### finding optimal targets can be solved using form of **minimax approach**:

Last player to shoot (player C in three players example) will see/know targets of A and B ( as per assumptions ), and would be able to calculate his optimal target by calculating win probabilities for series of predefined targets, where those predefined targets will always have same targets selected by A and B, and his own targets will be his every possible target option. Let's say that A selected tgA=B, B to shoot tgB=A ... and possible targets for C are A/B/Miss. Player C will calculate three times:

• for targetsC1= [ B, A, A], winProbabilities=[ 0.0% 14.8% 85.2% ]
• for targetsC2= [ B, A, B], winProbabilities=[ 7.4% 0.0% 92.6% ]
• for targetsC3= [ B, A, Miss], winProbabilities=[ 0.0% 0.0% 100.0% ] C+

Player C will obviously select for his target option where his pwC is maximal - in this example, one marked with 'C+' where pwC=100% - so he will Miss, and all targets are [ B, A, Miss] ). He needed 3 calculations for that.

How does that influence selection process of player B?
When player B is on the move, he knows what is target of player A (tgA=B in above example), and he also can analyse each of his possible choices. But for each of his choices, he needs to analyse potential responses of player C, in the same way that C would reason it from above. Player B will need 9 calculations (3x3):

• for targetsB1= [ B, A, * ] -> B can expect 0%, since C will choose 100% as his maximum in B13 case
• for targetsB11= [ B, A, A ], winProbabilities=[ 0.0% 14.8% 85.2% ]
• for targetsB12= [ B, A, B ], winProbabilities=[ 7.4% 0.0% 92.6% ]
• for targetsB13= [ B, A, Miss ], winProbabilities=[ 0.0% 0.0% 100.0% ] C+
• for targetsB2= [ B, C, * ] -> B can expect 25.4%, since C would choose 48.1% in B22 case
• for targetsB21= [ B, C, A ], winProbabilities=[ 19.0% 40.2% 40.7% ]
• for targetsB22= [ B, C, B ], winProbabilities=[ 26.5% 25.4% 48.1% ] C+ B+
• for targetsB23= [ B, C, Miss ], winProbabilities=[ 24.5% 32.7% 42.9% ]
• for targetsB3= [ B, Miss, * ] -> B can expect 0%, since C will choose 100% in B33 case
• for targetsB31= [ B, Miss, A ], winProbabilities=[ 0.0% 44.4% 55.6% ]
• for targetsB32= [ B, Miss, B ], winProbabilities=[ 22.2% 0.0% 77.8% ]
• for targetsB33= [ B, Miss, Miss ], winProbabilities=[ 0.0% 0.0% 100.0% ] C+
• therefore B chooses max ( 0%, 25.4%, 0% ), and it turns to be pwB22=25.4% ( marked with B+), so B chooses 'C' as target. It also means that C will choose 'B' as target, and that optimal targets are [ B, C, B ] if A targeted B

Same logic now extend to player A. He needs to evaluate each of his target options tgA in { Miss, B, C }, and for each of them repeat logic that B would apply, so he will need to do 27 calculations (3x3x3).
After determining what targets would B and C choose if he select each of his targets, he select one where he has maximal win probability. Note that it may not ( and in many cases will not ) be situation where player A has best win probability of all players. It will merely be one of three options where he has largest probability. In above example, player A options are:

• if A targets B : as above, B will do 9 calculations and choose to target C, targets=[ B, C, B ] -> pwin=[ 26.5% 25.4% 48.1% ]
• if A targets C : after 3x3 calculations B will choose to target C, targets=[ C, C, B ] -> pwin=[ 31.2% 54.0% 14.8% ]
• if A targets Miss : after 9 calculations B will again choose to target C, targets=[ Miss, C, B ] -> pwin=[ 39.7% 38.1% 22.2% ]
• so after 3x9=27 calculations, player A will find max( 26.5%, 31.2%,39.7%) of his win probabilities, and select to Miss as optimal initial target
• meaning final problem solution for above example with pHits=[1/3,2/3,1] is targets=[ Miss, C, B ] and pwin=[ 39.7% 38.1% 22.2% ]

### What if different targeting options are available ?

It should not change logic, it will only change number of available target options for each player. For example, if players can target themselves (with Self option), they will have 4 target options at each step, so player C will examine 4 cases, player B 4x4 and player A 4x4x4
In general, there will be $T^N = TargetChoices ^ {Players}$ cases .

If 'Random' option is available, where players can randomly shoot at any other player each round, it can be considered just as another option - as long as we know how to calculate win probabilities when Random is included in predefined targets, for example targets=[ Random, C, B ]

Same approach can be applied for cases where only some players are allowed to search for optimal target, and other players are fixed to predefined targets. Lets say that we have:

• predefined targets = [ Optimal, C, Optimal ]

In this case, player A and C can choose any target, but player B must shoot at player C. It only changes step for player B ( who does not have anything to choose ), and player A will need smaller number of total calculations to find optimum (3x1x3=9).
Basically, different options will change total number of possible 'predefined targets', but logic of choosing optimal targets after that should be same application of minimax.

Conclusion:

### Is same approach to find optimal targets applicable both in mathematical calculations and in simulations ?

In essence, yes.
Simulations need to use this approach just as explained, by simulating all possible predefined targets combinations (27 in above example), and then applying this approach to find optimal target.
Mathematical calculation can do the same, but also has an option to optimize calculation and at same time calculate for optimal and for specific targets, as explained in later section.

So simulating for targets=[Optimal, Optimal, Optimal ] will generally takes 27x more time than for specific predefined targets=[ Miss, C, B ]. Even more for more target options, for example if target options are { Others, Self, Miss, Random }, which means 5 target options ( A,B,C,Miss,Random), then it needs 5^3= 125 more calculations to find optimal solution than to find one specific solution using simulations. Using optimized mathematical calculations instead of simulation can avoid calculation of some of those Targets^Players, but would still need more time than single calculation needed for predefined specific targets .

allTargetOptions is function that generate all possible specific targets , based on predefined 'targets' and 'search' options for valid targets, and probability to hit of each player

Function selectOptimalResult is used to select optimal targets among all possible

• finds optimal targets among supplied list of target candidates and their results, using minimax approach
• returns single result with optimal targets
• optionally can print all options and indicate reasons why this result is optimal, which is used by ptruelWhy

truel() function calculates win probabilities using mathematical solution (as opposed to simulations). It enumerate all possible target combinations based on inputs, and then call numba function to actually produce result. Function nb_truel is numba accelerated version which calls either simulation or math calculation, based on specified algorithm.

Starting parameters:

• pHit: list with probabilities of each starting player to hit
• targets: list with predefined target for each player ( default is Optimal for each player )
• search: what targets are allowed when searching for optimal/best or random target ( default= Miss | Others )
• targetLater: what should all players select as target after initial phase (after one player is eliminated, and initial 'targets' become invalid . default= Optimal )
• searchLater: if search options in later phases (when some player was eliminated) should be different than in initial phase. Default=[], which uses same as 'search'
• ifEQ : how to choose if equal probabilities. Default +1 means choose higher target indexes ( so player over Miss/Random )
• iterations : how many simulated rounds before averaging result. Default is 1 million
• alg : what algorithm is used to find truel solution: 'tg.math' solves by calculation, 'tg.sim' by random simulation ( default alg= tg.math )

Result:

• pw[]: probability of each player to win (given players shoot in pHits order) , array of float64 in [0,1] range
• tw[]: who is their initial target, array of int32 with target constants

# 2. Visualization

Multiple visualization procedures use similar parameters to define one 'test case', with given number of players, their probabilities to hit etc. This is Test class which contain all those parameters:

## 2.1 ptruel - to find and print result for given hit probabilities and predefined or optimal targets

Helper ptruel (pHits < , targets, search, targetLater,searchLater, ifEQ, alg, iterations, timeit > ) function used to invoke single truel() calculation and pretty print result.
It accept same starting parameters as truel(), and return same result (but only for single case , ie single set of probabilities ) In addition it display result as : pHit%= [,,..] pWin%= [,,..] targets= [,,..] tm= h:mm:ss.ms        ; win probabilities are rounded to one decimal, and targets are final/optimal targets for each player (tw[]) zero based. Asterix (*) denote that target was forced as parameter.

Solution to original problem, where first player can hit in 1/3 cases, second player in 2/3 cases and last player always hit - and program is allowed to find optimal targets for each player:

• best for first player is to intentionally miss ( 'm' as first value in targets )
• his probability to win is 39.7% ( first value in pWin%[] ) ... in fact, first player has highest win probability of all 3 players
• best target for other two players are each other ( second player shoot third and vice-versa )

Example that demonstrate how truel() can find optimal solution for generalized problem with N players - here with 4 players, each 25% better shot than previous.

• best for first player is to shoot last player, and he has 22.2% chance to win
• best for second player is to shoot first player, and he has 41.8% chance to win - most of all players
• best for third player is to shoot last player, and he has 27.8% chance to win
• best for last player is to shoot third player, and he has only 8.2% chance to win - least of all players, even if he is best shot

Same example as above, but we force all to shoot at highest shot, so B and C shoots D, D shoots C, allowing only first player A to find optimal target. We also do not find optimal targets when 3 players remain, instead forcing them to shoot at strongest.
Now first player is better ( at 32% chance to win ) but second player is worse and now best chance to win has third player with 35%. Note how slower simulation is in this case, even if we did optimal search only for one player.

## 2.2 ptruelWhy - to demonstrate "why" is target list returned in result really 'optimal'

Function ptruelWhy (pHits <, search, targets,ifEQ, alg > ) to generate all possible target combinations and select optimal targets among them , while printing all options and reasoning behind choices.

• uses allTargetResults function to generate all posible target combinations and calculate their probabilities using truel() with all predefined targets
• uses selectOptimalResult function to find best target among all candidates, using minimax approach
• print all options and indicate reasons why this result is optimal.

### Example of minimax selection process for Optimal result

• using ptruelWhy function that actually calculate all probabilities for all possible target options for three players with hit chances pHits= [ 1/3, 2/3, 1] and possible targets { Others, Miss }
• it then uses minimax approach to select optimal targets for all players ( see also explanation in 1.3 about minimax approach ):
• player C will, for each combination of targets selected by players A and B before him, select one out of three target options ( Miss, A , B ) that gives him best result. So for cases #1..3 , where A and B target [B,A] ( each other ), best option for C is #3 ( to Miss ) with 100% chance to win. That option is marked by "C+". Similarly, if A and B selected [ C, C ] before him , his best option is #14 ( 14% #14 > 7% #13 > 0% #15 ), so he selects #14 in that case ( marked with C+ on line #14), etc...
• player B can also select three options (Miss, A, C ) for every target that A select before him. But he knows that, for each of his selections, he can expect player C to select one of rows marked with "C+". So if player A selected Miss, player B can only choose among rows #21,23,26 , and best win probability for him in that case is 38% at row #23 , so he chooses that one (marked with B+). Similarly, if A select to target C, player B can choose among C+ rows at # 11,14,17 where best result for him is at #14 with 54%, so #14 gets B+, and in similar way #5
• when player A need to select his best target, he also have three options ( Miss, B , C ). For each of them he will get options marked by B+ from player B, so he needs to choose among three of those ( at rows #5, #14 and #23 ), and best win probability for player A among those is one at row #23 with 39.7% chance ( where A chose to Miss, and that row gets marked by A+)
• therefore A chooses as his target Miss , B chooses to target player C, anc C chooses to target player B or targets = [ m C B ] as shown in row #23, with win probabilities [ 39.7% 38.1% 22.2% ], meaning 40% chance for player A to win, 38% for player B and just 22% for player C
• if 'search' parameter is expanded to search=[Others,Miss,Self,Random] , then number of rows ( possible target combinations ) would increase to 125 ( 5^3). Or in general: nTargetCombinations = nTargetOptions ^ nPlayers

## 2.3 testAB - to display results as colored graph image for any hit probability of first two players (A and B) on Y and X axis

Functions in this area are intended to create AB image/graph showing how win probabilities or optimal targets change for all possible combinations of hit probabilities for first two players, A and B ( hence 'AB' in names )
They can work for any number of players, using fixed hit probabilities for player C and above (players 3+), and exploring entire [0..1] range of probabilities for first two players A and B. On all images Y-axis represent every possible probability 0%..100% to hit for player A (0% chance is bottom, 100% chance for A to hit is top ) and X-axis represent players B chance to hit ( 0% is left, 100% is right)
Most functions share same parameters, that allows different data to be shown. Main AB functions are :

• testAB([tests] <, title="", inches=7 > ) has parameters passed in members of test class, either as single class or as list ( in which case multiple images side-by-side are generated )
• diffAB(tests1, [tests2], < colors=['w','dp','do'], title="", inches=7 > ) calculate difference in results of test1 and one or more [tests2], and display them using specified color schemes.
• testAB3(< pC=[1.0] >, ... params... ) fixed for 3 players, ignoring parameter pHits and recreating it from new optional parameter pC= list of probabilities for player C ( default pC=[1.0]) . If multiple pC, it create multiple images side-by-side

#### Common parameters, either passed directly or as members of typed class test:

Initial three parameters are standard parameter passed to truel function:

• pHits: chance of each player to hit. Only chances for player C and above are used ( so pHits[2:] ), since pA and pB will be changing along Y-axis and X-axis. For real truel (3 players) typical pHits value here is [0,0,1]
• targets: predefined targets for each player. If not specified, optimal targets will be searched for players. Default = [] , meaning find optimal target for each player
• search: valid targets to choose when searching for optimal target or random target. Default = [ Others, Miss ]

Next three parameters are also passed to truel(), but are rarely changed from default. They define search/target behaviour after one player is eliminated :

• targetLater: what will remaining players target in later phases of game, after some player is eliminated. Default= Optimal ( they will try to find optimal target )
• searchLater: what will be valid targets for Optimal/Random search in later phases of game after someone is eliminated. Default = same as 'search' parameter
• ifEQ: what target will player choose if both give him equal maximal win chance. Default = +1 , choose higher target first, player C,B,A,Miss,Random ( so exotic options chosen last). use ifEQ=+1 for opposite order

Following are ImageAB specific parameters:

• points: how many different pA and pB values to calculate. Resulting image will have points x points dimension - so higher points require more time to calculate. Default is 300 ( so 300x300 pixels image )
• AgtB: if True (default), pA can be greater than pB - thus pA and pB will both cover every possible value between 0.0 and 1.0 and image will be square. If False, pA < = pB and image will be bottom right triangle
• method: how to calculate truel probability values for each point:
• 'smart': (default) progressive resolution, calculating every point near edges using truel() function, and interpolating points in homogenous areas
• 'normal': calculating every point using truel() function - slower than 'smart' option, and does not result in noticeable difference.
• intensityRange: when probability is used for color intensity, determine where to have max/min intensity. Default is max intensity for pWin < = 40%, and min intensity for pWin > = 90%. Rarely changed.
• colors: how to determine color of points on image. Just initial letter of option is enough to pass in parameter. If passed as list , eg colors=['u','w'], it generate multiple images side-by-side. Can specify for which player ( eg "p" vs "pA" ). Default is "w".
• w: 'winner' (default) color is based on which player has most chance to win at that point. Red for player A, Green for player B and Blue for player C. Intensity is brighter for lower chance to win, darker for higher chance to win.
• Red: player A has highest chance to win
• Green: player B has highest chance to win
• Blue: player C has highest chance to win
• o: 'win order' color is based on win order - similar to "w" so same primary colors are retained based on who has most chance to win, only its subdivided to different colors based on order 1st/2nd/3rd
• p: 'pwin' , color is based on best probability to win at that point , with gamut from black / dark blues for low probabilities to white / bright green for high probabilities. If "pC" it shows probabilities of player C to win.
• u: 'unusualTargets', color is based on how unusual result targets are ( targets they will use while still alive ). When multiple 'unusual' targets are present, most unusual one determine color in order:
• green: random target.
• red: self target / suicide
• blue: intentional miss
• black: other player, but not one with highest chance to hit
• lightgrey: other player who has highest chance to hit - there will be most of those.
• t: 'targets', color is based on optimal/selected target for that pA,pB. If player is specified, eg. "tB", it shows targets of player B

### Calculate AB result matrix : makeMatAB¶

Function that calculate result matrix [ points, points, 2, N ], with win probabilities and target orders for every mat[x,y] point:

makeMatAB : create full matrix ( size x size ) with probabilities and targets for entire AB image area ( where Y axis represent probabilities of player A to hit, from 0.0 to 1.0 , and X-axis is same for player B ), using one of:

• fixResolution : create matrix by calling truel() calculation for every point, if method=="normal"
• smartResolution : create matrix by calling truel() calculation only for non-homogenous areas ( around edges or where targets change ), and interpolate for homogenous areas. faster than fixResolution, if method=="smart"
• CalcMat: recursive function that determine if it can use interpolation or need to subdivide further
• InterpolateMat: interpolation for homogenous areas ( with same win order )

### Make AB result image : makeImageAB¶

Functions that create image from result matrix:

• makeImageAB : create AB image from params ( using Fixed or Smart methods), and return (image, label, title ). Can return multiple images, one for each option in 'colors'
• colorImageAB : create image from matrix, based on 'colors' parameter
• CalcMat: recursive function that determine if it can use interpolation or need to subdivide further
• InterpolateMat: interpolation for homogenous areas ( with same win order )
• horizImages : display multiple results of makeImageAB horizontally in same cell.

### Main AB functions : testAB and diffAB¶

Functions that allow user to calculate and display test results in AB image format :

• testAB([tests] <, title="", inches=7, cache=1 > ) has parameters passed in members of test class, either as single class or as list ( in which case multiple images side-by-side are generated )
• diffAB(tests1, [tests2], < colors=['w','dp','do'], title="", inches=7, cache=1 > ) calculate difference in results of test1 and one or more [tests2], and display them using specified color schemes. Additional supported colorings are:
• 'dp'= pWin average DIFF% : average difference in win probabilities across all three players ( with colorscale, dark means low difference )
• 'do'= DIFF in Win Order : colored cases with different win order, regardless or how much probabilities differ ( white means no change in win order )

Additional formats for above functions or useful helper functions :

• testAB3t(test <, pC=[1.0], inches=7 > ) fixed for 3 players, ignoring parameter pHits and recreating it from new optional parameter pC= list of probabilities for player C ( default pC=[1.0]) . If multiple pC, it create multiple images side-by-side
• testAB3 same as testAB3t, but parameters are passed directly instead of in Test class
• Replace( [tests], field1=val1, field2=val2, ...) to replace same field values in multiple test cases ( works on single test case as well )
• preCalc( [tests] , cache=cNone/cUse/cUpdate/cReadOnly) - precalculate all tests and return new list of [tests] with those precalculated matrix results included. Rarely needed, included in all testXY, diffXY functions

For each test case, parameter colors can also be list, again generating multiple side-by-side images with different coloring schemes.

# 3. Analysis

## 3.1 Comparison between "smart" and "normal" AB methods

There are two methods available for every 'AB' function:

• Normal method : calculate value of truel() solution for every point x point case in 0<= pA,pB <=1 range ( using simulation or math calculation algorithm )
• Smart method : (default) performance optimization that does not calculate for every point. Instead, it subdivide image to smaller areas, and if area is homogenous then it interpolate values, otherwise subdivide/calculate.

Performance advantage range from moderate for small images (200 x 200), to dozens or hundreds of times for larger images ( 1000 x 1000 ) - depending not only on size of image, but also on parameters of AB function like search options etc.
But any optimization has potential to introduce errors, so here we will use those same AB functions to analyze difference between 'smart' and 'normal' methods. It will show that smart method is acceptable approach with very low result difference.