Act 1#

The Source of Payoffs:#

These numbers represent rewards or penalties that depend on the specific game context. They are often chosen based on empirical data, simulations, or theoretical models. In game theory, the payoffs usually come from the following factors:

  1. Resource allocation: How much of a resource is gained or lost when players cooperate or defect?

  2. Utility functions: Mathematical functions representing preferences, satisfaction, or outcomes for players.

  3. Statistical or probabilistic outcomes: Random factors that influence the payoff, especially in uncertain or risky situations.

  4. Real-world examples: Payoffs derived from actual scenarios like business deals, wars, negotiations, etc.

Let’s now build a resource-driven example where payoffs are not arbitrary but calculated from specific interactions.


Step 1: Define the Problem Context#

Let’s imagine a game where two players are sharing resources. The total pool of resources is 100 units. If both cooperate, they share the resources equally. If one defects, they take more resources, leaving the other with less. However, if both defect, the resource pool shrinks due to inefficiencies (say, down to 80 units).

Step 2: Create a Payoff Model Based on Resource Allocation#

  • Cooperate-Cooperate: Each player gets half the total resources (50 units each).

  • Cooperate-Defect: The defector takes 70% of the total resources (70 units), while the cooperator gets the remaining 30% (30 units).

  • Defect-Defect: Both players get less due to inefficiency. The total resource pool shrinks to 80 units, and they split it equally (40 units each).

Step 3: Compute the Payoff Matrix#

From the resource-sharing setup above, we can now compute the payoffs. The key here is that these numbers aren’t arbitrary; they reflect specific interactions and consequences.

Hide code cell source
import numpy as np

def resource_based_payoff(total_resources=100, defect_share=0.7, inefficiency=0.8):
    """
    Calculate payoffs based on sharing of total resources.
    Cooperation leads to an even split, defection allows one to take a larger share.
    Defection by both players leads to a reduction in total resources.
    """
    # Cooperate-Cooperate: Equal split of total resources
    cooperate_cooperate = total_resources / 2
    
    # Cooperate-Defect: Defector takes the larger share, cooperator gets the rest
    cooperate_defect = total_resources * (1 - defect_share)
    defect_cooperate = total_resources * defect_share
    
    # Defect-Defect: Both defect, causing inefficiency (resource pool shrinks)
    total_defect_resources = total_resources * inefficiency
    defect_defect = total_defect_resources / 2
    
    # Payoff matrix in the format: [(C, C), (C, D)], [(D, C), (D, D)]
    return np.array([[cooperate_cooperate, cooperate_defect],
                     [defect_cooperate, defect_defect]])

# Compute the payoff matrix from the resource-sharing model
payoff_matrix = resource_based_payoff()
print("Payoff Matrix (Resource-Driven):\n", payoff_matrix)
Payoff Matrix (Resource-Driven):
 [[50. 30.]
 [70. 40.]]

Explanation of the Code:#

  1. Cooperate-Cooperate: Players split 100 units equally (50 each).

  2. Cooperate-Defect: The defector takes 70% (70 units) and the cooperator gets the rest (30 units).

  3. Defect-Defect: The total pool shrinks to 80 units due to inefficiency, and both players get 40 units each.

This payoff matrix is not arbitrary—it’s derived from specific resource allocation rules. You can change the parameters to model different scenarios.

Output Example:#

Payoff Matrix (Resource-Driven):
[[50. 30.]
 [70. 40.]]

In this example:

  • If both cooperate, they get 50 each.

  • If one defects while the other cooperates, the defector gets 70 and the cooperator gets 30.

  • If both defect, they each get 40 due to inefficiency.


Step 4: Generalize the Model for Any Scenario#

You can adapt this method for any game scenario by changing:

  • Total resources: The overall amount available for sharing.

  • Defection share: How much more a defector takes compared to a cooperator.

  • Inefficiency: How much the total resource pool shrinks if both players defect.

By adjusting these parameters, you can model real-world situations like business negotiations, environmental cooperation, or competitive markets.


Step 5: Reverse-Engineer for Understanding#

Now, if we reverse-engineer this model, you can see that the numbers in the payoff matrix (50, 30, 70, 40) come directly from:

  • Total resources: How much is available to share.

  • Strategy outcomes: Whether players cooperate or defect.

  • Penalties for defection: Inefficiencies that reduce overall resources when both defect.

The key idea is that the payoffs are calculated based on specific game dynamics—not arbitrary numbers.


Conclusion:#

The payoffs in game theory don’t come from nowhere—they are derived from specific resource interactions, utility functions, or statistical models. By using a resource-based model like this one, you can see exactly where numbers like 0, 1, 3, 5 come from. They’re just the result of carefully defining the game rules.

Great question! Equilibrium in game theory, particularly Nash Equilibrium, comes into play when players reach a point where no one can improve their outcome by changing their strategy while others keep theirs the same. This is where the game stabilizes, as each player’s choice is optimal given the choices of others.

In simpler terms:

  • Equilibrium is the point where no one has any incentive to deviate.

  • Empirically, we see equilibrium when the players’ strategies stabilize because each player is “stuck” in a position where changing their choice would lead to a worse outcome for them.

Step-by-Step Breakdown of Equilibrium Empirically:#

Step 1: Empirical Payoff Matrix#

We first need a payoff matrix derived from some real interaction (like the resource-sharing model from earlier). The payoffs define the incentives for each strategy.

Let’s reuse our resource-sharing example from before, where the payoff matrix reflects the incentives for cooperation and defection:

\( M = \begin{bmatrix} 50 & 30 \\ 70 & 40 \end{bmatrix} \)

Here:

  • Row 1 represents Player 1’s choices (Cooperate or Defect).

  • Column 1 represents Player 2’s choices (Cooperate or Defect).

  • The values reflect their payoffs from each strategy combination.

Step 2: Find the Nash Equilibrium#

To empirically find the Nash Equilibrium, we check whether either player can improve their payoff by changing strategies. For a 2x2 game like this, we follow these steps:

  1. Check Player 1’s best responses:

    • If Player 2 cooperates (Column 1), Player 1 gets 50 if they cooperate, but 70 if they defect. So, Player 1 would defect.

    • If Player 2 defects (Column 2), Player 1 gets 30 if they cooperate, but 40 if they defect. So, Player 1 would still defect.

  2. Check Player 2’s best responses:

    • If Player 1 cooperates (Row 1), Player 2 gets 50 if they cooperate, but 70 if they defect. So, Player 2 would defect.

    • If Player 1 defects (Row 2), Player 2 gets 30 if they cooperate, but 40 if they defect. So, Player 2 would also defect.

From this, we see that both players choosing to defect is the Nash Equilibrium, because neither can improve their outcome by unilaterally changing their strategy. In this case, the equilibrium is (Defect, Defect), which gives both players a payoff of 40.

Step 3: How to Empirically Observe Equilibrium#

In a real-world situation, equilibrium behavior is observed when players’ strategies stabilize because deviating would make them worse off. For example:

  • Businesses in competition: If both firms are pricing their goods similarly, and lowering the price further would reduce profits without gaining market share, they reach an equilibrium.

  • Countries in a standoff (e.g., Cold War): Neither side escalates the conflict because it would lead to mutual destruction, stabilizing the situation in a “Cold War” state.

These are empirical observations of equilibrium—players learn over time that switching strategies leads to worse outcomes, so they stick with the current strategies.

Empirical Simulation of Equilibrium in Python:#

Now, let’s simulate this in Python to dynamically find equilibrium through repeated interactions between players.

We’ll use the best response dynamics—players adjust their strategies based on the best possible response to the other player’s strategy. The simulation will iterate through several rounds until the players’ strategies stabilize (reach equilibrium).

Hide code cell source
import numpy as np

# grok-2 to the rescue!
def resource_based_payoff(total_resources=100, defect_share=0.7, inefficiency=0.8):
    cooperate_cooperate = total_resources / 2
    cooperate_defect = total_resources * (1 - defect_share)
    defect_cooperate = total_resources * defect_share
    defect_defect = total_resources * inefficiency / 2
    
    payoff_matrix_p1 = np.array([[cooperate_cooperate, cooperate_defect],
                                 [defect_cooperate, defect_defect]])
    payoff_matrix_p2 = np.array([[cooperate_cooperate, defect_cooperate],
                                 [cooperate_defect, defect_defect]])
    return payoff_matrix_p1, payoff_matrix_p2

def repeated_game(payoff_matrix_p1, payoff_matrix_p2, rounds=10, discount=0.9):
    player1_strategy = 0
    player2_strategy = 0
    player1_payoff_history = []
    player2_payoff_history = []

    for round in range(rounds):
        p1_payoff = payoff_matrix_p1[player1_strategy][player2_strategy]
        p2_payoff = payoff_matrix_p2[player1_strategy][player2_strategy]
        
        player1_payoff_history.append(p1_payoff)
        player2_payoff_history.append(p2_payoff)

        future_payoff_cooperate = sum(p1_payoff * (discount ** i) for i in range(rounds - round))
        future_payoff_defect = payoff_matrix_p1[1][player2_strategy] + (payoff_matrix_p1[1][1] * discount) * sum(discount ** i for i in range(rounds - round - 1))

        player1_strategy = 1 if future_payoff_defect > future_payoff_cooperate else 0

        future_payoff_cooperate = sum(p2_payoff * (discount ** i) for i in range(rounds - round))
        future_payoff_defect = payoff_matrix_p2[1-player2_strategy][1] + (payoff_matrix_p2[1][1] * discount) * sum(discount ** i for i in range(rounds - round - 1))
        player2_strategy = 1 if future_payoff_defect > future_payoff_cooperate else 0

        print(f"Round {round + 1}: Player 1 chooses {'Defect' if player1_strategy else 'Cooperate'}, "
              f"Player 2 chooses {'Defect' if player2_strategy else 'Cooperate'}")

    # Return only the final strategies, combined total payoff
    return (player1_strategy, player2_strategy), sum(player1_payoff_history) + sum(player2_payoff_history)

# Setup and run the game
payoff_matrix_p1, payoff_matrix_p2 = resource_based_payoff()
equilibrium, total_payoff = repeated_game(payoff_matrix_p1, payoff_matrix_p2)
print(f"Equilibrium: Player 1 = {'Defect' if equilibrium[0] else 'Cooperate'}, "
      f"Player 2 = {'Defect' if equilibrium[1] else 'Cooperate'}")
print(f"Total Payoff: {total_payoff}")
Round 1: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 2: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 3: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 4: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 5: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 6: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 7: Player 1 chooses Cooperate, Player 2 chooses Cooperate
Round 8: Player 1 chooses Defect, Player 2 chooses Cooperate
Round 9: Player 1 chooses Cooperate, Player 2 chooses Defect
Round 10: Player 1 chooses Defect, Player 2 chooses Cooperate
Equilibrium: Player 1 = Defect, Player 2 = Cooperate
Total Payoff: 1000.0

Explanation:#

  1. Best Response Dynamics: In each round, Player 1 and Player 2 adjust their strategies based on what maximizes their payoff given the other player’s current strategy.

  2. Convergence to Equilibrium: The game continues until neither player can improve by changing their strategy, which leads to the Nash Equilibrium.

Output Example:#

Round 1: Player 1 chooses Defect, Player 2 chooses Defect
Round 2: Player 1 chooses Defect, Player 2 chooses Defect
...
Equilibrium: Player 1 = Defect, Player 2 = Defect

Conclusion:#

Equilibrium empirically emerges when players adjust their strategies in response to one another until no one has an incentive to change. In this example, both players defect in every round because it’s their best response given the other’s strategy, leading to a Nash Equilibrium at (Defect, Defect). This is where equilibrium is observed empirically in real-life games—through iterative adjustments and learning that stabilizes strategy choices.

This should give you a clearer understanding of how equilibrium emerges both conceptually and empirically through game interactions.