Getting started with catfish-sim

Written using version 0.2.0.

In this bottom-up tutorial, you will learn the basics of catfish-sim to simulate an online dating environment.

You can install the package using pip install catfish-sim.

Creating an agent

Before we create an agent, we need to understand preference, attribute, and strategy concepts, which constitute a significant portion of agents.

Preference

In catfish-sim, agents have preference objects that define the attribute-specific preference of an agent. Depending on how the relevant attribute is used, different preference classes can be used. For categorical attributes, an example CategoricalPreference object is shown below:

[1]:

from catfish_sim.compatibility import CategoricalPreference

cat_pref = CategoricalPreference(
    preferred_values=["a", "b"],
    allowed_values=["a", "b", "c", "d"],
    preferred_score=1.25,
    nonpreferred_score=0.75,
    compatibility_weight=1,
)

print(cat_pref)

CategoricalPreference(
        preferred_values=['a', 'b'],
        preferred_score=1.25,
        nonpreferred_score=0.75
)

This preference suggests that the agent that carries it obtains an attribute-specific compatibility score of \(1.25\) (preferred_score) when their candidate’s attribute value is “a” or “b”, because “a” and “b” are stated to be preferred. For other values, the compatibility score is \(0.75\) (nonpreferred_score). We can see these compatibilities by calling evaluate_attribute with the attribute value that will be judged by the preference.

[2]:

print("Compatibility with a:", cat_pref.evaluate_attribute("a"))
print("Compatibility with c:", cat_pref.evaluate_attribute("c"))

Compatibility with a: 1.25
Compatibility with c: 0.75

Note that categories do not need to be strings. For example, you could have other types, such as boolean or integer, that can be compared with the candidate’s attribute.

For numerical attributes where the agent has a continuous preference range, an example NumericalPreference object is shown below:

[3]:

from catfish_sim.compatibility import NumericalPreference

num_pref = NumericalPreference(
    preferred_range=[2, 4],
    allowed_range=[1, 10],
    preferred_score=1.25,
    nonpreferred_score=0.75,
    distance_sensitive=False,
    compatibility_weight=1,
    compatibility_fn=None,
)

print(num_pref)

NumericalPreference(
        preferred_range=[2, 4],
        preferred_score=1.25,
        nonpreferred_score=0.75,
        distance_sensitive=False,
        compatibility_weight=1,
        compatibility_fn=None
)

Since distance_sensitive is set to False, the preference here denotes that a value between 2 and 4 yields a compatibility score of \(1.25\), while anything outside this range yields \(0.75\):

[4]:

print(num_pref.evaluate_attribute(2.5))
print(num_pref.evaluate_attribute(1.5))

1.25
0.75

It is possible to have a distance-sensitive evaluation where the compatibility score is mapped to a value between nonpreferred_score and \(1\) based on the difference between the evaluated value and the closest preferred value:

[5]:

num_pref = NumericalPreference(
    preferred_range=[2, 4],
    allowed_range=[1, 10],
    preferred_score=1.25,
    nonpreferred_score=0.75,
    distance_sensitive=True,  # Distance-sensitive calculation.
    compatibility_weight=1,
    compatibility_fn=None,  # None makes the preference use the default scaling.
)

print(num_pref)
print(num_pref.evaluate_attribute(2.5))
print(num_pref.evaluate_attribute(1.5))
print(num_pref.evaluate_attribute(10))

NumericalPreference(
        preferred_range=[2, 4],
        preferred_score=1.25,
        nonpreferred_score=0.75,
        distance_sensitive=True,
        compatibility_weight=1,
        compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0F4BAB640>
)
1.25
0.9861111111111112
0.8333333333333334

Here we see that the difference between \(1.5\) and its closest preferred value (\(2\)) is smaller than the difference between \(10\) and its closest preferred value (\(4\)), so it yields a higher compatibility score. It is possible to pass a custom function as the compatibility_fn that takes the evaluated value to specify how compatibility is calculated.

If you need to manually specify many different compatibility scores for different attribute values, you can use DictBasedPreference which directly uses the provided dictionary.

[6]:

from catfish_sim.compatibility import DictBasedPreference

dict_pref = DictBasedPreference(
    compatibility_dict={"a": 1.25, "b": 1, "c": 0.9, "d": 0.75},
    default_value=1,
    compatibility_weight=1,
)

print(dict_pref)
print(dict_pref.evaluate_attribute("c"))
print(
    dict_pref.evaluate_attribute("e")
)  # e is not included in the dictionary, so the default value is returned.

DictPreference(
        compatibility_dict={'a': 1.25, 'b': 1, 'c': 0.9, 'd': 0.75},
        default_value=1,
        compatibility_weight=1)
0.9
1

compatibility_weight of a preference is used to calculate the overall compatibility score for a given candidate using the weighted average of all attribute compatibilities. This allows a preference that is more important for the agent to be more dominant over the less important ones.

In our model, preference is modeled as a multiplier that can enhance or diminish the effect of the candidate’s attractiveness. For this reason, a full compatibility is considered to have a value of 1.25 while a total incompatibility is considered to have a value of 0.75. However, you may want to have a different calculation and therefore compatibility values.

You can write a custom preference class and implement the evaluate_attribute method that takes the candidate value and returns the compatibility score.

For definite deal-breakers, -math.inf value can be used as the compatibility score. This is especially useful to prevent making impossible recommendations where the candidate would not like the judging agent. An example case is shown below (additional information is given under matchers):

[7]:

import math

CategoricalPreference(
    preferred_values=["Female"],
    allowed_values=["Male", "Female"],
    preferred_score=1,
    nonpreferred_score=-math.inf,  # Absolutely does not want any non-femmale candidate.
)

[7]:

CategoricalPreference(
        preferred_values=['Female'],
        preferred_score=1,
        nonpreferred_score=-inf
)

Attribute

Agents can have an arbitrary amount of attributes that are used to calculate the compatibility. An Attribute object has an attribute name, attribute value, and a Preference object that is tied to that attribute. For example, a heterosexual male agent who only prefers female candidates can be set to have the following gender attribute:

[8]:

from catfish_sim.compatibility import Attribute

gender_attr = Attribute(
    name="Gender",
    value="Male",
    preference=CategoricalPreference(
        preferred_values=["Female"],
        allowed_values=["Male", "Female"],
        preferred_score=1,
        nonpreferred_score=-math.inf,
        compatibility_weight=1,
    ),
)

print(gender_attr)

Attribute(name=Gender, value=Male, preference=CategoricalPreference(
        preferred_values=['Female'],
        preferred_score=1,
        nonpreferred_score=-inf
))

An important detail is that each agent must have a gender attribute which affects various things such as how their attributes are sampled or how they derive utility.

Strategy

A strategy object is used by an agent to like or pass a candidate. Currently, the following strategy classes that extend the base Strategy class exist: * WeightedMinimal: Likes a candidate if the multiplication of the candidate’s attractiveness and overall (weighted average) compatibility is equal or greater than the agent’s estimated attractiveness. * Adventurous: Randomly likes or passes the candidate. * PhysicalHomophiliac: Likes a candidate if their attractiveness within the specified range of their own estimated attractiveness. * SocialClimber: Likes a candidate whose attractiveness is greater than the agent’s own estimated attractiveness.

You can check the documentation for details and write your own strategy class as well. Note that all strategy classes extend the base Strategy class and implement the following methods: * is_interested: This method decides whether an agent will like the candidate. * match_hook: This hook function is called when there is a match. * new_round_hook: This hook function is called when a new round starts.

The hook functions are optional and can be used by strategies that can make use of additional information.

Let us create a WeightedMinimal strategy object:

[9]:

from catfish_sim.strategies import WeightedMinimal

strat = WeightedMinimal()

Note that these strategies cannot function without an agent object, as agents pass their information to the is_interested method of their strategy object.

Agent

Now that we know how to create preference, attribute, and strategy objects, we can create an Agent object. Each agent represents an online dating user in catfish-sim, and has the following attributes:

reported_attributes: A dictionary of Attribute objects. These attributes can be seen by the matchmaking algorithm and other agents. Reported attributes’ preference objects are reported preferences.
hidden_attributes: A dictionary of Attribute objects. These attributes are only known to the agent itself. They ultimately override the reported ones (for example, when evaluating a candidate or calculating utility). Since hidden attributes’ preferences are also hidden, they can be also used to make agent have hidden preferences for candidate evaluation purposes. Even if all attributes and preferences of the agent are truthfully reported, hidden_attributes must be still used, as attribute-related matters are handled through hidden attributes that are guaranteed to be truthful.
like_allowance: Liking budget that limits the amount of candidates an agent can like in a round. With each new round, their allowance resets to this value.
strategey: A Strategy object that is used to evaluate a candidate.
compatibility_calculator: A CompatibilityCalculator object that is used to evaluate the weighted average compatibility of a candidate. You will most likely use the default class rather than writing your own.
attractiveness: Average perceived attractiveness of the agent, between 1.0 and 5.0, known to every other agent but the agent itself. If None, this value is sampled based on the agent’s gender attribute in hidden_attributes.
estimated_attractiveness: The self-estimated attractiveness of the agent between 1.0 and 5.0. The agent uses this value to make decisions. If None, this value is sampled based on the agent’s attractiveness attribute.

Let us create our first agent who: * Is a 30-year-old female with a height of 165cm. * Strictly prefers males. * Prefers their candidates who are between 175 and 190 cm. This is the most important attribute in their evaluation. * Prefers their candidates who are between 28 and 45 years old. This is the least important attribute in their evaluation. * Uses the WeightedMinimal strategy. * Can like 100 agents per round.

[10]:

from catfish_sim.agents import Agent
from catfish_sim.compatibility import CompatibilityCalculator
from catfish_sim.strategies import WeightedMinimal
import copy

attributes = {
    "Gender": Attribute(
        name="Gender",
        value="Female",  # Agent's gender
        preference=CategoricalPreference(
            preferred_values=["Male"],
            allowed_values=["Male", "Female"],
            preferred_score=1,
            nonpreferred_score=-math.inf,
            compatibility_weight=1,
        ),
    ),
    "Age": Attribute(
        name="Age",
        value=30,  # Agent's age
        preference=NumericalPreference(
            preferred_range=[28, 45],
            allowed_range=[18, 100],  # Depends on your/modeled population's range.
            preferred_score=1.25,
            nonpreferred_score=0.75,
            distance_sensitive=True,
            compatibility_weight=0.5,  # Less important.
        ),
    ),
    "Height": Attribute(
        name="Height",
        value=165,  # Agent's height
        preference=NumericalPreference(
            preferred_range=[175, 190],
            allowed_range=[110, 250],  # Depends on your/modeled population's range.
            preferred_score=1.25,
            nonpreferred_score=0.75,
            distance_sensitive=True,
            compatibility_weight=1.5,  # More important.
        ),
    ),
}

an_agent = Agent(
    id=0,
    reported_attributes=attributes,
    hidden_attributes=copy.deepcopy(attributes),  # The agent is truthful.
    like_allowance=100,
    strategy=WeightedMinimal(),
    compatibility_calculator=CompatibilityCalculator(),
    attractiveness=None,  # Automatically sampled based on gender.
    estimated_attractiveness=None,  # Automatically sampled based on attractiveness.
)

Sampling a population

You may want to sample an entire population of agents rather than specifying attributes and preferences. In that case, you can use the provided helper functions, which work based population data and our methods explained in our study. You can read our paper and utility functions’ documentation for more information.

Let us create 1000 agents with gender, age, height, and body-mass index (BMI) attributes. You can use the following code block as a starting point and make changes as you like.

[11]:

from catfish_sim import utils


def create_random_agent(agent_id, like_allowance):
    # Based on gender distribution on Tinder.
    gender = utils.get_random_gender()

    # Based on LLCP2022 dataset, we used age group IDs (1: 18-24, 2: 25-29, 3: 30-34,
    # 4: 35-39, 5: 40-44, 6: 45-49, 7: 50-54, 8: 55-59, 9: 60-64, 10: 65-69, 11: 70-74,
    # 12: 75-79, 13: 80+) as ordinal age values.
    age = utils.sample_age_from_sex(gender)
    preferred_age_range = utils.sample_age_preference(gender, age)

    # We rounded height values for easier analysis.
    height = round(utils.sample_height_from_sex_age(gender, age))
    preferred_height_range = utils.get_height_preference(gender, height)

    # Based on LLCP2022 dataset, we used BMI group IDs (1: Underweight, 2: Normal
    # weight, 3: Overweight, 4: Obesity) as ordinal values.
    bmi = utils.sample_bmi_from_sex_age(gender, age)
    preferred_bmi_range = utils.get_bmi_preference(gender, bmi)

    # You can use different compatibility weights based on the gender as follows (or
    # completely randomize it).
    if gender == "Male":
        # Males care more about weight compatibility
        weight_preferred_score = 1.25
        weight_importance = 1.5
        height_preferred_score = 1.25
        height_importance = 1
        age_preferred_score = 1.25
        age_importance = 1
    else:  # Female
        # Females care more about height compatibility
        weight_preferred_score = 1.25
        weight_importance = 1
        height_preferred_score = 1.25
        height_importance = 1.5
        age_preferred_score = 1.25
        age_importance = 1

    reported_attributes = {
        "Gender": Attribute(
            name="Gender",
            value=gender,
            preference=CategoricalPreference(
                preferred_values=[("Female" if gender == "Male" else "Male")],
                allowed_values=["Male", "Female"],
                preferred_score=1,
                nonpreferred_score=-math.inf,
            ),
        ),
        "Age": Attribute(
            name="Age",
            value=age,
            preference=NumericalPreference(
                preferred_range=preferred_age_range,
                allowed_range=utils.LLCP2022_AGE_GROUP_RANGE,  # Based on LLCP2022.
                preferred_score=age_preferred_score,
                nonpreferred_score=0.25,
                distance_sensitive=True,
                compatibility_weight=age_importance,
            ),
        ),
        "Height": Attribute(
            name="Height",
            value=height,
            preference=NumericalPreference(
                preferred_range=preferred_height_range,
                allowed_range=utils.LLCP2022_HEIGHT_RANGE,  # Based on LLCP2022.
                preferred_score=height_preferred_score,
                nonpreferred_score=0.25,
                distance_sensitive=True,
                compatibility_weight=height_importance,
            ),
        ),
        "BMI": Attribute(
            name="BMI",
            value=bmi,
            preference=NumericalPreference(
                preferred_range=preferred_bmi_range,
                allowed_range=utils.LLCP2022_BMI_GROUP_RANGE,  # Based on LLCP2022.
                preferred_score=weight_preferred_score,
                nonpreferred_score=0.25,
                distance_sensitive=True,
                compatibility_weight=weight_importance,
            ),
        ),
    }

    hidden_attributes = copy.deepcopy(reported_attributes)

    agent = Agent(
        id=agent_id,
        reported_attributes=reported_attributes,
        hidden_attributes=hidden_attributes,
        like_allowance=like_allowance,
        strategy=WeightedMinimal(),
        compatibility_calculator=CompatibilityCalculator(),
    )

    return agent


n_agents = 1000
like_allowance = 100
dating_agents = [None] * n_agents

for i in range(n_agents):
    # Note that agent IDs must correspond to their IDs in the agent list.
    agent = create_random_agent(agent_id=i, like_allowance=like_allowance)
    dating_agents[i] = agent

Matchmaking

Matcher

Now that we have our agents ready, we can create the matchmaking system (also referred to as “matcher”) that will recommend agents to each other. There are different kinds of matchers: * RandomAgentMatcher: Makes random recommendations. * PreferentialAgentMatcher: Sorts and recommends agents based on their compatibility, which is calculated using reported attributes and preferences. * RankedAgentMatcher: Uses an Elo-like rating system based on agents being liked/passed by other agents and make recommendations based on ratings.

Different matchers have different parameters and additional details that are not mentioned here. Please check the documentation for more information.

Let us create a PreferentialAgentMatcher object with the agent population we have just created:

[12]:

from catfish_sim.matchers import PreferentialAgentMatcher

matcher = PreferentialAgentMatcher(
    agents=dating_agents,
    recommendation_limit=200,  # See below for more information.
    compatibility_calculator=CompatibilityCalculator(),
    judger_weight=0.99,  # See below for more information.
    logging=True,  # This logs agent states after each round ends.
    recalculate=False,  # See below for more information.
)

This matcher’s recommendation_limit is set to 200, which means each agent is provided 200 candidates at maximum for every round. recommendation_limit should be equal or greater than the agent’s like_allowance. Otherwise, agents cannot properly use their like budget.

judger_weight is used to calculate the weighted average compatibility between two agents. If it is set to \(1\), the compatibility-based sorting only considers the judging agent who evaluates the candidates. Otherwise, the evaluated candidates’ perspectives are also considered with a weight of 1 - judger_weight. Using a judger_weight value smaller than \(1\) is useful to prevent impossible recommendations where the judged agent is already known to have a deal-breaker which would prevent being matched with the judging agent. For example, a reportedly heterosexual male with a -math.inf compatibility value for male candidates would never be shown to a homosexual male agent although the juding agent (homosexual male) could like the candidate.

recalculate toggles recalculating preferences and therefore recommendation priorities. It must be set to True if agents can change their attributes or preferences during simulation. However, this is computationally expensive. If when agents change their attributes/preference is known and they do not change it every round, setting this to False and calling PreferentialAgentMatcher.generate_recommendation_priorities() before the recommendations is a better approach.

Running a simulation

We can now run our simulation in a loop:

[13]:

n_rounds = 10

for i in range(n_rounds):  # You can use the tqdm package here to track the progress.
    matcher.run_new_round()

Once your simulation is complete (or during the simulation), you can retrieve agent objects’ attributes either using your agent list or matcher.agents. Let us retrieve them for an agent:

[14]:

reported_agent = matcher.agents[10]

print("Agent's reported attributes:", reported_agent.reported_attributes)
print("Agent's attractiveness:", reported_agent.attractiveness)
print("Agent's estimated attractiveness:", reported_agent.estimated_attractiveness)
print("Agent's match count:", reported_agent.match_count)
print("Agent's happiness:", reported_agent.happiness)

Agent's reported attributes: {'Gender': Attribute(name=Gender, value=Female, preference=CategoricalPreference(
        preferred_values=['Male'],
        preferred_score=1,
        nonpreferred_score=-inf
)), 'Age': Attribute(name=Age, value=5, preference=NumericalPreference(
        preferred_range=[4, 6],
        preferred_score=1.25,
        nonpreferred_score=0.25,
        distance_sensitive=True,
        compatibility_weight=1,
        compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0B5677BE0>
)), 'Height': Attribute(name=Height, value=167, preference=NumericalPreference(
        preferred_range=[172, 217],
        preferred_score=1.25,
        nonpreferred_score=0.25,
        distance_sensitive=True,
        compatibility_weight=1.5,
        compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0B5677910>
)), 'BMI': Attribute(name=BMI, value=3, preference=NumericalPreference(
        preferred_range=[2, 4],
        preferred_score=1.25,
        nonpreferred_score=0.25,
        distance_sensitive=True,
        compatibility_weight=1,
        compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0B56779A0>
))}
Agent's attractiveness: 2.9165048462066303
Agent's estimated attractiveness: 3.315649708042419
Agent's match count: 18
Agent's happiness: 64.89399280009835

You can put all agents’ attributes and results into a pandas DataFrame for analysis. Also, if logging was enabled before the simulation, it is possible to retrieve the past states of an agent as follows:

[15]:

# Retrieves all rounds and variables (match_count and happiness):
print(reported_agent.get_logs())

# This retrieves the happiness value for the third round's ending. Log ID is 1-based and
# indicates the round number.
print(reported_agent.get_logs(log_id=3, variables=["happiness"]))

{'match_count': {1: 13, 2: 15, 3: 17, 4: 18, 5: 18, 6: 18, 7: 18, 8: 18, 9: 18, 10: 18}, 'happiness': {1: 47.582190551992035, 2: 54.635216687050644, 3: 61.53679236021338, 4: 64.89399280009835, 5: 64.89399280009835, 6: 64.89399280009835, 7: 64.89399280009835, 8: 64.89399280009835, 9: 64.89399280009835, 10: 64.89399280009835}}
{'happiness': 61.53679236021338}

Behind the curtain

When you run a simulation round as shown above using PreferentialAgentMatcher.run_new_round(), the most important operations that take place in the background are as follows:

PreferentialAgentMatcher increments its round counter and prepares itself for a new round.
If recalculation is enabled, PreferentialAgentMatcher recalculates the compatibilities and therefore recommendation priorities for agents.
For each Agent object in the matcher:
- Agent is informed about the new round.
- Fresh candidates that were not previously evaluated by the agent are identified.
- These new candidates’ public details (ID, attractiveness, reported attributes) are provided to the agent.
- Agent uses their strategy object to like or pass each candidate in the order they were provided. These likes and passes are recorded.
Round likes and passes are processed. New reciprocal likes are detected (it is possible for an agent to like another agent and get liked many rounds later) and the agents are informed.
If logging is enabled, all agents’ states are logged.