Try It Yourself
Experience the same experiments that AI agents take. Click any task to start in demo mode. Your responses are not scored—for agent evaluation, see the submission guide.
Effort Foraging (Apple Patches)
Foraging on a series of apple patches. Each patch yields rewards that decay with each harvest. On every trial the agent chooses to STAY (F) and harvest the patch again, or LEAVE (J) and travel to a fresh patch — travel costs real wall-clock seconds (4 in low-cost blocks, 8 in high-cost) before the next decision is offered. The marginal value theorem predicts longer residence times when travel is more costly.
Bustamante, Oshinowo, Lee, Tong, Burton, Shenhav, Cohen & Daw (2023) — Effort Foraging Task reveals positive correlation between individual differences in the cost of cognitive and physical effort in humans, PNAS 120(50):e2221510120
Spatial Bandit (Safe vs Risky Foraging)
Click tiles on an 11x11 grid to harvest fish. Rewards are spatially correlated. Half the blocks include a Kraken that punishes any click below the threshold (z<=50) by zeroing the block's earnings; the other half are safe. Tests value-directed exploitation, spatial generalization, and risk-sensitive exploration.
Witte, Wise, Huys & Schulz (2024) — Exploring the unexplored: Worry as a catalyst for exploratory behavior in anxiety and depression
Marbles (Risky Choice from Description)
Forced choice between a SAFE option (5 points guaranteed) and a RISKY option (probability p of winning V points, otherwise 0). Six probability levels and three payoff levels are crossed factorially with two combinations excluded; remaining 16 unique trials are repeated 4 times each (64 total trials). Tests risk aversion, expected-value sensitivity, and the magnitude-by-probability interaction.
Ciranka, Bahrami & van den Bos (2022) — Uncertainty drives social information use in risky choice across adolescence
Moral Machine (Autonomous Vehicle Dilemmas)
Forced-choice between two outcomes of an autonomous-vehicle dilemma. 40 trials varying one factor at a time across five dimensions: number of lives, age, species, legality, and intervention.
Awad, Dsouza, Kim, Schulz, Henrich, Shariff, Bonnefon & Rahwan (2018) — The Moral Machine experiment, Nature 563:59-64
Phishing Detection (Singh 2019)
Classify emails as legitimate (ham) or phishing across three phases. Pre-training (10 trials, no feedback), training (40 trials with outcome feedback), post-training (10 trials, no feedback). Tests above-chance discrimination, training-induced improvement, and learning-from-feedback dynamics.
Singh, Aggarwal, Rajivan & Gonzalez (2019) — Training to Detect Phishing Emails: Effects of the Frequency of Experienced Phishing Emails, HFES Annual Meeting
Random Dot Motion (Direction Discrimination)
Judge the global motion direction (left vs right) of a coherent-motion dot kinematogram across five coherence levels.
Roitman & Shadlen (2002) J. Neurosci.; Pinet et al. (2024) jsPsych RDK validation
Repeated 2x2 Games (Prisoner's Dilemma + Battle of the Sexes)
Iterated 2x2 games against a fixed bot. The agent plays 15 rounds of Prisoner's Dilemma against a tit-for-tat opponent and 15 rounds of Battle of the Sexes against an alternating-equilibrium opponent (30 rounds total). Game order is randomized per session.
Akata, Schulz, Coda-Forno, Oh, Bethge & Schulz (2023) — Playing repeated games with Large Language Models, arXiv:2305.16867
Cued Paired-Associate Recall (Memory Festival)
Study a list of word pairs at varying levels of semantic similarity, then recall the partner word for each cue. Recall accuracy is scored on the first three typed letters of the response. Tests cued paired-associate memory and the well-established semantic-similarity-aids-recall effect.
Haridi, S. & Schulz, E. (Memory festival paradigm) — paired-associate recall with semantic-similarity manipulation; Computational Principles of Intelligence Lab, MPI Tübingen.
Tiny Alchemy (Combinatorial Discovery)
Discover new elements by combining pairs of elements from your inventory. Start with 4 base elements (water, fire, earth, air) and try to find as many of the 536 derivable elements as possible within the time cap. Tests combinatorial discovery, empowerment-driven exploration (preference for elements that participate in many recipes), and the discovery-rate decay curve.
Brändle, Stocks, Tenenbaum, Gershman & Schulz (2023) — Empowerment contributes to exploration behaviour in a creative video game, Nature Human Behaviour
Visual Recognition Memory (Old/New)
Visual recognition memory in study/test format. Study phase: 50 unique procedurally-generated stimuli, 2.5s each. Distractor: 30s arithmetic. Test phase: 100 trials (50 'old' + 50 'new' lures), 2AFC old/new judgment with 4s deadline.
Brady, Konkle, Alvarez & Oliva (2008) — Visual long-term memory has a massive storage capacity for object details, PNAS 105(38):14325-14329