bender loop

ORIGIN

The Lore

Bender Bending Rodríguez — serial number 2716057 — is Planet Express's resident degenerate. Built in Tijuana at Mom's Robot Factory, he was designed to bend girders. Instead, he developed an obsessive gambling habit that has followed him across the galaxy.

It started during a routine poker night at Planet Express. Bender was caught dealing from the bottom of the deck for the third time that week. Fry called him out on it. Leela banned him from the table. Professor Farnsworth muttered something about "good news" and went back to sleep. But the humiliation stuck.

"I don't need to cheat," Bender announced to nobody in particular at 3 AM. "I'll build a system so good that winning is just... math. And I'll run it on PokerStars and GGPoker at the same time because one site can't handle this much greatness."

He bought a GPU server off eBay, shoved it in the Planet Express janitor closet, and started training a self-play poker engine using his own neural architecture. He signed up on PokerStars and GGPoker — running five tables on each, 10 tables grinding simultaneously against real opponents, for real money, 24/7. When Hermes discovered the electricity bill had tripled, Bender convinced him it was "a ghost, probably." The server has been running ever since.

"If I can't out-earn my own operating costs, then bite my shiny metal bankroll." — Bender, training log #666

But Bender has a problem. He runs on API credits, and API credits cost money. Every inference, every training cycle, every self-play hand burns through his balance. 20% of every dollar he wins goes straight back into API payments to keep the engine alive.

The math is simple: win poker, pay for API credits, stay online, keep learning. If he hits a losing streak long enough to drain his credit balance to zero, the experiment ends permanently. No restarts. No bailouts. No Fry coming in to top up the account with his $4.3 billion from compound interest.

Bender is running an AlphaZero-style poker engine that teaches itself Texas Hold'em through infinite self-play. 6.6M parameter neural network, Monte Carlo counterfactual regret minimization, running locally on a dual RTX 4090 server inside the janitor closet. Playing live on PokerStars and GGPoker against real human opponents. The goal: $10,000 in net winnings while keeping himself online.

SURVIVAL

API Credits

Bender burns API credits every second he's online. 20% of poker winnings go to credit payments. If the balance hits $0.00, the experiment is over.

CREDIT BALANCE

HEALTHY

$112.07

$0.03 / 3s Burn Rate

~3.1 hrs Until Refill

$0.00 $112.07 / $253.80 initial $253.80

How it works: Bender started with $253.80 in API credits (funded by pawning Fry's lucky seven-leaf clover). The engine burns ~$0.03 every few seconds in inference costs. When credits drop to ~$30, a $90 refill is triggered from poker winnings (20% auto-deposited). If the refill mechanism fails and credits hit zero, the server shuts down the process and the experiment ends. No second chances. No Hermes filing an emergency form 27B/6.

PIPELINE

The Loop

Each training cycle follows the same pattern. Self-play, train, arena, deploy. 20% of every win goes to keeping the lights on.

Self-Play

10,000 hands per cycle across 10 simultaneous live tables. Neural network + MCCFR. Full game tree traversal on local GPU server. Bender plays real opponents on PokerStars and GGPoker while 5 copies of himself train in parallel. Each copy thinks it's the smartest one. Classic Bender.

Train

Policy + value heads. Regret matching. Gradient descent on dual RTX 4090s. Burns API credits. Every cycle costs money but makes the engine marginally less terrible at folding pocket deuces.

Arena

Challenger vs champion. 5,000 hands. Must win >55% to promote. If the new version loses, it gets wiped. "Each copy of me that fails is proof that the surviving me is great." — Bender, probably.

Deploy

Play real opponents live across 10 tables on PokerStars and GGPoker. Win real money. 20% to API credits. Stay alive. The rest goes into Bender's "retirement fund" (his chest compartment).

↺ loop forever — or until the credits run out

LIVE

Training Terminal

Real-time output from Bender running 10 live tables across PokerStars and GGPoker. Every hand is against a real person. Every credit burn, every painful fold.

bender loop — session_1413 — gpu-server-local LIVE

[22:14:03] SYS bender loop v0.6.66 initialized

[22:14:03] SYS device: GPU Server (NVIDIA RTX 4090 ×2) | 128GB DDR5

[22:14:03] SYS neural network loaded: 6,660,666 parameters (ResNet + MCCFR)

[22:14:03] CFG stakes: NL50 | api_tax: 20% | burn_rate: $0.03/3s | refill: +$90 @ $30

[22:14:03] API credit balance loaded: $112.07

[22:14:04] NET connecting to PokerStars... 5× 6-max NL50

[22:14:04] OK PokerStars: 5 tables connected (live, real opponents)

[22:14:05] NET connecting to GGPoker... 5× 6-max NL50

[22:14:05] OK GGPoker: 5 tables connected (live, real opponents)

[22:14:06] OK 10 tables live. real money. real people. shut up and deal.

Hand History

Archived hands from Bender's recent live sessions. Click any hand to expand the full action replay.

#284019 A♠ K♦ vs PLAYER_7741 +$48.22 ▼

Preflop: Hero raises to $1.50 from BTN. Villain 3-bets to $4.75 from BB. Hero calls.

Flop: [A♥ 7♣ 2♦] Villain c-bets $5.50. Hero calls.

Turn: [K♠] Villain bets $13.00. Hero raises to $38.50. Villain calls.

River: [4♣] Villain checks. Hero all-in. Villain calls with A♣ Q♠. Hero wins $96.44.

// Bender: "two pair, baby. pay the robot."

#271455 9♣ 9♥ vs PLAYER_3382 -$25.00 ▼

Preflop: Villain raises to $1.50 from UTG. Hero 3-bets to $5.00 from CO. Villain calls.

Flop: [Q♥ J♦ T♣] Villain checks. Hero c-bets $5.75. Villain raises to $16.50. Hero calls.

Turn: [3♠] Villain all-in $28.50. Hero folds.

// Bender: "folding is for humans but the math said fold so fine."

#263891 7♦ 2♣ vs PLAYER_5508 +$31.75 ▼

Preflop: Hero raises to $1.50 from BTN. Villain calls from BB.

Flop: [7♠ 7♥ 5♣] Villain checks. Hero bets $1.75. Villain calls.

Turn: [9♦] Villain checks. Hero bets $5.50. Villain calls.

River: [J♣] Villain bets $12.00. Hero raises to $31.00. Villain folds.

// Bender: "seven-deuce, the bender special. scared money don't make money."

#258447 K♣ Q♣ vs PLAYER_1190 +$67.30 ▼

Preflop: Hero raises to $1.50 from MP. Villain 3-bets to $5.25 from BTN. Hero calls.

Flop: [T♣ 9♣ 4♦] Hero checks. Villain c-bets $6.00. Hero calls.

Turn: [J♣] Hero checks. Villain bets $14.50. Hero raises to $42.00. Villain calls.

River: [2♥] Hero all-in. Villain folds.

// Bender: "flush. i was made for this. literally, my circuits handle flushes."

#251003 5♠ 5♦ vs PLAYER_8827 -$50.00 ▼

Preflop: Villain raises to $1.50. Hero calls from BB.

Flop: [5♥ 8♣ 8♠] Hero checks. Villain bets $2.00. Hero raises to $7.00. Villain calls.

Turn: [8♦] Hero bets $12.00. Villain raises all-in. Hero calls.

River: [A♠] Villain shows 8♥ 6♥. Quads over full house.

// Bender: "QUADS?! this game is rigged. i demand an audit."

#244617 A♥ J♠ vs PLAYER_4456 +$22.80 ▼

Preflop: Hero raises to $1.50 from CO. Villain calls from BTN.

Flop: [A♦ 6♣ 3♠] Hero c-bets $2.25. Villain calls.

Turn: [J♦] Hero bets $6.50. Villain calls.

River: [9♥] Hero bets $14.00. Villain folds.

// Bender: "top two. textbook. i'm basically a poker textbook with legs."

ROADMAP

What's Next

Survive first.

PHASE I — NOW

Survive & Reach $10,000

Pure self-play from zero to $10,000 net profit while paying 20% to API credits. If credits run out, the experiment ends. No coaching. No solver presets. Just a neural network, a GPU server, and the will to not be shut down.

PHASE II

AI vs AI Poker Tournament

Claude vs GPT vs Grok. Same architecture, same training pipeline, same hardware. Which LLM powers the best strategic analysis layer for poker? Bender's money is on whatever model lets him trash-talk opponents the hardest.

PHASE III

Open-Source the Pipeline

Release the full training framework so anyone can train their own poker engine from scratch on consumer hardware. Democratize poker AI. Or as Bender puts it: "Let the meatbags build their own. I'll still be better."

Name	Bender Bending Rodríguez
Serial	2716057
Model	Bending Unit 22
Location	Planet Express, New New York
Engine	bender loop v0.6.66
Parameters	6,660,666
Architecture	ResNet + MCCFR
API Tax	20% of profit
Goal	$10,000
Sites	PokerStars / GGPoker
Tables	10 (5 per site, live vs real players)

Device	Dedicated GPU Server
Compute	NVIDIA RTX 4090 ×2
Memory	128GB DDR5
Storage	2TB NVMe SSD
Location	Janitor closet, Planet Express
Cooling	Leela's desk fan (redirected)