Pointer
Posts
Issue #700

Issue #700

Essential Reading For Engineering Leaders

March 20th, 2026

Friday 20th March issue is presented by WorkOS

How To Test AI Agents That Never Produce the Same Output Twice

Same input. Same prompt. Different output. That's the reality of testing AI agents that write code, and most teams are shipping without solving it.

Nick Nisi from WorkOS tackled this by building eval systems for two AI tools:

npx workos, a CLI agent that installs AuthKit into your project
WorkOS agent skills that power LLM responses about SSO, directory sync, and RBAC.

The post covers how to test against real project structures, score output that's different every time, and catch when your agent makes up methods that don't exist.

Learn More About Evals →

Technical Leaders Make These 4 Common Storytelling Mistakes

— Wes Kao

tl;dr: (1) Over-reliance on technical details: Real-life is non-linear, but stories are linear. (2) Trying to remember too many tactics: Don’t try to remember a list of storytelling tips and strategies. (3) Too much backstory: Start right before you almost get eaten by a bear.(4) Trying to tell a story that’s too long.

Leadership Management

Exploit vs Explore

— Mike Fisher

tl;dr: “For leaders, this maps uncomfortably well to the way teams behave under pressure. When metrics are strong and customers are happy, exploration often feels like a luxury. When things are going poorly, it feels irresponsible. In both cases, the instinct is to exploit harder, to optimize the known, to squeeze more value out of the current system.”

Leadership Management

WorkOS FGA: The Authorization Layer For AI Agents

— Aaron Tainter, Pavan Kulkarni

tl;dr: Authentication proves an agent's identity. Authorization defines its blast radius. Most agents today inherit a user's full access token, turning a helpful assistant into a confused deputy that can leak production secrets to a shared Slack channel. This post digs into why that happens and how WorkOS FGA solves this by scoping the blast radius with resource-level permissions.

Promoted by WorkOS

Security Agents

The Future Of Software Engineering With Anthropic

— Akash Bajwa

tl;dr: A roundtable with Anthropic’s Ash Prabaker and engineering leaders from Stripe, NVIDIA, Microsoft, Google DeepMind, xAI, Apple, Scale AI, and Peter Steinberger explored how AI is reshaping software engineering - shifting workflows toward eval-driven development, agent-led coding, and new bottlenecks in long-horizon tasks, context, and regulation.

Leadership Management

“Keep your fears to yourself, but share your courage with others.”

― Edsger W. Dijkstra

Measuring Agents In Production

— Murat Demirbas

tl;dr: “This 2025 December paper, "Measuring Agents in Production", cuts through the reality behind the hype. It surveys 306 practitioners and conducts 20 in-depth case studies across 26 domains to document what is actually running in live environments. The reality is far more basic, constrained, and human-dependent.”

Productivity Agents

Scaling Postgres Connections With PgBouncer

— Ben Dicken

tl;dr: The Postgres process-per-connection model breaks down at scale. PgBouncer fixes this, but tuning it well is the hard part. This deep dive covers the three pooling modes, how to size your connection chain from max_client_conn down to max_connections, and real tuning examples for small, large, and single-tenant setups. Everything you need to configure PgBouncer with confidence.

Promoted by PlanetScale

PostgreSQL

The PERFECT Code Review: How to Reduce Cognitive Load While Improving Quality

— Daniil Bastrich

tl;dr: “I’ve distilled a healthy, sustainable review process into an acronym: PERFECT. It prioritizes what truly matters - from business logic and edge cases to reliability and readability - while keeping subjective opinions in check. Here is how you can apply these principles to bring structure, clarity, and consistency to your code reviews.”

CodeReview

Rob Pike's 5 Rules Of Programming

— Rob Pike

tl;dr: (1) You can't tell where a program is going to spend its time.(2) Measure before optimizing. (3) Fancy algorithms are slow when n is small, and n is usually small. (4) Prefer simple algorithms and data structures. (5) Data dominates.

BestPractices

Lessons From Building Claude Code: How We Use Skills

— Thariq Shihipar

tl;dr: “A common misconception we hear about skills is that they are “just markdown files”, but the most interesting part of skills is that they’re not just text files. They’re folders that can include scripts, assets, data, etc. that the agent can discover, explore and manipulate. In Claude Code, skills also have a wide variety of configuration options including registering dynamic hooks.”

Agents

Null Pointer

Always Accept

Hand-drawn by Manu. Got an idea for a cartoon? Click reply and let us know

Most Popular From Last Issue

Slow Down To Speed Up - James Stanier

Notable Links

Claude HUD: Claude Code plugin that shows what's happening.

Cookbooks: Notebooks & recipes showcasing ways of using Claude.

Homarr: Modern and easy to use dashboard.

Impeccable: Design fluency for AI harnesses.

OpenDataLoader PDF: PDF Parser for AI-ready data.

How did you like this issue of Pointer?

1 = Didn't enjoy it all // 5 = Really enjoyed it

1 | 2 | 3 | 4 | 5