Issue #700

Essential Reading For Engineering Leaders

Friday 20th March issue is presented by WorkOS

Same input. Same prompt. Different output. That's the reality of testing AI agents that write code, and most teams are shipping without solving it.

Nick Nisi from WorkOS tackled this by building eval systems for two AI tools:

The post covers how to test against real project structures, score output that's different every time, and catch when your agent makes up methods that don't exist.

— Wes Kao

tl;dr: (1) Over-reliance on technical details: Real-life is non-linear, but stories are linear. (2) Trying to remember too many tactics: Don’t try to remember a list of storytelling tips and strategies. (3) Too much backstory: Start right before you almost get eaten by a bear.(4) Trying to tell a story that’s too long.

Leadership Management

— Mike Fisher

tl;dr: “For leaders, this maps uncomfortably well to the way teams behave under pressure. When metrics are strong and customers are happy, exploration often feels like a luxury. When things are going poorly, it feels irresponsible. In both cases, the instinct is to exploit harder, to optimize the known, to squeeze more value out of the current system.”

Leadership Management

— Aaron Tainter, Pavan Kulkarni

tl;dr: Authentication proves an agent's identity. Authorization defines its blast radius. Most agents today inherit a user's full access token, turning a helpful assistant into a confused deputy that can leak production secrets to a shared Slack channel. This post digs into why that happens and how WorkOS FGA solves this by scoping the blast radius with resource-level permissions.

Promoted by WorkOS

Security Agents

— Akash Bajwa

tl;dr: A roundtable with Anthropic’s Ash Prabaker and engineering leaders from Stripe, NVIDIA, Microsoft, Google DeepMind, xAI, Apple, Scale AI, and Peter Steinberger explored how AI is reshaping software engineering - shifting workflows toward eval-driven development, agent-led coding, and new bottlenecks in long-horizon tasks, context, and regulation.

Leadership Management

“Keep your fears to yourself, but share your courage with others.”

― Edsger W. Dijkstra

— Murat Demirbas

tl;dr: “This 2025 December paper, "Measuring Agents in Production", cuts through the reality behind the hype. It surveys 306 practitioners and conducts 20 in-depth case studies across 26 domains to document what is actually running in live environments. The reality is far more basic, constrained, and human-dependent.”

Productivity Agents

— Ben Dicken

tl;dr: The Postgres process-per-connection model breaks down at scale. PgBouncer fixes this, but tuning it well is the hard part. This deep dive covers the three pooling modes, how to size your connection chain from max_client_conn down to max_connections, and real tuning examples for small, large, and single-tenant setups. Everything you need to configure PgBouncer with confidence.

Promoted by PlanetScale

PostgreSQL

— Daniil Bastrich

tl;dr: “I’ve distilled a healthy, sustainable review process into an acronym: PERFECT. It prioritizes what truly matters - from business logic and edge cases to reliability and readability - while keeping subjective opinions in check. Here is how you can apply these principles to bring structure, clarity, and consistency to your code reviews.”

CodeReview

— Rob Pike

tl;dr: (1) You can't tell where a program is going to spend its time.(2) Measure before optimizing. (3) Fancy algorithms are slow when n is small, and n is usually small. (4) Prefer simple algorithms and data structures. (5) Data dominates.

BestPractices

— Thariq Shihipar

tl;dr: “A common misconception we hear about skills is that they are “just markdown files”, but the most interesting part of skills is that they’re not just text files. They’re folders that can include scripts, assets, data, etc. that the agent can discover, explore and manipulate. In Claude Code, skills also have a wide variety of configuration options including registering dynamic hooks.”

Agents

Null Pointer

Always Accept

Hand-drawn by Manu. Got an idea for a cartoon? Click reply and let us know

Slow Down To Speed Up - James Stanier

Claude HUD: Claude Code plugin that shows what's happening.

Cookbooks: Notebooks & recipes showcasing ways of using Claude.

Homarr: Modern and easy to use dashboard.

Impeccable: Design fluency for AI harnesses.

OpenDataLoader PDF: PDF Parser for AI-ready data.


How did you like this issue of Pointer?

1 = Didn't enjoy it all // 5 = Really enjoyed it
1  |  2  |  3  |  4  |  5

Login or Subscribe to participate in polls.