Hiring in an AI World

Hiring must shift from evaluating code production to evaluating engineering judgment because AI has collapsed the cost of typing but raised the cost of misunderstanding.

AI accelerates code production, so wrong assumptions propagate faster. Misunderstandings now reach production before they are caught, making judgment the bottleneck.

What it is to be a software engineer

Even before the rise of AI, a software engineer's job was about 30% code generation and 70% teamwork.

AI automates the 30 percent, not the 70 percent. The teamwork remains.

  • Requirements, clarification and planning (15 to 20 percent)
  • Meetings and coordination (10 to 15 percent)
  • Code review (10 to 15 percent)
  • Debugging, testing, and validation (15 to 20 percent)
  • DevOps, tooling, and environment work (5 to 10 percent)
  • Documentation and knowledge work (5 to 10 percent)

Teamwork ensures that there is sufficient problem-domain clarity so that any generated code is not only an appropriate solution, but that it is safe and compliant, meeting all company rules.

The above figures come from McKinsey, GitHub, Stripe, and the Harris Poll.

How we used to hire engineers

When interviewing for software engineers, most companies would set a programming task, for completion either before or within the interview.

AI now makes this task trivial and professional code generation automatic.

What is more important than ever is engineering judgement.

Hiring an AI-aware engineer

The interview must simulate real engineering conditions: ambiguity, risk, constraints, and system behaviour. These four can lead to expensive misunderstandings. The interview will assess for the candidate's ability to work in such conditions.

In the interview, the candidate receives the following.

Present this to the candidate in the interview, not as a take‑home task, so you interview the candidate rather than their AI‑augmented version.

  1. A short, ambiguous business requirement
  2. A set of constraints and risks, some explicit and some implied
  3. An AI‑generated code snippet
  4. A failing integration test
  5. A log excerpt from a degraded service
  6. A description of upstream and downstream components
  7. A note defining their decision‑rights for the scenario

Decision-rights define what the candidate can and cannot decide for themselves, and why.

The candidate is then invited to read through and ask questions. This shifts the interaction toward real engineering practice.

The interviewer evaluates the candidate on how they think, not how they type. This single scenario measures the 70 percent of the modern software engineering role: judgement, verification, system reasoning, operational awareness, and clarity under uncertainty.

Why the seven topic areas are insurance for your business

Engineering is the discipline of operating with incomplete information.

Hiring against the seven topics reduces business risk because you select for candidates who can demonstrate judgement: reasoning about failure, evaluating AI‑generated output that may be incomplete or wrong, working within constraints, understanding coupling, and escalating with clarity.

These are the conditions of real engineering and real systems. Hiring for them is not overhead. It is risk management.

What to look for within each section.

Business Requirement

Can the candidate clarify the intent of the requirement and surface any missing information?

It is instructive to have the candidate restate the goal of the requirement in business terms without any focus on the technology that could be used to deliver a solution. You can then judge whether they have grasped the idea behind the requirement and what a successful solution should deliver for the business.

A more senior colleague will be able to demonstrate more insight. However, a junior candidate should still be able to state relevant information.

No candidate will describe what is to be done exactly as your company does it. That is fine. You are assessing for judgment not process knowledge.

Constraints

Can the candidate identify boundaries to the requirement, any risks, compliance obligations, and unsafe assumptions?

AI‑generated code

Can the candidate assess the code for correctness, instruction adherence, and schema reliability? Can they challenge plausible but wrong output?

Instruction adherence captures whether the code matches any constraints passed via a prompt, such as, does the code adhere to how exceptions should be handled within your organisation? Has the AI written some obviously non-compliant code?

The candidate is not expected to know the original AI prompt; they are expected to detect when the output does not logically follow from the requirement or system.

Can the candidate detect when the AI has produced code that does not logically follow from the requirement, the domain, or the surrounding system? The candidate can make assumptions on what might be within the surrounding system. This is not a test of their ability to know your system, but an assessment of their judgment.

If a candidate makes a sensible guess, that shows judgment. You might then follow up by asking what they would do to confirm their guess. A good answer will entail discussing their assumption with a more experienced colleague.

Failure‑mode literacy

Can the candidate trace the failure chain from the integration test they have been given? Are they able to distinguish symptoms from causes, and explain how the system behaves under stress?

From the log are they able to discuss what production failure looks like and how that might ripple through the system?

Operational literacy

From the log excerpt they have been shown, can the candidate interpret logs, metrics, and signals? Are they able to explain what "healthy" looks like and how to restore it.

System awareness

Can the candidate show awareness of upstream and downstream effects, identify risks between systems, and reason about integration behaviour?

Discuss with the candidate how the AI-generated code might behave if a system it invokes is currently down.

Communication of reasoning

During their discussion, can the candidate articulate their assumptions, any trade‑offs they might make (such as space vs. time), and next steps with clarity? Do you understand what they say as an engineer, albeit a more junior one? Or is their communication muddled, unclear and verbose?

Conclusion

Hiring in an AI world is about finding people who can think clearly under uncertainty, evaluate machine‑generated artefacts, and protect system integrity.

Hire for judgment.

Read next: The Big AI Gains Come From Teams, Not Individuals
Most delivery cost is at the team level. The big AI gains will be with teams, not individuals.

If this was useful, you can get more pieces like it in the Phroneses newsletter.

Subscribe →

I work with leaders and teams on clarity, capability, and momentum. Work with me →

Table of Contents

Further Reading

  • GitHub — The Economic Impact of GitHub Copilot https://github.blog/news-insights/research/the-economic-impact-of-github-copilot/

  • McKinsey & Company — Unleashing developer productivity with generative AI https://www.mckinsey.com/capabilities/quantumblack/our-insights/unleashing-developer-productivity-with-generative-ai

  • McKinsey & Company — Yes, you can measure software developer productivity https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/yes-you-can-measure-software-developer-productivity

  • Stripe — The Developer Coefficient (with Harris Poll) https://stripe.com/reports/developer-coefficient-2018

\