Why not estimate in hours?
Hours invite false precision. Two engineers looking at the same story often produce very different hour estimates and both feel confident — because "how long it takes me" and "how long it takes the team" are different questions. Hours also force you to account for interruptions, code review, pairing, and context-switching. Story points sidestep all of that by asking a different question: how does the effort of this story compare to stories we already know?
What a story point actually represents
A point blends three things the team cares about:
- Effort — how much work, regardless of who does it.
- Complexity — how much thinking, design, or unknown tech is involved.
- Risk / uncertainty — how wrong we might be, and how much that hurts.
A story can be small in effort but high in uncertainty (a one-line config change in a legacy system you've never touched) and deserve more points than its size suggests.
Calibrating your reference story
The easiest way to make points meaningful: pick one already-completed story the team remembers well, call it a 2, and estimate everything against it. "Is this bigger or smaller than the login form we did last sprint?" beats "is this closer to 5 hours or 8 hours?" every time.
Why planning poker helps
Relative estimation is a team consensus exercise. If one person says 3 and another says 13, they're not looking at the same story — and the conversation that follows is where alignment actually happens. Planning poker forces independent voting first so the conversation isn't anchored to whoever speaks loudest.
Fibonacci isn't arbitrary
The gaps grow on purpose. The difference between 1 and 2 is meaningful. The difference between 18 and 19 is not. Fibonacci reflects that honest uncertainty scales — the bigger the story, the less you know. See Fibonacci vs T-shirt sizing for when a different deck makes sense.
Velocity without misusing it
Once you consistently estimate and complete stories, the team's velocity (points completed per sprint) stabilizes. Velocity is useful for the team's own planning ("can we fit this into the sprint?") but dangerous as a cross-team metric — points are calibrated per team. Comparing velocities across teams usually optimizes for inflation, not throughput.
Anti-patterns to watch
- Converting points to hours. Defeats the purpose. If leadership asks for an hour estimate, give it separately — don't calibrate the deck around it.
- Punishing teams for "low velocity". Teams will respond by inflating estimates. Velocity is a planning aid, not a performance review tool.
- Estimating half the backlog. Estimate what you'll actually plan this sprint or the next. Points on stories two months out decay fast.
How to run your first calibrated session
- Pick a 2-point reference story everyone remembers.
- Take 5–10 recent stories of varying size, sort them by relative effort, and assign points.
- Use that ordered list as the team's baseline for the next planning session.
- Run the next session with backlog imports from Jira or Notion, and re-calibrate the reference every quarter or two.