Spec-Driven Development with GitHub Copilot¶
You don't need to know what a specification is, or how to write one, to work in a structured, deliberate way with Copilot. The Plan agent does that for you - it interviews you about what you want, asks the right clarifying questions, and turns your answers into a concrete implementation plan before a single line of code is written.
This section is designed as a tutorial which walks through the full cycle using a real example: a Python tool that reads an Excel timesheet export from Maconomy and produces a project hours summary.
A sample file is included in this folder: Fictional Timesheet.xlsx - a realistic Maconomy export covering two years of consultant time, with three sheets (Week, Month, Year), ~1,300 data rows, and columns like Client Working Hours, Non-Client Working Hours, Absence, and ISO Week numbers. Use it as your test data throughout the exercise.
Why This Workflow?¶
Most people jump straight to asking Copilot to write code. That works for small tasks - but for anything beyond a few lines, it leads to code that almost-but-not-quite does what you wanted, which you then spend time patching rather than thinking clearly. The workflow in this tutorial builds good habits:
| Phase | What you do | Why it matters |
|---|---|---|
| Plan | Describe what you want; Copilot asks questions and writes a spec | Catches misunderstandingsbefore they become code. Cheap to change a plan; expensive to change working code. |
| Implement | Copilot writes the code step by step, you review each change | You stay in control. Every change is shown as a diff - read it and push back if it's wrong. |
| Run it | Copilot runs the tool with your test data | Confirms the code actually works end-to-end before you invest time in tests. |
| Test | Copilot writes automated tests for the behaviour you described | Tests prove the code does what you agreed. They catch regressions when you change things later. You don't need to know how to write tests - just ask. |
| Code review | Copilot reviews what it wrote, looking for bugs and gaps | AI-generated code contains mistakes. A second pass - especially with adifferent model - catches things the first pass missed. |
Before You Start¶
Open VS Code in an empty folder - this will be your project. All files Copilot creates will land here.
Create the folder anywhere you like, then open it: File → Open Folder.
Copy Fictional Timesheet.xlsx from this tutorial folder into your project folder. You'll use it as test data throughout the exercise.
The Idea: Project Hours Summary Tool¶
AFRY consultants often export their logged hours from Maconomy into Excel. The resulting file is more complex than a flat table - it has three sheets at different levels of granularity (daily, monthly, yearly), separate columns for chargeable and non-chargeable hours, and ISO week numbers. Summarising that manually - utilisation rates, totals per project, trends by week - is tedious and error-prone.
We'll build a small Python tool that:
- Reads the Week sheet from a Maconomy
.xlsxexport - Calculates chargeable utilisation - client hours as a percentage of total working hours
- Summarises hours per project and per ISO week
- Optionally filters by date range or project code
- Prints the summary to the terminal
You don't need to know any of this in advance - you'll just describe the rough idea to Copilot, and the Plan agent will figure out the details with you.
Step 1: Inspect the Data¶
Before starting the planning interview, give Copilot a chance to look at the actual file. This makes the planning conversation much sharper - instead of asking generic questions about column names, Copilot will ask specific, useful questions based on what it can see.
Use the regular Agent for this step, not Plan. Plan mode cannot run terminal commands, so it cannot open or inspect an Excel file. You'll switch to Plan in the next step.
- Open the Chat panel (
Ctrl+Alt+I) - Click the paperclip icon in the chat input bar and attach Fictional Timesheet.xlsx (or drag it directly into the chat)
- Ask Copilot to describe what it sees:
I've attached a Maconomy timesheet export. Can you describe its structure?
How many sheets does it have, what are the column names on each, and what
kind of data is in each column?
Copilot will enumerate the three sheets, the column names, and what the data looks like.
Why this matters: Without seeing the file, Copilot asks generic questions and makes assumptions. With the file attached, it asks sharper questions - "should I use the Week sheet or the Month sheet?" and "should absence days be excluded from the utilisation calculation?" - which lead to a much more accurate implementation plan.
Step 2: Create the Plan¶
Open the Chat panel (Ctrl+Alt+I) and select Plan from the agents dropdown.
The file context from Step 1 is still available in the conversation - no need to attach it again. Describe your idea:
I want to build a Python tool that reads this Maconomy timesheet export
and gives me a utilisation summary - how much of my time was chargeable,
broken down by project and by week. Let us work on the specifications together.
That's it. Press Enter.
The Interview¶
The Plan agent will respond with clarifying questions. This is intentional - it's extracting the requirements from you rather than making assumptions. Answer honestly and conversationally.
A typical exchange might look like:
Copilot: The file has three sheets - Week, Month, and Year. Which should I use as the primary data source?
You: The Week sheet. It has one row per project entry per day, which gives the most flexibility for filtering and grouping.
Copilot: The Week sheet has both
Client Working HoursandNon-Client Working Hourscolumns. Should utilisation be based only on client hours, or should I include non-client hours in the total too?
You: Total hours in the denominator (client + non-client), client hours in the numerator.
Copilot: Should absence days be excluded from the utilisation calculation, or treated as non-chargeable time?
You: Excluded. I only want to measure utilisation across days I was actually working.
Copilot: Should the project summary show
Subproject No.(e.g.GBR002100-001),Subproject Name, or both?
You: Both - the code and the name next to it.
Copilot: Should this be a command-line tool with arguments, or a script you just run directly?
You: CLI. At minimum an argument for the input file path, and optionally a date range.
Keep answering until Copilot stops asking. If you're unsure about something, say so - "I don't know, you decide" is a perfectly valid answer.
Step 3: Review and Refine the Plan¶
Once the interview is done, the Plan agent produces a structured implementation plan. It will look something like:
## Implementation Plan: Maconomy Hours Summary Tool
1. Set up project structure
- Create hours_summary.py
- Add requirements.txt (pandas, openpyxl, tabulate)
2. Implement Excel reader
- Read the "Week" sheet from the .xlsx file using pandas
- Validate expected columns: Subproject No., Subproject Name, Task,
Date, Hours, Client Working Hours, Non-Client Working Hours, Absence
- Parse Date as datetime; parse Week as integer
- Handle missing file or unrecognised format gracefully
3. Implement utilisation logic
- Exclude absence rows (where Absence > 0) before calculating utilisation
- Utilisation % = Client Working Hours / (Client + Non-Client) * 100
- Aggregate per Subproject No. + Subproject Name
4. Implement weekly breakdown
- Group by ISO Week number; show total and client hours per week
- Highlight the highest- and lowest-utilisation weeks
5. Implement output
- Print summary tables to terminal using tabulate:
(a) Overall utilisation % and total hours
(b) Hours and utilisation % per project
(c) Hours per ISO week
6. Implement CLI
- --input: path to .xlsx file
- --from / --to: optional date range filter (YYYY-MM-DD)
- --project: optional filter by Subproject No.
Read the plan carefully. This is the moment to catch misunderstandings before they become code.
- Does it match what you had in mind?
- Is there anything missing or wrong?
Push back with follow-up messages if needed:
"I don't want pandas as a dependency - use openpyxl directly"
"Add a --week argument so I can filter by ISO week number"
"Don't bother with the output Excel file for now - just terminal output"
Iterate until the plan is right. It's cheap to change a plan. It's expensive to change code.
When you're happy, click Open in editor - this opens the plan as an editable document in VS Code. Read through it once more and make any final tweaks directly in the file.
Tip: The Plan agent automatically saves the plan to
/memories/session/plan.md. You can retrieve it later via the Command Palette:Chat: Show Memory Files.
Once you're done reviewing, open a new chat (Ctrl+Alt+I), make sure the regular Agent is selected, and kick off the implementation:
Step 4: Implement¶
After handing off from Plan, the Agent agent takes over. It will:
- Create the project files and folder structure (
hours_summary.py,requirements.txt, etc.) - Write the implementation following the plan step by step
- Run terminal commands as needed (e.g.,
pip install openpyxl tabulate) - Show you inline diffs for every file change
What to Watch For¶
Review each file change before accepting. The agent shows diffs inline - use the overlay controls to Keep or Undo individual changes. Don't accept everything blindly.
Answer permission prompts honestly. If the agent asks to run a terminal command you don't recognise, ask it first:
What does this command do and why is it needed?
Steer mid-flight if needed. If the implementation diverges from what you discussed in the plan, say so:
The output should show a table grouped by project first, then employees within each project - not a flat list.
Keeping the Agent on Track¶
If the conversation grows long, the agent may start to drift. You can re-anchor it to the plan at any point:
Look at the plan in #file:plan.md. The --project filter hasn't been implemented yet.
Please implement that next.
Step 5: Run It¶
Before writing tests, confirm the tool actually works end-to-end with your sample data.
Ask the Agent:
The Agent will run the command in the terminal and show you the output. Check:
- Does it print output without errors?
- Do the numbers look plausible? (Spot-check a week manually against the raw data.)
If something is wrong, describe what you expected and let the Agent fix it before moving on to tests.
Step 6: Write Tests¶
You don't need to know how to write tests, or even what to test - just ask:
The Agent will read the implementation and generate a test file covering the main behaviours. It will also run the tests and fix any failures it finds.
Once the initial tests are passing, push for more coverage:
- Edge cases:
"Add a test for a file where all hours are 0" - Invalid inputs:
"Test what happens if the Hours column contains text instead of numbers" - Boundary conditions:
"Test with a single row, and with an empty sheet" - Your specific requirements:
"Add a test that confirms absence rows are excluded from the utilisation denominator"
Test-First Variant¶
If you want to try a more disciplined approach, you can ask for tests before the implementation. Go back to Step 3, and before opening the plan in the editor, add:
Before we implement, please also add a set of failing unit tests to the plan
that verify the key behaviours. The implementation should make these tests pass.
Step 7: Code Review¶
AI-generated code is not always correct - it can contain bugs, miss edge cases, and make poor design decisions. Asking Copilot to review what it just wrote is a cheap way to catch problems before they cause trouble.
Tip: switch models before reviewing. The model that wrote the code is likely to miss the same things it missed when writing. Switch to a different model in the chat model picker - a fresh perspective catches more.
Self-Review Before Committing¶
Review #file:hours_summary.py for:
1. Potential bugs or unhandled edge cases
2. Input validation - what if the Excel file has unexpected column names or types?
3. Anything that would make this fragile or hard to maintain
Spec Compliance Check¶
Reference the plan Copilot generated earlier:
Based on our earlier implementation plan, does hours_summary.py implement
everything we agreed on? Are there any gaps?
Focused Review Prompts¶
Robustness:
What happens if the Hours column contains a blank cell or the text "n/a"?
Is this handled gracefully?
Correctness:
Does the per-project summary correctly aggregate hours when the same
employee appears across multiple rows for the same project?
What Copilot Catches Well vs. What Needs Human Eyes¶
| Copilot catches well | Needs human judgement |
|---|---|
| Missing error handling | Whether the output format is actually useful |
| Off-by-one and aggregation errors | Whether the column names match your real files |
| Unhandled file/IO exceptions | Whether the filtering logic fits your team's workflow |
| Inconsistency with the agreed plan | Performance with very large files |
Implementing the Fixes¶
Once the review is done, apply everything in one go:
Full Workflow Summary¶
[Agent] Inspect the data file
↓
[Plan] Describe your idea → answer interview questions
↓
Review and edit the plan (Open in editor)
↓
[Agent] Implement the plan → review diffs
↓
Run the tool with test data
↓
Ask Agent to write tests → review and extend
↓
Switch model → ask for code review → apply fixes
Tips for Getting the Best Results¶
You Don't Need to Know Everything Upfront¶
The whole point of starting in Plan mode is that Copilot extracts the requirements from you through conversation. If you don't know the answer to one of its questions, say so - it will suggest a reasonable default.
Iterate on the Plan Before Implementing¶
It's cheap to change a plan. It's expensive to change code. Spend time in Plan until you're genuinely happy with the steps before clicking Start Implementation.
Re-anchor With the Plan if the Agent Drifts¶
In a long session, the Agent can forget earlier decisions. Remind it:
Review Diffs, Don't Blindly Accept¶
Every change the Agent makes is shown as a diff. Read them. You are responsible for the code - Copilot is your assistant, not your replacement.
Save the Plan as a Spec¶
After a successful session, ask Copilot to write the plan out as a docs/spec.md file. This becomes your documentation, your onboarding guide, and your starting point for the next session.
Please write a clean specification document based on everything we built
and save it as docs/spec.md
Extension Exercises¶
Once the core tool works, try these - each one requires a new planning conversation.
1. Utilisation trend chart¶
Add a matplotlib bar chart showing my monthly utilisation % over the full
dataset. Each bar = one month, coloured green if above 75%, amber if 50–75%,
red if below 50%. Save it as utilisation_trend.png.
2. Busiest and quietest weeks¶
Extend the tool to print my 5 highest- and 5 lowest-utilisation weeks,
with the date range of each week and the top project worked on that week.
3. Project timeline¶
Add a --timeline flag that prints a text-based Gantt chart showing which
projects were active in which months, based on the first and last entry
for each project in the file.
4. Test-first variant¶
Go back to Step 1 and this time, before clicking Start Implementation, add:
Before implementing, write failing unit tests that cover:
- Correct utilisation calculation when absence rows are present
- Correct handling of a project with only Non-Client hours (utilisation = 0%)
- --from / --to date filter correctly excludes out-of-range rows
- A graceful error message when the input file is missing
Don't implement the logic yet.
Then hand off to Agent and watch it make the tests pass.
Next Steps¶
- Try the same workflow on a real task from your own work
- After the session, ask Copilot to generate the spec document from the plan - this is how spec-driven development starts to happen naturally
- Explore custom agents to encode your team's preferred planning style into a reusable agent