Worked Example: Automating a Tax Return PDF Form
This detailed step-by-step example demonstrates how to use flyfield for extracting, marking up, generating form fields, filling, and capturing data from a real complex form. It starts with copying the original PDF to a new working filename to safeguard the source and uses this filename as the base for every output file in the workflow.
Prerequisites
Before following this example, you should be familiar with the Quick Start Guide.
- flyfield installed and operational
- Source PDF: Trust-tax-return-2024.pdf
- CSV file containing field codes and associated data: example-fields-filled-capture.csv
Take a look at the CSV contents:
code,fill
1-3-1,123
1-3-2,456
1-3-3,789
1-4-1,GREENFIELD FAMILY TRUST
1-6-1,12
1-6-2,345
1-6-3,678
1-6-4,901
1-9-1,45 MARKET STREET
1-11-1,SYDNEY
1-11-2,NSW
1-11-3,2000
2-4-1,DOE & CO TRUSTEES PTY LTD
2-6-1,98
2-6-2,765
2-6-3,432
2-6-4,109
2-6-5,(02) 9876 5432
2-10-1,D
2-15-1,62000
2-15-2,123
2-15-3,456
2-15-4,789
2-16-1,GREENFIELD FAMILY TRUST
4-6-1,3606
4-8-1,22870
4-10-1,26476
4-12-1,7069.35
5-1-1,123
5-1-2,456
5-1-3,789
5-8-1,26476
5-17-1,26476
6-3-1,463
6-3-5,463
6-4-1,15.64
6-6-1,26939
6-8-1,26939
11-7-1,26939
12-2-1,X
12-3-1,DOE
12-4-1,JOHN
12-11-1,1
12-11-2,4
12-11-3,1945
12-12-1,30
12-13-1,8889
12-25-1,1190
12-28-1,152
12-30-1,5.16
12-33-1,7547
12-35-1,2332.89
14-2-2,X
14-3-1,DOE
14-4-1,JANE
14-11-1,25
14-11-2,12
14-11-3,1950
14-13-1,17779
14-25-1,2380
14-28-1,305
14-30-1,10.32
14-33-1,15094
14-35-1,4665.77
Workflow Overview
- Copy the original PDF to a working filename
- Create a marked-up PDF to verify field positions
- Generate interactive form fields
- Fill form fields using CSV data
- Capture filled form data back into a CSV for editing or reuse
Step 1: Copy the Source PDF
Begin by copying the original tax return PDF:
copy Trust-tax-return-2024.pdf example.pdf
Outputs:
example.pdf
— copied working PDF file.
This protects the integrity of the original form by avoiding accidental overwrites or corruption. It also keeps all outputs organized and traceable by using a convenient base filename example.pdf
.
Step 2: Create Markup PDF for Verification
Create a visual markup of the fields for manual verification:
flyfield --input-pdf example.pdf --markup
Outputs:
example-markup.pdf
— PDF showing detected field boxes and codes.example.csv
— updated field codes and metadata.
Review this file carefully to validate the correctness and alignment of detected form fields against the original.
Step 3: Generate Interactive PDF Form Fields
The --fields
option tells flyfield to detect placeholder boxes and add interactive fields.
On long documents (e.g. 20 pages), this process can be slow.
Important best-practice:
- Always process the full PDF to preserve consistent field codes that embed the original page number.
- Do not physically extract and save partial page PDFs before running — doing so changes the page numbering context, breaking reproducibility.
If you want to limit processing while preserving field code consistency, use the --pdf-pages
option. This processes only selected pages but retains original numbering.
Example: restrict processing to pages 1, 2, 4–6, 11, 12, and 14:
Run the command:
flyfield --input-pdf example.pdf --pdf-pages="1,2,4-6,11,12,14" --fields
Outputs:
example-fields.pdf
— PDF with interactive form fields addedexample-field-generator.py
— Python script used to generate those fieldsexample.csv
— extracted field codes and metadata
Advanced User Bonus: Editing the Python Field Generator Script
This generated Python script can be modified to customize or unify fields, for instance to link repeated field entries across pages.
Example: Linking the Tax File Number (TFN) fields at the top of page 5 and page 1.
- Open
example-field-generator.py
in a text editor - Locate the code block that defines fields at the top of page 5
- Change each field name in that section from something like
"5-1-1"
to reference the page 1 TFN field names, e.g.,"1-3-1"
,"1-3-2"
,"1-3-3"
- Save your changes
- Regenerate the filled PDF by running:
python example-field-generator.py
This rebuilds example-fields.pdf
with linked fields, so entering the TFN once will automatically update it on both pages.
Step 4: Fill Form Fields With CSV Data
After confirming fields, supply a CSV file containing at least two columns, code
and fill
, to fill the interactive form fields with your CSV data:
flyfield --input-pdf example-fields.pdf --fill example-fields-filled-capture.csv
Outputs:
example-fields-filled.pdf
— filled PDF form with CSV data insertedexample-fields-filler.py
— Python script used to perform filling (editable for customization)
Step 5: Capture Data From Filled Form
Capture the filled form data back into a CSV file for review or reuse:
flyfield --input-pdf example-fields-filled.pdf --capture
Outputs:
example-fields-filled-capture.csv
— extracted form data, which can be edited and reused.
Demonstrated
- Working safely with a copy of the original PDF to protect the source
- Validating detected form fields visually by generating a markup PDF
- Improving efficiency by generating fields only on required pages
- Tailoring workflows by direct editing and rerunning the generated Python field generation script
- Managing data consistency by reusing CSV data for filling and capturing
This example provides a practical, real-world automation workflow for PDF form processing using flyfield. It assumes basic familiarity covered by the Quick Start Guide, guiding users through safely and effectively managing complex tax return forms.