Skip to content

Worked Example: Automating a Tax Return PDF Form

This detailed step-by-step example demonstrates how to use flyfield for extracting, marking up, generating form fields, filling, and capturing data from a real complex form. It starts with copying the original PDF to a new working filename to safeguard the source and uses this filename as the base for every output file in the workflow.


Prerequisites

Before following this example, you should be familiar with the Quick Start Guide.

Take a look at the CSV contents:
code,fill
1-3-1,123
1-3-2,456
1-3-3,789
1-4-1,GREENFIELD FAMILY TRUST
1-6-1,12
1-6-2,345
1-6-3,678
1-6-4,901
1-9-1,45 MARKET STREET
1-11-1,SYDNEY
1-11-2,NSW
1-11-3,2000
2-4-1,DOE & CO TRUSTEES PTY LTD
2-6-1,98
2-6-2,765
2-6-3,432
2-6-4,109
2-6-5,(02) 9876 5432
2-10-1,D
2-15-1,62000
2-15-2,123
2-15-3,456
2-15-4,789
2-16-1,GREENFIELD FAMILY TRUST
4-6-1,3606
4-8-1,22870
4-10-1,26476
4-12-1,7069.35
5-1-1,123
5-1-2,456
5-1-3,789
5-8-1,26476
5-17-1,26476
6-3-1,463
6-3-5,463
6-4-1,15.64
6-6-1,26939
6-8-1,26939
11-7-1,26939
12-2-1,X
12-3-1,DOE
12-4-1,JOHN
12-11-1,1
12-11-2,4
12-11-3,1945
12-12-1,30
12-13-1,8889
12-25-1,1190
12-28-1,152
12-30-1,5.16
12-33-1,7547
12-35-1,2332.89
14-2-2,X
14-3-1,DOE
14-4-1,JANE
14-11-1,25
14-11-2,12
14-11-3,1950
14-13-1,17779
14-25-1,2380
14-28-1,305
14-30-1,10.32
14-33-1,15094
14-35-1,4665.77

Workflow Overview

  1. Copy the original PDF to a working filename
  2. Create a marked-up PDF to verify field positions
  3. Generate interactive form fields
  4. Fill form fields using CSV data
  5. Capture filled form data back into a CSV for editing or reuse

Step 1: Copy the Source PDF

Begin by copying the original tax return PDF:

copy Trust-tax-return-2024.pdf example.pdf

Outputs:

  • example.pdf — copied working PDF file.

This protects the integrity of the original form by avoiding accidental overwrites or corruption. It also keeps all outputs organized and traceable by using a convenient base filename example.pdf.


Step 2: Create Markup PDF for Verification

Create a visual markup of the fields for manual verification:

flyfield --input-pdf example.pdf --markup

Outputs:

  • example-markup.pdf — PDF showing detected field boxes and codes.
  • example.csv — updated field codes and metadata.

Review this file carefully to validate the correctness and alignment of detected form fields against the original.


Step 3: Generate Interactive PDF Form Fields

The --fields option tells flyfield to detect placeholder boxes and add interactive fields.

On long documents (e.g. 20 pages), this process can be slow.

Important best-practice:

  • Always process the full PDF to preserve consistent field codes that embed the original page number.
  • Do not physically extract and save partial page PDFs before running — doing so changes the page numbering context, breaking reproducibility.

If you want to limit processing while preserving field code consistency, use the --pdf-pages option. This processes only selected pages but retains original numbering.

Example: restrict processing to pages 1, 2, 4–6, 11, 12, and 14:

Run the command:

flyfield --input-pdf example.pdf --pdf-pages="1,2,4-6,11,12,14" --fields

Outputs:

  • example-fields.pdf — PDF with interactive form fields added
  • example-field-generator.py — Python script used to generate those fields
  • example.csv — extracted field codes and metadata

Advanced User Bonus: Editing the Python Field Generator Script

This generated Python script can be modified to customize or unify fields, for instance to link repeated field entries across pages.

Example: Linking the Tax File Number (TFN) fields at the top of page 5 and page 1.

  1. Open example-field-generator.py in a text editor
  2. Locate the code block that defines fields at the top of page 5
  3. Change each field name in that section from something like"5-1-1"to reference the page 1 TFN field names, e.g., "1-3-1", "1-3-2", "1-3-3"
  4. Save your changes
  5. Regenerate the filled PDF by running:
    python example-field-generator.py
    

This rebuilds example-fields.pdf with linked fields, so entering the TFN once will automatically update it on both pages.


Step 4: Fill Form Fields With CSV Data

After confirming fields, supply a CSV file containing at least two columns, code and fill, to fill the interactive form fields with your CSV data:

flyfield --input-pdf example-fields.pdf --fill example-fields-filled-capture.csv

Outputs:

  • example-fields-filled.pdf — filled PDF form with CSV data inserted
  • example-fields-filler.py — Python script used to perform filling (editable for customization)

Step 5: Capture Data From Filled Form

Capture the filled form data back into a CSV file for review or reuse:

flyfield --input-pdf example-fields-filled.pdf --capture

Outputs:

  • example-fields-filled-capture.csv — extracted form data, which can be edited and reused.

Demonstrated

  • Working safely with a copy of the original PDF to protect the source
  • Validating detected form fields visually by generating a markup PDF
  • Improving efficiency by generating fields only on required pages
  • Tailoring workflows by direct editing and rerunning the generated Python field generation script
  • Managing data consistency by reusing CSV data for filling and capturing

This example provides a practical, real-world automation workflow for PDF form processing using flyfield. It assumes basic familiarity covered by the Quick Start Guide, guiding users through safely and effectively managing complex tax return forms.