API Documentation for flyfield

This documentation provides a detailed reference to the flyfield Python API for programmatically working with PDF forms that use white box placeholders.

Overview

The flyfield API automates workflows including:

Extracting white box placeholders from vector PDFs
Filtering, deduplicating, and grouping detected regions into logical fields
Generating interactive AcroForm fields in PDFs programmatically
Filling form fields with data from CSV files
Capturing data back from filled PDFs into CSV

The API is modular and can be imported into Python projects, offering programmable control beyond the CLI.

Key Modules and Functions

1. Extraction (`extract.py`)

extract_boxes(pdf_path: str) -> List[dict] Extracts all white boxes from a PDF that match config.TARGET_COLOUR (pure white by default).
- Converts coordinates to the standard bottom-left PDF system.
- Returns a list of box dictionaries with metadata such as page_num, bbox, chars, and field_type.
filter_boxes(page: fitz.Page, boxes: List[dict]) -> List[dict] Filters raw boxes by:
- Size (MIN_BOX_HEIGHT, MAX_BOX_HEIGHT)
- Allowed text (utils.allowed_text)
- Retains only candidate placeholders.
remove_duplicates(boxes: List[dict]) -> List[dict] Removes duplicates based on rounded coordinates on each page.
sort_boxes(boxes: List[dict], decimal_places: int=0) -> List[dict] Sorts results top-to-bottom, then left-to-right.
process_boxes(pdf_path: str, csv_path: str) -> Dict[int, List[dict]] Full extraction pipeline:
Extract → Filter → Deduplicate → Sort
Compute layout fields (calculate_layout_fields)
Assign numeric block types (assign_numeric_blocks)
Save annotated results to CSV Returns a dictionary keyed by page_num.

2. Layout (`layout.py`)

calculate_layout_fields(boxes: List[dict]) -> Dict[int, List[dict]] Annotates box rows with:
- IDs, line numbers, block grouping
- Block length/width
- Concatenated block_fill text or formatted money values
assign_numeric_blocks(page_dict: Dict[int, List[dict]]) -> Dict[int, List[dict]] Merges sequential numeric blocks (e.g. ### ### ## patterns) into currency fields.
- Assigns "Currency" or "CurrencyDecimal" where applicable.

3. CSV I/O (`io_utils.py`)

load_boxes_from_csv(csv_path: str) -> Dict[int, List[dict]] Reads CSV data into a page dictionary for further processing.
write_csv(data, csv_path: str) -> None Writes box/page data back to CSV in canonical format.
- Ensures only one fill column is stored (block_fill or fallback fill).
read_csv_rows(filename: str) -> List[dict] Reads CSV into dictionaries, parsing numeric fills with parse_money_space or parse_implied_decimal.
save_pdf_form_data_to_csv(pdf_path: str, csv_path: str, boxes: dict=None) -> None Captures filled AcroForm values from a PDF and writes them to CSV.
- Applies NUMERIC_FIELD_TYPES parsing rules.
- Uppercases strings where applicable.

4. Markup and Field Scripts (`markup_and_fields.py`)

markup_pdf(pdf_path: str, page_dict: Dict[int,List[dict]], output_pdf: str, mark_color=(0,0,1)) -> None Creates a debug PDF marking detected fields with circles and rotated field codes.
generate_form_fields_script(csv_path: str, input_pdf: str, output_pdf: str, script_path: str) -> str Generates a standalone Python script that adds AcroForm fields to a given PDF, based on detected CSV data.
run_standalone_script(script_path: str) -> None Executes the generated script in a subprocess to apply fields.
run_fill_pdf_fields(csv_path: str, output_pdf: str, template_pdf: str, generator_script: str, boxes: dict=None) -> None Generates and runs a filler script that populates an interactive PDF with values from a CSV.
- Supports monetary formatting via format_money_space.
- Supports normalization of Currency/CurrencyDecimal values by stripping non-digits.

5. Utilities (`utils.py`)

add_suffix_to_filename(filename: str, suffix: str) -> str Adds a suffix before the file extension.
colour_match(color: Tuple, target_color=(1,1,1), tol=1e-3) -> bool Compares normalized RGB colors with tolerance.
int_to_rgb(color_int: int) -> Tuple[float,float,float] Converts an integer 0xRRGGBB color to normalized floats.
clean_fill_string(line_text: str) -> str Removes single spaces but preserves aligned spacing.
allowed_text(text: str, field_type: Optional[str]) -> Tuple[bool, Optional[str]] Checks whether a string value inside a field is allowed (filters out pre-printed text).
format_money_space(amount: Union[float,int], decimal=True) -> str Formats numeric values with:
- Space as thousand separator
- Space as decimal marker (if decimal=True)
parse_money_space(s: str, decimal=True) -> Union[int,float] Parses strings formatted above back into numbers.
parse_implied_decimal(s: str) -> float Parses numbers treating the last two digits as cents.
parse_pages(pages_str: str) -> List[int] Parses "1,3-5,7" into [1,3,4,5,7].
conditional_merge_list(main_list, ref_list, match_key, keys_to_merge) Merges keys from a reference list into a main list when values of match_key match.

Field Data Structure

flyfield represents form fields as dictionaries (not classes):

Key	Type	Description
`code`	str	Unique identifier (`page-line-block` naming scheme)
`page_num`	int	PDF page number (1-based)
`x0,y0,x1,y1`	float	Bounding box coordinates (PDF bottom-left system)
`left, right`	float	Rounded left/right coordinates
`top, bottom`	float	Rounded positions
`line`	int	Line number on page
`block`	int	Block number within line
`block_length`	int	Number of boxes in block
`block_width`	float	Width of block in points
`field_type`	str	One of `"Dollars"`, `"DollarCents"`, `"Currency"`, etc.
`chars`	str	Non-black overlay text extracted
`fill`	str/num	Overlay text (user values, may be pre-filled)
`block_fill`	str/num	Aggregated/normalized block fill

Example Usage

from flyfield.extract import process_boxes
from flyfield.io_utils import save_pdf_form_data_to_csv
from flyfield.markup_and_fields import  run_fill_pdf_fields
from flyfield import config

# Process boxes and save CSV
page_dict = process_boxes("example.pdf", "example.csv")

# Generate a markup PDF
from flyfield.markup_and_fields import markup_pdf
markup_pdf("example.pdf", page_dict, "example-markup.pdf")

# Fill fields with values from another CSV
run_fill_pdf_fields("example.csv",
                    "example-filled.pdf",
                    "example-fields.pdf",
                    "example-filler.py",
                    page_dict)

# Capture back to CSV after filling
save_pdf_form_data_to_csv("example-filled.pdf", "example-capture.csv", page_dict)

Info

flyfield depends on PyMuPDF (fitz) for box extraction and markup, and PyPDFForm for form field creation and filling.
Monetary/Currency parsing is opinionated.
All generated scripts (-field-generator.py, -filler.py) are standalone and reusable in case of workflow adjustments.
Debug logging (--debug) outputs stepwise CSVs for troubleshooting.

Further Resources

Configuration Reference — adjustable thresholds and suffixes
Developer Guide — core architecture and extension points
Worked Example — end-to-end workflow with CSV integration

Automatic documentation from sources by mkdocstrings.

Core Modules

`flyfield.extract`

Extraction functions for PDF processing.

Provides methods to extract PDF box data and text.

`extract_boxes(pdf_path)`

Extract filled rectangles (boxes) from a PDF matching a target color.

Parameters:

Name	Type	Description	Default
`pdf_path`	`str`	Path to the input PDF file.	required

Returns:

Type	Description
`List[Dict]`	list of dict: Each dict details box coordinates (PDF coordinates, origin bottom-left),
`List[Dict]`	page number, and other metadata for detected boxes.

Notes

Converts PyMuPDF coordinates (origin top-left) to PDF standard bottom-left origin. Only boxes filled with the target color are extracted.

Source code in flyfield/extract.py

def extract_boxes(pdf_path: str) -> List[Dict]:
    """
    Extract filled rectangles (boxes) from a PDF matching a target color.

    Args:
        pdf_path (str): Path to the input PDF file.

    Returns:
        list of dict: Each dict details box coordinates (PDF coordinates, origin bottom-left),
        page number, and other metadata for detected boxes.

    Notes:
        Converts PyMuPDF coordinates (origin top-left) to PDF standard bottom-left origin.
        Only boxes filled with the target color are extracted.
    """
    boxes = []
    try:
        with fitz.open(pdf_path) as doc:
            for page_num in range(1, len(doc) + 1):
                try:
                    page = doc[page_num - 1]
                except IndexError:
                    logger.warning(f"Page {page_num} not found in document.")
                    continue
                page_height = page.rect.height
                for drawing in page.get_drawings():
                    rect = drawing.get("rect")
                    fill_color = drawing.get("fill")
                    if rect and colour_match(fill_color, target_color=TARGET_COLOUR):
                        # Convert PyMuPDF page coordinates (origin top-left)
                        # to PDF coordinate system (origin bottom-left)

                        pdf_y0 = page_height - rect.y1
                        pdf_y1 = page_height - rect.y0
                        boxes.append(
                            {
                                "page_num": page_num,
                                "x0": rect.x0,
                                "y0": pdf_y0,
                                "x1": rect.x1,
                                "y1": pdf_y1,
                                "left": round(rect.x0, 2),
                                "bottom": round(pdf_y0, 2),
                                "right": round(rect.x1, 2),
                                "top": round(pdf_y1, 2),
                                "chars": "",
                                "field_type": None,
                            }
                        )
    except Exception as e:
        logger.error(f"Could not open PDF file {pdf_path}: {e}")
    return boxes

`filter_boxes(page, boxes)`

Filter a list of boxes on a PDF page based on height and allowed text.

Parameters:

Name	Type	Description	Default
`page`	`Page`	PyMuPDF page object.	required
`boxes`	`list of dict`	List of box dictionaries to filter.	required

Returns:

Type	Description
`List[Dict]`	list of dict: Filtered boxes that meet size and allowed text criteria.

Notes

Excludes boxes outside valid height ranges or with disallowed text.

Source code in flyfield/extract.py

def filter_boxes(page: fitz.Page, boxes: List[Dict]) -> List[Dict]:
    """
    Filter a list of boxes on a PDF page based on height and allowed text.

    Args:
        page (fitz.Page): PyMuPDF page object.
        boxes (list of dict): List of box dictionaries to filter.

    Returns:
        list of dict: Filtered boxes that meet size and allowed text criteria.

    Notes:
        Excludes boxes outside valid height ranges or with disallowed text.
    """
    filtered = []
    page_height = page.rect.height
    black = (0, 0, 0)  # RGB for black text matching

    for box in boxes:
        height = box.get("y1", 0) - box.get("y0", 0)
        if height < MIN_BOX_HEIGHT or height > MAX_BOX_HEIGHT:
            continue
        # Convert box coordinates to PyMuPDF's coordinate system for clipping

        pymupdf_y0 = page_height - box["y1"]
        pymupdf_y1 = page_height - box["y0"]
        clip_rect = fitz.Rect(box["x0"], pymupdf_y0, box["x1"], pymupdf_y1)

        text_dict = page.get_text("dict", clip=clip_rect)

        black_text_parts = []
        non_black_text_parts = []

        for block in text_dict.get("blocks", []):
            for line in block.get("lines", []):
                for span in line.get("spans", []):
                    span_text = span.get("text", "").strip()
                    if not span_text:
                        continue
                    span_color = span.get("color")
                    rgb = None
                    if span_color is not None:
                        if isinstance(span_color, int):
                            rgb = int_to_rgb(span_color)
                        elif isinstance(span_color, str):
                            try:
                                rgb = fitz.utils.getColor(span_color)
                            except Exception:
                                rgb = None
                    if rgb and colour_match(rgb, target_color=black):
                        black_text_parts.append(span_text)
                    else:
                        non_black_text_parts.append(span_text)
        fill_text = "".join(black_text_parts)
        box_text = "".join(non_black_text_parts)

        allowed, detected_field_type = allowed_text(
            box_text, field_type=box.get("field_type")
        )
        if box_text and not allowed:
            continue
        box["field_type"] = detected_field_type
        box["chars"] = box_text
        box["fill"] = fill_text
        filtered.append(box)
    return filtered

`remove_duplicates(boxes)`

Remove duplicate boxes on the same page based on rounded coordinates.

Parameters:

Name	Type	Description	Default
`boxes`	`list of dict`	List of box dictionaries.	required

Returns:

Type	Description
`List[Dict]`	list of dict: Boxes with duplicates removed.

Source code in flyfield/extract.py

def remove_duplicates(boxes: List[Dict]) -> List[Dict]:
    """
    Remove duplicate boxes on the same page based on rounded coordinates.

    Args:
        boxes (list of dict): List of box dictionaries.

    Returns:
        list of dict: Boxes with duplicates removed.
    """
    page_groups = defaultdict(list)
    for box in boxes:
        page_groups[box["page_num"]].append(box)
    cleaned = []
    for _page_num, page_boxes in page_groups.items():
        seen = set()
        for box in page_boxes:
            key = (
                round(box["x0"], 3),
                round(box["y0"], 3),
                round(box["x1"], 3),
                round(box["y1"], 3),
            )
            if key not in seen:
                seen.add(key)
                cleaned.append(box)
    return cleaned

`sort_boxes(boxes, decimal_places=0)`

Sort boxes by page number, top-to-bottom (descending), then left-to-right.

Parameters:

Name	Type	Description	Default
`boxes`	`list of dict`	List of boxes to sort.	required
`decimal_places`	`int`	Precision for vertical grouping (bottom coordinate rounding).	`0`

Returns:

Type	Description
`List[Dict]`	list of dict: Sorted boxes.

Source code in flyfield/extract.py

def sort_boxes(boxes: List[Dict], decimal_places: int = 0) -> List[Dict]:
    """
    Sort boxes by page number, top-to-bottom (descending), then left-to-right.

    Args:
        boxes (list of dict): List of boxes to sort.
        decimal_places (int): Precision for vertical grouping (bottom coordinate rounding).

    Returns:
        list of dict: Sorted boxes.
    """
    return sorted(
        boxes,
        key=lambda b: (b["page_num"], -round(b["bottom"], decimal_places), b["left"]),
    )

`process_boxes(pdf_path, csv_path)`

Full pipeline to extract, filter, deduplicate, sort, layout annotate, and save boxes from a PDF.

Parameters:

Name	Type	Description	Default
`pdf_path`	`str`	Path to input PDF file.	required
`csv_path`	`str`	Path to output CSV file for annotated box data.	required

Returns:

Name	Type	Description
`dict`	`Dict[int, List[Dict]]`	Dictionary keyed by page number containing processed boxes with layout metadata.

Notes

Extract filled white boxes matching TARGET_COLOUR.
Filter boxes by valid height and allowed text content.
Remove duplicate boxes by coordinate proximity.
Sort boxes by page, vertical then horizontal order.
Compute layout fields such as IDs, block grouping, lines.
Assign numeric block field types using heuristics.
Write the full annotated box data to CSV.

Source code in flyfield/extract.py

def process_boxes(pdf_path: str, csv_path: str) -> Dict[int, List[Dict]]:
    """
    Full pipeline to extract, filter, deduplicate, sort, layout annotate, and save boxes from a PDF.

    Args:
        pdf_path (str): Path to input PDF file.
        csv_path (str): Path to output CSV file for annotated box data.

    Returns:
        dict: Dictionary keyed by page number containing processed boxes with layout metadata.

    Notes:
        - Extract filled white boxes matching TARGET_COLOUR.
        - Filter boxes by valid height and allowed text content.
        - Remove duplicate boxes by coordinate proximity.
        - Sort boxes by page, vertical then horizontal order.
        - Compute layout fields such as IDs, block grouping, lines.
        - Assign numeric block field types using heuristics.
        - Write the full annotated box data to CSV.
    """
    logger.info(f"Extracting boxes from PDF: {pdf_path}")
    boxes = extract_boxes(pdf_path)
    logger.info(f"Extracted {len(boxes)} white boxes.")

    try:
        doc = fitz.open(pdf_path)
    except Exception as e:
        logger.error(f"Error opening input PDF: {e}")
        return defaultdict(list)
    if logger.isEnabledFor(logging.DEBUG):
        write_csv(boxes, csv_path.replace(".csv", "-extracted.csv"))
    filtered_boxes = []
    for page_num in range(1, len(doc) + 1):
        page_boxes = [p for p in boxes if p["page_num"] == page_num]
        filtered_boxes.extend(filter_boxes(doc[page_num - 1], page_boxes))
    doc.close()

    if logger.isEnabledFor(logging.DEBUG):
        write_csv(filtered_boxes, csv_path.replace(".csv", "-grouped.csv"))
    filtered_boxes = remove_duplicates(filtered_boxes)
    filtered_boxes = sort_boxes(filtered_boxes, decimal_places=-1)

    if logger.isEnabledFor(logging.DEBUG):
        write_csv(filtered_boxes, csv_path.replace(".csv", "-filtered.csv"))
    page_dict = calculate_layout_fields(filtered_boxes)

    if logger.isEnabledFor(logging.DEBUG):
        write_csv(filtered_boxes, csv_path.replace(".csv", "-layout.csv"))
    page_dict = assign_numeric_blocks(page_dict)

    write_csv(page_dict, csv_path)
    return page_dict

`flyfield.io_utils`

Utility functions for input/output operations.

Includes CSV reading/writing and data transformation helpers.

`load_boxes_from_csv(csv_path)`

Load boxes data from a CSV into a dictionary keyed by page number,

applying the specified types to each column in the CSV.

Parameters:

Name	Type	Description	Default
`csv_path`	`str`	Path to the CSV file.	required

Returns:

Type	Description
`dict[int, list[dict]]`	dict[int, list[dict]]: Dictionary mapping page number (int)
`dict[int, list[dict]]`	to a list of box dictionaries with appropriately typed values.

Description

The CSV is expected to contain columns:

page_num (int), id (int), x0 (float), y0 (float), x1 (float), y1 (float),
left (float), top (float), right (float), bottom (float),
height (float), width (float), pgap (float), gap (float),
line (int), block (int), block_length (int), block_width (float),
code (str), field_type (str), chars (str), fill (str)

Each value from the CSV is converted from string to the appropriate type. Empty or missing values are converted to None for numeric types and empty string for strings. Conversion errors are caught and logged; original strings are kept in those cases.

Source code in flyfield/io_utils.py

def load_boxes_from_csv(csv_path: str) -> dict[int, list[dict]]:
    """
    Load boxes data from a CSV into a dictionary keyed by page number,

    applying the specified types to each column in the CSV.

    Args:
        csv_path (str): Path to the CSV file.

    Returns:
        dict[int, list[dict]]: Dictionary mapping page number (int)
        to a list of box dictionaries with appropriately typed values.

    Description:
        The CSV is expected to contain columns:

        - page_num (int), id (int), x0 (float), y0 (float), x1 (float), y1 (float),
        - left (float), top (float), right (float), bottom (float),
        - height (float), width (float), pgap (float), gap (float),
        - line (int), block (int), block_length (int), block_width (float),
        - code (str), field_type (str), chars (str), fill (str)

        Each value from the CSV is converted from string to the appropriate type.
        Empty or missing values are converted to None for numeric types and empty string for strings.
        Conversion errors are caught and logged; original strings are kept in those cases.
    """
    logger.info(f"Reading blocks from CSV: {csv_path}")
    rows = read_csv_rows(csv_path)  # Should return list of dict[str, str]

    def convert_value(value: str, to_type):
        if value is None or value == "":
            if to_type in (int, float):
                return None
            return ""
        try:
            return to_type(value)
        except Exception as e:
            logger.warning(f"Failed to convert value '{value}' to {to_type}: {e}")
            return value  # fallback: keep original string

    page_dict = defaultdict(list)
    for row in rows:
        typed_row = {
            col: convert_value(row.get(col), col_type)
            for col, col_type in COLUMN_TYPES.items()
        }
        if typed_row.get("page_num") is not None:
            page_dict[typed_row["page_num"]].append(typed_row)
    return page_dict

`write_csv(boxes_or_page_dict, csv_path)`

Write box data or page dictionary data to CSV file.

Saves only one 'fill' column: - Uses 'block_fill' if present, - Otherwise falls back to original 'fill'.

Parameters:

Name	Type	Description	Default
`boxes_or_page_dict`	`list or dict`	List of box dicts or dict keyed by page containing lists of boxes.	required
`csv_path`	`str`	Output CSV file path.	required

Source code in flyfield/io_utils.py

def write_csv(
    boxes_or_page_dict: Union[List[Dict], Dict[int, List[Dict]]], csv_path: str
) -> None:
    """
    Write box data or page dictionary data to CSV file.

    Saves only one 'fill' column:
        - Uses 'block_fill' if present,
        - Otherwise falls back to original 'fill'.

    Args:
        boxes_or_page_dict (list or dict): List of box dicts or dict keyed by page containing lists of boxes.
        csv_path (str): Output CSV file path.
    """
    if isinstance(boxes_or_page_dict, dict):
        all_boxes = [
            box
            for boxes in boxes_or_page_dict.values()
            if boxes is not None
            for box in boxes
        ]
    else:
        all_boxes = boxes_or_page_dict or []
    try:
        with open(csv_path, "w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(CSV_HEADER)
            for box in all_boxes:
                height = round(box.get("y1", 0) - box.get("y0", 0), 1)
                width = round(box.get("x1", 0) - box.get("x0", 0), 1)
                fill_value = box.get("block_fill")
                if fill_value is None:
                    fill_value = box.get("fill", "")
                field_type = box.get("field_type")
                # Convert monetary fill values back to float/int as appropriate

                if (
                    field_type in ("Dollars", "DollarCents", "CurrencyDecimal")
                    and fill_value
                ):
                    decimal = field_type in ("DollarCents", "CurrencyDecimal")
                    try:
                        fill_value = parse_money_space(fill_value, decimal=decimal)
                    except Exception as e:
                        logger.warning(
                            f"Failed to parse money from fill_value '{fill_value}' for field_type '{field_type}': {e}"
                        )
                row = [
                    box.get("page_num", ""),
                    box.get("id", ""),
                    box.get("x0", ""),
                    box.get("y0", ""),
                    box.get("x1", ""),
                    box.get("y1", ""),
                    box.get("left", ""),
                    box.get("top", ""),
                    box.get("right", ""),
                    box.get("bottom", ""),
                    height,
                    width,
                    box.get("pgap", ""),
                    box.get("gap", ""),
                    box.get("line", ""),
                    box.get("block", ""),
                    box.get("block_length", ""),
                    box.get("block_width", ""),
                    box.get("code", ""),
                    box.get("field_type", ""),
                    box.get("chars", ""),
                    fill_value,
                ]

                writer.writerow(row)
    except Exception as e:
        logger.error(f"Failed to write CSV {csv_path}: {e}")

`read_csv_rows(filename)`

Read CSV rows into a list, converting typed fields and normalizing monetary fills.

Parameters:

Name	Type	Description	Default
`filename`	`str`	Path to CSV file.	required

Returns:

Type	Description
`List[Dict[str, str]]`	list of dict: Rows with typed values and block_fill normalized.

Source code in flyfield/io_utils.py

def read_csv_rows(filename: str) -> List[Dict[str, str]]:
    """
    Read CSV rows into a list, converting typed fields and normalizing monetary fills.

    Args:
        filename (str): Path to CSV file.

    Returns:
        list of dict: Rows with typed values and block_fill normalized.
    """
    rows = []
    currency_field_types = {"Dollars", "DollarCents", "Currency", "CurrencyDecimal"}

    try:
        with open(filename, newline="", encoding="utf-8") as f:
            reader = csv.DictReader(f)
            headers = reader.fieldnames or []
            is_extraction_csv = "page_num" in headers

            for row in reader:
                if is_extraction_csv:
                    try:
                        # Convert page_num, line, gap, block_length, height, width fields to correct types

                        row["page_num"] = (
                            int(row["page_num"]) if row["page_num"].strip() else None
                        )
                        row["line"] = int(row["line"]) if row["line"].strip() else None
                        row["gap"] = float(row["gap"]) if row["gap"].strip() else 0.0
                        row["block_length"] = (
                            int(row["block_length"])
                            if row["block_length"].strip()
                            else 0
                        )
                        row["height"] = float(row.get("height", 0))
                        row["width"] = float(row.get("width", 0))
                    except (ValueError, KeyError) as e:
                        logger.warning(f"Skipping row due to value error: {e}")
                        continue
                # Rearrange 'fill' to 'block_fill' with formatted monetary fields

                if "fill" in row:
                    fill_value = row["fill"]
                    field_type = row.get("field_type", "")

                    if field_type in currency_field_types and fill_value.strip():
                        if (
                            field_type in ("DollarCents", "CurrencyDecimal")
                            and " " not in fill_value
                        ):
                            # Use implied decimal parser for no explicit decimal separator

                            try:
                                amount = parse_implied_decimal(fill_value)
                                fill_value = format_money_space(amount, decimal=True)
                            except Exception as e:
                                logger.warning(
                                    f"Failed to parse implied decimal fill '{fill_value}' for field_type '{field_type}': {e}"
                                )
                        else:
                            # Use existing parser for explicit decimal formatting

                            decimal = field_type in ("DollarCents", "CurrencyDecimal")
                            try:
                                amount = parse_money_space(fill_value, decimal=decimal)
                                fill_value = format_money_space(amount, decimal=decimal)
                            except Exception as e:
                                logger.warning(
                                    f"Failed to parse/format fill '{fill_value}' for field_type '{field_type}': {e}"
                                )
                        row["block_fill"] = fill_value
                        del row["fill"]
                rows.append(row)
    except Exception as e:
        logger.error(f"Failed to read CSV rows from {filename}: {e}")
    return rows

`save_pdf_form_data_to_csv(pdf_path, csv_path, boxes=None)`

Extract PDF form data, convert string values to uppercase and numeric fields to raw numbers, then save as CSV.

Parameters:

Name	Type	Description	Default
`pdf_path`	`str`	Input PDF form file path.	required
`csv_path`	`str`	Output CSV path.	required
`boxes`	`dict`	Boxes metadata to enrich form data.	`None`

Returns:

Type	Description
`None`	None

Source code in flyfield/io_utils.py

def save_pdf_form_data_to_csv(
    pdf_path: str, csv_path: str, boxes: Optional[Dict[int, List[dict]]] = None
) -> None:
    """
    Extract PDF form data, convert string values to uppercase and numeric fields to raw numbers, then save as CSV.

    Args:
        pdf_path (str): Input PDF form file path.
        csv_path (str): Output CSV path.
        boxes (dict, optional): Boxes metadata to enrich form data.

    Returns:
        None
    """
    data = []
    try:
        # Extract form data; convert string values to uppercase where applicable

        form_data = {
            k: v.upper() if isinstance(v, str) else str(v)
            for k, v in PdfWrapper(pdf_path).data.items()
            if v is not None and str(v).strip() != "" and str(v).strip("0") != ""
        }
        # Convert raw data dict to list of dicts with explicit 'code' and 'value' keys

        data = [{"code": k, "value": v} for k, v in form_data.items()]
    except Exception as e:
        logger.error(f"Failed to extract data from {pdf_path}: {e}")
    logger.debug(f"Extracted PDF form data (type={type(data)}), count={len(data)}")

    if boxes:
        flat_boxes = [entry for sublist in boxes.values() for entry in sublist]
        conditional_merge_list(data, flat_boxes, "code", ["field_type"])
    try:
        with open(csv_path, mode="w", newline="", encoding="utf-8") as file:
            writer = csv.writer(file)
            # Write CSV header

            writer.writerow(["code", "fill"])

            # Write each field record as a CSV row

            for field in data:
                code = field.get("code")
                fill_value = field.get("value")
                field_type = field.get("field_type")

                logger.debug(
                    f"Code: {code}, Raw Value: {fill_value}, Field Type: {field_type}"
                )
                if field_type in NUMERIC_FIELD_TYPES and isinstance(fill_value, str):
                    try:
                        if field_type == "CurrencyDecimal":
                            amount = parse_implied_decimal(fill_value)
                            fill_value = str(amount)  # Save raw number, not formatted!
                            logger.debug(f"Parsed CurrencyDecimal: {fill_value}")
                        elif field_type in ("DollarCents", "Dollars"):
                            decimal = field_type == "DollarCents"
                            amount = parse_money_space(fill_value, decimal=decimal)
                            fill_value = str(amount)  # Save raw number, not formatted!
                            logger.debug(f"Parsed {field_type}: {fill_value}")
                    except Exception as e:
                        logger.warning(
                            f"Failed parsing money value '{fill_value}' for field '{code}': {e}"
                        )
                writer.writerow([code, fill_value])
    except Exception as e:
        logger.error(f"Failed to write CSV file {csv_path}: {e}")

`flyfield.layout`

Layout processing for PDFs.

Calculates layout box positions and formatting.

`calculate_layout_fields(boxes)`

Annotate boxes with layout metadata including IDs, lines, blocks,

block dimensions, monetary formatting, calculate block dimensions and concatenated fill text per block.

Parameters:

Name	Type	Description	Default
`boxes`	`list`	List of boxes sorted by page and vertical order.	required

Returns:

Name	Type	Description
`dict`	`DefaultDict[int, List[Dict]]`	Mapping page numbers to lists of annotated boxes.

Notes

Vertical tolerance epsilon controls grouping boxes into the same line.
Blocks are formed by grouping boxes separated by large gaps (GAP_THRESHOLD).
Monetary fills are formatted with spaces and decimals where appropriate.

Source code in flyfield/layout.py

def calculate_layout_fields(boxes: List[Dict]) -> DefaultDict[int, List[Dict]]:
    """
    Annotate boxes with layout metadata including IDs, lines, blocks,

    block dimensions, monetary formatting, calculate block dimensions and
    concatenated fill text per block.

    Args:
        boxes (list): List of boxes sorted by page and vertical order.

    Returns:
        dict: Mapping page numbers to lists of annotated boxes.

    Notes:
        - Vertical tolerance epsilon controls grouping boxes into the same line.
        - Blocks are formed by grouping boxes separated by large gaps (GAP_THRESHOLD).
        - Monetary fills are formatted with spaces and decimals where appropriate.
    """
    epsilon = 1  # Vertical tolerance for grouping boxes into the same line
    idx = 0
    current_page = None
    line_counter = 1
    while idx < len(boxes):
        page_num = boxes[idx]["page_num"]
        if page_num != current_page:
            current_page = page_num
            line_counter = 1
        block_id_counter = 1
        # Initialize first box in a new line and block

        boxes[idx].update(
            {
                "id": idx + 1,
                "line": line_counter,
                "block_start": block_id_counter,
                "block": block_id_counter,
                "code": f"{page_num}-{line_counter}-{block_id_counter}",
                "pgap": None,  # Gap before this box (none for first)
            }
        )
        block_start = idx
        j = idx + 1
        # Group boxes horizontally on the same line by bottom alignment and gap thresholds

        while (
            j < len(boxes)
            and boxes[j]["page_num"] == page_num
            and abs(boxes[j]["bottom"] - boxes[idx]["bottom"]) < epsilon
        ):
            boxes[j]["id"] = j + 1
            boxes[j]["line"] = line_counter
            prev_gap = round(boxes[j]["x0"] - boxes[j - 1]["x1"], 1)
            boxes[j]["pgap"] = prev_gap
            boxes[j - 1]["gap"] = prev_gap
            if prev_gap >= GAP_THRESHOLD:
                # Close current block and start a new block

                end_idx = j - 1
                block_length = (end_idx - block_start) + 1
                block_width = round(boxes[end_idx]["x1"] - boxes[block_start]["x0"], 1)
                boxes[block_start]["block_length"] = block_length
                boxes[block_start]["block_width"] = block_width
                current_box = boxes[block_start]
                if current_box.get("field_type") not in ("DollarCents", "Dollars"):
                    raw_fill = " ".join(
                        box.get("fill", "") for box in boxes[block_start : end_idx + 1]
                    )
                    boxes[block_start]["block_fill"] = clean_fill_string(raw_fill)
                else:
                    decimal = current_box.get("field_type") == "DollarCents"
                    fill_val = current_box.get("fill", "")
                    try:
                        if fill_val == "" or fill_val is None:
                            fill_val = 0
                        current_box["fill"] = format_money_space(fill_val, decimal)
                    except Exception as e:
                        logger.warning(f"Failed to format fill value '{fill_val}': {e}")
                        # fall back to original fill value if formatting fails

                        current_box["fill"] = fill_val
                block_id_counter += 1
                block_start = j
                boxes[j].update(
                    {
                        "block_start": block_id_counter,
                        "block": block_id_counter,
                        "code": f"{page_num}-{line_counter}-{block_id_counter}",
                    }
                )
            else:
                # Continue current block

                boxes[j].update(
                    {
                        "block_start": block_id_counter,
                        "block": block_id_counter,
                        "code": f"{page_num}-{line_counter}-{block_id_counter}",
                    }
                )
            j += 1
        # Close last block on the line

        end_idx = j - 1
        block_length = (end_idx - block_start) + 1
        block_width = round(boxes[end_idx]["x1"] - boxes[block_start]["x0"], 1)
        boxes[block_start]["block_length"] = block_length
        boxes[block_start]["block_width"] = block_width
        current_box = boxes[block_start]
        if current_box.get("field_type") not in ("DollarCents", "Dollars"):
            raw_fill = " ".join(
                box.get("fill", "") for box in boxes[block_start : end_idx + 1]
            )
            boxes[block_start]["block_fill"] = clean_fill_string(raw_fill)
        else:
            decimal = current_box.get("field_type") == "DollarCents"
            fill_val = current_box.get("fill", "")
            try:
                if fill_val == "" or fill_val is None:
                    fill_val = 0
                current_box["fill"] = format_money_space(fill_val, decimal)
            except Exception as e:
                logger.warning(f"Failed to format fill value '{fill_val}': {e}")
                current_box["fill"] = fill_val
        boxes[end_idx]["gap"] = None  # No gap after the last box in the line
        line_counter += 1
        idx = j
    block_id_counter = 1
    # Group boxes by page number, only include blocks with length >= 1, then sort by line and left coordinate

    page_dict = defaultdict(list)
    for box in boxes:
        if box.get("block_length", 0) >= 1:
            page_dict[box["page_num"]].append(box)
    for page_num in page_dict:
        page_dict[page_num].sort(key=lambda r: (r.get("line", 0), r.get("left", 0)))
    return page_dict

`assign_numeric_blocks(page_dict)`

Merge and assign numeric block types based on heuristics of adjacency and length.

Parameters:

Name	Type	Description	Default
`page_dict`	`dict`	Keyed by page number with boxes list.	required

Returns:

Name	Type	Description
`dict`	`DefaultDict[int, List[Dict]]`	Updated page_dict with numeric block types assigned.

Notes

Modifies the page_dict in place:

Merges runs of adjacent blocks of length 3 if gaps between them are small.
Optionally prepends certain preceding blocks to runs.
Assigns field types "CurrencyDecimal" or "Currency" based on heuristics.
Aggregates block lengths, widths, and concatenates fill strings.

Source code in flyfield/layout.py

def assign_numeric_blocks(
    page_dict: DefaultDict[int, List[Dict]],
) -> DefaultDict[int, List[Dict]]:
    """
    Merge and assign numeric block types based on heuristics of adjacency and length.

    Args:
        page_dict (dict): Keyed by page number with boxes list.

    Returns:
        dict: Updated page_dict with numeric block types assigned.

    Notes:
        Modifies the page_dict in place:

        - Merges runs of adjacent blocks of length 3 if gaps between them are small.
        - Optionally prepends certain preceding blocks to runs.
        - Assigns field types "CurrencyDecimal" or "Currency" based on heuristics.
        - Aggregates block lengths, widths, and concatenates fill strings.
    """
    for page_num, rows in page_dict.items():
        rows.sort(key=lambda r: (r.get("line", 0), r.get("left", 0)))
        page_dict[page_num] = rows
        i = 0
        while i < len(rows):
            block_length = rows[i].get("block_length", 0)
            if block_length == 3:
                run = [rows[i]]
                j = i + 1
                # Collect consecutive blocks of length 3 separated by small gaps

                while j < len(rows):
                    next_block_length = rows[j].get("block_length", 0)
                    next_pgap = rows[j].get("pgap")
                    if (
                        next_block_length == 3
                        and next_pgap is not None
                        and 0 < next_pgap < 8
                    ):
                        run.append(rows[j])
                        j += 1
                    else:
                        break
                # Optionally prepend preceding block if conditions met

                if len(run) >= 2 and i > 0:
                    prev = rows[i - 1]
                    first_pgap = rows[i].get("pgap", 0)
                    if (
                        prev.get("block_length") in (1, 2)
                        and first_pgap is not None
                        and 1 <= first_pgap < 8
                    ):
                        run.insert(0, prev)
                        i -= 1
                next_idx = j
                next_block_length = (
                    rows[next_idx].get("block_length") if next_idx < len(rows) else None
                )
                next_gap = rows[next_idx].get("pgap") if next_idx < len(rows) else None
                if len(run) >= 2:
                    if (
                        next_idx < len(rows)
                        and next_block_length == 2
                        and next_gap is not None
                    ):
                        run.append(rows[next_idx])
                        run[0]["field_type"] = "CurrencyDecimal"
                        j += 1
                    else:
                        run[0]["field_type"] = "Currency"
                    # Aggregate block length and width for the merged block

                    block_length_sum = sum(
                        r.get("block_length", 0) for r in run if r.get("block_length")
                    )
                    run[0]["block_length"] = block_length_sum
                    first_left = min(r.get("left", float("inf")) for r in run)
                    last_left = max(r.get("left", float("-inf")) for r in run)
                    run[0]["block_width"] = (
                        last_left - first_left + run[-1]["block_width"]
                    )
                    fills = [
                        r.get("block_fill", "") for r in run if r.get("block_fill")
                    ]
                    run[0]["block_fill"] = "".join(fills).strip()
                    # Clear subordinate blocks lengths and fills

                    for r in run[1:]:
                        r["block_length"] = None
                        r["block_width"] = None
                        r["block_fill"] = None
                    i = j
                else:
                    i += 1
            else:
                i += 1
    return page_dict

`flyfield.markup_and_fields`

Functions for PDF markup and form field annotation.

`markup_pdf(pdf_path, page_dict, output_pdf_path, mark_color=(0, 0, 1), mark_radius=1)`

Mark PDF with circles and codes at block locations for debugging.

Parameters:

Name	Type	Description	Default
`pdf_path`	`str`	Input PDF file.	required
`page_dict`	`dict`	Pages and boxes with layout info.	required
`output_pdf_path`	`str`	Output marked PDF file path.	required
`mark_color`	`tuple`	RGB float tuple for marker color.	`(0, 0, 1)`
`mark_radius`	`int or float`	Radius of circle marks.	`1`

Returns:

Type	Description
`None`	None

Source code in flyfield/markup_and_fields.py

def markup_pdf(
    pdf_path: str,
    page_dict: Dict[int, List[Dict]],
    output_pdf_path: str,
    mark_color: Tuple[float, float, float] = (0, 0, 1),
    mark_radius: float = 1,
) -> None:
    """
    Mark PDF with circles and codes at block locations for debugging.

    Args:
        pdf_path (str): Input PDF file.
        page_dict (dict): Pages and boxes with layout info.
        output_pdf_path (str): Output marked PDF file path.
        mark_color (tuple): RGB float tuple for marker color.
        mark_radius (int or float): Radius of circle marks.

    Returns:
        None
    """
    try:
        doc = fitz.open(pdf_path)
    except Exception as e:
        logger.error(f"Failed to open PDF for markup: {e}")
        return
    for page_num, boxes in sorted(page_dict.items()):
        if config.PDF_PAGES and page_num not in config.PDF_PAGES:
            continue
        page = doc[page_num - 1]
        page_height = page.rect.height
        shape = page.new_shape()

        for box in boxes:
            # Only mark boxes that have a meaningful block_length

            if box.get("block_length") not in ("", 0, None):
                x, y_raw = box.get("x0"), box.get("y0")
                y = page_height - y_raw
                shape.draw_circle((x, y), mark_radius)

                point = fitz.Point(x + 4, y)
                shape.insert_text(
                    point,
                    str(box.get("code", "?")),
                    fontsize=8,
                    color=mark_color,
                    morph=(point, fitz.Matrix(1, 0, 0, 1, 0, 0).prerotate(45)),
                )
        shape.finish(color=mark_color, fill=None)
        shape.commit()
    try:
        doc.save(output_pdf_path)
    except Exception as e:
        logger.error(f"Failed to save output PDF: {e}")
    finally:
        doc.close()

`adjust_form_boxes(row, width, block_length)`

Adjust the position and width of form boxes depending on field type and block length.

Parameters:

Name	Type	Description	Default
`row`	`dict`	Box attributes.	required
`width`	`float`	Original block width.	required
`block_length`	`int`	Block length in contained boxes.	required

Returns:

Name	Type	Description
`tuple`	`Tuple[float, float, List[str]]`	(adjusted x, adjusted width, list of extra args)

Source code in flyfield/markup_and_fields.py

def adjust_form_boxes(
    row: Dict,
    width: float,
    block_length: int,
) -> Tuple[float, float, List[str]]:
    """
    Adjust the position and width of form boxes depending on field type and block length.

    Args:
        row (dict): Box attributes.
        width (float): Original block width.
        block_length (int): Block length in contained boxes.

    Returns:
        tuple: (adjusted x, adjusted width, list of extra args)
    """
    x = float(row["left"])
    field_type = row.get("field_type")
    extra_args = ["alignment=2"]

    if (
        block_length == 1
        and width > 14
        and field_type not in ("Currency", "CurrencyDecimal")
    ):
        # Reduce width by size of layout characters

        width_adjusted = width
        if field_type == "Dollars":
            width_adjusted -= 21
        elif field_type == "DollarCents":
            width_adjusted -= 4
        return x, max(0, width_adjusted), extra_args
    if field_type in ("Currency", "CurrencyDecimal"):
        gap_adj = (2 * GAP + GAP_GROUP) / 3 / 2
        gap_start = (gap_adj * (((block_length - 1) % 3) + 1)) / 2 + F
        if field_type == "CurrencyDecimal":
            gap_start += F * 2
        gap_end = gap_adj + F * 2 if field_type == "Currency" else (gap_adj * 3) / 2
    else:
        gap_adj = GAP
        gap_start = gap_end = gap_adj / 2 + F
        extra_args[0] = "alignment=0"
    x -= gap_start
    width_adjusted = width + gap_start + gap_end
    extra_args += [
        f"max_length={block_length}" if block_length else "max_length=None",
        "comb=True",
    ]
    return x, max(0, width_adjusted), extra_args

`generate_form_fields_script(csv_path, input_pdf, output_pdf_with_fields, script_path)`

Generate a standalone Python script to create PDF form fields from CSV block data.

Parameters:

Name	Type	Description	Default
`csv_path`	`str`	CSV data path.	required
`input_pdf`	`str`	Input PDF to annotate.	required
`output_pdf_with_fields`	`str`	Output annotated PDF.	required
`script_path`	`str`	Output script file path.	required

Returns:

Name	Type	Description
`str`	`str`	Path to the generated script file.

Source code in flyfield/markup_and_fields.py

def generate_form_fields_script(
    csv_path: str,
    input_pdf: str,
    output_pdf_with_fields: str,
    script_path: str,
) -> str:
    """
    Generate a standalone Python script to create PDF form fields from CSV block data.

    Args:
        csv_path (str): CSV data path.
        input_pdf (str): Input PDF to annotate.
        output_pdf_with_fields (str): Output annotated PDF.
        script_path (str): Output script file path.

    Returns:
        str: Path to the generated script file.
    """
    lines = [
        "from PyPDFForm import Fields, PdfWrapper",
        f'pdf = PdfWrapper("{input_pdf}")',
    ]
    try:
        with open(csv_path, newline="", encoding="utf-8") as f:
            reader = csv.DictReader(f)
            current_page = None
            for row in reader:
                page_number = int(row["page_num"])

                # Skip rows whose page number is not in PDF_PAGES if PDF_PAGES filter is set

                if config.PDF_PAGES and page_number not in config.PDF_PAGES:
                    continue
                code = row["code"]
                if (
                    not code
                    or row["block_length"] in ("", "0")
                    or row.get("field_type") == "Skip"
                ):
                    continue
                if page_number != current_page:
                    lines.append(f'print("Starting page {page_number}...", flush=True)')
                    current_page = page_number
                block_length = (
                    int(float(row["block_length"]))
                    if row["block_length"] not in ("", "0")
                    else 0
                )
                width = (
                    float(row["block_width"])
                    if row["block_width"] not in ("", "0")
                    else 0
                )
                y, height = float(row["bottom"]), float(row.get("height", 0))
                x, width_adjusted, extra_args = adjust_form_boxes(
                    row, width, block_length
                )
                sanitized_code = re.sub(r"[^\w\-_]", "_", code)
                base_args = [
                    #                    'widget_type="text"',
                    f'name="{sanitized_code}"',
                    f"page_number={page_number}",
                    f"x={x:.2f}",
                    f"y={y:.2f}",
                    f"height={height:.2f}",
                    f"width={width_adjusted:.2f}",
                    "bg_color=(0,0,0,0)",
                    "border_color=(0,0,0,0)",
                    "border_width=0",
                ]
                args = [*base_args, *extra_args]
                lines.append(f"pdf.create_field(Fields.TextField({', '.join(args)}))")
            lines.extend(
                [
                    f'pdf.write("{output_pdf_with_fields}")',
                    'print("Created form fields PDF.", flush=True)',
                ]
            )
        with open(script_path, "w", encoding="utf-8") as f:
            f.write("\n".join(lines))
    except Exception as e:
        logger.error(f"Failed to generate form fields script: {e}")
    return script_path

`run_standalone_script(script_path)`

Execute a standalone script for PDF form field creation.

Parameters:

Name	Type	Description	Default
`script_path`	`str`	Path to the script to run.	required

Source code in flyfield/markup_and_fields.py

def run_standalone_script(script_path: str) -> None:
    """
    Execute a standalone script for PDF form field creation.

    Args:
        script_path (str): Path to the script to run.
    """
    print(f"Running generated form field creation script: {script_path}")
    try:
        result = subprocess.run([sys.executable, "-u", script_path], text=True)
        if result.returncode != 0:
            raise RuntimeError(
                f"Generated script failed with exit code {result.returncode}"
            )
    except Exception as e:
        logger.error(f"Error running generated script: {e}")

`run_fill_pdf_fields(csv_path, output_pdf_path, template_pdf_path, generator_script_path, boxes=None)`

Generates and runs a standalone Python script to fill PDF form fields using PyPDFForm,

based on data from a CSV file with 'code' and 'fill' columns.

Parameters:

Name	Type	Description	Default
`csv_path`	`str`	Path to the CSV input file.	required
`output_pdf_path`	`str`	Path where the filled PDF should be saved.	required
`template_pdf_path`	`str`	Path to the input (template) PDF file.	required
`generator_script_path`	`str`	Path where the generated fill script will be saved.	required

Source code in flyfield/markup_and_fields.py

def run_fill_pdf_fields(
    csv_path: str,
    output_pdf_path: str,
    template_pdf_path: str,
    generator_script_path: str,
    boxes: Optional[Dict[int, List[Dict]]] = None,
) -> None:
    """
    Generates and runs a standalone Python script to fill PDF form fields using PyPDFForm,

    based on data from a CSV file with 'code' and 'fill' columns.

    Args:
        csv_path (str): Path to the CSV input file.
        output_pdf_path (str): Path where the filled PDF should be saved.
        template_pdf_path (str): Path to the input (template) PDF file.
        generator_script_path (str): Path where the generated fill script will be saved.
    """
    from .utils import format_money_space, parse_money_space

    fill_data = {}
    try:
        with open(csv_path, newline="", encoding="utf-8") as f:
            reader = csv.DictReader(f)
            rows = []
            for row in reader:
                # Clean row values

                stripped_row = {
                    k: v.strip() if isinstance(v, str) else v for k, v in row.items()
                }
                if all(v == "" or v == "0" for v in stripped_row.values()):
                    continue
                rows.append(stripped_row)
        # Flatten boxes if any and merge to rows

        if boxes:
            flat_boxes = [entry for sublist in boxes.values() for entry in sublist]
            conditional_merge_list(rows, flat_boxes, "code", ["field_type"])
        for row in rows:
            field = row.get("code")
            value = row.get("fill")
            field_type = row.get("field_type", "")
            if not field or value in ("", "0"):
                continue
            if field_type in ("Dollars", "DollarCents"):
                decimal = field_type == "DollarCents"
                try:
                    amount = parse_money_space(value, decimal=decimal)
                    value = format_money_space(amount, decimal=decimal)
                except Exception as e:
                    print(
                        f"Warning: Could not format value '{value}' for field_type '{field_type}': {e}"
                    )
            elif field_type in ("Currency", "CurrencyDecimal"):
                import re

                value = re.sub(r"\D", "", value)
            fill_data[field] = value
    except Exception as e:
        print(f"Error reading CSV {csv_path}: {e}")
        return
    fill_dict_items = ",\n ".join(f'"{k}": {repr(v)}' for k, v in fill_data.items())
    script_content = f"""\
from PyPDFForm import PdfWrapper
print("Starting to fill PDF fields...", flush=True)
try:
    filled = PdfWrapper(
        "{template_pdf_path}",
        adobe_mode=False
    ).fill(
        {{
            {fill_dict_items}
        }},
        flatten=False
    )
    filled.write("{output_pdf_path}")
    print("Filled PDF saved to {output_pdf_path}", flush=True)
except Exception as e:
    print(f"Exception during filling: {{e}}", file=sys.stderr, flush=True)
    sys.exit(1)
"""

    try:
        with open(generator_script_path, "w", encoding="utf-8") as script_file:
            script_file.write(script_content)
        print(f"Generated fill script saved to {generator_script_path}")
    except Exception as e:
        print(f"Error writing fill script to {generator_script_path}: {e}")
        return
    try:
        result = subprocess.run(
            [sys.executable, generator_script_path], capture_output=True, text=True
        )
        print("Fill script stdout:")
        print(result.stdout)
        print("Fill script stderr:")
        print(result.stderr)
        if result.returncode != 0:
            print(f"Fill script failed with exit code {result.returncode}")
        else:
            print("Fill script completed successfully.")
    except Exception as e:
        print(f"Error running fill script: {e}")

`flyfield.utils`

General utility functions.

Helper functions for parsing, formatting, and validation.

`add_suffix_to_filename(filename, suffix)`

Add a suffix before the file extension in a filename.

Parameters:

Name	Type	Description	Default
`filename`	`str`	Original filename.	required
`suffix`	`str`	Suffix to add.	required

Returns:

Name	Type	Description
`str`	`str`	Filename with suffix added.

Source code in flyfield/utils.py

def add_suffix_to_filename(filename: str, suffix: str) -> str:
    """
    Add a suffix before the file extension in a filename.

    Args:
        filename (str): Original filename.
        suffix (str): Suffix to add.

    Returns:
        str: Filename with suffix added.
    """
    base, ext = os.path.splitext(filename)
    return f"{base}{suffix}{ext}"

`colour_match(color, target_color=TARGET_COLOUR, tol=0.001)`

Check if a color matches a target within a tolerance.

Parameters:

Name	Type	Description	Default
`color`	`tuple`	RGB color tuple.	required
`target_color`	`tuple`	RGB target color.	`COLOR_WHITE`
`tol`	`float`	Allowed tolerance.	`0.001`

Returns:

Name	Type	Description
`bool`	`bool`	True if colors match within tolerance.

Note

If the input color has an alpha channel (RGBA), the alpha component is ignored.

Source code in flyfield/utils.py

def colour_match(
    color: Tuple[float, ...],
    target_color: Tuple[float, float, float] = TARGET_COLOUR,
    tol: float = 1e-3,
) -> bool:
    """
    Check if a color matches a target within a tolerance.

    Args:
        color (tuple): RGB color tuple.
        target_color (tuple): RGB target color.
        tol (float): Allowed tolerance.

    Returns:
        bool: True if colors match within tolerance.

    Note:
        If the input color has an alpha channel (RGBA), the alpha component is ignored.
    """
    if not color or len(color) < 3:
        return False
    # Compare only RGB channels; ignore alpha if present

    return all(abs(a - b) < tol for a, b in zip(color[:3], target_color))

`int_to_rgb(color_int)`

Convert a 24-bit integer color in 0xRRGGBB format to normalized RGB tuple of floats.

Parameters:

Name	Type	Description	Default
`color_int`	`int`	Integer encoding color as 0xRRGGBB.	required

Returns:

Name	Type	Description
`tuple`	`Tuple[float, float, float]`	Normalized (r, g, b) floats in range [0.0, 1.0].

Source code in flyfield/utils.py

def int_to_rgb(color_int: int) -> Tuple[float, float, float]:
    """
    Convert a 24-bit integer color in 0xRRGGBB format to normalized RGB tuple of floats.

    Args:
        color_int (int): Integer encoding color as 0xRRGGBB.

    Returns:
        tuple: Normalized (r, g, b) floats in range [0.0, 1.0].
    """
    r = ((color_int >> 16) & 0xFF) / 255
    g = ((color_int >> 8) & 0xFF) / 255
    b = (color_int & 0xFF) / 255
    return (r, g, b)

`clean_fill_string(line_text)`

Clean a concatenated fill text string by removing single spaces while preserving double spaces as single spaces.

Parameters:

Name	Type	Description	Default
`line_text`	`str`	Raw line text.	required

Returns:

Name	Type	Description
`str`	`str`	Cleaned fill string.

Source code in flyfield/utils.py

def clean_fill_string(line_text: str) -> str:
    """
    Clean a concatenated fill text string by removing single spaces while preserving double spaces as single spaces.

    Args:
        line_text (str): Raw line text.

    Returns:
        str: Cleaned fill string.
    """
    line_text = re.sub(r" {2,}", "<<<SPACE>>>", line_text)
    line_text = line_text.replace(" ", "")
    line_text = line_text.replace("<<<SPACE>>>", " ")
    return line_text

`allowed_text(text, field_type=None)`

Determine if text is allowed based on predefined rules and field type.

Helps to filter out pre-filled or invalid box contents.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text extracted from a box.	required
`field_type`	`str or None`	Optional current field type guess to refine allowed patterns.	`None`

Returns:

Name	Type	Description
`tuple`	`Tuple[bool, Optional[str]]`	(bool indicating if allowed, detected field type or None)

Source code in flyfield/utils.py

def allowed_text(
    text: str, field_type: Optional[str] = None
) -> Tuple[bool, Optional[str]]:
    """
    Determine if text is allowed based on predefined rules and field type.

    Helps to filter out pre-filled or invalid box contents.

    Args:
        text (str): Text extracted from a box.
        field_type (str or None): Optional current field type guess to refine allowed patterns.

    Returns:
        tuple: (bool indicating if allowed, detected field type or None)
    """
    allowed_text_by_type = {
        "DollarCents": {".", ".00."},
        "Dollars": {".00", ".00.00"},
    }
    generic_allowed_text = {"S", "M", "I", "T", "H"}
    if field_type in allowed_text_by_type:
        allowed_set = allowed_text_by_type[field_type] | generic_allowed_text
        if text in allowed_set:
            return True, field_type
        else:
            return False, None
    else:
        for ftype, texts in allowed_text_by_type.items():
            if text in texts:
                return True, ftype
        if text in generic_allowed_text:
            return True, None
        return False, None

`format_money_space(amount, decimal=True)`

Format a numeric amount to a string with space-separated thousands and optional decimal.

Parameters:

Name	Type	Description	Default
`amount`	`float or int`	Numeric amount to format.	required
`decimal`	`bool`	Whether to include two decimal places.	`True`

Returns:

Name	Type	Description
`str`	`str`	Formatted monetary string.

Source code in flyfield/utils.py

def format_money_space(amount: float, decimal: bool = True) -> str:
    """
    Format a numeric amount to a string with space-separated thousands and optional decimal.

    Args:
        amount (float or int): Numeric amount to format.
        decimal (bool): Whether to include two decimal places.

    Returns:
        str: Formatted monetary string.
    """
    if decimal:
        s = f"{amount:,.2f}"
        int_part, dec_part = s.split(".")
        int_part = int_part.replace(",", " ")
        return f"{int_part} {dec_part}"
    else:
        s = f"{int(amount):,}"
        int_part = s.replace(",", " ")
        return int_part

`parse_money_space(money_str, decimal=True)`

Parse a monetary string with optional implied decimal space formatting.

Parameters:

Name	Type	Description	Default
`money_str`	`str`	Monetary string to parse (e.g., "12 345" means 123.45 if decimal is True).	required
`decimal`	`bool`	Whether the last two digits represent cents (default True).	`True`

Returns:

Name	Type	Description
`float`	`float`	Parsed monetary value as a float.

Source code in flyfield/utils.py

def parse_money_space(money_str: str, decimal: bool = True) -> float:
    """
    Parse a monetary string with optional implied decimal space formatting.

    Args:
        money_str (str): Monetary string to parse (e.g., "12 345" means 123.45 if decimal is True).
        decimal (bool): Whether the last two digits represent cents (default True).

    Returns:
        float: Parsed monetary value as a float.
    """
    if decimal:
        if " " in money_str:
            parts = money_str.rsplit(" ", 1)
            int_part = parts[0].replace(" ", "")
            dec_part = parts[1]
            combined = f"{int_part}.{dec_part}"
            return float(combined)
        else:
            # No decimal part found, treat as int

            return float(money_str.replace(" ", ""))
    else:
        return int(money_str.replace(" ", ""))

`parse_implied_decimal(s)`

Parse a numeric string with implied decimal (last two digits as decimals).

Parameters:

Name	Type	Description	Default
`s`	`str`	Numeric string (e.g., "12345" -> 123.45).	required

Returns:

Name	Type	Description
`float`	`float`	Parsed float value.

Source code in flyfield/utils.py

def parse_implied_decimal(s: str) -> float:
    """
    Parse a numeric string with implied decimal (last two digits as decimals).

    Args:
        s (str): Numeric string (e.g., "12345" -> 123.45).

    Returns:
        float: Parsed float value.
    """
    s = s.strip()
    digits_only = re.sub(r"\D", "", s)

    if not digits_only:
        return 0.0
    if len(digits_only) <= 2:
        # If only 1 or 2 digits, treat as fractional part

        combined = f"0.{digits_only.zfill(2)}"
    else:
        combined = f"{digits_only[:-2]}.{digits_only[-2:]}"
    return float(combined)

`version()`

Return the current version string of the library/module.

Returns:

Name	Type	Description
`str`	`str`	Version string.

Source code in flyfield/utils.py

def version() -> str:
    """
    Return the current version string of the library/module.

    Returns:
        str: Version string.
    """
    try:
        # Python 3.8+

        from importlib.metadata import PackageNotFoundError
        from importlib.metadata import version as pkg_version
    except ImportError:
        # For Python <3.8

        from importlib_metadata import PackageNotFoundError
        from importlib_metadata import version as pkg_version
    try:
        return pkg_version("flyfield")
    except PackageNotFoundError:
        return "unknown"

`parse_pages(pages_str)`

Parse a string specifying pages or page ranges into a list of page integers.

Parameters:

Name	Type	Description	Default
`pages_str`	`str`	Pages specified as a comma-separated list or ranges (e.g., "1,3-5").	required

Returns:

Type	Description
`List[int]`	list[int]: List of individual page numbers.

Source code in flyfield/utils.py

def parse_pages(pages_str: str) -> List[int]:
    """
    Parse a string specifying pages or page ranges into a list of page integers.

    Args:
        pages_str (str): Pages specified as a comma-separated list or ranges (e.g., "1,3-5").

    Returns:
        list[int]: List of individual page numbers.
    """
    pages = set()
    for part in pages_str.split(","):
        part = part.strip()
        if "-" in part:
            start_str, end_str = part.split("-")
            start, end = int(start_str), int(end_str)
            pages.update(range(start, end + 1))
        else:
            pages.add(int(part))
    return sorted(pages)

`conditional_merge_list(main_list, ref_list, match_key, keys_to_merge)`

Conditionally merge dictionaries in a main list with those in a reference list.

Parameters:

Name	Type	Description	Default
`main_list`	`list[dict]`	Primary list of dictionaries.	required
`ref_list`	`list[dict]`	Reference list of dictionaries.	required
`match_key`	`str`	Key to match dictionaries.	required
`keys_to_merge`	`list[str]`	Keys to merge from ref_list into main_list.	required

Returns:

Name	Type	Description
`None`	`None`	Modifies main_list in place.

Source code in flyfield/utils.py

def conditional_merge_list(
    main_list: List[Dict],
    ref_list: List[Dict],
    match_key: str,
    keys_to_merge: List[str],
) -> None:
    """
    Conditionally merge dictionaries in a main list with those in a reference list.

    Args:
        main_list (list[dict]): Primary list of dictionaries.
        ref_list (list[dict]): Reference list of dictionaries.
        match_key (str): Key to match dictionaries.
        keys_to_merge (list[str]): Keys to merge from ref_list into main_list.

    Returns:
        None: Modifies main_list in place.
    """
    # Build lookup dictionary for efficient matching

    ref_lookup = {item[match_key]: item for item in ref_list if match_key in item}
    for record in main_list:
        ref_record = ref_lookup.get(record.get(match_key))
        if ref_record:
            for key in keys_to_merge:
                if key in ref_record:
                    record[key] = ref_record[key]

API Documentation for flyfield

Key Modules and Functions

1. Extraction (extract.py)

2. Layout (layout.py)

3. CSV I/O (io_utils.py)

4. Markup and Field Scripts (markup_and_fields.py)

5. Utilities (utils.py)

Field Data Structure

Example Usage

Further Resources

Core Modules

flyfield.extract

extract_boxes(pdf_path)

filter_boxes(page, boxes)

remove_duplicates(boxes)

sort_boxes(boxes, decimal_places=0)

process_boxes(pdf_path, csv_path)

flyfield.io_utils

load_boxes_from_csv(csv_path)

write_csv(boxes_or_page_dict, csv_path)

read_csv_rows(filename)

save_pdf_form_data_to_csv(pdf_path, csv_path, boxes=None)

flyfield.layout

calculate_layout_fields(boxes)

assign_numeric_blocks(page_dict)

flyfield.markup_and_fields

markup_pdf(pdf_path, page_dict, output_pdf_path, mark_color=(0, 0, 1), mark_radius=1)

adjust_form_boxes(row, width, block_length)

generate_form_fields_script(csv_path, input_pdf, output_pdf_with_fields, script_path)

run_standalone_script(script_path)

run_fill_pdf_fields(csv_path, output_pdf_path, template_pdf_path, generator_script_path, boxes=None)

flyfield.utils

add_suffix_to_filename(filename, suffix)

colour_match(color, target_color=TARGET_COLOUR, tol=0.001)

int_to_rgb(color_int)

clean_fill_string(line_text)

allowed_text(text, field_type=None)

format_money_space(amount, decimal=True)

parse_money_space(money_str, decimal=True)

parse_implied_decimal(s)

version()

parse_pages(pages_str)

conditional_merge_list(main_list, ref_list, match_key, keys_to_merge)

1. Extraction (`extract.py`)

2. Layout (`layout.py`)

3. CSV I/O (`io_utils.py`)

4. Markup and Field Scripts (`markup_and_fields.py`)

5. Utilities (`utils.py`)

`flyfield.extract`

`extract_boxes(pdf_path)`

`filter_boxes(page, boxes)`

`remove_duplicates(boxes)`

`sort_boxes(boxes, decimal_places=0)`

`process_boxes(pdf_path, csv_path)`

`flyfield.io_utils`

`load_boxes_from_csv(csv_path)`

`write_csv(boxes_or_page_dict, csv_path)`

`read_csv_rows(filename)`

`save_pdf_form_data_to_csv(pdf_path, csv_path, boxes=None)`

`flyfield.layout`

`calculate_layout_fields(boxes)`

`assign_numeric_blocks(page_dict)`

`flyfield.markup_and_fields`

`markup_pdf(pdf_path, page_dict, output_pdf_path, mark_color=(0, 0, 1), mark_radius=1)`

`adjust_form_boxes(row, width, block_length)`

`generate_form_fields_script(csv_path, input_pdf, output_pdf_with_fields, script_path)`

`run_standalone_script(script_path)`

`run_fill_pdf_fields(csv_path, output_pdf_path, template_pdf_path, generator_script_path, boxes=None)`

`flyfield.utils`

`add_suffix_to_filename(filename, suffix)`

`colour_match(color, target_color=TARGET_COLOUR, tol=0.001)`

`int_to_rgb(color_int)`

`clean_fill_string(line_text)`

`allowed_text(text, field_type=None)`

`format_money_space(amount, decimal=True)`

`parse_money_space(money_str, decimal=True)`

`parse_implied_decimal(s)`

`version()`

`parse_pages(pages_str)`

`conditional_merge_list(main_list, ref_list, match_key, keys_to_merge)`