stupa-pdf-api/docs/PDF_INTEGRATION.md

6.4 KiB

PDF Integration Guide

This document describes how the STUPA PDF API integrates with LaTeX templates to generate and process PDF forms.

Overview

The STUPA PDF API system works with two types of funding application forms:

  • QSM (Qualitätssicherungsmittel) - Quality assurance funding
  • VSM (Verfasste Studierendenschaft Mittel) - Student body funding

LaTeX Templates

Repository Structure

The LaTeX templates are maintained in a separate Git repository and integrated as a submodule:

  • Repository: git@git.beimgraben.net:frederik/PA_Vorlage.git
  • Location: /backend/latex-templates/

Branch Organization

Different form types are maintained in separate branches:

  • v1.2/QSM - QSM application forms
  • v1.2/VSM - VSM application forms
  • v1.2/VGL - VGL forms (deprecated, not used)

Working with Templates

The project uses Git worktrees to access multiple template versions simultaneously:

# Templates are checked out to:
/backend/latex-qsm/  # QSM templates (branch: v1.2/QSM)
/backend/latex-vsm/  # VSM templates (branch: v1.2/VSM)

PDF Generation Workflow

1. Template Compilation

LaTeX templates are compiled to PDF using XeLaTeX:

# Build PDFs from LaTeX sources
./scripts/build-pdfs.sh

This script:

  • Sets up Git worktrees for QSM and VSM branches
  • Compiles LaTeX to PDF using latexmk -xelatex
  • Copies generated PDFs to /backend/assets/

2. Form Field Mapping

The system uses pre-compiled PDFs with form fields:

  • /backend/assets/qsm.pdf - QSM form template
  • /backend/assets/vsm.pdf - VSM form template

These PDFs contain named form fields that correspond to the application data structure.

3. PDF Processing Pipeline

User uploads PDF → Parse form data → Store in database → Generate filled PDF

LaTeX Template Structure

Main Components

Main.tex                 # Main document file
Content/
├── 01_content.tex      # Form content and fields
└── 99_glossary.tex     # Glossary definitions
HSRTReport/             # Custom document class
TeX/
├── Preamble.tex        # Package imports and settings
└── Settings/           # Configuration files

Form Fields

LaTeX form fields are defined using custom commands:

\CustomTextFieldDefault{pa-project-name}{}{Projektname}{width=\linewidth}
\CustomChoiceMenuDefault{pa-course}{}{width=\linewidth,default=-}{-,INF,ESB,LS,TEC,TEX,NXT}
\CheckBox[name=pa-qsm-studierende,width=1em,height=1em]{}

Field naming convention: pa- prefix followed by the field identifier.

Docker Integration

Build Requirements

The Docker image includes all necessary LaTeX packages:

RUN apt-get install -y \
    texlive-full \
    texlive-xetex \
    texlive-lang-german \
    texlive-fonts-extra \
    latexmk \
    git

Environment Variables

# PDF template paths
QSM_TEMPLATE=/app/assets/qsm.pdf
VSM_TEMPLATE=/app/assets/vsm.pdf

# LaTeX source paths
LATEX_QSM_PATH=/app/latex-qsm
LATEX_VSM_PATH=/app/latex-vsm

Development Workflow

1. Modifying Templates

To modify PDF templates:

  1. Navigate to the appropriate worktree:

    cd backend/latex-qsm  # or latex-vsm
    
  2. Edit the LaTeX files

  3. Build the PDF:

    ./scripts/build-pdfs.sh
    
  4. Test the new PDF with the application

2. Adding New Form Fields

  1. Add field definition in LaTeX:

    \CustomTextFieldDefault{pa-new-field}{}{Field Label}{width=\linewidth}
    
  2. Update the field mapping in pdf_field_mapping.py

  3. Add corresponding database fields if needed

  4. Rebuild the PDF template

3. Testing PDF Generation

# Test PDF generation in Docker
docker compose exec api python -c "
from pdf_filler import fill_pdf
# Test code here
"

Field Mapping Reference

Common Fields (Both QSM and VSM)

LaTeX Field Database Field Description
pa-applicant-type applicantType Person or Institution
pa-institution institution Institution name
pa-first-name firstName Applicant first name
pa-last-name lastName Applicant last name
pa-email email Contact email
pa-phone phone Phone number
pa-project-name name Project name

QSM-Specific Fields

LaTeX Field Database Field Description
pa-qsm-* Various QSM-specific checkboxes
pa-cost-* costs[].name/amountEur Cost positions

VSM-Specific Fields

LaTeX Field Database Field Description
pa-vsm-* Various VSM-specific fields
pa-financing-* financing.* Financing options

Troubleshooting

Common Issues

  1. PDF build fails with XeLaTeX error

    • Ensure all LaTeX dependencies are installed
    • Check for syntax errors in .tex files
    • Verify fonts are available
  2. Form fields not filling

    • Check field names match between LaTeX and mapping
    • Verify PDF has form fields (use PDF reader)
    • Check data types match expected format
  3. Git worktree errors

    • Remove existing worktrees: git worktree prune
    • Re-run setup script

Debugging Commands

# List form fields in PDF
docker compose exec api python -c "
import PyPDF2
with open('/app/assets/qsm.pdf', 'rb') as f:
    pdf = PyPDF2.PdfReader(f)
    fields = pdf.get_form_text_fields()
    for name, value in fields.items():
        print(f'{name}: {value}')
"

# Check LaTeX compilation log
cd backend/latex-qsm
cat Main.log

Best Practices

  1. Version Control

    • Keep LaTeX templates in sync with main repo
    • Tag releases when updating PDF templates
    • Document field changes in commit messages
  2. Testing

    • Test both empty and filled PDFs
    • Verify all form fields are accessible
    • Check PDF compatibility across readers
  3. Performance

    • Pre-compile PDFs rather than generating on demand
    • Cache compiled PDFs in Docker image
    • Minimize LaTeX package dependencies

Future Enhancements

  1. Dynamic PDF Generation

    • Generate PDFs on-demand from LaTeX
    • Support custom form layouts
    • Template versioning system
  2. Field Validation

    • Implement LaTeX-side validation
    • Sync validation rules with frontend
    • Generate field documentation from LaTeX
  3. Multi-language Support

    • Internationalize LaTeX templates
    • Support multiple PDF languages
    • Dynamic language selection