stupa-pdf-api/docs/PDF_INTEGRATION.md

260 lines
6.4 KiB
Markdown

# PDF Integration Guide
This document describes how the STUPA PDF API integrates with LaTeX templates to generate and process PDF forms.
## Overview
The STUPA PDF API system works with two types of funding application forms:
- **QSM** (Qualitätssicherungsmittel) - Quality assurance funding
- **VSM** (Verfasste Studierendenschaft Mittel) - Student body funding
## LaTeX Templates
### Repository Structure
The LaTeX templates are maintained in a separate Git repository and integrated as a submodule:
- Repository: `git@git.beimgraben.net:frederik/PA_Vorlage.git`
- Location: `/backend/latex-templates/`
### Branch Organization
Different form types are maintained in separate branches:
- `v1.2/QSM` - QSM application forms
- `v1.2/VSM` - VSM application forms
- `v1.2/VGL` - VGL forms (deprecated, not used)
### Working with Templates
The project uses Git worktrees to access multiple template versions simultaneously:
```bash
# Templates are checked out to:
/backend/latex-qsm/ # QSM templates (branch: v1.2/QSM)
/backend/latex-vsm/ # VSM templates (branch: v1.2/VSM)
```
## PDF Generation Workflow
### 1. Template Compilation
LaTeX templates are compiled to PDF using XeLaTeX:
```bash
# Build PDFs from LaTeX sources
./scripts/build-pdfs.sh
```
This script:
- Sets up Git worktrees for QSM and VSM branches
- Compiles LaTeX to PDF using `latexmk -xelatex`
- Copies generated PDFs to `/backend/assets/`
### 2. Form Field Mapping
The system uses pre-compiled PDFs with form fields:
- `/backend/assets/qsm.pdf` - QSM form template
- `/backend/assets/vsm.pdf` - VSM form template
These PDFs contain named form fields that correspond to the application data structure.
### 3. PDF Processing Pipeline
```
User uploads PDF → Parse form data → Store in database → Generate filled PDF
```
## LaTeX Template Structure
### Main Components
```
Main.tex # Main document file
Content/
├── 01_content.tex # Form content and fields
└── 99_glossary.tex # Glossary definitions
HSRTReport/ # Custom document class
TeX/
├── Preamble.tex # Package imports and settings
└── Settings/ # Configuration files
```
### Form Fields
LaTeX form fields are defined using custom commands:
```latex
\CustomTextFieldDefault{pa-project-name}{}{Projektname}{width=\linewidth}
\CustomChoiceMenuDefault{pa-course}{}{width=\linewidth,default=-}{-,INF,ESB,LS,TEC,TEX,NXT}
\CheckBox[name=pa-qsm-studierende,width=1em,height=1em]{}
```
Field naming convention: `pa-` prefix followed by the field identifier.
## Docker Integration
### Build Requirements
The Docker image includes all necessary LaTeX packages:
```dockerfile
RUN apt-get install -y \
texlive-full \
texlive-xetex \
texlive-lang-german \
texlive-fonts-extra \
latexmk \
git
```
### Environment Variables
```env
# PDF template paths
QSM_TEMPLATE=/app/assets/qsm.pdf
VSM_TEMPLATE=/app/assets/vsm.pdf
# LaTeX source paths
LATEX_QSM_PATH=/app/latex-qsm
LATEX_VSM_PATH=/app/latex-vsm
```
## Development Workflow
### 1. Modifying Templates
To modify PDF templates:
1. Navigate to the appropriate worktree:
```bash
cd backend/latex-qsm # or latex-vsm
```
2. Edit the LaTeX files
3. Build the PDF:
```bash
./scripts/build-pdfs.sh
```
4. Test the new PDF with the application
### 2. Adding New Form Fields
1. Add field definition in LaTeX:
```latex
\CustomTextFieldDefault{pa-new-field}{}{Field Label}{width=\linewidth}
```
2. Update the field mapping in `pdf_field_mapping.py`
3. Add corresponding database fields if needed
4. Rebuild the PDF template
### 3. Testing PDF Generation
```bash
# Test PDF generation in Docker
docker compose exec api python -c "
from pdf_filler import fill_pdf
# Test code here
"
```
## Field Mapping Reference
### Common Fields (Both QSM and VSM)
| LaTeX Field | Database Field | Description |
|-------------|----------------|-------------|
| `pa-applicant-type` | `applicantType` | Person or Institution |
| `pa-institution` | `institution` | Institution name |
| `pa-first-name` | `firstName` | Applicant first name |
| `pa-last-name` | `lastName` | Applicant last name |
| `pa-email` | `email` | Contact email |
| `pa-phone` | `phone` | Phone number |
| `pa-project-name` | `name` | Project name |
### QSM-Specific Fields
| LaTeX Field | Database Field | Description |
|-------------|----------------|-------------|
| `pa-qsm-*` | Various | QSM-specific checkboxes |
| `pa-cost-*` | `costs[].name/amountEur` | Cost positions |
### VSM-Specific Fields
| LaTeX Field | Database Field | Description |
|-------------|----------------|-------------|
| `pa-vsm-*` | Various | VSM-specific fields |
| `pa-financing-*` | `financing.*` | Financing options |
## Troubleshooting
### Common Issues
1. **PDF build fails with XeLaTeX error**
- Ensure all LaTeX dependencies are installed
- Check for syntax errors in .tex files
- Verify fonts are available
2. **Form fields not filling**
- Check field names match between LaTeX and mapping
- Verify PDF has form fields (use PDF reader)
- Check data types match expected format
3. **Git worktree errors**
- Remove existing worktrees: `git worktree prune`
- Re-run setup script
### Debugging Commands
```bash
# List form fields in PDF
docker compose exec api python -c "
import PyPDF2
with open('/app/assets/qsm.pdf', 'rb') as f:
pdf = PyPDF2.PdfReader(f)
fields = pdf.get_form_text_fields()
for name, value in fields.items():
print(f'{name}: {value}')
"
# Check LaTeX compilation log
cd backend/latex-qsm
cat Main.log
```
## Best Practices
1. **Version Control**
- Keep LaTeX templates in sync with main repo
- Tag releases when updating PDF templates
- Document field changes in commit messages
2. **Testing**
- Test both empty and filled PDFs
- Verify all form fields are accessible
- Check PDF compatibility across readers
3. **Performance**
- Pre-compile PDFs rather than generating on demand
- Cache compiled PDFs in Docker image
- Minimize LaTeX package dependencies
## Future Enhancements
1. **Dynamic PDF Generation**
- Generate PDFs on-demand from LaTeX
- Support custom form layouts
- Template versioning system
2. **Field Validation**
- Implement LaTeX-side validation
- Sync validation rules with frontend
- Generate field documentation from LaTeX
3. **Multi-language Support**
- Internationalize LaTeX templates
- Support multiple PDF languages
- Dynamic language selection