stupa-pdf-api/backend/ARCHITECTURE.md
Frederik Beimgraben ad697e5f54 feat: Complete redesign with OIDC auth, PDF upload, and enhanced workflow
BREAKING CHANGE: Major architecture overhaul removing LaTeX compilation

- Removed embedded LaTeX compilation
- Added OIDC/OAuth2 authentication with Nextcloud integration
- Added email authentication with magic links
- Implemented role-based access control (RBAC)
- Added PDF template upload and field mapping
- Implemented visual form designer capability
- Created multi-stage approval workflow
- Added voting mechanism for AStA members
- Enhanced user dashboard with application tracking
- Added comprehensive audit trail and history
- Improved security with JWT tokens and encryption

New Features:
- OIDC single sign-on with automatic role mapping
- Dual authentication (OIDC + Email)
- Upload fillable PDFs as templates
- Graphical field mapping interface
- Configurable workflow with reviews and voting
- Admin panel for role and permission management
- Email notifications for status updates
- Docker compose setup with Redis and MailHog

Migration Required:
- Database schema updates via Alembic
- Configuration of OIDC provider
- Upload of PDF templates to replace LaTeX
- Role mapping configuration
2025-09-17 00:42:57 +02:00

11 KiB

Backend Architecture Documentation

Overview

The backend has been refactored from a monolithic structure into a modular, service-oriented architecture that emphasizes:

  • Separation of Concerns: Clear boundaries between layers (API, Service, Repository, Model)
  • Dependency Injection: Dynamic service resolution and configuration
  • Extensibility: Plugin-based system for PDF variants and providers
  • Maintainability: Organized code structure with single responsibility principle
  • Scalability: Stateless services with proper connection pooling

Directory Structure

backend/
├── src/
│   ├── api/                 # API Layer
│   │   ├── routes/          # FastAPI routers
│   │   ├── middleware/      # Custom middleware
│   │   └── dependencies/    # Dependency injection helpers
│   │
│   ├── services/            # Business Logic Layer
│   │   ├── base.py         # Base service classes
│   │   ├── application.py  # Application business logic
│   │   ├── pdf.py          # PDF processing service
│   │   └── auth.py         # Authentication service
│   │
│   ├── repositories/        # Data Access Layer
│   │   ├── base.py         # Base repository pattern
│   │   ├── application.py  # Application repository
│   │   └── attachment.py   # Attachment repository
│   │
│   ├── models/             # Database Models
│   │   ├── base.py        # Base model with mixins
│   │   └── application.py # Application entities
│   │
│   ├── providers/          # Dynamic Providers
│   │   ├── pdf_qsm.py     # QSM PDF variant provider
│   │   └── pdf_vsm.py     # VSM PDF variant provider
│   │
│   ├── config/            # Configuration Management
│   │   └── settings.py    # Centralized settings with Pydantic
│   │
│   ├── core/              # Core Infrastructure
│   │   ├── container.py   # Dependency injection container
│   │   └── database.py    # Database management
│   │
│   └── utils/             # Utility Functions
│       └── helpers.py     # Common utilities

Architecture Layers

1. API Layer (api/)

Responsibility: HTTP request/response handling, validation, routing

  • Routes: Modular FastAPI routers for different domains
  • Middleware: Cross-cutting concerns (rate limiting, logging, error handling)
  • Dependencies: FastAPI dependency injection functions
# Example: api/routes/applications.py
@router.post("/", response_model=ApplicationResponse)
async def create_application(
    data: ApplicationCreate,
    service: ApplicationService = Depends(get_application_service)
):
    return await service.create(data.dict())

2. Service Layer (services/)

Responsibility: Business logic, orchestration, validation rules

  • Encapsulates all business rules and workflows
  • Coordinates between repositories and external services
  • Handles complex validations and transformations
  • Stateless and testable
# Example: services/application.py
class ApplicationService(CRUDService[Application]):
    def submit_application(self, id: int) -> Application:
        # Business logic for submission
        app = self.repository.get_or_404(id)
        self._validate_submission(app)
        app.status = ApplicationStatus.SUBMITTED
        return self.repository.update(app)

3. Repository Layer (repositories/)

Responsibility: Data access abstraction, CRUD operations

  • Implements repository pattern for database access
  • Provides clean abstraction over SQLAlchemy
  • Handles query building and optimization
  • Transaction management
# Example: repositories/application.py
class ApplicationRepository(BaseRepository[Application]):
    def find_by_status(self, status: ApplicationStatus) -> List[Application]:
        return self.query().filter(
            Application.status == status
        ).all()

4. Model Layer (models/)

Responsibility: Data structure definition, ORM mapping

  • SQLAlchemy models with proper relationships
  • Base classes with common functionality (timestamps, soft delete)
  • Model mixins for reusable behavior
  • Business entity representation
# Example: models/application.py
class Application(ExtendedBaseModel):
    __tablename__ = "applications"
    
    pa_id = Column(String(64), unique=True, index=True)
    status = Column(SQLEnum(ApplicationStatus))
    payload = Column(JSON)

Key Components

Dependency Injection Container

The system uses a custom dependency injection container for managing service lifecycles:

# core/container.py
class Container:
    def register_service(self, name: str, service_class: Type[BaseService]):
        # Register service with automatic dependency resolution
        
    def get_service(self, name: str) -> BaseService:
        # Retrieve service instance with dependencies injected

Benefits:

  • Loose coupling between components
  • Easy testing with mock services
  • Dynamic service configuration
  • Singleton pattern support

Configuration Management

Centralized configuration using Pydantic Settings:

# config/settings.py
class Settings(BaseSettings):
    database: DatabaseSettings
    security: SecuritySettings
    rate_limit: RateLimitSettings
    storage: StorageSettings
    pdf: PDFSettings
    app: ApplicationSettings

Features:

  • Environment variable support
  • Type validation
  • Default values
  • Configuration file support (JSON/YAML)
  • Dynamic override capability

Provider Pattern for PDF Variants

Extensible system for handling different PDF types:

# providers/pdf_qsm.py
class QSMProvider(PDFVariantProvider):
    def parse_pdf_fields(self, fields: Dict) -> Dict:
        # QSM-specific parsing logic
        
    def map_payload_to_fields(self, payload: Dict) -> Dict:
        # QSM-specific field mapping

Advantages:

  • Easy to add new PDF variants
  • Variant-specific validation rules
  • Dynamic provider registration
  • Clean separation of variant logic

Database Architecture

Base Model Classes

# models/base.py
class BaseModel:
    # Common fields and methods
    
class TimestampMixin:
    created_at = Column(DateTime)
    updated_at = Column(DateTime)
    
class SoftDeleteMixin:
    is_deleted = Column(Boolean)
    deleted_at = Column(DateTime)
    
class AuditMixin:
    created_by = Column(String)
    updated_by = Column(String)

Connection Management

  • Connection pooling with configurable size
  • Automatic retry on connection failure
  • Session scoping for transaction management
  • Health check utilities

Service Patterns

CRUD Service Base

class CRUDService(BaseService):
    def create(self, data: Dict) -> T
    def update(self, id: Any, data: Dict) -> T
    def delete(self, id: Any, soft: bool = True) -> bool
    def get(self, id: Any) -> Optional[T]
    def list(self, filters: Dict, page: int, page_size: int) -> Dict

Error Handling

Hierarchical exception system:

ServiceException
├── ValidationError
├── BusinessRuleViolation
├── ResourceNotFoundError
└── ResourceConflictError

Transaction Management

with service.handle_errors("operation"):
    with repository.transaction():
        # Perform multiple operations
        # Automatic rollback on error

API Design

RESTful Endpoints

POST   /api/applications           # Create application
GET    /api/applications          # List applications
GET    /api/applications/{id}     # Get application
PUT    /api/applications/{id}     # Update application
DELETE /api/applications/{id}     # Delete application

POST   /api/applications/{id}/submit     # Submit application
POST   /api/applications/{id}/review     # Review application
GET    /api/applications/{id}/pdf        # Generate PDF

Request/Response Models

Using Pydantic for validation:

class ApplicationCreate(BaseModel):
    variant: ApplicationType
    payload: Dict[str, Any]
    
class ApplicationResponse(BaseModel):
    id: int
    pa_id: str
    status: ApplicationStatus
    created_at: datetime

Middleware Stack

  1. CORS Middleware: Cross-origin resource sharing
  2. Rate Limit Middleware: Request throttling
  3. Logging Middleware: Request/response logging
  4. Error Handler Middleware: Global error handling
  5. Authentication Middleware: JWT/API key validation

Security Features

  • JWT-based authentication
  • API key support
  • Rate limiting per IP/key
  • SQL injection prevention via ORM
  • Input sanitization
  • Audit logging

Performance Optimizations

  • Database connection pooling
  • Lazy loading relationships
  • Query optimization with indexes
  • Caching support (Redis)
  • Async request handling
  • PDF generation caching

Testing Strategy

Unit Tests

  • Service logic testing
  • Repository method testing
  • Model validation testing

Integration Tests

  • API endpoint testing
  • Database transaction testing
  • PDF processing testing

End-to-End Tests

  • Complete workflow testing
  • Multi-service interaction testing

Deployment Considerations

Environment Variables

# Database
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_DB=stupa
MYSQL_USER=user
MYSQL_PASSWORD=password

# Security
MASTER_KEY=secret_key
JWT_SECRET_KEY=jwt_secret

# Rate Limiting
RATE_IP_PER_MIN=60
RATE_KEY_PER_MIN=30

# PDF Templates
QSM_TEMPLATE=assets/qsm.pdf
VSM_TEMPLATE=assets/vsm.pdf

Docker Support

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

Scaling Considerations

  • Stateless services for horizontal scaling
  • Database read replicas support
  • Cache layer for frequently accessed data
  • Async processing for heavy operations
  • Message queue integration ready

Migration Path

From Old to New Architecture

  1. Phase 1: Setup new structure alongside old code
  2. Phase 2: Migrate database models
  3. Phase 3: Implement service layer
  4. Phase 4: Create API routes
  5. Phase 5: Migrate business logic
  6. Phase 6: Remove old code

Database Migrations

Using Alembic for version control:

alembic init migrations
alembic revision --autogenerate -m "Initial migration"
alembic upgrade head

Monitoring & Observability

  • Structured logging with context
  • Prometheus metrics integration
  • Health check endpoints
  • Performance profiling hooks
  • Error tracking integration ready

Future Enhancements

  1. GraphQL Support: Alternative API interface
  2. WebSocket Support: Real-time updates
  3. Event Sourcing: Audit trail and history
  4. Microservices: Service decomposition
  5. API Gateway: Advanced routing and auth
  6. Message Queue: Async task processing
  7. Search Engine: Elasticsearch integration
  8. Machine Learning: PDF field prediction

Conclusion

This refactored architecture provides:

  • Maintainability: Clear structure and separation
  • Scalability: Ready for growth
  • Testability: Isolated components
  • Extensibility: Plugin-based design
  • Performance: Optimized patterns
  • Security: Built-in best practices

The modular design allows teams to work independently on different components while maintaining system integrity through well-defined interfaces.