stupa-pdf-api/backend/ARCHITECTURE.md
Frederik Beimgraben ad697e5f54 feat: Complete redesign with OIDC auth, PDF upload, and enhanced workflow
BREAKING CHANGE: Major architecture overhaul removing LaTeX compilation

- Removed embedded LaTeX compilation
- Added OIDC/OAuth2 authentication with Nextcloud integration
- Added email authentication with magic links
- Implemented role-based access control (RBAC)
- Added PDF template upload and field mapping
- Implemented visual form designer capability
- Created multi-stage approval workflow
- Added voting mechanism for AStA members
- Enhanced user dashboard with application tracking
- Added comprehensive audit trail and history
- Improved security with JWT tokens and encryption

New Features:
- OIDC single sign-on with automatic role mapping
- Dual authentication (OIDC + Email)
- Upload fillable PDFs as templates
- Graphical field mapping interface
- Configurable workflow with reviews and voting
- Admin panel for role and permission management
- Email notifications for status updates
- Docker compose setup with Redis and MailHog

Migration Required:
- Database schema updates via Alembic
- Configuration of OIDC provider
- Upload of PDF templates to replace LaTeX
- Role mapping configuration
2025-09-17 00:42:57 +02:00

420 lines
11 KiB
Markdown

# Backend Architecture Documentation
## Overview
The backend has been refactored from a monolithic structure into a modular, service-oriented architecture that emphasizes:
- **Separation of Concerns**: Clear boundaries between layers (API, Service, Repository, Model)
- **Dependency Injection**: Dynamic service resolution and configuration
- **Extensibility**: Plugin-based system for PDF variants and providers
- **Maintainability**: Organized code structure with single responsibility principle
- **Scalability**: Stateless services with proper connection pooling
## Directory Structure
```
backend/
├── src/
│ ├── api/ # API Layer
│ │ ├── routes/ # FastAPI routers
│ │ ├── middleware/ # Custom middleware
│ │ └── dependencies/ # Dependency injection helpers
│ │
│ ├── services/ # Business Logic Layer
│ │ ├── base.py # Base service classes
│ │ ├── application.py # Application business logic
│ │ ├── pdf.py # PDF processing service
│ │ └── auth.py # Authentication service
│ │
│ ├── repositories/ # Data Access Layer
│ │ ├── base.py # Base repository pattern
│ │ ├── application.py # Application repository
│ │ └── attachment.py # Attachment repository
│ │
│ ├── models/ # Database Models
│ │ ├── base.py # Base model with mixins
│ │ └── application.py # Application entities
│ │
│ ├── providers/ # Dynamic Providers
│ │ ├── pdf_qsm.py # QSM PDF variant provider
│ │ └── pdf_vsm.py # VSM PDF variant provider
│ │
│ ├── config/ # Configuration Management
│ │ └── settings.py # Centralized settings with Pydantic
│ │
│ ├── core/ # Core Infrastructure
│ │ ├── container.py # Dependency injection container
│ │ └── database.py # Database management
│ │
│ └── utils/ # Utility Functions
│ └── helpers.py # Common utilities
```
## Architecture Layers
### 1. API Layer (`api/`)
**Responsibility**: HTTP request/response handling, validation, routing
- **Routes**: Modular FastAPI routers for different domains
- **Middleware**: Cross-cutting concerns (rate limiting, logging, error handling)
- **Dependencies**: FastAPI dependency injection functions
```python
# Example: api/routes/applications.py
@router.post("/", response_model=ApplicationResponse)
async def create_application(
data: ApplicationCreate,
service: ApplicationService = Depends(get_application_service)
):
return await service.create(data.dict())
```
### 2. Service Layer (`services/`)
**Responsibility**: Business logic, orchestration, validation rules
- Encapsulates all business rules and workflows
- Coordinates between repositories and external services
- Handles complex validations and transformations
- Stateless and testable
```python
# Example: services/application.py
class ApplicationService(CRUDService[Application]):
def submit_application(self, id: int) -> Application:
# Business logic for submission
app = self.repository.get_or_404(id)
self._validate_submission(app)
app.status = ApplicationStatus.SUBMITTED
return self.repository.update(app)
```
### 3. Repository Layer (`repositories/`)
**Responsibility**: Data access abstraction, CRUD operations
- Implements repository pattern for database access
- Provides clean abstraction over SQLAlchemy
- Handles query building and optimization
- Transaction management
```python
# Example: repositories/application.py
class ApplicationRepository(BaseRepository[Application]):
def find_by_status(self, status: ApplicationStatus) -> List[Application]:
return self.query().filter(
Application.status == status
).all()
```
### 4. Model Layer (`models/`)
**Responsibility**: Data structure definition, ORM mapping
- SQLAlchemy models with proper relationships
- Base classes with common functionality (timestamps, soft delete)
- Model mixins for reusable behavior
- Business entity representation
```python
# Example: models/application.py
class Application(ExtendedBaseModel):
__tablename__ = "applications"
pa_id = Column(String(64), unique=True, index=True)
status = Column(SQLEnum(ApplicationStatus))
payload = Column(JSON)
```
## Key Components
### Dependency Injection Container
The system uses a custom dependency injection container for managing service lifecycles:
```python
# core/container.py
class Container:
def register_service(self, name: str, service_class: Type[BaseService]):
# Register service with automatic dependency resolution
def get_service(self, name: str) -> BaseService:
# Retrieve service instance with dependencies injected
```
**Benefits:**
- Loose coupling between components
- Easy testing with mock services
- Dynamic service configuration
- Singleton pattern support
### Configuration Management
Centralized configuration using Pydantic Settings:
```python
# config/settings.py
class Settings(BaseSettings):
database: DatabaseSettings
security: SecuritySettings
rate_limit: RateLimitSettings
storage: StorageSettings
pdf: PDFSettings
app: ApplicationSettings
```
**Features:**
- Environment variable support
- Type validation
- Default values
- Configuration file support (JSON/YAML)
- Dynamic override capability
### Provider Pattern for PDF Variants
Extensible system for handling different PDF types:
```python
# providers/pdf_qsm.py
class QSMProvider(PDFVariantProvider):
def parse_pdf_fields(self, fields: Dict) -> Dict:
# QSM-specific parsing logic
def map_payload_to_fields(self, payload: Dict) -> Dict:
# QSM-specific field mapping
```
**Advantages:**
- Easy to add new PDF variants
- Variant-specific validation rules
- Dynamic provider registration
- Clean separation of variant logic
## Database Architecture
### Base Model Classes
```python
# models/base.py
class BaseModel:
# Common fields and methods
class TimestampMixin:
created_at = Column(DateTime)
updated_at = Column(DateTime)
class SoftDeleteMixin:
is_deleted = Column(Boolean)
deleted_at = Column(DateTime)
class AuditMixin:
created_by = Column(String)
updated_by = Column(String)
```
### Connection Management
- Connection pooling with configurable size
- Automatic retry on connection failure
- Session scoping for transaction management
- Health check utilities
## Service Patterns
### CRUD Service Base
```python
class CRUDService(BaseService):
def create(self, data: Dict) -> T
def update(self, id: Any, data: Dict) -> T
def delete(self, id: Any, soft: bool = True) -> bool
def get(self, id: Any) -> Optional[T]
def list(self, filters: Dict, page: int, page_size: int) -> Dict
```
### Error Handling
Hierarchical exception system:
```python
ServiceException
├── ValidationError
├── BusinessRuleViolation
├── ResourceNotFoundError
└── ResourceConflictError
```
### Transaction Management
```python
with service.handle_errors("operation"):
with repository.transaction():
# Perform multiple operations
# Automatic rollback on error
```
## API Design
### RESTful Endpoints
```
POST /api/applications # Create application
GET /api/applications # List applications
GET /api/applications/{id} # Get application
PUT /api/applications/{id} # Update application
DELETE /api/applications/{id} # Delete application
POST /api/applications/{id}/submit # Submit application
POST /api/applications/{id}/review # Review application
GET /api/applications/{id}/pdf # Generate PDF
```
### Request/Response Models
Using Pydantic for validation:
```python
class ApplicationCreate(BaseModel):
variant: ApplicationType
payload: Dict[str, Any]
class ApplicationResponse(BaseModel):
id: int
pa_id: str
status: ApplicationStatus
created_at: datetime
```
## Middleware Stack
1. **CORS Middleware**: Cross-origin resource sharing
2. **Rate Limit Middleware**: Request throttling
3. **Logging Middleware**: Request/response logging
4. **Error Handler Middleware**: Global error handling
5. **Authentication Middleware**: JWT/API key validation
## Security Features
- JWT-based authentication
- API key support
- Rate limiting per IP/key
- SQL injection prevention via ORM
- Input sanitization
- Audit logging
## Performance Optimizations
- Database connection pooling
- Lazy loading relationships
- Query optimization with indexes
- Caching support (Redis)
- Async request handling
- PDF generation caching
## Testing Strategy
### Unit Tests
- Service logic testing
- Repository method testing
- Model validation testing
### Integration Tests
- API endpoint testing
- Database transaction testing
- PDF processing testing
### End-to-End Tests
- Complete workflow testing
- Multi-service interaction testing
## Deployment Considerations
### Environment Variables
```env
# Database
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_DB=stupa
MYSQL_USER=user
MYSQL_PASSWORD=password
# Security
MASTER_KEY=secret_key
JWT_SECRET_KEY=jwt_secret
# Rate Limiting
RATE_IP_PER_MIN=60
RATE_KEY_PER_MIN=30
# PDF Templates
QSM_TEMPLATE=assets/qsm.pdf
VSM_TEMPLATE=assets/vsm.pdf
```
### Docker Support
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
### Scaling Considerations
- Stateless services for horizontal scaling
- Database read replicas support
- Cache layer for frequently accessed data
- Async processing for heavy operations
- Message queue integration ready
## Migration Path
### From Old to New Architecture
1. **Phase 1**: Setup new structure alongside old code
2. **Phase 2**: Migrate database models
3. **Phase 3**: Implement service layer
4. **Phase 4**: Create API routes
5. **Phase 5**: Migrate business logic
6. **Phase 6**: Remove old code
### Database Migrations
Using Alembic for version control:
```bash
alembic init migrations
alembic revision --autogenerate -m "Initial migration"
alembic upgrade head
```
## Monitoring & Observability
- Structured logging with context
- Prometheus metrics integration
- Health check endpoints
- Performance profiling hooks
- Error tracking integration ready
## Future Enhancements
1. **GraphQL Support**: Alternative API interface
2. **WebSocket Support**: Real-time updates
3. **Event Sourcing**: Audit trail and history
4. **Microservices**: Service decomposition
5. **API Gateway**: Advanced routing and auth
6. **Message Queue**: Async task processing
7. **Search Engine**: Elasticsearch integration
8. **Machine Learning**: PDF field prediction
## Conclusion
This refactored architecture provides:
- **Maintainability**: Clear structure and separation
- **Scalability**: Ready for growth
- **Testability**: Isolated components
- **Extensibility**: Plugin-based design
- **Performance**: Optimized patterns
- **Security**: Built-in best practices
The modular design allows teams to work independently on different components while maintaining system integrity through well-defined interfaces.