# Backend Architecture Documentation ## Overview The backend has been refactored from a monolithic structure into a modular, service-oriented architecture that emphasizes: - **Separation of Concerns**: Clear boundaries between layers (API, Service, Repository, Model) - **Dependency Injection**: Dynamic service resolution and configuration - **Extensibility**: Plugin-based system for PDF variants and providers - **Maintainability**: Organized code structure with single responsibility principle - **Scalability**: Stateless services with proper connection pooling ## Directory Structure ``` backend/ ├── src/ │ ├── api/ # API Layer │ │ ├── routes/ # FastAPI routers │ │ ├── middleware/ # Custom middleware │ │ └── dependencies/ # Dependency injection helpers │ │ │ ├── services/ # Business Logic Layer │ │ ├── base.py # Base service classes │ │ ├── application.py # Application business logic │ │ ├── pdf.py # PDF processing service │ │ └── auth.py # Authentication service │ │ │ ├── repositories/ # Data Access Layer │ │ ├── base.py # Base repository pattern │ │ ├── application.py # Application repository │ │ └── attachment.py # Attachment repository │ │ │ ├── models/ # Database Models │ │ ├── base.py # Base model with mixins │ │ └── application.py # Application entities │ │ │ ├── providers/ # Dynamic Providers │ │ ├── pdf_qsm.py # QSM PDF variant provider │ │ └── pdf_vsm.py # VSM PDF variant provider │ │ │ ├── config/ # Configuration Management │ │ └── settings.py # Centralized settings with Pydantic │ │ │ ├── core/ # Core Infrastructure │ │ ├── container.py # Dependency injection container │ │ └── database.py # Database management │ │ │ └── utils/ # Utility Functions │ └── helpers.py # Common utilities ``` ## Architecture Layers ### 1. API Layer (`api/`) **Responsibility**: HTTP request/response handling, validation, routing - **Routes**: Modular FastAPI routers for different domains - **Middleware**: Cross-cutting concerns (rate limiting, logging, error handling) - **Dependencies**: FastAPI dependency injection functions ```python # Example: api/routes/applications.py @router.post("/", response_model=ApplicationResponse) async def create_application( data: ApplicationCreate, service: ApplicationService = Depends(get_application_service) ): return await service.create(data.dict()) ``` ### 2. Service Layer (`services/`) **Responsibility**: Business logic, orchestration, validation rules - Encapsulates all business rules and workflows - Coordinates between repositories and external services - Handles complex validations and transformations - Stateless and testable ```python # Example: services/application.py class ApplicationService(CRUDService[Application]): def submit_application(self, id: int) -> Application: # Business logic for submission app = self.repository.get_or_404(id) self._validate_submission(app) app.status = ApplicationStatus.SUBMITTED return self.repository.update(app) ``` ### 3. Repository Layer (`repositories/`) **Responsibility**: Data access abstraction, CRUD operations - Implements repository pattern for database access - Provides clean abstraction over SQLAlchemy - Handles query building and optimization - Transaction management ```python # Example: repositories/application.py class ApplicationRepository(BaseRepository[Application]): def find_by_status(self, status: ApplicationStatus) -> List[Application]: return self.query().filter( Application.status == status ).all() ``` ### 4. Model Layer (`models/`) **Responsibility**: Data structure definition, ORM mapping - SQLAlchemy models with proper relationships - Base classes with common functionality (timestamps, soft delete) - Model mixins for reusable behavior - Business entity representation ```python # Example: models/application.py class Application(ExtendedBaseModel): __tablename__ = "applications" pa_id = Column(String(64), unique=True, index=True) status = Column(SQLEnum(ApplicationStatus)) payload = Column(JSON) ``` ## Key Components ### Dependency Injection Container The system uses a custom dependency injection container for managing service lifecycles: ```python # core/container.py class Container: def register_service(self, name: str, service_class: Type[BaseService]): # Register service with automatic dependency resolution def get_service(self, name: str) -> BaseService: # Retrieve service instance with dependencies injected ``` **Benefits:** - Loose coupling between components - Easy testing with mock services - Dynamic service configuration - Singleton pattern support ### Configuration Management Centralized configuration using Pydantic Settings: ```python # config/settings.py class Settings(BaseSettings): database: DatabaseSettings security: SecuritySettings rate_limit: RateLimitSettings storage: StorageSettings pdf: PDFSettings app: ApplicationSettings ``` **Features:** - Environment variable support - Type validation - Default values - Configuration file support (JSON/YAML) - Dynamic override capability ### Provider Pattern for PDF Variants Extensible system for handling different PDF types: ```python # providers/pdf_qsm.py class QSMProvider(PDFVariantProvider): def parse_pdf_fields(self, fields: Dict) -> Dict: # QSM-specific parsing logic def map_payload_to_fields(self, payload: Dict) -> Dict: # QSM-specific field mapping ``` **Advantages:** - Easy to add new PDF variants - Variant-specific validation rules - Dynamic provider registration - Clean separation of variant logic ## Database Architecture ### Base Model Classes ```python # models/base.py class BaseModel: # Common fields and methods class TimestampMixin: created_at = Column(DateTime) updated_at = Column(DateTime) class SoftDeleteMixin: is_deleted = Column(Boolean) deleted_at = Column(DateTime) class AuditMixin: created_by = Column(String) updated_by = Column(String) ``` ### Connection Management - Connection pooling with configurable size - Automatic retry on connection failure - Session scoping for transaction management - Health check utilities ## Service Patterns ### CRUD Service Base ```python class CRUDService(BaseService): def create(self, data: Dict) -> T def update(self, id: Any, data: Dict) -> T def delete(self, id: Any, soft: bool = True) -> bool def get(self, id: Any) -> Optional[T] def list(self, filters: Dict, page: int, page_size: int) -> Dict ``` ### Error Handling Hierarchical exception system: ```python ServiceException ├── ValidationError ├── BusinessRuleViolation ├── ResourceNotFoundError └── ResourceConflictError ``` ### Transaction Management ```python with service.handle_errors("operation"): with repository.transaction(): # Perform multiple operations # Automatic rollback on error ``` ## API Design ### RESTful Endpoints ``` POST /api/applications # Create application GET /api/applications # List applications GET /api/applications/{id} # Get application PUT /api/applications/{id} # Update application DELETE /api/applications/{id} # Delete application POST /api/applications/{id}/submit # Submit application POST /api/applications/{id}/review # Review application GET /api/applications/{id}/pdf # Generate PDF ``` ### Request/Response Models Using Pydantic for validation: ```python class ApplicationCreate(BaseModel): variant: ApplicationType payload: Dict[str, Any] class ApplicationResponse(BaseModel): id: int pa_id: str status: ApplicationStatus created_at: datetime ``` ## Middleware Stack 1. **CORS Middleware**: Cross-origin resource sharing 2. **Rate Limit Middleware**: Request throttling 3. **Logging Middleware**: Request/response logging 4. **Error Handler Middleware**: Global error handling 5. **Authentication Middleware**: JWT/API key validation ## Security Features - JWT-based authentication - API key support - Rate limiting per IP/key - SQL injection prevention via ORM - Input sanitization - Audit logging ## Performance Optimizations - Database connection pooling - Lazy loading relationships - Query optimization with indexes - Caching support (Redis) - Async request handling - PDF generation caching ## Testing Strategy ### Unit Tests - Service logic testing - Repository method testing - Model validation testing ### Integration Tests - API endpoint testing - Database transaction testing - PDF processing testing ### End-to-End Tests - Complete workflow testing - Multi-service interaction testing ## Deployment Considerations ### Environment Variables ```env # Database MYSQL_HOST=localhost MYSQL_PORT=3306 MYSQL_DB=stupa MYSQL_USER=user MYSQL_PASSWORD=password # Security MASTER_KEY=secret_key JWT_SECRET_KEY=jwt_secret # Rate Limiting RATE_IP_PER_MIN=60 RATE_KEY_PER_MIN=30 # PDF Templates QSM_TEMPLATE=assets/qsm.pdf VSM_TEMPLATE=assets/vsm.pdf ``` ### Docker Support ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ### Scaling Considerations - Stateless services for horizontal scaling - Database read replicas support - Cache layer for frequently accessed data - Async processing for heavy operations - Message queue integration ready ## Migration Path ### From Old to New Architecture 1. **Phase 1**: Setup new structure alongside old code 2. **Phase 2**: Migrate database models 3. **Phase 3**: Implement service layer 4. **Phase 4**: Create API routes 5. **Phase 5**: Migrate business logic 6. **Phase 6**: Remove old code ### Database Migrations Using Alembic for version control: ```bash alembic init migrations alembic revision --autogenerate -m "Initial migration" alembic upgrade head ``` ## Monitoring & Observability - Structured logging with context - Prometheus metrics integration - Health check endpoints - Performance profiling hooks - Error tracking integration ready ## Future Enhancements 1. **GraphQL Support**: Alternative API interface 2. **WebSocket Support**: Real-time updates 3. **Event Sourcing**: Audit trail and history 4. **Microservices**: Service decomposition 5. **API Gateway**: Advanced routing and auth 6. **Message Queue**: Async task processing 7. **Search Engine**: Elasticsearch integration 8. **Machine Learning**: PDF field prediction ## Conclusion This refactored architecture provides: - **Maintainability**: Clear structure and separation - **Scalability**: Ready for growth - **Testability**: Isolated components - **Extensibility**: Plugin-based design - **Performance**: Optimized patterns - **Security**: Built-in best practices The modular design allows teams to work independently on different components while maintaining system integrity through well-defined interfaces.