stupa-pdf-api/docs/ARCHITECTURE.md

182 lines
5.2 KiB
Markdown

# Architecture Overview
## System Architecture
The STUPA PDF API is built using a microservices architecture with clear separation of concerns between the frontend, backend, and database layers.
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Frontend │────▶│ API Gateway │────▶│ Backend │
│ (React/TS) │ │ (Nginx) │ │ (FastAPI) │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
┌─────────────────┐ ┌─────────────────┐
│ │ │ │
│ Database │ │ File Storage │
│ (MySQL) │ │ (Base64 DB) │
│ │ │ │
└─────────────────┘ └─────────────────┘
```
## Components
### Frontend (React Application)
- **Technology**: React 18 with TypeScript
- **State Management**: With State
- **UI Framework**: Material-UI
- **Build Tool**: Vite
- **Features**:
- Single Page Application (SPA)
- Responsive design
- Real-time form validation
- File upload management
- PDF preview capabilities
### API Gateway (Nginx)
- **Purpose**: Reverse proxy and static file serving
- **Features**:
- Route `/api/*` requests to backend
- Serve static frontend assets
- Handle CORS
- SSL termination (production)
- Request buffering
- Gzip compression
### Backend (FastAPI)
- **Framework**: FastAPI (Python 3.11)
- **ORM**: SQLAlchemy
- **PDF Processing**: PyPDF2, ReportLab
- **Features**:
- RESTful API design
- Automatic API documentation
- Request validation
- Rate limiting
- Authentication middleware
- PDF parsing and generation
### Database (MySQL)
- **Version**: MySQL 8.0
- **Character Set**: utf8mb4
- **Features**:
- Relational data model
- Foreign key constraints
- Indexes for performance
- Transaction support
## Data Flow
### Application Creation Flow
```
1. User uploads PDF → Frontend
2. Frontend sends PDF to API → POST /upload
3. Backend parses PDF → Extracts structured data
4. Backend creates application → Stores in database
5. Backend returns application ID and key
6. Frontend redirects to application view
```
### PDF Generation Flow
```
1. User requests PDF → GET /applications/{id}?format=pdf
2. Backend loads application data
3. Backend fills PDF template
4. Backend returns filled PDF
5. Frontend displays/downloads PDF
```
## Security Architecture
### Authentication Layers
1. **Application Key** (`X-PA-KEY`)
- Generated per application
- Allows read/write access to specific application
- Stored as SHA-256 hash
2. **Master Key** (`X-MASTER-KEY`)
- Environment variable
- Full admin access
- Never exposed to frontend
### Security Features
- Rate limiting per IP and per key
- SQL injection prevention (ORM)
- XSS protection (React)
- CORS configuration
- Input validation
- Secure password hashing
## Database Schema
### Core Tables
- `applications` - Main application data
- `application_keys` - Authentication keys
- `attachments` - File storage (Base64)
- `application_attachments` - Link table
- `comparison_offers` - Cost comparison data
- `cost_position_justifications` - Justification text
### Key Relationships
- Applications ↔ Keys (1:N)
- Applications ↔ Attachments (N:N)
- Applications ↔ Comparison Offers (1:N)
## Scalability Considerations
### Horizontal Scaling
- Stateless API design
- Database connection pooling
- Load balancer ready
- Containerized deployment
### Performance Optimizations
- Database indexes on foreign keys
- Lazy loading of attachments
- Efficient PDF streaming
- Response caching headers
- Gzip compression
## Development vs Production
### Development Environment
- Hot reloading (frontend & backend)
- Debug logging
- Local file storage
- Relaxed CORS
- Default credentials
### Production Environment
- Optimized builds
- Error tracking
- Cloud storage ready
- Strict CORS
- Secret management
- SSL/TLS encryption
## Future Architecture Considerations
### Potential Enhancements
1. **Microservice Separation**
- PDF service
- Authentication service
- Notification service
2. **External Storage**
- S3-compatible object storage
- CDN for static assets
3. **Caching Layer**
- Redis for session management
- Application data caching
4. **Message Queue**
- Async PDF generation
- Email notifications
5. **Monitoring**
- Application metrics
- Performance monitoring
- Error tracking