PepsiMax Indekset
Komplet teknisk arkitekturoversigt - Januar 2025
Indholdsfortegnelse
1. System Overview
PepsiMax Indekset er en dansk prissammenligningsplatform der tracker PepsiMax priser på tvaers af alle danske supermarkeder. Systemet er bygget pa Cloudflare's edge platform for maksimal ydeevne og minimal latency.
flowchart TB
subgraph Users["Brugere"]
Consumer["Forbruger
(prissammenligning)"]
Admin["Administrator
(datahåndtering)"]
end
subgraph Frontend["Frontend Layer"]
PublicUI["index.html
Offentlig prisside"]
AdminUI["admin.html
Admin dashboard"]
Maps["Leaflet Maps
Butikskort"]
Charts["Chart.js
Prishistorik"]
end
subgraph CloudflareEdge["Cloudflare Edge"]
Pages["Cloudflare Pages
Static hosting + Functions"]
Workers["Cloudflare Workers
Cron dispatcher"]
R2["Cloudflare R2
Billeder"]
KV["KV Storage
Sessions"]
end
subgraph API["API Layer (50+ endpoints)"]
Public["Public API
/api/products, /api/stores
/api/prices, /api/stats"]
AdminAPI["Admin API
/api/admin/*"]
AuthAPI["Auth API
/api/auth/*"]
HealthAPI["Health API
/api/health, /api/debug/*"]
end
subgraph External["External Services"]
Turso[("Turso SQLite
Primary Database")]
Salling["Salling Group API
Bilka, Føtex, Netto"]
Browser["Browser Rendering
Web Scraping"]
GitHub["GitHub OAuth
Authentication"]
Discord["Discord
Notifications"]
end
Consumer --> PublicUI
Admin --> AdminUI
PublicUI --> Maps
PublicUI --> Charts
PublicUI --> Public
AdminUI --> AdminAPI
AdminUI --> AuthAPI
Pages --> Public
Pages --> AdminAPI
Pages --> AuthAPI
Pages --> HealthAPI
Workers --> AdminAPI
Public --> Turso
AdminAPI --> Turso
AdminAPI --> R2
AuthAPI --> GitHub
AuthAPI --> KV
Workers --> Browser
Workers --> Salling
Workers --> Discord
Workers --> Turso
2. Features & User Stories
Systemet er dokumenteret med 48 user stories fordelt på 7 feature-områder. Hver feature har acceptance criteria og API-referencer.
Price Tracking (8 stories)
Sammenlign priser, se tilbud, prishistorik og best deals på tværs af kæder.
Store Locator (5 stories)
Find nærmeste butikker med GPS, filtrer efter kæde, se åbningstider.
Package Types (3 stories)
Browse produkter efter type: enkeltdåser, 6-pack, 24-pack, flasker.
Statistics (2 stories)
Se prisstatistik, gennemsnitspriser og trends over tid.
Admin Dashboard (15 stories)
Komplet CRUD for produkter, kæder, butikker, priser og audit log.
Cron Jobs (10 stories)
Automatiseret prisindsamling, job scheduling og monitoring.
Integrations (5 stories)
Salling Group API integration til Bilka, Føtex og Netto data.
3. API Architecture
API'et følger REST-principper med OpenAPI 3.0.3 dokumentation (2800+ linjer). TypeScript types genereres automatisk fra spec'en.
flowchart LR
subgraph PublicEndpoints["Public Endpoints (ingen auth)"]
direction TB
P1["/api/products
Produkter + priser"]
P2["/api/chains
Supermarkedskæder"]
P3["/api/stores
Butikslokationer"]
P4["/api/nearest-stores
GPS-baseret søgning"]
P5["/api/prices/*
current, best, promotions, history"]
P6["/api/package-types
Produktkategorier"]
P7["/api/stats
Prisstatistik"]
P8["/api/health
System status"]
end
subgraph AdminEndpoints["Admin Endpoints (OAuth required)"]
direction TB
A1["/api/admin/products
CRUD produkter"]
A2["/api/admin/chains
CRUD kæder"]
A3["/api/admin/stores
CRUD butikker"]
A4["/api/admin/prices
Import/slet priser"]
A5["/api/admin/cron-jobs
Job management"]
A6["/api/admin/integrations
Salling sync"]
A7["/api/admin/metrics
System metrics"]
A8["/api/admin/audit
Audit log"]
end
subgraph AuthEndpoints["Auth Endpoints"]
direction TB
AU1["/api/auth/github-login
Start OAuth"]
AU2["/api/auth/github-callback
OAuth callback"]
AU3["/api/auth/session
Verify session"]
AU4["/api/auth/logout
End session"]
end
subgraph DebugEndpoints["Debug Endpoints (admin)"]
direction TB
D1["/api/debug/r2-status
Storage diagnostik"]
D2["/api/debug/investigate-product
Image debugging"]
end
Client((Client)) --> PublicEndpoints
Client --> AuthEndpoints
AdminClient((Admin)) --> AdminEndpoints
AdminClient --> DebugEndpoints
API Endpoint Summary
| Category | Endpoints | Auth | Description |
|---|---|---|---|
| PUBLIC Products | /api/products |
- | Alle produkter med kædepriser |
| PUBLIC Prices | /api/prices/* |
- | current, best, promotions, history, expiration-status |
| PUBLIC Stores | /api/stores, /api/nearest-stores |
- | Butikker med GPS koordinater |
| PUBLIC Package Types | /api/package-types/* |
- | Produktkategorier med priser |
| PUBLIC Statistics | /api/stats/* |
- | Prisstatistik og historik |
| ADMIN Products | /api/admin/products/* |
OAuth | CRUD + image upload |
| ADMIN Cron Jobs | /api/admin/cron-jobs/* |
OAuth | CRUD + manual run + history |
| ADMIN Integrations | /api/admin/integrations/* |
OAuth | Salling sync, import, status |
| ADMIN Monitoring | /api/admin/metrics, /api/admin/analytics |
OAuth | System health og analytics |
| DEBUG Diagnostics | /api/debug/* |
Password | R2 status, image investigation |
4. Data Model
Databasen er hosted på Turso (distributed SQLite) med UUID-baserede primærnøgler.
erDiagram
PRODUCT_TYPES ||--o{ PRODUCTS : categorizes
PRODUCTS ||--o{ PRICE_HISTORY : has
CHAINS ||--o{ PRICE_HISTORY : has
CHAINS ||--o{ STORES : has
CHAINS ||--o{ CHAIN_PRICE_SCHEDULES : has
CRON_JOBS ||--o{ CRON_JOB_RUNS : executes
INTEGRATIONS ||--o{ CRON_JOBS : uses
INTEGRATIONS ||--o{ SYNC_LOG : logs
PRODUCT_TYPES {
uuid id PK
string name
string description
string unit_type
int unit_size_ml
int units_per_package
}
PRODUCTS {
uuid id PK
uuid product_type_id FK
string display_name
string ean
int bundle_size
int volume_ml
int total_volume_ml
string image_url
boolean is_active
int sort_order
}
CHAINS {
uuid id PK
string name
string external_id
string logo_url
string color_hex
string api_source
boolean is_active
}
STORES {
uuid id PK
uuid chain_id FK
string external_id
string name
string address
string city
string postal_code
float latitude
float longitude
json opening_hours
datetime last_synced_at
}
PRICE_HISTORY {
uuid id PK
uuid product_id FK
uuid chain_id FK
uuid store_id FK
decimal price
decimal price_per_liter
boolean is_promotion
date promotion_end_date
string source
boolean is_current
datetime recorded_at
datetime expires_at
}
CRON_JOBS {
string id PK
string name
string description
uuid integration_id FK
string schedule_type
json schedule_config
boolean is_active
datetime last_run_at
string last_run_status
datetime next_run_at
}
CRON_JOB_RUNS {
string id PK
string cron_job_id FK
datetime started_at
datetime completed_at
string status
int items_found
int items_imported
string error_message
json result_details
}
INTEGRATIONS {
uuid id PK
string name
string source
string type
string api_base_url
datetime last_sync_at
string last_sync_status
}
AUDIT_LOG {
uuid id PK
string action
string resource_type
string resource_id
json changes
string user_id
datetime created_at
}
5. Cron & Automation
Prisindsamling sker automatisk via et database-drevet cron system. Jobs kan konfigureres i Admin UI med forskellige schedules.
flowchart TB
subgraph Trigger["Cron Dispatcher (Worker)"]
Schedule["Kører hvert 15. min
:00, :15, :30, :45"]
Query["Query cron_jobs
WHERE is_active = 1
AND next_run_at <= NOW()"]
end
subgraph Mutex["Concurrency Control"]
CheckRunning["Check cron_job_runs
status = 'running'"]
Skip["Skip hvis
allerede kører"]
end
subgraph Execution["Job Execution"]
RunAPI["POST /api/admin/
cron-jobs/:id/run"]
CreateRun["INSERT cron_job_runs
status = 'running'"]
end
subgraph JobTypes["Job Types"]
BrowserJob["Browser Rendering
━━━━━━━━━━━━━
etilbudsavis.dk
tilbudsugen.dk"]
SallingJob["Salling API
━━━━━━━━━━━━━
Bilka, Føtex, Netto
stores + prices"]
end
subgraph Processing["Data Processing"]
Parse["Parse HTML/JSON"]
Match["Match produkter
via navn/EAN"]
Validate["Validér priser"]
Dedup["Deduplicate"]
end
subgraph Output["Output"]
InsertPrices[("INSERT price_history")]
UpdateJob["UPDATE cron_jobs
next_run_at"]
UpdateRun["UPDATE cron_job_runs
status = 'success'"]
Notify["Discord notification
(ved fejl)"]
end
Schedule --> Query
Query --> CheckRunning
CheckRunning -->|Ikke kører| RunAPI
CheckRunning -->|Allerede kører| Skip
RunAPI --> CreateRun
CreateRun --> BrowserJob
CreateRun --> SallingJob
BrowserJob --> Parse
SallingJob --> Parse
Parse --> Match
Match --> Validate
Validate --> Dedup
Dedup --> InsertPrices
InsertPrices --> UpdateRun
UpdateRun --> UpdateJob
UpdateRun --> Notify
Aktive Cron Jobs
| Job | Kæder | Schedule | Integration | Kilde |
|---|---|---|---|---|
| Daglig tilbudsscraping | Alle | Kl. 01:00 UTC | Browser Rendering | etilbudsavis.dk |
| Tilbudsugen scraper | Alle | Kl. 07:30 UTC | Browser Rendering | tilbudsugen.dk |
| Salling Bilka | Bilka | Kl. 07:00 UTC | Salling API | Products API |
| Salling Føtex | Føtex | Kl. 06:05 UTC | Salling API | Products API |
| Salling Netto | Netto | Kl. 05:00 UTC | Salling API | Products API |
6. External Integrations
flowchart LR
subgraph PepsiMaxIndex["PepsiMax Index"]
API["Admin API"]
Parser["Parsers"]
DB[("Turso DB")]
end
subgraph BrowserRendering["Cloudflare Browser Rendering"]
BR["Headless Chrome"]
end
subgraph Salling["Salling Group API"]
StoresAPI["Stores API
/v2/stores"]
ProductsAPI["Products API
/v2/products/:ean"]
end
subgraph Scraped["Scraped Sites"]
ETilbud["etilbudsavis.dk"]
Tilbudsugen["tilbudsugen.dk"]
end
subgraph Notifications["Notifications"]
Discord["Discord Webhooks"]
end
subgraph Auth["Authentication"]
GitHub["GitHub OAuth"]
end
API --> BR
BR --> ETilbud
BR --> Tilbudsugen
ETilbud --> Parser
Tilbudsugen --> Parser
API --> StoresAPI
API --> ProductsAPI
StoresAPI --> Parser
ProductsAPI --> Parser
Parser --> DB
API --> Discord
API --> GitHub
Integration Details
| Integration | Type | Endpoints | Data |
|---|---|---|---|
| Salling Stores API | REST API | /v2/stores?brand={brand} |
700+ butikker (Bilka, Føtex, Netto) |
| Salling Products API | REST API | /v2/products/{ean}?storeId={id} |
Priser per produkt per butik |
| Browser Rendering | Web Scraping | etilbudsavis.dk, tilbudsugen.dk | Ugentlige tilbud alle kæder |
| GitHub OAuth | OAuth 2.0 | github.com/login/oauth | Admin authentication |
| Discord | Webhook | discord.com/api/webhooks | Error notifications |
7. Security & Authentication
flowchart TB
subgraph Public["Offentlige Endpoints"]
GET1["GET /api/products"]
GET2["GET /api/stores"]
GET3["GET /api/prices/*"]
GET4["GET /api/stats"]
GET5["GET /api/health"]
end
subgraph Protected["Beskyttede Endpoints"]
Admin["POST/PUT/DELETE
/api/admin/*"]
CronRun["POST /api/admin/
cron-jobs/:id/run"]
Debug["GET /api/debug/*"]
end
subgraph AuthMethods["Auth Methods"]
OAuth["GitHub OAuth
Session i KV"]
CronSecret["X-Cron-Secret
Header"]
AdminPwd["?password=
Query param"]
end
subgraph OAuthFlow["GitHub OAuth Flow"]
Login["1. /api/auth/github-login"]
Redirect["2. Redirect til GitHub"]
Callback["3. /api/auth/github-callback"]
Verify["4. Verificer username"]
Session["5. Opret KV session"]
Cookie["6. Set session cookie"]
end
Public -->|Ingen auth| Response1["Response"]
Protected --> AuthMethods
OAuth --> Admin
CronSecret --> CronRun
AdminPwd --> Debug
Login --> Redirect
Redirect --> Callback
Callback --> Verify
Verify -->|SlambertDK| Session
Session --> Cookie
Verify -->|Andre| Reject["403 Forbidden"]
Security Layers
| Layer | Protection | Implementation |
|---|---|---|
| Transport | HTTPS only | Cloudflare SSL/TLS |
| Authentication | GitHub OAuth | Kun whitelisted users |
| Session | KV-stored sessions | Encrypted session tokens |
| Cron Auth | Shared secret | X-Cron-Secret header |
| Rate Limiting | Cloudflare WAF | DDoS protection |
| Audit | Audit log | Alle admin actions logges |
8. Monitoring & Health
flowchart LR
subgraph HealthCheck["Health Endpoints"]
Health["/api/health
━━━━━━━━━━
Database status
R2 status
Data counts
Last update"]
end
subgraph AdminMetrics["Admin Metrics"]
Metrics["/api/admin/metrics
━━━━━━━━━━━━
Scraper success rate
Data freshness
Coverage %
Stuck jobs"]
Analytics["/api/admin/analytics
━━━━━━━━━━━━
12-week trends
Jobs per week
Success rates"]
Images["/api/admin/check-images
━━━━━━━━━━━━
Broken URLs
Missing images
Legacy URLs"]
end
subgraph Debug["Debug Endpoints"]
R2Status["/api/debug/r2-status
━━━━━━━━━━━━
R2 binding test
File listing
Write/read test"]
Investigate["/api/debug/investigate-product
━━━━━━━━━━━━
DB record
R2 files
URL conflicts"]
end
subgraph Alerts["Alerting"]
Discord2["Discord Webhook
━━━━━━━━━━
Scraper failures
Stale data (>7 days)"]
end
Health --> Status["System OK/Degraded/Error"]
Metrics --> Dashboard["Admin Dashboard"]
Analytics --> Dashboard
Images --> Dashboard
R2Status --> Troubleshooting["Troubleshooting"]
Investigate --> Troubleshooting
Metrics --> Discord2
Health Status Criteria
| Status | Criteria | Action |
|---|---|---|
| Healthy | Success rate >= 95%, no stale chains | None |
| Warning | Success rate 80-95% OR >3 stale chains OR stuck jobs | Review metrics |
| Critical | Success rate < 80% | Immediate attention |
9. Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Vanilla JS, HTML5, CSS3 | No framework, fast loading |
| Maps | Leaflet.js + OpenStreetMap | Store locator |
| Charts | Chart.js | Price history visualization |
| Backend | Cloudflare Pages Functions | Serverless API (50+ endpoints) |
| Cron | Cloudflare Workers | Scheduled job dispatcher |
| Database | Turso (libSQL/SQLite) | Distributed edge database |
| Storage | Cloudflare R2 | Product/chain images |
| Sessions | Cloudflare KV | OAuth session storage |
| Scraping | Browser Rendering API | Headless Chrome scraping |
| API Docs | OpenAPI 3.0.3 + Swagger UI | Interactive documentation |
| Types | openapi-typescript | Generated TypeScript types |
| Hosting | Cloudflare Pages | Global CDN + CI/CD |
10. File Structure
pepsimax-index/
├── functions/ # Cloudflare Pages Functions
│ ├── api/
│ │ ├── products.js # GET /api/products
│ │ ├── chains.js # GET /api/chains
│ │ ├── stores.js # GET /api/stores
│ │ ├── nearest-stores.js # GET /api/nearest-stores
│ │ ├── health.js # GET /api/health
│ │ ├── prices/ # Price endpoints
│ │ │ ├── index.js # GET /api/prices
│ │ │ ├── current.js # GET /api/prices/current
│ │ │ ├── best.js # GET /api/prices/best
│ │ │ ├── promotions.js # GET /api/prices/promotions
│ │ │ └── history.js # GET /api/prices/history
│ │ ├── package-types/ # Package type endpoints
│ │ ├── stats/ # Statistics endpoints
│ │ ├── auth/ # GitHub OAuth
│ │ │ ├── github-login.js
│ │ │ ├── github-callback.js
│ │ │ └── session.js
│ │ ├── admin/ # Admin endpoints (40+)
│ │ │ ├── _middleware.js # Auth middleware
│ │ │ ├── products.js # CRUD products
│ │ │ ├── chains.js # CRUD chains
│ │ │ ├── stores.js # CRUD stores
│ │ │ ├── prices.js # Price management
│ │ │ ├── cron-jobs/ # Cron job CRUD + run
│ │ │ ├── cron-runs.js # Job history
│ │ │ ├── integrations/ # Salling integration
│ │ │ │ └── salling/
│ │ │ │ ├── sync-brand.js
│ │ │ │ ├── sync-stores.js
│ │ │ │ ├── import-prices.js
│ │ │ │ ├── ean-products.js
│ │ │ │ └── price-status.js
│ │ │ ├── chain-schedules/ # Price refresh schedules
│ │ │ ├── scraper-jobs.js # Scraper job management
│ │ │ ├── metrics.js # System metrics
│ │ │ ├── analytics.js # Weekly analytics
│ │ │ ├── check-images.js # Image health check
│ │ │ └── audit.js # Audit log
│ │ └── debug/ # Debug endpoints
│ │ ├── r2-status.js
│ │ └── investigate-product.js
│ ├── images/ # R2 image server
│ └── utils/ # Shared utilities
│ ├── db.js # Turso client
│ ├── salling-client.js # Salling API client
│ ├── tilbudsavis-parser.js
│ └── chain-normalizer.js
├── workers/
│ └── cron-dispatcher/ # Cron Worker
│ ├── src/index.js
│ └── wrangler.toml
├── src/
│ ├── admin/ # Admin dashboard
│ │ └── admin.html
│ └── types/
│ └── api.d.ts # Generated TypeScript types
├── js/
│ └── main.js # Frontend logic
├── css/
│ └── styles.css # Styling
├── docs/ # Documentation (48 user stories)
│ ├── openapi.yaml # OpenAPI 3.0.3 spec (2800+ lines)
│ ├── README.md # Docs index
│ ├── features/ # Feature documentation
│ │ ├── PRICE-TRACKING.md # 8 user stories
│ │ ├── STORE-LOCATOR.md # 5 user stories
│ │ ├── PACKAGE-TYPES.md # 3 user stories
│ │ ├── STATISTICS.md # 2 user stories
│ │ ├── ADMIN-DASHBOARD.md # 15 user stories
│ │ ├── CRON-JOBS.md # 10 user stories
│ │ └── INTEGRATIONS.md # 5 user stories
│ └── technical/
│ └── SYSTEM-HEALTH.md # Troubleshooting guide
├── index.html # Public frontend
├── architecture.html # This file
└── wrangler.toml # Cloudflare config
Dokumentation Links
| Dokument | Beskrivelse |
|---|---|
| /docs (Swagger UI) | Interactive API documentation |
| openapi.yaml | OpenAPI 3.0.3 specification |
| docs/features/ | Feature documentation (48 user stories) |
| SYSTEM-HEALTH.md | Monitoring & troubleshooting guide |
| DATABASE.md | Database schema & queries |
| CRON-JOBS-AND-INTEGRATIONS.md | Detailed cron system documentation |
Sidst opdateret: Januar 2025 | Bygget med Cloudflare