PepsiMax Indekset - Arkitektur

Tilbage til forsiden

PepsiMax Indekset

Komplet teknisk arkitekturoversigt - Januar 2025

48
User Stories
50+
API Endpoints
13
Supermarkeder
12
Produktvarianter
700+
Butikker
7
Feature Docs

1. System Overview

PepsiMax Indekset er en dansk prissammenligningsplatform der tracker PepsiMax priser på tvaers af alle danske supermarkeder. Systemet er bygget pa Cloudflare's edge platform for maksimal ydeevne og minimal latency.

flowchart TB
    subgraph Users["Brugere"]
        Consumer["Forbruger
(prissammenligning)"] Admin["Administrator
(datahåndtering)"] end subgraph Frontend["Frontend Layer"] PublicUI["index.html
Offentlig prisside"] AdminUI["admin.html
Admin dashboard"] Maps["Leaflet Maps
Butikskort"] Charts["Chart.js
Prishistorik"] end subgraph CloudflareEdge["Cloudflare Edge"] Pages["Cloudflare Pages
Static hosting + Functions"] Workers["Cloudflare Workers
Cron dispatcher"] R2["Cloudflare R2
Billeder"] KV["KV Storage
Sessions"] end subgraph API["API Layer (50+ endpoints)"] Public["Public API
/api/products, /api/stores
/api/prices, /api/stats"] AdminAPI["Admin API
/api/admin/*"] AuthAPI["Auth API
/api/auth/*"] HealthAPI["Health API
/api/health, /api/debug/*"] end subgraph External["External Services"] Turso[("Turso SQLite
Primary Database")] Salling["Salling Group API
Bilka, Føtex, Netto"] Browser["Browser Rendering
Web Scraping"] GitHub["GitHub OAuth
Authentication"] Discord["Discord
Notifications"] end Consumer --> PublicUI Admin --> AdminUI PublicUI --> Maps PublicUI --> Charts PublicUI --> Public AdminUI --> AdminAPI AdminUI --> AuthAPI Pages --> Public Pages --> AdminAPI Pages --> AuthAPI Pages --> HealthAPI Workers --> AdminAPI Public --> Turso AdminAPI --> Turso AdminAPI --> R2 AuthAPI --> GitHub AuthAPI --> KV Workers --> Browser Workers --> Salling Workers --> Discord Workers --> Turso

2. Features & User Stories

Systemet er dokumenteret med 48 user stories fordelt på 7 feature-områder. Hver feature har acceptance criteria og API-referencer.

Price Tracking (8 stories)

Sammenlign priser, se tilbud, prishistorik og best deals på tværs af kæder.

Store Locator (5 stories)

Find nærmeste butikker med GPS, filtrer efter kæde, se åbningstider.

Package Types (3 stories)

Browse produkter efter type: enkeltdåser, 6-pack, 24-pack, flasker.

Statistics (2 stories)

Se prisstatistik, gennemsnitspriser og trends over tid.

Admin Dashboard (15 stories)

Komplet CRUD for produkter, kæder, butikker, priser og audit log.

Cron Jobs (10 stories)

Automatiseret prisindsamling, job scheduling og monitoring.

Integrations (5 stories)

Salling Group API integration til Bilka, Føtex og Netto data.

3. API Architecture

API'et følger REST-principper med OpenAPI 3.0.3 dokumentation (2800+ linjer). TypeScript types genereres automatisk fra spec'en.

flowchart LR
    subgraph PublicEndpoints["Public Endpoints (ingen auth)"]
        direction TB
        P1["/api/products
Produkter + priser"] P2["/api/chains
Supermarkedskæder"] P3["/api/stores
Butikslokationer"] P4["/api/nearest-stores
GPS-baseret søgning"] P5["/api/prices/*
current, best, promotions, history"] P6["/api/package-types
Produktkategorier"] P7["/api/stats
Prisstatistik"] P8["/api/health
System status"] end subgraph AdminEndpoints["Admin Endpoints (OAuth required)"] direction TB A1["/api/admin/products
CRUD produkter"] A2["/api/admin/chains
CRUD kæder"] A3["/api/admin/stores
CRUD butikker"] A4["/api/admin/prices
Import/slet priser"] A5["/api/admin/cron-jobs
Job management"] A6["/api/admin/integrations
Salling sync"] A7["/api/admin/metrics
System metrics"] A8["/api/admin/audit
Audit log"] end subgraph AuthEndpoints["Auth Endpoints"] direction TB AU1["/api/auth/github-login
Start OAuth"] AU2["/api/auth/github-callback
OAuth callback"] AU3["/api/auth/session
Verify session"] AU4["/api/auth/logout
End session"] end subgraph DebugEndpoints["Debug Endpoints (admin)"] direction TB D1["/api/debug/r2-status
Storage diagnostik"] D2["/api/debug/investigate-product
Image debugging"] end Client((Client)) --> PublicEndpoints Client --> AuthEndpoints AdminClient((Admin)) --> AdminEndpoints AdminClient --> DebugEndpoints

API Endpoint Summary

Category Endpoints Auth Description
PUBLIC Products /api/products - Alle produkter med kædepriser
PUBLIC Prices /api/prices/* - current, best, promotions, history, expiration-status
PUBLIC Stores /api/stores, /api/nearest-stores - Butikker med GPS koordinater
PUBLIC Package Types /api/package-types/* - Produktkategorier med priser
PUBLIC Statistics /api/stats/* - Prisstatistik og historik
ADMIN Products /api/admin/products/* OAuth CRUD + image upload
ADMIN Cron Jobs /api/admin/cron-jobs/* OAuth CRUD + manual run + history
ADMIN Integrations /api/admin/integrations/* OAuth Salling sync, import, status
ADMIN Monitoring /api/admin/metrics, /api/admin/analytics OAuth System health og analytics
DEBUG Diagnostics /api/debug/* Password R2 status, image investigation

4. Data Model

Databasen er hosted på Turso (distributed SQLite) med UUID-baserede primærnøgler.

erDiagram
    PRODUCT_TYPES ||--o{ PRODUCTS : categorizes
    PRODUCTS ||--o{ PRICE_HISTORY : has
    CHAINS ||--o{ PRICE_HISTORY : has
    CHAINS ||--o{ STORES : has
    CHAINS ||--o{ CHAIN_PRICE_SCHEDULES : has
    CRON_JOBS ||--o{ CRON_JOB_RUNS : executes
    INTEGRATIONS ||--o{ CRON_JOBS : uses
    INTEGRATIONS ||--o{ SYNC_LOG : logs

    PRODUCT_TYPES {
        uuid id PK
        string name
        string description
        string unit_type
        int unit_size_ml
        int units_per_package
    }

    PRODUCTS {
        uuid id PK
        uuid product_type_id FK
        string display_name
        string ean
        int bundle_size
        int volume_ml
        int total_volume_ml
        string image_url
        boolean is_active
        int sort_order
    }

    CHAINS {
        uuid id PK
        string name
        string external_id
        string logo_url
        string color_hex
        string api_source
        boolean is_active
    }

    STORES {
        uuid id PK
        uuid chain_id FK
        string external_id
        string name
        string address
        string city
        string postal_code
        float latitude
        float longitude
        json opening_hours
        datetime last_synced_at
    }

    PRICE_HISTORY {
        uuid id PK
        uuid product_id FK
        uuid chain_id FK
        uuid store_id FK
        decimal price
        decimal price_per_liter
        boolean is_promotion
        date promotion_end_date
        string source
        boolean is_current
        datetime recorded_at
        datetime expires_at
    }

    CRON_JOBS {
        string id PK
        string name
        string description
        uuid integration_id FK
        string schedule_type
        json schedule_config
        boolean is_active
        datetime last_run_at
        string last_run_status
        datetime next_run_at
    }

    CRON_JOB_RUNS {
        string id PK
        string cron_job_id FK
        datetime started_at
        datetime completed_at
        string status
        int items_found
        int items_imported
        string error_message
        json result_details
    }

    INTEGRATIONS {
        uuid id PK
        string name
        string source
        string type
        string api_base_url
        datetime last_sync_at
        string last_sync_status
    }

    AUDIT_LOG {
        uuid id PK
        string action
        string resource_type
        string resource_id
        json changes
        string user_id
        datetime created_at
    }
        

5. Cron & Automation

Prisindsamling sker automatisk via et database-drevet cron system. Jobs kan konfigureres i Admin UI med forskellige schedules.

flowchart TB
    subgraph Trigger["Cron Dispatcher (Worker)"]
        Schedule["Kører hvert 15. min
:00, :15, :30, :45"] Query["Query cron_jobs
WHERE is_active = 1
AND next_run_at <= NOW()"] end subgraph Mutex["Concurrency Control"] CheckRunning["Check cron_job_runs
status = 'running'"] Skip["Skip hvis
allerede kører"] end subgraph Execution["Job Execution"] RunAPI["POST /api/admin/
cron-jobs/:id/run"] CreateRun["INSERT cron_job_runs
status = 'running'"] end subgraph JobTypes["Job Types"] BrowserJob["Browser Rendering
━━━━━━━━━━━━━
etilbudsavis.dk
tilbudsugen.dk"] SallingJob["Salling API
━━━━━━━━━━━━━
Bilka, Føtex, Netto
stores + prices"] end subgraph Processing["Data Processing"] Parse["Parse HTML/JSON"] Match["Match produkter
via navn/EAN"] Validate["Validér priser"] Dedup["Deduplicate"] end subgraph Output["Output"] InsertPrices[("INSERT price_history")] UpdateJob["UPDATE cron_jobs
next_run_at"] UpdateRun["UPDATE cron_job_runs
status = 'success'"] Notify["Discord notification
(ved fejl)"] end Schedule --> Query Query --> CheckRunning CheckRunning -->|Ikke kører| RunAPI CheckRunning -->|Allerede kører| Skip RunAPI --> CreateRun CreateRun --> BrowserJob CreateRun --> SallingJob BrowserJob --> Parse SallingJob --> Parse Parse --> Match Match --> Validate Validate --> Dedup Dedup --> InsertPrices InsertPrices --> UpdateRun UpdateRun --> UpdateJob UpdateRun --> Notify

Aktive Cron Jobs

Job Kæder Schedule Integration Kilde
Daglig tilbudsscraping Alle Kl. 01:00 UTC Browser Rendering etilbudsavis.dk
Tilbudsugen scraper Alle Kl. 07:30 UTC Browser Rendering tilbudsugen.dk
Salling Bilka Bilka Kl. 07:00 UTC Salling API Products API
Salling Føtex Føtex Kl. 06:05 UTC Salling API Products API
Salling Netto Netto Kl. 05:00 UTC Salling API Products API

6. External Integrations

flowchart LR
    subgraph PepsiMaxIndex["PepsiMax Index"]
        API["Admin API"]
        Parser["Parsers"]
        DB[("Turso DB")]
    end

    subgraph BrowserRendering["Cloudflare Browser Rendering"]
        BR["Headless Chrome"]
    end

    subgraph Salling["Salling Group API"]
        StoresAPI["Stores API
/v2/stores"] ProductsAPI["Products API
/v2/products/:ean"] end subgraph Scraped["Scraped Sites"] ETilbud["etilbudsavis.dk"] Tilbudsugen["tilbudsugen.dk"] end subgraph Notifications["Notifications"] Discord["Discord Webhooks"] end subgraph Auth["Authentication"] GitHub["GitHub OAuth"] end API --> BR BR --> ETilbud BR --> Tilbudsugen ETilbud --> Parser Tilbudsugen --> Parser API --> StoresAPI API --> ProductsAPI StoresAPI --> Parser ProductsAPI --> Parser Parser --> DB API --> Discord API --> GitHub

Integration Details

Integration Type Endpoints Data
Salling Stores API REST API /v2/stores?brand={brand} 700+ butikker (Bilka, Føtex, Netto)
Salling Products API REST API /v2/products/{ean}?storeId={id} Priser per produkt per butik
Browser Rendering Web Scraping etilbudsavis.dk, tilbudsugen.dk Ugentlige tilbud alle kæder
GitHub OAuth OAuth 2.0 github.com/login/oauth Admin authentication
Discord Webhook discord.com/api/webhooks Error notifications

7. Security & Authentication

flowchart TB
    subgraph Public["Offentlige Endpoints"]
        GET1["GET /api/products"]
        GET2["GET /api/stores"]
        GET3["GET /api/prices/*"]
        GET4["GET /api/stats"]
        GET5["GET /api/health"]
    end

    subgraph Protected["Beskyttede Endpoints"]
        Admin["POST/PUT/DELETE
/api/admin/*"] CronRun["POST /api/admin/
cron-jobs/:id/run"] Debug["GET /api/debug/*"] end subgraph AuthMethods["Auth Methods"] OAuth["GitHub OAuth
Session i KV"] CronSecret["X-Cron-Secret
Header"] AdminPwd["?password=
Query param"] end subgraph OAuthFlow["GitHub OAuth Flow"] Login["1. /api/auth/github-login"] Redirect["2. Redirect til GitHub"] Callback["3. /api/auth/github-callback"] Verify["4. Verificer username"] Session["5. Opret KV session"] Cookie["6. Set session cookie"] end Public -->|Ingen auth| Response1["Response"] Protected --> AuthMethods OAuth --> Admin CronSecret --> CronRun AdminPwd --> Debug Login --> Redirect Redirect --> Callback Callback --> Verify Verify -->|SlambertDK| Session Session --> Cookie Verify -->|Andre| Reject["403 Forbidden"]

Security Layers

Layer Protection Implementation
Transport HTTPS only Cloudflare SSL/TLS
Authentication GitHub OAuth Kun whitelisted users
Session KV-stored sessions Encrypted session tokens
Cron Auth Shared secret X-Cron-Secret header
Rate Limiting Cloudflare WAF DDoS protection
Audit Audit log Alle admin actions logges

8. Monitoring & Health

flowchart LR
    subgraph HealthCheck["Health Endpoints"]
        Health["/api/health
━━━━━━━━━━
Database status
R2 status
Data counts
Last update"] end subgraph AdminMetrics["Admin Metrics"] Metrics["/api/admin/metrics
━━━━━━━━━━━━
Scraper success rate
Data freshness
Coverage %
Stuck jobs"] Analytics["/api/admin/analytics
━━━━━━━━━━━━
12-week trends
Jobs per week
Success rates"] Images["/api/admin/check-images
━━━━━━━━━━━━
Broken URLs
Missing images
Legacy URLs"] end subgraph Debug["Debug Endpoints"] R2Status["/api/debug/r2-status
━━━━━━━━━━━━
R2 binding test
File listing
Write/read test"] Investigate["/api/debug/investigate-product
━━━━━━━━━━━━
DB record
R2 files
URL conflicts"] end subgraph Alerts["Alerting"] Discord2["Discord Webhook
━━━━━━━━━━
Scraper failures
Stale data (>7 days)"] end Health --> Status["System OK/Degraded/Error"] Metrics --> Dashboard["Admin Dashboard"] Analytics --> Dashboard Images --> Dashboard R2Status --> Troubleshooting["Troubleshooting"] Investigate --> Troubleshooting Metrics --> Discord2

Health Status Criteria

Status Criteria Action
Healthy Success rate >= 95%, no stale chains None
Warning Success rate 80-95% OR >3 stale chains OR stuck jobs Review metrics
Critical Success rate < 80% Immediate attention

9. Tech Stack

Layer Technology Purpose
Frontend Vanilla JS, HTML5, CSS3 No framework, fast loading
Maps Leaflet.js + OpenStreetMap Store locator
Charts Chart.js Price history visualization
Backend Cloudflare Pages Functions Serverless API (50+ endpoints)
Cron Cloudflare Workers Scheduled job dispatcher
Database Turso (libSQL/SQLite) Distributed edge database
Storage Cloudflare R2 Product/chain images
Sessions Cloudflare KV OAuth session storage
Scraping Browser Rendering API Headless Chrome scraping
API Docs OpenAPI 3.0.3 + Swagger UI Interactive documentation
Types openapi-typescript Generated TypeScript types
Hosting Cloudflare Pages Global CDN + CI/CD

10. File Structure

pepsimax-index/
├── functions/                    # Cloudflare Pages Functions
│   ├── api/
│   │   ├── products.js          # GET /api/products
│   │   ├── chains.js            # GET /api/chains
│   │   ├── stores.js            # GET /api/stores
│   │   ├── nearest-stores.js    # GET /api/nearest-stores
│   │   ├── health.js            # GET /api/health
│   │   ├── prices/              # Price endpoints
│   │   │   ├── index.js         # GET /api/prices
│   │   │   ├── current.js       # GET /api/prices/current
│   │   │   ├── best.js          # GET /api/prices/best
│   │   │   ├── promotions.js    # GET /api/prices/promotions
│   │   │   └── history.js       # GET /api/prices/history
│   │   ├── package-types/       # Package type endpoints
│   │   ├── stats/               # Statistics endpoints
│   │   ├── auth/                # GitHub OAuth
│   │   │   ├── github-login.js
│   │   │   ├── github-callback.js
│   │   │   └── session.js
│   │   ├── admin/               # Admin endpoints (40+)
│   │   │   ├── _middleware.js   # Auth middleware
│   │   │   ├── products.js      # CRUD products
│   │   │   ├── chains.js        # CRUD chains
│   │   │   ├── stores.js        # CRUD stores
│   │   │   ├── prices.js        # Price management
│   │   │   ├── cron-jobs/       # Cron job CRUD + run
│   │   │   ├── cron-runs.js     # Job history
│   │   │   ├── integrations/    # Salling integration
│   │   │   │   └── salling/
│   │   │   │       ├── sync-brand.js
│   │   │   │       ├── sync-stores.js
│   │   │   │       ├── import-prices.js
│   │   │   │       ├── ean-products.js
│   │   │   │       └── price-status.js
│   │   │   ├── chain-schedules/ # Price refresh schedules
│   │   │   ├── scraper-jobs.js  # Scraper job management
│   │   │   ├── metrics.js       # System metrics
│   │   │   ├── analytics.js     # Weekly analytics
│   │   │   ├── check-images.js  # Image health check
│   │   │   └── audit.js         # Audit log
│   │   └── debug/               # Debug endpoints
│   │       ├── r2-status.js
│   │       └── investigate-product.js
│   ├── images/                  # R2 image server
│   └── utils/                   # Shared utilities
│       ├── db.js                # Turso client
│       ├── salling-client.js    # Salling API client
│       ├── tilbudsavis-parser.js
│       └── chain-normalizer.js
├── workers/
│   └── cron-dispatcher/         # Cron Worker
│       ├── src/index.js
│       └── wrangler.toml
├── src/
│   ├── admin/                   # Admin dashboard
│   │   └── admin.html
│   └── types/
│       └── api.d.ts             # Generated TypeScript types
├── js/
│   └── main.js                  # Frontend logic
├── css/
│   └── styles.css               # Styling
├── docs/                        # Documentation (48 user stories)
│   ├── openapi.yaml             # OpenAPI 3.0.3 spec (2800+ lines)
│   ├── README.md                # Docs index
│   ├── features/                # Feature documentation
│   │   ├── PRICE-TRACKING.md    # 8 user stories
│   │   ├── STORE-LOCATOR.md     # 5 user stories
│   │   ├── PACKAGE-TYPES.md     # 3 user stories
│   │   ├── STATISTICS.md        # 2 user stories
│   │   ├── ADMIN-DASHBOARD.md   # 15 user stories
│   │   ├── CRON-JOBS.md         # 10 user stories
│   │   └── INTEGRATIONS.md      # 5 user stories
│   └── technical/
│       └── SYSTEM-HEALTH.md     # Troubleshooting guide
├── index.html                   # Public frontend
├── architecture.html            # This file
└── wrangler.toml                # Cloudflare config

Dokumentation Links

Dokument Beskrivelse
/docs (Swagger UI) Interactive API documentation
openapi.yaml OpenAPI 3.0.3 specification
docs/features/ Feature documentation (48 user stories)
SYSTEM-HEALTH.md Monitoring & troubleshooting guide
DATABASE.md Database schema & queries
CRON-JOBS-AND-INTEGRATIONS.md Detailed cron system documentation

Sidst opdateret: Januar 2025 | Bygget med Cloudflare