SONI-remix-new — Tech audit — Scalability & Performance

Kritikus Launch előtt L SCA-001 · scalability

Coach chat endpoint loads massive context on every turn (10+ Supabase queries + 7-day biometrics + full profile JSON-stringified into prompt)

Kód-hely

_clients/SONI-remix-new/src/server/_shared/coach-context.ts:27-50, 211-221

Evidence

buildCoachContext (called on EVERY /api/coach-chat POST and EVERY /api/voice-coach-chat POST) executes 11 Supabase queries via Promise.all (lines 38-50): profiles, biometrics(7 rows), meals(today), body_biometry_scans(latest), subjective_pulse(3 rows), lifestyle_logs(today), habit_logs(today), workout_logs(today), meals(last-1), coach_facts(top-10), workout_logs(last-14-days), daily_intents(today). loadAndBuildBodyTrendBlock at line 56 adds further queries. The route handler then runs ADDITIONAL queries in parallel: ritualSignals, functionalAgeBlock, snapshotArcBlock, todaysNorthStar (api.coach-chat.ts:863-869), plus body_biometry_scans again (line 873), profiles again (line 641), coach_memory_threads (line 902), stagnationDays (line 903). The final system prompt embeds JSON.stringify(profile), JSON.stringify(biometrics.data ?? []), and full today rows raw (coach-context.ts:212-220). On each user turn the server fires ~16-18 queries against Postgres AND constructs a prompt that includes a full week of biometrics + entire profile + all today meals/workouts/habits.

A probléma

Every coach turn is amplifying database load by a factor of ~17 and token cost by sending the full user-state snapshot to the LLM. With 100 concurrent users each sending one message per minute, that is ~1,700 Postgres queries per minute on top of the embedding overhead. Supabase pooler default of 60-90 connections will saturate well before the user-load justifies it. There is no caching of the assembled context: two messages 5 seconds apart by the same user re-run all 17 queries from scratch.

Üzleti hatás

At 100 daily active users sending 10 messages/day, the chat path alone executes ~17,000 Postgres queries per day from the coach endpoint. At 1,000 DAU this is 170k queries/day. AI-gateway cost: an 8-12 KB system prompt at ~3000 input tokens x 10 turns/day x 1000 users = 30M input tokens/day. At openai/gpt-5 indicative pricing (~$5/1M input tokens) that is ~$150/day input cost just for the coach endpoint context.

Magyarázat

Every time a user sends a chat message to your coach, your server runs around 17 database queries and packs the user entire week of biometrics, full profile, all today meals and workouts into the prompt sent to OpenAI. None of this is cached, so two messages five seconds apart re-do all the work.

Javaslat

Introduce a per-user context cache keyed by (userId, dayKey) with a 60-90 second TTL via Cloudflare Workers KV or an in-Worker Map.
Split the context blocks by volatility: profile (5 min cache), biometrics-7d (5 min), today meals/lifestyle (15 sec, invalidate on log events).
Reduce the prompt: summarise meal/lifestyle rows numerically rather than JSON-stringifying.
Use Postgres views or RPC functions to fold 11 queries into 2-3.
Add a token-count log per turn and alert when system_prompt_tokens > 4000.

Becsült munka

L — 1–2 weeks

Kapcsolódó dimenziók

SEC-005

Major overlap with AI integration (cost economics). Overlaps with Security (SEC-005 rate-limiting amplifies cost-runaway).

References

Magas Launch előtt S SCA-002 · scalability

Per-coach-turn fact extraction fires a SECOND openai/gpt-5 call on every chat turn - doubles AI cost per message

Kód-hely

_clients/SONI-remix-new/src/routes/api.coach-chat.ts:440-519

Evidence

extractAndPinFacts (lines 440-519) is called after every coach response. It selects up to 50 existing coach_facts (lines 458-462), then unconditionally calls the AI gateway with model openai/gpt-5 (line 483), passing the system prompt + user message + assistant message + existing-facts list. The only gate is a heuristic on lines 453-455 that skips only messages under 30 chars with no first-person keywords. No batching, no probabilistic sampling, no caching, no timeout/abort.

A probléma

Every text-coach turn now incurs TWO openai/gpt-5 calls: the streaming response (max_completion_tokens: 1600) plus the fact-extraction. For voice-coach turns the extractor is NOT called, so cost asymmetry between text and voice is unexpected. Fact extraction provides marginal value (most turns produce 0 new facts) but doubles per-turn AI cost. It also runs synchronously in the request handler. Combined with SCA-001, a single user message triggers ~17 Postgres queries + 2 paid GPT-5 calls.

Üzleti hatás

At 1000 DAU x 10 messages/day, ~10,000 extra openai/gpt-5 calls per day - at indicative pricing $5/1M input + $15/1M output, ~1000 input + 200 output tokens per call ~$10-15/day of pure overhead. A power user firing 50 messages in one session hits the gateway 100 times in minutes.

Magyarázat

Every chat message a user sends actually triggers two AI calls: the visible reply, and a hidden background call that tries to extract long-term facts. You pay for the hidden call on every turn even when it returns nothing.

Javaslat

Throttle fact extraction to every Nth turn (e.g. every 5th user message).
Extend the personal-disclosure keyword pre-filter on line 454.
Move fact extraction off the request hot path via Cloudflare Queue or a scheduled cron.
Use a cheaper model (gpt-5-mini, gemini-flash) for this classification task.
Cap with AbortSignal so a slow gateway cannot block the request handler.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

SCA-001, SEC-005

Overlaps with AI integration (cost) and Security (SEC-005 abuse amplification).

References

Magas Launch előtt M SCA-003 · scalability

i18n: all 6 locale JSON files (~858 KB total) eagerly imported and shipped to every visitor

Kód-hely

_clients/SONI-remix-new/src/i18n/index.ts:3-26

Evidence

src/i18n/index.ts imports all 6 locale files at module scope (lines 3-8): en.json (135 KB), de.json (143 KB), es.json (141 KB), fr.json (146 KB), hu.json (154 KB), it.json (140 KB) - total raw 858 KB across ~19,194 lines. The resources object on lines 19-26 wires all six into i18next at init time, so the bundler cannot tree-shake any of them. There is no namespace splitting, no lazy backend (i18next-http-backend / i18next-resources-to-backend), and lng is hard-coded to en with the user saved language applied only after hydration.

A probléma

Every visitor downloads JSON for all six languages on first page load even though they will only ever read one. After gzip the payload is roughly 200-280 KB of redundant translation text - large for a mobile-first PWA where TTI is heavily affected by JS size. The i18n module is imported eagerly at the root (__root.tsx:13), so the bundler cannot split it out of the critical-path chunk. For the Cloudflare Worker SSR pass all six locale JSONs are bundled into the Worker code itself, contributing to the 10 MB compressed Worker limit and to cold-start parse time.

Üzleti hatás

Slower first-contentful-paint on every visitor - especially mobile on cellular. ~200 KB of avoidable JS adds ~300-500 ms to interactive on a mid-tier Android. Core Web Vitals (LCP/INP) are affected. Cloudflare Worker cold-start: each cold isolate parses the locale bundles, adding ~50-100 ms to that subset of requests.

Magyarázat

Your app supports six languages but every visitor downloads all six translation files on first visit - about 200 KB of extra data. The fix is to fetch just the user language on demand.

Javaslat

Adopt i18next-resources-to-backend or i18next-http-backend and split locales into one chunk per language.
Keep en.json in the critical chunk; lazy-import the others via dynamic import of ./locales/de.json.
Consider namespace splitting (onboarding, settings, errors only loaded on those routes).
Add a bundle-size budget to CI.
Verify with vite-bundle-analyzer how much of the current bundle is locale JSON.

Becsült munka

M — 1–3 days

Kapcsolódó dimenziók

Pure scalability/performance. Overlaps with Ops (CI bundle budgets).

References

Magas Launch előtt M SCA-004 · scalability

Unbounded coach_messages and coach_conversations load on chat sheet open - no LIMIT on history

Kód-hely

_clients/SONI-remix-new/src/components/CoachChatSheet.tsx:183-209

Evidence

CoachChatSheet.tsx lines 183-187 select id, updated_at from coach_conversations filtered by user_id, ordered ascending - NO LIMIT clause. Then lines 193-197 select role, content, created_at from coach_messages filtered by conversation_id IN (convIds), ordered ascending - NO LIMIT, fetches every message ever sent across every conversation the user has had. The coach_messages table has an index on (conversation_id, created_at) per migration 20260418115322 line 161 but no index supports the user-scoped conversations list query. No pagination, no infinite scroll, no client-side cap.

A probléma

On day 30 of usage a daily user has ~30 conversations x 20-50 messages each = 600-1500 message rows. By day 180 that is 3,600-9,000 rows. The query loads them all into memory and renders via react-markdown. Three failure modes: bandwidth (5-15 MB on every chat-sheet open after months); render time (react-markdown invoked per message causes main-thread stalls); database (user_id with no LIMIT means the planner falls back to scan-and-filter as the user table grows). No index supports coach_conversations(user_id, updated_at) for the conversations list query.

Üzleti hatás

UX degrades silently the longer the user uses the product. A heavy user 6 months in sees a 3-5 second freeze when opening the chat sheet. Cost angle: every chat-sheet open re-downloads the entire history (no client cache) - Supabase egress fees scale linearly. Database angle: with 10k users averaging 1000 messages each, this query pattern over a 10M-row table becomes the dominant Postgres workload.

Magyarázat

When a user opens the chat with their coach, your app downloads every message they have ever sent or received. After a few months of daily use that is thousands of messages - megabytes of data on every open. The fix is to load only the most recent 50-100 messages and paginate older ones.

Javaslat

Add LIMIT 100 (or.
and ORDER BY created_at DESC to messages query; reverse client-side.
For conversations list add LIMIT 20 ORDER BY updated_at DESC.
Infinite-scroll pagination using .lt(created_at, oldestLoadedTimestamp).
Add covering index coach_conversations(user_id, updated_at DESC).
Archive messages older than 90 days to a coach_messages_archive table.
Cache fetched history in tanstack-query with a 60-second staleTime.
Log response size to alert on > 1 MB per open.

Becsült munka

M — 1–3 days

Kapcsolódó dimenziók

Overlaps with Data integrity (long-term retention policy) and AI integration (long history is fed back into prompts).

References

Magas Launch előtt S SCA-005 · scalability

Coach-chat AI gateway fetch has no timeout - a hung upstream stalls the Worker request indefinitely

Kód-hely

_clients/SONI-remix-new/src/routes/api.coach-chat.ts:1099-1127, 479-491

Evidence

callGatewayWithRetry (api.coach-chat.ts:1099-1127) invokes fetch to https://ai.gateway.lovable.dev/v1/chat/completions with NO AbortController, NO signal, NO timeout. Same pattern in extractAndPinFacts (lines 479-491). callAITool in _shared/ai-tool-call.ts:52-78 - no timeout. voice-coach gateway calls at api.voice-coach-chat.ts:253-262 and 377-386 - no timeout. The single explicit setTimeout in the codebase is api.coach-chat.ts:814-816, which RACES a fetch (already inflight without abort) against a 20s timeout - the race resolves but the underlying fetch keeps running and consuming Worker time. Cloudflare Workers have a 30-second wall-time limit by default.

A probléma

When the AI gateway has a transient slow-down, every in-flight coach request hangs until the Worker kills it at 30 s. The user sees a spinning indicator then a hard error. The retry logic in callGatewayWithRetry retries on transient 5xx - but if the first call is stuck (no response), it never enters the retry branch. Combined with the inFlightTurns dedup (45 sec window), a hung request also blocks the user from retrying the same message for 45 seconds.

Üzleti hatás

During an AI gateway incident every coach request is degraded - users see a long delay then a hard error, and may abandon. Worker bill is also affected: stalled requests consume the full 30s CPU/wall budget. At the upper bound of 100 concurrent stuck requests, queueing backlog can measurably degrade p99 for the whole app, not just chat.

Magyarázat

If the AI service is slow your server has no time limit on waiting for it. Each chat request can hang for 30 seconds before failing, tying up resources other users need. The fix is a 15-20 second timeout and fail-fast behaviour.

Javaslat

Wrap every fetch to the AI gateway in an AbortController with a 20-second timeout.
Centralize in a helper aiGatewayFetch(url, opts, { timeoutMs }).
On AbortError treat as transient 5xx and let retry path run.
Explicit 15-second timeout to the SSE first-byte.
Log timeout vs error vs success per call.
Add a Cloudflare limits.cpu_ms guard in wrangler.jsonc.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

SCA-001, SEC-005

Overlaps with Ops (observability) and Security (SEC-005 denial-of-wallet protection).

References

Magas Launch előtt L SCA-006 · scalability

Cron endpoints process all users in a tight sequential for-loop with no concurrency control or backpressure

Kód-hely

_clients/SONI-remix-new/src/routes/api/public/hooks/body-plateau-detect.ts:38-72

Evidence

body-plateau-detect.ts:38-41 selects ALL rows from body_progress_state where goal_pace_status in (plateau, reverse) - no LIMIT. Lines 55-72 iterate over them in a for...of loop, awaiting detectAndEmitBodyPlateau per user. Each iteration: 5 parallel Postgres queries + 1 paid openai/gpt-5 call + 1 INSERT into coaching_moments + 1 push notification. weekly-reports.ts:88-115 does the same: sequential for-loop, one runOrGenerateReport call per user. At 10,000 users the loop will take 10k x ~2-5s = 6-14 hours, exceeding Cloudflare 30s default limit and pg_cron net.http_post 60-second timeout.

A probléma

These cron handlers will run today (small base) but will silently fail to complete as the userbase grows past ~30-50 active users. The Worker dies at 30 seconds; pg_cron times out at 60 seconds. Loop interrupted mid-iteration with no resume marker - most users miss their weekly report or plateau check. No idempotency token, no resume cursor. Combined with unauthenticated cron endpoints (SEC-002, SEC-003), an attacker can hit them repeatedly to amplify cost.

Üzleti hatás

Silent feature degradation as you grow: at 30 users today everything works; at 100 users cron starts cutting off mid-batch; at 500 users it never finishes. End-user-visible symptom is mysterious: power users report missing Sunday reports. AI cost: each timed-out cron still consumed credits for every user processed before the cut.

Magyarázat

Your weekly-report and plateau-detection crons process users one at a time in a single 30-second request. Works for 30-50 users today; at 100+ the job will time out before reaching everyone, and some users silently stop getting their reports.

Javaslat

Move per-user work into a Cloudflare Queue or Supabase pg_net background job - cron handler enqueues a task per user, returns immediately.
Add a resume cursor (last_processed_user_id).
Split by time-zone bucket so each hourly cron tick processes only users whose local time matches.
Immediate-term: Promise.all-with-limit (p-limit, batch of 5-.
AND a hard time budget that returns 200 with summary processed/remaining.
Observability: log start/end timestamp and remaining; alert when remaining > 0.

Becsült munka

L — 1–2 weeks

Kapcsolódó dimenziók

SEC-002, SEC-003

Major overlap with Security (SEC-002, SEC-003 - unauthenticated endpoints allow attackers to amplify this load). Overlaps with Ops.

References

Magas Első sprint L SCA-007 · scalability

AI cost runaway: per-user concurrent AI calls unbounded; no per-user daily token budget; no usage telemetry

Kód-hely

_clients/SONI-remix-new/src/routes/api.coach-chat.ts:613-621, 1089-1094

Evidence

Per stack-profile and grep, 38+ server files invoke the Lovable AI Gateway. The only concurrency control is api.coach-chat.ts:613-621 - an in-memory inFlightTurns Map that dedupes identical (userId:conversationId:lastUserMessage) triples within 45 seconds. This Map is per-isolate (Workers spawn many isolates), so the same user hitting two isolates can fire concurrent gpt-5 calls; varying message content trivially bypasses dedup; opening multiple tabs bypasses it. No per-user daily token budget anywhere in the codebase (grep for tokens_used / daily_limit / budget / quota = 0 matches in server cost models). No file logs per-request token counts (gateway response is consumed via SSE without inspecting the usage field). max_completion_tokens bounds OUTPUT but INPUT is unbounded.

A probléma

Cost scaling is essentially linear-in-bad-actor: a single authenticated user with a script (or buggy frontend re-firing on keystrokes) can spawn dozens of parallel openai/gpt-5 calls. No application-level circuit breaker - only the gateway 429 (caught and surfaced, but not used to throttle subsequent attempts). Combined with SCA-001 and SCA-002, a user firing 50 messages in 5 minutes can consume $5-10 of credit; an attacker bypassing dedup with varied content can consume orders of magnitude more. No usage attribution: when the bill arrives there is no way to identify which users drove it.

Üzleti hatás

Direct dollar cost: a single power user can cost $10-50/day; a malicious script with one stolen session token can cost $100-1000/day. With no per-user budget the only ceiling is the prepaid Lovable credit balance, drainable in hours. Lack of usage attribution makes post-incident triage impossible.

Magyarázat

There is no spending cap per user on AI calls. One user, one bug, or one bad actor can fire many AI calls in parallel and drain your AI credit balance. You also have no way to see which user is driving cost.

Javaslat

Add a database monthly budget per user in profiles (monthly_ai_tokens_used INTEGER, monthly_ai_tokens_limit INTEGER).
Wrap every AI gateway call site in a chooseAndCallAI(userId, model, request) helper that checks the budget, reads the usage block, increments monthly_ai_tokens_used, and logs to an ai_call_log table for attribution.
Add an organisation-wide circuit breaker via a Durable Object counter.
Surface remaining quota in the UI.
Stream usage events so cost is recorded in real time.
Cache deterministic prompts (e.g. relocalize).

Becsült munka

L — 1–2 weeks

Kapcsolódó dimenziók

SCA-001, SCA-002, SEC-005

Major overlap with AI integration (cost economics) and Security (SEC-005 abuse rate-limiting).

References

Közepes Első sprint M SCA-008 · scalability

Frontend src/assets contains 5 raw mockup images >1 MB each (~8 MB total) imported via @/assets

Kód-hely

_clients/SONI-remix-new/src/assets:n/a

Evidence

ls -la src/assets shows raw assets: biotwin-mock-overlay.jpg (1.36 MB), coach-mockup-A-split-top.png (1.76 MB), coach-mockup-B-overlay.png (1.91 MB), coach-mockup-C-floating-avatar.png (1.58 MB), voice-coach-mic-3d.png (1.31 MB) - together ~7.9 MB unoptimised. Plus ~74 other JPGs in 50-300 KB range - total src/assets is 13 MB on disk. Repo-wide grep for import from @/assets shows 62 import sites across 14 files. No Vite image-optimization plugin in vite.config.ts. No WebP/AVIF generation. No responsive srcset. Of 30 components using img tags, only 17 use loading=lazy.

A probléma

Several of these images are likely mockups only used on marketing screens (or unused after a UI iteration). On a 4G connection 8 MB of images = ~15-20 seconds of perceived sluggishness. Image bytes are NOT bundled into the Worker (served as static assets) so this is a client-bundle and CDN-cost concern, not a Worker-size issue.

Üzleti hatás

Slower perceived performance for mobile users (chief target persona). ~2 MB per mockup means each marketing screen takes 3-5 seconds to render on typical 4G. CDN egress costs scale with bytes: 1000 visits/day to a 1.5 MB PNG screen = 1.5 GB/day. If any mockup PNGs are not referenced from current routes, they are pure dead weight.

Magyarázat

Your app ships mockup images at 1.5-2 MB each in raw PNG. On a phone over cellular each one takes a few seconds to load. Converting to WebP/AVIF and lazy-loading below-the-fold images would cut this substantially.

Javaslat

One-time pass with squoosh-cli or sharp to convert PNG mockups to WebP at 85% quality (typical 70-90% reduction).
Add vite-imagetools or vite-plugin-image-optimizer to vite.config.ts.
Audit each src/assets file - delete any not imported anywhere.
Add loading=lazy decoding=async to every below-the-fold img tag.
Use picture element with WebP + AVIF + PNG fallback and responsive srcset for hero images.
Add CI bundle-size check that fails when individual assets exceed 500 KB.

Becsült munka

M — 1–3 days

Kapcsolódó dimenziók

Overlaps with Code quality (dead assets) and Ops (bundle-size CI gate).

References

Közepes Első sprint S SCA-009 · scalability

No Cache-Control / CDN cache rules configured - every static request hits the Worker / origin

Kód-hely

_clients/SONI-remix-new/wrangler.jsonc:1-7

Evidence

wrangler.jsonc contains 5 fields (name, compatibility_date, compatibility_flags, main) - no routes cache directives, no observability section, no vars. There is no public/_headers file (only public/sw.js). There is no vercel.json or netlify.toml. Grep for Cache-Control in src returns matches only for getPublicUrl/createSignedUrl (Supabase Storage default), NOT for the app own HTTP responses. The SSE responses from api.coach-chat.ts and api.voice-coach-chat.ts have Content-Type but no Cache-Control header. Static assets in /dist/_app/ are fingerprinted by Vite but the response headers are not explicitly set to immutable.

A probléma

Without explicit Cache-Control: public, max-age=31536000, immutable on fingerprinted assets, Cloudflare CDN behaves conservatively - every cold-cache request hits the origin Worker, increasing CPU billing and worsening cold-start exposure. The HTML SSR response has no Cache-Control either. Sensitive endpoints should be marked no-store explicitly to avoid intermediate cache mishaps - they are not.

Üzleti hatás

Higher Cloudflare Worker invocation count and CPU time, because every static asset request goes through the Worker instead of being served from edge cache. At 1000 daily visitors loading 20 assets each = 20,000 extra Worker invocations per day. The HTML page being uncached prevents auto-minification and brotli cache hits.

Magyarázat

Your app does not tell browsers and Cloudflare how long to cache things. Every visitor re-downloads every image, font, and CSS file on every visit. Adding standard cache headers is a 30-minute change that reduces both your traffic and your visitors load time.

Javaslat

Add a public/_headers file (Cloudflare Pages-style) or header-injection middleware in src/start.ts. Recommended rules: /_app/* with Cache-Control: public, max-age=31536000, immutable; /assets/* with public, max-age=86400, stale-while-revalidate=604800; /api/* with no-store; / (landing) with public, max-age=300, stale-while-revalidate=3600.
Verify with curl -I that Vite fingerprinted assets get the immutable header.
SSE endpoints: explicit Cache-Control: no-store, no-cache, must-revalidate.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

SEC-004

Overlaps with Ops (deployment hardening) and Security (SEC-004 - natural place to add security headers too).

References

Közepes Első sprint S SCA-010 · scalability

useDashboardData uses select(*) on biometrics, meals, subjective_pulse - over-fetches every column

Kód-hely

_clients/SONI-remix-new/src/hooks/useDashboardData.ts:132-174

Evidence

useDashboardData.ts lines 132-174 runs 6 parallel queries on dashboard mount. Four of the six use select(*): biometrics (line 135), meals today (line 142), meals window of 20 (line 153), subjective_pulse (line 159). Only workout_logs (line 165) and lifestyle_logs (line 170) select named columns. biometrics table has 30+ columns; meals has many columns including raw photo_url, ai_analysis JSON.

A probléma

select(*) fetches every column including large JSON columns (ai_analysis blobs can be 1-5 KB each, zone_minutes JSON on biometrics). For meals limit 20 that is ~60 KB of unused JSON per dashboard mount. Compounds with React Query refetch on focus and the in-component useEffect refetch on data-change events. On slow mobile, dashboard mount spends a noticeable fraction of time fetching bytes the component never reads.

Üzleti hatás

Slower dashboard TTI on cellular networks, higher Supabase egress cost, noisier Postgres query plans (column list affects index-only-scan eligibility). Not critical today, but a multi-hundred-megabytes-per-day waste at 1000-DAU scale.

Magyarázat

Your dashboard downloads every field of every record from the database, even fields it never displays. For some users that is 30-100 KB of wasted data per page view. Changing select(*) to a named column list is a 15-minute fix per query.

Javaslat

Replace select(*) on biometrics with the explicit column list already in the Biometric interface.
Replace select(*) on meals with the Meal interface column list.
For ai_analysis specifically, do not load it on the dashboard - load on demand in MealDetailSheet.
Repeat for subjective_pulse (only energy/stress/soreness used).
Add ESLint rule or CI grep flagging new .select(*) usages.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

Overlaps with Code quality (useNotificationPrefs, useLifestyleData, useWorkoutLogs, server/coach-facts, server/habit-stacks, server/weekly-report, server/rewards, intelligence/HormonalSync, intelligence/PerformanceLab also use select(*)).

References

Közepes Első sprint M SCA-011 · scalability

Hot tables likely to grow large (coach_messages, meals, habit_logs) have user_id indexes but no archival or row-cap policy

Kód-hely

_clients/SONI-remix-new/supabase/migrations:n/a

Evidence

Per migration inspection: coach_messages indexed only on (conversation_id, created_at) - 20260418115322:161; no (user_id, created_at) compound. meals indexed on (user_id, consumed_at DESC) - 20260418023542:66. habit_logs indexed on (user_id, done_on, habit_key) - 20260511115651:4 and 20260429100506:5. lifestyle_logs indexed on (user_id, recorded_at DESC) and (user_id, type, recorded_at DESC) - 20260418025438. Two purge functions exist (purge_old_body_progress_photos, purge_old_coach_memory_threads - types.ts confirms) but no purge for coach_messages, meals, habit_logs, or lifestyle_logs. A daily-active user generates ~25-100 rows/day across these tables. At 1000 DAU x 365 days = ~10-40M rows/year per table.

A probléma

At year-2 scale (10-40M rows per hot table) the user-id-scoped queries are still index-efficient (logarithmic), but the supporting RLS policies execute auth.uid()::text comparisons on every row a candidate query touches. Postgres planner stats degrade without ANALYZE. No retention means historical data accumulates forever even when users never read it. Auto-vacuum becomes a concern past ~50M rows. Backup size grows linearly.

Üzleti hatás

At year-2 scale: Supabase Pro storage is included up to 8 GB; with multi-tenant logs the database will cross that threshold and start charging $0.125/GB/month. Backup snapshots compound storage cost. Query latency degrades modestly: p95 dashboard load goes from ~200 ms today to ~500-800 ms at year 2. Any future schema migration on a 40M-row table requires careful planning.

Magyarázat

Several of your busiest tables (chat messages, meal logs, habit logs) will keep growing forever - your app has no cleanup or archive policy. At 1000 daily users for a year those tables hold tens of millions of rows. Today this is fine; in 18-24 months it starts costing real money in storage and slowing queries.

Javaslat

Author retention policies per table: coach_messages > 365 days -> archive table; meals > 730 days -> drop ai_analysis column; lifestyle_logs > 180 days -> aggregate to daily rollup.
Implement as Supabase pg_cron jobs running weekly.
Add EXPLAIN ANALYZE to representative dashboard queries; ANALYZE the tables if planner is stale.
Add (user_id, created_at DESC) compound index to coach_messages (current index does NOT help the user-scoped query in SCA-004).
Monitoring alert when any table > 5 GB.
Long-term: consider partitioning hot tables by month.

Becsült munka

M — 1–3 days

Kapcsolódó dimenziók

SCA-004

Overlaps with Data integrity (dimension 3 - retention/erasure is also GDPR) and Domain compliance (dimension 10 - biometric data retention has regulatory limits).

References

Közepes Első sprint S SCA-012 · scalability

Supabase clients re-instantiated per server function (no shared admin client across in-loop invocations)

Kód-hely

_clients/SONI-remix-new/src/routes/api/public/hooks/body-plateau-detect.ts:29-34

Evidence

body-plateau-detect.ts:29-34 creates a fresh createClient with service-role key inside the POST handler on every cron invocation, bypassing the supabaseAdmin proxy in client.server.ts (lines 36-41) which would reuse a singleton per isolate. The api.coach-chat.ts POST handler also creates a fresh user-scoped client on every request. Across server functions, no PgBouncer/connection-pooler URL is verifiably configured (wrangler.jsonc shows no SUPABASE_URL set to pooler.supabase.com - the .env was not opened per Charter Rule 7).

A probléma

Each createClient instantiates a fetch-based PostgREST client - relatively cheap (1-2 ms) but not free. On Workers each isolate is reused for many requests; per-request instantiation misses connection state and HTTP/2 stream reuse. Critical risk: if SUPABASE_URL points at the direct Postgres endpoint instead of the pooler, every request opens a new pooled connection - direct endpoint caps at max_connections (60 Pro, 200 Team) while the pooler supports thousands.

Üzleti hatás

Small CPU overhead per request - multiplied by every API call and cron iteration. More importantly the architectural risk: if SUPABASE_URL is the direct host instead of the pooler, the project hits max_connections at modest concurrency with random too many connections errors. Verification step: confirm SUPABASE_URL contains pooler.supabase.com (transaction-pooler mode) for serverless.

Magyarázat

Your code creates a fresh database client on every request and every cron iteration instead of reusing one. The cost per request is small, but the pattern can become a problem under load. Also please verify the database URL in your environment uses Supabase connection pooler (a URL with pooler in the hostname) - without it you can hit a hard ceiling on simultaneous database connections.

Javaslat

Refactor cron handlers (body-plateau-detect.ts, weekly-reports.ts) to import supabaseAdmin from @/integrations/supabase/client.server instead of constructing a new client.
Verify SUPABASE_URL in the Cloudflare environment is the pooler URL (Supabase dashboard -> Settings -> Database -> Connection Pooler, port 6543 transaction mode for serverless).
Document the requirement.
Add a /api/_health endpoint that verifies the URL contains pooler so misconfiguration is caught in CI/staging.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

SEC-002

Overlaps with Ops (deployment configuration) and Security (SEC-002 already touches service-role hardening).

References

Alacsony Backlog S SCA-013 · scalability

No bundle-size budget in CI; vite-bundle-analyzer not wired; home route explicitly NOT code-split

Kód-hely

_clients/SONI-remix-new/vite.config.ts:1-12

Evidence

vite.config.ts is 12 lines - only cloudflare and tanstackStart settings, no rollup chunking config, no bundle-analyzer plugin. No GitHub Actions / CI files in repo. Bundle contains: 25 KB routeTree.gen.ts (generated), 858 KB locale JSONs (SCA-003), 25 Radix UI packages (~150-300 KB), framer-motion (~50 KB), recharts (~120 KB), react-markdown (~30 KB), embla-carousel, vaul, cmdk, lucide-react, date-fns. The tanstackStart router config explicitly keeps the / route un-split (line 8: routeId === / ? [] : undefined).

A probléma

Without a bundle budget there is no early signal when a new dependency pushes the Worker bundle past Cloudflare 10 MB compressed limit. Current dependency mix is well within today budget but trending upward. The home route being explicitly NOT code-split means the entire landing renders in the critical chunk. lucide-react can balloon if imported as import * as Icons from lucide-react - needs verification.

Üzleti hatás

Risk-only at current scale. The bundle is probably 3-5 MB compressed today and the 10 MB limit is far. But a single careless import * as Icons from lucide-react or adding a heavy chart library would breach the limit silently, and the team would learn at deploy time. From a Worker startup angle, parse time scales with bundle size: larger bundles = longer cold-start (50-150 ms cold vs 1-5 ms warm).

Magyarázat

There is no automatic check on how big your shipped JavaScript bundle is. As new features and libraries are added the bundle silently grows until a deploy fails because Cloudflare 10 MB limit was crossed. Adding a simple size-check catches this early.

Javaslat

Add rollup-plugin-visualizer to vite.config.ts so each build produces dist/stats.html.
Add npm script analyze that opens the report.
Once CI exists, use size-limit to fail the build if main chunk grows by more than 10% week-over-week or Worker bundle > 6 MB compressed.
Audit lucide-react import sites - ensure they use the recommended named-import form.
Investigate whether codeSplittingOptions excluding the home route is justified - the home page benefits most from code-splitting.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

SCA-003

Overlaps with Ops (CI setup) and Code quality.

References

Alacsony Backlog S SCA-014 · scalability

AI image generation max_completion_tokens: 8192 - high ceiling burned per avatar generation

Kód-hely

_clients/SONI-remix-new/src/server/bio-twin-avatar.ts:194, 356

Evidence

bio-twin-avatar.ts:194 - max_completion_tokens: 8192 on the gemini-3-pro-image-preview call. Same value at line 356 (retry branch) and at bio-twin-bank-generator.ts:223. Image-generation models use this differently than text models (the ceiling primarily caps reasoning tokens and any text the model emits alongside the image) but at gateway pricing tier 8192 is the maximum and is paid regardless of whether the model uses it. For a typical Bio Twin generation a 1024-2048 cap suffices.

A probléma

Per-avatar generation potentially costs 4-8x more than necessary. With the 8-image bank per user (TOTAL_BANK_VARIATIONS = 8), every new user triggers 8 image generations at this ceiling - meaningful at 1000+ signups. The historical bank size was 160 (comment line 26), so this code was previously 20x more expensive - the team optimised that but missed the per-call ceiling.

Üzleti hatás

Per-user signup cost on the image generation pipeline is higher than necessary by an estimated 2-4x. At 1000 signups, a marginal cost difference but predictable savings. At high signup-spike traffic (e.g. marketing campaign), the unconstrained per-call cap makes spike cost harder to forecast.

Magyarázat

The Bio Twin avatar generation requests the maximum response size from the AI even though it does not need it. Setting a lower cap (1024 instead of 8192) cuts the per-image cost without changing the output quality.

Javaslat

Reduce max_completion_tokens on image-generation calls (bio-twin-avatar.ts:194, 356; bio-twin-bank-generator.ts:.
to 2048 or 1024.
Add an explicit comment justifying whatever value is chosen so future devs do not bump it up reflexively.
Test with the lower cap to confirm quality is unchanged.

Becsült munka

S — under ½ day

Kapcsolódó dimenziók

SCA-007

Overlaps with AI integration (cost economics).

References

https://ai.google.dev/gemini-api/docs/image-generation

Scalability & Performance

Findings — Scalability & Performance