Files
iddaai-be/mds/archive/V17_MIGRATION_AND_TRAINING_GUIDE.md
T
fahricansecer 2f0b85a0c7
Deploy Iddaai Backend / build-and-deploy (push) Failing after 18s
first (part 2: other directories)
2026-04-16 15:11:25 +03:00

5.2 KiB
Executable File

V17 "Galacticos" Migration & Training Guide

Date: February 6, 2026 Status: Architecture Implemented / Training Ready


1. Executive Summary

This document details the transition of the Suggest-Bet AI Engine from a monolithic, script-based system to a scalable, service-oriented architecture (SOA) powered by FastAPI and Docker. It also documents the rigorous data analysis that led to the "Strict Filtering" training strategy for the V17 Model.

Core Achievement: The system now treats players as first-class citizens (Embeddings) and evaluates match context (Odds, Form) dynamically, served via a high-performance HTTP API.


2. Architecture Overhaul

Before (Legacy)

  • Execution: Backend spawned new Python processes (child_process.spawn) for EVERY prediction request.
  • Performance: ~3-5 seconds latency per request (loading PyTorch models from disk repeatedly).
  • Maintenance: Spaghetti code mixing API logic, feature engineering, and training scripts.
  • Integration: Brittle stdout parsing (Backend read text output from Python).

After (V17 Beast Mode)

  • Execution: Persistent Dockerized Service (ai-engine).
  • Performance: <100ms latency (Models kept in RAM).
  • Structure: Clean separation of concerns:
    • ai-engine/app: FastAPI Routes (HTTP Layer).
    • ai-engine/core: Pure Business Logic (Model, Features).
    • ai-engine/data: Database Abstraction.
  • Integration: Type-safe HTTP requests via axios from NestJS.

3. Data Analysis & Strategy Shift

We analyzed the database (~167k matches) and discovered critical data quality issues that were poisoning previous models.

The "Garbage Data" Problem

  • Total Matches: ~167,000
  • Matches with Lineups: Only ~73,000 (44%)
  • Matches with Odds: ~94,000 (56%)
  • Intersection (Quality Data): < 50,000 matches.

Conclusion: Training on the full dataset forces the model to learn from "blind" matches (missing lineups) or "contextless" matches (missing odds), leading to hallucinations.

The "Top Leagues" Solution

We analyzed 20 Top Leagues (Premier League, LaLiga, etc.) and found elite data quality:

  • Premier League: 77% Lineup Coverage.
  • Championship: 88% Lineup Coverage.

Decision: The V17 training pipeline now strictly filters for:

  1. Top 20 Leagues (top_leagues.json).
  2. Full Lineup Availability (11+ players per team).
  3. Odds Availability.

This reduces the dataset size but drastically increases Signal-to-Noise Ratio.


4. The V17 Model "Galacticos"

Philosophy

Instead of rating "Team A vs Team B", V17 rates "These 11 Players vs Those 11 Players" in the context of current odds and form.

Input Vector (The Brain)

  1. Player Embeddings: Each of the ~17,000 elite players has a learnable 32-dimensional vector.
  2. Context Vector (32-dim):
    • Odds (9): 1X2, Over/Under, BTTS (Normalized).
    • Form (12): Goals Scored/Conceded, Win Rate (Home/Away).
    • H2H (3): Historical dominance.
    • Advanced (8): Rest Days (Fatigue), League Standing Diff.

Output Heads (Multi-Task Learning)

The model is trained to predict everything at once to understand the game deeply:

  • Match Result (1X2): CrossEntropy Loss.
  • Goals (Home/Away): MSE Loss.
  • BTTS & Over 2.5: Binary CrossEntropy Loss.

5. How to Train (The Beast Trainer)

The training logic is consolidated into a single robust script: ai-engine/core/training/trainer_v17.py.

Prerequisites

  • Docker container running (ai-engine).
  • Database populated with historical data.

Execution Command

Run this from your host terminal (project root):

# 1. Install dependencies (if running locally outside docker)
pip3 install torch numpy pandas tqdm psycopg2-binary scikit-learn python-dotenv

# 2. Run Training
export DATABASE_URL="postgresql://suggestbet:SuGGesT2026SecuRe@127.0.0.1:15432/boilerplate_db?schema=public"
python3 ai-engine/core/training/trainer_v17.py

Key Metrics to Watch

The trainer reports "High Confidence Accuracy".

  • Bad: Overall Acc 50%, High Conf Acc 55%.
  • Good: Overall Acc 55%, High Conf Acc >85%. (This is our goal).

6. Deployment & Usage

1. Docker

The AI Engine is now part of docker-compose.yml.

docker-compose up -d --build

2. Backend Usage

NestJS services (SmartCouponService, PredictionsProcessor) now use axios to call the AI Engine:

// Example call
const response = await axios.post(`http://ai-engine:8000/predict/v17/${matchId}`);

3. API Endpoints

  • POST /predict/v17/{match_id}: Single match prediction.
  • GET /health: Health check.

7. Future Roadmap (TODOs)

  1. Curriculum Learning V2: Sort training data by "Difficulty" (Goal Difference) to teach easy matches first. (Partially implemented).
  2. Live Dashboard: A simple frontend page to visualize v17_comprehensive.pth training metrics in real-time.
  3. Auto-Retraining: A Cron job (BullMQ) to retrain the model every Monday with the weekend's results.
  4. Feedback Loop: Integrate ai_predictions_log table to feed "Wrong Predictions" back into training with higher weight.

Generated by Gemini CLI Agent - Senior ML Engineer