19 Commits

Author SHA1 Message Date
fahricansecer 15c6313246 Merge branch 'main' of https://gitea.bilgich.com/fahricansecer/iddaai-be
Deploy Iddaai Backend / build-and-deploy (push) Successful in 54s
2026-05-24 02:44:52 +03:00
fahricansecer 1b420a425e Update .gitignore 2026-05-24 02:43:10 +03:00
fahricansecer 55e62d8fe5 .gitea/workflows/deploy.yml Güncelle
Deploy Iddaai Backend / build-and-deploy (push) Successful in 4m56s
2026-05-24 02:30:14 +03:00
fahricansecer 21e05148c8 feat: league tier system + retrained V25 models (48 quality leagues)
Deploy Iddaai Backend / build-and-deploy (push) Failing after 3m56s
- Add LeagueTier DB model and Prisma schema
- Add league-tiers service (CRUD, sync, retrain trigger)
- Add league-tiers controller with admin API endpoints
- Add /v1/admin/retrain endpoint in AI engine (extract→train→reload pipeline)
- Retrain V25 Pro with 48 quality leagues (MS accuracy: 26.9%→51.4%)
- Update qualified_leagues.json (443→48 leagues)
- Include V25 model files in repo for Docker deployment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-20 21:57:15 +03:00
fahricansecer e001ce9ab5 fix: guarantee iddaai-ai-engine network alias on every deploy
Deploy Iddaai Backend / build-and-deploy (push) Successful in 29s
2026-05-20 10:40:00 +03:00
fahricansecer 9481ad7094 changes
Deploy Iddaai Backend / build-and-deploy (push) Successful in 42s
2026-05-20 10:10:28 +03:00
fahricansecer 1d4aa36602 gg
Deploy Iddaai Backend / build-and-deploy (push) Successful in 31s
2026-05-18 00:08:50 +03:00
fahricansecer 5574a3c59d feat: separate commentary endpoint - non-blocking Ollama
Deploy Iddaai Backend / build-and-deploy (push) Successful in 30s
2026-05-17 16:47:05 +03:00
fahricansecer 94c7a4481a main
Deploy Iddaai Backend / build-and-deploy (push) Successful in 37s
2026-05-17 02:17:22 +03:00
fahricansecer 17ace9bd12 feat: Ollama AI expert commentary integration
Deploy Iddaai Backend / build-and-deploy (push) Successful in 37s
- OllamaClient utility for llama3.2:3b API calls (timeout 30s, non-fatal)
- OllamaCommentary service builds structured Turkish prompt from prediction data
- PredictionsService enriches response with ai_expert_commentary field
- Frontend prediction-card displays AI commentary section above match_commentary
2026-05-17 02:09:04 +03:00
fahricansecer 2b87669f41 gg
Deploy Iddaai Backend / build-and-deploy (push) Successful in 31s
2026-05-13 16:56:14 +03:00
fahricansecer 2507678bc0 gg
Deploy Iddaai Backend / build-and-deploy (push) Successful in 32s
2026-05-12 17:41:49 +03:00
fahricansecer 2b8dce665f gg
Deploy Iddaai Backend / build-and-deploy (push) Successful in 1m8s
2026-05-12 03:06:54 +03:00
fahricansecer b6d64b59bf main
Deploy Iddaai Backend / build-and-deploy (push) Failing after 2m6s
2026-05-12 02:43:02 +03:00
fahricansecer f8599bdb9a gg
Deploy Iddaai Backend / build-and-deploy (push) Failing after 2m1s
2026-05-11 23:11:41 +03:00
fahricansecer 4dcc4ced50 gg
Deploy Iddaai Backend / build-and-deploy (push) Failing after 2m15s
2026-05-11 20:50:31 +03:00
fahricansecer 70fdc066c7 Merge branch 'v28'
Deploy Iddaai Backend / build-and-deploy (push) Successful in 6s
2026-05-10 22:52:21 +03:00
fahricansecer 8ce8fa5b94 Merge pull request 'gg' (#6) from v28 into main
Deploy Iddaai Backend / build-and-deploy (push) Successful in 39s
Reviewed-on: #6
2026-05-10 10:39:32 +03:00
fahricansecer 497b5d8d3b Merge pull request 'feat(ai-engine): value sniper thresholds and logic relaxed' (#5) from v28 into main
Deploy Iddaai Backend / build-and-deploy (push) Successful in 30s
Reviewed-on: #5
2026-05-06 17:56:24 +03:00
178 changed files with 149211 additions and 8549 deletions
+29 -5
View File
@@ -11,13 +11,27 @@ jobs:
- name: Kodu Cek
uses: actions/checkout@v4
- name: Docker Build
- name: Docker Build (Backend)
run: docker build -t iddaai-be:latest .
- name: Eski Konteyneri Sil
run: docker rm -f iddaai-be || true
- name: Docker Build (AI Engine)
run: docker build -t iddaai-ai-engine:latest ./ai-engine
- name: Yeni Versiyonu Baslat
- name: Eski Konteynerleri Sil
run: |
docker rm -f iddaai-be || true
docker rm -f iddaai-ai-engine || true
- name: AI Engine'i Baslat
run: |
docker run -d \
--name iddaai-ai-engine \
--restart unless-stopped \
--network iddaai_iddaai-network \
-e DATABASE_URL='${{ secrets.DATABASE_URL }}' \
iddaai-ai-engine:latest
- name: Backend'i Baslat
run: |
docker run -d \
--name iddaai-be \
@@ -29,7 +43,17 @@ jobs:
-e REDIS_HOST='${{ secrets.REDIS_HOST }}' \
-e REDIS_PORT='${{ secrets.REDIS_PORT }}' \
-e REDIS_PASSWORD='${{ secrets.REDIS_PASSWORD }}' \
-e AI_ENGINE_URL='${{ secrets.AI_ENGINE_URL }}' \
-e AI_ENGINE_URL='http://iddaai-ai-engine:8000' \
-e JWT_SECRET='${{ secrets.JWT_SECRET }}' \
-e JWT_ACCESS_EXPIRATION='1d' \
iddaai-be:latest /bin/sh -c "npx prisma migrate deploy && node dist/src/main.js"
- name: Saglik Kontrolu
run: |
sleep 10
echo "=== AI Engine logs ==="
docker logs --tail 30 iddaai-ai-engine || true
echo "=== Backend logs ==="
docker logs --tail 30 iddaai-be || true
echo "=== AI Engine health ==="
docker exec iddaai-ai-engine python -c "import urllib.request; print(urllib.request.urlopen('http://127.0.0.1:8000/health').read().decode())" || echo "AI engine health check failed"
+5 -1
View File
@@ -21,7 +21,10 @@ venv/
env/
# Database / Docker Volumes
data/
/data/
ai-engine/data/**/*.csv
ai-engine/data/v26_shadow/
ai-engine/data/__pycache__/
postgres-data/
redis-data/
@@ -44,6 +47,7 @@ public/uploads/
# Large Datasets and ML Models
ai-engine/models/*
!ai-engine/models/*.py
!ai-engine/models/v25/
models/*
!models/*.py
colab_export/
+2 -2
View File
@@ -16,7 +16,7 @@ RUN npm ci
COPY . .
# Generate Prisma client
RUN npx prisma generate
RUN DATABASE_URL="postgresql://dummy:dummy@localhost/dummy" npx prisma generate
# Build the application
RUN npm run build
@@ -38,7 +38,7 @@ RUN apk add --no-cache --virtual .build-deps python3 make g++ cairo-dev pango-de
# Copy Prisma schema and generate client
COPY prisma ./prisma
RUN npx prisma generate
RUN DATABASE_URL="postgresql://dummy:dummy@localhost/dummy" npx prisma generate
# Copy built application
COPY --from=builder /app/dist ./dist
+11
View File
@@ -1,3 +1,14 @@
model_ensemble:
xgb_weight: 0.50
lgb_weight: 0.50
temperature: 1.5
default_ms_odds:
home: 2.65
draw: 3.20
away: 2.65
elo_staleness_days: 14
odds_staleness_hours: 48
engine_weights:
team: 0.30
player: 0.25
@@ -40,7 +40,7 @@ class CalculationContext:
is_surprise: bool = False
# XGBoost Predictions (New)
xgboost_preds: dict[str, dict[str, Any]] = field(default_factory=dict)
xgboost_preds: dict[str, Any] = field(default_factory=dict)
class BaseCalculator:
@@ -28,7 +28,7 @@ class RecommendationResult:
class BetRecommender(BaseCalculator):
def calculate(self,
def calculate(self, # type: ignore[override]
ctx: CalculationContext,
ms_res: MatchResultPrediction,
ou_res: OverUnderPrediction,
@@ -36,7 +36,7 @@ class ExpertResult:
class ExpertRecommender(BaseCalculator):
def calculate(self,
def calculate(self, # type: ignore[override]
ctx: CalculationContext,
ms_res: MatchResultPrediction,
ou_res: OverUnderPrediction,
@@ -31,7 +31,7 @@ class HalfTimeCalculator(BaseCalculator):
return 1.0 if k == 0 else 0.0
return (lam ** k) * math.exp(-lam) / math.factorial(k)
def calculate(self, ctx: CalculationContext) -> HalfTimePrediction:
def calculate(self, ctx: CalculationContext) -> HalfTimePrediction: # type: ignore[override]
team_pred = ctx.team_pred
odds_pred = ctx.odds_pred
@@ -22,9 +22,9 @@ class MatchResultCalculator(BaseCalculator):
def _get_engine_winner(self, home_prob: float, draw_prob: float, away_prob: float) -> str:
"""Determine which outcome an engine favors."""
probs = {"1": home_prob, "X": draw_prob, "2": away_prob}
return max(probs, key=probs.get)
return max(probs, key=probs.__getitem__)
def calculate(self, ctx: CalculationContext) -> MatchResultPrediction:
def calculate(self, ctx: CalculationContext) -> MatchResultPrediction: # type: ignore[override]
# Weights
w_team = ctx.weights["team"]
w_player = ctx.weights["player"]
@@ -28,7 +28,7 @@ class OtherMarketsPrediction:
class OtherMarketsCalculator(BaseCalculator):
def calculate(
def calculate( # type: ignore[override]
self,
ctx: CalculationContext,
ms_result: MatchResultPrediction,
@@ -55,7 +55,7 @@ class OverUnderCalculator(BaseCalculator):
return over_15, over_25, over_35, btts_yes
def calculate(self, ctx: CalculationContext) -> OverUnderPrediction:
def calculate(self, ctx: CalculationContext) -> OverUnderPrediction: # type: ignore[override]
odds_pred = ctx.odds_pred
referee_mods = ctx.referee_mods
+40 -29
View File
@@ -67,12 +67,14 @@ class RiskAssessor(BaseCalculator):
if sport_key == "basketball":
if is_top_league:
return float(
self.config.get("risk.surprise_threshold_basketball_top", self.config.get("risk.surprise_threshold_basketball", 0.30)),
)
return float(
self.config.get("risk.surprise_threshold_basketball_non_top", 0.34),
)
top_val = self.config.get("risk.surprise_threshold_basketball_top")
if top_val is not None:
return float(top_val)
base_val = self.config.get("risk.surprise_threshold_basketball")
return float(base_val) if base_val is not None else 0.30
non_top_val = self.config.get("risk.surprise_threshold_basketball_non_top")
return float(non_top_val) if non_top_val is not None else 0.34
if top_label not in ("1/2", "2/1"):
return base_threshold
@@ -81,27 +83,30 @@ class RiskAssessor(BaseCalculator):
favorite_side, gap = self._favorite_profile_from_odds(ctx.odds_data)
if is_top_league:
favorite_winner_threshold = float(
self.config.get(
"risk.surprise_threshold_favorite_reversal_top",
self.config.get("risk.surprise_threshold_favorite_reversal", 0.26),
),
)
underdog_winner_threshold = float(
self.config.get(
"risk.surprise_threshold_underdog_reversal_top",
self.config.get("risk.surprise_threshold_underdog_reversal", 0.20),
),
)
top_fav = self.config.get("risk.surprise_threshold_favorite_reversal_top")
if top_fav is not None:
favorite_winner_threshold = float(top_fav)
else:
base_fav = self.config.get("risk.surprise_threshold_favorite_reversal")
favorite_winner_threshold = float(base_fav) if base_fav is not None else 0.26
top_ud = self.config.get("risk.surprise_threshold_underdog_reversal_top")
if top_ud is not None:
underdog_winner_threshold = float(top_ud)
else:
base_ud = self.config.get("risk.surprise_threshold_underdog_reversal")
underdog_winner_threshold = float(base_ud) if base_ud is not None else 0.20
else:
favorite_winner_threshold = float(
self.config.get("risk.surprise_threshold_favorite_reversal_non_top", 0.30),
)
underdog_winner_threshold = float(
self.config.get("risk.surprise_threshold_underdog_reversal_non_top", 0.24),
)
gap_medium = float(self.config.get("risk.htft_reversal_gap_medium", 0.50))
gap_strong = float(self.config.get("risk.htft_reversal_gap_strong", 1.00))
nt_fav = self.config.get("risk.surprise_threshold_favorite_reversal_non_top")
favorite_winner_threshold = float(nt_fav) if nt_fav is not None else 0.30
nt_ud = self.config.get("risk.surprise_threshold_underdog_reversal_non_top")
underdog_winner_threshold = float(nt_ud) if nt_ud is not None else 0.24
gm = self.config.get("risk.htft_reversal_gap_medium")
gap_medium = float(gm) if gm is not None else 0.50
gs = self.config.get("risk.htft_reversal_gap_strong")
gap_strong = float(gs) if gs is not None else 1.00
if favorite_side in ("H", "A"):
threshold = (
@@ -117,7 +122,7 @@ class RiskAssessor(BaseCalculator):
return base_threshold
def calculate(self, ctx: CalculationContext, ms_result=None) -> RiskAnalysis:
def calculate(self, ctx: CalculationContext, ms_result: Any = None) -> RiskAnalysis: # type: ignore[override]
"""
Wrapper for assess_risk to match BaseCalculator interface but with extra arg.
"""
@@ -173,9 +178,15 @@ class RiskAssessor(BaseCalculator):
threshold = self._dynamic_reversal_threshold(ctx, top_label)
if getattr(ctx, "is_top_league", False):
min_gap = float(self.config.get("risk.surprise_min_top_gap_top", self.config.get("risk.surprise_min_top_gap", 0.02)))
top_gap_val = self.config.get("risk.surprise_min_top_gap_top")
if top_gap_val is not None:
min_gap = float(top_gap_val)
else:
base_gap_val = self.config.get("risk.surprise_min_top_gap")
min_gap = float(base_gap_val) if base_gap_val is not None else 0.02
else:
min_gap = float(self.config.get("risk.surprise_min_top_gap_non_top", 0.03))
non_top_gap_val = self.config.get("risk.surprise_min_top_gap_non_top")
min_gap = float(non_top_gap_val) if non_top_gap_val is not None else 0.03
# Trigger surprise only when reversal class is:
# - top HT/FT outcome
@@ -3,7 +3,7 @@ import pickle
import pandas as pd
import xgboost as xgb
from dataclasses import dataclass
from typing import List, Dict, Tuple
from typing import List, Dict, Tuple, Optional
import math
from .base_calculator import BaseCalculator, CalculationContext
from .confidence import calc_confidence_3way, calc_confidence_dc
@@ -16,7 +16,7 @@ class ScorePrediction:
ft_scores_top5: List[Dict]
# Reconciled MS/DC predictions (can be updated here)
reconciled_ms: MatchResultPrediction = None
reconciled_ms: Optional[MatchResultPrediction] = None
class ScoreCalculator(BaseCalculator):
@@ -57,7 +57,8 @@ class ScoreCalculator(BaseCalculator):
return 1.0 if k == 0 else 0.0
return (lam ** k) * math.exp(-lam) / math.factorial(k)
def calculate(self, ctx: CalculationContext, ms_result: MatchResultPrediction) -> ScorePrediction:
def calculate(self, ctx: CalculationContext, ms_result: MatchResultPrediction) -> ScorePrediction: # type: ignore[override]
predicted_ht = None
# Default Lambdas (fallback)
lambda_home = max(0.5, ctx.home_xg)
lambda_away = max(0.5, ctx.away_xg)
@@ -199,7 +200,7 @@ class ScoreCalculator(BaseCalculator):
predicted_ft = top_overall_score
# If we didn't calculate HT via ML (exception case), do it now
if 'predicted_ht' not in locals():
if predicted_ht is None:
ft_to_ht = self.config.get("half_time.ft_to_ht_ratio", 0.42)
ht_h = round(lambda_home * ft_to_ht)
ht_a = round(lambda_away * ft_to_ht)
+1 -7
View File
@@ -1,16 +1,10 @@
# ai-engine/core/engines/__init__.py
"""
V20 Ensemble Prediction Engines
Prediction Engines
"""
from .team_predictor import TeamPredictorEngine, get_team_predictor
from .player_predictor import PlayerPredictorEngine, get_player_predictor
from .odds_predictor import OddsPredictorEngine, get_odds_predictor
from .referee_predictor import RefereePredictorEngine, get_referee_predictor
__all__ = [
"TeamPredictorEngine", "get_team_predictor",
"PlayerPredictorEngine", "get_player_predictor",
"OddsPredictorEngine", "get_odds_predictor",
"RefereePredictorEngine", "get_referee_predictor"
]
-237
View File
@@ -1,237 +0,0 @@
"""
Odds Predictor Engine - V20 Ensemble Component
Uses market odds and Poisson mathematics for predictions.
Weight: 30% in ensemble
"""
import os
import sys
from typing import Dict, Optional
from dataclasses import dataclass
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from features.poisson_engine import get_poisson_engine
from features.value_calculator import get_value_calculator
@dataclass
class OddsPrediction:
"""Odds engine prediction output."""
# Market-implied probabilities
market_home_prob: float = 0.33
market_draw_prob: float = 0.33
market_away_prob: float = 0.33
# Poisson xG
poisson_home_xg: float = 1.3
poisson_away_xg: float = 1.1
# Over/Under probabilities
over_15_prob: float = 0.75
over_25_prob: float = 0.55
over_35_prob: float = 0.30
# BTTS
btts_yes_prob: float = 0.50
# Most likely scores
most_likely_score: str = "1-1"
second_likely_score: str = "1-0"
third_likely_score: str = "2-1"
# Value bet opportunities
value_bets: list = None
confidence: float = 0.0
def __post_init__(self):
if self.value_bets is None:
self.value_bets = []
def to_dict(self) -> dict:
return {
"market_home_prob": round(self.market_home_prob * 100, 1),
"market_draw_prob": round(self.market_draw_prob * 100, 1),
"market_away_prob": round(self.market_away_prob * 100, 1),
"poisson_home_xg": round(self.poisson_home_xg, 2),
"poisson_away_xg": round(self.poisson_away_xg, 2),
"over_15_prob": round(self.over_15_prob * 100, 1),
"over_25_prob": round(self.over_25_prob * 100, 1),
"over_35_prob": round(self.over_35_prob * 100, 1),
"btts_yes_prob": round(self.btts_yes_prob * 100, 1),
"most_likely_score": self.most_likely_score,
"second_likely_score": self.second_likely_score,
"third_likely_score": self.third_likely_score,
"value_bets": self.value_bets,
"confidence": round(self.confidence, 1)
}
class OddsPredictorEngine:
"""
Odds-based prediction engine.
Uses:
- Market odds to extract implied probabilities
- Poisson distribution for mathematical xG
- Value calculator for EV+ opportunities
"""
def __init__(self):
self.poisson_engine = get_poisson_engine()
try:
self.value_calc = get_value_calculator()
except Exception:
self.value_calc = None
self.default_ms_h = 2.65
self.default_ms_d = 3.20
self.default_ms_a = 2.65
print("✅ OddsPredictorEngine initialized")
def _odds_to_prob(self, odds: float) -> float:
"""Convert decimal odds to probability."""
try:
odds = float(odds)
except (TypeError, ValueError):
return 0.0
if odds <= 1.0:
return 0.0
return 1.0 / odds
def predict(self,
odds_data: Dict[str, float],
home_goals_avg: float = 1.5,
home_conceded_avg: float = 1.2,
away_goals_avg: float = 1.2,
away_conceded_avg: float = 1.4) -> OddsPrediction:
"""
Generate odds-based prediction.
Args:
odds_data: Dict with keys like 'ms_h', 'ms_d', 'ms_a', 'ou25_o', 'btts_y'
home_goals_avg: Home team's average goals scored
home_conceded_avg: Home team's average goals conceded
away_goals_avg: Away team's average goals scored
away_conceded_avg: Away team's average goals conceded
Returns:
OddsPrediction with market and Poisson analysis
"""
# 1. Extract market probabilities from odds
ms_h = odds_data.get("ms_h", self.default_ms_h)
ms_d = odds_data.get("ms_d", self.default_ms_d)
ms_a = odds_data.get("ms_a", self.default_ms_a)
# Remove vig to get fair probabilities
raw_probs = [
self._odds_to_prob(ms_h),
self._odds_to_prob(ms_d),
self._odds_to_prob(ms_a)
]
total = sum(raw_probs) or 1
market_home = raw_probs[0] / total
market_draw = raw_probs[1] / total
market_away = raw_probs[2] / total
# 2. Poisson prediction
poisson_pred = self.poisson_engine.predict(
home_goals_avg, home_conceded_avg,
away_goals_avg, away_conceded_avg
)
# 3. Get most likely scores
likely_scores = poisson_pred.most_likely_scores[:3] if poisson_pred.most_likely_scores else []
score_1 = likely_scores[0]["score"] if len(likely_scores) > 0 else "1-1"
score_2 = likely_scores[1]["score"] if len(likely_scores) > 1 else "1-0"
score_3 = likely_scores[2]["score"] if len(likely_scores) > 2 else "2-1"
# 4. Value bet detection
value_bets = []
# Check if our Poisson model disagrees with market significantly
if abs(poisson_pred.home_win_prob - market_home) > 0.10:
if poisson_pred.home_win_prob > market_home:
value_bets.append({
"market": "MS 1",
"edge": round((poisson_pred.home_win_prob - market_home) * 100, 1),
"confidence": "medium"
})
else:
value_bets.append({
"market": "MS 2",
"edge": round((poisson_pred.away_win_prob - market_away) * 100, 1),
"confidence": "medium"
})
# O/U value check
ou25_o = odds_data.get("ou25_o", 1.9)
market_over25 = self._odds_to_prob(ou25_o)
if abs(poisson_pred.over_25_prob - market_over25) > 0.08:
pick = "2.5 Üst" if poisson_pred.over_25_prob > market_over25 else "2.5 Alt"
edge = abs(poisson_pred.over_25_prob - market_over25) * 100
value_bets.append({
"market": pick,
"edge": round(edge, 1),
"confidence": "high" if edge > 10 else "medium"
})
# Calculate confidence
# Higher when market and Poisson agree
agreement = 1.0 - abs(poisson_pred.home_win_prob - market_home)
confidence = 50.0 + (agreement * 40) + (len(value_bets) * 5)
return OddsPrediction(
market_home_prob=market_home,
market_draw_prob=market_draw,
market_away_prob=market_away,
poisson_home_xg=poisson_pred.home_xg,
poisson_away_xg=poisson_pred.away_xg,
over_15_prob=poisson_pred.over_15_prob,
over_25_prob=poisson_pred.over_25_prob,
over_35_prob=poisson_pred.over_35_prob,
btts_yes_prob=poisson_pred.btts_yes_prob,
most_likely_score=score_1,
second_likely_score=score_2,
third_likely_score=score_3,
value_bets=value_bets,
confidence=min(99.9, confidence)
)
# Singleton
_engine: Optional[OddsPredictorEngine] = None
def get_odds_predictor() -> OddsPredictorEngine:
global _engine
if _engine is None:
_engine = OddsPredictorEngine()
return _engine
if __name__ == "__main__":
engine = get_odds_predictor()
print("\n🧪 Odds Predictor Engine Test")
print("=" * 50)
pred = engine.predict(
odds_data={
"ms_h": 1.85,
"ms_d": 3.40,
"ms_a": 4.20,
"ou25_o": 1.90
},
home_goals_avg=1.8,
home_conceded_avg=1.0,
away_goals_avg=1.2,
away_conceded_avg=1.5
)
print(f"\n📊 Prediction:")
for k, v in pred.to_dict().items():
print(f" {k}: {v}")
+171 -63
View File
@@ -24,32 +24,29 @@ class PlayerPrediction:
extract_training_data.py so that inference values match the
distribution the model was trained on (~3-36 range).
"""
home_squad_quality: float = 12.0 # training-scale composite (~3-36)
home_squad_quality: float = 12.0
away_squad_quality: float = 12.0
squad_diff: float = 0.0 # home - away (training scale)
squad_diff: float = 0.0
home_key_players: int = 0
away_key_players: int = 0
home_missing_impact: float = 0.0 # 0-1, how much weaker due to missing players
home_missing_impact: float = 0.0
away_missing_impact: float = 0.0
home_goals_form: int = 0 # Goals in last 5 matches
home_goals_form: int = 0
away_goals_form: int = 0
home_lineup_goals_per90: float = 0.0
away_lineup_goals_per90: float = 0.0
home_lineup_assists_per90: float = 0.0
away_lineup_assists_per90: float = 0.0
home_squad_continuity: float = 0.5
away_squad_continuity: float = 0.5
home_top_scorer_form: int = 0
away_top_scorer_form: int = 0
home_avg_player_exp: float = 0.0
away_avg_player_exp: float = 0.0
home_goals_diversity: float = 0.0
away_goals_diversity: float = 0.0
lineup_available: bool = False
confidence: float = 0.0
def to_dict(self) -> dict:
return {
"home_squad_quality": round(self.home_squad_quality, 1),
"away_squad_quality": round(self.away_squad_quality, 1),
"squad_diff": round(self.squad_diff, 1),
"home_key_players": self.home_key_players,
"away_key_players": self.away_key_players,
"home_missing_impact": round(self.home_missing_impact, 2),
"away_missing_impact": round(self.away_missing_impact, 2),
"home_goals_form": self.home_goals_form,
"away_goals_form": self.away_goals_form,
"lineup_available": self.lineup_available,
"confidence": round(self.confidence, 1)
}
class PlayerPredictorEngine:
@@ -72,9 +69,9 @@ class PlayerPredictorEngine:
match_id: str,
home_team_id: str,
away_team_id: str,
home_lineup: List[str] = None,
away_lineup: List[str] = None,
sidelined_data: Dict = None) -> PlayerPrediction:
home_lineup: Optional[List[str]] = None,
away_lineup: Optional[List[str]] = None,
sidelined_data: Optional[Dict] = None) -> PlayerPrediction:
"""
Generate player-based prediction.
@@ -90,8 +87,9 @@ class PlayerPredictorEngine:
"""
# Get squad features
home_analysis = None
away_analysis = None
if home_lineup and away_lineup:
# Use provided lineups (for live matches)
home_analysis = self.squad_engine.analyze_squad_from_list(
home_lineup, home_team_id
)
@@ -99,7 +97,6 @@ class PlayerPredictorEngine:
away_lineup, away_team_id
)
lineup_available = True
# Build features dict from analysis objects
features = {
"home_starting_11": home_analysis.starting_count or 11,
"home_goals_last_5": home_analysis.total_goals_last_5,
@@ -113,7 +110,6 @@ class PlayerPredictorEngine:
"away_forwards": away_analysis.forward_count or 2,
}
elif match_id:
# Try to get from database
try:
features = self.squad_engine.get_features(
match_id, home_team_id, away_team_id
@@ -132,58 +128,42 @@ class PlayerPredictorEngine:
home_team_id, away_team_id
)
lineup_available = False
# Extract features
home_goals = features.get("home_goals_last_5", 0)
away_goals = features.get("away_goals_last_5", 0)
home_key = features.get("home_key_players", 0)
away_key = features.get("away_key_players", 0)
home_assists = features.get("home_assists_last_5", 0)
away_assists = features.get("away_assists_last_5", 0)
home_goals = int(features.get("home_goals_last_5", 0))
away_goals = int(features.get("away_goals_last_5", 0))
home_key = int(features.get("home_key_players", 0))
away_key = int(features.get("away_key_players", 0))
home_starting = features.get("home_starting_11", 11)
away_starting = features.get("away_starting_11", 11)
home_fwd = features.get("home_forwards", 2)
away_fwd = features.get("away_forwards", 2)
# Calculate squad quality — MUST match extract_training_data.py formula
# Formula: starting_count * 0.3 + goals * 2.0 + assists * 1.0
# + key_players * 3.0 + fwd_count * 1.5
# Typical range: ~3 36 (model trained on this distribution)
home_quality = (
home_starting * 0.3 +
home_goals * 2.0 +
home_assists * 1.0 +
home_key * 3.0 +
home_fwd * 1.5
)
away_quality = (
away_starting * 0.3 +
away_goals * 2.0 +
away_assists * 1.0 +
away_key * 3.0 +
away_fwd * 1.5
)
# Squad difference
# Squad quality — matches V25 extract_training_data.py:579
home_quality = home_starting * 0.3 + home_key * 3.0 + home_fwd * 1.5
away_quality = away_starting * 0.3 + away_key * 3.0 + away_fwd * 1.5
squad_diff = home_quality - away_quality
# Missing player impact
# Priority: sidelined data (position-weighted) > lineup count (basic)
if sidelined_data:
home_impact, away_impact = self.sidelined_analyzer.analyze_match(sidelined_data)
home_missing = home_impact.impact_score
away_missing = away_impact.impact_score
home_missing = min(1.0, max(0.0, home_impact.impact_score))
away_missing = min(1.0, max(0.0, away_impact.impact_score))
sidelined_available = True
else:
# Fallback: basic lineup count method
expected_xi = 11
actual_home_xi = features.get("home_starting_11", 11)
actual_away_xi = features.get("away_starting_11", 11)
home_missing = (expected_xi - actual_home_xi) / expected_xi if actual_home_xi < expected_xi else 0
away_missing = (expected_xi - actual_away_xi) / expected_xi if actual_away_xi < expected_xi else 0
sidelined_available = False
# Confidence: more data sources = higher confidence
# Player-level features (matches extract_training_data.py:594-650)
player_feats = self._compute_player_level_features(
home_lineup or [], away_lineup or [],
home_team_id, away_team_id,
home_analysis, away_analysis,
)
confidence = 70.0 if lineup_available else 35.0
if home_goals + away_goals > 10:
confidence += 15
@@ -191,7 +171,7 @@ class PlayerPredictorEngine:
confidence += self.sidelined_analyzer.config.get("sidelined.confidence_boost", 10)
if not lineup_available:
confidence -= 5.0
return PlayerPrediction(
home_squad_quality=home_quality,
away_squad_quality=away_quality,
@@ -202,9 +182,137 @@ class PlayerPredictorEngine:
away_missing_impact=away_missing,
home_goals_form=home_goals,
away_goals_form=away_goals,
home_lineup_goals_per90=player_feats['home_lineup_goals_per90'],
away_lineup_goals_per90=player_feats['away_lineup_goals_per90'],
home_lineup_assists_per90=player_feats['home_lineup_assists_per90'],
away_lineup_assists_per90=player_feats['away_lineup_assists_per90'],
home_squad_continuity=player_feats['home_squad_continuity'],
away_squad_continuity=player_feats['away_squad_continuity'],
home_top_scorer_form=player_feats['home_top_scorer_form'],
away_top_scorer_form=player_feats['away_top_scorer_form'],
home_avg_player_exp=player_feats['home_avg_player_exp'],
away_avg_player_exp=player_feats['away_avg_player_exp'],
home_goals_diversity=player_feats['home_goals_diversity'],
away_goals_diversity=player_feats['away_goals_diversity'],
lineup_available=lineup_available,
confidence=max(5.0, confidence)
)
def _compute_player_level_features(
self,
home_lineup: List[str],
away_lineup: List[str],
home_team_id: str,
away_team_id: str,
home_analysis,
away_analysis,
) -> Dict[str, float]:
defaults = {
'home_lineup_goals_per90': 0.0, 'away_lineup_goals_per90': 0.0,
'home_lineup_assists_per90': 0.0, 'away_lineup_assists_per90': 0.0,
'home_squad_continuity': 0.5, 'away_squad_continuity': 0.5,
'home_top_scorer_form': 0, 'away_top_scorer_form': 0,
'home_avg_player_exp': 0.0, 'away_avg_player_exp': 0.0,
'home_goals_diversity': 0.0, 'away_goals_diversity': 0.0,
}
conn = self.squad_engine.get_conn()
if conn is None:
return defaults
try:
from psycopg2.extras import RealDictCursor
result = {}
for prefix, lineup, team_id in [
('home', home_lineup, home_team_id),
('away', away_lineup, away_team_id),
]:
if not lineup:
for k in ('lineup_goals_per90', 'lineup_assists_per90',
'squad_continuity', 'top_scorer_form',
'avg_player_exp', 'goals_diversity'):
result[f'{prefix}_{k}'] = defaults[f'{prefix}_{k}']
continue
g90, a90, total_exp = 0.0, 0.0, 0
best_scorer_total, best_scorer_id = 0, None
scorers_in_lineup = 0
with conn.cursor(cursor_factory=RealDictCursor) as cur:
for pid in lineup:
cur.execute("""
SELECT
COUNT(*) as starts,
COALESCE(SUM(CASE WHEN e.event_type = 'goal'
AND (e.event_subtype IS NULL OR e.event_subtype NOT ILIKE '%%penaltı kaçırma%%')
THEN 1 ELSE 0 END), 0) as goals,
COALESCE((SELECT COUNT(*) FROM match_player_events
WHERE assist_player_id = %s), 0) as assists
FROM match_player_participation mpp
LEFT JOIN match_player_events e
ON e.match_id = mpp.match_id AND e.player_id = mpp.player_id
WHERE mpp.player_id = %s AND mpp.is_starting = true
""", (pid, pid))
row = cur.fetchone()
if not row or not row['starts']:
continue
starts = row['starts']
goals = row['goals'] or 0
assists = row['assists'] or 0
g90 += goals / starts
a90 += assists / starts
total_exp += starts
if goals > 0:
scorers_in_lineup += 1
if goals > best_scorer_total:
best_scorer_total = goals
best_scorer_id = pid
n_st = len(lineup) or 1
# Top scorer recent form (goals in last 5 starts)
top_scorer_form = 0
if best_scorer_id:
cur.execute("""
SELECT COUNT(*) as goals
FROM match_player_events mpe
WHERE mpe.player_id = %s AND mpe.event_type = 'goal'
AND mpe.match_id IN (
SELECT match_id FROM match_player_participation
WHERE player_id = %s AND is_starting = true
ORDER BY match_id DESC LIMIT 5
)
""", (best_scorer_id, best_scorer_id))
tsf_row = cur.fetchone()
if tsf_row:
top_scorer_form = tsf_row['goals'] or 0
# Squad continuity (overlap with previous match lineup)
squad_continuity = 0.5
cur.execute("""
SELECT mpp.player_id
FROM match_player_participation mpp
JOIN matches m ON mpp.match_id = m.id
WHERE mpp.team_id = %s AND mpp.is_starting = true
AND m.status = 'FT'
ORDER BY m.mst_utc DESC
LIMIT 11
""", (team_id,))
prev_starters = {r['player_id'] for r in cur.fetchall()}
if prev_starters:
overlap = len(set(lineup) & prev_starters)
squad_continuity = overlap / n_st
result[f'{prefix}_lineup_goals_per90'] = round(g90, 3)
result[f'{prefix}_lineup_assists_per90'] = round(a90, 3)
result[f'{prefix}_squad_continuity'] = round(squad_continuity, 3)
result[f'{prefix}_top_scorer_form'] = top_scorer_form
result[f'{prefix}_avg_player_exp'] = round(total_exp / n_st, 1)
result[f'{prefix}_goals_diversity'] = round(scorers_in_lineup / n_st, 3)
return result
except Exception as e:
print(f"[PlayerPredictor] Player-level features failed: {e}")
return defaults
def get_1x2_modifier(self, prediction: PlayerPrediction) -> Dict[str, float]:
"""
@@ -241,7 +349,7 @@ if __name__ == "__main__":
print("=" * 50)
pred = engine.predict(
match_id=None,
match_id="test_match",
home_team_id="test_home",
away_team_id="test_away"
)
-188
View File
@@ -1,188 +0,0 @@
"""
Referee Predictor Engine - V20 Ensemble Component
Analyzes referee patterns for cards, goals, and home bias.
Weight: 15% in ensemble
"""
import os
import sys
from typing import Dict, Optional
from dataclasses import dataclass
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from features.referee_engine import get_referee_engine
@dataclass
class RefereePrediction:
"""Referee engine prediction output."""
referee_name: str = ""
matches_officiated: int = 0
# Card tendencies
avg_yellow_cards: float = 4.0
avg_red_cards: float = 0.2
is_card_heavy: bool = False # Above average cards
# Goal tendencies
avg_goals_per_match: float = 2.5
over_25_rate: float = 0.50
is_high_scoring: bool = False # Above average goals
# Home bias
home_win_rate: float = 0.45
home_bias: float = 0.0 # -1 to +1, positive = favors home
# Penalty tendency
penalty_rate: float = 0.15
confidence: float = 0.0
def to_dict(self) -> dict:
return {
"referee_name": self.referee_name,
"matches_officiated": self.matches_officiated,
"avg_yellow_cards": round(self.avg_yellow_cards, 1),
"avg_red_cards": round(self.avg_red_cards, 2),
"is_card_heavy": self.is_card_heavy,
"avg_goals_per_match": round(self.avg_goals_per_match, 2),
"over_25_rate": round(self.over_25_rate * 100, 1),
"is_high_scoring": self.is_high_scoring,
"home_win_rate": round(self.home_win_rate * 100, 1),
"home_bias": round(self.home_bias, 2),
"penalty_rate": round(self.penalty_rate * 100, 1),
"confidence": round(self.confidence, 1)
}
class RefereePredictorEngine:
"""
Referee-based prediction engine.
Analyzes:
- Card tendency (sarı/kırmızı kart ortalaması)
- Goal tendency (maç başına gol, 2.5 üst oranı)
- Home bias (ev sahibi lehine karar oranı)
- Penalty tendency (penaltı verme oranı)
"""
# League average benchmarks
LEAGUE_AVG_GOALS = 2.65
LEAGUE_AVG_YELLOW = 4.0
LEAGUE_HOME_WIN_RATE = 0.45
def __init__(self):
self.referee_engine = get_referee_engine()
print("✅ RefereePredictorEngine initialized")
def predict(self,
match_id: str = None,
referee_name: str = None,
league_id: str = None) -> RefereePrediction:
"""
Generate referee-based prediction.
Args:
match_id: Match ID to find referee
referee_name: Or provide referee name directly
league_id: League ID to scope stats (prevents name collisions)
Returns:
RefereePrediction with referee analysis
"""
# Get referee features
if match_id:
features = self.referee_engine.get_features(match_id, league_id=league_id)
# Live flows may already have referee_name while match_officials table is sparse.
# Prefer the richer profile if direct-name lookup has more history.
if referee_name:
name_features = self.referee_engine.get_features_by_name(referee_name, league_id=league_id)
if (name_features.get("referee_matches", 0) or 0) > (features.get("referee_matches", 0) or 0):
features = name_features
elif referee_name:
features = self.referee_engine.get_features_by_name(referee_name, league_id=league_id)
else:
# Return default
return RefereePrediction(confidence=10.0)
ref_name = features.get("referee_name", "Unknown")
matches = features.get("referee_matches", 0)
if matches < 5:
# Not enough data
return RefereePrediction(
referee_name=ref_name,
matches_officiated=matches,
confidence=20.0
)
# Extract features
avg_yellow = features.get("referee_avg_yellow", 4.0)
avg_red = features.get("referee_avg_red", 0.2)
avg_goals = features.get("referee_avg_goals", 2.5)
over25_rate = features.get("referee_over25_rate", 0.5)
home_win_rate = features.get("referee_home_win_rate", 0.45) if "referee_home_win_rate" in features else 0.45
home_bias = features.get("referee_home_bias", 0.0)
penalty_rate = features.get("referee_penalty_rate", 0.15)
# Determine tendencies
is_card_heavy = (avg_yellow + avg_red * 4) > (self.LEAGUE_AVG_YELLOW + 1)
is_high_scoring = avg_goals > self.LEAGUE_AVG_GOALS
# Confidence based on matches officiated
confidence = min(90.0, 30.0 + matches * 2)
return RefereePrediction(
referee_name=ref_name,
matches_officiated=matches,
avg_yellow_cards=avg_yellow,
avg_red_cards=avg_red,
is_card_heavy=is_card_heavy,
avg_goals_per_match=avg_goals,
over_25_rate=over25_rate,
is_high_scoring=is_high_scoring,
home_win_rate=home_win_rate,
home_bias=home_bias,
penalty_rate=penalty_rate,
confidence=confidence
)
def get_modifiers(self, prediction: RefereePrediction) -> Dict[str, float]:
"""
Get modifiers to apply to other predictions based on referee profile.
"""
return {
# Home team gets slight boost if referee has home bias
"home_modifier": 1.0 + (prediction.home_bias * 0.05),
# O/U modifier
"over_25_modifier": 1.0 + (prediction.avg_goals_per_match - self.LEAGUE_AVG_GOALS) * 0.1,
# Card modifier for card markets
"cards_modifier": 1.0 + (prediction.avg_yellow_cards - self.LEAGUE_AVG_YELLOW) * 0.05
}
# Singleton
_engine: Optional[RefereePredictorEngine] = None
def get_referee_predictor() -> RefereePredictorEngine:
global _engine
if _engine is None:
_engine = RefereePredictorEngine()
return _engine
if __name__ == "__main__":
engine = get_referee_predictor()
print("\n🧪 Referee Predictor Engine Test")
print("=" * 50)
pred = engine.predict(referee_name="Cüneyt Çakır")
print(f"\n📊 Prediction:")
for k, v in pred.to_dict().items():
print(f" {k}: {v}")
-286
View File
@@ -1,286 +0,0 @@
"""
Team Predictor Engine - V20 Ensemble Component
Combines ELO ratings, form stats, H2H records and team statistics.
Weight: 30% in ensemble
"""
import os
import sys
from typing import Dict, Optional, Tuple, Any
from dataclasses import dataclass, field
# Add parent to path
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from features.elo_system import get_elo_system
from features.h2h_engine import get_h2h_engine
from features.momentum_engine import get_momentum_engine, MomentumData
from features.team_stats_engine import get_team_stats_engine
@dataclass
class TeamPrediction:
"""Team engine prediction output."""
home_win_prob: float = 0.33
draw_prob: float = 0.33
away_win_prob: float = 0.33
home_xg: float = 1.3
away_xg: float = 1.1
form_advantage: float = 0.0 # -1 to +1, positive = home advantage
h2h_advantage: float = 0.0 # -1 to +1
elo_diff: float = 0.0
confidence: float = 0.0
def to_dict(self) -> dict:
return {
"home_win_prob": round(self.home_win_prob * 100, 1),
"draw_prob": round(self.draw_prob * 100, 1),
"away_win_prob": round(self.away_win_prob * 100, 1),
"home_xg": round(self.home_xg, 2),
"away_xg": round(self.away_xg, 2),
"form_advantage": round(self.form_advantage, 2),
"h2h_advantage": round(self.h2h_advantage, 2),
"elo_diff": round(self.elo_diff, 0),
"confidence": round(self.confidence, 1)
}
raw_features: Dict[str, Any] = field(default_factory=dict)
class TeamPredictorEngine:
"""
Team-based prediction engine.
Uses:
- ELO Rating System (venue-adjusted, league-weighted)
- H2H Engine (head-to-head history)
- Momentum Engine (recent form)
- Team Stats Engine (possession, shots, corners)
"""
def __init__(self):
self.elo_system = get_elo_system()
self.h2h_engine = get_h2h_engine()
self.momentum_engine = get_momentum_engine()
self.team_stats_engine = get_team_stats_engine()
print("✅ TeamPredictorEngine initialized")
def predict(self,
home_team_id: str,
away_team_id: str,
match_date_ms: int,
home_team_name: str = "",
away_team_name: str = "") -> TeamPrediction:
"""
Generate team-based prediction.
Args:
home_team_id: Home team ID
away_team_id: Away team ID
match_date_ms: Match date in milliseconds
home_team_name: Home team name (for ELO)
away_team_name: Away team name (for ELO)
Returns:
TeamPrediction with 1X2 probabilities and xG
"""
# 1. Get ELO predictions
elo_pred = self.elo_system.predict_match(home_team_id, away_team_id)
elo_features = self.elo_system.get_match_features(home_team_id, away_team_id)
# 2. Get H2H features
try:
h2h_features = self.h2h_engine.get_features(
home_team_id, away_team_id, match_date_ms
)
except Exception:
h2h_features = {
"h2h_home_win_rate": 0.5,
"h2h_away_win_rate": 0.5,
"h2h_avg_goals": 2.5,
"h2h_btts_rate": 0.5
}
# 3. Get Momentum/Form features
try:
# key: form_score should be 0-1 derived from momentum_score (-1 to 1)
home_mom_data = self.momentum_engine.calculate_momentum(home_team_id, match_date_ms)
away_mom_data = self.momentum_engine.calculate_momentum(away_team_id, match_date_ms)
home_form_score = (home_mom_data.momentum_score + 1) / 2
away_form_score = (away_mom_data.momentum_score + 1) / 2
except Exception as e:
print(f"⚠️ MomentumEngine error: {e}")
home_mom_data = MomentumData()
away_mom_data = MomentumData()
home_form_score = 0.5
away_form_score = 0.5
# 4. Get Team Stats
home_stats = self.team_stats_engine.get_features(home_team_id, match_date_ms)
away_stats = self.team_stats_engine.get_features(away_team_id, match_date_ms)
# 5. Combine predictions
# ELO-based 1X2 (60% weight)
elo_home = elo_pred.get("home_win_prob", 0.33)
elo_draw = elo_pred.get("draw_prob", 0.33)
elo_away = elo_pred.get("away_win_prob", 0.33)
# Adjust based on H2H (20% weight)
h2h_home_rate = h2h_features.get("h2h_home_win_rate", 0.5)
h2h_away_rate = h2h_features.get("h2h_away_win_rate", 0.5)
# Adjust based on form (20% weight)
home_form = home_form_score
away_form = away_form_score
form_diff = (home_form - away_form) # -1 to +1
# Weighted combination
final_home = elo_home * 0.6 + h2h_home_rate * 0.2 + (0.5 + form_diff * 0.3) * 0.2
final_away = elo_away * 0.6 + h2h_away_rate * 0.2 + (0.5 - form_diff * 0.3) * 0.2
final_draw = 1.0 - final_home - final_away
# Normalize
total = final_home + final_draw + final_away
if total > 0:
final_home /= total
final_draw /= total
final_away /= total
# Calculate xG based on stats and form (conservative base)
home_conversion = home_stats.get("shot_conversion_rate", 0.1)
away_conversion = away_stats.get("shot_conversion_rate", 0.1)
base_home_xg = 1.35 + (home_conversion * 3.0)
base_away_xg = 1.10 + (away_conversion * 2.5)
# Defense weakness factor: opponent's defensive quality affects xG
# Higher shots on target against = weaker defense
away_def_weakness = away_stats.get("shot_accuracy", 0.35) # opponent's shot accuracy as proxy
home_def_weakness = home_stats.get("shot_accuracy", 0.35)
# Adjust xG: stronger opponent defense → lower xG
home_xg = base_home_xg * (1 + form_diff * 0.15) * (0.8 + away_def_weakness * 0.6)
away_xg = base_away_xg * (1 - form_diff * 0.15) * (0.8 + home_def_weakness * 0.6)
# Apply xG Underperformance Penalty directly to calculated xG
# If a team chronically underperforms its xG, we subtract that historical difference here
if hasattr(home_mom_data, 'xg_underperformance') and home_mom_data.xg_underperformance > 0.2:
home_xg -= min(0.5, home_mom_data.xg_underperformance * 0.5)
if hasattr(away_mom_data, 'xg_underperformance') and away_mom_data.xg_underperformance > 0.2:
away_xg -= min(0.5, away_mom_data.xg_underperformance * 0.5)
# H2H adjustment (more conservative)
h2h_avg_goals = h2h_features.get("h2h_avg_goals", 2.5)
if h2h_avg_goals > 3.0:
home_xg *= 1.05
away_xg *= 1.05
elif h2h_avg_goals < 2.0:
home_xg *= 0.95
away_xg *= 0.95
# Clamp xG to reasonable range
home_xg = max(0.5, min(3.5, home_xg))
away_xg = max(0.3, min(3.0, away_xg))
# Calculate confidence
# Higher when ELO, H2H, and Form all agree
elo_winner = "H" if elo_home > max(elo_draw, elo_away) else ("A" if elo_away > elo_draw else "D")
h2h_winner = "H" if h2h_home_rate > h2h_away_rate else "A"
form_winner = "H" if form_diff > 0.1 else ("A" if form_diff < -0.1 else "D")
agreement = sum([
elo_winner == h2h_winner,
elo_winner == form_winner,
h2h_winner == form_winner
])
max_prob = max(final_home, final_draw, final_away)
confidence = max_prob * 100 * (0.7 + agreement * 0.1)
# Collect Raw Features for XGBoost
# Note: home_mom_data is an object now
def get_rate(val): return val if val is not None else 0.5
raw_features = {
**elo_features, # 8 features
# Form Features (need key mapping to match extract_training_data.py)
"home_goals_avg": 1.5 + home_mom_data.goals_trend, # Proxy
"home_conceded_avg": 1.5 - home_mom_data.conceded_trend, # Proxy
"away_goals_avg": 1.5 + away_mom_data.goals_trend,
"away_conceded_avg": 1.5 - away_mom_data.conceded_trend,
"home_clean_sheet_rate": 0.2, # Not in new MomentumData
"away_clean_sheet_rate": 0.2,
"home_scoring_rate": 0.8,
"away_scoring_rate": 0.8,
"home_winning_streak": home_mom_data.winning_streak,
"away_winning_streak": away_mom_data.winning_streak,
"home_unbeaten_streak": home_mom_data.unbeaten_streak,
"away_unbeaten_streak": away_mom_data.unbeaten_streak,
# H2H Features
**h2h_features,
# Team Stats
"home_avg_possession": home_stats.get("avg_possession", 0.5),
"away_avg_possession": away_stats.get("avg_possession", 0.5),
"home_avg_shots_on_target": home_stats.get("avg_shots_on_target", 3.5),
"away_avg_shots_on_target": away_stats.get("avg_shots_on_target", 3.5),
"home_shot_conversion": home_stats.get("shot_conversion_rate", 0.1),
"away_shot_conversion": away_stats.get("shot_conversion_rate", 0.1),
"home_avg_corners": home_stats.get("avg_corners", 4.5),
"away_avg_corners": away_stats.get("avg_corners", 4.5),
# Derived
"home_xga": 1.5 - home_mom_data.conceded_trend, # reusing as proxy
"away_xga": 1.5 - away_mom_data.conceded_trend
}
return TeamPrediction(
home_win_prob=final_home,
draw_prob=final_draw,
away_win_prob=final_away,
home_xg=home_xg,
away_xg=away_xg,
form_advantage=form_diff,
h2h_advantage=h2h_home_rate - h2h_away_rate,
elo_diff=elo_features.get("elo_diff", 0),
confidence=confidence,
raw_features=raw_features
)
# Singleton
_engine: Optional[TeamPredictorEngine] = None
def get_team_predictor() -> TeamPredictorEngine:
global _engine
if _engine is None:
_engine = TeamPredictorEngine()
return _engine
if __name__ == "__main__":
engine = get_team_predictor()
print("\n🧪 Team Predictor Engine Test")
print("=" * 50)
# Test with sample IDs
pred = engine.predict(
home_team_id="test_home",
away_team_id="test_away",
match_date_ms=1707393600000
)
print(f"\n📊 Prediction:")
for k, v in pred.to_dict().items():
print(f" {k}: {v}")
+1
View File
@@ -0,0 +1 @@
# data package
+97
View File
@@ -0,0 +1,97 @@
"""
Async Database Module V2 Betting Engine
==========================================
Provides async SQLAlchemy sessions via asyncpg for the V2 router.
Usage:
async with get_session() as session:
result = await session.execute(text("SELECT ..."))
"""
from __future__ import annotations
import os
from contextlib import asynccontextmanager
from typing import AsyncGenerator
from dotenv import load_dotenv
from sqlalchemy.ext.asyncio import (
AsyncEngine,
AsyncSession,
async_sessionmaker,
create_async_engine,
)
load_dotenv()
_engine: AsyncEngine | None = None
_session_maker: async_sessionmaker[AsyncSession] | None = None
def _get_async_dsn() -> str:
"""
Convert DATABASE_URL to asyncpg-compatible format.
Handles:
1. Prisma's ``?schema=public`` suffix → stripped
2. ``postgresql://`` driver prefix ``postgresql+asyncpg://``
"""
dsn = os.getenv(
"DATABASE_URL",
"postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db",
)
# Strip Prisma's ?schema= parameter
if "?" in dsn:
base, query = dsn.split("?", 1)
kept_parts = [
part for part in query.split("&") if part and not part.startswith("schema=")
]
dsn = base if not kept_parts else f"{base}?{'&'.join(kept_parts)}"
# Convert driver prefix for asyncpg
if dsn.startswith("postgresql://"):
dsn = dsn.replace("postgresql://", "postgresql+asyncpg://", 1)
elif dsn.startswith("postgres://"):
dsn = dsn.replace("postgres://", "postgresql+asyncpg://", 1)
return dsn
def _ensure_engine() -> AsyncEngine:
global _engine, _session_maker
if _engine is None:
_engine = create_async_engine(
_get_async_dsn(),
pool_size=5,
max_overflow=5,
pool_timeout=10,
pool_pre_ping=True,
echo=False,
)
_session_maker = async_sessionmaker(
bind=_engine,
class_=AsyncSession,
expire_on_commit=False,
)
print("✅ Async database engine created (asyncpg)")
return _engine
@asynccontextmanager
async def get_session() -> AsyncGenerator[AsyncSession, None]:
"""Provide an async session context manager."""
_ensure_engine()
assert _session_maker is not None
async with _session_maker() as session:
yield session
async def dispose_engine() -> None:
"""Shut down the async engine cleanly."""
global _engine, _session_maker
if _engine is not None:
await _engine.dispose()
_engine = None
_session_maker = None
print("️ Async database engine disposed")
+92
View File
@@ -0,0 +1,92 @@
"""
Synchronous psycopg2 database helper for the AI Engine.
Uses a thread-safe connection pool for legacy V20+ endpoints.
"""
from __future__ import annotations
import os
from contextlib import contextmanager
from typing import Generator
import psycopg2
from psycopg2 import pool
from psycopg2.extensions import connection as PgConnection
from dotenv import load_dotenv
load_dotenv()
# Safe default with no credentials — will fail fast if not configured.
_DEFAULT_DSN = "postgresql://postgres:postgres@localhost:15432/boilerplate_db"
def get_clean_dsn() -> str:
"""
Return a psycopg2-compatible DSN from DATABASE_URL.
Handles DSN cleanup issues that break raw usage:
1. Prisma appends '?schema=public' which psycopg2 cannot parse.
"""
dsn: str = os.getenv("DATABASE_URL", _DEFAULT_DSN)
connect_timeout: str = os.getenv("PGCONNECT_TIMEOUT", "5").strip() or "5"
# Strip Prisma's ?schema= query parameter while preserving any other query args.
if "?" in dsn:
base, query = dsn.split("?", 1)
kept_parts: list[str] = [
part for part in query.split("&") if part and not part.startswith("schema=")
]
dsn = base if not kept_parts else f"{base}?{'&'.join(kept_parts)}"
# Force bounded DB connect attempts so API calls do not hang indefinitely.
if "connect_timeout=" not in dsn:
separator = "&" if "?" in dsn else "?"
dsn = f"{dsn}{separator}connect_timeout={connect_timeout}"
return dsn
class Database:
_pool: pool.ThreadedConnectionPool | None = None
@classmethod
def initialize(cls) -> None:
if cls._pool is None:
dsn: str = get_clean_dsn()
try:
cls._pool = pool.ThreadedConnectionPool(
minconn=1,
maxconn=10,
dsn=dsn,
)
print("✅ Database connection pool created")
except Exception as e:
print(f"❌ Failed to create DB pool: {e}")
raise
@classmethod
def get_conn(cls) -> PgConnection:
if cls._pool is None:
cls.initialize()
assert cls._pool is not None # guaranteed by initialize()
return cls._pool.getconn()
@classmethod
def return_conn(cls, conn: PgConnection) -> None:
if cls._pool:
cls._pool.putconn(conn)
@classmethod
@contextmanager
def connection(cls) -> Generator[PgConnection, None, None]:
"""Context manager for safe connection handling."""
conn: PgConnection = cls.get_conn()
try:
yield conn
finally:
cls.return_conn(conn)
@classmethod
def close_all(cls) -> None:
if cls._pool:
cls._pool.closeall()
print("️ Database connection pool closed")
+726
View File
@@ -0,0 +1,726 @@
{
"version": "v1",
"description": "Per-league odds reliability scores computed from Brier Score analysis",
"min_matches_threshold": 50,
"total_leagues": 265,
"default_reliability": 0.35,
"lookup": {
"bx57cmq1edfq53ckfk791supi": 0.9476,
"55hcphd1ccc6eai1ms77460on": 0.9445,
"d9eaigzyfnfiraqc3ius757tl": 0.9402,
"1gxlzw2ezkyeykhcaa5x8ozkk": 0.9259,
"5jd0k2txwnq69frs79eulba8j": 0.9233,
"6694fff47wqxl10lrd9tb91f8": 0.9193,
"4jg7he1n3rb5dniq6hf49xorq": 0.9061,
"59tpnfrwnvhnhzmnvfyug68hj": 0.8988,
"ac42gi3penartj88fe9l6plpk": 0.8937,
"3j81qr7yc4gdnakfwnxf95ovh": 0.8771,
"9z5643nd06afqu01ea2wt8y4g": 0.8734,
"482ofyysbdbeoxauk19yg7tdt": 0.8722,
"ahl3vljaignq9ebaos4uqkrvo": 0.8696,
"8x3sbh85gc8qir50utw39jl04": 0.865,
"agpweohvn9tugnyl6ry4rhivp": 0.8428,
"4c1nfi2j1m731hcay25fcgndq": 0.8425,
"1j4ehtrbry9depwt6oghaq3lu": 0.8299,
"40yjcbx2sq6oq736iqqqczwt1": 0.8237,
"145hkd59i6foieuwr4mwi6wlq": 0.823,
"34pl8szyvrbwcmfkuocjm3r6t": 0.8227,
"cse5oqqt2pzfcy8uz6yz3tkbj": 0.8212,
"zs18qaehvhg3w1208874zvfa": 0.8176,
"57nu0wygurzkp6fuy5hhrtaa2": 0.8099,
"1eruend45vd20g9hbrpiggs5u": 0.8083,
"595nsvo7ykvoe690b1e4u5n56": 0.7987,
"6vq8j5p3av14nr3iuyi4okhjt": 0.793,
"486rhdgz7yc0sygziht7hje65": 0.7901,
"9hh6n2f84k31zmlcxyvmc1w2y": 0.789,
"3n5046abeu3x482ds3jwda238": 0.7863,
"8yi6ejjd1zudcqtbn07haahg6": 0.7752,
"byhmntnl1b4lxw0zz21im3zkd": 0.7719,
"2bmwykmdlcc2u1c40ytoc39vy": 0.7668,
"82jkgccg7phfjpd0mltdl3pat": 0.7643,
"2nttcoriwf5co73vmz1vr8frm": 0.7641,
"dr2xk7muj8aqcjdz2b3li1c0k": 0.759,
"4yngyfinzd6bb1k7anqtqs0wt": 0.7586,
"eog6knrkfei68si736fpquyzc": 0.756,
"eg6s9f1jj7jr6stmbosn0g6c8": 0.7538,
"ae1wva3zrzcp2zd15gpvsntg6": 0.7517,
"cesdwwnxbc5fmajgroc0hqzy2": 0.7466,
"8k1xcsyvxapl4jlsluh3eomre": 0.7463,
"bdtat25m14jy85y484z3e6lf": 0.7437,
"iu1vi94p4p28oozl1h9bvplr": 0.7411,
"1r097lpxe0xn03ihb7wi98kao": 0.7391,
"2kwbbcootiqqgmrzs6o5inle5": 0.7386,
"9fuwphq8kvugrlc3ckm7k8wes": 0.7358,
"civf31q1inxohs4a03y8reetf": 0.735,
"ili150pwfuf39f7yfdch9lhw": 0.7286,
"abs7n2ae3oydilk0tgmpnsj89": 0.7277,
"9nbpdi9q3ywcm4q0j5u0ekwcq": 0.7254,
"6by3h89i2eykc341oz7lv1ddd": 0.7252,
"4qehj8hfxmy6o2ohp4fxinnzo": 0.7244,
"9u4pm8x0lfmfq3r0pypmrls71": 0.7244,
"c7b8o53flg36wbuevfzy3lb10": 0.7144,
"89ovpy1rarewwzqvi30bfdr8b": 0.7068,
"4d5d3sf6805n5u6jdoa0hdlog": 0.7052,
"eqz64pn0qsp2y7aq4m9id3fn6": 0.7031,
"8q60vlvn3krynkob6igrncdjq": 0.703,
"6ihotpaocgiovlxw18e9r9prx": 0.7019,
"c0r21rtokgnbtc0o2rldjmkxu": 0.7013,
"1mpjd0vbxbtu9zw89yj09xk3z": 0.6996,
"4zwgbb66rif2spcoeeol2motx": 0.6995,
"bu1l7ckihyr0errxw61p0m05": 0.6995,
"cv3tuitw3ho3v0opjjxpn83b9": 0.6974,
"8r98daokeuzsamu5fmjtblqx5": 0.6922,
"dvstmwnvw0mt5p38twn9yttyb": 0.688,
"8y29fg2s85ppcb8uugm5ee8s4": 0.6866,
"19q13y6ruzo0o84ipblcuouzs": 0.6858,
"f4jc2cc5nq7flaoptpi5ua4k4": 0.6852,
"4oogyu6o156iphvdvphwpck10": 0.684,
"3e40pestup9xzagsu2o6c0i8u": 0.6824,
"4rls982p5uzil6x30mhyhv9f3": 0.6812,
"e21cf135btr8t3upw0vl6n6x0": 0.6771,
"65q4uwm6ol1rkf5dp89m8omny": 0.6754,
"46b141eaqq9q7o4gz5gtdpikk": 0.6752,
"75i269i1ak43magshljadydrh": 0.6741,
"3ab1uwtoyjopdj1y1fynyy9jg": 0.6737,
"4mbfidy8zum5u0aqjqo0vuqs2": 0.673,
"7wssxdqi4xihseeam8grqa2b8": 0.666,
"61fzfjogstjuukzcehighq7mu": 0.6641,
"6g8hw3acenrw828la7gwx4mvs": 0.663,
"e1kxdivp5g4cpldgpwvnzl1vv": 0.6626,
"9ikchyu9fb8bvx0s673jofj6s": 0.6622,
"a9vrdkelbgif0gtu3wxsr75xo": 0.6618,
"6sxm2iln2w45ux498pty9miw8": 0.6615,
"ea0h6cf3bhl698hkxhpulh2zz": 0.661,
"apdwh753fupxheygs8seahh7x": 0.6604,
"er5745q30wnr8jv9nr863omzg": 0.659,
"2z7257m7hj58zuxcjrsg4erzc": 0.6551,
"2o9svokc5s7diish3ycrzk7jm": 0.655,
"8usjlmziv3p2re0r2wwzezki9": 0.6549,
"c0yqkbilbbg70ij2473xymmqv": 0.6506,
"du6jsenbjql5e8f3yk880ox4g": 0.6494,
"cbdbziaqczfuyuwqsylqi26zd": 0.6478,
"725gd73msyt08xm76v7gkxj7u": 0.6445,
"enzlj1as2raqm4ids1zyb07y1": 0.6442,
"scf9p4y91yjvqvg5jndxzhxj": 0.6414,
"5z8v4mj6cjs9ex6hdrpourjzh": 0.6389,
"4zwjlzdszduqmxzusysvzymms": 0.6387,
"7nmz249q89qg5ezcvzlheljji": 0.6381,
"2mdmx668tyhy4u4z9zszwjv5v": 0.6345,
"4a7o9rf7ytl8g3ejwpblc6p5n": 0.6306,
"2ty8ihceabty8yddmu31iuuej": 0.6283,
"dy8zaksw5e9nwrs1p5ss4o1nu": 0.628,
"1b70m6qtxrp75b4vtk8hxh8c3": 0.6261,
"ajxs0e0g6ryg5ol8qvw3evrcz": 0.6249,
"a4fgj2rfbpf4ejo1qi624fefo": 0.6184,
"akmkihra9ruad09ljapsm84b3": 0.6182,
"907l7wtxdvugdo9i2249wcmr0": 0.6171,
"6lwpjhktjhl9g7x2w7njmzva6": 0.6164,
"ax1yf4nlzqpcji4j8epdgx3zl": 0.6163,
"6ybvtzejh91761lqe7y1csrqo": 0.6158,
"3btdfgw79qiz3jmyfudovtbu2": 0.6122,
"5cwsxtx37les6m10xj71htkgf": 0.6101,
"9p3nnxhdjahfn8qswpzy8oyc3": 0.61,
"2xg0qvif1rh7du6wmk2eleku3": 0.6091,
"1wwro3z1eb3fl601dju6inlc6": 0.6084,
"gfskxsdituog2kqp9yiu7bzi": 0.6076,
"zilopfej2h0n3vpan5tcynpo": 0.6051,
"2hsidwomhjsaaytdy9u5niyi4": 0.6012,
"1klyfth8tl6lu6ra7k8zmy2n2": 0.5996,
"cegl2ivkc25blcatxp4jmk1ec": 0.5993,
"7qf0jaayyxy3ruamsexv5p1kl": 0.5988,
"erpufio3qaujd9gkszcqvb0bf": 0.5972,
"cfesxhzb83yl8b779uv3revz1": 0.597,
"3ww12jab49q8q8mk9avdwjqgk": 0.5961,
"8t2o4huu2e48ij23dxnl9w5qx": 0.5928,
"5vq1bl8h8dxdr34w0jaanokto": 0.5919,
"ac112osli9fvox1epcg4ld3t6": 0.59,
"3frp1zxrqulrlrnk503n6l4l": 0.5808,
"c76z5d6j7dpi1e79tm8fpm39z": 0.5807,
"6ifaeunfdelecgticvxanikzu": 0.5796,
"81txfenlgw75nq3u2nfdkj92o": 0.5789,
"yv73ms6v1995b5wny16jcfi3": 0.5787,
"b3ufcd24wfnnd5j98ped6irfu": 0.5752,
"29actv1ohj8r10kd9hu0jnb0n": 0.5737,
"bfqezwfhot1l3p1cpk4oonh25": 0.5705,
"5taraea6mqjjldg9zxswo825y": 0.5696,
"7qdv1xae7ikfe8dft3oj29yqc": 0.5692,
"dm5ka0os1e3dxcp3vh05kmp33": 0.5678,
"ay4u6j7lfkcg7x21mx5q121j": 0.5676,
"7af85xa75vozt2l4hzi6ryts7": 0.5663,
"5k620c7y6dlbmcm88dt3eb7t": 0.5644,
"ejunkmfhjz9weugd2bqrkgobb": 0.564,
"3428tckxcirwwh3o3jgc1m8ji": 0.5597,
"d6zovb8puwgcmsg91iya6rbtm": 0.5593,
"2wolc27r8z03itcvwp43e38c5": 0.5592,
"alpfd99yd3lfv7bhjo0biuq7b": 0.5582,
"beqqnubkv05mamuwvimeum015": 0.5577,
"4w7x0s5gfs5abasphlha5de8k": 0.5558,
"9ynnnx1qmkizq1o3qr3v0nsuk": 0.554,
"722fdbecxzcq9788l6jqclzlw": 0.5539,
"287tckirbfj9nb8ar2k9r60vn": 0.5529,
"esrunz7rjb0td98mx9e5cedoy": 0.5516,
"32n2r9bl6x90psj0wa7bfs6vq": 0.5487,
"50ap4sua1xyut3mpu7ehesp63": 0.5483,
"5c96g1zm7vo5ons9c42uy2w3r": 0.5469,
"3p81ltz6845appgkbgkzxueii": 0.5454,
"3n9mk5b2mxmq831wfmv6pu86i": 0.5437,
"5zr0b05eyx25km7z1k03ca9jx": 0.5424,
"1owhvvge4wlx7e0e431b4vhqx": 0.5423,
"3iwftmprsznl6yribr11a8l9m": 0.5393,
"7r1f93t6ddrsa5n8v1nq6qlzm": 0.5393,
"1gwajyt0pk2jm5fx5mu36v114": 0.5389,
"581t4mywybx21wcpmpykhyzr3": 0.5388,
"6wubmo7di3kdpflluf6s8c7vs": 0.5375,
"bq89wbdvedtov6auzuh6rsv7s": 0.5363,
"byu00jvt1j6csyv4y1lkt2fm2": 0.5359,
"af79lqrc0ntom74zq13ccjslo": 0.5357,
"3ri6juw2w6ma0jezszdlv1uqm": 0.5356,
"3l29w00m506ex93t5bbh9cg2a": 0.5355,
"1zp1du9n4rj36p1ss9zbxtqfb": 0.5353,
"9chuiarcjofld1dkj9kysehmb": 0.5346,
"5aw6uyw4pz2bpj24t5z8aacim": 0.5333,
"by5nibd18nkt40t0j8a0j5yzx": 0.5332,
"4yzidekywejmxxp77gqmdgopg": 0.5323,
"7ntvbsyq31jnzoqoa8850b9b8": 0.5305,
"a7247po5qs29o3zsfmt222ydu": 0.5299,
"117yqo02rs8dykkxpm274w3bd": 0.5298,
"193wqkyb0v5jnsblhvd2ocmyo": 0.5296,
"8jh0jejuxfhrpawnoztz2jlv4": 0.5295,
"5y0z0l2epprzbscvzsgldw8vu": 0.5288,
"47s2kt0e8m444ftqvsrqa3bvq": 0.5268,
"2hj3286pqov1g1g59k2t2qcgm": 0.5245,
"7swf4kpu3v38i2it4h94c5s9k": 0.5227,
"78wml3z5wrfxe5iky50tiotgu": 0.5196,
"f39uq10c8xhg5e6rwwcf6lhgc": 0.5186,
"bbajzna018c79opa1kl5kmkqo": 0.5172,
"4davonpqws4a4ejl1awu98zdg": 0.5168,
"1fedahp0rws09tj451onten8r": 0.5163,
"aho73e5udydy96iun3tkzdzsi": 0.5149,
"3aa4mumjl6zyetg6o9hwd5hhx": 0.5125,
"7cwemnr3vi40znjq451zxkus6": 0.5115,
"ajm86skyzse4ym8g6fpgzncxa": 0.5112,
"bgen5kjer2ytfp7lo9949t72g": 0.5102,
"8ey0ww2zsosdmwr8ehsorh6t7": 0.51,
"8najqkluatpaxvqws78b9s17c": 0.5082,
"8v97rcbthsxmzqk4ufxws9mug": 0.506,
"degxm4y6gmvp011ccyrev6z5p": 0.5049,
"3oa9e03e7w9nr8kqwqc3tlqz9": 0.5049,
"5dycj9wdhxh3n33qubw18ohlk": 0.5036,
"3is4bkgf3loxv9qfg3hm8zfqb": 0.5033,
"f47f3717z2vtpxfxrpdd4jl1x": 0.498,
"8ivsfwex4dfx1tvgsiq8askcx": 0.4972,
"8vbck9a4mxjms783lf72779uu": 0.4946,
"aql5z4osw5wmun0emnakfpwji": 0.4946,
"e6vzdkz6l236s9p288mharefy": 0.4925,
"4nidzmunvpvxk1ir9b6m8mpay": 0.4874,
"ein4fkggto3pdh5msp8huafiq": 0.4856,
"1q4ab2bpg5e8jl1g2udnakrju": 0.4852,
"8ztsv3pzrsyq5w1r3a0nfk1y5": 0.4842,
"1qd0wvt30rlswa4g6nu4na660": 0.4826,
"jznihqxle06xych9ygwiwnsa": 0.4796,
"2y8bntiif3a9y6gtmauv30gt": 0.4782,
"477yyajzheg2z8u7uick0e13e": 0.4706,
"bockl24qpr7ryjl8b6obukga": 0.4671,
"7mxwwunvot2pi69pj1yr1kh8i": 0.466,
"3w1hkk9k9gr8fwssyn4icvdfo": 0.4657,
"1txej2dzohnydl21zc9pgx6hy": 0.464,
"b8rae0ib0frjmwlca429bq19q": 0.4624,
"b5udgm9vakjqz8dcmy5b2g0xt": 0.4582,
"eitf7hulqfv1clb7toewkil24": 0.458,
"7hl0svs2hg225i2zud0g3xzp2": 0.4559,
"2aso72utuctat2ecs6nahjss6": 0.4521,
"3ymqchdzk8tt6lfphf26xfvh0": 0.4519,
"2yyjcbbryf1r10apyzl7c7jvp": 0.4507,
"bly7ema5au6j40i0grhl0pnub": 0.4476,
"b1rveez5u792gess9w3e7v5le": 0.4444,
"8sdpk4aerruf515yh76ezo7vi": 0.4434,
"32vph7vcjqgo1ksj1548di90n": 0.44,
"65ggsqdi6drpa4m8y3gkll25k": 0.4394,
"xaouuwuk8qyhv1libkeexwjh": 0.4347,
"6qitd9h242qkvjenaytfdnsf2": 0.4312,
"duuc1qczfnawwncru1ly6o66": 0.4213,
"b60nisd3qn427jm0hrg9kvmab": 0.4203,
"xwnjb1az11zffwty3m6vn8y6": 0.4197,
"dkarmrybx9vx10rg7cywumth0": 0.4158,
"75434tz9rc14xkkvudex742ui": 0.4137,
"c1d9p6b2e9zr5tqlzx3ktjplg": 0.4129,
"b73zounsynk9d3u1p9nvpu7i2": 0.4049,
"913mb508il6jzwtlj28fl892h": 0.4044,
"e0lck99w8meo9qoalfrxgo33o": 0.401,
"8dn0w8zh7nbn2i904603eigwf": 0.3984,
"ddyrh5latwfhesgfh4w401n92": 0.3973,
"avs3xposm3t9x1x2vzsoxzcbu": 0.3957,
"eu2g5j36zzxiazpd729osx0wm": 0.3924,
"67uya58idol2eq18ljecsru5o": 0.3912,
"23e698ls3x6vi9x8wl0mz7bsa": 0.3838,
"6321dlqv4ziuwqte4xpohijtw": 0.382,
"8o5tv5viv4hy1qg9jp94k7ayb": 0.381,
"53tknno09wqihmwxrqcuwq9sa": 0.3782,
"82wo38rqeizxlfjjhfjy4rx7u": 0.3781,
"dvtl8sf1262pd2aqgu641qa7u": 0.3767,
"663a54fmymndjeev47qm7d3nf": 0.3522,
"macko16888165594668885588": 0.3309,
"macko16698982162572521585": 0.3262,
"6lkj3o21cr4g7bql6tb3fk222": 0.3261,
"cu0rmpyff5692eo06ltddjo8a": 0.3161,
"1cnx2c8g3hhp8ssxnwwli0mjb": 0.3121,
"4vt0ldrcl6thpxpcs8zmpdq1g": 0.2926,
"etta63x1t7tnkn4jheisjwk4p": 0.2907,
"1n9l0ex47bu0762qg574hzjtd": 0.2626,
"6jgwiu2gq3dllmrwt45pfdn2z": 0.2416,
"392slbmf1kdqlr6sd1ckt71rs": 0.24,
"8z3180hhw2pj1i65uftlk54uz": 0.2096
},
"details": [
{
"league_id": "bx57cmq1edfq53ckfk791supi",
"league_name": "CAF Konfederasyon Kupası",
"match_count": 98,
"brier_score": 0.3046,
"heavy_fav_win_pct": 84.1,
"fav_win_pct": 63.3,
"odds_reliability": 0.9476
},
{
"league_id": "55hcphd1ccc6eai1ms77460on",
"league_name": "Şampiyonlar Ligi Kadınlar",
"match_count": 89,
"brier_score": 0.3258,
"heavy_fav_win_pct": 83.3,
"fav_win_pct": 74.2,
"odds_reliability": 0.9445
},
{
"league_id": "d9eaigzyfnfiraqc3ius757tl",
"league_name": "Kupa",
"match_count": 78,
"brier_score": 0.3141,
"heavy_fav_win_pct": 81.2,
"fav_win_pct": 73.1,
"odds_reliability": 0.9402
},
{
"league_id": "1gxlzw2ezkyeykhcaa5x8ozkk",
"league_name": "Concacaf Orta Amerika Kupası",
"match_count": 88,
"brier_score": 0.3338,
"heavy_fav_win_pct": 79.4,
"fav_win_pct": 61.4,
"odds_reliability": 0.9259
},
{
"league_id": "5jd0k2txwnq69frs79eulba8j",
"league_name": "Kupa",
"match_count": 69,
"brier_score": 0.3223,
"heavy_fav_win_pct": 78.4,
"fav_win_pct": 66.7,
"odds_reliability": 0.9233
},
{
"league_id": "6694fff47wqxl10lrd9tb91f8",
"league_name": "Kupa",
"match_count": 55,
"brier_score": 0.3099,
"heavy_fav_win_pct": 78.8,
"fav_win_pct": 67.3,
"odds_reliability": 0.9193
},
{
"league_id": "4jg7he1n3rb5dniq6hf49xorq",
"league_name": "Premier Lig",
"match_count": 79,
"brier_score": 0.3333,
"heavy_fav_win_pct": 77.1,
"fav_win_pct": 64.6,
"odds_reliability": 0.9061
},
{
"league_id": "59tpnfrwnvhnhzmnvfyug68hj",
"league_name": "Libertadores Kupası",
"match_count": 180,
"brier_score": 0.3408,
"heavy_fav_win_pct": 76.2,
"fav_win_pct": 61.7,
"odds_reliability": 0.8988
},
{
"league_id": "ac42gi3penartj88fe9l6plpk",
"league_name": "Premier Lig",
"match_count": 185,
"brier_score": 0.3148,
"heavy_fav_win_pct": 70.7,
"fav_win_pct": 68.1,
"odds_reliability": 0.8937
},
{
"league_id": "3j81qr7yc4gdnakfwnxf95ovh",
"league_name": "Premier Lig",
"match_count": 106,
"brier_score": 0.333,
"heavy_fav_win_pct": 72.2,
"fav_win_pct": 60.4,
"odds_reliability": 0.8771
},
{
"league_id": "9z5643nd06afqu01ea2wt8y4g",
"league_name": "Kuu Bara Ligi",
"match_count": 110,
"brier_score": 0.3294,
"heavy_fav_win_pct": 70.3,
"fav_win_pct": 53.6,
"odds_reliability": 0.8734
},
{
"league_id": "482ofyysbdbeoxauk19yg7tdt",
"league_name": "Trendyol Süper Lig",
"match_count": 342,
"brier_score": 0.3627,
"heavy_fav_win_pct": 80.7,
"fav_win_pct": 59.6,
"odds_reliability": 0.8722
},
{
"league_id": "ahl3vljaignq9ebaos4uqkrvo",
"league_name": "Kupa",
"match_count": 105,
"brier_score": 0.331,
"heavy_fav_win_pct": 70.4,
"fav_win_pct": 63.8,
"odds_reliability": 0.8696
},
{
"league_id": "8x3sbh85gc8qir50utw39jl04",
"league_name": "UEFA Kadınlar Euro 2025 Elemeleri",
"match_count": 88,
"brier_score": 0.3421,
"heavy_fav_win_pct": 75.5,
"fav_win_pct": 61.4,
"odds_reliability": 0.865
},
{
"league_id": "agpweohvn9tugnyl6ry4rhivp",
"league_name": "Eredivisie Kadınlar",
"match_count": 51,
"brier_score": 0.3356,
"heavy_fav_win_pct": 72.0,
"fav_win_pct": 56.9,
"odds_reliability": 0.8428
},
{
"league_id": "4c1nfi2j1m731hcay25fcgndq",
"league_name": "Avrupa Ligi",
"match_count": 242,
"brier_score": 0.3625,
"heavy_fav_win_pct": 77.6,
"fav_win_pct": 61.6,
"odds_reliability": 0.8425
},
{
"league_id": "1j4ehtrbry9depwt6oghaq3lu",
"league_name": "Süper Lig",
"match_count": 84,
"brier_score": 0.3201,
"heavy_fav_win_pct": 65.9,
"fav_win_pct": 60.7,
"odds_reliability": 0.8299
},
{
"league_id": "40yjcbx2sq6oq736iqqqczwt1",
"league_name": "DK Elemeler",
"match_count": 88,
"brier_score": 0.3383,
"heavy_fav_win_pct": 68.6,
"fav_win_pct": 55.7,
"odds_reliability": 0.8237
},
{
"league_id": "145hkd59i6foieuwr4mwi6wlq",
"league_name": "Pro Lig",
"match_count": 143,
"brier_score": 0.3546,
"heavy_fav_win_pct": 73.8,
"fav_win_pct": 60.1,
"odds_reliability": 0.823
},
{
"league_id": "34pl8szyvrbwcmfkuocjm3r6t",
"league_name": "LaLiga",
"match_count": 364,
"brier_score": 0.3773,
"heavy_fav_win_pct": 80.2,
"fav_win_pct": 56.6,
"odds_reliability": 0.8227
},
{
"league_id": "cse5oqqt2pzfcy8uz6yz3tkbj",
"league_name": "CAF Şampiyonlar Ligi",
"match_count": 91,
"brier_score": 0.3513,
"heavy_fav_win_pct": 73.9,
"fav_win_pct": 57.1,
"odds_reliability": 0.8212
},
{
"league_id": "zs18qaehvhg3w1208874zvfa",
"league_name": "1. Lig",
"match_count": 225,
"brier_score": 0.3744,
"heavy_fav_win_pct": 82.1,
"fav_win_pct": 59.6,
"odds_reliability": 0.8176
},
{
"league_id": "57nu0wygurzkp6fuy5hhrtaa2",
"league_name": "1. Lig",
"match_count": 286,
"brier_score": 0.3626,
"heavy_fav_win_pct": 72.9,
"fav_win_pct": 59.1,
"odds_reliability": 0.8099
},
{
"league_id": "1eruend45vd20g9hbrpiggs5u",
"league_name": "Botola Pro",
"match_count": 265,
"brier_score": 0.3625,
"heavy_fav_win_pct": 72.9,
"fav_win_pct": 50.2,
"odds_reliability": 0.8083
},
{
"league_id": "595nsvo7ykvoe690b1e4u5n56",
"league_name": "UEFA Uluslar Ligi",
"match_count": 67,
"brier_score": 0.3687,
"heavy_fav_win_pct": 83.3,
"fav_win_pct": 50.7,
"odds_reliability": 0.7987
},
{
"league_id": "6vq8j5p3av14nr3iuyi4okhjt",
"league_name": "Süper Lig Kadınlar",
"match_count": 70,
"brier_score": 0.356,
"heavy_fav_win_pct": 73.5,
"fav_win_pct": 58.6,
"odds_reliability": 0.793
},
{
"league_id": "486rhdgz7yc0sygziht7hje65",
"league_name": "Kupa",
"match_count": 62,
"brier_score": 0.3704,
"heavy_fav_win_pct": 81.1,
"fav_win_pct": 66.1,
"odds_reliability": 0.7901
},
{
"league_id": "9hh6n2f84k31zmlcxyvmc1w2y",
"league_name": "2. Lig",
"match_count": 204,
"brier_score": 0.357,
"heavy_fav_win_pct": 69.2,
"fav_win_pct": 62.3,
"odds_reliability": 0.789
},
{
"league_id": "3n5046abeu3x482ds3jwda238",
"league_name": "WE Lig Kadınlar",
"match_count": 102,
"brier_score": 0.3761,
"heavy_fav_win_pct": 85.4,
"fav_win_pct": 58.8,
"odds_reliability": 0.7863
},
{
"league_id": "8yi6ejjd1zudcqtbn07haahg6",
"league_name": "Premier Lig",
"match_count": 302,
"brier_score": 0.3712,
"heavy_fav_win_pct": 72.1,
"fav_win_pct": 56.3,
"odds_reliability": 0.7752
},
{
"league_id": "byhmntnl1b4lxw0zz21im3zkd",
"league_name": "Kupa",
"match_count": 96,
"brier_score": 0.3528,
"heavy_fav_win_pct": 68.2,
"fav_win_pct": 58.3,
"odds_reliability": 0.7719
},
{
"league_id": "2bmwykmdlcc2u1c40ytoc39vy",
"league_name": "Açık Kupası",
"match_count": 93,
"brier_score": 0.3807,
"heavy_fav_win_pct": 84.6,
"fav_win_pct": 66.7,
"odds_reliability": 0.7668
},
{
"league_id": "82jkgccg7phfjpd0mltdl3pat",
"league_name": "Süper Lig",
"match_count": 289,
"brier_score": 0.3782,
"heavy_fav_win_pct": 74.0,
"fav_win_pct": 57.4,
"odds_reliability": 0.7643
},
{
"league_id": "2nttcoriwf5co73vmz1vr8frm",
"league_name": "Nesine 2. Lig",
"match_count": 525,
"brier_score": 0.3782,
"heavy_fav_win_pct": 71.8,
"fav_win_pct": 55.2,
"odds_reliability": 0.7641
},
{
"league_id": "dr2xk7muj8aqcjdz2b3li1c0k",
"league_name": "Meistaradeildin",
"match_count": 129,
"brier_score": 0.3714,
"heavy_fav_win_pct": 73.6,
"fav_win_pct": 61.2,
"odds_reliability": 0.759
},
{
"league_id": "4yngyfinzd6bb1k7anqtqs0wt",
"league_name": "Premier Lig",
"match_count": 195,
"brier_score": 0.3772,
"heavy_fav_win_pct": 74.4,
"fav_win_pct": 57.4,
"odds_reliability": 0.7586
},
{
"league_id": "eog6knrkfei68si736fpquyzc",
"league_name": "Lig Kupası",
"match_count": 120,
"brier_score": 0.3632,
"heavy_fav_win_pct": 69.9,
"fav_win_pct": 66.7,
"odds_reliability": 0.756
},
{
"league_id": "eg6s9f1jj7jr6stmbosn0g6c8",
"league_name": "Süper Lig",
"match_count": 108,
"brier_score": 0.3657,
"heavy_fav_win_pct": 71.2,
"fav_win_pct": 55.6,
"odds_reliability": 0.7538
},
{
"league_id": "ae1wva3zrzcp2zd15gpvsntg6",
"league_name": "Ulusal Lig",
"match_count": 278,
"brier_score": 0.3783,
"heavy_fav_win_pct": 72.7,
"fav_win_pct": 55.0,
"odds_reliability": 0.7517
},
{
"league_id": "cesdwwnxbc5fmajgroc0hqzy2",
"league_name": "Hazırlık Maçları Ülkeler",
"match_count": 235,
"brier_score": 0.3669,
"heavy_fav_win_pct": 67.6,
"fav_win_pct": 56.2,
"odds_reliability": 0.7466
},
{
"league_id": "8k1xcsyvxapl4jlsluh3eomre",
"league_name": "Premier Lig",
"match_count": 328,
"brier_score": 0.385,
"heavy_fav_win_pct": 74.2,
"fav_win_pct": 45.7,
"odds_reliability": 0.7463
},
{
"league_id": "bdtat25m14jy85y484z3e6lf",
"league_name": "Kupa",
"match_count": 90,
"brier_score": 0.3772,
"heavy_fav_win_pct": 75.7,
"fav_win_pct": 55.6,
"odds_reliability": 0.7437
},
{
"league_id": "iu1vi94p4p28oozl1h9bvplr",
"league_name": "1. Lig",
"match_count": 158,
"brier_score": 0.3729,
"heavy_fav_win_pct": 71.2,
"fav_win_pct": 50.0,
"odds_reliability": 0.7411
},
{
"league_id": "1r097lpxe0xn03ihb7wi98kao",
"league_name": "Serie A",
"match_count": 359,
"brier_score": 0.3732,
"heavy_fav_win_pct": 67.8,
"fav_win_pct": 56.5,
"odds_reliability": 0.7391
},
{
"league_id": "2kwbbcootiqqgmrzs6o5inle5",
"league_name": "Premier Lig",
"match_count": 369,
"brier_score": 0.3791,
"heavy_fav_win_pct": 70.2,
"fav_win_pct": 54.2,
"odds_reliability": 0.7386
},
{
"league_id": "9fuwphq8kvugrlc3ckm7k8wes",
"league_name": "Ligler Kupası",
"match_count": 143,
"brier_score": 0.3934,
"heavy_fav_win_pct": 81.6,
"fav_win_pct": 50.3,
"odds_reliability": 0.7358
},
{
"league_id": "civf31q1inxohs4a03y8reetf",
"league_name": "Premier Lig",
"match_count": 320,
"brier_score": 0.3721,
"heavy_fav_win_pct": 67.2,
"fav_win_pct": 57.8,
"odds_reliability": 0.735
},
{
"league_id": "ili150pwfuf39f7yfdch9lhw",
"league_name": "UEFA U21 Şampiyonası Elemeler",
"match_count": 112,
"brier_score": 0.3715,
"heavy_fav_win_pct": 70.4,
"fav_win_pct": 67.9,
"odds_reliability": 0.7286
},
{
"league_id": "abs7n2ae3oydilk0tgmpnsj89",
"league_name": "Azadegan Ligi",
"match_count": 217,
"brier_score": 0.3801,
"heavy_fav_win_pct": 71.4,
"fav_win_pct": 45.2,
"odds_reliability": 0.7277
},
{
"league_id": "9nbpdi9q3ywcm4q0j5u0ekwcq",
"league_name": "Serie D",
"match_count": 232,
"brier_score": 0.3718,
"heavy_fav_win_pct": 67.2,
"fav_win_pct": 54.7,
"odds_reliability": 0.7254
}
]
}
-4
View File
@@ -15,13 +15,9 @@ Orijinal Faktörler:
- Tarihsel upset pattern
"""
import os
import sys
from typing import Dict, Any, Optional, Tuple, List
from dataclasses import dataclass, field
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
try:
import psycopg2
from psycopg2.extras import RealDictCursor
+173 -18
View File
@@ -7,11 +7,14 @@ import time
from contextlib import asynccontextmanager
from typing import Any
from datetime import datetime
import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import subprocess
from pydantic import BaseModel
try:
@@ -21,6 +24,7 @@ except ImportError:
HAS_BASKETBALL = False
from services.single_match_orchestrator import get_single_match_orchestrator
from services.v26_shadow_engine import get_v26_shadow_engine
from models.league_model import get_league_model_loader
load_dotenv()
@@ -37,6 +41,23 @@ class CouponRequest(BaseModel):
min_confidence: float | None = None
class RetrainRequest(BaseModel):
reason: str | None = "manual"
markets: str | None = None # comma-separated, e.g. "MS,OU25,BTTS"
trials: int | None = 50
# ─── Retrain state tracking ──────────────────────────────────
_retrain_state: dict[str, Any] = {
"running": False,
"last_started": None,
"last_completed": None,
"last_status": None,
"last_error": None,
"pid": None,
}
@asynccontextmanager
async def lifespan(_: FastAPI):
try:
@@ -114,6 +135,8 @@ def read_root() -> dict[str, Any]:
"GET /v20plus/reversal-watchlist",
"POST /v20plus/coupon",
"GET /v20plus/daily-banker",
"POST /v1/admin/retrain",
"GET /v1/admin/retrain/status",
],
}
@@ -123,7 +146,15 @@ def health_check() -> dict[str, Any]:
try:
orchestrator = get_single_match_orchestrator()
shadow_engine = get_v26_shadow_engine()
# Per-market V25 model status
v25_readiness: dict[str, Any] = {"fully_loaded": False}
try:
v25_predictor = orchestrator._get_v25_predictor()
v25_readiness = v25_predictor.readiness_summary()
except Exception as v25_err:
v25_readiness = {"fully_loaded": False, "error": str(v25_err)}
if HAS_BASKETBALL:
basketball_predictor = get_basketball_v25_predictor()
basketball_readiness = basketball_predictor.readiness_summary()
@@ -131,35 +162,52 @@ def health_check() -> dict[str, Any]:
else:
basketball_readiness = {"fully_loaded": False, "error": "Basketball module not found"}
ready = True
league_readiness = get_league_model_loader().readiness_summary()
overall_ready = ready and v25_readiness.get("fully_loaded", False)
return {
"status": "healthy" if ready else "degraded",
"status": "healthy" if overall_ready else "degraded",
"engine": "v28.main",
"mode": os.getenv("AI_ENGINE_MODE", "v28"),
"ready": ready,
"ready": overall_ready,
"v25_football": v25_readiness,
"league_specific": league_readiness,
"basketball_v25": basketball_readiness,
"v26_shadow": shadow_engine.readiness_summary(),
"prediction_service_ready": True,
"model_loaded": ready,
"model_loaded": overall_ready,
"orchestrator_mode": getattr(orchestrator, "engine_mode", "v28"),
}
except Exception as error:
return {"status": "unhealthy", "ready": False, "error": str(error)}
_REQUIRED_RESPONSE_FIELDS = ("match_info", "market_board", "main_pick", "bet_summary", "data_quality")
@app.post("/v20plus/analyze/{match_id}")
async def analyze_match_v20plus(match_id: str) -> dict[str, Any]:
started_at = time.time()
orchestrator = get_single_match_orchestrator()
result = orchestrator.analyze_match(match_id)
result = await asyncio.to_thread(orchestrator.analyze_match, match_id)
elapsed_ms = int((time.time() - started_at) * 1000)
if not result:
raise HTTPException(status_code=404, detail=f"Match not found: {match_id}")
# Response validation: log missing required fields (non-fatal)
missing_fields = [f for f in _REQUIRED_RESPONSE_FIELDS if f not in result]
if missing_fields:
print(f"⚠️ [API] analyze/{match_id} response missing fields: {missing_fields} ({elapsed_ms}ms)")
result["timing_ms"] = elapsed_ms
return result
@app.get("/v20plus/analyze-htms/{match_id}")
async def analyze_match_htms_v20plus(match_id: str) -> dict[str, Any]:
orchestrator = get_single_match_orchestrator()
result = orchestrator.analyze_match_htms(match_id)
result = await asyncio.to_thread(orchestrator.analyze_match_htms, match_id)
if not result:
raise HTTPException(status_code=404, detail=f"Match not found: {match_id}")
return result
@@ -230,11 +278,12 @@ async def analyze_match_htft_v20plus(match_id: str, timeout_sec: int = 30) -> di
@app.post("/v20plus/coupon")
async def generate_coupon_v20plus(request: CouponRequest) -> dict[str, Any]:
orchestrator = get_single_match_orchestrator()
return orchestrator.build_coupon(
match_ids=request.match_ids,
strategy=request.strategy or "BALANCED",
max_matches=request.max_matches,
min_confidence=request.min_confidence,
return await asyncio.to_thread(
orchestrator.build_coupon,
request.match_ids,
request.strategy or "BALANCED",
request.max_matches,
request.min_confidence,
)
@@ -244,7 +293,7 @@ async def get_daily_banker_v20plus(count: int = 3) -> dict[str, Any]:
raise HTTPException(status_code=400, detail="count must be >= 1")
orchestrator = get_single_match_orchestrator()
bankers = orchestrator.get_daily_bankers(count=count)
bankers = await asyncio.to_thread(orchestrator.get_daily_bankers, count)
return {"count": len(bankers), "bankers": bankers}
@app.get("/v20plus/reversal-watchlist")
@@ -262,14 +311,120 @@ async def get_reversal_watchlist_v20plus(
raise HTTPException(status_code=400, detail="min_score must be between 0 and 100")
orchestrator = get_single_match_orchestrator()
return orchestrator.get_reversal_watchlist(
count=count,
horizon_hours=horizon_hours,
min_score=min_score,
top_leagues_only=top_leagues_only,
return await asyncio.to_thread(
orchestrator.get_reversal_watchlist,
count,
horizon_hours,
min_score,
top_leagues_only,
)
# ─── ADMIN: Retrain Pipeline ─────────────────────────────────
def _run_retrain_pipeline(markets: str | None, trials: int):
"""Background function: extract data → train model → reload."""
global _retrain_state
ai_dir = os.path.dirname(os.path.abspath(__file__))
scripts_dir = os.path.join(ai_dir, "scripts")
python = os.path.join(ai_dir, "venv", "bin", "python3")
if not os.path.exists(python):
python = sys.executable # fallback
try:
# Step 1: Extract training data
print("🔄 [RETRAIN] Step 1/3: Extracting training data...", flush=True)
result = subprocess.run(
[python, os.path.join(scripts_dir, "extract_training_data.py")],
capture_output=True, text=True, timeout=600, cwd=ai_dir,
)
if result.returncode != 0:
raise RuntimeError(f"Extract failed:\n{result.stderr[-500:]}")
print(f"✅ [RETRAIN] Extract done", flush=True)
# Step 2: Train V25 Pro
print("🔄 [RETRAIN] Step 2/3: Training V25 Pro model...", flush=True)
train_cmd = [python, os.path.join(scripts_dir, "train_v25_pro.py")]
if markets:
train_cmd += ["--markets", markets]
train_cmd += ["--trials", str(trials)]
result = subprocess.run(
train_cmd, capture_output=True, text=True, timeout=3600, cwd=ai_dir,
)
if result.returncode != 0:
raise RuntimeError(f"Training failed:\n{result.stderr[-500:]}")
print(f"✅ [RETRAIN] Training done", flush=True)
# Step 3: Reload models in memory
print("🔄 [RETRAIN] Step 3/3: Reloading models...", flush=True)
try:
orchestrator = get_single_match_orchestrator()
v25 = orchestrator._get_v25_predictor()
v25._loaded = False
v25.load_models()
print("✅ [RETRAIN] Models reloaded in memory", flush=True)
except Exception as reload_err:
print(f"⚠️ [RETRAIN] Hot reload failed (restart needed): {reload_err}", flush=True)
_retrain_state.update({
"running": False,
"last_completed": datetime.now().isoformat(),
"last_status": "success",
"last_error": None,
})
print("🎉 [RETRAIN] Pipeline complete!", flush=True)
except Exception as err:
_retrain_state.update({
"running": False,
"last_completed": datetime.now().isoformat(),
"last_status": "failed",
"last_error": str(err),
})
print(f"❌ [RETRAIN] Pipeline failed: {err}", flush=True)
@app.post("/v1/admin/retrain")
async def admin_retrain(request: RetrainRequest) -> dict[str, Any]:
"""Trigger full retrain pipeline: extract → train → reload."""
if _retrain_state["running"]:
return {
"status": "already_running",
"message": f"Retrain in progress since {_retrain_state['last_started']}",
}
_retrain_state.update({
"running": True,
"last_started": datetime.now().isoformat(),
"last_status": "running",
"last_error": None,
})
# Run in background thread
import threading
thread = threading.Thread(
target=_run_retrain_pipeline,
args=(request.markets, request.trials or 50),
daemon=True,
)
thread.start()
return {
"status": "triggered",
"message": "Retrain pipeline started in background",
"reason": request.reason,
"markets": request.markets or "all",
"trials": request.trials or 50,
}
@app.get("/v1/admin/retrain/status")
async def admin_retrain_status() -> dict[str, Any]:
"""Check retrain pipeline status."""
return {**_retrain_state}
if __name__ == "__main__":
port = int(os.getenv("PORT", "8000"))
uvicorn.run("main:app", host="0.0.0.0", port=port, reload=True)
+69 -26
View File
@@ -46,6 +46,9 @@ SUPPORTED_MARKETS = [
"ht_ft", # Half-Time/Full-Time
"dc", # Double Chance
"ht", # Half-Time Result
"ht_home", # Half-Time Home win
"ht_draw", # Half-Time Draw
"ht_away", # Half-Time Away win
]
@@ -91,22 +94,29 @@ class Calibrator:
def __init__(self):
self.calibrators: Dict[str, IsotonicRegression] = {}
self.metrics: Dict[str, CalibrationMetrics] = {}
# Less aggressive shrinkage — only meaningful overconfident bands are pulled.
# Default raised from ~0.85-0.90 to 0.95+ since the orchestrator and config
# already apply market-level multipliers; double-shrinkage was the root cause
# of 24-35pt avg calibrated-vs-raw drops in production traces.
self.heuristic_fallback: Dict[str, float] = {
"ms": 0.90,
"ms_home": 0.90,
"ms_home_heavy_fav": 0.95,
"ms_home_fav": 0.90,
"ms_home_balanced": 0.85,
"ms_home_underdog": 0.80,
"ms_draw": 0.90,
"ms_away": 0.90,
"ou15": 0.90,
"ou25": 0.90,
"ou35": 0.90,
"btts": 0.90,
"ht_ft": 0.85,
"dc": 0.93,
"ht": 0.85,
"ms": 0.96,
"ms_home": 0.96,
"ms_home_heavy_fav": 0.98,
"ms_home_fav": 0.96,
"ms_home_balanced": 0.94,
"ms_home_underdog": 0.92,
"ms_draw": 0.94,
"ms_away": 0.96,
"ou15": 0.96,
"ou25": 0.96,
"ou35": 0.94,
"btts": 0.96,
"ht_ft": 0.92,
"dc": 0.97,
"ht": 0.92,
"ht_home": 0.92,
"ht_draw": 0.92,
"ht_away": 0.92,
}
self._load_calibrators()
@@ -139,21 +149,32 @@ class Calibrator:
except Exception as e:
print(f"[Calibrator] Warning: Failed to load metrics for {market}: {e}")
# Below this sample count, blend isotonic with raw_prob to dampen overfit jumps.
# Above this count, trust isotonic fully.
TRUSTED_SAMPLE_FLOOR = 30
TRUSTED_SAMPLE_CEILING = 200
# Hard cap on how far calibration can move probability in either direction.
MAX_DELTA = 0.20
def calibrate(self, market_type: str, raw_prob: float, odds_val: Optional[float] = None) -> float:
"""
Calibrate a raw probability using Isotonic Regression.
Calibrate a raw probability using Isotonic Regression with safeguards.
Args:
market_type (str): 'ms_home', 'ou25', 'btts', 'ht_ft', etc.
raw_prob (float): The raw probability from XGBoost (0.0 - 1.0)
odds_val (float, optional): The pre-match odds, used for context-aware bucket mapping
Returns:
float: Calibrated probability (0.0 - 1.0)
Safeguards:
* Low-sample trained models are blended with raw_prob to dampen overfit.
* MAX_DELTA caps the per-call adjustment (prevents 40pp swings).
"""
# Normalize market type
market_key = market_type.lower().replace("-", "_")
# Route to bucket if ms_home and odds provided
if market_key == "ms_home" and odds_val is not None and odds_val > 1.0:
if odds_val <= 1.40:
@@ -164,20 +185,42 @@ class Calibrator:
bucket_key = "ms_home_balanced"
else:
bucket_key = "ms_home_underdog"
if bucket_key in self.calibrators:
market_key = bucket_key
# If we have a trained Isotonic Regression model, use it
# If we have a trained Isotonic Regression model, use it (with safeguards)
if market_key in self.calibrators:
try:
calibrated = self.calibrators[market_key].predict([raw_prob])[0]
# Ensure output is valid probability
return float(np.clip(calibrated, 0.01, 0.99))
iso_pred = float(self.calibrators[market_key].predict([raw_prob])[0])
# Sample-count weighted blend with raw probability.
# Sparse models barely move probability; mature models dominate.
metrics = self.metrics.get(market_key)
n_samples = metrics.sample_count if metrics else 0
if n_samples >= self.TRUSTED_SAMPLE_CEILING:
iso_weight = 1.0
elif n_samples <= self.TRUSTED_SAMPLE_FLOOR:
# Very sparse: at least 30% trust to surface the signal
iso_weight = max(0.30, n_samples / self.TRUSTED_SAMPLE_CEILING)
else:
# Linearly ramp 30% → 100% between floor and ceiling
span = self.TRUSTED_SAMPLE_CEILING - self.TRUSTED_SAMPLE_FLOOR
iso_weight = 0.30 + 0.70 * (n_samples - self.TRUSTED_SAMPLE_FLOOR) / span
blended = iso_weight * iso_pred + (1.0 - iso_weight) * raw_prob
# Cap delta to avoid huge swings on noisy calibrators
delta = blended - raw_prob
if delta > self.MAX_DELTA:
blended = raw_prob + self.MAX_DELTA
elif delta < -self.MAX_DELTA:
blended = raw_prob - self.MAX_DELTA
return float(np.clip(blended, 0.01, 0.99))
except Exception as e:
print(f"[Calibrator] Warning: Isotonic failed for {market_key}: {e}")
# Fall through to heuristic
# Fallback to heuristic calibration
return self._heuristic_calibrate(market_key, raw_prob)
+191
View File
@@ -0,0 +1,191 @@
"""
League-Specific Model Loader
=============================
Loads per-league XGBoost models + isotonic calibrators trained by
scripts/train_league_models.py and provides a unified prediction interface.
Falls back to general V25 for any market/league without a dedicated model.
"""
import os
import json
import pickle
from functools import lru_cache
from typing import Dict, Optional, Tuple
import numpy as np
import pandas as pd
import xgboost as xgb
AI_ENGINE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
LEAGUE_MODEL_DIR = os.path.join(AI_ENGINE_DIR, "models", "league_specific")
# Market file name → (num_class, label_list)
MARKET_META: Dict[str, Tuple[int, list]] = {
"ms": (3, ["1", "X", "2"]),
"ou15": (2, ["Over", "Under"]),
"ou25": (2, ["Over", "Under"]),
"ou35": (2, ["Over", "Under"]),
"btts": (2, ["Yes", "No"]),
"ht": (3, ["1", "X", "2"]),
"ht_ou05": (2, ["Over", "Under"]),
"ht_ou15": (2, ["Over", "Under"]),
"htft": (9, ["1/1","1/X","1/2","X/1","X/X","X/2","2/1","2/X","2/2"]),
"oe": (2, ["Odd", "Even"]),
"cards": (2, ["Over", "Under"]),
"handicap": (3, ["1", "X", "2"]),
}
# Signal key map (file key → uppercase signal key used in _get_v25_signal)
FILE_TO_SIGNAL = {
"ms": "MS", "ou15": "OU15", "ou25": "OU25", "ou35": "OU35",
"btts": "BTTS", "ht": "HT", "ht_ou05": "HT_OU05", "ht_ou15": "HT_OU15",
"htft": "HTFT", "oe": "OE", "cards": "CARDS", "handicap": "HCAP",
}
class LeagueModel:
"""Holds XGBoost models + isotonic calibrators for one league."""
def __init__(self, league_id: str):
self.league_id = league_id
self.league_dir = os.path.join(LEAGUE_MODEL_DIR, league_id)
self.models: Dict[str, xgb.Booster] = {} # market_key → booster
self.calibrators: Dict[str, object] = {} # cal_key → isotonic
self.feature_cols: Optional[list] = None
self._loaded = False
def load(self) -> bool:
if not os.path.isdir(self.league_dir):
return False
try:
fc_path = os.path.join(self.league_dir, "feature_cols.json")
if os.path.exists(fc_path):
with open(fc_path) as f:
self.feature_cols = json.load(f)
for mkey in MARKET_META:
xgb_path = os.path.join(self.league_dir, f"xgb_{mkey}.json")
if os.path.exists(xgb_path) and os.path.getsize(xgb_path) > 100:
b = xgb.Booster()
b.load_model(xgb_path)
self.models[mkey] = b
for fname in os.listdir(self.league_dir):
if fname.startswith("cal_") and fname.endswith(".pkl"):
cal_key = fname[4:-4] # strip cal_ and .pkl
with open(os.path.join(self.league_dir, fname), "rb") as f:
self.calibrators[cal_key] = pickle.load(f)
self._loaded = bool(self.models or self.calibrators)
return self._loaded
except Exception as e:
print(f"[LeagueModel] Load failed for {self.league_id}: {e}")
return False
def has_market(self, mkey: str) -> bool:
return mkey in self.models
def predict_market(
self,
mkey: str,
feature_row: Dict[str, float],
) -> Optional[Dict[str, float]]:
"""
Predict one market using league-specific XGBoost + isotonic calibration.
Returns {label: prob} dict or None if no model available.
"""
if mkey not in self.models:
return None
num_class, labels = MARKET_META[mkey]
fc = self.feature_cols
if fc is None:
# Fallback to whatever the booster expects (it knows its feature names)
fc = list(self.models[mkey].feature_names or [])
try:
X = pd.DataFrame([{col: feature_row.get(col, 0.0) for col in fc}])
dmat = xgb.DMatrix(X)
raw = self.models[mkey].predict(dmat)
if num_class > 2:
probs_arr = raw.reshape(-1, num_class)[0]
probs = {labels[i]: float(probs_arr[i]) for i in range(num_class)}
# Apply isotonic calibration per class
cal_total = 0.0
for i, label in enumerate(labels):
cal_key = f"{mkey}_{i}"
if cal_key in self.calibrators:
p_cal = float(self.calibrators[cal_key].predict([probs_arr[i]])[0])
probs[label] = max(0.01, min(0.99, p_cal))
cal_total += probs[label]
if cal_total > 0:
probs = {k: v / cal_total for k, v in probs.items()}
else:
p = float(raw[0])
cal_key = mkey
if cal_key in self.calibrators:
p = float(self.calibrators[cal_key].predict([p])[0])
p = max(0.01, min(0.99, p))
probs = {labels[0]: p, labels[1]: 1.0 - p}
return probs
except Exception as e:
print(f"[LeagueModel] predict_market({mkey}) failed for {self.league_id}: {e}")
return None
class LeagueModelLoader:
"""
In-memory cache for league-specific models.
Thread-safe for single-process async servers (FastAPI/uvicorn).
"""
def __init__(self, max_cached: int = 80):
self._cache: Dict[str, Optional[LeagueModel]] = {}
self._max_cached = max_cached
def get(self, league_id: str) -> Optional[LeagueModel]:
"""Return loaded LeagueModel for this league, or None if unavailable."""
if league_id in self._cache:
return self._cache[league_id]
# Evict oldest entry if cache is full
if len(self._cache) >= self._max_cached:
oldest = next(iter(self._cache))
del self._cache[oldest]
model = LeagueModel(league_id)
loaded = model.load()
self._cache[league_id] = model if loaded else None
if loaded:
n_models = len(model.models)
n_cals = len(model.calibrators)
print(f"[LeagueModel] Loaded {league_id}: {n_models} XGB models, {n_cals} calibrators")
return self._cache[league_id]
def available_leagues(self) -> list:
if not os.path.isdir(LEAGUE_MODEL_DIR):
return []
return [d for d in os.listdir(LEAGUE_MODEL_DIR)
if os.path.isdir(os.path.join(LEAGUE_MODEL_DIR, d))]
def readiness_summary(self) -> dict:
leagues = self.available_leagues()
return {
"league_specific_dir": LEAGUE_MODEL_DIR,
"available_leagues": len(leagues),
"cached": len([v for v in self._cache.values() if v is not None]),
}
# ── Singleton ──────────────────────────────────────────────────────
_loader: Optional[LeagueModelLoader] = None
def get_league_model_loader() -> LeagueModelLoader:
global _loader
if _loader is None:
_loader = LeagueModelLoader()
return _loader
File diff suppressed because it is too large Load Diff
+154
View File
@@ -0,0 +1,154 @@
[
"home_overall_elo",
"away_overall_elo",
"elo_diff",
"home_home_elo",
"away_away_elo",
"home_form_elo",
"away_form_elo",
"form_elo_diff",
"home_goals_avg",
"home_conceded_avg",
"away_goals_avg",
"away_conceded_avg",
"home_clean_sheet_rate",
"away_clean_sheet_rate",
"home_scoring_rate",
"away_scoring_rate",
"home_winning_streak",
"away_winning_streak",
"home_unbeaten_streak",
"away_unbeaten_streak",
"h2h_total_matches",
"h2h_home_win_rate",
"h2h_draw_rate",
"h2h_avg_goals",
"h2h_btts_rate",
"h2h_over25_rate",
"home_avg_possession",
"away_avg_possession",
"home_avg_shots_on_target",
"away_avg_shots_on_target",
"home_shot_conversion",
"away_shot_conversion",
"home_avg_corners",
"away_avg_corners",
"odds_ms_h",
"odds_ms_d",
"odds_ms_a",
"implied_home",
"implied_draw",
"implied_away",
"odds_ht_ms_h",
"odds_ht_ms_d",
"odds_ht_ms_a",
"odds_ou05_o",
"odds_ou05_u",
"odds_ou15_o",
"odds_ou15_u",
"odds_ou25_o",
"odds_ou25_u",
"odds_ou35_o",
"odds_ou35_u",
"odds_ht_ou05_o",
"odds_ht_ou05_u",
"odds_ht_ou15_o",
"odds_ht_ou15_u",
"odds_btts_y",
"odds_btts_n",
"odds_ms_h_present",
"odds_ms_d_present",
"odds_ms_a_present",
"odds_ht_ms_h_present",
"odds_ht_ms_d_present",
"odds_ht_ms_a_present",
"odds_ou05_o_present",
"odds_ou05_u_present",
"odds_ou15_o_present",
"odds_ou15_u_present",
"odds_ou25_o_present",
"odds_ou25_u_present",
"odds_ou35_o_present",
"odds_ou35_u_present",
"odds_ht_ou05_o_present",
"odds_ht_ou05_u_present",
"odds_ht_ou15_o_present",
"odds_ht_ou15_u_present",
"odds_btts_y_present",
"odds_btts_n_present",
"home_xga",
"away_xga",
"league_avg_goals",
"league_zero_goal_rate",
"upset_atmosphere",
"upset_motivation",
"upset_fatigue",
"upset_potential",
"referee_home_bias",
"referee_avg_goals",
"referee_cards_total",
"referee_avg_yellow",
"referee_experience",
"home_momentum_score",
"away_momentum_score",
"momentum_diff",
"home_squad_quality",
"away_squad_quality",
"squad_diff",
"home_key_players",
"away_key_players",
"home_missing_impact",
"away_missing_impact",
"home_goals_form",
"away_goals_form",
"home_lineup_goals_per90",
"away_lineup_goals_per90",
"home_lineup_assists_per90",
"away_lineup_assists_per90",
"home_squad_continuity",
"away_squad_continuity",
"home_top_scorer_form",
"away_top_scorer_form",
"home_avg_player_exp",
"away_avg_player_exp",
"home_goals_diversity",
"away_goals_diversity",
"h2h_home_goals_avg",
"h2h_away_goals_avg",
"h2h_recent_trend",
"h2h_venue_advantage",
"home_rolling5_goals",
"home_rolling5_conceded",
"home_rolling10_goals",
"home_rolling10_conceded",
"home_rolling20_goals",
"home_rolling20_conceded",
"away_rolling5_goals",
"away_rolling5_conceded",
"away_rolling10_goals",
"away_rolling10_conceded",
"home_rolling5_cs",
"away_rolling5_cs",
"home_venue_goals",
"home_venue_conceded",
"away_venue_goals",
"away_venue_conceded",
"home_goal_trend",
"away_goal_trend",
"home_days_rest",
"away_days_rest",
"match_month",
"is_season_start",
"is_season_end",
"attack_vs_defense_home",
"attack_vs_defense_away",
"xg_diff",
"form_momentum_interaction",
"elo_form_consistency",
"upset_x_elo_gap",
"league_home_win_rate",
"league_draw_rate",
"league_btts_rate",
"league_ou25_rate",
"league_reliability_score"
]
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+891
View File
@@ -0,0 +1,891 @@
tree
version=v4
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=151
objective=binary sigmoid:1
feature_names=Column_0 Column_1 Column_2 Column_3 Column_4 Column_5 Column_6 Column_7 Column_8 Column_9 Column_10 Column_11 Column_12 Column_13 Column_14 Column_15 Column_16 Column_17 Column_18 Column_19 Column_20 Column_21 Column_22 Column_23 Column_24 Column_25 Column_26 Column_27 Column_28 Column_29 Column_30 Column_31 Column_32 Column_33 Column_34 Column_35 Column_36 Column_37 Column_38 Column_39 Column_40 Column_41 Column_42 Column_43 Column_44 Column_45 Column_46 Column_47 Column_48 Column_49 Column_50 Column_51 Column_52 Column_53 Column_54 Column_55 Column_56 Column_57 Column_58 Column_59 Column_60 Column_61 Column_62 Column_63 Column_64 Column_65 Column_66 Column_67 Column_68 Column_69 Column_70 Column_71 Column_72 Column_73 Column_74 Column_75 Column_76 Column_77 Column_78 Column_79 Column_80 Column_81 Column_82 Column_83 Column_84 Column_85 Column_86 Column_87 Column_88 Column_89 Column_90 Column_91 Column_92 Column_93 Column_94 Column_95 Column_96 Column_97 Column_98 Column_99 Column_100 Column_101 Column_102 Column_103 Column_104 Column_105 Column_106 Column_107 Column_108 Column_109 Column_110 Column_111 Column_112 Column_113 Column_114 Column_115 Column_116 Column_117 Column_118 Column_119 Column_120 Column_121 Column_122 Column_123 Column_124 Column_125 Column_126 Column_127 Column_128 Column_129 Column_130 Column_131 Column_132 Column_133 Column_134 Column_135 Column_136 Column_137 Column_138 Column_139 Column_140 Column_141 Column_142 Column_143 Column_144 Column_145 Column_146 Column_147 Column_148 Column_149 Column_150 Column_151
feature_infos=[1150.3663761896189:1903.4781806887747] [1158.5088961211511:1916.84579108047] [-496.81477567713546:573.10259120534784] [1159.8767670517543:1884.5959848901657] [1151.7894548779084:1919.4116678360419] [1426.1496448360797:1585.9817930954068] [1427.9817118206745:1588.9895054335384] [-113.02538114266532:114.69704651598067] [0:5.9333333333333336] [0:6.2666666666666666] [0:6.0666666666666664] [0:5.2000000000000002] [0:1] [0:1] [0:1] [0:1] [0:5] [0:5] [0:5] [0:5] [0:8] [0:1] [0:1] [0:11] [0:1] [0:1] [0.20999999999999999:0.81000000000000005] [0.22500000000000001:0.76000000000000001] [0:13] [0:13] [0:6.333333333333333] [0:6.666666666666667] [0:4.5] [0:4.5] [0:22.550000000000001] [0:17.5] [0:35.5] [0.065285302506677897:0.80962131918207236] [0.1044438556117265:0.4288719106684401] [0.056651501780225801:0.79902050363656307] [0:26] [0:5.0899999999999999] [0:26.5] [0:1.0900000000000001] [0:13.050000000000001] [0:1.76] [0:13.25] [0:3.7400000000000002] [0:7.6500000000000004] [0:8.5600000000000005] [0:3.5299999999999998] [0:1.72] [0:7.3300000000000001] [0:5.0199999999999996] [0:2.5800000000000001] [0:3.77] [0:4.1299999999999999] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:1] [0:6.2666666666666666] [0:5.2000000000000002] [2.0046118370484245:4.1840324763193504] [0.018042399639151999:0.1552651806302843] [0:0.45000000000000001] none none [0:0.1575] [-0.92000000000000004:1] [0:7] none none [0:1] [-1:0.61250000000000004] [-1:0.59166666666666667] [-1.2908333333333333:1.3799999999999999] [0:40.799999999999997] [0:36.299999999999997] [-29.999999999999996:31.499999999999996] [0:10] [0:10] [0:1] [0:1] [0:5.7999999999999998] [0:5] [0:5] [0:6.25] [0:4] [0:5] [0:1] [0:1] [0:10] [0:11] [0:42.100000000000001] [0:41.799999999999997] [0:1] [0:1] [0:8] [0:8] [-1:1] [0:1] [0:5.7999999999999998] [0:6.5] [0:5.2000000000000002] [0:6.5] [0:4.4119999999999999] [0:6.5] [0:5.3330000000000002] [0:4.7999999999999998] [0:5.3330000000000002] [0:4.5] [0:1] [0:1] [0:7] [0:8] [0:6] [0:7] [-1.8:1.8] [-2.2000000000000002:2] [1:30] [1.2:30] [1:12] [0:1] [0:1] [-4.4669999999999996:4.5999999999999996] [-5.2000000000000002:5] [-4.133:5.3330000000000002] [-0.045499999999999999:0.088099999999999998] [0.063600000000000004:1] [0:0.088800000000000004] [0.35623409669211198:0.51556156968876865] [0.1051136363636363:0.32744043043812449] [0.40430438124519602:0.64185836716283262] [0.32129131437355879:0.77537212449255755] [0.70399999999999996:1]
tree_sizes=946 1005 993 986 1007 997 994 1005 992 1009 984 1007 997 684 1002 999 897 992 991 990 1001 905 1003 995 896 1010 908 1005 1007 1014 1012 1005 1005 691 688
Tree=0
num_leaves=8
num_cat=0
split_feature=45 54 46 96 4 3 16
split_gain=174.584 35.5641 24.9346 16.8552 14.3832 11.0732 10.8094
threshold=1.155 1.405 2.5850000000000004 4.5000000000000009 1704.4687685416313 1441.3975280143418 1.0000000180025095e-35
decision_type=2 2 2 2 2 2 2
left_child=1 3 5 -1 -4 -2 -3
right_child=2 6 4 -5 -6 -7 -8
leaf_value=0.87683759709368503 0.84782001969484888 0.90404985782878589 0.85793883408869098 0.90097077069702014 0.73127673195863474 0.81008235044809496 0.93515098554525133
leaf_weight=939.87957760691643 113.51603642106056 283.89385342597961 418.36988925933838 472.32632339000702 9.9611878395080549 330.17187193036079 212.09029108285904
leaf_count=4529 547 1368 2016 2276 48 1591 1022
internal_value=0.875686 0.893342 0.837052 0.884909 0.85499 0.819737 0.91735
internal_weight=2780.21 1908.19 872.019 1412.21 428.331 443.688 495.984
internal_count=13397 9195 4202 6805 2064 2138 2390
is_linear=0
shrinkage=1
Tree=1
num_leaves=8
num_cat=0
split_feature=47 48 49 86 141 133 150
split_gain=140.072 28.5634 26.9673 23.4898 19.0464 14.6224 10.7313
threshold=1.6850000000000003 2.1650000000000005 3.3850000000000002 3.3279569892473124 1.2620000000000002 0.31650000000000006 0.48538789856891795
decision_type=2 2 2 2 2 2 2
left_child=1 3 5 -1 -4 -2 -3
right_child=2 6 4 -5 -6 -7 -8
leaf_value=0.0068305934170468773 -0.15031955758812821 -0.031384052219233315 -0.052289652303345355 0.062911487393225371 0.031971580220150592 -0.011449057668411599 0.04709896679065162
leaf_weight=1284.6507234424353 8.3659095764160138 19.989385187625885 501.71516834199429 86.587033972144127 30.929726287722588 495.32643516361713 353.17835873365402
leaf_count=6221 40 98 2370 419 146 2373 1730
internal_value=-0.000699774 0.0173296 -0.0310475 0.0103722 -0.0473963 -0.0137584 0.0428942
internal_weight=2780.74 1744.41 1036.34 1371.24 532.645 503.692 373.168
internal_count=13397 8468 4929 6640 2516 2413 1828
is_linear=0
shrinkage=0.104222
Tree=2
num_leaves=8
num_cat=0
split_feature=45 50 53 39 96 28 150
split_gain=110.988 23.3948 18.1465 13.3271 12.0676 11.8858 11.7377
threshold=1.155 1.4150000000000003 3.3250000000000006 0.36187197152743827 4.5000000000000009 3.9500000000000006 0.48538789856891795
decision_type=2 2 2 2 2 2 2
left_child=1 4 5 -4 -1 -2 -3
right_child=2 6 3 -5 -6 -7 -8
leaf_value=0.0012888166907150367 -0.037269165059036741 -0.040251887739062138 -0.05134333704385069 -0.13896201271441588 0.020859848952961575 -0.011322757210899898 0.041785901483228478
leaf_weight=991.85861267149448 395.00907251238823 20.026126876473427 106.85419258475304 22.895305529236794 522.50144763290882 372.75306884944439 348.38899676501751
leaf_count=4802 1851 97 494 106 2555 1751 1741
internal_value=-0.000605268 0.0137719 -0.0307651 -0.0668133 0.00804152 -0.0246723 0.0373256
internal_weight=2780.29 1882.78 897.512 129.749 1514.36 767.762 368.415
internal_count=13397 9195 4202 600 7357 3602 1838
is_linear=0
shrinkage=0.104222
Tree=3
num_leaves=8
num_cat=0
split_feature=45 48 86 11 53 144 104
split_gain=91.2855 30.6398 20.2224 17.4942 14.4128 10.3549 10.2483
threshold=1.155 2.2850000000000006 3.3279569892473124 0.30952380952380959 3.4950000000000006 0.0075500000000000003 1.0005000000000002
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 6 -6 -2
right_child=4 3 -4 -5 5 -7 -8
leaf_value=0.0040069597334026 -0.027086096902533729 -0.1715719381102207 0.053726932967164444 0.048773966775503941 -0.061900987427835959 -0.23396750965062757 0.016338062312498316
leaf_weight=1510.6300098896027 782.38381478190422 3.9652889966964713 94.400425717234612 257.22131448984146 73.248439565300941 3.9991354644298545 63.84617380797863
leaf_count=7369 3628 20 471 1306 331 18 297
internal_value=-0.000943352 0.0123188 0.00693156 0.0454226 -0.0277441 -0.0708364 -0.0238097
internal_weight=2789.69 1866.22 1605.03 261.187 923.478 77.2476 846.23
internal_count=13440 9166 7840 1326 4274 349 3925
is_linear=0
shrinkage=0.104222
Tree=4
num_leaves=8
num_cat=0
split_feature=49 35 6 53 44 126 38
split_gain=70.6071 28.8927 21.165 13.8246 12.9797 16.2351 8.96833
threshold=2.8350000000000004 3.3550000000000004 1541.1282734213053 1.9650000000000001 6.5150000000000006 1.8450000000000002 0.27603289214361998
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 5 -2 -6
right_child=4 3 -4 -5 6 -7 -8
leaf_value=-0.0027517198054468608 -0.029319791370752923 0.043695199840511914 0.083541694807877556 0.015347179328011443 -0.028790250204289367 -0.099217570104222469 -0.0022072689999685109
leaf_weight=937.29022094607353 417.85833202302456 320.03798474371433 31.915322333574295 449.0833810120821 218.10735833644867 39.50090055167675 374.63410261273384
leaf_count=4582 1898 1642 157 2211 1027 181 1742
internal_value=-0.000829778 0.0120602 9.06525e-05 0.0271434 -0.0221677 -0.0353586 -0.0119891
internal_weight=2788.43 1738.33 969.206 769.121 1050.1 457.359 592.741
internal_count=13440 8592 4739 3853 4848 2079 2769
is_linear=0
shrinkage=0.104222
Tree=5
num_leaves=8
num_cat=0
split_feature=51 50 35 143 44 147 11
split_gain=63.712 19.0005 13.7227 11.8206 10.6018 10.4763 6.84163
threshold=1.2550000000000001 1.4650000000000001 5.035000000000001 -1.7974999999999997 7.2550000000000008 0.47199397614381194 2.6904761904761911
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 6 -5 -3 -2
right_child=3 5 -4 4 -6 -7 -8
leaf_value=0.0047544573525090221 -0.065940592898403594 0.046851958603355899 0.063639968219378742 -0.028402060075216558 -0.0037183287940315865 -0.010933729592520421 -0.2419531318087022
leaf_weight=1547.0932241082191 6.1367039680480939 207.7856714874506 44.20663620531559 674.41967558860779 262.60369572043419 40.760551080107689 3.9397892653942108
leaf_count=7584 28 1099 222 3053 1221 215 18
internal_value=-0.000729499 0.0105766 0.00639068 -0.0226927 -0.0214845 0.0373753 -0.134921
internal_weight=2786.95 1839.85 1591.3 947.1 937.023 248.546 10.0765
internal_count=13440 9120 7806 4320 4274 1314 46
is_linear=0
shrinkage=0.104222
Tree=6
num_leaves=8
num_cat=0
split_feature=41 130 54 48 99 34 105
split_gain=42.6482 15.4832 11.8244 11.5616 10.5481 8.08864 5.28375
threshold=1.9950000000000003 2.1835000000000004 1.6950000000000001 2.0250000000000004 0.13392857142857142 2.1550000000000007 1.0765000000000002
decision_type=2 2 2 2 2 2 2
left_child=1 3 4 -1 -2 -3 -4
right_child=2 5 6 -5 -6 -7 -8
leaf_value=-0.012930103603477631 -0.025837088447404372 0.009985288944886676 0.093598551064641447 0.0827918792790409 0.017739499999769957 0.067684395437451708 -0.011462427181900188
leaf_weight=1452.7631230950356 63.837418377399445 116.17519751191139 30.260349631309509 13.827319353818892 1098.0612086355686 34.140560820698738 6.2758191227912894
leaf_count=6763 321 560 163 70 5525 162 34
internal_value=0.00228156 -0.00876197 0.0171813 -0.0120271 0.0153451 0.0230937 0.0755539
internal_weight=2815.34 1616.91 1198.43 1466.59 1161.9 150.316 36.5362
internal_count=13598 7555 6043 6833 5846 722 197
is_linear=0
shrinkage=0.104222
Tree=7
num_leaves=8
num_cat=0
split_feature=41 135 130 21 53 102 16
split_gain=34.421 15.4189 13.9177 13.4109 11.0563 13.2179 9.32957
threshold=1.9950000000000003 -0.78349999999999997 2.1340000000000003 0.4642857142857143 2.0250000000000004 2.1120000000000005 1.5000000000000002
decision_type=2 2 2 2 2 2 2
left_child=1 3 -3 -1 6 -6 -2
right_child=4 2 -4 -5 5 -7 -8
leaf_value=-0.039029586512136658 0.021256889774663217 -0.0093694924607743823 0.024219953555788796 -0.18450744298912586 0.010656362766355395 -0.067587490182026186 0.059808579820180507
leaf_weight=29.468837857246399 324.64188092947006 1435.6618839651346 147.77664601802826 8.9744612574577314 753.81818389892578 24.195777088403702 86.313260063529015
leaf_count=138 1721 6659 716 42 3733 122 467
internal_value=0.00205658 -0.00781819 -0.00623458 -0.0730278 0.0155268 0.00822224 0.0293551
internal_weight=2810.85 1621.88 1583.44 38.4433 1188.97 778.014 410.955
internal_count=13598 7555 7375 180 6043 3855 2188
is_linear=0
shrinkage=0.104222
Tree=8
num_leaves=8
num_cat=0
split_feature=86 41 30 92 147 8 85
split_gain=36.407 25.9777 15.256 11.9724 10.8984 10.2187 7.72256
threshold=3.2679487179487183 1.8750000000000002 0.46841755319148942 -0.76249999999999984 0.40846280364372473 1.1083333333333336 -0.021724137931034445
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 5 -2 -6
right_child=4 3 -4 -5 6 -7 -8
leaf_value=-0.012159457825842523 0.064307005786210347 -0.085506751396820332 -0.10592968794102031 0.0081406001340509019 0.027805156010338752 -0.1311253894779951 0.074917050140623179
leaf_weight=1030.2416722476482 5.0686567574739438 14.960604444146155 19.18836584687233 1572.8768468499184 63.189212292432785 6.7883874922990799 94.08995346724987
leaf_count=4724 25 75 92 7835 328 34 485
internal_value=0.00185481 -0.00115146 -0.0138749 0.00725775 0.0487278 -0.0475971 0.055992
internal_weight=2806.4 2637.27 1049.43 1587.84 169.136 11.857 157.279
internal_count=13598 12726 4816 7910 872 59 813
is_linear=0
shrinkage=0.104222
Tree=9
num_leaves=8
num_cat=0
split_feature=49 150 52 133 30 91 30
split_gain=26.3296 13.6848 11.0511 10.5586 10.2017 14.0051 13.9274
threshold=2.7850000000000006 0.60942003008724055 2.9650000000000003 2.7320000000000007 0.10822072072072071 -0.0024999999999999497 0.12681311751103638
decision_type=2 2 2 2 2 2 2
left_child=1 3 -3 -1 5 -2 -6
right_child=4 2 -4 -5 6 -7 -8
leaf_value=0.0056197323214787877 0.0092359089138059399 0.013793895571876906 0.055269467591862222 -0.056030521488072138 0.020769131467870516 -0.044189842169712931 -0.012941111969465513
leaf_weight=1335.4049420952797 73.188210651278496 194.0794630497694 108.95535556972027 30.86374768614769 165.2346598058939 196.06340071558952 684.92604845762253
leaf_count=6712 329 987 589 157 747 879 3130
internal_value=0.000378046 0.00867093 0.0287076 0.00422669 -0.0119885 -0.0296678 -0.00638913
internal_weight=2788.72 1669.3 303.035 1366.27 1119.41 269.252 850.161
internal_count=13530 8445 1576 6869 5085 1208 3877
is_linear=0
shrinkage=0.104222
Tree=10
num_leaves=8
num_cat=0
split_feature=86 41 0 6 110 102 53
split_gain=38.5149 20.3558 13.2957 10.1636 9.93058 14.9217 6.02783
threshold=3.0790020790020796 1.925 1413.5914823844723 1484.8073996434571 2.9500000000000006 0.65050000000000019 2.2150000000000003
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 6 -6 -2
right_child=4 3 -4 -5 5 -7 -8
leaf_value=0.0140954684441122 0.085154679849317738 0.026410531501952182 -0.017501548280296479 0.0014621478522681489 -0.046636790759956186 0.037447698249607933 0.031794502065436499
leaf_weight=167.52074213325977 60.752241030335426 211.7988056242466 1059.5275938510895 1090.8869259208441 27.780392900109291 130.96362222731113 37.02421498298645
leaf_count=765 323 1076 4848 5512 140 685 181
internal_value=0.000347101 -0.00355495 -0.0131877 0.00551857 0.0388267 0.0227309 0.0649542
internal_weight=2786.25 2529.73 1227.05 1302.69 256.52 158.744 97.7765
internal_count=13530 12201 5613 6588 1329 825 504
is_linear=0
shrinkage=0.104222
Tree=11
num_leaves=8
num_cat=0
split_feature=86 122 0 132 136 148 145
split_gain=31.5628 19.7563 17.5348 15.2483 8.699 11.6526 8.53686
threshold=3.0790020790020796 1.0250000000000001 1444.5595683182339 0.68350000000000011 11.950000000000001 0.25070180469638537 0.63955000000000006
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 6 -6 -2
right_child=4 3 -4 -5 5 -7 -8
leaf_value=-0.0029563172992086786 0.032749371502154893 -0.027078470578097272 -0.04142343191974563 0.004355282116605525 0.042836068356502124 -0.043229223026257771 0.082015555649337976
leaf_weight=268.76511310040951 128.29142379760742 184.50364246964455 247.01660700142384 1831.7398000508547 37.845243006944656 31.138597756624222 54.413731098175049
leaf_count=1259 680 882 1128 8932 205 158 286
internal_value=0.000321355 -0.0031776 -0.0213793 0.00147871 0.0355201 0.00398648 0.0474254
internal_weight=2783.71 2532.03 515.782 2016.24 251.689 68.9838 182.705
internal_count=13530 12201 2387 9814 1329 363 966
is_linear=0
shrinkage=0.104222
Tree=12
num_leaves=8
num_cat=0
split_feature=86 41 94 4 100 132 111
split_gain=16.7583 14.9692 11.4051 9.36529 9.15568 6.8883 6.86331
threshold=3.2679487179487183 2.5250000000000004 19.050000000000004 1454.7040164610194 2.1000000000000005 2.5855000000000006 1.6500000000000001
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 4 -2 -3 -5
right_child=3 5 -4 6 -6 -7 -8
leaf_value=-0.0077031612935488866 -0.037300665068035477 0.045270021664037818 0.0069515620157151858 0.08783804546768309 0.1145092464596736 -0.063294467679354607 0.032433508236858831
leaf_weight=1642.7547204941511 28.476281866431236 95.693837076425552 889.01127083599567 33.729012683033943 5.0763863474130622 6.7911695241928092 86.767844557762146
leaf_count=7824 156 540 4317 188 29 38 478
internal_value=0.000977474 -0.000976496 -0.00255725 0.0343884 -0.0143067 0.0380708 0.0479478
internal_weight=2788.3 2634.25 2531.77 154.05 33.5527 102.485 120.497
internal_count=13570 12719 12141 851 185 578 666
is_linear=0
shrinkage=0.104222
Tree=13
num_leaves=5
num_cat=0
split_feature=49 91 105 52
split_gain=12.5399 11.9283 13.2239 11.4266
threshold=6.7950000000000008 0.46333333333333332 0.66550000000000009 3.1450000000000005
decision_type=2 2 2 2
left_child=1 3 -3 -1
right_child=-2 2 -4 -5
leaf_value=-0.00096173862485333158 -0.18852641170062942 -0.26944729151735436 0.0083924991906715318 0.019698628791158867
leaf_weight=2444.542382568121 3.7820855826139441 3.7145474255084983 3.7179097086191177 330.02280321717262
leaf_count=11674 16 18 18 1844
internal_value=0.000884212 0.00114235 -0.13063 0.00149578
internal_weight=2785.78 2782 7.43246 2774.57
internal_count=13570 13554 36 13518
is_linear=0
shrinkage=0.104222
Tree=14
num_leaves=8
num_cat=0
split_feature=35 136 1 3 120 119 137
split_gain=12.9995 12.1396 9.67922 10.2069 8.87434 8.76233 7.82142
threshold=3.4050000000000007 3.1500000000000008 1408.1043210310934 1436.2136975970254 1.4145000000000001 0.55000000000000016 12.850000000000003
decision_type=2 2 2 2 2 2 2
left_child=1 5 3 -2 -3 -1 -4
right_child=2 4 6 -5 -6 -7 -8
leaf_value=-0.065982224130305259 -0.028897601514020019 -0.010800039930687538 0.014598348000302748 0.051779760894765055 0.0040416852382515198 0.035952394418601007 -0.011457004864636652
leaf_weight=9.93890532851219 20.277691304683685 1238.979572609067 441.32176646590233 106.29307742416859 676.56466819345951 115.69292894005775 174.66667002439499
leaf_count=46 107 5755 2341 568 3282 548 923
internal_value=0.000796137 -0.00349957 0.0126042 0.0388534 -0.00555799 0.0278839 0.00721016
internal_weight=2783.74 2041.18 742.559 126.571 1915.54 125.632 615.988
internal_count=13570 9631 3939 675 9037 594 3264
is_linear=0
shrinkage=0.104222
Tree=15
num_leaves=8
num_cat=0
split_feature=94 53 50 125 104 49 120
split_gain=15.421 19.7004 11.5702 10.8982 8.46618 7.68903 0.284565
threshold=17.550000000000004 3.5150000000000001 2.0050000000000003 0.45000000000000007 0.45750000000000007 1.8950000000000002 2.4365000000000001
decision_type=2 2 2 2 2 2 2
left_child=2 5 3 -1 -3 -2 -4
right_child=1 4 6 -5 -6 -7 -8
leaf_value=-0.040237205117941127 0.029193543717025244 -0.027898057080941927 0.13173751892789104 -0.0049749588986612815 -0.15821400423700094 0.0070086816806797644 0.089170490566897243
leaf_weight=101.41680394113064 210.2422444075346 11.58236499130726 5.4282936453819257 1551.7522683441639 10.168696627020834 879.85487067699432 2.6328659802675247
leaf_count=475 1114 49 33 7519 43 4276 17
internal_value=-0.000172415 0.00932761 -0.00653072 -0.00713841 -0.0888634 0.0112876 0.117955
internal_weight=2773.08 1111.85 1661.23 1653.17 21.7511 1090.1 8.06116
internal_count=13526 5482 8044 7994 92 5390 50
is_linear=0
shrinkage=0.104222
Tree=16
num_leaves=7
num_cat=0
split_feature=54 49 149 143 133 120
split_gain=13.5064 12.5454 12.7609 9.52104 4.64479 6.69198
threshold=1.6950000000000001 4.1950000000000012 0.53851874003189804 -2.5639999999999996 1.2110000000000001 3.0500000000000003
decision_type=2 2 2 2 2 2
left_child=1 3 -3 -1 -2 -6
right_child=4 2 -4 -5 5 -7
leaf_value=-0.11165723984306615 -0.022805170561939463 -0.034049153155287885 0.071318390164509041 0.0013999129723307169 0.11023664759120146 -0.064625521974505357
leaf_weight=8.1074528247117978 4.7248373478651073 177.6567175835371 13.41981780529022 2544.4961924999952 20.898206591606144 2.6760432869195929
leaf_count=41 29 758 58 12491 132 17
internal_value=-0.000148839 -0.000887801 -0.0266458 0.00104042 0.0714722 0.0903692
internal_weight=2771.98 2743.68 191.077 2552.6 28.2991 23.5742
internal_count=13526 13348 816 12532 178 149
is_linear=0
shrinkage=0.104222
Tree=17
num_leaves=8
num_cat=0
split_feature=86 89 86 49 89 92 95
split_gain=12.3529 16.6857 29.867 13.2745 12.903 19.095 7.28162
threshold=2.9298029556650254 1.0000000180025095e-35 1.8603896103896107 3.3850000000000002 0.27000000000000007 -0.62124999999999986 8.25
decision_type=2 2 2 2 2 2 2
left_child=1 3 -3 -1 6 -6 -2
right_child=4 2 -4 -5 5 -7 -8
leaf_value=0.0076740310795042049 0.058197093639699833 -0.094247277907232085 -0.010401082561217358 -0.014939641022089629 -0.10188514738956803 0.012348640807380216 -0.023647947189664047
leaf_weight=1310.5064302459359 93.891582764685154 49.551189616322517 670.03426378965378 359.25059229135513 16.932199478149414 257.1773796081543 13.500689059495924
leaf_count=6543 499 228 3182 1562 87 1350 75
internal_value=-0.000128236 -0.00290891 -0.016176 0.00280861 0.0172867 0.00528897 0.0479069
internal_weight=2770.84 2389.34 719.585 1669.76 381.502 274.11 107.392
internal_count=13526 11515 3410 8105 2011 1437 574
is_linear=0
shrinkage=0.104222
Tree=18
num_leaves=8
num_cat=0
split_feature=149 56 148 55 148 53 92
split_gain=15.7381 11.9383 11.3441 10.9087 9.49514 8.82734 5.7139
threshold=0.53433908520280893 1.7550000000000001 0.26796584848230992 1.905 0.24185130950380487 2.1250000000000004 -0.0099999999999999482
decision_type=2 2 2 2 2 2 2
left_child=1 4 -3 5 -1 -2 -5
right_child=3 2 -4 6 -6 -7 -8
leaf_value=-0.054435660661113758 0.0153280247340685 0.02819107947162831 -0.017585503842290271 -0.01840079708748131 -0.010724207322160621 -0.0015876644117591623 0.10180483948952145
leaf_weight=56.825720146298409 617.56049958616495 165.87729875743389 91.08600726723671 5.4966678321361568 1074.6435647159815 732.60994145274162 19.602076262235641
leaf_count=283 3334 829 448 27 4917 3565 97
internal_value=-0.000486987 -0.00831442 0.0119646 0.00741537 -0.01292 0.00614953 0.0754826
internal_weight=2763.7 1388.43 256.963 1375.27 1131.47 1350.17 25.0987
internal_count=13500 6477 1277 7023 5200 6899 124
is_linear=0
shrinkage=0.104222
Tree=19
num_leaves=8
num_cat=0
split_feature=149 55 135 1 27 115 54
split_gain=12.6285 9.13776 8.91602 12.9607 10.4883 7.46851 5.43256
threshold=0.53433908520280893 1.905 0.30550000000000005 1494.2870431281676 0.50845238095238099 2.3665000000000007 1.3650000000000004
decision_type=2 2 2 2 2 2 2
left_child=2 5 3 -1 -4 -2 -3
right_child=1 6 4 -5 -6 -7 -8
leaf_value=0.00092966247827827358 0.0028048803388890822 0.099877401856329082 -0.0069739024068449437 -0.021042388628781803 0.03967412519492558 0.027656530085268364 -0.011168774035797255
leaf_weight=536.86709239333868 1198.4237125739455 17.76616628468037 128.78820982575417 638.33134125173092 88.204329490661621 147.51364935934544 6.5457842200994483
leaf_count=2501 6137 84 598 2961 417 762 40
internal_value=-0.000430022 0.0066729 -0.007421 -0.0110049 0.0119887 0.00552878 0.0699864
internal_weight=2762.44 1370.25 1392.19 1175.2 216.993 1345.94 24.312
internal_count=13500 7023 6477 5462 1015 6899 124
is_linear=0
shrinkage=0.104222
Tree=20
num_leaves=8
num_cat=0
split_feature=86 29 98 85 111 54 1
split_gain=13.4852 11.6743 11.2269 10.3561 7.51975 9.02896 6.85752
threshold=3.3717607973421928 3.8452380952380958 0.53589743589743599 -0.45393665158371038 1.7500000000000002 1.3250000000000004 1430.9302281483838
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 6 -6 -2
right_child=4 3 -4 -5 5 -7 -8
leaf_value=0.010161271773601144 -0.030831897558424644 -0.065476683381506764 -0.016436452248208454 0.004566632375329463 -0.017458853996767588 0.051773327327659352 0.10280712840651235
leaf_weight=220.18627671897411 5.0186129659414318 23.250052616000175 793.86604422330856 1612.9528871029615 39.005949392914772 43.020874515175819 24.570655010640621
leaf_count=1065 29 113 3791 7879 209 264 150
internal_value=-0.000380007 -0.0018746 -0.0106611 0.00357096 0.0351053 0.0188533 0.0801381
internal_weight=2761.87 2650.26 1014.05 1636.2 111.616 82.0268 29.5893
internal_count=13500 12848 4856 7992 652 473 179
is_linear=0
shrinkage=0.104222
Tree=21
num_leaves=7
num_cat=0
split_feature=96 132 136 141 123 120
split_gain=13.2003 16.5388 14.2713 13.2653 11.1928 9.3524
threshold=4.5000000000000009 0.64600000000000013 7.8500000000000005 -2.3269999999999995 0.73000000000000009 3.1835000000000004
decision_type=2 2 2 2 2 2
left_child=1 2 -1 -2 -3 -5
right_child=3 4 -4 5 -6 -7
leaf_value=-0.060960557629992744 -0.18516760103419116 0.033438636082733196 -0.0037890386549494811 0.011554321509567651 -0.0045660462097220139 -0.095189759824081052
leaf_weight=104.0578725785017 3.772067964076995 88.859501846134663 87.140555322170258 867.02598301321268 1593.5797092542052 8.9989516139030439
leaf_count=484 19 416 405 4401 7693 51
internal_value=-0.000914735 -0.00585967 -0.0349056 0.00961608 -0.00255862 0.0104568
internal_weight=2753.43 1873.64 191.198 879.797 1682.44 876.025
internal_count=13469 8998 889 4471 8109 4452
is_linear=0
shrinkage=0.104222
Tree=22
num_leaves=8
num_cat=0
split_feature=86 96 132 53 17 141 14
split_gain=11.7147 10.9619 13.1395 10.1877 3.25091 5.30827 0.81688
threshold=3.9466666666666668 4.5000000000000009 0.64600000000000013 2.0250000000000004 1.0000000180025095e-35 0.45950000000000008 0.63333333333333341
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 -3 5 -2 -6
right_child=4 3 -4 -5 6 -7 -8
leaf_value=-0.03177439277102298 0.091437039897303479 0.024857814401883283 -0.0029740838488449607 0.00058337476930965298 0.07050981561257999 -0.054418786094624118 0.13381725959363758
leaf_weight=191.83759662508965 7.081265255808832 274.07831323891878 1669.0744777023792 596.54252921044827 3.1656967699527723 4.3793987929821014 7.6569998264312744
leaf_count=880 41 1511 8020 2927 18 26 46
internal_value=-0.000813155 -0.00142725 -0.00594319 0.00822535 0.0744319 0.0357031 0.115377
internal_weight=2753.82 2731.53 1860.91 870.621 22.2834 11.4607 10.8227
internal_count=13469 13338 8900 4438 131 67 64
is_linear=0
shrinkage=0.104222
Tree=23
num_leaves=8
num_cat=0
split_feature=54 86 91 40 130 51 10
split_gain=12.222 10.6549 10.0211 7.99429 8.88031 6.65012 3.52018
threshold=1.5750000000000004 3.9115384615384619 0.46333333333333332 1.4550000000000003 3.4720000000000004 1.0650000000000002 1.2583333333333335
decision_type=2 2 2 2 2 2 2
left_child=1 2 -1 4 -2 -5 -3
right_child=3 6 -4 5 -6 -7 -8
leaf_value=-0.0022469666970611943 0.016170707495608474 0.01400989076087111 -0.12852821652405796 0.092762096676041614 -0.16668567356812367 0.014555154480083921 0.10191215472125625
leaf_weight=2637.7453966140747 32.849811799824238 7.6601853668689754 6.834140643477439 32.402883179485798 3.1530902385711661 18.586706772446632 13.987728200852869
leaf_count=12765 209 44 31 200 20 112 88
internal_value=-0.000723185 -0.00197755 -0.00257376 0.0377177 0.000116917 0.0642621 0.0708273
internal_weight=2753.22 2666.23 2644.58 86.9925 36.0029 50.9896 21.6479
internal_count=13469 12928 12796 541 229 312 132
is_linear=0
shrinkage=0.104222
Tree=24
num_leaves=7
num_cat=0
split_feature=49 137 127 10 121 40
split_gain=12.6198 16.4964 12.4085 9.96103 9.53044 3.42942
threshold=3.3850000000000002 2.8500000000000001 1.4365000000000003 4.3000000000000007 2.7750000000000004 2.7850000000000006
decision_type=2 2 2 2 2 2
left_child=3 5 -3 4 -1 -2
right_child=1 2 -4 -5 -6 -7
leaf_value=0.0033265147337535523 -0.2293764266460791 -0.0035570344953408817 -0.040820787979045939 -0.15680120567767367 0.077284263807364459 -0.095337183505289247
leaf_weight=2143.9148783534765 3.9466704875230771 459.63925896584988 123.04180367290974 4.1842633336782447 19.085369817912579 4.4060309380292892
leaf_count=10807 17 1977 532 22 100 19
internal_value=-1.32262e-05 -0.0135124 -0.0114264 0.00366831 0.00397942 -0.158858
internal_weight=2758.22 591.034 582.681 2167.18 2163 8.3527
internal_count=13474 2545 2509 10929 10907 36
is_linear=0
shrinkage=0.104222
Tree=25
num_leaves=8
num_cat=0
split_feature=49 6 94 56 127 104 7
split_gain=10.0943 10.6231 11.9381 10.4719 9.83816 10.0485 9.11168
threshold=3.3850000000000002 1485.1782059621762 17.550000000000004 2.1050000000000004 1.4365000000000003 0.42950000000000005 3.0523414722088096
decision_type=2 2 2 2 2 2 2
left_child=1 3 -3 -1 6 -6 -2
right_child=4 2 -4 -5 5 -7 -8
leaf_value=0.014478237478344286 -0.015463014973786505 -0.0073632514981839545 0.0096110031081186563 0.078530598053871026 -0.0086524748202200501 -0.067456935030922358 0.015332839885013407
leaf_weight=308.73068219423294 309.28241093456745 1017.4353533536196 807.05070595443249 30.452116876840591 63.547168910503387 62.713204026222229 157.49458535015583
leaf_count=1561 1327 5053 4143 172 271 269 678
internal_value=-7.78723e-06 0.00329399 0.000145221 0.0202309 -0.012054 -0.0378634 -0.00507213
internal_weight=2756.71 2163.67 1824.49 339.183 593.037 126.26 466.777
internal_count=13474 10929 9196 1733 2545 540 2005
is_linear=0
shrinkage=0.104222
Tree=26
num_leaves=7
num_cat=0
split_feature=86 78 86 86 106 45
split_gain=11.0807 19.2113 13.2975 16.5366 9.93615 9.16974
threshold=2.6909814323607431 3.2071428571428577 1.0000000180025095e-35 1.8603896103896107 0.59050000000000014 1.1950000000000001
decision_type=2 2 2 2 2 2
left_child=2 4 5 -4 -2 -1
right_child=1 -3 3 -5 -6 -7
leaf_value=0.0057285473288771786 0.034425187503696976 -0.17261249206353166 -0.072289780038289356 -0.011539802081962551 0.0051253977786566941 -0.012137044156937154
leaf_weight=1235.6497117057443 188.36661138385534 5.9902653545141211 54.218099623918533 475.07240612059832 377.99835128337145 417.49042452871799
leaf_count=6255 973 33 239 2206 1968 1800
internal_value=-1.60817e-06 0.0129056 -0.00338664 -0.0177639 0.0148706 0.00121666
internal_weight=2754.79 572.355 2182.43 529.291 566.365 1653.14
internal_count=13474 2974 10500 2445 2941 8055
is_linear=0
shrinkage=0.104222
Tree=27
num_leaves=8
num_cat=0
split_feature=148 141 115 107 104 4 134
split_gain=10.1298 10.8135 12.2053 9.7232 15.8813 8.97442 7.29505
threshold=0.17399267399267399 -0.79299999999999993 1.2250000000000003 0.40950000000000003 0.41250000000000003 1460.7202799166059 -0.30549999999999994
decision_type=2 2 2 2 2 2 2
left_child=3 5 -3 6 -5 -2 -1
right_child=1 2 -4 4 -6 -7 -8
leaf_value=-0.10259253098907423 -0.03269054108208802 -0.0021082423175915925 0.015308582355162925 -0.13734624687852556 0.036154293999346325 -0.0035202193944307817 0.042506627852751083
leaf_weight=4.9914454817771938 183.7718341127038 1603.1649219617248 600.83727415651083 20.186357498168949 7.9962391257286063 304.1783049851656 15.20464117079973
leaf_count=25 894 7830 2977 102 43 1483 78
internal_value=-0.0013178 -0.00046824 0.00263984 -0.0485807 -0.0881234 -0.0145068 0.00661435
internal_weight=2740.33 2691.95 2204 48.3787 28.1826 487.95 20.1961
internal_count=13432 13184 10807 248 145 2377 103
is_linear=0
shrinkage=0.104222
Tree=28
num_leaves=8
num_cat=0
split_feature=86 134 41 106 29 91 3
split_gain=10.2371 10.3112 10.5364 10.1993 11.3955 9.81442 8.99213
threshold=2.864250614250615 -0.43849999999999995 2.1050000000000004 0.95450000000000013 1.3875000000000002 -0.0054166666666666486 1670.9226550788155
decision_type=2 2 2 2 2 2 2
left_child=3 2 -2 5 -5 -1 -3
right_child=1 6 -4 4 -6 -7 -8
leaf_value=-0.015667034131642239 -0.063328583368563382 0.021380817595676553 0.036600112461665557 -0.21945686012767968 -0.026560041934415114 0.0014656039323343243 -0.12083840209161807
leaf_weight=461.02843705564737 34.460375130176544 373.77398503571749 17.162777505815029 3.4011466801166526 135.53743974119425 1711.2739861980081 4.8830340132117263
leaf_count=2173 176 1994 103 17 658 8282 29
internal_value=-0.0011736 0.0135866 -0.0301054 -0.00392153 -0.0312966 -0.00217051 0.0195438
internal_weight=2741.52 430.28 51.6232 2311.24 138.939 2172.3 378.657
internal_count=13432 2302 279 11130 675 10455 2023
is_linear=0
shrinkage=0.104222
Tree=29
num_leaves=8
num_cat=0
split_feature=12 84 115 141 136 11 110
split_gain=9.65816 11.3219 10.1518 11.5361 10.0734 9.44977 6.86501
threshold=1.0000000180025095e-35 0.017970779220779203 1.2915000000000003 -0.60349999999999981 3.6500000000000008 2.1083333333333338 27.750000000000004
decision_type=2 2 2 2 2 2 2
left_child=1 6 4 -4 -2 -3 -1
right_child=2 5 3 -5 -6 -7 -8
leaf_value=0.01695486197013276 0.018272032851518873 -0.047853351303007219 -0.020306195413698006 0.015936397489881831 -0.011004436565708025 0.099082325881264394 -0.086337197330871959
leaf_weight=520.42061326652765 139.83185759186745 50.192012257874012 122.51054417341948 430.93686553835869 1465.7739738970995 5.2428159564733496 7.0750899761915198
leaf_count=2585 694 257 602 2140 7090 29 35
internal_value=-0.00104481 0.0108592 -0.00425885 0.00791357 -0.00845466 -0.0339422 0.0155679
internal_weight=2741.98 582.931 2159.05 553.447 1605.61 55.4348 527.496
internal_count=13432 2906 10526 2742 7784 286 2620
is_linear=0
shrinkage=0.104222
Tree=30
num_leaves=8
num_cat=0
split_feature=120 37 38 38 105 85 100
split_gain=12.0364 15.2453 11.9342 13.3835 10.0958 10.0578 10.0546
threshold=1.4145000000000001 0.42084520042422408 0.23906113829447367 0.14593794095256588 1.1225000000000003 -0.37141065830721004 1.3650000000000002
decision_type=2 2 2 2 2 2 2
left_child=2 6 3 -1 -3 -4 -2
right_child=1 4 5 -5 -6 -7 -8
leaf_value=-0.12038760226397151 -0.0090962232080863334 0.0034107878159782353 -0.060847190395801561 0.022337300771774038 -0.044228760615057698 -0.0060312321358531873 0.030222541925148897
leaf_weight=7.3577561527490607 84.787891268730164 592.17002998292446 37.370823763310909 228.1118380650878 52.602927520871162 1330.5593820437789 423.39983003586531
leaf_count=43 418 3057 171 1205 280 6147 2165
internal_value=0.00204154 0.0101633 -0.00379863 0.0178729 -0.00047641 -0.00752917 0.0236624
internal_weight=2756.36 1152.96 1603.4 235.47 644.773 1367.93 508.188
internal_count=13486 5920 7566 1248 3337 6318 2583
is_linear=0
shrinkage=0.104222
Tree=31
num_leaves=8
num_cat=0
split_feature=86 86 86 1 127 31 122
split_gain=10.4188 10.4465 11.9432 10.3703 15.906 10.0392 7.58651
threshold=2.6939799331103687 1.0000000180025095e-35 1.8603896103896107 1467.9553804385575 2.1340000000000003 0.22980739360049704 2.3515000000000006
decision_type=2 2 2 2 2 2 2
left_child=1 6 -3 5 -5 -2 -1
right_child=3 2 -4 4 -6 -7 -8
leaf_value=0.0014256124570572787 0.0018941270616234518 -0.058645654322910731 -0.0086447877444392647 0.02733676763309234 -0.12590863729503893 -0.082423971345859881 0.042831761918242235
leaf_weight=1608.7813730537891 166.28092505782843 58.273459360003471 473.13907171785831 372.2649027556181 7.4961845725774756 16.887378051877022 49.535993173718452
leaf_count=7823 854 251 2181 1963 43 94 277
internal_value=0.00183871 -0.00141235 -0.0141287 0.0144848 0.0243088 -0.0058835 0.0026627
internal_weight=2752.66 2189.73 531.413 562.929 379.761 183.168 1658.32
internal_count=13486 10532 2432 2954 2006 948 8100
is_linear=0
shrinkage=0.104222
Tree=32
num_leaves=8
num_cat=0
split_feature=93 144 112 94 104 93 120
split_gain=10.1721 16.1941 11.2364 13.1765 9.78364 8.96284 8.48134
threshold=20.550000000000004 0.013450000000000002 0.22750000000000004 5.5500000000000007 1.2865000000000002 25.050000000000004 2.0500000000000003
decision_type=2 2 2 2 2 2 2
left_child=2 4 6 -4 -2 -3 -1
right_child=1 5 3 -5 -6 -7 -8
leaf_value=0.0034609471658556285 0.020050548549636428 -0.009082334773132807 -0.11004659616173273 -0.007594432301873808 -0.021486441758403186 -0.13291664093499478 0.038221762386595907
leaf_weight=820.33589915931225 594.67776323109865 13.120999380946161 13.791079476475714 1142.1561977639794 68.70460732281208 12.293409973382948 84.046657353639603
leaf_count=3903 3054 71 65 5495 382 70 446
internal_value=0.0016558 0.0126202 -0.00200979 -0.00881763 0.0157484 -0.0690103 0.00669173
internal_weight=2749.13 688.797 2060.33 1155.95 663.382 25.4144 904.383
internal_count=13486 3577 9909 5560 3436 141 4349
is_linear=0
shrinkage=0.104222
Tree=33
num_leaves=5
num_cat=0
split_feature=91 141 123 7
split_gain=9.64033 9.54985 10.2025 8.96435
threshold=0.46333333333333332 -0.77399999999999991 2.2565000000000004 33.104396552584838
decision_type=2 2 2 2
left_child=1 3 -3 -1
right_child=-2 2 -4 -5
leaf_value=-0.015732035707484552 -0.12133869515037762 0.0038526132779362976 0.057630556320633561 0.031446003016811871
leaf_weight=450.80555958300829 6.9056807309389105 2188.2216669544578 38.992827542126179 48.440718069672585
leaf_count=2162 31 10769 213 250
internal_value=0.00156231 0.00187402 0.00479437 -0.011154
internal_weight=2733.37 2726.46 2227.21 499.246
internal_count=13425 13394 10982 2412
is_linear=0
shrinkage=0.104222
Tree=34
num_leaves=5
num_cat=0
split_feature=45 98 80 123
split_gain=8.19305 8.37131 12.6079 10.3403
threshold=1.6150000000000004 0.58114035087719318 0.043836402184014855 2.1715000000000004
decision_type=2 2 2 2
left_child=1 3 -3 -1
right_child=-2 2 -4 -5
leaf_value=0.008356015193296689 -0.14816646007318571 -0.033602776052007143 0.00019618273212945178 0.070940775625300892
leaf_weight=749.49989224970341 3.9628616422414771 128.32899699360132 1819.463518589735 29.807201541960239
leaf_count=3713 16 704 8830 162
internal_value=0.00140384 0.0016217 -0.00203079 0.0107506
internal_weight=2731.06 2727.1 1947.79 779.307
internal_count=13425 13409 9534 3875
is_linear=0
shrinkage=0.104222
end of trees
feature_importances:
Column_86=18
Column_49=9
Column_53=8
Column_41=6
Column_54=6
Column_120=6
Column_141=6
Column_45=5
Column_91=5
Column_104=5
Column_1=4
Column_94=4
Column_96=4
Column_132=4
Column_136=4
Column_148=4
Column_3=3
Column_4=3
Column_6=3
Column_11=3
Column_30=3
Column_35=3
Column_38=3
Column_48=3
Column_50=3
Column_85=3
Column_92=3
Column_105=3
Column_115=3
Column_123=3
Column_127=3
Column_130=3
Column_133=3
Column_149=3
Column_150=3
Column_0=2
Column_7=2
Column_10=2
Column_16=2
Column_29=2
Column_40=2
Column_44=2
Column_51=2
Column_52=2
Column_55=2
Column_56=2
Column_89=2
Column_93=2
Column_98=2
Column_100=2
Column_102=2
Column_106=2
Column_110=2
Column_111=2
Column_122=2
Column_134=2
Column_135=2
Column_137=2
Column_143=2
Column_144=2
Column_147=2
Column_8=1
Column_12=1
Column_14=1
Column_17=1
Column_21=1
Column_27=1
Column_28=1
Column_31=1
Column_34=1
Column_37=1
Column_39=1
Column_46=1
Column_47=1
Column_78=1
Column_80=1
Column_84=1
Column_95=1
Column_99=1
Column_107=1
Column_112=1
Column_119=1
Column_121=1
Column_125=1
Column_126=1
Column_145=1
parameters:
[boosting: gbdt]
[objective: binary]
[metric: binary_logloss]
[tree_learner: serial]
[device_type: cpu]
[data_sample_strategy: bagging]
[data: ]
[valid: ]
[num_iterations: 1500]
[learning_rate: 0.104222]
[num_leaves: 8]
[num_threads: 4]
[seed: 42]
[deterministic: 0]
[force_col_wise: 0]
[force_row_wise: 0]
[histogram_pool_size: -1]
[max_depth: 3]
[min_data_in_leaf: 17]
[min_sum_hessian_in_leaf: 0.001]
[bagging_fraction: 0.709098]
[pos_bagging_fraction: 1]
[neg_bagging_fraction: 1]
[bagging_freq: 3]
[bagging_seed: 400]
[bagging_by_query: 0]
[feature_fraction: 0.58888]
[feature_fraction_bynode: 1]
[feature_fraction_seed: 30056]
[extra_trees: 0]
[extra_seed: 12879]
[early_stopping_round: 0]
[early_stopping_min_delta: 0]
[first_metric_only: 0]
[max_delta_step: 0]
[lambda_l1: 7.81041e-07]
[lambda_l2: 0.00942891]
[linear_lambda: 0]
[min_gain_to_split: 0]
[drop_rate: 0.1]
[max_drop: 50]
[skip_drop: 0.5]
[xgboost_dart_mode: 0]
[uniform_drop: 0]
[drop_seed: 17869]
[top_rate: 0.2]
[other_rate: 0.1]
[min_data_per_group: 100]
[max_cat_threshold: 32]
[cat_l2: 10]
[cat_smooth: 10]
[max_cat_to_onehot: 4]
[top_k: 20]
[monotone_constraints: ]
[monotone_constraints_method: basic]
[monotone_penalty: 0]
[feature_contri: ]
[forcedsplits_filename: ]
[refit_decay_rate: 0.9]
[cegb_tradeoff: 1]
[cegb_penalty_split: 0]
[cegb_penalty_feature_lazy: ]
[cegb_penalty_feature_coupled: ]
[path_smooth: 0]
[interaction_constraints: ]
[verbosity: -1]
[saved_feature_importance_type: 0]
[use_quantized_grad: 0]
[num_grad_quant_bins: 4]
[quant_train_renew_leaf: 0]
[stochastic_rounding: 1]
[linear_tree: 0]
[max_bin: 255]
[max_bin_by_feature: ]
[min_data_in_bin: 3]
[bin_construct_sample_cnt: 200000]
[data_random_seed: 175]
[is_enable_sparse: 1]
[enable_bundle: 1]
[use_missing: 1]
[zero_as_missing: 0]
[feature_pre_filter: 1]
[pre_partition: 0]
[two_round: 0]
[header: 0]
[label_column: ]
[weight_column: ]
[group_column: ]
[ignore_column: ]
[categorical_feature: ]
[forcedbins_filename: ]
[precise_float_parser: 0]
[parser_config_file: ]
[objective_seed: 16083]
[num_class: 1]
[is_unbalance: 0]
[scale_pos_weight: 1]
[sigmoid: 1]
[boost_from_average: 1]
[reg_sqrt: 0]
[alpha: 0.9]
[fair_c: 1]
[poisson_max_delta_step: 0.7]
[tweedie_variance_power: 1.5]
[lambdarank_truncation_level: 30]
[lambdarank_norm: 1]
[label_gain: ]
[lambdarank_position_bias_regularization: 0]
[eval_at: ]
[multi_error_top_k: 1]
[auc_mu_weights: ]
[num_machines: 1]
[local_listen_port: 12400]
[time_out: 120]
[machine_list_filename: ]
[machines: ]
[gpu_platform_id: -1]
[gpu_device_id: -1]
[gpu_use_dp: 0]
[num_gpu: 1]
end of parameters
pandas_categorical:null
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
+85 -26
View File
@@ -20,6 +20,13 @@ from dataclasses import dataclass, field
import xgboost as xgb
import lightgbm as lgb
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
try:
from config.config_loader import get_config as _get_cfg
except ImportError:
_get_cfg = None # type: ignore[assignment]
# CatBoost is optional
try:
from catboost import CatBoostClassifier
@@ -228,15 +235,13 @@ class V25Predictor:
print(f"[V25] Using fallback feature columns ({len(V25Predictor._FALLBACK_FEATURE_COLS)} features)")
return V25Predictor._FALLBACK_FEATURE_COLS
FEATURE_COLS = _load_feature_cols.__func__()
# Model weights for ensemble
# Model weights for ensemble (overridden from config in __init__)
DEFAULT_WEIGHTS = {
'xgb': 0.50,
'lgb': 0.50,
}
def __init__(self, models_dir: str = None):
def __init__(self, models_dir: Optional[str] = None):
"""
Initialize V25 Predictor.
@@ -246,6 +251,17 @@ class V25Predictor:
self.models_dir = models_dir or MODELS_DIR
self.models = {} # market -> {'xgb': model, 'lgb': model}
self._loaded = False
self.FEATURE_COLS = self._load_feature_cols()
# Load weights from config (falls back to class default 0.50/0.50)
if _get_cfg is not None:
try:
cfg = _get_cfg()
self.DEFAULT_WEIGHTS = {
'xgb': float(cfg.get('model_ensemble.xgb_weight', 0.50)),
'lgb': float(cfg.get('model_ensemble.lgb_weight', 0.50)),
}
except Exception:
pass # keep class-level defaults
# All trained market models available in V25
ALL_MARKETS = [
@@ -276,21 +292,34 @@ class V25Predictor:
xgb_content = f.read()
booster = xgb.Booster()
booster.load_model(bytearray(xgb_content, 'utf-8'))
self.models[market]['xgb'] = booster
loaded_count += 1
# Corruption detection: verify model can run a dummy prediction
try:
_dummy = pd.DataFrame([{col: 0.0 for col in self.FEATURE_COLS}])
booster.predict(xgb.DMatrix(_dummy))
self.models[market]['xgb'] = booster
loaded_count += 1
except Exception as _ce:
print(f"[V25] ⚠️ XGB model for {market} failed integrity check: {_ce} — skipping")
# Load LightGBM (read content in Python to avoid non-ASCII path issues)
lgb_path = os.path.join(self.models_dir, f'lgb_v25_{market}.txt')
if os.path.exists(lgb_path) and os.path.getsize(lgb_path) > 0:
with open(lgb_path, 'r', encoding='utf-8') as f:
model_str = f.read()
self.models[market]['lgb'] = lgb.Booster(model_str=model_str)
loaded_count += 1
lgb_model = lgb.Booster(model_str=model_str)
# Corruption detection: verify model can run a dummy prediction
try:
_dummy = pd.DataFrame([{col: 0.0 for col in self.FEATURE_COLS}])
lgb_model.predict(_dummy)
self.models[market]['lgb'] = lgb_model
loaded_count += 1
except Exception as _ce:
print(f"[V25] ⚠️ LGB model for {market} failed integrity check: {_ce} — skipping")
# Remove empty entries
if not self.models[market]:
del self.models[market]
print(f"[V25] Loaded {loaded_count} model files across {len(self.models)} markets: {list(self.models.keys())}")
self._loaded = loaded_count > 0
return self._loaded
@@ -306,7 +335,27 @@ class V25Predictor:
if not self._loaded:
if not self.load_models():
raise RuntimeError("Failed to load V25 models")
def readiness_summary(self) -> Dict[str, Any]:
"""Return per-market model status for health check endpoint."""
if not self._loaded:
self.load_models()
market_status = {}
for market in self.ALL_MARKETS:
m = self.models.get(market, {})
market_status[market] = {
"xgb": "xgb" in m,
"lgb": "lgb" in m,
"ready": bool(m),
}
loaded_markets = [k for k, v in market_status.items() if v["ready"]]
return {
"fully_loaded": len(loaded_markets) == len(self.ALL_MARKETS),
"loaded_markets": loaded_markets,
"missing_markets": [m for m in self.ALL_MARKETS if m not in loaded_markets],
"weights": self.DEFAULT_WEIGHTS,
}
def _prepare_features(self, features: Dict[str, float]) -> pd.DataFrame:
"""Prepare feature vector for prediction."""
X = pd.DataFrame([{col: features.get(col, 0.0) for col in self.FEATURE_COLS}])
@@ -412,7 +461,7 @@ class V25Predictor:
return float(avg_prob), float(1 - avg_prob)
def predict_market(self, market: str, features: Dict[str, float]) -> np.ndarray:
def predict_market(self, market: str, features: Dict[str, float]) -> Optional[np.ndarray]:
"""
Generic prediction for any loaded market.
@@ -510,15 +559,15 @@ class V25Predictor:
# Determine picks
ms_probs = {'1': home_prob, 'X': draw_prob, '2': away_prob}
ms_pick = max(ms_probs, key=ms_probs.get)
ms_pick = max(ms_probs, key=ms_probs.__getitem__)
ms_confidence = ms_probs[ms_pick] * 100
ou25_probs = {'Over': over_prob, 'Under': under_prob}
ou25_pick = max(ou25_probs, key=ou25_probs.get)
ou25_pick = max(ou25_probs, key=ou25_probs.__getitem__)
ou25_confidence = ou25_probs[ou25_pick] * 100
btts_probs = {'Yes': btts_yes_prob, 'No': btts_no_prob}
btts_pick = max(btts_probs, key=btts_probs.get)
btts_pick = max(btts_probs, key=btts_probs.__getitem__)
btts_confidence = btts_probs[btts_pick] * 100
# Create prediction
@@ -564,13 +613,23 @@ class V25Predictor:
) -> List[ValueBet]:
"""Detect value bets based on model vs market odds."""
value_bets = []
min_edge = 0.05 # 5% minimum edge
# Market-specific minimum edge thresholds
# MS: higher variance → require more edge
# OU/BTTS: binary markets → tighter edge acceptable
EDGE_THRESHOLDS = {
'MS': 0.06,
'OU25': 0.04,
'BTTS': 0.04,
}
ms_edge = EDGE_THRESHOLDS['MS']
ou_edge = EDGE_THRESHOLDS['OU25']
btts_edge = EDGE_THRESHOLDS['BTTS']
# MS value bets
if 'ms_h' in odds and odds['ms_h'] > 0:
implied = 1 / odds['ms_h']
edge = home_prob - implied
if edge > min_edge:
if edge > ms_edge:
value_bets.append(ValueBet(
market_type='MS',
pick='1',
@@ -583,7 +642,7 @@ class V25Predictor:
if 'ms_d' in odds and odds['ms_d'] > 0:
implied = 1 / odds['ms_d']
edge = draw_prob - implied
if edge > min_edge:
if edge > ms_edge:
value_bets.append(ValueBet(
market_type='MS',
pick='X',
@@ -596,7 +655,7 @@ class V25Predictor:
if 'ms_a' in odds and odds['ms_a'] > 0:
implied = 1 / odds['ms_a']
edge = away_prob - implied
if edge > min_edge:
if edge > ms_edge:
value_bets.append(ValueBet(
market_type='MS',
pick='2',
@@ -610,7 +669,7 @@ class V25Predictor:
if 'ou25_o' in odds and odds['ou25_o'] > 0:
implied = 1 / odds['ou25_o']
edge = over_prob - implied
if edge > min_edge:
if edge > ou_edge:
value_bets.append(ValueBet(
market_type='OU25',
pick='Over',
@@ -623,7 +682,7 @@ class V25Predictor:
if 'ou25_u' in odds and odds['ou25_u'] > 0:
implied = 1 / odds['ou25_u']
edge = under_prob - implied
if edge > min_edge:
if edge > ou_edge:
value_bets.append(ValueBet(
market_type='OU25',
pick='Under',
@@ -637,7 +696,7 @@ class V25Predictor:
if 'btts_y' in odds and odds['btts_y'] > 0:
implied = 1 / odds['btts_y']
edge = btts_yes_prob - implied
if edge > min_edge:
if edge > btts_edge:
value_bets.append(ValueBet(
market_type='BTTS',
pick='Yes',
@@ -650,7 +709,7 @@ class V25Predictor:
if 'btts_n' in odds and odds['btts_n'] > 0:
implied = 1 / odds['btts_n']
edge = btts_no_prob - implied
if edge > min_edge:
if edge > btts_edge:
value_bets.append(ValueBet(
market_type='BTTS',
pick='No',
Binary file not shown.
+160
View File
@@ -0,0 +1,160 @@
{
"total_test": 23039,
"thresholds": {
"0.0": {
"n_matches": 22227,
"pct": 96.5,
"markets": {
"ms": {
"hit_rate": 0.5363,
"avg_roi": -0.0046,
"total_roi": -103.02
},
"ou15": {
"hit_rate": 0.7463,
"avg_roi": 0.0144,
"total_roi": 319.02
},
"ou25": {
"hit_rate": 0.6111,
"avg_roi": -0.006,
"total_roi": -134.41
},
"ou35": {
"hit_rate": 0.7302,
"avg_roi": -0.014,
"total_roi": -310.51
},
"btts": {
"hit_rate": 0.5848,
"avg_roi": 0.0031,
"total_roi": 69.5
}
}
},
"0.1": {
"n_matches": 23033,
"pct": 100.0,
"markets": {
"ms": {
"hit_rate": 0.546,
"avg_roi": -0.0045,
"total_roi": -104.38
},
"ou15": {
"hit_rate": 0.7533,
"avg_roi": 0.0145,
"total_roi": 335.02
},
"ou25": {
"hit_rate": 0.6193,
"avg_roi": -0.0042,
"total_roi": -96.97
},
"ou35": {
"hit_rate": 0.7277,
"avg_roi": -0.0147,
"total_roi": -338.57
},
"btts": {
"hit_rate": 0.5886,
"avg_roi": 0.0025,
"total_roi": 57.21
}
}
},
"0.2": {
"n_matches": 23034,
"pct": 100.0,
"markets": {
"ms": {
"hit_rate": 0.5459,
"avg_roi": -0.0046,
"total_roi": -105.38
},
"ou15": {
"hit_rate": 0.7533,
"avg_roi": 0.0146,
"total_roi": 335.26
},
"ou25": {
"hit_rate": 0.6193,
"avg_roi": -0.0043,
"total_roi": -97.97
},
"ou35": {
"hit_rate": 0.7276,
"avg_roi": -0.0147,
"total_roi": -339.57
},
"btts": {
"hit_rate": 0.5887,
"avg_roi": 0.0025,
"total_roi": 57.62
}
}
},
"0.3": {
"n_matches": 23039,
"pct": 100.0,
"markets": {
"ms": {
"hit_rate": 0.546,
"avg_roi": -0.0045,
"total_roi": -103.45
},
"ou15": {
"hit_rate": 0.7534,
"avg_roi": 0.0146,
"total_roi": 335.6
},
"ou25": {
"hit_rate": 0.6194,
"avg_roi": -0.0042,
"total_roi": -97.44
},
"ou35": {
"hit_rate": 0.7277,
"avg_roi": -0.0147,
"total_roi": -339.26
},
"btts": {
"hit_rate": 0.5887,
"avg_roi": 0.0025,
"total_roi": 58.61
}
}
},
"0.5": {
"n_matches": 23039,
"pct": 100.0,
"markets": {
"ms": {
"hit_rate": 0.546,
"avg_roi": -0.0045,
"total_roi": -103.45
},
"ou15": {
"hit_rate": 0.7534,
"avg_roi": 0.0146,
"total_roi": 335.6
},
"ou25": {
"hit_rate": 0.6194,
"avg_roi": -0.0042,
"total_roi": -97.44
},
"ou35": {
"hit_rate": 0.7277,
"avg_roi": -0.0147,
"total_roi": -339.26
},
"btts": {
"hit_rate": 0.5887,
"avg_roi": 0.0025,
"total_roi": 58.61
}
}
}
}
}
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,5 @@
[
{
"market": "MS-Ev",
"min_edge": 0.02,
"n":
+267
View File
@@ -0,0 +1,267 @@
{
"generated_at": "2026-05-15T21:40:57.995899",
"matches_processed": 3000,
"matches_skipped": 0,
"markets": {
"MS": {
"overall_accuracy": 54.97,
"total_matches": 3000,
"by_confidence_band": {
"<50%": {
"accuracy": 38.87,
"count": 759,
"mean_confidence": 45.58
},
"50-65%": {
"accuracy": 52.62,
"count": 1300,
"mean_confidence": 57.19
},
"65-75%": {
"accuracy": 66.99,
"count": 624,
"mean_confidence": 69.49
},
"75%+": {
"accuracy": 79.5,
"count": 317,
"mean_confidence": 80.69
}
},
"by_league": {
"Bundesliga": {
"accuracy": 46.77,
"count": 62
},
"Ligue 1": {
"accuracy": 58.73,
"count": 63
},
"Serie A": {
"accuracy": 56.25,
"count": 64
},
"Other": {
"accuracy": 55.03,
"count": 2811
}
},
"by_pick_direction": {
"1": {
"accuracy": 58.38,
"count": 1946,
"mean_confidence": 60.84
},
"2": {
"accuracy": 48.72,
"count": 1053,
"mean_confidence": 56.44
},
"X": {
"accuracy": 0.0,
"count": 1,
"mean_confidence": 56.07
}
}
},
"OU15": {
"overall_accuracy": 74.4,
"total_matches": 3000,
"by_confidence_band": {
"50-65%": {
"accuracy": 70.97,
"count": 62,
"mean_confidence": 59.63
},
"65-75%": {
"accuracy": 68.0,
"count": 275,
"mean_confidence": 71.1
},
"75%+": {
"accuracy": 75.14,
"count": 2663,
"mean_confidence": 89.44
}
},
"by_league": {
"Bundesliga": {
"accuracy": 67.74,
"count": 62
},
"Ligue 1": {
"accuracy": 76.19,
"count": 63
},
"Serie A": {
"accuracy": 70.31,
"count": 64
},
"Other": {
"accuracy": 74.6,
"count": 2811
}
},
"by_pick_direction": {
"Over": {
"accuracy": 74.4,
"count": 3000,
"mean_confidence": 87.14
}
}
},
"OU25": {
"overall_accuracy": 51.77,
"total_matches": 3000,
"by_confidence_band": {
"50-65%": {
"accuracy": 49.33,
"count": 1267,
"mean_confidence": 57.13
},
"65-75%": {
"accuracy": 54.53,
"count": 453,
"mean_confidence": 69.42
},
"75%+": {
"accuracy": 53.2,
"count": 1280,
"mean_confidence": 90.2
}
},
"by_league": {
"Bundesliga": {
"accuracy": 41.94,
"count": 62
},
"Ligue 1": {
"accuracy": 50.79,
"count": 63
},
"Serie A": {
"accuracy": 43.75,
"count": 64
},
"Other": {
"accuracy": 52.19,
"count": 2811
}
},
"by_pick_direction": {
"Over": {
"accuracy": 51.03,
"count": 2432,
"mean_confidence": 76.11
},
"Under": {
"accuracy": 54.93,
"count": 568,
"mean_confidence": 60.17
}
}
},
"BTTS": {
"overall_accuracy": 51.83,
"total_matches": 3000,
"by_confidence_band": {
"50-65%": {
"accuracy": 48.74,
"count": 2214,
"mean_confidence": 58.66
},
"65-75%": {
"accuracy": 60.42,
"count": 758,
"mean_confidence": 68.19
},
"75%+": {
"accuracy": 64.29,
"count": 28,
"mean_confidence": 77.44
}
},
"by_league": {
"Bundesliga": {
"accuracy": 54.84,
"count": 62
},
"Ligue 1": {
"accuracy": 50.79,
"count": 63
},
"Serie A": {
"accuracy": 57.81,
"count": 64
},
"Other": {
"accuracy": 51.65,
"count": 2811
}
},
"by_pick_direction": {
"No": {
"accuracy": 50.26,
"count": 2099,
"mean_confidence": 61.56
},
"Yes": {
"accuracy": 55.49,
"count": 901,
"mean_confidence": 60.51
}
}
}
},
"calibration": {
"ms_home": {
"brier_score": 0.2054,
"calibration_error": 0.0,
"sample_count": 3000,
"last_trained": "2026-05-15T21:40:58.026574",
"mean_predicted": 0.4942,
"mean_actual": 0.46
},
"ms_draw": {
"brier_score": 0.1846,
"calibration_error": 0.0,
"sample_count": 3000,
"last_trained": "2026-05-15T21:40:58.030886",
"mean_predicted": 0.149,
"mean_actual": 0.2493
},
"ms_away": {
"brier_score": 0.1726,
"calibration_error": 0.0,
"sample_count": 3000,
"last_trained": "2026-05-15T21:40:58.033980",
"mean_predicted": 0.3567,
"mean_actual": 0.2907
},
"ou15": {
"brier_score": 0.1884,
"calibration_error": 0.0,
"sample_count": 3000,
"last_trained": "2026-05-15T21:40:58.037204",
"mean_predicted": 0.8714,
"mean_actual": 0.744
},
"ou25": {
"brier_score": 0.247,
"calibration_error": 0.0,
"sample_count": 3000,
"last_trained": "2026-05-15T21:40:58.041152",
"mean_predicted": 0.6924,
"mean_actual": 0.499
},
"btts": {
"brier_score": 0.2453,
"calibration_error": 0.0,
"sample_count": 3000,
"last_trained": "2026-05-15T21:40:58.044344",
"mean_predicted": 0.4506,
"mean_actual": 0.5147
}
},
"runtime_seconds": 94.1
}
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+40
View File
@@ -0,0 +1,40 @@
"""
MatchData dataclass core data transfer object used throughout the engine.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
@dataclass
class MatchData:
match_id: str
home_team_id: str
away_team_id: str
home_team_name: str
away_team_name: str
match_date_ms: int
sport: str
league_id: Optional[str]
league_name: str
referee_name: Optional[str]
odds_data: Dict[str, float]
home_lineup: Optional[List[str]]
away_lineup: Optional[List[str]]
sidelined_data: Optional[Dict[str, Any]]
home_goals_avg: float
home_conceded_avg: float
away_goals_avg: float
away_conceded_avg: float
home_position: int
away_position: int
lineup_source: str
status: str = ""
state: Optional[str] = None
substate: Optional[str] = None
current_score_home: Optional[int] = None
current_score_away: Optional[int] = None
lineup_confidence: float = 0.0
source_table: str = "matches"
+292
View File
@@ -0,0 +1,292 @@
"""
Shared prediction dataclasses used across the AI engine.
These were originally defined in models/v20_ensemble.py and are extracted here
so they can be used without importing the full V20 ensemble.
"""
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
from core.calculators.score_calculator import ScorePrediction
@dataclass
class MarketPrediction:
"""Prediction for a single betting market."""
market_type: str
pick: str
probability: float
confidence: float
odds: float = 0.0
is_recommended: bool = False
is_value_bet: bool = False
edge: float = 0.0 # Expected edge over market
def to_dict(self) -> dict:
return {
"market_type": self.market_type,
"pick": self.pick,
"probability": round(self.probability * 100, 1),
"confidence": round(self.confidence, 1),
"odds": self.odds,
"is_recommended": self.is_recommended,
"is_value_bet": self.is_value_bet,
"edge": round(self.edge, 1)
}
@dataclass
class FullMatchPrediction:
"""Complete prediction for a match with ALL markets."""
match_id: str
home_team: str
away_team: str
match_date: str = ""
# === MAÇ SONUCU (1X2) ===
ms_home_prob: float = 0.33
ms_draw_prob: float = 0.33
ms_away_prob: float = 0.33
ms_pick: str = ""
ms_confidence: float = 0.0
# === ÇİFTE ŞANS ===
dc_1x_prob: float = 0.66
dc_x2_prob: float = 0.66
dc_12_prob: float = 0.66
dc_pick: str = ""
dc_confidence: float = 0.0
# === ALT/ÜST GOLLER ===
# 1.5
over_15_prob: float = 0.70
under_15_prob: float = 0.30
ou15_pick: str = ""
ou15_confidence: float = 0.0
# 2.5
over_25_prob: float = 0.50
under_25_prob: float = 0.50
ou25_pick: str = ""
ou25_confidence: float = 0.0
# 3.5
over_35_prob: float = 0.30
under_35_prob: float = 0.70
ou35_pick: str = ""
ou35_confidence: float = 0.0
# === KARŞILIKLI GOL (BTTS) ===
btts_yes_prob: float = 0.50
btts_no_prob: float = 0.50
btts_pick: str = ""
btts_confidence: float = 0.0
# === İLK YARI SONUCU ===
ht_home_prob: float = 0.30
ht_draw_prob: float = 0.40
ht_away_prob: float = 0.30
ht_pick: str = ""
ht_confidence: float = 0.0
# === SKOR TAHMİNLERİ ===
score: Optional[ScorePrediction] = None
predicted_ft_score: str = "1-1"
predicted_ht_score: str = "0-0"
ft_scores_top5: List[Dict] = field(default_factory=list)
# === xG (Expected Goals) ===
home_xg: float = 1.3
away_xg: float = 1.1
total_xg: float = 2.4
# === RISK DEĞERLENDİRMESİ ===
risk_level: str = "MEDIUM" # LOW, MEDIUM, HIGH, EXTREME
risk_score: float = 0.0
is_surprise_risk: bool = False
surprise_type: str = ""
risk_warnings: List[str] = field(default_factory=list)
ht_ft_probs: Dict[str, float] = field(default_factory=dict)
# === GLM-5 SÜRPRİZ SKORU ===
upset_score: int = 0 # 0-100 arası sürpriz skoru
upset_level: str = "LOW" # LOW, MEDIUM, HIGH, EXTREME
upset_reasons: List[str] = field(default_factory=list)
# === SÜRPRİZ PROFİLİ ===
surprise_score: float = 0.0 # 0-100 overall surprise risk score
surprise_comment: str = "" # Human-readable surprise commentary
surprise_reasons: List[str] = field(default_factory=list) # Flagged risk reasons
surprise_breakdown: List[Dict[str, Any]] = field(default_factory=list) # Per-factor {code, points, label}
# === ENGINE KATKILARI ===
team_confidence: float = 0.0
player_confidence: float = 0.0
odds_confidence: float = 0.0
referee_confidence: float = 0.0
# === KORNER & KART & DİĞER ===
total_corners_pred: float = 9.5
corner_pick: str = "9.5 Üst"
total_cards_pred: float = 4.5
card_pick: str = "4.5 Alt"
cards_over_prob: float = 0.50
cards_under_prob: float = 0.50
cards_confidence: float = 0.0
handicap_pick: str = ""
handicap_home_prob: float = 0.33
handicap_draw_prob: float = 0.34
handicap_away_prob: float = 0.33
handicap_confidence: float = 0.0
ht_over_05_prob: float = 0.65
ht_under_05_prob: float = 0.35
ht_over_15_prob: float = 0.30
ht_under_15_prob: float = 0.70
ht_ou_pick: str = "İY 0.5 Üst"
ht_ou15_pick: str = "İY 1.5 Alt"
odd_even_pick: str = "Çift"
odd_prob: float = 0.50 # Tek olasılığı
even_prob: float = 0.50 # Çift olasılığı
# === TAVSİYELER (RECOMMENDATIONS) ===
best_bet: Optional[MarketPrediction] = None
recommended_bets: List[MarketPrediction] = field(default_factory=list)
alternative_bet: Optional[MarketPrediction] = None
expert_recommendation: Dict[str, Any] = field(default_factory=dict)
# === DETAILED ANALYSIS ===
analysis_details: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict:
return {
"match_info": {
"match_id": self.match_id,
"home_team": self.home_team,
"away_team": self.away_team,
"match_date": self.match_date
},
"predictions": {
"match_result": {
"1": round(self.ms_home_prob * 100, 1),
"X": round(self.ms_draw_prob * 100, 1),
"2": round(self.ms_away_prob * 100, 1),
"pick": self.ms_pick,
"confidence": round(self.ms_confidence, 1)
},
"double_chance": {
"1X": round(self.dc_1x_prob * 100, 1),
"X2": round(self.dc_x2_prob * 100, 1),
"12": round(self.dc_12_prob * 100, 1),
"pick": self.dc_pick,
"confidence": round(self.dc_confidence, 1)
},
"over_under": {
"1.5": {
"over": round(self.over_15_prob * 100, 1),
"under": round(self.under_15_prob * 100, 1),
"pick": self.ou15_pick,
"confidence": round(self.ou15_confidence, 1)
},
"2.5": {
"over": round(self.over_25_prob * 100, 1),
"under": round(self.under_25_prob * 100, 1),
"pick": self.ou25_pick,
"confidence": round(self.ou25_confidence, 1)
},
"3.5": {
"over": round(self.over_35_prob * 100, 1),
"under": round(self.under_35_prob * 100, 1),
"pick": self.ou35_pick,
"confidence": round(self.ou35_confidence, 1)
}
},
"btts": {
"yes": round(self.btts_yes_prob * 100, 1),
"no": round(self.btts_no_prob * 100, 1),
"pick": self.btts_pick,
"confidence": round(self.btts_confidence, 1)
},
"first_half": {
"1": round(self.ht_home_prob * 100, 1),
"X": round(self.ht_draw_prob * 100, 1),
"2": round(self.ht_away_prob * 100, 1),
"pick": self.ht_pick,
"confidence": round(self.ht_confidence, 1),
"over_under_05": {
"over": round(self.ht_over_05_prob * 100, 1),
"under": round(self.ht_under_05_prob * 100, 1),
"pick": self.ht_ou_pick
},
"over_under_15": {
"over": round(self.ht_over_15_prob * 100, 1),
"under": round(self.ht_under_15_prob * 100, 1),
"pick": self.ht_ou15_pick
}
},
"scores": {
"predicted_ft": self.predicted_ft_score,
"predicted_ht": self.predicted_ht_score,
"top_5_ft_scores": self.ft_scores_top5
},
"others": {
"handicap": {
"pick": self.handicap_pick,
"confidence": round(self.handicap_confidence, 1),
"home": round(self.handicap_home_prob * 100, 1),
"draw": round(self.handicap_draw_prob * 100, 1),
"away": round(self.handicap_away_prob * 100, 1)
},
"corners": {
"total": round(self.total_corners_pred, 1),
"pick": self.corner_pick
},
"cards": {
"total": round(self.total_cards_pred, 1),
"pick": self.card_pick,
"confidence": round(self.cards_confidence, 1),
"over": round(self.cards_over_prob * 100, 1),
"under": round(self.cards_under_prob * 100, 1)
},
"odd_even": {
"pick": self.odd_even_pick,
"tek": round(self.odd_prob * 100, 1),
"cift": round(self.even_prob * 100, 1)
}
},
"xg": {
"home": round(self.home_xg, 2),
"away": round(self.away_xg, 2),
"total": round(self.total_xg, 2)
}
},
"risk": {
"level": self.risk_level,
"score": round(self.risk_score, 1),
"is_surprise_risk": self.is_surprise_risk,
"surprise_type": self.surprise_type,
"ht_ft_probs": {k: round(v * 100, 1) for k, v in self.ht_ft_probs.items()} if self.ht_ft_probs else {},
"warnings": self.risk_warnings
},
"upset_analysis": {
"score": self.upset_score,
"level": self.upset_level,
"reasons": self.upset_reasons
},
"engine_breakdown": {
"team_engine": round(self.team_confidence, 1),
"player_engine": round(self.player_confidence, 1),
"odds_engine": round(self.odds_confidence, 1),
"referee_engine": round(self.referee_confidence, 1)
},
"recommendations": {
"best_bet": self.best_bet.to_dict() if self.best_bet else None,
"all_recommended": [b.to_dict() for b in self.recommended_bets] if self.recommended_bets else [],
"alternative_bet": self.alternative_bet.to_dict() if self.alternative_bet else None
},
"analysis_details": self.analysis_details
}
+510
View File
@@ -0,0 +1,510 @@
"""
Calibration Backfill Script
============================
Runs V25 model against historical matches (using pre-computed ai_features + odds)
to generate calibration training data, then trains isotonic calibration models.
Usage:
python ai-engine/scripts/backfill_calibration.py
python ai-engine/scripts/backfill_calibration.py --limit 5000
python ai-engine/scripts/backfill_calibration.py --min-samples 50
"""
import argparse
import json
import os
import sys
import time
from typing import Any, Dict, List, Optional, Tuple
import numpy as np
import pandas as pd
import psycopg2
from psycopg2.extras import RealDictCursor
from dotenv import load_dotenv
AI_ENGINE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, AI_ENGINE_DIR)
from models.v25_ensemble import V25Predictor
from models.calibration import get_calibrator
load_dotenv()
def _normalize_pick(pick) -> str:
return str(pick or "").strip().casefold()
def resolve_actual(market, pick, score_home, score_away, ht_home, ht_away):
if score_home is None or score_away is None:
return None
market = (market or "").upper()
p = _normalize_pick(pick)
total = score_home + score_away
ht_total = (ht_home or 0) + (ht_away or 0) if ht_home is not None else None
if market == "MS":
if p == "1": return int(score_home > score_away)
if p in {"x", "0"}: return int(score_home == score_away)
if p == "2": return int(score_away > score_home)
return None
if market in {"OU15", "OU25", "OU35"}:
line = {"OU15": 1.5, "OU25": 2.5, "OU35": 3.5}[market]
if "over" in p or "üst" in p or "ust" in p: return int(total > line)
if "under" in p or "alt" in p: return int(total < line)
return None
if market == "BTTS":
both = score_home > 0 and score_away > 0
if "yes" in p or "var" in p: return int(both)
if "no" in p or "yok" in p: return int(not both)
return None
if market == "HT":
if ht_home is None or ht_away is None: return None
if p == "1": return int(ht_home > ht_away)
if p in {"x", "0"}: return int(ht_home == ht_away)
if p == "2": return int(ht_away > ht_home)
return None
if market == "HTFT":
if ht_home is None or ht_away is None or "/" not in p: return None
ht_p, ft_p = p.split("/")
ht_actual = "1" if ht_home > ht_away else "2" if ht_away > ht_home else "x"
ft_actual = "1" if score_home > score_away else "2" if score_away > score_home else "x"
return int(ht_p.strip() == ht_actual and ft_p.strip() == ft_actual)
if market == "DC":
norm = p.replace("-", "").upper()
if norm == "1X": return int(score_home >= score_away)
if norm == "X2": return int(score_away >= score_home)
if norm == "12": return int(score_home != score_away)
return None
return None
def calibrator_key(market, pick):
m = (market or "").upper()
p = _normalize_pick(pick)
if m == "MS":
if p == "1": return "ms_home"
if p in {"x", "0"}: return "ms_draw"
if p == "2": return "ms_away"
return None
if m == "DC": return "dc"
if m == "OU15" and ("over" in p or "üst" in p): return "ou15"
if m == "OU25" and ("over" in p or "üst" in p): return "ou25"
if m == "OU35" and ("over" in p or "üst" in p): return "ou35"
if m == "BTTS" and ("yes" in p or "var" in p): return "btts"
if m == "HT":
if p == "1": return "ht_home"
if p in {"x", "0"}: return "ht_draw"
if p == "2": return "ht_away"
return None
if m == "HTFT": return "ht_ft"
return None
def get_conn():
db_url = os.getenv("DATABASE_URL", "")
if "?schema=" in db_url:
db_url = db_url.split("?schema=")[0]
if not db_url:
raise ValueError("DATABASE_URL not set")
return psycopg2.connect(db_url, cursor_factory=RealDictCursor)
ODD_CAT_MAP = {
"maç sonucu": {"1": "ms_h", "0": "ms_d", "x": "ms_d", "2": "ms_a"},
"1. yarı sonucu": {"1": "ht_ms_h", "0": "ht_ms_d", "x": "ht_ms_d", "2": "ht_ms_a"},
}
ODD_CAT_KEYWORD_MAP = {
"karşılıklı gol": {"var": "btts_y", "yok": "btts_n"},
"0,5 alt/üst": {"alt": "ou05_u", "üst": "ou05_o"},
"1,5 alt/üst": {"alt": "ou15_u", "üst": "ou15_o"},
"2,5 alt/üst": {"alt": "ou25_u", "üst": "ou25_o"},
"3,5 alt/üst": {"alt": "ou35_u", "üst": "ou35_o"},
"ilk yarı 0,5 alt/üst": {"alt": "ht_ou05_u", "üst": "ht_ou05_o"},
"ilk yarı 1,5 alt/üst": {"alt": "ht_ou15_u", "üst": "ht_ou15_o"},
}
def load_matches(cur, limit: int) -> List[Dict]:
cur.execute("""
SELECT m.id, m.score_home, m.score_away,
m.ht_score_home, m.ht_score_away
FROM matches m
JOIN football_ai_features f ON f.match_id = m.id
WHERE m.status = 'FT'
AND m.sport = 'football'
AND m.score_home IS NOT NULL
AND m.score_away IS NOT NULL
ORDER BY m.mst_utc DESC
LIMIT %s
""", (limit,))
return cur.fetchall()
def load_ai_features_batch(cur, match_ids: List[str]) -> Dict[str, Dict]:
if not match_ids:
return {}
ph = ",".join(["%s"] * len(match_ids))
cur.execute(f"""
SELECT match_id,
home_elo AS home_overall_elo,
away_elo AS away_overall_elo,
elo_diff,
home_home_elo, away_away_elo,
home_form_elo, away_form_elo,
(home_form_elo - away_form_elo) AS form_elo_diff,
home_goals_avg_5 AS home_goals_avg,
home_conceded_avg_5 AS home_conceded_avg,
away_goals_avg_5 AS away_goals_avg,
away_conceded_avg_5 AS away_conceded_avg,
home_clean_sheet_rate, away_clean_sheet_rate,
home_scoring_rate, away_scoring_rate,
home_win_streak AS home_winning_streak,
away_win_streak AS away_winning_streak,
0 AS home_unbeaten_streak,
0 AS away_unbeaten_streak,
h2h_total AS h2h_total_matches,
h2h_home_win_rate,
(1.0 - h2h_home_win_rate - 0.33) AS h2h_draw_rate,
h2h_avg_goals,
h2h_btts_rate, h2h_over25_rate,
home_avg_possession, away_avg_possession,
home_avg_shots_on_target, away_avg_shots_on_target,
home_shot_conversion, away_shot_conversion,
0.0 AS home_avg_corners, 0.0 AS away_avg_corners,
implied_home, implied_draw, implied_away,
league_avg_goals,
0.0 AS league_zero_goal_rate,
0.0 AS home_xga, 0.0 AS away_xga,
0.0 AS upset_atmosphere, 0.0 AS upset_motivation,
0.0 AS upset_fatigue, 0.0 AS upset_potential,
referee_home_bias, referee_avg_goals,
referee_avg_cards AS referee_cards_total,
0.0 AS referee_avg_yellow,
0.0 AS referee_experience,
0.0 AS home_momentum_score, 0.0 AS away_momentum_score,
0.0 AS momentum_diff,
0.0 AS home_squad_quality, 0.0 AS away_squad_quality,
0.0 AS squad_diff,
0 AS home_key_players, 0 AS away_key_players,
missing_players_impact AS home_missing_impact,
0.0 AS away_missing_impact,
home_goals_avg_5 AS home_goals_form,
away_goals_avg_5 AS away_goals_form
FROM football_ai_features
WHERE match_id IN ({ph})
""", match_ids)
return {str(row["match_id"]): dict(row) for row in cur.fetchall()}
def load_odds_batch(cur, match_ids: List[str]) -> Dict[str, Dict[str, float]]:
if not match_ids:
return {}
ph = ",".join(["%s"] * len(match_ids))
cur.execute(f"""
SELECT oc.match_id, oc.name AS cat_name,
os.name AS sel_name, os.odd_value
FROM odd_selections os
JOIN odd_categories oc ON os.odd_category_db_id = oc.db_id
WHERE oc.match_id IN ({ph})
""", match_ids)
odds: Dict[str, Dict[str, float]] = {}
for row in cur.fetchall():
mid = str(row["match_id"])
cat = (row["cat_name"] or "").lower().strip()
sel = (row["sel_name"] or "").strip()
val = float(row["odd_value"]) if row["odd_value"] else 0
if val <= 0:
continue
if mid not in odds:
odds[mid] = {}
if cat in ODD_CAT_MAP:
key = ODD_CAT_MAP[cat].get(sel.lower())
if key:
odds[mid][key] = val
else:
for cat_pattern, kw_map in ODD_CAT_KEYWORD_MAP.items():
if cat == cat_pattern:
for keyword, key in kw_map.items():
if keyword in sel.lower():
odds[mid][key] = val
break
return odds
MARKETS_TO_PREDICT = [
("MS", "1", lambda p: p[0]),
("MS", "X", lambda p: p[1]),
("MS", "2", lambda p: p[2]),
("OU25", "Over 2.5", lambda p: p[0]),
("BTTS", "Yes", lambda p: p[0]),
("OU15", "Over 1.5", lambda p: p[0]),
("OU35", "Over 3.5", lambda p: p[0]),
("HT", "1", lambda p: p[0]),
("HT", "X", lambda p: p[1]),
("HT", "2", lambda p: p[2]),
]
def run_backfill(args):
print("=" * 70)
print("CALIBRATION BACKFILL")
print("=" * 70)
conn = get_conn()
cur = conn.cursor(cursor_factory=RealDictCursor)
t0 = time.time()
print(f"Loading matches (limit={args.limit})...")
matches = load_matches(cur, args.limit)
print(f" Found {len(matches)} finished matches with ai_features")
match_ids = [str(m["id"]) for m in matches]
match_map = {str(m["id"]): m for m in matches}
print("Loading ai_features...")
features_map = load_ai_features_batch(cur, match_ids)
print(f" Loaded features for {len(features_map)} matches")
print("Loading odds...")
odds_map = load_odds_batch(cur, match_ids)
print(f" Loaded odds for {len(odds_map)} matches")
print(f"Data loading: {time.time() - t0:.1f}s")
print("\nLoading V25 model...")
predictor = V25Predictor()
predictor.load_models()
feature_cols = predictor.FEATURE_COLS
samples: List[Dict[str, Any]] = []
skipped = 0
processed = 0
print(f"\nRunning predictions on {len(match_ids)} matches...")
t1 = time.time()
for i, mid in enumerate(match_ids):
if mid not in features_map:
skipped += 1
continue
feat_row = features_map[mid]
odds_row = odds_map.get(mid, {})
match_row = match_map[mid]
feat_dict = {}
for col in feature_cols:
if col in feat_row and feat_row[col] is not None:
feat_dict[col] = float(feat_row[col])
elif col.startswith("odds_") and not col.endswith("_present"):
odds_key = col.replace("odds_", "")
feat_dict[col] = float(odds_row.get(odds_key, 0))
elif col.endswith("_present"):
base = col.replace("_present", "")
odds_key = base.replace("odds_", "")
feat_dict[col] = 1.0 if odds_row.get(odds_key, 0) > 0 else 0.0
else:
feat_dict[col] = 0.0
if odds_row.get("ms_h", 0) > 0:
feat_dict["odds_ms_h"] = odds_row["ms_h"]
if odds_row.get("ms_d", 0) > 0:
feat_dict["odds_ms_d"] = odds_row["ms_d"]
if odds_row.get("ms_a", 0) > 0:
feat_dict["odds_ms_a"] = odds_row["ms_a"]
ms_h = feat_dict.get("odds_ms_h", 0)
ms_d = feat_dict.get("odds_ms_d", 0)
ms_a = feat_dict.get("odds_ms_a", 0)
if ms_h > 0 and ms_d > 0 and ms_a > 0:
raw_sum = 1/ms_h + 1/ms_d + 1/ms_a
feat_dict["implied_home"] = (1/ms_h) / raw_sum
feat_dict["implied_draw"] = (1/ms_d) / raw_sum
feat_dict["implied_away"] = (1/ms_a) / raw_sum
sh = match_row["score_home"]
sa = match_row["score_away"]
ht_h = match_row.get("ht_score_home")
ht_a = match_row.get("ht_score_away")
try:
X = pd.DataFrame([{c: feat_dict.get(c, 0.0) for c in feature_cols}])
for market_name, model_key, market_list in [
("ms", "ms", ["MS"]),
("ou25", "ou25", ["OU25"]),
("btts", "btts", ["BTTS"]),
("ou15", "ou15", ["OU15"]),
("ou35", "ou35", ["OU35"]),
("ht_result", "ht_result", ["HT"]),
]:
if model_key not in predictor.models:
continue
probs = predictor.predict_market(model_key, feat_dict)
if probs is None:
continue
if model_key == "ms":
for pick, prob in [("1", probs[0]), ("X", probs[1]), ("2", probs[2])]:
actual = resolve_actual("MS", pick, sh, sa, ht_h, ht_a)
key = calibrator_key("MS", pick)
if actual is not None and key:
samples.append({
"match_id": mid,
"market": "MS",
"pick": pick,
"key": key,
"raw_prob": float(prob),
"actual": int(actual),
})
elif model_key == "ht_result":
if ht_h is None or ht_a is None:
continue
for pick, prob in [("1", probs[0]), ("X", probs[1]), ("2", probs[2])]:
actual = resolve_actual("HT", pick, sh, sa, ht_h, ht_a)
key = calibrator_key("HT", pick)
if actual is not None and key:
samples.append({
"match_id": mid,
"market": "HT",
"pick": pick,
"key": key,
"raw_prob": float(prob),
"actual": int(actual),
})
elif model_key in ("ou25", "ou15", "ou35"):
market_upper = model_key.upper()
over_prob = float(probs[0]) if len(probs) > 0 else 0.5
pick = f"Over"
actual = resolve_actual(market_upper, "Over", sh, sa, ht_h, ht_a)
key = calibrator_key(market_upper, "Over")
if actual is not None and key:
samples.append({
"match_id": mid,
"market": market_upper,
"pick": pick,
"key": key,
"raw_prob": over_prob,
"actual": int(actual),
})
elif model_key == "btts":
yes_prob = float(probs[0]) if len(probs) > 0 else 0.5
actual = resolve_actual("BTTS", "Yes", sh, sa, ht_h, ht_a)
key = calibrator_key("BTTS", "Yes")
if actual is not None and key:
samples.append({
"match_id": mid,
"market": "BTTS",
"pick": "Yes",
"key": key,
"raw_prob": yes_prob,
"actual": int(actual),
})
processed += 1
except Exception as e:
skipped += 1
if skipped <= 5:
print(f" Error on {mid}: {e}")
if (i + 1) % 5000 == 0:
elapsed = time.time() - t1
rate = (i + 1) / elapsed
print(f" Processed {i+1}/{len(match_ids)} ({rate:.0f} matches/s)")
elapsed = time.time() - t1
print(f"\nPrediction complete: {processed} matches, {skipped} skipped, {elapsed:.1f}s")
if not samples:
print("No calibration samples generated!")
cur.close()
conn.close()
return
df = pd.DataFrame(samples)
print(f"\nTotal calibration samples: {len(df)}")
print(f"Unique matches: {df['match_id'].nunique()}")
print(f"\nPer-key counts:")
for key, count in df["key"].value_counts().items():
print(f" {key:<14} {count}")
print(f"\nTraining isotonic calibration models (min_samples={args.min_samples})...")
calibrator = get_calibrator()
results: Dict[str, Any] = {}
keys = sorted(df["key"].unique())
for key in keys:
sub = df[df["key"] == key].copy()
sub = sub.drop_duplicates(subset=["match_id", "key"], keep="first")
sub = sub.dropna(subset=["raw_prob", "actual"])
sub = sub[(sub["raw_prob"] > 0.0) & (sub["raw_prob"] < 1.0)]
n = len(sub)
if n < args.min_samples:
results[key] = {"status": "skipped", "samples": n}
continue
metrics = calibrator.train_calibration(
df=sub,
market=key,
prob_col="raw_prob",
actual_col="actual",
min_samples=args.min_samples,
save=True,
)
results[key] = {
"status": "trained",
"samples": metrics.sample_count,
"brier": round(metrics.brier_score, 4),
"ece": round(metrics.calibration_error, 4),
"mean_predicted": round(metrics.mean_predicted, 4),
"mean_actual": round(metrics.mean_actual, 4),
}
print("\n" + "=" * 70)
print("CALIBRATION RESULTS")
print("=" * 70)
print(f"{'market':<14} {'status':<10} {'n':<8} {'brier':<9} {'ece':<8} {'pred_avg':<9} {'actual_avg'}")
print("-" * 70)
for key, info in sorted(results.items()):
if info["status"] == "trained":
print(
f"{key:<14} {'OK':<10} {info['samples']:<8} "
f"{info['brier']:<9.4f} {info['ece']:<8.4f} "
f"{info['mean_predicted']:<9.4f} {info['mean_actual']}"
)
else:
print(f"{key:<14} {'SKIP':<10} {info['samples']:<8}")
print("=" * 70)
total_time = time.time() - t0
print(f"\nTotal time: {total_time:.1f}s")
print(f"Calibration models saved to: {os.path.join(AI_ENGINE_DIR, 'models', 'calibration')}/")
cur.close()
conn.close()
def main():
parser = argparse.ArgumentParser(description="Backfill calibration from historical matches")
parser.add_argument("--limit", type=int, default=50000,
help="Max matches to process (default: 50000)")
parser.add_argument("--min-samples", type=int, default=100,
help="Min samples per market for calibration (default: 100)")
args = parser.parse_args()
run_backfill(args)
if __name__ == "__main__":
main()
+206
View File
@@ -0,0 +1,206 @@
"""
Backtest for September 13th (Top Leagues Only)
==============================================
Simulates the NEW 'Skip Logic' on matches from Sept 13, 2025.
"""
import os
import sys
import json
import psycopg2
from psycopg2.extras import RealDictCursor
from datetime import datetime
# Load .env manually to ensure correct DB connection
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.insert(0, project_root) # Add root to path if needed
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
# ─── Configuration ─────────
MIN_CONF_THRESHOLDS = {
"MS": 45.0, "DC": 40.0, "OU15": 50.0, "OU25": 45.0,
"OU35": 45.0, "BTTS": 45.0, "HT": 40.0,
}
def run_backtest():
print("🚀 Backtest: 13 Eylül 2024 - Top Leagues")
print("="*60)
# 1. Load Top Leagues
leagues_path = os.path.join(project_root, "top_leagues.json")
try:
with open(leagues_path, 'r') as f:
top_leagues = json.load(f)
# Ensure they are strings for SQL IN clause
league_ids = tuple(str(lid) for lid in top_leagues)
print(f"📋 Loaded {len(top_leagues)} top leagues.")
except Exception as e:
print(f"❌ Error loading top_leagues.json: {e}")
return
# 2. Define Date Range (Sept 13, 2024 UTC)
start_dt = datetime(2024, 9, 13, 0, 0, 0)
end_dt = datetime(2024, 9, 13, 23, 59, 59)
start_ts = int(start_dt.timestamp() * 1000)
end_ts = int(end_dt.timestamp() * 1000)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
# 3. Fetch Matches & Predictions
# We need matches that are FT and have a prediction
query = """
SELECT p.match_id, p.prediction_json,
m.score_home, m.score_away, m.status, m.league_id
FROM predictions p
JOIN matches m ON p.match_id = m.id
WHERE m.mst_utc BETWEEN %s AND %s
AND m.league_id IN %s
AND m.status = 'FT'
AND p.prediction_json IS NOT NULL
"""
try:
cur.execute(query, (start_ts, end_ts, league_ids))
rows = cur.fetchall()
except Exception as e:
print(f"❌ DB Error: {e}")
cur.close()
conn.close()
return
print(f"📊 Found {len(rows)} matches with predictions on Sept 13, 2024.")
if not rows:
print("⚠️ No predictions found for this date. The AI Engine might not have processed these historical matches yet.")
print("💡 Tip: Run the feeder or AI engine on this date range to generate predictions first.")
cur.close()
conn.close()
return
total_bets = 0
winning_bets = 0
skipped_bets = 0
total_profit = 0.0
for row in rows:
data = row['prediction_json']
if isinstance(data, str):
data = json.loads(data)
home_score = row['score_home'] or 0
away_score = row['score_away'] or 0
total_goals = home_score + away_score
# Extract Main Pick
main_pick = None
main_pick_conf = 0.0
main_pick_odds = 0.0
if "main_pick" in data and isinstance(data["main_pick"], dict):
mp = data["main_pick"]
main_pick = mp.get("pick")
main_pick_conf = mp.get("confidence", 0.0)
main_pick_odds = mp.get("odds", 0.0)
if not main_pick or not main_pick_conf:
continue
# Determine Market Type
pick_str = str(main_pick).upper()
market_type = "MS"
if "1X" in pick_str or "X2" in pick_str or "12" in pick_str: market_type = "DC"
elif "ÜST" in pick_str or "ALT" in pick_str or "OVER" in pick_str or "UNDER" in pick_str:
if "1.5" in pick_str: market_type = "OU15"
elif "3.5" in pick_str: market_type = "OU35"
else: market_type = "OU25"
elif "VAR" in pick_str or "YOK" in pick_str or "BTTS" in pick_str: market_type = "BTTS"
threshold = MIN_CONF_THRESHOLDS.get(market_type, 45.0)
# --- SKIP LOGIC ---
# 1. Confidence Gate
if main_pick_conf < threshold:
skipped_bets += 1
continue
# 2. Value Gate
if main_pick_odds > 0:
implied_prob = 1.0 / main_pick_odds
my_prob = main_pick_conf / 100.0
edge = my_prob - implied_prob
if edge < -0.03:
skipped_bets += 1
continue
# --- BET PLAYED ---
total_bets += 1
is_won = False
# Resolve Result
if market_type == "MS":
if (main_pick == "1" or main_pick == "MS 1") and home_score > away_score: is_won = True
elif (main_pick == "X" or main_pick == "MS X") and home_score == away_score: is_won = True
elif (main_pick == "2" or main_pick == "MS 2") and away_score > home_score: is_won = True
elif market_type.startswith("OU"):
line = 2.5
if "1.5" in pick_str: line = 1.5
elif "3.5" in pick_str: line = 3.5
is_over = total_goals > line
is_under = total_goals < line
if ("ÜST" in pick_str or "OVER" in pick_str) and is_over: is_won = True
elif ("ALT" in pick_str or "UNDER" in pick_str) and is_under: is_won = True
elif market_type == "BTTS":
if home_score > 0 and away_score > 0:
if "VAR" in pick_str: is_won = True
else:
if "YOK" in pick_str: is_won = True
elif market_type == "DC":
if "1X" in pick_str and home_score >= away_score: is_won = True
elif "X2" in pick_str and away_score >= home_score: is_won = True
elif "12" in pick_str and home_score != away_score: is_won = True
if is_won:
winning_bets += 1
profit = main_pick_odds - 1.0
total_profit += profit
else:
total_profit -= 1.0
# Report
print("\n" + "="*60)
print("📈 BACKTEST RESULTS: 13 EYLÜL 2025 (TOP LEAGUES)")
print("="*60)
print(f"Total Matches Analyzed: {len(rows)}")
print(f"🚫 Bets SKIPPED (Low Conf/Bad Value): {skipped_bets}")
print(f"✅ Bets PLAYED: {total_bets}")
if total_bets > 0:
win_rate = (winning_bets / total_bets) * 100
roi = (total_profit / total_bets) * 100
print(f"🏆 Winning Bets: {winning_bets}")
print(f"💀 Losing Bets: {total_bets - winning_bets}")
print("-" * 40)
print(f" Win Rate: {win_rate:.2f}%")
print(f"💰 Total Profit (Units): {total_profit:.2f}")
print(f"📊 ROI: {roi:.2f}%")
if roi > 0:
print("🟢 STRATEGY IS PROFITABLE!")
else:
print("🔴 STRATEGY IS LOSING")
else:
print("⚠️ No bets were played. Thresholds might be too high or no suitable matches found.")
cur.close()
conn.close()
if __name__ == "__main__":
run_backtest()
+240
View File
@@ -0,0 +1,240 @@
"""
Detailed Backtest with 50 Top League Matches
============================================
Runs AI Engine predictions on 50 real historical matches and shows
exactly which predictions were correct and which were skipped.
Usage:
python ai-engine/scripts/backtest_50_detailed.py
"""
import os
import sys
import json
import time
import psycopg2
from psycopg2.extras import RealDictCursor
# Add paths
AI_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(AI_DIR)
sys.path.insert(0, ROOT_DIR)
if "scripts" in os.path.basename(AI_DIR):
ROOT_DIR = os.path.dirname(ROOT_DIR)
from services.single_match_orchestrator import get_single_match_orchestrator
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
# 50 Match IDs from the query
MATCH_IDS = [
"v2ljcst50nk37x04xwimpi50", "7gz0bhb5yvdssazl3y5946kno", "7ftj7kbu4rzpewxravf3luuc4",
"7f1z4e8ch1dm5q677644cky6s", "7ffq3aq3so22iymfdzch63nys", "rrkmeuymz7gzvoz8mplikzdg",
"7hegc9covicy699bxsi81xkb8", "7gl7rpr1hjayk3e5ut0gr613o", "7g7d86i3738287xfvyfeffcwk",
"7hs4boe4hv80muawocevvx2j8", "7ijhsloieg4t9yp5cxp0duln8", "7ixaiiptli5ek32kuybuni4gk",
"7i5sfh41cjpwg4l972dm487x0", "eo7g4wunxxxr8uv45q8p5x638", "7dinds2937w4645wva2rddlas",
"7b5ukdhvqh62wtndeqfg01ixg", "7bjptsj24gndoydn7n0202g44", "7cqxf3vo58ewrwmoom5xiyexg",
"7bxjl9h2hnf165rlp3o1vfztg", "7eo8zrez08c342rqsezpvq39w", "7as1muhs98vdarlhsean4bspg",
"7dwhj8cfxv6v6bzxpu5e3h05w", "7d4vq4417ps84yjzh95bnvvv8", "7ea9z501jgp9kxw3gay4myrkk",
"7cd3401itlty6ded7c1wct0yc", "ebgpz9mcije2snv986n6587pw", "i7ar1dkhvcwpxmkyks65ib6c",
"lyek7tyy6qk2xjs9vblucnx0", "hdn9qtyn3ysjwbc3i2trantg", "3y2bnssfqlajosiz2gpkn6xhw",
"40pehd14s9djjtycujavbex3o", "3xnbfjznzmnwml20akbgnis5w", "2eovi2rcc2l4ha7fpb2w7e1hw",
"2bwuikdjyyuithhru8ka8o00k", "2d3pcd76ya9ihi9yotxc553is", "1e9it04z4epy2etdxsffe7m6s",
"7af49jgo4iulv1k8cplj9smj8", "5k3vrz619hdu9nx4rnx6uim1g", "amjppgpetnyr0iisi241kgkyc",
"coqrhq09kxd16iejvgtzj3mz8", "d8ysan1qdctmkvjaz2adw7aqc", "9ttciz0gtb0z09ev1q5fe0ro4",
"9u720o37yaddqu1w6hlszpnh0", "7ijezdjp8t0rjti91ac63hyxg", "72gvdvztbb3dn79jidzzxzcb8",
"6uof1v2s6vrpieeml2bwo9tlg", "91dd8ia3m0bxoqzjgyo3ptsk", "3tj1nt3udsbvb9soqn2cs6gpg",
"1br5g88o5idtjxka1fr6zg4k4", "akuesquthbmxlzckvnqmgles4"
]
def run_detailed_backtest():
print("🚀 DETAILED BACKTEST: 50 Top League Matches")
print("🧠 Engine: V30 Ensemble (V20+V25) + Skip Logic")
print("="*80)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
# Fetch match details with odds
placeholders = ','.join(['%s'] * len(MATCH_IDS))
cur.execute(f"""
SELECT m.id, m.match_name, m.home_team_id, m.away_team_id,
m.score_home, m.score_away, m.league_id,
t1.name as home_team, t2.name as away_team,
l.name as league_name
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
LEFT JOIN leagues l ON m.league_id = l.id
WHERE m.id IN ({placeholders})
AND m.status = 'FT'
ORDER BY m.mst_utc DESC
""", MATCH_IDS)
rows = cur.fetchall()
print(f"📊 Found {len(rows)} matches. Starting AI Analysis...")
if not rows:
print("⚠️ No matches found.")
cur.close()
conn.close()
return
# Initialize AI Engine
try:
orchestrator = get_single_match_orchestrator()
print("✅ AI Engine Loaded.\n")
except Exception as e:
print(f"❌ Failed to load AI Engine: {e}")
cur.close()
conn.close()
return
# ─── Backtest Loop ───
results = []
total_skipped = 0
total_played = 0
total_won = 0
total_profit = 0.0
MIN_CONF = 45.0
start_time = time.time()
for i, row in enumerate(rows):
match_id = str(row['id'])
home_team = row['home_team'] or "Unknown"
away_team = row['away_team'] or "Unknown"
league = row['league_name'] or "Unknown"
home_score = row['score_home'] or 0
away_score = row['score_away'] or 0
total_goals = home_score + away_score
print(f"[{i+1}/{len(rows)}] {home_team} vs {away_team} ({league}) ... ", end="", flush=True)
try:
prediction = orchestrator.analyze_match(match_id)
if not prediction:
print("⚠️ No prediction")
continue
# Extract Main Pick
main_pick = prediction.get("main_pick") or {}
pick_name = main_pick.get("pick", "")
confidence = main_pick.get("confidence", 0)
odds = main_pick.get("odds", 0)
# Apply Skip Logic
if confidence < MIN_CONF:
print(f"🚫 SKIP (Conf {confidence:.0f}%)")
total_skipped += 1
results.append({"match": f"{home_team} vs {away_team}", "pick": pick_name,
"conf": confidence, "odds": odds, "result": "SKIPPED", "profit": 0})
continue
if odds > 0:
implied_prob = 1.0 / odds
my_prob = confidence / 100.0
if my_prob - implied_prob < -0.03:
print(f"🚫 SKIP (Bad Value)")
total_skipped += 1
results.append({"match": f"{home_team} vs {away_team}", "pick": pick_name,
"conf": confidence, "odds": odds, "result": "SKIPPED", "profit": 0})
continue
# Bet Played
total_played += 1
won = False
# Resolve
pick_clean = str(pick_name).upper()
if pick_clean in ["1", "MS 1", "İY 1"] and home_score > away_score: won = True
elif pick_clean in ["X", "MS X", "İY X"] and home_score == away_score: won = True
elif pick_clean in ["2", "MS 2", "İY 2"] and away_score > home_score: won = True
elif pick_clean in ["1X", "X2"] or ("1X" in pick_clean or "X2" in pick_clean):
if "1X" in pick_clean and home_score >= away_score: won = True
elif "X2" in pick_clean and away_score >= home_score: won = True
elif pick_clean in ["12"] and home_score != away_score: won = True
elif "ÜST" in pick_clean or "OVER" in pick_clean:
line = 2.5
if "1.5" in pick_clean: line = 1.5
elif "3.5" in pick_clean: line = 3.5
if total_goals > line: won = True
elif "ALT" in pick_clean or "UNDER" in pick_clean:
line = 2.5
if "1.5" in pick_clean: line = 1.5
elif "3.5" in pick_clean: line = 3.5
if total_goals < line: won = True
elif "VAR" in pick_clean and home_score > 0 and away_score > 0: won = True
elif "YOK" in pick_clean and (home_score == 0 or away_score == 0): won = True
if won:
total_won += 1
profit = odds - 1.0
print(f"✅ WON ({pick_name} @ {odds:.2f}, +{profit:.2f})")
else:
profit = -1.0
print(f"❌ LOST ({pick_name} @ {odds:.2f})")
total_profit += profit
results.append({"match": f"{home_team} vs {away_team}", "pick": pick_name,
"conf": confidence, "odds": odds,
"result": "WON" if won else "LOST", "profit": profit,
"score": f"{home_score}-{away_score}"})
except Exception as e:
print(f"💥 Error: {e}")
elapsed = time.time() - start_time
# ─── DETAILED REPORT ───
print("\n" + "="*80)
print("📈 DETAILED BACKTEST RESULTS")
print(f"⏱️ Time: {elapsed:.1f}s")
print("="*80)
print(f"📊 Total Matches: {len(rows)}")
print(f"🚫 Skipped: {total_skipped}")
print(f"🎲 Played: {total_played}")
print(f"✅ Won: {total_won}")
print(f"💀 Lost: {total_played - total_won}")
print(f"💰 Profit: {total_profit:+.2f} units")
if total_played > 0:
win_rate = (total_won / total_played) * 100
roi = (total_profit / total_played) * 100
print(f"📊 Win Rate: {win_rate:.1f}%")
print(f"📊 ROI: {roi:.1f}%")
if roi > 0:
print("🟢 STRATEGY IS PROFITABLE!")
else:
print("🔴 STRATEGY IS LOSING")
# ─── TABLE OF ALL RESULTS ───
print("\n" + "="*80)
print("📋 DETAILED MATCH RESULTS")
print("="*80)
print(f"{'Match':<40} {'Pick':<15} {'Conf':<6} {'Odds':<6} {'Result':<8} {'Score':<6}")
print("-"*80)
for r in results:
match_str = r['match'][:38]
pick_str = str(r['pick'])[:13]
conf_str = f"{r['conf']:.0f}%"
odds_str = f"{r['odds']:.2f}" if r['odds'] > 0 else "N/A"
res_str = r['result']
score_str = r.get('score', '')
# Color coding
if res_str == "WON": res_display = f"{res_str}"
elif res_str == "LOST": res_display = f"{res_str}"
else: res_display = f"🚫 {res_str}"
print(f"{match_str:<40} {pick_str:<15} {conf_str:<6} {odds_str:<6} {res_display:<12} {score_str:<6}")
cur.close()
conn.close()
if __name__ == "__main__":
run_detailed_backtest()
+191
View File
@@ -0,0 +1,191 @@
"""
Adaptive 500 Match Backtest
=============================
Skips NO match unless NO odds exist.
Evaluates ALL available markets (MS, OU, BTTS) and picks the BEST value bet.
"""
import os
import sys
import json
import time
import psycopg2
from psycopg2.extras import RealDictCursor
AI_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(AI_DIR)
sys.path.insert(0, ROOT_DIR)
if "scripts" in os.path.basename(AI_DIR):
ROOT_DIR = os.path.dirname(ROOT_DIR)
from services.single_match_orchestrator import get_single_match_orchestrator
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
def run_adaptive_backtest():
print("🔄 ADAPTIVE 500 MATCH BACKTEST")
print("="*60)
# 1. Load Top Leagues
leagues_path = os.path.join(ROOT_DIR, "top_leagues.json")
with open(leagues_path, 'r') as f:
top_leagues = json.load(f)
league_ids = tuple(str(lid) for lid in top_leagues)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
# 2. Fetch 500 Finished Matches with Odds
cur.execute("""
SELECT m.id, m.match_name, m.home_team_id, m.away_team_id,
m.score_home, m.score_away, m.league_id,
t1.name as home_team, t2.name as away_team
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
WHERE m.league_id IN %s
AND m.status = 'FT'
AND m.score_home IS NOT NULL
AND EXISTS (SELECT 1 FROM odd_categories oc WHERE oc.match_id = m.id)
ORDER BY m.mst_utc DESC
LIMIT 500
""", (league_ids,))
rows = cur.fetchall()
print(f"📊 Found {len(rows)} matches. Analyzing...\n")
if not rows:
print("⚠️ No matches found.")
return
try: orchestrator = get_single_match_orchestrator()
except Exception as e:
print(f"❌ AI Error: {e}")
return
# Stats
total_evaluated = 0
total_bet = 0
total_won = 0
total_profit = 0.0
skipped_count = 0
for i, row in enumerate(rows):
match_id = str(row['id'])
home = row['home_team'] or "?"
away = row['away_team'] or "?"
h_score = row['score_home'] or 0
a_score = row['score_away'] or 0
total_evaluated += 1
# print(f"[{i+1}] {home} vs {away} ... ", end="", flush=True)
try:
pred = orchestrator.analyze_match(match_id)
if not pred:
# print("⚠️ No Data")
continue
# ─── ADAPTIVE PICKING ───
# Check ALL recommendations (Expert or Standard) to find the BEST option
candidates = []
# Add main picks
if pred.get("expert_recommendation"):
rec = pred["expert_recommendation"]
if rec.get("main_pick"): candidates.append(rec["main_pick"])
if rec.get("safe_alternative"): candidates.append(rec["safe_alternative"])
if rec.get("value_picks"): candidates.extend(rec["value_picks"])
elif pred.get("main_pick"):
candidates.append(pred["main_pick"])
best_bet = None
for c in candidates:
if not c: continue
conf = c.get("confidence", 0)
odds = c.get("odds", 0)
pick = c.get("pick")
# Flexible Criteria:
# 1. Confidence > 60%
# 2. Odds > 1.10 (Not "free" odds like 1.00)
# 3. Edge > -2% (Slightly tolerant)
if conf >= 60 and odds > 1.10:
implied = 1.0 / odds
edge = ((conf/100) - implied) * 100
# Prioritize positive edge, but accept small negative if confidence is high
if edge > -2.0:
if best_bet is None or (conf > best_bet.get("confidence", 0)):
best_bet = c
if best_bet:
pick = str(best_bet.get("pick")).upper()
conf = best_bet.get("confidence")
odds = best_bet.get("odds")
# Resolution Logic
won = False
if pick in ["1", "MS 1", "İY 1"] and h_score > a_score: won = True
elif pick in ["X", "MS X", "İY X"] and h_score == a_score: won = True
elif pick in ["2", "MS 2", "İY 2"] and a_score > h_score: won = True
elif pick in ["1X", "X2"]:
if "1X" in pick and h_score >= a_score: won = True
elif "X2" in pick and a_score >= h_score: won = True
elif pick == "12" and h_score != a_score: won = True
elif "ÜST" in pick or "OVER" in pick:
line = 2.5
if "1.5" in pick: line = 1.5
elif "3.5" in pick: line = 3.5
if (h_score + a_score) > line: won = True
elif "ALT" in pick or "UNDER" in pick:
line = 2.5
if "1.5" in pick: line = 1.5
elif "3.5" in pick: line = 3.5
if (h_score + a_score) < line: won = True
elif "VAR" in pick and h_score > 0 and a_score > 0: won = True
elif "YOK" in pick and (h_score == 0 or a_score == 0): won = True
total_bet += 1
if won:
total_won += 1
profit = odds - 1.0
total_profit += profit
# print(f"✅ WON (+{profit:.2f}) | {pick}")
else:
total_profit -= 1.0
# print(f"❌ LOST ({pick} @ {odds:.2f})")
else:
skipped_count += 1
# print(f"🚫 SKIP (No Value)")
except Exception as e:
# print(f"💥 Error: {e}")
pass
print("\n" + "="*60)
print("🔄 ADAPTIVE BACKTEST RESULTS (500 Matches)")
print("="*60)
print(f"📊 Evaluated: {total_evaluated}")
print(f"🎲 Played: {total_bet}")
print(f"🚫 Skipped: {skipped_count}")
print(f"✅ Won: {total_won}")
if total_bet > 0:
win_rate = (total_won / total_bet) * 100
roi = (total_profit / total_bet) * 100
print(f"📈 Win Rate: {win_rate:.2f}%")
print(f"💰 Total Profit: {total_profit:.2f} Units")
print(f"📊 ROI: {roi:.2f}%")
if total_profit > 0: print("🟢 KARLI STRATEJİ")
else: print("🔴 ZARARDA")
else:
print("⚠️ Hiç bahis oynanmadı. Veri kalitesi çok düşük.")
cur.close()
conn.close()
if __name__ == "__main__":
run_adaptive_backtest()
+352
View File
@@ -0,0 +1,352 @@
"""
Tutarsızlık Bazlı Backtest
============================
Modeller arası tutarsızlığı ölçer, tutarlı maçlarda bahis açılsaydı
ROI ne olurdu hesaplar.
Mantık:
- Her maç için market'ler arası çelişkileri tespit et
- Tutarsız maçları filtrele
- Tutarlı maçlarda hit rate ve ROI hesapla
Usage:
python scripts/backtest_consistency.py
"""
import os, sys, json
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.metrics import accuracy_score
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
DATA_PATH = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
'data', 'training_data.csv')
MODELS_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
'models', 'v25')
SKIP_COLS = {
'match_id','home_team_id','away_team_id','league_id','mst_utc',
'score_home','score_away','total_goals','ht_score_home','ht_score_away','ht_total_goals',
'label_ms','label_ou05','label_ou15','label_ou25','label_ou35','label_btts',
'label_ht_result','label_ht_ou05','label_ht_ou15','label_ht_ft',
'label_odd_even','label_yellow_cards','label_cards_ou45','label_handicap_ms',
}
def load_model(market: str):
path = os.path.join(MODELS_DIR, f'xgb_v25_{market}.json')
if not os.path.exists(path):
return None
b = xgb.Booster()
b.load_model(path)
return b
def predict_proba(model, X: np.ndarray, feature_cols: list, n_class: int):
dmat = xgb.DMatrix(pd.DataFrame(X, columns=feature_cols))
raw = model.predict(dmat)
if n_class > 2:
return raw.reshape(-1, n_class)
return np.column_stack([1 - raw, raw])
def consistency_score(probs: dict) -> tuple[float, list]:
"""
Market'ler arası tutarsızlığı hesapla.
0 = tamamen tutarlı, 1 = tamamen çelişkili.
Kontrol edilen çelişkiler:
1. OU15 üst yüksek ama OU25 üst de yüksek ok
OU15 üst yüksek ama OU25 alt yüksek ÇELISKI (1 gol bekleniyor ama 2.5+ da bekleniyor?)
2. HT_OU05 üst yüksek ama HT sonucu draw yüksek ÇELISKI
3. OU35 üst yüksek ama BTTS düşük şüpheli
4. MS home yüksek ama HT away yüksek çelişkili
"""
conflicts = []
total_weight = 0
total_conflict = 0
# OU tutarlılığı: P(OU25>0.5) <= P(OU15>0.5) matematiksel zorunluluk
ou15_over = probs.get('ou15_over', 0.5)
ou25_over = probs.get('ou25_over', 0.5)
ou35_over = probs.get('ou35_over', 0.5)
# OU hiyerarşisi: ou35 <= ou25 <= ou15 olmalı
if ou25_over > ou15_over + 0.05:
gap = ou25_over - ou15_over
conflicts.append(f'OU25>{ou25_over:.0%} > OU15>{ou15_over:.0%} (imkansız)')
total_conflict += gap * 2
total_weight += 1
if ou35_over > ou25_over + 0.05:
gap = ou35_over - ou25_over
conflicts.append(f'OU35>{ou35_over:.0%} > OU25>{ou25_over:.0%} (imkansız)')
total_conflict += gap * 2
total_weight += 1
# HT_OU05 ve HT sonuç tutarlılığı
ht_ou05_over = probs.get('ht_ou05_over', 0.5)
ht_draw_prob = probs.get('ht_draw', 0.34)
# İlk yarıda gol bekleniyor ama beraberlik de bekleniyor (0-0 draw?)
# HT_OU05 >%70 ama HT draw >%50 → çelişkili (0-0 berabere çok?)
if ht_ou05_over > 0.70 and ht_draw_prob > 0.50:
conflict = min(ht_ou05_over - 0.5, ht_draw_prob - 0.4)
conflicts.append(f'HT_OU05>{ht_ou05_over:.0%} ama HT_Draw>{ht_draw_prob:.0%}')
total_conflict += conflict
total_weight += 1
# HT_OU05 ve HT_OU15 tutarlılığı
ht_ou15_over = probs.get('ht_ou15_over', 0.3)
if ht_ou15_over > ht_ou05_over + 0.05:
gap = ht_ou15_over - ht_ou05_over
conflicts.append(f'HT_OU15>{ht_ou15_over:.0%} > HT_OU05>{ht_ou05_over:.0%} (imkansız)')
total_conflict += gap * 2
total_weight += 1
# MS ve OU tutarlılığı
ms_home = probs.get('ms_home', 0.33)
ms_away = probs.get('ms_away', 0.33)
btts_yes = probs.get('btts_yes', 0.5)
# Tek takım galibiyeti kuvvetli ama BTTS yüksek → şüpheli
dominant = max(ms_home, ms_away)
if dominant > 0.65 and btts_yes > 0.65:
conflict = (dominant - 0.5) * (btts_yes - 0.5)
conflicts.append(f'MS dominant>{dominant:.0%} ama BTTS_Yes>{btts_yes:.0%}')
total_conflict += conflict * 0.5
total_weight += 1
# OU25 ve BTTS tutarlılığı
# BTTS yüksekse en az 2 gol → OU25 üst de yüksek olmalı
if btts_yes > 0.65 and ou25_over < 0.45:
conflict = btts_yes - ou25_over
conflicts.append(f'BTTS_Yes>{btts_yes:.0%} ama OU25>{ou25_over:.0%} düşük')
total_conflict += conflict
total_weight += 1
# OU35 üst yüksek ama BTTS düşük → şüpheli (3+ gol ama tek takım mı?)
if ou35_over > 0.45 and btts_yes < 0.40:
conflict = (ou35_over - 0.35) * (0.5 - btts_yes)
conflicts.append(f'OU35>{ou35_over:.0%} ama BTTS_Yes<{btts_yes:.0%}')
total_conflict += conflict
total_weight += 1
score = min(1.0, total_conflict / max(total_weight * 0.3, 0.1))
return score, conflicts
def main():
print('Loading data...')
df = pd.read_csv(DATA_PATH, low_memory=False)
# Son %20 = test seti (kronolojik)
df = df.sort_values('mst_utc')
n_test = int(len(df) * 0.20)
df_test = df.tail(n_test).copy()
print(f'Test seti: {len(df_test):,} maç')
feature_cols = [c for c in df.columns if c not in SKIP_COLS]
# Modelleri yükle
print('Modeller yükleniyor...')
models = {
'ms': (load_model('ms'), 3),
'ou15': (load_model('ou15'), 2),
'ou25': (load_model('ou25'), 2),
'ou35': (load_model('ou35'), 2),
'btts': (load_model('btts'), 2),
'ht_result':(load_model('ht_result'), 3),
'ht_ou05': (load_model('ht_ou05'), 2),
'ht_ou15': (load_model('ht_ou15'), 2),
}
models = {k: v for k, v in models.items() if v[0] is not None}
print(f'Yüklenen model: {list(models.keys())}')
X = df_test[feature_cols].fillna(0).values
# Tüm tahminleri al
print('Tahminler yapılıyor...')
preds = {}
for mkey, (model, n_class) in models.items():
p = predict_proba(model, X, feature_cols, n_class)
preds[mkey] = p
# Her maç için tutarsızlık skoru ve tahmin kararı
results = []
for i in range(len(df_test)):
row = df_test.iloc[i]
# Olasılıkları topla
probs = {}
if 'ms' in preds:
probs['ms_home'] = preds['ms'][i][0]
probs['ms_draw'] = preds['ms'][i][1]
probs['ms_away'] = preds['ms'][i][2]
if 'ou15' in preds:
probs['ou15_over'] = preds['ou15'][i][1]
if 'ou25' in preds:
probs['ou25_over'] = preds['ou25'][i][1]
if 'ou35' in preds:
probs['ou35_over'] = preds['ou35'][i][1]
if 'btts' in preds:
probs['btts_yes'] = preds['btts'][i][1]
if 'ht_result' in preds:
probs['ht_home'] = preds['ht_result'][i][0]
probs['ht_draw'] = preds['ht_result'][i][1]
probs['ht_away'] = preds['ht_result'][i][2]
if 'ht_ou05' in preds:
probs['ht_ou05_over'] = preds['ht_ou05'][i][1]
if 'ht_ou15' in preds:
probs['ht_ou15_over'] = preds['ht_ou15'][i][1]
c_score, conflicts = consistency_score(probs)
# Gerçek sonuçlar
actual = {
'ms': int(row.get('label_ms', -1)),
'ou15': int(row.get('label_ou15', -1)),
'ou25': int(row.get('label_ou25', -1)),
'ou35': int(row.get('label_ou35', -1)),
'btts': int(row.get('label_btts', -1)),
}
# Her market için tahmin ve doğruluk
market_results = {}
for mkt, label_key in [('ms','ms'),('ou15','ou15'),('ou25','ou25'),
('ou35','ou35'),('btts','btts')]:
if mkt not in preds or actual[label_key] < 0:
continue
pred_class = int(np.argmax(preds[mkt][i]))
correct = int(pred_class == actual[label_key])
# Odds (implied prob → odds = 1/prob)
pred_prob = float(preds[mkt][i][pred_class])
implied_odds = 1 / pred_prob if pred_prob > 0.01 else 10.0
# ROI hesabı: 1 birim bahis, kazanırsa (odds-1) kazanç, kaybederse -1
roi = (implied_odds - 1) * correct - (1 - correct)
market_results[mkt] = {
'pred': pred_class,
'actual': actual[label_key],
'correct': correct,
'prob': pred_prob,
'roi': roi,
}
results.append({
'idx': i,
'consistency_score': c_score,
'conflicts': conflicts,
'probs': probs,
'market_results': market_results,
})
df_results = pd.DataFrame([{
'consistency_score': r['consistency_score'],
'n_conflicts': len(r['conflicts']),
**{f'{m}_correct': r['market_results'].get(m, {}).get('correct', None)
for m in ['ms','ou15','ou25','ou35','btts']},
**{f'{m}_roi': r['market_results'].get(m, {}).get('roi', None)
for m in ['ms','ou15','ou25','ou35','btts']},
} for r in results])
# ── Analiz ──────────────────────────────────────────────────────────
print(f'\n{"="*70}')
print('TUTARSIZLIK ANALİZİ')
print(f'{"="*70}')
thresholds = [0.0, 0.1, 0.2, 0.3, 0.5]
markets = ['ms', 'ou15', 'ou25', 'ou35', 'btts']
for t in thresholds:
mask = df_results['consistency_score'] <= t
n = mask.sum()
if n < 50:
continue
print(f'\n[Tutarsızlık <= {t:.1f}] → {n:,} maç ({n/len(df_results)*100:.0f}%)')
print(f' {"Market":<8} {"HitRate":>8} {"ROI/bahis":>10} {"Toplam ROI":>12}')
print(f' {"-"*42}')
for m in markets:
col_c = f'{m}_correct'
col_r = f'{m}_roi'
if col_c not in df_results.columns:
continue
sub = df_results[mask][col_c].dropna()
roi_sub = df_results[mask][col_r].dropna()
if len(sub) < 20:
continue
hit = sub.mean()
avg_roi = roi_sub.mean()
total_roi = roi_sub.sum()
print(f' {m:<8} {hit:>7.1%} {avg_roi:>+9.3f} {total_roi:>+11.1f}')
# Çelişki türlerine göre breakdown
print(f'\n{"="*70}')
print('EN SIK ÇELIŞKILER')
print(f'{"="*70}')
all_conflicts = [c for r in results for c in r['conflicts']]
from collections import Counter
for conflict, cnt in Counter(all_conflicts).most_common(10):
print(f' {cnt:>5}x {conflict}')
# Tutarsızlık dağılımı
print(f'\n{"="*70}')
print('TUTARSIZLIK DAĞILIMI')
print(f'{"="*70}')
for label, lo, hi in [
('Tamamen tutarlı', 0.0, 0.05),
('Çok tutarlı', 0.05, 0.15),
('Orta', 0.15, 0.30),
('Tutarsız', 0.30, 0.50),
('Çok tutarsız', 0.50, 1.01),
]:
mask = (df_results['consistency_score'] >= lo) & (df_results['consistency_score'] < hi)
n = mask.sum()
ou25_hit = df_results[mask]['ou25_correct'].mean()
ms_hit = df_results[mask]['ms_correct'].mean()
print(f' {label:<20} {n:>6,} maç ({n/len(df_results)*100:>4.0f}%) | '
f'MS={ms_hit:.0%} OU25={ou25_hit:.0%}')
# Raporu kaydet
report = {
'total_test': len(df_results),
'thresholds': {},
}
for t in thresholds:
mask = df_results['consistency_score'] <= t
n = mask.sum()
report['thresholds'][str(t)] = {
'n_matches': int(n),
'pct': round(n/len(df_results)*100, 1),
'markets': {},
}
for m in markets:
col_c = f'{m}_correct'
col_r = f'{m}_roi'
if col_c not in df_results.columns:
continue
sub_c = df_results[mask][col_c].dropna()
sub_r = df_results[mask][col_r].dropna()
if len(sub_c) > 0:
report['thresholds'][str(t)]['markets'][m] = {
'hit_rate': round(float(sub_c.mean()), 4),
'avg_roi': round(float(sub_r.mean()), 4),
'total_roi': round(float(sub_r.sum()), 2),
}
out_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
'reports', 'backtest_consistency.json')
with open(out_path, 'w') as f:
json.dump(report, f, indent=2)
print(f'\nRapor: {out_path}')
if __name__ == '__main__':
main()
+145
View File
@@ -0,0 +1,145 @@
"""
Diagnostic Backtest - Hangi Pazar Kanıyor?
===========================================
Analyses the 500 matches to see WHICH markets are losing money.
"""
import os
import sys
import json
import time
import psycopg2
from psycopg2.extras import RealDictCursor
from collections import defaultdict
AI_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(AI_DIR)
sys.path.insert(0, ROOT_DIR)
if "scripts" in os.path.basename(AI_DIR):
ROOT_DIR = os.path.dirname(ROOT_DIR)
from services.single_match_orchestrator import get_single_match_orchestrator
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
def run_diagnostic():
print("🔍 TANI BACKTESTİ: NEREDE KAYBETTİK?")
print("="*60)
leagues_path = os.path.join(ROOT_DIR, "top_leagues.json")
with open(leagues_path, 'r') as f:
top_leagues = json.load(f)
league_ids = tuple(str(lid) for lid in top_leagues)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
cur.execute("""
SELECT m.id, m.match_name, m.home_team_id, m.away_team_id,
m.score_home, m.score_away, m.league_id,
t1.name as home_team, t2.name as away_team
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
WHERE m.league_id IN %s
AND m.status = 'FT'
AND m.score_home IS NOT NULL
AND EXISTS (SELECT 1 FROM odd_categories oc WHERE oc.match_id = m.id)
ORDER BY m.mst_utc DESC
LIMIT 500
""", (league_ids,))
rows = cur.fetchall()
print(f"📊 {len(rows)} maç analiz ediliyor...\n")
try: orchestrator = get_single_match_orchestrator()
except Exception as e:
print(f"❌ AI Hatası: {e}")
return
# Market Stats: { "MS": {"won": 10, "lost": 20, "profit": -5.0}, ... }
market_stats = defaultdict(lambda: {"won": 0, "lost": 0, "profit": 0.0, "total": 0})
for i, row in enumerate(rows):
match_id = str(row['id'])
h_score = row['score_home'] or 0
a_score = row['score_away'] or 0
try:
pred = orchestrator.analyze_match(match_id)
if not pred: continue
candidates = []
if pred.get("expert_recommendation"):
rec = pred["expert_recommendation"]
if rec.get("main_pick"): candidates.append(rec["main_pick"])
if rec.get("value_picks"): candidates.extend(rec["value_picks"])
elif pred.get("main_pick"):
candidates.append(pred["main_pick"])
played_this = False
for c in candidates:
if not c: continue
conf = c.get("confidence", 0)
odds = c.get("odds", 0)
pick = str(c.get("pick")).upper()
market_type = c.get("market_type", "Unknown")
# Criteria
if conf >= 60 and odds > 1.10:
implied = 1.0 / odds
edge = ((conf/100) - implied) * 100
if edge > -2.0:
# Resolve
won = False
if pick in ["1", "MS 1"] and h_score > a_score: won = True
elif pick in ["X", "MS X"] and h_score == a_score: won = True
elif pick in ["2", "MS 2"] and a_score > h_score: won = True
elif pick in ["1X", "X2"]:
if "1X" in pick and h_score >= a_score: won = True
elif "X2" in pick and a_score >= h_score: won = True
elif pick == "12" and h_score != a_score: won = True
elif "ÜST" in pick or "OVER" in pick:
line = 2.5
if "1.5" in pick: line = 1.5
elif "3.5" in pick: line = 3.5
if (h_score + a_score) > line: won = True
elif "ALT" in pick or "UNDER" in pick:
line = 2.5
if "1.5" in pick: line = 1.5
elif "3.5" in pick: line = 3.5
if (h_score + a_score) < line: won = True
elif "VAR" in pick and h_score > 0 and a_score > 0: won = True
elif "YOK" in pick and (h_score == 0 or a_score == 0): won = True
market_stats[market_type]["total"] += 1
if won:
market_stats[market_type]["won"] += 1
market_stats[market_type]["profit"] += (odds - 1.0)
else:
market_stats[market_type]["lost"] += 1
market_stats[market_type]["profit"] -= 1.0
played_this = True
break # Only one bet per match
except: pass
# Print Results
print("\n" + "="*60)
print("📊 PAZAR BAZLI KAR/ZARAR TABLOSU")
print("="*60)
print(f"{'Market':<15} {'Oynanan':<10} {'Kazanılan':<10} {'Win%':<8} {'Kâr':<10}")
print("-" * 60)
for mkt, stats in sorted(market_stats.items(), key=lambda x: x[1]["profit"], reverse=True):
wr = (stats["won"] / stats["total"] * 100) if stats["total"] > 0 else 0
print(f"{mkt:<15} {stats['total']:<10} {stats['won']:<10} {wr:.1f}% {stats['profit']:+.2f} Units")
cur.close()
conn.close()
if __name__ == "__main__":
run_diagnostic()
+257
View File
@@ -0,0 +1,257 @@
"""
Multi-market hit-rate backtest.
Runs the orchestrator against historical finished matches and measures raw V25
pick accuracy per market independent of the "playable" gate. This isolates
model quality from the value-detection thresholds.
Usage:
python scripts/backtest_hitrate.py --start 2026-05-01 --end 2026-05-09 [--limit 500]
"""
from __future__ import annotations
import argparse
import json
import os
import sys
import time
from collections import defaultdict
from typing import Any, Dict, List, Optional, Tuple
import psycopg2
from psycopg2.extras import RealDictCursor
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from data.db import get_clean_dsn
from services.single_match_orchestrator import SingleMatchOrchestrator
def fetch_matches(cur, start_date: str, end_date: str, limit: Optional[int]) -> List[Dict[str, Any]]:
cur.execute(
"""
SELECT m.id, m.score_home, m.score_away, m.ht_score_home, m.ht_score_away,
m.mst_utc, t1.name as home_name, t2.name as away_name
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
WHERE m.status IN ('FT', 'AET', 'PEN')
AND m.sport = 'football'
AND to_timestamp(m.mst_utc / 1000.0)::date BETWEEN %s::date AND %s::date
AND m.score_home IS NOT NULL
AND m.score_away IS NOT NULL
ORDER BY m.mst_utc ASC
""" + (f" LIMIT {int(limit)}" if limit else ""),
(start_date, end_date),
)
return cur.fetchall()
def actual_ms(h: int, a: int) -> str:
return "1" if h > a else ("X" if h == a else "2")
def actual_ht(hh: Optional[int], ha: Optional[int]) -> Optional[str]:
if hh is None or ha is None:
return None
return "1" if hh > ha else ("X" if hh == ha else "2")
OVER_TOKENS = {"over", "üst", "ust"}
UNDER_TOKENS = {"under", "alt"}
YES_TOKENS = {"yes", "var", "kg var"}
NO_TOKENS = {"no", "yok", "kg yok"}
ODD_TOKENS = {"odd", "tek"}
EVEN_TOKENS = {"even", "çift", "cift"}
def _norm(s: str) -> str:
return str(s or "").strip().lower()
def score_pick(market: str, predicted: str, h: int, a: int, hh: Optional[int], ha: Optional[int]) -> Optional[bool]:
"""Return True/False for hit, or None if cannot evaluate."""
total = h + a
ht_total = (hh + ha) if hh is not None and ha is not None else None
p = _norm(predicted)
if market == "MS":
return p.upper() == actual_ms(h, a)
if market in ("OU15", "OU25", "OU35"):
line = {"OU15": 1.5, "OU25": 2.5, "OU35": 3.5}[market]
if p in OVER_TOKENS:
return total > line
if p in UNDER_TOKENS:
return total < line
return None
if market == "BTTS":
btts = h > 0 and a > 0
if p in YES_TOKENS:
return btts
if p in NO_TOKENS:
return not btts
return None
if market == "HT":
ht = actual_ht(hh, ha)
return None if ht is None else p.upper() == ht
if market in ("HT_OU05", "HT_OU15"):
if ht_total is None:
return None
line = 0.5 if market == "HT_OU05" else 1.5
if p in OVER_TOKENS:
return ht_total > line
if p in UNDER_TOKENS:
return ht_total < line
return None
if market == "HTFT":
ht = actual_ht(hh, ha)
if ht is None:
return None
full = actual_ms(h, a)
norm = p.replace(" ", "").upper().replace("0", "X")
return norm == f"{ht}/{full}"
if market == "OE":
odd = total % 2 == 1
if p in ODD_TOKENS:
return odd
if p in EVEN_TOKENS:
return not odd
return None
if market == "DC":
ms = actual_ms(h, a)
compact = p.replace("-", "").upper()
if compact == "1X":
return ms in ("1", "X")
if compact == "X2":
return ms in ("X", "2")
if compact == "12":
return ms in ("1", "2")
return None
# CARDS / HCAP cannot be scored without extra data
return None
def top_pick(probs: Dict[str, float]) -> Tuple[Optional[str], float]:
if not probs:
return None, 0.0
key = max(probs, key=lambda k: float(probs.get(k, 0) or 0))
return key, float(probs.get(key, 0) or 0)
def run(start_date: str, end_date: str, limit: Optional[int], out_path: Optional[str]) -> None:
dsn = get_clean_dsn()
print(f"DSN host={dsn.split('@')[-1].split('/')[0]}")
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
matches = fetch_matches(cur, start_date, end_date, limit)
print(f"Found {len(matches)} matches between {start_date} and {end_date}")
if not matches:
return
orchestrator = SingleMatchOrchestrator()
market_stats: Dict[str, Dict[str, Any]] = defaultdict(lambda: {
"total": 0, "hits": 0, "skipped": 0,
"playable_total": 0, "playable_hits": 0,
"conf_sum": 0.0,
})
detailed_rows: List[Dict[str, Any]] = []
errors = 0
started = time.time()
for idx, m in enumerate(matches, 1):
try:
pkg = orchestrator.analyze_match(m["id"])
except Exception as e:
errors += 1
if errors <= 5:
print(f"[ERR] {m['id']}: {e}")
continue
if not pkg:
continue
board = pkg.get("market_board", {}) or {}
h = int(m["score_home"])
a = int(m["score_away"])
hh = m.get("ht_score_home")
ha = m.get("ht_score_away")
for market, entry in board.items():
if not isinstance(entry, dict):
continue
probs = entry.get("probs") or {}
pick, prob = top_pick(probs)
if pick is None:
continue
hit = score_pick(market, pick, h, a, hh, ha)
stats = market_stats[market]
if hit is None:
stats["skipped"] += 1
continue
stats["total"] += 1
stats["conf_sum"] += prob
if hit:
stats["hits"] += 1
if entry.get("playable") is True:
stats["playable_total"] += 1
if hit:
stats["playable_hits"] += 1
detailed_rows.append({
"match_id": m["id"],
"market": market,
"pick": pick,
"prob": round(prob, 4),
"hit": hit,
"playable": bool(entry.get("playable")),
"score": f"{h}-{a}",
"ht_score": f"{hh}-{ha}" if hh is not None else None,
})
if idx % 25 == 0:
elapsed = time.time() - started
print(f" ... processed {idx}/{len(matches)} ({elapsed:.1f}s)")
elapsed = time.time() - started
print("\n" + "=" * 72)
print(f"BACKTEST {start_date} .. {end_date} | matches={len(matches)} errors={errors} elapsed={elapsed:.1f}s")
print("=" * 72)
header = f"{'Market':<10} {'N':>5} {'Hit':>5} {'Rate':>7} {'AvgConf':>8} | {'PlayN':>6} {'PlayHit':>7} {'PlayRate':>8}"
print(header)
print("-" * 72)
for market in sorted(market_stats.keys()):
s = market_stats[market]
n = s["total"]
rate = (s["hits"] / n * 100) if n else 0.0
avg_conf = (s["conf_sum"] / n * 100) if n else 0.0
pn = s["playable_total"]
prate = (s["playable_hits"] / pn * 100) if pn else 0.0
print(f"{market:<10} {n:>5} {s['hits']:>5} {rate:>6.1f}% {avg_conf:>7.1f}% | {pn:>6} {s['playable_hits']:>7} {prate:>7.1f}%")
if out_path:
payload = {
"range": {"start": start_date, "end": end_date},
"match_count": len(matches),
"errors": errors,
"elapsed_sec": round(elapsed, 1),
"market_stats": {k: dict(v) for k, v in market_stats.items()},
"rows": detailed_rows,
}
with open(out_path, "w") as f:
json.dump(payload, f, indent=2, ensure_ascii=False)
print(f"\nSaved details to {out_path}")
def main() -> None:
p = argparse.ArgumentParser()
p.add_argument("--start", required=True, help="YYYY-MM-DD")
p.add_argument("--end", required=True, help="YYYY-MM-DD")
p.add_argument("--limit", type=int, default=None)
p.add_argument("--out", default=None, help="Optional JSON output path")
args = p.parse_args()
run(args.start, args.end, args.limit, args.out)
if __name__ == "__main__":
main()
+310
View File
@@ -0,0 +1,310 @@
"""
League Model Backtest Son 100+ Maç
======================================
Her lig için en son 100-200 maçı (eğitim datasından bağımsız, test seti)
lig bazlı modelle tahmin eder ve gerçek sonuçla karşılaştırır.
Usage:
python scripts/backtest_league_models.py
python scripts/backtest_league_models.py --min-matches 150
"""
import os, sys, json, warnings, argparse
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.metrics import accuracy_score
warnings.filterwarnings("ignore")
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from models.league_model import get_league_model_loader, MARKET_META, FILE_TO_SIGNAL
AI_ENGINE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
DATA_PATH = os.path.join(AI_ENGINE_DIR, "data", "training_data.csv")
REPORTS_DIR = os.path.join(AI_ENGINE_DIR, "reports")
QL_PATH = os.path.join(os.path.dirname(AI_ENGINE_DIR), "qualified_leagues.json")
# Gerçek label kolonları (CSV'den)
LABEL_COLS = {
"MS": "label_ms",
"OU15": "label_ou15",
"OU25": "label_ou25",
"OU35": "label_ou35",
"BTTS": "label_btts",
"HT": "label_ht_result",
"HT_OU05": "label_ht_ou05",
"HT_OU15": "label_ht_ou15",
"HTFT": "label_ht_ft",
"OE": "label_odd_even",
"CARDS": "label_cards_ou45",
"HCAP": "label_handicap_ms",
}
# Model dosya adı → signal key eşlemesi
SIGNAL_TO_FILE = {v: k for k, v in FILE_TO_SIGNAL.items()}
SKIP_COLS = {
"match_id","home_team_id","away_team_id","league_id","mst_utc",
"score_home","score_away","total_goals","ht_score_home","ht_score_away","ht_total_goals",
"label_ms","label_ou05","label_ou15","label_ou25","label_ou35","label_btts",
"label_ht_result","label_ht_ou05","label_ht_ou15","label_ht_ft",
"label_odd_even","label_yellow_cards","label_cards_ou45","label_handicap_ms",
}
def backtest_league(
league_id: str,
df_league: pd.DataFrame,
feature_cols: list,
league_model,
n_test: int,
) -> dict:
"""Son n_test maçı backtest et, her market için doğruluk döndür."""
df_sorted = df_league.sort_values("mst_utc")
df_test = df_sorted.tail(n_test)
X = df_test[feature_cols].fillna(0)
results = {}
for sig_key, mfile_key in SIGNAL_TO_FILE.items():
label_col = LABEL_COLS.get(sig_key)
if not label_col or label_col not in df_test.columns:
continue
y_true = df_test[label_col].dropna().values
if len(y_true) < 30:
continue
# League-specific model varsa kullan
if league_model and league_model.has_market(mfile_key):
probs_list = []
preds = []
for _, row in df_test.iterrows():
feat = row[feature_cols].fillna(0).to_dict()
probs = league_model.predict_market(mfile_key, feat)
if probs:
best = max(probs, key=probs.__getitem__)
meta = MARKET_META[mfile_key]
labels = meta[1]
pred_idx = labels.index(best)
preds.append(pred_idx)
probs_list.append(list(probs.values()))
if not preds:
continue
y_valid = df_test[label_col].dropna()
if len(preds) != len(y_valid):
min_len = min(len(preds), len(y_valid))
preds = preds[:min_len]
y_valid = y_valid.values[:min_len]
else:
y_valid = y_valid.values
acc = accuracy_score(y_valid, preds)
results[sig_key] = {
"accuracy": round(acc, 4),
"n": len(preds),
"source": "league_specific",
}
return results
def backtest_with_general_v25(
df_test: pd.DataFrame,
feature_cols: list,
) -> dict:
"""Genel V25 modeli ile backtest."""
try:
from models.v25_ensemble import get_v25_predictor
v25 = get_v25_predictor()
if not v25._loaded:
v25.load_models()
except Exception as e:
return {}
X = df_test[feature_cols].fillna(0)
results = {}
mkey_map = {
"MS": ("ms", {"1": 0, "X": 1, "2": 2}),
"OU15": ("ou15", {"Over": 0, "Under": 1}),
"OU25": ("ou25", {"Over": 0, "Under": 1}),
"OU35": ("ou35", {"Over": 0, "Under": 1}),
"BTTS": ("btts", {"Yes": 0, "No": 1}),
}
for sig_key, (mkey, label_to_idx) in mkey_map.items():
label_col = LABEL_COLS.get(sig_key)
if not label_col or label_col not in df_test.columns:
continue
y_true = df_test[label_col].dropna().values
if len(y_true) < 30 or not v25.has_market(mkey):
continue
try:
dmat = xgb.DMatrix(X.values, feature_names=feature_cols)
models_v25 = v25.models.get(mkey, {})
if "xgb" not in models_v25:
continue
raw = models_v25["xgb"].predict(dmat)
num_class = list(MARKET_META.get(mkey, (2,)))[0]
if num_class > 2:
raw = raw.reshape(-1, num_class)
preds = np.argmax(raw, axis=1)
else:
preds = (raw >= 0.5).astype(int)
acc = accuracy_score(y_true, preds)
results[sig_key] = {
"accuracy": round(acc, 4),
"n": len(preds),
"source": "general_v25",
}
except Exception:
continue
return results
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--min-matches", type=int, default=100)
parser.add_argument("--test-size", type=int, default=150,
help="Son kaç maçı test için kullan (min 100)")
args = parser.parse_args()
n_test = max(args.min_matches, args.test_size)
print(f"Loading training data ...")
df = pd.read_csv(DATA_PATH, low_memory=False)
feature_cols = [c for c in df.columns if c not in SKIP_COLS]
print(f" {len(df):,} maç | {len(feature_cols)} feature")
qualified = json.load(open(QL_PATH)) if os.path.exists(QL_PATH) else []
loader = get_league_model_loader()
try:
import psycopg2
from data.db import get_clean_dsn
conn = psycopg2.connect(get_clean_dsn())
cur = conn.cursor()
cur.execute("SELECT id, name FROM leagues WHERE id = ANY(%s)", (qualified,))
league_names = {r[0]: r[1] for r in cur.fetchall()}
conn.close()
except Exception:
league_names = {}
counts = df[df["league_id"].isin(qualified)].groupby("league_id").size()
leagues_to_test = counts[counts >= n_test].index.tolist()
print(f"\nBacktest: {len(leagues_to_test)} lig (>={n_test} maç) | son {n_test} maç kullanılacak\n")
all_results = []
markets_order = ["MS", "OU15", "OU25", "OU35", "BTTS", "HT", "HT_OU05", "HT_OU15", "HTFT", "OE", "CARDS", "HCAP"]
header = f"{'Liga':<35} {'Maç':>5} | " + " | ".join(f"{m:>7}" for m in markets_order)
print(header)
print("-" * len(header))
for league_id in leagues_to_test:
df_league = df[df["league_id"] == league_id].copy()
name = league_names.get(league_id, league_id[:20])
league_model = loader.get(league_id)
if league_model and league_model.models:
# Batch predict from CSV features (fast)
df_test = df_league.sort_values("mst_utc").tail(n_test)
X = df_test[feature_cols].fillna(0)
mkt_results = {}
for mfile_key in list(league_model.models.keys()):
sig_key = FILE_TO_SIGNAL.get(mfile_key)
if not sig_key:
continue
label_col = LABEL_COLS.get(sig_key)
if not label_col or label_col not in df_test.columns:
continue
y_true = df_test[label_col].dropna().values
if len(y_true) < 30:
continue
try:
dmat = xgb.DMatrix(X.values, feature_names=feature_cols)
raw = league_model.models[mfile_key].predict(dmat)
nc = MARKET_META[mfile_key][0]
if nc > 2:
preds = np.argmax(raw.reshape(-1, nc), axis=1)
else:
preds = (raw >= 0.5).astype(int)
acc = accuracy_score(y_true[:len(preds)], preds[:len(y_true)])
mkt_results[sig_key] = {"accuracy": round(float(acc), 4), "n": len(preds), "source": "league_xgb"}
except Exception as e:
mkt_results[sig_key] = {"error": str(e)}
# Fill missing markets with general V25
missing_mkts_df = df_league.sort_values("mst_utc").tail(n_test)
gen_results = backtest_with_general_v25(missing_mkts_df, feature_cols)
for k, v in gen_results.items():
if k not in mkt_results:
mkt_results[k] = {**v, "source": "general_v25_fallback"}
else:
# No league model — use general V25
df_test = df_league.sort_values("mst_utc").tail(n_test)
mkt_results = backtest_with_general_v25(df_test, feature_cols)
for k in mkt_results:
mkt_results[k]["source"] = "general_v25"
n_used = min(n_test, len(df_league))
# Print row
accs = []
for m in markets_order:
r = mkt_results.get(m, {})
if "accuracy" in r:
accs.append(f"{r['accuracy']*100:>6.1f}%")
else:
accs.append(f"{'':>7}")
print(f"{name:<35} {n_used:>5} | " + " | ".join(accs))
all_results.append({
"league_id": league_id,
"league_name": name,
"n_tested": n_used,
"markets": mkt_results,
})
# ── Özet ──────────────────────────────────────────────────────
print("\n" + "=" * len(header))
print("ORTALAMA DOĞRULUK (tüm ligler):")
for m in markets_order:
accs = [r["markets"][m]["accuracy"] for r in all_results if m in r["markets"] and "accuracy" in r["markets"][m]]
if accs:
print(f" {m:<10}: {np.mean(accs)*100:.1f}% (min={min(accs)*100:.1f}% max={max(accs)*100:.1f}% n_leagues={len(accs)})")
# En iyi / en kötü MS ligleri
ms_sorted = sorted(
[(r["league_name"], r["markets"].get("MS",{}).get("accuracy",0), r["n_tested"])
for r in all_results if "MS" in r["markets"] and "accuracy" in r["markets"]["MS"]],
key=lambda x: x[1], reverse=True
)
print("\nEN İYİ MS (Top 10):")
for name, acc, n in ms_sorted[:10]:
print(f" {name:<35} {acc*100:.1f}% ({n} maç)")
print("\nEN KÖTÜ MS (Bottom 10):")
for name, acc, n in ms_sorted[-10:]:
print(f" {name:<35} {acc*100:.1f}% ({n} maç)")
# Save
report = {"generated_at": pd.Timestamp.now().isoformat(), "n_test_per_league": n_test, "results": all_results}
out_path = os.path.join(REPORTS_DIR, "backtest_league_results.json")
with open(out_path, "w") as f:
json.dump(report, f, indent=2)
print(f"\nRapor: {out_path}")
if __name__ == "__main__":
main()
+136
View File
@@ -0,0 +1,136 @@
"""
Gerçek Odds Bazlı Backtest
============================
Model olasılığı vs gerçek bookmaker odds karşılaştırır.
Edge varsa bahis açıldığı varsayılır, gerçek ROI hesaplanır.
"""
import os, sys, json
import numpy as np
import pandas as pd
import xgboost as xgb
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
DATA_PATH = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'data', 'training_data.csv')
MODELS_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'models', 'v25')
REPORT_DIR = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'reports')
SKIP_COLS = {
'match_id','home_team_id','away_team_id','league_id','mst_utc',
'score_home','score_away','total_goals','ht_score_home','ht_score_away','ht_total_goals',
'label_ms','label_ou05','label_ou15','label_ou25','label_ou35','label_btts',
'label_ht_result','label_ht_ou05','label_ht_ou15','label_ht_ft',
'label_odd_even','label_yellow_cards','label_cards_ou45','label_handicap_ms',
}
# (model_key, n_class, pred_class, label_col, odds_col, isim)
MARKETS = [
('ms', 3, 0, 'label_ms', 'odds_ms_h', 'MS-Ev'),
('ms', 3, 1, 'label_ms', 'odds_ms_d', 'MS-Ber'),
('ms', 3, 2, 'label_ms', 'odds_ms_a', 'MS-Dep'),
('ou15', 2, 1, 'label_ou15', 'odds_ou15_o', 'OU15-Ust'),
('ou15', 2, 0, 'label_ou15', 'odds_ou15_u', 'OU15-Alt'),
('ou25', 2, 1, 'label_ou25', 'odds_ou25_o', 'OU25-Ust'),
('ou25', 2, 0, 'label_ou25', 'odds_ou25_u', 'OU25-Alt'),
('ou35', 2, 1, 'label_ou35', 'odds_ou35_o', 'OU35-Ust'),
('ou35', 2, 0, 'label_ou35', 'odds_ou35_u', 'OU35-Alt'),
('btts', 2, 1, 'label_btts', 'odds_btts_y', 'BTTS-Var'),
('btts', 2, 0, 'label_btts', 'odds_btts_n', 'BTTS-Yok'),
]
MIN_ODDS = 1.10
MAX_ODDS = 10.0
def load_model(market):
path = os.path.join(MODELS_DIR, f'xgb_v25_{market}.json')
if not os.path.exists(path):
return None
b = xgb.Booster()
b.load_model(path)
return b
def main():
print('Veri yukleniyor...')
df = pd.read_csv(DATA_PATH, low_memory=False)
df = df.sort_values('mst_utc')
n_test = int(len(df) * 0.20)
df_test = df.tail(n_test).copy().reset_index(drop=True)
print(f'Test seti: {len(df_test):,} mac')
feature_cols = [c for c in df.columns if c not in SKIP_COLS]
X = df_test[feature_cols].fillna(0).values
# Modelleri yukle
loaded = {}
for mkey, n_class, *_ in MARKETS:
if mkey not in loaded:
m = load_model(mkey)
if m:
loaded[mkey] = (m, n_class)
print(f'Modeller: {list(loaded.keys())}')
# Toplu tahmin
raw_preds = {}
for mkey, (model, n_class) in loaded.items():
dmat = xgb.DMatrix(pd.DataFrame(X, columns=feature_cols))
raw = model.predict(dmat)
raw_preds[mkey] = raw.reshape(-1, n_class) if n_class > 2 else np.column_stack([1-raw, raw])
# Backtest
all_results = []
print(f'\n{"Market":<12} {"Edge>=":>7} {"Bahis":>7} {"Hit%":>7} {"AvgOdds":>9} {"ROI/b":>8} {"Toplam":>10}')
print('-' * 65)
for mkey, n_class, pred_cls, label_col, odds_col, isim in MARKETS:
if mkey not in raw_preds or label_col not in df_test.columns or odds_col not in df_test.columns:
continue
mp = raw_preds[mkey][:, pred_cls]
act = pd.to_numeric(df_test[label_col], errors='coerce').values
bko = pd.to_numeric(df_test[odds_col], errors='coerce').values
valid = (~np.isnan(act) & ~np.isnan(bko) &
(bko >= MIN_ODDS) & (bko <= MAX_ODDS))
mp, act, bko = mp[valid], act[valid].astype(int), bko[valid]
implied = 1.0 / bko
edge = mp - implied
print(f'\n{isim}:')
for min_e in [0.02, 0.03, 0.05, 0.07, 0.10]:
mask = edge >= min_e
n = mask.sum()
if n < 20:
continue
won = (act[mask] == pred_cls).astype(int)
roi = (bko[mask] - 1) * won - (1 - won)
hit = won.mean()
avg_roi = roi.mean()
total = roi.sum()
avg_odds = bko[mask].mean()
sign = '+' if total > 0 else ''
print(f' edge>={min_e:+.0%} n={n:>5,} hit={hit:.1%} odds={avg_odds:.2f} roi/b={avg_roi:+.3f} toplam={sign}{total:.1f}')
all_results.append({'market': isim, 'min_edge': min_e, 'n': n,
'hit': round(hit, 4), 'avg_odds': round(avg_odds, 3),
'avg_roi': round(avg_roi, 4), 'total_roi': round(total, 2)})
# En iyi
winners = sorted([r for r in all_results if r['total_roi'] > 0],
key=lambda x: x['avg_roi'], reverse=True)
print(f'\n{"="*65}')
print('KAZANCLI KOMBINASYONLAR (total_roi > 0):')
print(f'{"="*65}')
for r in winners[:20]:
print(f' {r["market"]:<12} edge>={r["min_edge"]:+.0%} | n={r["n"]:>5,} | '
f'hit={r["hit"]:.0%} | roi/b={r["avg_roi"]:+.3f} | toplam={r["total_roi"]:+.1f}')
os.makedirs(REPORT_DIR, exist_ok=True)
with open(os.path.join(REPORT_DIR, 'backtest_real_odds.json'), 'w') as f:
json.dump(all_results, f, indent=2)
print(f'\nRapor kaydedildi.')
if __name__ == '__main__':
main()
+231
View File
@@ -0,0 +1,231 @@
"""
Backtest ROI Engine
===================
Simulates the NEW "Skip Logic" on historical predictions.
Answers: "What if we only played the bets the model was confident about?"
Usage:
python ai-engine/scripts/backtest_roi.py
"""
import os
import sys
import json
import psycopg2
from psycopg2.extras import RealDictCursor
from typing import Dict, List, Any
from dotenv import load_dotenv
# Load .env from project root (2 levels up from this script)
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
load_dotenv(os.path.join(project_root, ".env"))
def get_clean_dsn() -> str:
"""Return a psycopg2-compatible DSN from DATABASE_URL."""
# HARDCODED FOR BACKTEST (Bypassing dotenv issues)
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
# ─── Configuration (Matching the NEW BetRecommender Logic) ─────────
# Minimum confidence to even consider a bet (Hard Gate)
MIN_CONF_THRESHOLDS = {
"MS": 45.0,
"DC": 40.0,
"OU15": 50.0,
"OU25": 45.0,
"OU35": 45.0,
"BTTS": 45.0,
"HT": 40.0,
}
def get_market_type_from_key(key: str) -> str:
"""Map prediction keys to market types for thresholding."""
if key.startswith("ms_") or key in ["1", "X", "2"]: return "MS"
if key.startswith("dc_") or key in ["1X", "X2", "12"]: return "DC"
if key.startswith("ou15_") or key.startswith("1.5"): return "OU15"
if key.startswith("ou25_") or key.startswith("2.5"): return "OU25"
if key.startswith("ou35_") or key.startswith("3.5"): return "OU35"
if key.startswith("btts_") or key in ["Var", "Yok"]: return "BTTS"
if key.startswith("ht_") or key.startswith("İY"): return "HT"
return "MS"
def simulate_backtest():
print("🚀 Starting Backtest with NEW 'Skip Logic'...")
print("="*60)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
# 1. Fetch PREDICTIONS that have a confidence score
# We limit to last 1000 finished matches to keep it fast but representative
cur.execute("""
SELECT p.match_id, p.prediction_json,
m.score_home, m.score_away, m.status
FROM predictions p
JOIN matches m ON p.match_id = m.id
WHERE m.status = 'FT'
AND p.prediction_json IS NOT NULL
ORDER BY m.mst_utc DESC
LIMIT 2000
""")
predictions = cur.fetchall()
print(f"📊 Loaded {len(predictions)} historical predictions.")
total_bets = 0
winning_bets = 0
skipped_bets = 0
total_profit = 0.0 # Assuming unit stake of 1.0
# 2. Process each prediction
for pred_row in predictions:
match_id = pred_row['match_id']
data = pred_row['prediction_json']
if isinstance(data, str):
data = json.loads(data)
# Real result
home_score = pred_row['score_home'] or 0
away_score = pred_row['score_away'] or 0
total_goals = home_score + away_score
# Extract prediction details from the JSON structure
# The structure varies, but usually contains 'main_pick', 'bet_summary', or 'market_board'
# Try to get the main pick recommendation
main_pick = None
main_pick_conf = 0.0
main_pick_odds = 0.0
# Navigate the V20+ JSON structure
market_board = data.get("market_board", {})
# Check Main Pick
if "main_pick" in data:
mp = data["main_pick"]
if isinstance(mp, dict):
main_pick = mp.get("pick")
main_pick_conf = mp.get("confidence", 0.0)
main_pick_odds = mp.get("odds", 0.0)
# If no main pick, try bet_summary
if not main_pick and "bet_summary" in data:
summary = data["bet_summary"]
if isinstance(summary, list) and len(summary) > 0:
# Take the highest confidence one
best = max(summary, key=lambda x: x.get("confidence", 0))
main_pick = best.get("pick")
main_pick_conf = best.get("confidence", 0.0)
main_pick_odds = best.get("odds", 0.0)
if not main_pick or not main_pick_conf:
continue
# ─── NEW LOGIC: APPLY FILTERS ───
# 1. Determine Market Type
# Simple heuristic based on pick string
pick_str = str(main_pick).upper()
market_type = "MS"
if "1X" in pick_str or "X2" in pick_str or "12" in pick_str: market_type = "DC"
elif "ÜST" in pick_str or "ALT" in pick_str or "OVER" in pick_str or "UNDER" in pick_str:
if "1.5" in pick_str: market_type = "OU15"
elif "3.5" in pick_str: market_type = "OU35"
else: market_type = "OU25"
elif "VAR" in pick_str or "YOK" in pick_str or "BTTS" in pick_str: market_type = "BTTS"
threshold = MIN_CONF_THRESHOLDS.get(market_type, 45.0)
# 2. Check Confidence Gate
if main_pick_conf < threshold:
skipped_bets += 1
continue
# 3. Check Value Gate (Edge)
if main_pick_odds > 0:
implied_prob = 1.0 / main_pick_odds
my_prob = main_pick_conf / 100.0
edge = my_prob - implied_prob
if edge < -0.03: # Negative value
skipped_bets += 1
continue
# ─── BET IS PLAYED ───
total_bets += 1
# Determine if WON
is_won = False
# Resolve MS (1, X, 2)
if market_type == "MS":
if main_pick == "1" and home_score > away_score: is_won = True
elif main_pick == "X" and home_score == away_score: is_won = True
elif main_pick == "2" and away_score > home_score: is_won = True
elif main_pick == "MS 1" and home_score > away_score: is_won = True
elif main_pick == "MS X" and home_score == away_score: is_won = True
elif main_pick == "MS 2" and away_score > home_score: is_won = True
# Resolve OU (Over/Under)
elif market_type.startswith("OU"):
line = 2.5
if "1.5" in pick_str: line = 1.5
elif "3.5" in pick_str: line = 3.5
is_over = total_goals > line
is_under = total_goals < line # Simplification (usually line is X.5 so no draw)
if "ÜST" in pick_str or "OVER" in pick_str:
if is_over: is_won = True
elif "ALT" in pick_str or "UNDER" in pick_str:
if is_under: is_won = True
# Resolve BTTS
elif market_type == "BTTS":
if home_score > 0 and away_score > 0:
if "VAR" in pick_str: is_won = True
else:
if "YOK" in pick_str: is_won = True
# Resolve DC (Double Chance) - Simplified
elif market_type == "DC":
if "1X" in pick_str and (home_score >= away_score): is_won = True
elif "X2" in pick_str and (away_score >= home_score): is_won = True
elif "12" in pick_str and (home_score != away_score): is_won = True
if is_won:
winning_bets += 1
profit = main_pick_odds - 1.0
total_profit += profit
else:
total_profit -= 1.0
# ─── REPORT ───
print("\n" + "="*60)
print("📈 BACKTEST RESULTS (With NEW Skip Logic)")
print("="*60)
print(f"Total Historical Matches Analyzed: {len(predictions)}")
print(f"🚫 Bets SKIPPED (Low Conf/Bad Value): {skipped_bets}")
print(f"✅ Bets PLAYED: {total_bets}")
if total_bets > 0:
win_rate = (winning_bets / total_bets) * 100
roi = (total_profit / total_bets) * 100
print(f"🏆 Winning Bets: {winning_bets}")
print(f"💀 Losing Bets: {total_bets - winning_bets}")
print("-" * 40)
print(f" Win Rate: {win_rate:.2f}%")
print(f"💰 Total Profit (Units): {total_profit:.2f}")
print(f"📊 ROI: {roi:.2f}%")
if roi > 0:
print("🟢 STRATEGY IS PROFITABLE!")
else:
print("🔴 STRATEGY IS LOSING (Adjust thresholds!)")
else:
print("⚠️ No bets were played. Thresholds might be too high.")
cur.close()
conn.close()
if __name__ == "__main__":
simulate_backtest()
+164
View File
@@ -0,0 +1,164 @@
"""
SNIPER Backtest
===============
Sadece en yüksek güvenilirlik ve değere sahip bahisleri oynar.
"""
import os
import sys
import json
import time
import psycopg2
from psycopg2.extras import RealDictCursor
from datetime import datetime
AI_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(AI_DIR)
sys.path.insert(0, ROOT_DIR)
if "scripts" in os.path.basename(AI_DIR):
ROOT_DIR = os.path.dirname(ROOT_DIR)
from services.single_match_orchestrator import get_single_match_orchestrator
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
MATCH_IDS = [
"v2ljcst50nk37x04xwimpi50", "7gz0bhb5yvdssazl3y5946kno", "7ftj7kbu4rzpewxravf3luuc4",
"7f1z4e8ch1dm5q677644cky6s", "7ffq3aq3so22iymfdzch63nys", "rrkmeuymz7gzvoz8mplikzdg",
"7hegc9covicy699bxsi81xkb8", "7gl7rpr1hjayk3e5ut0gr613o", "7g7d86i3738287xfvyfeffcwk",
"7hs4boe4hv80muawocevvx2j8", "7ijhsloieg4t9yp5cxp0duln8", "7ixaiiptli5ek32kuybuni4gk",
"7i5sfh41cjpwg4l972dm487x0", "eo7g4wunxxxr8uv45q8p5x638", "7dinds2937w4645wva2rddlas",
"7b5ukdhvqh62wtndeqfg01ixg", "7bjptsj24gndoydn7n0202g44", "7cqxf3vo58ewrwmoom5xiyexg",
"7bxjl9h2hnf165rlp3o1vfztg", "7eo8zrez08c342rqsezpvq39w", "7as1muhs98vdarlhsean4bspg",
"7dwhj8cfxv6v6bzxpu5e3h05w", "7d4vq4417ps84yjzh95bnvvv8", "7ea9z501jgp9kxw3gay4myrkk",
"7cd3401itlty6ded7c1wct0yc", "ebgpz9mcije2snv986n6587pw", "i7ar1dkhvcwpxmkyks65ib6c",
"lyek7tyy6qk2xjs9vblucnx0", "hdn9qtyn3ysjwbc3i2trantg", "3y2bnssfqlajosiz2gpkn6xhw",
"40pehd14s9djjtycujavbex3o", "3xnbfjznzmnwml20akbgnis5w", "2eovi2rcc2l4ha7fpb2w7e1hw",
"2bwuikdjyyuithhru8ka8o00k", "2d3pcd76ya9ihi9yotxc553is", "1e9it04z4epy2etdxsffe7m6s",
"7af49jgo4iulv1k8cplj9smj8", "5k3vrz619hdu9nx4rnx6uim1g", "amjppgpetnyr0iisi241kgkyc",
"coqrhq09kxd16iejvgtzj3mz8", "d8ysan1qdctmkvjaz2adw7aqc", "9ttciz0gtb0z09ev1q5fe0ro4",
"9u720o37yaddqu1w6hlszpnh0", "7ijezdjp8t0rjti91ac63hyxg", "72gvdvztbb3dn79jidzzxzcb8",
"6uof1v2s6vrpieeml2bwo9tlg", "91dd8ia3m0bxoqzjgyo3ptsk", "3tj1nt3udsbvb9soqn2cs6gpg",
"1br5g88o5idtjxka1fr6zg4k4", "akuesquthbmxlzckvnqmgles4"
]
def run_sniper_backtest():
print("🎯 SNIPER BACKTEST: SADECE NET OLANLAR")
print("="*60)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
placeholders = ','.join(['%s'] * len(MATCH_IDS))
cur.execute(f"""
SELECT m.id, m.match_name, m.home_team_id, m.away_team_id,
m.score_home, m.score_away,
t1.name as home_team, t2.name as away_team,
l.name as league_name
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
LEFT JOIN leagues l ON m.league_id = l.id
WHERE m.id IN ({placeholders}) AND m.status = 'FT'
""", MATCH_IDS)
rows = cur.fetchall()
print(f"📊 Analiz edilecek {len(rows)} maç var.\n")
try:
orchestrator = get_single_match_orchestrator()
except Exception as e:
print(f"❌ AI Hatası: {e}")
return
total_bet = 0
total_won = 0
total_profit = 0.0
for i, row in enumerate(rows):
match_id = str(row['id'])
home = row['home_team'] or "?"
away = row['away_team'] or "?"
h_score = row['score_home'] or 0
a_score = row['score_away'] or 0
print(f"[{i+1}/{len(rows)}] {home} vs {away} ... ", end="", flush=True)
try:
pred = orchestrator.analyze_match(match_id)
if not pred:
print("⚠️ Veri Yok")
continue
pick_data = pred.get("expert_recommendation", {}).get("main_pick") or pred.get("main_pick", {})
pick = pick_data.get("pick") or pick_data.get("market_type")
conf = pick_data.get("confidence", 0)
odds = pick_data.get("odds", 0)
# SNIPER FİLTRELERİ
if conf < 75:
print(f"🚫 PASS (Conf: {conf:.0f}%)")
continue
if odds < 1.35:
print(f"🚫 PASS (Odds: {odds:.2f} çok düşük)")
continue
# Value Control
implied = 1.0 / odds
if (conf/100) < implied:
print(f"🚫 PASS (Negatif Value)")
continue
# OYNA
total_bet += 1
won = False
pick_clean = str(pick).upper()
if pick_clean in ["1", "MS 1"] and h_score > a_score: won = True
elif pick_clean in ["X", "MS X"] and h_score == a_score: won = True
elif pick_clean in ["2", "MS 2"] and a_score > h_score: won = True
elif "ÜST" in pick_clean or "OVER" in pick_clean:
line = 2.5
if "1.5" in pick_clean: line = 1.5
elif "3.5" in pick_clean: line = 3.5
if (h_score + a_score) > line: won = True
elif "ALT" in pick_clean or "UNDER" in pick_clean:
line = 2.5
if "1.5" in pick_clean: line = 1.5
elif "3.5" in pick_clean: line = 3.5
if (h_score + a_score) < line: won = True
elif "VAR" in pick_clean and h_score > 0 and a_score > 0: won = True
elif "YOK" in pick_clean and (h_score == 0 or a_score == 0): won = True
if won:
total_won += 1
profit = odds - 1.0
total_profit += profit
print(f"✅ WON! (+{profit:.2f})")
else:
total_profit -= 1.0
print(f"❌ LOST! ({pick} @ {odds:.2f})")
except Exception as e:
print(f"💥 Hata: {e}")
print("\n" + "="*60)
print("🎯 SNIPER SONUÇLARI")
print("="*60)
print(f"Oynanan: {total_bet}")
print(f"Kazanılan: {total_won}")
print(f"Kazanma Oranı: %{(total_won/total_bet)*100:.1f}" if total_bet > 0 else "Kazanma Oranı: N/A")
print(f"Toplam Kâr: {total_profit:.2f} Units")
if total_profit > 0:
print("🟢 PARA KAZANDIK!")
else:
print("🔴 PARA KAYBETTİK!")
cur.close()
conn.close()
if __name__ == "__main__":
run_sniper_backtest()
+162
View File
@@ -0,0 +1,162 @@
"""
Strict Sniper Backtest (Calibrated)
===================================
Sadece Güven > %75 ve Oran > 1.30 olan bahisleri oynar.
Modelin şişirilmiş özgüvenini elemek için yapıldı.
"""
import os
import sys
import json
import time
import psycopg2
from psycopg2.extras import RealDictCursor
AI_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(AI_DIR)
sys.path.insert(0, ROOT_DIR)
if "scripts" in os.path.basename(AI_DIR):
ROOT_DIR = os.path.dirname(ROOT_DIR)
from services.single_match_orchestrator import get_single_match_orchestrator
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
def run_strict_backtest():
print("🎯 STRICT SNIPER BACKTEST (Conf > 75%)")
print("="*60)
leagues_path = os.path.join(ROOT_DIR, "top_leagues.json")
with open(leagues_path, 'r') as f:
top_leagues = json.load(f)
league_ids = tuple(str(lid) for lid in top_leagues)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
cur.execute("""
SELECT m.id, m.match_name, m.home_team_id, m.away_team_id,
m.score_home, m.score_away,
t1.name as home_team, t2.name as away_team
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
WHERE m.league_id IN %s
AND m.status = 'FT'
AND m.score_home IS NOT NULL
AND EXISTS (SELECT 1 FROM odd_categories oc WHERE oc.match_id = m.id)
ORDER BY m.mst_utc DESC
LIMIT 500
""", (league_ids,))
rows = cur.fetchall()
print(f"📊 {len(rows)} maç taranıyor. Sadece NET OLANLAR oynanacak...\n")
try: orchestrator = get_single_match_orchestrator()
except Exception as e:
print(f"❌ AI Hatası: {e}")
return
total_bet = 0
total_won = 0
total_profit = 0.0
for i, row in enumerate(rows):
match_id = str(row['id'])
home = row['home_team'] or "?"
away = row['away_team'] or "?"
h_score = row['score_home'] or 0
a_score = row['score_away'] or 0
try:
pred = orchestrator.analyze_match(match_id)
if not pred: continue
# Check all picks for a HIGH CONFIDENCE bet
candidates = []
if pred.get("expert_recommendation"):
rec = pred["expert_recommendation"]
if rec.get("main_pick"): candidates.append(rec["main_pick"])
if rec.get("value_picks"): candidates.extend(rec["value_picks"])
elif pred.get("main_pick"):
candidates.append(pred["main_pick"])
best_bet = None
for c in candidates:
if not c: continue
# Access attributes safely (Dict or Object)
conf = c.get("confidence", 0) if isinstance(c, dict) else getattr(c, 'confidence', 0)
odds = c.get("odds", 0) if isinstance(c, dict) else getattr(c, 'odds', 0)
pick = c.get("pick", "") if isinstance(c, dict) else getattr(c, 'pick', "")
# STRICT CRITERIA
if conf >= 75.0 and odds >= 1.30:
# Check Value (Edge)
implied = 1.0 / odds
edge = ((conf/100) - implied) * 100
if edge > -5.0: # Tolerant edge
if best_bet is None or (conf > (best_bet.get("confidence", 0) if isinstance(best_bet, dict) else getattr(best_bet, 'confidence', 0))):
best_bet = c
if best_bet:
pick = str(best_bet.get("pick") if isinstance(best_bet, dict) else getattr(best_bet, 'pick', "")).upper()
conf = best_bet.get("confidence", 0) if isinstance(best_bet, dict) else getattr(best_bet, 'confidence', 0)
odds = best_bet.get("odds", 0) if isinstance(best_bet, dict) else getattr(best_bet, 'odds', 0)
# Resolution
won = False
if pick in ["1", "MS 1"] and h_score > a_score: won = True
elif pick in ["X", "MS X"] and h_score == a_score: won = True
elif pick in ["2", "MS 2"] and a_score > h_score: won = True
elif pick in ["1X", "X2"]:
if "1X" in pick and h_score >= a_score: won = True
elif "X2" in pick and a_score >= h_score: won = True
elif "ÜST" in pick or "OVER" in pick:
line = 2.5
if "1.5" in pick: line = 1.5
elif "3.5" in pick: line = 3.5
if (h_score + a_score) > line: won = True
elif "ALT" in pick or "UNDER" in pick:
line = 2.5
if "1.5" in pick: line = 1.5
elif "3.5" in pick: line = 3.5
if (h_score + a_score) < line: won = True
elif "VAR" in pick and h_score > 0 and a_score > 0: won = True
elif "YOK" in pick and (h_score == 0 or a_score == 0): won = True
total_bet += 1
if won:
total_won += 1
profit = odds - 1.0
total_profit += profit
print(f"[{i+1}] ✅ {home} vs {away} | {pick} ({conf:.0f}%) -> WON (+{profit:.2f})")
else:
total_profit -= 1.0
print(f"[{i+1}] ❌ {home} vs {away} | {pick} ({conf:.0f}%) -> LOST")
except Exception as e:
pass
print("\n" + "="*60)
print("🎯 STRICT SNIPER SONUÇLARI")
print("="*60)
print(f"Oynanan Bahis: {total_bet}")
print(f"Kazanılan: {total_won}")
if total_bet > 0:
win_rate = (total_won / total_bet) * 100
roi = (total_profit / total_bet) * 100
print(f"Kazanma Oranı: %{win_rate:.2f}")
print(f"Toplam Kâr: {total_profit:.2f} Units")
if total_profit > 0: print("🟢 PARA KAZANDIK!")
else: print("🔴 PARA KAYBETTİK!")
else:
print("⚠️ Yeteri kadar NET maç bulunamadı.")
cur.close()
conn.close()
if __name__ == "__main__":
run_strict_backtest()
+230
View File
@@ -0,0 +1,230 @@
"""
Backtest the live V2 predictor stack against recent finished football matches.
This script uses the same path as production:
database -> feature extractor -> betting predictor -> quant ranking.
"""
from __future__ import annotations
import argparse
import asyncio
import sys
from dataclasses import dataclass
from pathlib import Path
from sqlalchemy import text
ROOT_DIR = Path(__file__).resolve().parents[1]
if str(ROOT_DIR) not in sys.path:
sys.path.insert(0, str(ROOT_DIR))
from core.quant import MarketPick, analyze_market
from data.database import dispose_engine, get_session
from features.extractor import extract_features
from models.betting_engine import get_predictor
@dataclass
class BacktestStats:
sampled_matches: int = 0
analyzed_matches: int = 0
skipped_matches: int = 0
ms_correct: int = 0
ou25_correct: int = 0
btts_correct: int = 0
main_pick_count: int = 0
main_pick_correct: int = 0
playable_pick_count: int = 0
playable_pick_correct: int = 0
playable_units_staked: float = 0.0
playable_units_profit: float = 0.0
def _parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument("--limit", type=int, default=50)
parser.add_argument("--days", type=int, default=45)
return parser.parse_args()
def _actual_ms(score_home: int, score_away: int) -> str:
if score_home > score_away:
return "1"
if score_home < score_away:
return "2"
return "X"
def _actual_ou25(score_home: int, score_away: int) -> str:
return "Over" if (score_home + score_away) > 2 else "Under"
def _actual_btts(score_home: int, score_away: int) -> str:
return "Yes" if score_home > 0 and score_away > 0 else "No"
def _odds_map_from_features(feats) -> dict[str, dict[str, float]]:
return {
"MS": {"1": feats.odds_home, "X": feats.odds_draw, "2": feats.odds_away},
"OU25": {"Under": feats.odds_under25, "Over": feats.odds_over25},
"BTTS": {"No": feats.odds_btts_no, "Yes": feats.odds_btts_yes},
}
def _best_pick(feats, all_probs: dict[str, dict[str, float]]) -> MarketPick | None:
odds_map = _odds_map_from_features(feats)
picks = [
analyze_market("MS", all_probs["MS"], odds_map["MS"], feats.data_quality_score),
analyze_market("OU25", all_probs["OU25"], odds_map["OU25"], feats.data_quality_score),
analyze_market("BTTS", all_probs["BTTS"], odds_map["BTTS"], feats.data_quality_score),
]
ranked = sorted(
[pick for pick in picks if pick.pick],
key=lambda pick: pick.play_score,
reverse=True,
)
return ranked[0] if ranked else None
def _pick_won(pick: MarketPick, actuals: dict[str, str]) -> bool:
return actuals.get(pick.market) == pick.pick
async def _load_match_rows(limit: int, days: int) -> list[dict[str, object]]:
min_mst_utc = days * 86400000
query = text("""
SELECT
m.id,
m.match_name,
m.score_home,
m.score_away,
m.mst_utc
FROM matches m
WHERE m.sport = 'football'
AND m.score_home IS NOT NULL
AND m.score_away IS NOT NULL
AND m.mst_utc >= (
EXTRACT(EPOCH FROM NOW()) * 1000 - :min_mst_utc
)
AND EXISTS (
SELECT 1
FROM odd_categories oc
WHERE oc.match_id = m.id
AND oc.name IN ('Maç Sonucu', '2,5 Alt/Üst', 'Karşılıklı Gol')
)
ORDER BY m.mst_utc DESC
LIMIT :limit
""")
async with get_session() as session:
result = await session.execute(
query,
{"limit": limit, "min_mst_utc": min_mst_utc},
)
rows = result.mappings().all()
return [dict(row) for row in rows]
async def _run(limit: int, days: int) -> BacktestStats:
stats = BacktestStats()
predictor = get_predictor()
rows = await _load_match_rows(limit, days)
stats.sampled_matches = len(rows)
async with get_session() as session:
for row in rows:
match_id = str(row["id"])
score_home = int(row["score_home"])
score_away = int(row["score_away"])
feats = await extract_features(session, match_id)
if feats is None:
stats.skipped_matches += 1
continue
if feats.data_quality_score <= 0.0:
stats.skipped_matches += 1
continue
all_probs = predictor.predict_all(feats.to_model_array(), feats)
stats.analyzed_matches += 1
actuals = {
"MS": _actual_ms(score_home, score_away),
"OU25": _actual_ou25(score_home, score_away),
"BTTS": _actual_btts(score_home, score_away),
}
if max(all_probs["MS"], key=all_probs["MS"].get) == actuals["MS"]:
stats.ms_correct += 1
if max(all_probs["OU25"], key=all_probs["OU25"].get) == actuals["OU25"]:
stats.ou25_correct += 1
if max(all_probs["BTTS"], key=all_probs["BTTS"].get) == actuals["BTTS"]:
stats.btts_correct += 1
best_pick = _best_pick(feats, all_probs)
if best_pick is None:
continue
stats.main_pick_count += 1
if _pick_won(best_pick, actuals):
stats.main_pick_correct += 1
if best_pick.playable:
stats.playable_pick_count += 1
stats.playable_units_staked += best_pick.stake_units
if _pick_won(best_pick, actuals):
stats.playable_pick_correct += 1
stats.playable_units_profit += best_pick.stake_units * (best_pick.odds - 1.0)
else:
stats.playable_units_profit -= best_pick.stake_units
return stats
def _pct(numerator: int, denominator: int) -> float:
if denominator <= 0:
return 0.0
return round((numerator / denominator) * 100.0, 2)
def _roi(profit: float, staked: float) -> float:
if staked <= 0:
return 0.0
return round((profit / staked) * 100.0, 2)
def _print_summary(stats: BacktestStats) -> None:
print("=== V2 Runtime Backtest ===")
print(f"Sampled matches : {stats.sampled_matches}")
print(f"Analyzed matches : {stats.analyzed_matches}")
print(f"Skipped matches : {stats.skipped_matches}")
print(f"MS accuracy : {_pct(stats.ms_correct, stats.analyzed_matches)}%")
print(f"OU2.5 accuracy : {_pct(stats.ou25_correct, stats.analyzed_matches)}%")
print(f"BTTS accuracy : {_pct(stats.btts_correct, stats.analyzed_matches)}%")
print(
"Main pick accuracy : "
f"{_pct(stats.main_pick_correct, stats.main_pick_count)}% "
f"({stats.main_pick_correct}/{stats.main_pick_count})"
)
print(
"Playable accuracy : "
f"{_pct(stats.playable_pick_correct, stats.playable_pick_count)}% "
f"({stats.playable_pick_correct}/{stats.playable_pick_count})"
)
print(f"Units staked : {stats.playable_units_staked:.2f}")
print(f"Units profit : {stats.playable_units_profit:.2f}")
print(f"ROI : {_roi(stats.playable_units_profit, stats.playable_units_staked)}%")
async def _main() -> None:
args = _parse_args()
try:
stats = await _run(args.limit, args.days)
_print_summary(stats)
finally:
await dispose_engine()
if __name__ == "__main__":
asyncio.run(_main())
+147
View File
@@ -0,0 +1,147 @@
"""
Value Hunter Backtest
=====================
Sadece modelin büroyu yendiği (Pozitif Edge) maçları oynar.
"""
import os, sys, json, time, psycopg2
from psycopg2.extras import RealDictCursor
AI_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(AI_DIR)
sys.path.insert(0, ROOT_DIR)
if "scripts" in os.path.basename(AI_DIR): ROOT_DIR = os.path.dirname(ROOT_DIR)
from services.single_match_orchestrator import get_single_match_orchestrator
def get_clean_dsn() -> str:
return "postgresql://suggestbet:SuGGesT2026SecuRe@localhost:15432/boilerplate_db"
MATCH_IDS = [
"v2ljcst50nk37x04xwimpi50", "7gz0bhb5yvdssazl3y5946kno", "7ftj7kbu4rzpewxravf3luuc4",
"7f1z4e8ch1dm5q677644cky6s", "7ffq3aq3so22iymfdzch63nys", "rrkmeuymz7gzvoz8mplikzdg",
"7hegc9covicy699bxsi81xkb8", "7gl7rpr1hjayk3e5ut0gr613o", "7g7d86i3738287xfvyfeffcwk",
"7hs4boe4hv80muawocevvx2j8", "7ijhsloieg4t9yp5cxp0duln8", "7ixaiiptli5ek32kuybuni4gk",
"7i5sfh41cjpwg4l972dm487x0", "eo7g4wunxxxr8uv45q8p5x638", "7dinds2937w4645wva2rddlas",
"7b5ukdhvqh62wtndeqfg01ixg", "7bjptsj24gndoydn7n0202g44", "7cqxf3vo58ewrwmoom5xiyexg",
"7bxjl9h2hnf165rlp3o1vfztg", "7eo8zrez08c342rqsezpvq39w", "7as1muhs98vdarlhsean4bspg",
"7dwhj8cfxv6v6bzxpu5e3h05w", "7d4vq4417ps84yjzh95bnvvv8", "7ea9z501jgp9kxw3gay4myrkk",
"7cd3401itlty6ded7c1wct0yc", "ebgpz9mcije2snv986n6587pw", "i7ar1dkhvcwpxmkyks65ib6c",
"lyek7tyy6qk2xjs9vblucnx0", "hdn9qtyn3ysjwbc3i2trantg", "3y2bnssfqlajosiz2gpkn6xhw",
"40pehd14s9djjtycujavbex3o", "3xnbfjznzmnwml20akbgnis5w", "2eovi2rcc2l4ha7fpb2w7e1hw",
"2bwuikdjyyuithhru8ka8o00k", "2d3pcd76ya9ihi9yotxc553is", "1e9it04z4epy2etdxsffe7m6s",
"7af49jgo4iulv1k8cplj9smj8", "5k3vrz619hdu9nx4rnx6uim1g", "amjppgpetnyr0iisi241kgkyc",
"coqrhq09kxd16iejvgtzj3mz8", "d8ysan1qdctmkvjaz2adw7aqc", "9ttciz0gtb0z09ev1q5fe0ro4",
"9u720o37yaddqu1w6hlszpnh0", "7ijezdjp8t0rjti91ac63hyxg", "72gvdvztbb3dn79jidzzxzcb8",
"6uof1v2s6vrpieeml2bwo9tlg", "91dd8ia3m0bxoqzjgyo3ptsk", "3tj1nt3udsbvb9soqn2cs6gpg",
"1br5g88o5idtjxka1fr6zg4k4", "akuesquthbmxlzckvnqmgles4"
]
def run_value_hunter():
print("💎 VALUE HUNTER: SADECE HATALI ORANLARI YAKALA")
print("="*60)
dsn = get_clean_dsn()
conn = psycopg2.connect(dsn)
cur = conn.cursor(cursor_factory=RealDictCursor)
placeholders = ','.join(['%s'] * len(MATCH_IDS))
cur.execute(f"""
SELECT m.id, m.match_name, m.home_team_id, m.away_team_id,
m.score_home, m.score_away,
t1.name as home_team, t2.name as away_team
FROM matches m
LEFT JOIN teams t1 ON m.home_team_id = t1.id
LEFT JOIN teams t2 ON m.away_team_id = t2.id
WHERE m.id IN ({placeholders}) AND m.status = 'FT'
""", MATCH_IDS)
rows = cur.fetchall()
print(f"📊 {len(rows)} maç taranıyor...\n")
try: orchestrator = get_single_match_orchestrator()
except Exception as e:
print(f"❌ AI Hatası: {e}")
return
total_bet = 0
total_won = 0
total_profit = 0.0
total_edge_found = 0
for i, row in enumerate(rows):
match_id = str(row['id'])
home = row['home_team'] or "?"
away = row['away_team'] or "?"
h_score = row['score_home'] or 0
a_score = row['score_away'] or 0
try:
pred = orchestrator.analyze_match(match_id)
if not pred: continue
# Tüm önerileri kontrol et
picks = pred.get("expert_recommendation", {}).get("value_picks", [])
if not picks: picks = [pred.get("expert_recommendation", {}).get("main_pick")]
played_this_match = False
for pick_data in picks:
if not pick_data: continue
pick = pick_data.get("pick")
conf = pick_data.get("confidence", 0)
odds = pick_data.get("odds", 0)
edge = pick_data.get("edge", 0)
# VALUE KURALI: Model bürodan en az %10 daha iyi olmalı
if edge < 10: continue
if odds < 1.20: continue
total_bet += 1
total_edge_found += edge
won = False
pick_clean = str(pick).upper()
if pick_clean in ["1", "MS 1"] and h_score > a_score: won = True
elif pick_clean in ["X", "MS X"] and h_score == a_score: won = True
elif pick_clean in ["2", "MS 2"] and a_score > h_score: won = True
elif "ÜST" in pick_clean or "OVER" in pick_clean:
line = 2.5
if "1.5" in pick_clean: line = 1.5
if (h_score + a_score) > line: won = True
elif "ALT" in pick_clean or "UNDER" in pick_clean:
line = 2.5
if "1.5" in pick_clean: line = 1.5
if (h_score + a_score) < line: won = True
elif "VAR" in pick_clean and h_score > 0 and a_score > 0: won = True
elif "YOK" in pick_clean and (h_score == 0 or a_score == 0): won = True
if won:
total_won += 1
profit = odds - 1.0
total_profit += profit
print(f"[{i+1}] ✅ {home} vs {away} | {pick} ({edge:.0f}% Edge) -> WON! (+{profit:.2f})")
else:
total_profit -= 1.0
print(f"[{i+1}] ❌ {home} vs {away} | {pick} ({edge:.0f}% Edge) -> LOST")
played_this_match = True
break # Maç başına tek bahis
except Exception: pass
print("\n" + "="*60)
print("💎 VALUE HUNTER SONUÇLARI")
print("="*60)
print(f"Toplam Value Bulunan Bahis: {total_bet}")
print(f"Ortalama Edge: {total_edge_found/total_bet:.1f}%" if total_bet > 0 else "N/A")
print(f"Kazanılan: {total_won}")
print(f"Toplam Kâr: {total_profit:.2f} Units")
if total_profit > 0: print("🟢 PARA KAZANDIK!")
else: print("🔴 PARA KAYBETTİK!")
cur.close()
conn.close()
if __name__ == "__main__":
run_value_hunter()

Some files were not shown because too many files have changed in this diff Show More