In modern sports analytics, data is king. Whether you are building a personal dashboard to track football statistics or training a machine learning model to analyze historical match outcomes, having a reliable, clean, and compliant data source is critical.
While web scraping raw HTML from commercial sites can lead to IP bans and violations of Terms of Service (ToS), using verified developer APIs ensures your application remains compliant and stable. In this guide, we will build a production-ready Python data pipeline using the official The Odds API to fetch, parse, and structure real-time football (soccer) market data.
Architecture of a Compliant Data Pipeline
When dealing with third-party sports data, your script should always respect three engineering pillars:
Compliance: Only query authorized developer endpoints.
Resilience: Properly handle network timeouts and API rate limits.
Data Normalization: Transform nested JSON responses into flat relational structures (like Pandas DataFrames or CSV files).
Let’s implement this step-by-step.
Prerequisites
We will use requests for fetching the network payload and pandas for structural data manipulation. Install them using pip:
Bash
pip install requests pandas
import os import requests import pandas as pd from datetime import datetime
class SportsDataPipeline: def init(self, api_key: str): self.api_key = api_key self.base_url = "https://api.the-odds-api.com/v4/sports"
def fetch_live_market_data(self, sport: str, region: str = "uk", market: str = "h2h") -> list:
"""
Fetches structured match and odds metrics from a compliant API endpoint.
"""
endpoint = f"{self.base_url}/{sport}/odds/"
params = {
'apiKey': self.api_key,
'regions': region,
'markets': market,
'dateFormat': 'iso'
}
try:
response = requests.get(endpoint, params=params, timeout=10)
# Compliance Check: Monitor API Rate Limits via Headers
remaining_requests = response.headers.get('x-requests-remaining')
print(f"[INFO] API Requests Remaining for this month: {remaining_requests}")
if response.status_code == 200:
return response.json()
elif response.status_code == 401:
print("[ERROR] Unauthorized: Please check your API key.")
return []
elif response.status_code == 429:
print("[ERROR] Rate limit exceeded. Backing off...")
return []
else:
print(f"[ERROR] HTTP Error {response.status_code}")
return []
except requests.exceptions.RequestException as e:
print(f"[CONNECTION ERROR] Failed to connect to data provider: {e}")
return []
def process_and_normalize(self, raw_json: list) -> pd.DataFrame:
"""
Flattens complex nested JSON structures into a clean analytical DataFrame.
"""
if not raw_json:
return pd.DataFrame()
normalized_records = []
for match in raw_json:
match_id = match.get('id')
home_team = match.get('home_team')
away_team = match.get('away_team')
commence_time = match.get('commence_time')
# Extract data from available bookmaker entities
for bookmaker in match.get('bookmakers', []):
provider_name = bookmaker.get('title')
for market in bookmaker.get('markets', []):
if market.get('key') == 'h2h':
outcomes = market.get('outcomes', [])
# Map outcome prices into a dynamic dictionary
prices = {outcome['name']: outcome['price'] for outcome in outcomes}
normalized_records.append({
'Match_ID': match_id,
'Kickoff_Time': commence_time,
'Home_Team': home_team,
'Away_Team': away_team,
'Data_Provider': provider_name,
'Home_Win_Odds': prices.get(home_team),
'Away_Win_Odds': prices.get(away_team),
'Draw_Odds': prices.get('Draw')
})
return pd.DataFrame(normalized_records)
--- Execution Block ---
if name == "main": # Replace with your actual verified API Key API_KEY = os.getenv('SPORTS_API_KEY', 'YOUR_OFFICIAL_API_KEY')
TARGET_SPORT = "soccer_epl"
pipeline = SportsDataPipeline(api_key=API_KEY)
print("Initiating data fetch...")
raw_payload = pipeline.fetch_live_market_data(sport=TARGET_SPORT)
if raw_payload:
df_analytics = pipeline.process_and_normalize(raw_payload)
# Save output for analytical processing
output_filename = f"epl_market_data_{datetime.now().strftime('%Y%m%d')}.csv"
df_analytics.to_csv(output_filename, index=False)
print(f"[SUCCESS] Pipeline complete. Data saved to {output_filename}")
print(df_analytics.head())
Conclusion
By swapping fragile scrapers for structured, compliant APIs, you secure your pipeline against layout changes and legal risks. From here, you can easily plug this Pandas DataFrame into a visualization tool like Streamlit or save it directly into a PostgreSQL database for historical trend analysis.
Happy coding! If you have any questions regarding API data nesting, feel free to drop a comment below.learn more:WOW88