Background: Data Sources and API Comparison

Data is the foundation of quantitative trading. Choosing the right data source directly affects strategy development efficiency and live trading performance.


1. Major Data Sources Overview

Data SourceTypePriceLatencyUse Case
Binance APICryptoFreeReal-timeCrypto strategies
Yahoo FinanceStocks/ETFsFree15-20 minLearning/backtesting
Alpha VantageMulti-assetFree/Paid15 minPrototyping
Polygon.ioUS Stocks$29-199/moReal-timeUS stock live trading
Alpaca MarketsUS StocksFreeReal-timeUS stock trading
BloombergAll assets$2400+/moReal-timeInstitutional
RefinitivAll assets$1800+/moReal-timeInstitutional
Nasdaq Data LinkAlternativeCustom pricingDailyFactor research

2. Free Data Sources Detailed

2.1 Binance API

Pros:

  • Completely free, no payment required
  • Real-time data, minimal latency
  • WebSocket real-time streaming support
  • Complete historical data (years of history)
  • REST + WebSocket dual interfaces

Cons:

  • Crypto only
  • Rate Limit: 1200 requests/minute (REST), subscription limits per connection (WebSocket)
  • API may be unstable during high volatility

Rate Limit Handling:

import time
from binance.exceptions import BinanceAPIException

def fetch_with_retry(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except BinanceAPIException as e:
            if e.code == -1003:  # Rate limit
                time.sleep(60)  # Wait 1 minute
            else:
                raise

Use Case: Crypto strategy development, 24/7 trading systems


2.2 Yahoo Finance (yfinance)

Pros:

  • Completely free
  • Covers global stocks, ETFs, indices
  • Rich historical data (decades of history)
  • Easy-to-use Python library

Cons:

  • 15-20 minute data delay
  • No official API, relies on web scraping (may be blocked)
  • Data quality not guaranteed (occasional errors)
  • No Level-2 data

Usage Example:

import yfinance as yf

# Get stock data
aapl = yf.Ticker("AAPL")
df = aapl.history(period="1y", interval="1d")

# Batch download
data = yf.download(["AAPL", "GOOGL", "MSFT"], period="1mo")

Use Case: Learning, backtesting, daily frequency strategy research


2.3 Alpha Vantage

Pros:

  • Free tier available (25 requests/day)
  • Covers stocks, forex, crypto
  • Provides technical indicator APIs
  • Official support

Cons:

  • Free tier severely restricted (only 25 calls/day as of 2024)
  • Paid version starts at $49.99/month
  • 15-minute data delay
  • Limited historical depth

Rate Limit Comparison:

TierPriceRequest Limit
Free$025/day (severely restricted)
Basic$49.99/mo30/min
Premium$249.99/mo120/min

Use Case: Prototyping, multi-asset research


2.4 Alpaca Markets

Pros:

  • Free real-time US stock data
  • Combined data + trading API (no separate data subscription needed)
  • WebSocket real-time streaming
  • Paper trading environment included
  • Commission-free stock trading
  • Supports fractional shares

Cons:

  • US stocks only
  • Requires account registration
  • Some features require funded account

Usage Example:

from alpaca.data import StockHistoricalDataClient
from alpaca.data.requests import StockBarsRequest
from alpaca.data.timeframe import TimeFrame
from datetime import datetime, timedelta

# Initialize client (no keys needed for free tier)
client = StockHistoricalDataClient()

# Get historical bars
request = StockBarsRequest(
    symbol_or_symbols=["AAPL", "MSFT"],
    timeframe=TimeFrame.Day,
    start=datetime.now() - timedelta(days=30)
)
bars = client.get_stock_bars(request)

Use Case: US stock strategy development, paper trading, live trading with same API


3. Paid Data Sources Detailed

3.1 Polygon.io

Pros:

  • US stock real-time data
  • Historical tick data
  • WebSocket real-time streaming
  • Reasonable pricing (starting $29/mo)

Cons:

  • US stocks only
  • Basic tier no Level-2
  • Requires US payment method

Pricing:

TierPriceFeatures
Basic$29/moDelayed data + Historical
Developer$79/moReal-time data
Premium$199/moLevel-2 + Full features

3.2 Bloomberg Terminal

Pros:

  • World's most comprehensive financial data
  • Real-time + Historical + News + Research
  • Level-2, order book data
  • Professional analysis tools

Cons:

  • Expensive ($2400+/month/terminal)
  • Requires dedicated hardware
  • Complex API usage

Use Case: Institutional investors, professional quant teams


3.3 Refinitiv (formerly Thomson Reuters)

Pros:

  • Global market coverage
  • High-quality tick data
  • Deep historical data
  • Easy-to-use Eikon API

Cons:

  • Expensive pricing
  • Complex contracts

Use Case: Institutional strategies, HFT


4. Alternative Data Sources

Provides non-traditional financial data:

  • Satellite imagery data
  • Sentiment analysis data
  • Macroeconomic data
  • Supply chain data

4.2 News and Sentiment Data

SourceTypePurpose
NewsAPINews aggregationSentiment analysis
Twitter/X APISocial mediaMarket sentiment
Reddit APIForumsRetail sentiment
SEC EDGARRegulatory filingsFundamental analysis

5. Data Source Selection Decision Tree

What asset class are you trading?

├─ Cryptocurrency  Binance API (Free, Real-time)

├─ US Stocks
   ├─ Learning/Backtesting  Yahoo Finance (Free)
   ├─ Prototyping  Alpaca Markets (Free, Real-time)
   ├─ Data + Trading Combined  Alpaca Markets (Free)
   └─ Premium Data Only  Polygon.io ($29+/mo)

├─ A-Shares (China)
   ├─ Learning/Backtesting  Tushare / AKShare (Free)
   └─ Live Trading  Broker API / Wind

└─ Institutional Needs  Bloomberg / Refinitiv

6. Common Pitfalls

6.1 Data Quality Issues

  • Missing values: Gaps from holidays, trading halts
  • Outliers: Unadjusted splits, dividends
  • Timezone issues: Different exchange timezones

6.2 Survivorship Bias

Free data sources typically only include currently existing stocks, missing delisted stocks data.

6.3 Look-Ahead Bias

Some data sources' "historical data" may contain later corrections (like earnings restatements).


7. Practical Recommendations

  1. Starting out: Yahoo Finance + Binance API (Free)
  2. Prototyping: Alpaca Markets (Free real-time data + paper trading)
  3. US Stock Live: Alpaca Markets (Free) or Polygon.io ($29+/mo for premium features)
  4. Institutional: Bloomberg / Refinitiv (Comprehensive but expensive)
  5. Always validate: Compare multiple data sources, check data quality

Warning: IEX Cloud shut down their public API service in August 2024. If you have legacy code using IEX Cloud, migrate to Alpaca Markets or Polygon.io.


Core Principle: Data quality > Data quantity. Better to use small amounts of high-quality data than large amounts of problematic data to train models.

Cite this chapter
Zhang, Wayland (2026). Background: Data Sources and API Comparison. In AI Quantitative Trading: From Zero to One. https://waylandz.com/quant-book-en/Data-Sources-and-API-Comparison
@incollection{zhang2026quant_Data_Sources_and_API_Comparison,
  author = {Zhang, Wayland},
  title = {Background: Data Sources and API Comparison},
  booktitle = {AI Quantitative Trading: From Zero to One},
  year = {2026},
  url = {https://waylandz.com/quant-book-en/Data-Sources-and-API-Comparison}
}