Apacen Trading – FAQ
What am I looking at on this dashboard?
This frontend sits on top of a custom data pipeline for Polymarket. Under the hood, three high-volume streams are being ingested:
market_quotes– “top of the book” snapshots.market_trades– individual fills (every trade that prints).market_features– derived statistical signals built from quotes and trades.
Those three streams are stored in Postgres in hourly partitions, periodically archived to S3, and then used to drive:
- The stats at the top of the page.
- The “Live Market Events” pane (new markets, big jumps, extremes).
- The lag meter, which shows how far behind the ingest is from real time.
The long-term goal is to use this data to power and backtest trading strategies on Polymarket.
The three main tables
What is market_quotes?
Short version: Snapshots of the best bid/ask for each market at a given instant.
Think of it as: “What’s the current top quote in the order book?”
Each row corresponds to the Quote struct:
type Quote struct {
TokenID string
TS time.Time
BestBid float64
BestAsk float64
BidSize1 float64
AskSize1 float64
SpreadBps float64
Mid float64
}Field-by-field:
TokenID
TS
BestBid
BestAsk
BidSize1
AskSize1
SpreadBps
spread = ((BestAsk - BestBid) / Mid) × 10,000Mid
Mid = (BestBid + BestAsk) / 2When is a market_quotes row emitted?
Whenever the top of book changes in a meaningful way – e.g.:
- Best bid moves.
- Best ask moves.
- Top level size changes enough to matter.
Busy markets can generate thousands of quote snapshots per minute. Quiet markets may sit unchanged for a while.
What is market_trades?
Short version: Every individual execution that happens on Polymarket.
Think of it as: “The tape” – every trade print with price, size, and which side initiated it.
Each row corresponds to the Trade struct:
type Trade struct {
TokenID string
TS time.Time
Price float64
Size float64
Aggressor string // "buy" or "sell"
TradeID string // optional if available
}Field-by-field:
TokenID
TS
Price
Size
Aggressor
"buy": a buyer lifted the ask (buy market order / taker)."sell": a seller hit the bid (sell market order / taker).
TradeID
When is a market_trades row emitted?
Whenever any trade prints on Polymarket for a tracked token. Trades are much rarer than quotes, but each trade is extremely informative: it’s where actual money changes hands.
What is market_features?
Short version: Rolling statistical signals built from the raw quotes and trades.
Think of it as: “Features you’d feed into a trading model.”
Each row corresponds to the FeatureUpdate struct:
type FeatureUpdate struct {
TokenID string
TS time.Time
Ret1m float64
Ret5m float64
Vol1m float64
AvgVol5m float64
Sigma5m float64
ZScore5m float64
ImbalanceTop float64
SpreadBps float64
BrokeHigh15m bool
BrokeLow15m bool
TimeToResolveH float64
SignedFlow1m float64 // +buy -sell
MidNow float64
Mid1mAgo float64
}Field-by-field:
TokenID
TS
Ret1m
(MidNow - Mid 1 minute ago) / (Mid 1 minute ago)A value of
0.05 ≈ +5% move in the last minute.Ret5m
Ret1m, but over the last 5 minutes.Vol1m
AvgVol5m
Sigma5m
Sigma5m = price jittering around more violently.ZScore5m
Z = (current mid - recent mean mid) / Sigma5m- |Z| ≈ 1–2: normal noise.
- |Z| ≈ 3–4: quite unusual.
- |Z| ≈ 10: “state extreme” – price is miles from its recent average.
ImbalanceTop
Intuitively:
- Positive values ≈ more aggressive bid size than ask size.
- Negative values ≈ more aggressive ask size than bid size.
- Near zero ≈ balanced.
SpreadBps
market_quotes: bid–ask spread in basis points. Wider spreads ≈ less liquidity / higher friction.BrokeHigh15m
true if the current mid price is breaking the 15-minute high (new short-term high).BrokeLow15m
true if the current mid price is breaking the 15-minute low (new short-term low).TimeToResolveH
- Small value → event is soon.
- Large value → long-dated / far in the future.
SignedFlow1m
- Positive = more aggressive buys than sells.
- Negative = more aggressive sells than buys.
MidNow
market_quotes).Mid1mAgo
When is a market_features row emitted?
The feature engine keeps a rolling window of recent quotes and trades in memory. It emits a new FeatureUpdate when:
- Enough new data has arrived to update the rolling window, and/or
- Something “interesting” happens (big move, high z-score, notable volume spike, etc.).
Because features can be recalculated as new data arrives, they’re:
- First streamed into temporary tables via
COPY. - Then merged into
market_featuresvia a set-based upsert, so only the latest feature per token/timestamp “sticks.”
The result is a dense, but not ridiculous, stream of signal snapshots.
Lag and performance
What is the “Stream lag” indicator on the front page?
The lag badges show, for each stream:
- Quotes lag – how many seconds ago the newest
market_quotesrow was ingested. - Trades lag – same for
market_trades. - Features lag – same for
market_features.
In other words:
“If the latest quote in the database is from 3 seconds ago, the quotes lag is ~3s.”
The frontend hits a lightweight /stream-lag endpoint that asks Postgres:
SELECT EXTRACT(EPOCH FROM (now() - max(ts))) AS lag_sec
FROM market_quotes -- or market_trades / market_features
WHERE ts > now() - interval '1 day';and uses that to color the badges.
- Green (
ok) – normal: typically 1–4 seconds. - Amber (
warn) – tens of seconds: usually a brief backlog or a small hiccup. - Red (
bad) – hundreds of seconds: something is putting the ingest under stress.
Why have we seen lags as high as ~470 seconds?
A few reasons can cause temporary spikes without actually “breaking” the system:
- Heavy analytics queries (like stats / exploratory SQL)
BigWITHqueries scanning partitioned tables can:- Compete with the ingest pipeline for I/O.
- Hold locks and slow down writes.
- Write-ahead log (WAL) and checkpoint pressure
Under high write volume, Postgres sometimes needs to:- Flush WAL aggressively.
- Run longer checkpoints.
- Archiver / janitor / healthmonitor interactions
- The archiver streams old partitions to S3.
- The janitor drops old partitions to free disk.
- The healthmonitor dynamically adjusts them.
- External factors
Network hiccups, upstream slowdowns at Polymarket, or temporary resource contention on the EC2 instance can all contribute.
The important point: the pipeline is built to catch up. Short-lived spikes into the tens or even hundreds of seconds are acceptable as long as:
- They don’t persist.
- Data eventually “catches up” and lag returns to single-digit seconds.
Is lag of ~120 seconds OK? What about ~470 seconds?
Normal operating range: ~1–4 seconds.
Mild stress: 30–120 seconds – worth watching, but usually self-correcting.
Serious stress: 200+ seconds sustained – useful as a “debug me” signal.
A one-off spike to ~470 seconds during a big query or a maintenance task is not catastrophic; a persistent 400–500 seconds would be a sign that the ingest rate and database configuration need tuning.
Prices and discrepancies vs Polymarket’s UI
Which leg (“YES” vs “NO”) are these prices?
For binary markets, the system treats the stored price as the probability of the event happening — i.e. the “YES” leg. In other words, the value you see here should correspond to “YES”, and the implied “NO” price would be 1 − YES up to fees and microstructure noise. Internally, the backend is designed around YES-normalized prices for binaries, so strategies and features all speak a common language: “probability this resolves true.”
Why do the prices on this dashboard look “swingier” than on Polymarket?
There are a few reasons.
- We record every micro-move at the top of book
market_quotescaptures each change in best bid/ask and mid. If the top of book jitters from 0.51 → 0.52 → 0.50 → 0.53, we record all those tiny moves.
Polymarket’s UI, by contrast, may emphasize:- Last trade.
- Coarser refresh.
- Some smoothing for charts.
- Mid price vs. displayed price
The dashboard often uses mid price (Mid), or features derived from mid, whereas Polymarket may show:- Last executed price.
- A VWAP-style value.
- Or something closer to “yes” probability rounded for humans.
- Raw derived features (ret, z-scores, σ) are intentionally sensitive
Ret1m,Ret5m,Sigma5m,ZScore5m, etc., are designed for strategy design, not for a calm human-facing chart.
A strategy might care deeply about a 1–2% move that a human UI would barely highlight. - Different time aggregation
Polymarket charts often aggregate over candles (e.g. 1m, 5m) and may interpolate missing data.
This dashboard shows point-in-time snapshots and derived stats without smoothing, so you see every wiggle.
Bottom line: if the dashboard looks “swingier,” that’s intentional — it’s closer to the raw microstructure of the market than the public UI.
Market events: new markets, price jumps, state extremes
What are “New markets” in the events panel?
These are new_market events recorded in market_events:
- The gatherer notices a new Polymarket market / outcome.
- It emits a
MarketEventof typenew_marketwith metadata such as:- Market slug.
- Question text.
- Liquidity.
- Volume.
- Market age (hours since creation).
The frontend’s “New markets” tab shows the most recent markets detected by the system, ranked by detection time, not by popularity.
What are “Price jumps” / “state extremes”?
Internally, most of the “big price move” events use the state_extreme event type, which is based on:
- Very high
ZScore5m(price far from its recent mean). - Often a large |
Ret1m| as well.
The frontend’s “Price jumps” tab focuses on “state extreme” events with large absolute returns (e.g. ≥ 5%) within the recent window. These are:
- Sharper moves.
- Often associated with news or sudden liquidity shifts.
Because these are filtered by both event type and minimum return, it’s possible for the “Price jumps” tab to briefly have fewer entries than the raw volume of state_extreme events would suggest.
Data retention, archiving, and disk space
How do you avoid filling the disk?
The tables are:
- Partitioned hourly into tables like
market_quotes_pYYYYMMDDHH, etc. - Archived by an archiver daemon that:
- Scans old partitions.
- Writes them to S3 as compressed JSON.
- Marks them as archived.
- Cleaned up by a janitor daemon that:
- Drops archived partitions after a configurable retention window.
Current defaults (subject to tuning):
- Quotes kept in Postgres for ~2.4 hours.
- Trades for ~3.6 hours.
- Features for ~6 hours.
Beyond that, the history lives in S3 and can be re-hydrated for research and backtests.
Where this is going
What are the future goals of this project?
This data plane (gatherer + feature engine + persister + archiver + janitor + healthmonitor + API) is the foundation. The roadmap from here looks roughly like:
- Richer strategy prototyping (“paper trading”)
Build out a dedicated strategies microservice that:- Reads
market_features,market_trades, andmarket_quotes. - Simulates entries/exits with realistic costs and constraints.
- Logs PnL, drawdowns, and risk metrics back into Postgres.
- Reads
- Deeper analytics & tools
- Per-market dashboards with:
- Feature time series (ret, z, sigma, imbalance).
- Liquidity and spread histories.
- “Calculators” and visualizers for common trading concepts:
- Kelly sizing.
- Probability conversion & edge.
- Volatility / Sharpe-style metrics specialized for prediction markets.
- Per-market dashboards with:
- Long-horizon research & backtesting
- Use the S3 archive to:
- Reconstruct order-book and feature histories over months.
- Backtest complex strategies safely and repeatedly.
- Compare strategies across:
- Elections / sports / crypto / “weird” markets.
- Different time-to-resolution regimes.
- Use the S3 archive to:
- Live trading (once US access is permitted)
- Gradually move a subset of strategies from paper trading to live trading on Polymarket.
- Add risk controls, position limits, and live monitoring.
- Potential monetization paths (still speculative)
- Proprietary strategies: trade on own capital.
- Data service: provide archived microstructure data to third parties.
- Strategy subscriptions: allow others to allocate to specific strategies and take a performance fee.
Everything on this frontend today is meant to be:
- A window into the health and behavior of the ingest pipeline.
- A playground for understanding how Polymarket behaves at the microstructure level.
- A foundation for future work in automated or semi-automated trading.