Apacen Trading – FAQ

What am I looking at on this dashboard?

This frontend sits on top of a custom data pipeline for Polymarket. Under the hood, three high-volume streams are being ingested:

market_quotes – “top of the book” snapshots.
market_trades – individual fills (every trade that prints).
market_features – derived statistical signals built from quotes and trades.

Those three streams are stored in Postgres in hourly partitions, periodically archived to S3, and then used to drive:

The stats at the top of the page.
The “Live Market Events” pane (new markets, big jumps, extremes).
The lag meter, which shows how far behind the ingest is from real time.

The long-term goal is to use this data to power and backtest trading strategies on Polymarket.

The three main tables

What is market_quotes?

Short version: Snapshots of the best bid/ask for each market at a given instant.
Think of it as: “What’s the current top quote in the order book?”

Each row corresponds to the Quote struct:

type Quote struct {
  TokenID   string
  TS        time.Time
  BestBid   float64
  BestAsk   float64
  BidSize1  float64
  AskSize1  float64
  SpreadBps float64
  Mid       float64
}

Field-by-field:

TokenID

Polymarket’s internal ID for a specific “yes/no” outcome (a CLOB token), not the human-readable market title.

TS

Timestamp when this quote snapshot was observed.

BestBid

Highest price someone is currently willing to pay for 1 unit of the token.

BestAsk

Lowest price someone is currently willing to sell for.

BidSize1

Size available at the best bid (top level of the bid side).

AskSize1

Size available at the best ask.

SpreadBps

Bid–ask spread expressed in basis points (1 basis point = 0.01%). It’s basically:

spread = ((BestAsk - BestBid) / Mid) × 10,000

Mid

Midpoint between best bid and best ask:

Mid = (BestBid + BestAsk) / 2

When is a `market_quotes` row emitted?

Whenever the top of book changes in a meaningful way – e.g.:

Best bid moves.
Best ask moves.
Top level size changes enough to matter.

Busy markets can generate thousands of quote snapshots per minute. Quiet markets may sit unchanged for a while.

What is market_trades?

Short version: Every individual execution that happens on Polymarket.
Think of it as: “The tape” – every trade print with price, size, and which side initiated it.

Each row corresponds to the Trade struct:

type Trade struct {
  TokenID   string
  TS        time.Time
  Price     float64
  Size      float64
  Aggressor string // "buy" or "sell"
  TradeID   string // optional if available
}

Field-by-field:

TokenID

Same idea as in quotes: the internal ID for the outcome being traded.

TS

When this trade was executed.

Price

Transaction price for this fill (e.g. 0.43 = 43¢).

Size

Quantity traded in this fill (in Polymarket’s units for that token).

Aggressor

Indicates which side crossed the spread:

"buy": a buyer lifted the ask (buy market order / taker).
"sell": a seller hit the bid (sell market order / taker).

TradeID

Exchange-side identifier when available, useful for deduplication or cross-referencing.

When is a `market_trades` row emitted?

Whenever any trade prints on Polymarket for a tracked token. Trades are much rarer than quotes, but each trade is extremely informative: it’s where actual money changes hands.

What is market_features?

Short version: Rolling statistical signals built from the raw quotes and trades.
Think of it as: “Features you’d feed into a trading model.”

Each row corresponds to the FeatureUpdate struct:

type FeatureUpdate struct {
  TokenID        string
  TS             time.Time
  Ret1m          float64
  Ret5m          float64
  Vol1m          float64
  AvgVol5m       float64
  Sigma5m        float64
  ZScore5m       float64
  ImbalanceTop   float64
  SpreadBps      float64
  BrokeHigh15m   bool
  BrokeLow15m    bool
  TimeToResolveH float64
  SignedFlow1m   float64 // +buy -sell
  MidNow         float64
  Mid1mAgo       float64
}

Field-by-field:

TokenID

Outcome identifier, as before.

TS

Time of this feature snapshot. Think of it as “the state of this market at this moment.”

Ret1m

1-minute return based on mid price: approximately

(MidNow - Mid 1 minute ago) / (Mid 1 minute ago)

A value of 0.05 ≈ +5% move in the last minute.

Ret5m

Same idea as Ret1m, but over the last 5 minutes.

Vol1m

Traded volume over the last 1 minute for this token (sum of sizes).

AvgVol5m

Average 1-minute volume over the last 5 minutes. Good for spotting whether the current minute is unusually active.

Sigma5m

Estimated volatility of mid price over the last 5 minutes (a rolling standard deviation). Higher Sigma5m = price jittering around more violently.

ZScore5m

A z-score of the current price relative to the recent 5-minute window:

Z = (current mid - recent mean mid) / Sigma5m

|Z| ≈ 1–2: normal noise.
|Z| ≈ 3–4: quite unusual.
|Z| ≈ 10: “state extreme” – price is miles from its recent average.

ImbalanceTop

Measure of order-book imbalance at the top level.
Intuitively:

Positive values ≈ more aggressive bid size than ask size.
Negative values ≈ more aggressive ask size than bid size.
Near zero ≈ balanced.

SpreadBps

Same as in market_quotes: bid–ask spread in basis points. Wider spreads ≈ less liquidity / higher friction.

BrokeHigh15m

true if the current mid price is breaking the 15-minute high (new short-term high).

BrokeLow15m

true if the current mid price is breaking the 15-minute low (new short-term low).

TimeToResolveH

Approximate hours until market resolution (based on known expiry time when available).

Small value → event is soon.
Large value → long-dated / far in the future.

Great for strategies that behave differently near expiry.

SignedFlow1m

Net signed order flow over 1 minute:

Positive = more aggressive buys than sells.
Negative = more aggressive sells than buys.

This tries to capture which side is “in control” of recent trading.

MidNow

Current mid price (same as in market_quotes).

Mid1mAgo

Mid price one minute ago, used internally to compute the returns, but also handy to see on its own.

When is a `market_features` row emitted?

The feature engine keeps a rolling window of recent quotes and trades in memory. It emits a new FeatureUpdate when:

Enough new data has arrived to update the rolling window, and/or
Something “interesting” happens (big move, high z-score, notable volume spike, etc.).

Because features can be recalculated as new data arrives, they’re:

First streamed into temporary tables via COPY.
Then merged into market_features via a set-based upsert, so only the latest feature per token/timestamp “sticks.”

The result is a dense, but not ridiculous, stream of signal snapshots.

Lag and performance

What is the “Stream lag” indicator on the front page?

The lag badges show, for each stream:

Quotes lag – how many seconds ago the newest market_quotes row was ingested.
Trades lag – same for market_trades.
Features lag – same for market_features.

In other words:
“If the latest quote in the database is from 3 seconds ago, the quotes lag is ~3s.”

The frontend hits a lightweight /stream-lag endpoint that asks Postgres:

SELECT EXTRACT(EPOCH FROM (now() - max(ts))) AS lag_sec
FROM market_quotes  -- or market_trades / market_features
WHERE ts > now() - interval '1 day';

and uses that to color the badges.

Green (ok) – normal: typically 1–4 seconds.
Amber (warn) – tens of seconds: usually a brief backlog or a small hiccup.
Red (bad) – hundreds of seconds: something is putting the ingest under stress.

Why have we seen lags as high as ~470 seconds?

A few reasons can cause temporary spikes without actually “breaking” the system:

Heavy analytics queries (like stats / exploratory SQL)
Big WITH queries scanning partitioned tables can:
- Compete with the ingest pipeline for I/O.
- Hold locks and slow down writes.
This doesn’t stop the gatherer from reading the websocket; it just means it takes longer for data to land in Postgres.
Write-ahead log (WAL) and checkpoint pressure
Under high write volume, Postgres sometimes needs to:
- Flush WAL aggressively.
- Run longer checkpoints.
Both can briefly slow writes, which shows up as lag.
Archiver / janitor / healthmonitor interactions
- The archiver streams old partitions to S3.
- The janitor drops old partitions to free disk.
- The healthmonitor dynamically adjusts them.
If they all wake up at once during a high-activity period, writes can momentarily fall behind.
External factors
Network hiccups, upstream slowdowns at Polymarket, or temporary resource contention on the EC2 instance can all contribute.

The important point: the pipeline is built to catch up. Short-lived spikes into the tens or even hundreds of seconds are acceptable as long as:

They don’t persist.
Data eventually “catches up” and lag returns to single-digit seconds.

Is lag of ~120 seconds OK? What about ~470 seconds?

Normal operating range: ~1–4 seconds.
Mild stress: 30–120 seconds – worth watching, but usually self-correcting.
Serious stress: 200+ seconds sustained – useful as a “debug me” signal.

A one-off spike to ~470 seconds during a big query or a maintenance task is not catastrophic; a persistent 400–500 seconds would be a sign that the ingest rate and database configuration need tuning.

Prices and discrepancies vs Polymarket’s UI

Which leg (“YES” vs “NO”) are these prices?

For binary markets, the system treats the stored price as the probability of the event happening — i.e. the “YES” leg. In other words, the value you see here should correspond to “YES”, and the implied “NO” price would be 1 − YES up to fees and microstructure noise. Internally, the backend is designed around YES-normalized prices for binaries, so strategies and features all speak a common language: “probability this resolves true.”

Why do the prices on this dashboard look “swingier” than on Polymarket?

There are a few reasons.

We record every micro-move at the top of book
market_quotes captures each change in best bid/ask and mid. If the top of book jitters from 0.51 → 0.52 → 0.50 → 0.53, we record all those tiny moves.
Polymarket’s UI, by contrast, may emphasize:
- Last trade.
- Coarser refresh.
- Some smoothing for charts.
Mid price vs. displayed price
The dashboard often uses mid price (Mid), or features derived from mid, whereas Polymarket may show:
- Last executed price.
- A VWAP-style value.
- Or something closer to “yes” probability rounded for humans.
Mid price moves whenever either side of the order book moves, so it looks more “twitchy.”
Raw derived features (ret, z-scores, σ) are intentionally sensitive
Ret1m, Ret5m, Sigma5m, ZScore5m, etc., are designed for strategy design, not for a calm human-facing chart.
A strategy might care deeply about a 1–2% move that a human UI would barely highlight.
Different time aggregation
Polymarket charts often aggregate over candles (e.g. 1m, 5m) and may interpolate missing data.
This dashboard shows point-in-time snapshots and derived stats without smoothing, so you see every wiggle.

Bottom line: if the dashboard looks “swingier,” that’s intentional — it’s closer to the raw microstructure of the market than the public UI.

Market events: new markets, price jumps, state extremes

What are “New markets” in the events panel?

These are new_market events recorded in market_events:

The gatherer notices a new Polymarket market / outcome.
It emits a MarketEvent of type new_market with metadata such as:
- Market slug.
- Question text.
- Liquidity.
- Volume.
- Market age (hours since creation).

The frontend’s “New markets” tab shows the most recent markets detected by the system, ranked by detection time, not by popularity.

What are “Price jumps” / “state extremes”?

Internally, most of the “big price move” events use the state_extreme event type, which is based on:

Very high ZScore5m (price far from its recent mean).
Often a large |Ret1m| as well.

The frontend’s “Price jumps” tab focuses on “state extreme” events with large absolute returns (e.g. ≥ 5%) within the recent window. These are:

Sharper moves.
Often associated with news or sudden liquidity shifts.

Because these are filtered by both event type and minimum return, it’s possible for the “Price jumps” tab to briefly have fewer entries than the raw volume of state_extreme events would suggest.

Data retention, archiving, and disk space

How do you avoid filling the disk?

The tables are:

Partitioned hourly into tables like market_quotes_pYYYYMMDDHH, etc.
Archived by an archiver daemon that:
- Scans old partitions.
- Writes them to S3 as compressed JSON.
- Marks them as archived.
Cleaned up by a janitor daemon that:
- Drops archived partitions after a configurable retention window.

Current defaults (subject to tuning):

Quotes kept in Postgres for ~2.4 hours.
Trades for ~3.6 hours.
Features for ~6 hours.

Beyond that, the history lives in S3 and can be re-hydrated for research and backtests.

Where this is going

What are the future goals of this project?

This data plane (gatherer + feature engine + persister + archiver + janitor + healthmonitor + API) is the foundation. The roadmap from here looks roughly like:

Richer strategy prototyping (“paper trading”)
Build out a dedicated strategies microservice that:
- Reads market_features, market_trades, and market_quotes.
- Simulates entries/exits with realistic costs and constraints.
- Logs PnL, drawdowns, and risk metrics back into Postgres.
Deeper analytics & tools
- Per-market dashboards with:
  - Feature time series (ret, z, sigma, imbalance).
  - Liquidity and spread histories.
- “Calculators” and visualizers for common trading concepts:
  - Kelly sizing.
  - Probability conversion & edge.
  - Volatility / Sharpe-style metrics specialized for prediction markets.
Long-horizon research & backtesting
- Use the S3 archive to:
  - Reconstruct order-book and feature histories over months.
  - Backtest complex strategies safely and repeatedly.
- Compare strategies across:
  - Elections / sports / crypto / “weird” markets.
  - Different time-to-resolution regimes.
Live trading (once US access is permitted)
- Gradually move a subset of strategies from paper trading to live trading on Polymarket.
- Add risk controls, position limits, and live monitoring.
Potential monetization paths (still speculative)
- Proprietary strategies: trade on own capital.
- Data service: provide archived microstructure data to third parties.
- Strategy subscriptions: allow others to allocate to specific strategies and take a performance fee.

Everything on this frontend today is meant to be:

A window into the health and behavior of the ingest pipeline.
A playground for understanding how Polymarket behaves at the microstructure level.
A foundation for future work in automated or semi-automated trading.

Apacen Trading – FAQ

When is a market_quotes row emitted?

When is a market_trades row emitted?

When is a market_features row emitted?

What is the “Stream lag” indicator on the front page?

Why have we seen lags as high as ~470 seconds?

Is lag of ~120 seconds OK? What about ~470 seconds?

Which leg (“YES” vs “NO”) are these prices?

Why do the prices on this dashboard look “swingier” than on Polymarket?

What are “New markets” in the events panel?

What are “Price jumps” / “state extremes”?

How do you avoid filling the disk?

What are the future goals of this project?

When is a `market_quotes` row emitted?

When is a `market_trades` row emitted?

When is a `market_features` row emitted?