Feature Store¶

Formula¶

\[ \text{feature view} = (\text{entity id}, \text{timestamp}) \mapsto \text{feature vector} \]

Parameters¶

Entity/time keys define point-in-time feature retrieval.

What it means¶

A feature store manages reusable, versioned feature definitions and serving/training access patterns.

What it's used for¶

Reducing duplicated feature logic across teams.
Point-in-time correct training data and online/offline consistency.

Key properties¶

Promotes feature lineage, reuse, and governance.
Often separates offline backfill and online serving paths.

Common gotchas¶

Without point-in-time joins, historical training sets can leak future data.
A feature store does not replace data quality monitoring.

Example¶

Define a "30-day spend" feature once and reuse it in both batch training and online scoring systems.

How to Compute (Pseudocode)¶

Input: feature definitions, entities, timestamps, source tables/events
Output: point-in-time correct feature retrieval pipeline

define versioned feature transformations and source dependencies
for training data generation:
  join features by entity and timestamp using point-in-time constraints
for online serving:
  compute/lookup the same feature definitions for incoming requests
monitor freshness, lineage, and online/offline consistency

Complexity¶

Time: Depends on storage/query engines, feature definitions, and join/backfill workloads (often dominated by data processing infrastructure)
Space: Depends on offline feature tables, online caches, and retained lineage/version metadata
Assumptions: This card describes system workflow complexity; infrastructure and data volume dominate over local algorithmic costs