Skip to content

Feature Store

Formula

\[ \text{feature view} = (\text{entity id}, \text{timestamp}) \mapsto \text{feature vector} \]

Parameters

  • Entity/time keys define point-in-time feature retrieval.

What it means

A feature store manages reusable, versioned feature definitions and serving/training access patterns.

What it's used for

  • Reducing duplicated feature logic across teams.
  • Point-in-time correct training data and online/offline consistency.

Key properties

  • Promotes feature lineage, reuse, and governance.
  • Often separates offline backfill and online serving paths.

Common gotchas

  • Without point-in-time joins, historical training sets can leak future data.
  • A feature store does not replace data quality monitoring.

Example

Define a "30-day spend" feature once and reuse it in both batch training and online scoring systems.

How to Compute (Pseudocode)

Input: feature definitions, entities, timestamps, source tables/events
Output: point-in-time correct feature retrieval pipeline

define versioned feature transformations and source dependencies
for training data generation:
  join features by entity and timestamp using point-in-time constraints
for online serving:
  compute/lookup the same feature definitions for incoming requests
monitor freshness, lineage, and online/offline consistency

Complexity

  • Time: Depends on storage/query engines, feature definitions, and join/backfill workloads (often dominated by data processing infrastructure)
  • Space: Depends on offline feature tables, online caches, and retained lineage/version metadata
  • Assumptions: This card describes system workflow complexity; infrastructure and data volume dominate over local algorithmic costs