Origins & Design: The Ultralight Example¶
How Groggy Started¶
Groggy began with a mission: create a high-performance Rust-based graph backend combined with a quick, intuitive Python frontend capable of handling:
- Lightweight graph diagrams
- Heavy-duty machine learning algorithms
- Real-time graph analytics
The journey started with the "ultralight example" - an attempt to distill Groggy's essence into the smallest possible implementation. This exploration led to the core architectural decisions that define Groggy today.
The Ultralight Example¶
The Core Insight: Separation of Structure and Attributes¶
The ultralight example revealed a fundamental truth:
Graph structure (topology) and graph data (attributes) should be stored separately.
┌─────────────────┐ ┌──────────────────┐
│ Graph Structure│ │ Graph Attributes│
│ (Topology) │ ←──────→│ (Signal) │
├─────────────────┤ ├──────────────────┤
│ Nodes: 0,1,2,3 │ │ name: "Alice" │
│ Edges: 0→1,1→2 │ │ age: 29 │
│ 2→3,3→0 │ │ club: "Blue" │
└─────────────────┘ └──────────────────┘
Why separate them?
- Efficient Bulk Operations: Attributes stored in columnar format enable SIMD and cache-friendly operations
- Dynamic Graphs: Track structural changes independently from attribute changes
- Version Control: Store deltas efficiently without duplicating structure
- Clear Mental Model: Reason about topology and data separately
The Key Objects¶
The ultralight example introduced five core objects that remain central to Groggy:
1. AttributeValues¶
Storage for any data type:
2. Delta Objects¶
Track changes over time (essential for dynamic graphs):
struct Delta {
timestamp: u64,
change_type: ChangeType, // Add, Remove, Modify
entity_id: usize,
old_value: Option<AttributeValue>,
new_value: Option<AttributeValue>,
}
Key insight: Every change is tracked, nothing is lost. This enables: - Time-travel queries - Audit trails - Reproducible experiments
3. GraphSpace¶
The active state of the graph—which nodes and edges are currently "alive":
struct GraphSpace {
live_nodes: BitSet, // Which nodes exist
live_edges: BitSet, // Which edges exist
node_count: usize,
edge_count: usize,
}
Design principle: Nodes/edges are never deleted, only marked as inactive. This enables: - O(1) node/edge queries - Efficient state restoration - Version history without data loss
4. GraphPool¶
The flyweight pool containing all attributes:
struct GraphPool {
node_attrs: ColumnarStorage, // All node attributes
edge_attrs: ColumnarStorage, // All edge attributes
attr_index: HashMap<String, ColumnId>,
}
Critical design pattern: Attributes are never stored inside nodes/edges. Nodes and edges only point to attributes.
Node {id: 0} ──→ GraphPool["name"][0] = "Alice"
──→ GraphPool["age"][0] = 29
──→ GraphPool["club"][0] = "Blue"
5. HistoryForest¶
Git-like version control for graphs:
struct HistoryForest {
commits: Vec<Commit>,
branches: HashMap<String, BranchId>,
current_branch: BranchId,
}
struct Commit {
id: CommitId,
parent: Option<CommitId>,
deltas: Vec<Delta>,
message: String,
timestamp: u64,
}
This enables: - Branching and merging graph states - Time-travel queries - A/B testing on graph structure - Reproducible experiments
The Columnar Architecture Decision¶
To support both graph operations and machine learning workflows, Groggy needed rectangular data. The solution: columnar storage.
Why Columnar?¶
Traditional graph libraries store attributes like this:
# Node-centric storage (inefficient for bulk ops)
node = {
"id": 0,
"name": "Alice",
"age": 29,
"club": "Blue"
}
Groggy stores them like this:
# Columnar storage (efficient for bulk ops)
node_pool = {
"name": ["Alice", "Bob", "Carol", ...], # Contiguous memory
"age": [29, 55, 31, ...], # SIMD-friendly
"club": ["Blue", "Purple", "Blue", ...] # Cache-friendly
}
Benefits:
- Vectorized Operations: Process entire columns at once
- Cache Efficiency: Sequential memory access patterns
- Compression: Columnar data compresses better
- Analytics: Natural fit for data science workflows
Example: Mean Age Computation¶
Traditional (row-wise):
total = 0
for node in graph.nodes:
total += node["age"] # Memory scattered, cache misses
mean = total / len(graph.nodes)
Groggy (columnar):
The columnar approach is 10-100x faster for bulk operations.
From Ultralight to Full Implementation¶
The ultralight example established the foundation. The full Groggy implementation expands it with:
Additional Components¶
- Display System: Rich formatting for notebooks and terminals
- Query Engine: Pandas-style filtering and selection
- Algorithm Library: Connected components, centrality, etc.
- Neural Module: Graph neural networks with autodiff
- Visualization: Real-time graph rendering
- I/O System: Parquet, CSV, bundles, pandas integration
Three-Tier Architecture¶
The ultralight concepts map to the three-tier architecture:
Python API Layer
├─ Graph, Subgraph, Table, Array, Matrix
└─ User-facing delegation chains
FFI Bridge (PyO3)
├─ Type conversions
└─ Safe error handling
Rust Core
├─ GraphSpace (active state)
├─ GraphPool (attribute storage)
├─ HistoryForest (version control)
├─ Delta tracking
└─ Algorithms
Design Principles from Ultralight¶
The ultralight example taught us these principles:
1. Separation of Concerns¶
Structure and attributes are independent. This makes each simpler to reason about and optimize.
2. Everything is Append-Only¶
Never delete, only mark as inactive. This enables: - Efficient version control - Time-travel queries - Simpler concurrent access patterns
3. Columnar is Fundamental¶
Not an optimization—it's the core design. Structure is graph, data is columnar.
4. Track All Changes¶
Deltas are first-class citizens. Every change is recorded, enabling reproducibility and audit trails.
5. Views, Not Copies¶
Create views into data rather than copying. Subgraphs, tables, and arrays are all views.
The Philosophy: Everything is a Graph¶
Even Groggy's architecture is a graph:
- Nodes = Core objects (GraphSpace, GraphPool, Delta, etc.)
- Edges = Dependencies and transformations
This recursive thinking influenced the API design where objects transform into each other via delegation chains.
Key Takeaways¶
The ultralight example established these enduring truths:
- Structure ≠ Signal: Separate topology from attributes
- Columnar is Key: Bulk operations are the common case
- Track Everything: Deltas enable time-travel and reproducibility
- Views > Copies: Immutable views are cheap and safe
- Graph Thinking: Apply graph concepts recursively
These principles guide every design decision in Groggy today.
Next Steps¶
- Architecture Deep Dive: Detailed look at the three-tier system
- Connected Views: Master object transformations
- User Guide: Start building with these concepts