Performance Optimization¶
Groggy is designed for high performance with a Rust core, but optimal performance requires understanding the library's design patterns. This guide covers benchmarking, optimization techniques, and scaling strategies.
Performance Philosophy¶
Groggy's performance comes from:
- Rust core: Zero-cost abstractions, memory safety without garbage collection
- Columnar storage: Cache-friendly data layout for bulk operations
- Lazy evaluation: Views and deferred computation where beneficial
- Parallel algorithms: Multi-threaded execution for large graphs
- Attribute-first design: Fast filtering and queries on node/edge attributes
- Explicit trait-backed delegation: Direct PyO3 methods (100ns FFI budget) instead of dynamic lookups
Trait-Backed Delegation (v0.5.1+)
Starting in v0.5.1, Groggy uses explicit PyO3 methods backed by Rust traits rather than dynamic __getattr__ delegation. This provides:
- 20x faster method calls: ~100ns per FFI call vs ~2000ns for dynamic lookup
- Better IDE support: All methods discoverable via autocomplete
- Clear stack traces: Direct method names in debugging
See Trait-Backed Delegation for architectural details.
Benchmarking¶
Basic Timing¶
Measure operation performance:
import groggy as gr
import time
g = gr.generators.erdos_renyi(n=10000, p=0.01)
# Time an operation
start = time.perf_counter()
components = g.connected_components()
elapsed = time.perf_counter() - start
print(f"Connected components: {elapsed*1000:.2f} ms")
Using timeit¶
For more accurate measurements:
import timeit
def benchmark():
g = gr.generators.erdos_renyi(n=10000, p=0.01)
return g.connected_components()
# Run multiple times
times = timeit.repeat(benchmark, number=10, repeat=3)
avg_time = min(times) / 10
print(f"Average: {avg_time*1000:.2f} ms")
Memory Profiling¶
Track memory usage:
import tracemalloc
tracemalloc.start()
g = gr.generators.erdos_renyi(n=100000, p=0.001)
components = g.connected_components()
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024**2:.1f} MB")
print(f"Peak: {peak / 1024**2:.1f} MB")
tracemalloc.stop()
Optimization Patterns¶
Pattern 1: Bulk Operations¶
Avoid: Loops over individual operations Prefer: Bulk operations
# ❌ Slow: Individual operations
g = gr.Graph()
for i in range(10000):
g.add_node(id=i, value=i*2)
# ✅ Fast: Bulk operations
nodes_data = [{"id": i, "value": i*2} for i in range(10000)]
g.add_nodes(nodes_data)
Why faster: - Single FFI call vs 10,000 FFI calls - Rust can optimize batch processing - Better cache locality
Pattern 2: Attribute-First Filtering¶
Avoid: Fetching all data then filtering in Python Prefer: Push filters to Rust
# ❌ Slow: Filter in Python
all_nodes = g.nodes.table().to_pandas()
young = all_nodes[all_nodes['age'] < 30]
# ✅ Fast: Filter in Rust
young = g.nodes[g.nodes['age'] < 30]
Why faster: - Filtering happens in Rust (compiled code) - Only matching data crosses FFI boundary - Columnar storage optimized for attribute queries
Pattern 3: Views vs Materialization¶
Understand when to materialize:
# Views are free
sub = g.nodes[g.nodes['age'] < 30] # O(1) view creation
# But repeated access on views can be costly
for _ in range(1000):
count = sub.node_count() # Recomputes filter each time
# Materialize for repeated access
sub_graph = sub.to_graph() # O(n) materialization
for _ in range(1000):
count = sub_graph.node_count() # O(1) cached result
Guidelines: - Use views for single-pass operations - Materialize for repeated access - Profile to find the crossover point
Pattern 4: Avoid Unnecessary Conversions¶
# ❌ Slow: Multiple conversions
df = g.nodes.table().to_pandas()
ages = df['age'].tolist()
mean_age = sum(ages) / len(ages)
# ✅ Fast: Use native operations
mean_age = g.nodes['age'].mean()
Why faster: - No DataFrame overhead - No Python list creation - Computation stays in Rust
Pattern 5: Sparse vs Dense Matrices¶
For large graphs, understand matrix representations:
# Large sparse graph
g = gr.generators.erdos_renyi(n=10000, p=0.001)
# Adjacency is sparse (fast)
A = g.adjacency_matrix() # ~100K edges, not 100M
# Operations respect sparsity
A2 = A.matmul(A) # Sparse matrix multiply
# Dense conversion only if needed
if A.is_sparse() and some_condition:
A_dense = A.dense() # Careful: 400MB for 10K nodes
Pattern 6: Parallel Algorithms¶
Groggy automatically parallelizes when beneficial:
# Automatically parallel for large graphs
g = gr.generators.erdos_renyi(n=100000, p=0.0001)
components = g.connected_components() # Uses multiple threads
No configuration needed: - Algorithms choose parallelization based on size - Rust's rayon handles thread pool - GIL released during computation
Pattern 7: Incremental Updates¶
For evolving graphs, use incremental operations:
# ❌ Slow: Rebuild entire graph
for new_edge in stream:
g_old = g
g = gr.Graph()
# Copy all nodes/edges + new edge
# ✅ Fast: Incremental updates
for new_edge in stream:
g.add_edge(*new_edge) # O(1) append
Scaling Strategies¶
Small Graphs (< 1K nodes)¶
Characteristics: - FFI overhead dominates - Memory not a concern - Single-threaded often faster
Recommendations:
# Minimize FFI crossings
data = prepare_all_data() # Python-side prep
g = gr.Graph()
g.add_nodes(data['nodes']) # Single bulk call
g.add_edges(data['edges'])
Medium Graphs (1K - 100K nodes)¶
Characteristics: - Computation and memory balanced - Parallelization starts helping - Attribute queries very fast
Recommendations:
# Use bulk operations
g.connected_components(inplace=True, label='component')
# Leverage columnar storage
high_degree = g.nodes[g.nodes['degree'] > 10]
# Cache computed results
degrees = g.degree() # Compute once
for analysis in analyses:
analysis.use_degrees(degrees)
Large Graphs (100K - 10M nodes)¶
Characteristics: - Memory becomes critical - Parallel algorithms essential - Sparse representations required
Recommendations:
# Work with subgraphs
core = g.nodes[g.nodes['degree'] > 5]
core_analysis = core.connected_components()
# Stream processing for edges
# (if streaming API available)
# Use sparse matrices
A = g.adjacency_matrix()
assert A.is_sparse() # Verify sparse
Very Large Graphs (> 10M nodes)¶
Characteristics: - Cannot fit all in memory - Must use sampling/partitioning - Approximate algorithms needed
Recommendations:
# Sample for analysis
sample = g.nodes.sample(100000)
sample_graph = sample.to_graph()
metrics = sample_graph.degree()
# Partition by components
components = g.connected_components()
for comp in components[:10]: # Process largest 10
analyze(comp.to_graph())
# Use approximate algorithms
# (when available)
# estimate = g.approximate_pagerank()
Common Performance Issues¶
Issue 1: Python Loops Over Graph Elements¶
Problem:
Solution:
Why: - First version: N FFI calls, Python arithmetic - Second version: 1 FFI call, Rust arithmetic
Issue 2: Repeated DataFrame Conversions¶
Problem:
for attribute in ['age', 'score', 'rank']:
df = g.nodes.table().to_pandas() # Slow conversion each time
print(df[attribute].mean())
Solution:
# Convert once
df = g.nodes.table().to_pandas()
for attribute in ['age', 'score', 'rank']:
print(df[attribute].mean())
# Or better: stay in Groggy
for attribute in ['age', 'score', 'rank']:
print(g.nodes[attribute].mean())
Issue 3: Unnecessary Materialization¶
Problem:
# Materializes even though we only need count
filtered = g.nodes[g.nodes['age'] < 30].to_graph()
count = filtered.node_count()
Solution:
# View is sufficient
filtered = g.nodes[g.nodes['age'] < 30]
count = len(filtered) # or filtered.node_count()
Issue 4: Inefficient Attribute Access¶
Problem:
# Access pattern unfriendly to columnar storage
for node in g.nodes.ids():
process(
g.nodes[node]['age'],
g.nodes[node]['score'],
g.nodes[node]['rank']
)
Solution:
# Columnar access
ages = g.nodes['age'].to_list()
scores = g.nodes['score'].to_list()
ranks = g.nodes['rank'].to_list()
for age, score, rank in zip(ages, scores, ranks):
process(age, score, rank)
# Or even better: bulk operation
result = process_bulk(
g.nodes['age'],
g.nodes['score'],
g.nodes['rank']
)
Profiling and Debugging¶
Finding Bottlenecks¶
Use Python profilers:
import cProfile
import pstats
def analysis():
g = gr.generators.karate_club()
g.connected_components(inplace=True)
return g.nodes.table().to_pandas()
# Profile
profiler = cProfile.Profile()
profiler.enable()
result = analysis()
profiler.disable()
# View results
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)
Rust-Level Profiling¶
For core Rust performance:
# Build with profiling
cargo build --release --features profiling
# Use system profilers
# macOS: Instruments
# Linux: perf, valgrind
Memory Leaks¶
Check for memory growth:
import gc
for i in range(100):
g = gr.generators.erdos_renyi(n=10000, p=0.01)
components = g.connected_components()
if i % 10 == 0:
gc.collect()
current, peak = tracemalloc.get_traced_memory()
print(f"Iteration {i}: {current / 1024**2:.1f} MB")
Performance Checklist¶
Before Optimization¶
- [ ] Profile to find actual bottleneck
- [ ] Measure baseline performance
- [ ] Identify the slowest 20% of code
- [ ] Understand data size and growth
Graph Construction¶
- [ ] Use bulk
add_nodes()andadd_edges() - [ ] Prepare data in Python, send in batches
- [ ] Avoid repeated small additions
- [ ] Set attributes during creation when possible
Graph Queries¶
- [ ] Use attribute-based filtering
- [ ] Keep filters in Rust (not Python)
- [ ] Materialize only when needed
- [ ] Cache frequently accessed results
Algorithms¶
- [ ] Let Groggy parallelize automatically
- [ ] Work on subgraphs when appropriate
- [ ] Use in-place operations when available
- [ ] Batch algorithm runs when possible
Data Export¶
- [ ] Convert to DataFrame once, not repeatedly
- [ ] Use appropriate format (CSV, Parquet, Bundle)
- [ ] Export only needed attributes
- [ ] Stream large exports if available
Real-World Examples¶
Example 1: Social Network Analysis¶
# Load large social network
g = gr.GraphTable.load_bundle("social_network.bundle")
# ✅ Efficient analysis
start = time.time()
# Filter in Rust
active_users = g.nodes[g.nodes['last_active'] > cutoff_date]
# Compute in bulk
degrees = active_users.degree()
# Materialize for multi-use
active_graph = active_users.to_graph()
components = active_graph.connected_components()
# Export efficiently
results = {
'degrees': degrees.to_list(),
'num_components': len(components)
}
print(f"Analysis: {time.time() - start:.2f}s")
Example 2: Real-Time Updates¶
# Streaming graph updates
g = gr.Graph()
batch = []
for event in event_stream:
batch.append(event)
# Batch updates
if len(batch) >= 1000:
edges = [(e['src'], e['tgt'], e['attrs']) for e in batch]
g.add_edges(edges)
batch.clear()
# Periodic analysis
if g.edge_count() % 10000 == 0:
metrics = compute_metrics(g)
emit_metrics(metrics)
Example 3: Large Graph Sampling¶
# Too large to process fully
g = gr.GraphTable.load_bundle("huge_graph.bundle")
# Sample and analyze
sample_size = 10000
sample = g.nodes.sample(sample_size)
sample_graph = sample.to_graph()
# Fast analysis on sample
degrees = sample_graph.degree()
clustering = sample_graph.clustering_coefficient()
# Extrapolate to full graph
estimated_avg_degree = degrees.mean() * (g.node_count() / sample_size)
Performance Goals¶
Groggy targets these performance characteristics:
| Operation | Target Complexity | Notes |
|---|---|---|
| Add node | O(1) amortized | Append to columnar storage |
| Add edge | O(1) amortized | Append to edge list |
| Attribute query | O(n) | Full column scan |
| Attribute filter | O(n) | Vectorized in Rust |
| BFS/DFS | O(V + E) | Parallel for large graphs |
| Connected components | O(V + E) | Union-find with path compression |
| Degree computation | O(E) | Linear in edge count |
| Subgraph creation (view) | O(1) | Lazy view |
| Subgraph materialization | O(V + E) | Copy induced subgraph |
Quick Reference¶
Fast Operations¶
# These are optimized
g.add_nodes(data) # Bulk insert
g.nodes[g.nodes['age'] > 30] # Attribute filter
g.degree() # Degree computation
g.connected_components() # Parallel algorithm
A = g.adjacency_matrix() # Sparse matrix
g.nodes['age'].mean() # Columnar stats
Slow Patterns to Avoid¶
# Avoid these
for node in g.nodes.ids(): # Python loop
g.nodes[node]['value'] # Individual access
df = g.to_pandas() # Repeated conversions
for analysis in analyses:
df = g.to_pandas()
sub.to_graph() # Unnecessary materialization
if sub.node_count() > 10: # When view suffices
When to Materialize¶
# Keep as view
sub = g.nodes[filter]
count = len(sub) # One-time use
# Materialize
sub = g.nodes[filter]
sub_g = sub.to_graph()
for _ in range(100): # Repeated access
analysis(sub_g)
See Also¶
- Graph Core Guide: Understanding core operations
- Algorithms Guide: Algorithm complexity reference
- Arrays Guide: Columnar data operations
- Matrices Guide: Sparse vs dense matrices
- Integration Guide: Efficient data exchange
- Architecture Docs: System design