Tutorial 2: PageRank — Iterative Algorithms with builder.iterate()¶
Express PageRank with the current builder API and let the Batch Executor handle the loop. We’ll build a compact iteration that reuses neighbor aggregation each round.
What You’ll Learn¶
- Structuring loops with
builder.iterate() - Neighbor aggregation via
map_nodes(...) - Reassigning variables across iterations
- Normalizing ranks each pass
Prerequisites¶
- Tutorial 1: Hello World
- Familiarity with PageRank basics (damping, iterations)
Step 1: Builder Setup¶
builder.iterate() marks the loop so the Batch Executor can batch iterations when the body is compatible.
Step 2: Initialize and Precompute¶
ranks = b.init_nodes(default=1.0) # start with uniform scores
degrees = b.node_degrees(ranks) # degrees per node
inv_deg = b.core.recip(degrees, epsilon=1e-9) # safe 1/deg
Degrees don’t change during PageRank, so we compute them once.
Step 3: Iterative Update¶
with b.iterate(max_iter):
contrib = b.core.mul(ranks, inv_deg) # rank / degree
neighbor_sum = b.map_nodes(
"sum(contrib[neighbors(node)])",
inputs={"contrib": contrib},
)
# Damped neighbor influence + small teleport; normalize each pass
ranks = b.core.add(b.core.mul(neighbor_sum, damping), 1 - damping)
ranks = b.normalize(ranks, method="sum")
map_nodesaggregates neighbor contributions each iteration.- Reassigning
ranksinside the loop carries the value forward. - Normalizing each round keeps the vector stable.
Step 4: Attach and Build¶
Step 5: Run It¶
G = gr.generators.karate_club()
result = G.view().apply(pagerank_algo)
scores = result.nodes["pagerank"]
print(scores[:5])
The returned subgraph contains the pagerank attribute on nodes.
Optional: Sink Handling¶
To redistribute sink mass, zero out contributions where degree is 0 and add a uniform sink term. A simple approach is to mask sinks in the neighbor_sum expression (e.g., is_sink ? 0 : rank/deg) and normalize each iteration. For exact parity with a reference implementation, compute sink mass on the Python side and feed a scalar into the builder.
Tested Reference Implementation (matches builder tests)¶
def build_pagerank_builder(builder, max_iter=100, damping=0.85):
"""
Mirrors the native Rust PageRank (see tests/test_builder_pagerank.py).
- Precomputes out-degrees and sink mask once
- Applies damping, teleport, and sink redistribution each iteration
"""
node_count = builder.graph_node_count()
inv_n = builder.core.recip(node_count, epsilon=1e-9)
ranks = builder.init_nodes(default=1.0)
ranks = builder.var("ranks", builder.core.broadcast_scalar(inv_n, ranks))
degrees = builder.node_degrees(ranks)
inv_degrees = builder.core.recip(degrees, epsilon=1e-9)
sink_mask = builder.core.compare(degrees, "eq", 0.0)
inv_n_map = builder.core.broadcast_scalar(inv_n, degrees)
teleport = builder.core.mul(inv_n_map, 1.0 - damping)
with builder.iterate(max_iter):
contrib = builder.core.mul(ranks, inv_degrees)
contrib = builder.core.where(sink_mask, 0.0, contrib)
neighbor_sum = builder.core.neighbor_agg(contrib, agg="sum")
damped = builder.core.mul(neighbor_sum, damping)
sink_ranks = builder.core.where(sink_mask, ranks, 0.0)
sink_mass = builder.core.reduce_scalar(sink_ranks, op="sum")
sink_share = builder.core.mul(inv_n_map, sink_mass)
sink_share = builder.core.mul(sink_share, damping)
next_ranks = builder.core.add(damped, teleport)
next_ranks = builder.core.add(next_ranks, sink_share)
ranks = builder.var("ranks", next_ranks)
builder.attach_as("pagerank", ranks)
return builder
Performance Note¶
This tutorial uses builder.iterate(), which enables the Batch Executor when the loop body is compatible. Iterative runs (PageRank/LPA) typically see 10–100x speedups compared to step-by-step execution.
Recap¶
builder.iterate()enables batched execution for iterative algorithms.- Use
map_nodes("sum(...[neighbors(node)])", inputs=...)for neighbor aggregation. - Normalize inside the loop to keep ranks bounded.
Next: Tutorial 3: Label Propagation for async updates with map_nodes(async_update=True).
Reduction¶
.reduce("sum") sums all values into a single scalar.
Complete Example¶
from groggy.builder import algorithm
import groggy as gg
@algorithm("pagerank")
def pagerank(sG, damping=0.85, max_iter=100):
"""
Compute PageRank scores for all nodes.
Args:
G: Graph handle (injected by decorator)
damping: Damping factor (default: 0.85)
max_iter: Maximum iterations (default: 100)
Returns:
Normalized PageRank scores
"""
# Initialize ranks uniformly
ranks = G.nodes(1.0 / G.N)
# Precompute degrees and identify sinks
degrees = ranks.degrees()
inv_degrees = 1.0 / (degrees + 1e-9)
is_sink = (degrees == 0.0)
# Iterate until convergence (or max_iter)
with G.builder.iter.loop(max_iter):
# Compute contributions (sinks contribute 0)
contrib = is_sink.where(0.0, ranks * inv_degrees)
# Aggregate neighbor contributions
neighbor_sum = G @ contrib
# Redistribute sink mass uniformly
sink_mass = is_sink.where(ranks, 0.0).reduce("sum")
# PageRank formula: damped neighbors + teleport + sink redistribution
new_ranks = (
damping * neighbor_sum +
(1 - damping) / G.N +
damping * sink_mass / G.N
)
# Update ranks for next iteration
ranks = G.builder.var("ranks", new_ranks)
# Normalize to sum to 1.0
return ranks.normalize()
# Example usage
if __name__ == "__main__":
# Create a simple web graph
G = gg.Graph(directed=True)
G.add_edges_from([
(1, 2), (1, 3), # Page 1 links to 2 and 3
(2, 3), (2, 4), # Page 2 links to 3 and 4
(3, 1), # Page 3 links back to 1
(4, 3), # Page 4 links to 3
])
# Compute PageRank
pr_algo = pagerank(damping=0.85, max_iter=50)
result = G.all().apply(pr_algo)
# Print results
ranks = result.nodes()["pagerank"]
for node, rank in sorted(ranks.items(), key=lambda x: -x[1]):
print(f"Node {node}: {rank:.4f}")
Expected output:
Node 3 has the highest PageRank because it's linked to by multiple nodes!
Understanding the Flow¶
Let's trace what happens in one iteration:
Before iteration:
Node 1: rank = 0.25, degree = 2
Node 2: rank = 0.25, degree = 2
Node 3: rank = 0.25, degree = 1
Node 4: rank = 0.25, degree = 1
Step 1: Compute contributions
Node 1 contributes: 0.25 / 2 = 0.125 to each neighbor (2, 3)
Node 2 contributes: 0.25 / 2 = 0.125 to each neighbor (3, 4)
Node 3 contributes: 0.25 / 1 = 0.250 to neighbor (1)
Node 4 contributes: 0.25 / 1 = 0.250 to neighbor (3)
Step 2: Aggregate neighbors
Node 1 receives: 0.250 from node 3
Node 2 receives: 0.125 from node 1
Node 3 receives: 0.125 + 0.125 + 0.250 = 0.500 from nodes 1, 2, 4
Node 4 receives: 0.125 from node 2
Step 3: Apply damping
Node 1: 0.85 * 0.250 + 0.15 * 0.25 = 0.250
Node 2: 0.85 * 0.125 + 0.15 * 0.25 = 0.144
Node 3: 0.85 * 0.500 + 0.15 * 0.25 = 0.463
Node 4: 0.85 * 0.125 + 0.15 * 0.25 = 0.144
After many iterations, these values converge to the final PageRank!
Exercises¶
-
Personalized PageRank: Start with non-uniform initial ranks
-
Early stopping: Stop when ranks change by less than a threshold
- This requires convergence detection (future feature)
-
For now, try different
max_itervalues -
Weighted PageRank: Use edge weights if available
-
Compare with NetworkX: Verify your results match NetworkX's PageRank
Key Takeaways¶
✅ Use G.builder.iter.loop(n) for fixed iteration loops
✅ Use G.builder.var("name", value) to carry values between iterations
✅ Use G @ values to aggregate neighbor values
✅ Use mask.where(if_true, if_false) for conditional logic
✅ Use .reduce("sum") to aggregate all values to a scalar
✅ Use .normalize() to normalize values to sum to 1.0
✅ Handle edge cases (sinks, zero degrees) with epsilon terms
Common Mistakes¶
❌ Forgetting to reassign loop variables
✅ Correct
with G.builder.iter.loop(10):
new_ranks = damping * neighbor_sum
ranks = G.builder.var("ranks", new_ranks) # Explicitly reassign
❌ Using Python's built-in operators on scalars
# Wrong - G.N returns a VarHandle, not a Python number
if G.N > 0: # This won't work as expected!
...
✅ Use comparison operators that return masks
# Use mask.where() for conditional logic
non_empty = (G.N > 0)
result = non_empty.where(compute_value(), 0.0)
❌ Modifying loop-invariant variables inside the loop
✅ Compute constants outside the loop
Next Steps¶
- Tutorial 3: Label Propagation - Learn about async updates and mode aggregation
- Tutorial 4: Custom Metrics - Build complex node/edge metrics
- API Reference: IterOps - Learn about iteration control
- API Reference: GraphOps - Learn about neighbor operations
Ready for asynchronous updates? Continue to Tutorial 3: Label Propagation!