Tutorial 2: PageRank — Iterative Algorithms with `builder.iterate()`¶

Express PageRank with the current builder API and let the Batch Executor handle the loop. We’ll build a compact iteration that reuses neighbor aggregation each round.

What You’ll Learn¶

Structuring loops with builder.iterate()
Neighbor aggregation via map_nodes(...)
Reassigning variables across iterations
Normalizing ranks each pass

Prerequisites¶

Tutorial 1: Hello World
Familiarity with PageRank basics (damping, iterations)

Step 1: Builder Setup¶

import groggy as gr

b = gr.builder("pagerank_builder")
damping = 0.85
max_iter = 20

builder.iterate() marks the loop so the Batch Executor can batch iterations when the body is compatible.

Step 2: Initialize and Precompute¶

ranks = b.init_nodes(default=1.0)          # start with uniform scores
degrees = b.node_degrees(ranks)            # degrees per node
inv_deg = b.core.recip(degrees, epsilon=1e-9)  # safe 1/deg

Degrees don’t change during PageRank, so we compute them once.

Step 3: Iterative Update¶

with b.iterate(max_iter):
    contrib = b.core.mul(ranks, inv_deg)  # rank / degree

    neighbor_sum = b.map_nodes(
        "sum(contrib[neighbors(node)])",
        inputs={"contrib": contrib},
    )

    # Damped neighbor influence + small teleport; normalize each pass
    ranks = b.core.add(b.core.mul(neighbor_sum, damping), 1 - damping)
    ranks = b.normalize(ranks, method="sum")

map_nodes aggregates neighbor contributions each iteration.
Reassigning ranks inside the loop carries the value forward.
Normalizing each round keeps the vector stable.

Step 4: Attach and Build¶

b.attach_as("pagerank", ranks)
pagerank_algo = b.build()

Step 5: Run It¶

G = gr.generators.karate_club()
result = G.view().apply(pagerank_algo)

scores = result.nodes["pagerank"]
print(scores[:5])

The returned subgraph contains the pagerank attribute on nodes.

Optional: Sink Handling¶

To redistribute sink mass, zero out contributions where degree is 0 and add a uniform sink term. A simple approach is to mask sinks in the neighbor_sum expression (e.g., is_sink ? 0 : rank/deg) and normalize each iteration. For exact parity with a reference implementation, compute sink mass on the Python side and feed a scalar into the builder.

Tested Reference Implementation (matches builder tests)¶

def build_pagerank_builder(builder, max_iter=100, damping=0.85):
    """
    Mirrors the native Rust PageRank (see tests/test_builder_pagerank.py).
    - Precomputes out-degrees and sink mask once
    - Applies damping, teleport, and sink redistribution each iteration
    """
    node_count = builder.graph_node_count()
    inv_n = builder.core.recip(node_count, epsilon=1e-9)
    ranks = builder.init_nodes(default=1.0)
    ranks = builder.var("ranks", builder.core.broadcast_scalar(inv_n, ranks))

    degrees = builder.node_degrees(ranks)
    inv_degrees = builder.core.recip(degrees, epsilon=1e-9)
    sink_mask = builder.core.compare(degrees, "eq", 0.0)
    inv_n_map = builder.core.broadcast_scalar(inv_n, degrees)
    teleport = builder.core.mul(inv_n_map, 1.0 - damping)

    with builder.iterate(max_iter):
        contrib = builder.core.mul(ranks, inv_degrees)
        contrib = builder.core.where(sink_mask, 0.0, contrib)
        neighbor_sum = builder.core.neighbor_agg(contrib, agg="sum")
        damped = builder.core.mul(neighbor_sum, damping)

        sink_ranks = builder.core.where(sink_mask, ranks, 0.0)
        sink_mass = builder.core.reduce_scalar(sink_ranks, op="sum")
        sink_share = builder.core.mul(inv_n_map, sink_mass)
        sink_share = builder.core.mul(sink_share, damping)

        next_ranks = builder.core.add(damped, teleport)
        next_ranks = builder.core.add(next_ranks, sink_share)
        ranks = builder.var("ranks", next_ranks)

    builder.attach_as("pagerank", ranks)
    return builder

Performance Note¶

This tutorial uses builder.iterate(), which enables the Batch Executor when the loop body is compatible. Iterative runs (PageRank/LPA) typically see 10–100x speedups compared to step-by-step execution.

Recap¶

builder.iterate() enables batched execution for iterative algorithms.
Use map_nodes("sum(...[neighbors(node)])", inputs=...) for neighbor aggregation.
Normalize inside the loop to keep ranks bounded.

Next: Tutorial 3: Label Propagation for async updates with map_nodes(async_update=True).

Reduction¶

sink_mass = is_sink.where(ranks, 0.0).reduce("sum")

.reduce("sum") sums all values into a single scalar.

Complete Example¶

from groggy.builder import algorithm
import groggy as gg

@algorithm("pagerank")
def pagerank(sG, damping=0.85, max_iter=100):
    """
    Compute PageRank scores for all nodes.

    Args:
        G: Graph handle (injected by decorator)
        damping: Damping factor (default: 0.85)
        max_iter: Maximum iterations (default: 100)

    Returns:
        Normalized PageRank scores
    """
    # Initialize ranks uniformly
    ranks = G.nodes(1.0 / G.N)

    # Precompute degrees and identify sinks
    degrees = ranks.degrees()
    inv_degrees = 1.0 / (degrees + 1e-9)
    is_sink = (degrees == 0.0)

    # Iterate until convergence (or max_iter)
    with G.iterate(max_iter):
        # Compute contributions (sinks contribute 0)
        contrib = is_sink.where(0.0, ranks * inv_degrees)

        # Aggregate neighbor contributions
        neighbor_sum = G @ contrib

        # Redistribute sink mass uniformly
        sink_mass = is_sink.where(ranks, 0.0).reduce("sum")

        # PageRank formula: damped neighbors + teleport + sink redistribution
        new_ranks = (
            damping * neighbor_sum +
            (1 - damping) / G.N +
            damping * sink_mass / G.N
        )

        # Update ranks for next iteration
        ranks = G.builder.var("ranks", new_ranks)

    # Normalize to sum to 1.0
    return ranks.normalize()

# Example usage
if __name__ == "__main__":
    # Create a simple web graph
    G = gg.Graph(directed=True)
    G.add_edges_from([
        (1, 2), (1, 3),  # Page 1 links to 2 and 3
        (2, 3), (2, 4),  # Page 2 links to 3 and 4
        (3, 1),          # Page 3 links back to 1
        (4, 3),          # Page 4 links to 3
    ])

    # Compute PageRank
    pr_algo = pagerank(damping=0.85, max_iter=50)
    result = G.all().apply(pr_algo)

    # Print results
    ranks = result.nodes()["pagerank"]
    for node, rank in sorted(ranks.items(), key=lambda x: -x[1]):
        print(f"Node {node}: {rank:.4f}")

Expected output:

Node 3: 0.3721
Node 1: 0.2677
Node 2: 0.2072
Node 4: 0.1530

Node 3 has the highest PageRank because it's linked to by multiple nodes!

Understanding the Flow¶

Let's trace what happens in one iteration:

Before iteration:

Node 1: rank = 0.25, degree = 2
Node 2: rank = 0.25, degree = 2
Node 3: rank = 0.25, degree = 1
Node 4: rank = 0.25, degree = 1

Step 1: Compute contributions

Node 1 contributes: 0.25 / 2 = 0.125 to each neighbor (2, 3)
Node 2 contributes: 0.25 / 2 = 0.125 to each neighbor (3, 4)
Node 3 contributes: 0.25 / 1 = 0.250 to neighbor (1)
Node 4 contributes: 0.25 / 1 = 0.250 to neighbor (3)

Step 2: Aggregate neighbors

Node 1 receives: 0.250 from node 3
Node 2 receives: 0.125 from node 1
Node 3 receives: 0.125 + 0.125 + 0.250 = 0.500 from nodes 1, 2, 4
Node 4 receives: 0.125 from node 2

Step 3: Apply damping

Node 1: 0.85 * 0.250 + 0.15 * 0.25 = 0.250
Node 2: 0.85 * 0.125 + 0.15 * 0.25 = 0.144
Node 3: 0.85 * 0.500 + 0.15 * 0.25 = 0.463
Node 4: 0.85 * 0.125 + 0.15 * 0.25 = 0.144

After many iterations, these values converge to the final PageRank!

Exercises¶

Personalized PageRank: Start with non-uniform initial ranks

# Give node 1 all the initial rank
ranks = G.nodes(0.0)
ranks = ranks.where(node_id == 1, 1.0, 0.0)

Early stopping: Stop when ranks change by less than a threshold
This requires convergence detection (future feature)
For now, try different max_iter values

Weighted PageRank: Use edge weights if available

# Load edge weights
weights = G.builder.attr.load_edge("weight", default=1.0)

# Use in aggregation
neighbor_sum = G.builder.graph.neighbor_agg(contrib, "sum", weights=weights)

Compare with NetworkX: Verify your results match NetworkX's PageRank

import networkx as nx
nx_graph = nx.DiGraph(G.edges())
nx_ranks = nx.pagerank(nx_graph, alpha=0.85)

Key Takeaways¶

✅ Use G.iterate(n) for fixed iteration loops
✅ Use G.builder.var("name", value) to carry values between iterations
✅ Use G @ values to aggregate neighbor values
✅ Use mask.where(if_true, if_false) for conditional logic
✅ Use .reduce("sum") to aggregate all values to a scalar
✅ Use .normalize() to normalize values to sum to 1.0
✅ Handle edge cases (sinks, zero degrees) with epsilon terms

Common Mistakes¶

❌ Forgetting to reassign loop variables

with G.iterate(10):
    ranks = damping * neighbor_sum  # Wrong! This doesn't carry forward

✅ Correct

with G.iterate(10):
    new_ranks = damping * neighbor_sum
    ranks = G.builder.var("ranks", new_ranks)  # Explicitly reassign

❌ Using Python's built-in operators on scalars

# Wrong - G.N returns a VarHandle, not a Python number
if G.N > 0:  # This won't work as expected!
    ...

✅ Use comparison operators that return masks

# Use mask.where() for conditional logic
non_empty = (G.N > 0)
result = non_empty.where(compute_value(), 0.0)

❌ Modifying loop-invariant variables inside the loop

with G.iterate(10):
    degrees = ranks.degrees()  # Wasteful! Degrees don't change

✅ Compute constants outside the loop

degrees = ranks.degrees()  # Compute once
with G.iterate(10):
    # Use degrees here

Next Steps¶

Tutorial 3: Label Propagation - Learn about async updates and mode aggregation
Tutorial 4: Custom Metrics - Build complex node/edge metrics
API Reference: IterOps - Learn about iteration control
API Reference: GraphOps - Learn about neighbor operations

Ready for asynchronous updates? Continue to Tutorial 3: Label Propagation!

Tutorial 2: PageRank — Iterative Algorithms with builder.iterate()¶