Working with Subgraphs¶

A Subgraph is an immutable view into a portion of a Graph. It doesn't copy data—it just tracks which nodes and edges belong to the view.

What is a Subgraph?¶

Think of a Subgraph as a window into your graph:

import groggy as gr

# Full graph
g = gr.generators.karate_club()
print(f"Full graph: {g.node_count()} nodes")

# Subgraph - just a view
sub = g.nodes[:10]  # First 10 nodes
print(f"Subgraph: {sub.node_count()} nodes")
# No data was copied!

Key characteristics: - View, not copy: References parent graph, no data duplication - Immutable: Cannot modify the subgraph view directly - Cheap to create: O(1) to create, stores only node/edge IDs - Transformable: Can convert to Graph, Table, Array, or Matrix

Creating Subgraphs¶

Via Slicing¶

Use Python slice notation on nodes or edges:

g = gr.generators.karate_club()

# First 10 nodes
first_ten = g.nodes[:10]

# Nodes 5 through 15
middle = g.nodes[5:15]

# Every other node
every_other = g.nodes[::2]

# First 5 edges
first_edges = g.edges[:5]

Via Filtering¶

Filter by attribute conditions:

g = gr.Graph()
alice = g.add_node(name="Alice", age=29, role="Engineer")
bob = g.add_node(name="Bob", age=55, role="Manager")
carol = g.add_node(name="Carol", age=31, role="Engineer")

# Engineers only
engineers = g.nodes[g.nodes["role"] == "Engineer"]
print(f"Engineers: {engineers.node_count()}")  # 2

# People over 30
older = g.nodes[g.nodes["age"] > 30]
print(f"Over 30: {older.node_count()}")  # 2

Boolean operations:

# Combine conditions with & (and) or | (or)
young_engineers = g.nodes[
    (g.nodes["role"] == "Engineer") & (g.nodes["age"] < 30)
]

# Note: Use & and | for boolean arrays, not 'and'/'or'

Via Specific IDs¶

Select nodes or edges by their IDs:

# Specific node IDs
sub = g.nodes[[0, 5, 10, 15]]

# Specific edge IDs
edge_sub = g.edges[[0, 1, 2]]

Via Explicit Subgraph Method¶

Use the subgraph() method for full control:

# Specify both nodes and edges
sub = g.subgraph(
    nodes=[0, 1, 2, 3],
    edges=[0, 1]  # Only include specific edges
)

Induced subgraphs (default):

By default, selecting nodes automatically includes all edges between those nodes:

# Selects nodes 0, 1, 2 and ALL edges between them
sub = g.nodes[[0, 1, 2]]

# This is an "induced" subgraph

Working with Subgraphs¶

Inspecting Subgraphs¶

Get basic information:

sub = g.nodes[:100]

# Counts
print(f"Nodes: {sub.node_count()}")
print(f"Edges: {sub.edge_count()}")

# IDs
node_ids = sub.node_ids()  # NumArray
edge_ids = sub.edge_ids()  # NumArray

# Check if empty
if sub.is_empty():
    print("No nodes in subgraph")

# Check connectivity
if sub.is_connected():
    print("Subgraph is connected")

Accessing Attributes¶

Access attributes just like on a Graph:

# Via attribute name
names = sub["name"]  # BaseArray
ages = sub["age"]    # NumArray (if numeric)

# Via accessors
names = sub.nodes["name"]
weights = sub.edges["weight"]

# Statistical operations on numeric attributes
mean_age = sub.nodes["age"].mean()
max_weight = sub.edges["weight"].max()

Graph Properties¶

Subgraphs support many graph analysis methods:

# Degree
degrees = sub.degree()      # NumArray
in_deg = sub.in_degree()
out_deg = sub.out_degree()

# Density
density = sub.density()
print(f"Edge density: {density:.3f}")

# Adjacency
adj_matrix = sub.adjacency_matrix()  # GraphMatrix
adj_list = sub.adjacency_list()      # dict

Transforming Subgraphs¶

Subgraph → Graph¶

Materialize the view as an independent graph:

sub = g.nodes[:100]

# Convert to full Graph (copies data)
new_graph = sub.to_graph()

# Now you can modify it
new_graph.add_node(name="NewPerson")
new_graph.commit("Added new person")

When to materialize: - Need to modify the subset - Want to persist it independently - Need version control on the subset

Subgraph → Table¶

Convert to tabular representation:

sub = g.nodes[g.nodes["active"] == True]

# Get as table
table = sub.table()  # GraphTable

# Access node/edge tables
nodes_df = table.nodes.to_pandas()
edges_df = table.edges.to_pandas()

# Or directly
nodes_table = sub.nodes.table()  # NodesTable

When to use tables: - Exporting to CSV/Parquet - Analysis with pandas - Tabular aggregations

Subgraph → Matrix¶

Get matrix representations:

# Adjacency matrix
A = sub.adjacency_matrix()
# or
A = sub.to_matrix()

# Adjacency list
adj = sub.adjacency_list()
print(adj[0])  # Neighbors of node 0

Subgraph → Arrays¶

Extract specific data as arrays:

# Node/edge IDs
node_ids = sub.node_ids()  # NumArray
edge_ids = sub.edge_ids()  # NumArray

# Attributes
ages = sub.nodes["age"]    # NumArray
names = sub.nodes["name"]  # BaseArray

# Accessors
nodes_accessor = sub.nodes  # NodesAccessor
edges_accessor = sub.edges  # EdgesAccessor

Running Algorithms on Subgraphs¶

Subgraphs support graph algorithms:

Connected Components¶

# Find components within the subgraph
components = sub.connected_components()  # SubgraphArray

# Check if connected
if sub.is_connected():
    print("Subgraph is a single component")
else:
    print(f"Found {len(components)} components")

Sampling¶

# Random sample of nodes
sample = sub.sample(n=50)  # Returns Subgraph

# Sample gives you another subgraph view
print(f"Sampled {sample.node_count()} nodes")

Neighborhood Expansion¶

# Expand to k-hop neighborhood
expanded = sub.neighborhood(depth=2)  # SubgraphArray

# This returns an array of neighborhoods around each node

Common Patterns¶

Pattern 1: Filter → Analyze → Export¶

# Filter to subset
active_users = g.nodes[g.nodes["active"] == True]

# Analyze
mean_age = active_users["age"].mean()
num_users = active_users.node_count()
print(f"Active users: {num_users}, mean age: {mean_age:.1f}")

# Export
active_users.table().to_csv("active_users.csv")

Pattern 2: Slice → Check → Expand¶

# Start with specific nodes
seed_nodes = g.nodes[[0, 1, 2]]

# Check properties
if seed_nodes.is_connected():
    print("Seeds are connected")

# Expand to neighborhood
expanded = seed_nodes.neighborhood(depth=2)

Pattern 3: Chain Multiple Filters¶

# Progressive filtering
result = (
    g.nodes[g.nodes["active"] == True]   # Filter 1
     .nodes[g.nodes["age"] > 25]         # Filter 2 (Note: re-filter on g.nodes)
     .nodes[g.nodes["role"] == "Engineer"]  # Filter 3
)

# Alternative: combine conditions
result = g.nodes[
    (g.nodes["active"] == True) &
    (g.nodes["age"] > 25) &
    (g.nodes["role"] == "Engineer")
]

Pattern 4: Compare Subgraphs¶

# Create two views
engineers = g.nodes[g.nodes["role"] == "Engineer"]
senior = g.nodes[g.nodes["age"] > 40]

# Compare sizes
print(f"Engineers: {engineers.node_count()}")
print(f"Senior: {senior.node_count()}")

# Get stats
print(f"Engineers mean age: {engineers['age'].mean():.1f}")
print(f"Senior mean age: {senior['age'].mean():.1f}")

Pattern 5: Temporary Working Set¶

# Create temporary view for analysis
temp = g.nodes[:1000]  # First 1000 nodes

# Do expensive computation on subset
components = temp.connected_components()
density = temp.density()

# Discard when done (temp is just a view)
# No cleanup needed!

Performance Considerations¶

Time Complexity¶

Operation	Complexity	Notes
Create subgraph	O(1)	Just stores node/edge IDs
`node_count()`	O(1)	BitSet count
`contains_node()`	O(1)	BitSet lookup
`to_graph()`	O(V + E + A)	Full copy, expensive
`table()`	O(1)	Creates view
Access attributes	O(1) per access	Via parent graph

Memory¶

Subgraph storage: ~40 bytes + 2 BitSets
BitSet overhead: ~(num_entities / 8) bytes
No attribute duplication: Attributes stay in parent GraphPool
Materialization: to_graph() creates full copy

Optimization Tips¶

1. Delay materialization:

# ✓ Stay as view
sub = g.nodes[:1000]
result = sub.table().agg({"age": "mean"})

# ✗ Unnecessary copy
graph = sub.to_graph()
result = graph.table().agg({"age": "mean"})

2. Chain filters efficiently:

# Efficient - single filtered view
result = g.nodes[cond1].nodes[cond2].nodes[cond3]

# Better - combine conditions
result = g.nodes[cond1 & cond2 & cond3]

3. Bulk operations over loops:

# ✓ Fast
ages = sub.nodes["age"]
mean_age = ages.mean()

# ✗ Slow
total = sum(sub.nodes[n]["age"] for n in sub.node_ids())

Subgraph Limitations¶

Cannot Modify Directly¶

Subgraphs are immutable views:

sub = g.nodes[:10]

# ✗ Cannot modify subgraph
# sub.add_node()  # No such method

# ✓ Materialize first, then modify
new_graph = sub.to_graph()
new_graph.add_node(name="NewPerson")

Must Access Parent Graph for Modifications¶

To modify nodes in a subgraph, work through the parent:

sub = g.nodes[g.nodes["active"] == True]

# Get IDs from subgraph
node_ids = sub.node_ids()

# Modify via parent graph
for nid in node_ids:
    g.nodes.set_attrs({nid: {"processed": True}})

# Or use bulk operations
g.nodes.set_attrs({nid: {"processed": True} for nid in node_ids})

Parent Graph Must Stay Alive¶

The parent graph must not be deleted while subgraph exists:

def create_subgraph():
    g = gr.Graph()
    g.add_node(name="Alice")
    return g.nodes[:1]  # Returns subgraph

# ✗ Dangerous - g goes out of scope
# sub = create_subgraph()  # Parent graph deleted!

# ✓ Keep parent alive
g = gr.Graph()
g.add_node(name="Alice")
sub = g.nodes[:1]  # Safe - g still in scope

When to Use Subgraphs¶

Use Subgraphs when: - ✅ Filtering by conditions - ✅ Working with a portion of a large graph - ✅ Temporary analysis without modifications - ✅ Building delegation chains - ✅ Memory efficiency matters

Use Graph when: - ❌ Need to modify structure - ❌ Need version control - ❌ Persisting for later use - ❌ Independent lifecycle required

Working with Subgraphs¶

What is a Subgraph?¶

Creating Subgraphs¶

Via Slicing¶

Via Filtering¶

Via Specific IDs¶

Via Explicit Subgraph Method¶

Working with Subgraphs¶

Inspecting Subgraphs¶

Accessing Attributes¶

Graph Properties¶

Transforming Subgraphs¶

Subgraph → Graph¶

Subgraph → Table¶

Subgraph → Matrix¶

Subgraph → Arrays¶

Running Algorithms on Subgraphs¶

Connected Components¶

Sampling¶

Neighborhood Expansion¶

Common Patterns¶

Pattern 1: Filter → Analyze → Export¶

Pattern 2: Slice → Check → Expand¶

Pattern 3: Chain Multiple Filters¶

Pattern 4: Compare Subgraphs¶

Pattern 5: Temporary Working Set¶

Performance Considerations¶

Time Complexity¶

Memory¶

Optimization Tips¶

Subgraph Limitations¶

Cannot Modify Directly¶

Must Access Parent Graph for Modifications¶

Parent Graph Must Stay Alive¶

When to Use Subgraphs¶

See Also¶