Working with Subgraphs¶
A Subgraph is an immutable view into a portion of a Graph. It doesn't copy data—it just tracks which nodes and edges belong to the view.
What is a Subgraph?¶
Think of a Subgraph as a window into your graph:
import groggy as gr
# Full graph
g = gr.generators.karate_club()
print(f"Full graph: {g.node_count()} nodes")
# Subgraph - just a view
sub = g.nodes[:10] # First 10 nodes
print(f"Subgraph: {sub.node_count()} nodes")
# No data was copied!
Key characteristics: - View, not copy: References parent graph, no data duplication - Immutable: Cannot modify the subgraph view directly - Cheap to create: O(1) to create, stores only node/edge IDs - Transformable: Can convert to Graph, Table, Array, or Matrix
Creating Subgraphs¶
Via Slicing¶
Use Python slice notation on nodes or edges:
g = gr.generators.karate_club()
# First 10 nodes
first_ten = g.nodes[:10]
# Nodes 5 through 15
middle = g.nodes[5:15]
# Every other node
every_other = g.nodes[::2]
# First 5 edges
first_edges = g.edges[:5]
Via Filtering¶
Filter by attribute conditions:
g = gr.Graph()
alice = g.add_node(name="Alice", age=29, role="Engineer")
bob = g.add_node(name="Bob", age=55, role="Manager")
carol = g.add_node(name="Carol", age=31, role="Engineer")
# Engineers only
engineers = g.nodes[g.nodes["role"] == "Engineer"]
print(f"Engineers: {engineers.node_count()}") # 2
# People over 30
older = g.nodes[g.nodes["age"] > 30]
print(f"Over 30: {older.node_count()}") # 2
Boolean operations:
# Combine conditions with & (and) or | (or)
young_engineers = g.nodes[
(g.nodes["role"] == "Engineer") & (g.nodes["age"] < 30)
]
# Note: Use & and | for boolean arrays, not 'and'/'or'
Via Specific IDs¶
Select nodes or edges by their IDs:
Via Explicit Subgraph Method¶
Use the subgraph() method for full control:
# Specify both nodes and edges
sub = g.subgraph(
nodes=[0, 1, 2, 3],
edges=[0, 1] # Only include specific edges
)
Induced subgraphs (default):
By default, selecting nodes automatically includes all edges between those nodes:
# Selects nodes 0, 1, 2 and ALL edges between them
sub = g.nodes[[0, 1, 2]]
# This is an "induced" subgraph
Working with Subgraphs¶
Inspecting Subgraphs¶
Get basic information:
sub = g.nodes[:100]
# Counts
print(f"Nodes: {sub.node_count()}")
print(f"Edges: {sub.edge_count()}")
# IDs
node_ids = sub.node_ids() # NumArray
edge_ids = sub.edge_ids() # NumArray
# Check if empty
if sub.is_empty():
print("No nodes in subgraph")
# Check connectivity
if sub.is_connected():
print("Subgraph is connected")
Accessing Attributes¶
Access attributes just like on a Graph:
# Via attribute name
names = sub["name"] # BaseArray
ages = sub["age"] # NumArray (if numeric)
# Via accessors
names = sub.nodes["name"]
weights = sub.edges["weight"]
# Statistical operations on numeric attributes
mean_age = sub.nodes["age"].mean()
max_weight = sub.edges["weight"].max()
Graph Properties¶
Subgraphs support many graph analysis methods:
# Degree
degrees = sub.degree() # NumArray
in_deg = sub.in_degree()
out_deg = sub.out_degree()
# Density
density = sub.density()
print(f"Edge density: {density:.3f}")
# Adjacency
adj_matrix = sub.adjacency_matrix() # GraphMatrix
adj_list = sub.adjacency_list() # dict
Transforming Subgraphs¶
Subgraph → Graph¶
Materialize the view as an independent graph:
sub = g.nodes[:100]
# Convert to full Graph (copies data)
new_graph = sub.to_graph()
# Now you can modify it
new_graph.add_node(name="NewPerson")
new_graph.commit("Added new person")
When to materialize: - Need to modify the subset - Want to persist it independently - Need version control on the subset
Subgraph → Table¶
Convert to tabular representation:
sub = g.nodes[g.nodes["active"] == True]
# Get as table
table = sub.table() # GraphTable
# Access node/edge tables
nodes_df = table.nodes.to_pandas()
edges_df = table.edges.to_pandas()
# Or directly
nodes_table = sub.nodes.table() # NodesTable
When to use tables: - Exporting to CSV/Parquet - Analysis with pandas - Tabular aggregations
Subgraph → Matrix¶
Get matrix representations:
# Adjacency matrix
A = sub.adjacency_matrix()
# or
A = sub.to_matrix()
# Adjacency list
adj = sub.adjacency_list()
print(adj[0]) # Neighbors of node 0
Subgraph → Arrays¶
Extract specific data as arrays:
# Node/edge IDs
node_ids = sub.node_ids() # NumArray
edge_ids = sub.edge_ids() # NumArray
# Attributes
ages = sub.nodes["age"] # NumArray
names = sub.nodes["name"] # BaseArray
# Accessors
nodes_accessor = sub.nodes # NodesAccessor
edges_accessor = sub.edges # EdgesAccessor
Running Algorithms on Subgraphs¶
Subgraphs support graph algorithms:
Connected Components¶
# Find components within the subgraph
components = sub.connected_components() # SubgraphArray
# Check if connected
if sub.is_connected():
print("Subgraph is a single component")
else:
print(f"Found {len(components)} components")
Sampling¶
# Random sample of nodes
sample = sub.sample(n=50) # Returns Subgraph
# Sample gives you another subgraph view
print(f"Sampled {sample.node_count()} nodes")
Neighborhood Expansion¶
# Expand to k-hop neighborhood
expanded = sub.neighborhood(depth=2) # SubgraphArray
# This returns an array of neighborhoods around each node
Common Patterns¶
Pattern 1: Filter → Analyze → Export¶
# Filter to subset
active_users = g.nodes[g.nodes["active"] == True]
# Analyze
mean_age = active_users["age"].mean()
num_users = active_users.node_count()
print(f"Active users: {num_users}, mean age: {mean_age:.1f}")
# Export
active_users.table().to_csv("active_users.csv")
Pattern 2: Slice → Check → Expand¶
# Start with specific nodes
seed_nodes = g.nodes[[0, 1, 2]]
# Check properties
if seed_nodes.is_connected():
print("Seeds are connected")
# Expand to neighborhood
expanded = seed_nodes.neighborhood(depth=2)
Pattern 3: Chain Multiple Filters¶
# Progressive filtering
result = (
g.nodes[g.nodes["active"] == True] # Filter 1
.nodes[g.nodes["age"] > 25] # Filter 2 (Note: re-filter on g.nodes)
.nodes[g.nodes["role"] == "Engineer"] # Filter 3
)
# Alternative: combine conditions
result = g.nodes[
(g.nodes["active"] == True) &
(g.nodes["age"] > 25) &
(g.nodes["role"] == "Engineer")
]
Pattern 4: Compare Subgraphs¶
# Create two views
engineers = g.nodes[g.nodes["role"] == "Engineer"]
senior = g.nodes[g.nodes["age"] > 40]
# Compare sizes
print(f"Engineers: {engineers.node_count()}")
print(f"Senior: {senior.node_count()}")
# Get stats
print(f"Engineers mean age: {engineers['age'].mean():.1f}")
print(f"Senior mean age: {senior['age'].mean():.1f}")
Pattern 5: Temporary Working Set¶
# Create temporary view for analysis
temp = g.nodes[:1000] # First 1000 nodes
# Do expensive computation on subset
components = temp.connected_components()
density = temp.density()
# Discard when done (temp is just a view)
# No cleanup needed!
Performance Considerations¶
Time Complexity¶
| Operation | Complexity | Notes |
|---|---|---|
| Create subgraph | O(1) | Just stores node/edge IDs |
node_count() |
O(1) | BitSet count |
contains_node() |
O(1) | BitSet lookup |
to_graph() |
O(V + E + A) | Full copy, expensive |
table() |
O(1) | Creates view |
| Access attributes | O(1) per access | Via parent graph |
Memory¶
- Subgraph storage: ~40 bytes + 2 BitSets
- BitSet overhead: ~(num_entities / 8) bytes
- No attribute duplication: Attributes stay in parent GraphPool
- Materialization:
to_graph()creates full copy
Optimization Tips¶
1. Delay materialization:
# ✓ Stay as view
sub = g.nodes[:1000]
result = sub.table().agg({"age": "mean"})
# ✗ Unnecessary copy
graph = sub.to_graph()
result = graph.table().agg({"age": "mean"})
2. Chain filters efficiently:
# Efficient - single filtered view
result = g.nodes[cond1].nodes[cond2].nodes[cond3]
# Better - combine conditions
result = g.nodes[cond1 & cond2 & cond3]
3. Bulk operations over loops:
# ✓ Fast
ages = sub.nodes["age"]
mean_age = ages.mean()
# ✗ Slow
total = sum(sub.nodes[n]["age"] for n in sub.node_ids())
Subgraph Limitations¶
Cannot Modify Directly¶
Subgraphs are immutable views:
sub = g.nodes[:10]
# ✗ Cannot modify subgraph
# sub.add_node() # No such method
# ✓ Materialize first, then modify
new_graph = sub.to_graph()
new_graph.add_node(name="NewPerson")
Must Access Parent Graph for Modifications¶
To modify nodes in a subgraph, work through the parent:
sub = g.nodes[g.nodes["active"] == True]
# Get IDs from subgraph
node_ids = sub.node_ids()
# Modify via parent graph
for nid in node_ids:
g.nodes.set_attrs({nid: {"processed": True}})
# Or use bulk operations
g.nodes.set_attrs({nid: {"processed": True} for nid in node_ids})
Parent Graph Must Stay Alive¶
The parent graph must not be deleted while subgraph exists:
def create_subgraph():
g = gr.Graph()
g.add_node(name="Alice")
return g.nodes[:1] # Returns subgraph
# ✗ Dangerous - g goes out of scope
# sub = create_subgraph() # Parent graph deleted!
# ✓ Keep parent alive
g = gr.Graph()
g.add_node(name="Alice")
sub = g.nodes[:1] # Safe - g still in scope
When to Use Subgraphs¶
Use Subgraphs when: - ✅ Filtering by conditions - ✅ Working with a portion of a large graph - ✅ Temporary analysis without modifications - ✅ Building delegation chains - ✅ Memory efficiency matters
Use Graph when: - ❌ Need to modify structure - ❌ Need version control - ❌ Persisting for later use - ❌ Independent lifecycle required
See Also¶
- Subgraph API Reference: Complete method reference
- SubgraphArray Guide: Working with collections of subgraphs
- Graph Core Guide: Parent graph operations
- Accessors Guide: NodesAccessor and EdgesAccessor details
- Object Transformations: Delegation chains