Skip to content

Commit 72b97be

Browse files
Introduce UtilizationMonitor. (#1)
1 parent 43973c2 commit 72b97be

20 files changed

Lines changed: 1455 additions & 14 deletions

async-service-supervisor.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Gem::Specification.new do |spec|
2626

2727
spec.add_dependency "async-bus"
2828
spec.add_dependency "async-service", "~> 0.15"
29+
spec.add_dependency "async-utilization", "~> 0.3"
2930
spec.add_dependency "io-endpoint"
3031
spec.add_dependency "memory", "~> 0.7"
3132
spec.add_dependency "memory-leak", "~> 0.10"

context/getting-started.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,14 +133,17 @@ service "supervisor" do
133133
Async::Service::Supervisor::MemoryMonitor.new(
134134
interval: 10,
135135
maximum_size_limit: 1024 * 1024 * 500 # 500MB limit per process
136+
),
137+
138+
# Aggregate application-level metrics (connections, requests) from workers:
139+
Async::Service::Supervisor::UtilizationMonitor.new(
140+
interval: 10
136141
)
137142
]
138143
end
139144
end
140145
```
141146

142-
See the {ruby Async::Service::Supervisor::MemoryMonitor Memory Monitor} and {ruby Async::Service::Supervisor::ProcessMonitor Process Monitor} guides for detailed configuration options and best practices.
143-
144147
### Collecting Diagnostics
145148

146149
The supervisor can collect various diagnostics from workers on demand:

context/index.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,8 @@ files:
2323
title: Process Monitor
2424
description: This guide explains how to use the <code class="language-ruby">Async::Service::Supervisor::ProcessMonitor</code>
2525
to log CPU and memory metrics for your worker processes.
26+
- path: utilization-monitor.md
27+
title: Utilization Monitor
28+
description: This guide explains how to use the <code class="language-ruby">Async::Service::Supervisor::UtilizationMonitor</code>
29+
to collect and aggregate application-level utilization metrics from your worker
30+
processes.

context/utilization-monitor.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Utilization Monitor
2+
3+
This guide explains how to use the {ruby Async::Service::Supervisor::UtilizationMonitor} to collect and aggregate application-level utilization metrics from your worker processes.
4+
5+
## Overview
6+
7+
While the {ruby Async::Service::Supervisor::ProcessMonitor} captures OS-level metrics (CPU, memory) and the {ruby Async::Service::Supervisor::MemoryMonitor} takes action when limits are exceeded, the `UtilizationMonitor` focuses on **application-level metrics**—connections, requests, queue depths, and other business-specific utilization data. Without it, you can't easily answer questions like "How many active connections do my workers have?" or "What is the total request throughput across all workers?"
8+
9+
The `UtilizationMonitor` solves this by using shared memory to efficiently collect metrics from workers and aggregate them by service name. Workers write metrics to a shared memory segment; the supervisor periodically reads and aggregates them without any IPC overhead during collection.
10+
11+
Use the `UtilizationMonitor` when you need:
12+
13+
- **Application observability**: Track connections, requests, queue depths, or custom metrics across workers.
14+
- **Service-level aggregation**: See totals per service (e.g., "echo" service: 42 connections, 1000 messages).
15+
- **Lightweight collection**: Avoid IPC or network calls—metrics are read directly from shared memory.
16+
- **Integration with logging**: Emit aggregated metrics to your logging pipeline for dashboards and alerts.
17+
18+
The monitor uses the `async-utilization` gem for schema definition and shared memory layout. Workers must include {ruby Async::Service::Supervisor::Supervised} and define a `utilization_schema` to participate.
19+
20+
## Usage
21+
22+
### Supervisor Configuration
23+
24+
Add a utilization monitor to your supervisor service:
25+
26+
```ruby
27+
service "supervisor" do
28+
include Async::Service::Supervisor::Environment
29+
30+
monitors do
31+
[
32+
Async::Service::Supervisor::UtilizationMonitor.new(
33+
path: File.expand_path("utilization.shm", root),
34+
interval: 10 # Aggregate and emit metrics every 10 seconds
35+
)
36+
]
37+
end
38+
end
39+
```
40+
41+
### Worker Configuration
42+
43+
Workers must include {ruby Async::Service::Supervisor::Supervised} and define a `utilization_schema` that describes the metrics they expose:
44+
45+
```ruby
46+
service "echo" do
47+
include Async::Service::Managed::Environment
48+
include Async::Service::Supervisor::Supervised
49+
50+
service_class EchoService
51+
52+
utilization_schema do
53+
{
54+
connections_total: :u64,
55+
connections_active: :u32,
56+
messages_total: :u64
57+
}
58+
end
59+
end
60+
```
61+
62+
### Emitting Metrics from Workers
63+
64+
Workers obtain a utilization registry from the evaluator and use it to update metrics:
65+
66+
```ruby
67+
def run(instance, evaluator)
68+
evaluator.prepare!(instance)
69+
instance.ready!
70+
71+
registry = evaluator.utilization_registry
72+
connections_total = registry.metric(:connections_total)
73+
connections_active = registry.metric(:connections_active)
74+
messages_total = registry.metric(:messages_total)
75+
76+
@bound_endpoint.accept do |peer|
77+
connections_total.increment
78+
connections_active.track do
79+
peer.each_line do |line|
80+
messages_total.increment
81+
peer.write(line)
82+
end
83+
end
84+
end
85+
end
86+
```
87+
88+
The supervisor aggregates these metrics by service name and emits them at the configured interval. For example:
89+
90+
```json
91+
{
92+
"echo": {
93+
"connections_total": 150,
94+
"connections_active": 12,
95+
"messages_total": 45000
96+
}
97+
}
98+
```
99+
100+
## Configuration Options
101+
102+
### `path`
103+
104+
Path to the shared memory file used for worker metrics. Default: `"utilization.shm"` (relative to current working directory).
105+
106+
Be explicit about the path when using {ruby Async::Service::Supervisor::Environment} so supervisor and workers resolve the same file regardless of working directory:
107+
108+
```ruby
109+
monitors do
110+
[
111+
Async::Service::Supervisor::UtilizationMonitor.new(
112+
path: File.expand_path("utilization.shm", root),
113+
interval: 10
114+
)
115+
]
116+
end
117+
```
118+
119+
### `interval`
120+
121+
The interval (in seconds) at which to aggregate and emit utilization metrics. Default: `10` seconds.
122+
123+
```ruby
124+
# Emit every second for high-frequency monitoring
125+
Async::Service::Supervisor::UtilizationMonitor.new(interval: 1)
126+
127+
# Emit every 5 minutes for low-overhead monitoring
128+
Async::Service::Supervisor::UtilizationMonitor.new(interval: 300)
129+
```
130+
131+
### `size`
132+
133+
Total size of the shared memory buffer. Default: `IO::Buffer::PAGE_SIZE * 8`. The buffer grows automatically when more workers are registered than segments available.
134+
135+
```ruby
136+
Async::Service::Supervisor::UtilizationMonitor.new(
137+
size: IO::Buffer::PAGE_SIZE * 32 # Larger initial buffer for many workers
138+
)
139+
```
140+
141+
### `segment_size`
142+
143+
Size of each allocation segment per worker. Default: `512` bytes. Must accommodate your schema; the `async-utilization` gem lays out fields according to type (e.g., `u64` = 8 bytes, `u32` = 4 bytes).
144+
145+
```ruby
146+
Async::Service::Supervisor::UtilizationMonitor.new(
147+
segment_size: 256 # Smaller segments if schema is compact
148+
)
149+
```
150+
151+
## Schema Types
152+
153+
The `utilization_schema` maps metric names to types supported by {ruby IO::Buffer}:
154+
155+
| Type | Size | Use case |
156+
|------|------|----------|
157+
| `:u32` | 4 bytes | Counters that may wrap (e.g., connections_active) |
158+
| `:u64` | 8 bytes | Monotonically increasing counters (e.g., requests_total) |
159+
| `:i32` | 4 bytes | Signed 32-bit values |
160+
| `:i64` | 8 bytes | Signed 64-bit values |
161+
| `:f32` | 4 bytes | Single-precision floats |
162+
| `:f64` | 8 bytes | Double-precision floats |
163+
164+
Prefer `:u64` for totals that only increase; use `:u32` for gauges or values that may decrease.
165+
166+
## Default Schema
167+
168+
The {ruby Async::Service::Supervisor::Supervised} mixin provides a default schema if you don't override `utilization_schema`:
169+
170+
```ruby
171+
{
172+
connections_active: :u32,
173+
connections_total: :u64,
174+
requests_active: :u32,
175+
requests_total: :u64
176+
}
177+
```
178+
179+
Override it when your service has different metrics:
180+
181+
```ruby
182+
utilization_schema do
183+
{
184+
connections_active: :u32,
185+
connections_total: :u64,
186+
messages_total: :u64,
187+
queue_depth: :u32
188+
}
189+
end
190+
```
191+
192+
## Metric API
193+
194+
The utilization registry provides methods to update metrics:
195+
196+
- **`increment`**: Increment a counter by 1.
197+
- **`set(value)`**: Set a gauge to a specific value.
198+
- **`track { ... }`**: Execute a block and increment/decrement a gauge around it (e.g., `connections_active` while handling a connection).
199+
200+
```ruby
201+
connections_total = registry.metric(:connections_total)
202+
connections_active = registry.metric(:connections_active)
203+
204+
# Increment total connections when a client connects
205+
connections_total.increment
206+
207+
# Track active connections for the duration of the block
208+
connections_active.track do
209+
handle_client(peer)
210+
end
211+
```
212+
213+
## Aggregation Behavior
214+
215+
Metrics are aggregated by service name (from `supervisor_worker_state[:name]`). Values are summed across workers of the same service. For example, with 4 workers each reporting `connections_active: 3`, the aggregated value is `12`.
216+
217+
## Best Practices
218+
219+
- **Define a minimal schema**: Only include metrics you need; each field consumes shared memory.
220+
- **Use appropriate types**: `u64` for ever-increasing counters; `u32` for gauges.
221+
- **Match schema across workers**: All workers of the same service should use the same schema for consistent aggregation.
222+
- **Combine with other monitors**: Use `UtilizationMonitor` alongside `ProcessMonitor` and `MemoryMonitor` for full observability.
223+
224+
## Common Pitfalls
225+
226+
- **Workers without schema**: Workers that don't define `utilization_schema` (or return `nil`) are not registered. They won't contribute to utilization metrics.
227+
- **Schema mismatch**: If workers of the same service use different schemas, aggregation may produce incorrect or partial results.
228+
- **Path permissions**: Ensure the shared memory path is accessible to all worker processes (e.g., same user, or appropriate permissions).
229+
- **Segment size**: If your schema is large, increase `segment_size` to avoid allocation failures.

examples/echo/service.rb

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
#!/usr/bin/env async-service
2+
# frozen_string_literal: true
3+
4+
# Released under the MIT License.
5+
# Copyright, 2025, by Samuel Williams.
6+
7+
require "async/service/supervisor"
8+
require "async/service/managed/service"
9+
require "async/utilization"
10+
require "io/endpoint/host_endpoint"
11+
12+
class EchoService < Async::Service::Managed::Service
13+
def initialize(...)
14+
super
15+
16+
@bound_endpoint = nil
17+
@endpoint = nil
18+
end
19+
20+
# Prepare the bound endpoint for the server.
21+
def start
22+
@endpoint = IO::Endpoint.tcp("127.0.0.1", 8080)
23+
24+
Sync do
25+
@bound_endpoint = @endpoint.bound
26+
end
27+
28+
Console.info(self, "Starting echo server on #{@endpoint}")
29+
30+
super
31+
end
32+
33+
def run(instance, evaluator)
34+
evaluator.prepare!(instance)
35+
36+
instance.ready!
37+
38+
registry = evaluator.utilization_registry
39+
connections_total = registry.metric(:connections_total)
40+
connections_active = registry.metric(:connections_active)
41+
messages_total = registry.metric(:messages_total)
42+
43+
Async do |task|
44+
@bound_endpoint.accept do |peer|
45+
connections_total.increment
46+
connections_active.track do
47+
Console.info(self, "Client connected", peer: peer)
48+
49+
peer.each_line do |line|
50+
messages_total.increment
51+
peer.write(line)
52+
end
53+
54+
Console.info(self, "Client disconnected", peer: peer)
55+
end
56+
end
57+
end
58+
59+
# Return the bound endpoint for health checking
60+
@bound_endpoint
61+
end
62+
63+
# Close the bound endpoint.
64+
def stop(...)
65+
if @bound_endpoint
66+
@bound_endpoint.close
67+
@bound_endpoint = nil
68+
end
69+
70+
@endpoint = nil
71+
72+
super
73+
end
74+
end
75+
76+
service "echo" do
77+
include Async::Service::Managed::Environment
78+
include Async::Service::Supervisor::Supervised
79+
80+
service_class EchoService
81+
82+
utilization_schema do
83+
{
84+
connections_total: :u64,
85+
connections_active: :u32,
86+
messages_total: :u64
87+
}
88+
end
89+
end
90+
91+
service "supervisor" do
92+
include Async::Service::Supervisor::Environment
93+
94+
monitors do
95+
[
96+
Async::Service::Supervisor::UtilizationMonitor.new(
97+
path: "utilization.shm",
98+
interval: 1
99+
)
100+
]
101+
end
102+
end

examples/echo/utilization.shm

128 KB
Binary file not shown.

guides/getting-started/readme.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,14 +133,17 @@ service "supervisor" do
133133
Async::Service::Supervisor::MemoryMonitor.new(
134134
interval: 10,
135135
maximum_size_limit: 1024 * 1024 * 500 # 500MB limit per process
136+
),
137+
138+
# Aggregate application-level metrics (connections, requests) from workers:
139+
Async::Service::Supervisor::UtilizationMonitor.new(
140+
interval: 10
136141
)
137142
]
138143
end
139144
end
140145
```
141146

142-
See the {ruby Async::Service::Supervisor::MemoryMonitor Memory Monitor} and {ruby Async::Service::Supervisor::ProcessMonitor Process Monitor} guides for detailed configuration options and best practices.
143-
144147
### Collecting Diagnostics
145148

146149
The supervisor can collect various diagnostics from workers on demand:

guides/links.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,6 @@ memory-monitor:
99

1010
process-monitor:
1111
order: 4
12+
13+
utilization-monitor:
14+
order: 5

0 commit comments

Comments
 (0)