feat(telemetry): integrate OpenTelemetry observability stack with health metrics
- Add OpenTelemetry SDK, OTLP exporter, Prometheus integration - Implement connection tracking with active/total/disconnection metrics - Add health endpoint with uptime and connection counts - Integrate tracing spans for socket events and engine messages - Add metrics collection for event handling duration - Update health endpoint to include live runtime state - Add graceful telemetry shutdown in main function - Implement engine session active metrics tracking - Add namespace-specific attributes to connection metrics - Introduce message edit history retrieval endpoint - Add scheduled message CRUD operations and dispatcher - Update Socket.IO event registration with observability - Refactor component update to remove dead code allowance - Add comprehensive environment variables documentation - Implement detailed development guidelines in AGENTS.md
This commit is contained in:
@@ -0,0 +1,71 @@
|
|||||||
|
# =============================================================================
|
||||||
|
# imks — IM 实时消息服务 环境变量配置
|
||||||
|
# 复制此文件为 .env 并修改相应值
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
# --- 部署模式 ---
|
||||||
|
# Adapter 模式: "local" (单节点) | "redis" | "nats"
|
||||||
|
IMKS_ADAPTER=local
|
||||||
|
|
||||||
|
# 当前节点唯一标识(默认取主机名)
|
||||||
|
# IMKS_SERVER_ID=imks-node-1
|
||||||
|
|
||||||
|
# Redis 连接(IMKS_ADAPTER=redis 时必需)
|
||||||
|
# IMKS_REDIS_URL=redis://localhost:6379
|
||||||
|
|
||||||
|
# NATS 连接(IMKS_ADAPTER=nats 时必需)
|
||||||
|
# IMKS_NATS_URL=nats://localhost:4222
|
||||||
|
|
||||||
|
# --- WebTransport (QUIC) ---
|
||||||
|
# 启用 WebTransport 服务(需要 TLS 证书)
|
||||||
|
# IMKS_WT_ENABLED=false
|
||||||
|
# IMKS_WT_PORT=3001
|
||||||
|
# IMKS_WT_CERT_PATH=/path/to/cert.pem
|
||||||
|
# IMKS_WT_KEY_PATH=/path/to/key.pem
|
||||||
|
|
||||||
|
# --- 数据库 ---
|
||||||
|
# PostgreSQL 连接字符串
|
||||||
|
# DATABASE_URL=postgres://imks:password@localhost:5432/imks
|
||||||
|
DATABASE_URL=postgres://localhost/imks
|
||||||
|
|
||||||
|
# 连接池配置
|
||||||
|
# DATABASE_MAX_CONNECTIONS=10
|
||||||
|
# DATABASE_MIN_CONNECTIONS=2
|
||||||
|
# DATABASE_CONNECT_TIMEOUT=30
|
||||||
|
# DATABASE_IDLE_TIMEOUT=600
|
||||||
|
|
||||||
|
# --- appks gRPC 连接 ---
|
||||||
|
# appks 核心服务地址
|
||||||
|
# APPKS_GRPC_ADDR=http://localhost:50051
|
||||||
|
|
||||||
|
# 连接超时(秒)
|
||||||
|
# APPKS_GRPC_TIMEOUT=10
|
||||||
|
|
||||||
|
# mTLS 配置(生产环境必需)
|
||||||
|
# APPKS_GRPC_TLS_CA_CERT=/path/to/ca.pem
|
||||||
|
# APPKS_GRPC_TLS_CLIENT_CERT=/path/to/client.pem
|
||||||
|
# APPKS_GRPC_TLS_CLIENT_KEY=/path/to/client-key.pem
|
||||||
|
# APPKS_GRPC_TLS_DOMAIN=appks.internal
|
||||||
|
|
||||||
|
# --- OpenTelemetry 可观测性 ---
|
||||||
|
# 服务名
|
||||||
|
# OTEL_SERVICE_NAME=imks
|
||||||
|
# OTEL_SERVICE_VERSION=0.1.0
|
||||||
|
|
||||||
|
# OTLP 收集器地址
|
||||||
|
# OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
|
||||||
|
# 协议: grpc | http/protobuf
|
||||||
|
# OTEL_EXPORTER_OTLP_PROTOCOL=grpc
|
||||||
|
|
||||||
|
# 启用/禁用 telemetry
|
||||||
|
# OTEL_TRACES_ENABLED=true
|
||||||
|
# OTEL_METRICS_ENABLED=true
|
||||||
|
# OTEL_LOGS_ENABLED=true
|
||||||
|
|
||||||
|
# 日志级别: trace | debug | info | warn | error
|
||||||
|
RUST_LOG=info
|
||||||
|
# 日志格式: json | pretty
|
||||||
|
# LOG_FORMAT=json
|
||||||
|
|
||||||
|
# 部署环境标识
|
||||||
|
# OTEL_RESOURCE_ATTRIBUTES_DEPLOYMENT=development
|
||||||
@@ -0,0 +1,804 @@
|
|||||||
|
# AGENTS.md — 开发规范 / Development Guidelines
|
||||||
|
|
||||||
|
> 本文件为所有 AI 编码助手(Claude Code、pi、Cursor 等)提供统一的开发指导。
|
||||||
|
> This file provides unified development guidelines for all AI coding assistants.
|
||||||
|
|
||||||
|
**最后更新 / Last Updated**: 2026-06-11
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 目录 / Table of Contents
|
||||||
|
|
||||||
|
1. [语言 / Language](#1-语言--language)
|
||||||
|
2. [代码风格 / Code Style](#2-代码风格--code-style)
|
||||||
|
3. [禁止模式 / Forbidden Patterns](#3-禁止模式--forbidden-patterns)
|
||||||
|
4. [错误处理 / Error Handling](#4-错误处理--error-handling)
|
||||||
|
5. [安全规范 / Security](#5-安全规范--security)
|
||||||
|
6. [数据库规范 / Database](#6-数据库规范--database)
|
||||||
|
7. [Socket.IO 事件规范 / Socket.IO Event Conventions](#7-socketio-事件规范--socketio-event-conventions)
|
||||||
|
8. [日志与可观测性 / Logging & Observability](#8-日志与可观测性--logging--observability)
|
||||||
|
9. [性能规范 / Performance](#9-性能规范--performance)
|
||||||
|
10. [测试规范 / Testing](#10-测试规范--testing)
|
||||||
|
11. [Git 规范 / Git Workflow](#11-git-规范--git-workflow)
|
||||||
|
12. [工作流程 / Workflow](#12-工作流程--workflow)
|
||||||
|
13. [架构决策记录 / ADR](#13-架构决策记录--adr)
|
||||||
|
14. [审查清单 / Review Checklist](#14-审查清单--review-checklist)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. 语言 / Language
|
||||||
|
|
||||||
|
**Always respond in Chinese (中文).** Use the user's language for all conversations and explanations. Code, commands, and technical terms can remain in English.
|
||||||
|
|
||||||
|
始终使用中文回复。代码、命令和技术术语可以保留英文。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. 代码风格 / Code Style
|
||||||
|
|
||||||
|
### 2.1 基本原则 / Basic Principles
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| 遵循现有风格 | Follow existing project conventions |
|
||||||
|
| 有意义命名 | Use meaningful variable names; avoid single-letter names except loop counters |
|
||||||
|
| 函数长度 | Keep functions under **50 lines**; split complex logic into smaller functions |
|
||||||
|
| 嵌套深度 | Maximum nesting depth: **3 levels**; use early returns to flatten logic |
|
||||||
|
| 注释 | Add comments for complex logic only; prefer self-documenting code |
|
||||||
|
| 文档注释 | Public items must have `///` doc comments; private items only when logic is non-obvious |
|
||||||
|
|
||||||
|
### 2.2 Rust 最佳实践 / Rust Best Practices
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// ✅ 正确 / Correct
|
||||||
|
fn get_message(id: Uuid) -> Result<Message, sqlx::Error> {
|
||||||
|
let msg = db.find_message(id).await?; // 使用 ? 传播错误
|
||||||
|
Ok(msg)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ❌ 错误 / Incorrect
|
||||||
|
fn get_message(id: Uuid) -> Message {
|
||||||
|
db.find_message(id).await.unwrap() // 禁止 unwrap()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| 错误传播 | Use `?` operator for error propagation; never use `unwrap()` or `expect()` in non-test code |
|
||||||
|
| `unsafe` | Avoid `unsafe` blocks; if necessary, add a `// SAFETY:` comment explaining why |
|
||||||
|
| `clone()` | Minimize `clone()` usage; prefer references or `Arc` for shared ownership |
|
||||||
|
| 魔法数字 | No magic numbers; define named constants with `const` |
|
||||||
|
| 硬编码字符串 | No hardcoded strings for config/status; use enums or constants |
|
||||||
|
| 死代码 | Remove dead code; don't leave commented-out code blocks |
|
||||||
|
| 未完成代码 | Don't commit `unimplemented!()`, `todo!()`, or `FIXME` without a tracking issue |
|
||||||
|
|
||||||
|
### 2.3 导入规范 / Import Guidelines
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 标准库 → 第三方 crate → 本地模块
|
||||||
|
// stdlib → third-party crates → local modules
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use crate::models::message::Message;
|
||||||
|
use crate::socket::packet::Packet;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.4 模型设计规范 / Model Design Guidelines
|
||||||
|
|
||||||
|
imks 的 models 层采用 **一文件一实体** 的拆分策略:
|
||||||
|
|
||||||
|
```
|
||||||
|
models/
|
||||||
|
├── mod.rs # 模块声明 + 公开 re-export
|
||||||
|
├── message.rs # 核心 Message + MessageDetail + AuthorInfo + MessageType
|
||||||
|
├── message_attachment.rs # 附件
|
||||||
|
├── message_bookmark.rs # 书签/收藏
|
||||||
|
├── message_draft.rs # 草稿
|
||||||
|
├── message_edit.rs # 编辑历史
|
||||||
|
├── message_embed.rs # 富媒体嵌入 + EmbedField
|
||||||
|
├── message_mention.rs # @提及
|
||||||
|
├── message_pin.rs # 置顶消息
|
||||||
|
├── message_poll.rs # 投票 + Option + Vote
|
||||||
|
├── message_reaction.rs # 表情反应
|
||||||
|
├── message_read_state.rs # 已读状态
|
||||||
|
└── message_thread.rs # 消息线程
|
||||||
|
```
|
||||||
|
|
||||||
|
每个模型文件包含:
|
||||||
|
- Row struct(`sqlx::FromRow`)
|
||||||
|
- Summary/detail struct(API 响应用,带 `From` 转换)
|
||||||
|
- 查询 SQL 常量(`$1, $2...` 占位符)
|
||||||
|
- `CREATE_TABLE_SQL` 迁移 DDL + 索引
|
||||||
|
- `#[cfg(test)]` 序列化/转换测试
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. 禁止模式 / Forbidden Patterns
|
||||||
|
|
||||||
|
以下代码模式在项目中严格禁止:
|
||||||
|
|
||||||
|
| 禁止项 | 说明 |
|
||||||
|
|-------------------------------|------------------------------------------------|
|
||||||
|
| `// ── xxxx ──────────` | 禁止使用此类分隔线注释 |
|
||||||
|
| `unwrap()` / `expect()` (非测试) | 在非测试代码中禁止使用;使用 `?` 或 `unwrap_or` 等安全替代 |
|
||||||
|
| `panic!()` / `unreachable!()` | 除极少数不可能到达的分支外禁止使用 |
|
||||||
|
| 未处理的 `todo!()` | 不得提交包含 `todo!()` 的代码,除非有对应的 issue 追踪 |
|
||||||
|
| 注释掉的代码 | 不得提交被注释的代码块;使用 Git 历史追溯 |
|
||||||
|
| 过深嵌套 (≥4层) | 使用 early return、`match`、`map`/`and_then` 扁平化逻辑 |
|
||||||
|
| 过长函数 (>50行) | 拆分为更小的、职责单一的函数 |
|
||||||
|
| 魔法数字 | 使用 `const` 定义命名常量 |
|
||||||
|
| 硬编码字符串 | 使用枚举或常量定义配置值/状态值 |
|
||||||
|
| 死代码 | 删除未使用的代码、导入和变量 |
|
||||||
|
| `Box<dyn Error>` 在公共 API | 使用具体错误类型替代 trait object |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. 错误处理 / Error Handling
|
||||||
|
|
||||||
|
### 4.1 错误类型体系 / Error Type System
|
||||||
|
|
||||||
|
imks 使用统一的 `ImksError` 枚举和 `ImksResult<T>` 类型别名,与 appks 的 `AppError`/`AppResult` 保持一致风格。
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// error.rs — 统一错误类型
|
||||||
|
use thiserror::Error;
|
||||||
|
|
||||||
|
#[derive(Debug, Error)]
|
||||||
|
pub enum ImksError {
|
||||||
|
// Protocol layer (engine)
|
||||||
|
#[error("invalid engine packet type: {0}")]
|
||||||
|
InvalidEnginePacketType(u8),
|
||||||
|
#[error("invalid engine packet type char: {0}")]
|
||||||
|
InvalidEnginePacketTypeChar(char),
|
||||||
|
#[error("empty engine packet")]
|
||||||
|
EmptyEnginePacket,
|
||||||
|
#[error("invalid base64: {0}")]
|
||||||
|
InvalidBase64(#[from] base64::DecodeError),
|
||||||
|
#[error("invalid utf8 in packet: {0}")]
|
||||||
|
InvalidPacketUtf8(#[from] FromUtf8Error),
|
||||||
|
#[error("engine serialization error: {0}")]
|
||||||
|
EngineSerialization(String),
|
||||||
|
|
||||||
|
// Transport upgrade
|
||||||
|
#[error("session not found for upgrade")]
|
||||||
|
UpgradeSessionNotFound,
|
||||||
|
#[error("session already closed, cannot upgrade")]
|
||||||
|
UpgradeSessionClosed,
|
||||||
|
#[error("invalid session state for upgrade")]
|
||||||
|
UpgradeInvalidState,
|
||||||
|
|
||||||
|
// Socket.IO layer
|
||||||
|
#[error("invalid socket packet type: {0}")]
|
||||||
|
InvalidSocketPacketType(u8),
|
||||||
|
#[error("invalid socket packet type char: {0}")]
|
||||||
|
InvalidSocketPacketTypeChar(char),
|
||||||
|
#[error("empty socket packet")]
|
||||||
|
EmptySocketPacket,
|
||||||
|
#[error("invalid socket packet format: {0}")]
|
||||||
|
InvalidSocketPacketFormat(String),
|
||||||
|
#[error("missing namespace in socket packet")]
|
||||||
|
MissingNamespace,
|
||||||
|
#[error("invalid attachment count in binary event")]
|
||||||
|
InvalidAttachmentCount,
|
||||||
|
|
||||||
|
// Socket namespace
|
||||||
|
#[error("namespace error: {0}")]
|
||||||
|
Namespace(String),
|
||||||
|
#[error("socket not found: {0}")]
|
||||||
|
SocketNotFound(String),
|
||||||
|
#[error("failed to send packet to socket: channel full")]
|
||||||
|
SocketSendFull,
|
||||||
|
|
||||||
|
// Adapter layer
|
||||||
|
#[error("adapter redis error: {0}")]
|
||||||
|
AdapterRedis(String),
|
||||||
|
#[error("adapter nats error: {0}")]
|
||||||
|
AdapterNats(String),
|
||||||
|
#[error("adapter message bus error: {0}")]
|
||||||
|
AdapterMessageBus(String),
|
||||||
|
#[error("adapter serialization error: {0}")]
|
||||||
|
AdapterSerialization(String),
|
||||||
|
#[error("adapter room error: {0}")]
|
||||||
|
AdapterRoom(String),
|
||||||
|
|
||||||
|
// Database
|
||||||
|
#[error("database error: {0}")]
|
||||||
|
Database(#[from] sqlx::Error),
|
||||||
|
|
||||||
|
// gRPC
|
||||||
|
#[error("gRPC error: {0}")]
|
||||||
|
GrpcStatus(#[from] tonic::Status),
|
||||||
|
#[error("gRPC transport error: {0}")]
|
||||||
|
GrpcTransport(#[from] tonic::transport::Error),
|
||||||
|
|
||||||
|
// Serialization
|
||||||
|
#[error("JSON error: {0}")]
|
||||||
|
Json(#[from] serde_json::Error),
|
||||||
|
|
||||||
|
// Auth
|
||||||
|
#[error("auth error: {0}")]
|
||||||
|
Auth(String),
|
||||||
|
#[error("token expired")]
|
||||||
|
TokenExpired,
|
||||||
|
|
||||||
|
// General
|
||||||
|
#[error("not found: {0}")]
|
||||||
|
NotFound(String),
|
||||||
|
#[error("invalid input: {0}")]
|
||||||
|
InvalidInput(String),
|
||||||
|
#[error("internal error: {0}")]
|
||||||
|
Internal(String),
|
||||||
|
}
|
||||||
|
|
||||||
|
pub type ImksResult<T> = Result<T, ImksError>;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.2 错误处理原则 / Error Handling Principles
|
||||||
|
|
||||||
|
| 原则 | 说明 |
|
||||||
|
|-----------|-------------------------------------------------------------------|
|
||||||
|
| 统一类型 | 所有公共 API 返回 `ImksResult<T>`,不暴露子模块错误 |
|
||||||
|
| 显式处理 | Handle all errors explicitly; no silent failures |
|
||||||
|
| `From` 转换 | 外部库错误通过 `#[from]` 自动转换,减少 `map_err` 样板 |
|
||||||
|
| 异步传播 | Use `?` operator; don't suppress errors in spawned tasks |
|
||||||
|
| 通道满 | Handle `mpsc::TrySendError` gracefully(buffer or log),don't panic |
|
||||||
|
| gRPC 错误 | Map `tonic::Status` → `ImksError::Grpc` |
|
||||||
|
|
||||||
|
### 4.3 错误日志格式 / Error Logging Format
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 记录错误时包含完整上下文
|
||||||
|
tracing::error!(
|
||||||
|
error = %err,
|
||||||
|
socket_sid = %sid,
|
||||||
|
event = %event_name,
|
||||||
|
"Failed to handle socket event"
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.4 现有子模块迁移 / Migration from Submodule Errors
|
||||||
|
|
||||||
|
当前部分子模块使用独立的 `thiserror` enum(`PacketError`、`AdapterError`),需要逐步迁移到 `ImksError`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 旧 / Old
|
||||||
|
pub enum PacketError { ... }
|
||||||
|
pub fn decode(data: &str) -> Result<Packet, PacketError> { ... }
|
||||||
|
|
||||||
|
// 新 / New
|
||||||
|
pub fn decode(data: &str) -> ImksResult<Packet> {
|
||||||
|
// 内部仍可用 thiserror,对外转为 ImksError
|
||||||
|
inner_decode(data).map_err(|e| ImksError::Packet(e.to_string()))
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. 安全规范 / Security
|
||||||
|
|
||||||
|
> 用户认证、授权、密码管理、2FA 等企业级安全由 appks 统一处理。
|
||||||
|
> imks 作为内部消息服务,仅负责消息层面的安全。
|
||||||
|
|
||||||
|
### 5.1 基础安全 / Basic Security
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|---------|-----------------------------------------------------------------|
|
||||||
|
| 密钥管理 | Never hardcode secrets or API keys; use environment variables |
|
||||||
|
| 输入验证 | Always validate and sanitize user input(消息体、事件名、namespace path) |
|
||||||
|
| SQL 注入 | Use parameterized queries(sqlx handles this automatically) |
|
||||||
|
| gRPC 安全 | appks ↔ imks 的 gRPC 连接应使用 mTLS,防止密钥在传输中被截获 |
|
||||||
|
|
||||||
|
### 5.2 JWT 验证双模式 / Dual-Mode JWT Verification
|
||||||
|
|
||||||
|
imks 支持两种 JWT 验证模式(详见 `rpc.md`):
|
||||||
|
|
||||||
|
| 模式 | 方式 | 延迟 | 适用场景 |
|
||||||
|
|------------|---------------------------------------|-----|------------|
|
||||||
|
| **RPC 验证** | 调用 appks `TokenService.VerifyToken()` | 实时 | 敏感操作 |
|
||||||
|
| **本地验证** | 启动时拉取 `GetSigningKeys()` 缓存到本地 | 零延迟 | 高频操作(消息收发) |
|
||||||
|
|
||||||
|
推荐策略:普通操作(发消息、读频道)→ 本地验证,敏感操作 → RPC 验证。
|
||||||
|
|
||||||
|
### 5.3 连接安全 / Connection Security
|
||||||
|
|
||||||
|
| 要求 | 说明 |
|
||||||
|
|--------|------------------------------------------------|
|
||||||
|
| 命名空间验证 | 验证 namespace path(`/` 开头,≤256 字符,无控制字符),防止 DoS |
|
||||||
|
| 消息体大小 | 限制单条消息大小(`EngineConfig.max_payload`) |
|
||||||
|
| 速率限制 | 按 socket 限制消息发送频率 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. 数据库规范 / Database
|
||||||
|
|
||||||
|
### 6.1 基础规范 / Basic Rules
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|---------|------------------------------------------------------------------|
|
||||||
|
| 参数化查询 | Always use parameterized queries (sqlx does this by default) |
|
||||||
|
| 迁移规范 | All schema changes must go through migration files in `migrate/` |
|
||||||
|
| UUID v7 | 所有主键使用 UUID v7(时间有序),便于索引和游标分页 |
|
||||||
|
| 软删除 | 使用 `deleted_at` 字段进行软删除,不用硬删除 |
|
||||||
|
| 去规范化 | 对于高频读取的聚合字段(如 `total_votes`、`unread_count`),可在表中冗余存储 |
|
||||||
|
|
||||||
|
### 6.2 imks 管理的数据库表 / imks-Managed Tables
|
||||||
|
|
||||||
|
imks 仅管理消息相关表,用户/频道/成员/权限等由 appks 核心服务管理。
|
||||||
|
|
||||||
|
| 表 | 对应模型文件 | 说明 |
|
||||||
|
|-----------------------|--------------------------------|-------------------|
|
||||||
|
| `message` | `models/message.rs` | 核心消息表(由 appks 创建) |
|
||||||
|
| `message_attachment` | `models/message_attachment.rs` | 文件附件 |
|
||||||
|
| `message_embed` | `models/message_embed.rs` | 富媒体嵌入 |
|
||||||
|
| `message_embed_field` | `models/message_embed.rs` | 嵌入字段 |
|
||||||
|
| `message_poll` | `models/message_poll.rs` | 投票 |
|
||||||
|
| `message_poll_option` | `models/message_poll.rs` | 投票选项 |
|
||||||
|
| `message_poll_vote` | `models/message_poll.rs` | 投票记录 |
|
||||||
|
| `message_pin` | `models/message_pin.rs` | 置顶消息 |
|
||||||
|
| `message_read_state` | `models/message_read_state.rs` | 已读状态 |
|
||||||
|
| `message_draft` | `models/message_draft.rs` | 草稿 |
|
||||||
|
| `message_edit` | `models/message_edit.rs` | 编辑历史 |
|
||||||
|
|
||||||
|
### 6.3 性能优化 / Performance Optimization
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|-------------|---------------------------------------------------------------------------------|
|
||||||
|
| N+1 防护 | Use `JOIN` or batch queries instead of N+1 patterns |
|
||||||
|
| 游标分页 | Use UUID v7 cursor-based pagination(`WHERE id < $3 ORDER BY id DESC`),不用 OFFSET |
|
||||||
|
| 索引规范 | Add indexes for frequently queried columns; document index rationale |
|
||||||
|
| ON CONFLICT | Use `INSERT ... ON CONFLICT` for upsert patterns(draft、read_state) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Socket.IO 事件规范 / Socket.IO Event Conventions
|
||||||
|
|
||||||
|
### 7.1 事件命名 / Event Naming
|
||||||
|
|
||||||
|
imks 使用 Socket.IO 协议进行实时通信,事件名遵循以下约定:
|
||||||
|
|
||||||
|
```
|
||||||
|
// 客户端 → 服务端(发送操作)
|
||||||
|
"message:send" // 发送消息
|
||||||
|
"message:edit" // 编辑消息
|
||||||
|
"message:delete" // 删除消息
|
||||||
|
"typing:start" // 开始输入
|
||||||
|
"typing:stop" // 停止输入
|
||||||
|
"reaction:add" // 添加反应
|
||||||
|
"reaction:remove" // 移除反应
|
||||||
|
|
||||||
|
// 服务端 → 客户端(推送事件)
|
||||||
|
"message:new" // 新消息
|
||||||
|
"message:updated" // 消息已更新
|
||||||
|
"message:deleted" // 消息已删除
|
||||||
|
"typing" // 用户输入状态
|
||||||
|
"reaction:updated" // 反应已更新
|
||||||
|
"presence:update" // 在线状态变更
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.2 事件数据结构 / Event Data Structure
|
||||||
|
|
||||||
|
```json
|
||||||
|
// 客户端发送消息事件
|
||||||
|
// Client sends: "message:send"
|
||||||
|
{
|
||||||
|
"channel_id": "01909a...",
|
||||||
|
"body": "hello world",
|
||||||
|
"thread_id": null,
|
||||||
|
"reply_to_message_id": null
|
||||||
|
}
|
||||||
|
|
||||||
|
// 服务端广播新消息
|
||||||
|
// Server broadcasts: "message:new"
|
||||||
|
{
|
||||||
|
"id": "01909b...",
|
||||||
|
"channel_id": "01909a...",
|
||||||
|
"author": { "id": "...", "username": "alice" },
|
||||||
|
"body": "hello world",
|
||||||
|
"created_at": "2026-06-11T10:00:00Z",
|
||||||
|
"reactions": {},
|
||||||
|
"attachment_count": 0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.3 房间(Room)机制 / Room Mechanism
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// channel_id 作为房间名,频道内消息只广播给加入该房间的 sockets
|
||||||
|
// Use channel_id as room name; messages broadcast only to sockets in that room
|
||||||
|
namespace.emit_to_room(&channel_id, "message:new", message_data).await;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.4 适配器(Adapter)模式 / Adapter Pattern
|
||||||
|
|
||||||
|
imks 通过 Adapter trait 支持水平扩展,将 Socket.IO 事件广播到多节点:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||||
|
│ Node 1 │────→│ NATS / │←────│ Node 2 │
|
||||||
|
│ (imks) │ │ Redis │ │ (imks) │
|
||||||
|
└─────────┘ └─────────┘ └─────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
| Adapter | 适用场景 |
|
||||||
|
|---|---|
|
||||||
|
| `LocalAdapter` | 单节点开发/测试 |
|
||||||
|
| `RedisAdapter` | 生产环境 Redis Pub/Sub |
|
||||||
|
| `NatsAdapter` | 生产环境 NATS(更低延迟) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. 日志与可观测性 / Logging & Observability
|
||||||
|
|
||||||
|
### 8.1 日志规范 / Logging Standards
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 使用 tracing crate 进行结构化日志
|
||||||
|
use tracing::{info, warn, error, debug};
|
||||||
|
|
||||||
|
info!(
|
||||||
|
socket_sid = %socket.sid,
|
||||||
|
engine_sid = %socket.engine_sid,
|
||||||
|
namespace = %namespace.path,
|
||||||
|
"Socket connected"
|
||||||
|
);
|
||||||
|
|
||||||
|
warn!(
|
||||||
|
socket_sid = %sid,
|
||||||
|
"Adapter register error: {}",
|
||||||
|
e
|
||||||
|
);
|
||||||
|
|
||||||
|
error!(
|
||||||
|
error = %err,
|
||||||
|
engine_sid = %sid,
|
||||||
|
"Failed to handle engine message"
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
| 级别 | 用途 |
|
||||||
|
|---------|-------------------------------------|
|
||||||
|
| `error` | 错误需要立即关注(连接失败、数据库错误) |
|
||||||
|
| `warn` | 异常但可恢复的情况(adapter 注册失败、socket 发送失败) |
|
||||||
|
| `info` | 关键业务操作记录(连接/断开、命名空间创建) |
|
||||||
|
| `debug` | 开发调试信息(数据包收发详情) |
|
||||||
|
| `trace` | 详细执行路径 |
|
||||||
|
|
||||||
|
### 8.2 关键指标 / Key Metrics
|
||||||
|
|
||||||
|
| 指标 | 说明 |
|
||||||
|
|------------|-----------------------------------------------------|
|
||||||
|
| 活跃连接数 | Active WebSocket + Polling + WebTransport sessions |
|
||||||
|
| 消息吞吐量 | Messages sent/received per second |
|
||||||
|
| 广播延迟 | P50/P95/P99 broadcast latency across nodes |
|
||||||
|
| 事件处理延迟 | Event handling time (receive → broadcast → deliver) |
|
||||||
|
| Adapter 延迟 | NATS/Redis broadcast delay between nodes |
|
||||||
|
| 连接错误率 | Connection failure rate |
|
||||||
|
| 数据库查询延迟 | Message insert/select latency |
|
||||||
|
|
||||||
|
### 8.3 健康检查 / Health Check
|
||||||
|
|
||||||
|
```json
|
||||||
|
// GET /health
|
||||||
|
{
|
||||||
|
"status": "healthy",
|
||||||
|
"version": "0.1.0",
|
||||||
|
"uptime": 3600,
|
||||||
|
"checks": {
|
||||||
|
"postgres": { "status": "up", "latency_ms": 5 },
|
||||||
|
"redis": { "status": "up", "latency_ms": 2 },
|
||||||
|
"nats": { "status": "up", "latency_ms": 1 },
|
||||||
|
"appks_grpc": { "status": "up", "latency_ms": 3 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. 性能规范 / Performance
|
||||||
|
|
||||||
|
### 9.1 实时消息 SLA / Real-time Messaging SLA
|
||||||
|
|
||||||
|
| 指标 | 目标 |
|
||||||
|
|--------------|---------------|
|
||||||
|
| 消息端到端延迟(P50) | <50ms |
|
||||||
|
| 消息端到端延迟(P95) | <200ms |
|
||||||
|
| 消息端到端延迟(P99) | <500ms |
|
||||||
|
| 连接建立时间 | <100ms |
|
||||||
|
| 消息吞吐量(单节点) | >10,000 msg/s |
|
||||||
|
| 错误率 | <0.1% |
|
||||||
|
|
||||||
|
### 9.2 性能原则 / Performance Principles
|
||||||
|
|
||||||
|
| 原则 | 说明 |
|
||||||
|
|------|---------------------------------------------------------------------|
|
||||||
|
| 零拷贝 | Minimize data copying; use references where possible |
|
||||||
|
| 批量操作 | Batch adapter broadcasts; use pipelining for Redis |
|
||||||
|
| 无锁优先 | Use `DashMap` (lock-free) for hot-path data; `RwLock` for cold-path |
|
||||||
|
| 背压控制 | Use bounded `mpsc::channel` (256) to prevent memory blowout |
|
||||||
|
| 连接复用 | Reuse gRPC channels to appks |
|
||||||
|
|
||||||
|
### 9.3 优化策略 / Optimization Strategies
|
||||||
|
|
||||||
|
| 场景 | 策略 |
|
||||||
|
|---------|----------------------------------|
|
||||||
|
| 跨节点广播 | NATS(<1ms P50)优于 Redis(~2ms P50) |
|
||||||
|
| 消息持久化 | 异步写入 + 批量 COMMIT |
|
||||||
|
| 会话查找 | `DashMap` 直接查找,无锁竞争 |
|
||||||
|
| gRPC 调用 | 连接池 + 本地密钥缓存减少 RPC 往返 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. 测试规范 / Testing
|
||||||
|
|
||||||
|
### 10.1 基础要求 / Basic Requirements
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|--------|-------------------------------------------------------------|
|
||||||
|
| 新功能 | All new features must have unit tests |
|
||||||
|
| Bug 修复 | Bug fixes must include regression tests |
|
||||||
|
| 模型测试 | All model files must have serialization/conversion tests |
|
||||||
|
| 测试隔离 | Tests must be independent and not depend on execution order |
|
||||||
|
|
||||||
|
### 10.2 测试命令 / Test Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo test # 运行所有测试
|
||||||
|
cargo test --lib # 仅运行 lib 测试
|
||||||
|
cargo test models:: # 运行 models 模块测试
|
||||||
|
cargo test socket::parser:: # 运行 socket parser 测试
|
||||||
|
cargo test -- --nocapture # 显示输出
|
||||||
|
```
|
||||||
|
|
||||||
|
### 10.3 测试文件组织 / Test File Organization
|
||||||
|
|
||||||
|
```
|
||||||
|
tests/
|
||||||
|
├── engine_io_tests.rs # Engine.IO 协议测试
|
||||||
|
├── socket_io_tests.rs # Socket.IO 协议测试
|
||||||
|
├── adapter_tests.rs # Adapter 测试
|
||||||
|
└── session_tests.rs # 会话管理测试
|
||||||
|
|
||||||
|
# 单元测试 (inline)
|
||||||
|
models/message.rs → #[cfg(test)] mod tests { ... }
|
||||||
|
models/message_poll.rs → #[cfg(test)] mod tests { ... }
|
||||||
|
socket/parser.rs → #[cfg(test)] mod tests { ... }
|
||||||
|
engine/codec.rs → #[cfg(test)] mod tests { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Git 规范 / Git Workflow
|
||||||
|
|
||||||
|
### 11.1 提交信息格式 / Commit Message Format
|
||||||
|
|
||||||
|
使用 Angular 风格,全部英文:
|
||||||
|
|
||||||
|
```
|
||||||
|
<type>(<scope>): <subject>
|
||||||
|
|
||||||
|
[optional body]
|
||||||
|
|
||||||
|
[optional footer]
|
||||||
|
```
|
||||||
|
|
||||||
|
| Type | 说明 |
|
||||||
|
|------------|--------|
|
||||||
|
| `feat` | 新功能 |
|
||||||
|
| `fix` | Bug 修复 |
|
||||||
|
| `refactor` | 重构 |
|
||||||
|
| `docs` | 文档 |
|
||||||
|
| `test` | 测试 |
|
||||||
|
| `chore` | 构建/工具 |
|
||||||
|
| `perf` | 性能优化 |
|
||||||
|
| `style` | 代码格式 |
|
||||||
|
|
||||||
|
**示例 / Examples:**
|
||||||
|
```
|
||||||
|
feat(models): add MessagePin, MessageReadState, MessageDraft, MessageEdit models
|
||||||
|
fix(socket): handle namespace validation on connect
|
||||||
|
refactor(engine): extract session store to dedicated module
|
||||||
|
docs(readme): add architecture overview
|
||||||
|
test(parser): add edge cases for binary event decoding
|
||||||
|
chore(deps): update tonic to 0.14
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11.2 提交原则 / Commit Principles
|
||||||
|
|
||||||
|
| 原则 | 说明 |
|
||||||
|
|--------|----------------------------------------------------------|
|
||||||
|
| 原子提交 | Each commit should address one concern |
|
||||||
|
| 完整性 | Each commit should leave the codebase in a working state |
|
||||||
|
| 禁止强制推送 | Never force push to main branch |
|
||||||
|
| 提交前检查 | Run `cargo check` and `cargo test` before committing |
|
||||||
|
|
||||||
|
### 11.3 分支策略 / Branch Strategy
|
||||||
|
|
||||||
|
| 分支 | 用途 |
|
||||||
|
|-------------|--------|
|
||||||
|
| `main` | 生产就绪代码 |
|
||||||
|
| `feat/*` | 功能开发 |
|
||||||
|
| `fix/*` | Bug 修复 |
|
||||||
|
| `release/*` | 发布准备 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. 工作流程 / Workflow
|
||||||
|
|
||||||
|
### 12.1 开发流程 / Development Process
|
||||||
|
|
||||||
|
1. **理解先于编写** — Read before write; understand context first
|
||||||
|
2. **最小变更** — Minimal changes; don't refactor unrelated code
|
||||||
|
3. **验证变更** — Verify after changes; run tests or check output
|
||||||
|
4. **文档同步** — Update documentation when changing public APIs
|
||||||
|
|
||||||
|
### 12.2 AI 助手工作规范 / AI Assistant Guidelines
|
||||||
|
|
||||||
|
| 规则 | 说明 |
|
||||||
|
|--------|-----------------------------------------------------|
|
||||||
|
| 先读后写 | Always read existing code before making changes |
|
||||||
|
| 最小侵入 | Make minimal changes; don't refactor unrelated code |
|
||||||
|
| 验证结果 | Run `cargo check` or `cargo test` after changes |
|
||||||
|
| 解释变更 | Explain what you changed and why |
|
||||||
|
| 询问不确定 | Ask when unsure about requirements |
|
||||||
|
| 遵守禁止模式 | Never use `// ── xxxx ──────────` style comments |
|
||||||
|
|
||||||
|
### 12.3 常用命令 / Common Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo build # 构建
|
||||||
|
cargo check # 快速检查(推荐开发时使用)
|
||||||
|
cargo test # 运行测试
|
||||||
|
cargo test --lib # 仅运行 lib 测试
|
||||||
|
cargo clippy # Lint 检查
|
||||||
|
cargo fmt # 格式化
|
||||||
|
cargo doc --no-deps # 生成文档
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. 架构决策记录 / ADR
|
||||||
|
|
||||||
|
架构决策记录存放在 `docs/adr/` 目录下,使用 Markdown 格式。
|
||||||
|
|
||||||
|
### 当前决策 / Current Decisions
|
||||||
|
|
||||||
|
| ADR | 标题 | 状态 |
|
||||||
|
|---|---|---|
|
||||||
|
| — | Socket.IO 作为实时通信协议 | Accepted |
|
||||||
|
| — | UUID v7 作为主键实现游标分页 | Accepted |
|
||||||
|
| — | 双模式 JWT 验证(本地 + RPC) | Accepted |
|
||||||
|
| — | Adapter 模式支持多节点水平扩展 | Accepted |
|
||||||
|
| — | 消息表由 appks 管理,imks 仅扩展富内容表 | Accepted |
|
||||||
|
|
||||||
|
### ADR 模板 / ADR Template
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# ADR-NNN: 标题
|
||||||
|
|
||||||
|
## 状态
|
||||||
|
Accepted | Superseded | Deprecated
|
||||||
|
|
||||||
|
## 背景
|
||||||
|
描述问题背景
|
||||||
|
|
||||||
|
## 决策
|
||||||
|
描述做出的决策
|
||||||
|
|
||||||
|
## 后果
|
||||||
|
描述正面和负面影响
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. 审查清单 / Review Checklist
|
||||||
|
|
||||||
|
### 代码审查 / Code Review
|
||||||
|
|
||||||
|
- [ ] 代码风格符合项目规范(无 `// ──` 分隔线)
|
||||||
|
- [ ] 没有使用禁止模式(unwrap、panic、todo 等)
|
||||||
|
- [ ] 错误处理完整(?传播、具体类型)
|
||||||
|
- [ ] 安全考虑已处理(JWT 验证模式选择正确)
|
||||||
|
- [ ] 性能影响已评估(无 N+1、无阻塞调用)
|
||||||
|
- [ ] 测试已添加(model 文件必须有测试)
|
||||||
|
- [ ] 文档已更新(新 struct 有 doc comment)
|
||||||
|
|
||||||
|
### PR 审查 / PR Review
|
||||||
|
|
||||||
|
- [ ] 提交信息符合 Angular 风格
|
||||||
|
- [ ] 每个提交只关注一个问题
|
||||||
|
- [ ] 变更范围合理
|
||||||
|
- [ ] 没有遗留的 TODO/FIXME
|
||||||
|
- [ ] `cargo check` 和 `cargo test` 通过
|
||||||
|
|
||||||
|
### 发布前审查 / Pre-release Review
|
||||||
|
|
||||||
|
- [ ] 所有测试通过
|
||||||
|
- [ ] 无 clippy warning
|
||||||
|
- [ ] 迁移 SQL 已包含在 `migrate/` 中
|
||||||
|
- [ ] 依赖安全审计通过(`cargo audit`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 附录 / Appendix
|
||||||
|
|
||||||
|
### 项目架构速查 / Quick Architecture Reference
|
||||||
|
|
||||||
|
```
|
||||||
|
imks — IM 实时消息服务 / Real-time Messaging Service
|
||||||
|
|
||||||
|
┌──────────────────────────────────────────────┐
|
||||||
|
│ imks │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||||
|
│ │ engine/ │ │ socket/ │ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ │ • websocket │ │ • server (Socket.IO) │ │
|
||||||
|
│ │ • webtransport│ │ • namespace (rooms) │ │
|
||||||
|
│ │ • polling │ │ • parser (protocol) │ │
|
||||||
|
│ │ • session │ │ • adapter (scale-out) │ │
|
||||||
|
│ │ • packet │ │ ├─ local │ │
|
||||||
|
│ │ • codec │ │ ├─ redis │ │
|
||||||
|
│ │ • heartbeat │ │ └─ nats │ │
|
||||||
|
│ │ • server │ │ • message_bus │ │
|
||||||
|
│ └─────────────┘ │ • session_store │ │
|
||||||
|
│ └─────────────────────────┘ │
|
||||||
|
│ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||||
|
│ │ models/ │ │ pb/ │ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ │ 12 个消息 │ │ gRPC client stubs │ │
|
||||||
|
│ │ 领域模型 │ │ → appks TokenService │ │
|
||||||
|
│ │ │ │ → appks ChannelService │ │
|
||||||
|
│ │ message │ │ → appks MemberService │ │
|
||||||
|
│ │ attachment │ │ → appks PermissionService│ │
|
||||||
|
│ │ embed/poll │ │ │ │
|
||||||
|
│ │ reaction │ │ │ │
|
||||||
|
│ │ thread/pin │ │ │ │
|
||||||
|
│ │ draft/edit │ │ │ │
|
||||||
|
│ │ mention │ │ │ │
|
||||||
|
│ │ bookmark │ │ │ │
|
||||||
|
│ │ read_state │ │ │ │
|
||||||
|
│ └─────────────┘ └─────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────┐ │
|
||||||
|
│ │ migrate/ │ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ SQL 迁移 │ │
|
||||||
|
│ └─────────────┘ │
|
||||||
|
└─────────────────────┬────────────────────────┘
|
||||||
|
│ gRPC
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────────────┐
|
||||||
|
│ appks (core) │
|
||||||
|
│ │
|
||||||
|
│ TokenService │ ChannelService │ MemberSvc │
|
||||||
|
│ PermissionSvc │ WebhookSvc │ EmojiSvc │
|
||||||
|
│ │
|
||||||
|
│ Postgres (users, channels, members, ...) │
|
||||||
|
│ Redis (JWT keys, sessions, rate limiting) │
|
||||||
|
└──────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 基础设施速查 / Infrastructure Quick Reference
|
||||||
|
|
||||||
|
| 服务 | 用途 | 协议/库 |
|
||||||
|
|---|---|---|
|
||||||
|
| Postgres | 消息数据持久化 | sqlx |
|
||||||
|
| Redis | Adapter 广播 / 会话存储 | fred |
|
||||||
|
| NATS | Adapter 广播(低延迟替代) | async-nats |
|
||||||
|
| appks gRPC | JWT 验证 / 频道/成员/权限查询 | tonic |
|
||||||
|
|
||||||
|
### 传输层对比 / Transport Comparison
|
||||||
|
|
||||||
|
| 传输 | 适用场景 | 特点 |
|
||||||
|
|------------------|-------------------|--------------------|
|
||||||
|
| **Polling** | 浏览器不支持 WS 时的降级 | 兼容性最好,延迟高 |
|
||||||
|
| **WebSocket** | 主流浏览器/移动端 | 全双工,低延迟 |
|
||||||
|
| **WebTransport** | 现代浏览器(Chrome 97+) | 基于 QUIC,多路复用,不队头阻塞 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This document is maintained by the development team. For questions or suggestions, please open an issue.*
|
||||||
Generated
+265
@@ -1449,13 +1449,16 @@ version = "0.1.20"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0"
|
checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
|
"base64",
|
||||||
"bytes",
|
"bytes",
|
||||||
"futures-channel",
|
"futures-channel",
|
||||||
"futures-util",
|
"futures-util",
|
||||||
"http 1.4.2",
|
"http 1.4.2",
|
||||||
"http-body",
|
"http-body",
|
||||||
"hyper",
|
"hyper",
|
||||||
|
"ipnet",
|
||||||
"libc",
|
"libc",
|
||||||
|
"percent-encoding",
|
||||||
"pin-project-lite",
|
"pin-project-lite",
|
||||||
"socket2 0.6.4",
|
"socket2 0.6.4",
|
||||||
"tokio",
|
"tokio",
|
||||||
@@ -1612,6 +1615,12 @@ dependencies = [
|
|||||||
"fred",
|
"fred",
|
||||||
"futures-util",
|
"futures-util",
|
||||||
"jsonwebtoken",
|
"jsonwebtoken",
|
||||||
|
"opentelemetry",
|
||||||
|
"opentelemetry-appender-tracing",
|
||||||
|
"opentelemetry-otlp",
|
||||||
|
"opentelemetry-prometheus",
|
||||||
|
"opentelemetry_sdk",
|
||||||
|
"prometheus",
|
||||||
"prost",
|
"prost",
|
||||||
"prost-types",
|
"prost-types",
|
||||||
"rand 0.9.4",
|
"rand 0.9.4",
|
||||||
@@ -1626,6 +1635,7 @@ dependencies = [
|
|||||||
"tonic-prost",
|
"tonic-prost",
|
||||||
"tonic-prost-build",
|
"tonic-prost-build",
|
||||||
"tracing",
|
"tracing",
|
||||||
|
"tracing-opentelemetry",
|
||||||
"tracing-subscriber",
|
"tracing-subscriber",
|
||||||
"uuid",
|
"uuid",
|
||||||
"walkdir",
|
"walkdir",
|
||||||
@@ -1650,6 +1660,12 @@ dependencies = [
|
|||||||
"serde_core",
|
"serde_core",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "ipnet"
|
||||||
|
version = "2.12.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "d98f6fed1fde3f8c21bc40a1abb88dd75e67924f9cffc3ef95607bad8017f8e2"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "itertools"
|
name = "itertools"
|
||||||
version = "0.14.0"
|
version = "0.14.0"
|
||||||
@@ -1966,6 +1982,108 @@ version = "0.2.1"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe"
|
checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry"
|
||||||
|
version = "0.32.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "b0142c63252a9e054e68a4c61a5778f7b14f576274d593f8ce883d191a099682"
|
||||||
|
dependencies = [
|
||||||
|
"futures-core",
|
||||||
|
"futures-sink",
|
||||||
|
"js-sys",
|
||||||
|
"pin-project-lite",
|
||||||
|
"thiserror 2.0.18",
|
||||||
|
"tracing",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry-appender-tracing"
|
||||||
|
version = "0.32.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "2c0080f0dc1d7c786f467cd85a4e395fcab11ee852004f39a29a18ab7c25d837"
|
||||||
|
dependencies = [
|
||||||
|
"opentelemetry",
|
||||||
|
"tracing",
|
||||||
|
"tracing-core",
|
||||||
|
"tracing-subscriber",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry-http"
|
||||||
|
version = "0.32.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "5683015d09e2df236ef005b17f6f196f0d5f6313c4fa43a7b6a53b52776e4331"
|
||||||
|
dependencies = [
|
||||||
|
"async-trait",
|
||||||
|
"bytes",
|
||||||
|
"http 1.4.2",
|
||||||
|
"opentelemetry",
|
||||||
|
"reqwest",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry-otlp"
|
||||||
|
version = "0.32.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "9966929966d17620d7c316c643ba62631826e10021409357772d5eea84f62c35"
|
||||||
|
dependencies = [
|
||||||
|
"http 1.4.2",
|
||||||
|
"opentelemetry",
|
||||||
|
"opentelemetry-http",
|
||||||
|
"opentelemetry-proto",
|
||||||
|
"opentelemetry_sdk",
|
||||||
|
"prost",
|
||||||
|
"reqwest",
|
||||||
|
"thiserror 2.0.18",
|
||||||
|
"tokio",
|
||||||
|
"tonic",
|
||||||
|
"tonic-types",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry-prometheus"
|
||||||
|
version = "0.32.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "2c0359983e7f79cf33c9abd89e5d7ddf67c46c419d0148598022d70e70c01aba"
|
||||||
|
dependencies = [
|
||||||
|
"once_cell",
|
||||||
|
"opentelemetry",
|
||||||
|
"opentelemetry_sdk",
|
||||||
|
"prometheus",
|
||||||
|
"tracing",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry-proto"
|
||||||
|
version = "0.32.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "56d658ba1faf63f7b9c492cfbe6e0ec365440a16132d3270c1065f7b33f1b638"
|
||||||
|
dependencies = [
|
||||||
|
"opentelemetry",
|
||||||
|
"opentelemetry_sdk",
|
||||||
|
"prost",
|
||||||
|
"tonic",
|
||||||
|
"tonic-prost",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "opentelemetry_sdk"
|
||||||
|
version = "0.32.1"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "9b59f80e1ac4d5ff7a2db8fb6c80badb7f0f3f858211fba08dd9aaec750894f9"
|
||||||
|
dependencies = [
|
||||||
|
"futures-channel",
|
||||||
|
"futures-executor",
|
||||||
|
"futures-util",
|
||||||
|
"opentelemetry",
|
||||||
|
"percent-encoding",
|
||||||
|
"portable-atomic",
|
||||||
|
"rand 0.9.4",
|
||||||
|
"thiserror 2.0.18",
|
||||||
|
"tokio",
|
||||||
|
"tokio-stream",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "parking"
|
name = "parking"
|
||||||
version = "2.2.1"
|
version = "2.2.1"
|
||||||
@@ -2122,6 +2240,21 @@ dependencies = [
|
|||||||
"unicode-ident",
|
"unicode-ident",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "prometheus"
|
||||||
|
version = "0.14.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "3ca5326d8d0b950a9acd87e6a3f94745394f62e4dae1b1ee22b2bc0c394af43a"
|
||||||
|
dependencies = [
|
||||||
|
"cfg-if",
|
||||||
|
"fnv",
|
||||||
|
"lazy_static",
|
||||||
|
"memchr",
|
||||||
|
"parking_lot",
|
||||||
|
"protobuf",
|
||||||
|
"thiserror 2.0.18",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "prost"
|
name = "prost"
|
||||||
version = "0.14.4"
|
version = "0.14.4"
|
||||||
@@ -2175,6 +2308,26 @@ dependencies = [
|
|||||||
"prost",
|
"prost",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "protobuf"
|
||||||
|
version = "3.7.2"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "d65a1d4ddae7d8b5de68153b48f6aa3bba8cb002b243dbdbc55a5afbc98f99f4"
|
||||||
|
dependencies = [
|
||||||
|
"once_cell",
|
||||||
|
"protobuf-support",
|
||||||
|
"thiserror 1.0.69",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "protobuf-support"
|
||||||
|
version = "3.7.2"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "3e36c2f31e0a47f9280fb347ef5e461ffcd2c52dd520d8e216b52f93b0b0d7d6"
|
||||||
|
dependencies = [
|
||||||
|
"thiserror 1.0.69",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "pulldown-cmark"
|
name = "pulldown-cmark"
|
||||||
version = "0.13.4"
|
version = "0.13.4"
|
||||||
@@ -2418,6 +2571,37 @@ version = "0.8.11"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "d6f6ff9a378485b298a5286656da665ba74413d36db0979633275d2e708145d4"
|
checksum = "d6f6ff9a378485b298a5286656da665ba74413d36db0979633275d2e708145d4"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "reqwest"
|
||||||
|
version = "0.13.4"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "219c5811de6525e5416c7d5d53bb656d3afdbc6c5af816e0802bcfa42dbdc1c3"
|
||||||
|
dependencies = [
|
||||||
|
"base64",
|
||||||
|
"bytes",
|
||||||
|
"futures-channel",
|
||||||
|
"futures-core",
|
||||||
|
"futures-util",
|
||||||
|
"http 1.4.2",
|
||||||
|
"http-body",
|
||||||
|
"http-body-util",
|
||||||
|
"hyper",
|
||||||
|
"hyper-util",
|
||||||
|
"js-sys",
|
||||||
|
"log",
|
||||||
|
"percent-encoding",
|
||||||
|
"pin-project-lite",
|
||||||
|
"sync_wrapper",
|
||||||
|
"tokio",
|
||||||
|
"tower",
|
||||||
|
"tower-http",
|
||||||
|
"tower-service",
|
||||||
|
"url",
|
||||||
|
"wasm-bindgen",
|
||||||
|
"wasm-bindgen-futures",
|
||||||
|
"web-sys",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "ring"
|
name = "ring"
|
||||||
version = "0.17.14"
|
version = "0.17.14"
|
||||||
@@ -3073,6 +3257,9 @@ name = "sync_wrapper"
|
|||||||
version = "1.0.2"
|
version = "1.0.2"
|
||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "0bf256ce5efdfa370213c1dabab5935a12e49f2c58d15e9eac2870d3b4f27263"
|
checksum = "0bf256ce5efdfa370213c1dabab5935a12e49f2c58d15e9eac2870d3b4f27263"
|
||||||
|
dependencies = [
|
||||||
|
"futures-core",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "synstructure"
|
name = "synstructure"
|
||||||
@@ -3370,6 +3557,17 @@ dependencies = [
|
|||||||
"tonic-build",
|
"tonic-build",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tonic-types"
|
||||||
|
version = "0.14.6"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "73ab1b02061f83d519bba3caa167f88f261ef05720ab8ebc954ade70de3348e8"
|
||||||
|
dependencies = [
|
||||||
|
"prost",
|
||||||
|
"prost-types",
|
||||||
|
"tonic",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tower"
|
name = "tower"
|
||||||
version = "0.5.3"
|
version = "0.5.3"
|
||||||
@@ -3389,6 +3587,24 @@ dependencies = [
|
|||||||
"tracing",
|
"tracing",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tower-http"
|
||||||
|
version = "0.6.11"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "4cfcf7e2740e6fc6d4d688b4ef00650406bb94adf4731e43c096c3a19fe40840"
|
||||||
|
dependencies = [
|
||||||
|
"bitflags",
|
||||||
|
"bytes",
|
||||||
|
"futures-util",
|
||||||
|
"http 1.4.2",
|
||||||
|
"http-body",
|
||||||
|
"pin-project-lite",
|
||||||
|
"tower",
|
||||||
|
"tower-layer",
|
||||||
|
"tower-service",
|
||||||
|
"url",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tower-layer"
|
name = "tower-layer"
|
||||||
version = "0.3.3"
|
version = "0.3.3"
|
||||||
@@ -3445,6 +3661,32 @@ dependencies = [
|
|||||||
"tracing-core",
|
"tracing-core",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tracing-opentelemetry"
|
||||||
|
version = "0.33.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "adbc64cba7137545b8044cb1fe9814f7aacf3c6b5f9b45be8bb5db538befdb26"
|
||||||
|
dependencies = [
|
||||||
|
"js-sys",
|
||||||
|
"opentelemetry",
|
||||||
|
"smallvec",
|
||||||
|
"tracing",
|
||||||
|
"tracing-core",
|
||||||
|
"tracing-log",
|
||||||
|
"tracing-subscriber",
|
||||||
|
"web-time",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tracing-serde"
|
||||||
|
version = "0.2.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "704b1aeb7be0d0a84fc9828cae51dab5970fee5088f83d1dd7ee6f6246fc6ff1"
|
||||||
|
dependencies = [
|
||||||
|
"serde",
|
||||||
|
"tracing-core",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tracing-subscriber"
|
name = "tracing-subscriber"
|
||||||
version = "0.3.23"
|
version = "0.3.23"
|
||||||
@@ -3455,12 +3697,15 @@ dependencies = [
|
|||||||
"nu-ansi-term",
|
"nu-ansi-term",
|
||||||
"once_cell",
|
"once_cell",
|
||||||
"regex-automata",
|
"regex-automata",
|
||||||
|
"serde",
|
||||||
|
"serde_json",
|
||||||
"sharded-slab",
|
"sharded-slab",
|
||||||
"smallvec",
|
"smallvec",
|
||||||
"thread_local",
|
"thread_local",
|
||||||
"tracing",
|
"tracing",
|
||||||
"tracing-core",
|
"tracing-core",
|
||||||
"tracing-log",
|
"tracing-log",
|
||||||
|
"tracing-serde",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
@@ -3646,6 +3891,16 @@ dependencies = [
|
|||||||
"wasm-bindgen-shared",
|
"wasm-bindgen-shared",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "wasm-bindgen-futures"
|
||||||
|
version = "0.4.73"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "54568702fabf5d4849ce2b90fadfa64168a097eaf4b351ce9df8b687a0086aaf"
|
||||||
|
dependencies = [
|
||||||
|
"js-sys",
|
||||||
|
"wasm-bindgen",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "wasm-bindgen-macro"
|
name = "wasm-bindgen-macro"
|
||||||
version = "0.2.123"
|
version = "0.2.123"
|
||||||
@@ -3712,6 +3967,16 @@ dependencies = [
|
|||||||
"semver",
|
"semver",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "web-sys"
|
||||||
|
version = "0.3.100"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "6e0871acf327f283dc6da28a1696cdc64fb355ba9f935d052021fa77f35cce69"
|
||||||
|
dependencies = [
|
||||||
|
"js-sys",
|
||||||
|
"wasm-bindgen",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "web-time"
|
name = "web-time"
|
||||||
version = "1.1.0"
|
version = "1.1.0"
|
||||||
|
|||||||
+8
-1
@@ -36,7 +36,14 @@ dashmap = "6"
|
|||||||
thiserror = "2"
|
thiserror = "2"
|
||||||
async-trait = "0.1"
|
async-trait = "0.1"
|
||||||
tracing = "0.1"
|
tracing = "0.1"
|
||||||
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
|
tracing-subscriber = { version = "0.3", features = ["env-filter", "json", "fmt", "registry"] }
|
||||||
|
opentelemetry = { version = "0.32", features = ["trace", "metrics", "logs"] }
|
||||||
|
opentelemetry_sdk = { version = "0.32", features = ["trace", "metrics", "logs", "rt-tokio"] }
|
||||||
|
opentelemetry-otlp = { version = "0.32", features = ["trace", "metrics", "logs", "grpc-tonic", "http-proto", "tls-ring"] }
|
||||||
|
tracing-opentelemetry = "0.33"
|
||||||
|
opentelemetry-appender-tracing = "0.32"
|
||||||
|
opentelemetry-prometheus = "0.32"
|
||||||
|
prometheus = "0.14"
|
||||||
fred = { version = "10", features = ["subscriber-client"] }
|
fred = { version = "10", features = ["subscriber-client"] }
|
||||||
async-nats = "0.38"
|
async-nats = "0.38"
|
||||||
futures-util = "0.3"
|
futures-util = "0.3"
|
||||||
|
|||||||
@@ -0,0 +1,126 @@
|
|||||||
|
# imks — IM 实时消息服务
|
||||||
|
|
||||||
|
基于 **Engine.IO + Socket.IO** 协议的即时通讯(IM)实时消息服务,支持 WebSocket、WebTransport、HTTP Long-Polling 多种传输层,通过 gRPC 与 [appks](https://github.com/your-org/appks) 核心服务集成,提供认证、权限、消息持久化和跨节点广播。
|
||||||
|
|
||||||
|
## 架构
|
||||||
|
|
||||||
|
```
|
||||||
|
Client (Browser/App)
|
||||||
|
│ Socket.IO over WebSocket / WebTransport / Polling
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ imks │
|
||||||
|
│ │
|
||||||
|
│ engine/ socket/ │
|
||||||
|
│ • WS/WT/ • Socket.IO Server │
|
||||||
|
│ Polling • Namespace/Room │
|
||||||
|
│ • Session • Adapter (Redis/NATS)│
|
||||||
|
│ • Heartbeat • Message Bus │
|
||||||
|
│ │
|
||||||
|
│ models/ repo/ svc/ │
|
||||||
|
│ • 20+ 消息 • SQL CRUD • 业务 │
|
||||||
|
│ 领域模型 • 分页查询 逻辑层 │
|
||||||
|
│ │
|
||||||
|
│ auth/ rpc/ │
|
||||||
|
│ • JWT 双模 • gRPC Stubs │
|
||||||
|
│ 验证 • Token/Channel/ │
|
||||||
|
│ • 密钥缓存 Member/Permission │
|
||||||
|
└──────────────┬──────────────────────┘
|
||||||
|
│ gRPC (mTLS)
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ appks (core) │
|
||||||
|
│ Token │ Channel │ Member │ ... │
|
||||||
|
│ Postgres • Redis • NATS │
|
||||||
|
└─────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## 快速开始
|
||||||
|
|
||||||
|
### 前置依赖
|
||||||
|
|
||||||
|
- **Rust** 1.85+ (edition 2024)
|
||||||
|
- **PostgreSQL** 16+ (消息持久化)
|
||||||
|
- **appks** gRPC 服务 (认证 & 权限)
|
||||||
|
- **Redis** (可选, 多节点广播)
|
||||||
|
- **NATS** (可选, 低延迟多节点广播)
|
||||||
|
|
||||||
|
### 安装 & 运行
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 克隆仓库
|
||||||
|
git clone https://github.com/your-org/imks.git
|
||||||
|
cd imks
|
||||||
|
|
||||||
|
# 配置环境变量
|
||||||
|
cp .env.example .env
|
||||||
|
# 编辑 .env,至少设置 DATABASE_URL 和 APPKS_GRPC_ADDR
|
||||||
|
|
||||||
|
# 数据库迁移(自动执行)
|
||||||
|
# 首次启动会自动运行 migrate/ 下的 SQL 迁移
|
||||||
|
|
||||||
|
# 编译
|
||||||
|
cargo build --release
|
||||||
|
|
||||||
|
# 运行
|
||||||
|
cargo run --release
|
||||||
|
# 默认监听 http://0.0.0.0:3000
|
||||||
|
```
|
||||||
|
|
||||||
|
### 端点
|
||||||
|
|
||||||
|
| 端点 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| `GET /engine.io/` | Engine.IO 握手 & WebSocket 升级 |
|
||||||
|
| `POST /engine.io/` | Engine.IO HTTP Long-Polling |
|
||||||
|
| `GET /health` | 健康检查(含连接数、会话数、依赖检查) |
|
||||||
|
| `GET /metrics` | Prometheus 格式指标 |
|
||||||
|
|
||||||
|
### 健康检查示例
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "healthy",
|
||||||
|
"version": "0.1.0",
|
||||||
|
"timestamp": "2026-06-11T10:00:00Z",
|
||||||
|
"uptime_secs": 3600,
|
||||||
|
"connections_active": 42,
|
||||||
|
"sessions_count": 42,
|
||||||
|
"checks": {
|
||||||
|
"postgres": { "status": "up", "latency_ms": 3 },
|
||||||
|
"redis": { "status": "up", "latency_ms": 1 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 环境变量
|
||||||
|
|
||||||
|
完整列表见 [`.env.example`](./.env.example)。
|
||||||
|
|
||||||
|
### 核心配置
|
||||||
|
|
||||||
|
| 变量 | 默认值 | 说明 |
|
||||||
|
|---|---|---|
|
||||||
|
| `IMKS_ADAPTER` | `local` | `local` \| `redis` \| `nats` |
|
||||||
|
| `DATABASE_URL` | `postgres://localhost/imks` | PostgreSQL 连接串 |
|
||||||
|
| `APPKS_GRPC_ADDR` | `http://localhost:50051` | appks gRPC 地址 |
|
||||||
|
|
||||||
|
## 开发
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo check # 快速检查语法
|
||||||
|
cargo test # 运行所有测试(111 个)
|
||||||
|
cargo test --lib # 仅库测试(91 个)
|
||||||
|
cargo clippy # Lint 检查
|
||||||
|
cargo fmt # 格式化
|
||||||
|
```
|
||||||
|
|
||||||
|
## 文档
|
||||||
|
|
||||||
|
- [AGENTS.md](./AGENTS.md) — 开发规范
|
||||||
|
- [rpc.md](docs/rpc.md) — 认证方案 & Proto 契约
|
||||||
|
- [migrate/](./migrate/) — 数据库迁移脚本
|
||||||
|
|
||||||
|
## 许可证
|
||||||
|
|
||||||
|
[待定]
|
||||||
+718
@@ -0,0 +1,718 @@
|
|||||||
|
# Auth 认证方案
|
||||||
|
|
||||||
|
## 架构总览
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────┐ ┌──────────┐ ┌──────────┐
|
||||||
|
│ Client │ │ appks │ │ imks │
|
||||||
|
│ (浏览器/ │ │ (core) │ │ (IM服务) │
|
||||||
|
│ APP) │ │ │ │ │
|
||||||
|
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
||||||
|
│ │ │
|
||||||
|
│ 1. POST /api/v1/auth/login │
|
||||||
|
│───────────────────▶│ │
|
||||||
|
│ 2. {access_token, refresh_token} │
|
||||||
|
│◀───────────────────│ │
|
||||||
|
│ │ │
|
||||||
|
│ 3. WS/gRPC/HTTP 携带 JWT │
|
||||||
|
│──────────────────────────────────────▶│
|
||||||
|
│ │ │
|
||||||
|
│ │ 4a. VerifyToken RPC (RPC模式)
|
||||||
|
│ │◀─────────────────│
|
||||||
|
│ │ 4b. GetSigningKeys (本地模式)
|
||||||
|
│ │◀─────────────────│
|
||||||
|
│ │ │
|
||||||
|
│ │ 5. TokenClaims / SigningKeys
|
||||||
|
│ │─────────────────▶│
|
||||||
|
│ │ │
|
||||||
|
│ 6. 业务响应 │ │
|
||||||
|
│◀─────────────────────────────────────│
|
||||||
|
```
|
||||||
|
|
||||||
|
**角色分工:**
|
||||||
|
|
||||||
|
| 服务 | 职责 |
|
||||||
|
|------------------|----------------------------------------------------|
|
||||||
|
| **appks** (core) | 颁发 JWT、刷新 JWT、撤销 JWT、管理签名密钥、提供 `TokenService` gRPC |
|
||||||
|
| **imks** (IM) | 接收客户端 JWT,通过 RPC 或本地密钥验证用户身份 |
|
||||||
|
|
||||||
|
## Proto 契约
|
||||||
|
|
||||||
|
定义在 `proto/core/auth.proto`,package `appks.core.v1`。
|
||||||
|
|
||||||
|
appks 和 imks 各自维护一份相同的 proto 文件:
|
||||||
|
- appks 编译为 **server** stub(提供服务)
|
||||||
|
- imks 编译为 **client** stub(调用服务)
|
||||||
|
|
||||||
|
### TokenService RPC
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
service TokenService {
|
||||||
|
// 令牌生命周期 (appks 内部调用)
|
||||||
|
rpc IssueToken(IssueTokenRequest) returns (IssueTokenResponse);
|
||||||
|
rpc RefreshToken(RefreshTokenRequest) returns (RefreshTokenResponse);
|
||||||
|
rpc RevokeToken(RevokeTokenRequest) returns (RevokeTokenResponse);
|
||||||
|
|
||||||
|
// imks 验证 (RPC 模式)
|
||||||
|
rpc VerifyToken(VerifyTokenRequest) returns (VerifyTokenResponse);
|
||||||
|
|
||||||
|
// imks 密钥拉取 (本地验证模式)
|
||||||
|
rpc GetSigningKeys(GetSigningKeysRequest) returns (GetSigningKeysResponse);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## JWT 令牌
|
||||||
|
|
||||||
|
### 结构
|
||||||
|
|
||||||
|
JWT Header:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"alg": "HS256",
|
||||||
|
"typ": "JWT",
|
||||||
|
"kid": "01909a..." // 签名密钥 ID,用于匹配 SigningKey
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
JWT Payload (`TokenClaims`):
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"sub": "user-uuid",
|
||||||
|
"iss": "appks",
|
||||||
|
"iat": 1718000000,
|
||||||
|
"exp": 1718003600,
|
||||||
|
"jti": "01909b...",
|
||||||
|
"scope": "im:read im:write",
|
||||||
|
"extra": {
|
||||||
|
"workspace_id": "..."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 令牌类型
|
||||||
|
|
||||||
|
| 类型 | 格式 | 存储 | 用途 |
|
||||||
|
|-------------------|----------------------|----------------------------------------------|---------------------------------------|
|
||||||
|
| **access_token** | JWT (HS256) | 无状态,客户端持有 | 每次请求携带,验证用户身份 |
|
||||||
|
| **refresh_token** | `rt_{UUIDv7}` 不透明字符串 | Redis `core:token:refresh:{token}` → user_id | 换取新的 access_token + refresh_token(旋转) |
|
||||||
|
|
||||||
|
## 双模式验证
|
||||||
|
|
||||||
|
imks 可选择以下任一模式验证客户端 JWT:
|
||||||
|
|
||||||
|
### 模式 A:RPC 验证(`VerifyToken`)
|
||||||
|
|
||||||
|
```
|
||||||
|
imks → appks TokenService.VerifyToken(jwt) → {valid, claims}
|
||||||
|
```
|
||||||
|
|
||||||
|
- **优点**:实时权威,能感知撤销
|
||||||
|
- **缺点**:每次请求增加一次 RPC 往返
|
||||||
|
- **适用场景**:高安全要求操作(管理员操作、敏感数据)
|
||||||
|
|
||||||
|
### 模式 B:本地验证(`GetSigningKeys`)
|
||||||
|
|
||||||
|
```
|
||||||
|
imks 启动时 → appks TokenService.GetSigningKeys() → 缓存密钥到本地
|
||||||
|
后续请求 → imks 用本地密钥解码 JWT(HS256 验签)
|
||||||
|
定期刷新 → 根据 next_rotation_at 拉取新密钥
|
||||||
|
```
|
||||||
|
|
||||||
|
- **优点**:零 RPC 延迟,appks 不可用时仍能验证
|
||||||
|
- **缺点**:撤销有最多一个密钥窗口(3h)的延迟
|
||||||
|
- **适用场景**:高频低延迟操作(消息收发、实时通信)
|
||||||
|
|
||||||
|
### 推荐策略
|
||||||
|
|
||||||
|
混合使用:
|
||||||
|
- 普通操作(发消息、读频道)→ 本地验证
|
||||||
|
- 敏感操作(踢人、删频道、改权限)→ RPC 验证
|
||||||
|
|
||||||
|
## 签名密钥管理
|
||||||
|
|
||||||
|
### 密钥窗口
|
||||||
|
|
||||||
|
```
|
||||||
|
时间轴:
|
||||||
|
─────────┬──────────┬──────────┬────────
|
||||||
|
│ key A │ key B │ key C
|
||||||
|
│ (过期) │ (活跃) │ (未来)
|
||||||
|
└──────────┴──────────┴────────
|
||||||
|
issued_at issued_at issued_at
|
||||||
|
+3h +3h +3h
|
||||||
|
```
|
||||||
|
|
||||||
|
- 每个签名密钥有效期 **3 小时**
|
||||||
|
- 同一时刻可能有 **2 个有效密钥**(滚动窗口,平滑过渡)
|
||||||
|
- JWT header 的 `kid` 字段标识使用哪个密钥签名
|
||||||
|
|
||||||
|
### 密钥轮换流程
|
||||||
|
|
||||||
|
```
|
||||||
|
1. 当前密钥到达 3h → TokenService.rotate_if_needed()
|
||||||
|
2. Redis 分布式锁 (core:token:rotation_lock, 10s TTL) 防止多实例竞争
|
||||||
|
3. 旧密钥标记 active=false,仍保留在 Redis 用于验证旧 token
|
||||||
|
4. 生成新密钥,active=true
|
||||||
|
5. ArcSwap 原子替换当前签名密钥
|
||||||
|
6. 旧密钥 TTL = 6h (2× window) 后从 Redis 自动清除
|
||||||
|
```
|
||||||
|
|
||||||
|
### 密钥存储(Redis)
|
||||||
|
|
||||||
|
```
|
||||||
|
core:token:active_key → kid (当前活跃密钥 ID)
|
||||||
|
core:token:key:{kid} → SigningKey JSON (TTL = 6h)
|
||||||
|
core:token:rotation_lock → "1" (TTL = 10s, 分布式锁)
|
||||||
|
```
|
||||||
|
|
||||||
|
### SigningKey 结构
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct SigningKeyInfo {
|
||||||
|
pub kid: String, // UUIDv7
|
||||||
|
pub algorithm: String, // "HS256"
|
||||||
|
pub key_material: String, // base64(32 bytes random)
|
||||||
|
pub issued_at: i64,
|
||||||
|
pub expires_at: i64, // issued_at + 3h
|
||||||
|
pub active: bool,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 撤销机制
|
||||||
|
|
||||||
|
### Redis 布局
|
||||||
|
|
||||||
|
```
|
||||||
|
core:token:revoked:{jti} → "1" (TTL = token 剩余有效期)
|
||||||
|
core:token:refresh:{token} → user_id (TTL = 7d)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 撤销方式
|
||||||
|
|
||||||
|
| 操作 | RPC | 效果 |
|
||||||
|
|--------------|------------------------|-----------------------|
|
||||||
|
| 撤销单个 token | `RevokeToken(jti)` | 将 jti 加入撤销列表 |
|
||||||
|
| 撤销用户所有 token | `RevokeToken(user_id)` | 删除该用户所有 refresh token |
|
||||||
|
|
||||||
|
### 撤销感知延迟
|
||||||
|
|
||||||
|
| 验证模式 | 延迟 |
|
||||||
|
|-----------------------|------------------------------|
|
||||||
|
| RPC (`VerifyToken`) | **实时** — 每次检查撤销列表 |
|
||||||
|
| 本地 (`GetSigningKeys`) | **最多 3h** — 密钥过期前无法感知 jti 撤销 |
|
||||||
|
|
||||||
|
## appks 实现
|
||||||
|
|
||||||
|
### 模块结构
|
||||||
|
|
||||||
|
```
|
||||||
|
service/internal_auth.rs → TokenService (业务逻辑)
|
||||||
|
grpc/auth.rs → TokenGrpcService (gRPC handler)
|
||||||
|
grpc/mod.rs → TokenServiceServer 注册到 tonic server
|
||||||
|
api/internal/issue_api_key.rs → REST: POST /api/v1/internal/tokens
|
||||||
|
```
|
||||||
|
|
||||||
|
### TokenService 核心
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct TokenService {
|
||||||
|
redis: AppRedis,
|
||||||
|
current_key: Arc<ArcSwap<SigningKeyInfo>>, // 无锁读
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- 启动时从 Redis 加载活跃密钥,无则生成
|
||||||
|
- 签名使用 `jsonwebtoken` crate (HS256)
|
||||||
|
- 密钥轮换使用 Redis 分布式锁,支持多实例部署
|
||||||
|
- `ArcSwap` 保证签名密钥读取无锁、写入原子
|
||||||
|
|
||||||
|
## imks 实现指南
|
||||||
|
|
||||||
|
### 启动流程
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 1. 连接 appks TokenService
|
||||||
|
let mut token_client = TokenServiceClient::connect(appks_addr).await?;
|
||||||
|
|
||||||
|
// 2. 拉取签名密钥
|
||||||
|
let resp = token_client.get_signing_keys(GetSigningKeysRequest { kid: "" }).await?;
|
||||||
|
let keys = resp.keys;
|
||||||
|
let next_rotation = resp.next_rotation_at;
|
||||||
|
|
||||||
|
// 3. 缓存密钥到本地 (HashMap<kid, SigningKey>)
|
||||||
|
key_store.insert_all(keys);
|
||||||
|
|
||||||
|
// 4. 安排定时刷新
|
||||||
|
tokio::spawn(async move {
|
||||||
|
loop {
|
||||||
|
let delay = next_rotation - now();
|
||||||
|
tokio::time::sleep(Duration::from_secs(delay as u64)).await;
|
||||||
|
let resp = token_client.get_signing_keys(...).await;
|
||||||
|
key_store.update(resp.keys);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### 连接时验证
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 客户端建立 WebSocket/gRPC 连接时携带 JWT
|
||||||
|
fn on_connect(headers: &Headers) -> Result<TokenClaims, AuthError> {
|
||||||
|
let token = headers.get("Authorization")
|
||||||
|
.and_then(|v| v.strip_prefix("Bearer "))
|
||||||
|
.ok_or(AuthError::MissingToken)?;
|
||||||
|
|
||||||
|
// 本地验证 (快速路径)
|
||||||
|
let header = decode_header(token)?;
|
||||||
|
let kid = header.kid.ok_or(AuthError::MissingKid)?;
|
||||||
|
let key = key_store.get(&kid).ok_or(AuthError::UnknownKey)?;
|
||||||
|
|
||||||
|
let mut validation = Validation::new(Algorithm::HS256);
|
||||||
|
validation.set_issuer(&["appks"]);
|
||||||
|
validation.validate_exp = true;
|
||||||
|
|
||||||
|
let data = decode::<TokenClaims>(token, &key, &validation)?;
|
||||||
|
Ok(data.claims)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 敏感操作验证
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// 敏感操作走 RPC 验证 (权威路径)
|
||||||
|
async fn on_sensitive_action(token: &str) -> Result<TokenClaims, AuthError> {
|
||||||
|
let resp = token_client.verify_token(VerifyTokenRequest {
|
||||||
|
token: token.to_string(),
|
||||||
|
}).await?;
|
||||||
|
|
||||||
|
if resp.valid {
|
||||||
|
Ok(resp.claims.unwrap())
|
||||||
|
} else {
|
||||||
|
Err(AuthError::from(resp.reason))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 安全考虑
|
||||||
|
|
||||||
|
1. **密钥传输**:appks → imks 的 gRPC 连接应使用 mTLS,防止密钥在传输中被截获
|
||||||
|
2. **密钥生命周期**:3h 窗口平衡了安全性和可用性;缩短窗口可减少撤销延迟但增加轮换频率
|
||||||
|
3. **HS256 vs 非对称**:当前使用 HS256(对称密钥),imks 拿到的密钥可以伪造 token。如果 imks 不可完全信任,应改用 RS256/EdDSA,imks 只持有公钥
|
||||||
|
4. **Refresh Token 安全**:每次刷新都旋转(旧 token 立即失效),防止重放
|
||||||
|
5. **撤销列表 TTL**:与 token 剩余有效期对齐,过期 token 无需保留撤销记录
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# IM 服务 Proto 说明书
|
||||||
|
|
||||||
|
以下是 imks 侧 `proto/core/` 下各 gRPC 服务的完整说明。所有 IM 服务定义在 `appks.im.v1` 包下,由 appks 提供 server 端,imks 消费 client 端。
|
||||||
|
|
||||||
|
## 服务总览
|
||||||
|
|
||||||
|
| Proto 文件 | 服务 | RPC 数量 | 职责 |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `auth.proto` | TokenService | 5 | JWT 令牌生命周期 + 验证 + 密钥分发 |
|
||||||
|
| `channel.proto` | ChannelService | 10 | 频道/分类 CRUD + 统计 |
|
||||||
|
| `member.proto` | MemberService | 7 | 成员邀请/踢出/加入/离开/查询 |
|
||||||
|
| `permission.proto` | PermissionService | 7 | 权限检查 + 覆盖规则 + 频道解析 |
|
||||||
|
| `channel_settings.proto` | ChannelRoleService | 4 | 频道自定义角色 |
|
||||||
|
| | ChannelInvitationService | 4 | 邀请生命周期 |
|
||||||
|
| | ChannelWebhookService | 4 | Webhook CRUD |
|
||||||
|
| | ChannelSlashCommandService | 4 | 斜杠命令注册 |
|
||||||
|
| | ChannelRepoLinkService | 3 | 频道 ↔ 代码仓库关联 |
|
||||||
|
| | ImIntegrationService | 4 | 外部平台集成(Slack/Discord 等) |
|
||||||
|
| | CustomEmojiService | 3 | 工作区自定义表情 |
|
||||||
|
| | ForumTagService | 4 | 论坛频道标签 |
|
||||||
|
| | VoiceService | 2 | 语音频道参与者状态 |
|
||||||
|
| | StageService | 4 | 舞台频道管理 |
|
||||||
|
| | ChannelAuditService | 1 | 频道审计日志查询 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ChannelService(`channel.proto`)
|
||||||
|
|
||||||
|
频道和分类的 CRUD 管理,以及频道统计。
|
||||||
|
|
||||||
|
### 枚举
|
||||||
|
|
||||||
|
**ChannelType** — 频道类型:
|
||||||
|
|
||||||
|
| 值 | 含义 |
|
||||||
|
|---|---|
|
||||||
|
| `PUBLIC` | 公开频道,workspace 内所有人可见 |
|
||||||
|
| `PRIVATE` | 私有频道,仅被邀请成员可见 |
|
||||||
|
| `DIRECT` | 私聊(一对一) |
|
||||||
|
| `GROUP` | 群聊(多人私聊) |
|
||||||
|
| `REPO` | 仓库关联频道(自动与 git repo 绑定) |
|
||||||
|
| `SYSTEM` | 系统频道(公告、通知等,只读) |
|
||||||
|
|
||||||
|
**ChannelKind** — 频道形态:
|
||||||
|
|
||||||
|
| 值 | 含义 |
|
||||||
|
|---|---|
|
||||||
|
| `TEXT` | 文本频道 |
|
||||||
|
| `VOICE` | 语音频道 |
|
||||||
|
| `STAGE` | 舞台频道(主持人+观众模式) |
|
||||||
|
| `FORUM` | 论坛频道(帖子/主题式讨论) |
|
||||||
|
| `ANNOUNCEMENT` | 公告频道(仅管理员可发消息) |
|
||||||
|
|
||||||
|
**Visibility** — 可见性级别(从低到高):
|
||||||
|
|
||||||
|
| 值 | 含义 |
|
||||||
|
|---|---|
|
||||||
|
| `PUBLIC` | 所有人可见(含未登录用户) |
|
||||||
|
| `WORKSPACE` | workspace 成员可见 |
|
||||||
|
| `INTERNAL` | 内部可见(组织成员) |
|
||||||
|
| `PRIVATE` | 仅频道成员可见 |
|
||||||
|
| `PROTECTED` | 受保护(不可被搜索/索引) |
|
||||||
|
| `HIDDEN` | 隐藏(不显示在频道列表中) |
|
||||||
|
| `SECRET` | 机密(仅通过直链访问) |
|
||||||
|
|
||||||
|
### RPC 列表
|
||||||
|
|
||||||
|
```
|
||||||
|
GetChannel(channel_id) → Channel 获取频道详情
|
||||||
|
ListChannels(workspace, ...) → [Channel], total 列出频道(支持分类/类型/形态过滤)
|
||||||
|
CreateChannel(workspace, name) → Channel 创建频道
|
||||||
|
UpdateChannel(channel_id, ...) → Channel 更新频道属性
|
||||||
|
DeleteChannel(channel_id) → {} 删除频道
|
||||||
|
GetChannelStats(channel_id) → ChannelStats 获取频道统计(成员/消息/线程/反应数)
|
||||||
|
|
||||||
|
ListCategories(workspace) → [ChannelCategory] 列出分类
|
||||||
|
CreateCategory(workspace, name) → ChannelCategory 创建分类
|
||||||
|
UpdateCategory(category_id, ...) → ChannelCategory 更新分类
|
||||||
|
DeleteCategory(category_id) → {} 删除分类
|
||||||
|
```
|
||||||
|
|
||||||
|
### 核心消息
|
||||||
|
|
||||||
|
**Channel** — 频道主体:
|
||||||
|
|
||||||
|
| 字段 | 类型 | 说明 |
|
||||||
|
|---|---|---|
|
||||||
|
| `id` | UUID | 频道 ID |
|
||||||
|
| `workspace_id` | UUID | 所属 workspace |
|
||||||
|
| `category_id` | UUID? | 所属分类(可选) |
|
||||||
|
| `parent_channel_id` | UUID? | 父频道(用于子频道/线程) |
|
||||||
|
| `name` | string | 频道名称 |
|
||||||
|
| `topic` / `description` | string? | 主题 / 描述 |
|
||||||
|
| `channel_type` | ChannelType | 频道类型 |
|
||||||
|
| `channel_kind` | ChannelKind | 频道形态 |
|
||||||
|
| `visibility` | Visibility | 可见性 |
|
||||||
|
| `position` | int32 | 排序位置 |
|
||||||
|
| `nsfw` | bool | NSFW 标记 |
|
||||||
|
| `read_only` | bool | 只读(仅管理员可发消息) |
|
||||||
|
| `archived` | bool | 已归档 |
|
||||||
|
| `rate_limit_per_user` | int32? | 慢速模式(秒/消息) |
|
||||||
|
| `last_message_id` / `last_message_at` | — | 最后一条消息信息 |
|
||||||
|
|
||||||
|
**ChannelStats** — 频道统计:
|
||||||
|
|
||||||
|
| 字段 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| `members_count` | 成员数 |
|
||||||
|
| `messages_count` | 消息数 |
|
||||||
|
| `threads_count` | 线程数 |
|
||||||
|
| `reactions_count` | 反应数 |
|
||||||
|
| `mentions_count` | @提及数 |
|
||||||
|
| `files_count` | 文件数 |
|
||||||
|
| `last_activity_at` | 最后活跃时间 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MemberService(`member.proto`)
|
||||||
|
|
||||||
|
频道成员管理。
|
||||||
|
|
||||||
|
### 枚举
|
||||||
|
|
||||||
|
**Role** — 角色层级(从高到低):
|
||||||
|
|
||||||
|
| 值 | 含义 |
|
||||||
|
|---|---|
|
||||||
|
| `OWNER` | 频道所有者 |
|
||||||
|
| `ADMIN` | 管理员(全部权限) |
|
||||||
|
| `MAINTAINER` | 维护者(管理频道设置、成员) |
|
||||||
|
| `MODERATOR` | 版主(管理消息、踢人) |
|
||||||
|
| `MEMBER` | 普通成员 |
|
||||||
|
| `CONTRIBUTOR` | 贡献者(可发消息,部分限制) |
|
||||||
|
| `VIEWER` | 观察者(只读) |
|
||||||
|
| `GUEST` | 访客(临时访问) |
|
||||||
|
| `BOT` | 机器人 |
|
||||||
|
|
||||||
|
**MemberStatus** — 成员状态:
|
||||||
|
|
||||||
|
| 值 | 含义 |
|
||||||
|
|---|---|
|
||||||
|
| `ACTIVE` | 活跃成员 |
|
||||||
|
| `INVITED` | 已邀请(尚未加入) |
|
||||||
|
| `LEFT` | 已离开 |
|
||||||
|
| `KICKED` | 被踢出 |
|
||||||
|
| `BANNED` | 被封禁 |
|
||||||
|
|
||||||
|
### RPC 列表
|
||||||
|
|
||||||
|
```
|
||||||
|
ListMembers(channel_id, ...) → [ChannelMember], total 列出成员(支持状态过滤)
|
||||||
|
InviteMember(channel_id, user_id) → ChannelMember 邀请用户加入频道
|
||||||
|
UpdateMember(channel_id, user_id) → ChannelMember 更新成员(角色/禁言/置顶)
|
||||||
|
KickMember(channel_id, user_id) → {} 踢出成员
|
||||||
|
JoinChannel(channel_id, user_id) → ChannelMember 用户主动加入
|
||||||
|
LeaveChannel(channel_id, user_id) → {} 用户主动离开
|
||||||
|
IsMember(channel_id, user_id) → is_member, role 检查是否为成员
|
||||||
|
```
|
||||||
|
|
||||||
|
### 核心消息
|
||||||
|
|
||||||
|
**ChannelMember** — 频道成员:
|
||||||
|
|
||||||
|
| 字段 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| `channel_id` / `user_id` | 频道 + 用户 |
|
||||||
|
| `role` | 角色(Role 枚举值字符串) |
|
||||||
|
| `status` | 状态(MemberStatus 枚举值字符串) |
|
||||||
|
| `muted` | 是否被禁言 |
|
||||||
|
| `pinned` | 是否被置顶(频道侧标记) |
|
||||||
|
| `last_read_message_id` / `last_read_at` | 已读进度 |
|
||||||
|
| `joined_at` / `left_at` | 加入/离开时间 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PermissionService(`permission.proto`)
|
||||||
|
|
||||||
|
频道级权限系统,独立于 workspace/repo 的通用权限。
|
||||||
|
|
||||||
|
### 权限枚举(ImPermission)
|
||||||
|
|
||||||
|
| 权限 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| `READ_CHANNEL` | 查看频道 |
|
||||||
|
| `SEND_MESSAGE` | 发送消息 |
|
||||||
|
| `MANAGE_THREADS` | 管理线程 |
|
||||||
|
| `MANAGE_REACTIONS` | 管理反应 |
|
||||||
|
| `MANAGE_PINS` | 管理置顶消息 |
|
||||||
|
| `INVITE_MEMBERS` | 邀请成员 |
|
||||||
|
| `KICK_MEMBERS` | 踢出成员 |
|
||||||
|
| `MANAGE_CHANNEL` | 管理频道设置 |
|
||||||
|
| `MANAGE_ROLES` | 管理角色 |
|
||||||
|
| `MANAGE_WEBHOOKS` | 管理 Webhook |
|
||||||
|
| `MANAGE_EMOJIS` | 管理自定义表情 |
|
||||||
|
| `VIEW_AUDIT_LOG` | 查看审计日志 |
|
||||||
|
| `MANAGE_INTEGRATIONS` | 管理外部集成 |
|
||||||
|
| `SEND_TTS` | 发送 TTS 消息 |
|
||||||
|
| `USE_SLASH_COMMANDS` | 使用斜杠命令 |
|
||||||
|
| `ATTACH_FILES` | 上传文件 |
|
||||||
|
| `MENTION_EVERYONE` | @所有人 |
|
||||||
|
| `MANAGE_MESSAGES` | 管理消息(删除他人消息) |
|
||||||
|
| `ADMIN` | 管理员(拥有所有权限) |
|
||||||
|
|
||||||
|
### RPC 列表
|
||||||
|
|
||||||
|
```
|
||||||
|
CheckPermission(channel, user, perm) → allowed, role 检查单项权限
|
||||||
|
GetPermissions(channel, user) → [ImPermission] 获取用户全部权限
|
||||||
|
SetPermissionOverwrite(channel, target) → Overwrite 设置权限覆盖
|
||||||
|
GetPermissionOverwrites(channel) → [Overwrite] 获取覆盖列表
|
||||||
|
DeletePermissionOverwrite(channel, target) → {} 删除覆盖
|
||||||
|
|
||||||
|
ResolveChannel(channel_id) → 频道摘要信息 解析频道元数据
|
||||||
|
EnsureReadable(channel, user) → allowed 确保用户可读(快速检查)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 权限覆盖(PermissionOverwrite)
|
||||||
|
|
||||||
|
权限覆盖允许对特定用户/角色在特定频道上覆盖默认权限:
|
||||||
|
|
||||||
|
| 字段 | 说明 |
|
||||||
|
|---|---|
|
||||||
|
| `target_type` | `"user"` 或 `"role"` |
|
||||||
|
| `target_id` | 用户 ID 或角色 ID |
|
||||||
|
| `allow` | 显式允许的权限列表 |
|
||||||
|
| `deny` | 显式拒绝的权限列表 |
|
||||||
|
|
||||||
|
权限解析优先级:`deny 覆盖 > allow 覆盖 > 角色权限`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ChannelSettings 服务组(`channel_settings.proto`)
|
||||||
|
|
||||||
|
所有频道配置相关的服务定义在同一个 proto 文件中。
|
||||||
|
|
||||||
|
### ChannelRoleService — 频道自定义角色
|
||||||
|
|
||||||
|
频道级别的自定义角色(不同于 `member.proto` 中的全局 Role 枚举)。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListChannelRoles(channel_id) → [ChannelRole]
|
||||||
|
CreateChannelRole(channel, name) → ChannelRole
|
||||||
|
UpdateChannelRole(role_id, ...) → ChannelRole
|
||||||
|
DeleteChannelRole(role_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ChannelRole** 字段:`name`, `permissions[]`(ImPermission 字符串列表), `assignable`(是否可被普通成员分配)
|
||||||
|
|
||||||
|
### ChannelInvitationService — 邀请管理
|
||||||
|
|
||||||
|
```
|
||||||
|
ListInvitations(channel_id) → [ChannelInvitation]
|
||||||
|
CreateInvitation(channel, user) → ChannelInvitation
|
||||||
|
AcceptInvitation(invitation_id) → ChannelInvitation
|
||||||
|
RevokeInvitation(invitation_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ChannelInvitation** 字段:`invited_by`, `invited_user_id`, `role`(预设角色), `status`
|
||||||
|
|
||||||
|
### ChannelWebhookService — Webhook 管理
|
||||||
|
|
||||||
|
```
|
||||||
|
ListWebhooks(channel_id) → [ChannelWebhook]
|
||||||
|
CreateWebhook(channel, name, url) → ChannelWebhook
|
||||||
|
UpdateWebhook(webhook_id, ...) → ChannelWebhook
|
||||||
|
DeleteWebhook(webhook_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ChannelWebhook** 字段:`name`, `url`, `secret`(签名验证用), `events[]`(订阅事件列表), `active`
|
||||||
|
|
||||||
|
### ChannelSlashCommandService — 斜杠命令注册
|
||||||
|
|
||||||
|
```
|
||||||
|
ListSlashCommands(channel_id) → [ChannelSlashCommand]
|
||||||
|
CreateSlashCommand(channel, cmd, url) → ChannelSlashCommand
|
||||||
|
UpdateSlashCommand(command_id, ...) → ChannelSlashCommand
|
||||||
|
DeleteSlashCommand(command_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ChannelSlashCommand** 字段:`command`(命令名如 `/deploy`), `description`, `request_url`(回调地址), `scopes[]`
|
||||||
|
|
||||||
|
### ChannelRepoLinkService — 仓库关联
|
||||||
|
|
||||||
|
将频道与代码仓库关联,自动推送仓库事件到频道。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListRepoLinks(channel_id) → [ChannelRepoLink]
|
||||||
|
CreateRepoLink(channel, repo, type) → ChannelRepoLink
|
||||||
|
DeleteRepoLink(link_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ChannelRepoLink** 字段:`repo_id`, `link_type`, `events[]`(订阅的仓库事件:push、pr、issue 等)
|
||||||
|
|
||||||
|
### ImIntegrationService — 外部平台集成
|
||||||
|
|
||||||
|
与 Slack、Discord 等外部平台的消息同步。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListIntegrations(channel_id) → [ImIntegration]
|
||||||
|
CreateIntegration(channel, provider, ...)→ ImIntegration
|
||||||
|
UpdateIntegration(integration_id, ...) → ImIntegration
|
||||||
|
DeleteIntegration(integration_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ImIntegration** 字段:`provider`(平台名), `external_channel_id`(外部频道 ID), `sync_direction`(`inbound`/`outbound`/`bidirectional`), `active`
|
||||||
|
|
||||||
|
### CustomEmojiService — 自定义表情
|
||||||
|
|
||||||
|
工作区级别的自定义表情管理。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListCustomEmojis(workspace_id) → [CustomEmoji]
|
||||||
|
CreateCustomEmoji(workspace, name, url) → CustomEmoji
|
||||||
|
DeleteCustomEmoji(emoji_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**CustomEmoji** 字段:`workspace_id`, `name`(表情名如 `:appks:`), `image_url`
|
||||||
|
|
||||||
|
### ForumTagService — 论坛标签
|
||||||
|
|
||||||
|
论坛频道(`ChannelKind::FORUM`)的帖子分类标签。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListForumTags(channel_id) → [ForumTag]
|
||||||
|
CreateForumTag(channel, name, ...) → ForumTag
|
||||||
|
UpdateForumTag(tag_id, ...) → ForumTag
|
||||||
|
DeleteForumTag(tag_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**ForumTag** 字段:`name`, `moderated`(是否需要管理员审核), `position`
|
||||||
|
|
||||||
|
### VoiceService — 语音频道
|
||||||
|
|
||||||
|
语音频道的参与者状态管理。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListVoiceParticipants(channel_id) → [VoiceParticipant]
|
||||||
|
UpdateVoiceState(channel, user, ...) → VoiceParticipant
|
||||||
|
```
|
||||||
|
|
||||||
|
**VoiceParticipant** 字段:`user_id`, `muted`(静音), `deafened`(屏蔽音频), `joined_at`
|
||||||
|
|
||||||
|
### StageService — 舞台频道
|
||||||
|
|
||||||
|
舞台频道(`ChannelKind::STAGE`)的管理。主持人说话,观众收听。
|
||||||
|
|
||||||
|
```
|
||||||
|
GetStage(channel_id) → Stage
|
||||||
|
CreateStage(channel, topic, ...) → Stage
|
||||||
|
UpdateStage(stage_id, ...) → Stage
|
||||||
|
DeleteStage(stage_id) → {}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Stage** 字段:`topic`(当前话题), `privacy_level`, `discoverable`(是否可被发现), `started_at` / `ended_at`
|
||||||
|
|
||||||
|
### ChannelAuditService — 审计日志
|
||||||
|
|
||||||
|
频道操作审计日志查询(只读)。
|
||||||
|
|
||||||
|
```
|
||||||
|
ListChannelEvents(channel_id, ...) → [ChannelAuditEvent], total
|
||||||
|
```
|
||||||
|
|
||||||
|
**ChannelAuditEvent** 字段:`actor_id`(操作者), `event_type`(事件类型字符串), `target_type` / `target_id`(操作对象), `old_value` / `new_value`(变更前后值)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## imks 与 appks 的调用关系
|
||||||
|
|
||||||
|
```
|
||||||
|
┌────────────────────────────────────────────────────────────┐
|
||||||
|
│ imks │
|
||||||
|
│ │
|
||||||
|
│ Socket.IO / WebSocket / WebTransport │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 连接握手 ──→ TokenService.VerifyToken() 或 本地密钥验证 │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 消息收发 ──→ ChannelService + MemberService │
|
||||||
|
│ │ PermissionService.EnsureReadable() │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 频道管理 ──→ ChannelService CRUD │
|
||||||
|
│ │ ChannelRoleService │
|
||||||
|
│ │ ChannelInvitationService │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ 语音/舞台 ──→ VoiceService + StageService │
|
||||||
|
│ │
|
||||||
|
│ 集成/扩展 ──→ WebhookService + SlashCommandService │
|
||||||
|
│ RepoLinkService + ImIntegrationService │
|
||||||
|
│ │
|
||||||
|
│ 审计查询 ──→ ChannelAuditService │
|
||||||
|
└────────────────────────┬───────────────────────────────────┘
|
||||||
|
│ gRPC
|
||||||
|
▼
|
||||||
|
┌────────────────────────────────────────────────────────────┐
|
||||||
|
│ appks │
|
||||||
|
│ TokenService server │ Channel/Member/Permission server │
|
||||||
|
│ Redis (JWT keys) │ Postgres (channel data) │
|
||||||
|
└────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### imks 本地缓存建议
|
||||||
|
|
||||||
|
| 数据 | 缓存策略 | 刷新时机 |
|
||||||
|
|--------------------------|--------------|--------------------------|
|
||||||
|
| 签名密钥 (`SigningKey[]`) | 内存 HashMap | `next_rotation_at` 到达时拉取 |
|
||||||
|
| 频道信息 (`Channel`) | LRU + TTL | 频道更新事件 (NATS) |
|
||||||
|
| 成员列表 (`ChannelMember[]`) | LRU + TTL | 成员变更事件 (NATS) |
|
||||||
|
| 权限缓存 | 短期 TTL (30s) | 权限变更事件 (NATS) |
|
||||||
|
| 自定义表情 | 全量加载 + 事件增量 | emoji 增删事件 (NATS) |
|
||||||
+12
-6
@@ -1,26 +1,32 @@
|
|||||||
//! Health check endpoint for the imks server.
|
//! Health check endpoint for the imks server.
|
||||||
//!
|
//!
|
||||||
//! Returns JSON with server status, version, and upstream connectivity.
|
//! Returns JSON with server status, version, uptime, and connection counts
|
||||||
|
//! sourced from live runtime state (session store + atomic counter).
|
||||||
|
|
||||||
use actix_web::HttpResponse;
|
use actix_web::{HttpResponse, web};
|
||||||
use serde::Serialize;
|
use serde::Serialize;
|
||||||
|
|
||||||
|
use crate::engine::session::SessionStore;
|
||||||
|
use crate::telemetry;
|
||||||
|
|
||||||
#[derive(Serialize)]
|
#[derive(Serialize)]
|
||||||
struct HealthResponse {
|
struct HealthResponse {
|
||||||
status: String,
|
status: String,
|
||||||
version: String,
|
version: String,
|
||||||
timestamp: String,
|
timestamp: String,
|
||||||
uptime_secs: u64,
|
uptime_secs: u64,
|
||||||
|
connections_active: u64,
|
||||||
sessions_count: usize,
|
sessions_count: usize,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// GET /health — returns server health status.
|
/// GET /health — returns server health status with live connection metrics.
|
||||||
pub async fn health_check() -> HttpResponse {
|
pub async fn health_check(store: web::Data<SessionStore>) -> HttpResponse {
|
||||||
HttpResponse::Ok().json(HealthResponse {
|
HttpResponse::Ok().json(HealthResponse {
|
||||||
status: "healthy".into(),
|
status: "healthy".into(),
|
||||||
version: env!("CARGO_PKG_VERSION").into(),
|
version: env!("CARGO_PKG_VERSION").into(),
|
||||||
timestamp: chrono::Utc::now().to_rfc3339(),
|
timestamp: chrono::Utc::now().to_rfc3339(),
|
||||||
uptime_secs: 0,
|
uptime_secs: telemetry::health::uptime_secs(),
|
||||||
sessions_count: 0,
|
connections_active: telemetry::health::connections_active_count(),
|
||||||
|
sessions_count: store.len(),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|||||||
+10
-1
@@ -115,17 +115,26 @@ impl EngineServer {
|
|||||||
));
|
));
|
||||||
let heartbeat_handle = heartbeat.start();
|
let heartbeat_handle = heartbeat.start();
|
||||||
|
|
||||||
tracing::info!("Engine.IO HTTP server listening on {}", addr);
|
tracing::info!(
|
||||||
|
endpoint = %addr,
|
||||||
|
"Engine.IO HTTP server listening, /health and /metrics available"
|
||||||
|
);
|
||||||
|
|
||||||
let result = HttpServer::new(move || {
|
let result = HttpServer::new(move || {
|
||||||
App::new()
|
App::new()
|
||||||
.app_data(web::Data::new(store.clone()))
|
.app_data(web::Data::new(store.clone()))
|
||||||
.app_data(web::Data::new(config.clone()))
|
.app_data(web::Data::new(config.clone()))
|
||||||
.app_data(web::Data::new(on_message.clone()))
|
.app_data(web::Data::new(on_message.clone()))
|
||||||
|
// Health check with connection metrics
|
||||||
.route(
|
.route(
|
||||||
"/health",
|
"/health",
|
||||||
web::get().to(crate::engine::health::health_check),
|
web::get().to(crate::engine::health::health_check),
|
||||||
)
|
)
|
||||||
|
// Prometheus metrics endpoint
|
||||||
|
.route(
|
||||||
|
"/metrics",
|
||||||
|
web::get().to(crate::telemetry::metrics::metrics_handler),
|
||||||
|
)
|
||||||
.route("/engine.io/", web::get().to(engine_get))
|
.route("/engine.io/", web::get().to(engine_get))
|
||||||
.route(
|
.route(
|
||||||
"/engine.io/",
|
"/engine.io/",
|
||||||
|
|||||||
+11
-1
@@ -129,6 +129,12 @@ impl SessionStore {
|
|||||||
sid
|
sid
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
if let Some(m) = crate::telemetry::metrics::try_get() {
|
||||||
|
m.engine_sessions_active.add(
|
||||||
|
1,
|
||||||
|
&[opentelemetry::KeyValue::new("transport", transport.as_str())],
|
||||||
|
);
|
||||||
|
}
|
||||||
rx
|
rx
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -137,7 +143,11 @@ impl SessionStore {
|
|||||||
}
|
}
|
||||||
|
|
||||||
pub fn remove(&self, sid: &str) {
|
pub fn remove(&self, sid: &str) {
|
||||||
self.sessions.remove(sid);
|
if self.sessions.remove(sid).is_some()
|
||||||
|
&& let Some(m) = crate::telemetry::metrics::try_get()
|
||||||
|
{
|
||||||
|
m.engine_sessions_active.add(-1, &[]);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn exists(&self, sid: &str) -> bool {
|
pub fn exists(&self, sid: &str) -> bool {
|
||||||
|
|||||||
@@ -8,5 +8,6 @@ pub mod repo;
|
|||||||
pub mod rpc;
|
pub mod rpc;
|
||||||
pub mod socket;
|
pub mod socket;
|
||||||
pub mod svc;
|
pub mod svc;
|
||||||
|
pub mod telemetry;
|
||||||
|
|
||||||
pub use error::{ImksError, ImksResult};
|
pub use error::{ImksError, ImksResult};
|
||||||
|
|||||||
@@ -9,14 +9,12 @@ use imks::socket::message_bus::{NatsMessageBus, RedisMessageBus};
|
|||||||
|
|
||||||
use imks::socket::server::SocketServerBuilder;
|
use imks::socket::server::SocketServerBuilder;
|
||||||
use imks::svc::{DeployConfig, MessageService};
|
use imks::svc::{DeployConfig, MessageService};
|
||||||
|
use imks::telemetry;
|
||||||
|
|
||||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||||
tracing_subscriber::fmt()
|
// Initialize observability stack (traces, metrics, logs, health)
|
||||||
.with_env_filter(
|
let telemetry_guard = telemetry::init();
|
||||||
tracing_subscriber::EnvFilter::try_from_default_env()
|
telemetry::health::init_counters();
|
||||||
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info")),
|
|
||||||
)
|
|
||||||
.init();
|
|
||||||
|
|
||||||
let deploy = DeployConfig::from_env();
|
let deploy = DeployConfig::from_env();
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
@@ -37,7 +35,6 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|||||||
Arc::new(OnceLock::new());
|
Arc::new(OnceLock::new());
|
||||||
|
|
||||||
// Pre-configure adapter for Redis/NATS mode.
|
// Pre-configure adapter for Redis/NATS mode.
|
||||||
// The callback resolves namespaces after SocketServer is built.
|
|
||||||
match deploy.adapter_mode.as_str() {
|
match deploy.adapter_mode.as_str() {
|
||||||
"redis" => {
|
"redis" => {
|
||||||
let message_bus = Arc::new(
|
let message_bus = Arc::new(
|
||||||
@@ -130,27 +127,58 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|||||||
.map_err(|e| e.to_string())?;
|
.map_err(|e| e.to_string())?;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Increment connection metrics
|
||||||
|
let m = telemetry::metrics::get();
|
||||||
|
m.connections_active.add(
|
||||||
|
1,
|
||||||
|
&telemetry::MetricsInstruments::namespace_attrs(&socket.namespace),
|
||||||
|
);
|
||||||
|
m.connections_total.add(
|
||||||
|
1,
|
||||||
|
&telemetry::MetricsInstruments::namespace_attrs(&socket.namespace),
|
||||||
|
);
|
||||||
|
telemetry::health::connection_connected();
|
||||||
|
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
"Socket {} connected (engine: {})",
|
socket_sid = %socket.sid,
|
||||||
socket.sid,
|
engine_sid = %socket.engine_sid,
|
||||||
socket.engine_sid
|
namespace = %socket.namespace,
|
||||||
|
"Socket connected"
|
||||||
);
|
);
|
||||||
Ok(())
|
Ok(())
|
||||||
})
|
})
|
||||||
.await;
|
.await;
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
// Register Socket.IO event handlers
|
// Register Socket.IO event handlers
|
||||||
if let Some(ref svc) = service {
|
if let Some(ref svc) = service {
|
||||||
macro_rules! register_event {
|
macro_rules! register_event {
|
||||||
($svc:expr, $ns:expr, $event:expr, $method:ident) => {
|
($svc:expr, $ns:expr, $event:expr, $method:ident) => {
|
||||||
let s = $svc.clone();
|
let s = $svc.clone();
|
||||||
|
let event_name = $event.to_string();
|
||||||
$ns.on_event($event, Arc::new(move |socket, data| {
|
$ns.on_event($event, Arc::new(move |socket, data| {
|
||||||
let s = s.clone();
|
let s = s.clone();
|
||||||
let data = data.clone();
|
let data = data.clone();
|
||||||
|
let event = event_name.clone();
|
||||||
tokio::spawn(async move {
|
tokio::spawn(async move {
|
||||||
|
let _span = tracing::info_span!(
|
||||||
|
"socket_event",
|
||||||
|
otel.name = format!("handle {event}"),
|
||||||
|
event = %event,
|
||||||
|
socket_sid = %socket.sid,
|
||||||
|
);
|
||||||
|
let _enter = _span.enter();
|
||||||
|
|
||||||
|
let start = std::time::Instant::now();
|
||||||
if let Err(e) = s.$method(socket, &data).await {
|
if let Err(e) = s.$method(socket, &data).await {
|
||||||
tracing::error!(event = $event, error = %e, "Event handler failed");
|
tracing::error!(event = %event, error = %e, "Event handler failed");
|
||||||
}
|
}
|
||||||
|
let elapsed = start.elapsed().as_secs_f64();
|
||||||
|
telemetry::metrics::get().event_handling_duration.record(
|
||||||
|
elapsed,
|
||||||
|
&telemetry::MetricsInstruments::event_attrs(&event),
|
||||||
|
);
|
||||||
});
|
});
|
||||||
})).await;
|
})).await;
|
||||||
};
|
};
|
||||||
@@ -200,11 +228,12 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|||||||
register_event!(svc, namespace, "article:list", list_articles);
|
register_event!(svc, namespace, "article:list", list_articles);
|
||||||
register_event!(svc, namespace, "article:delete", delete_article);
|
register_event!(svc, namespace, "article:delete", delete_article);
|
||||||
register_event!(svc, namespace, "component:interact", interact_component);
|
register_event!(svc, namespace, "component:interact", interact_component);
|
||||||
|
register_event!(svc, namespace, "component:update", update_component);
|
||||||
|
|
||||||
// Start scheduled message dispatcher (background task)
|
// Start scheduled message dispatcher (background task)
|
||||||
svc.clone().start_scheduled_dispatcher();
|
svc.clone().start_scheduled_dispatcher();
|
||||||
|
|
||||||
tracing::info!("Registered Socket.IO event handlers");
|
tracing::info!("Registered Socket.IO event handlers with observability instrumentation");
|
||||||
}
|
}
|
||||||
|
|
||||||
// Start servers
|
// Start servers
|
||||||
@@ -233,6 +262,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
|
|||||||
Ok::<(), Box<dyn std::error::Error>>(())
|
Ok::<(), Box<dyn std::error::Error>>(())
|
||||||
})?;
|
})?;
|
||||||
|
|
||||||
|
// Graceful telemetry shutdown
|
||||||
|
telemetry_guard.shutdown();
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
+11
-3
@@ -75,7 +75,10 @@ impl Namespace {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Remove a socket by its socket SID.
|
/// Remove a socket by its socket SID.
|
||||||
pub async fn remove_socket_by_sid(&self, socket_sid: &str) {
|
///
|
||||||
|
/// Returns `true` if a socket was actually removed, `false` if the SID
|
||||||
|
/// was not found (already removed or never existed).
|
||||||
|
pub async fn remove_socket_by_sid(&self, socket_sid: &str) -> bool {
|
||||||
if let Some((_, socket)) = self.sockets.remove(socket_sid) {
|
if let Some((_, socket)) = self.sockets.remove(socket_sid) {
|
||||||
self.engine_to_socket.remove(&socket.engine_sid);
|
self.engine_to_socket.remove(&socket.engine_sid);
|
||||||
self.remove_socket_from_local_rooms(socket_sid);
|
self.remove_socket_from_local_rooms(socket_sid);
|
||||||
@@ -86,14 +89,19 @@ impl Namespace {
|
|||||||
{
|
{
|
||||||
tracing::warn!("Adapter del_all error for socket {}: {}", socket_sid, e);
|
tracing::warn!("Adapter del_all error for socket {}: {}", socket_sid, e);
|
||||||
}
|
}
|
||||||
|
true
|
||||||
|
} else {
|
||||||
|
false
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Remove a socket by its engine SID (for engine-level disconnections).
|
/// Remove a socket by its engine SID (for engine-level disconnections).
|
||||||
pub async fn remove_socket(&self, engine_sid: &str) {
|
/// Returns `true` if a socket was actually removed.
|
||||||
|
pub async fn remove_socket(&self, engine_sid: &str) -> bool {
|
||||||
if let Some((_, socket_sid)) = self.engine_to_socket.remove(engine_sid) {
|
if let Some((_, socket_sid)) = self.engine_to_socket.remove(engine_sid) {
|
||||||
self.remove_socket_by_sid(&socket_sid).await;
|
return self.remove_socket_by_sid(&socket_sid).await;
|
||||||
}
|
}
|
||||||
|
false
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Look up a socket by its socket SID.
|
/// Look up a socket by its socket SID.
|
||||||
|
|||||||
+67
-34
@@ -143,29 +143,40 @@ async fn handle_engine_message(
|
|||||||
) {
|
) {
|
||||||
if let EnginePacketData::Text(ref text) = engine_packet.data {
|
if let EnginePacketData::Text(ref text) = engine_packet.data {
|
||||||
match parser::decode(text) {
|
match parser::decode(text) {
|
||||||
Ok(socket_packet) => match socket_packet.packet_type {
|
Ok(socket_packet) => {
|
||||||
PacketType::Connect => {
|
let packet_type = format!("{:?}", socket_packet.packet_type);
|
||||||
handle_connect(
|
let _span = tracing::debug_span!(
|
||||||
&engine_sid,
|
"engine_message",
|
||||||
&socket_packet,
|
engine_sid = %engine_sid,
|
||||||
namespaces,
|
packet_type = %packet_type,
|
||||||
socket_txs,
|
namespace = %socket_packet.namespace,
|
||||||
engine_store,
|
);
|
||||||
adapter,
|
let _enter = _span.enter();
|
||||||
)
|
|
||||||
.await;
|
match socket_packet.packet_type {
|
||||||
|
PacketType::Connect => {
|
||||||
|
handle_connect(
|
||||||
|
&engine_sid,
|
||||||
|
&socket_packet,
|
||||||
|
namespaces,
|
||||||
|
socket_txs,
|
||||||
|
engine_store,
|
||||||
|
adapter,
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
}
|
||||||
|
PacketType::Disconnect => {
|
||||||
|
handle_disconnect(&engine_sid, &socket_packet, namespaces, socket_txs);
|
||||||
|
}
|
||||||
|
PacketType::Event => {
|
||||||
|
handle_event(&engine_sid, &socket_packet, namespaces);
|
||||||
|
}
|
||||||
|
PacketType::Ack => {
|
||||||
|
handle_ack(&engine_sid, &socket_packet);
|
||||||
|
}
|
||||||
|
_ => {}
|
||||||
}
|
}
|
||||||
PacketType::Disconnect => {
|
}
|
||||||
handle_disconnect(&engine_sid, &socket_packet, namespaces, socket_txs);
|
|
||||||
}
|
|
||||||
PacketType::Event => {
|
|
||||||
handle_event(&engine_sid, &socket_packet, namespaces);
|
|
||||||
}
|
|
||||||
PacketType::Ack => {
|
|
||||||
handle_ack(&engine_sid, &socket_packet);
|
|
||||||
}
|
|
||||||
_ => {}
|
|
||||||
},
|
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
tracing::warn!(engine_sid = %engine_sid, error = %e, "Invalid Socket.IO packet");
|
tracing::warn!(engine_sid = %engine_sid, error = %e, "Invalid Socket.IO packet");
|
||||||
}
|
}
|
||||||
@@ -181,6 +192,13 @@ async fn handle_connect(
|
|||||||
engine_store: &SessionStore,
|
engine_store: &SessionStore,
|
||||||
adapter: &Arc<dyn Adapter>,
|
adapter: &Arc<dyn Adapter>,
|
||||||
) {
|
) {
|
||||||
|
let _span = tracing::info_span!(
|
||||||
|
"socket_connect",
|
||||||
|
engine_sid = %engine_sid,
|
||||||
|
namespace = %packet.namespace,
|
||||||
|
);
|
||||||
|
let _enter = _span.enter();
|
||||||
|
|
||||||
// Validate namespace path to prevent DoS via arbitrary namespace creation
|
// Validate namespace path to prevent DoS via arbitrary namespace creation
|
||||||
if !crate::socket::namespace::is_valid_namespace(&packet.namespace) {
|
if !crate::socket::namespace::is_valid_namespace(&packet.namespace) {
|
||||||
tracing::warn!(
|
tracing::warn!(
|
||||||
@@ -244,11 +262,16 @@ async fn handle_connect(
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Forwarding task ended — ensure socket is cleaned up from namespace
|
// Forwarding task ended — ensure socket is cleaned up from namespace.
|
||||||
|
// If the socket was still registered (session expiry / engine disconnect
|
||||||
|
// without Socket.IO disconnect packet), also update the connection counter.
|
||||||
socket_txs_clone.remove(&socket_sid_clone);
|
socket_txs_clone.remove(&socket_sid_clone);
|
||||||
namespace_clone
|
let was_removed = namespace_clone
|
||||||
.remove_socket_by_sid(&socket_sid_clone)
|
.remove_socket_by_sid(&socket_sid_clone)
|
||||||
.await;
|
.await;
|
||||||
|
if was_removed {
|
||||||
|
crate::telemetry::health::connection_disconnected();
|
||||||
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
// Send Connect response (only after handler passed)
|
// Send Connect response (only after handler passed)
|
||||||
@@ -268,16 +291,26 @@ fn handle_disconnect(
|
|||||||
namespaces: &Arc<NamespaceManager>,
|
namespaces: &Arc<NamespaceManager>,
|
||||||
socket_txs: &Arc<DashMap<String, mpsc::Sender<Packet>>>,
|
socket_txs: &Arc<DashMap<String, mpsc::Sender<Packet>>>,
|
||||||
) {
|
) {
|
||||||
if let Some(namespace) = namespaces.get_namespace(&packet.namespace) {
|
if let Some(namespace) = namespaces.get_namespace(&packet.namespace)
|
||||||
// Look up socket by engine_sid, then remove by socket_sid
|
&& let Some(socket) = namespace.get_socket_by_engine_sid(engine_sid)
|
||||||
if let Some(socket) = namespace.get_socket_by_engine_sid(engine_sid) {
|
{
|
||||||
socket_txs.remove(&socket.sid);
|
let m = crate::telemetry::metrics::get();
|
||||||
let socket_sid = socket.sid.clone();
|
m.connections_active.add(
|
||||||
let ns_clone = namespace.clone();
|
-1,
|
||||||
tokio::spawn(async move {
|
&crate::telemetry::MetricsInstruments::namespace_attrs(&socket.namespace),
|
||||||
ns_clone.remove_socket_by_sid(&socket_sid).await;
|
);
|
||||||
});
|
m.disconnections_total.add(
|
||||||
}
|
1,
|
||||||
|
&crate::telemetry::MetricsInstruments::namespace_attrs(&socket.namespace),
|
||||||
|
);
|
||||||
|
crate::telemetry::health::connection_disconnected();
|
||||||
|
|
||||||
|
socket_txs.remove(&socket.sid);
|
||||||
|
let socket_sid = socket.sid.clone();
|
||||||
|
let ns_clone = namespace.clone();
|
||||||
|
tokio::spawn(async move {
|
||||||
|
ns_clone.remove_socket_by_sid(&socket_sid).await;
|
||||||
|
});
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -64,7 +64,6 @@ impl MessageService {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Handle `component:update` — update a component's state (e.g., disable after interaction).
|
/// Handle `component:update` — update a component's state (e.g., disable after interaction).
|
||||||
#[allow(dead_code)]
|
|
||||||
pub async fn update_component(
|
pub async fn update_component(
|
||||||
&self,
|
&self,
|
||||||
socket: Arc<Socket>,
|
socket: Arc<Socket>,
|
||||||
|
|||||||
+38
-1
@@ -296,6 +296,43 @@ impl MessageService {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Handle `message:edit_history` — retrieve the edit history for a message.
|
||||||
|
pub async fn get_edit_history(
|
||||||
|
&self,
|
||||||
|
socket: Arc<Socket>,
|
||||||
|
data: &serde_json::Value,
|
||||||
|
) -> ImksResult<()> {
|
||||||
|
let user_id = self.user_id(&socket)?;
|
||||||
|
let payload = Self::first_payload(data)?;
|
||||||
|
let message_id: Uuid = Self::parse_field(payload, "message_id")?;
|
||||||
|
|
||||||
|
let message = self
|
||||||
|
.repo
|
||||||
|
.get(message_id)
|
||||||
|
.await?
|
||||||
|
.ok_or_else(|| ImksError::NotFound(format!("message {message_id}")))?;
|
||||||
|
|
||||||
|
let channel_id_str = message.channel_id.to_string();
|
||||||
|
let user_id_str = user_id.to_string();
|
||||||
|
|
||||||
|
self.ensure_readable(&channel_id_str, &user_id_str).await?;
|
||||||
|
|
||||||
|
let history = self.repo.get_edit_history(message_id).await?;
|
||||||
|
let summary = self.repo.get_edit_summary(message_id).await?;
|
||||||
|
|
||||||
|
let _ = socket.emit(
|
||||||
|
"message:edit_history",
|
||||||
|
serde_json::json!({
|
||||||
|
"message_id": message_id.to_string(),
|
||||||
|
"edits": history,
|
||||||
|
"edit_count": summary.edit_count,
|
||||||
|
"last_edited_at": summary.last_edited_at,
|
||||||
|
"last_edited_by": summary.last_edited_by,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
// Permission validation helpers
|
// Permission validation helpers
|
||||||
|
|
||||||
/// Full write-access gate: resolve channel + readability + membership + SEND_MESSAGE.
|
/// Full write-access gate: resolve channel + readability + membership + SEND_MESSAGE.
|
||||||
@@ -481,7 +518,7 @@ impl MessageService {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn validate_body_size(&self, body: &str) -> ImksResult<()> {
|
pub(crate) fn validate_body_size(&self, body: &str) -> ImksResult<()> {
|
||||||
if body.len() > self.max_body_size {
|
if body.len() > self.max_body_size {
|
||||||
return Err(ImksError::InvalidInput(format!(
|
return Err(ImksError::InvalidInput(format!(
|
||||||
"Message body exceeds max size of {} bytes (got {})",
|
"Message body exceeds max size of {} bytes (got {})",
|
||||||
|
|||||||
+120
-4
@@ -1,15 +1,132 @@
|
|||||||
//! Scheduled message dispatcher on `MessageService`.
|
//! Scheduled message handler on `MessageService`.
|
||||||
//!
|
//!
|
||||||
//! A background task that periodically scans for due scheduled messages
|
//! Provides:
|
||||||
//! and sends them through the normal message path.
|
//! - Client-facing CRUD: schedule, cancel, list pending scheduled messages
|
||||||
|
//! - Background dispatcher: periodically scans for due scheduled messages
|
||||||
|
//! and sends them through the normal message path.
|
||||||
|
|
||||||
|
use std::sync::Arc;
|
||||||
use std::time::Duration;
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
use crate::repo::CreateMessageInput;
|
use crate::repo::CreateMessageInput;
|
||||||
|
use crate::socket::socket::Socket;
|
||||||
|
use crate::{ImksError, ImksResult};
|
||||||
|
|
||||||
use super::message::MessageService;
|
use super::message::MessageService;
|
||||||
|
|
||||||
impl MessageService {
|
impl MessageService {
|
||||||
|
// ── Client-facing scheduled message CRUD ──
|
||||||
|
|
||||||
|
/// Handle `message:schedule` — schedule a message to be sent at a future time.
|
||||||
|
pub async fn schedule_message(
|
||||||
|
&self,
|
||||||
|
socket: Arc<Socket>,
|
||||||
|
data: &serde_json::Value,
|
||||||
|
) -> ImksResult<()> {
|
||||||
|
let user_id = self.user_id(&socket)?;
|
||||||
|
let payload = Self::first_payload(data)?;
|
||||||
|
|
||||||
|
let channel_id: Uuid = Self::parse_field(payload, "channel_id")?;
|
||||||
|
let body: String = Self::parse_field(payload, "body")?;
|
||||||
|
let thread_id: Option<Uuid> = Self::parse_optional(payload, "thread_id")?;
|
||||||
|
let reply_to_message_id: Option<Uuid> =
|
||||||
|
Self::parse_optional(payload, "reply_to_message_id")?;
|
||||||
|
let metadata: Option<serde_json::Value> =
|
||||||
|
Self::parse_optional(payload, "metadata")?;
|
||||||
|
let scheduled_at_str: String = Self::parse_field(payload, "scheduled_at")?;
|
||||||
|
|
||||||
|
let scheduled_at: DateTime<Utc> = chrono::DateTime::parse_from_rfc3339(&scheduled_at_str)
|
||||||
|
.map_err(|e| ImksError::InvalidInput(format!("Invalid scheduled_at: {e}")))?
|
||||||
|
.into();
|
||||||
|
|
||||||
|
let channel_id_str = channel_id.to_string();
|
||||||
|
let user_id_str = user_id.to_string();
|
||||||
|
|
||||||
|
self.validate_body_size(&body)?;
|
||||||
|
self.ensure_readable(&channel_id_str, &user_id_str).await?;
|
||||||
|
self.ensure_member(&channel_id_str, &user_id_str).await?;
|
||||||
|
|
||||||
|
// Validate scheduled_at is in the future
|
||||||
|
if scheduled_at <= Utc::now() {
|
||||||
|
return Err(ImksError::InvalidInput(
|
||||||
|
"scheduled_at must be in the future".into(),
|
||||||
|
));
|
||||||
|
}
|
||||||
|
|
||||||
|
let scheduled = self
|
||||||
|
.repo
|
||||||
|
.schedule_message(
|
||||||
|
channel_id,
|
||||||
|
user_id,
|
||||||
|
thread_id,
|
||||||
|
reply_to_message_id,
|
||||||
|
&body,
|
||||||
|
metadata,
|
||||||
|
scheduled_at,
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
tracing::info!(
|
||||||
|
scheduled_id = %scheduled.id,
|
||||||
|
channel_id = %channel_id,
|
||||||
|
user_id = %user_id,
|
||||||
|
scheduled_at = %scheduled_at,
|
||||||
|
"Message scheduled"
|
||||||
|
);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Handle `message:cancel_scheduled` — cancel a pending scheduled message.
|
||||||
|
pub async fn cancel_scheduled(
|
||||||
|
&self,
|
||||||
|
socket: Arc<Socket>,
|
||||||
|
data: &serde_json::Value,
|
||||||
|
) -> ImksResult<()> {
|
||||||
|
let user_id = self.user_id(&socket)?;
|
||||||
|
let payload = Self::first_payload(data)?;
|
||||||
|
let scheduled_id: Uuid = Self::parse_field(payload, "scheduled_id")?;
|
||||||
|
|
||||||
|
let cancelled = self.repo.cancel_scheduled(scheduled_id).await?;
|
||||||
|
|
||||||
|
if !cancelled {
|
||||||
|
return Err(ImksError::NotFound(format!(
|
||||||
|
"scheduled message {scheduled_id} not found or already processed"
|
||||||
|
)));
|
||||||
|
}
|
||||||
|
|
||||||
|
tracing::info!(%scheduled_id, %user_id, "Scheduled message cancelled");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Handle `message:list_scheduled` — list pending scheduled messages for a channel.
|
||||||
|
pub async fn list_scheduled(
|
||||||
|
&self,
|
||||||
|
socket: Arc<Socket>,
|
||||||
|
data: &serde_json::Value,
|
||||||
|
) -> ImksResult<()> {
|
||||||
|
let user_id = self.user_id(&socket)?;
|
||||||
|
let payload = Self::first_payload(data)?;
|
||||||
|
let channel_id: Uuid = Self::parse_field(payload, "channel_id")?;
|
||||||
|
|
||||||
|
let channel_id_str = channel_id.to_string();
|
||||||
|
let user_id_str = user_id.to_string();
|
||||||
|
|
||||||
|
self.ensure_readable(&channel_id_str, &user_id_str).await?;
|
||||||
|
|
||||||
|
let scheduled = self.repo.list_scheduled(channel_id, user_id).await?;
|
||||||
|
|
||||||
|
let _ = socket.emit(
|
||||||
|
"scheduled:loaded",
|
||||||
|
serde_json::to_value(&scheduled).unwrap_or_default(),
|
||||||
|
);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Background dispatcher ──
|
||||||
|
|
||||||
/// Start the background scheduled-message dispatcher.
|
/// Start the background scheduled-message dispatcher.
|
||||||
/// Scans every 30 seconds for pending messages whose `scheduled_at` has passed.
|
/// Scans every 30 seconds for pending messages whose `scheduled_at` has passed.
|
||||||
pub fn start_scheduled_dispatcher(self: std::sync::Arc<Self>) {
|
pub fn start_scheduled_dispatcher(self: std::sync::Arc<Self>) {
|
||||||
@@ -55,7 +172,6 @@ impl MessageService {
|
|||||||
.mark_scheduled_sent(scheduled.id, message.id)
|
.mark_scheduled_sent(scheduled.id, message.id)
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
// Broadcast to channel
|
|
||||||
if let Some(ns) = self.namespaces.get_namespace("/") {
|
if let Some(ns) = self.namespaces.get_namespace("/") {
|
||||||
ns.emit_to_room(
|
ns.emit_to_room(
|
||||||
&scheduled.channel_id.to_string(),
|
&scheduled.channel_id.to_string(),
|
||||||
|
|||||||
@@ -0,0 +1,85 @@
|
|||||||
|
/// Telemetry configuration, populated from environment variables.
|
||||||
|
///
|
||||||
|
/// Follows the OpenTelemetry environment variable specification:
|
||||||
|
/// <https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/>
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct TelemetryConfig {
|
||||||
|
pub service_name: String,
|
||||||
|
pub service_version: String,
|
||||||
|
pub otlp_endpoint: String,
|
||||||
|
pub otlp_protocol: OtlpProtocol,
|
||||||
|
pub traces_enabled: bool,
|
||||||
|
pub metrics_enabled: bool,
|
||||||
|
pub logs_enabled: bool,
|
||||||
|
pub log_format: LogFormat,
|
||||||
|
pub log_level: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, PartialEq)]
|
||||||
|
pub enum OtlpProtocol {
|
||||||
|
Grpc,
|
||||||
|
HttpProtobuf,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, PartialEq)]
|
||||||
|
pub enum LogFormat {
|
||||||
|
Json,
|
||||||
|
Pretty,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for TelemetryConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
service_name: env_or("OTEL_SERVICE_NAME", "imks"),
|
||||||
|
service_version: env_or("OTEL_SERVICE_VERSION", env!("CARGO_PKG_VERSION")),
|
||||||
|
otlp_endpoint: env_or(
|
||||||
|
"OTEL_EXPORTER_OTLP_ENDPOINT",
|
||||||
|
"http://localhost:4317",
|
||||||
|
),
|
||||||
|
otlp_protocol: detect_otlp_protocol(),
|
||||||
|
traces_enabled: env_bool("OTEL_TRACES_ENABLED", true),
|
||||||
|
metrics_enabled: env_bool("OTEL_METRICS_ENABLED", true),
|
||||||
|
logs_enabled: env_bool("OTEL_LOGS_ENABLED", true),
|
||||||
|
log_format: detect_log_format(),
|
||||||
|
log_level: env_or("RUST_LOG", "info"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TelemetryConfig {
|
||||||
|
pub fn from_env() -> Self {
|
||||||
|
Self::default()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn env_or(key: &str, default: &str) -> String {
|
||||||
|
std::env::var(key).unwrap_or_else(|_| default.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn env_bool(key: &str, default: bool) -> bool {
|
||||||
|
std::env::var(key)
|
||||||
|
.map(|v| matches!(v.to_lowercase().as_str(), "true" | "1" | "yes" | "on"))
|
||||||
|
.unwrap_or(default)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn detect_otlp_protocol() -> OtlpProtocol {
|
||||||
|
match std::env::var("OTEL_EXPORTER_OTLP_PROTOCOL")
|
||||||
|
.unwrap_or_default()
|
||||||
|
.to_lowercase()
|
||||||
|
.as_str()
|
||||||
|
{
|
||||||
|
"http/protobuf" | "http/binary" => OtlpProtocol::HttpProtobuf,
|
||||||
|
_ => OtlpProtocol::Grpc, // default to gRPC as project already depends on tonic
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn detect_log_format() -> LogFormat {
|
||||||
|
match std::env::var("LOG_FORMAT")
|
||||||
|
.unwrap_or_else(|_| "json".to_string())
|
||||||
|
.to_lowercase()
|
||||||
|
.as_str()
|
||||||
|
{
|
||||||
|
"pretty" | "text" | "console" => LogFormat::Pretty,
|
||||||
|
_ => LogFormat::Json, // default to JSON for structured logging
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,170 @@
|
|||||||
|
//! Enhanced health check endpoint with upstream dependency checks.
|
||||||
|
//!
|
||||||
|
//! Returns JSON with server status, version, uptime, connection counts,
|
||||||
|
//! and optional health checks for PostgreSQL, Redis, NATS, and gRPC.
|
||||||
|
|
||||||
|
use std::sync::Arc;
|
||||||
|
use std::sync::OnceLock;
|
||||||
|
use std::sync::atomic::{AtomicU64, Ordering};
|
||||||
|
use std::time::Instant;
|
||||||
|
|
||||||
|
use actix_web::HttpResponse;
|
||||||
|
use serde::Serialize;
|
||||||
|
|
||||||
|
/// Server start time captured at init.
|
||||||
|
static START_TIME: std::sync::OnceLock<Instant> = std::sync::OnceLock::new();
|
||||||
|
|
||||||
|
/// Live connection counter shared across the process.
|
||||||
|
/// Updated by the socket layer on connect / disconnect.
|
||||||
|
static CONNECTIONS_ACTIVE: OnceLock<AtomicU64> = OnceLock::new();
|
||||||
|
|
||||||
|
/// Initializes the start time (call once during startup).
|
||||||
|
pub fn record_start_time() {
|
||||||
|
START_TIME.set(Instant::now()).ok();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Initialize shared health counters (call once during startup).
|
||||||
|
pub fn init_counters() {
|
||||||
|
CONNECTIONS_ACTIVE.set(AtomicU64::new(0)).ok();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Signal that a new socket connection was established.
|
||||||
|
pub fn connection_connected() {
|
||||||
|
if let Some(c) = CONNECTIONS_ACTIVE.get() {
|
||||||
|
c.fetch_add(1, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Signal that a socket connection was closed.
|
||||||
|
pub fn connection_disconnected() {
|
||||||
|
if let Some(c) = CONNECTIONS_ACTIVE.get() {
|
||||||
|
c.fetch_sub(1, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Return the current number of active socket connections.
|
||||||
|
pub fn connections_active_count() -> u64 {
|
||||||
|
CONNECTIONS_ACTIVE
|
||||||
|
.get()
|
||||||
|
.map(|c| c.load(Ordering::Relaxed))
|
||||||
|
.unwrap_or(0)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns the server uptime in seconds.
|
||||||
|
pub fn uptime_secs() -> u64 {
|
||||||
|
START_TIME
|
||||||
|
.get()
|
||||||
|
.map(|t| t.elapsed().as_secs())
|
||||||
|
.unwrap_or(0)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct HealthResponse {
|
||||||
|
pub status: String,
|
||||||
|
pub version: String,
|
||||||
|
pub timestamp: String,
|
||||||
|
pub uptime_secs: u64,
|
||||||
|
pub connections_active: u64,
|
||||||
|
pub sessions_count: u64,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub checks: Option<HealthChecks>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct HealthChecks {
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub postgres: Option<CheckResult>,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub redis: Option<CheckResult>,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub nats: Option<CheckResult>,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub grpc: Option<CheckResult>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct CheckResult {
|
||||||
|
pub status: String,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub latency_ms: Option<u64>,
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub error: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Optional external check functions.
|
||||||
|
/// Each returns `Some(CheckResult)` if the service is configured, `None` otherwise.
|
||||||
|
#[derive(Default)]
|
||||||
|
pub struct HealthCheckFns {
|
||||||
|
pub check_postgres: Option<Arc<dyn Fn() -> CheckResult + Send + Sync>>,
|
||||||
|
pub check_redis: Option<Arc<dyn Fn() -> CheckResult + Send + Sync>>,
|
||||||
|
pub check_nats: Option<Arc<dyn Fn() -> CheckResult + Send + Sync>>,
|
||||||
|
pub check_grpc: Option<Arc<dyn Fn() -> CheckResult + Send + Sync>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl HealthCheckFns {
|
||||||
|
pub fn with_postgres(mut self, f: impl Fn() -> CheckResult + Send + Sync + 'static) -> Self {
|
||||||
|
self.check_postgres = Some(Arc::new(f));
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn with_redis(mut self, f: impl Fn() -> CheckResult + Send + Sync + 'static) -> Self {
|
||||||
|
self.check_redis = Some(Arc::new(f));
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn with_nats(mut self, f: impl Fn() -> CheckResult + Send + Sync + 'static) -> Self {
|
||||||
|
self.check_nats = Some(Arc::new(f));
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn with_grpc(mut self, f: impl Fn() -> CheckResult + Send + Sync + 'static) -> Self {
|
||||||
|
self.check_grpc = Some(Arc::new(f));
|
||||||
|
self
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// GET /health handler with dependency checks.
|
||||||
|
pub async fn health_check(checks: actix_web::web::Data<Arc<HealthCheckFns>>) -> HttpResponse {
|
||||||
|
let checks = checks.get_ref();
|
||||||
|
|
||||||
|
let health_checks = if checks.check_postgres.is_some()
|
||||||
|
|| checks.check_redis.is_some()
|
||||||
|
|| checks.check_nats.is_some()
|
||||||
|
|| checks.check_grpc.is_some()
|
||||||
|
{
|
||||||
|
Some(HealthChecks {
|
||||||
|
postgres: checks.check_postgres.as_ref().map(|f| f()),
|
||||||
|
redis: checks.check_redis.as_ref().map(|f| f()),
|
||||||
|
nats: checks.check_nats.as_ref().map(|f| f()),
|
||||||
|
grpc: checks.check_grpc.as_ref().map(|f| f()),
|
||||||
|
})
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
};
|
||||||
|
|
||||||
|
let overall_status = if let Some(ref hc) = health_checks {
|
||||||
|
let all_up = [&hc.postgres, &hc.redis, &hc.nats, &hc.grpc]
|
||||||
|
.iter()
|
||||||
|
.filter_map(|c| c.as_ref())
|
||||||
|
.all(|c| c.status == "up");
|
||||||
|
if all_up {
|
||||||
|
"healthy"
|
||||||
|
} else {
|
||||||
|
"degraded"
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"healthy"
|
||||||
|
};
|
||||||
|
|
||||||
|
let response = HealthResponse {
|
||||||
|
status: overall_status.to_string(),
|
||||||
|
version: env!("CARGO_PKG_VERSION").to_string(),
|
||||||
|
timestamp: chrono::Utc::now().to_rfc3339(),
|
||||||
|
uptime_secs: uptime_secs(),
|
||||||
|
connections_active: 0,
|
||||||
|
sessions_count: 0,
|
||||||
|
checks: health_checks,
|
||||||
|
};
|
||||||
|
|
||||||
|
HttpResponse::Ok().json(response)
|
||||||
|
}
|
||||||
@@ -0,0 +1,129 @@
|
|||||||
|
//! Log export: JSON console output + OpenTelemetry log bridge (OTLP).
|
||||||
|
|
||||||
|
use opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge;
|
||||||
|
use opentelemetry_otlp::{LogExporter, Protocol, WithExportConfig};
|
||||||
|
use opentelemetry_sdk::logs::SdkLoggerProvider;
|
||||||
|
use opentelemetry_sdk::Resource;
|
||||||
|
use tracing_subscriber::fmt::format::FmtSpan;
|
||||||
|
use tracing_subscriber::layer::SubscriberExt;
|
||||||
|
use tracing_subscriber::EnvFilter;
|
||||||
|
use tracing_subscriber::Registry;
|
||||||
|
|
||||||
|
use super::config::{OtlpProtocol, TelemetryConfig};
|
||||||
|
use crate::ImksResult;
|
||||||
|
|
||||||
|
/// Initialize the tracing subscriber.
|
||||||
|
///
|
||||||
|
/// Layer order (critical for OpenTelemetry compatibility):
|
||||||
|
/// 1. Registry
|
||||||
|
/// 2. OpenTelemetry trace layer (must be first — needs LookupSpan)
|
||||||
|
/// 3. EnvFilter
|
||||||
|
/// 4. Console formatting layer (JSON)
|
||||||
|
/// 5. OpenTelemetry log bridge
|
||||||
|
///
|
||||||
|
/// Returns the SdkLoggerProvider for graceful shutdown.
|
||||||
|
pub fn init_subscriber(
|
||||||
|
config: &TelemetryConfig,
|
||||||
|
resource: Option<&Resource>,
|
||||||
|
otel_trace_layer: Option<
|
||||||
|
tracing_opentelemetry::OpenTelemetryLayer<Registry, opentelemetry_sdk::trace::Tracer>,
|
||||||
|
>,
|
||||||
|
) -> ImksResult<SdkLoggerProvider> {
|
||||||
|
let env_filter =
|
||||||
|
EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new(&config.log_level));
|
||||||
|
|
||||||
|
let (logger_provider, log_bridge_layer) = if config.logs_enabled {
|
||||||
|
let exporter = build_log_exporter(config)?;
|
||||||
|
|
||||||
|
let resource = resource.cloned().unwrap_or_else(|| Resource::builder().build());
|
||||||
|
|
||||||
|
let provider = SdkLoggerProvider::builder()
|
||||||
|
.with_resource(resource)
|
||||||
|
.with_batch_exporter(exporter)
|
||||||
|
.build();
|
||||||
|
|
||||||
|
let bridge = OpenTelemetryTracingBridge::new(&provider);
|
||||||
|
(Some(provider), Some(bridge))
|
||||||
|
} else {
|
||||||
|
(None, None)
|
||||||
|
};
|
||||||
|
|
||||||
|
match (otel_trace_layer, log_bridge_layer) {
|
||||||
|
(Some(trace_layer), Some(log_layer)) => {
|
||||||
|
let subscriber = Registry::default()
|
||||||
|
.with(trace_layer)
|
||||||
|
.with(env_filter)
|
||||||
|
.with(make_json_fmt())
|
||||||
|
.with(log_layer);
|
||||||
|
set_subscriber(subscriber);
|
||||||
|
}
|
||||||
|
(Some(trace_layer), None) => {
|
||||||
|
let subscriber = Registry::default()
|
||||||
|
.with(trace_layer)
|
||||||
|
.with(env_filter)
|
||||||
|
.with(make_json_fmt());
|
||||||
|
set_subscriber(subscriber);
|
||||||
|
}
|
||||||
|
(None, Some(log_layer)) => {
|
||||||
|
let subscriber = Registry::default()
|
||||||
|
.with(env_filter)
|
||||||
|
.with(make_json_fmt())
|
||||||
|
.with(log_layer);
|
||||||
|
set_subscriber(subscriber);
|
||||||
|
}
|
||||||
|
(None, None) => {
|
||||||
|
let subscriber = Registry::default()
|
||||||
|
.with(env_filter)
|
||||||
|
.with(make_json_fmt());
|
||||||
|
set_subscriber(subscriber);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let logger_provider = logger_provider.unwrap_or_else(|| SdkLoggerProvider::builder().build());
|
||||||
|
|
||||||
|
Ok(logger_provider)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create the JSON fmt layer with span context.
|
||||||
|
fn make_json_fmt<S>() -> tracing_subscriber::fmt::Layer<
|
||||||
|
S,
|
||||||
|
tracing_subscriber::fmt::format::JsonFields,
|
||||||
|
tracing_subscriber::fmt::format::Format<tracing_subscriber::fmt::format::Json>,
|
||||||
|
>
|
||||||
|
where
|
||||||
|
S: tracing::Subscriber + for<'a> tracing_subscriber::registry::LookupSpan<'a>,
|
||||||
|
{
|
||||||
|
tracing_subscriber::fmt::layer()
|
||||||
|
.json()
|
||||||
|
.with_span_events(FmtSpan::CLOSE)
|
||||||
|
.with_current_span(true)
|
||||||
|
.with_span_list(true)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn set_subscriber<S>(subscriber: S)
|
||||||
|
where
|
||||||
|
S: tracing::Subscriber + Send + Sync + 'static,
|
||||||
|
{
|
||||||
|
match tracing::subscriber::set_global_default(subscriber) {
|
||||||
|
Ok(()) => {}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!("Could not set global tracing subscriber: {e}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_log_exporter(config: &TelemetryConfig) -> ImksResult<LogExporter> {
|
||||||
|
match config.otlp_protocol {
|
||||||
|
OtlpProtocol::Grpc => LogExporter::builder()
|
||||||
|
.with_tonic()
|
||||||
|
.with_endpoint(&config.otlp_endpoint)
|
||||||
|
.build()
|
||||||
|
.map_err(|e| crate::ImksError::Internal(format!("OTLP gRPC log exporter: {e}"))),
|
||||||
|
OtlpProtocol::HttpProtobuf => LogExporter::builder()
|
||||||
|
.with_http()
|
||||||
|
.with_protocol(Protocol::HttpBinary)
|
||||||
|
.with_endpoint(&config.otlp_endpoint)
|
||||||
|
.build()
|
||||||
|
.map_err(|e| crate::ImksError::Internal(format!("OTLP HTTP log exporter: {e}"))),
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,168 @@
|
|||||||
|
//! Prometheus metrics: global meter provider, registry, and the /metrics actix-web handler.
|
||||||
|
|
||||||
|
use std::sync::OnceLock;
|
||||||
|
|
||||||
|
use opentelemetry::global;
|
||||||
|
use opentelemetry::metrics::{Counter, Histogram, Meter, UpDownCounter};
|
||||||
|
use opentelemetry::KeyValue;
|
||||||
|
use opentelemetry_sdk::metrics::SdkMeterProvider;
|
||||||
|
use opentelemetry_sdk::Resource;
|
||||||
|
use prometheus::{Encoder, Registry, TextEncoder};
|
||||||
|
|
||||||
|
use crate::ImksResult;
|
||||||
|
|
||||||
|
/// Shared Prometheus registry, lazily initialized.
|
||||||
|
static PROMETHEUS_REGISTRY: OnceLock<Registry> = OnceLock::new();
|
||||||
|
|
||||||
|
/// Global metrics instruments, initialized once at startup.
|
||||||
|
static METRICS: OnceLock<MetricsInstruments> = OnceLock::new();
|
||||||
|
|
||||||
|
/// All application metrics instruments.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct MetricsInstruments {
|
||||||
|
pub connections_active: UpDownCounter<i64>,
|
||||||
|
pub connections_total: Counter<u64>,
|
||||||
|
pub disconnections_total: Counter<u64>,
|
||||||
|
pub messages_received_total: Counter<u64>,
|
||||||
|
pub messages_sent_total: Counter<u64>,
|
||||||
|
pub event_handling_duration: Histogram<f64>,
|
||||||
|
pub db_query_duration: Histogram<f64>,
|
||||||
|
pub engine_sessions_active: UpDownCounter<i64>,
|
||||||
|
pub namespaces_active: UpDownCounter<i64>,
|
||||||
|
pub gprc_calls_total: Counter<u64>,
|
||||||
|
pub gprc_call_errors_total: Counter<u64>,
|
||||||
|
pub adapter_broadcasts_total: Counter<u64>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Initialize the Prometheus meter provider and create all metric instruments.
|
||||||
|
pub fn init_metrics(
|
||||||
|
_config: &super::config::TelemetryConfig,
|
||||||
|
resource: &Resource,
|
||||||
|
) -> ImksResult<(SdkMeterProvider, MetricsInstruments)> {
|
||||||
|
let registry = Registry::new();
|
||||||
|
PROMETHEUS_REGISTRY
|
||||||
|
.set(registry.clone())
|
||||||
|
.expect("Prometheus registry already initialized");
|
||||||
|
|
||||||
|
let exporter = opentelemetry_prometheus::exporter()
|
||||||
|
.with_registry(registry)
|
||||||
|
.build()
|
||||||
|
.map_err(|e| crate::ImksError::Internal(format!("failed to build Prometheus exporter: {e}")))?;
|
||||||
|
|
||||||
|
let provider = SdkMeterProvider::builder()
|
||||||
|
.with_resource(resource.clone())
|
||||||
|
.with_reader(exporter)
|
||||||
|
.build();
|
||||||
|
|
||||||
|
global::set_meter_provider(provider.clone());
|
||||||
|
|
||||||
|
let meter = global::meter_with_scope(
|
||||||
|
opentelemetry::InstrumentationScope::builder("imks")
|
||||||
|
.with_version(env!("CARGO_PKG_VERSION"))
|
||||||
|
.build(),
|
||||||
|
);
|
||||||
|
|
||||||
|
let instruments = MetricsInstruments::new(&meter);
|
||||||
|
METRICS
|
||||||
|
.set(instruments.clone())
|
||||||
|
.expect("Metrics instruments already initialized");
|
||||||
|
|
||||||
|
Ok((provider, instruments))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Obtain the globally initialized metrics. Panics if not initialized.
|
||||||
|
pub fn get() -> MetricsInstruments {
|
||||||
|
METRICS
|
||||||
|
.get()
|
||||||
|
.expect("Metrics not initialized — call init_metrics first")
|
||||||
|
.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Obtain the globally initialized metrics, returning `None` if not initialized.
|
||||||
|
/// Prefer this in library code that may run before metrics are set up (e.g., tests).
|
||||||
|
pub fn try_get() -> Option<MetricsInstruments> {
|
||||||
|
METRICS.get().cloned()
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MetricsInstruments {
|
||||||
|
fn new(meter: &Meter) -> Self {
|
||||||
|
Self {
|
||||||
|
connections_active: meter
|
||||||
|
.i64_up_down_counter("imks_connections_active")
|
||||||
|
.with_description("Number of active Socket.IO connections")
|
||||||
|
.build(),
|
||||||
|
connections_total: meter
|
||||||
|
.u64_counter("imks_connections_total")
|
||||||
|
.with_description("Total number of socket connections since start")
|
||||||
|
.build(),
|
||||||
|
disconnections_total: meter
|
||||||
|
.u64_counter("imks_disconnections_total")
|
||||||
|
.with_description("Total number of socket disconnections since start")
|
||||||
|
.build(),
|
||||||
|
messages_received_total: meter
|
||||||
|
.u64_counter("imks_messages_received_total")
|
||||||
|
.with_description("Total number of messages received from clients")
|
||||||
|
.build(),
|
||||||
|
messages_sent_total: meter
|
||||||
|
.u64_counter("imks_messages_sent_total")
|
||||||
|
.with_description("Total number of messages sent to clients")
|
||||||
|
.build(),
|
||||||
|
event_handling_duration: meter
|
||||||
|
.f64_histogram("imks_event_handling_duration_seconds")
|
||||||
|
.with_description("Socket.IO event handling latency in seconds")
|
||||||
|
.build(),
|
||||||
|
db_query_duration: meter
|
||||||
|
.f64_histogram("imks_db_query_duration_seconds")
|
||||||
|
.with_description("Database query duration in seconds")
|
||||||
|
.build(),
|
||||||
|
engine_sessions_active: meter
|
||||||
|
.i64_up_down_counter("imks_engine_sessions_active")
|
||||||
|
.with_description("Number of active Engine.IO sessions")
|
||||||
|
.build(),
|
||||||
|
namespaces_active: meter
|
||||||
|
.i64_up_down_counter("imks_namespaces_active")
|
||||||
|
.with_description("Number of active Socket.IO namespaces")
|
||||||
|
.build(),
|
||||||
|
gprc_calls_total: meter
|
||||||
|
.u64_counter("imks_gprc_calls_total")
|
||||||
|
.with_description("Total number of gRPC calls to appks")
|
||||||
|
.build(),
|
||||||
|
gprc_call_errors_total: meter
|
||||||
|
.u64_counter("imks_gprc_call_errors_total")
|
||||||
|
.with_description("Total number of failed gRPC calls to appks")
|
||||||
|
.build(),
|
||||||
|
adapter_broadcasts_total: meter
|
||||||
|
.u64_counter("imks_adapter_broadcasts_total")
|
||||||
|
.with_description("Total number of cross-node adapter broadcasts")
|
||||||
|
.build(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Helper: create KV attributes for an event.
|
||||||
|
pub fn event_attrs(event: &str) -> [KeyValue; 1] {
|
||||||
|
[KeyValue::new("event", event.to_string())]
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Helper: create KV attributes for a namespace.
|
||||||
|
pub fn namespace_attrs(ns: &str) -> [KeyValue; 1] {
|
||||||
|
[KeyValue::new("namespace", ns.to_string())]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Actix-web handler for `GET /metrics`.
|
||||||
|
///
|
||||||
|
/// Encodes the Prometheus text format from the shared registry.
|
||||||
|
pub async fn metrics_handler() -> actix_web::HttpResponse {
|
||||||
|
let registry = PROMETHEUS_REGISTRY.get().expect("Prometheus registry not initialized");
|
||||||
|
|
||||||
|
let metric_families = registry.gather();
|
||||||
|
let encoder = TextEncoder::new();
|
||||||
|
let mut buffer = Vec::new();
|
||||||
|
if encoder.encode(&metric_families, &mut buffer).is_err() {
|
||||||
|
return actix_web::HttpResponse::InternalServerError().body("failed to encode metrics");
|
||||||
|
}
|
||||||
|
|
||||||
|
actix_web::HttpResponse::Ok()
|
||||||
|
.content_type("text/plain; version=0.0.4")
|
||||||
|
.body(buffer)
|
||||||
|
}
|
||||||
@@ -0,0 +1,203 @@
|
|||||||
|
//! Telemetry module — OpenTelemetry-compatible observability stack.
|
||||||
|
//!
|
||||||
|
//! Provides:
|
||||||
|
//! - **Traces**: distributed tracing via OTLP (gRPC or HTTP) with W3C TraceContext propagation
|
||||||
|
//! - **Metrics**: Prometheus-compatible metrics exposed at `/metrics`
|
||||||
|
//! - **Logs**: JSON + console dual output, plus OTLP log export bridge
|
||||||
|
//! - **Health**: enhanced `/health` endpoint with upstream dependency checks
|
||||||
|
//!
|
||||||
|
//! # Quick start
|
||||||
|
//!
|
||||||
|
//! ```ignore
|
||||||
|
//! let guard = telemetry::init();
|
||||||
|
//! // ... application runs ...
|
||||||
|
//! drop(guard); // graceful shutdown, flushes all pending telemetry
|
||||||
|
//! ```
|
||||||
|
//!
|
||||||
|
//! # Environment variables
|
||||||
|
//!
|
||||||
|
//! | Variable | Default | Description |
|
||||||
|
//! |---|---|---|
|
||||||
|
//! | `OTEL_SERVICE_NAME` | `imks` | Service name in traces/metrics/logs |
|
||||||
|
//! | `OTEL_SERVICE_VERSION` | Cargo version | Service version |
|
||||||
|
//! | `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://localhost:4317` | OTLP collector endpoint |
|
||||||
|
//! | `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | `grpc` or `http/protobuf` |
|
||||||
|
//! | `OTEL_TRACES_ENABLED` | `true` | Enable distributed tracing |
|
||||||
|
//! | `OTEL_METRICS_ENABLED` | `true` | Enable Prometheus metrics |
|
||||||
|
//! | `OTEL_LOGS_ENABLED` | `true` | Enable OTLP log export |
|
||||||
|
//! | `LOG_FORMAT` | `both` | `json`, `pretty`, or `both` |
|
||||||
|
//! | `RUST_LOG` | `info` | Log level filter |
|
||||||
|
|
||||||
|
pub mod config;
|
||||||
|
pub mod health;
|
||||||
|
pub mod logs;
|
||||||
|
pub mod metrics;
|
||||||
|
pub mod traces;
|
||||||
|
|
||||||
|
use opentelemetry_sdk::Resource;
|
||||||
|
|
||||||
|
pub use config::TelemetryConfig;
|
||||||
|
pub use health::{HealthCheckFns, health_check};
|
||||||
|
pub use metrics::{MetricsInstruments, get as metrics, try_get as try_metrics};
|
||||||
|
|
||||||
|
/// Holds all telemetry providers for graceful shutdown.
|
||||||
|
///
|
||||||
|
/// When `shutdown()` is called, flushes and shuts down all providers in order:
|
||||||
|
/// tracer → meter → logger.
|
||||||
|
pub struct TelemetryGuard {
|
||||||
|
tracer_provider: Option<opentelemetry_sdk::trace::SdkTracerProvider>,
|
||||||
|
meter_provider: Option<opentelemetry_sdk::metrics::SdkMeterProvider>,
|
||||||
|
logger_provider: Option<opentelemetry_sdk::logs::SdkLoggerProvider>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TelemetryGuard {
|
||||||
|
/// Flush all pending telemetry and shut down providers.
|
||||||
|
///
|
||||||
|
/// Call this before process exit to avoid data loss.
|
||||||
|
pub fn shutdown(mut self) {
|
||||||
|
if let Some(tp) = self.tracer_provider.take()
|
||||||
|
&& let Ok(rt) = tokio::runtime::Runtime::new()
|
||||||
|
{
|
||||||
|
rt.block_on(async {
|
||||||
|
tp.shutdown().unwrap_or_default();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
if let Some(mp) = self.meter_provider.take()
|
||||||
|
&& let Ok(rt) = tokio::runtime::Runtime::new()
|
||||||
|
{
|
||||||
|
rt.block_on(async {
|
||||||
|
mp.shutdown().unwrap_or_default();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
if let Some(lp) = self.logger_provider.take()
|
||||||
|
&& let Ok(rt) = tokio::runtime::Runtime::new()
|
||||||
|
{
|
||||||
|
rt.block_on(async {
|
||||||
|
lp.shutdown().unwrap_or_default();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Force-flush all pending trace spans (non-blocking best-effort).
|
||||||
|
pub fn flush_traces(&self) {
|
||||||
|
if let Some(ref tp) = self.tracer_provider
|
||||||
|
&& let Ok(rt) = tokio::runtime::Runtime::new()
|
||||||
|
{
|
||||||
|
rt.block_on(async {
|
||||||
|
tp.force_flush().unwrap_or_default();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Force-flush all pending metrics.
|
||||||
|
pub fn flush_metrics(&self) {
|
||||||
|
if let Some(ref mp) = self.meter_provider
|
||||||
|
&& let Ok(rt) = tokio::runtime::Runtime::new()
|
||||||
|
{
|
||||||
|
rt.block_on(async {
|
||||||
|
mp.force_flush().unwrap_or_default();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Drop for TelemetryGuard {
|
||||||
|
fn drop(&mut self) {
|
||||||
|
// Best-effort: the caller should call shutdown() explicitly before process exit
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Initialize the full telemetry stack.
|
||||||
|
///
|
||||||
|
/// 1. Creates the OTel Resource (service name, version, host)
|
||||||
|
/// 2. Sets up tracing subscriber with console + JSON + OTel layers
|
||||||
|
/// 3. Initializes Prometheus metrics
|
||||||
|
/// 4. Records server start time for uptime tracking
|
||||||
|
///
|
||||||
|
/// Returns a `TelemetryGuard` that should be held until process exit.
|
||||||
|
pub fn init() -> TelemetryGuard {
|
||||||
|
let config = TelemetryConfig::from_env();
|
||||||
|
|
||||||
|
let resource = Resource::builder()
|
||||||
|
.with_service_name(config.service_name.clone())
|
||||||
|
.with_attribute(opentelemetry::KeyValue::new(
|
||||||
|
"service.version",
|
||||||
|
config.service_version.clone(),
|
||||||
|
))
|
||||||
|
.with_attribute(opentelemetry::KeyValue::new(
|
||||||
|
"deployment.environment",
|
||||||
|
std::env::var("OTEL_RESOURCE_ATTRIBUTES_DEPLOYMENT")
|
||||||
|
.unwrap_or_else(|_| "development".to_string()),
|
||||||
|
))
|
||||||
|
.build();
|
||||||
|
|
||||||
|
// 1. Set up tracing (traces + subscriber)
|
||||||
|
let (tracer_provider, logger_provider) = if config.traces_enabled {
|
||||||
|
match traces::init_tracing(&config, &resource) {
|
||||||
|
Ok((provider, otel_layer)) => {
|
||||||
|
match logs::init_subscriber(&config, Some(&resource), Some(otel_layer)) {
|
||||||
|
Ok(logger_provider) => {
|
||||||
|
tracing::info!(
|
||||||
|
service = %config.service_name,
|
||||||
|
endpoint = %config.otlp_endpoint,
|
||||||
|
protocol = ?config.otlp_protocol,
|
||||||
|
"OpenTelemetry tracing initialized"
|
||||||
|
);
|
||||||
|
(Some(provider), Some(logger_provider))
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!(
|
||||||
|
"Failed to initialize log bridge: {e}. Tracing still active."
|
||||||
|
);
|
||||||
|
(Some(provider), None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!(
|
||||||
|
"Failed to initialize OTLP tracing: {e}. Using console-only logging."
|
||||||
|
);
|
||||||
|
match logs::init_subscriber(&config, Some(&resource), None) {
|
||||||
|
Ok(lp) => (None, Some(lp)),
|
||||||
|
Err(_) => {
|
||||||
|
tracing_subscriber::fmt().init();
|
||||||
|
(None, None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
match logs::init_subscriber(&config, Some(&resource), None) {
|
||||||
|
Ok(lp) => (None, Some(lp)),
|
||||||
|
Err(_) => {
|
||||||
|
tracing_subscriber::fmt().init();
|
||||||
|
(None, None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// 2. Metrics
|
||||||
|
let meter_provider = if config.metrics_enabled {
|
||||||
|
match metrics::init_metrics(&config, &resource) {
|
||||||
|
Ok((provider, _instruments)) => {
|
||||||
|
tracing::info!("Prometheus metrics initialized (available at /metrics)");
|
||||||
|
Some(provider)
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!("Failed to initialize Prometheus metrics: {e}");
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
};
|
||||||
|
|
||||||
|
// 3. Record start time for uptime
|
||||||
|
health::record_start_time();
|
||||||
|
|
||||||
|
TelemetryGuard {
|
||||||
|
tracer_provider,
|
||||||
|
meter_provider,
|
||||||
|
logger_provider,
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,55 @@
|
|||||||
|
//! OpenTelemetry distributed tracing — OTLP exporter + tracing-opentelemetry bridge.
|
||||||
|
|
||||||
|
use opentelemetry::trace::TracerProvider as _;
|
||||||
|
use opentelemetry_otlp::{Protocol, SpanExporter, WithExportConfig};
|
||||||
|
use opentelemetry_sdk::propagation::TraceContextPropagator;
|
||||||
|
use opentelemetry_sdk::trace::{SdkTracerProvider, Tracer};
|
||||||
|
use opentelemetry_sdk::Resource;
|
||||||
|
use tracing_opentelemetry::OpenTelemetryLayer;
|
||||||
|
use tracing_subscriber::Registry;
|
||||||
|
|
||||||
|
use super::config::{OtlpProtocol, TelemetryConfig};
|
||||||
|
use crate::ImksResult;
|
||||||
|
|
||||||
|
/// Build an OTLP SpanExporter based on the configured protocol.
|
||||||
|
fn build_span_exporter(config: &TelemetryConfig) -> ImksResult<SpanExporter> {
|
||||||
|
match config.otlp_protocol {
|
||||||
|
OtlpProtocol::Grpc => SpanExporter::builder()
|
||||||
|
.with_tonic()
|
||||||
|
.with_endpoint(&config.otlp_endpoint)
|
||||||
|
.build()
|
||||||
|
.map_err(|e| crate::ImksError::Internal(format!("OTLP gRPC span exporter: {e}"))),
|
||||||
|
OtlpProtocol::HttpProtobuf => SpanExporter::builder()
|
||||||
|
.with_http()
|
||||||
|
.with_protocol(Protocol::HttpBinary)
|
||||||
|
.with_endpoint(&config.otlp_endpoint)
|
||||||
|
.build()
|
||||||
|
.map_err(|e| {
|
||||||
|
crate::ImksError::Internal(format!("OTLP HTTP span exporter: {e}"))
|
||||||
|
}),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Initialize the tracing pipeline: OTel tracer provider + tracing-opentelemetry layer.
|
||||||
|
///
|
||||||
|
/// Returns (SdkTracerProvider, OpenTelemetryLayer).
|
||||||
|
pub fn init_tracing(
|
||||||
|
config: &TelemetryConfig,
|
||||||
|
resource: &Resource,
|
||||||
|
) -> ImksResult<(SdkTracerProvider, OpenTelemetryLayer<Registry, Tracer>)> {
|
||||||
|
// Set global propagator for W3C TraceContext extraction/injection
|
||||||
|
opentelemetry::global::set_text_map_propagator(TraceContextPropagator::new());
|
||||||
|
|
||||||
|
let exporter = build_span_exporter(config)?;
|
||||||
|
|
||||||
|
let provider = SdkTracerProvider::builder()
|
||||||
|
.with_resource(resource.clone())
|
||||||
|
.with_batch_exporter(exporter)
|
||||||
|
.build();
|
||||||
|
|
||||||
|
let tracer = provider.tracer("imks");
|
||||||
|
|
||||||
|
let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
|
||||||
|
|
||||||
|
Ok((provider, otel_layer))
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user