# 知识库（RAGFlow）上线步骤

> **适用范围**: 生产 / UAT / 本地  
> **部署脚本**: `bash scripts/deploy/deploy.sh <env> deploy`  
> **依赖**: RAGFlow + ES + MySQL + MinIO + Redis（Valkey）

---

## 1. 环境变量配置

### UAT 环境（端口 70xx）

```env
# SharePoint 同步
KB_SP_SITE_URL="https://faradayandfuture.sharepoint.com/UP2U Subcommitee"
KB_SP_INCLUDED_PATHS="/UP2U Functional Units/01. Branding & MarCom Sales/Communication Planning,/UP2U Functional Units/01. Branding & MarCom Sales/Event,/UP2U Functional Units/01. Branding & MarCom Sales/Marcom Strategy & Planning,/UP2U Functional Units/01. Branding & MarCom Sales/Public Relations,/UP2U Functional Units/01. Branding & MarCom Sales/Social"
KB_SP_MAX_FILE_SIZE_MB=128

# RAGFlow 基本配置
RAGFLOW_IMAGE=infiniflow/ragflow:v0.23.1
RAGFLOW_BASE_URL=http://localhost:7098
RAGFLOW_API_KEY=GyF2xp7htmwZcM3cs1QZWq82v0Pdcu4-7_8u-CFoPyU
RAGFLOW_DATASET_NAME=ffoa-knowledge-base
RAGFLOW_DATASET_DESCRIPTION=FFOA Knowledge Base
RAGFLOW_CHAT_NAME=ffoa-knowledge-assistant

# 端口（UAT）
RAGFLOW_WEB_HTTP_PORT=7095
RAGFLOW_WEB_HTTPS_PORT=7443
RAGFLOW_HTTP_PORT=7098
RAGFLOW_ADMIN_HTTP_PORT=7099

# 存储/引擎
RAGFLOW_DOC_ENGINE=elasticsearch
RAGFLOW_DEVICE=cpu
RAGFLOW_MEM_LIMIT=4g
RAGFLOW_MAX_CONTENT_LENGTH=134217728
RAGFLOW_MYSQL_HOST_PORT=7092
RAGFLOW_MYSQL_PORT=3306
RAGFLOW_MYSQL_DBNAME=rag_flow
RAGFLOW_MYSQL_USER=root
RAGFLOW_MYSQL_PASSWORD=infini_rag_flow
RAGFLOW_MYSQL_MAX_PACKET=1073741824
RAGFLOW_MYSQL_CHARSET=utf8mb4
RAGFLOW_MYSQL_COLLATION=utf8mb4_0900_ai_ci
RAGFLOW_MINIO_PORT=7093
RAGFLOW_MINIO_CONSOLE_PORT=7094
RAGFLOW_MINIO_USER=rag_flow
RAGFLOW_MINIO_PASSWORD=infini_rag_flow
RAGFLOW_MINIO_BUCKET=ragflow
RAGFLOW_MINIO_PREFIX_PATH=
RAGFLOW_REDIS_PORT=7097
RAGFLOW_REDIS_PASSWORD=infini_rag_flow
RAGFLOW_ES_PORT=7096
RAGFLOW_ES_STACK_VERSION=8.11.3
RAGFLOW_ES_JAVA_OPTS=-Xms1g -Xmx1g
RAGFLOW_ELASTIC_PASSWORD=infini_rag_flow

# Embedding/TEI
RAGFLOW_EMBEDDING_MODEL=text-embedding-v3@Tongyi-Qianwen
RAGFLOW_EMBEDDING_RESET_ON_CHANGE=true
RAGFLOW_ASR_MODEL=
RAGFLOW_ASR_FACTORY=
RAGFLOW_ASR_API_KEY=
RAGFLOW_ASR_BASE_URL=
RAGFLOW_IMAGE2TEXT_MODEL=
RAGFLOW_IMAGE2TEXT_FACTORY=
RAGFLOW_IMAGE2TEXT_API_KEY=
RAGFLOW_IMAGE2TEXT_BASE_URL=
RAGFLOW_LOCAL_OCR_ONLY=true
# 本地 TEI 仅在需要本地向量模型时才启用（默认不启动）
# TEI_IMAGE_CPU=ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
# TEI_MODEL=/data/models/bge-m3
# TEI_PORT=7088
# TEI_HOST=ragflow-tei
# RAGFLOW_TEI_HOST=ragflow-tei

# DashScope（通义）Key（后端与 RAGFlow 共用）
DASHSCOPE_API_KEY=change_me_dashscope_api_key

# 同步任务超时配置
KB_SYNC_TIMEOUT_MINUTES=120
KB_SYNC_PROGRESS_TIMEOUT_MINUTES=30
```

### Production 环境（端口 60xx）

```env
# SharePoint 同步
KB_SP_SITE_URL="https://faradayandfuture.sharepoint.com/UP2U Subcommitee"
KB_SP_INCLUDED_PATHS="/UP2U Functional Units/01. Branding & MarCom Sales/Communication Planning,/UP2U Functional Units/01. Branding & MarCom Sales/Event,/UP2U Functional Units/01. Branding & MarCom Sales/Marcom Strategy & Planning,/UP2U Functional Units/01. Branding & MarCom Sales/Public Relations,/UP2U Functional Units/01. Branding & MarCom Sales/Social"
KB_SP_MAX_FILE_SIZE_MB=128

# RAGFlow 基本配置
RAGFLOW_IMAGE=infiniflow/ragflow:v0.23.1
RAGFLOW_BASE_URL=http://localhost:6098
RAGFLOW_API_KEY=GyF2xp7htmwZcM3cs1QZWq82v0Pdcu4-7_8u-CFoPyU
RAGFLOW_DATASET_NAME=ffoa-knowledge-base
RAGFLOW_DATASET_DESCRIPTION=FFOA Knowledge Base
RAGFLOW_CHAT_NAME=ffoa-knowledge-assistant

# 端口（Production）
RAGFLOW_WEB_HTTP_PORT=6095
RAGFLOW_WEB_HTTPS_PORT=6443
RAGFLOW_HTTP_PORT=6098
RAGFLOW_ADMIN_HTTP_PORT=6099

# 存储/引擎
RAGFLOW_DOC_ENGINE=elasticsearch
RAGFLOW_DEVICE=cpu
RAGFLOW_MEM_LIMIT=4g
RAGFLOW_MAX_CONTENT_LENGTH=134217728
RAGFLOW_MYSQL_HOST_PORT=6092
RAGFLOW_MYSQL_PORT=3306
RAGFLOW_MYSQL_DBNAME=rag_flow
RAGFLOW_MYSQL_USER=root
RAGFLOW_MYSQL_PASSWORD=infini_rag_flow
RAGFLOW_MYSQL_MAX_PACKET=1073741824
RAGFLOW_MYSQL_CHARSET=utf8mb4
RAGFLOW_MYSQL_COLLATION=utf8mb4_0900_ai_ci
RAGFLOW_MINIO_PORT=6093
RAGFLOW_MINIO_CONSOLE_PORT=6094
RAGFLOW_MINIO_USER=rag_flow
RAGFLOW_MINIO_PASSWORD=infini_rag_flow
RAGFLOW_MINIO_BUCKET=ragflow
RAGFLOW_MINIO_PREFIX_PATH=
RAGFLOW_REDIS_PORT=6097
RAGFLOW_REDIS_PASSWORD=infini_rag_flow
RAGFLOW_ES_PORT=6096
RAGFLOW_ES_STACK_VERSION=8.11.3
RAGFLOW_ES_JAVA_OPTS=-Xms1g -Xmx1g
RAGFLOW_ELASTIC_PASSWORD=infini_rag_flow

# Embedding/TEI
RAGFLOW_EMBEDDING_MODEL=text-embedding-v3@Tongyi-Qianwen
RAGFLOW_EMBEDDING_RESET_ON_CHANGE=true
RAGFLOW_ASR_MODEL=
RAGFLOW_ASR_FACTORY=
RAGFLOW_ASR_API_KEY=
RAGFLOW_ASR_BASE_URL=
RAGFLOW_IMAGE2TEXT_MODEL=
RAGFLOW_IMAGE2TEXT_FACTORY=
RAGFLOW_IMAGE2TEXT_API_KEY=
RAGFLOW_IMAGE2TEXT_BASE_URL=
RAGFLOW_LOCAL_OCR_ONLY=true
# 本地 TEI 仅在需要本地向量模型时才启用（默认不启动）
# TEI_IMAGE_CPU=ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
# TEI_MODEL=/data/models/bge-m3
# TEI_PORT=6088
# TEI_HOST=ragflow-tei
# RAGFLOW_TEI_HOST=ragflow-tei

# DashScope（通义）Key（后端与 RAGFlow 共用）
DASHSCOPE_API_KEY=change_me_dashscope_api_key

# 同步任务超时配置
KB_SYNC_TIMEOUT_MINUTES=120
KB_SYNC_PROGRESS_TIMEOUT_MINUTES=30
```

---

## 2. 需要确认/调整的关键项

- `KB_SP_SITE_URL` / `KB_SP_INCLUDED_PATHS` / `KB_SP_MAX_FILE_SIZE_MB`（SharePoint 白名单与大小限制）
- `RAGFLOW_BASE_URL` / 端口是否与环境端口规划一致
- `RAGFLOW_API_KEY`（必须替换）
- `RAGFLOW_EMBEDDING_MODEL`
- `RAGFLOW_EMBEDDING_RESET_ON_CHANGE`（嵌入模型变更时是否自动清空并重建；生产建议 `false`）
- `RAGFLOW_MAX_CONTENT_LENGTH`（上传与解析大小上限，需与 Nginx 的 `client_max_body_size` 保持一致）
- `DASHSCOPE_API_KEY`（通义模型 Key，RAGFlow 与后端需要一致）
  - 修改该值后需 **重启 RAGFlow 容器** 才会生效（脚本只更新数据库，不会改容器环境变量）
- `ENV_FILE_PATH`（由部署脚本自动注入，无需手动配置）
- **TEI 默认不随 `knowledge` 启动**（如需本地向量模型，请使用 `knowledge-tei` profile）

---

## 3. 上传大小上限（128M）

RAGFlow 上传与解析大小上限需要 **两处同时设置**：

1. `.env` 中的 `RAGFLOW_MAX_CONTENT_LENGTH`（字节）
2. `docker/ragflow/nginx/nginx.conf` 中的 `client_max_body_size`

本项目已将 `client_max_body_size` 设为 `128M`，请确保 `.env` 对应：

```env
RAGFLOW_MAX_CONTENT_LENGTH=134217728
```

---

## 4. 多模态模型配置（图片/音频）

图片 OCR/图片描述与音频转写依赖 RAGFlow 的多模态模型配置。未配置会出现 `Model(@None) not authorized`。

```env
RAGFLOW_ASR_MODEL=
RAGFLOW_ASR_FACTORY=
RAGFLOW_ASR_API_KEY=
RAGFLOW_ASR_BASE_URL=
RAGFLOW_IMAGE2TEXT_MODEL=
RAGFLOW_IMAGE2TEXT_FACTORY=
RAGFLOW_IMAGE2TEXT_API_KEY=
RAGFLOW_IMAGE2TEXT_BASE_URL=
RAGFLOW_LOCAL_OCR_ONLY=true
```

> 若已在 RAGFlow UI 中配置过模型，UI 配置会覆盖此文件。建议以 UI 为准，并确保与 `.env` 一致。

---

## 5. DashScope Key 规范化规则

RAGFlow 的模型配置存储在自身 MySQL（`tenant_llm` 表），仅更新 `.env` 不会覆盖已写入的旧 Key。

- 数据库仅存 **不带前缀** 的 raw token
- 环境变量 `RAGFLOW_API_KEY` 也使用 raw token
- 脚本会自动规范化（可带或不带 `ragflow-` 前缀，脚本会自动去除前缀）
- 修改 `DASHSCOPE_API_KEY` 后需重启 RAGFlow 容器才会生效

---

## 6. 后台管理

全量/增量同步、任务状态、失败/跳过明细 **均可在后台页面查看**，不需要调用接口。接口仅用于排障与脚本化验证。

---

## 7. 旧向量服务（Qdrant）停用决定

若历史环境仍在运行 Qdrant，应停用并移除。若环境使用容器前缀（如 `ffws` / `ffws-uat`），请据实替换容器名。
