Skip to content

Version 0.6

Latest
Compare
Choose a tag to compare
@andylau-55 andylau-55 released this 08 Jan 03:50
· 18 commits to master since this release
1131dbb

Version 0.6 (2025-01-07)

On January 7, 2025, OpenSPG officially released version 0.6, bringing updates across multiple areas, including domain knowledge mounting, vertical domain schema management, visual knowledge exploration, and support for summary generation tasks. In terms of user experience, it offers a mechanism for resuming knowledge base tasks from breakpoints, introduces a user login and permission system, and optimizes task scheduling for building processes. In developer mode, it supports configuring different models for different stages and enables schema-constraint mode for extraction, significantly enhancing the system's flexibility, usability, performance, and security. This release provides users with a more powerful knowledge management platform that adapts to diverse application scenarios.


🌟 New Features

  1. Support for Summary Generation Tasks

    • Native support for abstractive summarization tasks without sacrificing multi-hop factual reasoning accuracy. On the CSQA dataset, while comprehensiveness, diversity, and empowerment metrics are slightly lower than LightRAG (-1.2/10), the factual accuracy metric is better than LightRAG (+0.1/10). On multi-hop question answering datasets such as HotpotQA, TwoWiki, and MuSiQue, since LightRAG and GraphRAG do not provide a factual QA evaluation entry, the EM metric using the default entry is close to 0. For quantitative evaluation results, please refer to the KAG code repository under examples/csqa/README.md and follow the steps to reproduce.
  2. Domain Schema Management

    • The product provides SPG schema management capabilities, allowing users to optimize knowledge base construction and inference Q&A performance by customizing schemas.
  3. Knowledge Exploration

    • Added a knowledge exploration feature to enable visual query and analysis of knowledge base data, and provided an HTTP API for integration with other systems.
  4. Support for Mounting Domain Knowledge in KAG-Builder(Developer Mode)

    • In developer mode, the system supports injecting domain knowledge (domain vocabulary, relationships between terms) into the knowledge base, which can significantly improve knowledge base construction and inference Q&A performance (with a 10%+ improvement in the medical domain).
  5. Adding Knowledge Alignment Component to the KAG-Builder Pipeline

    • Kag-Builder provides a default knowledge alignment component that includes features such as filtering out invalid data and linking similar entities. This optimizes the structure and data quality of the graph.

⚙️ User Experience Optimizations

  1. Resumable Tasks

    • Provide resumable capabilities for knowledge base construction tasks at the file level and chunk level in both product mode and developer mode, to reduce the time and token consumption caused by full re-runs after task failures.
  2. User Login & Permission System

    • Implement a user login and permission system to prevent unauthorized access and operations on the knowledge base data.
  3. Optimized Knowledge Base Construction Task Scheduling

    • Provide database-based knowledge base construction task scheduling to avoid task anomalies or interruptions after container restarts.
  4. Support for Configuring Different Models at Different Stages (Developer Mode)

    • The system provides a component management mechanism based on a registry, allowing users to instantiate component objects via configuration files. This supports users in developing and embedding custom components into the KAG-Builder and KAG-Solver workflows. Additionally, it enables the configuration of different-sized models at different stages of the workflow, thereby enhancing the overall reasoning and question-answering performance.
  5. Optimization of Layout Analysis for Markdown, PDF, and Word Files

    • For Markdown, PDF, and Word files, the system prioritizes dividing the content into chunks based on the file's sections. This ensures that the content within each chunk is more cohesive.**
  6. Global Configuration and Knowledge Base Configuration

    • Provide global configuration for the knowledge base, allowing unified settings for storage engines, generation models, and representation model access information.
  7. Support for Schema-Constrained Extraction and Linking (Developer Mode)

    • Provide a schema-constraint mode that strictly adheres to schema definitions during the knowledge base construction phase, enabling finer-grained and more complex knowledge extraction.

Version 0.6 (2025-01-07)

2025 年 1 月 7 日,OpenSPG 正式发布 v0.6 版本,此次发布带来多方面更新,包括领域知识挂载、垂域schema 管理、可视化知识探查、摘要生成类任务支持等;用户体验上,提供知识库任务的断点续跑机制,新增用户登录与权限体系、优化构建任务调度;开发者模式下支持不同阶段配置不同模型、支持 schema-constraint 模式抽取等,极大地提升了系统的灵活性、易用性、性能和安全性,为用户提供了一个更加强大且适应多样化应用场景的知识管理平台。


🌟 新增功能

  1. 摘要生成类任务支持

    • 不牺牲多跳事实推理精度的情况下,原生支持摘要生成任务。
      在CSQA 数据集上,全面性、多样性、赋权性 等指标弱于LightRAG (-1.2/10)情况下,事实性指标优于 LightRAG(+0.1/10);在hotpotqa, twowiki, musique 等多跳问答数据集上,鉴于LightRAG & GraphRAG均未提供事实问答的测评入口,使用默认入口测试EM指标接近0。
      KAG 量化评测结果,可参考 KAG 代码仓库 examples/csqa/READEME.md 按步骤复现。
  2. 领域 Schema 管理

    • 产品侧提供spg schema 管理能力,支持用户根据通过自定义schema 以优化知识库构建&推理问答的效果。
  3. 知识探查

    • 新增知识探查功能,实现知识库数据的可视化查询分析,并提供HttpAPI 与其它系统对接。
  4. 知识库构建支持挂载领域知识 (开发者模式)

    • 开发者模式下,支持将领域知识(领域词汇、词条间关系)注入知识库中,可显著提升知识库构建、推理问答效果(医疗场景下有10%+ 的提升)。
  5. 构建链路增加知识对齐组件

    • Kag-Builder 提供默认的知识对齐组件,并内嵌无效数据过滤、相似实体链指等功能,以优化图谱的结构和数据质量。

⚙️ 用户体验优化

  1. 断点续跑

    • 产品模式、开发者模式下,分别提供文件级别、Chunk 级别的知识库构建任务的断点续跑能力,以降低任务失败后全量重跑所带来的时间和tokens 消耗。
  2. 用户登录&权限体系

    • 提供 用户登录&权限体系,防止未经授权的知识库数据访问和操作。
  3. 知识库构建任务调度优化

    • 提供基于数据库的知识库构建任务调度能力,避免容器重启后任务异常或者中断。
  4. 支持不同阶段配置不同模型(开发者模式)

    • 提供基于注册器的组件管理机制,允许用户通过配置文件实例化组件对象,支持用户开发&嵌入自定义组件到KAG-Builder、KAG-Solver 工作流 中,同时在工作流的不同阶段配置不同规模的大模型,以提升整体的推理问答性能。
  5. Markdown、PDF、Word 文件版面分析优化

    • Markdown、pdf、word 等文件优先根据文件章节划分Chunk,以实现同一chunk 的内容更内聚。
  6. 项目全局配置及知识库配置

    • 提供知识库全局配置功能,统一设置存储引擎、生成模型、表示模型的访问信息。
  7. 支持 schema-constraint 模式的抽取链接(开发者模式)

    • 提供schema-constraint 模式,知识库构建阶段,严格按照 Schema 的定义进行操作,从而实现更细粒度和更复杂的知识抽取。