• 中国科学学与科技政策研究会
  • 中国科学院科技战略咨询研究院
  • 清华大学科学技术与社会研究中心
ISSN 1003-2053 CN 11-1805/G3

科学学研究 ›› 2026, Vol. 44 ›› Issue (4): 690-700.

• 热点议题 • 上一篇    下一篇

DeepSeek类生成式人工智能的新型数据安全风险治理

宗绍昊1,罗世龙2   

  1. 1. 中国政法大学
    2. 武汉大学
  • 收稿日期:2025-03-11 修回日期:2025-07-22 出版日期:2026-04-15 发布日期:2026-04-15
  • 通讯作者: 宗绍昊

Governance of New Type of Data Security Risks in DeepSeek-like Generative Artificial Intelligence

  • Received:2025-03-11 Revised:2025-07-22 Online:2026-04-15 Published:2026-04-15

摘要: 生成式人工智能的数据安全风险治理是当前面临的重要问题。现有生成式人工智能的数据安全治理体系未能针对不同的数据安全风险建立起有效的类型化治理机制,与迭代至DeepSeek类生成式人工智能的现实状况相脱节。DeepSeek系列模型具有多头潜在注意力机制、混合专家模型、纯强化学习等新技术特征,导致原有数据安全风险量变或异化,由此产生运行逻辑各异的新型数据泄露安全风险、数据偏离安全风险和数据删除安全风险。针对数据泄露安全风险,应分别形成技术层面和管理层面的预防机制;针对数据偏离安全风险,应强调不同于人工监督微调模式的数据输入和输出的两端微调并创设数据输出结果的算法解释权;针对数据删除安全风险,应从数据绝对删除转向数据相对删除并实行动态核验机制。

Abstract: The governance of data security risks in generative artificial intelligence is a critical issue currently facing us. The existing data security governance system for generative artificial intelligence remains in its embryonic stage, failing to establish effective typed governance mechanisms for different types of data security risks, and is out of sync with the practical development of DeepSeek-like generative artificial intelligence that has iterated to an advanced stage. The new technical features of the DeepSeek series of models—including multi-head latent attention mechanisms, mixture of experts models, and pure reinforcement learning—have led to quantitative changes or alienation of original data security risks, giving rise to data leakage security risks, data deviation security risks, and data deletion security risks with distinct operational logics. These risks require the exploration of matching typed governance mechanisms. First, data leakage security risks stem from external attacks, simple internal configuration errors, or human negligence. To address these risks, preventive mechanisms at both the technical and management levels should be established respectively. Technical governance should take a leading role here, requiring the systematic application of technologies such as data masking and privacy-preserving computing, federated learning and distributed training optimization, as well as data encryption storage and isolation. Preventive mechanisms at the management level consist of pre-defined management matters that are implemented in real time, including basic elements such as domain warning mechanisms, data classification and grading protection mechanisms, data security organization and evaluation mechanisms, and data leakage incident emergency response mechanisms. Second, data deviation security risks refer to various risks that deviate from reasonable expectations during the use of artificial intelligence systems, encompassing both data quality issues caused by machine hallucinations and invasive issues in data processing behaviors. In response, adjustments at both the input and output ends of data should be emphasized—distinct from the manual fine-tuning model—and the right to algorithmic explanation of data output results should be established. Here, the two-end adjustment mechanism does not mean reverting to the large-scale model development path of supervised fine-tuning; rather, it achieves adjustment effects through special mechanisms constructed outside the model while maintaining DeepSeek’s current development model. For example, to address machine hallucination security risks, synthetic content detection tools can be introduced to identify and classify data or content generated or modified by large models, or high-quality data such as synthetic data can be used to reduce the impact of poor-quality data on DeepSeek. To address data infringement security risks, content filters can be embedded at data output ports to directly block the output of harmful instructions such as hate speech, pornography, and violence, or value alignment mechanisms can be embedded at data output ports. Meanwhile, the right to algorithmic explanation of data output results serves as a reasonable intermediate path to balance human rights protection and legal interest protection, particularly suitable for scenarios involving unauthorized information, uncertain data sources, and unreasonable data processing during DeepSeek’s data output process. Finally, regarding the data deletion security risks faced by DeepSeek—including objective impossibility of deletion or deletion-induced harm—de-absoluteizing data deletion and using dynamic verification mechanisms to counter logical reproduction phenomena are relatively scientific measures. After de-absoluteizing data deletion, the relative deletion obligation targeting specific data becomes factually fulfillable. On this basis, fulfilling DeepSeek’s data deletion obligations involves a process of confirming the scope of data to be deleted based on deletion requests and carrying out deletion after comprehensive evaluation.