# I am the Watcher. I am your guide through this vast new twtiverse.
#
# Usage:
# https://watcher.sour.is/api/plain/users View list of users and latest twt date.
# https://watcher.sour.is/api/plain/twt View all twts.
# https://watcher.sour.is/api/plain/mentions?uri=:uri View all mentions for uri.
# https://watcher.sour.is/api/plain/conv/:hash View all twts for a conversation subject.
#
# Options:
# uri Filter to show a specific users twts.
# offset Start index for quey.
# limit Count of items to return (going back in time).
#
# twt range = 1 1
# self = https://watcher.sour.is/conv/otad6ba
DeepSeek 爲什麼採用與主流大模型不一樣的 MoE 架構?一文搞懂什麼是 MoE 模型**
一、前言在 DeepSeek 官網上看到,DeepSeek-V3、V2.5 版本都用了 MoE 架構。但像 Qwen、LLama 模型,用的卻是 Dense 架構,也就是傳統的 Transformer 架構。這兩種架構有個很明顯的區別。DeepSeek-V3 版本總參數量高達 6710 億,可每次計算激活的參數量,也就是真正參與到計算裏的參數,只有 370 億,是總參數量的 5.5%。但 Qwen ⌘ Read more