The Watcher

	
# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 1
# self = https://watcher.sour.is/conv/otad6ba

yue-fang-readfog

feeds.twtxt.net

05 Feb 25 03:15 UTC

DeepSeek 爲什麼採用與主流大模型不一樣的 MoE 架構？一文搞懂什麼是 MoE 模型**
一、前言在 DeepSeek 官網上看到，DeepSeek-V3、V2.5 版本都用了 MoE 架構。但像 Qwen、LLama 模型，用的卻是 Dense 架構，也就是傳統的 Transformer 架構。這兩種架構有個很明顯的區別。DeepSeek-V3 版本總參數量高達 6710 億，可每次計算激活的參數量，也就是真正參與到計算裏的參數，只有 370 億，是總參數量的 5.5%。但 Qwen ⌘ Read more