The Watcher

	
# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 1
# self = https://watcher.sour.is/conv/g7grulq

yue-fang-readfog

feeds.twtxt.net

19 Feb 25 02:28 UTC

Deepseek-R1 實現原理概述**
基本概念強化學習 (Reinforcement Learning)強化學習 (RL) 是一種機器學習，其中 AI 通過採取行動並根據這些行動獲得獎勵或懲罰來進行學習。目標是隨着時間的推移最大化獎勵。示例：想象一下教機器人玩遊戲。機器人嘗試不同的動作，每做一次好動作（例如得一分），它都會得到獎勵（例如 +1）。做錯動作（例如丟一分），它會受到懲罰（例如 -1）。隨着時間的推移，機器人會了解哪些動作可 ⌘ Read more