# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 7
# self = https://watcher.sour.is/conv/fzmmn2q
is there consensus on what characters should(n't) be allowed in nicks? i remember reading somewhere whitespace should not be allowed, but i don't see it in the spec on twtxt.dev — in fact, are there any other resources on twtxt extensions outside of twtxt.dev?
@zvava Good question. This is the spec, I think:

https://twtxt.dev/exts/metadata.html#nick

It doesn’t say much. 🤔

In the wild, I’ve only seen “traditional” nick names, i.e. ASCII 0x21 thru 0x7E.

My client removes anything but r'[a-zA-Z0-9]' from nick names.
@zvava Good question. This is the spec, I think:

https://twtxt.dev/exts/metadata.html#nick

It doesn’t say much. 🤔

In the wild, I’ve only seen “traditional” nick names, i.e. ASCII 0x21 thru 0x7E.

My client removes anything but r'[a-zA-Z0-9]' from nick names.
@zvava @movq I'm not entirely sure about the spaces, but maybe they were omitted to simplify parsing of mentions in the form of @<nick url>. If the next token after the @<nick does not look like a URL, it's not a mention but regular text. This is just wild guessing, though.

Looking at the regex and tests in the original twtxt reference implementation seems to confirm that theory in the sense as it relies on whitespace as the delimiter:

https://lyse.isobeef.org/tmp/screenshot-2025-09-17-21-30-25.png

Another thing about nicks is that the original twtxt reference implementation converts nicks to all lowercase:

https://lyse.isobeef.org/tmp/screenshot-2025-09-17-21-20-39.png

You probably know this already, the original twtxt file format specification can be found here: https://twtxt.readthedocs.io/en/latest/user/twtxtfile.html

As for extensions, I don't know of anything outside of twtxt.dev that has actually been (partially) implemented. However, there is also the issue tracker of the official reference implementation. You might wanna dig through that. For example, there is an alternative suggestions of multiline messages: https://github.com/buckket/twtxt/issues/157
@lyse @movq bbycll's nickname regex is /^([-_\p{N}\p{L}])+$/iu because i don't like how english-centric only allowing ascii letters/numbers is though this only applies to local users as of now, currently all nicknames are tolerated when parsing remote feeds and i just do mentions how yarn does (just the feed url)

in the wild, i've noticed a texedus feed with spaces in the nick (where its spec explicitly disallows whitespace in the nick) and feeds with other symbols in the nick too. honestly, i think we should just tolerate arbitrary nicknames for sake of user expression (while stripping or converting unreasonable characters) and just leave them out of mentions
@zvava @lyse @movq I also was wondering how to handle this.

Currently my regex is like this: /@<((?<nick>[^\s]+)\s)?(?<url>\w+:\/\/[^>]+)>/g

It takes everything until the space and the nick is optional.
@zvava In tt, I recognize umlauts in nicks, but they cannot include whitespace, @, !, #, (, ), [, ], <, >, " (but ' is okay). Whitespace also acts as a separator between nick and URL. @<Hello World http://example.com> ends up exactly like that and is not a mention.