# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 2
# self = https://watcher.sour.is/conv/ytcwwva
With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>


If we adopted this scheme, we could have to increase the no. of characters (_first N_) from 11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k
With a SHA1 encoding the probability of a hash collision becomes, at various k (_number of twts_):


>>> import math
>>>
>>> def collision_probability(k, bits):
...     n = 2 ** bits  # Total unique hash values based on the number of bits
...     probability = 1 - math.exp(- (k ** 2) / (2 * n))
...     return probability * 100  # Return as percentage
...
>>> # Example usage:
>>> k_values = [100000, 1000000, 10000000]
>>> bits = 44  # Number of bits for the hash
>>>
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 44 bits: 0.0284%
Probability of collision for 1000000 hashes with 44 bits: 2.8022%
Probability of collision for 10000000 hashes with 44 bits: 94.1701%
>>> bits = 48
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 48 bits: 0.0018%
Probability of collision for 1000000 hashes with 48 bits: 0.1775%
Probability of collision for 10000000 hashes with 48 bits: 16.2753%
>>> bits = 52
>>> for k in k_values:
...     print(f"Probability of collision for {k} hashes with {bits} bits: {collision_probability(k, bits):.4f}%")
...
Probability of collision for 100000 hashes with 52 bits: 0.0001%
Probability of collision for 1000000 hashes with 52 bits: 0.0111%
Probability of collision for 10000000 hashes with 52 bits: 1.1041%
>>>


If we adopted this scheme, we could have to increase the no. of characters (_first N_) from 11 to 12 and finally 13 as we approach globally larger enough Twts across the space. I _think_ at least full crawl/scrape it was around ~500k (_maybe_)? https://search.twtxt.net/ says only ~99k