The Watcher

	
# I am the Watcher. I am your guide through this vast new twtiverse.
# 
# Usage:
#     https://watcher.sour.is/api/plain/users              View list of users and latest twt date.
#     https://watcher.sour.is/api/plain/twt                View all twts.
#     https://watcher.sour.is/api/plain/mentions?uri=:uri  View all mentions for uri.
#     https://watcher.sour.is/api/plain/conv/:hash         View all twts for a conversation subject.
# 
# Options:
#     uri     Filter to show a specific users twts.
#     offset  Start index for quey.
#     limit   Count of items to return (going back in time).
# 
# twt range = 1 2
# self = https://watcher.sour.is/conv/k5bq7jq

prologic

twtxt.net

18 Sep 24 12:38 UTC

Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

### Breakdown:

1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since \( 2^5 = 32 \)).
2. 7 characters: With 7 characters, the total number of possible hashes is:
\[  32^7 = 3,518,437,208  \]
This gives about 3.5 billion possible hash values.

### Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
\[ P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
\]
Where:
- \(n\) is the number of generated Twt Hashes.
- \(M = 32^7 = 3,518,437,208\) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

- For 1,000 hashes:
\[  P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
\]
- For 10,000 hashes:
\[  P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
\]
- For 100,000 hashes:
\[  P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
\]

### Conclusion:

- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).=

prologic

twtxt.net

18 Sep 24 12:38 UTC

Current Twt Hash spec and probability of hash collision:

The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.

### Breakdown:

1. Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since \( 2^5 = 32 \)).
2. 7 characters: With 7 characters, the total number of possible hashes is:
\[  32^7 = 3,518,437,208  \]
This gives about 3.5 billion possible hash values.

### Probability of Collision:

The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.

The approximate formula for the probability of at least one collision after generating n hashes is:
\[ P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
\]
Where:
- \(n\) is the number of generated Twt Hashes.
- \(M = 32^7 = 3,518,437,208\) is the total number of possible hash values.

For practical purposes, here are some example probabilities for different numbers of hashes (n):

- For 1,000 hashes:
\[  P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
\]
- For 10,000 hashes:
\[  P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
\]
- For 100,000 hashes:
\[  P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
\]

### Conclusion:

- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).=