cc @stackeffect
cc @stackeffect
Exactly, you see correct UTF-8 encoded version (even with
content-type: text/plain
leaving out charset declaration).After following utf8test twtxt myself I now see that
jenny
does not handle it as UTF-8 when charset is missing from HTTP header, just like @quark has observed.So should
jenny
treat twtxt files always as UTF-8 encoded? I'm not sure about this.
content-type: text/plain
leaving out charset declaration).\n\nAfter following utf8test twtxt myself I now see that jenny
does not handle it as UTF-8 when charset is missing from HTTP header, just like @quark has observed.\n\nSo should jenny
treat twtxt files always as UTF-8 encoded? I'm not sure about this.
requests
library:https://docs.python-requests.org/en/latest/user/advanced/#encodings
Honestly, I’d rather not interfere with that.
They refer to RFC 2616, which indeed says ISO-8859-1 should be the default for
text/plain
. However, RFC 7231 says in appendix B that this has been removed and it’s now up to the media type. When we look at https://www.iana.org/assignments/media-types/media-types.xhtml#text, we see RFC 2046 listed for text/plain
. RFC 2046 including its update RFC 6657 specify US-ASCII as a default (https://www.rfc-editor.org/rfc/rfc6657#section-4). So, uhm, which one is correct? ISO-8859-1 or US-ASCII? None of those things specify UTF-8 as a default for text/plain
, though, this only applies to *new* text media type registrations.It’s a rabbit hole. That’s why I’d like to defer this to
requests
.
requests
library:\n\nhttps://docs.python-requests.org/en/latest/user/advanced/#encodings\n\nHonestly, I’d rather not interfere with that.\n\nThey refer to RFC 2616, which indeed says ISO-8859-1 should be the default for text/plain
. However, RFC 7231 says in appendix B that this has been removed and it’s now up to the media type. When we look at https://www.iana.org/assignments/media-types/media-types.xhtml#text, we see RFC 2046 listed for text/plain
. RFC 2046 including its update RFC 6657 specify US-ASCII as a default (https://www.rfc-editor.org/rfc/rfc6657#section-4). So, uhm, which one is correct? ISO-8859-1 or US-ASCII? None of those things specify UTF-8 as a default for text/plain
, though, this only applies to *new* text media type registrations.\n\nIt’s a rabbit hole. That’s why I’d like to defer this to requests
.
requests
library:https://docs.python-requests.org/en/latest/user/advanced/#encodings
Honestly, I’d rather not interfere with that.
They refer to RFC 2616, which indeed says ISO-8859-1 should be the default for
text/plain
. However, RFC 7231 says in appendix B that this has been removed and it’s now up to the media type. When we look at https://www.iana.org/assignments/media-types/media-types.xhtml#text, we see RFC 2046 listed for text/plain
. RFC 2046 including its update RFC 6657 specify US-ASCII as a default (https://www.rfc-editor.org/rfc/rfc6657#section-4). So, uhm, which one is correct? ISO-8859-1 or US-ASCII? None of those things specify UTF-8 as a default for text/plain
, though, this only applies to *new* text media type registrations.It’s a rabbit hole. That’s why I’d like to defer this to
requests
.
requests
library:https://docs.python-requests.org/en/latest/user/advanced/#encodings
Honestly, I’d rather not interfere with that.
They refer to RFC 2616, which indeed says ISO-8859-1 should be the default for
text/plain
. However, RFC 7231 says in appendix B that this has been removed and it’s now up to the media type. When we look at https://www.iana.org/assignments/media-types/media-types.xhtml#text, we see RFC 2046 listed for text/plain
. RFC 2046 including its update RFC 6657 specify US-ASCII as a default (https://www.rfc-editor.org/rfc/rfc6657#section-4). So, uhm, which one is correct? ISO-8859-1 or US-ASCII? None of those things specify UTF-8 as a default for text/plain
, though, this only applies to *new* text media type registrations.It’s a rabbit hole. That’s why I’d like to defer this to
requests
.


requests
. On the other hand we know that received data must be utf-8 (by twtxt spec) and it does burden "publishers" to somehow add charset
property to content-type
header. But again I'm not sure what "the right thing to do" (TM) is.
I'm not a Python programmer, so please bear with me.
The doc about encodings does also mention:
If you require a different encoding, you can manually set the Response.encoding property
Wouldn't that be a one liner like (Ruby example)?
'some text'.force_encoding('utf-8')
I understand that you do not want to interfere with
requests
. On the other hand we know that received data must be utf-8 (by twtxt spec) and it does burden "publishers" to somehow add charset
property to content-type
header. But again I'm not sure what "the right thing to do" (TM) is.
(Yes, the change is super simple. I just wasn’t sure earlier if I *wanted* to do this. But you’re absolutely right, twtxt says feeds must be UTF-8, so there’s no point in caring about the
Content-Type
header at all.)
(Yes, the change is super simple. I just wasn’t sure earlier if I *wanted* to do this. But you’re absolutely right, twtxt says feeds must be UTF-8, so there’s no point in caring about the
Content-Type
header at all.)
Content-Type
header at all.)
(Yes, the change is super simple. I just wasn’t sure earlier if I *wanted* to do this. But you’re absolutely right, twtxt says feeds must be UTF-8, so there’s no point in caring about the
Content-Type
header at all.)
Content-Type: text/plain; charset=utf-8
instead of just Content-Type: text/plain
. 🤣 Maybe I’ll remove that hack from my config now …
Content-Type: text/plain; charset=utf-8
instead of just Content-Type: text/plain
. 🤣 Maybe I’ll remove that hack from my config now …
Content-Type: text/plain; charset=utf-8
instead of just Content-Type: text/plain
. 🤣 Maybe I’ll remove that hack from my config now …
Updated. Will it be possible for the subject be moved at the begining instead (like Yarn and tt do)?
I just pulled it, works like a charm (as expected) ;-)