The Watcher

prologic

twtxt.net

@abucci Can you explain the type of neural networks behind these *GPT(s) and how they differ from more traditional ANNs? 🙏*

prologic

twtxt.net

07 Apr 23 08:00 UTC

View Thread

@abucci Can you explain the type of neural networks behind these *GPT(s) and how they differ from more traditional ANNs? 🙏*

prologic

twtxt.net

07 Apr 23 08:00 UTC

View Thread

@abucci Can you explain the type of neural networks behind these *GPT(s) and how they differ from more traditional ANNs? 🙏*

abucci

anthony.buc.ci

07 Apr 23 13:34 UTC

View Thread

@prologic Dude that is a big ask. I'm not sure I could describe the structure faithfully even if I had the time to do that, because OpenAI sucks and won't publish the structure. Here's the original GPT-3 paper: https://arxiv.org/pdf/2005.14165.pdf . You'll see it's scant on details, which is one of the million criticisms of OpenAI: they are not conducting science here, even though they pretend to be. It's some weird combination of marketing and big dick contest with them. The first 10-ish pages give a detail-free description of the neural network, and the remaining 65 pages are bragging about how great they are. I've never seen anything quite like it in all the tens of thousands of research articles I've encountered over the course of my career.

The best description I've heard of it is that it's an extremely complicated autocomplete. The way it (most likely) works is that it reads through the sequence of text a user enters (the prompt), and then begins generating the next words that it deems likely to follow the prompt text. Very much how autocomplete on a smartphone keyboard works. It's a generative model, which means the neural network is probably being trained to learn the mean, standard deviation, and possibly other statistics about some probabilistic generative model (undescribed by OpenAI to my knowldge). There were some advances in LSTM around the time GPT was becoming popular, so it's possible they use a variant of that.

Hope that suffices for now!

abucci

anthony.buc.ci

07 Apr 23 13:34 UTC

View Thread

@prologic Dude that is a big ask. I'm not sure I could describe the structure faithfully even if I had the time to do that, because OpenAI sucks and won't publish the structure. Here's the original GPT-3 paper: https://arxiv.org/pdf/2005.14165.pdf . You'll see it's scant on details, which is one of the million criticisms of OpenAI: they are not conducting science here, even though they pretend to be. It's some weird combination of marketing and big dick contest with them. The first 10-ish pages give a detail-free description of the neural network, and the remaining 65 pages are bragging about how great they are. I've never seen anything quite like it in all the tens of thousands of research articles I've read over the course of my career.

The best description I've heard of it is that it's an extremely complicated autocomplete. The way it (most likely) works is that it reads through the sequence of text a user enters (the prompt), and then begins generating the next words that it deems likely to follow the prompt text. Very much how autocomplete on a smartphone keyboard works. It's a generative model, which means the neural network is probably being trained to learn the mean, standard deviation, and possibly other statistics about some probabilistic generative model (undescribed by OpenAI to my knowldge). There were some advances in LSTM around the time GPT was becoming popular, so it's possible they use a variant of that.

abucci

anthony.buc.ci

07 Apr 23 13:34 UTC

View Thread

@prologic That is a big ask. I'm not sure I could describe the structure faithfully even if I had the time to do that, because OpenAI sucks and won't publish the structure. Here's the original GPT-3 paper: https://arxiv.org/pdf/2005.14165.pdf . You'll see it's scant on details, which is one of the million criticisms of OpenAI: they are not conducting science here, even though they pretend to be. It's some weird combination of marketing and big dick contest with them. The first 10-ish pages give a detail-free description of the neural network, and the remaining 65 pages are bragging about how great they are. I've never seen anything quite like it in all the tens of thousands of research articles I've encountered over the course of my career.

The best description I've heard of it is that it's an extremely complicated autocomplete. The way it (most likely) works is that it reads through the sequence of text a user enters (the prompt), and then begins generating the next words that it deems likely to follow the prompt text. Very much how autocomplete on a smartphone keyboard works. It's a generative model, which means the neural network is probably being trained to learn the mean, standard deviation, and possibly other statistics about some probabilistic generative model (undescribed by OpenAI to my knowldge). There were some advances in LSTM around the time GPT was becoming popular, so it's possible they use a variant of that.

Hope that suffices for now!

abucci

anthony.buc.ci

07 Apr 23 13:34 UTC

View Thread

@prologic Dude that is a big ask. I'm not sure I could describe the structure faithfully even if I had the time to do that, because OpenAI sucks and won't publish the structure. Here's the original GPT-3 paper: https://arxiv.org/pdf/2005.14165.pdf . You'll see it's scant on details, which is one of the million criticisms of OpenAI: they are not conducting science here, even though they pretend to be. It's some weird combination of marketing and big dick contest with them. The first 10-ish pages give a detail-free description of the neural network, and the remaining 65 pages are bragging about how great they are. I've never seen anything quite like it in all the tens of thousands of research articles I've read over the course of my career.

The best description I've heard of it is that it's an extremely complicated autocomplete. The way it (most likely) works is that it reads through the sequence of text a user enters (the prompt), and then begins generating the next words that it deems likely to follow the prompt text. Very much how autocomplete on a smartphone keyboard works. It's a generative model, which means the neural network is probably being trained to learn the mean, standard deviation, and possibly other statistics about some probabilistic generative model (undescribed by OpenAI to my knowldge). There were some advances in LSTM around the time GPT was becoming popular, so it's possible they use a variant of that.

Hope that suffices for now!

prologic

twtxt.net

08 Apr 23 09:10 UTC

View Thread

@abucci Interestingly the Wikipedia article on GPT-3 describe it as:

> Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model released in 2020 that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.

Which is even more confusing to me, mostly because it doesn't speak of a neural network at all. Basically I was (on my short-lived holiday) doing some R&D on neural networks, evolutionary algorithms and other reading 😅

prologic

twtxt.net

08 Apr 23 09:10 UTC

View Thread

prologic

twtxt.net

08 Apr 23 09:10 UTC

View Thread

prologic

twtxt.net

08 Apr 23 09:11 UTC

View Thread

I _tried_ to read up on autoregressive language models(s) btw, and gave up. Way over my puny head 🤦‍♂️

prologic

twtxt.net

08 Apr 23 09:11 UTC

View Thread

I _tried_ to read up on autoregressive language models(s) btw, and gave up. Way over my puny head 🤦‍♂️

prologic

twtxt.net

08 Apr 23 09:11 UTC

View Thread

I _tried_ to read up on autoregressive language models(s) btw, and gave up. Way over my puny head 🤦‍♂️

abucci

anthony.buc.ci

08 Apr 23 18:38 UTC

View Thread

@prologic geez, yes that's horrible. "Autoregressive" just means that the next token in a sequence is a function of previous ones, and "language model" here just means a probability distribution over sequences. "Autoregressive language model" is an infuriatingly obtuse way to describe autocomplete!

Like, if you type "The dog is", autocomplete will suggest some words from you that are likely to come next. Maybe "barking", "wet", "hungry", ... It'll rank those by how high a probability it rates each follow-up word. It'll probably not suggest words like "uranium" or "quickly", because you very rarely if ever encounter those words after "The dog is" in English sentences so their probability is very low.

👆 That's the "autoregressive" part.

It gets these probabilities from a "language model", which is a fancy way of saying a table of probabilities. A literal lookup table of the probabilities would be wayyyyy too big to be practical, so neural networks are often used as a representation of the lookup table, and deep learning (many-layered neural networks + a learning algorithm) is the hotness lately so they use that.

👆 That's the "language model" part.

So, you enter a prompt for ChatGPT. It runs fancy autocorrect to pick a word that should come next. It runs fancy autocorrect again to see what word will come next *after the last word it predicted and some of your prompt words*. Repeat to generate as many words as needed. There's probably a heuristic or a special "END OF CHAT" token to indicate when it should stop generating and send its response to you. Uppercase and lowercase versions of the tokens are in there so it can generate those. Punctuation is in there so it can generate that. With a good enough "language model", it'll do nice stuff like close parens and quotes, correctly start a sentence with a capital letter, add paragraph breaks, and so on.

There's really not much more to it than that, aside from a crapton of engineering to make all that work at the scale they're doing it.

abucci

anthony.buc.ci

08 Apr 23 18:40 UTC

View Thread

(not to imply the engineering parts, including the data acquisition and cleanup, are easy, and not to imply there aren't a million tricks in there to make sure this all works nicely. it's a hell of a feat of engineering and those two twts I wrote only outline at a very high level one way it might work).

abucci

anthony.buc.ci

08 Apr 23 18:44 UTC

View Thread

@prologic oh, nice. Did you stumble on recurrent neural networks?

prologic

twtxt.net

08 Apr 23 23:07 UTC

View Thread

@abucci Noice! 👌 Bwtween you and my reading I have a muumuu deeper understanding of this shit 🙇‍♂️

Sasly I didn't come across RNNS though 😆 But yhay doesn't matter 🤔

prologic

twtxt.net

08 Apr 23 23:07 UTC

View Thread

@abucci Noice! 👌 Bwtween you and my reading I have a muumuu deeper understanding of this shit 🙇‍♂️

Sasly I didn't come across RNNS though 😆 But yhay doesn't matter 🤔

prologic

twtxt.net

08 Apr 23 23:07 UTC

View Thread

@abucci Noice! 👌 Bwtween you and my reading I have a muumuu deeper understanding of this shit 🙇‍♂️

Sasly I didn't come across RNNS though 😆 But yhay doesn't matter 🤔