- Syntha AI
- Posts
- How ChatGPT works and why it will not replace good content authors
How ChatGPT works and why it will not replace good content authors
ChatGPT vs Human. Meta’s LLaMA vs GPT-3.
There have been many articles about ChatGPT and GPT-3 by OpenAI, with some claiming that these models will replace content creators. However, I believe this is not entirely true. These large language models will only replace creators of poor-quality content. To demonstrate this, I will provide a high-level explanation of how such models work.
How ChatGPT/GPT-3 generates the text
Let us look at what the prediction process in a large language model looks like. Firstly, I need to say, that there is no brain inside the model. It doesn’t understand the content in the same way as humans. Every model is just a function. Like the mathematical y = 3x + 5 function, but much more complicated.
We know, that ChatGPT or GPT-3 generate text based on a prompt. The process of generating a text is sequential. A network predicts word after word. Let us consider an example.
We start with a simple prompt “I like to go” and want to generate the continuation of it. In the first step model will split this prompt into so-called tokens. Though it is not 100% correct, for simplicity you can consider tokens to be words. In the next steps model predicts only one token in each step:
“I like to go” → “I like to go for” → “I like to go for a” → “I like to go for a walk”
When the token is predicted, it is appended to the sentence and used as input for the next token prediction. There are other tokens besides words. For example, the end of the sentence token or the end of the text token. If the end of the sentence token is predicted, the model will just put a dot. If the end of the text token is predicted — we can stop generation.
How does the model decide what token should be next?
Well, imagine a dictionary of all words in the English language. The model’s aim is to predict one word out of all those words available. One can say, that the model tries to classify the existing text, where the class will be the next word (or a special token).
The model can work in two modes. First mode: it will predict the most probable token at each moment of time for the given text. In this case for the same input text, it will output the same prediction.
In the second mode, each token in our English dictionary is assigned a probability. For most of the tokens, the probability will be close to 0, while for some of them, it will be higher. For example, for the text “I like to go for a” we can see the following predictions:
walk: 0.6
run: 0.3
everything else: 0.1
In this second mode, the model can select the predicted word with the predicted probability. I.e. with the probability 0.6, it will select the word walk, 0.3 — run, and with a really small probability it will select something else. The second mode allows for making the model more random and avoids the same prediction each time. Such behaviour is controlled by a user when the model is already trained.
How is this relates to the topic we are discussing? Well, each predicted word usually is the most probable word. It means, that the more our model faced such sequences of words in its training data, the more it will use it in its prediction.
And now my question to you: what do you see more on the internet — well-written content by specialists or SEO-optimised articles written by copywriters? I would assume that there is much more second type of content. And all this content is used for training the models.
That is the main reason for ChatGPT, GPT-3 and other language models writing quite boring text. It was trained with a lot of badly written content. And it will predict such content as the most probable.
Despite everything I’ve said, I think there is still great potential for high-quality content written by neural networks. I think we could see it when the model will be adapted to the specific writer. If, for example, a service could analyse all my previous writing and fine-tune it to write more like me. While now I still have to write this text by myself without the help of AI :-)
For now, I would say, that ChatGPT will definitely replace creators of boring content. But not creators of high-quality content.
News of the week
Meta presented their new Large Language model: LLaMA. In their paper, the authors state, that their model outperforms GPT-3 in many benchmarks despite being 10x smaller.
The LLaMA model is available for free for research purposes. This is exciting since it makes it available to many research groups around the world. Moreover, its relatively small size makes it possible for small groups to use it and build their own models on top of it. As a comparison, code for GPT-3 is unavailable at all, while the training of such a model would cost $3-5 million.
Thank you for reading my newsletter. I would greatly appreciate any feedback you have. Additionally, if there are any topics you would like to see covered, please let me know. You can reply to this email.