GPT-4 Is Coming Soon. Here’s What We Know So Far (GPT, GPT-2, GPT-3)

Aryan Bajaj
8 min readNov 20, 2022

Discover everything we currently know about GPT-4, including my assumptions and forecasts based on AI trends and OpenAI data.

Introduction

Source: Created By Artificial Intelligence

In these unusual times, new models are constantly being introduced that fundamentally alter the AI landscape. DALLE2, a cutting-edge text-to-image model, was released by OpenAI in July 2022, and now the API is also available. And, after a few weeks, stability. AI released Stable Diffusion, an open-source version of DALLE-2. Both of these models are popular and have demonstrated promising outcomes in terms of quality and capacity to interpret the prompt. Whisper, an Automatic Speech Recognition (ASR) model developed by OpenAI, was released a few months ago. It beat all other models in terms of robustness and accuracy.

Based on the trend, it is expected that OpenAI will release GPT-4 in the following months. There is a huge market demand for big language models, and the success of GPT-3 has demonstrated that users anticipate GPT-4 to have greater accuracy, compute optimization, fewer biases, and increased safety.

Despite OpenAI’s silence on the launch or features, we will make some assumptions and predictions regarding GPT-4 based on AI trends and data given by OpenAI in this blog. We will also learn about huge language models and their applications.

What is GPT?

Source: Hugging Face

Generative Pre-trained Transformer (GPT) is a text generation deep learning model trained on the data available on the internet. It is used for question & answers, text summarization, machine translation, classification, code generation, and conversation AI.

By reading my blogs, you can learn how to build your deep learning or machine learning or AI models. You will explore the fundamentals of deep understanding, gain an introduction to Tensorflow and Keras frameworks, and build multiple input and output models using different algorithms. Link for my Blogs — ARYAN BAJAJ

Prior to GPT

Prior to GPT-1, most Natural Language Processing (NLP) models were trained for specific tasks such as categorization, translation, and so on. They were all utilizing supervised learning. There are two drawbacks to this form of learning: a lack of labelled data and an inability to generalize tasks.

Source: The Gradient

Generative Pre-trained Transformer — 1 (GPT-1)

Source: Velog

The initial iteration of Generative Pre-Training-1 was developed by Open AI (GPT-1). It was trained using the Works Corpus data collection, which contains over 7000 unpublished books. The Transformer design served as the foundation for the GPT model. It was constructed of decoders piled on top of one another (12 decoders). These devices were similar to BERT in that they were built on Transformer architecture. The architecture of BERT differs in that it has stacked encoder layers. The GPT model is based on an autoregressive approach, which is similar to the one employed in RNN. It is a process in which the prior output is converted into the current input.

Generative Pre-trained Transformer — 2 (GPT-2)

Source: ResearchGate

Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence established by OpenAI in February 2019. GPT-2 interprets text, answers questions, summarises sections, and creates text output on a level that, while occasionally indistinguishable from that of humans, can become repetitious or nonsensical when creating extended passages. It is a general-purpose learner; it was not particularly taught to execute any of these tasks, and its ability to do so is an extension of its general capacity to accurately synthesise the next item in an arbitrary sequence. GPT-2 was developed as a “direct scale-up” of OpenAI’s 2018 GPT model, with a tenfold increase in both parameter count and training dataset size.

The GPT design leverages attention instead of earlier recurrence- and convolution-based architectures to create a deep neural network, especially a transformer model. The model’s attention processes allow it to choose to focus on parts of the incoming text that it anticipates will be the most relevant. This model significantly improves parallelization and outperforms earlier benchmarks for RNN/CNN/LSTM-based models.

Generative Pre-trained Transformer — 3(GPT-3)

Source: ZDNET

The article GPT-3 (175B parameters) was published in 2020. The model contains 100 times the number of parameters as GPT-2. To get good performance on downstream tasks, it was trained on an even bigger dataset. Its human-like tale authoring, SQL queries and Python scripts, language translation, and summarizing have astounded the globe. It obtained a cutting-edge outcome by combining In-context learning, few-shot, one-shot, and zero-shot settings.

Generative Pre-trained Transformer — 4 (GPT-4)

Source: Author

Now the most awaited part is GPT-4. After reading everything about GPT, now all the readers must have understood the concept of GPT. Let’s not waste time here and discuss a few new features in this section.

Sam Altman, the CEO of OpenAI, verified the speculations regarding the unveiling of the GPT-4 model during a question-and-answer session at the AC10 online event.

In this part, we’ll use that data and combine it with current trends to forecast the model size, optimal parameters and compute, multimodality, sparsity, and performance.

Source: Reddit

Size

GPT-4, according to Altman, will not be substantially larger than GPT-3. As a result, we may expect it to contain 175B-280B parameters, similar to Deepmind’s language model Gopher.

The huge model Megatron NLG is three times larger than the GPT-3 with 530B characteristics and performs similarly. The smaller model that followed it outperformed it in terms of performance. Simply said, more size does not imply better performance.

Altman stated that they are concentrating on improving the performance of smaller models. Big datasets, vast computational resources, and complicated implementation were required for the large language models. For certain businesses, even installing huge models is too expensive.

Parameterization

The majority of large models are under-optimized. Training the model is costly, and businesses must choose between accuracy and expense. GPT-3, for example, was only taught once, despite failures. Researchers were unable to do hyperparameter tuning due to prohibitively high prices. OpenAI demonstrated that GPT-3 may be enhanced by training it on appropriate hyperparameters. They observed that a 6.7B GPT-3 model with tuned hyperparameters outperformed a 13B GPT-3 model in terms of performance. They developed a new parameterization (μP), which states that the optimum hyperparameters for smaller models are the same as those for bigger models with the same design. Researchers can now optimise big models at a tenth of the expense.

Computation

DeepMind has discovered that the quantity of training tokens has the same effect on model performance as the size. They demonstrated this by training Chinchilla, a 70B model with four times the data of Gopher and four times the data of big language models since GPT-3. We may confidently anticipate that OpenAI will raise training tokens by 5 trillion for a compute-optimal model. It indicates that training the model with minimum loss will need 10–20X the FLOPs of GPT-3.

The GPT-4 will be a text-only Model

Altman stated during the Q&A that the GPT-4 will not be multimodal like DALL-E. The model will be text-only. Why? Good multimodal is more difficult to create than language or vision alone. It is difficult to combine written and visual information. It also implies that they must perform better than GPT-3 and DALL-E 2.

So don’t anticipate anything spectacular with GPT-4.

Sparsity

Conditional computation is used in sparse models to minimise computational expenses. Without incurring large processing expenses, the model can readily grow beyond 1 trillion parameters. It will assist us in training huge language models with few resources. GPT-4, on the other hand, will not use sparse models. Why? OpenAI has always depended on dense language models, and they will not expand the model’s size.

Artificial Intelligence Alignment

GPT-4 will be better aligned than GPT-3. OpenAI is having difficulty with Artificial Intelligence alignment. They want language models to reflect our intentions and ideals.

By training InstructGPT, they have made the first step. It is a GPT-3 model that has been trained to obey commands using human input. Human judges thought the model was superior to GPT-3. Regardless of linguistic standards.

Release date

The GPT-4 release date is still unknown, and we may presume that the corporation is concentrating its efforts on other technologies such as text-to-image and speech recognition. As a result, you may see it next year or next month. We don’t know for sure. What we can be certain of is that the next version will address the issue of the previous version and provide better outcomes.

Conclusion

GPT-4 will be a text-only big language model with comparable performance to GPT-3. It will also be closer to human orders and values.

You may hear conflicting information about GPT-4, which has 100 trillion parameters and focuses solely on code creation.

However, they are all speculative at this stage.

We don’t know much, and OpenAI hasn’t published anything definite regarding the release date, model design, size, or dataset.

GPT-4, like GPT-3, will be utilized for a variety of language applications including code generation, text summarization, language translation, categorization, chatbots, and grammar correction.

The new model will be more secure, less prejudiced, more accurate, and better aligned. It will also be cost-effective and durable.

In case of questions, leave a Comment or Email me at aryanbajaj104@gmail.com

ABOUT THE AUTHOR

Passionate about studying how to improve performance and automate tasks. Seeking to leverage data analytics, machine learning and artificial intelligence skills to improve corporate performance by optimum utilization of available resources.

Website — acumenfinalysis.com (CHECK THIS OUT)

CONTACTS:

If you have any questions or suggestions on what my next article should be about, please write to me at aryanbajaj104@gmail.com.

If you want to keep updated with my latest articles and projects, follow me on Medium.

Subscribe to my Medium Account

Click Here to Subscribe

CONNECT WITH ME VIA:

LinkedIn

--

--

Aryan Bajaj

Passionate about studying how to improve performance and automate tasks.