Large Language Models
It's not possible to summarise 2023 without saying something about LLMs. The heart of all LLMs is transformer-based architecture. This type of architecture was designed in 2017, and one of the first LLMs was a model for generating embeddings, named BERT (Bidirectional Encoder Representations from Transformers). The power behind these models lies in their architecture, which allows us to process and learn from huge datasets and text sources. The models are trained on massive datasets that include a diverse range of text, which enables them to learn patterns, grammar and contextual relationships. The models have created a real revolution in generating human-like texts. The most common usage is the generation of text, summarization, translation or answering questions, which is something that can boost many businesses.
When reviewing the topic of LLM, we also need to consider other aspects related to it, such as prompt-tuning or adaptation. To communicate with the model, users provide a prompt, which gives a command to the model. As more and more people started using ChatGPT, it turned out that some ways of writing prompts are better than others. Therefore, users started to structure the text in commands, which is well-known as prompt engineering. There are many courses and tips on how to get the best possible answers from ChatGPT by creating the best possible prompts. In some applications, one prompt is not enough, especially when an explanation is needed or when solving mathematical or logical tasks. It turns out that using a different strategy, a Chain of Thoughts, can be one of the solutions when carrying out such tasks.
As the LLMs were trained on 'standard' forms of language, it was necessary to add more specialist knowledge in the case of certain specialised contexts, such as Law or Medicine. In small models, it's possible to carry out fine-tuning, but LLMs are huge or only accessible via APIs, which makes it impossible to create a new version of a model on, for example, laptops or local machines. One approach to fine-tuning is Retrieval Augmented Generation (RAG), where your prompt provides the source of the new knowledge for the model. Note that OpenAI does offer fine-tuning and has its own instance of the model, but you need to prepare a good quality dataset for training, and training on a big dataset is costly.
Another topic around LLMs which is worth mentioning is LoRA (Low-Rank Adaptation of Large Language Models). It is a training technique that significantly reduces the number of trainable parameters. It works by inserting a smaller number of new weights into the model and only these weights are trained. This makes training with LoRA much faster, more memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share.