A large language model (LLM) is a language model characterized by its large size. Its size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. LLMs are artificial neural networks which can contain a billion to a trillion weights, and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training. Alternative architectures include the mixture of experts (MoE), which has been proposed by Google in 2017.
As language models, they work by taking an input text and repeatedly predicting the next token or word. Up to 2020, fine tuning was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as GPT-3, however, can be prompt-engineered to achieve similar results. They are thought to acquire embodied knowledge about syntax, semantics and “ontology” inherent in human language corpora, but also inaccuracies and biases present in the corpora.
Notable examples include OpenAI’s GPT models (e.g., GPT-3.5 and GPT-4, used in ChatGPT), Google’s PaLM (used in Bard), and Meta’s LLaMa, as well as BLOOM, Ernie 3.0 Titan, and Anthropic’s Claude 2. —Wikipedia, “Large language model“