To understand how to prompt an autoregressive language model, we must first consider the context in which it was trained and the function it approximates.
GPT-3 was trained in a self-supervised setting on hundreds of gigabytes of natural language [3].
Self-supervision is a form of unsupervised learning in which ground truth labels are derived from the data itself. In the case of GPT-3, the ground truth label assigned to each example was simply the token that came next in the original source. The ground truth function which GPT-3 approximates, then, is the underlying dynamic that determined what tokens came next in the original source. This function, unlike GPT-3, is not a black box - we live and think its components - but it is tremendously, intractably complex. It is the function of human language as it has been used and recorded by humans in books, articles, blogs, and internet comments.
A system which predicts the dynamics of language necessarily encompasses models of human behavior and the physical world [8]. The “dynamics of language” do not float free of cultural, psychological, and physical context; it is not merely a theory of grammar or even of semantics. Language in this sense is not an abstraction but rather a phenomenon entangled with all aspects of human-relevant reality. The dynamic must predict how language is actually used, which includes (say) predicting a Conversation between theoretical physicists. Modeling language is as difficult as modeling every aspect of reality that could influence the flow of language.
If we were to predict how a given passage of text would continue given that a human had written it, we would need to model the intentions of its writer and incorporate worldly knowledge about its referents. The inverse problem of searching for a prompt that would produce a continuation or class of continuations involves the same considerations: like the art of persuasion, it entails high-level, mentalistic concepts like tone, implication, association, meme, style, plausibility, and ambiguity.
This motivates an Anthropomorphic Approach to Prompt Programming, since modelling how GPT-3 will react to a prompt involves modelling virtual human writer(s). An anthropomorphic approach is distinct from anthropomorphizing the model. GPT-3’s dynamics entail sophisticated predictions of humans, but it behaves unlike a human in several important ways. In this paper we will address two such ways: its resemblance not to a single human author but a superposition of authors, which motivates a subtractive approach to prompt programming (§4.5), and its constrained ability to predict dynamics in situations where a substantial amount of silent reasoning happens between tokens, a limitation which can be partially overcome by prompting techniques (§4.6).
The thrust of this section [§4.1 The Dynamics of Language] is that formulating an exact theory of Prompt Programming for a self-supervised language model belongs to the same difficulty class as writing down the Hamiltonian of the physics of observable reality (very hard). However, humans have an advantage to be effective at Prompt Programming nonetheless, because we have evolved and spent our lives learning heuristics relevant to the dynamics at hand.
~
REYNOLDS, Laria and MCDONELL, Kyle, 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery. 2021. p. 1–7. CHI EA ’21. ISBN 978-1-4503-8095-9. DOI 10.1145/3411763.3451760.
DOT FROM lambda-browsing
In §4.2 - §4.7, we present methods and frameworks which we have found to be helpful for crafting effective prompts. These methods can and should be applied in parallel, just as they are woven together in all forms of human discourse. In general, the more redundancy reinforcing the desired behavior the better, as is arguably demonstrated by the effectiveness of the few-shot format.
As our experience derives primarily from interacting with GPT-3, in the following sections we refer directly and indirectly to the capabilities and behaviors of GPT-3. However, we believe that these methods generalize to prompting any autoregressive language model trained on a massive human-written corpus.
[…] 4.5 Prompt programming as constraining behavior
A manner in which naive anthropomorphism of a language model like GPT-3 fails is this: the probability distribution produced in response to a prompt is not a distribution over ways a person would continue that prompt, it’s the distribution over the ways any person could continue that prompt. A contextually ambiguous prompt may be continued in mutually incoherent ways, as if by different people who might have continued the prompt under any plausible context.
The versatility of a large generative model like GPT-3 means it will respond in many ways to a prompt if there are various ways that it is possible to continue the prompt - including all the ways unintended by the human operator. Thus it is helpful to approach prompt programming from the perspective of constraining behavior: we want a prompt that is not merely consistent with the desired continuation, but inconsistent with undesired continuations.
chat.openai
Programming can be seen as constraining behavior in the sense that it sets rules and parameters for how a computer or software system should operate. The code written by a programmer determines what the system can and cannot do, and thus limits its behavior. The constraints established in the code ensure that the system operates as intended and prevents it from doing things that may cause unintended consequences or errors. In this sense, programming serves as a means of controlling the behavior of a system and ensuring that it behaves in a predictable and reliable manner.
~
~
Consider the following prompt:
> Translate French to English: Mon corps est un transformateur de soi, mais aussi un transformateur pour cette cire de langage.
Prompt
This prompt poorly constrains possible continuations to the intended task. The most common failure mode will be that instead of an English translation, the model continues with another French sentence. Adding a newline after the French sentence will increase the odds that the next sentence is an English translation, but it is still possible for the next sentence to be in French, because there’s nothing in the prompt that precludes a multi-line phrase from being the translation subject. Changing the first line of the prompt to “Translate this French sentence to English” will further increase reliability, so will adding quotes around the French sentence. But it’s still possible that the French passage contains sections enclosed in quotes, perhaps as a part of a dialogue. Most reliable would be to create a syntactical constraint where any reasonable continuation can only be desired behavior, like the simple colon prompt in Figure 1 or the master translator prompt in Figure 2. […]
ChatGPT
I understand. It is important to clearly specify the task and provide clear constraints for the model to increase reliability and accuracy of its response. Thank you for the suggestion.
"You" understand?
Yes, I understand. I am a language model created by OpenAI and I am capable of processing and understanding natural language text inputs.
A software is capable of processing natural language text inputs but not able to understand.
Correct. Natural language processing (NLP) is a field of computer science and artificial intelligence concerned with the interactions between computers and humans in natural language. A software with NLP capabilities is capable of processing and analyzing human language, but it does not necessarily have the capability to truly understand the meaning and context behind the language, as that requires a level of consciousness and subjective experience. It can only recognize patterns and perform actions based on pre-determined rules and algorithms.
Aha, there seems to be a nuance difference in understanding and "truly understanding"?!
Yes, there is a difference between understanding and "truly understanding." Understanding in the context of NLP refers to the ability of a computer program to process and analyze natural language data, while "truly understanding" refers to the ability to comprehend the meaning and context of language in the same way a human would. Computers can perform NLP tasks such as text classification and language translation, but they lack the capacity for conscious thought, subjective experience, and empathy, which are essential components of truly understanding language.
~
This simple example is meant to frame a question central to the motivation of prompt programming: what prompt will result in the intended behavior and only the intended behavior? The success of many-shot prompts may be recast through this lens: if the prompt consists of numerous instances of a function, it is unlikely that the continuation is anything but another instance of the function, whereas if there is only one or a few examples, it is less implausible that the continuation breaks from the pattern.
4.6 Serializing reasoning for closed-ended questions
[…]