Today, I wanted to discuss an interesting topic that’s been on my mind – the reasoning abilities of Large Language Models (LLMs). You might already know that LLMs with 100B or more parameters can perform tasks like sentiment analysis and machine translation impressively well. However, they still struggle when it comes to certain multi-step reasoning tasks such as solving math word problems or commonsense reasoning.
But what if we could improve this? Method for this is called ‘Chain of Thought Prompting’. This method enables models to break down complex problems into intermediate steps. In essence, it allows the model to mimic an intuitive thought process when working through a problem.
Here is an example:
📌 Standard Prompting
Input Prompt
Q: Charlie has 5 chocolates. He buys 2 more packs of chocolates. Each pack has 3 chocolates. How many chocolates does he have now?
A: The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
Model Output
A: The answer is 27
This answer is wrong. It should be 9. Let’s see how we can break down the problem into multi-step reasoning.
📌 Chain of Thought Prompting
Input Prompt
Q: Charlie has 5 chocolates. He buys 2 more packs of chocolates. Each pack has 3 chocolates. How many chocolates does he have now?
A: Charlie started with 5 chocolates. 2 packs containing 3 chocolates each is 6 chocolates. 5 + 6 = 11. The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
Model Output
A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23-20= 3. They bought 6 more apples, so they have 3+ 6 = 9. The answer is 9.
Compared to standard prompting where the model is given examples of input-output pairs and asked to predict an answer for a test-time example, chain of thought prompting asks the model to produce intermediate reasoning steps before giving the final answer.
The beauty of this system is its language-based nature makes it applicable for any task that can be solved via language. Empirical experiments have shown that chain of thought prompting can enhance performance on various reasoning tasks.
In conclusion, successful chain-of-thought reasoning seems largely dependent on model scale – benefits are seen only with sufficient number (around 100B) of model parameters.
Looking forward to hearing your thoughts on this!
References:
- Google Research: https://blog.research.google/2022/05/language-models-perform-reasoning-via.html
- Featured image was generated with the help of DALL·E