Ways to reduce token use
Long prompts are not always better. You can often reduce tokens by removing greetings, repeated instructions, vague filler, and background details that do not affect the answer.
Keep the task, context, constraints, and output format. Cut phrases such as “please,” “can you,” “really,” and repeated explanations once the request is already clear.
After reducing the prompt, count tokens again and compare the shorter version with the original. The best prompt is usually the shortest prompt that still gives the model enough context.
Quick checklist
- Remove duplicate lines and repeated instructions.
- Replace long phrases with direct verbs.
- Keep examples only when they change the expected answer.
- Use bullets instead of long paragraphs for constraints.
- Count tokens before and after reducing the prompt.