๐ช Fit Strategies
Three ways to truncate a chat history into a token budget. agentfit ships all three. Pick the one that matches your use case.
In every example, the chat has 8 messages: 1 system + 7 conversation turns. Imagine the budget only allows 4. Which 4 do you keep?
1. drop-oldest
Drop from the front. Keep the system message + most recent messages. Simple, predictable.
Best when: you only care about the recent thread (chat assistants, customer support).
2. drop-middle
Keep the system message + first message + last N. Drop the middle. Preserves both intent and recency.
Best when: the original ask matters (long-running tasks, agent loops where the task spec was set early).
3. priority
Each message can carry a priority score. Drop lowest-priority first. Most flexible, requires extra metadata.
p=1
p=9
p=2
p=3
p=8
p=7
p=4
Best when: you can tag messages (tool results vs. chit-chat), or you have a relevance score from a retrieval step.
One-liner usage
from agentfit import fit
result = fit(
messages,
max_tokens=4000,
model="claude-sonnet-4-6",
strategy="drop-middle", # or "drop-oldest" / "priority"
preserve_system=True,
preserve_last_n=3,
on_over_budget="return-partial",
)
print(result.fit, result.tokens.before, "โ", result.tokens.after)
Try it live in the agent-stack-demo Space.