๐ŸชŸ Fit Strategies

Three ways to truncate a chat history into a token budget. agentfit ships all three. Pick the one that matches your use case.

In every example, the chat has 8 messages: 1 system + 7 conversation turns. Imagine the budget only allows 4. Which 4 do you keep?

1. drop-oldest

Drop from the front. Keep the system message + most recent messages. Simple, predictable.

sys
u1
a1
u2
a2
u3
a3
u4

Best when: you only care about the recent thread (chat assistants, customer support).

2. drop-middle

Keep the system message + first message + last N. Drop the middle. Preserves both intent and recency.

sys
u1
a1
u2
a2
u3
a3
u4

Best when: the original ask matters (long-running tasks, agent loops where the task spec was set early).

3. priority

Each message can carry a priority score. Drop lowest-priority first. Most flexible, requires extra metadata.

sys
u1
p=1
a1
p=9
u2
p=2
a2
p=3
u3
p=8
a3
p=7
u4
p=4

Best when: you can tag messages (tool results vs. chit-chat), or you have a relevance score from a retrieval step.

system (always preserved) kept dropped

One-liner usage

from agentfit import fit

result = fit(
    messages,
    max_tokens=4000,
    model="claude-sonnet-4-6",
    strategy="drop-middle",     # or "drop-oldest" / "priority"
    preserve_system=True,
    preserve_last_n=3,
    on_over_budget="return-partial",
)
print(result.fit, result.tokens.before, "โ†’", result.tokens.after)

Try it live in the agent-stack-demo Space.