Name: Best practices to handle prompts that are too long for the LLM API (eg., Anthropic, OpenAi)?
Rating: 3.3 (4732 reviews)
Author: brando90

I am working with the Anthropic API to process text prompts, but I keep encountering the following error when my prompt exceeds the maximum token limit:

Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 200936 tokens > 199999 maximum'}}

I need to ensure my prompts are within the 199,999 token limit before sending them to the API. Here's what I have so far:

I generate a long prompt with approximately 150K words.
I use the count_tokens method to check the token count.
If the token count exceeds the limit, I trim the prompt and retry. Here's the code I'm using:

from anthropic_bedrock import AnthropicBedrock
import anthropic
import random
import string

# Function to generate a random word
def generate_random_word(length):
    return ''.join(random.choices(string.ascii_lowercase, k=length))

# Generate ~150K words
words = [generate_random_word(random.randint(3, 10)) for _ in range(150000)]
print(f'Number of words: {len(words)}')

test_prompt = ' '.join(words)

# Function to count the number of tokens
def count_number_tokens(prompt: str, verbose: bool = False) -> tuple[int, int]:
    bedrock_client = AnthropicBedrock()
    anthropic_client = anthropic.Client()

    try:
        token_count_bedrock = bedrock_client.count_tokens(prompt)
    except Exception as e:
        token_count_bedrock = -1
        if verbose:
            print(f"Error counting tokens with Bedrock: {e}")

    try:
        token_count_anthropic = anthropic_client.count_tokens(prompt)
    except Exception as e:
        token_count_anthropic = -1
        if verbose:
            print(f"Error counting tokens with Anthropic: {e}")

    if verbose:
        print(f'token_count_bedrock={token_count_bedrock}, token_count_anthropic={token_count_anthropic}')
    return token_count_bedrock, token_count_anthropic

# Maximum token limit
max_tokens = 199_999

# Function to trim the prompt
def trim_prompt(prompt: str, max_tokens: int) -> str:
    initial_length = len(prompt)
    while True:
        _, token_count = count_number_tokens(prompt)
        if token_count <= max_tokens:
            break
        # Reduce the size of the prompt
        prompt = prompt[:len(prompt) - 1000]
        if len(prompt) == initial_length:
            # Avoid infinite loop in case prompt length doesn't change
            prompt = prompt[:len(prompt) // 2]
    return prompt

# Trim the prompt to fit within the token limit
trimmed_prompt = trim_prompt(test_prompt, max_tokens)

# Final check
final_token_count_bedrock, final_token_count_anthropic = count_number_tokens(trimmed_prompt, verbose=True)
print(f'Final prompt length: {len(trimmed_prompt)} characters')
print(f'Final token count (Bedrock): {final_token_count_bedrock}')
print(f'Final token count (Anthropic): {final_token_count_anthropic}')

Questions:

Is there a more efficient way to handle prompts that are too long for the Anthropic API?
Are there any best practices or recommended approaches for trimming prompts to fit within token limits?
How can I ensure my approach does not inadvertently lead to an infinite loop or excessive API calls?

Any guidance or suggestions would be greatly appreciated!

Note: I it's basically impossible to deduce the exact token index to truncate the string, since those companies don't return that afaik.

Ref:

Anthropic's Discord Channel: https://discord.com/channels/1072196207201501266/1268741091377676309