Skip to main content

Subscription Tiers

CTGT API offers two tiers designed for different needs:

Free Tier

Perfect for getting started
  • 3 AI models
  • 20 req/min, 100 req/hour
  • 500 req/day
  • 100K tokens/day
  • Pay-as-you-go
  • No credit card required

Paid Tier

For production applications
  • All 10 AI models
  • 100 req/min, 1,000 req/hour
  • 10,000 req/day
  • 10M tokens/day
  • Pay-as-you-go
  • Priority support

Rate Limits Comparison

LimitFree TierPaid TierIncrease
Requests per minute201005x
Requests per hour1001,00010x
Requests per day50010,00020x
Tokens per day100,00010,000,000100x
Available models3103.6x
Rate limits reset at the start of each time period (minute, hour, day).

Rate Limit Headers

Every API response includes rate limit information in the headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1702847523
Headers explained:
  • X-RateLimit-Limit: Maximum requests allowed in the current window
  • X-RateLimit-Remaining: Requests remaining in current window
  • X-RateLimit-Reset: Unix timestamp when the limit resets

Example: Checking Rate Limits

import requests

response = requests.post(
    "https://api.ctgt.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"model": "gemini-2.5-flash", "messages": [...]}
)

print(f"Limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Resets at: {response.headers.get('X-RateLimit-Reset')}")

Handling Rate Limits

When you exceed your rate limits: Status Code: 429 Too Many Requests Response:
{
  "detail": "Rate limit exceeded. Please try again later."
}

Best Practices

import time
import requests

def make_request_with_retry(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)
        
        if response.status_code == 429:
            wait_time = 2 ** attempt  # Exponential backoff
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
            continue
        
        return response
    
    raise Exception("Max retries exceeded")
def monitor_rate_limits(response):
    remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
    
    if remaining < 10:
        print(f"Warning: Only {remaining} requests remaining!")
    
    if remaining == 0:
        reset_time = int(response.headers.get('X-RateLimit-Reset', 0))
        wait_seconds = reset_time - time.time()
        print(f"Rate limit exhausted. Waiting {wait_seconds}s")
        time.sleep(wait_seconds)
from functools import lru_cache
import hashlib
import json

@lru_cache(maxsize=1000)
def cached_api_call(prompt_hash):
    # Make actual API call
    response = requests.post(...)
    return response.json()

def get_completion(prompt):
    # Create hash of prompt for caching
    prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
    return cached_api_call(prompt_hash)
# Instead of 100 individual requests
for item in items:
    result = api_call(item)

# Batch into fewer requests with multiple items
for batch in chunks(items, 10):
    results = api_call_batch(batch)
Pro tip: Upgrade to paid tier for 5-100x higher rate limits if you’re consistently hitting limits.

Optimize Costs

Choose the Right Model

Use cheaper models for simple tasks:
  • Gemini Flash Lite: $0.30 input
  • GPT-5 Nano: $0.25 input

Control Token Limits

Set max_tokens to limit response length:
{"max_tokens": 500}

Optimize Prompts

Shorter prompts = lower costs:
  • Be concise
  • Remove unnecessary context
  • Keep messages focused

Cache Common Responses

Store and reuse responses for:
  • FAQ answers
  • Common queries
  • Static content

Pricing Summary

Pay-as-you-go Pricing

Both tiers pay for token usage at the same rates:
Model CategoryInput (per 1M)Output (per 1M)
Most Affordable0.250.25 - 0.500.600.60 - 2.70
Mid-Range1.201.20 - 4.005.205.20 - 14.00
Premium5.005.00 - 10.0017.0017.00 - 30.00
See the Models & Pricing page for complete pricing details.

Example Cost Scenarios

Scenario 1: Small Project (Free Tier)

Usage:
  • 500 requests/day
  • Average 100 input + 300 output tokens per request
  • Using Gemini 2.5 Flash
Monthly Cost:
  • Input: 500 × 30 × 100 tokens = 1.5M tokens = $0.75
  • Output: 500 × 30 × 300 tokens = 4.5M tokens = $12.15
  • Total: $12.90/month

Scenario 2: Medium Project (Paid Tier)

Usage:
  • 5,000 requests/day
  • Average 200 input + 500 output tokens per request
  • Mix of Gemini Flash and GPT-5
Monthly Cost:
  • Usage: ~$150-200
  • Total: $150-200/month

Scenario 3: Large Project (Paid Tier)

Usage:
  • 50,000 requests/day
  • Using advanced models (Claude Sonnet, GPT-5.2)
  • Complex queries with higher token counts
Monthly Cost:
  • Usage: ~$1,500-2,500
  • Total: $1,500-2,500/month
All scenarios assume normal usage patterns. Your costs may vary based on actual token consumption.

Next Steps