Advanced Features

Overview

The CTGT API offers advanced features to customize and optimize your AI interactions:

Streaming Responses

Real-time token-by-token output

Token Limits

Control response length and costs

Streaming Responses

Get responses in real-time as they’re generated, similar to ChatGPT’s typing effect.

Enable Streaming

Set stream: true in your request:

curl -X POST https://api.ctgt.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Write a short story"}
    ],
    "stream": true
  }'

Streaming Response Format

Server-Sent Events (SSE):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

...

data: [DONE]

When to Use Streaming

Interactive Applications

Chat interfaces, chatbots, conversational UIs

Long-Form Content

Articles, reports, stories, documentation

Better UX

Show progress, reduce perceived latency

Real-Time Feedback

Users see responses immediately

Streaming is ideal for user-facing applications where showing incremental progress improves the experience.

Token Limits

Control response length and manage costs with max_tokens.

Setting Token Limits

curl -X POST https://api.ctgt.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Summarize quantum physics"}
    ],
    "max_tokens": 100
  }'

Token Guidelines

Tokens	Approximate Length	Best For
50-100	1-2 short paragraphs	Quick answers, summaries
200-500	1-2 medium paragraphs	Explanations, descriptions
500-1000	1-2 pages	Detailed responses, articles
1000-2000	2-4 pages	Long-form content, essays
2000+	Multiple pages	Reports, documentation

1 token ≈ 4 characters or ¾ of a word in English.

Cost Impact

# Example: 1000-token response
# Gemini 2.5 Flash: $0.0027
# Claude Sonnet 4.5: $0.017
# Claude Opus 4.5: $0.030

# Setting max_tokens=200 reduces costs by 80%

Combining Features

Example: Production Chat Application

import requests
import json

def chat_completion(
    api_key: str,
    user_message: str,
    conversation_history: list = None,
    model: str = "gemini-2.5-flash",
    streaming: bool = True,
    temperature: float = 0.7,
    max_tokens: int = 500
):
    """
    Complete chat request with all advanced features
    """
    url = "https://api.ctgt.ai/v1/chat/completions"
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    # Build messages with history
    messages = []
    
    if conversation_history:
        messages.extend(conversation_history)
    
    messages.append({
        "role": "user",
        "content": user_message
    })
    
    # Request payload
    data = {
        "model": model,
        "messages": messages,
        "stream": streaming,
        "max_tokens": max_tokens
    }
    
    response = requests.post(url, headers=headers, json=data, stream=streaming)
    
    if streaming:
        # Handle streaming response
        full_response = ""
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data_str = line[6:]
                    if data_str == '[DONE]':
                        break
                    try:
                        chunk = json.loads(data_str)
                        content = chunk['choices'][0]['delta'].get('content', '')
                        full_response += content
                        print(content, end='', flush=True)
                    except json.JSONDecodeError:
                        pass
        print()
        return full_response
    else:
        # Handle non-streaming response
        result = response.json()
        return result['choices'][0]['message']['content']

# Usage
conversation = []

response1 = chat_completion(
    api_key="sk-ctgt-YOUR_API_KEY",
    user_message="What is machine learning?",
    conversation_history=conversation,
    streaming=True
)

conversation.append({"role": "user", "content": "What is machine learning?"})
conversation.append({"role": "assistant", "content": response1})

response2 = chat_completion(
    api_key="sk-ctgt-YOUR_API_KEY",
    user_message="Give me a code example",
    conversation_history=conversation,
    streaming=True
)

Best Practices

Set Token Limits

Prevent excessive costs
Control response length
Match your UI constraints

Enable Streaming

Better user experience
Show progress in real-time
Ideal for chat interfaces

Monitor Usage

Track token consumption
Set budget alerts
Optimize costs regularly

Handle Errors

Implement retry logic
Use exponential backoff
Log all failures

Error Handling

def safe_api_call(api_key, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.ctgt.ai/v1/chat/completions",
                headers={"Authorization": f"Bearer {api_key}"},
                json={
                    "model": "gemini-2.5-flash",
                    "messages": messages,
                    "max_tokens": 500
                },
                timeout=30
            )
            
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                # Rate limit - wait and retry
                time.sleep(2 ** attempt)
                continue
            else:
                print(f"Error: {response.status_code}")
                return None
                
        except requests.exceptions.Timeout:
            print(f"Timeout on attempt {attempt + 1}")
            if attempt < max_retries - 1:
                continue
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    return None

Next Steps

Code Examples

See complete SDK implementations

Models & Pricing

Choose the right model for your needs

Subscription & Billing

Manage your subscription and usage

Chat Completions API

Full API reference

Overview

API Features

Endpoints

Advanced Features

Overview

Streaming Responses

Token Limits

Streaming Responses

Enable Streaming

Streaming Response Format

When to Use Streaming

Interactive Applications

Long-Form Content

Better UX

Real-Time Feedback

Token Limits

Setting Token Limits

Token Guidelines

Cost Impact

Combining Features

Example: Production Chat Application

Best Practices

Set Token Limits

Enable Streaming

Monitor Usage

Handle Errors

Error Handling

Next Steps

Code Examples

Models & Pricing

Subscription & Billing

Chat Completions API

Overview

API Features

Endpoints

​Overview

Streaming Responses

Token Limits

​Streaming Responses

​Enable Streaming

​Streaming Response Format

​When to Use Streaming

Interactive Applications

Long-Form Content

Better UX

Real-Time Feedback

​Token Limits

​Setting Token Limits

​Token Guidelines

​Cost Impact

​Combining Features

​Example: Production Chat Application

​Best Practices

Set Token Limits

Enable Streaming

Monitor Usage

Handle Errors

​Error Handling

​Next Steps

Code Examples

Models & Pricing

Subscription & Billing

Chat Completions API

Overview

Streaming Responses

Enable Streaming

Streaming Response Format

When to Use Streaming

Token Limits

Setting Token Limits

Token Guidelines

Cost Impact

Combining Features

Example: Production Chat Application

Best Practices

Error Handling

Next Steps