Skip to main content

Overview

The CTGT API offers advanced features to customize and optimize your AI interactions:

Streaming Responses

Real-time token-by-token output

Temperature Control

Adjust creativity and randomness

System Prompts

Define AI personality and behavior

Token Limits

Control response length and costs

Streaming Responses

Get responses in real-time as they’re generated, similar to ChatGPT’s typing effect.

Enable Streaming

Set stream: true in your request:
curl -X POST https://api.ctgt.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Write a short story"}
    ],
    "stream": true
  }'

Streaming Response Format

Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

...

data: [DONE]

When to Use Streaming

Interactive Applications

Chat interfaces, chatbots, conversational UIs

Long-Form Content

Articles, reports, stories, documentation

Better UX

Show progress, reduce perceived latency

Real-Time Feedback

Users see responses immediately
Streaming is ideal for user-facing applications where showing incremental progress improves the experience.

Temperature Control

Adjust the randomness and creativity of AI responses.

Temperature Scale

0.0 ←─────────── 1.0 ────────────→ 2.0
Deterministic    Balanced     Creative
   Focused                      Random

Setting Temperature

curl -X POST https://api.ctgt.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "temperature": 0.0
  }'

Temperature Use Cases

TemperatureBest ForExample Use Cases
0.0 - 0.3Factual, deterministicMath, code, data extraction, classification
0.4 - 0.7Balanced responsesGeneral Q&A, summarization, translation
0.8 - 1.2Creative variationContent writing, brainstorming
1.3 - 2.0Maximum creativityPoetry, fiction, artistic content
Higher temperatures increase randomness and may reduce accuracy. Use lower temperatures for factual tasks.

System Prompts

Define the AI’s personality, role, and behavior using system messages.

Basic System Prompt

import requests

api_key = "sk-ctgt-YOUR_API_KEY"
url = "https://api.ctgt.ai/v1/chat/completions"

data = {
    "model": "claude-sonnet-4-5-20250929",
    "messages": [
        {
            "role": "system",
            "content": "You are a expert financial analyst specializing in tech stocks. Provide detailed, data-driven analysis."
        },
        {
            "role": "user",
            "content": "Should I invest in AI companies?"
        }
    ]
}

response = requests.post(url, headers={"Authorization": f"Bearer {api_key}"}, json=data)
print(response.json()['choices'][0]['message']['content'])

System Prompt Examples

{
  "role": "system",
  "content": "You are a senior software engineer with 15 years of experience. Provide detailed technical explanations with code examples. Focus on best practices, performance, and maintainability."
}
{
  "role": "system",
  "content": "You are a bestselling fiction author. Write engaging, descriptive prose with vivid imagery. Use literary techniques like metaphors, foreshadowing, and character development."
}
{
  "role": "system",
  "content": "You are a meticulous code reviewer. Analyze code for bugs, security issues, performance problems, and style violations. Provide constructive feedback with specific suggestions."
}
{
  "role": "system",
  "content": "You are a friendly and helpful customer support representative. Be empathetic, patient, and solution-oriented. Always maintain a positive tone and provide clear, actionable steps."
}
{
  "role": "system",
  "content": "You are a data analyst expert in statistics and visualization. Provide insights with numbers, trends, and data-driven recommendations. Explain complex concepts clearly."
}

Multi-Turn Conversations

Maintain context across multiple messages:
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to calculate fibonacci"},
    {"role": "assistant", "content": "def fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)"},
    {"role": "user", "content": "Now optimize it with memoization"}
]

response = requests.post(url, headers=headers, json={
    "model": "gpt-5",
    "messages": messages
})
System prompts are processed first and set the behavior for the entire conversation. They’re ideal for defining roles, constraints, and output formats.

Token Limits

Control response length and manage costs with max_tokens.

Setting Token Limits

curl -X POST https://api.ctgt.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {"role": "user", "content": "Summarize quantum physics"}
    ],
    "max_tokens": 100
  }'

Token Guidelines

TokensApproximate LengthBest For
50-1001-2 short paragraphsQuick answers, summaries
200-5001-2 medium paragraphsExplanations, descriptions
500-10001-2 pagesDetailed responses, articles
1000-20002-4 pagesLong-form content, essays
2000+Multiple pagesReports, documentation
1 token ≈ 4 characters or ¾ of a word in English.

Cost Impact

# Example: 1000-token response
# Gemini 2.5 Flash: $0.0027
# Claude Sonnet 4.5: $0.017
# Claude Opus 4.5: $0.030

# Setting max_tokens=200 reduces costs by 80%

Advanced Parameters

Top P (Nucleus Sampling)

Alternative to temperature for controlling randomness:
{
  "model": "gemini-2.5-flash",
  "messages": [...],
  "top_p": 0.9
}
  • top_p: 1.0 - Consider all tokens (default)
  • top_p: 0.9 - Consider top 90% probability mass
  • top_p: 0.1 - Only most likely tokens

Presence Penalty

Encourage topic diversity:
{
  "model": "gpt-5",
  "messages": [...],
  "presence_penalty": 0.6
}
  • Range: -2.0 to 2.0
  • Positive values encourage new topics
  • Negative values encourage repetition

Frequency Penalty

Reduce repetition:
{
  "model": "gpt-5",
  "messages": [...],
  "frequency_penalty": 0.5
}
  • Range: -2.0 to 2.0
  • Positive values reduce word repetition
  • Higher values = more diverse vocabulary

Combining Features

Example: Production Chat Application

import requests
import json

def chat_completion(
    api_key: str,
    user_message: str,
    conversation_history: list = None,
    model: str = "gemini-2.5-flash",
    streaming: bool = True,
    temperature: float = 0.7,
    max_tokens: int = 500
):
    """
    Complete chat request with all advanced features
    """
    url = "https://api.ctgt.ai/v1/chat/completions"
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    # Build messages with history
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant. Be concise and accurate."
        }
    ]
    
    if conversation_history:
        messages.extend(conversation_history)
    
    messages.append({
        "role": "user",
        "content": user_message
    })
    
    # Request payload
    data = {
        "model": model,
        "messages": messages,
        "stream": streaming,
        "temperature": temperature,
        "max_tokens": max_tokens,
        "presence_penalty": 0.1,
        "frequency_penalty": 0.1
    }
    
    response = requests.post(url, headers=headers, json=data, stream=streaming)
    
    if streaming:
        # Handle streaming response
        full_response = ""
        for line in response.iter_lines():
            if line:
                line = line.decode('utf-8')
                if line.startswith('data: '):
                    data_str = line[6:]
                    if data_str == '[DONE]':
                        break
                    try:
                        chunk = json.loads(data_str)
                        content = chunk['choices'][0]['delta'].get('content', '')
                        full_response += content
                        print(content, end='', flush=True)
                    except json.JSONDecodeError:
                        pass
        print()
        return full_response
    else:
        # Handle non-streaming response
        result = response.json()
        return result['choices'][0]['message']['content']

# Usage
conversation = []

response1 = chat_completion(
    api_key="sk-ctgt-YOUR_API_KEY",
    user_message="What is machine learning?",
    conversation_history=conversation,
    streaming=True,
    temperature=0.7
)

conversation.append({"role": "user", "content": "What is machine learning?"})
conversation.append({"role": "assistant", "content": response1})

response2 = chat_completion(
    api_key="sk-ctgt-YOUR_API_KEY",
    user_message="Give me a code example",
    conversation_history=conversation,
    streaming=True,
    temperature=0.3  # Lower for code
)

Best Practices

Optimize Temperature

  • Use 0.0-0.3 for factual tasks
  • Use 0.7-1.0 for general content
  • Use 1.5+ for creative writing

Set Token Limits

  • Prevent excessive costs
  • Control response length
  • Match your UI constraints

Use System Prompts

  • Define clear roles
  • Set output formats
  • Establish constraints

Enable Streaming

  • Better user experience
  • Show progress in real-time
  • Ideal for chat interfaces

Error Handling

def safe_api_call(api_key, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.ctgt.ai/v1/chat/completions",
                headers={"Authorization": f"Bearer {api_key}"},
                json={
                    "model": "gemini-2.5-flash",
                    "messages": messages,
                    "max_tokens": 500,
                    "temperature": 0.7
                },
                timeout=30
            )
            
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                # Rate limit - wait and retry
                time.sleep(2 ** attempt)
                continue
            else:
                print(f"Error: {response.status_code}")
                return None
                
        except requests.exceptions.Timeout:
            print(f"Timeout on attempt {attempt + 1}")
            if attempt < max_retries - 1:
                continue
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    return None

Next Steps