Overview
The CTGT API offers advanced features to customize and optimize your AI interactions:
Streaming Responses Real-time token-by-token output
Temperature Control Adjust creativity and randomness
System Prompts Define AI personality and behavior
Token Limits Control response length and costs
Streaming Responses
Get responses in real-time as they’re generated, similar to ChatGPT’s typing effect.
Enable Streaming
Set stream: true in your request:
curl -X POST https://api.ctgt.ai/v1/chat/completions \
-H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{"role": "user", "content": "Write a short story"}
],
"stream": true
}'
Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1702847123,"model":"gemini-2.5-flash","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}
...
data: [DONE]
When to Use Streaming
Interactive Applications Chat interfaces, chatbots, conversational UIs
Long-Form Content Articles, reports, stories, documentation
Better UX Show progress, reduce perceived latency
Real-Time Feedback Users see responses immediately
Streaming is ideal for user-facing applications where showing incremental progress improves the experience.
Temperature Control
Adjust the randomness and creativity of AI responses.
Temperature Scale
0.0 ←─────────── 1.0 ────────────→ 2.0
Deterministic Balanced Creative
Focused Random
Setting Temperature
Low Temperature (0.0-0.3)
Balanced Temperature (0.7-1.0)
High Temperature (1.5-2.0)
curl -X POST https://api.ctgt.ai/v1/chat/completions \
-H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"temperature": 0.0
}'
Temperature Use Cases
Temperature Best For Example Use Cases 0.0 - 0.3 Factual, deterministic Math, code, data extraction, classification 0.4 - 0.7 Balanced responses General Q&A, summarization, translation 0.8 - 1.2 Creative variation Content writing, brainstorming 1.3 - 2.0 Maximum creativity Poetry, fiction, artistic content
Higher temperatures increase randomness and may reduce accuracy. Use lower temperatures for factual tasks.
System Prompts
Define the AI’s personality, role, and behavior using system messages.
Basic System Prompt
import requests
api_key = "sk-ctgt-YOUR_API_KEY"
url = "https://api.ctgt.ai/v1/chat/completions"
data = {
"model" : "claude-sonnet-4-5-20250929" ,
"messages" : [
{
"role" : "system" ,
"content" : "You are a expert financial analyst specializing in tech stocks. Provide detailed, data-driven analysis."
},
{
"role" : "user" ,
"content" : "Should I invest in AI companies?"
}
]
}
response = requests.post(url, headers = { "Authorization" : f "Bearer { api_key } " }, json = data)
print (response.json()[ 'choices' ][ 0 ][ 'message' ][ 'content' ])
System Prompt Examples
{
"role" : "system" ,
"content" : "You are a senior software engineer with 15 years of experience. Provide detailed technical explanations with code examples. Focus on best practices, performance, and maintainability."
}
{
"role" : "system" ,
"content" : "You are a bestselling fiction author. Write engaging, descriptive prose with vivid imagery. Use literary techniques like metaphors, foreshadowing, and character development."
}
{
"role" : "system" ,
"content" : "You are a meticulous code reviewer. Analyze code for bugs, security issues, performance problems, and style violations. Provide constructive feedback with specific suggestions."
}
{
"role" : "system" ,
"content" : "You are a friendly and helpful customer support representative. Be empathetic, patient, and solution-oriented. Always maintain a positive tone and provide clear, actionable steps."
}
{
"role" : "system" ,
"content" : "You are a data analyst expert in statistics and visualization. Provide insights with numbers, trends, and data-driven recommendations. Explain complex concepts clearly."
}
Multi-Turn Conversations
Maintain context across multiple messages:
messages = [
{ "role" : "system" , "content" : "You are a helpful coding assistant." },
{ "role" : "user" , "content" : "Write a Python function to calculate fibonacci" },
{ "role" : "assistant" , "content" : "def fibonacci(n): \n if n <= 1: \n return n \n return fibonacci(n-1) + fibonacci(n-2)" },
{ "role" : "user" , "content" : "Now optimize it with memoization" }
]
response = requests.post(url, headers = headers, json = {
"model" : "gpt-5" ,
"messages" : messages
})
System prompts are processed first and set the behavior for the entire conversation. They’re ideal for defining roles, constraints, and output formats.
Token Limits
Control response length and manage costs with max_tokens.
Setting Token Limits
Short Response (100 tokens)
Medium Response (500 tokens)
Long Response (2000 tokens)
curl -X POST https://api.ctgt.ai/v1/chat/completions \
-H "Authorization: Bearer sk-ctgt-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{"role": "user", "content": "Summarize quantum physics"}
],
"max_tokens": 100
}'
Token Guidelines
Tokens Approximate Length Best For 50-100 1-2 short paragraphs Quick answers, summaries 200-500 1-2 medium paragraphs Explanations, descriptions 500-1000 1-2 pages Detailed responses, articles 1000-2000 2-4 pages Long-form content, essays 2000+ Multiple pages Reports, documentation
1 token ≈ 4 characters or ¾ of a word in English.
Cost Impact
# Example: 1000-token response
# Gemini 2.5 Flash: $0.0027
# Claude Sonnet 4.5: $0.017
# Claude Opus 4.5: $0.030
# Setting max_tokens=200 reduces costs by 80%
Advanced Parameters
Top P (Nucleus Sampling)
Alternative to temperature for controlling randomness:
{
"model" : "gemini-2.5-flash" ,
"messages" : [ ... ],
"top_p" : 0.9
}
top_p: 1.0 - Consider all tokens (default)
top_p: 0.9 - Consider top 90% probability mass
top_p: 0.1 - Only most likely tokens
Presence Penalty
Encourage topic diversity:
{
"model" : "gpt-5" ,
"messages" : [ ... ],
"presence_penalty" : 0.6
}
Range: -2.0 to 2.0
Positive values encourage new topics
Negative values encourage repetition
Frequency Penalty
Reduce repetition:
{
"model" : "gpt-5" ,
"messages" : [ ... ],
"frequency_penalty" : 0.5
}
Range: -2.0 to 2.0
Positive values reduce word repetition
Higher values = more diverse vocabulary
Combining Features
Example: Production Chat Application
import requests
import json
def chat_completion (
api_key : str ,
user_message : str ,
conversation_history : list = None ,
model : str = "gemini-2.5-flash" ,
streaming : bool = True ,
temperature : float = 0.7 ,
max_tokens : int = 500
):
"""
Complete chat request with all advanced features
"""
url = "https://api.ctgt.ai/v1/chat/completions"
headers = {
"Authorization" : f "Bearer { api_key } " ,
"Content-Type" : "application/json"
}
# Build messages with history
messages = [
{
"role" : "system" ,
"content" : "You are a helpful assistant. Be concise and accurate."
}
]
if conversation_history:
messages.extend(conversation_history)
messages.append({
"role" : "user" ,
"content" : user_message
})
# Request payload
data = {
"model" : model,
"messages" : messages,
"stream" : streaming,
"temperature" : temperature,
"max_tokens" : max_tokens,
"presence_penalty" : 0.1 ,
"frequency_penalty" : 0.1
}
response = requests.post(url, headers = headers, json = data, stream = streaming)
if streaming:
# Handle streaming response
full_response = ""
for line in response.iter_lines():
if line:
line = line.decode( 'utf-8' )
if line.startswith( 'data: ' ):
data_str = line[ 6 :]
if data_str == '[DONE]' :
break
try :
chunk = json.loads(data_str)
content = chunk[ 'choices' ][ 0 ][ 'delta' ].get( 'content' , '' )
full_response += content
print (content, end = '' , flush = True )
except json.JSONDecodeError:
pass
print ()
return full_response
else :
# Handle non-streaming response
result = response.json()
return result[ 'choices' ][ 0 ][ 'message' ][ 'content' ]
# Usage
conversation = []
response1 = chat_completion(
api_key = "sk-ctgt-YOUR_API_KEY" ,
user_message = "What is machine learning?" ,
conversation_history = conversation,
streaming = True ,
temperature = 0.7
)
conversation.append({ "role" : "user" , "content" : "What is machine learning?" })
conversation.append({ "role" : "assistant" , "content" : response1})
response2 = chat_completion(
api_key = "sk-ctgt-YOUR_API_KEY" ,
user_message = "Give me a code example" ,
conversation_history = conversation,
streaming = True ,
temperature = 0.3 # Lower for code
)
Best Practices
Optimize Temperature
Use 0.0-0.3 for factual tasks
Use 0.7-1.0 for general content
Use 1.5+ for creative writing
Set Token Limits
Prevent excessive costs
Control response length
Match your UI constraints
Use System Prompts
Define clear roles
Set output formats
Establish constraints
Enable Streaming
Better user experience
Show progress in real-time
Ideal for chat interfaces
Error Handling
def safe_api_call ( api_key , messages , max_retries = 3 ):
for attempt in range (max_retries):
try :
response = requests.post(
"https://api.ctgt.ai/v1/chat/completions" ,
headers = { "Authorization" : f "Bearer { api_key } " },
json = {
"model" : "gemini-2.5-flash" ,
"messages" : messages,
"max_tokens" : 500 ,
"temperature" : 0.7
},
timeout = 30
)
if response.status_code == 200 :
return response.json()
elif response.status_code == 429 :
# Rate limit - wait and retry
time.sleep( 2 ** attempt)
continue
else :
print ( f "Error: { response.status_code } " )
return None
except requests.exceptions.Timeout:
print ( f "Timeout on attempt { attempt + 1 } " )
if attempt < max_retries - 1 :
continue
except Exception as e:
print ( f "Error: { e } " )
return None
return None
Next Steps