Text to Speech
Convert text to natural-sounding speech with our TTS API
Text to Speech API
Convert any text into natural-sounding speech using our advanced neural TTS models.
Endpoint
POST https://api.speechgen.com/v1/text-to-speechRequest
Headers
| Header | Value | Required |
|---|---|---|
Authorization | Bearer YOUR_API_KEY | Yes |
Content-Type | application/json | Yes |
Body Parameters
Prop
Type
Example Request
curl -X POST https://api.speechgen.com/v1/text-to-speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to Speechgen. This is a demonstration of our text-to-speech capabilities.",
"voice": "en-US-Neural2-A",
"format": "mp3",
"speed": 1.0,
"pitch": 0
}'Response
Success (200)
Returns binary audio data with appropriate content type:
audio/mpegfor MP3audio/wavfor WAVaudio/oggfor OGG
Response Headers:
Content-Type: audio/mpeg
Content-Length: 45678
X-Audio-Duration: 4.5
X-Characters-Used: 87Error Response
{
"error": {
"code": "invalid_voice",
"message": "The specified voice 'invalid-voice' does not exist.",
"param": "voice"
}
}SSML Support
Our API supports Speech Synthesis Markup Language (SSML) for advanced control:
{
"text": "<speak><prosody rate=\"slow\">Hello</prosody>, <break time=\"500ms\"/> welcome to <emphasis level=\"strong\">Speechgen</emphasis>!</speak>",
"voice": "en-US-Neural2-A",
"format": "mp3"
}Supported SSML Tags
| Tag | Description | Example |
|---|---|---|
<speak> | Root element | <speak>Hello</speak> |
<break> | Insert pause | <break time="500ms"/> |
<prosody> | Control rate, pitch, volume | <prosody rate="slow">...</prosody> |
<emphasis> | Add emphasis | <emphasis level="strong">...</emphasis> |
<say-as> | Interpret text type | <say-as interpret-as="date">2024-01-15</say-as> |
SSML text counts towards your character limit including the tags.
Code Examples
JavaScript/TypeScript
async function textToSpeech(text: string, voice: string): Promise<Blob> {
const response = await fetch("https://api.speechgen.com/v1/text-to-speech", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SPEECHGEN_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
text,
voice,
format: "mp3",
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.error.message);
}
return response.blob();
}
// Usage
const audio = await textToSpeech("Hello world!", "en-US-Neural2-A");Python
import requests
def text_to_speech(text: str, voice: str) -> bytes:
response = requests.post(
'https://api.speechgen.com/v1/text-to-speech',
headers={
'Authorization': f'Bearer {os.environ["SPEECHGEN_API_KEY"]}',
'Content-Type': 'application/json',
},
json={
'text': text,
'voice': voice,
'format': 'mp3',
}
)
response.raise_for_status()
return response.content
# Usage
audio = text_to_speech('Hello world!', 'en-US-Neural2-A')
with open('output.mp3', 'wb') as f:
f.write(audio)Best Practices
- Batch requests: Combine multiple short texts into single requests when possible
- Cache audio: Store generated audio to avoid regenerating the same content
- Use streaming: For long texts, consider using our streaming endpoint
- Choose appropriate format: Use MP3 for web, WAV for professional editing