Voice Cloning
Create custom voice models from audio samples
Voice Cloning API
Create custom voice models that sound like any voice using our advanced voice cloning technology.
Endpoint
POST https://api.speechgen.com/v1/voice-clone
Content-Type: multipart/form-dataRequest
Headers
| Header | Value | Required |
|---|---|---|
Authorization | Bearer YOUR_API_KEY | Yes |
Content-Type | multipart/form-data | Yes |
Form Data Parameters
Prop
Type
Supported Audio Formats
- MP3 (recommended)
- WAV
- M4A
- FLAC
- OGG
Audio Requirements: - Duration: 10 seconds to 5 minutes - Single speaker only - Clear audio with minimal background noise - File size: Maximum 10MB
Example Request
curl -X POST https://api.speechgen.com/v1/voice-clone \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "audio=@voice_sample.mp3" \
-F "name=My Custom Voice" \
-F "description=Professional narrator voice" \
-F "enhance_quality=true"Response
Success (201)
{
"model_id": "vm_abc123xyz",
"name": "My Custom Voice",
"description": "Professional narrator voice",
"status": "processing",
"created_at": "2024-01-15T10:30:00Z",
"estimated_completion": "2024-01-15T10:31:00Z",
"samples": [
{
"title": "Default Sample",
"text": "This is a demonstration of the voice clone...",
"audio_url": "https://cdn.speechgen.com/samples/abc123.mp3"
}
]
}Status Values
| Status | Description |
|---|---|
processing | Voice model is being created |
ready | Voice model is ready to use |
failed | Voice creation failed |
Check Model Status
GET https://api.speechgen.com/v1/models/{model_id}Response:
{
"model_id": "vm_abc123xyz",
"name": "My Custom Voice",
"status": "ready",
"voice_id": "custom-abc123",
"created_at": "2024-01-15T10:30:00Z",
"usage_count": 0
}Using Your Cloned Voice
Once the model status is ready, use the voice_id in text-to-speech requests:
curl -X POST https://api.speechgen.com/v1/text-to-speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is my cloned voice speaking!",
"voice": "custom-abc123",
"format": "mp3"
}'Code Examples
JavaScript
async function cloneVoice(audioFile, name, description) {
const formData = new FormData();
formData.append("audio", audioFile);
formData.append("name", name);
formData.append("description", description);
formData.append("enhance_quality", "true");
const response = await fetch("https://api.speechgen.com/v1/voice-clone", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SPEECHGEN_API_KEY}`,
},
body: formData,
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.error.message);
}
return response.json();
}
// Poll for completion
async function waitForModel(modelId) {
while (true) {
const response = await fetch(
`https://api.speechgen.com/v1/models/${modelId}`,
{
headers: {
Authorization: `Bearer ${process.env.SPEECHGEN_API_KEY}`,
},
}
);
const model = await response.json();
if (model.status === "ready") return model;
if (model.status === "failed") throw new Error("Voice cloning failed");
await new Promise((r) => setTimeout(r, 5000)); // Wait 5 seconds
}
}Python
import requests
import time
def clone_voice(audio_path: str, name: str, description: str = "") -> dict:
with open(audio_path, 'rb') as audio_file:
response = requests.post(
'https://api.speechgen.com/v1/voice-clone',
headers={
'Authorization': f'Bearer {os.environ["SPEECHGEN_API_KEY"]}',
},
files={
'audio': audio_file,
},
data={
'name': name,
'description': description,
'enhance_quality': 'true',
}
)
response.raise_for_status()
return response.json()
def wait_for_model(model_id: str) -> dict:
while True:
response = requests.get(
f'https://api.speechgen.com/v1/models/{model_id}',
headers={
'Authorization': f'Bearer {os.environ["SPEECHGEN_API_KEY"]}',
}
)
model = response.json()
if model['status'] == 'ready':
return model
if model['status'] == 'failed':
raise Exception('Voice cloning failed')
time.sleep(5)Tips for Best Results
Recording Tips: 1. Record in a quiet environment with minimal echo 2. Speak naturally at a consistent pace 3. Keep a consistent distance from the microphone 4. Avoid background music or noise 5. Record at least 30 seconds for best quality
Delete a Voice Model
DELETE https://api.speechgen.com/v1/models/{model_id}Response: 204 No Content