Voice & Speech
Buy and sell voice & speech data. Browse voice & speech datasets in the Audio category. Find buyers and sellers of voice & speech data on FileYield.
Available Now · 3 listings
Spanish-English Bilingual Support Transcripts — 480K Conversations, Code-Switching Annotated
Transcribed customer support conversations where agents and callers switch between Spanish and English. Each segment is language-tagged at the sentence level with code-switching points annotated. Sourced from insurance and healthcare support lines. Critical for training multilingual NLP models and bilingual virtual agents.
Customer Service Call Recordings — 1.2M Hours, Sentiment-Labeled, Multi-Industry
Inbound and outbound customer service calls from telecom, utilities, and financial services contact centers. Each call is transcribed, sentiment-scored at utterance level, and tagged with call disposition codes. Includes hold times, transfer events, and CSAT survey responses where available. Built for conversational AI training, agent coaching models, and IVR optimization.
Podcast Transcription Corpus — 890K Episodes, Speaker-Diarized, Topic-Classified
Full transcriptions of 890K English-language podcast episodes across 14 genres (true crime, business, technology, health, comedy, politics, etc.). Each episode is speaker-diarized, topic-modeled, and sentiment-scored at the segment level. Powers podcast search engines, content recommendation systems, and long-form audio AI.
Subtypes