Audio

Buy and sell audio data — call center recordings, voicemail, podcast raw audio, courtroom proceedings, emergency dispatch, and environmental sound. Speech AI and voice recognition companies need diverse real-world audio datasets.

103 subtypes9 groups

Available Now · 4 listings

Spanish-English Bilingual Support Transcripts — 480K Conversations, Code-Switching Annotated

Transcribed customer support conversations where agents and callers switch between Spanish and English. Each segment is language-tagged at the sentence level with code-switching points annotated. Sourced from insurance and healthcare support lines. Critical for training multilingual NLP models and bilingual virtual agents.

480K conversations (~62K hours)listed

Customer Service Call Recordings — 1.2M Hours, Sentiment-Labeled, Multi-Industry

Inbound and outbound customer service calls from telecom, utilities, and financial services contact centers. Each call is transcribed, sentiment-scored at utterance level, and tagged with call disposition codes. Includes hold times, transfer events, and CSAT survey responses where available. Built for conversational AI training, agent coaching models, and IVR optimization.

1.2M hours from 340 contact centerslisted

Emergency Room Triage Call Recordings — 340K Hours, De-identified, HIPAA Compliant

Audio recordings from 14 Level I trauma center emergency departments spanning 2018-2026. Each call is transcribed, speaker-diarized, and tagged with chief complaint codes (ICD-10). Ideal for training medical triage AI, clinical NLP models, and patient routing systems.

340,000 hours (~4.2 PB raw audio)make_offer

Podcast Transcription Corpus — 890K Episodes, Speaker-Diarized, Topic-Classified

Full transcriptions of 890K English-language podcast episodes across 14 genres (true crime, business, technology, health, comedy, politics, etc.). Each episode is speaker-diarized, topic-modeled, and sentiment-scored at the segment level. Powers podcast search engines, content recommendation systems, and long-form audio AI.

890K episodes (~1.4M hours)listed

All Subtypes

Every data type.data type.