FileTextData Catalog

Documents

Buy and sell document data — legal filings, contracts, patents, medical charts, inspection reports, and corporate filings. NLP companies need millions of real documents to train extraction, classification, and summarization models.

100 subtypes11 groups

Available Now · 6 listings

Enterprise Codebase Migration Artifacts — 2,400 Java-to-Kotlin Conversions with Test Suites

Paired Java and Kotlin source files from 2,400 real enterprise migration projects, each with corresponding unit test suites and migration notes. Includes build configs, dependency changes, and API compatibility annotations. Powers code translation AI, automated refactoring tools, and migration planning assistants.

2,400 projects, 18M lines of codelisted

Federal Court Docket Filings — 3.2M Cases, PACER-Sourced, Structured + Full Text

Complete federal court docket entries from all 94 district courts and 13 circuit courts of appeals. Includes case metadata (parties, judges, case type, disposition), full-text filings, and motion outcomes. Built for litigation analytics, judicial prediction models, and legal research AI.

3.2M cases, 28M individual docket entrieslisted

Clinical Radiology Reports — 8.4M Structured Reports with Matched DICOM Studies

Radiology dictation reports from a 12-hospital network paired with their source imaging studies (CT, MRI, X-ray). Reports are NLP-parsed into structured findings, impressions, and follow-up recommendations. Powers radiology AI copilots and automated report generation.

8.4M reports + matched imagingcontact

Commercial Real Estate Lease Agreements — 47K Contracts, 2015-2026, OCR-Processed, Entity-Tagged

Full-text commercial lease agreements from office, retail, and industrial properties across 38 US states. Each contract is OCR-processed, clause-segmented, and entity-tagged (landlord, tenant, guarantor, square footage, escalation terms, CAM provisions). Powers legal AI contract review and lease abstraction tools.

47,000 contracts (~1.2M pages)listed

News Article Archive — 18M Articles, 4,200 Sources, Political Bias Scored

Full-text news articles from 4,200 English-language sources (national papers, local outlets, digital-native publications) with political bias ratings, topic tags, and named entity extraction. Each article scored on a 7-point bias scale validated against AllSides and Media Bias/Fact Check. Built for misinformation detection, media monitoring AI, and balanced content curation.

18M articles, 4,200 sourceslisted

Open Source Vulnerability Patches — 47K CVEs with Before/After Code Diffs

Curated dataset of 47,000 CVE-linked vulnerability patches across Python, JavaScript, Java, Go, and C/C++ open source projects. Each entry includes the vulnerable code, the patch diff, CVE severity score, CWE classification, and exploit proof-of-concept where publicly available. Essential for training AI-powered code security scanners and automated patching systems.

47K CVEs, 128K affected fileslisted

Groups

Browse by group.group.

Financial Documents

Buy and sell financial documents data. Browse financial documents datasets in the Documents category. Find buyers and sellers of financial documents data on FileYield.

8 subtypesExplore

HR & Employment Documents

Buy and sell hr & employment documents data. Browse hr & employment documents datasets in the Documents category. Find buyers and sellers of hr & employment documents data on FileYield.

6 subtypesExplore

Technical Documents

Buy and sell technical documents data. Browse technical documents datasets in the Documents category. Find buyers and sellers of technical documents data on FileYield.

7 subtypesExplore

Education & Research

Buy and sell education & research data. Browse education & research datasets in the Documents category. Find buyers and sellers of education & research data on FileYield.

5 subtypesExplore

Real Estate Documents

Buy and sell real estate documents data. Browse real estate documents datasets in the Documents category. Find buyers and sellers of real estate documents data on FileYield.

7 subtypesExplore

Supply Chain & Logistics

Buy and sell supply chain & logistics data. Browse supply chain & logistics datasets in the Documents category. Find buyers and sellers of supply chain & logistics data on FileYield.

5 subtypesExplore

Compliance & Safety

Buy and sell compliance & safety data. Browse compliance & safety datasets in the Documents category. Find buyers and sellers of compliance & safety data on FileYield.

5 subtypesExplore

Government & Public Records

Buy and sell government & public records data. Browse government & public records datasets in the Documents category. Find buyers and sellers of government & public records data on FileYield.

7 subtypesExplore

Legal Documents

Buy and sell legal documents data. Browse legal documents datasets in the Documents category. Find buyers and sellers of legal documents data on FileYield.

10 subtypesExplore

Healthcare Documents

Buy and sell healthcare documents data. Browse healthcare documents datasets in the Documents category. Find buyers and sellers of healthcare documents data on FileYield.

7 subtypesExplore

Insurance Documents

Buy and sell insurance documents data. Browse insurance documents datasets in the Documents category. Find buyers and sellers of insurance documents data on FileYield.

5 subtypesExplore

All Subtypes

Documents

Browse by group.group.

Financial Documents

HR & Employment Documents

Technical Documents

Education & Research

Real Estate Documents

Supply Chain & Logistics

Compliance & Safety

Government & Public Records

Legal Documents

Healthcare Documents

Insurance Documents

Every data type.data type.

Maintenance & Repair Logs

Tax Returns

Title Documents

Product Manuals

Mining Claims & Mineral Rights

Legal Briefs & Memos

Product Reviews

Shipping Labels & Waybills

Trademark Applications

Weather & Climate Reports

Campaign Finance Records

Power of Attorney Documents

Immigration Forms & Applications

Lease Agreements

Deposition Transcripts

Workplace Incident Reports

Standard Operating Procedures

Job Descriptions

Material Test Certificates

Customs Declarations

Lab Results

Engineering Drawings & CAD Files

Medical Billing & Coding Data

Nonprofit 990 Tax Filings

Loan Applications

Lobbying Disclosures

Appraisal Reports

Warranty & RMA Records

Medical Records

Actuarial Tables & Models

Discharge Summaries

Bank Statements

Building Permits

Regulatory Filings

Loss Run Reports

Inspection Reports

Audit Reports

Business Registrations

Resumes & CVs

Performance Reviews

Claims Adjuster Notes

Shipping Manifests

Restaurant Health Inspections

Insurance Claims

Vehicle Title & Registration Data

Software License Agreements

Expense Reports

Court Filings

Warehouse & Inventory Records

Bills of Lading

Prescription Records

Patent Filings

Public Meeting Minutes

Academic Papers

Fire Safety Inspections

Grant Proposals

Contracts & Agreements

Employee Handbooks

Corporate Bylaws & Charters

Privacy Policies & TOS

FOIA Responses

Corporate Filings (10-K, 10-Q, 8-K)

Invoices & Receipts

AML & KYC Records

Insurance Policies

Zoning & Land Use Records