unstructured
Unstructured-IO/unstructured
IntermediateGood First Issues Available
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Repository Metrics
14.8kStars
1.2kForks
244Open Issues
74Watchers
Project Info
- Language
- HTML
- License
- Apache License 2.0
- Last Pushed
- May 22, 2026
- Discussions
- View discussions ↗
Topics
data-pipelinesdeep-learningdocument-image-analysisdocument-image-processingdocument-parserdocument-parsingdocxdonutinformation-retrievallangchainllmmachine-learningmlnatural-language-processingnlpocrpdfpdf-to-jsonpdf-to-textpreprocessing