SCRIPT

Kreuzberg: A Versatile Document Intelligence Framework

Kreuzberg is a powerful document intelligence framework that extracts data from various formats, making it ideal for developers across multiple languages.

document-intelligence rust python nodejs go csharp elixir ruby
Kreuzberg: A Versatile Document Intelligence Framework

📦 Get Kreuzberg: A Versatile Document Intelligence Framework

vmain· Other· ⭐ 8.4K stars · Updated Jun 6, 2026

In today's data-driven world, managing and extracting useful information from documents is crucial for businesses and developers alike. Whether you're dealing with PDFs, Office documents, images, or other formats, the challenge often lies in parsing them efficiently. Kreuzberg, a polyglot document intelligence framework with a Rust core, aims to tackle this problem by offering robust capabilities across more than 97 file formats. It provides an elegant solution for developers looking to extract text, metadata, images, and structured information seamlessly.

What Is Kreuzberg?

Kreuzberg is a document intelligence framework that allows developers to extract and process data from various document types. With its core written in Rust, it ensures performance and safety, making it suitable for high-demand applications. Whether you are working in Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript, or using it via CLI or REST API, Kreuzberg bridges the gap between complex document formats and your application.

Key Features

  • Multi-Language Support: Kreuzberg supports a wide range of programming languages, including Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, and TypeScript, providing flexibility for developers.
  • Comprehensive Format Coverage: Extracts data from PDFs, Office documents, images, and over 97 other formats, making it highly versatile for different use cases.
  • CLI and API Access: Use Kreuzberg through a command-line interface or integrate via REST API for seamless automation in your workflows.
  • Structured Data Extraction: Not just text, but also metadata and images can be extracted in a structured format, enhancing data usability.
  • Performance Optimized: With a Rust core, Kreuzberg is designed for speed and efficiency, capable of handling large documents without a hitch.
  • Community-Driven Development: With over 8,294 stars on GitHub, it's evident that Kreuzberg is backed by a vibrant community, ensuring continuous improvement and support.
  • Docker Support: For easy deployment, Kreuzberg can be run in a Docker container, making it simple to integrate into existing systems.

Installation & Setup

Installing Kreuzberg is straightforward, and the method depends on your chosen programming language. Below are examples for a few popular languages:

Rust

CODE
cargo add kreuzberg

Python

CODE
pip install kreuzberg

Node.js

CODE
npm install @kreuzberg/node

Java

CODE
<dependency>
    <groupId>dev.kreuzberg</groupId>
    <artifactId>kreuzberg</artifactId>
    <version>1.0.0</version>
</dependency>

For other languages and more detailed installation instructions, refer to the official Kreuzberg repository.

How to Use It

Here’s a practical example of how to extract text from a PDF using Kreuzberg in Python:

CODE
import kreuzberg

# Load the PDF file
pdf_file = "sample.pdf"

# Extract text
text = kreuzberg.extract_text(pdf_file)

print(text)

This simple snippet demonstrates how easy it is to get started with Kreuzberg. You can also extract images and metadata in a similar fashion by using additional methods provided by the library.

Who Should Use Kreuzberg?

Kreuzberg is ideal for developers and companies that need to handle a variety of document formats and require a robust solution for data extraction. It’s particularly beneficial for:

  • Data analysts looking to automate data extraction from reports and documents.
  • Software developers integrating document processing capabilities into their applications.
  • Businesses processing large volumes of documents that demand efficiency and accuracy.

Final Thoughts

Kreuzberg stands out as a powerful document intelligence framework that simplifies the complex task of data extraction across various formats. Its multi-language support and solid Rust core make it a reliable choice for developers. Whether you’re building a new application or enhancing existing workflows, Kreuzberg is worth considering for its performance and ease of use.

ScriptForge Admin

Senior developer and curator of the ScriptForge platform. Specializing in PHP, Laravel, and full-stack JavaScript development.

gh
𝕏
🌐

Related Scripts