Docling CLI to parse PDFs and export it to multiple formats
What is Docling ??? Docling is an open source document processing library that converts various document formats into structured outputs. Docling plays an important part in the RAG pipeline. I'll b...

Source: DEV Community
What is Docling ??? Docling is an open source document processing library that converts various document formats into structured outputs. Docling plays an important part in the RAG pipeline. I'll be taking you through the process of parsing PDFs into structured formats. Step 1: Set up Create the project structure in your terminal; mkdir docling_cli cd docling_cli Create your virtual environment and activate it. Fedora Windows Step 2: Installing docling pip install docling docling --version Fedora Windows Check the docling's version Step 3: Creating input and outputs folders create a folder called data where you will stored your desired pdfs. create a new folder and name it outputs then inside the folders create new folders called; markdown outputs, html outputs and json outputs. Step 4: Changing the pdfs into html format docling --to html *.pdf --output ~Documents/docling_cli/outputs/html_outputs Step 5: Changing the pdfs into other formats 1. Markdown 2. Json 3. Plain text 4. yaml 5.