Skip to content

Extracting data from Image-based PDF files using OCR to JSON files

Notifications You must be signed in to change notification settings

pyxploiter/PDF2JSON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image-based PDF To JSON Data Extracting using OCR

Getting Started

Setup

$ git clone https://github.com/pyxploiter/PDF2JSON.git
$ cd PDF2JSON
$ pip install -r requirements.txt

Usage

usage: pdf2json.py [-h] -i PDF_FILE [-s] [-d]

optional arguments:
  -h, --help         show this help message and exit
  -i PDF_FILE        enter the path to pdf file
  -s, --save_images  save images of pdf file
  -d, --debug        turn on the debug mode

Run for a pdf file

example: python pdf2json.py -i tests/example1.pdf

Run for a folder containing pdf files

usage: ./run_for_folder.sh PATH/TO/FOLDER [-s] [-d]
example: ./run_for_folder.sh tests -s -d

About

Extracting data from Image-based PDF files using OCR to JSON files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages