Skip to content
This repository has been archived by the owner on Dec 13, 2021. It is now read-only.
/ pyxpdf Public archive
forked from ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

License

Notifications You must be signed in to change notification settings

pb-jeff-oneill/pyxpdf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyxpdf

pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.

docs Read the Docs
tests Azure DevOps builds (branch) Travis (.com) Codecov
package PyPI PyPI - Python Version PyPI - Wheel PyPI - Downloads
license GitHub

Features

  • Almost x20 times faster than pure python based pdf parsers (see Speed Comparison)
  • Extract text while maintaining original document layout (best possible)
  • Support almost all PDF encodings, CMaps and predefined CMaps.
  • Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
  • Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
  • No explict dependencies (except optional ones, see Installation)
  • Thread Safe

More Information

License

pyxpdf is licensed under the GNU General Public License (GPL), version 2 or 3. See the LICENSE

Credits

About

Fast and memory-efficient Python PDF Parser based on xpdf sources

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Cython 69.7%
  • Python 22.8%
  • C++ 5.1%
  • Makefile 1.3%
  • Shell 1.1%