Loading
If you deal with documents and images and want to save time on parsing, analyzing, or describing them, PyVisionAI is for you. It unifies multiple Vision LLMs (GPT-4 Vision, Claude Vision, or local Llama2-based models) under one workflow, so you can extract text and images from PDF, DOCX, PPTX, and HTML—even capturing fully rendered web pages—and generate human-like explanations for images or diagrams.
brew tap mdgrey33/pyvisionai brew install pyvisionai # Optional: Needed for dynamic HTML extraction playwright install chromium # Optional: For Office documents (DOCX, PPTX) brew install --cask libreoffice
This leverages Python 3.11+ automatically (as required by the Homebrew formula). If you’re on Windows or Linux, you can install via pip install pyvisionai
(Python 3.8+).
file-extract
for documents, describe-image
for images.create_extractor(...)
to handle large sets of files; describe_image_*
functions for quick references in code.from pyvisionai import create_extractor, describe_image_claude # 1. Extract content from PDFs extractor = create_extractor("pdf", model="gpt4") # or "claude", "llama" extractor.extract("quarterly_reports/", "analysis_out/") # 2. Describe an image or diagram desc = describe_image_claude( "circuit.jpg", prompt="Explain what this circuit does, focusing on the components" ) print(desc)
pip install pyvisionai
If there’s a feature you need—maybe specialized document parsing, new prompt templates, or deeper local model integration—please ask or open a feature request on GitHub. I want PyVisionAI to fit right into your workflow, whether you’re doing academic research, business analysis, or general-purpose data wrangling.
Give it a try and share your ideas! I’d love to know how PyVisionAI can make your work easier.
submitted by /u/Electrical-Two9833
[link] [comments]