Skip to content

Releases: ymrohit/openscenesense-ollama

OpenSceneSense-Ollama v1.0.2

05 Nov 10:47
Compare
Choose a tag to compare

Fixed the import errors

OpenSceneSense-Ollama v1.0.1

05 Nov 01:14
Compare
Choose a tag to compare

Fixed default model bug and updated the version

OpenSceneSense-Ollama v1.0.0

05 Nov 00:07
Compare
Choose a tag to compare

I'm excited to announce the initial release of OpenSceneSense Ollama, a powerful Python package for local video analysis using Ollama's models!

🌟 Major Features

Local Video Analysis

  • Frame Analysis Engine powered by Ollama's vision models
  • Audio Transcription using local Whisper models
  • Dynamic Frame Selection for optimal scene coverage
  • Comprehensive Video Summaries integrating visual and audio elements
  • Metadata Extraction for detailed video information

Privacy & Control

  • 🔒 Fully local processing - no cloud dependencies
  • 🛠️ Customizable analysis pipelines
  • 💪 GPU acceleration support
  • 🎯 Fine-tuning capabilities for specific use cases

⚙️ Technical Features

Core Components

  • Modular architecture supporting custom components
  • Flexible frame selection strategies
  • Configurable model selection for different analysis tasks
  • Extensible prompt system for customized analysis

Performance

  • Optimized frame processing pipeline
  • GPU acceleration support with CUDA 12.1
  • Memory-efficient frame selection
  • Configurable processing parameters

Integration

  • FFmpeg integration for robust video handling
  • PyTorch backend for ML operations
  • Whisper integration for audio processing
  • Compatible with all Ollama vision models

📋 Requirements

Minimum Requirements

  • Python 3.10+
  • FFmpeg
  • Ollama installed and running
  • 8GB RAM
  • 4GB storage space

Recommended Specifications

  • NVIDIA GPU with CUDA 12.1+
  • 16GB RAM
  • SSD storage
  • 8-core CPU

🛠️ Configuration Options

Models

  • Support for multiple Ollama vision models:
    • llava (default)
    • minicpm-v
    • bakllava
  • Configurable summary models:
    • llama3.2
    • mistral
    • claude-3-haiku (default)

Frame Selection

  • Adjustable frame rate (default: 4.0 fps)
  • Min frames: 8 (configurable)
  • Max frames: 64 (configurable)
  • Multiple selection strategies:
    • Dynamic (scene-aware)
    • Uniform
    • Content-aware

Audio Processing

  • Whisper model selection
  • GPU acceleration support
  • Multiple output formats
  • Timestamp alignment

🔧 API Improvements

New Classes

  • OllamaVideoAnalyzer: Main analysis pipeline
  • WhisperTranscriber: Audio processing
  • DynamicFrameSelector: Smart frame selection
  • AnalysisPrompts: Customizable prompts

Enhanced Configuration

  • Flexible host configuration
  • Custom frame processors
  • Configurable logging levels
  • Modular component architecture

📝 Documentation

  • Comprehensive README
  • Detailed API documentation
  • Example scripts and notebooks
  • Configuration guides
  • Best practices documentation

🐛 Known Issues

  1. High memory usage with large frame counts
  2. Potential GPU memory issues with 4GB cards
  3. Limited support for some video codecs

🚀 Next Steps

We're already working on:

  1. Memory optimization
  2. Additional frame selection strategies
  3. Enhanced error handling
  4. More example notebooks
  5. Performance improvements

🙏 Acknowledgments

Special thanks to:

  • The Ollama team for their amazing models
  • OpenAI for Whisper
  • The open-source community for valuable feedback

📦 Installation

pip install openscenesense-ollama

🔗 Links

📄 License

MIT License - See LICENSE file for details