Releases: ymrohit/openscenesense-ollama
Releases · ymrohit/openscenesense-ollama
OpenSceneSense-Ollama v1.0.2
OpenSceneSense-Ollama v1.0.1
Fixed default model bug and updated the version
OpenSceneSense-Ollama v1.0.0
I'm excited to announce the initial release of OpenSceneSense Ollama, a powerful Python package for local video analysis using Ollama's models!
🌟 Major Features
Local Video Analysis
- Frame Analysis Engine powered by Ollama's vision models
- Audio Transcription using local Whisper models
- Dynamic Frame Selection for optimal scene coverage
- Comprehensive Video Summaries integrating visual and audio elements
- Metadata Extraction for detailed video information
Privacy & Control
- 🔒 Fully local processing - no cloud dependencies
- 🛠️ Customizable analysis pipelines
- 💪 GPU acceleration support
- 🎯 Fine-tuning capabilities for specific use cases
⚙️ Technical Features
Core Components
- Modular architecture supporting custom components
- Flexible frame selection strategies
- Configurable model selection for different analysis tasks
- Extensible prompt system for customized analysis
Performance
- Optimized frame processing pipeline
- GPU acceleration support with CUDA 12.1
- Memory-efficient frame selection
- Configurable processing parameters
Integration
- FFmpeg integration for robust video handling
- PyTorch backend for ML operations
- Whisper integration for audio processing
- Compatible with all Ollama vision models
📋 Requirements
Minimum Requirements
- Python 3.10+
- FFmpeg
- Ollama installed and running
- 8GB RAM
- 4GB storage space
Recommended Specifications
- NVIDIA GPU with CUDA 12.1+
- 16GB RAM
- SSD storage
- 8-core CPU
🛠️ Configuration Options
Models
- Support for multiple Ollama vision models:
- llava (default)
- minicpm-v
- bakllava
- Configurable summary models:
- llama3.2
- mistral
- claude-3-haiku (default)
Frame Selection
- Adjustable frame rate (default: 4.0 fps)
- Min frames: 8 (configurable)
- Max frames: 64 (configurable)
- Multiple selection strategies:
- Dynamic (scene-aware)
- Uniform
- Content-aware
Audio Processing
- Whisper model selection
- GPU acceleration support
- Multiple output formats
- Timestamp alignment
🔧 API Improvements
New Classes
OllamaVideoAnalyzer
: Main analysis pipelineWhisperTranscriber
: Audio processingDynamicFrameSelector
: Smart frame selectionAnalysisPrompts
: Customizable prompts
Enhanced Configuration
- Flexible host configuration
- Custom frame processors
- Configurable logging levels
- Modular component architecture
📝 Documentation
- Comprehensive README
- Detailed API documentation
- Example scripts and notebooks
- Configuration guides
- Best practices documentation
🐛 Known Issues
- High memory usage with large frame counts
- Potential GPU memory issues with 4GB cards
- Limited support for some video codecs
🚀 Next Steps
We're already working on:
- Memory optimization
- Additional frame selection strategies
- Enhanced error handling
- More example notebooks
- Performance improvements
🙏 Acknowledgments
Special thanks to:
- The Ollama team for their amazing models
- OpenAI for Whisper
- The open-source community for valuable feedback
📦 Installation
pip install openscenesense-ollama
🔗 Links
📄 License
MIT License - See LICENSE file for details