more flexibility on Audio Transcriber #1

ohjho · 2025-01-13T15:39:53Z

first of all, just want to say this is a great package! 💯 Thanks for putting this out there ❤️

on to the audio transcription issue...

have you thought about using a "local" Transcriber like in openscenesense-ollama?
or alternatively, loading whisper using the huggingface serverless inference api (e.g. see below)
and an option to turn off audio transcription (which might be useful if the video has no audio track)

import requests

API_URL = "https://api-inference.huggingface.co/models/openai/whisper-small"
headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}

def query(filename):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.post(API_URL, headers=headers, data=data)
    return response.json()

output = query("sample1.flac")

The text was updated successfully, but these errors were encountered:

ymrohit · 2025-01-13T16:12:25Z

Hi @ohjho , Thank you for your feedback and suggestion. I think the huggingface serverless inference api can be implemented and would help this library a lot along with the option to turn off the audio. I thought of adding local inference but I didn't want to make this library heavy with powerful dependencies. The main goal of this library is to be light weight and fast so that it can run on any device. I am open for further feedback, please feel free to let me know. Also please feel free to contribute any new ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more flexibility on Audio Transcriber #1

more flexibility on Audio Transcriber #1

ohjho commented Jan 13, 2025

ymrohit commented Jan 13, 2025

more flexibility on Audio Transcriber #1

more flexibility on Audio Transcriber #1

Comments

ohjho commented Jan 13, 2025

ymrohit commented Jan 13, 2025