-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token streaming doesn't work #517
Comments
What kind of model / api was it hooked up to? |
Thought that would be in the debug json, but I've tried with both LLaMA 2 and Mixtral 8x7B in GGUF format, running on KoboldCPP (with cuBLAS and full offload to a 3090). I'm using the KoboldAI United UI (localhost:5000, not lite). |
United can't stream over the API thats why streaming is missing. |
What do you mean it can't stream over the API? So it can't stream at all? |
It can stream when you use huggingface based models in the main UI. |
So I can't use my 3090 to run models? Or I can't use GGUF files? |
You can't use GGUF's combined with United combined with streaming. |
OK, so the solution is to not use GGUF then? the lite UI is mostly unusable for me (it works fine, it just has an awful user experience) |
Yes, the backends built in to KoboldAI United should work (Huggingface, exllama2) |
kobold_debug.json
For some reason token streaming just does not work. It's enabled and the actual terminal output from the server updates every token but no messages are actually sent over websocket to the UI so it can't be displayed until the response is complete. No idea what is going on.
I'm on the latest United commit 1e985ed.
The text was updated successfully, but these errors were encountered: