There is a problem with the Chinese document Q&A #1464

iMountTai · 2023-12-28T04:58:33Z

iMountTai
Dec 28, 2023

When I uploaded the Chinese document and did question and answer, the model reply did not have any connection with the document and was similar to the randomly generated reply. When I checked /local_data/docstore.json, I found that the generated Chinese information was garbled. Can anyone provide a solution? Urgently! There's nothing wrong with the conversation.
my settings.yaml:

# The default configuration file.
# More information about configuration can be found in the documentation: https://docs.privategpt.dev/
# Syntax in `private_pgt/settings/settings.py`
server:
  env_name: ${APP_ENV:prod}
  port: ${PORT:8001}
  cors:
    enabled: false
    allow_origins: ["*"]
    allow_methods: ["*"]
    allow_headers: ["*"]
  auth:
    enabled: false
    # python -c 'import base64; print("Basic " + base64.b64encode("secret:key".encode()).decode())'
    # 'secret' is the username and 'key' is the password for basic auth by default
    # If the auth is enabled, this value must be set in the "Authorization" header of the request.
    secret: "Basic c2VjcmV0OmtleQ=="

data:
  local_data_folder: local_data/

ui:
  enabled: true
  path: /
  default_chat_system_prompt: "You are a helpful assistant. 你是一个乐于助人的助手。"
  default_query_system_prompt: >
    You can only answer questions about the provided context. 
    If you know the answer but it is not based in the provided context, don't provide 
    the answer, just state the answer is not in the context provided.

llm:
  mode: local
  # Should be matching the selected model
  max_new_tokens: 512
  context_window: 4096
  tokenizer: hfl/chinese-alpaca-2-7b

embedding:
  # Should be matching the value above in most cases
  mode: local
  ingest_mode: simple

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

local:
  prompt_style: llama2
  llm_hf_repo_id: hfl/chinese-alpaca-2-7b-gguf
  llm_hf_model_file: ggml-model-q4_k.gguf
  embedding_hf_model_name: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

sagemaker:
  llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
  embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479

openai:
  api_key: ${OPENAI_API_KEY:}
  model: gpt-3.5-turbo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is a problem with the Chinese document Q&A #1464

{{title}}

Replies: 0 comments

Select a reply

There is a problem with the Chinese document Q&A #1464

iMountTai Dec 28, 2023

Replies: 0 comments

iMountTai
Dec 28, 2023