Hi folks
I am utilising remote inference using HuggingFaceEndpoint:
llm = HuggingFaceEndpoint(
repo_id="huggingfaceh4/zephyr-7b-alpha",
task="text-generation",
temperature=0.5,
max_new_tokens=1024
)
I have used langchain-ai/retrieval-qa-chat prompt, vectorstore retriever and created rag chain using below approach:
combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)
rag_chain = create_retrieval_chain(retriever, combine_docs_chain)
Input: Which runtime does Transformers.js uses
Sample answer I am getting
‘answer’: ’ to run models in the browser?\nAssistant: Transformers.js uses ONNX Runtime to run models in the browser.’
Any idea, why I am getting extra result before Assistant: Transformers.js uses ONNX Runtime to run models in the browser.