Literature Reading Notes:

Nvidia ChatQA

ChatRAG bench 10 datasets.

Introduction: 提到了GPT4的优点, 1. easy interaction for follow-up Questions. 2. provided context is longer than the context window. 3. zero-shot, same accuracy as fine-tuned models.

Proposed a two-stage instruction tuning, design a dataset.

Utilize sythetic data generation for training customized retreiver.

ChatRag bench: 10 conversational QA datasets, 5 with long documentations, 3 with tabular data and arithmetic calculations.

“unanswerable scenario”: answer “cannot” instead hallucinating.

Related Works:

  1. conversational QA and RAG: 对话方式提高用户体验,有需要会澄清
  2. retrieval for multi-turn QA: dense retrievers, trained to retrieve the top-k relevant chunks given a single question.
  3. conversational query rewriting: reinforcement learning methods. Few-shot generative models like GPT-2. instruction tuned GPT-3.5-Turbo.
  4. finetuning retriever for multi-turn QA: zero-shot evaluation. Finetune a single-turn query retriever on a high-quality multi-turn dataset. Then evaluate zero-shot capability of the fine-tuned retriever on 5 benchmark datasets.

Note: a “turn” in a conversation is marked by one back and forth interaction.

  1. Instruction Tuning: high quality instruction tuning dataset: FLAN, self-instruct, unnatural instructions, dolly, openAssistant.

Adding unanswerable samples reduce hallucinations.

ChatQA 2-stage instruction tuning method