NovAScore: A New Automated Metric for Evaluating Document Level Novelty
January 2025
We introduce NovAScore (Novelty Evaluation in Atomicity Score), an automated metric for evaluating document-level novelty. NovAScore aggregates the novelty and salience scores of atomic information, providing high interpretability and a detailed analysis of a document's novelty. With its dynamic weight adjustment scheme, NovAScore offers enhanced flexibility and an additional dimension to assess both the novelty level and the importance of information within a document.
An Exploratory Framework for LLM-assisted Human Annotation of Speech Datasets
August 2025
We introduce a framework for LLM-based human-in-the-loop ASR designed to enhance the quality of ASR transcripts, with a particular focus on accurately capturing named entities. A key contribution of this work is demonstrating that when LLMs are provided with high-quality, human-annotated transcript examples, even a small set can significantly improve WER and entity recall rate.
Comparison-Based Automatic Evaluation for Meeting Summarization
August 2025
We introduce CREAM (Comparison-based Reference-free Elo-ranked Automatic evaluation for Meeting summarization), a novel framework that addresses the unique challenges of evaluating meeting summaries. CREAM leverages a combination of chain-of-thought reasoning and key facts alignment to assess conciseness and completeness of model-generated summaries without requiring reference.