Loading
This paper introduces a novel approach to creating high-quality diagram descriptions for blind and low-vision (BLV) users by leveraging sighted user feedback on VLM-generated descriptions rather than asking them to write descriptions from scratch.
The key insight is that sighted users can evaluate effectively even if they aren’t skilled at producing BLV-optimized descriptions. The researchers:
Key Technical Contributions:
Annotation protocol: Designed efficient protocol for collecting sighted user evaluations of:
Dataset creation: Released 5 datasets (137K samples across 5K diagrams):
Evaluation: BLV educators rated descriptions from sighted feedback as comparable or better than expert-written ones in terms of content coverage, sequence, and additional information.
Fine-tuning results: Models fine-tuned on Sightation datasets showed significant improvements:
I think this approach could be a game-changer for accessibility. Rather than relying on expensive BLV expert annotations or settling for lower-quality direct annotations from sighted users, this feedback-based approach produces high-quality descriptions at scale. The methodology could extend beyond diagrams to other visual accessibility challenges where the consumer and producer of descriptions have different visual abilities.
TLDR: The researchers created a method and datasets that use sighted user feedback on AI-generated diagram descriptions to create high-quality, BLV-aligned content. Models fine-tuned on these datasets produce significantly better descriptions for visually impaired users.
Full summary is here. Paper here.
submitted by /u/Successful-Western27
[link] [comments]