Training Vision-Language Models for BLV-Aligned Diagram Descriptions using Sighted User Feedback

No Comments

Sightation: Using Sighted Feedback to Build Better Diagram Descriptions for BLV Users

This paper introduces a novel approach to creating high-quality diagram descriptions for blind and low-vision (BLV) users by leveraging sighted user feedback on VLM-generated descriptions rather than asking them to write descriptions from scratch.

The key insight is that sighted users can evaluate effectively even if they aren’t skilled at producing BLV-optimized descriptions. The researchers:

Generate diverse candidate descriptions using GPT-4V with different prompting strategies

Collect sighted user feedback on these candidates

Validate with BLV educators that this approach creates useful descriptions

Build comprehensive datasets for multiple tasks

Key Technical Contributions:

Multi-pass inference approach: Used progressive prompting to generate diagram descriptions with increasing complexity/specificity

Annotation protocol: Designed efficient protocol for collecting sighted user evaluations of:

Description completion
Comparative preference
Verification of description accuracy

Dataset creation: Released 5 datasets (137K samples across 5K diagrams):

SightCOMPLETE: 50K samples with completion annotations
SightPREFER: 71K preference annotations between descriptions
SightRETRIEVE: 5K diagram-description matching samples
SightQA: 6K question-answer pairs about diagrams
SightREASON: 5K multi-step reasoning examples

Evaluation: BLV educators rated descriptions from sighted feedback as comparable or better than expert-written ones in terms of content coverage, sequence, and additional information.

Fine-tuning results: Models fine-tuned on Sightation datasets showed significant improvements:

LLaVA-1.5 improved from 12.4% to 53.7% win rate against ChatGPT
GPT-4V improved from 44.7% to 68.5% win rate in blind evaluations

I think this approach could be a game-changer for accessibility. Rather than relying on expensive BLV expert annotations or settling for lower-quality direct annotations from sighted users, this feedback-based approach produces high-quality descriptions at scale. The methodology could extend beyond diagrams to other visual accessibility challenges where the consumer and producer of descriptions have different visual abilities.

TLDR: The researchers created a method and datasets that use sighted user feedback on AI-generated diagram descriptions to create high-quality, BLV-aligned content. Models fine-tuned on these datasets produce significantly better descriptions for visually impaired users.

Uncategorized

Training Vision-Language Models for BLV-Aligned Diagram Descriptions using Sighted User Feedback

Sightation: Using Sighted Feedback to Build Better Diagram Descriptions for BLV Users

Leave a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories