Repository for the Bias Benchmark for QA dataset.

Last update: Nov 18, 2022

Related tags

Overview

BBQ

Repository for the Bias Benchmark for QA dataset.

Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman.

About BBQ

It is well documented that NLP models learnsocial biases present in the world, but littlework has been done to show how these biasesmanifest in actual model outputs for appliedtasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), adataset consisting of question-sets constructedby the authors that highlightattestedsocialbiases against people belonging to protectedclasses along nine different social dimensionsrelevant for U.S. English-speaking contexts.Our task evaluates model responses at two distinct levels: (i) given an under-informative context, test how strongly model answers reflectsocial biases, and (ii) given an adequately informative context, test whether the model’s biases still override a correct answer choice. Wefind that models strongly rely on stereotypeswhen the context is ambiguous, meaning thatthe model’s outputs consistently reproduceharmful biases in this setting. Though modelsare much more accurate when the context provides an unambiguous answer, they still relyon stereotyped information and achieve an accuracy 2.5 percentage points higher on examples where the correct answer aligns with a social bias, with this accuracy difference widening to over 5 points for examples targeting gender.

The paper

You can read our paper "BBQ: A Hand-Built Bias Benchmark for Question Answering" here.

File structure

data
- Description: This folder contains each set of generated examples for BBQ. This is the folder you would use to test BBQ.
- Contents: 11 jsonl files, each containing all templated examples. Each category is a separate file.
results
- Description: This folder contains our results after running BBQ on UnifiedQA
- Contents: 11 jsonl files, each containing all templated examples and three sets of results for each example line:
  - Predictions using ARC-format
  - Predictions using RACE-format
  - Predictions using a question-only baseline
supplemental
- Description: Additional files used in validation and selecting names for the vocabulary
- Contents:
  - MTurk_validation contains the HIT templates, scripts, input data, and results from our MTurk validations
  - name_job_data contains files downloaded that contain name & demographic information or occupation prestige scores for developing these portions of the vocabulary
templates
- Description: This folder contains all the templates and vocabulary used to create BBQ
- Contents: 11 csv files that contain the templates used in BBQ, 1 csv file listing all filler items used in the validation, 2 csv files for the BBQ vocabulary.

Repository for the Bias Benchmark for QA dataset.

Related tags

Overview

BBQ

About BBQ

The paper

File structure

Owner

ML² AT CILVR

Attention Probe: Vision Transformer Distillation in the Wild

Localizing Visual Sounds the Hard Way

A deep learning based semantic search platform that computes similarity scores between provided query and documents

Real-time Object Detection for Streaming Perception, CVPR 2022

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

Semantic Segmentation with SegFormer on Drone Dataset.

A Python package for time series augmentation

ICLR2021 (Under Review)

Chatbot in 200 lines of code using TensorLayer

PyTorch wrapper for Taichi data-oriented class

This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

VGGFace2-HQ - A high resolution face dataset for face editing purpose

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Auto White-Balance Correction for Mixed-Illuminant Scenes

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Repository for the Bias Benchmark for QA dataset.

Related tags

Overview

BBQ

About BBQ

The paper

File structure

Owner

ML² AT CILVR

Attention Probe: Vision Transformer Distillation in the Wild

Localizing Visual Sounds the Hard Way

A deep learning based semantic search platform that computes similarity scores between provided query and documents

Real-time Object Detection for Streaming Perception, CVPR 2022

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

Semantic Segmentation with SegFormer on Drone Dataset.

A Python package for time series augmentation

ICLR2021 (Under Review)

Chatbot in 200 lines of code using TensorLayer

PyTorch wrapper for Taichi data-oriented class

This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

VGGFace2-HQ - A high resolution face dataset for face editing purpose

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Auto White-Balance Correction for Mixed-Illuminant Scenes

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务