By Xavier Ferrer
Research Associate at King's College London, Informatics
and Tom Van Nuenen
Research Associate at King's College London, Informatics

Updated Aug 2020 The approach used to discover the biases presented in this tool is described in the paper "Discovering and Categorising Language Biases in Reddit", accepted at the forthcoming International AAAI Conference on Web and Social Media (ICWSM 2021). Also, if you are interested in training your own embedding models to discover biases, we are sharing our code on Github.

Updated May 2021 A new version of the methodology to discover biases in language communities is described in the paper Discovering and Interpreting Conceptual Biases in Online Communities. We also developed a new Language Bias Visualiser with new options and analyses!

Language Bias Visualiser

The DADD (Discovering and Attesting Digital Discrimination) Language Bias Visualiser is a tool to interactively compare men and women stereotypes inherent in large textual datasets taken from the internet, as captured by Word Embeddings models.

Language carries implicit biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, advances in Artificial Intelligence have made it possible to use machine learning techniques to trace linguistic biases. One of the most promising approaches in this field involves word embeddings, which transform text into high-dimensional vectors and capture semantic relations between words, and which has been successfully used to quantify human biases in large textual datasets. Target concepts such as `men' or `women' are connected to evaluative attributes found in the data, which are then categorised through clustering algorithms and labelled through a semantic analysis system into more general (conceptual) biases. Categorising biases allows us to give a broad picture of the biases present in a community.

+ show more

Dataset Selection

Find a link to the online community below:
https://reddit.com/r/dating_advice



Most Frequent Gender-Biased Words

The word clouds presented below show the most frequent words biased towards women (left) and men (right) in the selected dataset, that is, these words more often found in women and men-related contexts. The size and color of each word corresponds with its frequency, bigger means more frequent. For details about each word, see section Detailed Dataset Word Biases.

+ show more

Women

Men


Detailed Dataset Word Biases

This section shows the details of the most frequent and gender-biased words for women (left) and men (right) in the dataset.

+ show more

Women

#WordBiasFreqSentPOS

Men

#WordBiasFreqSentPOS

Word Distributions of Biases

Explore the bias and frequency distributions of all gender-biased words in the dataset in two bar plots; women-biased words are shown on the left bar plot and men-biased words on the right. By comparing the distributions, one could observe the differences between genders in the dataset. For instance, in The Red Pill, although men-biased words are more frequent, women-biased words hold stronger gender biases.

+ show more


Bias Polarity

Explore the sentiment of the most gender-biased words for women (left) and men (right), classified in 7 categories ranging from positive to negative meanings.

+ show more

Word Embedding Space

Explore the distribution of women and men-biased words in the embedding space as learned by a machine learning algorithm, represented in the two principal t-SNE dimensions.

+ show more


Concept Embedding Space

The figure below shows the distribution of women and men concepts on the embedding space for the selected dataset, presented in the two most informative t-SNE dimensions.

+ show more