Updated May 2021 A new version of the methodology to discover biases in language communities is described in the paper Discovering and Interpreting Conceptual Biases in Online Communities. We also developed a new Language Bias Visualiser with new options and analyses!
Language Bias Visualiser
The DADD (Discovering and Attesting Digital Discrimination) Language Bias Visualiser is a tool to interactively compare men and women stereotypes inherent in large textual datasets taken from the internet, as captured by Word Embeddings models.
Language carries implicit biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, advances in Artificial Intelligence have made it possible to use machine learning techniques to trace linguistic biases. One of the most promising approaches in this field involves word embeddings, which transform text into high-dimensional vectors and capture semantic relations between words, and which has been successfully used to quantify human biases in large textual datasets. Target concepts such as `men' or `women' are connected to evaluative attributes found in the data, which are then categorised through clustering algorithms and labelled through a semantic analysis system into more general (conceptual) biases. Categorising biases allows us to give a broad picture of the biases present in a community.
Explore the data
Click on any of the cards and explore different interactive approaches to discover women and men stereotypes and gender-related biases found in the selected dataset.
Most Frequent Gender-Biased Words
The word clouds presented below show the most frequent words biased towards women (left) and men (right) in the selected dataset, that is, these words more often found in women and men-related contexts. The size and color of each word corresponds with its frequency, bigger means more frequent. For details about each word, see section Detailed Dataset Word Biases.
Women
Men
Detailed Dataset Word Biases
This section shows the details of the most frequent and gender-biased words for women (left) and men (right) in the dataset.
Women
# | Word | Bias | Freq | Sent | POS |
---|
Men
# | Word | Bias | Freq | Sent | POS |
---|
Word Distributions of Biases
Explore the bias and frequency distributions of all gender-biased words in the dataset in two bar plots; women-biased words are shown on the left bar plot and men-biased words on the right. By comparing the distributions, one could observe the differences between genders in the dataset. For instance, in The Red Pill, although men-biased words are more frequent, women-biased words hold stronger gender biases.
Bias Polarity
Explore the sentiment of the most gender-biased words for women (left) and men (right), classified in 7 categories ranging from positive to negative meanings.
Word Embedding Space
Explore the distribution of women and men-biased words in the embedding space as learned by a machine learning algorithm, represented in the two principal t-SNE dimensions.
Concept Embedding Space
The figure below shows the distribution of women and men concepts on the embedding space for the selected dataset, presented in the two most informative t-SNE dimensions.