Lipstick on a Pig

Unfortunately, there's a big flaw in the linear projection trick.



We will repeat the steps in this guide. You'll need to download two word-lists found here first.

import pathlib
from whatlies.transformers import Pca, Umap
from whatlies.language import SpacyLanguage, FasttextLanguage
male_word = pathlib.Path("male-words.txt").read_text().split("\n")
female_word = pathlib.Path("female-words.txt").read_text().split("\n")
lang = FasttextLanguage("cc.en.300.bin")
e1 = lang[male_word].add_property("group", lambda d: "male")
e2 = lang[female_word].add_property("group", lambda d: "female")
emb_debias = e1.merge(e2) | (lang['man'] - lang['woman'])

We now have a debiased embeddingset emb_debias that we can use in a scikit-learn pipeline.

from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
# There is overlap in the word-lists which we remove via `set`.
words = list(male_word) + list(female_word)
words = list(set(words))
labels = [w in male_word for w in words]
# We use our language backend as a transformer in scikit-learn.
pipe = Pipeline([
("embed", lang),
("model", SVC())

Method I: Biased Embedding, Biased Model

To run this standard model without any debiasing you can run:

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(words,
y_pred =, y_train).predict(X_test)

print(classification_report(y_pred, y_test))

This gives us the following results.

              precision    recall  f1-score   support

       False       0.87      0.92      0.90        93
        True       0.94      0.89      0.91       116

    accuracy                           0.90       209
   macro avg       0.90      0.91      0.90       209
weighted avg       0.91      0.90      0.90       209

Method II: UnBiased Embedding, Biased Model

If we now apply debiasing on the vectors then one might expect the old model to no longer be able to predict the gender.

X, y = emb_debias.to_X_y('group')
X_train, X_test, y_train, y_test = train_test_split(X, y,
y_pred = pipe.steps[1][1].predict(X_test)
print(classification_report(y_pred, y_test == 'male'))

This gives us the following result.

              precision    recall  f1-score   support

       False       0.97      0.73      0.83       131
        True       0.68      0.96      0.79        78

    accuracy                           0.81       209
   macro avg       0.82      0.84      0.81       209
weighted avg       0.86      0.81      0.82       209

Method III: UnBiased Embedding, UnBiased Model

We can also try to create a model that is both trained and applied on the unbiased vectors.

y_pred = SVC().fit(X_train, y_train).predict(X_test)
print(classification_report(y_pred, y_test))

This gives us the following result.

              precision    recall  f1-score   support

      female       0.80      0.83      0.81        94
        male       0.86      0.83      0.84       115

    accuracy                           0.83       209
   macro avg       0.83      0.83      0.83       209
weighted avg       0.83      0.83      0.83       209


Cosine distance seems to suggest that we're able to remove the gender "direction" from our embeddings by using linear projections as a debiasing technique. However, if we use the debiased embeddings to predict gender it seems that we still keep plenty of predictive power.

That's why we cannot say that this deniasing technique is enough! There's plenty of reasons to remain careful and critical when applying word embeddings in practice.


Try to answer the following questions to test your knowledge.

  1. Can you think of a reason why our projection trick doesn't seem to work while we do see an effect in cosine similarity?

2016-2022 © Rasa.