Fairness-Aware Data Mining | Software and Data Sets

Author : Toshihiro Kamishima
Copyright : Copyright © 2012 Toshihiro Kamishima all rights reserved
License : MIT License

The goal of fairness-aware classification is to learn a classifier model so that the specified sensitive features don’t influence the classification outcome from a potentially unfair training data set.

We would like you to acknowledge the use of these program codes or data sets in publications by citing one of our related publications, if you utilize these materials.

Software

Program codes used for obtaining experimental results.

Release 1.0.0

This is a test code for an fairness-aware classification . This code provides the results of Table 1 and Figure 1 in [ECMLPKDD12].

This software is written in Python. We tested this software under the environment: Python 2.7, NumPy 1.6, and SciPy 0.10. A readme.rst file in the following archive contains a detailed instruction. This script is designed to process the above “Discretized US Census Income / Adult Data set”.

Download : 2012ecmlpkdd.tgz
kamfadm@GitHub

Data Sets

Scripts to convert public data sets and to generate synthetic data sets.

Discretized US Census Income / Adult Data set

A adultd data set is generated by discretizing the US Census Income / Adult data set. The procedure of discretization is described in:

[Calders+2010] T.Calders and S.Verwer “Three naive Bayes approaches for discrimination-free classification” Data Mining and Knowledge Discovery, vol.21 (2010)

Download : data-adultd.tgz

Kamishima’s Synthetic Data Set 1

A sdata_kam1 data set is a synthetic data set, which was used for obtaining results of Fig.2 and Table 2 in [ECMLPKDD12].

Download : sdata_kam1.tgz