Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Topic available: yes_one_instance

Topic available also for a group: no

...

Topic #10: Missing Data Imputation for Supervised Learning with Variational Autoencoder (VAE)

Background

Many real-world datasets come with missing values for various reasons. However, missing values are usually treated roughly, e.g., by removing samples/features that contain missing values or imputing missing values with a constant, which may lose crucial information, especially when the missingness is not at random. For example, in electronic health records, the patient with lots of missing data is likely to be healthier than the rest because the patient may not go for lab tests often. Here, the main objectives are: 1. develop sophisticated missing data imputation algorithms that can take both label and feature information into account with variational autoencoder; 2. study the identifiability of the missing mechanism. You will work on real-world datasets such as MIMIC-III or UCI datasets, and aim at an article with cooperative researchers.

Related materials:

  1. How to deal with missing data in supervised deep learning? (https://openreview.net/forum?id=jEXxzPUMYVZ)
  2. MIWAE: deep generative modelling and imputation of incomplete data sets (http://proceedings.mlr.press/v97/mattei19a.html)
  3. not-MIWAE: Deep Generative Modelling with Missing not at Random Data (https://openreview.net/forum?id=tu29GQT0JFy)

Prerequisite: 1) familiar with latent variable model and Bayesian inference (e.g., CS-E4820); 2) knowledge of deep learning; 3) programming skills with deep learning frameworks (e.g., PyTorch)

Instructor: Tianyu Cui (tianyu.cui@aalto.fi, contact person), Zhiheng Qian (zhiheng.qian@aalto.fi)

Topic available: yes_one_instance

Topic available also for a group: no