Semi-supervised learning using data augmentation and consensus labelling

Speaker: Daniel Ahfock
Affiliation: University of Queensland

Abstract

A popular strategy for semi-supervised learning is to train a classification model using a combination of a supervised loss using labelled data and a regularisation term using unlabelled data. Data augmentation is a technique for the generation of artificial unlabelled data through the perturbation of available data. Data augmentation is frequently combined with consistency regularisation, whereby the classification model is encouraged to make similar predictions on the original and perturbed data. We propose a semi-supervised learning technique using data augmentation and a consistency regularisation penalty developed using the Kullback-Leibler divergence. An EM algorithm is developed for maximisation of the penalised likelihood. The E-step can be interpreted as imputing the missing labels through a consensus labelling procedure given the current model predictions on the original and perturbed data. The M-step can be viewed as a supervised classification task using the imputed labels. The proposed technique can be used for semi-supervised learning with image, text and audio datasets.

About Statistics, modelling and operations research seminars

Students, staff and visitors to UQ are welcome to attend our regular seminars.

The events are jointly run by our Operations research and Statistics and probability research groups.

The Statistics, modelling and operations research (SMOR) Seminar series seeks to celebrate and disseminate research and developments across the broad spectrum of quantitative sciences. The SMOR series provides a platform for communication of both theoretical and practical developments, as well as interdisciplinary topics relating to applied mathematics and statistics.

Venue

Priestley Building (67)
Room: 348 (and via Zoom
https://uqz.zoom.us/j/85172010876)