On universal transfer learning

M. M. Hassan Mahmud*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    17 Citations (Scopus)

    Abstract

    In transfer learning the aim is to solve new learning tasks using fewer examples by using information gained from solving related tasks. Existing transfer learning methods have been used successfully in practice and PAC analysis of these methods have been developed. But the key notion of relatedness between tasks has not yet been defined clearly, which makes it difficult to understand, let alone answer, questions that naturally arise in the context of transfer, such as, how much information to transfer, whether to transfer information, and how to transfer information across tasks. In this paper, we look at transfer learning from the perspective of Algorithmic Information Theory/Kolmogorov complexity theory, and formally solve these problems in the same sense Solomonoff Induction solves the problem of inductive inference. We define universal measures of relatedness between tasks, and use these measures to develop universally optimal Bayesian transfer learning methods. We also derive results in AIT that are interesting by themselves. To address a concern that arises from the theory, we also briefly look at the notion of Kolmogorov complexity of probability measures. Finally, we present a simple practical approximation to the theory to do transfer learning and show that even these are quite effective, allowing us to transfer across tasks that are superficially unrelated. The latter is an experimental feat which has not been achieved before, and thus shows the theory is also useful in constructing practical transfer algorithms.

    Original languageEnglish
    Pages (from-to)1826-1846
    Number of pages21
    JournalTheoretical Computer Science
    Volume410
    Issue number19
    DOIs
    Publication statusPublished - 28 Apr 2009

    Fingerprint

    Dive into the research topics of 'On universal transfer learning'. Together they form a unique fingerprint.

    Cite this