Locally Masked Convolution for Autoregressive Models
Locally Masked Convolution for Autoregressive Models

Ajay Jain
UC Berkeley
Pieter Abbeel
UC Berkeley
Deepak Pathak
Conference on Uncertainty in AI (UAI), 2020
[GitHub Code]

Summary: Our Locally Masked PixelCNN generates natural images in customizable orders like zig-zags and Hilbert Curves. We train a single PixelCNN++ to support 8 generation orders simultaneously, outperforming PixelCNN++ on distribution estimation and allowing globally coherent image completions on CIFAR10, CelebA-HQ and MNIST. We control the order with our proposed locally masked convolution operation, which is efficient and easy to implement via matrix multiplication.

Abstract: There is substantial interest in modeling high dimensional data distributions such as images, with applications including compression, multimedia generation, anomaly detection, and data completion. State-of-the-art density estimators for natural images are autoregressive, decomposing the joint distribution over pixels into a product of conditionals. The conditional distributions are parameterized by an expressive deep neural network, e.g. a convolutional neural network such as the PixelCNN. However, convolutional autoregressive models can only model a single decomposition of the joint where only a single generation order is efficient. For tasks such as missing data completion, these models are unable to use much of the observed context. To generate data in arbitrary orders, we introduce LMConv: a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image. Using LMConv, we learn an ensemble of density estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation as well as globally coherent image completions.

Locally Masked Convolution

Autoregressive models generate images one pixel at a time, but are typically trained with a single generation order. We train a PixelCNN that can decompose the joint distribution over image dimensions with arbitrary autoregressive orderings. At test time, we evaluate more accurate likelihoods by ensembling across multiple orders (shared parameters), or complete missing image dimension by choosing a favorable order. To implement these models on images, we propose locally masked convolutions that efficiently apply different masks to the input image at each conv filter location, allowing full control over the order.

Customized Order along Hilbert Curve

Since Locally Masked PixelCNN can support arbitrary orders, we trained it to generate binary MNIST digits along Hilbert space-filling curves. With this order, consecutively generated pixels lie in the same neighborhood of the image agnostic to its resolution.

Overview of the Algorithm

Locally masked convolutions mask features patches, not CNN filter weights. This can be efficiently implemented with the im2col algorithm, which computes convolutions with matrix multiplication.

Source Code

PyTorch code for our paper is open-source and available on GitHub. We include a memory-efficient, pure Python implementation of the locally masked convolution, as well as training and evaluation code.

Paper and Bibtex


Ajay Jain, Pieter Abbeel, Deepak Pathak. Locally Masked Convolution for Autoregressive Models. In Conference on Uncertainty in Artificial Intelligence (UAI), 2020.

    title={Locally Masked Convolution
    for Autoregressive Models},
    author={Ajay Jain and Pieter
    Abbeel and Deepak Pathak},
    booktitle={Conference on Uncertainty
    in Artificial Intelligence (UAI)},

Website source