Traditionally, statistics of physical capital (e.g. buildings, cars, and roads) have been derived from ground-based surveys, which are expensive and time-consuming to conduct. For example, the Demographic and Health Surveys (DHS) collects population-related statistics of about 90 countries at a cost of 1.9 million dollars over a five-year interval [1].

Figure 1. Exhaustive approach VS. IS-Count | Source: https://arxiv.org/pdf/2112.09126.pdf

Nevertheless, such an exhaustive approach scales and generalizes poorly due to the following problems:

A common detection-based pipeline [2, 4] to collect object count statistics over a large region exhaustively downloads all satellite images covering the target region, counts the objects in each image using a trained detection model, and takes the summation of counts in all the images to produce a total count.

**Story Highlights**

**The quantities of physical capital, or object counts, provide important insights into human activities and the socio-economic development of a region. For example, the number of buildings reflects the level of urbanization in a region; the number of brick kilns is related to the level of air pollution, and the number of cars correlates with the poverty level of a region.****Recently, object detection in high-resolution satellite imagery has emerged as an alternative to ground-based survey data collection in socioeconomic monitoring tasks like counting brick kilns in Bangladesh [2] and counting solar panels in the U.S. [3].**

Applying detection-based methods to real-world scenarios is often prohibitively expensive because of the high cost of purchasing high-resolution satellite images and a large amount of computation required to apply detection models at scale. Training the detection model requires a large number of labeled images in the first place, which could be even more expensive to obtain. Pre-trained detection models generalize poorly under domain shift, which prevents us from directly applying a model to different target regions and objects.

Using Sampling for Estimating Object Count

To avoid the high cost required by the exhaustive approaches, we utilize sampling for estimating the total object count over a large geography. However, applying sampling to real-world object counting is challenging, as the objects of interest often have uneven spatial distributions. For example, buildings are concentrated in towns and cities that take up a tiny proportion of the Earth’s surface but have close-to-zero density in regions like deserts and forests. In this case, if we directly apply uniform sampling, it is likely that most of the samples have zero object counts. As a result, uniform sampling results in high variance and requires a large number of samples in order to perform well (Figure 2).

To address this problem, we adopt importance sampling (IS), which allows us to sample “important” regions – i.e. those that have non-zero object counts – more frequently from a “biased”, or non-uniform, proposal distribution and hence reduces the variance of uniform sampling (Figure 2). Finally, the total count is estimated by reweighting each sample, which ensures that we produce an unbiased estimator. Figure 2. Why does sampling from a “biased” distribution – i.e. importance sampling – help?

IS-Count: Combining Importance Sampling and Machine Learning Our method, IS-Count, selects a small number of representative areas by 1) sampling from a learnable proposal distribution and 2) estimating the total object count by reweighting according to importance sampling.

population), (b) the actual proposal distribution we sample from, and (c) the building density map from the Google Open Buildings dataset in Africa. As we observe, the chosen covariates (e.g. population) is close to the target object density, and the proposal distribution learned from it further resembles the latter. After constructing the proposal distribution, we sample a small number of informative areas from it. The high-resolution satellite imagery correlated with these representative samples are downloaded and labeled by human annotators for estimating the total count.

Observing that publicly available socioeconomic indicators like nightlight intensity (NL) and population density are highly correlated with the target object counts, we treat the covariates (i.e. NL, population) as the base for designing the proposal distribution. The intuition is that these covariates provide a good prior knowledge for building a proposal distribution that is close to the target object counts. The proposal distribution is then learned based on the covariates. The first step is to construct the proposal distribution from which we will sample representative locations. Optimally, the proposal distribution should be as close to the target object density as possible because this will yield the lowest variance of estimation.