Analysis of living population density per countries

By definition, population density is a measurement of population per unit area. Classical way to calculate population density of a country was to divide entire population count with entire area of that country.

But that simple approach has several shortcomings, and number of alternate measurement methods for population density measurements exist. Measurement that is analyzed in this article is one closely related to level of urbanization :

Living population density – density metric which measures the density at which the average person lives.

Final result of this analysis is presented in table and interactive map below, based on NASA SEDAC/CIESIN 30 arc-seconds gridded world population data for 2020, adjusted to UN WPP 2015 population count. Rest of this article explains in details used methodology and includes resources needed to recalculate data.

(*) star next to country name in above table marks group of countries, see below for their definition



On wikipedia there are few approaches mentioned for calculating similar type of density:

  1. Median density – a density metric which measures the density at which the average person lives. It is determined by ranking the census tracts by population density, and taking the density at which fifty percent of the population lives at a higher density and fifty percent lives at a lower density.
  2. Population-weighted density – a density metric which measures the density at which the average person lives. It is determined by calculating the standard density of each census tract, assigning each a weight equal to its share of the total population, and then adding the segments.

While both methods have advantage over simple population density definition, they also have few shortcomings when we are talking about ‘living population density per world countries’:

  • they are focused to densities of urban areas and are thus not suitable to calculate density of entire country
  • they rely on ‘census tract’ population data, which is available in US but is not easily available for most of the countries in the world

For purposes of this analysis I implemented method that is similar to ‘population-weighted density’, but instead of US census tracts data it uses population and area data for each 30 arc-second grid cell in the world ( from NASA SEDAC ) . That allows it to avoid above issues, and can be described as :

  • Living density of a country – a density metric which measures the density at which the average citizen of a country lives. It is determined by calculating population density of each 30 arc-second cell of the country, assigning each a weight equal to its share of the total country population.

“Classical” average population density per world country

To quickly check how well classical population density match urbanization levels of world countries, we can use interactive map below ( which looks similar to map at wikipedia entry about population density ) :

That map immediately demonstrate that “classic” population density does not relate well to expected “urbanization” levels. For example, Canada has classic population density 10 times lower than US, but even cursory internet search will show that Canada has 70%80% of people living in urban/metropolitan areas, about same as US ! So while urbanization levels are similar between US and Canada, classic population density of US is an order of magnitude higher than Canada. Similar results would be shown for Russia, Australia, Brazil and many other countries – their urbanization levels do not match their classical population density.

“Classic” population density obviously is not good indicator for urbanization level. Main reason is that it shows density that would be realistic only if population is evenly spread across entire country. And that is not true in any country, but especially not in countries like Canada ( or Australia, or … ) where most people are concentrated in several big cities, and rest of the land is mostly uninhabited.

To demonstrate, lets imagine country that consists of single city – for example, Singapore. Imagine that it has 100km2 area, with 1 million population evenly spread across city – classic population density would be 10,000 ppl/km2 , and it would really accurately reflect situation in which every citizen of that city-state lives in.

Now imagine that Singapore buys additional 9,900km2 of empty land adjacent to it, so it expand its country area to total 10,000km2, while still only having same 1M people living in original 100km2 city. Classic population density would now show that Singapore has just 100 ppl/km2 density … 100 times lower than before ! And yet, every citizen in Singapore still live in same city as before, under same conditions – which means every citizen still live surrounded by 10,000 people on average on square kilometer. This demonstrate that “classic” population density is bad indicator of average population density as seen by average citizen.

That last sentence is key reason for the problem – we need “living” population density that shows average situation in which citizens of that country live, instead of “classical” density which simply shows total population over total country area. In imaginary ‘Singapore’ example above, we need number that shows livingDensity= 10,000ppl/km2 even when Singapore has 1M people over 10,000km2 ( because all of them live on just 100km2 city area), instead of number that shows classicDensity=100 ppl/km2.

Mathematically, formula for such density would be:

(1)   \begin{equation*}  living\;density= \frac{\sum\limits_{for\;each\,citizen}density\;where\;that\,citizen\;lives}{Total\;country\;population}   \end{equation*}

It is identical to formula (2) below, where population%(A) represent fraction of total country population living at that area A ( at that square kilometer), and density(A) represent population density at that area A :

(2)   \begin{equation*}  living\;density= \sum\limits_{A=for\;each\,km^2}population\%(A) * density(A)  \end{equation*}

In our Singapore example, that would be equal to sum over 100km2 of city area, each km2 with population%= 10,000ppl/1Mppl= 1% and density=10,000ppl/km2, and sum over remaining 9900km2 of uninhabited area, each km2 there with population%=0 and density=0. In total, imaginary Singapore living Density = 100*(1%*10,000ppl/km2)+9900*(0%*0) = 100*100ppl/km2= 10,000 ppl/km2 … exactly what we wanted ! It shows that average imaginary Singapore citizen lives in 10,000 ppl/km2 density, both in case before expanding to empty area and after.

Living population density per world country

Living population density ( henceforth just “living density”, and other one will be called “classic density”) is value that should much better represent urbanization of countries , as mentioned previously.

To calculate living density values for each country in the world, I used NASA statistical data for world grid from :

NASA SEDAC (Socioeconomic Data and Applications Center)

They provide earth population data in different formats, but most suited for above calculation are GIS data in GEOtiff formats with 30 seconds resolution, because 30seconds correspond closely to 1km2 : there are 2 of those in minute, and 60*2 in degree, so 360*60*2=43,200 around entire 40,000km of Earth circumference – which averages to square-ish areas with around 40,000km/43,200 = 0.93km sides, so around 0.86 km2 on average ( less than 1km2 each ). Also, 30 arc-seconds is highest resolution ( most detailed data ) available.

Population distribution based on latest SEDAC data ( for 2020, but adjusted to UN WPP 2015 counts ) is visualized in image below:

Image resolution is 10800 x 4500. Each pixel represent area smaller than 4x4km. Red color for cities, brown for urban, green for rural.

That data presents certain problem for previous ‘Living Density’ formula (2), because that formula is fixed to exact 1km2 units, and NASA GEO data is given for “almost” each 1km2, but not exactly – as explained above, it is closer to 0.9km2 and, more importantly, it is not always same area for each cell.

So, to generalize previous formula and make it more suitable, we observe that:

(3)   \begin{equation*}  population\%(A) =  population(A)/Total\;Population  \end{equation*}

(4)   \begin{equation*}   \[ density(A) =  population(A)/area(A)  \end{equation*}

Therefore, if we substitute (3) and (4) in (2), we get:

(5)   \begin{equation*}   population\%(A) * density(A) = population(A)^2/area(A)/Total\;Population \end{equation*}

And since TotalPopulation is constant that does not depend on selected area, we can write previous formula (2) for living density as:

(6)   \begin{equation*}  \textbf{Density = living\;density =  } \frac{\sum\limits_{A=for\;each\,area}\frac{population(A)^2}{area(A)} }{Total\;Population}  \end{equation*}

For above formula to be correct, areas can be different sizes but each area must be evenly populated ( homogenous ), regardless of its size. Also, to really reflect ‘living’ density, it should only include land area, excluding bodies of water.

Applying this formula on our previous ‘Singapore’ example shows that it simplifies calculation – we only consider two ‘areas’ : 100km2 city area and 9900km2 of outside area: living density = (population_sity^2/city_area + population_outside^2/outside_area)/TotalPopulation = (1M^2/100km2+0^2/9900^2)/1M = 1M^2/100km2/1M = 1M/100km2= 10,000 ppl/km2 … same correct result as before.

New formula is especially suitable when used on already mentioned NASA GEO data, since it allows summing over areas of different sizes. And because each cell there has under 1km2 area, we can safely assume that within such small area population is evenly/homogenously distributed. For calculation it needs only three data sources, all at 30 sec resolution :

  1. population count – how many people for each cell
  2. land area – actual land area for each cell
  3. national grid – to which country each cell belongs

Technical difficulties in calculating and showing “living density”

Processing of NASA geo data sets had several technical issues that needed to be overcome:

  • GIS data was too large for normal arrays : 43200 x 21600 float numbers, resulting in almost 1 billion array elements, with almost 4GB size . Solution was to use gcAllowVeryLargeObjects enabled=”true” in C#
  • Data needed too much RAM : even if above would allow c# to handle such arrays, they took too much RAM ( especially since 3 or 4 of GEO data arrays needed to be processed at same time, as listed above : population, area, nations…). Solution was to process data in configurable bands – for example, in 6000 lines per band, so around 4 bands for total data.
  • GEO tiff data needs decoding: to avoid tangential effort of decoding geotiff format, I used OSGeo.GDAL nuget from https://gdal.org
  • there were errors in GEO data: negative populations, land areas etc… solution was to detect cases when they were for ‘uninhabited land’ ( like deserts and ice) or ‘no land’ ( like lakes and seas )
  • there were statistical errors in GEO data: some countries added to twice their real population in 2020 data ( like Romania). Solution was to use “UN adjusted” datasets
  • final result visualization: while I made my own visualization maps and sortable tables ( in same app used to process geo data), for embedding in HTML posts I used datawrapper.de

Analysis of “Living” population density per world country

Applying above method to process geo data and get living population density for each country in the world resulted in exported CSV file that ( in addition to country codes, population and area) included three calculated values for each world country:

  • living density – average population density where citizens live
  • classic density – simple total country population divided by total country area
  • concentration index – ratio of living over classic densities

That resulting CSV file can be downloaded from ‘get the data’ link under each map, or in download section at the end of this article ( bundled with my application for processing original NASA data ).

Map below demonstrate resulting living densities for world countries:

Difference to classical densities are immediately visible – especially if we look at countries like Canada or Australia. Now, they have similar ( even slightly higher ) average population density than US – indicating that less people live in small rural areas, and more people are concentrated in cities.

Most countries around the world have population density in range 1500-4000 ppl/km2 .

Exceptions are some countries with higher living density like China ( 5900 ), Brazil ( 6100 ), Egypt ( 12,500 ) and especially Mexico ( over 14,000 ppl/km2 ). While Egypt was expected to have high living density ( most people are forced to live close to river Nile ), Mexico was not so expected – but supposedly countries that has large unhospitable areas will tend to have more of the population concentrated into cities and less people in those (unhospitable) rural areas. Examples are countries with deserts (Morocco, Egypt), jungle (Brazil), or in general lot of barren/infertile land (China, Mexico).

There are also countries with lower living density, like Germany (1030 ppl/km2) or Poland or number of other European countries.

For some countries reason for low living density could be lower quality of NASA geo data. Some of those countries ( like Bulgaria, North Macedonia, Moldova ) appear to miss city areas in NASA data set – instead they have city population spread evenly over larger ‘regional’ areas, so they appear as lower density while still keeping same population. That could be result of census data for those countries being available only on regional level, as opposed to smaller areas. It must be noted that density numbers presented here depend on accuracy of underlying NASA geo data, more specifically on data resolution. If resolution of the data is worse than 30 arc-seconds (~1km2) for some countries, they can still have accurate total population but their cities may be shown as larger low-density areas instead of smaller high-density areas, and their living density will show as lower than actual. But those countries are in minority and can be visually detected on 10800 x 4500 map above, or in application from download section – those countries will miss red urban areas at positions of their cities and will instead have evenly spread brown or green population areas, often within inner region/county borders. For most of the countries, NASA geo data appears to be valid for population and density distribution at each km2.

We can see that US has similar living population density ( around 2250 ppl/km2 ) to many of European countries, but not all – because there is quite a difference among European countries as mentioned before, even comparing countries with similar population, economy development levels and quality of geo data, like UK, France and Germany – which have 4180, 2800 and 1000 ppl/km2 living population densities respectively. But it is almost certain that individual US states would also have different living density, so best way to compare US to Europe would be to aggregate all European countries, which is presented below – where EU is 27 countries of European Union ( without UK ), Europe consisting of countries entirely on the continent (44 countries and 7 smaller territories), Europe+ is wiki definition of Europe ( with Russia, Turkey, Azerbaijan, Armenia, Kazakhstan and Georgia) , and NA refers to Northern America which contains US, Canada, Greenland and few small countries :

Density [ppl/km2]
CountryPopulation Area [km2]classicLiving
US333,421,581 9,090,390372,244
EU441,176,165 4,039,0201092,161
Europe597,383,727 5,742,3151042,296
 
Europe+859,073,201 25,627,288342,564
NA371,143,896 20,467,094182,345

It demonstrate that while US has significantly lower ‘classic’ density than EU/Europe ( three times, due to smaller population over larger area ), they have practically same living population density, around 2200 ppl/km2. Extended European definition, that adds large countries like Russia and Kazakhstan, results in huge area ( 2-3 times larger than US), but with larger population it amount to about same classic density as US – while still having living density at similar levels ( around 2500 ppl/km2). Similar case is for Northern America, which adds two large and mostly empty countries (Canada and Greenland) to US, resulting in two times lower classical density – but even there, living density remains similar (2345 ppl/km2).

Which indicate that on average US and Europe have similarly high levels of urbanization ( while certainly differences exists between individual US states or European countries ). It also demonstrate that, whenever most of population is concentrated in cities, it does not matter how empty or large is rest of the country – living density ( density seen as average citizen ) will usually be close to average city density.

Uneven concentration of population

When looking at both “classic” population density and “living” one, some countries have much higher difference than the others.

In fact, ratio of living density vs classic density is direct indicator of how “uneven” is population concentration in the country. In hypothetical country where population is ideally evenly distributed across entire country area, those two densities would be the same ( for example, in our hypothetical Singapore example while entire area was just 100km2 of the city ). But when country has most population crammed in several cities and with large uninhabited areas, then living density becomes much higher than classic density. Examples are Canada or Australia – they both have less evenly spread population across country than US for example.

Therefore, we could state formula:

(7)   \begin{equation*}  uneven\;index =  \frac{living\;density}{classic\;density}   \end{equation*}

So I made third map, showing above mentioned ‘uneven index’ as “ratio of population density” , which is measure of how homogenous ( even spread of population, low ratio index ) or non-homogenous ( uneven spread of population, high ratio index ) are population per countries:

Countries that are especially uneven are some large countries with small populations concentrated in few cities ( like Canada, Australia, Mongolia ) , barely populated countries like Greenland, or some presumably desert countries like Mauritania and Namibia.

But more interesting, and surprising, are “most even” countries: quite different mix of countries, like several central European countries ( Germany, Poland, low lands ) , south-east European countries ( Bulgaria, Croatia, Bosnia,…), India, some African countries (South Sudan, Uganda ) etc. Very different countries, both in development level, size and , most interestingly, in population densities ( living and classical ). Yet all of them share same trait: they have more evenly distributed population across country than most of the other countries.

Some fun/interesting questions related to concentration levels :

Q1: What Germany, India and North Korea have in common ?

A1: They have more evenly spread population across country than most other countries.


Q2: If we know that uncontrolled reentry of large space junk will hit certain country, but we do not know where, what is probability it will endanger some citizens of that country ?

A2: Inversely proportional to ‘concentration index’ of that country. So US would have 1 in 60 (under 2%), Canada 1 in 800 ( around 0.1%) and San Marino 1 in 1 (100%). Basically, darker colored countries on “concentration” map would have lower chance of some citizen being hit by space debris ( under assumption that we somehow know which country will be hit, but not where )

Downloadable resources

In order to process NASA geo data and export summary country CSV file, I made application that can be downloaded in ZIP form from :

Since it is C# application, it requires .NET Framework 4.7.2 ( which should be included in Windows 10 April 2018 Update Version 1803 and later, or can be installed independently ).

Once it is unzipped to its folder, notable files are :

  • GeoTiff.exe – main executable
  • saved_*.* files : cached pre-calculated files from latest NASA data, that was used for this article and for linked maps
  • predef_*.* files : used in case of ‘Recalc’ with new NASA data ( contains names/codes for countries and cities )
  • exportedCountries.csv : summary file with country data, used to import for maps

While main purpose of this application was to process GEO data (calculating population density ) and make export files, it also has limited visualization capabilities. Both countries and cities can be explored on geo map shown within application, sorted by population/ area/ density, searched by name, and visualized on map ( double-clicking city or country row in tables, or right-clicking on map ). Main map is made from NASA geo data directly, linked to smaller embedded google map.

In addition to standard UN countries, application also calculates aggregated data for Northern America and for Europe ( in three variants, since “Europe” is not exactly well defined term ) :

As mentioned before, I made that application to also detect largest connected cities in the world. Cities are listed in separate tab, with their “connected metropolitan” area and population. Those numbers are dependent on configurable parameters : ‘city density’ ( default 2000 ppl/km2) and ‘range’ ( default max 6km of non-city ‘jump’ allowed ). Any change of those parameters require new recalculation ( using NASA geo files ). Example of largest “connected city” in the world under default parameters is :

Note that this is not production level application – it does not have polished UI and performance is not optimized for visualization (only for data processing). Only reason that it has visualization at all is lack of 3rd party visualization tools for cities or arbitrary areas. For countries, 3rd party tools like datawrapper are good for visualization and I have used them for maps in this article. But for cities I was forced to make my own solution in this application.

Optional data files are needed for new recalculation and warning with download instructions will be displayed if ‘Recalculate’ is attempted without them. Those files can be downloaded from NASA SEDAC site :

  1. population count – how many people for each cell
  2. land area – actual land area for each cell
  3. national grid – to which country each cell belongs