Analysis of living population density per countries

By definition, population density is a measurement of population per unit area. Classical way to calculate population density of a country was to divide entire population count with entire area of that country.

But that simple approach has several shortcomings, and number of alternate measurement methods for population density measurements exist. Measurement that is analyzed in this article is one closely related to level of urbanization :

Living population density – density metric which measures the density at which the average person lives.

Final result of this analysis is presented in table and interactive map below, based on NASA SEDAC/CIESIN 30 arc-seconds gridded world population data for 2020, adjusted to UN WPP 2015 population count. Rest of this article explains in details used methodology and includes resources needed to recalculate data.

(*) star next to country name in above table marks group of countries, see below for their definition



On wikipedia there are few approaches mentioned for calculating similar type of density:

  1. Median density – a density metric which measures the density at which the average person lives. It is determined by ranking the census tracts by population density, and taking the density at which fifty percent of the population lives at a higher density and fifty percent lives at a lower density.
  2. Population-weighted density – a density metric which measures the density at which the average person lives. It is determined by calculating the standard density of each census tract, assigning each a weight equal to its share of the total population, and then adding the segments.

While both methods have advantage over simple population density definition, they also have few shortcomings when we are talking about ‘living population density per world countries’:

  • they are focused to densities of urban areas and are thus not suitable to calculate density of entire country
  • they rely on ‘census tract’ population data, which is available in US but is not easily available for most of the countries in the world

For purposes of this analysis I implemented method that is similar to ‘population-weighted density’, but instead of US census tracts data it uses population and area data for each 30 arc-second grid cell in the world ( from NASA SEDAC ) . That allows it to avoid above issues, and can be described as :

  • Living density of a country – a density metric which measures the density at which the average citizen of a country lives. It is determined by calculating population density of each 30 arc-second cell of the country, assigning each a weight equal to its share of the total country population.

“Classical” average population density per world country

To quickly check how well classical population density match urbanization levels of world countries, we can use interactive map below ( which looks similar to map at wikipedia entry about population density ) :

That map immediately demonstrate that “classic” population density does not relate well to expected “urbanization” levels. For example, Canada has classic population density 10 times lower than US, but even cursory internet search will show that Canada has 70%80% of people living in urban/metropolitan areas, about same as US ! So while urbanization levels are similar between US and Canada, classic population density of US is an order of magnitude higher than Canada. Similar results would be shown for Russia, Australia, Brazil and many other countries – their urbanization levels do not match their classical population density.

“Classic” population density obviously is not good indicator for urbanization level. Main reason is that it shows density that would be realistic only if population is evenly spread across entire country. And that is not true in any country, but especially not in countries like Canada ( or Australia, or … ) where most people are concentrated in several big cities, and rest of the land is mostly uninhabited.

To demonstrate, lets imagine country that consists of single city – for example, Singapore. Imagine that it has 100km2 area, with 1 million population evenly spread across city – classic population density would be 10,000 ppl/km2 , and it would really accurately reflect situation in which every citizen of that city-state lives in.

Now imagine that Singapore buys additional 9,900km2 of empty land adjacent to it, so it expand its country area to total 10,000km2, while still only having same 1M people living in original 100km2 city. Classic population density would now show that Singapore has just 100 ppl/km2 density … 100 times lower than before ! And yet, every citizen in Singapore still live in same city as before, under same conditions – which means every citizen still live surrounded by 10,000 people on average on square kilometer. This demonstrate that “classic” population density is bad indicator of average population density as seen by average citizen.

That last sentence is key reason for the problem – we need “living” population density that shows average situation in which citizens of that country live, instead of “classical” density which simply shows total population over total country area. In imaginary ‘Singapore’ example above, we need number that shows livingDensity= 10,000ppl/km2 even when Singapore has 1M people over 10,000km2 ( because all of them live on just 100km2 city area), instead of number that shows classicDensity=100 ppl/km2.

Mathematically, formula for such density would be:

(1)   \begin{equation*}  living\;density= \frac{\sum\limits_{for\;each\,citizen}density\;where\;that\,citizen\;lives}{Total\;country\;population}   \end{equation*}

It is identical to formula (2) below, where population%(A) represent fraction of total country population living at that area A ( at that square kilometer), and density(A) represent population density at that area A :

(2)   \begin{equation*}  living\;density= \sum\limits_{A=for\;each\,km^2}population\%(A) * density(A)  \end{equation*}

In our Singapore example, that would be equal to sum over 100km2 of city area, each km2 with population%= 10,000ppl/1Mppl= 1% and density=10,000ppl/km2, and sum over remaining 9900km2 of uninhabited area, each km2 there with population%=0 and density=0. In total, imaginary Singapore living Density = 100*(1%*10,000ppl/km2)+9900*(0%*0) = 100*100ppl/km2= 10,000 ppl/km2 … exactly what we wanted ! It shows that average imaginary Singapore citizen lives in 10,000 ppl/km2 density, both in case before expanding to empty area and after.

Living population density per world country

Living population density ( henceforth just “living density”, and other one will be called “classic density”) is value that should much better represent urbanization of countries , as mentioned previously.

To calculate living density values for each country in the world, I used NASA statistical data for world grid from :

NASA SEDAC (Socioeconomic Data and Applications Center)

They provide earth population data in different formats, but most suited for above calculation are GIS data in GEOtiff formats with 30 seconds resolution, because 30seconds correspond closely to 1km2 : there are 2 of those in minute, and 60*2 in degree, so 360*60*2=43,200 around entire 40,000km of Earth circumference – which averages to square-ish areas with around 40,000km/43,200 = 0.93km sides, so around 0.86 km2 on average ( less than 1km2 each ). Also, 30 arc-seconds is highest resolution ( most detailed data ) available.

Population distribution based on latest SEDAC data ( for 2020, but adjusted to UN WPP 2015 counts ) is visualized in image below:

Image resolution is 10800 x 4500. Each pixel represent area smaller than 4x4km. Red color for cities, brown for urban, green for rural.

That data presents certain problem for previous ‘Living Density’ formula (2), because that formula is fixed to exact 1km2 units, and NASA GEO data is given for “almost” each 1km2, but not exactly – as explained above, it is closer to 0.9km2 and, more importantly, it is not always same area for each cell.

So, to generalize previous formula and make it more suitable, we observe that:

(3)   \begin{equation*}  population\%(A) =  population(A)/Total\;Population  \end{equation*}

(4)   \begin{equation*}   \[ density(A) =  population(A)/area(A)  \end{equation*}

Therefore, if we substitute (3) and (4) in (2), we get:

(5)   \begin{equation*}   population\%(A) * density(A) = population(A)^2/area(A)/Total\;Population \end{equation*}

And since TotalPopulation is constant that does not depend on selected area, we can write previous formula (2) for living density as:

(6)   \begin{equation*}  \textbf{Density = living\;density =  } \frac{\sum\limits_{A=for\;each\,area}\frac{population(A)^2}{area(A)} }{Total\;Population}  \end{equation*}

For above formula to be correct, areas can be different sizes but each area must be evenly populated ( homogenous ), regardless of its size. Also, to really reflect ‘living’ density, it should only include land area, excluding bodies of water.

Applying this formula on our previous ‘Singapore’ example shows that it simplifies calculation – we only consider two ‘areas’ : 100km2 city area and 9900km2 of outside area: living density = (population_sity^2/city_area + population_outside^2/outside_area)/TotalPopulation = (1M^2/100km2+0^2/9900^2)/1M = 1M^2/100km2/1M = 1M/100km2= 10,000 ppl/km2 … same correct result as before.

New formula is especially suitable when used on already mentioned NASA GEO data, since it allows summing over areas of different sizes. And because each cell there has under 1km2 area, we can safely assume that within such small area population is evenly/homogenously distributed. For calculation it needs only three data sources, all at 30 sec resolution :

  1. population count – how many people for each cell
  2. land area – actual land area for each cell
  3. national grid – to which country each cell belongs

Technical difficulties in calculating and showing “living density”

Processing of NASA geo data sets had several technical issues that needed to be overcome:

  • GIS data was too large for normal arrays : 43200 x 21600 float numbers, resulting in almost 1 billion array elements, with almost 4GB size . Solution was to use gcAllowVeryLargeObjects enabled=”true” in C#
  • Data needed too much RAM : even if above would allow c# to handle such arrays, they took too much RAM ( especially since 3 or 4 of GEO data arrays needed to be processed at same time, as listed above : population, area, nations…). Solution was to process data in configurable bands – for example, in 6000 lines per band, so around 4 bands for total data.
  • GEO tiff data needs decoding: to avoid tangential effort of decoding geotiff format, I used OSGeo.GDAL nuget from https://gdal.org
  • there were errors in GEO data: negative populations, land areas etc… solution was to detect cases when they were for ‘uninhabited land’ ( like deserts and ice) or ‘no land’ ( like lakes and seas )
  • there were statistical errors in GEO data: some countries added to twice their real population in 2020 data ( like Romania). Solution was to use “UN adjusted” datasets
  • final result visualization: while I made my own visualization maps and sortable tables ( in same app used to process geo data), for embedding in HTML posts I used datawrapper.de

Analysis of “Living” population density per world country

Applying above method to process geo data and get living population density for each country in the world resulted in exported CSV file that ( in addition to country codes, population and area) included three calculated values for each world country:

  • living density – average population density where citizens live
  • classic density – simple total country population divided by total country area
  • concentration index – ratio of living over classic densities

That resulting CSV file can be downloaded from ‘get the data’ link under each map, or in download section at the end of this article ( bundled with my application for processing original NASA data ).

Map below demonstrate resulting living densities for world countries:

Difference to classical densities are immediately visible – especially if we look at countries like Canada or Australia. Now, they have similar ( even slightly higher ) average population density than US – indicating that less people live in small rural areas, and more people are concentrated in cities.

Most countries around the world have population density in range 1500-4000 ppl/km2 .

Exceptions are some countries with higher living density like China ( 5900 ), Brazil ( 6100 ), Egypt ( 12,500 ) and especially Mexico ( over 14,000 ppl/km2 ). While Egypt was expected to have high living density ( most people are forced to live close to river Nile ), Mexico was not so expected – but supposedly countries that has large unhospitable areas will tend to have more of the population concentrated into cities and less people in those (unhospitable) rural areas. Examples are countries with deserts (Morocco, Egypt), jungle (Brazil), or in general lot of barren/infertile land (China, Mexico).

There are also countries with lower living density, like Germany (1030 ppl/km2) or Poland or number of other European countries.

For some countries reason for low living density could be lower quality of NASA geo data. Some of those countries ( like Bulgaria, North Macedonia, Moldova ) appear to miss city areas in NASA data set – instead they have city population spread evenly over larger ‘regional’ areas, so they appear as lower density while still keeping same population. That could be result of census data for those countries being available only on regional level, as opposed to smaller areas. It must be noted that density numbers presented here depend on accuracy of underlying NASA geo data, more specifically on data resolution. If resolution of the data is worse than 30 arc-seconds (~1km2) for some countries, they can still have accurate total population but their cities may be shown as larger low-density areas instead of smaller high-density areas, and their living density will show as lower than actual. But those countries are in minority and can be visually detected on 10800 x 4500 map above, or in application from download section – those countries will miss red urban areas at positions of their cities and will instead have evenly spread brown or green population areas, often within inner region/county borders. For most of the countries, NASA geo data appears to be valid for population and density distribution at each km2.

We can see that US has similar living population density ( around 2250 ppl/km2 ) to many of European countries, but not all – because there is quite a difference among European countries as mentioned before, even comparing countries with similar population, economy development levels and quality of geo data, like UK, France and Germany – which have 4180, 2800 and 1000 ppl/km2 living population densities respectively. But it is almost certain that individual US states would also have different living density, so best way to compare US to Europe would be to aggregate all European countries, which is presented below – where EU is 27 countries of European Union ( without UK ), Europe consisting of countries entirely on the continent (44 countries and 7 smaller territories), Europe+ is wiki definition of Europe ( with Russia, Turkey, Azerbaijan, Armenia, Kazakhstan and Georgia) , and NA refers to Northern America which contains US, Canada, Greenland and few small countries :

Density [ppl/km2]
CountryPopulation Area [km2]classicLiving
US333,421,581 9,090,390372,244
EU441,176,165 4,039,0201092,161
Europe597,383,727 5,742,3151042,296
 
Europe+859,073,201 25,627,288342,564
NA371,143,896 20,467,094182,345

It demonstrate that while US has significantly lower ‘classic’ density than EU/Europe ( three times, due to smaller population over larger area ), they have practically same living population density, around 2200 ppl/km2. Extended European definition, that adds large countries like Russia and Kazakhstan, results in huge area ( 2-3 times larger than US), but with larger population it amount to about same classic density as US – while still having living density at similar levels ( around 2500 ppl/km2). Similar case is for Northern America, which adds two large and mostly empty countries (Canada and Greenland) to US, resulting in two times lower classical density – but even there, living density remains similar (2345 ppl/km2).

Which indicate that on average US and Europe have similarly high levels of urbanization ( while certainly differences exists between individual US states or European countries ). It also demonstrate that, whenever most of population is concentrated in cities, it does not matter how empty or large is rest of the country – living density ( density seen as average citizen ) will usually be close to average city density.

Uneven concentration of population

When looking at both “classic” population density and “living” one, some countries have much higher difference than the others.

In fact, ratio of living density vs classic density is direct indicator of how “uneven” is population concentration in the country. In hypothetical country where population is ideally evenly distributed across entire country area, those two densities would be the same ( for example, in our hypothetical Singapore example while entire area was just 100km2 of the city ). But when country has most population crammed in several cities and with large uninhabited areas, then living density becomes much higher than classic density. Examples are Canada or Australia – they both have less evenly spread population across country than US for example.

Therefore, we could state formula:

(7)   \begin{equation*}  uneven\;index =  \frac{living\;density}{classic\;density}   \end{equation*}

So I made third map, showing above mentioned ‘uneven index’ as “ratio of population density” , which is measure of how homogenous ( even spread of population, low ratio index ) or non-homogenous ( uneven spread of population, high ratio index ) are population per countries:

Countries that are especially uneven are some large countries with small populations concentrated in few cities ( like Canada, Australia, Mongolia ) , barely populated countries like Greenland, or some presumably desert countries like Mauritania and Namibia.

But more interesting, and surprising, are “most even” countries: quite different mix of countries, like several central European countries ( Germany, Poland, low lands ) , south-east European countries ( Bulgaria, Croatia, Bosnia,…), India, some African countries (South Sudan, Uganda ) etc. Very different countries, both in development level, size and , most interestingly, in population densities ( living and classical ). Yet all of them share same trait: they have more evenly distributed population across country than most of the other countries.

Some fun/interesting questions related to concentration levels :

Q1: What Germany, India and North Korea have in common ?

A1: They have more evenly spread population across country than most other countries.


Q2: If we know that uncontrolled reentry of large space junk will hit certain country, but we do not know where, what is probability it will endanger some citizens of that country ?

A2: Inversely proportional to ‘concentration index’ of that country. So US would have 1 in 60 (under 2%), Canada 1 in 800 ( around 0.1%) and San Marino 1 in 1 (100%). Basically, darker colored countries on “concentration” map would have lower chance of some citizen being hit by space debris ( under assumption that we somehow know which country will be hit, but not where )

Downloadable resources

In order to process NASA geo data and export summary country CSV file, I made application that can be downloaded in ZIP form from :

Since it is C# application, it requires .NET Framework 4.7.2 ( which should be included in Windows 10 April 2018 Update Version 1803 and later, or can be installed independently ).

Once it is unzipped to its folder, notable files are :

  • GeoTiff.exe – main executable
  • saved_*.* files : cached pre-calculated files from latest NASA data, that was used for this article and for linked maps
  • predef_*.* files : used in case of ‘Recalc’ with new NASA data ( contains names/codes for countries and cities )
  • exportedCountries.csv : summary file with country data, used to import for maps

While main purpose of this application was to process GEO data (calculating population density ) and make export files, it also has limited visualization capabilities. Both countries and cities can be explored on geo map shown within application, sorted by population/ area/ density, searched by name, and visualized on map ( double-clicking city or country row in tables, or right-clicking on map ). Main map is made from NASA geo data directly, linked to smaller embedded google map.

In addition to standard UN countries, application also calculates aggregated data for Northern America and for Europe ( in three variants, since “Europe” is not exactly well defined term ) :

As mentioned before, I made that application to also detect largest connected cities in the world. Cities are listed in separate tab, with their “connected metropolitan” area and population. Those numbers are dependent on configurable parameters : ‘city density’ ( default 2000 ppl/km2) and ‘range’ ( default max 6km of non-city ‘jump’ allowed ). Any change of those parameters require new recalculation ( using NASA geo files ). Example of largest “connected city” in the world under default parameters is :

Note that this is not production level application – it does not have polished UI and performance is not optimized for visualization (only for data processing). Only reason that it has visualization at all is lack of 3rd party visualization tools for cities or arbitrary areas. For countries, 3rd party tools like datawrapper are good for visualization and I have used them for maps in this article. But for cities I was forced to make my own solution in this application.

Optional data files are needed for new recalculation and warning with download instructions will be displayed if ‘Recalculate’ is attempted without them. Those files can be downloaded from NASA SEDAC site :

  1. population count – how many people for each cell
  2. land area – actual land area for each cell
  3. national grid – to which country each cell belongs

Truel problem – solved with Jupyter / Python

About a year ago I decided to estimate usability of Jupyter notebook documents with Python code. Since both Python and Jupyter were new to me at that time, I selected real world problem to solve using them, specifically to solve “Truel” problem :

Several people are doing duel. Given their probabilities to hit, what are probability of each of them to win and who should they choose as optimal initial target ?

Resulting solution, as static HTML page showing results of Truel analysis using python functions, can be seen at link below :

Truel solver as Python based Jupyter notebook

Of course, main point in using Jupyter was to have interactive document. That document ( including both python source and Truel problem analysis) is available at GitHub repository , and also in ZIP form at this site. ( zip also contains already precalculated cache file, to save some 45 min of initial calculation time ). Document is best used with JupyterLab

While above link demonstrate how that solution was used to analyse Truel cases, point of this blog is to give my summary on usability of Python and Jupyter notebooks – which was initial reason why I decided to solve “Truel” problem .

Shortest possible summary would be:

Jupyter/Python/numba combination was excellent match for this problem

Especially suitable was Jupyter document, because it allows interactivity and easy analysis of different cases, while still resulting in visually good looking document. Great thing about Jupyter notebook is that it does not recalculate entire document when one cell is calculated – it remembers already calculated variables or compiled parts. This is in contrast to running same code in Visual Studio – where each small change required execution of entire Python code.

Python itself was not so excellent match out of the box for this problem, because problem is very computationally intensive – especially for 2D analysis where python functions need to solve millions of times for 1000×1000 images. And Python, by default, was much slower than solutions in C# for example. In situation where interactivity is important, it was not acceptable to wait 10+min for every analysis image. But, apart from speed, Python was good match due to simplicity of coding and especially due to great modules like numpy ( for array/matrix operations) and matplotlib ( for 2D visualization )

Python performance issues gave me reason to explore numba – which is Python module that allows ‘just in time’ compilation of python code. Eventually that proved right combination – numba accelerated python functions were fast enough to produce 2D solutions in seconds on average, which was acceptable from interactivity point of view.

Problems and shortcomings of Jupyter/Python/numba ( and workarounds )

While eventually this proved to be good match, each of those technologies had some problems or limitations – some of them were overcome in this solution, while some remain:

  • python is slow – standard python is slow when millions of complex calculations are needed. But this can be overcome by using numba
  • numba often requires rewriting python code – mostly due to type ambiguities, but also some python features are not supported in numba. This can not be exactly overcome, but is easy to comply when writing numba code from start. Modifying old python code to numba is also not hard usualy – but can be tricky in some cases.
  • Jupyter notebook does not have debug option – some bugs are hard to detect without that. This can be be overcome by running same code in Visual Studio , and debugging there. Not ideal option, since it may require slight code rearrangement – and also does not support numba debugging ( solvable by temporary marking functions as non-numba, since numba code is also valid python code ).
  • Jupyter nootebook often requires ‘run all cells’ – and that can result in 30min computation for entire Truel document, which have many complex 2D comparisons ( most needing just few seconds, but some need few minutes each ) . I solved this problem by introducing cache for large results ( eg 2D analysis data ), and running code again without forcing recalculation will simply retrieve last result from cache – resulting in 30x faster ‘run all cells’ ( with majority of remaining time spent to recompile all numba functions )

Conclusion:

Jupyter notebook document, based on Python with numba accelerated functions and matplotlib enabled visualizations , was great match for this problem – and likely to be good match for any similar problem that requires interactivity and visualization.

Publishing Visual Studio dotnet app to Linux


Initially I published my Blazor proof-of-concept projects to Azure , and that is fairly straightforward to setup in Visual Studio ( after initial fairly complicated Azure site setup ). But I decided to use Linux as publish target due to two reasons:

  • to reuse same DigitalOcean droplet used for this WordPress site
  • to have sites under my gmnenad.com domain, at reasonable price

Azure allows using your domain instead of appName.azurewebsites.net , but if you want also SSL on those custom domains ( which you must have in order to install PWA apps, like https://orao.gmnenad.com ), then Azure require moving to at least B1 tier – which is both more expensive ( over $50/month , compared to $0 for F1 or under $10/month for D1 ) and with worse performance ( just 1 CPU, compared to multiple CPUs on shared F1/D1 plans ).

But while reasons for publishing dotnet apps on Linux hosts instead of Azure will be different and subjective for most people, problem remains same :

How to publish dotnet core app from Visual Studio to your Linux host in the most efficient way ?


Visual Studio Publish options

While manual publishing of files to any host is always possible, what I needed is “one-click” publish integrated in standard Visual Studio publish process ( right-click VS project, ‘publish’ ), and currently supported options are:

  • Azure
  • Docker Container Registry
  • Folder
  • FTP/FTPS Server
  • Web Server (IIS)

Obviously, for publishing directly to Linux host in order to be served with same web server ( Apache or Nginx ) as WordPress site, I had to ignore Azure and IIS, and even Docker options. Logical choice was therefore FTP/FTPS Server. But …

FTP/FTPS Server was bad option

FTP was not installed by default on Linux droplets that I used, and furthermore I consider simple FTP too insecure so I installed FTPS , which ( in short version ) included :

  • installing vsftpd
  • creating ftpuser and linking his /home/ftpuser/ftp/www to /var/www
  • mount has to be used ( add to /etc/fstab) , since FTPS do not work with symlinks
  • create openssl certificate for vsftpd.pem
  • significantly change default vsftpd.conf ( ssl options, chroot, userlist, passive…)
  • allow at firewall (ufw) FTP direct and passive ports (20,21,11000-12000)
  • in Visual Studio, add FTP Publish Profile ( to /ftp/www/appName )

This “almost” works and allows standard “one-click” publish from Visual Studio , but has significant drawbacks:

  • complicated to setup
  • security implications ( another user, more open ports …)
  • Visual Studio report 426 error for every copied file, and report ‘Failed Publish’ at end
  • slow transfer ( maybe partially due to all those reported errors )
  • no automatic restart of dotnet service on Linux

Reason why I mentioned “almost work” even with VS reporting failure is because all files end up transferred to Linux – that reported error is difference in how Windows and vsftpd think they should do FTPS : vsftpd expect other side to confirm ( with code 3 ) when it ends SSL session for one uploaded file, and when windows do not send that code, vsftpd sends 426 error back. Note that it is not windows vs Linux issue, since I tested curl in Linux, and it has same problem with vsftpd.

But while I could ignore error(s) reported by VS, main issue was that after FTPS publish I still had to manually SSH to Linux box to restart service for that dotnet app before change is visible in browser.

End result is that using FTPS was not a good option, which is also reason why I didn’t give details here about specific steps listed above . Instead, I moved to right option:


Folder publish is right option

Of course, folder publish on its own will only publish locally, so it had to go in tandem with some app that support file transfer. Initially I tried FileZilla but it does not have scripting support, so much better option was WinSCP – it does support scripts, and is very good choice even for other file operations between windows and Linux ( unrelated to publish ).

Short version of what is needed with this approach:

  1. install WinSCP and make it work with your Linux box using SSH keys as wwwuser
  2. create Visual Studio Folder Publish Profile
  3. create WinSCP script and modify FolderPublish profile to call that script


First step is standard one and not related specifically to Visual Studio. While it is mostly straightforward, here is detailed description of Linux steps to create wwwuser, pair of SSH keys, and allow that user to SSH using those keys . It was done on Ubuntu 18.04 (Bionic).

# assume those commands are run as root in terminal

# add new wwwuser ( in www-data Apache group, optional ) and set his password 
adduser --ingroup www-data wwwuser 
chpasswd wwwuser:*somepassword*

# switch to wwwuser, so keygen generate folder in his home
su - wwwuser
# create new SSH pair of keys in /home/wwwuser/.ssh folder
ssh-keygen
# insert public key to allow wwwuser to SSH with its private key
cat /home/wwwuser/.ssh/id_rsa.pub >> /home/wwwuser/.ssh/authorized_keys
# set ReadOnly to authorized_keys
chmod 644 /home/wwwuser/.ssh/authorized_keys

# MOVE KEYS FROM /home/wwwuser/.ssh , leave only authorized_keys

Above should allow to connect from WinSCP to Linux box using wwwuser. Those keys should be moved from Linux box – they are not needed there anymore, and private key (id_rsa) will be needed on Windows box for WinSCP. To test it: install WinSCP, open “New Session” , enter your server IP or domain, then press “Advanced” button and enter path to private key in SSH/Authentication section, as shown below:

Allow WinSCP to automatically convert that private key from Linux format to its own format and save it as PuTTy *.ppk file in same folder. It should be enough for “Login” to work – after which you can save it for further use.

Second step is also standard one, not related to Linux – creating Visual Studio “Folder” publish profile.

Right click on project in Visual Studio, select “Publish” and then “New” if this is not first publish profile. Select “Folder” option, and after Next, leave options as default ( it offers location as “bin\Release\netcoreapp3.1\publish\” ) and just Finish creation. You can select “Edit” to change few options that were not available at creation step, but I tend to leave those as defaults too. This created new “FolderProfile.pubxml” in VS project under Properties/PublishProfiles .

To test , just right-click on VS project, select Publish and click “Publish” button – it should build your dotnet core app, and store it in “publish” folder from above.

Third step is where we integrate publish process with WinSCP, to automatically transfer published files to Linux.

It had two challenges:

  • make WinSCP script to non-interactively copy files AND restart app service
  • find correct place in VS publish process ( AfterTargets=”???” )

While previous two steps are agnostic toward type and location of actual dotnet app on Linux box, in this step we need to know that for script . In my case, I had following assumptions :

  • app type: dotnet app hosted by Kestrel ( which was set as Linux service systemctl start appName )
  • app location: /var/www/appName
  • name of systemctl service is same as name of folder under /var/www : “appName”
  • Linux script “restart_app.sh” was copied to /var/linuxVM

Location is usual one for web apps, and hosting dotnet app as systemctl service which runs Kestrel local web server with Apache/Nginx proxy in front is standard “type” of hosting both for Nginx or Apache web servers .

For WinSCP script, I created file “publishLinux.sh” in VS project root:

# open sftp session with wwwuser SSH key
open sftp://wwwuser@yourDomain.com/ -hostkey="ssh-ed25519..yourHostKey=" -privatekey="C:\path\to\private\key\user.ppk"

# create /www/appName folder ( but ignore error if already exists , with batch continue )
cd "/var/www"
option batch continue
mkdir "%2%"
option batch off

# go to appName folder and delete all old files. CD should throw error if folder does not exist
cd "%2%"
rm *.*

# on Windows, go to publish folder, and copy all from it to www/appName on Linux
lcd "%1%"
put * 

# restart app service on Linux
call sudo /var/linuxVM/restart_app.sh "%2%"

# finish script with OK 
exit

This script uses two parameters:

  • %1% : first parameter is location of published files on Visual Studio machine
  • %2%: second parameter is name of my app ( one word )

For script to work, you need to copy previously defined private SSH key to VisualStudio machine. If you manually opened WinSCP section as mentioned at end of first step, easiest way to get those values is to select any file on left side panel in WinSCP ( on Windows side ) and click “Upload” button. That will open Upload Dialog, where you should expand “Transfer Settings” combo box/button and select “Generate Code…“. That will show commands in “Script File” format, and you only need to copy first ‘open sftp:…” line ( which has correct host key and path to private key ) over to above script.

For restarting our Linux app after publish is done, WinSCP script relies on “restart_app.sh” bash script previously copied to /var/linuxVM folder :

#!/usr/bin/env bash
if [ ! -z "$1" ]; then
	# restart app service
	# instead of systemctl restart, since this will start even if it was stopped before
	sudo systemctl stop "$1" 2>null
	sudo systemctl start "$1" 2>null
	# reload proxy server too, may be needed by some indexed apps
	sudo systemctl reload apache2 2>null
fi

This restart_app.sh script was made with few assumptions:

  • we use Apache as proxy server. Alternative last line for Nginx reload would be: sudo nginx -s reload
  • we can have both “proxied” apps ( with Apache/Nginx “ProxyPass” to http://localhost:500x hosted by dotnet Kestrel ) and “indexed” apps ( where app index.html is served directly by Apache or Nginx, option for Blazor WASM apps )
  • we added to /etc/sudoers : wwwuser ALL=NOPASSWD: /var/linuxVM/restart_app.sh *

If we only use “proxied” apps ( since Blazor WASM apps can also be used that way ) , we may not need to reload web server, so “reload Apache” would not be needed in “restart_app.sh”. Also, in order to allow ‘call sudo /var/linuxVM/restart_app.sh “%2%”‘ from WinSCP script without being asked for sudo password, we need to add our “restart_app.sh” script to “/etc/sudoers” file ( wildcard * at end will allow us to supply any parameter ). In theory it would be possible to call above commands directly from “publishLinux.sh” WinSCP script using call , and skip “restart_app.sh” – but it would require changes on all our Visual Studio installations if we change from Apache to Nginx and, more importantly, would require giving sudoers rights to wwwuser for unrestricted “systemctl *”, which is not good security practice. Using our “restart_app.sh” script also allows us to further check if supplied appName is one of ours (if we want more security).

Last part is calling “publishLinux.sh” from Visual Studio publish profile. To do that, open Properties / PublishProfiles / “FolderProfile.pubxml” in Visual Studio ( that is Properties folder under project root, not project options ) and add new <Target Name> section after last </PropertyGroup> , so that modified profile looks like this:

<?xml version="1.0" encoding="utf-8"?>
<!--
https://go.microsoft.com/fwlink/?LinkID=208121. 
-->
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>
    <DeleteExistingFiles>True</DeleteExistingFiles>
    <ExcludeApp_Data>False</ExcludeApp_Data>
    <LaunchSiteAfterPublish>True</LaunchSiteAfterPublish>
    <LastUsedBuildConfiguration>Release</LastUsedBuildConfiguration>
    <LastUsedPlatform>Any CPU</LastUsedPlatform>
    <PublishProvider>FileSystem</PublishProvider>
    <PublishUrl>bin\Release\netcoreapp3.1\publish\</PublishUrl>
    <WebPublishMethod>FileSystem</WebPublishMethod>
    <SiteUrlToLaunchAfterPublish />
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <ProjectGuid>20245b63-e767-4b2a-8261-312f840e8213</ProjectGuid>
    <SelfContained>false</SelfContained>
  </PropertyGroup>

  <Target Name="LinuxPublish" AfterTargets="FileSystemPublish">
    <Message Importance="high" Text="*** Linux Publish             ... copying to LinuxVM ... " />
    <Exec Command="call "C:\Program Files (x86)\WinSCP\WinSCP.exe" /ini=nul /script=publishLinux.sh /parameter // "$(PublishUrl)" appName " />
  </Target>

</Project>

As mentioned above, finding correct place in VS publish process to insert our call is important. Here I had to do few trials and errors until I found AfterTargets=”FileSystemPublish” to be suitable ( called after publish folder is complete, and called regardless if rebuild was done or not ). Since this may change in the future , if Microsoft change publish process order, one way to find best AfterTarget is to set VS [ Tools-> Options-> Projects and Solutions-> Build and Run-> MSBuild project build output verbosity ] option from default “Minimal” to “Diagnostic” , then run publish and find in output what was last ‘Done building target “XYZ”‘ or similar message mentioning completion of some target, then use that last mentioned target name.

Only thing that need change in above FolderProfile.pubxml is ‘appName’ at the end of exec call ( line #23 ), which will define both folder on Linux where to copy and Linux service to restart. Other parameter ( publish folder location ) is automatically set by $(PublishUrl). In case that publish is failing, you can add “/log=WinSCP.log” before “/script=” in line #23 , as debug option.

That means each dotnet project will have its own FolderProfile.pubxml ( with its own publish folder and appName ), but they all can call same publishLinux.sh .

Good thing about this approach is that publish process will wait until file transfer is done, and correctly report success ( or failure if something was not copied ), with output similar to:

...
*** Linux Publish             ... copying to LinuxVM ... 
call "C:\Program Files (x86)\WinSCP\WinSCP.exe" /ini=nul /script=publishLinux.sh /parameter // "bin\Release\netcoreapp3.1\publish\" appName 
Web App was published successfully file:///E:/sourcePath/bin/Release/netcoreapp3.1/publish/

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
========== Publish: 1 succeeded, 0 failed, 0 skipped ==========

Benefits of this Folder option over FTPS option :

  • easier to setup
  • no additional services and open ports on Linux
  • Visual Studio correctly report success or error
  • faster transfer
  • automatic restart of dotnet service on Linux

End result is real “one-click” publish of dotnet app from Visual Studio to Linux host.

Starting Blog

As mentioned previously, blogging is not primary goal for this site but it will still be used for interesting issues related to projects mentioned on this site, technologies used or just some general issues.

Few blogs that are “soon to come” will be related to issues of hosting Blazor apps on Linux machine like this one.