Building Azure Functions with Postgres

Cloud Architect
4 min readOct 23, 2022

--

Microsoft makes it very straightforward to build services accessing Postgresql on Azure, however there are a few learning for someone (such as me) who have previously done this with ASP.NET, IIS and SQL Server.

Performance & Load Testing

A single Azure Function calling Postgres may perform really well. However if it is locking tables and indexes then it may not scale well at all. In fact your awesome query that runs in a few milliseconds may actually take many seconds when competing for database resources against other queries running at the same time.

Ensure that you perform a load test with something like JMeter and simulate 20 users making constant function calls for 10 minutes, and you will likely discover performance issues that you missed.

Close That Connection

NPGSQL is the recommended driver for .NET, but I discovered that it will soon fail if you keep a single connection open as a Singleton service. Best to declare it as Transient it each time, pull a connection from the pool and close it at the end of each Function call.

Speed up text search with GIN Indexes

GIN is a great performance extension for text searching, including LIKE and Similarity operations. This can dramatically reduce your query time over millions of records. Not available in SQL Server.

Get familiar with Pgadmin

I don’t love the user interface of Pgadmin at all. I usually create my queries in VS Code and paste them in. However it is fully featured allowing you to view and manage multiple servers, as well as providing graphical output for your query ‘EXPLAIN’ statements. There are other commercial tools that may be worth considering if you are using this every day.

Avoid ‘IN’ or ‘ANY’ for large result sets

I really like the simplicity of ‘IN’ or ‘ANY’ in a SQL statement, however for a large result set they can have significant performance impacts. I found that using an ARRAY cast as well, such as: ‘WHERE myID IN( ARRAY [ (SELECT myId FROM myTable WHERE myCond = @Param1) ] ) will speed things up noticeably. Even better would be to use a proper table join.

Handle all sensitive data on Postgres itself

Your Azure Function should be secure, but if you can perform a query completely in Postgres itself then that is better from a security point of view. For example; if you are looking to return the sum dollar value of all accounts then that can be queried completely within Postgres with just the sum value returned to the Function. Even if you need to return sensitive data as part of the Function response, ensure that you are dealing with the minimum data set required.

Learning Stored Procedures in Postgres

If you are coming from SQL Server, then the Postgres implementation of Stored Procedures requires some new learning: LINK

Don’t throw more iron at the problem until the end

If you are experiencing poor query performance, do not reach for the Azure settings and increase the resources for your Postgres database. All this does will provide marginally better performance and do nothing to solve the underlying query inefficiencies. Work with EXPLAIN and see where most of the work is happening. One well placed index (for $0) will provide more performance increase than $4,000/month of Azure resources.

After development has completed, then you can run load performance comparison tests between Postgres machine sizes.

Experiment with WORK_MEM (working memory)

Queries have a limit to how much memory they can use before writing to the disk. Disk writes are extremely expensive in terms of performance. You can check the Analysis output in Pgadmin to see if this is happening. Look at server level work_mem which is set at 4MB by default (Look in the Server Parameters in the Azure Portal). You can also set work_mem at the beginning of each query, but remember to reset it after. LINK

It is better to optimize your queries for less memory to show where the inefficiencies are. Allocating too much memory will restrict how many queries can run at the same time. Nevertheless, for some queries giving more working memory is an obvious way to help, and increasing from 4MB to 32MB (for example) is fairly safe in the server configurations that we have in 2022. Specific Microsoft guidance on this parameter here: LINK

Note that Hyperscale below automatically sets work_mem, so no need to touch in that scenario.

Using Hyperscale (advanced)

If you really need high performance over millions/billions of records then take a look at Microsoft’s PostgreSQL Citus Hyperscale that can split up your tables and run your query over different parts of them in parallel: Overview of Azure Database for PostgreSQL — Hyperscale (Citus) | Microsoft Learn

Again, this performance feature should be used to improve queries that are already highly efficient.

--

--

No responses yet