Now that we've covered an introduction to PostgreSQL 13, we want to focus our attention on the next important topic. Locking is a vital concept for any kind of database. It is not enough to understand just how it works to write proper or better applications—it is also essential from a performance point of view. Without handling locks properly, your applications might not only be slow; they might also behave in very unexpected ways. In my opinion, locking is the key to performance, and having a good overview of this will certainly help. Therefore, understanding locking and transactions is important for administrators and developers alike. In this chapter, you will learn about the following topics:
- Working with PostgreSQL transactions
- Understanding basic locking
- Making use of FOR SHARE and FOR UPDATE
- Understanding transaction isolation...
Working with PostgreSQL transactions
PostgreSQL provides you with highly advanced transaction machinery that offers countless features to developers and administrators alike. In this section, we will look at the basic concept of transactions.
The first important thing to know is that, in PostgreSQL, everything is a transaction. If you send a simple query to the server, it is already a transaction. Here is an example:
test=# SELECT now(), now();
now | now
-------------------------------+-------------------------------
2020-08-13 11:03:17.741316+02 | 2020-08-13 11:03:17.741316+02
(1 row)
In this case, the SELECT statement will be a separate transaction. If the same command is executed again, different timestamps will be returned.
Understanding basic locking
In this section, you will learn about basic locking mechanisms. The goal is to understand how locking works in general and how to get simple applications right.
To show you how things work, we will create a simple table. For demonstrative purposes, I will add one row to the table using a simple INSERT command:
test=# CREATE TABLE t_test (id int); CREATE TABLE
test=# INSERT INTO t_test VALUES (0); INSERT 0 1
The first important thing is that tables can be read concurrently. Many users reading the same data at the same time won't block each other. This allows PostgreSQL to handle thousands of users without any problems.
The question now is what happens if reads and writes occur at the same time? Here is an example. Let's assume that the table contains one row and its id = 0:
Transaction 1 |
Transaction 2 |
BEGIN; |
BEGIN; |
UPDATE t_test SET id = id + 1 RETURNING *; |
|
User will see 1 |
SELECT * FROM t_test; |
User will see 0 |
|
... |
Making use of FOR SHARE and FOR UPDATE
Sometimes, data is selected from the database, then some processing happens in the application, and finally, some changes are made back on the database side. This is a classic example of SELECT FOR UPDATE.
Here is an example that shows the way SELECT is often executed in the wrong way:
BEGIN;
SELECT * FROM invoice WHERE processed = false; ** application magic will happen here ** UPDATE invoice SET processed = true ... COMMIT;
The problem here is that two people might select the same unprocessed data. Changes that are made to these processed rows will then be overwritten. In short, a race condition will occur.
To solve this problem, developers can make use of SELECT FOR UPDATE. Here's how it can be used. The following example will show a typical scenario:
BEGIN; SELECT * FROM invoice WHERE processed = false FOR UPDATE; ** application magic will happen here ** UPDATE invoice SET processed = true ... COMMIT;
SELECT FOR UPDATE...
Understanding transaction isolation levels
Up until now, you have seen how to handle locking, as well as some basic concurrency. In this section, you will learn about transaction isolation. To me, this is one of the most neglected topics in modern software development. Only a small fraction of software developers are actually aware of this issue, which in turn leads to mind-boggling bugs.
Here is an example of what can happen:
Transaction 1 |
Transaction 2 |
BEGIN; |
|
SELECT sum(balance) FROM t_account; |
|
User will see 300 |
BEGIN; |
INSERT INTO t_account (balance) VALUES (100); |
|
COMMIT; |
|
SELECT sum(balance) FROM t_account; |
|
User will see 400 |
|
COMMIT; |
Most users would actually expect the first transaction to always return 300, regardless of the second transaction. However, this isn't true. By default, PostgreSQL runs in the READ COMMITTED transaction isolation mode. This means that every statement inside a transaction will get a new snapshot of...
Observing deadlocks and similar issues
Deadlocks are an important issue and can happen in every database. Basically, a deadlock will happen if two transactions have to wait on each other.
In this section, you will see how this can happen. Let's suppose we have a table containing two rows:
CREATE TABLE t_deadlock (id int); INSERT INTO t_deadlock VALUES (1), (2);
The following example shows what can happen:
Transaction 1 |
Transaction 2 |
BEGIN; |
BEGIN; |
UPDATE t_deadlock SET id = id * 10 WHERE id = 1; |
UPDATE t_deadlock SET id = id * 10 WHERE id = 2; |
UPDATE t_deadlock SET id = id * 10 WHERE id = 2; |
|
Waiting on transaction 2 |
UPDATE t_deadlock SET id = id * 10 WHERE id = 1; |
Waiting on transaction 2 |
Waiting on transaction 1 |
Deadlock will be resolved after 1 second (deadlock_timeout) |
|
COMMIT; |
ROLLBACK; |
As soon as the deadlock is detected, the following error message will show up:
psql: ERROR: deadlock detected
DETAIL: Process 91521 waits for ShareLock...
Utilizing advisory locks
PostgreSQL has highly efficient and sophisticated transaction machinery that is capable of handling locks in a really fine-grained and efficient way. A few years ago, people came up with the idea of using this code to synchronize applications with each other. Thus, advisory locks were born.
When using advisory locks, it is important to mention that they won't go away on COMMIT as normal locks do. Therefore, it is really important to make sure that unlocking is done properly and in a totally reliable way.
If you decide to use an advisory lock, what you really lock is a number. So, this isn't about rows or data; it is really just a number. Here's how it works:
Session 1 |
Session 2 |
BEGIN; |
|
SELECT pg_advisory_lock(15); |
|
SELECT pg_advisory_lock(15); |
|
It has to wait |
|
COMMIT; |
It still has to wait |
SELECT pg_advisory_unlock(15); |
It is still waiting |
Lock is taken |
The first transaction will lock 15. The second transaction...
Optimizing storage and managing cleanup
Transactions are an integral part of the PostgreSQL system. However, transactions come with a small price tag attached. As we've already shown in this chapter, sometimes, concurrent users will be presented with different data. Not everybody will get the same data returned by a query. In addition to this, DELETE and UPDATE are not allowed to actually overwrite data since ROLLBACK would not work. If you happen to be in the middle of a large DELETE operation, you cannot be sure whether you will be able to COMMIT or not. In addition to this, data is still visible while you perform the DELETE operation, and sometimes data is even visible once your modification has long since finished.
Consequently, this means that cleanup has to happen asynchronously. A transaction cannot clean up its own mess and any COMMIT/ROLLBACK might be too early to take care of dead rows.
The solution to this problem is VACUUM. The following code block provides you with a...
Summary
In this chapter, you learned about transactions, locking and its logical implications, and the general architecture the PostgreSQL transaction machinery can have for storage, concurrency, and administration. You saw how rows are locked and some of the features that are available in PostgreSQL.
In Chapter 3, Making Use of Indexes, you will learn about one of the most important topics in database work: indexing. You will also learn about the PostgreSQL query optimizer, as well as various types of indexes and their behavior.
Questions
- What is the purpose of a transaction?
- How long can a transaction in PostgreSQL be?
- What is transaction isolation?
- Should we avoid table locks?
- What do transactions have to do with VACUUM?
Answers to these questions can be found in the GitHub repository (https://github.com/PacktPublishing/Mastering-PostgreSQL-13-Fourth-Edition)