Why use PostgreSQL
MySQL has been the king of databases for a long time, but these days there are relational DBs, document DBs, graph DBs, and K/SV stores. Within each category, there's a number of different databases. The question is how can we make the best choice from the various options?
Database Popularity Evolution from StackOverflow(2017-2022)
A research report published by StackOverflow gives us an answer! PostgreSQL has become the most popular DB technology in 2022. For the first time, PostgreSQL (46.48%) topped MySQL (45.68%) in the popularity rankings for the professional developer community. Its popularity is growing rapidly in the past few years, the report ranking it as the "Most Wanted" database and DB-Engines also declares PostgreSQL as the DBMS of the Year 2020.
Why is PostgreSQL so popular? Why we use PostgreSQL? This article will give you the answer, and help you choose the best option for your next project.
What is PostgreSQL?
PostgreSQL is a full-featured object-relational database management system (ORDBMS), which was started in 1986 under the direction of Michael Stonebreaker, a professor at the University of California, Berkeley. It is a powerful cross-platform open source ORDBMS that runs on many operating systems such as Linux, FreeBSD, OS X, Solaris, and Microsoft Windows. It has excellent transaction processing capability and powerful data analysis capability. In terms of application scenarios, PostgreSQL is suitable for rigorous enterprise application scenarios (e.g. finance, telecom, ERP, CRM).
PostgreSQL supports highly transactional, mission-critical applications, it is also well documented with a large number of free online manuals, and archived reference manuals for older versions.
PostgreSQL's main benefit is it's high data consistency and integrity, but it also encapsulates four important features: Atomicity, Consistency, Isolation, Durability (known as ACID), which ensure the transactions remain correct and reliable in DBMS during the process of writing or updating data.
What is ACID in PostgreSQL
In a database management system (DBMS), there are four characteristics that must be present to ensure that transactions are correct and reliable when writing or updating data: atomicity, consistency, isolation, and durability.
A transaction is a sequence of one or more operations. For a database, this sequence of operations is either all executed or none executed, and is an inseparable unit of work for the database.
Transactions usually consist of multiple statements. Atomicity guarantees that each transaction is treated as a single unit, and that the transaction either succeeds completely or fails completely. In other words, if any statement in a transaction fails to complete, the entire transaction fails and the data entries in the incomplete transaction remain unchanged in the database. An atomic system must guarantee atomicity under all circumstances, including power failures, database errors, and instance runs. At the same time, atomicity prevents the occurrence of partial data updates in the database.
Consistency ensures that transactions can only change the database from one valid state to another, guarantees that the final state of the database data is consistent. Consistency is mainly ensured by defining rules such as constraints on table objects, triggers, cascades or a combination of rules between them. These rules are used to prevent abnormal transactions in the database.
Isolation guarantees that multiple transactions can occur simultaneously. The simplest example is that multiple transactions can read and write to a table at the same time. Isolation ensures that multiple transactions can proceed simultaneously and not affect each other.
Durability is to ensure that once a transaction is committed, even in the event of a system exception (e.g., power failure, instance crash), the incomplete transaction can be rolled back, and the completed transaction can be automatically committed and recorded on a permanent storage medium.
ACID in PostgreSQL
PostgreSQL mainly uses two technologies, Multi-Version Concurrency Control (MVCC) and Write-Ahead Logging (WAL), to implement ACID features, ensuring the validity of the data in case of exceptions.
- In PostgreSQL, the transaction ID is left on the data operated by add, delete and change, so that the batch operation can be easily committed or completely undone, thus ensuring the atomicity of the transaction.
- Using MVCC, read operations do not block write operations, and write operations do not block read operations, improving performance under concurrent access.
- The rollback of a transaction can be done immediately, regardless of the number of operations performed by the transaction.
- Data can be updated in large quantities and will not need to ensure that the rollback segment is not exhausted as in MySQL, InnoDB and Oracle.
Benefits of PostgreSQL
The multi-process architecture makes PostgreSQL more stable under abnormal conditions.
- A subprocess will not affect the operation of other processes even if it crashes.
- Under high concurrent reads and writes, PostgreSQL's performance metrics can maintain a hyperbolic or even logarithmic curve with no drop after the peak, while MySQL will clearly dip after the wave.
The scalability of a database system is directly dependent on the ability to compress data. Ideally, database systems must have advanced, off-the-shelf compression techniques. In some database systems, developers have to compress manually, which is not only time-consuming but also inefficient. PostgreSQL provides it for free and the whole process is automatic.
PostgreSQL basically releases a patch version every 3 months, which means that known bugs are quickly fixed and needs are met in a timely manner. One major version update per year, keeping it fresh and the more powerful it is.
- In terms of tables and views, PostgreSQL supports temporary tables and can use PL/pgSQL (Procedural Language/ Postgres SQL), PL/Perl, PL/Python to materialize views.
- For indexes, it fully supports R-/R+tree indexes, full-text search, bitmap indexes, hash indexes, reverse indexes, partial indexes, expression indexes, Generalized Search Trees (GiST).
- For other objects, PostgreSQL supports data fields, stored procedures, triggers, functions, external calls, and
- In terms of data table partitioning, it supports 4 kinds of partitioning, range partitioning, hash partitioning, hybrid partitioning, list partitioning.
- In terms of stored procedures, PostgreSQL supports stored procedures. This point can avoid the transmission of a large number of raw SQL statements on the network.
- In terms of extensions of user-defined functions, PostgreSQL can be more easily extended using User Defined Functions (UDF).
- Point-In-Time-Recover (PITR), a feature introduced in PostgreSQL starting from version 8.0, allows database clusters to be recovered to any point in time using base backups and continuous archived logs.
- Provide high availability services through asynchronous or synchronous replication methods across servers.
Ease of operation and maintenance
- Data Definition Language (DDL) can be placed in a transaction, delete tables, truncate, create functions, indexes, can all be placed in a transaction to take effect atomically, or rolled back. This can do a lot of cool things, like changing two tables via RENAME in one transaction.
- Can concurrently create and delete indexes, add non-null fields, reorganize indexes and tables. This means that major schema changes can be made online at any time, and indexes can be optimized as needed.
- Various replication methods: segment replication, stream replication, trigger replication, logical replication, plug-in replication, etc. It makes it very easy for services to migrate data: you can directly copy, re-read, and re-write.
- Various submission methods: asynchronous submission, synchronous submission, and quorum synchronous submission. This means that PostgreSQL allows trade-offs and choices between C and A, such as using synchronous commit for transaction libraries and asynchronous commit for normal libraries.
- The system view is complete and easy to monitor.
- The existence of Foreign Data Wrappers (FDW) makes Extract-Transform-Load (ETL) extremely simple. FDW makes it easy for one instance to access data or metadata of other instances. It is useful in cross-partition operations, database monitoring metrics collection, data migration and other scenarios. It can also interface with many heterogeneous data systems.
SQL Battle: PostgreSQL vs MySQL
Both PostgreSQL and MySQL are relational databases that organise data into tables. These tables can be joined or related based on data shared by each table. Relational databases enable your business to better understand the relationships between available data and help gain new insights to make better decisions or discover new opportunities.
Comparison of PostgreSQL and MySQL
- PostgreSQL is a free and open source system.
- MySQL is an open source system developed by Oracle and offers several paid versions for users to access.
- PostgreSQL is suitable for use in large-scale systems that require high read and write speeds.
- MySQL is mainly used for web applications which only need a database to perform data transactions.
- PostgreSQL fully adheres to ACID principles and ensures all requirements are met.
- MySQL is only ACID-compliant when using InnoDB and the NDB cluster storage engine.
- PostgreSQL can run on Linux, Windows (Win2000 SP4 and above), FreeBSD, OpenBSD, NetBSD, Mac OS X, AIX, IRIX, Solaris, Tu64, HP-UX OS, and the open source Unix OS.
- MySQL can run on Oracle Solaris, Microsoft Windows, Linux Mac OS X, and open source FreeBSD OS.
Programming Language Support
- MySQL is written in C and C++. It supports C/C++, Erlang, PHP, Lisp (ListProcessing), Golang, Perl, Java, Delphi, R, and Node.js.
- PostgreSQL supports materialized views.
- MySQL does not support materialized views.
- PostgreSQL supports master-standby replication and can also handle other types of replication by implementing third-party extensions.
- MySQL supports master-standby replication, where each node is the master and has the right to update data.
- PostgreSQL is highly scalable, can add and have data types, operators, index types and functional languages.
- MySQL does not support Scalability.
- PostgreSQL has an active community of support that helps improve existing features, and its creative committers do their best to ensure that the database stays up-to-date with the latest features and maximum security, making it the most advanced database available.
- MySQL also has a large community of followers, while these community contributors, especially after its acquisition by Oracle, will only occasionally focus on new features as they emerge and maintain existing features.
Advantages of PostgreSQL over MySQL
- PostgreSQL is under the BSD (Berkeley Software Distribution) open source protocol, which allows the development and sale of commercial versions based on PostgreSQL, which has more advantages in commercial scenarios. MySQL is under the GPL (General Public License) agreement, which means if a software uses the GPL agreement, then the software must also be open source, which is a strict limitation.
- MySQL's storage engine plug-in mechanism has the problem that the lock mechanism is complex and affects concurrency, while PostgreSQL does not exist.
- PostgreSQL uses physical replication. Compared with MySQL's binlog (binary log: record write operations: add, delete, change, exclude query in a MySQL database) based logical replication, data consistency is more reliable, replication performance is higher, and the impact on host performance is smaller.
- From an enterprise user's perspective, MySQL is a single functional component dedicated to OLTP and often requires ES, Redis, Mongo, and others to work together to meet complete data storage needs, whereas PostgreSQL basically doesn't have this problem.
- MySQL generally leaves the data compliance verification to the client; PostgreSQL does a stricter job in terms of legality difficulties. For example, if you insert the time "2012-02-30" in MySQL, it will succeed, but the result will be "0000-00-00"; PostgreSQL does not allow inserting this value.
- Better performance. PostgreSQL performs nearly 50 times faster than MySQL when it reads 10,000 records at random after inserting 10 million data. (Test conditions: same database structure, table structure, insertion of 10 million random data.)
6.1. Time for a random query of 10,000 records out of 10 million records using PostgreSQL.
6.2. Time for a random query of 10,000 records out of 10 million records using MySQL.
Challenges of using PostgreSQL
There are some several disadvantages of PostgreSQL:
- Because of using Multiversion Concurrency Control (MVCC), the database needs to be VACUUM (In PostgreSQL operations, tuples that are deleted or deprecated by updates are not physically removed from their tables, they will remain until a VACUUM is performed) regularly, and tables and indexes need to be maintained regularly to avoid performance degradation.
- The slow query log and the general log are mixed together, need to be parsed and processed.
- The official PostgreSQL does not have a very useful column store.
MySQL is user-oriented, answering the question "what problem do you want to solve"; while PostgreSQL tends to be theoretical, answering the question "how should the database solve the problem", which leads its sustainable development, with rigorous transaction processing capabilities and powerful data analysis capabilities. In terms of application scenarios, PostgreSQL is more suitable for strict enterprise application scenarios (e.g. finance, telecom, ERP, CRM), while MySQL is more suitable for Internet scenarios with relatively simple business logic and low data reliability requirements.
Frequently Asked Questions:
Q: MongoDB vs PostgreSQL - which is better for most applications?
A: MongoDB and PostgreSQL are stored in different data structures. PostgreSQL is preferred when a high level of security is required and the number of transactions or queries is large. MongoDB is used for unstructured data storage and is not suitable for highly secure applications.
Q: What are the main differences between SQL vs NoSQL?
A: We've summarised the main differences between the two databases in the image below.
Tim is the face of the company. When you want to kick off a new project, or an update on your existing project, Tim is your man!