PostgreSQL Tutorial

PostgreSQL also known as Postgres, was developed by Michael Stonebraker of the University of California, Berkley. It started as the Ingres Project and later evolved into Postgresql as we know today. In the year 1982, Michael Stonebraker started a post-Ingres project to address the problems with contemporary database systems. He was awarded the Turing Award in the year 2014 for the projects and techniques pioneered in them. The POSTGRES project aimed at adding fewest features like the ability to define various data types and to fully describe relationships – something used widely, but maintained completely by the end-user. POSTGRES used various ideas of Ingres, but had its unique source code. The initial version of PostgreSQL was designed to run on UNIX-like platforms. However, it was then evolved to be mobile so that it could run on other platforms such as Mac OS X, Solaris, and Windows.

What is PostgreSQL?

PostgreSQL is one of the most advanced general-purpose object-relational database management system and is open-source. Being an open-source software, its source code is available under PostgreSQL license, a liberal open source license. Anyone with the right skills is free to use, modify, and distribute PostgreSQL in any form. As it is highly stable, very low effort is required to maintain this DBMS.

Why PostgreSQL is best for you?

The key features that make PostgreSQL a reliable and user-friendly are listed below:

User-defined types
Table inheritance
Sophisticated locking mechanism
Foreign key referential integrity
Views, rules, subquery
Nested transactions (savepoints)
Multi-version concurrency control (MVCC)
Asynchronous replication
Native Microsoft Windows Server version
Tablespaces
Point-in-time recovery

Why PostgreSQL is unique in itself?

Below is the list of features that makes PostgreSQL stand-out among other DBMS:

The multi-version concurrency control (MVCC) feature was first implemented by PostgreSQL.
Custom functions developed in languages like C/C++, Python, Java, etc can be added to PostgreSQL.
As PostgreSQL is designed to be extensible, one can define their own data types, index types, functional languages, etc.
If one intends to remove any part of the system, one can always develop a custom plugin to enhance it to meet their specific requirements.

Large Scale users of PostgreSQL

A number of companies have built products and solutions using PostgreSQL. Few of those companies are Apple, Fujitsu, Red Hat, Cisco, Juniper Network, etc.

PostgreSQL – System Architecture

PostgreSQL is an open-source Database Management System that has an object-relational nature. PostgreSQL is a successor of one of the earliest systems i.e. POSTGRES system. It is one of the most widely used open-source database management systems.

PostgreSQL has a Client-server model of architecture. In the simplest term a PostgreSQL service has 2 processes:

Server-side process: This is the “Postgres” application that manages connections, operations, and static & dynamic assets.
Client-side process(Front-end applications): These are the applications that users use to interact with the database. It generally has a simple UI and is used to communicate between the user and the database generally through APIs.

Client-side Process :

When the user runs queries on PostgreSQL, the Client Application can connect to the PostgreSQL server (Postmaster Daemon Process) and submit queries through one of many Database Client Application program interface supported by PostgreSQL like JDBC, Perl DBD, ODBC, etc. that helps to provide client-side libraries. In the Client Process, the Communication between Client Application and Client Application library occurs with the help of Library API as shown in the figure below:

PostgreSQL System Architecture

1. Postmaster Daemon Process :

The system architecture of PostgreSQL is based on Process-Per-Transaction Model(Client/Server Model). A running PostgreSQL site is managed by Postmaster which is a central coordinating process. It is also known as Server Process.

The postmaster daemon process is responsible for :

Initializing the server
Shutting Down the server
Handling Connection requests from new clients.
Perform Recovery.
Run Background Processes.

Shared Memory: Shared memory is the memory that is simultaneously accessed by multiple programs in order to provide fast and efficient results with less redundancy. This is the memory that is reserved for Database Caching and transactional log caching. In PostgreSQL shared Disk Buffer and Shared Tables are Used whose working is explained below:

Shared Disk Buffer: The purpose of the shared disk buffer is to minimize the disk Input/Output. If it is not used, then the Disk Input/Output takes more time which causes redundancy and an inefficient system. The advantages of using a shared buffer are:

Reduce time.
Can Access a very large amount of Data Easily.
Minimize heating when multiple users is accessing at the same time.

Shared Tables: This approach involves using the same set of tables to host multiple client data. The main advantages of using this approach are:

The Lowest Hardware Cost
The Lowest Backup Cost
It allows working with large data in a single database.

UNIX System: In UNIX System Kernel Disk Buffer maintain a Memory buffer and provide physical Storage to Data in Disk Storage. Also, the command of PostgreSQL is verified that the syntax is written is correct and provide an error message with the reason that what is missing in the command, etc.

2. Back-end process:

The Postmaster is responsible for handling initial client connections. For this, it constantly listens for new connections as a known port. After Performing an initialization process such as authentication of the user, the postmaster will give rise to a new backend server process to handle the new client. The client interacts only with the Backend server process like submitting queries and receiving queries result. This will show that PostgreSQL actually uses Process- per-transaction model.

The Backend Server is responsible for Executing queries submitted by the client by performing specific operations. Each backend server will handle only a single query at a time. At a time, multiple clients are connected to the system hence multiple backend servers executing queries Concurrently. The back-end server access data from the main-memory buffer pool which is placed in shared memory.

After that, the result obtained is provided to the Client Process by Back-end Process.

WAL (Write Ahead Log) Writer	This process Write and flushes WAL Data on WAL buffer.
logging collector	This process is also called logger. It will write an error message to the log file.
Auto vacuum launcher	When auto vacuum is enabled, this process has the responsibility of the auto vacuum daemon to carry vacuum operations on bloated tables. This process relies on the stats collector process for perfect table analysis.
Archiver	When Achiever is enabled, the process has the responsibility to copy WAL log file to the specified directory.
stats collector	In this statistics information like pg_stat_activity and pg_stat_all_tables are collected.
checkpointer	When a checkpoint occurs, the dirty buffer written to the file.
writer	It will periodically write the dirty buffer to the file.

3. Shared Pool:

The Shared pool is a RAM area within the RAM Heap that is created during the starting time. A shared pool is a component of SGA (System Global Area). If Shared Pool is not available in RAM or it is not used then it results in high library cache reloads, high row cache reloads.

Why PostgreSQL didn’t use Shared Pool?

PostgreSQL doesn’t provide a shared pool although most of the Database Systems like Oracle, the Shared pool is an important component of its structure. It doesn’t have because PostgreSQL will provide a feature to share SQL information at the process level as compared to Shared Pool. Simply, if the user will execute the same SQL query several times in one process, it will hard-parse only once which is advantageous over other Database systems because, in another database system that uses a shared pool, the hard-parse occurs for a single SQL statement that is loaded from Shared pool. If the user executes simultaneously a single SQL query several times then it will cause more load.

4. OID in PostgreSQL :

OID stands for Object Identifier types. OID is used by PostgreSQL as a Primary Key for various system tables. It is implemented as an unsigned four bytes integer. We can also have an option to use OID in the user-defined table as” WITH OIDS ” but is discouraged to use because it is not large enough to provide uniqueness in a Large User-Defined Table. OID’s usually fits best for system tables. It basically gives a built id for every row, contained in the system column.

In PostgreSQL 12 version, the feature of OID for User tables has been removed indirectly i.e. we can use OID explicitly.

Merits of PostgreSQL:

PostgreSQL is a highly risk-tolerant Database and requires low maintenance cost.
It uses LAMP (Linux, Apache, MySQL, PHP) Stack to execute dynamic website and web-application.

Demerits of PostgreSQL:

It is a bit slow as compared to a commercial Database.
It doesn’t support the various open-source applications as compared to MYSQL.

Search This Blog

PostgresSQL