So Lets start with some basic first.
Atomicity once a transaction is not committed it must be rolled back.
Consistency while making data changes the rules of the data must be followed.
Isolation every transaction must run seperately from each other
durability for all transactions proper logging using transaction logs should happen to solve cases of disaster.
Transaction : A transaction is a unit of work in a database that typically contains several commands that read
from and write to the database.
Types of transaction :
Implicit : just SQL server engine processes the transaction query
Explicit : these transactions start with BEGIN tran and end with either COMMIT or ROLLBACK
SQL server architecture :
Main components :
1. Relational engine
2. Storage engine
3. Buffer Pool
4. Data files & transaction log
5. Network
Relational Engine : It parses the query with Query Parser for syntax . Further the optimizer omtimizes the query with best execution plan. and finally query executor executes the query.
Storage engine : responsible for I/O of data . Using access methods it manages access over various views , idexes , page allocations etc. Transaction manager manages the transactions are stored in the transaction logs. and the most important is the buffer manager which deals with sql server main memory consumer the buffer pool.
SQL server network interface : the client can contact the sql server engine via SSNI . that has to happen via various protocols, namely :
1. Shared memory ( Protocol used on a single machine
2. Named Pipes within a WAN
3. TCPIP enables to connect sql server using IP and port. default port is 1433
TDS (Tabular Data Stream) : a protocol used to interact with SQL server . TDS end point is created between client and server over the network protocol. So now a sql statement is sent to sql server in a TDS packet
Protocol Layer : shows where our query has gone so far. At the client, the statement was wrapped in a TDS packet by the SQL Server Network Interface and sent to the protocol layer on the SQL Server where it was unwrapped, identified as a SQL Command, and the code sent to the Command Parser by the SNI.
Command Parser : The Command Parser’s role is to handle T-SQL language events. It first checks the syntax and returns any errors back to the protocol layer to send to the client. If the syntax is valid, then the next step is to generate a query plan or fi nd an existing plan. A query plan contains the details about how SQL Server is going to execute a piece of code. It is commonly referred to as an execution plan . To check for a query plan, the Command Parser generates a hash of the T-SQL and checks it against
the plan cache to determine whether a suitable plan already exists. The plan cache is an area in the buffer pool used to cache query plans. If it fi nds a match, then the plan is read from cache and passed on to the Query Executor for execution.
Plan Cache : Plan cache stores a existing execution plan in buffer pool. if there is no matching execution plan then a query tree is created based on TSQL and sent to query optimizer by the parser.
Query Optimizer :
First stage is the Pre-optimization where the process drops out if query is too simple.
After this starts the main phases :
Phase 0 : here optimizer looks at nested loop joins etc. optimizer stops here if it gets cost <0.2 Plan generated at this phase is called TP plan (transaction phase plan)
Phase 1: uses subset of optimization rules which were used before for similar scenarios. optimizer stops here if it get <0.2
Phase 2 : uses all available optimization .
Query Executor : As name says it will execute the query based on the query plan generated by optimizer.
Access Methods : Access methods is group of code which provide interface for the data which is retrieved or modified. It asks the buffer manager to retrieve the page and convert into OLE DB rowset to pass back to relational engine.
Buffer Manager
The Buffer Manager, as its name suggests, manages the buffer pool, which represents the majority of SQL Server’s memory usage. If you need to read some rows from a page (you’ll look at writes when we look at an UPDATE query), the Buffer Manager checks the data cache in the buffer pool to see if it already has the page cached in memory. If the page is already cached, then the results are passed back to the Access Methods. If the page isn’t already in cache, then the Buffer Manager gets the page from the database on disk, puts it in the data cache, and passes the results to the Access Methods.
The key point to take away from this is that you only ever work with data in memory. Every new
data read that you request is first read from disk and then written to memory (the data cache) before being returned as a result set. This is why SQL Server needs to maintain a minimum level of free pages in memory; you wouldn’t be able to read any new data if there were no space in cache to put it first. The Access Methods code determined that the SELECT query needed a single page, so it asked the Buffer Manager to get it. The Buffer Manager checked whether it already had it in the data cache, and then loaded it from disk into the cache when it couldn’t find it.
Data Cache :
The data cache is usually the largest part of the buffer pool; therefore, it’s the largest memory consumer within SQL Server. It is here that every data page that is read from disk is written to before being used.The sys.dm_os_buffer_descriptors DMV contains one row for every data page currently held in cache. You can use this script to see how much space each database is using in the data cache:
SELECT count(*)*8/1024 AS 'Cached Size (MB)'
,CASE database_id
WHEN 32767 THEN 'ResourceDb'
ELSE db_name(database_id)
END AS 'Database'
FROM sys.dm_os_buffer_descriptors
GROUP BY db_name(database_id),database_id
ORDER BY 'Cached Size (MB)' DESC
How a Update Query Works : the process is exactly the same as the process for the SELECT statement you just looked at until you get to the Access Methods. The Access Methods need to make a data modification this time, so before the I/O request is passed on, the details of the change need to be persisted to disk. That is the job of the Transaction Manager.
Transaction Manager :
The Transaction Manager has two components that are of interest here: a Lock Manager and a Log Manager. The Lock Manager is responsible for providing concurrency to the data, and it delivers the configured level of isolation by using locks. NOTE The Lock Manager is also employed during the SELECT query life cycle covered earlier, The real item of interest here is actually the Log Manager. The Access Methods code requests that the changes it wants to make are logged, and the Log Manager writes the changes to the transaction log. This is called write-ahead logging (WAL). Writing to the transaction log is the only part of a data modification transaction that always needs a physical write to disk because SQL Server depends on being able to reread that change in the event of system failure (you’ll learn more about this in the “Recovery” section coming up). What’s actually stored in the transaction log isn’t a list of modifi cation statements but only details of the page changes that occurred as the result of a modifi cation statement. This is all that SQL Server needs in order to undo any change, and why it’s so difficult to read the contents of a transaction log in any meaningful way, although you can buy a third-party tool to help. Getting back to the UPDATE query life cycle, the update operation has now been logged. The actual data modification can only be performed when confirmation is received that the operation has been physically written to the transaction log. This is why transaction log performance is so crucial. Once confirmation is received by the Access Methods, it passes the modification request on to the Buffer Manager to complete.
Buffer Manager
The page that needs to be modified is already in cache, so all the Buffer Manager needs to do is modify the page required by the update as requested by the Access Methods. The page is modified in the cache, and confirmation is sent back to Access Methods and ultimately to the client. The key point here (and it’s a big one) is that the UPDATE statement has changed the data in the data cache, not in the actual database file on disk. This is done for performance reasons, and the page is now what’s called a dirty page because it’s different in memory from what’s on disk. It doesn’t compromise the durability of the modification as defined in the ACID properties because you can re-create the change using the transaction log if, for example, you suddenly lost power to the server, and therefore anything in physical RAM (i.e., the data cache).
so finally this happens as we can see in the fig:
Dirty Pages
When a page is read from disk into memory it is regarded as a clean page because it’s exactly the same as its counterpart on the disk. However, once the page has been modifi ed in memory it is marked as a dirty page. Clean pages can be fl ushed from cache using dbcc dropcleanbuffers, which can be handy when you’re troubleshooting development and test environments because it forces subsequent reads to be fulfi lled from disk, rather than cache, but doesn’t touch any dirty pages.
Lazy Writer
The lazy writer is a thread that periodically checks the size of the free buffer list. When it’s low, it
scans the whole data cache to age-out any pages that haven’t been used for a while. If it fi nds any dirty pages that haven’t been used for a while, they are fl ushed to disk before being marked as free in memory. The lazy writer also monitors the free physical memory on the server and will release memory from the free buffer list back to Windows in very low memory conditions. When SQL Server is busy, it will also grow the size of the free buffer list to meet demand (and therefore the buffer pool) when there is free physical memory and the confi gured Max Server Memory threshold hasn’t been reached.
Checkpoint Process
A checkpoint is a point in time created by the checkpoint process at which SQL Server can be sure that any committed transactions have had all their changes written to disk. This checkpoint then becomes the marker from which database recovery can start. The checkpoint process ensures that any dirty pages associated with a committed transaction will be flushed to disk. It can also fl ush uncommitted dirty pages to disk to make effi cient use of writes but unlike the lazy writer, a checkpoint does not remove the page from cache; it ensures the dirty page is written to disk and then marks the cached paged as clean in the page header.
Comparison between Checkpoint & Lazy writer
Checkpoint | Lazy writer | |
1 | Checkpoint is used by sql engine to keep database recovery time in check | Lazy writer is used by SQL engine only to make sure there is enough memory left in sql buffer pool to accommodate new pages |
2 | Check point always mark entry in T-log before it executes either sql engine or manually | Lazy writer doesn’t mark any entry in T-log |
3 | To check occurrence of checkpoint , we can use below query select * from ::fn_dblog(null,null) WHERE [Operation] like ‘%CKPT’ | To check occurrence of lazy writer we can use performance monitor SQL Server Buffer Manager Lazy writes/sec |
4 | Checkpoint only check if page is dirty or not | Lazy writer clears any page from memory when it satisfies all of 3 conditions.
1. Memory is required by any object and available memory is full
2. Cost factor of page is zero
3. Page is not currently reference by any connection
|
5 | Checkpoint is affected by two parameters
1. Checkpoint duration: is how long the checkpoint can run for.
2. Recovery interval: affects how often it runs.
| Lazy writer is affected by
1. Memory pressure
2. Reference counter of page in memory
|
6 | Check point should not be very low , it can cause increasing recovery time of database | No. of times lazy writer is executing per second should always be low else it will show memory pressure |
7 | Checkpoint will run as per defined frequency | No memory pressure, no lazy writer |
8 | Checkpoint tries to write as many pages as fast as possible | Lazy writer tries to write as few as necessary |
9 | checkpoint process does not put the buffer page back on the free list | Lazy writer scans the buffer cache and reclaim unused pages and put it n free list |
10 | We can find last run entry of checkpoint in Boot page | Lazy writer doesn’t update boot page |
11 | Checkpoint can be executed by user manually or by SQL engine | Lazy writer cant be controlled by user |
12 | It keeps no. of dirty pages in memory to minimum | It helps to reduce paging |
13 | Auto frequency can be controlled using recovery interval in sp_configure | Works only @ memory pressure , It uses clock algorithm for cleaning buffer cache |
14 | It will be automatically executed before every sql statement which requires consistent view of database to perform task like (Alter, backup, checkdb, snapshot …..) | It kicks pages out of memory when reference counter of page reaches to zero |
15 | Command : Checkpoint | No command available |
16 | It comes in picture to find min lsn whenever t-log truncates | No entry in T-log |
17 | Checkpoint is affected by Database recovery model | Lazy writer doesn’t get impacted with recovery model of database |
18 | To get checkpoint entry in error log DBCC TRACEON(3502, -1) | Not Applied |
19 | Members of the SYSADMIN, DB_OWNER and DB_BACKUPOPERATOR can execute checkpoint maually | Not Applied |