Compression and its effects on performance – 【126kr】

One of the many new features introduced back in SQL Server 2008 was Data Compression . Data recovery equipment tools Compression at either the row or page level provides an opportunity to save disk space, with the trade off of requiring a bit more CPU to compress and decompress the data. Data recovery lab It’s frequently argued that the majority of systems are IO-bound, not CPU-bound, so the trade off is worth it. Data recovery online The catch? You had to be on Enterprise Edition to use Data Compression. Database union With the release of SQL Server 2016 SP1, that has changed! If you’re running Standard Edition of SQL Server 2016 SP1 and higher, you can now use Data Compression. Data recovery houston There’s also a new built-in function for compression, COMPRESS (and its counterpart DECOMPRESS ). 7m database soccer basketball Data Compression does not work on off-row data, so if you have a column like NVARCHAR(MAX) in your table with values typically more than 8000 bytes in size, that data won’t be compressed (thanks Adam Machanic for that reminder).

Create database link The COMPRESS function solves this problem, and compresses data up to 2GB in size. Database form Moreover, while I’d argue that the function should only be used for large, off-row data, I thought comparing it directly against row and page compression was a worthwhile experiment. Data recovery process SETUP

For test data, I’m working from a script Aaron Bertrandhas used previously, but I’ve made some tweaks. Database training I created a separate database for testing but you can use tempdb or another sample database, and then I started with a Customers table that has three NVARCHAR columns. Data recovery hardware I considered creating larger columns and populating them with strings of repeating letters, but using readable text gives a sample that’s more realistic and thus provides greater accuracy.

Note:If you’re interested in implementing compression and want to know how it will affect storage and performance in your environment, I HIGHLY RECOMMEND THAT YOU TEST IT. Best database software I’m giving you the methodology with sample data; implementing this in your environment shouldn’t involve additional work.

You’ll note below that after creating the database we’re enabling Query Store. Database gui Why create a separate table to try and track our performance metrics when we can just use functionality built-in to SQL Server?! USE [master];

With the table created, we’ll add some data, but we’re adding 5 million rows instead of 1 million. Data recovery technician This takes about eight minutes to run on my laptop. Database engine tuning advisor INSERT dbo.Customers WITH (TABLOCKX)

Now we’ll create three more tables: one for row compression, one for page compression, and one for the COMPRESS function. Data recovery rates Note that with the COMPRESS function, you must create the columns as VARBINARY data types. Database developer salary As a result, there are no nonclustered indexes on the table (as you cannot create an index key on a varbinary column). Database backup and recovery CREATE TABLE [dbo].[Customers_Page]

Next we’ll copy the data from [dbo].[Customers] to the other three tables. Yale b database This is a straight INSERT for our page and row tables and takes about two to three minutes for each INSERT, but there’s a scalability issue with the COMPRESS function: trying to insert 5 million rows in one fell swoop just isn’t reasonable. Sybase database The script below inserts rows in batches of 50,000, and only inserts 1 million rows instead of 5 million. Database design I know, that means we’re not truly apples-to-apples here for comparison, but I’m ok with that. Database 3 normal forms Inserting 1 million rows takes 10 minutes on my machine; feel free to tweak the script and insert 5 million rows for your own tests. Database for dummies INSERT dbo.Customers_Page WITH (TABLOCKX)

With all our tables populated, we can do a check of size. A database is a collection of At this point, we have not implemented ROW or PAGE compression, but the COMPRESS function has been used: SELECT [o].[name], [i].[index_id], [i].[name], [p].[rows],

As expected, all tables except Customers_Compress are about the same size. Library database Now we’ll rebuild indexes on all tables, implementing row and page compression on Customers_Row and Customers_Page, respectively. Database node ALTER INDEX ALL ON dbo.Customers REBUILD;

As expected, the row and page compression significantly decreases the size of the table and its indexes. Data recovery after format The COMPRESS function saved us the most space – the clustered index is one quarter the size of the original table. Database 2015 EXAMINING QUERY PERFORMANCE

Before we test query performance, note that we can use Query Store to look at INSERT and REBUILD performance: SELECT [q].[query_id], [qt].[query_sql_text],

While this data is interesting, I’m more curious about how compression affects my everyday SELECT queries. Data recovery nyc I have a set of three stored procedures that each have one SELECT query, so that each index is used. Database weekly I created these procedures for each table, and then wrote a script to pull values for first and last names to use for testing. Data recovery utah Here is the script to create the procedures .

Once we have the stored procedures created, we can run the script below to call them. Data recovery deleted files Kick this off and then wait a couple minutes… SET NOCOUNT ON;

You’ll see that most stored procedures have executed only 20 times because two procedures against [dbo].[Customers_Compress] are really slow. Database health check This is not a surprise; neither [FirstName] nor [LastName] is indexed, so any query will have to scan the table. R studio data recovery software I don’t want those two queries to slow down my testing, so I’m going to modify the workload and comment out EXEC [dbo].[usp_FindActiveCustomer_CS] and EXEC [dbo].[usp_FindAnyCustomer_CS] and then start it again. Data recovery iphone This time, I’ll let it run for about 10 minutes, and when I look at the Query Store output again, now I have some good data. Ease use data recovery Raw numbers are below, with the manager-favorite graphs below.

Reminder: All stored procedures that end with _C are from the non-compressed table. Free database software The procedures ending with _R are the row compressed table, those ending with _P are page compressed, and the one with _CS uses the COMPRESS function (I removed the results for said table for usp_FindAnyCustomer_CS and usp_FindActiveCustomer_CS as they skewed the graph so much we lost the differences in the rest of the data). Raid 0 data recovery software The usp_FindAnyCustomer_* and usp_FindActiveCustomer_* procedures used nonclustered indexes and returned thousands of rows for each execution.

I expected duration to be higher for the usp_FindAnyCustomer_* and usp_FindActiveCustomer_* procedures against row and page compressed tables, compared to the non-compressed table, because of the overhead of decompressing the data. Mail database The Query Store data does not support my expectation – the duration for those two stored procedures is roughly the same (or less in one case!) across those three tables. Hdata recovery master The logical IO for the queries was nearly the same across the non-compressed and page and row compressed tables.

In terms of CPU, in the usp_FindActiveCustomer and usp_FindAnyCustomer stored procedures it was always higher for the compressed tables. In database CPU was comparable for the usp_FindSpecificCustomer procedure, which was always a singleton lookup against the clustered index. Drupal 7 database query Note the high CPU (but relatively low duration) for the usp_FindSpecificCustomer procedure against the [dbo].[Customer_Compress] table, which required the DECOMPRESS function to display data in readable format. Data recovery usa SUMMARY

The additional CPU required to retrieve compressed data exists and can be measured using Query Store or traditional baselining methods. Data recovery business Based on this initial testing, CPU is comparable for singleton lookups, but increases with more data. Database visualization I wanted to force SQL Server to decompress more than just 10 pages – I wanted 100 at least. Data recovery qatar I executed variations of this script, where tens of thousands of rows were returned, and findings were consistent with what you see here. Data recovery no root My expectation is that to see significant differences in duration due to the time to decompress the data, queries would need to return hundreds of thousands, or millions of rows. Database keywords If you’re in an OLTP system, you don’t want to return that many rows, so the tests here should give you an idea of how compression may affect performance. Normalization in database If you’re in a data warehouse, then you will probably see higher duration along with the higher CPU when returning large data sets. Database 3nf While the COMPRESS function provides significant space savings compared to page and row compression, the performance hit in terms of CPU, and the inability to index the compressed columns due to their data type, make it viable only for large volumes of data that will not be searched.