Thursday, March 9, 2017

How Statistics affect query execution plans and what will be their effect.

Mahesh Sambu 3/09/2017 11:18:00 PM DBCC_SHOW_STATISTICS, Performance gains, statistics No comments

Problem: How much of an improvement will statistics do in terms of performance.?

In my last few posts, I was more into details, how to create, update and check the statistics of a table or indexed view columns.

DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view. The query optimizer uses statistics to estimate the cardinality or number of rows in the query result, which enables the query optimizer to create a high quality query plan. For example, the query optimizer could use cardinality estimates to choose the index seek operator instead of the index scan operator in the query plan, improving query performance by avoiding a resource-intensive index scan.

Let’s see how we can improve and what makes a difference after we update statistics on a table or column.

Example:

Example 1:

In this example I would like to show how we can make use of statistics to improve performance and what will happen if we keep AUTO_UPDATE_STATISTICS option set to OFF.

Step 1: setting the database statistics update off.

ALTER DATABASE DB SET AUTO_CREATE_STATISTICS OFF

ALTER DATABASE DB SET AUTO_UPDATE_STATISTICS OFF

ALTER DATABASE DB SET AUTO_UPDATE_STATISTICS_ASYNC OFF

If you go to properties of a database now e can see these are turned to false as a sign of the updates are turned off.

Step 2: Insert few thousand rows into Employees table and check again if the statistics go the density and ranges correct.

insert into dbo.Employees([Name]

,[PhoneNumber],Company_Id)

select

substring(cast(newid() as varchar(68)), 1, 6) as Name

, (

SELECT TOP (1) cast (c1 as int)

FROM (

VALUES ('1234564'), ('2456464'), ('464646'), ('4644949'), ('74997979')

) AS T1(c1)

ORDER BY ABS(CHECKSUM(NEWID()))

) as Ph

, (

SELECT TOP (1) cast (c1 as int)

FROM (

select number from master..spt_values

) AS T1(c1)

ORDER BY ABS(CHECKSUM(NEWID()))

) as Company_Id

GO 100000

DBCC SHOW_STATISTICS(Employees,'employeeid')

This is how it looks after i have inserted a million rows still the rows doesnt show up in the statistics as we dont have update statistics switched on.

Step 3: Update Statistics to refresh the index statistics on the database.

exec sp_updatestats

OUTPUT:

Updating [dbo].[Employees]

[_WA_Sys_00000001_2057CCD0], update is not necessary...

[_WA_Sys_00000004_2057CCD0] has been updated...

[_WA_Sys_00000002_2057CCD0] has been updated...

[_WA_Sys_00000005_2057CCD0] has been updated...

[_WA_Sys_00000003_2057CCD0] has been updated...

4 index(es)/statistic(s) have been updated, 1 did not require update.

Step 4: Check to see if there are any improvements in the density and ranges.

DBCC SHOW_STATISTICS(Employees,'employeeid')

Step 5: Time to test some query performances.

I choose to run a query with between values and so it falls in ranges and scan the rows better.

select * from employees where employeeid >=100021 and employeeid <=102380

And I would like to compare the number of rows in the query costs i.e estimated rows vs actual number of rows as this makes a lot of difference in the query plans.

So After comparing the same query execution plans before and after updating the stats the number of rows inside the scan where actual vs estimated had a huge differences. I feel like updating the statistics on a index or table column where it is used predicates in a number of queries at regular intervals is good for a healthy execution plans.

Example 2:

DBCC SHOW_STATISTICS (Employees,'ix_name_phonenumber') WITH HISTOGRAM;

RANGE_HI_KEY	RANGE_ROWS	EQ_ROWS	DISTINCT_RANGE_ROWS	AVG_RANGE_ROWS
1981	26767.57	592.936	45	594.835
1997	8255.367	587.2891	15	550.3578
2042	26438.72	525.1719	44	600.88
4096	8351.755	4833.841	10	832.2198
8275	8680.609	1129.402	7	1235.528
16389	8068.261	649.4062	8	1004.882
17222	8374.435	1157.637	9	927.1665
17985	8062.591	1174.578	7	1147.564
18259	8436.804	1253.636	7	1200.827
19523	8476.493	1112.461	7	1206.476
20047	9423.365	1067.285	8	1173.657
20816	8334.746	632.4651	8	1038.073
21318	9661.501	1287.518	9	1069.663
21581	7682.708	1129.402	7	1093.495
22099	9814.588	1168.931	8	1222.383
8388608	8674.939	1208.46	11	785.8628
1073741824	3498.325	1293.165	5	696.9646
1469283328	2.392718	643.7592	0	741.0588

Here is a small visualization on what a histogram means in terms of the RANGES and rows.

Let’s compare to see if we query between the RANGEHKEY where we have less number of distinct rows and compare to where we have more distinct rows to compare the same query execution costs and how they differ in execution plans.

select * from employees where company_id =1469283328

OPTION(RECOMPILE);

As we can see the cost is 2.02 based on the Range HKey there are only 2.39 range rows that makes a cut on the cost of execution.

select * from employees where company_id =4096

OPTION(RECOMPILE);

As we can see the cost is 5.4817 based on the Range HKey there are only 8351 range rows that makes a hit on the cost of execution.

So Depending up on the RANGE they fall in the cost of a query becomes cheaper or expensive. The statistics play a vital role when they get into executions and cost of query optimizer.

Learn SQL Concepts

All Kinds of topics related to Transact SQL, SQL Server Database Administration, SQL Server Query Optimization, SQL Server Trouble Shooting.

Thursday, March 9, 2017

How Statistics affect query execution plans and what will be their effect.

0 comments:

Post a Comment

Certifications

Total Pageviews

Blog Archive

Popular Posts

Categories

Thank you!