Performance in SQL Server 2012 degrades non-linearly as SELECT clauses are added when using ColumnStore

When adding additional SELECT clauses to an aggregating query over a large fact table and using ColumnStore, the performance degrades in a step-wise linear fashion with large steps.  It may be quicker to execute several less-complex queries rather than a single complex query.

I’ve submitted a Microsoft Connect bug report here: https://connect.microsoft.com/SQLServer/feedback/details/761895/.

Continue reading for full details, or download this attachment: https://www.box.com/s/5hp9f0ditg0fspghu506 [PDF file also available via Microsoft Connect].  Actual test results, steps to reproduce, etc., as well as more pretty graphs :), are included.

Elapsed time by SUM measure count (base query)

Business context

We have a Kimball-style star schema data mart. We have tens of millions of rows with a variety of integer and decimal data types. We are performing aggregating SELECT queries either over the entire data in the table or over a significant slice of data. We aggregate the data according to the dimensionality and calculate aggregated values. All data is pre-loaded into the buffer cache. We are not using SQL Server Analysis Services, only the relational engine.

Expected behaviour

We expect the performance of the system to degrade as additional calculations are added, and as more result rows are required, however we also expect that it is cheaper to include additional SELECT clauses in a single statement versus separating the requests into multiple statements.

Actual behaviour

As the number of SELECT clauses is increased, we see a mostly linear progression in calculation time. However, at a certain point the performance degrades substantially such that it may be cheaper to execute multiple queries rather than a single execution of the more complex query.

Workaround

Separating the SELECT clauses into multiple queries and executing them independently may be beneficial, depending on the cost of correlating the results (if that is required).

Demonstration

We built a star schema model using an adaptation of the tool here https://github.com/electrum/ssb-dbgen which is based on the Star Schema Benchmark paper http://www.cs.umb.edu/~poneil/StarSchemaB.PDF [O’Neil, O’Neil & Chen, 2009]. We used a scale factor of 10 and then truncated the table to 50 million rows for the following test. We also added additional padding to the table to increase the overall width of a row.

Next, we rebuilt the table with a clustered primary key index and then applied a ColumnStore index to all supported columns.

We compared the performance of a query when all data was in the buffer cache. We examined how the performance changed as additional SELECT clauses were introduced, using different grouping strategies.

To evaluate whether the behaviour was data-dependent, we varied our tests with the following manipulations:

  • Different choice of columns from the original fact table
  • Different order of columns in the SELECT clause
  • Different grouping choices
  • Replacement fact table, where
  • All numeric data was strongly randomised
  • Additional decimal type columns were introduced to remove data type dependence

We also tested various hardware/operating-system configurations as follows:

  • 2x AMD 12-core (NUMA-visible)
  • 2x Intel 10-core (NUMA-visible, Hyperthreading enabled)
  • 2x Intel 10-core (NUMA-visible, Hyperthreading disabled)
  • 2x Intel 10-core (NUMA-hidden, Hyperthreading enabled)
  • 2x Intel 10-core (NUMA-hidden, Hyperthreading disabled)
  • 4x Intel 10-core (NUMA-visible, Hyperthreading enabled)
  • 4x Intel 10-core (NUMA-visible, Hyperthreading disabled)
  • 4x Intel 10-core (NUMA-hidden, Hyperthreading enabled)
  • 4x Intel 10-core (NUMA-hidden, Hyperthreading disabled)

The results correlate with experiences we have had with different CPU manufacturers and types. Hardware details are included at the end of this document.

The following results describe the behaviour on the randomised, extended table as this is easiest to evaluate.

Primary test query (20,000 result rows)

SET NOCOUNT ON;
SET STATISTICS PROFILE OFF
SET STATISTICS IO ON
SET STATISTICS TIME ON

DROP TABLE #results;

SELECT
	   LO_SuppKey
	 , SUM(LO_ExtendedPrice) AS measure1
	 , SUM(LO_OrdTotalPrice) AS measure2
	 , SUM(LO_Discount) AS measure3
	 , SUM(LO_Revenue) AS measure4
	 , SUM(LO_SupplyCost) AS measure5
	 , SUM(LO_ExtendedPrice2) AS measure6
	 , SUM(LO_OrdTotalPrice2) AS measure7
	 , SUM(LO_Discount2) AS measure8
	 , SUM(LO_Revenue2) AS measure9
	 , SUM(LO_SupplyCost2) AS measure10
  INTO #results
  FROM LineOrderAlternate
  GROUP BY
	LO_SuppKey
  OPTION (RECOMPILE)

The charts below shows how performance changes as additional SUM clauses are introduced:

Elapsed time by SUM measure count (base query)

CPU time by SUM measure count (base query)

Alternative query shapes

Altering the order in which columns are considered

SET NOCOUNT ON;
SET STATISTICS PROFILE OFF
SET STATISTICS IO ON
SET STATISTICS TIME ON

DROP TABLE #results;

SELECT 
	   LO_SuppKey
	 , SUM(LO_Discount2) AS measure1
	 , SUM(LO_Revenue2) AS measure2
	 , SUM(LO_Revenue) AS measure3
	 , SUM(LO_SupplyCost) AS measure4
	 , SUM(LO_ExtendedPrice) AS measure5
	 , SUM(LO_OrdTotalPrice) AS measure6
	 , SUM(LO_Discount) AS measure7
	 , SUM(LO_ExtendedPrice2) AS measure8
	 , SUM(LO_OrdTotalPrice2) AS measure9
	 , SUM(LO_SupplyCost2) AS measure10
  INTO #results
  FROM LineOrderAlternate
  GROUP BY
	LO_SuppKey
  OPTION (RECOMPILE)

Elapsed time by SUM measure count (SELECT clause order changed)

CPU time by SUM measure count (SELECT clause order changed)

Changing the level of aggregation (800,000 result rows)

SET NOCOUNT ON;
SET STATISTICS PROFILE OFF
SET STATISTICS IO ON
SET STATISTICS TIME ON

DROP TABLE #results;

SELECT 
       LO_PartKey
	 , SUM(LO_ExtendedPrice) AS measure1
	 , SUM(LO_OrdTotalPrice) AS measure2
	 , SUM(LO_Discount) AS measure3
	 , SUM(LO_Revenue) AS measure4
	 , SUM(LO_SupplyCost) AS measure5
	 , SUM(LO_ExtendedPrice2) AS measure6
	 , SUM(LO_OrdTotalPrice2) AS measure7
	 , SUM(LO_Discount2) AS measure8
	 , SUM(LO_Revenue2) AS measure9
	 , SUM(LO_SupplyCost2) AS measure10
  INTO #results
  FROM LineOrderAlternate
  GROUP BY
	LO_PartKey
  OPTION (RECOMPILE)

Note the different scales on these charts!

Elapsed time by SUM measure count (aggregation level changed)

CPU time by SUM measure count (aggregation level changed)

The step-wise behaviour is still noticeable when performing queries that return a large number of results, although it is less pronounced. The inflection at eight measure results clearly adds an overhead of about 40-50% on the query, and it is clearly cheaper to do a single additional query of one measure (subject to result correlation).

Grouping on multiple dimensions (10,000 result rows)

SET NOCOUNT ON;
SET STATISTICS PROFILE OFF
SET STATISTICS IO ON
SET STATISTICS TIME ON

DROP TABLE #results;

SELECT 
	   S_Nation, P_MFGR, D_YearMonth
	 , SUM(LO_ExtendedPrice) AS measure1
	 , SUM(LO_OrdTotalPrice) AS measure2
	 , SUM(LO_Discount) AS measure3
	 , SUM(LO_Revenue) AS measure4
	 , SUM(LO_SupplyCost) AS measure5
	 , SUM(LO_ExtendedPrice2) AS measure6
	 , SUM(LO_OrdTotalPrice2) AS measure7
	 , SUM(LO_Discount2) AS measure8
	 , SUM(LO_Revenue2) AS measure9
	 , SUM(LO_SupplyCost2) AS measure10
  INTO #results
  FROM LineOrderAlternate
	INNER JOIN Part ON P_PartKey = LO_PartKey
	INNER JOIN Supplier ON S_SuppKey = LO_SuppKey
	INNER JOIN Date ON D_DateKey = LO_OrderDateKey
  GROUP BY
	S_Nation, P_MFGR, D_YearMonth
  OPTION (RECOMPILE)

Note the different scales on these charts!

Elapsed time by SUM measure count (grouping on multiple dimensions)

CPU time by SUM measure count (grouping on multiple dimensions)

Additional observations

We originally suspected that the behaviour was linked to hardware factors, such as:

  • Physical NUMA characteristics of the system
  • Number of logical cores/threads
  • Physical caching capabilities
  • Disk I/O

However, the same queries appear to have the same behaviour on different CPU manufacturers and hardware architectures (see an exception below).  We have also evaluated the performance in memory so as to avoid disk I/O effects.   We have also compared query plans to try to identify differences but have been unable to do so.

Note that with a 40-core, 80-thread Intel system we did not notice the same inflection points.  We do not know why that is but suspect that the inflection points have moved.

We suspect that there is a behaviour in the logical architecture of the SQL Server engine that causes the observed behaviour.

Table structure

CREATE TABLE [dbo].[LineOrderAlternate](
	[LO_OrderKey] [bigint] NOT NULL,
	[LO_LineNumber] [tinyint] NOT NULL,
	[LO_CustKey] [bigint] NOT NULL,
	[LO_PartKey] [bigint] NOT NULL,
	[LO_SuppKey] [int] NOT NULL,
	[LO_OrderDateKey] [bigint] NOT NULL,
	[LO_OrderPriority] [char](15) NULL,
	[LO_ShipPriority] [char](1) NULL,
	[LO_Quantity] [tinyint] NULL,
	[LO_ExtendedPrice] [decimal](18, 0) NULL,
	[LO_OrdTotalPrice] [decimal](18, 0) NULL,
	[LO_Discount] [decimal](18, 0) NULL,
	[LO_Revenue] [decimal](18, 0) NULL,
	[LO_SupplyCost] [decimal](18, 0) NULL,
	[LO_ExtendedPrice2] [decimal](18, 0) NULL,
	[LO_OrdTotalPrice2] [decimal](18, 0) NULL,
	[LO_Discount2] [decimal](18, 0) NULL,
	[LO_Revenue2] [decimal](18, 0) NULL,
	[LO_SupplyCost2] [decimal](18, 0) NULL,
	[LO_Tax] [tinyint] NULL,
	[LO_CommitDateKey] [bigint] NOT NULL,
	[LO_ShipMode] [char](10) NULL,
	[LO_Padding] [binary](897) NOT NULL,
 CONSTRAINT [PK_LineOrderAlternate] PRIMARY KEY CLUSTERED 
(
	[LO_OrderKey] ASC,
	[LO_LineNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[LineOrderAlternate] ADD  DEFAULT (CONVERT([binary](897),N'00')) FOR [LO_Padding]
GO

 
CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex-20120628-034310] ON [dbo].[LineOrderAlternate]
(
	[LO_OrderKey],
	[LO_LineNumber],
	[LO_CustKey],
	[LO_PartKey],
	[LO_SuppKey],
	[LO_OrderDateKey],
	[LO_OrderPriority],
	[LO_ShipPriority],
	[LO_Quantity],
	[LO_ExtendedPrice],
	[LO_OrdTotalPrice],
	[LO_Discount],
	[LO_Revenue],
	[LO_SupplyCost],
	[LO_ExtendedPrice2],
	[LO_OrdTotalPrice2],
	[LO_Discount2],
	[LO_Revenue2],
	[LO_SupplyCost2],
	[LO_Tax],
	[LO_CommitDateKey],
	[LO_ShipMode]
)WITH (DROP_EXISTING = OFF) ON [PRIMARY]
GO

Page compression was also used, although it is not shown in this script.

Note that SQL Server created a number of system statistics on this table. All queries were executed while the entire table data was in the buffer cache and this was validated using SET STATISTICS IO ON.

Data randomisation process

BEGIN
	DECLARE @minimumCustKey bigint, @maximumCustKey bigint;
	SELECT @minimumCustKey = MIN(C_CustomerKey), @maximumCustKey = MAX(C_CustomerKey)
		FROM dbo.Customer;

	DECLARE @minimumSuppKey bigint, @maximumSuppKey bigint;
	SELECT @minimumSuppKey = MIN(S_SuppKey), @maximumSuppKey = MAX(S_SuppKey)
		FROM dbo.Supplier;

	DECLARE @minimumPartKey bigint
	      , @maximumPartKey bigint;
	 SELECT @minimumPartKey = MIN(P_PartKey)
	      , @maximumPartKey = MAX(P_PartKey)
		FROM dbo.Part;

	DECLARE @minimumDateKey bigint
	      , @maximumDateKey bigint;
	 SELECT @minimumDateKey = MIN(D_DateKey)
	      , @maximumDateKey = MAX(D_DateKey)
		FROM dbo.Date;

	CREATE TABLE #randomSeeds
	(
		idx int,
		seed1 bigint,
		seed2 bigint,
		seed3 bigint,
		seed4 bigint,
		seed5 bigint,
		seed6 bigint
	);

	TRUNCATE TABLE #randomSeeds;

	WITH 
	numsA AS (
		SELECT 1 as n
		  FROM (SELECT 1 as n0
				UNION ALL SELECT 2
				UNION ALL SELECT 3
				UNION ALL SELECT 4
				UNION ALL SELECT 5
				UNION ALL SELECT 6
				UNION ALL SELECT 7
				UNION ALL SELECT 8
			   ) AS n0
			   ,
			   (SELECT 1 as n0
				UNION ALL SELECT 2
				UNION ALL SELECT 3
				UNION ALL SELECT 4
				UNION ALL SELECT 5
				UNION ALL SELECT 6
				UNION ALL SELECT 7
				UNION ALL SELECT 8
			   ) AS n1
			   ,
			   (SELECT 1 as n0
				UNION ALL SELECT 2
				UNION ALL SELECT 3
				UNION ALL SELECT 4
				UNION ALL SELECT 5
				UNION ALL SELECT 6
				UNION ALL SELECT 7
				UNION ALL SELECT 8
			   ) AS n2
			   ,
			   (SELECT 1 as n0
				UNION ALL SELECT 2
				UNION ALL SELECT 3
				UNION ALL SELECT 4
				UNION ALL SELECT 5
				UNION ALL SELECT 6
				UNION ALL SELECT 7
				UNION ALL SELECT 8
			   ) AS n3
			   ,
			   (SELECT 1 as n0
				UNION ALL SELECT 2
				UNION ALL SELECT 3
				UNION ALL SELECT 4
				UNION ALL SELECT 5
				UNION ALL SELECT 6
				UNION ALL SELECT 7
				UNION ALL SELECT 8
			   ) AS n4
			   
	)
	, numsB AS
	(
		SELECT 1 as n
		  FROM numsA n0, numsA n1
	)
	, baseData AS 
	(
		SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as idx
		     , CRYPT_GEN_RANDOM(128) as seed
		  FROM numsB
	)
	INSERT INTO #randomSeeds
	SELECT  TOP(50000000)
	        idx
		  , CAST(SUBSTRING(seed, 0, 8) as bigint) as seed1
		  , CAST(SUBSTRING(seed, 8, 8) as bigint) as seed2
		  , CAST(SUBSTRING(seed, 16, 8) as bigint) as seed3
		  , CAST(SUBSTRING(seed, 24, 8) as bigint) as seed4
		  , CAST(SUBSTRING(seed, 32, 8) as bigint) as seed5
		  , CAST(SUBSTRING(seed, 40, 8) as bigint) as seed6
	  FROM baseData;
	   
	

	WITH Base AS (
		SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as idx
			 , LO_OrderKey, LO_LineNumber, LO_OrderPriority, LO_ShipPriority, LO_ShipMode, LO_Tax
		  FROM dbo.LineOrder
    ),
	Computed AS (
	  SELECT Base.idx, Base.LO_OrderKey, Base.LO_LineNumber
	       , LO_OrderPriority, LO_ShipPriority, LO_ShipMode, LO_Tax
	       , RNG.seed1, RNG.seed2, RNG.seed3, RNG.seed4, RNG.seed5, RNG.seed6   
	    FROM Base INNER JOIN #randomSeeds AS RNG ON RNG.idx = Base.idx
	)
	INSERT INTO dbo.LineOrderAlternate WITH (TABLOCKX)
		   ([LO_OrderKey]
           ,[LO_LineNumber]
           ,[LO_CustKey]
           ,[LO_PartKey]
           ,[LO_SuppKey]
           ,[LO_OrderDateKey]
           ,[LO_CommitDateKey]
           ,[LO_OrderPriority]
           ,[LO_ShipPriority]
           ,[LO_ShipMode]
           ,[LO_Tax]
           ,[LO_Quantity]
           ,[LO_ExtendedPrice]
           ,[LO_OrdTotalPrice]
           ,[LO_Discount]
           ,[LO_Revenue]
           ,[LO_SupplyCost]
           ,[LO_ExtendedPrice2]
           ,[LO_OrdTotalPrice2]
           ,[LO_Discount2]
           ,[LO_Revenue2]
           ,[LO_SupplyCost2]
           )
		   
	SELECT LO_OrderKey, LO_LineNumber,
		   ABS(seed1) % (@maximumCustKey - @minimumCustKey + 1) + @minimumCustKey AS LO_CustKey,
		   (5419 + ABS(seed1)) % (@maximumPartKey - @minimumPartKey + 1) + @minimumPartKey AS LO_PartKey,
		   (6011 + ABS(seed1)) % (@maximumSuppKey - @minimumSuppKey + 1) + @minimumSuppKey AS LO_SuppKey,
		   (SELECT TOP 1 D_DateKey FROM dbo.Date WHERE D_DateKey >= (
				((7717 + ABS(seed1)) % (@maximumDateKey - @minimumDateKey + 1)) + @minimumDateKey)
				ORDER BY NEWID() ASC) AS LO_OrderDateKey,
		   (SELECT TOP 1 D_DateKey FROM dbo.Date 
				WHERE D_DateKey >= (
					((3119 + ABS(seed2)) % (@maximumDateKey - @minimumDateKey + 1)) + @minimumDateKey)
				  AND D_DateKey <= (00010000 +
					((3119 + ABS(seed2)) % (@maximumDateKey - @minimumDateKey + 1)) + @minimumDateKey)
				ORDER BY NEWID() ASC) AS LO_CommitDateKey,
		   LO_OrderPriority, LO_ShipPriority, LO_ShipMode, LO_Tax,
		   ABS(seed2) % 200 + 1 AS LO_Quantity,
		   (ABS(seed3 % 178129182543) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_ExtendedPrice,
		   (ABS(seed4% 078324182543) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_OrdTotalPrice,
		   (ABS(seed5% 338122182543) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_Discount,
		   (ABS(seed6% 178862182543) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_Revenue,
		   (ABS(seed1% 178129282543) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_SupplyCost,
		   (ABS(seed2% 178191265262) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_ExtendedPrice2,
		   (ABS(seed3% 178945673543) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_OrdTotalPrice2,
		   (ABS(seed4% 326328632953) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_Discount2,
		   (ABS(seed5% 417618218343) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_Revenue2,
		   (ABS(seed6% 729264638343) / 100000000.0) * (1.0 + (0.01 * (seed4 % 1000))) AS LO_SupplyCost2

	  FROM Computed
	  ;


	DROP TABLE #randomSeeds;
END;
GO

Server platform

  • Windows Server 2008 R2 Enterprise SP1
  • SQL Server 2012 RTM + CU1
  • Trace flags 4199, 834, 845, 2301
  • Service account has Lock Pages In Memory privilege
  • 2x 10 core Intel Xeon E7 2860, hyperthreading enabled and NUMA visible to O/S
  • 256 GB RAM, 240 GB assigned at startup to SQL Server
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s