Performance in SQL Server 2012 degrades when using ColumnStore and no GROUP BY clause

Essentially, the performance of a non-grouping SQL SELECT query degrades when applied to a ColumnStore index.  This has been tested with SQL Server 2012 RTM CU1.  Performing partial aggregation can result in a 15x performance improvement in some of the cases I observed.

See the full details in my Microsoft Connect submission here: https://connect.microsoft.com/SQLServer/feedback/details/761469/. Or download the details here: https://www.box.com/s/frf7imhyclb2efz2tfvb.

Business context

We have a Kimball-style star schema data mart. We have tens of millions of rows with a variety of integer and decimal data types. We are performing aggregating SELECT queries either over the entire data in the table or over a significant slice of data. Only a single result row is expected as there is no GROUP BY clause. We are not using SQL Server Analysis Services, only the relational engine.

Expected behaviour

We expect the performance of the query to be similar or faster than when a GROUP BY clause is applied because less processing is required.

Actual behaviour

Performance is disproportionately slow (observed at 15x slower than after applying workaround). Performance appears to degrade as more columns are added.

Workaround

Introducing an intermediate GROUP BY clause on an arbitrary (but relatively low cardinality) dimension and then further aggregating to the result improves performance to that of other queries.

Demonstration

We built a star schema model using an adaptation of the tool here https://github.com/electrum/ssb-dbgen which is based on the Star Schema Benchmark paper http://www.cs.umb.edu/~poneil/StarSchemaB.PDF [O’Neil, O’Neil & Chen, 2009]. We used a scale factor of 10 and then truncated the table to 50 million rows for the following test. We also added additional padding to the table to increase the overall width of a row.

Next, we rebuilt the table with a clustered primary key index and then applied a ColumnStore index to all supported columns.

We compared the behaviour of the following two queries when all data was in the buffer cache.

The results correlate with experiences we have had with different CPU manufacturers and types. Hardware details are included at the end of this document.

Query 1 (poor performance)

SELECT SUM(LO_ExtendedPrice)
     , SUM(LO_OrdTotalPrice)
     , SUM(LO_Revenue)
  FROM dbo.LineOrder;

We note that the execution mode of the ColumnStore index scan operation is row, not batch as we would expect.

SQL Server Execution Times:

CPU time = 59605 ms, elapsed time = 2013 ms.

Query 2 (good performance)

WITH base AS (
SELECT SUM(LO_ExtendedPrice) AS LO_ExtendedPrice
, SUM(LO_OrdTotalPrice) AS LO_OrdTotalPrice
, SUM(LO_Revenue) AS LO_Revenue
FROM dbo.LineOrder
GROUP BY LO_SuppKey
)
SELECT SUM(LO_ExtendedPrice) AS LO_ExtendedPrice
, SUM(LO_OrdTotalPrice) AS LO_OrdTotalPrice
, SUM(LO_Revenue) AS LO_Revenue
FROM base;

Here we see that the execution mode of the ColumnStore index scan is batch, as expected.

SQL Server Execution Times:

CPU time = 8360 ms, elapsed time = 395 ms.

This query is five times faster in elapsed time and about seven times faster in CPU time!

Additional observations

We have seen this behaviour on production systems with various star schema data sets with similar overall characteristics (indexing, data types, row count, etc.), and we have seen the performance difference reaching more than fifteen times. We intend to implement a solution while waiting for a fix from Microsoft.

We have also seen a similar behaviour in other non-grouping queries which include more complexity in their design (CASTing, arithmetic operations, joins, where clauses, CTEs). It has not always been clear whether the issue relates to the batch versus row execution mode, but in all observed cases the intermediate aggregation solution works.

We have also noticed that using a high-cardinality intermediate grouping causes performance problems.

We suspect that similar performance problems may exist when very minimal grouping is involved, but we have not confirmed this hypothesis.

Table structure

CREATE TABLE [dbo].[LineOrder](
   [LO_OrderKey] [bigint] NOT NULL,
   [LO_LineNumber] [tinyint] NOT NULL,
   [LO_CustKey] [bigint] NOT NULL,
   [LO_PartKey] [bigint] NOT NULL,
   [LO_SuppKey] [int] NOT NULL,
   [LO_OrderDateKey] [bigint] NOT NULL,
   [LO_OrderPriority] [char](15) NULL,
   [LO_ShipPriority] [char](1) NULL,
   [LO_Quantity] [tinyint] NULL,
   [LO_ExtendedPrice] [decimal](18, 0) NULL,
   [LO_OrdTotalPrice] [decimal](18, 0) NULL,
   [LO_Discount] [decimal](18, 0) NULL,
   [LO_Revenue] [decimal](18, 0) NULL,
   [LO_SupplyCost] [decimal](18, 0) NULL,
   [LO_Tax] [tinyint] NULL,
   [LO_CommitDateKey] [bigint] NOT NULL,
   [LO_ShipMode] [char](10) NULL,
   [LO_Padding] [binary](897) NOT NULL,
   CONSTRAINT [PK_LineOrder] PRIMARY KEY CLUSTERED
   (
      [LO_OrderKey] ASC,
      [LO_LineNumber] ASC
   )
   WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX [CS_LineOrder] ON [dbo].[LineOrder]
(
   [LO_CustKey],
   [LO_PartKey],
   [LO_SuppKey],
   [LO_OrderDateKey],
   [LO_OrderPriority],
   [LO_ShipPriority],
   [LO_Quantity],
   [LO_ExtendedPrice],
   [LO_OrdTotalPrice],
   [LO_Discount],
   [LO_Revenue],
   [LO_SupplyCost],
   [LO_Tax],
   [LO_CommitDateKey],
   [LO_ShipMode]
)
WITH (DROP_EXISTING = OFF) ON [PRIMARY]
GO

Note that SQL Server created a number of system statistics on this table. All queries were executed while the entire table data was in the buffer cache and this was validated using SET STATISTICS IO ON.

Server platform

  • Windows Server 2008 R2 Enterprise SP1
  • SQL Server 2012 RTM + CU1
  • Trace flags 4199, 834, 845, 2301
  • Service account has Lock Pages In Memory privilege
  • 2x 10 core Intel Xeon E7 2860, hyperthreading enabled and NUMA visible to O/S
  • 256 GB RAM, 240 GB assigned at startup to SQL Server
Advertisements

2 thoughts on “Performance in SQL Server 2012 degrades when using ColumnStore and no GROUP BY clause

  1. Lonny Niederstadt

    Hello Steve!
    256 GB of RAM, trace flag 834, and SQL 2012 Columnstore Indexes. Do you know the reasoning of the Microsoft recommendation below?
    “If you are using the Column Store Index feature of SQL Server 2012, we do not recommend turning on trace flag 834.”
    http://support.microsoft.com/kb/920093
    I also commented at the connect item linked above – since the connect is closed as fixed I wonder if the recommendation to NOT turn on columnstore is lifted in SQL Server 2014?
    Thanks – these are great details to have!

    Reply
    1. Steve Horsfield Post author

      I know that they have different memory allocation strategies with C/S and my understanding is that this doesn’t play well with T834. I was also surprised as it undermines the reasoning for T834 to exist, which still makes logical sense, in theory at least.

      I believe we also found some separate issues with T834 with C/S, which were resolved once we followed this advice.

      None of my Connect cases were resolved by removing T834, to my knowledge. In each case, Microsoft discovered a specific underlying cause.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s