Category Archives: Software Development

Why Big Data Analytics is harder than you think: numerical precision matters!

How big is your Big Data data set? If it is not measured in petabytes then perhaps it doesn’t qualify. In any case, as the amount of data grows the challenges grow disproportionately and in areas that perhaps you don’t expect. In this post, I am raising a number of challenges and concerns that you should be thinking about, especially if you are interested in business analytics on large data sets.

I recommend that if you are serious about large data sets and numerical precision, you have no choice but to adopt 128-bit arithmetic in most scenarios, especially financial.

Continue reading

Performance in SQL Server 2012 degrades non-linearly as SELECT clauses are added when using ColumnStore

When adding additional SELECT clauses to an aggregating query over a large fact table and using ColumnStore, the performance degrades in a step-wise linear fashion with large steps.  It may be quicker to execute several less-complex queries rather than a single complex query.

I’ve submitted a Microsoft Connect bug report here: https://connect.microsoft.com/SQLServer/feedback/details/761895/.

Continue reading for full details, or download this attachment: https://www.box.com/s/5hp9f0ditg0fspghu506 [PDF file also available via Microsoft Connect].  Actual test results, steps to reproduce, etc., as well as more pretty graphs :), are included.

Elapsed time by SUM measure count (base query)

Continue reading

Performance in SQL Server 2012 degrades when using ColumnStore and no GROUP BY clause

Essentially, the performance of a non-grouping SQL SELECT query degrades when applied to a ColumnStore index.  This has been tested with SQL Server 2012 RTM CU1.  Performing partial aggregation can result in a 15x performance improvement in some of the cases I observed.

See the full details in my Microsoft Connect submission here: https://connect.microsoft.com/SQLServer/feedback/details/761469/. Or download the details here: https://www.box.com/s/frf7imhyclb2efz2tfvb.

Continue reading

Why software developers need to look beyond frameworks and languages

My career as a software developer has been fairly unusual as I spent about six years as an IT manager for two companies. It was not an intentional change of focus but happened unexpectedly after taking a job as an Oracle database developer.  That job changed shape immediately, in the form of a job offer for a job I was not interviewed for. It was a big undertaking for both my employer and for myself, but in the process it exposed me to many challenges that most developers have no experience of.

It is my belief that my current software development skills have been enriched as a result. I also believe that other software developers would be able to act more effectively with a little investigation around the edges of their own areas of expertise. In this brief article, I hope to encourage you, my readers, to swim a little further out to sea. I hope you find it useful and can make your way back against the tide!

Testing for correctness

Have you heard of integration testing? This is where you test how the software that you and your team have developed interacts with the surrounding eco-system. Perhaps you already do a lot of integration testing. If not, I certainly encourage you to do it, at least in some form. However, if you are performing integration testing without a consideration of the wider eco-system— networks, storage devices, processing platform, operating system options and configuration, drivers, patch level, etc.—then you are leaving open a multitude of untested areas in your product. Whether or not these risks are considered acceptable or not to your business is a matter of judgement, but wouldn’t it be nice to at least smoke test these? Or alternatively, wouldn’t it be nice to know and understand how the test environment is constrained similarly and differently from the real world environment? Not being able to answer these questions means that you won’t be able to tell the IT manager engaged in deploying your software the answers to his justifiable and reasonable questions.

Product ownership and leadership

One of the major failings in the modern world (I have seen it in governments, commercial businesses, charities, churches and social groups) is abdication of ownership.  More generally, I see this as a failing in leadership, but that is a discussion that is probably best conducted in another post.  A failure in ownership means, typically, that important things are left undone, because there is no one that considers it to be their job to ensure that it is done.  Note how I have phrased the last sentence, “to ensure that it is done”.  It is not always the case that the person taking ownership of an activity is the same person responsible for performing the activity.  Nevertheless, not having someone who is willing, able and empowered to take ownership of the activity defeats success almost every time.  We need more people to take ownership of the things for which they have been deemed responsible, or at least been given responsibility for.

In relation to software development, ownership leads to a need for a product team to understand the eco-system in which their product will be deployed.  I mean, to really understand.  I mean to understand to the degree necessary to be able to describe the entirety of the product’s function for which they are responsible.

This means that they should know how to deploy it.  They should know how it has been tested.  They should know how it has been designed.  Importantly, they must, they absolutely must, be able to describe how to use it, in all of its uses, to their user community.  Finally, they must be able to maintain it and support it.

I cannot see that any of this is possible without the team taking a high level of ownership of the entire product for which they are responsible.

Owning a deployment platform

If this level of ownership is to be achieved, then software developers must aim for at least level 3 of the Capability Maturity Model: Defined—a standard business process.  In other words, they must be able to correctly deliver an equivalent configuration based on a repeatable and well-described process.  This is not possible without understanding the configuration of the operating system, and equally of related functional areas.

For an example, a very simple but typical two tier web application deployed on self-hosted hardware would need:

  • Two physical machines
    • One, single disk system with minimal RAM, CPU and memory
    • One, RAID-1 operating system and database log disk; one, RAID-5 array for all other data files
  • One network switch
  • Two operating system installations
  • One web server installation
  • One database server installation
  • One database schema deployment
  • One web application deployment
  • Two firewall configurations
  • Two security configurations
  • etc.

The complexity quickly builds up.  Is there a SAN?  Is there a central network authentication system?  What protocols are being used? UDP? TCP? IPsec? LDAP? Kerberos? SOAP? XMLA?

Taking ownership of a product means taking ownership of the aspects of this that matter to the product’s correct function.  At the very least, it means knowing how to deploy the software and how to configure each part of the overall solution for correct behaviour.  It does not mean understanding everything; it just means understanding enough.

We need software developers to recognise that they develop products deployed into eco-systems, not products acting in isolation (well, at least that is true for most of us).  Therefore, it is critically important that we take responsibility and ownership for what we know and for what we require to be controlled.  We need to improve the quality of software development in the industry.  Would it not help if we at least could describe how the software we were responsible for creating works?

Ten tips for using and configuring Kerberos authentication on Windows

Lately, I’ve been having some fun with Kerberos in Windows/Active Directory. Fun might not be the best way to describe it, but I thought I’d spend a few moments capturing some of what I’ve learnt in the past few days.

Tip 1. Debugging Kerberos issues is very hard. I recommend that you don’t change anything without making a note of what you did and also what side-effects it might cause. Also keep track of whether you restarted any services or servers, whether you emptied any caches, etc. Otherwise, you might not be able to interpret your results.

Tip 2. Premature success is evil. If you’ve changed something and you are testing whether the configuration is working, you had better make sure that your test results aren’t due to the previous behaviour being cached. This is much worse than having a step that fails. So, if you think something is working, test thoroughly before moving on to the next step or declaring victory!

Tip 3. Using custom service accounts is a common trouble spot. In theory, a correctly configured service account should work just like a computer account. My experience is that sometimes they don’t. At the time of writing, I don’t know why not. Everything I can think of has been checked. One major consideration is the distinction between kernel-mode and user-mode code execution.

Tip 4. Capture network traces. It can be useful to see whether a Kerberos negotiation actually takes place, or if the client abandons Kerberos in favour of NTLM authentication. Sometimes, this can be caused by the Kerberos token cache on the client machine answering the request. This may be fine, or it may have an old configuration cached. Execute klist purge using an elevated administrator account. Both WireShark and NetMonitor are good tools for this. Use your preferred tool but make sure you learn how to use it effectively. Both tools can help you identify communication sessions or filter the trace to a set of protocols or addresses.

Tip 5. Make sure your DNS configuration is correct. I’ve often seen Windows clients set to load-balance between public and corporate DNS servers. This is an incorrect configuration. The Windows DNS client only uses the alternate server if the former cannot respond to a query. It is assumed that both would provide identical results. In a recent case, I saw a public DNS providing records for a the DC’s own test domain that wasn’t intended to be public (because there was a real public registration for the FQDN). Use .local domains unless you need Apple Mac integration (the Rendezvous service had problems with this in the past). The DNS specification lists .local addresses as private registrations. This is the DNS equivalent of private IP ranges. Note that Windows clients use DNS to identify the appropriate Kerberos servers.

Tip 6. Don’t just restart application pools in IIS. Restarting an application pool is a quick way of restarting a web-site. However, it is flawed. Restarting an application pool does not restart the entire user-mode stack. In particular, you need to pay attention to Windows Activation Services (WAS). Make sure this service is restarted when testing. Don’t forget klist purge, either.

Tip 7. Check your SPNs whenever a configuration is changed. In some cases, I believe, IIS configures SPNs for you. However, sometimes these can become out of sync. So check. Use setspn.exe -L [accountname] to review.

Tip 8. Check your Allowed-To-Delegate-To configuration. In Windows 2008 R2, these views in Active Directory Users & Computers show you whether the account supports delegation, whether it is constrained and whether any protocol can be used.

Tip 9. Know your abbrebiations! If you don’t know the abbreviations, you can’t search effectively. S4U (the ‘Services for User’ Kerberos extension) is ‘Protocol Transition’. S4U2proxy (the ‘Services for User to Proxy’ Kerberos extension) is ‘Constrained Delegation’, also look for blog entries with the incorrect S4Uproxy abbreviation, missing the numeral ’2′).

Tip 10. Don’t forget the rest. Unfortunately, ten tips isn’t enough to cover all the things you need to be aware of. Here are a few of the other things to consider:

  • Account option ‘Do not require Kerberos preauthentication’. You shouldn’t need to use this in a Windows environment. Kerberos protocol errors referring to KRB5KDC_ERR_PREAUTH_REQUIRED can usually be ignored. You should see a normal Kerberos negotiation following. Kerberos pre-authentication is used to validate the calling user’s identity.
  • Account option ‘This account is sensitive and cannot be delegated’. This will prevent delegation. It can be configured on service accounts, unless the service account needs to act as itself on a delegated service. If you are using impersonation, you may want this enabled because it will help to avoid false-positives.
  • IIS 7.5 authentication. There are new options to specify the protocols and other behaviours for Windows authentication. Make sure you review them. There is more information in the links below.
  • Try to test several different approaches. You may find that delegation to a file share is working but delegation to a web server is not. Don’t just follow one path. If things are working correctly then both approaches should work easily.
  • Windows servers use IPsec between servers and especially between domain controllers. I have no idea whether this can affect the success or failure of Kerberos interactions when running as a user account.
  • This is not a definitive guide! Sorry, but you are going to have to investigate and try things out. I recommend that you build an entirely clean, virtual environment to test your configuration. Also, try not to use it as an experimentation platform. Assume it is production and script or document everything. You need it to be reproducible.

References:

Finally, don’t forget that Kerberos relies on near-synchronisation of computer clocks. See my previous post Windows: The Windows Time Service.

Thanks for this article have to also go to several Microsoft engineers who have helped me to understand more about the implementation of Kerberos on Windows. You know who you are!

A continuous thread of execution it isn’t!

I have to admit it, I was really surprised this week. While investigating a mysterious issue I discovered that I knew less about the hosting platform of ASP.NET and IIS than I thought I did. What I found makes sense, but it was surprising nonetheless.

What I found has made me believe more strongly what I have recently been advocating. Affinity is dangerous. The model of pure functions in functional languages is much easier to understand and thus reason about. Whenever affinity is used as a back-door to rely on some previously established state, you are essentially adding input to your function, and when you do so you had better understand the immutability or otherwise of that information. The problem? Something believed to be immutable was not in fact immutable and thus the correctness of the code was gone.

Now this is all quite mysterious, so I’d probably get to telling you about what it is that I found.

I had a HttpModule that was impersonating a user, and therefore changing the return value of WindowsIdentity.GetCurrent(). I also caused a change in Thread.CurrentPrincipal because I wanted any .NET code in the ASP.NET pipeline to consider this account to be the current account. I thought everthing was fine! (I should point out that I am dubious about the quality and purpose of this code, it is just what I had when I was investigating. I suspect a rewrite is due…)

In fact, ASP.NET interleaves request tasks (note: request tasks, not just whole requests) on the same thread and therefore has logic to switch the current thread identity and impersonation behaviour. It only needs this because it interleaves request processing, otherwise it could have just left the identity as it was. The problem is the following: it does not determine the behaviour based on Thread.CurrentPrincipal or WindowsIdentity.GetCurrent(). Instead, the request’s execution context is represented by the HttpContext class and HttpContext.Current instance. The User property of HttpContext is actually an instance of IPrincipal and ASP.NET will undo impersonation before switching to a new task. Without setting the HttpContext.Current.User property, this impersonation approach is not going to work!

The solution is clearly trivial: set the HttpContext.Current.User property. However, that misses the point. Server-side code oftens requires the splitting up of work into smaller units. When this happens, each of these units of work may be executed on the same thread without interruption, on the same thread with infrastructure interruption and then an immediate resumption, on the same thread with the interleaving of an alternate unit of work or on another thread. Modern systems have a large amount of co-operative multi-tasking on the same thread. This is true for ASP.NET, WCF and the TPL. It also means that, when traversing threads intentionally, you have to take responsibility for taking this state with you.

The large and complex subsystems of .NET include several examples of this. The ExecutionContext manages the CLR state as it reuses different operating system threads. WCF has the OperationContext and ServiceSecurityContext classes. ASP.NET has the HttpContext. Of course, you’ve probably also used the SynchronisationContext to interact with a UI thread that has its own thread affinity.

In retrospect, a lot of this looks obvious. I knew that ASP.NET supported asynchronous page execution, and of course it may need to load the page from disk and even compile it in some cases, so an asynchronous approach seems obvious. Similarly, I’ve coded custom WCF bindings and so I know that they are also an asynchronous design. Nevertheless, it is all to easy to make the incorrect assumption that these methods and events are just executed as a monolithic block of code with the infrastructure providing the simplest of glue. The reality is far more complex.

Thanks go to Scott Hanselman for a nice blog post on some of this: System.Threading.Thread.CurrentPrincipal vs. System.Web.HttpContext.Current.User or why FormsAuthentication can be subtle.  The Microsoft Patterns & Practices team also have a detailed description of ASP.NET authentication, although the article is quite old now: Explained: Windows Authentication in ASP.NET 2.0.

NHibernate: How to filter on primitive collections

I am using NHibernate with a client and I keep hitting the same issue. I have entities with basic collections of strings. I want to search for entities on the basis of filtering criteria expressed against the elements of the collection.

The easy solution is to treat the collection elements as entities but this is not ideal. That really complicates the domain. In some cases, the elements are simply references to foreign entities outside the scope of NHibernate, for example in a remote service or configuration file.

In SQL, I can pose the query as a correlated subquery or as an (outer) join. This has the advantage of being efficient and does not result in the loading of the collection.

An example would be searching for a Cat that is only black when each Cat has a collection of Colours, perhaps represented by an RGB triad. It would be true normal form to extract the colours into their own table but it would also be ridiculous to do so because a foreign key already exists – the RGB triad! Another example would be finding all Cats that are partly black or partly white. There is no reason why the criteria cannot be arbritrarily complex.

I have not found a way in HQL or the Criteria API and my scenarios require the filtering to take place in the database. The result is that I am using SQL directly with NHibernate’s ISession.

Any better solutions? Does ADO.NET Entity Framework also lack this concept?

Introducing SQL Service Broker

I have been investigating SQL Server’s Service Broker component as a possible choice for enabling messaging solutions in a .Net environment for a client and I have been impressed by the feature set.

The architecture does require a bit of a mind shift as it encourages the separation of processing logic from the reception of messages. Sending messages should of course still be part of the process logic.

Why this distinction? This approach allows for rapid verification of message transfer and controlled processing of message content. It encourages developers to recognise that transfer of messages is a different task to responding to a queue of received requests. Messaging is about sending messages and reliably receiving them, not about any particular business process.

If you do not apply this design then you are in danger of causing rollbacks during message reception and this may stop a queue if it occurs five times in a row.

Do you agree with the design? What are your experiences?

F#: Generic Fold

The fold functions in functional languages are very powerful constructs, but how about when you want a different function applied to the first element, or to the last? What about when the sequence is empty? Rather than solving this problem repeatedly, I have written a few functions to cover the range of cases. As I wrote these, I found it interesting that the solutions for Array, Seq and List all look very different.

Continue reading