Chrome almost supports SSO in Windows Kerberos environments

I was pleasantly surprised to find that Google Chrome has support for SSO and the Negotiate algorithm. Indeed it also has support for NTLM. So why the need for this post? I think the implementation could do with a little refinement.

Here’s my assumption. Credential delegation in a Kerberos environment is managed by the Kerberos system and its configuration, clients should not attempt to interfere with it. However, Google Chrome disallows ticket forwarding by default, effectively preventing delegation (constrained or otherwise). You can change this with an option on the command line but that means you have to know the option exists and have to plan to change it for every user of your web site. Seems the wrong way round to me. This default means that, out of the box, most web sites of any complexity will not operate as per their intended design.

Secondly, the default SPN behaviour is incorrect for Windows platforms. The Kerberos specification does not say much about SPNs, but they do at least have several parts: the service type, the host and port, and optionally an additional service identifier. Including the port is standard, but Chrome doesn’t do this by default. Secondly, the Chrome default behaviour is to resolve DNS CNAME records to A records and use this for the host part. I can’t fault Google for this approach but it does differ from the widely documented Windows approach of using SPNs for the host header (i.e. before CNAME resolution). (As an aside, note that if you take that approach then why shouldn’t you use the IPv4 address, or the IPv6 address, and what if the machine is multi-homed?). It also interferes with the ability of a host to provide multiple independent services because with the Google approach they all have identical SPNs. In Chrome’s defense, these options can also be controlled via the command line.

Finally, note that NTLMv2 is only available on Windows platforms. Chrome supports NTLMv1 on other platforms but that is horrendously insecure! This is not intended as a negative comment on Chrome, just something to be aware of.

It is great to see other browsers finally supporting SSO, Negotiate, NTLM and Kerberos. I just hope that interoperability is considered a desirable end goal. Without it these are just more competing proprietary solutions, and that would be a shame.

Material about Google Chrome was taken from here: HTTP authentication [The Chromium Projects]. See my recent post about Kerberos in Windows for links to supporting Windows implementation materials.

Ten tips for using and configuring Kerberos authentication on Windows

Lately, I’ve been having some fun with Kerberos in Windows/Active Directory. Fun might not be the best way to describe it, but I thought I’d spend a few moments capturing some of what I’ve learnt in the past few days.

Tip 1. Debugging Kerberos issues is very hard. I recommend that you don’t change anything without making a note of what you did and also what side-effects it might cause. Also keep track of whether you restarted any services or servers, whether you emptied any caches, etc. Otherwise, you might not be able to interpret your results.

Tip 2. Premature success is evil. If you’ve changed something and you are testing whether the configuration is working, you had better make sure that your test results aren’t due to the previous behaviour being cached. This is much worse than having a step that fails. So, if you think something is working, test thoroughly before moving on to the next step or declaring victory!

Tip 3. Using custom service accounts is a common trouble spot. In theory, a correctly configured service account should work just like a computer account. My experience is that sometimes they don’t. At the time of writing, I don’t know why not. Everything I can think of has been checked. One major consideration is the distinction between kernel-mode and user-mode code execution.

Tip 4. Capture network traces. It can be useful to see whether a Kerberos negotiation actually takes place, or if the client abandons Kerberos in favour of NTLM authentication. Sometimes, this can be caused by the Kerberos token cache on the client machine answering the request. This may be fine, or it may have an old configuration cached. Execute klist purge using an elevated administrator account. Both WireShark and NetMonitor are good tools for this. Use your preferred tool but make sure you learn how to use it effectively. Both tools can help you identify communication sessions or filter the trace to a set of protocols or addresses.

Tip 5. Make sure your DNS configuration is correct. I’ve often seen Windows clients set to load-balance between public and corporate DNS servers. This is an incorrect configuration. The Windows DNS client only uses the alternate server if the former cannot respond to a query. It is assumed that both would provide identical results. In a recent case, I saw a public DNS providing records for a the DC’s own test domain that wasn’t intended to be public (because there was a real public registration for the FQDN). Use .local domains unless you need Apple Mac integration (the Rendezvous service had problems with this in the past). The DNS specification lists .local addresses as private registrations. This is the DNS equivalent of private IP ranges. Note that Windows clients use DNS to identify the appropriate Kerberos servers.

Tip 6. Don’t just restart application pools in IIS. Restarting an application pool is a quick way of restarting a web-site. However, it is flawed. Restarting an application pool does not restart the entire user-mode stack. In particular, you need to pay attention to Windows Activation Services (WAS). Make sure this service is restarted when testing. Don’t forget klist purge, either.

Tip 7. Check your SPNs whenever a configuration is changed. In some cases, I believe, IIS configures SPNs for you. However, sometimes these can become out of sync. So check. Use setspn.exe -L [accountname] to review.

Tip 8. Check your Allowed-To-Delegate-To configuration. In Windows 2008 R2, these views in Active Directory Users & Computers show you whether the account supports delegation, whether it is constrained and whether any protocol can be used.

Tip 9. Know your abbrebiations! If you don’t know the abbreviations, you can’t search effectively. S4U (the ‘Services for User’ Kerberos extension) is ‘Protocol Transition’. S4U2proxy (the ‘Services for User to Proxy’ Kerberos extension) is ‘Constrained Delegation’, also look for blog entries with the incorrect S4Uproxy abbreviation, missing the numeral ‘2’).

Tip 10. Don’t forget the rest. Unfortunately, ten tips isn’t enough to cover all the things you need to be aware of. Here are a few of the other things to consider:

  • Account option ‘Do not require Kerberos preauthentication’. You shouldn’t need to use this in a Windows environment. Kerberos protocol errors referring to KRB5KDC_ERR_PREAUTH_REQUIRED can usually be ignored. You should see a normal Kerberos negotiation following. Kerberos pre-authentication is used to validate the calling user’s identity.
  • Account option ‘This account is sensitive and cannot be delegated’. This will prevent delegation. It can be configured on service accounts, unless the service account needs to act as itself on a delegated service. If you are using impersonation, you may want this enabled because it will help to avoid false-positives.
  • IIS 7.5 authentication. There are new options to specify the protocols and other behaviours for Windows authentication. Make sure you review them. There is more information in the links below.
  • Try to test several different approaches. You may find that delegation to a file share is working but delegation to a web server is not. Don’t just follow one path. If things are working correctly then both approaches should work easily.
  • Windows servers use IPsec between servers and especially between domain controllers. I have no idea whether this can affect the success or failure of Kerberos interactions when running as a user account.
  • This is not a definitive guide! Sorry, but you are going to have to investigate and try things out. I recommend that you build an entirely clean, virtual environment to test your configuration. Also, try not to use it as an experimentation platform. Assume it is production and script or document everything. You need it to be reproducible.

References:

Finally, don’t forget that Kerberos relies on near-synchronisation of computer clocks. See my previous post Windows: The Windows Time Service.

Thanks for this article have to also go to several Microsoft engineers who have helped me to understand more about the implementation of Kerberos on Windows. You know who you are!

A continuous thread of execution it isn’t!

I have to admit it, I was really surprised this week. While investigating a mysterious issue I discovered that I knew less about the hosting platform of ASP.NET and IIS than I thought I did. What I found makes sense, but it was surprising nonetheless.

What I found has made me believe more strongly what I have recently been advocating. Affinity is dangerous. The model of pure functions in functional languages is much easier to understand and thus reason about. Whenever affinity is used as a back-door to rely on some previously established state, you are essentially adding input to your function, and when you do so you had better understand the immutability or otherwise of that information. The problem? Something believed to be immutable was not in fact immutable and thus the correctness of the code was gone.

Now this is all quite mysterious, so I’d probably get to telling you about what it is that I found.

I had a HttpModule that was impersonating a user, and therefore changing the return value of WindowsIdentity.GetCurrent(). I also caused a change in Thread.CurrentPrincipal because I wanted any .NET code in the ASP.NET pipeline to consider this account to be the current account. I thought everthing was fine! (I should point out that I am dubious about the quality and purpose of this code, it is just what I had when I was investigating. I suspect a rewrite is due…)

In fact, ASP.NET interleaves request tasks (note: request tasks, not just whole requests) on the same thread and therefore has logic to switch the current thread identity and impersonation behaviour. It only needs this because it interleaves request processing, otherwise it could have just left the identity as it was. The problem is the following: it does not determine the behaviour based on Thread.CurrentPrincipal or WindowsIdentity.GetCurrent(). Instead, the request’s execution context is represented by the HttpContext class and HttpContext.Current instance. The User property of HttpContext is actually an instance of IPrincipal and ASP.NET will undo impersonation before switching to a new task. Without setting the HttpContext.Current.User property, this impersonation approach is not going to work!

The solution is clearly trivial: set the HttpContext.Current.User property. However, that misses the point. Server-side code oftens requires the splitting up of work into smaller units. When this happens, each of these units of work may be executed on the same thread without interruption, on the same thread with infrastructure interruption and then an immediate resumption, on the same thread with the interleaving of an alternate unit of work or on another thread. Modern systems have a large amount of co-operative multi-tasking on the same thread. This is true for ASP.NET, WCF and the TPL. It also means that, when traversing threads intentionally, you have to take responsibility for taking this state with you.

The large and complex subsystems of .NET include several examples of this. The ExecutionContext manages the CLR state as it reuses different operating system threads. WCF has the OperationContext and ServiceSecurityContext classes. ASP.NET has the HttpContext. Of course, you’ve probably also used the SynchronisationContext to interact with a UI thread that has its own thread affinity.

In retrospect, a lot of this looks obvious. I knew that ASP.NET supported asynchronous page execution, and of course it may need to load the page from disk and even compile it in some cases, so an asynchronous approach seems obvious. Similarly, I’ve coded custom WCF bindings and so I know that they are also an asynchronous design. Nevertheless, it is all to easy to make the incorrect assumption that these methods and events are just executed as a monolithic block of code with the infrastructure providing the simplest of glue. The reality is far more complex.

Thanks go to Scott Hanselman for a nice blog post on some of this: System.Threading.Thread.CurrentPrincipal vs. System.Web.HttpContext.Current.User or why FormsAuthentication can be subtle.  The Microsoft Patterns & Practices team also have a detailed description of ASP.NET authentication, although the article is quite old now: Explained: Windows Authentication in ASP.NET 2.0.

Developers can be good communicators

As developers, we can often find it difficult to express compelling arguments to managers, colleagues in other departments and business leaders. This challenge is not insurmountable.

To communicate effectively, we developers (like everyone else) must address the needs of our audience.

Firstly, we need to identify our audience. Who are they? What do they already know? What is their concern with our current subject?

Secondly, we need to determine their degree of engagement with our subject. The ACCA framework is useful for this:

Thirdly, we need to answer a question that they have. To do this, we must direct them to an appropriate question that we can answer. Only then can we answer that question (and not anything else!). The Barbara Minto Pyrmaid Principle describes this in detail.

All too often, we ask our audience to commit to some action before they understand there is a problem, or else answer a question that they are not asking. We can become good communicators. If we are unable to communicate effectively it is our fault (mostly!). Will we become good communicators? Will you?

NHibernate: How to filter on primitive collections

I am using NHibernate with a client and I keep hitting the same issue. I have entities with basic collections of strings. I want to search for entities on the basis of filtering criteria expressed against the elements of the collection.

The easy solution is to treat the collection elements as entities but this is not ideal. That really complicates the domain. In some cases, the elements are simply references to foreign entities outside the scope of NHibernate, for example in a remote service or configuration file.

In SQL, I can pose the query as a correlated subquery or as an (outer) join. This has the advantage of being efficient and does not result in the loading of the collection.

An example would be searching for a Cat that is only black when each Cat has a collection of Colours, perhaps represented by an RGB triad. It would be true normal form to extract the colours into their own table but it would also be ridiculous to do so because a foreign key already exists – the RGB triad! Another example would be finding all Cats that are partly black or partly white. There is no reason why the criteria cannot be arbritrarily complex.

I have not found a way in HQL or the Criteria API and my scenarios require the filtering to take place in the database. The result is that I am using SQL directly with NHibernate’s ISession.

Any better solutions? Does ADO.NET Entity Framework also lack this concept?

Introducing SQL Service Broker

I have been investigating SQL Server’s Service Broker component as a possible choice for enabling messaging solutions in a .Net environment for a client and I have been impressed by the feature set.

The architecture does require a bit of a mind shift as it encourages the separation of processing logic from the reception of messages. Sending messages should of course still be part of the process logic.

Why this distinction? This approach allows for rapid verification of message transfer and controlled processing of message content. It encourages developers to recognise that transfer of messages is a different task to responding to a queue of received requests. Messaging is about sending messages and reliably receiving them, not about any particular business process.

If you do not apply this design then you are in danger of causing rollbacks during message reception and this may stop a queue if it occurs five times in a row.

Do you agree with the design? What are your experiences?