Visual Studio 2010 is littering

The new Visual Studio 2010 is great, especially the C++ Intellisense that actually works properly, but it has a problem. It litters in the project folder. The old .ncb Intellisense files are gone, replaced by .sdf files. But now there's a whole new 'ipch' subfolder added to the project, containing even more intellisense cache files.

But wait, don't update your source control ignore lists just yet! VS2010 supports a fallback location for the intellisense files, designed to be used when the project is on a network share and VS doesn't have permission to write to the project folder. The good part is that you can tell VS to always use the fallback location, even for local projects. The fallback options are buried deep in the Text Editor/C/C++/Advanced category, just set them once and enjoy your cleaner project folders.

Dependency injection for beginners (like me)

Since starting to practise proper TDD, I started to learn dependency injection too. It's a simple pattern that makes a world of difference. The following are some resources which I found helpful when getting to grips with it:

One thing I found confusing at first (but is now obvious) is how to inject into the dynamic/transient objects created at runtime. The static wiring is simple - with Ninject it's just a single call to kernel.Get<Application>(), and the entire application will be wired up. But when creating an object at runtime, it's not quite so easy. At first I tried to use the same pattern:

class ShoppingCartItem
{
   private ILogger logger;
   private string name;

   public ShoppingCartItem(ILogger logger, string name)
   {
      this.logger = logger;
      this.name = name;
   }
}

class ShoppingCart
{
   private IKernel kernel;

   public ShoppingCart(IKernel kernel)
   {
      this.kernel = kernel;
   }

   public void AddItem(string name)
   {
      ShoppingCartItem item = kernel.Get<ShoppingCartItem>(
         new ConstructorArgument("name", name));
      ...
   }
}

This works, but it is wrong! Ninject will self-inject the kernel to the ShoppingCart, and then inject the ILogger to the ShoppingCartItem. But this is bad for a couple of reasons:

  • The code is now coupled to Ninject. Changing the dependency injection framework now requires all the code which creates objects to be changed.
  • Passing extra non-injected parameters to the ShoppingCartItem constructor is now difficult. It's possible in Ninject, but the syntax is horrible and brittle.
A much better way is to use a factory:

class ShoppingCartItemFactory
{
   private ILogger logger;

   public ShoppingCartItemFactory(ILogger logger)
   {
      this.logger = logger;
   }

   public void CreateItem(string name)
   {
      return new ShoppingCartItem(logger, name);
   }
}

class ShoppingCart
{
   public ShoppingCart(ShoppingCartItemFactory itemFactory)
   {
      this.itemFactory = itemFactory;
   }

   public void AddItem(string name)
   {
      ShoppingCartItem item = itemFactory.CreateItem(name);
      ...
   }
}

Now everything is decoupled, the syntax is clean, and we still have the ability to inject mock items by injecting a different factory implementation.

Email obfuscation

Finally added my email address to the site, and spent way too long investigating different obfuscation techniques. From a quick survey of the blogs in my RSS reader, it seems that most people go with ASCII encoding or plain old fashioned munging. However I found an article which suggests that several email harvesters will decode those just fine, and since the article is already over 2 years old I dread to think how much better the harvesters are today. Went with a javascript obfuscator in the end, as it should be transparent to nearly all users, and seems like it should be sufficient.

Hash collisions

At least in the game industry, hash functions are often used to generate unique identifiers for assets such as textures, models. This is a little dangerous in theory, as collisions are always possible if unlikely, so the correct approach is to use the hashes for fast lookup and then do a full equality comparison on the returned values. But what's the actual probability of a collision?

It's more likely than you might think, as this is an example of the birthday paradox. The approximate formula for the probability of a collision is:

p = 1 - e^[-n^2/(2*N)]

where N is the size of the hash space, and n is the number of values used.

With 10,000 values in a 32-bit space, plugging the numbers into the formula gives a collision probability of 1.1%. So fairly unlikely, but it wouldn't be too surprising.

With 100,000 values in a 32-bit space, the probability jumps to 68%. Much higher than you might intuitively expect, given the space of 4 billion possible values.

Changing to a 64-bit hash and 100,000 values, probability of a collision is 2.7e-8%, much more comfortable.

Even with 10,000,000 values in the 64-bit space the probability is still only 0.00027%.

With 1,000,000,000 values in the 64-bit space, the probability is 2.6%... starting to become a more serious concern, but if you have a billion assets then you have bigger problems than hash collisions.

With a 128-bit hash (e.g. md5) and 1,000,000,000 values, the probability is 1.4e-19%... cosmic rays flipping bits in your memory chips should be a bigger concern at this point.

RockScroll

This is very cool - a plug-in for Visual Studio which replaces the standard scrollbar with a zoomed out image of the current source file, complete with syntax highlighting. With just one click it will jump to the exact part of the code you specified. It also highlights all breakpoints and bookmarks. But the best feature is the search. Just double-click a word and it instantly highlights all occurrences of that word.

MFC Woes

While working on some old MFC code, I made an unpleasant discovery - the standard CTreeCtrl class doesn't support multiple selection. This was a little hard for me to believe at first, it's not exactly an advanced feature. Even worse, all the free implementations I could find on the web had bugs. One had such funky message handling that it worked on XP but not Vista. Another doesn't implement keyboard handling, and therefore breaks when the user tries to use the keyboard.

Working with legacy code is so much more difficult after being spoiled with new modern frameworks.

Strict Aliasing

Strict aliasing, or type based alias analysis as it is also known, is enabled by default in new versions of GCC (at least when optimizing with -O2), but unfortunately most programmers don't even know about the existence of the strict aliasing rules. To the uninitiated it seems that GCC is producing buggy code, so perhaps Microsoft is smart not to have implemented strict aliasing in Visual C++ yet.

The best explanation of strict aliasing I have seen is Understanding Strict Aliasing by Mike Acton. He also has a related article about the restrict keyword, Demystifying the Restrict Keyword.

Peterson's Algorithm

Peterson's Algorithm is a simple mutual exclusion algorithm for two threads. It uses only reads and writes, so I figured that if there were only two threads, maybe it would be faster than using a regular lock with a compare-and-swap or other interlocked memory operations.

class PetersonLock
{
public:
   PetersonLock()
   {
      m_flag[0] = m_flag[1] = 0;
   }
   void lock(int id)
   {
      assert((id==0) || (id==1));
      int otherId = 1 - id;

      m_flag[id] = true;
      m_victim = id;
      while( m_flag[otherId] && (m_victim == id) )      { }
   }
   void unlock(int id)
   {
      assert((id==0) || (id==1));
      m_flag[id] = false;
   }
private:
   Atomic<bool> m_flag[2];
   Atomic<int> m_victim;
};

The Atomic and Atomic classes implement reads and writes with acquire and release semantics respectively, which I thought would give the correct memory ordering. But it turns out that this is insufficient, as in the middle of the lock function we have this sequence:

   m_victim = id;      //store with release
   while( m_flag[otherId] ... )      //load with acquire

There is nothing to prevent these being re-ordered, doing the load before the store does not violate the acquire and release semantics. And it turns out that this does happen, even on the relatively strong memory model of x86 processors (stores are ordered with respect to other stores, and loads are ordered with respect to other loads, but unfortunately there is no ordering between loads and stores) which breaks the mutual exclusion property of the lock.

To fix it we need to add a memory fence between these instructions, which is just as slow as an interlocked instruction, so there goes any speed advantage.

C++ FAQ Lite Riposte

I discovered the C++ FQA recently, it's a nice summary of all the things that suck and are broken in C++. It uses the C++ FAQ Lite as the starting point for its attacks, which is somewhat of an easy target, as I have always found the C++ FAQ Lite to be an excellent resource for demonstrating just how horribly complicated C++ actually is.

Memory ordering, barriers, acquire and release semantics

The problem

Multithreading is difficult, and multithreading without locks is even more difficult. This post is about one of the problems: out-of-order memory accesses.

Here's an example of the unexpected behaviour caused by out-of-order processors (assume variables are initialized to 0 in all the following examples):

Example 1
Thread 1 Thread 2
a = 1
b = 2
x = b
y = a

After running this code, if x is 2, then is y guaranteed to be 1? Unfortunately the answer is no, x==2 and y==0 is a possible result. This can happen because the writes in thread 1 can be re-ordered:

Example 1, re-ordered execution
Thread 1 Thread 2
b = 2
...
...
a = 1
...
x = b
y = a
...

Here's another example:

Example 2
Thread 1 Thread 2
x = 1
a = y
y = 1
b = x

We would assume that after running this code then either a or b must be set to 1. Again, this is not the case, as the instructions can be re-ordered:

Example 2, re-ordered execution
Thread 1 Thread 2
a = y
x = 1
b = x
y = 1

What on earth is actually going on here?

So why are we seeing all these instructions being re-ordered? We don't see this when writing a program without multithreading, so why does it start happening in multithreaded code? The answer is that it always happens - it's just that the processor is nice enough to hide it from us when we're running in a single thread of execution. Processors have separate caches, memory writes can be buffered, the processor can re-order instructions, all of which will affect when the memory accesses actually happen. The processor hides this from single threaded code using careful cache protocols, tracking dependent instructions, etc.

For these reasons, memory ordering is not an issue when running multiple threads on a single core, and these problems only show up when running on a machine with multiple processors or multicore processors. (Not totally true, the compiler can re-order stuff too, which can break it on single core machines too, see below).

Traditional locking methods for writing multithreaded applications handle this automatically. The locks will include the necessary memory barriers for the platform (see acquire and release semantics below) to ensure that memory appears consistent to the other processors. Sometimes programmers writing multithreaded code try to be clever, and avoid locking when doing something simple like just setting a flag:

Example 3
Thread 1 Thread 2
a = 100
isReady = true
while( !isReady ) { } //wait for flag
x = a //do something with 'a'

And this code will then fail horribly when ran on a multicore CPU. I've written code like this many times, before I knew better. When writing lock based code, the rule is simple - always protect shared data using a lock. Trying to be cute is a bad idea.

Compiler re-ordering

Unfortunately it's not just the processor which can re-order memory accesses. The compiler will also re-order instructions as part of its optimizing pass. This means we can still see memory ordering issues on single core machines, and machines which would otherwise have strong memory consistency guarantees (e.g. x86). Compiler intrinsics such as _ReadWriteBarrier can be used to block the compiler from re-ordering instructions, or the volatile keyword can have the same effect. Sometimes the barrier can be implicit - an inline assembly block will also function as a barrier on most compilers, so the implementation of memory barriers using an inline assembly block for the barrier opcode will also function as a compiler barrier.

Acquire and release semantics

A refinement of the full memory barrier model is acquire and release semantics. This is often referred to as 'load-with-acquire' and 'store-with-release'. The naming comes from the types of barriers needed when implementing locks. Acquiring a lock is always associated with a load instruction, to confirm that the lock has been successfully acquired (often a compare-and-swap). Releasing a lock is likewise always associated with a store instruction.

The fences required for a lock to be consistent must ensure that all memory accesses between the acquire and release of the lock must remain inside the lock, they can not be moved either before the acquire or after the release. It is allowed to move instructions outside the lock to the inside however, this will not affect the lock property. Therefore a load-with-acquire is a one-way fence which allows movement downwards but not upwards, and a store-with-release is a fence which allows instructions to move upward but not downward.

In C# and Microsofts Visual C++ 2005, volatile reads have acquire semantics and volatile writes have release semantics. This actually makes volatile useful again in C++ (see first reference below for why it is currently useless). However it looks like the C++ standard will take a different approach, C++0x has defined atomic types which will have acquire and release semantics.

References

  1. Volatile: Almost Useless for Multi-Threaded Programming
    A discussion of why the C++ volatile keyword is insufficient for multi-threaded programs, and why memory fences are needed instead.
  2. Memory Ordering in Modern Microprocessors, Part 1 and Part 2
    Explanation of re-ordering and memory barriers, from a Linux viewpoint. Part 2 describes the memory models of many processor types.
  3. Blog post about the .NET memory model
    Explains acquire/release, and how the .NET memory model compares to some real processors. Somewhat out of date now, the .NET framework 2.0 has a stronger model.
  4. Lockless Programming Considerations for Xbox 360 and Microsoft Windows
    Good article on lockless programming, with comparisons between Xbox 360 and x86.
  5. Trip Report: Ad-Hoc Meeting on Threads in C++
    The memory model and threading library for the future C++ standard.
  6. Understand the Impact of Low-Lock Techniques in Multithreaded Apps
    Good article from Vance Morrison explaining memory models, also in relation to the .NET memory model. See also an older blog post from him.
  7. Blog post from Kang Su
    Explains why explicit memory fence instructions are not usually needed on x86.
Syndicate content