Peterson's Algorithm is a simple mutual exclusion algorithm for two threads. It uses only reads and writes, so I figured that if there were only two threads, maybe it would be faster than using a regular lock with a compare-and-swap or other interlocked memory operations.
class PetersonLock
{
public:
PetersonLock()
{
m_flag[0] = m_flag[1] = 0;
}
void lock(int id)
{
assert((id==0) || (id==1));
int otherId = 1 - id;
m_flag[id] = true;
m_victim = id;
while( m_flag[otherId] && (m_victim == id) ) { }
}
void unlock(int id)
{
assert((id==0) || (id==1));
m_flag[id] = false;
}
private:
Atomic<bool> m_flag[2];
Atomic<int> m_victim;
};
The Atomic
m_victim = id; //store with release
while( m_flag[otherId] ... ) //load with acquireThere is nothing to prevent these being re-ordered, doing the load before the store does not violate the acquire and release semantics. And it turns out that this does happen, even on the relatively strong memory model of x86 processors (stores are ordered with respect to other stores, and loads are ordered with respect to other loads, but unfortunately there is no ordering between loads and stores) which breaks the mutual exclusion property of the lock.
To fix it we need to add a memory fence between these instructions, which is just as slow as an interlocked instruction, so there goes any speed advantage.