TheAnig

Back

One of the projects in CSE 6431 (Operating Systems) this past semester had us write a multithreaded restaurant simulation. I read the spec, thought “producer-consumer with extra steps,” and then spent three nights debugging a deadlock that only showed up with exactly four diners and two cooks. The sort of bug where you stare at the terminal, the program just hangs, and you start questioning whether you actually understand what a mutex does.

I want to write this up because the problem turned out to be a surprisingly good way to learn synchronization primitives. The resources you’re coordinating are physical things you can picture: tables that fill up, cooks that get busy, a single burger machine two cooks are fighting over. And the path from “I have no idea how to synchronize this” to a working solution went through several dead ends that were as instructive as the final code.

The implementation is in C++14 using the standard threading library. The full source is on GitHub.

The problem#

Restaurant 6431 has three kinds of constrained resources. Tables limit how many diners can eat at once. If all tables are occupied, arriving diners wait outside. Cooks each handle one order start to finish, picking up the next order only after delivering the current one. And there are three machines (burger, fries, soda), each of which exists as a single instance. The burger machine takes 5 minutes per burger. Fries take 3 minutes. Coke takes 1 minute. Only one cook can use a given machine at a time.

Diners show up over a 120-minute window. They get seated, place an order (at least one burger, some fries maybe, optionally a coke), and then wait for their food. A cook grabs the order, uses each machine in sequence to prepare the items, and delivers everything at once. The diner eats for exactly 30 minutes and leaves, freeing the table.

Input looks like this:

5        <- number of diners
3        <- number of tables
2        <- number of cooks
0 2 1 1  <- diner 0: arrives at t=0, orders 2 burgers, 1 fries, 1 coke
10 1 0 0 <- diner 1: arrives at t=10, orders 1 burger, no fries, no coke
20 1 1 1 <- diner 2: arrives at t=20
30 3 2 0 <- diner 3: arrives at t=30
45 1 1 1 <- diner 4: arrives at t=45
plaintext

Decomposing it into threads#

I went with the most direct mapping: one thread per diner, one thread per cook, main thread as orchestrator. A diner thread sleeps until its arrival time, waits for a table, drops an order on the queue, waits for the cook to signal that the food is done, eats for 30 minutes, frees the table, exits. A cook thread loops forever, pulling orders off a shared queue and using the machines to fill them.

You could do this with a thread pool or one thread per table, but for an assignment about learning OS primitives, having each entity be its own thread makes the synchronization problems more visible. And the thread counts here are small enough (maybe 20-50 diners, a handful of cooks) that the overhead doesn’t matter.

The relationship between diners and cooks is producer-consumer. Diners produce orders, cooks consume them. The shared buffer is the order queue. But there’s a second communication channel going the other direction: after a cook finishes an order, it needs to tell the specific diner “your food is ready.” That part took me a while to get right.

┌─────────┐     ┌──────────────┐     ┌────────┐     ┌──────────┐
│  Diner  │────>│  Order Queue │────>│  Cook  │────>│ Machines │
│ threads │     │  (bounded)   │     │ threads│     │ (shared) │
└─────────┘     └──────────────┘     └────────┘     └──────────┘
     │                                    │
     │          ┌──────────────┐          │
     └──────────│  DinerSlot   │<─────────┘
    (wait food) │  (per-diner) │ (notify food ready)
                └──────────────┘
plaintext

Synchronization#

I ended up with four distinct synchronization mechanisms. Each one protects a different resource, and each one uses a different primitive (or combination of primitives) for a reason.

Tables: a counting semaphore#

Tables are identical and interchangeable. You have K of them, and N diners competing for them. A diner grabs one (the P operation), holds it while eating, and releases it (the V operation). This is exactly what counting semaphores are for.

C++14 doesn’t have std::counting_semaphore. That showed up in C++20. I tried using the POSIX sem_t first, but sem_init and sem_destroy are deprecated on macOS and the compiler warnings were annoying me. So I wrote one from scratch, which turned out to be a useful exercise anyway since the implementation is basically the textbook definition of a semaphore:

The lambda in wait() is the predicate. I learned the hard way that you should never call cv.wait(lock) without one. The OS is allowed to wake your thread even when nobody called notify (these are called spurious wakeups, and they really do happen). The predicate form re-checks the condition each time the thread wakes up, so spurious wakeups just cause a harmless re-check and the thread goes back to sleep.

One thing I want to point out about signal(): the notify_one() call happens after the lock is released, not inside the locked scope. If you notify while holding the lock, the woken thread tries to grab the same mutex immediately and just blocks again. People call this “hurry up and wait.” It’s not incorrect, but it’s a wasted context switch.

Usage in the diner thread is simple:

table_sem->wait();       // block until a table is free
// ... sit, order, eat for 30 min ...
table_sem->signal();     // free the table
cpp

The order queue: producer-consumer with shutdown#

The order queue needs a std::mutex for exclusive access and a std::condition_variable so cooks can sleep when there’s nothing to do and get woken up when a diner places an order.

Diner side (producer):

{
    std::lock_guard<std::mutex> lock(order_queue_mtx);
    order_queue.push(Order{id, burgers, fries, coke});
}
order_queue_cv.notify_one();
cpp

Cook side (consumer):

{
    std::unique_lock<std::mutex> lock(order_queue_mtx);
    order_queue_cv.wait(lock, [] {
        return !order_queue.empty() || restaurant_closed;
    });

    if (order_queue.empty() && restaurant_closed) {
        return;  // no more orders, shut down
    }

    order = order_queue.front();
    order_queue.pop();
}
cpp

The diner uses std::lock_guard and the cook uses std::unique_lock, and the reason isn’t stylistic. condition_variable::wait() needs to release the mutex while the thread is sleeping and re-acquire it when it wakes up. lock_guard doesn’t support that; it locks in the constructor and unlocks in the destructor, nothing else. unique_lock exposes lock() and unlock() so the condition variable can manipulate it. In C++14 this is the rule: lock_guard for straightforward lock-unlock scopes, unique_lock when you need a condition variable or deferred locking.

The restaurant_closed flag in the predicate is how cooks know to exit. Without it, after the last diner finishes, cooks would sit there waiting for orders forever. The main thread sets this flag after joining all diner threads, then wakes all the cooks with notify_all(). I’ll come back to the subtleties of this shutdown sequence later.

I use notify_one() on the diner side because only one cook needs to wake up per order. notify_all() would wake every cook, they’d all race for the lock, one would grab the order, and the rest would find the queue empty and go back to sleep. Not a correctness problem, but pointless work.

Machines: three independent mutexes#

Each machine gets its own std::mutex. When a cook needs to make a burger, they lock the burger machine mutex, sleep for 5 simulation minutes, and release it.

static std::mutex burger_machine_mtx;
static std::mutex fries_machine_mtx;
static std::mutex soda_machine_mtx;
cpp
// cook preparing burgers
for (int b = 0; b < order.burgers; ++b) {
    std::lock_guard<std::mutex> lock(burger_machine_mtx);
    sim_sleep(5);
}
cpp

I initially locked the burger machine for all the burgers at once. If someone ordered 3 burgers, that was 15 minutes with the machine locked. The other cook couldn’t use it at all during that time, even between burgers. The fix was locking per burger and releasing between each one. Whether the OS actually schedules the other cook in that tiny gap between lock release and re-acquire depends on timing, but at least the opportunity is there.

One thing worth noting: a cook uses the burger machine, then fries, then soda, always in that order, and always releases one before acquiring the next. No cook ever holds two machine locks at the same time. I’ll come back to why that matters in the dead ends section.

Food delivery: per-diner condition variables#

Once a cook finishes all items for an order, the diner needs to know their food is ready. I gave each diner its own slot with a mutex, a condition variable, and a boolean:

struct DinerSlot {
    std::mutex mtx;
    std::condition_variable cv;
    bool food_ready = false;
};
cpp

Cook signals the diner:

{
    std::lock_guard<std::mutex> lock(diner_slots[order.diner_id]->mtx);
    diner_slots[order.diner_id]->food_ready = true;
}
diner_slots[order.diner_id]->cv.notify_one();
cpp

Diner waits for the signal:

{
    std::unique_lock<std::mutex> lock(diner_slots[id]->mtx);
    diner_slots[id]->cv.wait(lock,
                             [&] { return diner_slots[id]->food_ready; });
}
cpp

I considered using a single shared condition variable for all diners instead, but that has problems I’ll get into later. The short version: notify_one() on a shared CV wakes an arbitrary waiter, not necessarily the one whose food is ready.

DinerSlot caused a compile error that confused me for a while. std::mutex and std::condition_variable are neither copyable nor movable, which means you can’t put DinerSlot objects in a std::vector and call resize(). The vector needs to move elements during reallocation, and it can’t. The fix was heap allocation through unique_ptr:

std::vector<std::unique_ptr<DinerSlot>> diner_slots;

for (int i = 0; i < num_diners; ++i) {
    diner_slots.push_back(
        std::unique_ptr<DinerSlot>(new DinerSlot()));
}
cpp

I used new DinerSlot() instead of std::make_unique<DinerSlot>(). make_unique is technically C++14, but I ran into issues with an older GCC version on one of the lab machines. new just works everywhere.

Timing and logging#

Running the simulation in real time would take two hours. I scaled it so 1 simulation minute = 100 real milliseconds:

static constexpr int TIME_SCALE_MS = 100;

static void sim_sleep(int minutes) {
    std::this_thread::sleep_for(
        std::chrono::milliseconds(minutes * TIME_SCALE_MS));
}
cpp

All threads share a common sim_start time point set right before any threads launch:

static int sim_elapsed() {
    auto now = std::chrono::steady_clock::now();
    auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(
                  now - sim_start).count();
    return static_cast<int>(ms / TIME_SCALE_MS);
}
cpp

I used steady_clock instead of system_clock because system_clock can jump backwards. NTP sync, DST transitions, the user changing their clock. steady_clock is guaranteed monotonic. Seems pedantic until your simulation produces negative elapsed times because your laptop decided to sync with a time server halfway through a run.

For logging, I wrapped std::cout in a mutex. Without it, output from different threads interleaves character by character and you get garbage. The logging function uses a variadic template:

template <typename... Args>
static void sim_log(Args&&... args) {
    std::lock_guard<std::mutex> lock(cout_mtx);
    std::cout << "[t=" << sim_elapsed() << "] ";
    using expander = int[];
    (void)expander{0, (void(std::cout << std::forward<Args>(args)), 0)...};
    std::cout << std::endl;
}
cpp

The expander trick is a C++14 workaround for the lack of fold expressions (those arrived in C++17 as (std::cout << ... << args)). You create an array initialization where each element triggers a side effect. The void(...) cast discards operator<<’s return value, the , 0 coerces each element to int for the array type, and the leading 0 handles the zero-arguments case. I found this pattern on Stack Overflow and it took me a while to understand why it works. It’s not pretty but it’s the standard approach in C++14.

The shutdown race#

Getting the shutdown sequence right took two attempts. The main thread needs to join all diner threads first (so all orders are placed and all food is eaten), then tell the cooks to stop, then join the cook threads.

My first version set restaurant_closed = true without holding the order queue mutex. This was a race condition. Here’s the scenario: a cook checks the wait predicate, sees restaurant_closed is false and the queue is empty, so the predicate returns false and the cook is about to go to sleep. But before it actually calls wait(), the main thread sets restaurant_closed = true and calls notify_all(). The notification fires before the cook is waiting, so the cook misses it. Then the cook enters wait() and sleeps forever. Nobody wakes it up because the main thread already sent its notification.

Setting the flag under the same mutex the condition variable uses eliminates this. The cook either sees the flag before it waits (and exits), or it’s already waiting when the flag is set and the notify_all() wakes it. There’s no in-between state.

Also, notify_all() here instead of notify_one(). Every idle cook needs to wake up and see the shutdown flag. If you only wake one, the rest hang.

Running it#

With the sample input (5 diners, 3 tables, 2 cooks), the output looks roughly like:

[t=0]  Diner 0 arrives, is seated, places order (2 burgers, 1 fries, 1 coke)
[t=0]  Cook 0 starts preparing order for Diner 0
[t=10] Diner 1 arrives, Cook 1 handles their order
[t=10] Cook 0 still working on Diner 0's burgers (burger machine contention)
[t=14] Cook 0 finishes Diner 0's order, delivers food
[t=30] Diner 3 arrives but all 3 tables are occupied — waits
[t=44] Diner 0 finishes eating (30 min), frees table — Diner 3 seated
[t=99] Last diner leaves, cooks clock out
plaintext

The two interesting moments: at t=30 Diner 3 can’t get a table because all three are occupied, and has to wait 14 minutes until Diner 0 finishes eating. And at t=10 Cook 1 wants the burger machine but Cook 0 is still using it. These are the contention scenarios the whole assignment is built around, and it’s satisfying to see them actually appear in the output.

Dead ends#

The final design didn’t come together on the first try. Or the second. I went through a few architectures that either didn’t work at all or worked badly enough that I had to rethink things. These are worth talking about because the failures taught me more than the working code did.

The global lock#

My very first attempt was embarrassingly naive. I created one std::mutex for the entire simulation and locked it around every shared operation. Every time a diner wanted a table, every time a cook grabbed an order, every time anyone touched a machine, it went through the same lock.

static std::mutex global_mtx;

// diner thread
{
    std::lock_guard<std::mutex> lock(global_mtx);
    // check for free table, sit down, place order...
}

// cook thread
{
    std::lock_guard<std::mutex> lock(global_mtx);
    // grab order, use burger machine, use fries machine...
}
cpp

This compiles. It doesn’t deadlock. It doesn’t have data races. It is technically correct. And it completely defeats the purpose of the assignment.

The problem is that a global lock serializes everything. Cook 0 can’t use the burger machine while Cook 1 uses the fries machine, because both operations require the same lock. Diner 5 can’t sit down at an empty table while Cook 0 is making a burger, because again, same lock. The simulation runs, but it runs as if there were only one thread. The output is deterministic and boring. There’s no contention anywhere because nothing is actually concurrent.

I think this is the instinct everyone has at first: “shared state is dangerous, so protect all of it with one lock and nothing can go wrong.” And that’s true in the narrow sense that you won’t get data races. But you’ve also eliminated all parallelism, which in a simulation about resource contention is like solving the problem by defining it away. The whole point is that the burger machine and the fries machine are independent resources. Two cooks should be able to use them at the same time. A global lock makes them artificially dependent.

This is what people mean by fine-grained vs. coarse-grained locking. Coarse-grained (one lock for everything) is easy to reason about but limits concurrency. Fine-grained (one lock per resource) is harder to get right but lets independent operations proceed in parallel. The restaurant problem is specifically designed to punish the coarse-grained approach because the interesting behavior only emerges when resources are independently contended.

I scrapped this and started over with per-resource synchronization, which is what the rest of the post describes.

Busy-waiting instead of condition variables#

Before I understood condition variables, I used spin loops. The idea was simple: if a cook needs to wait for an order, just keep checking in a loop.

And for diners waiting for food:

// diner thread - busy waiting for food
while (true) {
    {
        std::lock_guard<std::mutex> lock(diner_slots[id]->mtx);
        if (diner_slots[id]->food_ready) break;
    }
    // spin...
}
cpp

No condition variable, no wait(), just a tight loop that grabs the lock, checks a flag, releases the lock, and does it again. Thousands of times per second. Per thread.

It works in the sense that the simulation produces correct output. But running top while it was executing showed every core pegged at 100%. With 10 diners waiting for food and 2 cooks checking the order queue, that’s 12 threads all hammering the CPU doing nothing useful. The machine’s fans spun up. On a system with fewer cores than threads (which is the common case when N is 20+), the spin-waiting threads are actively stealing CPU time from the cook threads that are doing real work. The simulation actually ran slower than it would have with proper blocking, because the OS scheduler kept giving time slices to threads that just checked a flag and looped again.

This is the fundamental problem with busy-waiting: a spinning thread consumes CPU without making progress, and it competes for scheduling with threads that could make progress. It’s the difference between standing at someone’s desk asking “are you done yet?” every two seconds vs. telling them “email me when you’re done” and doing something else in the meantime.

Condition variables solve this by asking the OS to deschedule the thread entirely. A thread that calls cv.wait() gets taken off the CPU. It doesn’t get scheduled again until another thread calls notify_one() or notify_all(). Zero CPU usage while waiting. The OS can give those cycles to threads that actually have work to do.

There’s a middle ground called std::this_thread::yield() that you can put inside the spin loop:

if (!got_one) {
    std::this_thread::yield();
    continue;
}
cpp

This at least tells the OS “I don’t have useful work right now, give my time slice to someone else.” It’s better than a bare spin, but it’s still polling. The thread gets rescheduled eventually and checks the flag again. On a lightly loaded system the difference between yield() and a proper CV isn’t huge, but under load it still wastes scheduling overhead. Condition variables are the right tool here.

The one case where spin-waiting legitimately beats condition variables is when the wait duration is extremely short, shorter than the overhead of a kernel context switch (typically a few microseconds). In those cases, the thread would spend more time going to sleep and waking up than it would spend spinning. Some lock-free data structures use this. For a restaurant simulation where waits are measured in simulated minutes, we’re nowhere near that territory.

Shared condition variable for food delivery#

This one was closer to correct. Instead of giving each diner its own condition variable, I used a single shared one. All diners waiting for food would call wait() on the same CV. When a cook finished an order, it would set that diner’s food_ready flag and call notify_all(). Every sleeping diner would wake up, check their flag, and either proceed (if their food was ready) or go back to sleep.

This is correct. No data races, no missed notifications, every diner eventually gets their food. I ran it and the output matched the expected behavior.

The issue is performance. Every time a cook finishes one order, notify_all() wakes up every diner that’s currently waiting for food. If there are 15 diners seated and waiting, all 15 wake up, acquire the mutex one by one, check their boolean, and 14 of them find it’s still false and go back to sleep. That’s 14 unnecessary context switches, 14 unnecessary mutex acquisitions, 14 unnecessary predicate checks. Per order completion.

With 5 diners, you don’t notice. I measured it with 50 diners and it was fine too, honestly. The overhead of a few dozen spurious wakeups is small in absolute terms. But it’s O(N) work per food delivery when it should be O(1), and on a real system under load those unnecessary context switches and mutex contentions add up.

You might think “just use notify_one() instead of notify_all().” But notify_one() wakes an arbitrary waiter. There’s no guarantee it wakes the diner whose food is actually ready. If Cook 0 finishes Diner 3’s order and calls notify_one(), the OS might wake Diner 7. Diner 7 checks its flag, sees false, goes back to sleep. Diner 3 never wakes up. The cook has moved on to the next order. Eventually some other notify_one() call might wake Diner 3, but the timing is unpredictable and you can get diners stuck waiting long after their food is done.

Per-diner condition variables give you targeted notification. The cook knows which diner’s order it just finished, calls notify_one() on that specific diner’s CV, and exactly one thread wakes up with a guarantee that its flag is true. No thundering herd, no wasted wakeups.

Pipelining machine access#

This was the most seductive wrong idea. A cook needs the burger machine, then the fries machine, then the soda machine, in sequence. What if we overlapped the end of one machine’s use with the start of the next? The cook could lock the fries machine before releasing the burger machine, so there’s no gap where another cook could steal the fries machine and force our cook to wait.

The code looked something like this:

The idea is to “hand off” between machines without a gap, so the cook never gets preempted between stages, reducing total order time.

I walked through the traces on paper and initially convinced myself it was fine. Both cooks always acquire machines in the same order (burger -> fries -> soda), so classic circular-wait deadlock can’t happen, right? And in the simple case it does work:

t=0:  Cook 0 locks burger_machine, starts making burgers
t=0:  Cook 1 tries burger_machine... blocked
t=10: Cook 0 finishes burgers, locks fries_machine, THEN releases burger_machine
t=10: Cook 1 gets burger_machine, starts making burgers
t=15: Cook 1 finishes burgers, tries fries_machine... blocked (Cook 0 has it)
t=13: Cook 0 finishes fries, releases fries_machine, locks soda_machine
t=15: Cook 1 gets fries_machine
plaintext

No deadlock. But the trace reveals the real problem even in the non-deadlock case. At t=10, Cook 0 holds both the burger and fries locks simultaneously, even if only for the instant between acquiring fries and releasing burger. During that instant, Cook 1 can’t acquire the burger machine. More importantly, at t=15 Cook 1 finishes its burgers and tries to lock the fries machine while still holding the burger lock (that’s the whole point of pipelining). So Cook 1 is now blocking on fries while holding burger, and a hypothetical Cook 2 can’t even start on burgers.

This is the global lock problem in miniature. You’re creating artificial dependencies between resources that should be independent. The burger machine and the fries machine have nothing to do with each other, but by holding both locks you’ve coupled them.

The deadlock risk is real too, even if it didn’t show up in my traces. In this specific problem both cooks acquire in the same order, which prevents circular wait. But the code is fragile. If order compositions ever caused different cooks to skip machines in different sequences (say one cook goes burger->soda while another goes burger->fries), you could get acquisition order inversions. And if you or a future student modifies the code to handle a new machine type or a different preparation order, the deadlock potential is there waiting. I spent enough time debugging the shutdown race that I wasn’t eager to introduce another class of timing-dependent bugs.

The sequential approach (lock burger, use it, release it, then lock fries, use it, release it) is simpler and gives better throughput. Yes, there’s a gap between releasing the burger machine and acquiring the fries machine where another cook could grab the fries machine first. But that’s correct behavior. The fries machine is a shared resource and other cooks have a legitimate claim to it. Trying to prevent that interleaving is trying to add atomicity where it doesn’t belong.

Debugging tools#

One thing I want to mention separately because it saved me hours: ThreadSanitizer. You compile with -fsanitize=thread and run the program normally. It instruments all memory accesses and lock operations and tells you exactly which locations have unsynchronized access from multiple threads, with full stack traces. It caught the restaurant_closed shutdown race immediately, something I might have chased for days with print-statement debugging. The program runs 5-10x slower under ThreadSanitizer so you wouldn’t leave it on for normal runs, but for finding data races it’s the best tool I’ve used.

g++ -std=c++14 -pthread -fsanitize=thread -g -o restaurant restaurant.cpp
./restaurant < input.txt
bash

If it finds a race, the output looks something like “WARNING: ThreadSanitizer: data race” followed by two stack traces showing the conflicting accesses. The specificity is what makes it useful. Instead of “something is wrong with my shutdown logic,” you get “this write on line 147 conflicts with this read on line 203, and neither is protected by a lock.”

What I actually learned#

The biggest thing that stuck with me from this project was that the synchronization design fell into place once I stopped thinking about threads and started thinking about resources. “What are the shared things, and what does each one need?” Tables need a counter with blocking. The order queue needs mutual exclusion with sleep/wake. Machines need mutual exclusion. Food delivery needs point-to-point notification. Once I had that list, choosing the primitives was almost mechanical. The dead ends I described above all came from thinking about the problem the wrong way around, starting with “how do I protect things” instead of “what are the things.”

The other takeaway is more practical: get comfortable with the toolchain. ThreadSanitizer found bugs I wouldn’t have caught through testing. steady_clock instead of system_clock prevented a class of timing bugs. The predicate form of cv.wait() prevented spurious wakeup bugs. None of these are deep insights. They’re just the kind of defaults you build up by getting burned once.

Compile with g++ -std=c++14 -pthread -o restaurant restaurant.cpp and run with ./restaurant < input.txt.

Multithreaded Restaurant Simulation in C++14
https://theanig.dev/blog/restaurant-6431-concurrency
Author Anirudh Ganesh
Published at January 17, 2019