

#### Memory barriers in C

Sergey Vojtovich Software Engineer @ MariaDB Foundation





- Normal: overview, problem, Relaxed
- Advanced: Acquire, Release
- Nightmare: Acquire\_release, Consume
- Hell: Sequentially consistent
- Summoning Cthulhu: Atomic thread fence



**Abbreviations** 

#define RELAXED MY\_MEMORY\_ORDER\_RELAXED #define CONSUME MY\_MEMORY\_ORDER\_CONSUME #define ACQUIRE MY\_MEMORY\_ORDER\_ACQUIRE #define RELEASE MY\_MEMORY\_ORDER\_RELEASE #define ACQ\_REL MY\_MEMORY\_ORDER\_ACQ\_REL #define SEQ\_CST MY\_MEMORY\_ORDER\_SEQ\_CST

#define load my\_atomic\_load32\_explicit
#define store my\_atomic\_store32\_explicit
#define fas my\_atomic\_fas32\_explicit
#define add my\_atomic\_add32\_explicit
#define cas my\_atomic\_cas32\_strong\_explicit

#define fence std::atomic\_thread\_fence

```
/* Global variables */
uint32_t a= 0, b= 0, c= 0, d= 0, result= 0, ready= 0, stage= 0;
char *str= NULL;
```

```
/* Thread variables */
uint32_t v1, v2, o;
```



The problem

#### Code

a= 1; v1= b; c= 2; v2= d;



The problem

# Code Compiler a= 1; v2= d; v1= b; v1= b; c= 2; a= 1; v2= d; c= 2;



The problem -d

| Code   | Compiler          | CPU               |
|--------|-------------------|-------------------|
| a= 1;  | <pre>v2= d;</pre> | <pre>v2= d;</pre> |
| v1= b; | v1= b;            | c= 2;             |
| c= 2;  | a= 1;             | a= 1;             |
| v2= d; | c= 2;             | v1= b;            |













| Thread 1    | Thread 2                                             |
|-------------|------------------------------------------------------|
|             |                                                      |
| ready= 1;   |                                                      |
|             | <pre>while (ready != 1); assert(result == 42);</pre> |
| result= 42; |                                                      |
|             |                                                      |











#### Thread 1

#### Thread 2

| result= 42;<br>ready= 1; | <pre>assert(result == 42);</pre> |
|--------------------------|----------------------------------|
|                          | while (ready != 1);              |









| Thread 1    | Thread 2                         |
|-------------|----------------------------------|
|             | <pre>assert(result == 42);</pre> |
| ready= 1;   |                                  |
| result= 42; | while (ready != 1);              |
|             |                                  |
|             |                                  |
|             |                                  |



Rationale

### Memory barriers (jointly with atomic operations) are intended to make data changes visible in concurrent threads.





#### Memory barrier can be issued along with atomic op

my\_atomic\_store32\_explicit(&a, 0, MY\_MEMORY\_ORDER\_RELAXED);

#### or on its own (not available in MariaDB API)

std::atomic\_thread\_fence(std::memory\_order\_relaxed);

Note: thread fence is not supposed to be used alone, it must be accompanied by appropriate atomic operation.





- relaxed
- consume
- acquire
- release
- acquire\_release
- sequentially consistent (default)



### Default memory order

```
#define my_atomic_load32(a)
my_atomic_load32_explicit(a, MY_MEMORY_ORDER_SEQ_CST)
#define my_atomic_store32(a, b)
my_atomic_fas32(a, b)
my_atomic_fas32(a, b)
my_atomic_fas32_explicit(a, b, MY_MEMORY_ORDER_SEQ_CST)
#define my_atomic_add32(a, b)
my_atomic_add32_explicit(a, b, MY_MEMORY_ORDER_SEQ_CST)
#define my_atomic_cas32(a, b, c)
my_atomic_cas32_strong_explicit(a, b, c, MY_MEMORY_ORDER_SEQ_CST,
MY_MEMORY_ORDER_SEQ_CST)
```





### 1. sequentially consistent

#### 2. acquire\_release

3.1 acquire 3. release 3.2 consume

4. relaxed





c= 1; v2= d;

Atomic operation with **Relaxed** memory barrier guarantees atomicity, but doesn't impose any synchronization or ordering constraints on other loads or stores.



#### Valid with any atomic operation

```
b= load(&a, RELAXED);
store(&a, 1, RELAXED);
b= fas(&a, 1, RELAXED);
b= add(&a, 1, RELAXED);
b= cas(&a, &o, 1, RELAXED, RELAXED);
```

fence(RELAXED); // no-op





Example

#### Example

while (load(&a, RELAXED) != 1);
fence(ACQUIRE);

#### Example

cas(&a, &o, 1, ACQUIRE, RELAXED);





#### **Release barrier**



Loads and stores before **Release** can not be reordered

#### after Release.

Loads and stores after **Release** can be reordered before **Release**.

MariaDB





### Not same as write barrier!





## Meaningless alone!

assert(result == 42);





#### Thread 1



result= 42; store(&ready, 1, RELEASE);

assert(result == 42);

while (ready != 1);

### Meaningless alone!





#### Valid with atomic store or atomic read-modify-write

```
store(&a, 1, RELEASE);
b= fas(&a, 1, RELEASE);
b= add(&a, 1, RELEASE);
b= cas(&a, &o, 1, RELEASE, RELEASE);
```

fence(RELEASE); // must be followed by RELAXED atomic store or RMW

#### Not valid with atomic load

b= load(&a, RELEASE); // undefined, may become RELAXED





#### Acquire barrier



Loads and stores after Acquire can not be reordered

#### before Acquire.

Loads and stores before Acquire can be reordered

after Acquire.

MariaDB





### Not same as read barrier!





### Meaningless alone!







### Meaningless alone!





#### Valid with atomic load or atomic read-modify-write

b= load(&a, ACQUIRE); b= fas(&a, 1, ACQUIRE); b= add(&a, 1, ACQUIRE); b= cas(&a, &o, 1, ACQUIRE, ACQUIRE);

fence(ACQUIRE); // must be preceded by RELAXED atomic load or RMW

#### Not valid with atomic store

store(&a, 1, ACQUIRE); // undefined, may become RELAXED





Acquire must be always paired with Release (or

stronger). Only then all stores before Release in Thread

1 become visible after **Acquire** in Thread 2.





#### Acquire\_release barrier



Loads and stores after Acquire\_release can not be

reordered before **Acquire\_release**.

Loads and stores before Acquire\_release can not be

reordered after Acquire\_release.

Maria

### Acquire\_release memory order

#### Valid with atomic read-modify-write

```
b= fas(&a, 1, ACQ_REL);
b= add(&a, 1, ACQ_REL);
b= cas(&a, &o, 1, ACQ_REL, ACQ_REL);
```

#### Not valid with atomic load and store

b= load(&a, ACQ\_REL); // undefined, may become ACQUIRE
store(&a, 1, ACQ\_REL); // undefined, may become RELEASE



### Acquire\_release memory order

#### Thread 1 Thread 2 a= 1; stage= 1; while (stage != 2); assert(b == 1); Thread 2 b= 1; while (stage != 1); assert(a == 1);



### Acquire\_release memory order

#### Thread 1

#### Thread 2






#### Thread 1

#### Thread 2







#### Consume barrier



Consume is a weaker form of Acquire: loads and

stores, dependent on the value currently loaded, that

happen after Consume can not be reordered before

#### Consume.

MariaDB



#### Valid with atomic load or atomic read-modify-write

b= load(&a, CONSUME); b= fas(&a, 1, CONSUME); b= add(&a, 1, CONSUME); b= cas(&a, &o, 1, CONSUME, CONSUME);

fence(CONSUME); // must be preceded by RELAXED atomic load or RMW

#### Not valid with atomic store

store(&a, 1, CONSUME); // undefined, may become RELAXED





Consume must be always paired with Release (or

stronger). Only then all dependent stores before

Release in Thread 1 become visible after Consume in

Thread 2.

MariaDB



The specification of release-consume ordering is being

revised, and the use of memory\_order\_consume is temporarily discouraged.

Note that as of February 2015 no known production

compilers track dependency chains: consume

operations are lifted to acquire operations.





#### Sequentially consistent



Loads and stores after Sequentially\_consistent can

not be reordered before Sequentially\_consistent.

Loads and stores before **Sequentially\_consistent** can

not be reordered after **Sequentially\_consistent**.



# Sequentially consistent memory order

#### Valid with any atomic operation...

```
b= fas(&a, 1, SEQ_CST);
b= add(&a, 1, SEQ_CST);
b= cas(&a, &o, 1, SEQ_CST, SEQ_CST);
fence(SEQ_CST);
```

### ...but there are traps

b= load(&a, SEQ\_CST); // may become ACQUIRE + sync store(&a, 1, SEQ\_CST); // may become RELEASE + sync













MariaDB®





































• It is possible to issue memory barrier without an

associated atomic operation

- it is very advanced technology
- frequently misunderstood
- generally slower than memory barriers associated

with an atomic operation





• non-atomic and Relaxed operations cannot be

re-ordered after **Release** (first store)

- non-atomic and Relaxed operations cannot be re-ordered before Acquire (last load)
- still requires atomic operations to work as defined
- not implemented in MariaDB API.





```
#define fence __atomic_thread_fence
#define RELEASE __ATOMIC_RELEASE
#define ACQUIRE __ATOMIC_ACQUIRE
```

```
uint32_t a= 0, b= 0;
```

#### Thread 1

```
Thread 2
```

```
a= 1;
fence(RELEASE);
b= 1;
```

```
la= a;
fence(ACQUIRE);
lb= b;
```

#### MariaDB®



```
#define fence __atomic_thread_fence
#define RELEASE __ATOMIC_RELEASE
#define ACQUIRE __ATOMIC_ACQUIRE
uint32_t a= 0, b= 0;
```

#### Thread 1

#### Thread 2

```
a= 1;
b= 1;
```

```
la= a;
fence(ACQUIRE);
lb= b;
```

#### MariaDB®



```
#define fence __atomic_thread_fence
#define RELEASE __ATOMIC_RELEASE
#define ACQUIRE __ATOMIC_ACQUIRE
uint32_t a= 0, b= 0;
```

#### Thread 1

```
a= 1;
fence(RELEASE);
b= 1;
```

### Thread 2













Possible synchronizations:

- Fence-Atomic
- Atomic-Fence
- Fence-Fence





A release fence F in thread A synchronizes-with atomic

acquire operation Y in thread B, if...

| Thread A             | Thread B                |
|----------------------|-------------------------|
| fence(RELEASE); // F | load(&a, ACQUIRE); // Y |





- there exists an atomic store X (any memory order)
- Y reads the value written by X
- F is sequenced-before X in thread A

| Thread A                                            | Thread B                |
|-----------------------------------------------------|-------------------------|
| fence(RELEASE); // F<br>store(&a, 1, RELAXED); // X | load(&a, ACQUIRE); // Y |



# Fence-Atomic synchronization

In this case, all non-atomic and relaxed atomic stores

that happen-before X in thread A will be

synchronized-with all non-atomic and relaxed atomic

loads from the same locations made in thread B after F.

| Thread A                                                                      | Thread B                                                                       |
|-------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| <pre>b= 1;<br/>fence(RELEASE); // F<br/>store(&amp;a, 1, RELAXED); // X</pre> |                                                                                |
|                                                                               | <pre>if (load(&amp;a, ACQUIRE) == 1) // Y assert(b == 1); // never fires</pre> |

MariaDB



An atomic release operation X in thread A

synchronizes-with an acquire fence F in thread B, if ...

| Thread A                    | Thread B             |
|-----------------------------|----------------------|
| store(&a, 1, RELEASE); // X | fence(ACQUIRE); // F |





- there exists an atomic read Y (any memory order)
- Y reads the value written by X
- Y is sequenced-before F in thread B

| Thread A                    | Thread B                                        |
|-----------------------------|-------------------------------------------------|
| store(&a, 1, RELEASE); // X | load(&a, RELAXED); // Y<br>fence(ACQUIRE); // F |



# Atomic-Fence synchronization

In this case, all non-atomic and relaxed atomic stores

that happen-before X in thread A will be

synchronized-with all non-atomic and relaxed atomic

loads from the same locations made in thread B after F.

| Thread A                             | Thread B                                                                                                    |
|--------------------------------------|-------------------------------------------------------------------------------------------------------------|
| b= 1;<br>store(&a, 1, RELEASE); // X |                                                                                                             |
|                                      | <pre>if (load(&amp;a, RELAXED) == 1) { // Y   fence(ACQUIRE); // F   assert(b == 1); // never fires }</pre> |

MariaDB



A release fence FA in thread A synchronizes-with an

acquire fence FB in thread B, if ...

| Thread A              | Thread B              |
|-----------------------|-----------------------|
| fence(RELEASE); // FA | fence(ACQUIRE); // FB |





- there exists an atomic store X (any memory order)
- FA is sequenced-before X in thread A

| Thread A                                             | Thread B              |
|------------------------------------------------------|-----------------------|
| fence(RELEASE); // FA<br>store(&a, 1, RELAXED); // X | fence(ACQUIRE); // FB |





- there exists an atomic read Y (any memory order)
- Y reads the value written by X
- Y is sequenced-before FB in thread B

| Thread A                    | Thread B                |
|-----------------------------|-------------------------|
| fence(RELEASE); // FA       | load(&a, RELAXED); // Y |
| store(&a, 1, RELAXED); // X | fence(ACQUIRE); // FB   |



### Fence-Fence synchronization

In this case, all non-atomic and relaxed atomic stores that happen-before FA in thread A will be synchronized-with all non-atomic and relaxed atomic loads from the same locations made in thread B after FB.

| Thread A                                                                       | Thread B                                                                                                     |
|--------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
| <pre>b= 1;<br/>fence(RELEASE); // FA<br/>store(&amp;a, 1, RELAXED); // X</pre> |                                                                                                              |
|                                                                                | <pre>if (load(&amp;a, RELAXED) == 1) { // Y   fence(ACQUIRE); // FB   assert(b == 1); // never fires }</pre> |

MariaDB



### Example

```
char *data[10];
void producer()
{
  for (int i= 0; i < 10; i++)
    data[i]= strdup("some long string");
}
void consumer()
{
  for (int i= 0; i < 10; i++)
    puts(data[i]);
}
```



# Fence-Fence synchronization

### Example

```
char *data[10];
uint32_t ready[10]= { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
void producer()
{
  for (int i= 0; i < 10; i++)
    data[i]= strdup("some long string");
    my_atomic_store32_explicit(&ready[i], 1, MY_MEMORY_ORDER_RELEASE);
 }
}
void consumer()
{
  for (int i= 0; i < 10; i++)
  {
    if (my_atomic_load32_explicit(&ready[i], MY_MEMORY_ORDER_ACQUIRE) == 1)
      puts(data[i]);
  }
}
```



# Fence-Fence synchronization

### Example

```
char *data[10];
uint32_t ready[10]= { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
void producer()
{
  for (int i = 0; i < 10; i++)
    data[i]= strdup("some long string");
  fence(MY MEMORY ORDER RELEASE);
  for (int i = 0; i < 10; i++)
    my_atomic_store32_explicit(&ready[i], 1, MY_MEMORY_ORDER_RELAXED);
}
void consumer()
{
  for (int i = 0; i < 10; i++)
  {
    if (my atomic load32 explicit(&ready[i], MY MEMORY ORDER ACQUIRE) == 1)
      puts(data[i]);
}
```
# Fence-Fence synchronization

### Example

```
char *data[10];
uint32_t ready[10]= { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
void producer()
{
  for (int i= 0; i < 10; i++)
    data[i]= strdup("some long string");
    my_atomic_store32_explicit(&ready[i], 1, MY_MEMORY_ORDER_RELEASE);
 }
}
void consumer()
{
  uint32_t tmp[10];
  for (int i = 0; i < 10; i++)
    tmp[i]= my_atomic_load32_explicit(&ready[i], MY_MEMORY_ORDER_RELAXED);
  fence(MY_MEMORY_ORDER_ACQUIRE);
  for (int i= 0; i < 10; i++)
    if (tmp[i] == 1)
      puts(data[i]);
}
```

#### MariaDB®

# Fence-Fence synchronization

### Example

```
char *data[10];
uint32_t ready[10]= { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
void producer()
{
  for (int i = 0; i < 10; i++)
    data[i]= strdup("some long string");
  fence(MY MEMORY ORDER RELEASE);
  for (int i = 0; i < 10; i++)
    my_atomic_store32_explicit(&ready[i], 1, MY_MEMORY_ORDER_RELAXED);
}
void consumer()
{
  uint32_t tmp[10];
  for (int i = 0; i < 10; i++)
    tmp[i]= my_atomic_load32_explicit(&ready[i], MY_MEMORY_ORDER_RELAXED);
  fence(MY_MEMORY_ORDER_ACQUIRE);
  for (int i = 0; i < 10; i++)
    if (tmp[i] == 1)
      puts(data[i]);
}
```

#### MariaDB®

References

http://en.cppreference.com/w/cpp/atomic/memory\_order

https://en.wikipedia.org/wiki/Memory\_ordering

http://preshing.com/20140709/the-purpose-of-memory\_order\_consume-in-cpp11/



© 2017 MariaDB Foundation