3rd edition:

chap. 5  Concurrent and Race conditions

old world: interrupts mostly the problem.

new world:  SMP means TH vs TH, and BH vs BH, and TH vs BH.

and TH kernel preemption ... (which is TH vs TH)

pitfalls in scull:
-----------------

see code. p. 107,  
	you can't assign a ptr without serial access

if A assigns first, B will overwrite it,  hence we have a memory leak

this is a classic race condition:

concurrency and its management
------------------------------

race conditions occur because of shared access to "resources"
	(usually memory, but hw resources are another possibility,
	if you have one buffer in a network card, you can't have
	two "threads" sucking a packet out at one time)

if we avoid global memory, we might be able to minimize it
	BUT WE HAVE A SMALL STACK

we need serialization mechanisms (mutex/locking ... whatever you call it).

semaphores and mutexes
----------------------

add locking to scull ... with semaphores

TH can "sleep" (or block) and give up the CPU.  BH never sleeps.
(it can however delay execution but never mind).

semaphores CAN block.

standard semaphore:

init to 1

P to sleep

serial access

V to get out

TH "protocol"

The Linux Semaphore Implementation
----------------------------------

p. 110.

functions:
sema_init(), down*, up are the real functions.

macros packaging functions:

static declaration:
DECLARE_MUTEX(name) <--- init to 1
DECLARE_MUTEX_LOCKED(name) <--- init to 0

dyn decl:
init_MUTEX
init_MUTEX)LOCKED

note down vs down_interruptible.  You probably want down_interruptible.
(Can cntrl-c out of it)

Using Semaphores in scull
------------------------

note scull_dev has semaphore in it.  thus one semaphore per minor device.

could have used 1 global sem., but no reason to make process A block
while process B is using the *other* device.

see init code: p. 112

Reader/writer semaphores
------------------------

Semaphores come with various usage patterns.  init to 1, init to 0,
and have bottom-half wake you up, reader/writer is 1 such pattern.

we can allow multiple readers access without restriction as long
as they are not CHANGING the data.  RO tasks can run in parallel.
The point here is that readers do not lock out readers.
readers lock out writers.  writers lock out readers.
if writers and readers contend, writers get in 1st. 
If many writers you may not want to use it as it can lead to
reader starvation.

linux has special sem called: "rwsem" or reader/writer sempahore
for this.

see functions p. 113-114.

Completions
-----------

common prog. pattern:

1. initate some activity outside of current thread
2. wait for completition (block) of that activity

e.g., create new thread, or user-space process, some hardware-based action

may use semaphore as on p. 114, BUT

problem with that situation is that often event you are waiting for
is faster than you expect, and other problems may exist:

completions are a lightweight (compared to sems) mechanism:

top-half calls: wait_for_completion 

bottom-half calls: complete or complete_all, see p. 115

see code example p. 116

note read/write top-half interaction:

spinlocks:
-----------------------------

semaphores block.  you cannot block in BH.  
semaphores are slow because they may block.  you may want finer
	time granularity

spinlock is similar an atomic variable with test/set semantics
(how implemented is arch. dependent, typically need machine instruction/s)

lock:
while (x)
   ;
x = 1;

do something to data

unlock
x = 0
	
cons: note the spinlock.  what happens if we naively execute that on
a single cpu?

introduction to spinlock API:
-----------------------------

see p. 117
init static/dynamic

spin_lock
spin_unlock

spinlocks and atomic context
-----------------------------

example:

lock

copy_from_user

unlock

what could go wrong here?   

1.
block in copy_from_user, 
2.
higher level process kicks you out after you get lock
3. 
another scenario:
	you take out a lock
	your interrupt handler interrupts and takes out the same lock 
	... TH cannot run to free the lock

two possible outcomes:

1. delay (likely)
2. some possibility of deadlock (possible)

rule:
code holding spinlocks should be atomic and should not sleep
	and minimal ...

background:
if you hold a spinlock, your cpu will not be preempted

hard to avoid sleep ... must have intimate knowledge of TH functions

must disable interrupts on local CPU if bh contents for that lock

keep spin lock times short

the spinlock functions
-----------------------------

spin_lock - 
spin_lock_irqsave - disable intrs on local cpu only, save interrupt status regs
spin_lock_irq - same, but don't save status
spin_lock_bh - disable sw interrupts if we get the lock, but leave hw interrupts alone

if tasklet code can use spinlock, use spin_lock_bh
if hw interrupt code can use spinlock, use spin_lock_irqsave as 1st choice

reader/writer spinlocks
-----------------------------

similar to r/w semaphores,  

situation: > 1 reader can get into crit. section, 1 writer only can get it.
	writer can starve readers if busy.
	
p. 120

locking traps
-----------------------------

lots of experience about what can go wrong ...  

ambiguous rules
-----------------------------

be clear and explicit (and comment...) how spinlocks are to be used.

e.g.,
	functionA sets lock Z
		functionB is called and wants lock Z too ...

scull design:

	if invoked from system call, acquire sem to lock device structure
	all internal functions can assume they have access to device structure

lock ordering rules
-----------------------------

in general the less locks the better.  danger can come > 1 lock.

take the locks in the same order

	take lock 1

	take lock 2

avoid this deadlock:

	thread 1 takes lock 1
	thread 2 takes lock 2

	neither can proceed

if you have 2 locks

	1 is your data structure lock
	2 is global lock

	take local lock first

don't do a down on a semaphore when you hold a spinlock

fine versus coarse-grained locking
-----------------------------

kernel started out with big-kernel lock.  don't use it.

locking gets finer and finer granularity.  means kernel becomes more and more complex.

lockmeter: http://oss.sgi.com/projects/lockmeter

	measure how much time spent waiting for locking

alternatives to locking
-----------------------------

alogirthms can exist that provide a different way to access data and avoid spinlocks

lock-free algorithms
-----------------------------

e.g., circular buffer:

	writer writes at one end.  
	reader reads at the other end.

trick here is that threads modify only certain variables and do not share
those variables.

could we have a circular buffer with minimized locking, 1 writer,
and many readers?  how would you do that?

note: <linux/kinfo.h> provides generic circular buffer implementation.

atomic variables
-----------------------------

n_op++ may not be atomic.  (it is on some CPUs)

lock
access
unlock

is not efficient.

need arch. independent way to do this:

see primitives p. 125

p. 126 top is important.  if you have operations requiring multiple
atomic_t variables, you need locking.

bit operations
-----------------------------

p. 126

seqlocks
-----------------------------

1.
another mechanism for fast, lockless access to a small simple, frequently
	accessed shared resource

2. write access should be rare.  readers get freeaccess but much
check for collisions with writers, and if that is the case, retry

cannot use with pointers ...

see read code example p. 128
 
read/copy/update
-----------------------------

another way to avoid spinlocks.

places constraints on types of data structure that it can protect.
optimal use: 
  	1. reads are common/writes are rare.  
	2. pointers should be how resource is accessed
	3. resources must be held by atomic code

when data structure needs to be changed, 
	writing thread makes a copy.
	changes the copy
	aims relevant pointer at new version
when kernel is sure no references to old version remain, it can be freed
how to do that:
	we know that code that holds data structure is atomic.
	once every cpu on system has been scheduled at least once, all
		references are gone.
	RCU schedules callback function, callback frees old storage

read code on p. 129
write code on p. 130, note callback function passed in