CS 185 C Class Notes

 

Chapter 1    Introduction

 

Motivation

This class has several purposes.

 

The class emphasizes the details of using C language in embedded programs.  The optional text, Programming Embedded Systems, gives a broad and complete overview of embedded programming in C.

 

Class Prerequisites

This class assumes you know the basic architecture of computers, CPU, stored program in memory, integer variables, etc.; and you know how to program, either in C or C++, or have a basic understanding of C and the ability to program in another procedural language like Java or Pascal.

 

Texts for Course: These Notes and K&R

The lectures will not follow a textbook.  These notes are provided to students for the purpose of reviewing what is taught in the lectures.

 

These notes review only selected special topics regarding the C language.  Depending on the need of the students, more elements of the C language will be covered in class.  Please use the required text, C Programming Language (2nd Edition), by Kernighan & Ritchie, as a reference.

 

No matter how well you know C, it will be useful in this class for you to know it better.  If you know C well, please read K&R at the beginning of the course regardless.  If not, read it at the beginning and go back over it until you understand it.  Questions about C, no matter how basic, are welcome in class.

 

You should carefully read chapters 1 through 6 and appendix A (except A13).  Appendix B is less important.  Chapters 7 and 8 are not necessary for this class.

 

Pay particular attention to the following advanced topics:  Declarations (Appendix A4, A6 and A8), Pointers (Chapter 5), and Structures (Chapter 6).

 

Optional Text

Programming Embedded Systems, Second Edition, by Barr and Massa, covers the subject of the class exactly.  This class goes through specific examples and details which are not in the book, so it cannot be used as a text for this class.

 

The book is an excellent text for the subject of embedded programming in C.  Anyone who intends to continue learning embedded programming should read it.  At the general level, it covers the subject much better than the class notes.  It also covers embedded operating systems which we will not do in class.

 

Chapter 4 explains in detail, the GNU tool-chain, which we will use in class.

 

Format of These Notes

Each chapter of the class notes has sections for the discussion of hardware, programming and examples.  The hardware sections discuss the operation of the processor, its built-in peripheral devices or external hardware devices included on the Development Board you will use in this class.  The programming sections discuss the C language, details of how the machine code produced from C language code controls the processor, how to use C language to control specific features of the processor, algorithms specific to low level programming  and tips, tricks and techniques for short effective and maintainable code.

 

 

Hardware section:

 

Embedded system

An embedded system is computer system with a fixed purpose, unlike a personal computer.  Examples range from a lawn sprinkler controller or a microwave oven with only timer and on-off switching functions to network routers which typically run a full Linux operating system.

 

This limitation of purpose allows the included software to be fixed, never changing except for possible updates controlled by the manufacturer.  Most embedded computer devices do not include disk drives.  The software is stored in read-only memory or, in flash memory, which can be modified, but only slowly and a limited number of times.

 

The fixed nature of embedded software has led to the term "firmware" - more difficult to change than software but easier to change than hardware.

 

Writing firmware for full sized embedded operating systems (like Linux or Unix-like commercial operating systems) resembles writing software for personal computers more that it resembles firmware for small embedded systems.  Large embedded systems combine a few elements of small embedded systems, like the lack of keyboard, mouse and display, and the need interact directly with the processor or custom hardware, with main stream software issues like network protocols, high level operating system services and programming in JAVA and scripting languages.

 

Small embedded system firmware

Small embedded systems often have no operating system.  In this case, the only operations the processor executes are those specified the C code you write (except for a small block of assembly language code which runs before the C code to initialize global variables and set up the stack pointer).

 

We will study a small embedded system, learning the issues specific to embedded systems without the distracting complexity of larger systems.  For the largest embedded systems, this knowledge needs to be combined with knowledge of high level software issues and techniques, and details of the services of full sized operating systems.

 

Small embedded system hardware

Small embedded systems can be built using nothing more than a single microcontroller integrated circuit.  A practical product would include a robust power supply, hardware for user interface like buttons, an LCD display and maybe sound, and probably electronics to convert the low power digital inputs and outputs of the microcontroller to whatever is being controlled.

 

Microcontroller

The microcontroller we will use contains flash memory that holds the firmware, RAM memory used to hold operating variables, the required clock and reset generation functionality as well as a complete processor.  Once programmed this chip can be connected to DC power (3 to 5 volts) and will begin running the programmed firmware.

Connecting to one or more of its I/O pins allows it to do real work.

 

Development Board for this class

We will use a development board which connects to a USB port on a Windows PC.  This connection provides power and allows the PC to download programs to the microcontroller as well as emulate a terminal connected to its serial port.  The board has two buttons and four LEDs connected to the microcontroller I/O pins and two peripheral chips connected to the microcontroller's SPI bus.  It has a loudspeaker to output sound from either a digital output of the microcontroller or the DAC (digital to analog converter) peripheral.

 

 

Processor Architecture Section:

 

Here, in the introduction, we will review computer processor architecture adding comments related to small microcontrollers.  Later, we will learn the specifics of the core architecture of the particular microcontroller that we will be using as well as that of its peripheral parts.

 

Basics of a processor

Most readers will already know these general facts about processors in general:

·        The processor fetches and then executes instructions from memory.

·        The order of execution is sequential except when specifically modified by an executing instruction.

·        Instructions can read data from and write data into specific memory locations.

·        Instructions can also read and write data in a small number of registers.

·        Instructions can do operations (arithmetic and bit manipulation) on data.

 

Microcontrollers in particular

Microcontrollers differ from ordinary processors in that they have the memory the processor uses on the same chip.  In most cases this is very much less memory that a normal PC.

 

Separate memory areas for program and data

The type of memory for instructions is physically different than that used for data because the instruction memory must be preserved when power is removed and later restored while data memory must be writable as easily as it is read.  In this case, it is usual to number the memory locations in each type of memory separately.  In other words, the memory at any address, 0x0004 for example, in program memory is not related to the memory at address 0x0004 in data memory.  It is said that the program and data address spaces are separate.  (This is called Harvard architecture.  The single address space of larger processors where the contents at an address can be treated as an instruction or as data is called Von Neumann architecture.)

 

Microcontrollers with separate address spaces have a special instruction to read data from program memory.  Although it is not useful for a program to be able to read its own instructions, it is very useful to store constant data in program memory and then allow an executing program to read them as needed.

 

Nonvolatile memory

Where flash memory is used as program memory, there is usually a way for an executing program to write to it.  Almost always, this is done only to install a new firmware version rather than as a part of normal operation because it delays program execution longer than is usually acceptable and because the number of possible rewrites of flash memory is limited.

 

Many microcontrollers have EEPROM (writable non-volatile) storage that is separate from both data and program address spaces.  It is not treated as data memory because its timing is different, and writing is so slow that software must continue to execute after a write operation is initiated and return later when it is complete.

 

Lack of features for speed and security

Microcontrollers are designed to favor simplicity over speed (with rare exceptions).  They do not use cache and so operate at relatively low speed, ten to forty, rarely up to one hundred, million instructions per second, and instructions that use data memory are often slower.  Memory mapping (MMU) and hardware data transfer (DMA) are absent.  Hardware implementation that can be done more slowly in software, such as division and floating point data, often even multiply, are absent.  Protection features like privileged mode, memory protection and error exceptions are absent.

 

Extra of features for use in electronic devices and circuits

Unlike large processors, microcontrollers are intended to reduce the size and complexity of a product, so they include as much peripheral hardware functionality as feasible, such as multiple counters and timers, pulse generators, various serial port types and analog converters, as well as functionality needed to support the processor such as clock and reset generation.  Many microcontrollers are designed for battery powered applications with minimized power consumption and special sleep modes.

 

Complex peripherals like Ethernet and USB have recently become available integrated in microcontrollers but at extra cost.

 

Many embedded products use more powerful processors

These comments apply to microcontrollers which contain their own memory and are complete systems in themselves.  Many embedded products contain more powerful processors with large amounts of separate memory chips and resemble full computers more than they resemble the microcontrollers we will study.

 

Upcoming

The next chapter will examine in detail the particular microcontroller we will use.

 

 

C Language Section:

 

High level features without blocking access to low level details

The C language, and its successor C++, are unique in providing abstraction and independence from low level machine details without making these details inaccessible. 

 

Most high level languages present a closed, self-sufficient programming environment in which the programmer needs to know nothing the underlying hardware details.  People who write code that owns the whole system: embedded systems and operating systems, need more, they need to be able to control every aspect of the processor, not just move data and do computations.

 

Assembly language = machine instructions

Every processor has an assembly language in which the programmer codes the native instructions that the processor executes.  Assembly language therefore allows the programmer to control every operation the processor is designed to do.  Assembly language programming is tedious, difficult and differs radically between each of the many processor types.  Many lines of assembly code are needed for what can be done each line of high level code, making it much more prone to errors and difficult to maintain.

 

High level features automate rote tasks

High level languages hide the differences and automate tasks such as arithmetic on variables larger than the size supported by the processor, choosing locations in memory to store variables and converting complex algebraic expressions into a linear sequence of machine instructions.

 

C Language is intended for both abstraction and machine level control

C was originally designed as a high level language with which to rewrite an early version of UNIX previously written in assembly language.  It intentionally combines the advantages of high level abstraction with the ability to override these abstractions.

 

Pointers are necessary

Support for the pointer data type is avoided by some languages because its use makes programs harder to verify and more prone to coding errors.  Pointers in C parallel the manipulation of addresses at the machine instruction level.  At the C level, pointers allow operations that, although unnecessary for computation and potentially dangerous, can be done just as in assembly language, but concisely, clearly and conveniently.

 

Programs have meaning to humans as well as the machine

Well written programs can be read at two levels - as instructions to be executed precisely according to the specifications of the computer language - and as a human readable description of how the program operates.  Similarly, a single piece of C code can be read both as a high level program hiding unnecessary details and, where needed, as a tool to manipulate data at the level of bytes and words stored at specific memory locations.

 

You need to know details of the machine and how C uses them

Obviously, to be able to use C for hardware level manipulation of data, it is necessary to understand how to manipulate data at that level.  Therefore, we will need to study the architecture of the processor, especially how memory is used.

 

An important related subject is the details of operation of the machine instruction (assembly language) version of a C program which is produced by the C compiler to execute on the processor.  Particularly, knowledge of its usage of the stack is needed in the embedded case where the memory is size is limited.  (The entire memory of the microcontroller we will use is a fraction of the amount a Windows program reserved for the stack of a single thread.)  This knowledge is necessary also to interface sections of assembly language code with C and very useful when debugging a program.

 

Summary -> Understand C thoroughly to use it for embedded programming

In summary, the use of C (or C++) is almost mandatory in small embedded systems, understanding the C language completely and thoroughly is greatly useful and it is necessary to understand assembly language programming but preferable to avoid using it wherever possible.

 

Upcoming

The upcoming chapters will review the C language emphasizing techniques used for embedded programming along with general tips for organization and maintainable code.  Features of the GNU C compiler and related parts like the linker will be presented.

 

 

Program Example Section:

 

The upcoming chapters have program examples that run on the development board and demonstrate features of the microcontroller and programming that have been presented. Each chapter has a Program Example Section which discusses presents and discusses the examples related to that chapter.

 


 

Chapter 2    Memory

 

This chapter goes over the memory included on the microcontroller chip.

 

Although the first chapter described microcontrollers in general, starting now, we will be discussing the particular microcontroller that we are going to use, the Atmel AVR ATmega168.  Almost everything will apply to all microcontrollers in the Atmel AVR family.  All the principles we will learn apply to all microcontrollers.

 

Program and data address spaces

The most important memories are program memory and data memory.  As already mentioned, in this microcontroller, addresses that refer to memory locations are understood to be separate for program and data memory.  Address 0x0004 might refer to an instruction stored in program memory, or a separate location in data memory, depending on the context in which it is used.  This is different from the large processors which may be familiar to you, which have only one addressing space for both instructions and data.

 

 

Hardware section:

 

The ATmega168 has 1 Kbytes of data memory and 16 Kbytes of program memory.  The data memory is SRAM which can be read and written very simply.  The program memory is flash memory for which reading is simple, but writing is slow and cumbersome.  This is appropriate because the program is written only once, when the microcontroller is programmed.  The flash is nonvolatile – the program remains when power is turned off.

 

Program memory

Program memory addresses go from 0x0000 to 0x1FFF.  This range is 8 K addresses.  Each instruction takes a multiple of two bytes – most are 16 bits (2 bytes), a few are 32 bits (4 bytes).  Unlike single address space processors, where every byte has an address, program space addresses refer to 16 bit memory locations.  (Note that the GNU linker, which is designed mainly for byte addresses, uses addresses that count bytes.  Therefore all address numbers it prints out are twice the number seen and used in the program and by the processor.)

 

Data memory

Data memory addresses go from 0x0100 to 0x04FF.  Each byte is addressed normally.

 

Addresses in the range 0x0000 to 0x00FF represent registers.  These include the 32 general purpose registers (which are normally used in instructions without referencing their addresses, a few other registers used by the processor like the stack pointer, and hardware peripheral control registers which the program writes and reads to control the on-chip peripherals.  Not all locations in the range 0x0000 to 0x00FF are used.

 

There are two other memories in the microcontroller: EEPROM and Fuse Bits.

 

EEPROM

The EEPROM is physically similar to the flash:  Both are nonvolatile.  Reading happens at the normal speed of program execution, writing takes several milliseconds – 10,000 times slower.  Both must be erased before writing.  The EEPROM is intended to be used for storing data.  In the EEPROM, individual bytes can be erased and rewritten; in the flash, large sections must be erased at one time.  The EEPROM can be rewritten 100,000 times; the flash only 10,000 times.  Most importantly EEPROM can be rewritten while the program is running; the flash can be rewritten during program execution only in a limited way.  The addresses in the EEPROM are not in either address space.  It is not known to the C Compiler.  It is accessed by reading and writing several

 

Fuse bits

The Fuse Bits consist of three bytes which control aspects of the operation that never change – mostly electrical characteristics.  They cannot be changed by the program.

 

Initial programming

The program memory, the fuse bits and optionally the EEPROM need to be programmed when the product is manufactured.  The EEPROM can be changed at any time under control of the program.  The program memory can be changed under program control with limitations, normally only to update the firmware after the embedded system is already in use.

 

Processor Architecture Section:

 

In this section, we examine the data memory area from the program point of view.

 

Data Memory Area

Memory locations 0x0100 to 0x04FF are ordinary SRAM.  They just retain whatever value is stored.  Locations 0x0020 to 0x00FF are control registers for the processor and for on-chip peripheral devices.  For these, reading does not necessarily give what was previously written.

 

Peripheral Control Registers – not exactly memory

Some of these locations are read-only, reading gives information that may change from time to time, writing has no effect.  For example, the Port C Input register, named PINC, gives the digital state of the voltage levels on the “Port C” pins – 6 of the 28 pins on the microcontroller.

 

If you want to force the voltage on the pins to digital one or zero, you write the desired bit pattern to a different register, the Port C Output register, named PORTC.  If you write 0x00 to PORTC, to pull all the pins to low voltage, but a stronger external device forces it high, reading PINC will show the actual voltage level.

 

Most writable registers can also be read and show the same value that was previously written.  In this way, they are the same as ordinary memory.  The difference is that there is the side effect – the voltages on the pins change in this case.  The purpose of allowing such registers to be read as well as written is to allow the program to see what value is in the register.  The program could keep a copy in normal memory and update it each time the control register were changed.  Allowing the register to be read makes that unnecessary.

 

In the example above where PORTC is set to zero, but a pin forced high, reading PORTC will show it low and reading PINC will show it high.  (Also, the microcontroller will overheat.)

 

Port C can also be used in input mode where the pins can be driven freely by external devices without conflict.  The Port C Data Direction register determines whether

 

There are a few cases of read/write registers where the value written and read are less related or not related at all.  There are interrupt flag bits in several registers which show as 1 if the hardware is requesting an interrupt, 0 if not.  The program can clear that flag by writing 1.  Writing 1 to that bit, when it reads as 1, causes it to read as 0 afterwards.

 

Writing to the UART data register, UDR0, causes 8 bits to be transmitted serially on a microcontroller pin.  Reading the same pin gets 8 bits previously received on a different pin.

 

Despite these exceptions, most writable control registers act like memory – reading gives the value most recently written there.

 

Use Register Names

PINC is read as location 0x0026.  You will not need to know the actual locations except perhaps when you are debugging a program.  Programs always use a symbolic name for such locations, PINC, in this case.  In C language, there is a header file where PINC is defined as (*(volatile uint8_t *)(0x26) – it is a number that is specified to be a memory location.  Related but different microcontrollers, in the same family, also have a PINC register, but it might be a different location in the memory area.  Using the name, PINC, in a program reduces the number of changes needed to make the program run on a different processor.  It also, obviously, makes the program easier to understand

 

Implications for C

The point of the preceding discussion is that the C compiler does not differentiate control registers from ordinary data memory.  Accessing control registers in C will be discussed later in the C language section below.

 

C Language Section:

 

Pointers are variables which contain the address (location in memory) of variables.  The type of the pointer specifies how the contents of the memory are to be interpreted. Dereferencing such a pointer in a C program causes the compiler to produce machine instructions to fetch from (or write to) the location in data memory given by the value of the pointer.  C also supports pointers to functions.  In that case, the address is understood to be in program memory.  To dereference a pointer to function, the compiler produces a machine instruction that calls a function.  The location called is similarly given by the value of the pointer variable, but it is in program memory. 

 

Casting Pointers

Casting pointers is a feature of C that should never be used in high level programs because it can make programs non portable by exposing features of the hardware that change from system to system.  For programs that interact directly with the hardware, this exposure is crucial.

 

Here is a brief summary of casting for C programmers unclear on it.  First, understand that the sense of the word, cast, is not the meaning, to throw, rather it is the meaning, to melt a substance and pour it into a container in order to give it a particular shape.  So, casting a variable means changing its “shape” without changing its “substance” – specifically changing its type without changing its value.

 

You seldom need to cast variables because C automatically translates types.  For example

   long x;

   float f;

   f = 1.;

   x = (long)f;

In the last line the float value 1 (internal representation 0x3f800000) gets converted to the integer type before it is stored in x.  In this case, its new internal representation (now 0x00000001) is obtained by a rather complicated set of steps, but its value is still the number 1.  It is not necessary to use the cast in this case because C knows that this is a normal and useful type of conversion.

 

When you cast a pointer, the value never changes (the address remains the same address), but the meaning changes.  When the new pointer is dereferenced (the value is read from memory as specified by the address in the pointer), the same stored data (at the same address) is interpreted differently.  In the following example, the first pointer points to memory containing data representing a variable of type float.  The program creates a second pointer whose data type is long, and copies the address in the first pointer into the second using a cast.  The data in memory has not changed, but when dereferencing reads it into a variable of type long, the data has a different meaning.

   long x;

   float f;

   long * px;

   float * pf;

 

   f = 1.;

   pf = &f;

   px = (long *)pf;

   x = *px;

In the line, pf = &f, pf is assigned the address the memory location (say 0x04dc for example) which holds the first of the four bytes of the value of f.  Casting pf to type (long *) in the line, px = (long *)pf, leaves its value (0x04dc) the same but makes it a pointer to long and stores it in px.  In the line, x = *px, the value of the long at that location, is copied to x.  The value of x is now 1065353216.  Why?  In the last line x gets the four byte value at location 0x04dc, which is 0x3f800000.  That binary bit pattern, in a long, is 1065353216 in decimal, 0x3f800000 in hexadecimal.

 

The explicit cast in the second to last line is necessary.  Without it, the statement, px = pf, is an error.   The compiler knows that converting a pointer-to-float to a pointer-to-long leads to a result that is normally not desired or correct.  Therefore, it does not make the conversion automatically.  The explicit cast is a direct instruction to the compiler to make the conversion.  Since the programmer clearly intends this conversion, the compiler does not treat it as an error.  (In C++, casting is separated into four types.  The type we are using here is called the reinterpret_cast<>, in C++.)

 

This is an extreme example.  Even hardware aware programs do not normally read float data as an integer.  Mostly, we will cast integers which represent memory addresses into pointers.  Sometimes we will cast a pointer to a long integer into a pointer to a single byte.  Then we could, for instance, change the upper byte without changing the rest.  (We could do the same thing using shift and mask operations.  Sometimes, casting produces more efficient code.)

 

Casting an Integer Address to a Pointer – to Read a Specific Memory Location

Now consider the code which reads the Port C pins and puts the bits in x. 

   unsigned char x;

   x = ( (*(unsigned char volatile *)(0x26) );

This is a shorter way of saying

   short addr;

   unsigned char volatile * px;

   unsigned char x;

 

   addr = 0x26;

   px = ((*(unsigned char volatile *)addr;

   x = *px;

The 16 bit variable addr is assigned the value 0x0026.  Then the pointer px is given that value.  The variable addr must be cast explicitly because it is unsafe to convert an integer to a pointer.  The last line reads the value of the memory location at address 0x0026, which is actually the Port C Input Register rather than memory, but that does not matter to the compiler.

 

The short version of the program uses internal temporary variables instead of addr and px in the same way that

   x = (a + b) * (c + d);

is an abbreviation of

   t1 = a + b;

   t2 = c + d;

   x = t1*t2;

 

Once you understand casting to pointers, you can do many operations related to talking to hardware registers in the data address space.

 

Volatile Keyword

The meaning of the keyword, volatile, in ( (*(unsigned char volatile *)(0x26) ) is somewhat subtle, but very important in embedded programming.  Syntactically, it is called a variable modification, like the keyword const.  The keyword, const, applied to a variable, tells the compiler that it should not allow the program to modify that variable.  The keyword, volatile, tells the compiler that the variable could unexpectedly be changed.

 

You can always write a working program without using the const keyword – it just helps the compiler to find errors and makes the program easier to understand.  There are cases where the program will run incorrectly if the volatile keyword is lacking.

 

In C, the variables normally are kept in specific locations in memory.  The machine code produced by the compiler often moves the value of a variable into a register before it can use it.  If the value is used several times, the compiler produces code that reuses the value in the register rather than reading it out of memory a second time.

 

This makes the code smaller and faster.  It normally works because the compiler normally knows whether it has produced code that could have changed the value of the variable.  The volatile keyword forces the compiler to read the variable from memory every time it is used.

 

In the case above where the variable is located at the address of a register, the value does change spontaneously, every time the inputs to port C change.  Without the volatile keyword, the compiler would likely assume the value never changes and produce code that would read it only once, save that value and reuse it.

 

Location of Modifier Keyword in a Pointer Declaration

The location of a modifier in the declaration of a pointer is important.  The following declarations are both syntactically correct, but they cause different operations in programs.

   unsigned char volatile * p1;

   unsigned char * volatile p2;

 

p1 is a pointer to a volatile byte.  If the value of p1 is 0x0026, then the compiler must reread memory location 0x0026 each time is *p1 evaluated, in case the content of that location has changed.  The volatile keyword is next to unsigned char; it refers to the unsigned char to which the pointer points.

 

p2 is a volatile pointer to a byte.  If the value of p2 was 0x0026, then the compiler must reread the value of p2 in case the value is no longer 0x0026. The volatile keyword is next to p2; it refers to p2, the pointer itself.

 

Usually it is the first case, the referred location is volatile, not the value of the pointer.

 

 

Program Example Section:

 

Here is our first program.  It blinks an LED on the development board.

 

The program first sets the Port C Data Direction register to 0x0F.  This makes the four lowest port C pins (PC0, PC1, PC2, PC3) into outputs.  The other two remain inputs.

 

The variable val will control the LEDs it will alternate between 0x00 and 0x01.  Decrementing the variable count is used to delay between the two states of the LED.  Otherwise the blink would be too fast to see.

 

A Program to Blink an LED

#include <inttypes.h>

#include <avr/io.h>

 

void main()

{

    int32_t volatile count;

    uint8_t val;

 

    DDRC = 0x0F;

    val = 0x00;

     

    for(;;)

    {

        for( count=0x100000; count>=0; --count ) continue;

        val = val ^ 0x01;

        PORTC = val;

    }

}

 

The symbols, DDRC and PORTC, are macros defined in <avr/io.h> as something like ( (*(unsigned char volatile *)(0x26) ).  The symbols, int32_t and uint8_t, are typedefs defined in <inttypes.h> as long and unsigned char.

 

Volatile Keyword Again

Note a different use of the volatile keyword in the declaration of count.  Without it, the compiler would (correctly) decide that the line

    for( count=0x100000; count>=0; --count ) continue;

is irrelevant to every calculation in the program, and eliminate it in the interest of execution speed and code size.  That would eliminate the delay that we intended.  Declaring count to be volatile forces the compiler to decrement and test it regardless (as well as forcing it to use the real variable in memory rather than a copy in a register).

 

The Program to Blink an LED, in Assembly Language

Before we get into C tools, we will use a version of this program written entirely in assembly language.  All subsequent programs will be written entirely or almost entirely C.  In this case only, we do not need to use the C compiler. 

 

Instead of C variables in memory, we use several of the many general purpose registers in this processor.  R16 is used as val above.  R20, r21 and r22 are used as count.  (Even though count is four bytes long, we only use three here because the top byte is always zero.  If we wanted very long delays, we would use a fourth.)  R17 is used to hold the constant 1 that is used in the line

        val = val ^ 0x01;

 

Here is the same functionality in assembly language.

 

    .equ DDRC = 0x07   

    .equ PORTC = 0x08

 

    .org 0          ; program starts at location 0 of program memory

 

    ldi r16, 0x0F   ; put value 0x0F in r16

    out DDRC, r16   ; port C bit 0-3 are output

 

    ldi r16, 0x00   ; value to output on port C

    out DDRC, r16   ; output it

    ldi r16, 0x01   ; put value 0x01 in r16

 

loop2:

    ldi r20, 0x00

    ldi r21, 0x00

    ldi r22, 0x10   ; load 24 bit counter with 0x100000

 

loop1:

    subi r20, 1

    sbci r21, 0

    sbci r22, 0     ; subtract 0x000001 from counter

    brcc loop1      ; loop if not negative

 

    eor r17, r16    ; change bit 0 of value to output

    out PORTC, r17  ; output value to port C pin

    rjmp loop2

 

Readers who are familiar with Intel x86 assembly language will be misled by the out instructions.  There is no separate I/O address space.  The out instruction causes a value to be stored in memory, but it is optimized to address locations 0x0020 to 0x003F.  The instruction,

    sts PORTC, r17

would also work.  This instruction can address all data memory locations 0x0000 to 0xFFFF, but the instruction occupies 32 bits, whereas the out instruction occupies only 16 bits.

 

 

 

Chapter 3    Data Representation

 

This chapter describes how the data that represents the value of variables is stored in memory.

 

Hardware Section:

 

The hardware components involved in data storage are the processor and the data memory.  These are included on the microcontroller chip.

 

The processor contains a computation unit (ALU) and 32 temporary storage registers which hold eight bits each.

 

The data memory has 1024 storage locations which hold eight bits each.

 

 

Processor Architecture Section:

 

Data Width is 8 Bits

Data memory is eight bits wide.  This means that each of the memory locations having a unique address, holds 8 bits.  In all modern processors, each unique address holds 8 bits, even though many read and write a larger number of bits in each operation.  These 8 bits, when considered as a unit, is called a byte.

 

The processor we use is an eight bit processor.  Most of the operations it does act on groups of 8 bits, one byte.  In particular, all memory operations always read or write single bytes and almost all computation operations use single byte inputs and give a single byte output.

 

The size of data used by a processor is called the machine word size.  Other processors use machine word size of two, four or eight bytes. 

 

Hexadecimal Notation

Hexadecimal notation is useful to describe data.  It uses sixteen possible values for each digit instead of ten.  Traditionally, 0x is prefixed to distinguish it from normal (decimal) notation.  One hexadecimal digit represents the numbers zero through fifteen (15 = 24-1), two digits 0 to 255 (255 = 28-1), four digits 0 to 65535 (65535 = 216-1), etc.  Therefore, we use two hexadecimal digits to represent each byte.  The possible values of a byte go from 0x00 to 0xFF.

 

If we want to write out the contents of a series of memory locations, we would write a series of groups of two digits, for example:

11 24 1F BE 8F EF 94 E0

(A 32 bit processor would have groups of eight digits.)

 

Multiple Word Operations

C language variables are 8, 16 or 32 bits wide.  Longer variables require the processor to do operations like addition in pieces.

 

We will look at how to do arithmetic on variable that are multiple machine words in length.

 

The first issue is how to store multi-word variables in memory.  These variables are stored as a sequence of words in consecutive addresses.  In the case of an eight bit processor, these words are bytes.  In other processors, each word is several bytes, but a C variable could still be several words.

 

Consider the C type, long (32 bits = 4 bytes), with a value of one thousand decimal.  1000 = 3*28 + 12*24  + 8*20 – therefore 0x000003C8.  The five leading zeros make the length 8 hexadecimal digits to show that is a 32 bit number, rather than 0x03C8 which would indicate a 16 bit number.  Divided into bytes, this is 0x00*224 + 0x00*216 + 0x03*28 + 0xC8*20.  This number is stored in memory as:

C8 03 00 00

The least significant byte (LSB) is stored first.  Least significant means the 20 byte.  First means stored at the lowest consecutive address.

 

Not all processors store the least significant byte first.  Others store 0x000003C8 as

00 00 03 C8 – most significant byte (MSB) first.

 

Endian-ness

As long as the variables stored in memory are used only by code running on the same kind of processor, the order of storage is not a concern, since the machine instructions handle it properly.

 

In cases where the contents of memory are moved somewhere else, like a binary disk file, or sent over Ethernet, problems do occur.  Because programmers need to worry about this problem, the two different storage schemes have names: “Big Endian” and “Little Endian”, referring to MSB first and LSB first respectively.

 

The 8 bit processor that we use in this class has a machine word size of one byte.  In that sense, the processor does not have an Endian type – all stores and fetches move only one byte, there is no order of bytes.

 

That is not the case for the code produced by the C compiler.  The machine code produced by the compiler must store two and four byte variables if the program specifies.  If the processor has an Endian type, the compiler will conform to it.  For example, if a 16 bit processor stores two byte words in Little Endian order, the C compiler will store a four byte variable using two two-byte store operations with the less significant two bytes coming before the more significant two bytes.

 

Although our 8 bit processor does not determine an Endian ordering, the C compiler we use uses Little Endian format.

 

There is one case, where our processor does a two byte store, that is when it saves a return address on the stack, either when a CALL instruction is executed or when an interrupt occurs.  Although it is not stated in the documentation, the two-byte return address is saved in Big Endian format.  This is not a problem since C programs do not directly access the saved return address.

 

 

C Language Section:

 

Endian-ness

Here is an example of the relevance of Endian-ness in C code.

 

The following C code is from the Ethernet handling part of the Linux kernel:

struct ethhdr {

    unsigned char   h_dest[ETH_ALEN];       /* destination eth addr */

    unsigned char   h_source[ETH_ALEN];     /* source ether addr    */

    __be16          h_proto;                /* packet type ID field */

} __attribute__((packed));

 

………

 

    if (type != ETH_P_802_3)

        eth->h_proto = htons(type);

    else

        eth->h_proto = htons(len);

When the kernel needs to send an Ethernet packet, it fills in a structure in memory and gives a pointer to the structure to the driver for the particular Ethernet controller used in the system.  The driver does something equivalent to casting the pointer to (unsigned char *) and then using that pointer to sequentially read bytes from memory and send them on the wire.  The receiving Ethernet driver, on a different computer does the reverse, filling in the structure byte by byte.

 

There is a problem in the member variable, h_proto, which represents the length of the packet.  It is two bytes long, so it will be different depending on whether the processor uses Big Endian or Little Endian storage.  The Ethernet standard specifies that all multi-byte numbers must be in Big Endian format.  A processor that uses Little Endian format is required to reverse the order of the bytes before storing into h_proto.  The typedef, __be16, is just unsigned short, but its name indicates that it should be in Big Endian format, regardless of the format used by the processor.

 

Since Linux is compiled and run on both Little Endian processors like Intel x86 and Big Endian processors like Power PC, Linux code uses the macro, htons(), to convert a short from host to network format.  On Big Endian processors, htons() does nothing.  On Little Endian processors, htons() swaps the two bytes.

 

You can try compiling and running the following C program on a computer:

#include <stdio.h>

int main()

{

    unsigned short x = 1000;

    unsigned char * p = (unsigned char *)&x;

    printf( “%02x %02x\n”, p[0], p[1] );

    return 0;

}

If the output is 03 C8, the processor is Big Endian.  If the output is C8 03, the processor is Little Endian.  You should notice the pointer cast.  Without a pointer cast (or a union, which does the same thing) every program will give the same result regardless of the processor (excluding errors such as reading beyond the end of an array, etc.).

 

 

Program Example Section:

We will run the Endian test program on our system with a few changes.

 

#include "blcalls.h"

 

void __attribute__((noreturn)) main()

{

    unsigned short x = 1000;

    unsigned char * p = (unsigned char *)&x;

    DbgPrtByte( p[0] );

    DbgPrtByteNL( p[1] );

 

    DbgPrtStack( 8 );

 

    // stop execution here

    for(;;) continue;

}

 

This is the output:

E8 03

04F8 FA

04F9 04

04FA E8

04FB 03

04FC FF

04FD 04

04FE 00

04FF 4F

 

Endian-ness

The first line is the expected E8 03, showing unsigned short x has been stored in Little Endian format.  The rest is a listing of the content of the stack, produced by the function, DbgPrtStack().  The above stack listing result depends on setting the compiler optimization level to 0.  This causes the compiler to store all local variables on the stack rather than keeping them in registers.

 

We will study the stack later, for now just take the following information:

Addresses 04F8 and 04F9       unsigned char * p

Addresses 04FA and 04FB       unsigned short x

Addresses 04FC                saved register R28

Addresses 04FD                saved register R29

Addresses 04FE and 04FF       return address of call to main()

 

You can see that x is 0x03C8 and p is 0x04FA.  Note that p does indeed point to the address of x.

 

Return Address on Stack

This is the assembly language instruction that called main()and the next instruction.

+0000004D:   940E0053    CALL    0x00000053

+0000004F:   940C0074    JMP     0x00000074

The return address for the call will be the instruction after the CALL, at address 0x004F.  See that the two-byte return address, stored by the CALL instruction at address 0x04FE is in Big Endian format.  Remember that the return address 0x004F refers to program memory space but the other addresses like the value of pointer p=0x04FA, refers to data memory space.

 

Bootloader Debug Functions

Finally, here is an explanation of the functions DbgPrtByte() etc.  These functions are defined in the file, "blcalls.h", for example:

#define DbgPrtByte ( * ( void ( * )( unsigned char x ) ) 0x1FF2  )

 

The definition, *( void ( * )( unsigned char x ) ) 0x1FF2, is the constant, 0x1FF2, cast to be a pointer to a function taking one argument, unsigned char x, and returning void; the * at the beginning indicates evaluation of the pointer, that is calling the function.

 

Calls to functions defined outside of a program are normally resolved by the Linker after it finds code for the function.  This is different.  It is another example of casting a pointer to make C do something unusual.

 

There is machine code for a function already at address 0x1FF2 because we are using a bootloader program which is already programmed into the top 1K bytes of the microcontroller.  The main purpose of the bootloader is to upload new programs into the rest of memory.  A secondary feature is the existence of several debug functions already in program memory.  The bootloader was programmed into the microcontroller when the demo board was manufactured.

 


 

 

 

Chapter 4    Data Representation, part 2

 

This chapter discusses two’s compliment and arithmetic on multi-word variables.

 

Hardware Section:

 

The hardware components to do arithmetic are the computation unit (ALU), the general purpose registers and the Status Register.  These are included on the microcontroller chip.

 

Status Register and Carry Flag

The computation unit takes one or two inputs from general registers and calculates a result which it puts into one of the input registers.  Additionally, it sets certain bits in the Status Register according to the result of the computation.  Some computation operations also take input from the carry flag bit in the Status Register.

 

Two’s Compliment

The way negative numbers are represented as a bit pattern in two’s compliment arithmetic, the operation of the hardware for addition and subtraction operations on a bit pattern, is the same regardless whether the bit patterns of the inputs represent signed or unsigned (or mixed) variables.  The processor does not have separate instructions for the various cases, as opposed to the multiplication, where the processor needs different instructions for each case.

 

Processor Architecture Section:

 

Carry Flag for Add

The carry flag gets its name from the concept of carry in addition where the result of the sum of two digits is greater than a single digit and the number one must be added to the next higher digits.  The same thing can happen between words when adding multi-word integers.  Here is an example:

0x12345678 + 0x9ABCDEF0

 

(     1  1    carry)

  12 34 56 78

+ 9A BC DE F0

-------------

  AC F1 35 68

 

0x78 + 0xF0 = 0x168

0x56 + 0xDE +1 = 0x135

0x34 + 0xBC +1 = 0xF1

0x12 + 0x9A = 0xAC

 

After each operation that adds one byte to one byte, the low eight bits of the possibly nine bit result is saved in a register and the carry flag is set to one if and only if the ninth bit is one.  After the addition of the lowest bytes, the new value of the carry flag is added along with the two input bytes.

 

Two’s Compliment

Two’s compliment arithmetic is defined by interpreting the highest bit in signed numbers as negative.  Here is an example for the bit pattern 0xF4 = 11110100 (one byte variable).

Position            bit value            unsigned           signed

27          1           128         -128

26          1           64          64

25          0           32          32

24          1           16          16

23          0           .           .

22          1           4           4

21          0           .           .

20          0           .           .

                      -----       -----

Value of variable                       244         -12

 

Here are two examples of addition: adding 4 and adding 32.  The bit patterns are the same while the interpretations are different:

Unsigned:         0xF4 + 0x04 = 0xF8      244 + 4 = 248

Unsigned:         0xF4 + 0x20 = 0x14      244 + 32 = 20 (overflow)

Signed: 0xF4 + 0x04 = 0xF8      -12 + 4 = -8

Signed: 0xF4 + 0x20 = 0x14      -12 + 32 = 20

 

The case 244 + 32 = 20 is a case of overflow.  The operation produced a carry (equal to 28 = 256) which was discarded.

 

Adding negative numbers can also produce an overflow:

0xF4 + 0x88= 0x7C       -12 + -120 = 124 (overflow)

In both cases of overflow, the result is off by 256:

244 + 32 = 275 = 20 + 256

-12 + -120 = -132 = 124 - 256

Overflow errors are always multiples of 256 because the bit position of discarded carries is 28 = 256.

 

Mathematically, this is modular arithmetic – modulo 28 = 256 for one byte char, modulo 216 = 65536 for two byte short, and modulo 232 = 4294967296 for four byte long.

 

The modulo, 2n, is also the difference between the signed and unsigned interpretation of numbers where the high bit is on.  This is because of the difference in the interpretation of the high bit – the difference between 2(n-1) and -2(n-1) is 2n.

 

The fact that the signed and unsigned interpretations differ only by 0 or 2n is the reason both work using the same operation at the bit level – any possible difference between the results also differs by 2n, which is already the case.

 

Signed and Unsigned Comparisons

In the preceding section, we saw that addition and subtraction works without the need to distinguish whether the bytes represent unsigned or sighed variables.  The same is true when testing whether two variables are equal.  If and only if

(signed int)x == (signed int)y is true, then

(unsigned int)x == (unsigned int)y is true.

This is not true for tests of greater and less than:

(signed char)0xFF < (signed char)0x01 is true, but

(unsigned char)0xFF < (unsigned char)0x01 is not true.

 

Status Register and Comparisons

There are six flags in the Status Register that are set along with the numerical result of arithmetic operations.  We have already seen the carry flag, which is one of those six.  (There are two additional flags in the Status Register which can be changed by instructions explicitly, but do not change as a by-product of operations.)

 

As mentioned, the carry flag is used as an input to some operations, for instance when it is added along with two input bytes.  The other five flags are not used in calculation operations.  They are used to control branching as the result of comparisons.

 

Zero Flag

The simplest flag is the zero flag, which is set to one when the result of an operation is zero.  Branching depending on the zero flag after a compare (subtraction) operation, tests for equality.  A test for equality to zero is done by a branch instruction depending on the zero flag, preceded by a test instruction.

 

Negative Flag

The negative flag is set when the high bit in a result is set.  A test for a signed variable less than zero is done by a branch instruction depending on the negative flag, preceded by a test instruction.

 

Overflow Flag

The overflow flag is set when there was an overflow in a signed operation.  For add, it is set when both inputs are either negative (high bit set) or positive (high bit clear) and the result is the opposite way.  For subtract, it is when the result is the same as the number being subtracted and the number from which is subtracted is the opposite, i.e.

A – B = C is equivalent to B + C = A, then apply rule for addition.

 

Carry Flag

The carry flag is set according to the rules of addition: if the sum of the high bits and the internal carry from adding the next lower bits is greater than one (it can be up to three counting the carry).  In the case of subtraction, A – B, it is the same as adding A and (-B).

For subtraction of unsigned variables, carry is set when B is greater than A.

 

S Flag

The S carry flag is used for comparison of signed variables.  It is set as the exclusive-or of the negative and overflow flags. Branching depending on the S flag after a compare (subtraction) operation, tests for greater in signed variables.

 

Half-Carry Flag

The half-carry flag is seldom used.  It is set according to the internal carry from bit 23 to bit 24.  Its use is in BCD arithmetic.  In BCD, each nibble (half byte) may contain codes from 0x0 to 0x9 (A through F forbidden) representing a decimal digit.  Thus, the decimal number 1024 would be represented as 0x1024.  (The decimal value of the representation normally would be 4132, but that is irrelevant.  It is intended to represent the value 1024 decimal.)  After performing a normal addition of such numbers, the result needs to be corrected for the appearance of nibbles 0xA through 0xF.

 

Carry Flag for Multi Word Operations

We have seen how the carry flag holds the value of the carry resulting from the addition of one pair of bytes, to be used in the addition of the next.

 

For subtraction, A – B = C, the carry flag (and overflow flag) are set as they would be when adding B back into the result, C. 

 

C Language Section:

 

Review of Variable Declaration

This section is a short review of the properties of variables in C.  See K&R for details.  The keywords used to declare variables are presented.  The keywords are grouped according to similar functionality.

 

Type specifier category:

Type specifier signed modifier category:

 

These keywords determine the size and representation type (integer or float).  Type, void, cannot be used to declare variables.  It is used to declare pointers (of unspecified type), function return value (no value returned) and an argument list (indicating no arguments).

 

Note to JAVA programmers: In JAVA, char and short are the same as in C, but int is 32 bit, and, long is 64 bit. 

 

The type, int, is intended to be the “natural machine size”, which probably means the word size of the machine.  The intention may have been to provide a size that works most efficiently on a given machine.  Sizes greater that the machine word size, require multiple operations and are obviously slower.  Sizes smaller than the machine word size, could in theory require extra operations.  Usually processors with machine word size greater than one byte, provide extra hardware to allow computations on smaller sizes at full speed.

 

The floating point types, float and double, also do not have a standard size.

 

Here are the actual sizes for these types used by the GNU compiler for the 8 bit AVR and the 32 bit X86 processors:

type   int   float  double

AVR    16        32        32

X86     32        32        64

 

Most environments have an include file that uses typedef to provide more sensible names.  The include file, stdint.h, defines the following types:

 

This file also provides some types that improve on the intention of int.  These types define integers, with sizes that could change for different processors, but where the minimum size is specified. For example if you have a loop counter that goes from zero to nine, an 8 bit variable will work.  Using a 16 or 32 bit variable on an 8 bit machine would be inefficient.  Using an 8 bit variable on a 32 bit machine would work correctly although possibly less efficiently, but it would appear strange and force someone reading the program to pause and wonder why.  Using uint_least8_t (in stdint.h) allows the compiler to use a longer size (when it is better on some particular processor).  It also informs the human reader of the intention of the writer of the program.

 

The other groups of keywords related to variables describe other qualities of the variable.

 

Type qualifier category:

 

Storage-class specifier category:

Function – name is not visible outside current file

Local – opposite of auto, there is one global instance,

            but it is not visible outside the function

can be used only in function local scope (since this is default,

it is never necessary, therefore auto is almost never used)

in a processor register instead of the stack – used only for optimization, the processor may ignore the request.

declared elsewhere, e.g. a different file or below in same file

for a variable

 

Program Example Section:

 

Status Register Flags

Here is a program that allows us to examine the values of the flags in the Status Register after addition and subtraction operations.  It depends on a special feature of the AVR processor, that the Status Register is available as a memory mapped control register, so it can be read in the same way as registers of peripheral devices.

 

C Only Version

The functions, AddFlags and SubFlags, do an 8 bit addition or subtraction, and the Status Register is read immediately thereafter.  (The statement, return SREG, specifies that the value of SREG must be returned.  This causes it to be read.)

 

Although this program works, it has a serious problem.  The C compiler is guaranteed to produce an executable program that does what the C code specifies.  There are no guarantees how it does that.  The code below tells the compiler to add two bytes and to return the value of SREG.

 

To the compiler, SREG is just a memory location, it does not know that we care about the effect of an add or sub machine instruction on the flags.  It might read SREG before doing the add instruction since it knows of no connection between the two operations.  With default optimization, -Os, without the volatile keyword in AddFlags and SubFlags, this will indeed happen.

 

There is also no reason the compiler needs to use an add instruction to calculate the sum of the bytes.  For example, to calculate a + b, it might have the value (-b) already in a register from an earlier operation, and it would be more efficient to subtract (-b) from a.

 

We have not told the compiler, and there is no way in the C language to tell it, to use a particular machine instruction or that the instruction has a side effect.

 

After we run this program, we will learn the correct way.

 

#include "blcalls.h"

#include <stdint.h>

#include <avr/io.h>

 

#define TST_BIT( r, n ) ( r & (1<<n) )

 

// some bytes to try

#define NBYTES 6

uint8_t bytes[NBYTES] = {0x80, 0x81, 0xFF, 0x00, 0x01, 0x7F };

 

char flagNames[] = "cznvshti";  // names of flags in Status Reg

 

 

 

uint8_t AddFlags( uint8_t aa, uint8_t bb, uint8_t * pc );

uint8_t SubFlags( uint8_t aa, uint8_t bb, uint8_t * pc );

 

 

void __attribute__((noreturn)) main()

{

   uint8_t t, i, j, a, b, c, f, k;

   char ch;

 

   for( t=0; t<2; ++t )

   {  // t selects + or -

      for( i=0; i<NBYTES; ++i )

      {  // i selects first byte

         a = bytes[i];

 

         for( j=0; j<NBYTES; ++j )

         {  // j selects second byte

            b = bytes[j];

 

            f = (t ? AddFlags : SubFlags)( a, b, &c );

 

            DbgPrtByte( a );

            DbgPrtChar( t ? '+' : '-' );

            DbgPrtByte( b );

            DbgPrtChar( '=' );

            DbgPrtByte( c );

            for( k=0; k<8; ++k )

            {  // k selects which flag to display

               ch = flagNames[k];

               if( TST_BIT(f, k) ) ch -= 0x20;

                  // -= 0x20 converts lower to upper

               DbgPrtChar( ch );

            }

            DbgPrtCharNL( ' ' );

         }

      }

      DbgPrtCharNL( ' ' );

   }

   // stop execution here

   for(;;) continue;

}

 

 

 

uint8_t AddFlags( uint8_t a, uint8_t b, uint8_t * pc )

{

   uint8_t volatile c;

   uint8_t flg;

 

   c = a + b;

   flg = SREG;

   *pc = c;

   return f;

}

 

 

 

uint8_t SubFlags( uint8_t aa, uint8_t bb, uint8_t * pc )

{

   uint8_t volatile cc;

   uint8_t flg;

 

   cc = aa + bb;

   f = SREG;

   *pc = cc;

   return flg;

}

 

 

 

Here are a few comments about the program:

The expression (t ? AddFlags : SubFlags) is the address of a function.

Appending ( a, b, &c ) causes the function to be called with those arguments.

 

ch = flagNames[k];

Remember that a text string is an array of char terminated with a zero byte.  Therefore "cznvshti" is the same as

      { 'c', 'z', 'n', 'v', 's', 'h', 't', 'i', '\0' }

What we wanted was the eight characters without the zero byte.  Using a text string, which has the zero byte, wastes one memory location.  In this case, the programmer wasted a byte to save typing and to make the code easier to read.

 

 

Inline Assembly Language Version

The only dependably correct way to make the program use the add instruction and then to read the Status Register without any intervening instructions that change the Status Register, is to write that section of code in assembly language.

 

The GNU C compiler supports inline assembly language, which allows mixing assembly language into C code.  Without this feature, the functions, AddFlags and SubFlags, would need to be written entirely in assembly language.

 

Here is one way to do it using assembly language.  (The syntax is complicated.  Please read C:/WinAVR/doc/avr-libc/avr-libc-user-manual/inline_asm.html).  Note that putting both assembly language instructions in one statement prevents the compiler from separating them or switches their order.

 

uint8_t AddFlags( uint8_t a, uint8_t b, uint8_t * pc )

{

   uint8_t flg;

 

 

   // a = a + b;

   // f = SREG;

   asm volatile ("add %0, %2 \n\t"

                 "in %1, %3"

                : "+r" (a), "=r" (flg)

                : "r" (b), "I" (_SFR_IO_ADDR(SREG)) );

 

   *pc = a;

   return flg;

}

Here is the assembly language produced.  (To get the assembly language output of the compiler, choose menu item Project->Configuration Options, select then Custom Options pane, then type –save-temps in the area next to the Add button and click the Add button.  This causes two temporary files to be kept after the compilation is finished: the preprocessor output (extension .i) and compiler assembly language output (extension .s).

 

AddFlags:

      movw r30,r20

      add r24, r22

      in r25, 63

      st Z,r24

      mov r24,r25

      ret

 

Functions Entirely in Assembly Language

Finally, we will look at writing the functions, AddFlags and SubFlags, entirely in assembly language.  Reading FAQ.html#faq_reg_usage tells us that the arguments, uint8_t a  and uint8_t b are found in registers, r24 and r22 respectively, uint8_t * pc (two byte pointer) is in r20 and r21, and the uint8_t return value needs to be in r24.

 

We don't want to be experts in assembly language.  It is often useful to start with the assembly language produced by the compiler, then modify it as needed.  Here is the C code that worked:

uint8_t AddFlags( uint8_t a, uint8_t b, uint8_t * pc )

{

   uint8_t volatile c;

   uint8_t flg;

 

   c = a + b;

   flg = SREG;

   *pc = c;

   return flg;

}

 

This is the assembly language produced:

 

.global     AddFlags

      .type AddFlags, @function

AddFlags:

      push r29

      push r28

      push __tmp_reg__

      in r28,__SP_L__

      in r29,__SP_H__

      movw r30,r20

      add r22,r24

      std Y+1,r22

      in r24,__SREG__

      ldd r25,Y+1

      st Z,r25

      pop __tmp_reg__

      pop r28

      pop r29

      ret

 

The push and in instructions are used to set up a frame on the stack for the local variable, uint8_t volatile c;.  We could use this code, but it is more complicated than we need.

 

Now, let's try allowing (and requesting that a register be used for c).

 

uint8_t AddFlags( uint8_t a, uint8_t b, uint8_t * pc )

{

   uint8_t register c;

   uint8_t flg;

 

   c = a + b;

   flg = SREG;

   *pc = c;

   return flg;

}

 

This is better, except that it is wrong.  The in instruction should come after the add instruction.

 

.global     AddFlags

      .type AddFlags, @function

AddFlags:

      movw r30,r20

      in r25,__SREG__

      add r22,r24

      st Z,r22

      mov r24,r25

      ret

 

We can fix that by hand:

 

.global     AddFlags

      .type AddFlags, @function

AddFlags:

      movw r30,r20

      add r22,r24

      in r25,__SREG__

      st Z,r22

      mov r24,r25

      ret

Before, the compiler read the Status Register into r25 then moved it to r24 after it used the value of a in r24.  Now, the value of a has already been used and we can read the Status Register directly into r24.

 

.global     AddFlags

      .type AddFlags, @function

AddFlags:

      movw r30,r20

      add r22,r24

      in r24,__SREG__

      st Z,r22

      ret

 

Here is the final file.  We named it addsub.s and added it to the project.  We had to add the line,

#include <avr/io.h>

and change __SREG__ to _SFR_IO_ADDR(SREG) (since __SREG__ is apparently defined in a different include file).  The line

      .section .text

specifies program memory, which is default even without this line.

 

#include <avr/io.h>

      .section .text

      .global     AddFlags

      .type AddFlags, @function

AddFlags:

      movw r30,r20

      add r22,r24

      in r24,_SFR_IO_ADDR(SREG)

      st Z,r22

      ret

 

.global     SubFlags

      .type SubFlags, @function

SubFlags:

      movw r30,r20

      sub r22,r24

      in r24,_SFR_IO_ADDR(SREG)

      st Z,r22

      ret


Chapter 5    First C Program

 

This chapter discusses the Digital I/O port peripheral hardware of the microcontroller and the GNU tool-chain.  We have used both already.

 

Hardware section:

 

The microcontroller we are using has 28 electrical pins.  Five pins are used for power.  The remaining 23 pins are divided into Port B with 8 pins, Port C with 7 pins and Port D with 8 pins.  Four of the port pins can be programmed as dedicated hardware signals related to clock and reset.  This is done by an external programmer changing the Fuse bits.  Most of the Fuse bits cannot be changed by the program running on the microcontroller.  The microcontroller on the development board we are using has been programmed to have a dedicated external reset input, which reduces the number of available Port C pins from seven to six.

 

Most of the 22 pins plus the reset input are connected to parts on the development board.   Although all can be used as digital inputs and outputs, seven are connected to parts that are intended to be used this way:

Port B:             pin PB1                                    as output          speaker

Port C:             pins PC0, PC1, PC2, PC3       as outputs         LEDs

Port D:             pins PD2, PD3                         as inputs           pushbuttons

 

Processor Architecture Section:

 

Digital I/O Ports

The Digital I/O port peripheral hardware gives the program running in the microcontroller the ability to input and output digital signals on the 23 port pins.  The microcontroller has several more complex peripheral devices that also use these port pins.  When any of those peripheral devices are enabled, whichever pins they use are no longer used as Digital I/O ports.  The Digital I/O ports and all the other peripheral devices can be enabled, disabled and programmed at any time by the running program.

 

Each of the three digital I/O ports has three control registers.  Each of the eight bits of each register corresponds to one electrical pin.  For example, the lowest bit (bit 0 corresponding to 20 = 0x01) in the three registers for Port B controls pin PB0 and bit 1 (corresponding to 21 = 0x02) in these same registers controls pin PB1 which is connected to the speaker.

 

The three registers that control Port B are called PINB, PORTB and DDRB.  There are similar registers, PINC, PORTC, DDRC, PIND, PORTD and DDRD, for Ports C and D.

 

Reading a PIN register gets eight bits that correspond to the voltage level on the microcontroller pins associated with that port; 1 for high voltage level, 0 for low voltage level.  (Writing to a PIN register has no effect.)

 

The PORT and DDR registers control how the pin is driven.  If the bit for the pin in the DDR register is 1, the pin is driven to a low or high voltage level depending on the bit in the PORT register (0 for low, 1 for high).  DDR means Data Direction Register, with a bit value 1 meaning output and 0 meaning input.  Reading either register gives whatever value was previously written, regardless of the voltage on the pin.

 

When the bit for the pin in the DDR register is 1, the pin is strongly driven high or low.  In this case, the pin can drive external circuitry and forcing the pin to the opposite level will cause the microcontroller to overheat.

 

When the bit for the pin in the DDR register is 0, the pin is either not driven or weakly driven to the high level. In either case, the pin can be driven either high or low by external circuitry.  The weak drive is useful when the pin is connected to a switch connected to ground (low voltage level).  When the switch is closed, the connection overcomes the weak drive to high and reading PIN gives 0.  When the switch is open, the weak drive pulls the pin to high level and reading PIN gives 1.  Without the weak high drive, the voltage on the pin when the switch was open, would float freely, it could be either level.

 

When the bit for the pin in the DDR register is 0, the pin is not driven (high impedance) when the bit in PORT is 0, the pin is weakly driven high when the bit in PORT is 1.

 

DDR         PORT         drive

0                0          none

0                1          weak high

1                0          strong low

1                1          strong high

 

Development Board: LEDs and Pushbuttons

For the development board, Port C bits 0-3 should be set as outputs:

      DDRC = 0x0F;

The LEDs can be turned off or on writing the PORTC register.

      PORTC = value | 0xF0;

The 0xF0 causes the unused pins to be driven weakly high.  This is the safest drive for unconnected pins.  If they are driven strongly, the pins are subject to over current when accidentally shorted.  If they are floating, they can go to an intermediate voltage which unnecessarily increases the power used by the microcontroller.

 

For the development board, Port D bits 2-3 should be set as inputs with weak pull up (high drive):

      DDRD = 0x00;

      PORTD = 0x0C;

The push buttons can be read by reading the PIND register:

      value = PIND;

Because pushing the button closes the circuit to ground, pushed reads as 0 and not pushed as 1.  This is the opposite of what we expect, i.e. non-zero = TRUE.  For example, in the code:

#define BUTTON_D2 (1<<2)

 

    if( PIND & BUTTON_D2 )

    {

      ...

    }

the true condition of the if would be taken when the button is not pushed.  One nice way to correct this is to use the C bitwise compliment (NOT) operator, which just reverses the sense of all the bits:

    if( (~PIND) & BUTTON_D2 )

(The extra parentheses are not necessary in this case, but it is much better to use parentheses when unnecessary than to omit them when necessarily.  They also make clearer to a human reader what is to be done.)

 

 

C Language Section:

 

In this chapter we will learn about the GNU C compiler rather than C language.

 

The purpose of the compiler is to translate source code written in C into an executable program that can be put into the microcontroller.

 

Everything presented below is handled automatically by the IDE (integrated development environment), AVR Studio.  You may need to know these details if you use the GNU tools with a different processor (with a different IDE or no IDE) and if you need to specify non standard options to the programs that perform the steps of compilation.

 

GNU C Compiler

The file name for the GNU C complier is gcc.exe.  For the version we use, it has been changed to avr-gcc.exe in order not to conflict with native Linux GNU C complier.  The file names of all the other tools (cpp.exe, as.exe, ld.exe, etc.) also have the avr- prefix.

 

These are the steps convert source code written in C into an executable program:

            Step                             What it does                                       Reference

code files, resolves inter file references

and decides locations in memory

 

The compiler, gcc, is capable of executing the first four steps, which is sufficient for a program to be run on a PC.  For an embedded processor, the final two steps converts the executable code into an ASCII format that is used by hardware device programmers.

 

In the case where more than one file is compiled and linked together, the Compiler, gcc is used to produce object code (first three steps) for each file and the Linker, ld, links them.

 

Besides using C language files, assembly language files can also be converted to object files (step 3, Assembler) and previously created object files can be directly taken as input to the Linker.

 

Previously created object files can be combined into library files.  The linker can search library files and select only those object files it needs.  The GNU tool, ar, can be used to create and examine library files.

 

The GNU tool, nm, can be used to examine object files.  Both ar and nm are documented in Binutils.

 

Look in directory C:/WinAVR/doc and subdirectories for all GNU documentation.

 

Non Standard Options

Some options can be changed in the AVR Studio IDE. In the IDE, choose menu item Project->Configuration Options.  For other options, you must modify the makefile, see the Linker Options section below.

 

Optimization Level

In the "General" pane, you can change the compiler optimization level.  Optimization level, -O0 (minus, Uppercase O, zero), turns off all optimization, this is useful when tracing the execution of a program especially at the assembly language level. Optimization level, -Os, is usually the best for a smaller faster program, but it is hard to trace because statements can be rearranged and variable values can be left in registers where they are harder to find.

 

-save-temps

One useful option is –save-temps, mentioned above.  In the IDE, choose menu item Project->Configuration Options, select then Custom Options pane, then type –save-temps in the area next to the Add button and click the Add button.

 

save-temps causes two temporary files to be preserved which allows you to examine them:  The C language output of the preprocessor (extension .i), and the assembly language output of the C compiler (extension .s).

 

If there is an error in inline assembly language, it is detected at the assembler stage.  Without specifying –save-temps, the IDE cannot show you the assembly language line that contains the error.

 

If you have some knowledge of assembly language, reading the assembly language produced from complicated C code can help you understand what it meant to the compiler.  Reading the assembly language can help when you need to optimize a section of code to run faster.  Sometimes this optimization might involve changing the C code, other times it might require inline assembly language.  Often you will see that the compiler has already done the best job possible.

 

The preprocessor does text substitution for #define macros and inserts the text of #include files into the main file.  When there is an error, the C compiler reports the problem it sees in the file after this substitution.  It can be difficult to figure out exactly what is happening, especially when included files conditionally include other files and macros involve other macros.  Reading the output of the preprocessor allows you to see exactly what the compiler sees.

 

Linker Options

There are two methods to send special instructions to the linker: passing command line flags and creating a script file.  Some instructions are done using one method, some using the other.

 

In both cases, you will need to modify the makefile.  In the AVR Studio IDE, choose Build->Export Makefile.  Then choose Project->Configuration Options.  Select the "General" button, check "Use External Makefile, then click the "..." button and select the makefile that you exported (default file name Makefile).  On the left pane, right click "Other Files", select "Add Existing File" and select the makefile.  Open the makefile (double click the name), and find the line similar to

LDFLAGS +=  -Wl,-Map=projname.map

Edit the line adding the desired linker command line flags, prefixed with -Wl, (minus, uppercase W, lowercase L, comma, with no space after the comma).  For example

LDFLAGS += -Wl,-T,linker.scp -Wl,-Map=projname.map

adds the linker command line flag -T,linker.scp which tells the linker to use a script file named linker.scp. 

 

The prefix -Wl, is needed because the makefile invokes the compiler, avr-gcc.exe rather than avr-ld.exe  The compiler then runs the linker.  The prefix -Wl, is a command line flag for the compiler to pass the rest of the flag to the linker when it runs the linker.  It's unpleasantly complicated, but that is what you need to do.

 

Look at the linker documentation at

C:/WinAVR-20080610/doc/binutils/ld.html/index.html or enter

avr-ld.exe –-help at the command prompt.

 

Linker Script

The best way to create a linker script is to modify the default linker script.  You can get the default script by entering at the command prompt

avr-ld.exe –verbose >linker.scp 

Then edit it discarding the lines

==================================================

and the small amount of text outside them.

 

Edit linker.scp as desired and modify

LDFLAGS += -Wl,-T,linker.scp -Wl,-Map=projname.map

as explained above.

 

If you want only to append to the default linker script, you should include the script to append as an input file to the linker rather than using the -T,command line flag.  Find and modify the line in the makefile

LINKONLYOBJECTS =

to

LINKONLYOBJECTS = append.scp

It is still useful to dump and examine the default linker script.

 

Program Example Section:

We will write a program that uses the buttons and the LEDs on the development board.  Just as an example, we specify that the program should do:

(15 wraps to 0 and the reverse.)

 

There are a few issues that are not obvious.

 

The program is a loop.  If we increment the number every time we see the button down, if will change hundreds of thousands of times per second while the button is down.  Therefore, we increment the number only the first time we see the button down until we see it released.  The following code accomplishes this for both buttons:

      buttons = (~PIND) & BUTTON_ALL;

      if( buttons == oldButtons ) continue;

      oldButtons = buttons;

      ...

The program only reaches the line ..., if the buttons have changed.  Then the program should increment the number if the button has gone from released to down but not the reverse.

 

The next problem is resetting when both buttons are down.  Normally the user will not release both buttons at exactly the same time.  So the number would be incremented or decremented by one after reset, depending on which button was released second.  Adding the variable, isReset, allows us to block increment and decrement until both buttons are released.

 

Finally there is the contact bounce problem.  Contact bounce is an undesired property of mechanical switches, due to the fact that the moving part often bounces for a short time after the first time that it makes the contact.  Since our program counts the number of times the contact is made (and it is fast enough to see all the bounces), contact bounce can cause multiple counts each time the button is pressed.  This is not what we want.

 

This problem can be avoided by adding external capacitors and resistors, but these add to the cost of the parts and of manufacture.  It is usually better to use a software solution.

 

After the first change in the state of the two buttons is detected, the code delays long enough for the bouncing to have stopped and checks again.  Every time there was a difference, it waits and tries again.  It the delay is long enough, the state is the same after the delay and the program goes on to acting on the new state.

 

You can observe contact bounce by removing this code.  Just add the line

#define NO_DEBOUNCE

at the beginning to temporarily eliminate the debounce code.  The buttons on the development board don't always bounce, but you should be able to see a problem after a few tries.  You can also see counts due to bounce when the button is released.

 

The function, void Delay( uint32_t dd ), is written using inline assembly language.  It could have been written in C, provided the count variable was declared volatile as in the program example in Chapter 2.  The assembly language loop is many times faster which allows better resolution and shorter possible delays.  (The C version is slower because the volatile keyword forces four bytes to be loaded into registers, decremented and then stored back each time through the loop.  The assembler version only decrements one byte per loop except when there is a borrow, and it keeps everything in registers.)

 

 

#include <inttypes.h>

#include <avr/io.h>

 

#define PORTB_UNUSED ( 0xFF )

 

#define LED_C0 (1<<0)

#define LED_C1 (1<<1)

#define LED_C2 (1<<2)

#define LED_C3 (1<<3)

#define LED_ALL (LED_C0 | LED_C1 | LED_C2 | LED_C3 )

#define PORTC_UNUSED ( 0xF0 )

 

#define BUTTON_D2 (1<<2)

#define BUTTON_D3 (1<<3)

#define BUTTON_ALL (BUTTON_D2 | BUTTON_D3)

#define PORTD_UNUSED ( 0x30 )

 

void Delay( uint32_t dd );

 

void main()

{

   uint8_t leds, buttons, oldButtons, isReset;

 

   DDRB = 0x00;

   PORTC = PORTB_UNUSED;

   DDRC = LED_ALL;

   PORTC = PORTC_UNUSED;

   DDRD = 0x00;

   PORTD = PORTD_UNUSED | BUTTON_ALL;

 

   leds  = 0;

   oldButtons = 0;

   isReset = 0;

 

   for(;;)

   {

      // wait for buttons to change

      buttons = (~PIND) & BUTTON_ALL;

      if( buttons == oldButtons ) continue;

      oldButtons = buttons;

 

#ifndef NO_DEBOUNCE

      // wait for buttons not to change for some time

      for(;;)

      {

         Delay( 100000 );

         buttons = (~PIND) & BUTTON_ALL;

         if( buttons == oldButtons ) break;

         oldButtons = buttons;

      }

#endif

 

      switch( buttons )

      {

      case 0:

         isReset = 0;

         break;

 

         // once reset is initiated, don't count

         // until both buttons up

      case BUTTON_D2:

         if( !isReset ) --leds;

         break;

 

      case BUTTON_D3:

         if( !isReset ) ++leds;

         break;

 

      case BUTTON_D2 | BUTTON_D3:

         isReset = 1;

         leds = 0;

         break;

      }

      PORTC = PORTC_UNUSED| ( leds & LED_ALL );

   }

}

 

 

 

 

void Delay( uint32_t dd )

{  // do not use dd = 0

   asm volatile ("1: \n\t"

                 "subi %A0,1 \n\t"

                 "brcc 1b \n\t"

                 "subi %B0,1 \n\t"

                 "brcc 1b \n\t"

                 "subi %C0,1 \n\t"

                 "brcc 1b \n\t"

                 "subi %D0,1 \n\t"

                 "brcc 1b" : "=d" (dd) : "0" (dd) );

}

 

 

 

 

 

 

 

 


 

 

Chapter 6    Interrupts

 

This chapter discusses interrupts and interrupt processing.  Although interrupt support is not necessary for a processor to do useful work, it is important enough that all common processors, from the smallest microcontroller to the latest billion transistor CPUs support interrupts.

 

At a minimum, interrupt support is a mechanism to interrupt the normal sequence of program instruction processing in response to an event that is not caused by the program instruction processing.  Interrupt support also allows resuming the interrupted program.

 

Interrupts are used to initiate processing of unpredictable external events quickly and efficiently.  They allow external conditions to be monitored by hardware without the need for the program to spend any time, until the condition actually occurs.  When the condition occurs, there is no delay waiting until the program gets to the part that checks the condition.

 

Sometimes, predictable events are also used to cause interrupts.  In particular, a regularly occurring timer interrupt is used trigger the switching of tasks in preemptive multitasking.

 

Many processors also support exceptions which also involve interrupting the normal sequence of program instruction processing.  Exceptions differ from interrupts in that an exception is caused by the execution of a program instruction.  One of the most important types of exception is used by large processors to trigger swapping in unmapped data in virtual memory operating systems.  Most small microcontroller processors do not support exceptions.

 

Hardware section:

 

The AVR microcontrollers have almost the minimum possible hardware support for interrupts.  Hardware support for interrupts consists of:

And when an interrupt occurs:

Global Interrupt Flag

The global interrupt enable flag in the Status Register can be set or cleared under program control.  The flag is clear at Reset when operation starts.  This prevents all interrupts and allows the program to make whatever preparations are needed to successfully handle interrupts.

 

Edge Triggered Interrupt Requests

 The various peripherals which can request interrupts can be programmed individually whether or not actually to allow interrupts.  Most interrupts are event triggered.  This means that the external event sets a flag which then requests the interrupt.  The request will remain until cleared by the initiation of processing the interrupt (or cleared by action of the program).  If interrupts are disabled for the peripheral or globally by the interrupt enable bit in the Status Register, the request flag remains and can cause an interrupt later when the interrupt is enabled.

 

Level Triggered Interrupt Requests

A few interrupt sources are level triggered.  In this case, the interrupt signal is not latched in a flag, it acts directly as an interrupt request.  If the signal returns to its inactive state before the interrupt processing can begin (because disabled or a higher priority interrupt request exists), no interrupt happens.  If the signal is still active after the interrupt processing is finished, another interrupt is requested and can occur.

 

Interrupt Vector Table

The ATMega168 has 25 separate interrupt sources.  Each of these can cause the start of execution of a different block of code.  Reset of the microcontroller is treated similarly, in that it also causes the start of execution of a block of code.  Counting Reset, there are 26 possible addresses for the start of code to handle interrupts or the start of the program at reset. The first 52 locations in program memory are used for this purpose.

 

Reset causes execution to start at location 0x0000, the highest priority interrupt (called INT0) causes execution to start at location 0x0002, and so on to the lowest priority interrupt (called SPM READY) causes execution to start at location 0x0032 (decimal 50).

 

Two memory locations is just enough space for a single assembly language Jump instruction.  Normally, the first 52 locations in program memory hold 26 Jump instructions to different blocks of code, the first being the start of the program and the rest to handle interrupts. A program that never allows interrupts could simply begin at location 0x0000, but the code normally created by the C compiler always includes this block of Jump instructions.  This block of Jump instructions is called the Interrupt Vector Table.

 

Processor Architecture Section:

 

Interrupt Processing

When one or more interrupts are enabled, their requests are active, and interrupts are enabled globally, the following steps occur:

 

Restoring Context

At this point the main program should continue to execute as it would if the interrupt had not occurred.  In order for the program to continue unaffected, the interrupt handler code must restore the state of all the registers that it modified during its operation (assuming these registers are used anywhere in the main program also) to their original state.

 

Code produced by the C compiler uses all 32 of the general purpose registers.  Therefore (unless the main program is not written in C), interrupt handlers must save and restore every register they use.  In addition, they must save and restore the Status Register since its flags temporarily hold the status result of certain instructions.

 

In rare instances, it is possible to write interrupt handlers in assembly language that do not change the Status Register or any general purpose registers.  Otherwise, interrupt handler code always saves the Status Register and whichever general purpose registers will be used at the beginning, and restores them at the end, finally executing a RETI (Return From Interrupt) instruction.

 

C Language Section:

 

The C language, by itself, has no support for interrupt handler code.  The GNU C compiler provides a language extension to support interrupt handler code.  Without this extension, all interrupt handlers would need assembly language code for saving and restoring context, for ending it with a RETI instruction, and for inserting the proper Jump instruction in the Interrupt Vector Table.

 

The GNU C compiler extension, however, does all of this, and allows us to write programs that include interrupt handlers entirely in C, except for the special keywords used to declare interrupt handlers.

 

Declaring Interrupt Handlers

Here is the definition of an interrupt handler in GNU C, using definitions in the header file, <interrupt.h>:

ISR( INT0_vect, ISR_BLOCK )

{

      ++gbl.intrCount;

}

 

The name, INT0_vect, is defined as __vector_1 in the include file, <avr/iomx8.h>, which defines constants for microcontrollers in the ATMega48 family.  This file is included by <avr/io.h>, which is included in any C program that uses these constants.

 

Here is a list of the Interrupt Vector Table and the interrupt names from <avr/iomx8.h>:

                        /* reset - not an interrupt */            0

INT0_vect               /* External Interrupt Request 0 */        1

INT1_vect               /* External Interrupt Request 1 */        2

PCINT0_vect            /* Pin Change Interrupt Request 0 */      3

PCINT1_vect            /* Pin Change Interrupt Request 1 */      4

PCINT2_vect             /* Pin Change Interrupt Request 2 */      5

PCINT2_vect             /* Watchdog Time-out Interrupt */         6

TIMER2_COMPA_vect       /* Timer/Counter2 Compare Match A */      7

TIMER2_COMPB_vect       /* Timer/Counter2 Compare Match B */      8

TIMER2_OVF_vect         /* Timer/Counter2 Overflow */             9

TIMER1_CAPT_vect       /* Timer/Counter1 Capture Event */        10

TIMER1_COMPA_vect       /* Timer/Counter1 Compare Match A */      11

TIMER1_COMPB_vect       /* Timer/Counter1 Compare Match B */      12

TIMER1_OVF_vect         /* Timer/Counter1 Overflow */             13

TIMER0_COMPA_vect       /* TimerCounter0 Compare Match A */       14

TIMER0_COMPB_vect       /* TimerCounter0 Compare Match B */       15

TIMER0_OVF_vect         /* Timer/Couner0 Overflow */              16

SPI_STC_vect            /* SPI Serial Transfer Complete */        17

USART_RX_vect           /* USART Rx Complete */                   18

USART_UDRE_vect         /* USART, Data Register Empty */          19

USART_TX_vect           /* USART Tx Complete */                   20

ADC_vect                /* ADC Conversion Complete */             21

EE_READY_vect           /* EEPROM Ready */                        22

ANALOG_COMP_vect       /* Analog Comparator */                   23

TWI_vect                /* Two-wire Serial Interface */           24

SPM_READY_vect          /* Store Program Memory Read */           25

 

Interrupt Handler Attributes

The attribute, ISR_BLOCK, is defined as nothing – it denotes the default behavior which is for the interrupt handler not to enable interrupts after the hardware disables them at its beginning.  Three attribute are available:

 

The attribute, ISR_NOBLOCK, causes interrupts to be re enabled immediately in the interrupt handler.  This can be useful since using the first C statement to re enable interrupts involves a significant delay needed to save registers etc.  In practice, interrupt handlers are usually designed to finish so quickly so there is little need to re enable interrupts sooner.  Allowing nested interrupts has the disadvantage of doubling the considerable amount of space on the stack used to hold the saved context.

 

The attribute, ISR_ NAKED, causes the compiler not to create any prolog or epilog. No code to save and restore context is created and no RETI instruction is put at the end.  The first object code of the interrupt handler is the first line of C code.   In this case, there must be inline assembly language code at the beginning and end of the handler to save and restore context.

 

When we look a multithreading, where each thread has a stack, we will use naked interrupt handlers to save most of the context in a fixed area rather than the stack.  Otherwise the space must be reserved in each stack (since the interrupt can come at any time), which multiplies the amount of reserved space by the number of tasks.

 

This is the location of the documentation for the header, <interrupt.h>:

C:/WinAVR-20080610/

doc/avr-libc/avr-libc-user-manual/group__avr__interrupts.html

 

Structure of Interrupt Handler Code

We can look at the output of the preprocessor to see what really is given to the compiler:

void __vector_1(void) __attribute__((signal,used,externally_visible));

 

void __vector_1(void)

{

 ++gbl.intrCount;

}

 

The interrupt handler is a function with a special name and some special attributes.  The attributes are extensions to the C language.  In particular, the attribute, signal, causes the function to save and restore the Status Register, any general purpose registers used, to set r1 equal to zero (a condition assumed by the compiler), and to end with a RETI instruction rather than a RET instruction.

 

This is the documentation for the attribute extension to the C language:

C:/WinAVR-20080610/doc/gcc/HTML/gcc-4.3.0/gcc/Function-Attributes.html

 

Finally, we examine the assembly language code produced for the interrupt handler listed above, with comments added:

.global     __vector_1

      .type __vector_1, @function

__vector_1:

      push r1     /* save r1 – it will be set to zero*/

      push r0     /* save r0 – it is used below and possibly C code*/

      in r0,__SREG__    /* get Status Register */

      push r0           /* and save its value */

      clr r1      /* make sure r1 is zero, in case C code uses it */

      push r24    /* save r24 – it is used by C code below */

      push r25    /* save r25 – it is used by C code below */

/* C code starts here */

      /*    ++gbl.intrCount; */

      lds r24,gbl       /* load low byte of gbl.intrCount to r24*/

      lds r25,(gbl)+1   /* load high byte of gbl.intrCount to r25 */

      adiw r24,1        /* add one to register pair r24,r25 */

      sts (gbl)+1,r25   /* store r25 to high byte of gbl.intrCount */

      sts gbl,r24       /* store r24 to low byte of gbl.intrCount */

/* C code ends here */

      pop r25     /* restore r25 from stack */

      pop r24     /* restore r24 from stack */

      pop r0      /* get old value of Status Register from stack */

      out __SREG__,r0   /* and restore it to Status Register */

      pop r0      /* restore r0 from stack */

      pop r1      /* restore r1 from stack */

      reti        /* return to interrupted code, enable interrupts */

 

Interrupt Handler Stack Usage

Processing this interrupt temporarily uses seven bytes on the stack: two bytes for the return address saved by hardware, one for the Status Register, two for r0 and r1 which are sometimes used by C code, and two for r24 and r25 which were used by this particular C code.

 

Calling a C function that did the same thing would have used only two bytes on the stack: two bytes for the return address saved by the CALL instruction.  Although, r24 and r25 would still be modified, the compiler never expects those particular registers to be preserved after the call.  Therefore, normal functions do not need to preserve them.

 

Avoid Function Calls from Interrupt Handler

The worst thing for stack usage is when an interrupt handler calls a separate function.  Twelve registers total (including r24 and r25) are expected not to be preserved by the called function.  If our interrupt handler had called a function, the compiler would have been forced to assume any of these might have been modified.  (The compiler never looks outside the function it is compiling.  Often the called functions are not compiled at the same time.)  Therefore the interrupt handler would need to save all twelve.

 

Sometimes an interrupt handler and the main line code need to execute the same block of code.  Normally, that block of code should be a function, called from both places, but we want to avoid that in the case of the interrupt handler.  On the other hand, copying a block of code to two places is bad practice because it makes the code hard to maintain.  (If you change the code someday, how do you know that there is another copy and where it is?)

 

extern inline Functions

The solution to this dilemma is to use an inline function.  Please see the Compiler documentation: C:/WinAVR-20080610/doc/gcc/HTML/gcc-4.3.0/gcc/Inline.html

Declaring a function extern inline prevents separate object code that could be called from being generated.  This is particularly useful because it allows the function to be defined in multiple files without creating a linker error.  This way, the function can be defined in a header file and included in several C files.

 

Program Example Section:

 

External Interrupts

This program uses External Interrupts to demonstrate level triggered and edge triggered interrupt requests.  External Interrupts are interrupts that can be generated by the Digital I/O Port inputs.

 

There are two types of External Interrupts: the original more versatile External Interrupts which are available on only two pins, and the Pin Change Interrupts which are available on all pins.

 

Pin Change Interrupts

The Pin Change Interrupts are always edge triggered and occur on both the rising and falling edges.  They are grouped so that a change on any pin of Port B creates the same interrupt, PCINT0.  Similarly for Port C with PCINT1 and Port D with PCINT2, so there are a total of three separate Pin Change Interrupts.

 

INT0 and INT1 Interrupts

The more versatile External Interrupts are INT0 associated with the bit 2 pin of PortD, and INT1 associated with the bit 3 pin of PortD.  INT0 and INT1 can be programmed to interrupt in one of four cases:

 

The program below uses INT0, which is connected to the D2 button of the development board, in either mode 0 or mode 2.

 

The other button, D3, selects one of four states:

 

The program demonstrated the following properties:

 

Here are some comments about the program.

 

Grouping Global Variables in a Structure

The interrupt handler can act only on global variables.  We put all the global variables into a single structure.  (In this case, there is only one.)  This simply helps keep track of them.  Global variables are generally bad for maintainability and should be closely watched.

 

Avoiding Static Variables

The function, CheckState(), uses several parameters which are preserved between calls, therefore they cannot be saved in local storage.  There are three places they could have been kept:

We use the third option.  In larger programs the first two options tend to reduce maintainability.  Although it would not be bad here, it is a good habit to avoid them whenever possible, even though it makes the code a bit more complicated.

 

Similarly to the previous example program, the function, CheckState(), has code to debounce button D3.

 

In the code where the program prints gbl.intrCount and the sets it to zero, interrupts are temporarily blocked.  The reason for this will be explained in the next chapter.  Normally, interrupts are blocked by calling cli(), and enabled by calling sei().  (These are actually not functions but macros that insert a single inline assembly language instruction CLI or SEI.)  In this case, it is not appropriate to use sei()to unblock interrupts because in two of the four test states, interrupts were blocked already.  What we want is:

The functions SaveAndCli() and IntrRestore() do this.  SaveAndCli() returns the original 8 bit value of the Status Register and IntrRestore() restores it.  The fact that the other seven bits of the Status Register are also restored is of no consequence.

 

Disabling Interrupts for a Block of Code

Using functions like SaveAndCli() and IntrRestore() is the only correct way to block interrupts in a section of code unless you are sure that interrupts will always be enabled when the code is called.  In a large program, you usually do not know.

 

 

#include <inttypes.h>

#include <avr/io.h>

#include <avr/interrupt.h>

#include "blcalls.h"

 

// Defines for bits within bytes

#define BIT(n) (1<<n)

#define BITS( v, hi, lo ) ( v & ( (1<<(hi-lo+1)) -1 ) << lo )

#define SET_BIT( r, n ) { r |= BIT(n); }

#define CLR_BIT( r, n ) { r &= ~BIT(n); }

#define TST_BIT( r, n ) ( r & BIT(n) )

 

 

// Defines for development board

#define PORTB_UNUSED ( 0xFF )

 

#define LED_C0 (1<<0)

#define LED_C1 (1<<1)

#define LED_C2 (1<<2)

#define LED_C3 (1<<3)

#define LED_ALL (LED_C0 | LED_C1 | LED_C2 | LED_C3 )

#define PORTC_UNUSED ( 0xF0 )

 

#define BUTTON_D2 (1<<2)

#define BUTTON_D3 (1<<3)

#define BUTTON_ALL (BUTTON_D2 | BUTTON_D3)

#define PORTD_UNUSED ( 0x30 )

 

 

 

 

 

#define INT0_LEVEL 0x00  // INT0 level triggered on low level

#define INT0_EDGE 0x02   // INT0 edge triggered on falling edge

 

// Defines for meaning of gbl.state

#define STATE_INTR_ENABLE 0x01

#define STATE_INT0_EDGE 0x02

 

 

// put all globals in one structure

typedef struct

{

   uint16_t intrCount;  // set by interrupt handler

} t_Gbl;

 

t_Gbl volatile gbl;

 

typedef struct

{

   uint8_t init;

   uint8_t oldButton;

   uint8_t state;

} t_CheckState;

 

 

uint8_t SaveAndCli(void);

void IntrRestore( uint8_t oldSreg );

uint8_t CheckState( t_CheckState * p );

void Delay( uint32_t dd );

 

 

 

 

void __attribute__((noreturn)) main()

{

   t_CheckState csVars;

 

   // set up ports

   DDRB = 0x00;

   PORTC = PORTB_UNUSED;

   DDRC = LED_ALL;

   PORTC = PORTC_UNUSED;

   DDRD = 0x00;

   PORTD = PORTD_UNUSED | BUTTON_ALL;

 

   // enable INT0

   EIMSK = BIT( INT0 );

 

   gbl.intrCount = 0;

   csVars.init = 1;

   for( ;; )      // forever loop

   {

      // check for button D3 changing state

      if( CheckState( &csVars ) )

        {

         char * stateNames[4] =

         {

            "Level Trig - Intr Disable",

            "Level Trig - Intr Enable",

            "Edge Trig - Intr Disable",

            "Edge Trig - Intr Enable"

          };

 

         // print current value of INTF0

         DbgPrtStr( "INTF0 was" );

         DbgPrtByteNL( (EIFR >>INTF0) & 1 );

 

         DbgPrtStrNL( "" );

         DbgPrtStrNL( stateNames[csVars.state] );

 

         // set global interrupts on or off

         if( csVars.state & STATE_INTR_ENABLE ) sei();

             else cli();

 

         // set global interrupts on or off

         if( csVars.state & STATE_INT0_EDGE )

            EICRA = BITS( INT0_EDGE,  ISC01, ISC00 );

         else EICRA = BITS( INT0_LEVEL,  ISC01, ISC00 );

      }

 

      // Print interrupt count, if non zero

        if( gbl.intrCount > 0 )

        {

           uint8_t intrSave;

           uint16_t temp;

 

             // block interrupts while copy and clear count

             intrSave = SaveAndCli();

             temp = gbl.intrCount;

             gbl.intrCount = 0;

             IntrRestore( intrSave );

 

             DbgPrtWordNL( temp );

      }

   }

}

 

 

 

 

 

ISR( INT0_vect, ISR_BLOCK )

{

   ++gbl.intrCount;

}

 

 

 

 

// returns 1 if state is changed, 0 otherwise

uint8_t CheckState( t_CheckState * p )

{

   uint8_t button;

 

   if( p->init )

   {  // initialization

      p->init = 0;

      p->state = 0;

      p->oldButton = 0;

      return 1;

   }

 

   button = ~PIND & BUTTON_D3;

   if( button == p->oldButton ) return 0;

   Delay( 100000 );

   button = ~PIND & BUTTON_D3;

   if( button == p->oldButton ) return 0;

   p->oldButton = button;

  

   if( !button ) return 0;

 

   p->state = (p->state +1) & 0x03;

   return 1;

}

 

 

 

 

 

 

void Delay( uint32_t dd )

{  // do not use dd = 0

   asm volatile ("1: \n\t"

                 "subi %A0,1 \n\t"

                 "brcc 1b \n\t"

                 "subi %B0,1 \n\t"

                 "brcc 1b \n\t"

                 "subi %C0,1 \n\t"

                 "brcc 1b \n\t"

                 "subi %D0,1 \n\t"

                 "brcc 1b" : "=d" (dd) : "0" (dd) );

}

 

 

 

 

 

// returns flags (interrupt enable in particular)

// and disables interrupts

uint8_t SaveAndCli(void)

{

   uint8_t ret;

 

   ret = SREG;

   cli();

   return ret;

}

 

 

 

 

 

 

// restores flags (interrupt enable in particular)

void IntrRestore( uint8_t oldSreg )

{

   SREG = oldSreg;

}

 


Chapter 7    Concurrency Issues with Interrupts

 

The fact that interrupts can come at any time has sometimes unpleasantly surprising consequences.  One often assumes that programs flow from statement to statement as written and that the statements do what they say they will do.  Neither of these assumptions is necessarily true when interrupt affect the same variables that the main line program is using.

 

Basically, C code operates on the assumption that nothing else is happening while it is running.  C code can be and is used when that is not the case, but some precautions and a lot of extra thought are needed.

 

Hardware section:

 

Nothing new related to hardware needs to be presented in this chapter.

 

Processor Architecture Section:

 

There are two ways interactions between main line code and interrupt code cause problems:  when the interrupt occurs between the blocks of object code corresponding to C statements and when it occurs within a block.

 

Non-atomic C Statements

We will examine the case where the interrupt comes in the middle of a C statement first.  Consider the operation of incrementing a two byte variable.  The increment requires an add operation for each byte and the interrupt can come between.  The C statement is called non-atomic because if can be interrupted in the middle.  For example:

uint16_t volatile v;

 

int main(void)

{

   ...

   ...

   v = 0x00FF;

   for(;;)

   {

      ++v;

      if( v > 0x0101 ) v = 0x00FF;

   }

}

If an interrupt handler reads the value of v at random times, what values are possible?  The naive answer is 0x00FF, 0x0100, 0x0101, and 0x0102.  In reality, the values 0x0000, and 0x01FF will also be seen.

 

When the value is 0x00FF and is ++v executed, first the high byte of the incremented value, 0x01, is stored back to the variable, v, and then the low byte, 0x00.  Reading the value of v between the two store operations gives 0x01FF.  (If the bytes were set in the opposite order, 0x0000 would be seen instead.)

 

  

When the value is 0x0102 and is v = 0xFF executed, the high byte is set to 0x00 and then the low byte is set to 0xFF.  Reading the value of v in between gives 0x0002. 

 

C Language Section:

 

Concurrency Problems

The other type of problem occurs when the interrupt changes a variable between C statements.  In the following code, the interrupt handler sets a variable to indicate the interrupt occurred.   The main line code counts the interrupts.

uint8_t volatile intrFlag;

 

ISR( ... )

{

   intrFlag = 1;

}

 

int main(void)

{

   uint32_t intrCount;

   ...

   ...

    intrCount = 0;

   for(;;)

   {

      if( intrFlag )

      {

         intrFlag = 0;

         ++intrCount;

         ...

      }

   }

}

This works, assuming that the for loop runs quickly enough to count each interrupt before the next one comes.

 

If this is not the case, we can improve the program by allowing the variable, intrFlag, to count higher than one.

uint8_t volatile intrFlag;

 

ISR( ... )

{

   ++intrFlag;

}

 

int main(void)

{

   uint32_t intrCount;

   ...

   ...

    intrCount = 0;

   for(;;)

   {

      if( intrFlag > 0 )

      {

         intrCount += intrFlag;

         intrFlag = 0;        // error, potential race condition

         ...

      }

   }

}

This allows up to 255 interrupts to accumulate before the main line loop must read and reset intrFlag.  There is a problem though.  When intrFlag > 0, and intrCount is increased by the value of intrFlag, it is possible that another interrupt occurs before intrFlag is reset to zero.  In this case, the last interrupt is never counted.

 

This is corrected by disabling interrupts around these two statements as demonstrated in the previous chapter.

 

Interrupt Latency

Disabling interrupts temporarily is undesirable because it increases interrupt latency, the maximum time an interrupt might be delayed before processing begins.  As we saw in the previous chapter, the time used by other interrupt handlers also creates latency, so it is good to make interrupt handlers as short as possible.  This is the reason that we increment a single byte variable, intrFlag, instead of incrementing the four byte variable, intrCount, immediately inside the interrupt handler.

 

The maximum interrupt latency is usually a bigger concern that the average percentage of time that interrupts are blocked.

 

Examining Interrupt Latency

Finally, we will assume it is important to reduce the interrupt latency of this program.

#include <inttypes.h>

#include <avr/io.h>

#include <avr/interrupt.h>

 

uint8_t volatile intrFlag;

 

extern void DoSomething( uint32_t t );

 

ISR( INT0_vect, ISR_BLOCK )

{

   ++intrFlag;

}

 

void __attribute__((noreturn)) main()

{

   uint32_t intrCount;

 

   intrCount = 0;

   for(;;)

   {

      if( intrFlag > 0 )

      {

         cli();

         intrCount += intrFlag;

         intrFlag = 0;

         sei();

         DoSomething( intrCount );

      }

   }

}

We need to add the call to DoSomething in order to prevent the compiler from seeing that the variable, intrCount, is not used and omitting the calculation, intrCount += intrFlag, as an optimization.  In a real program, this would not be a problem because we would not have calculated intrCount if we did not use it somewhere.

 

Here is the assembly language produced with optimization on (-Os).  We will count the clock cycles where interrupts are disabled.  Carefully reading the data sheet manual for the processor tells that the following instructions (of those executed while interrupts are disabled) take more than one clock cycle:

The data sheet also says:

 

 

00000000

<__vectors>:

      jmp   <__ctors_end>

      jmp   <__vector_1>

      ...

 

__vector_1:

      push __zero_reg__

      push r0

      in r0,__SREG__

      push r0

      clr __zero_reg__

      push r24

      lds r24,intrFlag

      subi r24,lo8(-(1))

      sts intrFlag,r24

      pop r24

      pop r0

      out __SREG__,r0

      pop r0

      pop __zero_reg__

      reti

main:

      clr r14

      clr r15

      movw r16,r14

.L8:

      lds r24,intrFlag

      tst r24

      breq .L8

      cli

      lds r24,intrFlag

      add r14,r24

      adc r15,__zero_reg__

      adc r16,__zero_reg__

      adc r17,__zero_reg__

      sts intrFlag,__zero_reg__

      sei

      movw r24,r16

      movw r22,r14

      call DoSomething

      rjmp .L8

With this information, we find the following latencies:

 

Allowing Nested Interrupts

We will try to reduce the interrupt handler latency.  First, we will try allowing the handler to be interrupted:

ISR( INT0_vect, ISR_NOBLOCK )

{

   ++intrFlag;

}

__vector_1:

      sei

      push __zero_reg__

      push r0

      in r0,__SREG__

      push r0

      clr __zero_reg__

      push r24

      lds r24,intrFlag

      subi r24,lo8(-(1))  /* subtract -1 == add 1 */

      sts intrFlag,r24

      pop r24

      pop r0

      out __SREG__,r0

      pop r0

      pop __zero_reg__

      reti

Here is the assembly language produced.  The only difference is the sei instruction inserted at the beginning.  This reduces the latency from 39 to 10 clock cycles.  This includes the two cycles needed by the push instruction after the sei.  Putting a nop instruction, which takes only one cycle and does nothing, in between would reduce this to 9.

 

Adding the ISR_NOBLOCK attribute has the following dangers.

 

If this interrupt is level triggered, it will recur immediately, unless the source has changed back to inactivate very soon after it became active.  In this case, the interrupt handler will never complete.  Rather the hardware push of the two byte return address onto the stack followed by the execution of the sei and push instructions will occur repeatedly every 10 clock cycles.  Three bytes of the stack space will be used each time and the stack will overflow the space allocated to it and crash the system.  This would take about 150 microseconds if all of memory were allocated for the stack.

 

Therefore, avoid using the ISR_NOBLOCK attribute for level triggered interrupts.

 

Even if the interrupt is edge triggered, it must not occur more often than the 40 clock cycles required by the interrupt handler to finish, or else the stack will grow.  If this happens occasionally, the stack could tolerate it, but then there is a concurrency problem:  If the second interrupt occurs between the instructions

      lds r24,intrFlag

      subi r24,lo8(-(1))

      sts intrFlag,r24

the second interrupt will have incremented intrFlag before the first stores back its incremented value.  The net effect is that one count is lost.

 

Blocking a Single Type of Interrupt

There is a better way to improve latency without allowing the same interrupt to nest.  The basic idea will be to block only the INT0 interrupt, without blocking interrupts globally.  To do this, we must temporarily set the INT0 bit in the EIMSK register to zero.

 

ISR( INT0_vect, ISR_BLOCK )

{

   EIMSK &= ~(1<< INT0);

   sei();

   ++intrFlag;

   cli();

   EIMSK |= (1<< INT0);

}

 

Here is the output of the complier.  The latency is now 23 clock cycles for the first half and 21 for the second half.

__vector_1:

      push __zero_reg__

      push r0

      in r0,__SREG__

      push r0

      clr __zero_reg__

      push r24

 

/*   EIMSK &= ~(1<< INT0);    */

      in r24,61-32

      andi r24,lo8(-2)

      out 61-32,r24

 

/*   sei(); */

      sei

 

/*   ++intrFlag; */

      lds r24,intrFlag

      subi r24,lo8(-(1))

      sts intrFlag,r24

 

/*   cli(); */

      cli

 

/*   EIMSK |= (1<< INT0);     */

      in r24,61-32

      ori r24,lo8(1)

      out 61-32,r24

 

      pop r24

      pop r0

      out __SREG__,r0

      pop r0

      pop __zero_reg__

      reti

 

ISR_NAKED and Inline Assembly to Improve Latency

What else can we do to improve it?  Next we will try using the ISR_NAKED attribute, which will allow us more detailed control using inline assembly language.  We will start out with the assembly language output from the C code above, and then remove unnecessary instructions.

 

We do not change the registers r0 or r1(__zero_reg__), so we do not need to save them.  The subi instruction changes the Status register, but we can wait to save it until after global interrupts are enabled.  Here is the improved version:

__vector_1:

      push r24

 

/*   EIMSK &= ~(1<< INT0);    */

      in r24,61-32

      andi r24,lo8(-2)

      out 61-32,r24

 

/*   sei(); */

      sei

 

      in r24,__SREG__

      push r24

 

/*   ++intrFlag; */

      lds r24,intrFlag

      subi r24,lo8(-(1))

      sts intrFlag,r24

 

      pop r24

      out __SREG__, r24

 

/*   cli(); */

      cli

 

/*   EIMSK |= (1<< INT0);     */

      in r24,61-32

      ori r24,lo8(1)

      out 61-32,r24

 

      pop r24

      reti

The latency is now 14 clock cycles for the first half and 14 for the second half.  Written in C, with symbolic names for EIMSK etc., it looks as follows:

ISR( INT0_vect, ISR_NAKED )

{

   asm volatile ( "push r24" );

 

/*   EIMSK &= ~(1<< INT0);    */

   asm volatile ( "in r24, %0"   : : "I" ( _SFR_IO_ADDR(EIMSK) ) );

   asm volatile ( "andi r24, %0" : : "M" ( (~(1<< INT0)) & 0xFF ) );

   asm volatile ( "out %0, r24"  : : "I" ( _SFR_IO_ADDR(EIMSK) ) );

 

/*   sei(); */

   asm volatile ( "sei" );

 

   asm volatile ( "in r24, __SREG__" );

   asm volatile ( "push r24" );

 

/*   ++intrFlag; */

   asm volatile ( "lds r24, intrFlag" );

   asm volatile ( "subi r24, %0" : : "M" ( (-1) & 0xFF ) );

   asm volatile ( "sts intrFlag, r24" );

 

   asm volatile ( "pop r24" );

   asm volatile ( "out __SREG__, r24" );

 

/*   cli(); */

   asm volatile ( "cli" );

 

/*   EIMSK |= (1<< INT0);     */

   asm volatile ( "in r24, %0"   : : "I" ( _SFR_IO_ADDR(EIMSK) ) );

   asm volatile ( "andi r24, %0" : : "M" ( (1<< INT0) & 0xFF ) );

   asm volatile ( "out %0, r24"  : : "I" ( _SFR_IO_ADDR(EIMSK) ) );

 

   asm volatile ( "pop r24" );

   asm volatile ( "reti" );

}

It took a few tries to discover that the macro, lo8(), does not work in this context and the constants , -1 and ~(1<< INT0), need to be reduced to 8 bits by adding & 0xFF — and that the output register number in the out instruction needs to be treated as an input (because the instruction does not modify the number itself, but rather the register at that address).

 

Writing and maintaining inline assembly language is probably ten times more difficult than C code, and then only after you are somewhat familiar with the assembly language.  Usually only a few lines are needed.  In this case, it is justified by the (presumed) need to reduce the interrupt latency.

 

The 14 clock cycle latency is still more that the 11 cycle latency of the main line code, so it is not necessary to try to reduce it:

      if( intrFlag > 0 )

      {

         cli();

         intrCount += intrFlag;

         intrFlag = 0;

         sei();

However, we can reduce it to zero by again blocking only the INT0 interrupt, without blocking interrupts globally:

      if( intrFlag > 0 )

      {

         EIMSK &= ~(1<< INT0);

         intrCount += intrFlag;

         intrFlag = 0;

         EIMSK |= (1<< INT0);

 

Program Example Section:

 

#include <inttypes.h>

#include <avr/io.h>

#include <avr/interrupt.h>

#include "devbd.h"

#include "blcalls.h"

 

// put all globals in one structure

typedef struct

{

   uint16_t curVal;     // read by interrupt handler

   uint16_t seenVal;    // set by interrupt handler

} t_Gbl;

 

t_Gbl volatile gbl;

 

 

ISR( INT0_vect, ISR_BLOCK )

{

   gbl.seenVal = gbl.curVal;

}

 

 

void __attribute__((noreturn)) main()

{

   // set up ports

   DDRB = 0x00;

   PORTC = PORTB_UNUSED;

   DDRC = LED_ALL;

   PORTC = PORTC_UNUSED;

   DDRD = 0x00;

   PORTD = PORTD_UNUSED | BUTTON_ALL;

 

   // enable INT0

   EIMSK = BIT( INT0 );

   EICRA = BITS( INT0_EDGE,  ISC01, ISC00 );

 

   gbl.curVal = 0x00FF;

   gbl.seenVal = 0xAAAA;

 

   sei();

   for(;;)

   {

      // gbl.curVal goes between 0x00FF and 0x101 and is breifly 0x0102

      ++gbl.curVal;

      if( gbl.curVal > 0x0101 ) gbl.curVal = 0x00FF;

 

      // gbl.seenVal is set to a sample of gbl.curVal

      // every time INT0 comes

      // This prints it when that happens

      if( gbl.seenVal != 0xAAAA )

      {

         DbgPrtWord( gbl.seenVal );

         if( gbl.seenVal < 0x00FF  ||  gbl.seenVal > 0x0102 )

            DbgPrtStrNL( "***" );

         else DbgPrtStrNL( "" );

         gbl.seenVal = 0xAAAA;

      }

   }

}

 

Button D2 is connected to INT0.  Push it repeatedly and see occasional failures due to the non-atomic nature of the statement, ++gbl.curVal.

 

The rate of failure is relatively high because the loop is short and executed very often.  In real code, the failure rate might be a few times per year.  In that case, it might not be found until the design was shipped to thousands of customers and it might take most of a year to reproduce it in a debug setting.  The faster alternative might be for someone (you?) to carefully read a hundred thousand lines of code.  Be careful not to make concurrency mistakes and do not expect them to be found during testing.  Such bugs can be costly to a company (and to your career).