CS 185 C Class Notes
Chapter 1 Introduction
Motivation
This class has several purposes.
The class emphasizes the details of using C language in embedded programs. The optional text, Programming Embedded Systems, gives a broad and complete overview of embedded programming in C.
Class Prerequisites
This class assumes you know the basic architecture of computers, CPU, stored program in memory, integer variables, etc.; and you know how to program, either in C or C++, or have a basic understanding of C and the ability to program in another procedural language like Java or Pascal.
Texts for Course: These Notes
and K&R
The lectures will not follow a textbook. These notes are provided to students for the purpose of reviewing what is taught in the lectures.
These notes review only selected special topics regarding the C language. Depending on the need of the students, more elements of the C language will be covered in class. Please use the required text, C Programming Language (2nd Edition), by Kernighan & Ritchie, as a reference.
No matter how well you know C, it will be useful in this class for you to know it better. If you know C well, please read K&R at the beginning of the course regardless. If not, read it at the beginning and go back over it until you understand it. Questions about C, no matter how basic, are welcome in class.
You should carefully read chapters 1 through 6 and appendix A (except A13). Appendix B is less important. Chapters 7 and 8 are not necessary for this class.
Pay particular attention to the following advanced topics: Declarations (Appendix A4, A6 and A8), Pointers (Chapter 5), and Structures (Chapter 6).
Optional Text
Programming Embedded Systems, Second Edition, by Barr and Massa, covers the subject of the class exactly. This class goes through specific examples and details which are not in the book, so it cannot be used as a text for this class.
The book is an excellent text for the subject of embedded programming in C. Anyone who intends to continue learning embedded programming should read it. At the general level, it covers the subject much better than the class notes. It also covers embedded operating systems which we will not do in class.
Chapter 4 explains in detail, the GNU tool-chain, which we will use in class.
Format of These Notes
Each chapter of the class notes has sections for the discussion of hardware, programming and examples. The hardware sections discuss the operation of the processor, its built-in peripheral devices or external hardware devices included on the Development Board you will use in this class. The programming sections discuss the C language, details of how the machine code produced from C language code controls the processor, how to use C language to control specific features of the processor, algorithms specific to low level programming and tips, tricks and techniques for short effective and maintainable code.
Hardware section:
Embedded system
An embedded system is computer system with a fixed purpose, unlike a personal computer. Examples range from a lawn sprinkler controller or a microwave oven with only timer and on-off switching functions to network routers which typically run a full Linux operating system.
This limitation of purpose allows the included software to be fixed, never changing except for possible updates controlled by the manufacturer. Most embedded computer devices do not include disk drives. The software is stored in read-only memory or, in flash memory, which can be modified, but only slowly and a limited number of times.
The fixed nature of embedded software has led to the term "firmware" - more difficult to change than software but easier to change than hardware.
Writing firmware for full sized embedded operating systems (like Linux or Unix-like commercial operating systems) resembles writing software for personal computers more that it resembles firmware for small embedded systems. Large embedded systems combine a few elements of small embedded systems, like the lack of keyboard, mouse and display, and the need interact directly with the processor or custom hardware, with main stream software issues like network protocols, high level operating system services and programming in JAVA and scripting languages.
Small embedded system
firmware
Small embedded systems often have no operating system. In this case, the only operations the processor executes are those specified the C code you write (except for a small block of assembly language code which runs before the C code to initialize global variables and set up the stack pointer).
We will study a small embedded system, learning the issues specific to embedded systems without the distracting complexity of larger systems. For the largest embedded systems, this knowledge needs to be combined with knowledge of high level software issues and techniques, and details of the services of full sized operating systems.
Small embedded system
hardware
Small embedded systems can be built using nothing more than a single microcontroller integrated circuit. A practical product would include a robust power supply, hardware for user interface like buttons, an LCD display and maybe sound, and probably electronics to convert the low power digital inputs and outputs of the microcontroller to whatever is being controlled.
Microcontroller
The microcontroller we will use contains flash memory that holds the firmware, RAM memory used to hold operating variables, the required clock and reset generation functionality as well as a complete processor. Once programmed this chip can be connected to DC power (3 to 5 volts) and will begin running the programmed firmware.
Connecting to one or more of its I/O pins allows it to do real work.
Development Board for this
class
We will use a development board which connects to a USB port on a Windows PC. This connection provides power and allows the PC to download programs to the microcontroller as well as emulate a terminal connected to its serial port. The board has two buttons and four LEDs connected to the microcontroller I/O pins and two peripheral chips connected to the microcontroller's SPI bus. It has a loudspeaker to output sound from either a digital output of the microcontroller or the DAC (digital to analog converter) peripheral.
Processor Architecture
Section:
Here, in the introduction, we will review computer processor architecture adding comments related to small microcontrollers. Later, we will learn the specifics of the core architecture of the particular microcontroller that we will be using as well as that of its peripheral parts.
Basics of a processor
Most readers will already know these general facts about processors in general:
· The processor fetches and then executes instructions from memory.
· The order of execution is sequential except when specifically modified by an executing instruction.
· Instructions can read data from and write data into specific memory locations.
· Instructions can also read and write data in a small number of registers.
· Instructions can do operations (arithmetic and bit manipulation) on data.
Microcontrollers in
particular
Microcontrollers differ from ordinary processors in that they have the memory the processor uses on the same chip. In most cases this is very much less memory that a normal PC.
Separate memory areas for
program and data
The type of memory for instructions is physically different than that used for data because the instruction memory must be preserved when power is removed and later restored while data memory must be writable as easily as it is read. In this case, it is usual to number the memory locations in each type of memory separately. In other words, the memory at any address, 0x0004 for example, in program memory is not related to the memory at address 0x0004 in data memory. It is said that the program and data address spaces are separate. (This is called Harvard architecture. The single address space of larger processors where the contents at an address can be treated as an instruction or as data is called Von Neumann architecture.)
Microcontrollers with separate address spaces have a special instruction to read data from program memory. Although it is not useful for a program to be able to read its own instructions, it is very useful to store constant data in program memory and then allow an executing program to read them as needed.
Nonvolatile memory
Where flash memory is used as program memory, there is usually a way for an executing program to write to it. Almost always, this is done only to install a new firmware version rather than as a part of normal operation because it delays program execution longer than is usually acceptable and because the number of possible rewrites of flash memory is limited.
Many microcontrollers have EEPROM (writable non-volatile) storage that is separate from both data and program address spaces. It is not treated as data memory because its timing is different, and writing is so slow that software must continue to execute after a write operation is initiated and return later when it is complete.
Lack of features for speed
and security
Microcontrollers are designed to favor simplicity over speed (with rare exceptions). They do not use cache and so operate at relatively low speed, ten to forty, rarely up to one hundred, million instructions per second, and instructions that use data memory are often slower. Memory mapping (MMU) and hardware data transfer (DMA) are absent. Hardware implementation that can be done more slowly in software, such as division and floating point data, often even multiply, are absent. Protection features like privileged mode, memory protection and error exceptions are absent.
Extra of features for use in
electronic devices and circuits
Unlike large processors, microcontrollers are intended to reduce the size and complexity of a product, so they include as much peripheral hardware functionality as feasible, such as multiple counters and timers, pulse generators, various serial port types and analog converters, as well as functionality needed to support the processor such as clock and reset generation. Many microcontrollers are designed for battery powered applications with minimized power consumption and special sleep modes.
Complex peripherals like Ethernet and USB have recently become available integrated in microcontrollers but at extra cost.
Many embedded products use
more powerful processors
These comments apply to microcontrollers which contain their own memory and are complete systems in themselves. Many embedded products contain more powerful processors with large amounts of separate memory chips and resemble full computers more than they resemble the microcontrollers we will study.
Upcoming
The next chapter will examine in detail the particular microcontroller we will use.
C Language Section:
High level features without
blocking access to low level details
The C language, and its successor C++, are unique in providing abstraction and independence from low level machine details without making these details inaccessible.
Most high level languages present a closed, self-sufficient programming environment in which the programmer needs to know nothing the underlying hardware details. People who write code that owns the whole system: embedded systems and operating systems, need more, they need to be able to control every aspect of the processor, not just move data and do computations.
Assembly language = machine
instructions
Every processor has an assembly language in which the programmer codes the native instructions that the processor executes. Assembly language therefore allows the programmer to control every operation the processor is designed to do. Assembly language programming is tedious, difficult and differs radically between each of the many processor types. Many lines of assembly code are needed for what can be done each line of high level code, making it much more prone to errors and difficult to maintain.
High level features automate
rote tasks
High level languages hide the differences and automate tasks such as arithmetic on variables larger than the size supported by the processor, choosing locations in memory to store variables and converting complex algebraic expressions into a linear sequence of machine instructions.
C Language is intended for
both abstraction and machine level control
C was originally designed as a high level language with which to rewrite an early version of UNIX previously written in assembly language. It intentionally combines the advantages of high level abstraction with the ability to override these abstractions.
Pointers are necessary
Support for the pointer data type is avoided by some languages because its use makes programs harder to verify and more prone to coding errors. Pointers in C parallel the manipulation of addresses at the machine instruction level. At the C level, pointers allow operations that, although unnecessary for computation and potentially dangerous, can be done just as in assembly language, but concisely, clearly and conveniently.
Programs have meaning to
humans as well as the machine
Well written programs can be read at two levels - as instructions to be executed precisely according to the specifications of the computer language - and as a human readable description of how the program operates. Similarly, a single piece of C code can be read both as a high level program hiding unnecessary details and, where needed, as a tool to manipulate data at the level of bytes and words stored at specific memory locations.
You need to know details of
the machine and how C uses them
Obviously, to be able to use C for hardware level manipulation of data, it is necessary to understand how to manipulate data at that level. Therefore, we will need to study the architecture of the processor, especially how memory is used.
An important related subject is the details of operation of the machine instruction (assembly language) version of a C program which is produced by the C compiler to execute on the processor. Particularly, knowledge of its usage of the stack is needed in the embedded case where the memory is size is limited. (The entire memory of the microcontroller we will use is a fraction of the amount a Windows program reserved for the stack of a single thread.) This knowledge is necessary also to interface sections of assembly language code with C and very useful when debugging a program.
Summary -> Understand C
thoroughly to use it for embedded programming
In summary, the use of C (or C++) is almost mandatory in small embedded systems, understanding the C language completely and thoroughly is greatly useful and it is necessary to understand assembly language programming but preferable to avoid using it wherever possible.
Upcoming
The upcoming chapters will review the C language emphasizing techniques used for embedded programming along with general tips for organization and maintainable code. Features of the GNU C compiler and related parts like the linker will be presented.
Program Example Section:
The upcoming chapters have program examples that run on the development board and demonstrate features of the microcontroller and programming that have been presented. Each chapter has a Program Example Section which discusses presents and discusses the examples related to that chapter.
Chapter 2 Memory
This chapter goes over the memory included on the microcontroller chip.
Although the first chapter described microcontrollers in general, starting now, we will be discussing the particular microcontroller that we are going to use, the Atmel AVR ATmega168. Almost everything will apply to all microcontrollers in the Atmel AVR family. All the principles we will learn apply to all microcontrollers.
Program and data address
spaces
The most important memories are program memory and data memory. As already mentioned, in this microcontroller, addresses that refer to memory locations are understood to be separate for program and data memory. Address 0x0004 might refer to an instruction stored in program memory, or a separate location in data memory, depending on the context in which it is used. This is different from the large processors which may be familiar to you, which have only one addressing space for both instructions and data.
Hardware section:
The ATmega168 has 1 Kbytes of data memory and 16 Kbytes of program memory. The data memory is SRAM which can be read and written very simply. The program memory is flash memory for which reading is simple, but writing is slow and cumbersome. This is appropriate because the program is written only once, when the microcontroller is programmed. The flash is nonvolatile – the program remains when power is turned off.
Program memory
Program memory addresses go from 0x0000 to 0x1FFF. This range is 8 K addresses. Each instruction takes a multiple of two bytes – most are 16 bits (2 bytes), a few are 32 bits (4 bytes). Unlike single address space processors, where every byte has an address, program space addresses refer to 16 bit memory locations. (Note that the GNU linker, which is designed mainly for byte addresses, uses addresses that count bytes. Therefore all address numbers it prints out are twice the number seen and used in the program and by the processor.)
Data memory
Data memory addresses go from 0x0100 to 0x04FF. Each byte is addressed normally.
Addresses in the range 0x0000 to 0x00FF represent registers. These include the 32 general purpose registers (which are normally used in instructions without referencing their addresses, a few other registers used by the processor like the stack pointer, and hardware peripheral control registers which the program writes and reads to control the on-chip peripherals. Not all locations in the range 0x0000 to 0x00FF are used.
There are two other memories in the microcontroller: EEPROM and Fuse Bits.
EEPROM
The EEPROM is physically similar to the flash: Both are nonvolatile. Reading happens at the normal speed of program execution, writing takes several milliseconds – 10,000 times slower. Both must be erased before writing. The EEPROM is intended to be used for storing data. In the EEPROM, individual bytes can be erased and rewritten; in the flash, large sections must be erased at one time. The EEPROM can be rewritten 100,000 times; the flash only 10,000 times. Most importantly EEPROM can be rewritten while the program is running; the flash can be rewritten during program execution only in a limited way. The addresses in the EEPROM are not in either address space. It is not known to the C Compiler. It is accessed by reading and writing several
Fuse bits
The Fuse Bits consist of three bytes which control aspects of the operation that never change – mostly electrical characteristics. They cannot be changed by the program.
Initial programming
The program memory, the fuse bits and optionally the EEPROM need to be programmed when the product is manufactured. The EEPROM can be changed at any time under control of the program. The program memory can be changed under program control with limitations, normally only to update the firmware after the embedded system is already in use.
Processor Architecture
Section:
In this section, we examine the data memory area from the program point of view.
Data Memory Area
Memory locations 0x0100 to 0x04FF are ordinary SRAM. They just retain whatever value is stored. Locations 0x0020 to 0x00FF are control registers for the processor and for on-chip peripheral devices. For these, reading does not necessarily give what was previously written.
Peripheral Control Registers
– not exactly memory
Some of these locations are read-only, reading gives information that may change from time to time, writing has no effect. For example, the Port C Input register, named PINC, gives the digital state of the voltage levels on the “Port C” pins – 6 of the 28 pins on the microcontroller.
If you want to force the voltage on the pins to digital one or zero, you write the desired bit pattern to a different register, the Port C Output register, named PORTC. If you write 0x00 to PORTC, to pull all the pins to low voltage, but a stronger external device forces it high, reading PINC will show the actual voltage level.
Most writable registers can also be read and show the same value that was previously written. In this way, they are the same as ordinary memory. The difference is that there is the side effect – the voltages on the pins change in this case. The purpose of allowing such registers to be read as well as written is to allow the program to see what value is in the register. The program could keep a copy in normal memory and update it each time the control register were changed. Allowing the register to be read makes that unnecessary.
In the example above where PORTC is set to zero, but a pin forced high, reading PORTC will show it low and reading PINC will show it high. (Also, the microcontroller will overheat.)
Port C can also be used in input mode where the pins can be driven freely by external devices without conflict. The Port C Data Direction register determines whether
There are a few cases of read/write registers where the value written and read are less related or not related at all. There are interrupt flag bits in several registers which show as 1 if the hardware is requesting an interrupt, 0 if not. The program can clear that flag by writing 1. Writing 1 to that bit, when it reads as 1, causes it to read as 0 afterwards.
Writing to the UART data register, UDR0, causes 8 bits to be transmitted serially on a microcontroller pin. Reading the same pin gets 8 bits previously received on a different pin.
Despite these exceptions, most writable control registers act like memory – reading gives the value most recently written there.
Use Register Names
PINC is read as location 0x0026. You will not need to know the actual locations except perhaps when you are debugging a program. Programs always use a symbolic name for such locations, PINC, in this case. In C language, there is a header file where PINC is defined as (*(volatile uint8_t *)(0x26) – it is a number that is specified to be a memory location. Related but different microcontrollers, in the same family, also have a PINC register, but it might be a different location in the memory area. Using the name, PINC, in a program reduces the number of changes needed to make the program run on a different processor. It also, obviously, makes the program easier to understand
Implications for C
The point of the preceding discussion is that the C compiler does not differentiate control registers from ordinary data memory. Accessing control registers in C will be discussed later in the C language section below.
C Language Section:
Pointers are variables which contain the address (location in memory) of variables. The type of the pointer specifies how the contents of the memory are to be interpreted. Dereferencing such a pointer in a C program causes the compiler to produce machine instructions to fetch from (or write to) the location in data memory given by the value of the pointer. C also supports pointers to functions. In that case, the address is understood to be in program memory. To dereference a pointer to function, the compiler produces a machine instruction that calls a function. The location called is similarly given by the value of the pointer variable, but it is in program memory.
Casting Pointers
Casting pointers is a feature of C that should never be used in high level programs because it can make programs non portable by exposing features of the hardware that change from system to system. For programs that interact directly with the hardware, this exposure is crucial.
Here is a brief summary of casting for C programmers unclear on it. First, understand that the sense of the word, cast, is not the meaning, to throw, rather it is the meaning, to melt a substance and pour it into a container in order to give it a particular shape. So, casting a variable means changing its “shape” without changing its “substance” – specifically changing its type without changing its value.
You seldom need to cast variables because C automatically translates types. For example
long x;
float f;
f = 1.;
x = (long)f;
In the last line the float value 1 (internal representation 0x3f800000) gets converted to the integer type before it is stored in x. In this case, its new internal representation (now 0x00000001) is obtained by a rather complicated set of steps, but its value is still the number 1. It is not necessary to use the cast in this case because C knows that this is a normal and useful type of conversion.
When you cast a pointer, the value never changes (the
address remains the same address), but the meaning changes. When the new pointer is dereferenced
(the value is read from memory as specified by the address in the pointer), the
same stored data (at the same address) is interpreted differently. In the following example, the first pointer
points to memory containing data representing a variable of type float. The program creates a second pointer whose
data type is long,
and copies the address in the first pointer into the second using a cast. The data in memory has not changed, but when
dereferencing reads it into a variable of type long, the data has a different meaning.
long x;
float f;
long * px;
float * pf;
f = 1.;
pf = &f;
px
= (long *)pf;
x = *px;
In the line, pf = &f, pf is assigned the address the memory location (say 0x04dc for example) which holds the first of the four bytes of the value of f. Casting pf to type (long *) in the line, px = (long *)pf, leaves its value (0x04dc) the same but makes it a pointer to long and stores it in px. In the line, x = *px, the value of the long at that location, is copied to x. The value of x is now 1065353216. Why? In the last line x gets the four byte value at location 0x04dc, which is 0x3f800000. That binary bit pattern, in a long, is 1065353216 in decimal, 0x3f800000 in hexadecimal.
The explicit cast in the second to last line is necessary. Without it, the statement, px = pf, is an error. The compiler knows that converting a pointer-to-float to a pointer-to-long leads to a result that is normally not desired or correct. Therefore, it does not make the conversion automatically. The explicit cast is a direct instruction to the compiler to make the conversion. Since the programmer clearly intends this conversion, the compiler does not treat it as an error. (In C++, casting is separated into four types. The type we are using here is called the reinterpret_cast<>, in C++.)
This is an extreme example. Even hardware aware programs do not normally read float data as an integer. Mostly, we will cast integers which represent memory addresses into pointers. Sometimes we will cast a pointer to a long integer into a pointer to a single byte. Then we could, for instance, change the upper byte without changing the rest. (We could do the same thing using shift and mask operations. Sometimes, casting produces more efficient code.)
Casting an Integer Address to
a Pointer – to Read a Specific Memory Location
Now consider the code which reads the Port C pins and puts the bits in x.
unsigned char x;
x = ( (*(unsigned
char volatile *)(0x26) );
This is a shorter way of saying
short addr;
unsigned char volatile
* px;
unsigned char x;
addr
= 0x26;
px
= ((*(unsigned char volatile *)addr;
x = *px;
The 16 bit variable addr is assigned the value 0x0026. Then the pointer px is given that value. The variable addr must be cast explicitly because it is unsafe to convert an integer to a pointer. The last line reads the value of the memory location at address 0x0026, which is actually the Port C Input Register rather than memory, but that does not matter to the compiler.
The short version of the program uses internal temporary variables instead of addr and px in the same way that
x = (a + b) * (c + d);
is an abbreviation of
t1 = a + b;
t2 = c + d;
x = t1*t2;
Once you understand casting to pointers, you can do many operations related to talking to hardware registers in the data address space.
Volatile Keyword
The meaning of the keyword, volatile, in ( (*(unsigned char volatile *)(0x26) ) is somewhat subtle, but very important in embedded programming. Syntactically, it is called a variable modification, like the keyword const. The keyword, const, applied to a variable, tells the compiler that it should not allow the program to modify that variable. The keyword, volatile, tells the compiler that the variable could unexpectedly be changed.
You can always write a working program without using the const keyword – it just helps the compiler to find errors and makes the program easier to understand. There are cases where the program will run incorrectly if the volatile keyword is lacking.
In C, the variables normally are kept in specific locations in memory. The machine code produced by the compiler often moves the value of a variable into a register before it can use it. If the value is used several times, the compiler produces code that reuses the value in the register rather than reading it out of memory a second time.
This makes the code smaller and faster. It normally works because the compiler normally knows whether it has produced code that could have changed the value of the variable. The volatile keyword forces the compiler to read the variable from memory every time it is used.
In the case above where the variable is located at the address of a register, the value does change spontaneously, every time the inputs to port C change. Without the volatile keyword, the compiler would likely assume the value never changes and produce code that would read it only once, save that value and reuse it.
Location of Modifier Keyword in a Pointer Declaration
The location of a modifier in the declaration of a pointer is important. The following declarations are both syntactically correct, but they cause different operations in programs.
unsigned char volatile
* p1;
unsigned char * volatile
p2;
p1 is a pointer to a volatile
byte. If the value of p1 is 0x0026, then the
compiler must reread memory location 0x0026 each time is *p1 evaluated,
in case the content of that location has changed. The volatile keyword is next to unsigned char; it
refers to the unsigned
char to which the pointer points.
p2 is a volatile pointer to a byte. If the value of p2 was 0x0026, then the compiler must reread the value of p2 in case the value is no longer 0x0026. The volatile keyword is next to p2; it refers to p2, the pointer itself.
Usually it is the first case, the referred location is
volatile, not the value of the pointer.
Program Example Section:
Here is our first program. It blinks an LED on the development board.
The program first sets the Port C Data Direction register to 0x0F. This makes the four lowest port C pins (PC0, PC1, PC2, PC3) into outputs. The other two remain inputs.
The variable val will control the LEDs it will alternate between 0x00 and 0x01. Decrementing the variable count is used to delay between the two states of the LED. Otherwise the blink would be too fast to see.
A Program to Blink an LED
#include
<inttypes.h>
#include
<avr/io.h>
void main()
{
int32_t volatile
count;
uint8_t val;
DDRC = 0x0F;
val
= 0x00;
for(;;)
{
for(
count=0x100000; count>=0; --count ) continue;
val
= val ^ 0x01;
PORTC = val;
}
}
The symbols, DDRC and PORTC, are macros defined in <avr/io.h>
as something like
( (*(unsigned char volatile *)(0x26) ). The symbols, int32_t and uint8_t, are typedefs
defined in <inttypes.h> as long and unsigned char.
Volatile Keyword Again
Note a different use of the volatile keyword in the declaration of count. Without it, the compiler would (correctly) decide that the line
for( count=0x100000; count>=0; --count ) continue;
is irrelevant to every calculation in the program, and eliminate it in the interest of execution speed and code size. That would eliminate the delay that we intended. Declaring count to be volatile forces the compiler to decrement and test it regardless (as well as forcing it to use the real variable in memory rather than a copy in a register).
The Program to Blink an LED,
in Assembly Language
Before we get into C tools, we will use a version of this program written entirely in assembly language. All subsequent programs will be written entirely or almost entirely C. In this case only, we do not need to use the C compiler.
Instead of C variables in memory, we use several of the many general purpose registers in this processor. R16 is used as val above. R20, r21 and r22 are used as count. (Even though count is four bytes long, we only use three here because the top byte is always zero. If we wanted very long delays, we would use a fourth.) R17 is used to hold the constant 1 that is used in the line
val
= val ^ 0x01;
Here is the same functionality in assembly language.
.equ DDRC =
0x07
.equ PORTC = 0x08
.org 0 ; program starts at location 0 of
program memory
ldi
r16, 0x0F ; put value 0x0F in r16
out DDRC, r16 ; port C bit 0-3 are output
ldi r16, 0x00 ; value to output on port C
out DDRC, r16 ;
output it
ldi
r16, 0x01 ; put value 0x01 in r16
loop2:
ldi
r20, 0x00
ldi
r21, 0x00
ldi
r22, 0x10 ; load 24 bit counter with
0x100000
loop1:
subi
r20, 1
sbci
r21, 0
sbci
r22, 0 ; subtract 0x000001 from
counter
brcc
loop1 ; loop if not negative
eor
r17, r16 ; change bit 0 of value to
output
out PORTC,
r17 ; output value to port C pin
rjmp
loop2
Readers who are familiar with Intel x86 assembly language will be misled by the out instructions. There is no separate I/O address space. The out instruction causes a value to be stored in memory, but it is optimized to address locations 0x0020 to 0x003F. The instruction,
sts
PORTC, r17
would also work. This instruction can address all data memory locations 0x0000 to 0xFFFF, but the instruction occupies 32 bits, whereas the out instruction occupies only 16 bits.
Chapter 3 Data Representation
This chapter describes how the data that represents the value of variables is stored in memory.
Hardware Section:
The hardware components involved in data storage are the processor and the data memory. These are included on the microcontroller chip.
The processor contains a computation unit (ALU) and 32 temporary storage registers which hold eight bits each.
The data memory has 1024 storage locations which hold eight bits each.
Processor Architecture
Section:
Data Width is 8 Bits
Data memory is eight bits wide. This means that each of the memory locations having a unique address, holds 8 bits. In all modern processors, each unique address holds 8 bits, even though many read and write a larger number of bits in each operation. These 8 bits, when considered as a unit, is called a byte.
The processor we use is an eight bit processor. Most of the operations it does act on groups of 8 bits, one byte. In particular, all memory operations always read or write single bytes and almost all computation operations use single byte inputs and give a single byte output.
The size of data used by a processor is called the machine word size. Other processors use machine word size of two, four or eight bytes.
Hexadecimal Notation
Hexadecimal notation is useful to describe data. It uses sixteen possible values for each digit instead of ten. Traditionally, 0x is prefixed to distinguish it from normal (decimal) notation. One hexadecimal digit represents the numbers zero through fifteen (15 = 24-1), two digits 0 to 255 (255 = 28-1), four digits 0 to 65535 (65535 = 216-1), etc. Therefore, we use two hexadecimal digits to represent each byte. The possible values of a byte go from 0x00 to 0xFF.
If we want to write out the contents of a series of memory locations, we would write a series of groups of two digits, for example:
11
24 1F BE 8F EF 94 E0
(A 32 bit processor would have groups of eight digits.)
Multiple Word Operations
C language variables are 8, 16 or 32 bits wide. Longer variables require the processor to do operations like addition in pieces.
We will look at how to do arithmetic on variable that are multiple machine words in length.
The first issue is how to store multi-word variables in memory. These variables are stored as a sequence of words in consecutive addresses. In the case of an eight bit processor, these words are bytes. In other processors, each word is several bytes, but a C variable could still be several words.
Consider the C type, long (32 bits = 4 bytes), with a value of one thousand decimal. 1000 = 3*28 + 12*24 + 8*20 – therefore 0x000003C8. The five leading zeros make the length 8 hexadecimal digits to show that is a 32 bit number, rather than 0x03C8 which would indicate a 16 bit number. Divided into bytes, this is 0x00*224 + 0x00*216 + 0x03*28 + 0xC8*20. This number is stored in memory as:
C8
03 00 00
The least significant byte (LSB) is stored first. Least significant means the 20 byte. First means stored at the lowest consecutive address.
Not all processors store the least significant byte first. Others store 0x000003C8 as
00 00 03 C8 –
most significant byte (MSB) first.
Endian-ness
As long as the variables stored in memory are used only by code running on the same kind of processor, the order of storage is not a concern, since the machine instructions handle it properly.
In cases where the contents of memory are moved somewhere else, like a binary disk file, or sent over Ethernet, problems do occur. Because programmers need to worry about this problem, the two different storage schemes have names: “Big Endian” and “Little Endian”, referring to MSB first and LSB first respectively.
The 8 bit processor that we use in this class has a machine word size of one byte. In that sense, the processor does not have an Endian type – all stores and fetches move only one byte, there is no order of bytes.
That is not the case for the code produced by the C compiler. The machine code produced by the compiler must store two and four byte variables if the program specifies. If the processor has an Endian type, the compiler will conform to it. For example, if a 16 bit processor stores two byte words in Little Endian order, the C compiler will store a four byte variable using two two-byte store operations with the less significant two bytes coming before the more significant two bytes.
Although our 8 bit processor does not determine an Endian ordering, the C compiler we use uses Little Endian format.
There is one case, where our processor does a two byte store, that is when it saves a return address on the stack, either when a CALL instruction is executed or when an interrupt occurs. Although it is not stated in the documentation, the two-byte return address is saved in Big Endian format. This is not a problem since C programs do not directly access the saved return address.
C Language Section:
Endian-ness
Here is an example of the relevance of Endian-ness in C code.
The following C code is from the Ethernet handling part of the Linux kernel:
struct ethhdr
{
unsigned char h_dest[ETH_ALEN]; /* destination eth addr
*/
unsigned char h_source[ETH_ALEN]; /* source ether addr */
__be16 h_proto; /* packet type ID field */
}
__attribute__((packed));
………
if (type !=
ETH_P_802_3)
eth->h_proto = htons(type);
else
eth->h_proto = htons(len);
When the kernel needs to send an Ethernet packet, it fills in a structure in memory and gives a pointer to the structure to the driver for the particular Ethernet controller used in the system. The driver does something equivalent to casting the pointer to (unsigned char *) and then using that pointer to sequentially read bytes from memory and send them on the wire. The receiving Ethernet driver, on a different computer does the reverse, filling in the structure byte by byte.
There is a problem in the member variable, h_proto, which represents the length of the packet. It is two bytes long, so it will be different depending on whether the processor uses Big Endian or Little Endian storage. The Ethernet standard specifies that all multi-byte numbers must be in Big Endian format. A processor that uses Little Endian format is required to reverse the order of the bytes before storing into h_proto. The typedef, __be16, is just unsigned short, but its name indicates that it should be in Big Endian format, regardless of the format used by the processor.
Since Linux is compiled and run on both Little Endian processors like Intel x86 and Big Endian processors like Power PC, Linux code uses the macro, htons(), to convert a short from host to network format. On Big Endian processors, htons() does nothing. On Little Endian processors, htons() swaps the two bytes.
You can try compiling and running the following C program on a computer:
#include
<stdio.h>
int main()
{
unsigned short x = 1000;
unsigned char * p
= (unsigned char *)&x;
printf( “%02x %02x\n”, p[0], p[1] );
return 0;
}
If the output is 03 C8, the processor is Big Endian. If the output is C8 03, the processor is Little Endian. You should notice the pointer cast. Without a pointer cast (or a union, which does the same thing) every program will give the same result regardless of the processor (excluding errors such as reading beyond the end of an array, etc.).
Program Example Section:
We will run the Endian test program on our system with a few changes.
#include
"blcalls.h"
void __attribute__((noreturn))
main()
{
unsigned short x =
1000;
unsigned char * p
= (unsigned char *)&x;
DbgPrtByte( p[0] );
DbgPrtByteNL( p[1] );
DbgPrtStack( 8 );
// stop execution here
for(;;) continue;
}
This is the output:
E8
03
04F8
FA
04F9
04
04FA
E8
04FB
03
04FC
FF
04FD
04
04FE
00
04FF
4F
Endian-ness
The first line is the expected E8 03, showing unsigned short x has been stored in Little Endian format. The rest is a listing of the content of the stack, produced by the function, DbgPrtStack(). The above stack listing result depends on setting the compiler optimization level to 0. This causes the compiler to store all local variables on the stack rather than keeping them in registers.
We will study the stack later, for now just take the following information:
Addresses
04F8 and 04F9 unsigned char * p
Addresses
04FA and 04FB unsigned short x
Addresses
04FC saved register R28
Addresses
04FD saved register R29
Addresses
04FE and 04FF return address of call
to main()
You can see that x is 0x03C8 and p is 0x04FA. Note that p does indeed point to the address of x.
Return Address on Stack
This is the assembly language instruction that called main()and the next instruction.
+0000004D: 940E0053
CALL 0x00000053
+0000004F: 940C0074
JMP 0x00000074
The return address for the call will be the instruction after the CALL, at address 0x004F. See that the two-byte return address, stored by the CALL instruction at address 0x04FE is in Big Endian format. Remember that the return address 0x004F refers to program memory space but the other addresses like the value of pointer p=0x04FA, refers to data memory space.
Bootloader Debug Functions
Finally, here is an explanation of the functions DbgPrtByte() etc. These functions are defined in the file, "blcalls.h", for example:
#define DbgPrtByte
( * ( void ( * )( unsigned char x ) ) 0x1FF2 )
The definition, *( void ( * )( unsigned char x ) ) 0x1FF2, is the constant, 0x1FF2, cast to be a pointer to a function taking one argument, unsigned char x, and returning void; the * at the beginning indicates evaluation of the pointer, that is calling the function.
Calls to functions defined outside of a program are normally resolved by the Linker after it finds code for the function. This is different. It is another example of casting a pointer to make C do something unusual.
There is machine code for a function already at address 0x1FF2 because we are using a bootloader program which is already programmed into the top 1K bytes of the microcontroller. The main purpose of the bootloader is to upload new programs into the rest of memory. A secondary feature is the existence of several debug functions already in program memory. The bootloader was programmed into the microcontroller when the demo board was manufactured.
Chapter 4 Data Representation, part 2
This chapter discusses two’s compliment and arithmetic on multi-word variables.
Hardware Section:
The hardware components to do arithmetic are the computation unit (ALU), the general purpose registers and the Status Register. These are included on the microcontroller chip.
Status Register and Carry Flag
The computation unit takes one or two inputs from general registers and calculates a result which it puts into one of the input registers. Additionally, it sets certain bits in the Status Register according to the result of the computation. Some computation operations also take input from the carry flag bit in the Status Register.
Two’s Compliment
The way negative numbers are represented as a bit pattern in two’s compliment arithmetic, the operation of the hardware for addition and subtraction operations on a bit pattern, is the same regardless whether the bit patterns of the inputs represent signed or unsigned (or mixed) variables. The processor does not have separate instructions for the various cases, as opposed to the multiplication, where the processor needs different instructions for each case.
Processor Architecture
Section:
Carry Flag for Add
The carry flag gets its name from the concept of carry in addition where the result of the sum of two digits is greater than a single digit and the number one must be added to the next higher digits. The same thing can happen between words when adding multi-word integers. Here is an example:
0x12345678
+ 0x9ABCDEF0
( 1 1 carry)
12 34 56 78
+
9A BC DE F0
-------------
AC F1 35 68
0x78
+ 0xF0 = 0x168
0x56
+ 0xDE +1 = 0x135
0x34
+ 0xBC +1 = 0xF1
0x12
+ 0x9A = 0xAC
After each operation that adds one byte to one byte, the low eight bits of the possibly nine bit result is saved in a register and the carry flag is set to one if and only if the ninth bit is one. After the addition of the lowest bytes, the new value of the carry flag is added along with the two input bytes.
Two’s Compliment
Two’s compliment arithmetic is defined by interpreting the highest bit in signed numbers as negative. Here is an example for the bit pattern 0xF4 = 11110100 (one byte variable).
Position bit
value unsigned signed
27 1 128 -128
26 1 64 64
25 0 32 32
24 1 16 16
23 0 . .
22 1 4 4
21 0 . .
20 0 . .
----- -----
Value of variable 244 -12
Here are two examples of addition: adding 4 and adding 32. The bit patterns are the same while the interpretations are different:
Unsigned: 0xF4 + 0x04 = 0xF8 244 + 4 = 248
Unsigned: 0xF4 + 0x20 = 0x14 244 + 32 = 20 (overflow)
Signed: 0xF4 + 0x04 = 0xF8 -12 + 4 = -8
Signed: 0xF4 + 0x20 = 0x14 -12 + 32 = 20
The case 244 + 32 = 20 is a case of overflow. The operation produced a carry (equal to 28 = 256) which was discarded.
Adding negative numbers can also produce an overflow:
0xF4 + 0x88= 0x7C -12 + -120 = 124 (overflow)
In both cases of overflow, the result is off by 256:
244 + 32 = 275 = 20 + 256
-12 + -120 = -132 = 124 - 256
Overflow errors are always multiples of 256 because the bit position of discarded carries is 28 = 256.
Mathematically, this is modular arithmetic – modulo 28 = 256 for one byte char, modulo 216 = 65536 for two byte short, and modulo 232 = 4294967296 for four byte long.
The modulo, 2n, is
also the difference between the signed and unsigned interpretation of numbers
where the high bit is on. This is
because of the difference in the interpretation of the high bit – the
difference between 2(n-1)
and -2(n-1)
is 2n.
The fact that the signed and unsigned interpretations differ only by 0 or 2n is the reason both work using the same operation at the bit level – any possible difference between the results also differs by 2n, which is already the case.
Signed and Unsigned Comparisons
In the preceding section, we saw that addition and subtraction works without the need to distinguish whether the bytes represent unsigned or sighed variables. The same is true when testing whether two variables are equal. If and only if
(signed int)x == (signed int)y is true, then
(unsigned int)x == (unsigned int)y is true.
This is not true for tests of greater and less than:
(signed char)0xFF < (signed char)0x01 is true,
but
(unsigned char)0xFF < (unsigned char)0x01 is not true.
Status Register and
Comparisons
There are six flags in the Status Register that are set along with the numerical result of arithmetic operations. We have already seen the carry flag, which is one of those six. (There are two additional flags in the Status Register which can be changed by instructions explicitly, but do not change as a by-product of operations.)
As mentioned, the carry flag is used as an input to some operations, for instance when it is added along with two input bytes. The other five flags are not used in calculation operations. They are used to control branching as the result of comparisons.
Zero Flag
The simplest flag is the zero flag, which is set to one when the result of an operation is zero. Branching depending on the zero flag after a compare (subtraction) operation, tests for equality. A test for equality to zero is done by a branch instruction depending on the zero flag, preceded by a test instruction.
Negative Flag
The negative flag is set when the high bit in a result is set. A test for a signed variable less than zero is done by a branch instruction depending on the negative flag, preceded by a test instruction.
Overflow Flag
The overflow flag is set when there was an overflow in a signed operation. For add, it is set when both inputs are either negative (high bit set) or positive (high bit clear) and the result is the opposite way. For subtract, it is when the result is the same as the number being subtracted and the number from which is subtracted is the opposite, i.e.
A – B = C is equivalent to B + C = A, then apply rule for addition.
Carry Flag
The carry flag is set according to the rules of addition: if the sum of the high bits and the internal carry from adding the next lower bits is greater than one (it can be up to three counting the carry). In the case of subtraction, A – B, it is the same as adding A and (-B).
For subtraction of unsigned variables, carry is set when B is greater than A.
S Flag
The S carry flag is used for comparison of signed variables. It is set as the exclusive-or of the negative and overflow flags. Branching depending on the S flag after a compare (subtraction) operation, tests for greater in signed variables.
Half-Carry Flag
The half-carry flag is seldom used. It is set according to the internal carry from bit 23 to bit 24. Its use is in BCD arithmetic. In BCD, each nibble (half byte) may contain codes from 0x0 to 0x9 (A through F forbidden) representing a decimal digit. Thus, the decimal number 1024 would be represented as 0x1024. (The decimal value of the representation normally would be 4132, but that is irrelevant. It is intended to represent the value 1024 decimal.) After performing a normal addition of such numbers, the result needs to be corrected for the appearance of nibbles 0xA through 0xF.
Carry Flag for Multi Word Operations
We have seen how the carry flag holds the value of the carry resulting from the addition of one pair of bytes, to be used in the addition of the next.
For subtraction, A – B = C, the carry flag (and overflow
flag) are set as they would be when adding B back into the result, C.
C Language Section:
Review of Variable Declaration
This section is a short review of the properties of variables in C. See K&R for details. The keywords used to declare variables are presented. The keywords are grouped according to similar functionality.
Type specifier category:
Type specifier signed modifier category:
These keywords determine the size and representation type (integer or float). Type, void, cannot be used to declare variables. It is used to declare pointers (of unspecified type), function return value (no value returned) and an argument list (indicating no arguments).
Note to JAVA programmers: In JAVA, char and short are the same as in C, but int is 32 bit, and, long is 64 bit.
The type, int, is intended to be the “natural machine size”, which probably means the word size of the machine. The intention may have been to provide a size that works most efficiently on a given machine. Sizes greater that the machine word size, require multiple operations and are obviously slower. Sizes smaller than the machine word size, could in theory require extra operations. Usually processors with machine word size greater than one byte, provide extra hardware to allow computations on smaller sizes at full speed.
The floating point types, float and double, also do not have a standard size.
Here are the actual sizes for these types used by the GNU compiler for the 8 bit AVR and the 32 bit X86 processors:
type int float double
AVR 16 32 32
X86 32 32 64
Most environments have an include file that uses typedef to provide more sensible names. The include file, stdint.h, defines the following types:
This file also provides some types that improve on the intention of int. These types define integers, with sizes that could change for different processors, but where the minimum size is specified. For example if you have a loop counter that goes from zero to nine, an 8 bit variable will work. Using a 16 or 32 bit variable on an 8 bit machine would be inefficient. Using an 8 bit variable on a 32 bit machine would work correctly although possibly less efficiently, but it would appear strange and force someone reading the program to pause and wonder why. Using uint_least8_t (in stdint.h) allows the compiler to use a longer size (when it is better on some particular processor). It also informs the human reader of the intention of the writer of the program.
The other groups of keywords related to variables describe other qualities of the variable.
Type qualifier category:
Storage-class specifier category:
Function – name
is not visible outside current file
Local – opposite of auto, there is one global instance,
but it is
not visible outside the function
can be used only in function local scope (since this is default,
it is never necessary, therefore auto is almost never used)
in a
processor register instead of the stack – used only for optimization, the
processor may ignore the request.
declared
elsewhere, e.g. a different file or below in same file
for a
variable
Program Example Section:
Status Register Flags
Here is a program that allows us to examine the values of the flags in the Status Register after addition and subtraction operations. It depends on a special feature of the AVR processor, that the Status Register is available as a memory mapped control register, so it can be read in the same way as registers of peripheral devices.
C Only Version
The functions, AddFlags and SubFlags, do an 8 bit addition or subtraction, and the Status Register is read immediately thereafter. (The statement, return SREG, specifies that the value of SREG must be returned. This causes it to be read.)
Although this program works, it has a serious problem. The C compiler is guaranteed to produce an executable program that does what the C code specifies. There are no guarantees how it does that. The code below tells the compiler to add two bytes and to return the value of SREG.
To the compiler, SREG is just a memory location, it does not know that we care about the effect of an add or sub machine instruction on the flags. It might read SREG before doing the add instruction since it knows of no connection between the two operations. With default optimization, -Os, without the volatile keyword in AddFlags and SubFlags, this will indeed happen.
There is also no reason the compiler needs to use an add instruction to calculate the sum of the bytes. For example, to calculate a + b, it might have the value (-b) already in a register from an earlier operation, and it would be more efficient to subtract (-b) from a.
We have not told the compiler, and there is no way in the C language to tell it, to use a particular machine instruction or that the instruction has a side effect.
After we run this program, we will learn the correct way.
#include
"blcalls.h"
#include
<stdint.h>
#include
<avr/io.h>
#define
TST_BIT( r, n ) ( r & (1<<n) )
//
some bytes to try
#define
NBYTES 6
uint8_t bytes[NBYTES] = {0x80, 0x81, 0xFF, 0x00,
0x01, 0x7F };
char flagNames[] =
"cznvshti";
// names of flags in Status Reg
uint8_t AddFlags( uint8_t aa, uint8_t bb, uint8_t * pc );
uint8_t SubFlags( uint8_t aa, uint8_t bb, uint8_t * pc );
void __attribute__((noreturn))
main()
{
uint8_t t, i, j, a, b, c, f, k;
char ch;
for( t=0; t<2;
++t )
{ // t selects + or -
for( i=0; i<NBYTES; ++i )
{ // i selects
first byte
a = bytes[i];
for( j=0;
j<NBYTES; ++j )
{ // j selects second byte
b = bytes[j];
f = (t ? AddFlags :
SubFlags)( a, b, &c );
DbgPrtByte( a );
DbgPrtChar( t ? '+' : '-' );
DbgPrtByte( b );
DbgPrtChar( '=' );
DbgPrtByte( c );
for( k=0;
k<8; ++k )
{ // k selects which flag to display
ch = flagNames[k];
if(
TST_BIT(f, k) ) ch -= 0x20;
// -= 0x20 converts lower to upper
DbgPrtChar( ch );
}
DbgPrtCharNL( ' ' );
}
}
DbgPrtCharNL( ' ' );
}
// stop execution here
for(;;) continue;
}
uint8_t AddFlags( uint8_t a,
uint8_t b, uint8_t * pc )
{
uint8_t volatile c;
uint8_t flg;
c = a + b;
flg
= SREG;
*pc = c;
return f;
}
uint8_t SubFlags( uint8_t aa, uint8_t bb, uint8_t * pc )
{
uint8_t volatile cc;
uint8_t flg;
cc = aa
+ bb;
f = SREG;
*pc = cc;
return flg;
}
Here are a few comments about the program:
The expression (t ? AddFlags : SubFlags) is the address of a function.
Appending ( a, b, &c ) causes
the function to be called with those arguments.
ch = flagNames[k];
Remember that a text string is an
array of char terminated
with a zero byte. Therefore "cznvshti"
is the same as
{ 'c', 'z', 'n', 'v',
's', 'h', 't', 'i', '\0' }
What we wanted was the eight characters without the zero byte. Using a text string, which has the zero byte, wastes one memory location. In this case, the programmer wasted a byte to save typing and to make the code easier to read.
Inline Assembly Language Version
The only dependably correct way to make the program use the add instruction and then to read the Status Register without any intervening instructions that change the Status Register, is to write that section of code in assembly language.
The GNU C compiler supports inline assembly language, which allows mixing assembly language into C code. Without this feature, the functions, AddFlags and SubFlags, would need to be written entirely in assembly language.
Here is one way to do it using assembly language. (The syntax is complicated. Please read C:/WinAVR/doc/avr-libc/avr-libc-user-manual/inline_asm.html). Note that putting both assembly language instructions in one statement prevents the compiler from separating them or switches their order.
uint8_t AddFlags(
uint8_t a, uint8_t b, uint8_t * pc )
{
uint8_t flg;
// a = a +
b;
// f =
SREG;
asm volatile ("add %0, %2
\n\t"
"in %1, %3"
: "+r" (a), "=r" (flg)
: "r" (b), "I" (_SFR_IO_ADDR(SREG))
);
*pc = a;
return flg;
}
Here is the assembly language produced. (To get the assembly language output of the compiler, choose menu item Project->Configuration Options, select then Custom Options pane, then type –save-temps in the area next to the Add button and click the Add button. This causes two temporary files to be kept after the compilation is finished: the preprocessor output (extension .i) and compiler assembly language output (extension .s).
AddFlags:
movw r30,r20
add r24, r22
in r25, 63
st Z,r24
mov r24,r25
ret
Functions Entirely in
Assembly Language
Finally, we will look at writing the functions, AddFlags and SubFlags, entirely in assembly language. Reading FAQ.html#faq_reg_usage tells us that the arguments, uint8_t a and uint8_t b are found in registers, r24 and r22 respectively, uint8_t * pc (two byte pointer) is in r20 and r21, and the uint8_t return value needs to be in r24.
We don't want to be experts in assembly language. It is often useful to start with the assembly language produced by the compiler, then modify it as needed. Here is the C code that worked:
uint8_t AddFlags( uint8_t
a, uint8_t b, uint8_t * pc )
{
uint8_t volatile c;
uint8_t flg;
c = a + b;
flg
= SREG;
*pc = c;
return flg;
}
This is the assembly language produced:
.global AddFlags
.type AddFlags, @function
AddFlags:
push r29
push r28
push __tmp_reg__
in r28,__SP_L__
in r29,__SP_H__
movw
r30,r20
add r22,r24
std Y+1,r22
in r24,__SREG__
ldd
r25,Y+1
st
Z,r25
pop __tmp_reg__
pop r28
pop r29
ret
The push and in instructions are used to set up a frame on the stack for the local variable, uint8_t volatile c;. We could use this code, but it is more complicated than we need.
Now, let's try allowing (and requesting that a register be used for c).
uint8_t AddFlags( uint8_t
a, uint8_t b, uint8_t * pc )
{
uint8_t register c;
uint8_t flg;
c = a + b;
flg
= SREG;
*pc = c;
return flg;
}
This is better, except that it is wrong. The in instruction should come after the add instruction.
.global AddFlags
.type AddFlags, @function
AddFlags:
movw
r30,r20
in r25,__SREG__
add r22,r24
st
Z,r22
mov
r24,r25
ret
We can fix that by hand:
.global AddFlags
.type AddFlags, @function
AddFlags:
movw
r30,r20
add r22,r24
in r25,__SREG__
st
Z,r22
mov
r24,r25
ret
Before, the compiler read the Status Register into r25 then moved it to r24 after it used the value of a in r24. Now, the value of a has already been used and we can read the Status Register directly into r24.
.global AddFlags
.type AddFlags, @function
AddFlags:
movw
r30,r20
add r22,r24
in r24,__SREG__
st
Z,r22
ret
Here is the final file. We named it addsub.s and added it to the project. We had to add the line,
#include
<avr/io.h>
and change __SREG__ to _SFR_IO_ADDR(SREG) (since __SREG__ is apparently defined in a different include file). The line
.section .text
specifies program memory, which is default even without this line.
#include
<avr/io.h>
.section .text
.global AddFlags
.type AddFlags, @function
AddFlags:
movw
r30,r20
add r22,r24
in
r24,_SFR_IO_ADDR(SREG)
st
Z,r22
ret
.global SubFlags
.type SubFlags, @function
SubFlags:
movw
r30,r20
sub r22,r24
in r24,_SFR_IO_ADDR(SREG)
st
Z,r22
ret
Chapter 5 First C Program
This chapter discusses the Digital I/O port peripheral hardware of the microcontroller and the GNU tool-chain. We have used both already.
Hardware section:
The microcontroller we are using has 28 electrical pins. Five pins are used for power. The remaining 23 pins are divided into Port B with 8 pins, Port C with 7 pins and Port D with 8 pins. Four of the port pins can be programmed as dedicated hardware signals related to clock and reset. This is done by an external programmer changing the Fuse bits. Most of the Fuse bits cannot be changed by the program running on the microcontroller. The microcontroller on the development board we are using has been programmed to have a dedicated external reset input, which reduces the number of available Port C pins from seven to six.
Most of the 22 pins plus the reset input are connected to parts on the development board. Although all can be used as digital inputs and outputs, seven are connected to parts that are intended to be used this way:
Port B: pin PB1 as output speaker
Port C: pins PC0, PC1, PC2, PC3 as outputs LEDs
Port D: pins PD2, PD3 as inputs pushbuttons
Processor Architecture
Section:
Digital I/O Ports
The Digital I/O port peripheral hardware gives the program running in the microcontroller the ability to input and output digital signals on the 23 port pins. The microcontroller has several more complex peripheral devices that also use these port pins. When any of those peripheral devices are enabled, whichever pins they use are no longer used as Digital I/O ports. The Digital I/O ports and all the other peripheral devices can be enabled, disabled and programmed at any time by the running program.
Each of the three digital I/O ports has three control registers. Each of the eight bits of each register corresponds to one electrical pin. For example, the lowest bit (bit 0 corresponding to 20 = 0x01) in the three registers for Port B controls pin PB0 and bit 1 (corresponding to 21 = 0x02) in these same registers controls pin PB1 which is connected to the speaker.
The three registers that control Port B are called PINB, PORTB and DDRB. There are similar registers, PINC, PORTC, DDRC, PIND, PORTD and DDRD, for Ports C and D.
Reading a PIN register gets eight bits that correspond to the voltage level on the microcontroller pins associated with that port; 1 for high voltage level, 0 for low voltage level. (Writing to a PIN register has no effect.)
The PORT and DDR registers control how the pin is driven. If the bit for the pin in the DDR register is 1, the pin is driven to a low or high voltage level depending on the bit in the PORT register (0 for low, 1 for high). DDR means Data Direction Register, with a bit value 1 meaning output and 0 meaning input. Reading either register gives whatever value was previously written, regardless of the voltage on the pin.
When the bit for the pin in the DDR register is 1, the pin is strongly driven high or low. In this case, the pin can drive external circuitry and forcing the pin to the opposite level will cause the microcontroller to overheat.
When the bit for the pin in the DDR register is 0, the pin is either not driven or weakly driven to the high level. In either case, the pin can be driven either high or low by external circuitry. The weak drive is useful when the pin is connected to a switch connected to ground (low voltage level). When the switch is closed, the connection overcomes the weak drive to high and reading PIN gives 0. When the switch is open, the weak drive pulls the pin to high level and reading PIN gives 1. Without the weak high drive, the voltage on the pin when the switch was open, would float freely, it could be either level.
When the bit for the pin in the DDR register is 0, the pin is not driven (high impedance) when the bit in PORT is 0, the pin is weakly driven high when the bit in PORT is 1.
DDR PORT drive
0 0 none
0 1 weak high
1 0 strong low
1 1 strong high
Development Board: LEDs and Pushbuttons
For the development board, Port C bits 0-3 should be set as outputs:
DDRC = 0x0F;
The LEDs can be turned off or on writing the PORTC register.
PORTC = value | 0xF0;
The 0xF0 causes the unused pins to be driven weakly high. This is the safest drive for unconnected pins. If they are driven strongly, the pins are subject to over current when accidentally shorted. If they are floating, they can go to an intermediate voltage which unnecessarily increases the power used by the microcontroller.
For the development board, Port D bits 2-3 should be set as inputs with weak pull up (high drive):
DDRD = 0x00;
PORTD = 0x0C;
The push buttons can be read by reading the PIND register:
value = PIND;
Because pushing the button closes the circuit to ground, pushed reads as 0 and not pushed as 1. This is the opposite of what we expect, i.e. non-zero = TRUE. For example, in the code:
#define
BUTTON_D2 (1<<2)
if( PIND &
BUTTON_D2 )
{
...
}
the true condition of the if would be taken
when the button is not pushed. One nice
way to correct this is to use the C bitwise compliment (NOT) operator, which
just reverses the sense of all the bits:
if( (~PIND) &
BUTTON_D2 )
(The extra parentheses are not necessary in this case, but it is much better to use parentheses when unnecessary than to omit them when necessarily. They also make clearer to a human reader what is to be done.)
C Language Section:
In this chapter we will learn about the GNU C compiler rather than C language.
The purpose of the compiler is to translate source code written in C into an executable program that can be put into the microcontroller.
Everything presented below is handled automatically by the IDE (integrated development environment), AVR Studio. You may need to know these details if you use the GNU tools with a different processor (with a different IDE or no IDE) and if you need to specify non standard options to the programs that perform the steps of compilation.
GNU C Compiler
The file name for the GNU C complier is gcc.exe. For the version we use, it has been changed to avr-gcc.exe in order not to conflict with native Linux GNU C complier. The file names of all the other tools (cpp.exe, as.exe, ld.exe, etc.) also have the avr- prefix.
These are the steps convert source code written in C into an executable program:
Step What
it does Reference
code files, resolves inter file references
and decides locations in memory
The compiler, gcc, is capable of executing the first four steps, which is sufficient for a program to be run on a PC. For an embedded processor, the final two steps converts the executable code into an ASCII format that is used by hardware device programmers.
In the case where more than one file is compiled and linked together, the Compiler, gcc is used to produce object code (first three steps) for each file and the Linker, ld, links them.
Besides using C language files, assembly language files can also be converted to object files (step 3, Assembler) and previously created object files can be directly taken as input to the Linker.
Previously created object files can be combined into library files. The linker can search library files and select only those object files it needs. The GNU tool, ar, can be used to create and examine library files.
The GNU tool, nm, can be used to examine object files. Both ar and nm are documented in Binutils.
Look in directory C:/WinAVR/doc and subdirectories for all GNU documentation.
Non Standard Options
Some options can be changed in the AVR Studio IDE. In the IDE, choose menu item Project->Configuration Options. For other options, you must modify the makefile, see the Linker Options section below.
Optimization Level
In the "General" pane, you can change the compiler optimization level. Optimization level, -O0 (minus, Uppercase O, zero), turns off all optimization, this is useful when tracing the execution of a program especially at the assembly language level. Optimization level, -Os, is usually the best for a smaller faster program, but it is hard to trace because statements can be rearranged and variable values can be left in registers where they are harder to find.
-save-temps
One useful option is –save-temps, mentioned above. In the IDE, choose menu item Project->Configuration Options, select then Custom Options pane, then type –save-temps in the area next to the Add button and click the Add button.
–save-temps causes two temporary files to be preserved which allows you to examine them: The C language output of the preprocessor (extension .i), and the assembly language output of the C compiler (extension .s).
If there is an error in inline assembly language, it is detected at the assembler stage. Without specifying –save-temps, the IDE cannot show you the assembly language line that contains the error.
If you have some knowledge of assembly language, reading the assembly language produced from complicated C code can help you understand what it meant to the compiler. Reading the assembly language can help when you need to optimize a section of code to run faster. Sometimes this optimization might involve changing the C code, other times it might require inline assembly language. Often you will see that the compiler has already done the best job possible.
The preprocessor does text substitution for #define macros and inserts the text of #include files into the main file. When there is an error, the C compiler reports the problem it sees in the file after this substitution. It can be difficult to figure out exactly what is happening, especially when included files conditionally include other files and macros involve other macros. Reading the output of the preprocessor allows you to see exactly what the compiler sees.
Linker Options
There are two methods to send special instructions to the linker: passing command line flags and creating a script file. Some instructions are done using one method, some using the other.
In both cases, you will need to modify the makefile. In the AVR Studio IDE, choose Build->Export Makefile. Then choose Project->Configuration Options. Select the "General" button, check "Use External Makefile, then click the "..." button and select the makefile that you exported (default file name Makefile). On the left pane, right click "Other Files", select "Add Existing File" and select the makefile. Open the makefile (double click the name), and find the line similar to
LDFLAGS
+= -Wl,-Map=projname.map
Edit the line adding the desired linker command line flags,
prefixed with -Wl, (minus, uppercase
W, lowercase L, comma, with no space after the comma). For example
LDFLAGS
+= -Wl,-T,linker.scp -Wl,-Map=projname.map
adds the linker command line flag -T,linker.scp which tells the linker to use a script file named linker.scp.
The prefix -Wl, is needed because the makefile invokes the compiler, avr-gcc.exe rather than avr-ld.exe The compiler then runs the linker. The prefix -Wl, is a command line flag for the compiler to pass the rest of the flag to the linker when it runs the linker. It's unpleasantly complicated, but that is what you need to do.
Look at the linker documentation at
C:/WinAVR-20080610/doc/binutils/ld.html/index.html or enter
avr-ld.exe –-help at the command prompt.
Linker Script
The best way to create a linker
script is to modify the default linker script.
You can get the default script by entering at the command prompt
avr-ld.exe –verbose >linker.scp
Then edit it discarding the lines
==================================================
and the small amount of text outside them.
Edit linker.scp as desired and modify
LDFLAGS
+= -Wl,-T,linker.scp -Wl,-Map=projname.map
as explained above.
If you want only to append to the default linker script, you should include the script to append as an input file to the linker rather than using the -T,command line flag. Find and modify the line in the makefile
LINKONLYOBJECTS =
to
LINKONLYOBJECTS
= append.scp
It is still useful to dump and examine the default linker
script.
Program Example Section:
We will write a program that uses the buttons and the LEDs on the development board. Just as an example, we specify that the program should do:
(15 wraps to 0 and the reverse.)
There are a few issues that are not obvious.
The program is a loop. If we increment the number every time we see the button down, if will change hundreds of thousands of times per second while the button is down. Therefore, we increment the number only the first time we see the button down until we see it released. The following code accomplishes this for both buttons:
buttons =
(~PIND) & BUTTON_ALL;
if( buttons == oldButtons ) continue;
oldButtons
= buttons;
...
The program only reaches the line ..., if the buttons have changed. Then the program should increment the number if the button has gone from released to down but not the reverse.
The next problem is resetting when both buttons are down. Normally the user will not release both buttons at exactly the same time. So the number would be incremented or decremented by one after reset, depending on which button was released second. Adding the variable, isReset, allows us to block increment and decrement until both buttons are released.
Finally there is the contact bounce problem. Contact bounce is an undesired property of mechanical switches, due to the fact that the moving part often bounces for a short time after the first time that it makes the contact. Since our program counts the number of times the contact is made (and it is fast enough to see all the bounces), contact bounce can cause multiple counts each time the button is pressed. This is not what we want.
This problem can be avoided by adding external capacitors and resistors, but these add to the cost of the parts and of manufacture. It is usually better to use a software solution.
After the first change in the state of the two buttons is detected, the code delays long enough for the bouncing to have stopped and checks again. Every time there was a difference, it waits and tries again. It the delay is long enough, the state is the same after the delay and the program goes on to acting on the new state.
You can observe contact bounce by removing this code. Just add the line
#define
NO_DEBOUNCE
at the beginning to temporarily eliminate the debounce code. The buttons on the development board don't always bounce, but you should be able to see a problem after a few tries. You can also see counts due to bounce when the button is released.
The function, void Delay( uint32_t dd ), is written using inline assembly language. It could have been written in C, provided the count variable was declared volatile as in the program example in Chapter 2. The assembly language loop is many times faster which allows better resolution and shorter possible delays. (The C version is slower because the volatile keyword forces four bytes to be loaded into registers, decremented and then stored back each time through the loop. The assembler version only decrements one byte per loop except when there is a borrow, and it keeps everything in registers.)
#include
<inttypes.h>
#include
<avr/io.h>
#define
PORTB_UNUSED ( 0xFF )
#define
LED_C0 (1<<0)
#define
LED_C1 (1<<1)
#define
LED_C2 (1<<2)
#define
LED_C3 (1<<3)
#define
LED_ALL (LED_C0 | LED_C1 | LED_C2 | LED_C3 )
#define
PORTC_UNUSED ( 0xF0 )
#define
BUTTON_D2 (1<<2)
#define
BUTTON_D3 (1<<3)
#define
BUTTON_ALL (BUTTON_D2 | BUTTON_D3)
#define
PORTD_UNUSED ( 0x30 )
void Delay( uint32_t dd
);
void main()
{
uint8_t leds, buttons, oldButtons, isReset;
DDRB = 0x00;
PORTC = PORTB_UNUSED;
DDRC = LED_ALL;
PORTC = PORTC_UNUSED;
DDRD = 0x00;
PORTD = PORTD_UNUSED | BUTTON_ALL;
leds = 0;
oldButtons
= 0;
isReset
= 0;
for(;;)
{
// wait for buttons to change
buttons =
(~PIND) & BUTTON_ALL;
if( buttons == oldButtons ) continue;
oldButtons
= buttons;
#ifndef NO_DEBOUNCE
// wait for buttons not to change for
some time
for(;;)
{
Delay( 100000
);
buttons =
(~PIND) & BUTTON_ALL;
if( buttons
== oldButtons ) break;
oldButtons
= buttons;
}
#endif
switch( buttons
)
{
case 0:
isReset
= 0;
break;
// once reset is initiated, don't
count
// until both buttons up
case BUTTON_D2:
if( !isReset ) --leds;
break;
case BUTTON_D3:
if( !isReset ) ++leds;
break;
case BUTTON_D2 | BUTTON_D3:
isReset
= 1;
leds
= 0;
break;
}
PORTC = PORTC_UNUSED| (
leds & LED_ALL );
}
}
void Delay( uint32_t dd
)
{ // do not use dd = 0
asm
volatile ("1: \n\t"
"subi %A0,1 \n\t"
"brcc 1b \n\t"
"subi %B0,1 \n\t"
"brcc 1b \n\t"
"subi %C0,1 \n\t"
"brcc 1b \n\t"
"subi %D0,1 \n\t"
"brcc
1b" : "=d" (dd)
: "0" (dd) );
}
Chapter 6 Interrupts
This chapter discusses interrupts and interrupt processing. Although interrupt support is not necessary for a processor to do useful work, it is important enough that all common processors, from the smallest microcontroller to the latest billion transistor CPUs support interrupts.
At a minimum, interrupt support is a mechanism to interrupt the normal sequence of program instruction processing in response to an event that is not caused by the program instruction processing. Interrupt support also allows resuming the interrupted program.
Interrupts are used to initiate processing of unpredictable external events quickly and efficiently. They allow external conditions to be monitored by hardware without the need for the program to spend any time, until the condition actually occurs. When the condition occurs, there is no delay waiting until the program gets to the part that checks the condition.
Sometimes, predictable events are also used to cause interrupts. In particular, a regularly occurring timer interrupt is used trigger the switching of tasks in preemptive multitasking.
Many processors also support exceptions which also involve interrupting the normal sequence of program instruction processing. Exceptions differ from interrupts in that an exception is caused by the execution of a program instruction. One of the most important types of exception is used by large processors to trigger swapping in unmapped data in virtual memory operating systems. Most small microcontroller processors do not support exceptions.
Hardware section:
The AVR microcontrollers have almost the minimum possible hardware support for interrupts. Hardware support for interrupts consists of:
And when an interrupt occurs:
Global Interrupt Flag
The global interrupt enable flag in the Status Register can be set or cleared under program control. The flag is clear at Reset when operation starts. This prevents all interrupts and allows the program to make whatever preparations are needed to successfully handle interrupts.
Edge Triggered Interrupt
Requests
The various peripherals which can request interrupts can be programmed individually whether or not actually to allow interrupts. Most interrupts are event triggered. This means that the external event sets a flag which then requests the interrupt. The request will remain until cleared by the initiation of processing the interrupt (or cleared by action of the program). If interrupts are disabled for the peripheral or globally by the interrupt enable bit in the Status Register, the request flag remains and can cause an interrupt later when the interrupt is enabled.
Level Triggered Interrupt
Requests
A few interrupt sources are level triggered. In this case, the interrupt signal is not latched in a flag, it acts directly as an interrupt request. If the signal returns to its inactive state before the interrupt processing can begin (because disabled or a higher priority interrupt request exists), no interrupt happens. If the signal is still active after the interrupt processing is finished, another interrupt is requested and can occur.
Interrupt Vector Table
The ATMega168 has 25 separate interrupt sources. Each of these can cause the start of execution of a different block of code. Reset of the microcontroller is treated similarly, in that it also causes the start of execution of a block of code. Counting Reset, there are 26 possible addresses for the start of code to handle interrupts or the start of the program at reset. The first 52 locations in program memory are used for this purpose.
Reset causes execution to start at location 0x0000, the highest priority interrupt (called INT0) causes execution to start at location 0x0002, and so on to the lowest priority interrupt (called SPM READY) causes execution to start at location 0x0032 (decimal 50).
Two memory locations is just enough space for a single assembly language Jump instruction. Normally, the first 52 locations in program memory hold 26 Jump instructions to different blocks of code, the first being the start of the program and the rest to handle interrupts. A program that never allows interrupts could simply begin at location 0x0000, but the code normally created by the C compiler always includes this block of Jump instructions. This block of Jump instructions is called the Interrupt Vector Table.
Processor Architecture
Section:
Interrupt Processing
When one or more interrupts are enabled, their requests are active, and interrupts are enabled globally, the following steps occur:
Restoring Context
At this point the main program should continue to execute as it would if the interrupt had not occurred. In order for the program to continue unaffected, the interrupt handler code must restore the state of all the registers that it modified during its operation (assuming these registers are used anywhere in the main program also) to their original state.
Code produced by the C compiler uses all 32 of the general purpose registers. Therefore (unless the main program is not written in C), interrupt handlers must save and restore every register they use. In addition, they must save and restore the Status Register since its flags temporarily hold the status result of certain instructions.
In rare instances, it is possible to write interrupt handlers in assembly language that do not change the Status Register or any general purpose registers. Otherwise, interrupt handler code always saves the Status Register and whichever general purpose registers will be used at the beginning, and restores them at the end, finally executing a RETI (Return From Interrupt) instruction.
C Language Section:
The C language, by itself, has no support for interrupt handler code. The GNU C compiler provides a language extension to support interrupt handler code. Without this extension, all interrupt handlers would need assembly language code for saving and restoring context, for ending it with a RETI instruction, and for inserting the proper Jump instruction in the Interrupt Vector Table.
The GNU C compiler extension, however, does all of this, and allows us to write programs that include interrupt handlers entirely in C, except for the special keywords used to declare interrupt handlers.
Declaring Interrupt Handlers
Here is the definition of an interrupt handler in GNU C, using definitions in the header file, <interrupt.h>:
ISR( INT0_vect, ISR_BLOCK )
{
++gbl.intrCount;
}
The name, INT0_vect, is defined as __vector_1 in the include file, <avr/iomx8.h>, which defines constants for microcontrollers in the ATMega48 family. This file is included by <avr/io.h>, which is included in any C program that uses these constants.
Here is a list of the Interrupt Vector Table and the interrupt names from <avr/iomx8.h>:
/* reset - not an interrupt */
0
INT0_vect
/* External Interrupt
Request 0 */ 1
INT1_vect
/* External Interrupt
Request 1 */ 2
PCINT0_vect /*
Pin Change Interrupt Request 0 */ 3
PCINT1_vect /*
Pin Change Interrupt Request 1 */ 4
PCINT2_vect
/* Pin Change Interrupt
Request 2 */ 5
PCINT2_vect
/* Watchdog Time-out Interrupt
*/ 6
TIMER2_COMPA_vect
/* Timer/Counter2 Compare Match A */ 7
TIMER2_COMPB_vect
/* Timer/Counter2 Compare Match B */ 8
TIMER2_OVF_vect
/* Timer/Counter2 Overflow */ 9
TIMER1_CAPT_vect
/*
Timer/Counter1 Capture Event */ 10
TIMER1_COMPA_vect
/* Timer/Counter1 Compare Match A */ 11
TIMER1_COMPB_vect
/* Timer/Counter1 Compare Match B */ 12
TIMER1_OVF_vect
/* Timer/Counter1 Overflow */ 13
TIMER0_COMPA_vect
/* TimerCounter0 Compare Match A */ 14
TIMER0_COMPB_vect
/* TimerCounter0 Compare Match B */ 15
TIMER0_OVF_vect
/* Timer/Couner0 Overflow */ 16
SPI_STC_vect /* SPI Serial Transfer Complete */ 17
USART_RX_vect /* USART Rx Complete */ 18
USART_UDRE_vect /* USART, Data Register Empty */ 19
USART_TX_vect /* USART Tx
Complete */ 20
ADC_vect /*
ADC Conversion Complete */ 21
EE_READY_vect /* EEPROM Ready */ 22
ANALOG_COMP_vect /* Analog Comparator */ 23
TWI_vect /*
Two-wire Serial Interface */ 24
SPM_READY_vect /* Store Program Memory Read */ 25
Interrupt Handler Attributes
The attribute, ISR_BLOCK, is defined as nothing – it denotes the default behavior which is for the interrupt handler not to enable interrupts after the hardware disables them at its beginning. Three attribute are available:
The attribute, ISR_NOBLOCK, causes interrupts to be re enabled immediately in the interrupt handler. This can be useful since using the first C statement to re enable interrupts involves a significant delay needed to save registers etc. In practice, interrupt handlers are usually designed to finish so quickly so there is little need to re enable interrupts sooner. Allowing nested interrupts has the disadvantage of doubling the considerable amount of space on the stack used to hold the saved context.
The attribute, ISR_ NAKED, causes the compiler not to create any prolog or epilog. No code to save and restore context is created and no RETI instruction is put at the end. The first object code of the interrupt handler is the first line of C code. In this case, there must be inline assembly language code at the beginning and end of the handler to save and restore context.
When we look a multithreading, where each thread has a stack, we will use naked interrupt handlers to save most of the context in a fixed area rather than the stack. Otherwise the space must be reserved in each stack (since the interrupt can come at any time), which multiplies the amount of reserved space by the number of tasks.
This is the location of the documentation for the header, <interrupt.h>:
C:/WinAVR-20080610/
doc/avr-libc/avr-libc-user-manual/group__avr__interrupts.html
Structure of Interrupt Handler Code
We can look at the output of the preprocessor to see what
really is given to the compiler:
void __vector_1(void) __attribute__((signal,used,externally_visible));
void __vector_1(void)
{
++gbl.intrCount;
}
The interrupt handler is a function with a special name and some special attributes. The attributes are extensions to the C language. In particular, the attribute, signal, causes the function to save and restore the Status Register, any general purpose registers used, to set r1 equal to zero (a condition assumed by the compiler), and to end with a RETI instruction rather than a RET instruction.
This is the documentation for the attribute extension to the C language:
C:/WinAVR-20080610/doc/gcc/HTML/gcc-4.3.0/gcc/Function-Attributes.html
Finally, we examine the assembly language code produced for the interrupt handler listed above, with comments added:
.global __vector_1
.type __vector_1,
@function
__vector_1:
push r1 /* save r1 – it will be set to zero*/
push r0 /* save r0 – it is used below and possibly
C code*/
in r0,__SREG__ /* get Status Register */
push r0 /* and save its value */
clr
r1 /* make sure r1 is zero, in case C
code uses it */
push r24 /* save r24 – it is used by C code below */
push r25 /* save r25 – it is used by C code below */
/*
C code starts here */
/* ++gbl.intrCount; */
lds
r24,gbl /* load low byte of gbl.intrCount to r24*/
lds
r25,(gbl)+1 /*
load high byte of gbl.intrCount to r25 */
adiw
r24,1 /* add one to register pair
r24,r25 */
sts
(gbl)+1,r25 /*
store r25 to high byte of gbl.intrCount */
sts
gbl,r24 /* store r24 to low byte of gbl.intrCount */
/*
C code ends here */
pop r25 /* restore r25 from stack */
pop r24 /* restore r24 from stack */
pop r0 /* get old value of Status Register from
stack */
out __SREG__,r0 /* and restore it to Status Register */
pop r0 /* restore r0 from stack */
pop r1 /* restore r1 from stack */
reti /* return to interrupted code, enable
interrupts */
Interrupt Handler Stack Usage
Processing this interrupt temporarily uses seven bytes on the stack: two bytes for the return address saved by hardware, one for the Status Register, two for r0 and r1 which are sometimes used by C code, and two for r24 and r25 which were used by this particular C code.
Calling a C function that did the same thing would have used only two bytes on the stack: two bytes for the return address saved by the CALL instruction. Although, r24 and r25 would still be modified, the compiler never expects those particular registers to be preserved after the call. Therefore, normal functions do not need to preserve them.
Avoid Function Calls from
Interrupt Handler
The worst thing for stack usage is when an interrupt handler calls a separate function. Twelve registers total (including r24 and r25) are expected not to be preserved by the called function. If our interrupt handler had called a function, the compiler would have been forced to assume any of these might have been modified. (The compiler never looks outside the function it is compiling. Often the called functions are not compiled at the same time.) Therefore the interrupt handler would need to save all twelve.
Sometimes an interrupt handler and the main line code need to execute the same block of code. Normally, that block of code should be a function, called from both places, but we want to avoid that in the case of the interrupt handler. On the other hand, copying a block of code to two places is bad practice because it makes the code hard to maintain. (If you change the code someday, how do you know that there is another copy and where it is?)
extern inline Functions
The solution to this dilemma is to use an inline function. Please see the Compiler documentation: C:/WinAVR-20080610/doc/gcc/HTML/gcc-4.3.0/gcc/Inline.html
Declaring a function extern inline prevents separate object code that could be called from being generated. This is particularly useful because it allows the function to be defined in multiple files without creating a linker error. This way, the function can be defined in a header file and included in several C files.
Program Example Section:
External Interrupts
This program uses External Interrupts to demonstrate level triggered and edge triggered interrupt requests. External Interrupts are interrupts that can be generated by the Digital I/O Port inputs.
There are two types of External Interrupts: the original more versatile External Interrupts which are available on only two pins, and the Pin Change Interrupts which are available on all pins.
Pin Change Interrupts
The Pin Change Interrupts are always edge triggered and occur on both the rising and falling edges. They are grouped so that a change on any pin of Port B creates the same interrupt, PCINT0. Similarly for Port C with PCINT1 and Port D with PCINT2, so there are a total of three separate Pin Change Interrupts.
INT0 and INT1 Interrupts
The more versatile External Interrupts are INT0 associated with the bit 2 pin of PortD, and INT1 associated with the bit 3 pin of PortD. INT0 and INT1 can be programmed to interrupt in one of four cases:
The program below uses INT0, which is connected to the D2 button of the development board, in either mode 0 or mode 2.
The other button, D3, selects one of four states:
The program demonstrated the following properties:
Here are some comments about the program.
Grouping Global Variables in
a Structure
The interrupt handler can act only on global variables. We put all the global variables into a single structure. (In this case, there is only one.) This simply helps keep track of them. Global variables are generally bad for maintainability and should be closely watched.
Avoiding Static Variables
The function, CheckState(), uses several parameters which are preserved between calls, therefore they cannot be saved in local storage. There are three places they could have been kept:
We use the third option. In larger programs the first two options tend to reduce maintainability. Although it would not be bad here, it is a good habit to avoid them whenever possible, even though it makes the code a bit more complicated.
Similarly to the previous example program, the function, CheckState(), has code to debounce button D3.
In the code where the program prints gbl.intrCount and the sets it to zero, interrupts are temporarily blocked. The reason for this will be explained in the next chapter. Normally, interrupts are blocked by calling cli(), and enabled by calling sei(). (These are actually not functions but macros that insert a single inline assembly language instruction CLI or SEI.) In this case, it is not appropriate to use sei()to unblock interrupts because in two of the four test states, interrupts were blocked already. What we want is:
The functions SaveAndCli() and IntrRestore() do this. SaveAndCli() returns the original 8 bit value of the Status Register and IntrRestore() restores it. The fact that the other seven bits of the Status Register are also restored is of no consequence.
Disabling Interrupts for a
Block of Code
Using functions like SaveAndCli() and IntrRestore() is the only correct way to block interrupts in a section of code unless you are sure that interrupts will always be enabled when the code is called. In a large program, you usually do not know.
#include
<inttypes.h>
#include
<avr/io.h>
#include
<avr/interrupt.h>
#include
"blcalls.h"
//
Defines for bits within bytes
#define
BIT(n) (1<<n)
#define
BITS( v, hi, lo ) ( v & ( (1<<(hi-lo+1)) -1
) << lo )
#define
SET_BIT( r, n ) { r |= BIT(n); }
#define
CLR_BIT( r, n ) { r &= ~BIT(n); }
#define
TST_BIT( r, n ) ( r & BIT(n) )
//
Defines for development board
#define
PORTB_UNUSED ( 0xFF )
#define
LED_C0 (1<<0)
#define
LED_C1 (1<<1)
#define
LED_C2 (1<<2)
#define
LED_C3 (1<<3)
#define
LED_ALL (LED_C0 | LED_C1 | LED_C2 | LED_C3 )
#define
PORTC_UNUSED ( 0xF0 )
#define
BUTTON_D2 (1<<2)
#define
BUTTON_D3 (1<<3)
#define
BUTTON_ALL (BUTTON_D2 | BUTTON_D3)
#define
PORTD_UNUSED ( 0x30 )
#define
INT0_LEVEL 0x00 //
INT0 level triggered on low level
#define
INT0_EDGE 0x02 // INT0 edge triggered
on falling edge
//
Defines for meaning of gbl.state
#define
STATE_INTR_ENABLE 0x01
#define
STATE_INT0_EDGE 0x02
//
put all globals in one structure
typedef struct
{
uint16_t intrCount; // set by
interrupt handler
}
t_Gbl;
t_Gbl volatile gbl;
typedef struct
{
uint8_t init;
uint8_t oldButton;
uint8_t state;
}
t_CheckState;
uint8_t SaveAndCli(void);
void IntrRestore(
uint8_t oldSreg );
uint8_t CheckState( t_CheckState * p );
void Delay( uint32_t dd
);
void __attribute__((noreturn))
main()
{
t_CheckState
csVars;
// set up ports
DDRB =
0x00;
PORTC = PORTB_UNUSED;
DDRC = LED_ALL;
PORTC = PORTC_UNUSED;
DDRD = 0x00;
PORTD = PORTD_UNUSED | BUTTON_ALL;
// enable INT0
EIMSK = BIT( INT0
);
gbl.intrCount = 0;
csVars.init = 1;
for( ;; ) // forever loop
{
// check for button D3 changing state
if( CheckState( &csVars ) )
{
char * stateNames[4] =
{
"Level Trig - Intr Disable",
"Level Trig - Intr Enable",
"Edge Trig - Intr Disable",
"Edge Trig - Intr
Enable"
};
// print current value of INTF0
DbgPrtStr(
"INTF0 was" );
DbgPrtByteNL(
(EIFR >>INTF0) & 1 );
DbgPrtStrNL(
"" );
DbgPrtStrNL(
stateNames[csVars.state] );
// set global interrupts on or off
if( csVars.state & STATE_INTR_ENABLE ) sei();
else cli();
// set global interrupts on or off
if( csVars.state & STATE_INT0_EDGE )
EICRA = BITS(
INT0_EDGE, ISC01, ISC00 );
else EICRA = BITS( INT0_LEVEL, ISC01, ISC00 );
}
// Print interrupt count, if non zero
if( gbl.intrCount > 0 )
{
uint8_t intrSave;
uint16_t temp;
// block interrupts while copy and clear count
intrSave
= SaveAndCli();
temp = gbl.intrCount;
gbl.intrCount = 0;
IntrRestore( intrSave );
DbgPrtWordNL( temp );
}
}
}
ISR( INT0_vect, ISR_BLOCK )
{
++gbl.intrCount;
}
//
returns 1 if state is changed, 0 otherwise
uint8_t CheckState( t_CheckState * p )
{
uint8_t button;
if( p->init )
{ // initialization
p->init = 0;
p->state = 0;
p->oldButton = 0;
return 1;
}
button = ~PIND
& BUTTON_D3;
if( button ==
p->oldButton ) return 0;
Delay( 100000 );
button = ~PIND
& BUTTON_D3;
if( button ==
p->oldButton ) return 0;
p->oldButton = button;
if( !button )
return 0;
p->state =
(p->state +1) & 0x03;
return 1;
}
void Delay( uint32_t dd
)
{ // do not use dd = 0
asm
volatile ("1: \n\t"
"subi %A0,1 \n\t"
"brcc 1b \n\t"
"subi %B0,1 \n\t"
"brcc 1b \n\t"
"subi %C0,1 \n\t"
"brcc 1b \n\t"
"subi %D0,1 \n\t"
"brcc
1b" : "=d" (dd)
: "0" (dd) );
}
//
returns flags (interrupt enable in particular)
//
and disables interrupts
uint8_t SaveAndCli(void)
{
uint8_t ret;
ret = SREG;
cli();
return ret;
}
//
restores flags (interrupt enable in particular)
void IntrRestore(
uint8_t oldSreg )
{
SREG = oldSreg;
}
Chapter 7 Concurrency Issues with Interrupts
The fact that interrupts can come at any time has sometimes unpleasantly surprising consequences. One often assumes that programs flow from statement to statement as written and that the statements do what they say they will do. Neither of these assumptions is necessarily true when interrupt affect the same variables that the main line program is using.
Basically, C code operates on the assumption that nothing else is happening while it is running. C code can be and is used when that is not the case, but some precautions and a lot of extra thought are needed.
Hardware section:
Nothing new related to hardware needs to be presented in this chapter.
Processor Architecture
Section:
There are two ways interactions between main line code and interrupt code cause problems: when the interrupt occurs between the blocks of object code corresponding to C statements and when it occurs within a block.
Non-atomic C Statements
We will examine the case where the interrupt comes in the middle of a C statement first. Consider the operation of incrementing a two byte variable. The increment requires an add operation for each byte and the interrupt can come between. The C statement is called non-atomic because if can be interrupted in the middle. For example:
uint16_t volatile v;
int main(void)
{
...
...
v = 0x00FF;
for(;;)
{
++v;
if( v > 0x0101
) v = 0x00FF;
}
}
If an interrupt handler reads the value of v at random times, what values are possible? The naive answer is 0x00FF, 0x0100, 0x0101, and 0x0102. In reality, the values 0x0000, and 0x01FF will also be seen.
When the value is 0x00FF and is ++v executed, first the high byte of the incremented value, 0x01, is stored back to the variable, v, and then the low byte, 0x00. Reading the value of v between the two store operations gives 0x01FF. (If the bytes were set in the opposite order, 0x0000 would be seen instead.)
When the value is 0x0102 and is v = 0xFF executed, the high byte is set to 0x00 and then the low byte is set to 0xFF. Reading the value of v in between gives 0x0002.
C Language Section:
Concurrency Problems
The other type of problem occurs when the interrupt changes a variable between C statements. In the following code, the interrupt handler sets a variable to indicate the interrupt occurred. The main line code counts the interrupts.
uint8_t volatile intrFlag;
ISR( ... )
{
intrFlag
= 1;
}
int main(void)
{
uint32_t intrCount;
...
...
intrCount = 0;
for(;;)
{
if( intrFlag )
{
intrFlag
= 0;
++intrCount;
...
}
}
}
This works, assuming that the for loop runs quickly enough to count each interrupt before the next one comes.
If this is not the case, we can improve the program by
allowing the variable,
intrFlag, to count higher than one.
uint8_t volatile intrFlag;
ISR( ... )
{
++intrFlag;
}
int main(void)
{
uint32_t intrCount;
...
...
intrCount = 0;
for(;;)
{
if( intrFlag > 0 )
{
intrCount
+= intrFlag;
intrFlag
= 0; // error, potential race
condition
...
}
}
}
This allows up to 255 interrupts to accumulate before the main line loop must read and reset intrFlag. There is a problem though. When intrFlag > 0, and intrCount is increased by the value of intrFlag, it is possible that another interrupt occurs before intrFlag is reset to zero. In this case, the last interrupt is never counted.
This is corrected by disabling interrupts around these two statements as demonstrated in the previous chapter.
Interrupt Latency
Disabling interrupts temporarily is undesirable because it increases interrupt latency, the maximum time an interrupt might be delayed before processing begins. As we saw in the previous chapter, the time used by other interrupt handlers also creates latency, so it is good to make interrupt handlers as short as possible. This is the reason that we increment a single byte variable, intrFlag, instead of incrementing the four byte variable, intrCount, immediately inside the interrupt handler.
The maximum interrupt latency is usually a bigger concern that the average percentage of time that interrupts are blocked.
Examining Interrupt Latency
Finally, we will assume it is important to reduce the interrupt latency of this program.
#include
<inttypes.h>
#include
<avr/io.h>
#include
<avr/interrupt.h>
uint8_t volatile intrFlag;
extern void DoSomething(
uint32_t t );
ISR( INT0_vect, ISR_BLOCK )
{
++intrFlag;
}
void __attribute__((noreturn))
main()
{
uint32_t intrCount;
intrCount
= 0;
for(;;)
{
if( intrFlag > 0 )
{
cli();
intrCount
+= intrFlag;
intrFlag
= 0;
sei();
DoSomething( intrCount );
}
}
}
We need to add the call to DoSomething in order to prevent the compiler from seeing that the variable, intrCount, is not used and omitting the calculation, intrCount += intrFlag, as an optimization. In a real program, this would not be a problem because we would not have calculated intrCount if we did not use it somewhere.
Here is the assembly language produced with optimization on (-Os). We will count the clock cycles where interrupts are disabled. Carefully reading the data sheet manual for the processor tells that the following instructions (of those executed while interrupts are disabled) take more than one clock cycle:
The data sheet also says:
00000000
<__vectors>:
jmp <__ctors_end>
jmp <__vector_1>
...
__vector_1:
push __zero_reg__
push r0
in r0,__SREG__
push r0
clr
__zero_reg__
push r24
lds
r24,intrFlag
subi
r24,lo8(-(1))
sts
intrFlag,r24
pop r24
pop r0
out __SREG__,r0
pop r0
pop __zero_reg__
reti
main:
clr
r14
clr
r15
movw
r16,r14
.L8:
lds
r24,intrFlag
tst
r24
breq
.L8
cli
lds
r24,intrFlag
add r14,r24
adc
r15,__zero_reg__
adc
r16,__zero_reg__
adc
r17,__zero_reg__
sts
intrFlag,__zero_reg__
sei
movw
r24,r16
movw
r22,r14
call DoSomething
rjmp
.L8
With this information, we find the following latencies:
Allowing Nested Interrupts
We will try to reduce the interrupt handler latency. First, we will try allowing the handler to be interrupted:
ISR( INT0_vect, ISR_NOBLOCK )
{
++intrFlag;
}
__vector_1:
sei
push __zero_reg__
push r0
in r0,__SREG__
push r0
clr
__zero_reg__
push r24
lds
r24,intrFlag
subi
r24,lo8(-(1)) /* subtract -1 == add 1 */
sts
intrFlag,r24
pop r24
pop r0
out __SREG__,r0
pop r0
pop __zero_reg__
reti
Here is the assembly language produced. The only difference is the sei instruction inserted at the beginning. This reduces the latency from 39 to 10 clock cycles. This includes the two cycles needed by the push instruction after the sei. Putting a nop instruction, which takes only one cycle and does nothing, in between would reduce this to 9.
Adding the ISR_NOBLOCK attribute has the following dangers.
If this interrupt is level triggered, it will recur immediately, unless the source has changed back to inactivate very soon after it became active. In this case, the interrupt handler will never complete. Rather the hardware push of the two byte return address onto the stack followed by the execution of the sei and push instructions will occur repeatedly every 10 clock cycles. Three bytes of the stack space will be used each time and the stack will overflow the space allocated to it and crash the system. This would take about 150 microseconds if all of memory were allocated for the stack.
Therefore, avoid using the ISR_NOBLOCK attribute for level triggered interrupts.
Even if the interrupt is edge triggered, it must not occur more often than the 40 clock cycles required by the interrupt handler to finish, or else the stack will grow. If this happens occasionally, the stack could tolerate it, but then there is a concurrency problem: If the second interrupt occurs between the instructions
lds
r24,intrFlag
subi
r24,lo8(-(1))
sts
intrFlag,r24
the second interrupt will have incremented intrFlag before the first stores back its incremented value. The net effect is that one count is lost.
Blocking a Single Type of Interrupt
There is a better way to improve latency without allowing the same interrupt to nest. The basic idea will be to block only the INT0 interrupt, without blocking interrupts globally. To do this, we must temporarily set the INT0 bit in the EIMSK register to zero.
ISR( INT0_vect, ISR_BLOCK )
{
EIMSK &= ~(1<<
INT0);
sei();
++intrFlag;
cli();
EIMSK |= (1<< INT0);
}
Here is the output of the complier. The latency is now 23 clock cycles for the
first half and 21 for the second half.
__vector_1:
push __zero_reg__
push r0
in r0,__SREG__
push r0
clr
__zero_reg__
push r24
/* EIMSK &= ~(1<<
INT0); */
in r24,61-32
andi
r24,lo8(-2)
out 61-32,r24
/* sei(); */
sei
/* ++intrFlag; */
lds
r24,intrFlag
subi
r24,lo8(-(1))
sts
intrFlag,r24
/* cli(); */
cli
/* EIMSK |= (1<< INT0); */
in r24,61-32
ori
r24,lo8(1)
out 61-32,r24
pop r24
pop r0
out __SREG__,r0
pop r0
pop __zero_reg__
reti
ISR_NAKED and Inline Assembly
to Improve Latency
What else can we do to improve it? Next we will try using the ISR_NAKED attribute, which will allow us more detailed control using inline assembly language. We will start out with the assembly language output from the C code above, and then remove unnecessary instructions.
We do not change the registers r0 or r1(__zero_reg__), so we do not need to save them. The subi instruction changes the Status register, but we can wait to save it until after global interrupts are enabled. Here is the improved version:
__vector_1:
push r24
/* EIMSK &= ~(1<<
INT0); */
in r24,61-32
andi
r24,lo8(-2)
out 61-32,r24
/* sei(); */
sei
in r24,__SREG__
push r24
/* ++intrFlag; */
lds
r24,intrFlag
subi
r24,lo8(-(1))
sts
intrFlag,r24
pop r24
out __SREG__, r24
/* cli(); */
cli
/* EIMSK |= (1<< INT0); */
in r24,61-32
ori
r24,lo8(1)
out 61-32,r24
pop r24
reti
The latency is now 14 clock cycles for the first half and 14 for the second half. Written in C, with symbolic names for EIMSK etc., it looks as follows:
ISR( INT0_vect, ISR_NAKED )
{
asm
volatile ( "push r24" );
/* EIMSK &= ~(1<<
INT0); */
asm
volatile ( "in r24, %0" : : "I" ( _SFR_IO_ADDR(EIMSK) ) );
asm
volatile ( "andi r24, %0" : : "M"
( (~(1<< INT0)) & 0xFF ) );
asm
volatile ( "out %0, r24" : :
"I" ( _SFR_IO_ADDR(EIMSK) ) );
/* sei(); */
asm
volatile ( "sei" );
asm
volatile ( "in r24, __SREG__" );
asm
volatile ( "push r24" );
/* ++intrFlag; */
asm
volatile ( "lds r24, intrFlag"
);
asm
volatile ( "subi r24, %0" : : "M"
( (-1) & 0xFF ) );
asm
volatile ( "sts intrFlag,
r24" );
asm
volatile ( "pop r24" );
asm
volatile ( "out __SREG__, r24" );
/* cli(); */
asm
volatile ( "cli" );
/* EIMSK |= (1<< INT0); */
asm
volatile ( "in r24, %0" : :
"I" ( _SFR_IO_ADDR(EIMSK) ) );
asm volatile ( "andi r24, %0" : : "M" ( (1<< INT0)
& 0xFF ) );
asm
volatile ( "out %0, r24" : :
"I" ( _SFR_IO_ADDR(EIMSK) ) );
asm
volatile ( "pop r24" );
asm
volatile ( "reti" );
}
It took a few tries to discover that the macro, lo8(), does not work in this context and the constants , -1 and ~(1<< INT0), need to be reduced to 8 bits by adding & 0xFF — and that the output register number in the out instruction needs to be treated as an input (because the instruction does not modify the number itself, but rather the register at that address).
Writing and maintaining inline assembly language is probably ten times more difficult than C code, and then only after you are somewhat familiar with the assembly language. Usually only a few lines are needed. In this case, it is justified by the (presumed) need to reduce the interrupt latency.
The 14 clock cycle latency is still more that the 11 cycle latency
of the main line code, so it is not necessary to try to reduce it:
if( intrFlag > 0 )
{
cli();
intrCount
+= intrFlag;
intrFlag
= 0;
sei();
However, we can reduce it to zero by again blocking only the
INT0 interrupt,
without blocking interrupts globally:
if( intrFlag > 0 )
{
EIMSK &= ~(1<<
INT0);
intrCount
+= intrFlag;
intrFlag
= 0;
EIMSK |= (1<< INT0);
Program Example Section:
#include
<inttypes.h>
#include
<avr/io.h>
#include
<avr/interrupt.h>
#include
"devbd.h"
#include
"blcalls.h"
//
put all globals in one structure
typedef struct
{
uint16_t curVal; // read by
interrupt handler
uint16_t seenVal; // set by
interrupt handler
}
t_Gbl;
t_Gbl volatile gbl;
ISR( INT0_vect, ISR_BLOCK )
{
gbl.seenVal = gbl.curVal;
}
void __attribute__((noreturn))
main()
{
// set up ports
DDRB = 0x00;
PORTC = PORTB_UNUSED;
DDRC = LED_ALL;
PORTC = PORTC_UNUSED;
DDRD = 0x00;
PORTD = PORTD_UNUSED | BUTTON_ALL;
// enable INT0
EIMSK = BIT( INT0
);
EICRA = BITS(
INT0_EDGE, ISC01, ISC00 );
gbl.curVal =
0x00FF;
gbl.seenVal =
0xAAAA;
sei();
for(;;)
{
// gbl.curVal goes
between 0x00FF and 0x101 and is breifly 0x0102
++gbl.curVal;
if( gbl.curVal > 0x0101 ) gbl.curVal = 0x00FF;
// gbl.seenVal
is set to a sample of gbl.curVal
// every time INT0 comes
// This prints
it when that happens
if( gbl.seenVal != 0xAAAA )
{
DbgPrtWord( gbl.seenVal );
if( gbl.seenVal < 0x00FF
|| gbl.seenVal
> 0x0102 )
DbgPrtStrNL(
"***" );
else DbgPrtStrNL( "" );
gbl.seenVal =
0xAAAA;
}
}
}
Button D2 is connected to INT0. Push it repeatedly and see occasional failures due to the non-atomic nature of the statement, ++gbl.curVal.
The rate of failure is relatively high because the loop is short and executed very often. In real code, the failure rate might be a few times per year. In that case, it might not be found until the design was shipped to thousands of customers and it might take most of a year to reproduce it in a debug setting. The faster alternative might be for someone (you?) to carefully read a hundred thousand lines of code. Be careful not to make concurrency mistakes and do not expect them to be found during testing. Such bugs can be costly to a company (and to your career).