C and C++

Source Code Analysis

using

CodeCheck

 

by

Loren Cobb, PhD.

 

CodeCheck is a product of Abraxas Software, Inc.

CodeCheck was designed & written by Loren Cobb.

 

 

 

 

For more information, contact:

Abraxas Software, Inc.

 

Phone:  503-232-0540

Fax:  206.309.0304

Email: support@abxsoft.com

www.abxsoft.com

 


Table of Contents

 


Preface............................................................................................................................................................. v

Acknowledgments................................................................................................................................. vi

Quick Start:

0.1     Installation........................................................................................................................................ 1

0.2 Command-Line Options.................................................................................................................... 1

0.3          CodeCheck File Names............................................................................................................ 8

0.4          How to Use CodeCheck........................................................................................................... 9

Introduction to CodeCheck:

1.1          The Elements of Style.......................................................................................................... 11

1.2          Why Programs Break............................................................................................................ 13

1.3          Why Programs Fail to Port.............................................................................................. 15

1.4          The Structure of CodeCheck.......................................................................................... 18

1.5          Debugging with CodeCheck............................................................................................ 19

1.6          Predefined Macro Constants.......................................................................................... 20

Checking Types:

3.1          How to Analyze a Type Declaration........................................................................... 24

3.2          How to Determine the Type of an Identifier........................................................ 26

3.3          How to Determine the Type of an Operand........................................................... 27

3.4          How to Detect Implicit Type Conversions............................................................. 27

Portable Style:

4.1          Lexical Issues in Portability........................................................................................... 29

4.2          Preprocessor Considerations....................................................................................... 39

4.3          Portability in Declarations............................................................................................ 48

4.4          Portability at the Expression Level......................................................................... 55

4.5          Portability of Functions................................................................................................... 56

4.6          C Compiler Limits..................................................................................................................... 58

Maintainable Style:

5.1          Lexical Issues in Program Maintenance................................................................. 62

5.2          Preprocessor Considerations....................................................................................... 78

5.3          Maintainability in Declarations................................................................................... 81

5.4          Maintainability at the Project Level....................................................................... 86

Software Metrics:

6.1          Program Size............................................................................................................................... 87

6.2          Logical Complexity............................................................................................................... 99

6.3          Code Density.............................................................................................................................. 105

CodeCheck Rule Sets:

7.1          Verifying POSIX.1 compliance....................................................................................... 112

7.2          Compliance with Coding Standards........................................................................ 118

7.3          Porting to ANSI C.................................................................................................................... 121

7.4          Porting to Strict K&R Compilers.............................................................................. 127

7.5          Measuring Code Complexity......................................................................................... 129

7.6          Verifying the Order of Module Elements.......................................................... 131

7.7          C++ Rules...................................................................................................................................... 133

7.8          Advanced C++ Rules.............................................................................................................. 135

Supporting Material:

8.1          Glossary....................................................................................................................................... 147

8.2          Bibliography............................................................................................................................ 154

8.3         Index................................................................................................................................................. 156


Preface

Producing accurate, reliable and flexible programs in C or C++ is a difficult task. Even experienced programmers need tools to aid in the program develop­ment process, but all too few tools exist in today’s market that can detect bugs in C and C++ source code and help the programmer to avoid problems.

CodeCheck is a powerful tool for analyzing C and C++ source code. Unlike other tools, Codecheck is itself fully programmable. It performs its primary task — analyzing and critiquing C and C++ source code — entirely under the direc­tion of a user-written control program.

CodeCheck is not a new version of that old C programmer’s standby, lint, although it can per­form some lint-like error detection. For example, Code­Check compares all declarations and macro definitions across all modules of a project, to detect inconsis­tencies. The main thrust of CodeCheck is to de­tect noncom­pliance with codified style standards, to detect maintenance or port­ability prob­lems within code which al­ready compiles perfectly on today’s compilers, and to compute cus­tomized quan­titative indicators of code size, com­plex­ity, and den­sity.

Stan­dards and mea­sures can be specified by the user for a tremendous number of fea­tures of C code that have an impact on portabil­ity, maintainability, and style. Code­Check is de­signed to enhance dra­matically the effec­tiveness and effi­ciency of project man­agement in com­mercial and indus­trial pro­gram­ming ef­forts.  A custom CodeCheck pro­gram specifying code stan­dard­s and measures can be written by a pro­ject leader using the CodeCheck language (actually a re­strict­ed subset of C itself). Code­Check can be pro­grammed to:

a.   Monitor compliance with standards for programming style, rules for type-encoded prefixes for identifiers, proper use of macros and typedefs, prototypes, etc.

b.   Identify code that is not portable to or from any particular envi­ronment (machine, com­­­piler, oper­ating system, or inter­face standard).

c.   Quantify code maintainability with user-defined measures at all levels: line, statement, func­tion, file, and project. Compute McCabe and Halstead complexity measures.

Sample CodeCheck programs are provided for a variety of problems, rang­ing from portability to complexity to compliance with style standards.


Acknowledgments

We gratefully acknowledge the invaluable help given to the CodeCheck pro­ject by the following individuals, who contributed suggestions and bug re­ports. We couldn't have done it without you!

 


Jan-Anders Åkerholm

Wendy Averdung

George Baker

Wahab Baldwin
Ed Batutis
Nasser Bazzi
John Benson
Bill Bentley
Dana Birkby
Dale Bremer
John Bradley

Mike Branson

Linda Brigham
Jeff Brown
Van Brollini

Thomas Brustbauer

Walt Buehring
Laura Burke
Bill Camp­bell
Pat Cappelare
Camille Carum

Rob Chambers
Alex Chervet

Tim Child

John Clinton
Patrick Conley

Darryl Cornish
Bill Costello
Kevin Coyle
Mike Curry

Mark De May

Matt Diamond
David Doerner
John Doggett
Bob Domitz

Tom Dropka
Ge­orge Entwhistle

Richard Evans

Aaron Fager
Brent Fairbanks

Bud Feuless

Steve Fine

Julianne Fontenoy

Keith Fulton
Greg Germano
Bud Feuless

Keith Fulton
Shawn Garbett

Jerry Garcia
Bonnie Gilmore

Dennis Glenn

Soo Hye Goh

David Gordon

Bruce Graham

Elaine Granoff
Jeff Johnson
Brett Halle
Esko Hannula

Othar Hansson
Tris Harkless
Bill Hazzard

John Her­bold

Alison Hine

Paul Hurley

Jim Jacobson

David Johnson
Mike Johnson
Darrell Jones

Arun Joshi

Ken Joyner

Ed Kirk

Ian Koenig

Tom Kohler

Hannu Kokko

Detlef Kowalewski
Ron Kuhn

Patrica Langer

Mark Lamer

Eric Lear

David Linsky

Alan Liu

Martin Lord

Tom Lucas
Frank Lusardi
Paul McGlashan

Bill McMahon

Terry McNulty

Eric Melbardis

Mike Muegel

Marcel Meyer
Deborah Miller
Steve Monett
Stephen Montgomery
Tom Moreaux

Peter Morse

Mike Muegel

Greg Munger

Rick Murnane

Hugh Njemanze
John Norby

Michael O’Leary
Ingmar Olsson

Lyle Parkyn
Bob Peterson
Steve Peterson

Greg Pilkington
Darrel Pinson
Th. Pfister

Karl Pingle

John Plocher

Alan Pope
Chris Prendergast

Steve Ray
Mike Reid

Steve Reynolds
Dan Richards

Robin Riley

Kay Roche
Jim Roskind

Stefan Roth
Florian Sachse
Richard Sargent

Jay Sarkar
Alan Sauls

Karl Schopmeyer

Rick Schuessler
Peter Schwaller
Roshin Sharma

Andrew Shebanow

Ursula Shelander
Hartmut Stein
Ur­sula Shelander

Jeffrey Smith

Cass Smith
Tim Southgate

Brian Stromquist

Dan Sullivan

Malcolm Sutter

Padma Talasila
Larry Thiel

Julie Tiemann

Esther Tong

Gino van den Bergen

Thomas Wik­shult

Clayton Wilkinson

David Williams

Roderick Williams
Tim Wint

Aaron Wohl
Matt Woodward
Heinz Wrosch

Cornelia Yoder

Robert Yu
Doug Zimmerman
Bruce Zimov


Chapter 0: Quick Start

 

 

 

0.1    Installation

 

1.    Copy the CodeCheck program into the directory in which you normally keep your programming tools.

2.    Create a new directory for the collection of Code­Check rule files that is sup­plied on the distribution disk. Copy these rule files to this new directory.

3.    Assign the pathname of the rule directory that you created in Step 2 to an envi­ron­mental variable named CCRULES. CodeCheck will use this vari­able to lo­cate rule files. Multiple pathnames may be specified in this environ­men­tal vari­able if this is desired. Separate each pathname with the charac­ter nor­mally used in your operating system (Unix: colon, DOS & OS/2: semi­colon, Macintosh: comma). On Unix systems only, if this variable is not defined then CodeCheck will look for rule files in a directory named /usr/CodeCheck.

4.    CodeCheck looks for header files in the paths listed in the INCLUDE en­vi­ron­mental variable. (On Macintosh systems it looks for CIncludes.) Make sure that this variable is correctly defined. This is im­portant: CodeCheck cannot function with­out access to the same header files that your C or C++ compiler uses. Multiple path­names may be specified in this envi­ronmental vari­able if necessary.

5.    If you have header files (e.g. system headers) that you wish CodeCheck to read but not check, then assign the pathnames of the directories containing these headers to an environmental variable named CCEXCLUDE. This step is not nec­es­sary for CodeCheck: it is useful only when you wish to apply rules to some but not all of the header files that are included in your C or C++ source files.

 

 

0.2    Command-Line Options

 

CodeCheck is invoked by means of a command line with either of these for­mats:

 

     check -options foo.c

     check foo.c -options

 

In this command line format foo.c refers to the name of the C source file to be analyzed. Any number of source files may be specified, arbitrarily intermixed with options.

The rules that are to be used to perform this analysis can be specified in the options list, as described below. If no rule file is speci­fied, Code­Check will look for a precompiled rule file named default.cco, first in the cur­rent directory and then in the directories specified in the CCRULES environment variable. If this file is not found, CodeCheck will per­form a simple syntactic scan of the source file without any user-defined rules. 

To analyze a multiple-file project with CodeCheck, either list all of the source filenames on the command line, or create a new file con­tain­ing the names of all of the source files (excluding the names of header files and li­braries). Give this pro­ject file the ex­tension “.ccp”. Then invoke CodeCheck, spec­ifying the pro­ject file in­stead of a source file:

   check -options myproject.ccp

CodeCheck will apply its rules to each source file named in myproject.ccp, and will apply project-level checking across all the files in the project.   The ccp ex­tension in­forms CodeCheck that the specified file is a project file rather than a C source file. This extension may be omitted in the command-line. Command-line options may also be specified in the project file, one per line. Every option placed in a project file applies to every source file in the project.

Command-line options are used to override default actions or conven­tions, or to indi­cate addi­tional actions that you want CodeCheck to perform. CodeCheck com­mand-line options are not case-sensitive. The available op­tions are:

–A       Reserved for CodeCheck expansion. Please do not use.

–B       Instruct CodeCheck that braces are on the same nesting level as material surrounded by the braces. If this option is not specified, then CodeCheck as­sumes that the braces are at the previous nest­ing level. This option only affects the prede­fined variable lin_nest_level.

–C       Reserved for CodeCheck expansion. Please do not use.

–D       Define a macro. The name of the macro must fol­low imme­di­ately. An optional macro definition can be specified af­ter an equal sign. The macro may not have any arguments. For ex­am­ple,

         check –DFOREVER=for(;;)

            has the same ef­fect as starting each source code file with

       

         #define FOREVER for(;;)

            If no macro definition is given, then CodeCheck assigns the value 1 to the macro by default.

–E       Do not ignore tokens that are derived from macro expan­sion when performing counts, e.g. of operators and operands. The de­fault (–E not specified) is for CodeCheck to ignore all macro-derived tokens when count­ing.

–F        Count to­kens, lines, operators, or operands when reading header files. The de­fault (–F not specified) is for CodeCheck not to count to­kens, lines, operators, or operands when reading header files.

–G       Do not read each header file more than once. Caution: Some header files are designed to be read multiple times, with con­di­tional access to different sections of the header.

–H       List all header files in the listing file. The –L option is as­sumed if this option is found. If –L is found without –H, then the listing file created by CodeCheck will not display the con­tents of header files.

–I         Specify a path to search when looking for header files. Use a sepa­rate –I for each path.  The pathname must follow –I, e.g.

        check -I/usr/myheaderpath src.c

            Header directory pathnames identified with the –I command-line option are searched before any directory paths listed in the the INCLUDE environmental variable. CodeCheck Unix only: the default header directory path is /usr/include.

–J        Suppress all CodeCheck-generated error messages, e.g. syntax warnings. This op­tion does not suppress warning messages generated by rules.

–Kn     Identify the dialect of C to be assumed for the source files. A digit should follow immediately, which identifies the di­alect. The dialects of C and C++ currently available are:

                      0      Strict K&R (1978) C

                      1      Strict ANSI standard C

                      2      K&R C with common extensions

                      3      ANSI C with common extensions (default)

                      4      AT&T C++  

                      5      Symantec C++ 

                      6      Borland C++ 

                      7      Microsoft C++ 

                      8      IBM Visual Age C++

                      9      Metrowerks CodeWarrior C/C++

                    10      DEC Vax C and HP/Apollo C.  

                    11      Metaware High C

            If this option is not specified, then CodeCheck will assume that the source code is ANSI with common exten­sions (–K3).

            If option –K is specified with no digit following, then Code­Check will assume that the user meant strict K&R C (–K0).

–L       Make a listing file for the source file or project, with Code­Check mes­sages inter­spersed at appro­priate points in the list­ing. The name of the listing file may follow im­me­diately. If no name is given then the listing file will be check.lst.  The listing file will be created in the current directory, unless a target directory is specified with the –Q option.

–M      List all macro expansions in the listing file. Each line contain­ing a macro is listed first as it is found in the source file, and second as it appears with all macros expanded. The –L op­tion is as­sumed if –M is found. If –L is found with­out –M, then the list­ing file created by CodeCheck will not ex­hibit macro expan­sions.

–N       Allow nested /* ... */ comments.  

–NEST         Allow C++ nested class definitions.

–O       Append all CodeCheck stderr output to the file stderr.out. This is useful for those using the MS-DOS operating system, which does not permit the redi­rec­tion of stderr output. .out; .out;

–P        Show progress of code checking. When this option is given, Code­Check will identify each file in the project as it is opened, and each function definition as it is parsed.

–Q       Specify a target directory. The pathname of the directory into which all CodeCheck output files are to be placed must follow immedi­ately, e.g.

         check -L -Q./temp mysource.c

            Ex­amples of such output files are the listing and prototype files. If this option is omitted CodeCheck will write its output files to the current working directory.

–R       Specify a rule file. The name of the rule file must follow im­me­diately, e.g. if the rule file name is foobar.cc and the C or C++ source filename is mysource.c:

         check -Rfoobar mysource.c

            Code­Check first looks for a object (i.e. compiled) rule file of this name (e.g. foobar.cco). If this file is out-of-date or not found, Code­Check will recom­pile the rule file (foobar.cc) into an object file (foobar.cco) before proceeding to apply these rules to the source file.  

            More than one –R file may be specified: in this case all the rules will be compiled together into an object file named temp.cco.

            If no –R file is specified, CodeCheck first looks for an object file named default.cco. If this file is found then it’s rules are used. If it is not found then checking proceeds with no user-defined rules.

–Sn      Apply rules while reading header files. A digit should follow immediately, which identifies the kinds of header files:

                      0      No header files (default).

                      1      Headers enclosed in double quotes.

                      2      Headers enclosed in angle brackets.

                      3      All header files.

            For example, suppose that these two lines are in a source file:

 

           #include <ctypes.h> //  A standard system header

           #include "project.h"      //  An application header

            When option –S1 is in effect, CodeCheck will apply it’s rules to project.h but not ctypes.h. Please note that CodeCheck must always read every header included in a source file — this op­tion only determines whether or not CodeCheck rules will be applied to the con­tents of the various headers.

            CodeCheck’s default behavior is not to apply its rules to the contents of any included header files.

            The environmental variable CCEXCLUDE, if it is used, takes prece­dence over this option. Rules are never applied to files that are found in directories listed in this variable.

–SQL  Enables embedded SQL code. Note: this option must be spelled in all uppercase. 

–T       Create a file of prototypes for all functions defined in a pro­ject. The name of the prototype file may follow im­me­diately. If no name is given then the name for the prototype file will be myprotos.h.     The prototye file will be created in the current directory, unless a target directory is specified with the –Q option.

–U       Undefine a macro constant. The name of the macro must fol­low im­mediately. Thus  check -UMSDOS foo.c  has the ef­fect of treat­ing foo.c as though it began with the pre­proces­sor direc­tive #undef MSDOS.

–V       For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

–W      For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

–X       For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

–Y       For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

Z        Suppress cross-module checking. Macro definitions and vari­able and function declarations will not be checked for con­sis­tency across the modules of a project.

Any letter of the alphabet may be used as a command-line option. Every op­tion is remem­bered by CodeCheck and passed to the rule interpreter. Code­Check rules can refer to and change these options by calling the functions op­tion, set_option, str_option, and set_str_option (see Sec­tions 1.3–1.5 of the Reference Manual for de­tails).  Option –X is recommended for users who wish to design custom rule files whose behavior is controlled by a com­mand-line option.


0.3    CodeCheck File Names 

The conventions used by CodeCheck for filename extensions are:

.cc        A CodeCheck rule file, containing a set of rules for com­pila­tion by Code­Check. These rules are written in a subset of the C language. Code­Check requires that this extension be used for rule filenames, though it may be omitted in the –R command-line option.

.cch     A CodeCheck header file, for inclusion in a CodeCheck rule file.

.cco     A CodeCheck object file, produced by the CodeCheck compiler. This file contains a compilation of the rules found in the rule file with the same prefix.

.ccp     A project file for CodeCheck. This file contains a simple list of the file­names of all of the source modules that comprise a pro­ject, one filename per line. Header files and li­braries should not be listed in this file.

 

Depending on command line options, the following files may be created by CodeCheck:

check.lst        The default filename for the listing file (–L option).

myprotos.h      The default filename for the prototype file (–T option).

stderr.out      The filename for stderr output (–O option).

temp.cco           The object file created by CodeCheck when more than one rule file is specified (–R option).


0.4    How to Use CodeCheck

 

 

0.4.1  A Single User with Prepackaged Rules

Let us suppose that you simply want to check your C source file foo.c for some of the common errors that are not usually detected by C compilers. You want to see the warning messages in context, in a listing file. The command is

   check –Rerror –L foo.c

When CodeCheck completes execution, open the listing file check.lst with your editor. Each warning will be shown under the line that caused the warn­ing, with a marker immediately under the token that was being scanned when the er­ror was detected.

This command made use of the prepackaged rule file error.cc, supplied by Abraxas. Some other prepackaged rule files that you may find helpful are:

Tutorial and example rule files:

 

•     dcl.cc.cc          Example rules that use the declarator variables.

•     cplist.cc           Lists and describes all classes in each module.

•     cplus.cc           Example rules for C++ style checking.

•     declare.cc        Interpret global declarations in ordinary “English”.

•     fcncalls.cc       Generate a list of functions called by each function.

•     forward.cc       Example rules that illustrate forward chaining.

•     lex.cc               Example rules that use the lexical variables.

•     nesting.cc        Example rules for measuring iteration nesting.

•     oometric.cc     Computes several object-oriented metrics.

•     order.cc           Check for standard ordering of file elements.

•     pp.cc               Example rules that use the preprocessor variables.

•     prefix.cc          Example rules for checking declarator prefixes.

•     sample.cc        Example rules for compliance with standards.

•     wrapper.cc      Detect headers and #includes that are not “wrapped”.

Production rule files:

•     ansi.cc Check for compatibility with the ANSI C standard.

•     BSD43.cc        Check for use of BSD 4.3 features that are not POSIX.

•     braces.cc         Check for consistent use of braces in high-level statements.

•     complex.cc      Measures of program complexity (McCabe, etc).

•     error.cc           Check for errors that compilers may not find.

•     fromHP.cc       Check for portability from HP/Apollo C.

•     fromVAX.cc    Check for portability from VAX C.

•     general.cc        Check for general portability.

•     indent.cc          Check for proper indentation.

•     Halstead.cc     Measures of program size developed by Halstead.

•     logical.cc        Check for if-conditions that are too complex.

•     maintain.cc      Check for general maintainability.

•     posix.cc           Check for violations of the POSIX namespaces.

•     size.cc Measures of program size based on lines & statements.

•     style.cc            Check for compliance with Comeau's C style standards.

•     SVID.cc           Check for use of SVID features that are not POSIX.

•     toIntel.cc         Check for portability to the Intel iC-386 V4.2 compiler.

•     toKR.cc           Check for portability to the 1978 K&R C standard.

•     toMPW.cc       Check for portability to Macintosh MPW C version 3.2.

•     toSuncpp.cc     Check for portability to Sun C++ version 2.1.

•     toVAX.cc         Check for portability to VAX C.

 

0.4.2  Multiple Users with Custom Rules

Large corporations with many programmers often have staff assigned to maintaining the tools used by these programmers, including tools like Code­Check which are used for quality assurance. It is often appropriate to assign to a single individual the responsibility to write and maintain a CodeCheck rule file that encodes the corporate standards for C style. This compiled rule file would be placed in a network directory with the name default.cco. Then each pro­grammer can check his or her code with a command like:

   check foo.c

Assuming that each programmer has defined an environmental variable CCRULES that points to the directory containing default.cco, this command will cause CodeCheck to apply the corporate rules to his or her source code.

Note: Please contact Abraxas for site license information.


Introduction to CodeCheck

 

 

1.1    The Elements of Style

 

Every programmer has a distinctive style of writing. This style is an ex­pres­sion of many things: the programmer’s sense of æsthetics, the demands of speed and effi­ciency, the requirements of the customer, the needs of main­ten­ance pro­grammers, and the possibility that the program will need to be ported to another computer or trans­lated into another language. Many of these ele­ments of style require careful value judgments by the programmer or project leader. Once the stylistic requirements are clearly defined, Code­Check can be an invaluable tool for monitoring each of these ele­ments of program style.

The principal elements of good programming style are the requirements of æs­thet­ics, maintenance, and portability. Fortunately, there are significant over­laps between these elements, as illustrated below. Can C programmers achieve a style that is at once portable, easily maintainable, and elegant? The answer is an emphatic “yes”, and CodeCheck can help any C or C++ pro­grammer to develop his or her personal style to­wards this goal.

Figure 1:  Overlap among the elements of good programming style.

 

There are several general principles that govern the overlap between these el­e­ments:

1.   Code that is technically “portable” but not easily main­tained is not truly portable. A nontrivial C program that is universally portable is a very rare ani­mal in­deed. There are so many dark and dusty corners in the syntax and seman­tics of C that universal portability is next to impossible. Therefore, programmers must make the purpose of their portable code evident — and this is the essence of main­tain­ability.

2.   Code that is elegant without being maintainable will have a short life. What good is elegant code if even its author cannot understand it one year later? The C grammar lends itself to code that is highly abstract and su­per­fi­cially ele­gant, espe­cially in the area of character processing, but mainte­nance program­mers may consider this style to be anything but æsthetically pleasing. What is wrong here is the equation of elegance with æsthetics: the two con­cepts are not identical.

3.   Simplicity lies at the heart of portability, maintainability, and æs­thetics. For reasons difficult to understand, many C programmers never learn this essen­tial truth. Quite possibly the fault lies not in the language itself, but in the cul­ture that has grown up around it. This culture seems to value complexity, density and abstractness over all other considerations, perhaps for the sheer fun of creat­ing puzzles that others find im­possible to solve. What­ever the reason, these val­ues militate against clean, simple and under­stand­able code.

 


1.2    Why Programs Break

 

Despite the C language’s reputation for portability, it is an unfortunate fact of life that an appar­ently flawless C program will usually fail when compiled with a different com­piler, or under a differ­ent operating system, or on an­other type of com­puter. The rea­sons for this fact of life are many, but there are three principal forces at work which underlie almost all such problems.

 

1.2.1  Force #1: Language Parochialism

First, most programmers become thoroughly versed in only one imple­menta­tion of C, on only one computer, under only one operating system. As they learn more about this programming envi­ronment, they unwittingly be­gin to use its many nonstandard features, and to take advantage of each of its quirks and foibles. These nonstandard features are typi­cally concentrated in the lowest lev­els of operation: the preprocessor and the lexical an­alyzer of the compiler, the file management routines of the operating system, and the bit-manipulation in­struc­tions of the machine. Inexperienced program­mers do not real­ize the extent to which computer languages are dependent on low-level con­ventions, and are wholly unaware of the implications of using non­stan­dard features. Unfortu­nately, these low-level nonstandard fea­tures tend to spawn an unending stream of the most amazingly mysterious bugs, often months or years after a program is first written.

 

1.2.2  Force #2: Programmer Machismo

The second force at work to make programs break is pro­grammer mach­ismo. Many programmers are young, smart, brash, fearless, and anxious to prove themselves to be wickedly clever. These are the program­mers who write macros like

#define put(x,p) (--(p)->cnt>=0?(*(p)->ptr++=(x)):flush(x,p))

and think that by getting all this power on one single completely un­docu­mented line they have achieved some­thing special. What they have actually created is a main­te­nance nightmare for someone else. (This delight­ful exam­ple is due to An­drew Koenig, who dis­sects a slightly different version on page 80 of C Traps and Pitfalls).

 

1.2.3  Force #3: Compiler Drift

The third program-destroying force operates not on application program­mers, but on compiler writers. This force is the almost ir­resist­ible temptation to in­clude a new feature or language ex­ten­sion (non­stan­dard, naturally) that will ease the life of pro­gram­mers and sell lots more compilers. The tempta­tion to in­clude these fea­tures is certainly not all bad, as it does gen­erate an endless stream of new ideas for compilers, but it feeds directly into the other two forces. The re­sult, if un­controlled by tough pro­ject management and pro­grammer self-dis­ci­pline, is code that is neither maintainable nor ­port­able.

 

1.2.4  CodeCheck can help!

For the C language programmer, to defend against all programming prac­tices that can threaten portability or maintainability is a task requiring both an ency­clopædic knowledge of C compilers and an almost superhuman level of self-dis­cipline and at­tention to detail. For the project leader, the task of enforc­ing uni­form standards for code structure and style is a severe test of the ability to read and cri­tique great vol­umes of dense code. CodeCheck is de­signed to automate these tasks:

• The C programmer can use CodeCheck to review his or her code at the end of the day, and to identify ques­tionable con­struc­tions that might have crept in. This daily Code­Check pro­gram can implement many lint-like code checking opera­tions, as well as checking for adherence to pro­ject style specifica­tions.

• The project leader can use a different CodeCheck program on a weekly ba­sis to verify the programmers’ adherence to the project style specifica­tions, to quantify the amount of code pro­duced, and to measure critical qualities of the code, e.g. den­sity and complexity.

• Software contractors are frequently required to certify that their code con­forms to published governmental and industrial standards for code com­plexity, among them the McCabe and Hal­stead measures. A CodeCheck program can be run at the con­clu­sion of a project to docu­ment these partic­ular measures, and many others too.


1.3    Why Programs Fail to Port

 

1.3.1  The Many Standards for the C Language

There seem to be no fewer than four “standards” for the C language, all of which are covered by CodeCheck. Figure 2 depicts the family tree for C stan­dards, with the earliest version on top:

Figure 2:  The Evolution of C Standards.

 

Each descendent of the original C has added significant extensions to the original language, while trying to remain true to the spirit of C.

◊    The K&R standard, as described in the first edition of Kernighan & Ritchie (1978). This is certainly the single most influential book in the history of C. The lan­guage was only loosely defined in this “stan­dard,” however, and it lacks many of the popu­lar features that are commonplace now (e.g. enum­er­ated constants, prototypes, the void type). Although obsolete, there are still many K&R compilers in daily use around the world.   

◊    The H&S standard, as described in the first edition of Harbison & Steele (1984). This was the first careful description of the K&R stan­dard, with many mod­ern ex­ten­sions included (e.g. the enum and void types). The H&S stan­dard represents a transi­tional phase be­tween K&R and ANSI. Most pre-ANSI compilers in use today are best described as adhering to the H&S standard.

◊    The ANSI C standard, as defined by the American National Stan­dards In­stitute and certified internationally as ISO/IEC 9899. This version represented a significant ad­vance in precision over H&S. It also introduced several significant innovations (e.g. the prepro­ces­sor paste operator).

◊    The POSIX standard, as defined by the American National Stan­dards In­stitute and certified internationally as ISO/IEC 9945. Part 1 of this standard includes and extends the ANSI C standard, and de­tails the interface and behavior of a standard library of op­erat­ing system services.

◊    The C++ 2.0 standard, as defined in “The Annotated C++ Refer­ence Manual,” by Ellis and Stroustrup (1990). This book is the base doc­u­ment for an ANSI committee that is now de­veloping an official standard for C++.

◊    The C++ 3.0 standard, as defined in “The C++ Programming Language Manual, 3rd Edition” by Bjarne Stroustrup (1997). This book is the base document of the pending ANSI C++ standard.

 

 

1.3.2  Two kinds of incompatibility

It is useful to break down a portation problem into two separate sources of in­compati­bility:

1.   The source environment will invariably have a vari­ety of idiosyn­crasies which are com­mon to no other, and which differ from the ostens­ible standard on which the envi­ronment is based. These differences are source portation problems.

2.   The target environment will differ somewhat from the stan­dard on which it was based. These differences are target portation problems.

The prepackaged CodeCheck rule files supplied by Abraxas with Code­Check include several that address source and target portation problems. The source portation rule files have names that begin with “from”, as in “fromVax.cc” , which detects special keywords and other peculiarities that are found only on Vax C compilers. The target portation rule files have names that begin with “to”, as in “toKR.cc”, which tests for non-K&R syn­tax and keywords.


1.4    The Structure of CodeCheck

 

A CodeCheck program looks just like a very simple C program. Indeed, CodeCheck programs are written using a small subset of the C gram­mar, so any­one who can read C can also read CodeCheck. A CodeCheck pro­gram is, in fact, just a collection of if-statements (called “rules”) and variable decla­rations. The CodeCheck interpreter translates this collection of rules into pseudocode, which is used during the analysis of a C source to control the code checking operation.

Figure 5: Actions of the two components of CodeCheck.

To analyze a C source file, the user has only to specify the name of the C source file and the name of the CodeCheck program. The Code­Check program will be compiled (if necessary), and then the C source file is analyzed in accor­dance with the CodeCheck rules. As depicted in Figure 5, Code­Check has two logically separate components — the Code Analyzer and the Rule Compiler.

 

A brief bibliographic note

For those who are interested in referring to original sources, this manual makes many references to the C literature. These are given in a compressed for­mat, as illus­trated below. Details (title, etc.) are given in the bibliography.

HS84:182              means              Harbison & Steele, 1984, page 182.

RJ88:52                 means              R. Jaeschke, 1988, page 52.

AK89:99               means              A. Koenig, 1989, page 99.


1.5    Debugging with CodeCheck

 

CodeCheck is, in addition to all of its other functions, a sensitive bug de­tector, ca­pable of identifying subtle bugs that many compilers miss. A pro­gram that com­piles without error may still fail to pass CodeCheck’s rigor­ous cross-module syntactic and semantic analyses. There are two common rea­sons for this: (a) your program de­viates from strict C in ways that your com­piler permits, or (b) your program actually has a fault whose presence has gone unnoticed. The former case is a mild problem — it only implies a lack of portability — but the latter case may be quite serious.

To use CodeCheck as a bug-detector, use the rule file error.cc for an effi­cient “once-over-lightly” check of your project or any one of its source files. Even with no rule file at all, CodeCheck still performs a tremendous variety of unusual syntac­tic and semantic checks on its input. Given an entire pro­ject, for exam­ple, Code­Check will compare ex­ternal declarations and macro defini­tions across files and will advise if any discrepan­cies are found — re­gardless of how many or how few rules are included in the rule file. Please note that Code­Check does not per­form many of the standard semantic checks that all C and C++ compilers do. CodeCheck is designed to provide error checking that complements the checking performed by your compiler.


1.6    Predefined Macro Constants

 

CodeCheck has a predefined macro constant, CODECHECK, which is designed to permit conditional checking of C code. This macro constant has the value 100*(CodeCheck version number). Thus in CodeCheck version 8.02 this con­stant will have the value 802.     

The CODECHECK macro can be used to hide code from Code­Check, so that it will not be checked. This is extremely useful, for example, when in-line as­sem­­bler code is intermixed with C code. Here is an example:

 

#ifndef CODECHECK

     •••

     •••   /* Code to be hidden from CodeCheck  */

     •••

#endif

The lint macro can be used in exactly the same way. CodeCheck pre­de­fines this macro with the value 2.

CodeCheck also has another predefined macro constant BETA, for a version in format x.xxBy, BETA has value of y.  For a version in format x.xx, the value of BETA is 0. Combined with macro CODECHECK, you can distinguish different minor releases.

Depending on the specific environment for which it is implemented, Code­Check will predefine certain additional macro constants. These con­stants (as of CodeCheck version 8.02 ) are listed in the following table. See the update notes that accompany Code­Check for the latest ad­ditions and changes. The table of macros is organized by operating system and compiler. Any of these may be changed by the user on the com­mand line (using the –D or –U op­tions) or from within rules (using the CodeCheck define and undefine functions).


Constant   Value   Comment        

 

CODECHECK                 802           Major Rev 802

BETA                      3             Minor Rev 3

lint                      2            

__STDC__                  1             Option -k2 only.

__STDC__                  0             Except option -k2.

__cplusplus               1             C++ only (-k4 through –k9).

cplusplus                 1             C++ only (-k4 through –k9).

__FILE__                  <file name>

__LINE__                  <line number>

__DATE__                  <date>

__TIME__                  <time>

Unix Operating System

unix                      1

__unix                    1

                         

Constant                                 Value                     Comment                                      

DOS Operating System

MSDOS                     1

M_I386                    1

M_I86                     1

M_I86LM                   1

__386__                   1            

__I386__                  1            

__MSDOS__                 1            

__LARGE__                 1             Except option -k7 or -k11    

__BORLANDC                0x0500        Except option -k7 or -k11

__TURBOC__                0x0500        Except option -k7 or -k11

_WIN32                    1

OS/2 Operating System

__OS2__                   1

__FLAT__                  1

__IBMC__                  200           Except option -k6

__IBMCPP__                200           Option -k4 only

__32BIT__                 1             Except option -k6

_M_I386                   1             Except option -k6

NT Operating System

i386                      1

MSDOS                     1

_M_IX86                   300

_MSDOS                    1

_X86_                     1

_WIN32                    1

VMS Operating System

vax                       1

vms                       1

vaxc                      1

vax11c                    1

VAX                       1

VMS                       1

VAXC                      1

CC$gfloat                 1

CC$parallel               1

Macintosh Operating System

applec                    1

MC68000                   1

mc68000                   1

m68k                      1

macintosh                 1

Borland C++

__BCPLUSPLUS__            0x0340

__TCPLUSPLUS__            0x0340

__CDECL__                 1

_Windows                  1

                         

Constant                                    Value                  Comment                                      

Borland C++ continued:

 

__TEMPLATES__             1

wchar_t                   short

Microsoft C++

__single_inheritance                   Expands to nothing.

__multiple_inheritance                 Expands to nothing.    

__virtual_inheritance                  Expands to nothing.

_M_I86                    1             Except Windows NT.

_M_I86LM                  1             Except Windows NT.

_M_IX86                   300

_MSC_VER                  1200

_MSDOS                    1

_X86_                     300

i386                      1

MSDOS                     1

_WIN32                    1

Metaware High C

__HIGHC__                 1

Symantec C++

__SC__                    700

IBM Visual Age C++

__IBMCPP__                350

Metrowerks CodeWarrior

__MWERKS__                1

 

Debugging your source code with the CodeCheck preprocessor can be greatly enhanced  by using the "-D?" switch, which will display the current state of CodeCheck internal symbol table for the pre-processor. If a particular intrinsic definition is non-desirable then the "-U" switch can be used to undefine the macro.

The CodeCheck product does not include the C/C++ system header files ( stdio.h, iostream.hpp, … ). These must be obtained from your compiler vendor, e.g. if source code to be analyzed by CodeCheck explicitly references stdio.h ( #include <stdio.h> ), then that header file must be available for CodeCheck to analyze. In summary for CodeCheck to analyze source code, all parts of the entire project to be analyzed must be present in order for the analysis to be successful.


Chapter 3: Checking Types

 

This section describes how to use CodeCheck to analyze type information with CodeCheck rules. These techniques are necessary for those who wish to write their own CodeCheck rules which detect conditions that depend on type information.

The ability to detect and analyze type information in declarations has been a part of CodeCheck since version 4.03. However, the ability to detect and analyze type information within executable code is relatively new to Code­Check, having been introduced for C in version 5.04, and for C++ in version 5.05. Please refer to the CodeCheck Reference Manual for the exact definitions of the CodeCheck variable and functions mentioned in this chapter, and the Abraxas Technical Note ( www.abxsoft.com/TechNotes.html ) series for recent changes and enhancements.

There are five broad categories of type-related rules that CodeCheck can en­force. Rules can be written that detect:

1.         a declaration of any specified type,

2.         a cast to or from any specified type,

3.         an implicit type conversion to or from any specified type,

4.         use of a variable or function of any specified type,

5.         use of an operand of any specified type for any specified opera­tor.

In addition, CodeCheck auto­mati­cally checks function argument types for com­patibility with the prototype for the function called, if one is in scope, and also checks the function return value for compatibility with the declared function return type.

 

3.1    How to Analyze a Type Declaration

Types in C and C++ are either simple or complex. A simple type consists of an unmodified base type, e.g. int or float, with possible qualifiers such as const, volatile, near, far, huge, export, etc. A complex type has a simple type as its base, and in addition has one or more additional levels, e.g. pointer to…, array of…, function returning…, or reference to… (the latter is allowed only in C++ and a few nonstandard C dialects). Each of these levels may have also have qualifiers (e.g. const, pascal, interrupt, etc.).

When an identifier is declared, CodeCheck set the variable dcl_levels to the number of levels in the type. Thus for simple variables dcl_levels will be zero. CodeCheck sets the variable dcl_base to an integer that identifies the base type of the identifier. The possible values of dcl_base are defined as manifest constants in the CodeCheck header file check.cch, which should be included in every rule file that makes use of type-checking services. Here are the first five base types from check.cch:

 

     #define VOID_TYPE            1

     #define CHAR_TYPE            2

     #define SHORT_TYPE           3

     #define INT_TYPE             4

     #define LONG_TYPE            5

As an example of the use of these CodeCheck variables to detect a specified type, here is a rule that will issue a message whenever a simple variable of type char is declared:

 

     if ( dcl_base == CHAR_TYPE )

        if ( dcl_levels == 0 )

           warn( 1234, "Variable %s is a char.", dcl_name() );

When the type in a declaration is complex, the function dcl_level() returns an integer that identifies the kind of each level. Here is an example rule that prints out the type of every global or local identifier that is declared:

 

     int  i, kind;

 

     if ( dcl_global || dcl_local )

        {

        printf( "Variable %s:  ", dcl_name() );

        i = 0;

        while ( i < dcl_levels )

           {

           kind = dcl_level( i++ );

           switch( kind )

              {

           case ARRAY:

              printf( "array of " );

              break;

           case POINTER:

              printf( "pointer to " );

              break;

           case REFERENCE:

              printf( "reference to " );

              break;

           case FUNCTION:

              printf( "function() returning " );

              break;

              }

           }

        printf( "%s\n", dcl_base_name() );

        }

The manifest constants ARRAY, POINTER, REFERENCE, and FUNCTION are de­fined in check.cch. In addition to dcl_level(), this rule also used the func­tions dcl_name() and dcl_base_name(). These functions return the declarator name and the name of the base type of the declaration, respectively. The vari­ables in the trigger for this rule are dcl_global and dcl_local, which Code­Check sets to 1 when a global or local identifier is declared, respectively. (In this context “global” means file scope and external linkage, while “local” means function or  block scope.)

To obtain the type qualifiers for each level of a type, including the base type level, use the function dcl_level_flags(). This function takes as its argument the level, just like dcl_level(), and returns an integer that has a bit set for each qualifier present. The header file check.cch contains manifest constants that can be used as masks to obtain each of these qualifier flags. For example, the fol­lowing rule can be used to detect declarations in which the first level has the const qualifier:

 

if ( dcl_level_flags(0) & CONST_FLAG )

     warn( 1234, "%s has been declared constant.", dcl_name() );

There are many other declarator variables and functions that are useful for analyzing declared types — refer to the CodeCheck Reference Manual for details.

 

 

3.2    How to Determine the Type of an Identifier

The CodeCheck variables and functions for determining the type of an identifier that is used within executable code are almost identical to those for declarations, except that they carry the prefix idn_ instead of dcl_.

The function idn_filename() and the variable idn_line can be used to determine the location (i.e. file name and line number) of the declaration that is currently in scope for the identifier as it is used in the executable code.

The variables idn_global, idn_local, idn_member, and idn_parameter also resemble their dcl_ counterparts: they are used to determine whether the identifier has global, local, or class scope, or is a function parameter, respectively.

 

 

3.3    How to Determine the Type of an Operand

The CodeCheck variables and functions for determining the types of all the operands of executable operators are similar to their counterparts for declara­tions and identifiers. The major difference is that they require an additional argument which specifies which operand to describe.

Operators may be unary (one operand, e.g.  ~ ), binary (two operands, e.g. += ), or ternary (three operands, e.g. ?: ). Functions have as many operands as they have arguments. Whenever an executable operator is encountered, Code­Check sets the appropriate op_ variables to indicate which operator was found, and sets op_operands to the number of operands taken by this operator.

The CodeCheck functions op_base(), op_levels(), op_level(), and op_level_flags() differ from their declarator and identifier counterparts in only one way: their first argument specifies which operand is to be described. The operands are indexed from right to left. Thus the rightmost operand is the operand 1, while the leftmost operand is the last (index given by op_operands). For example, in the expression x = a + b the first operand for op_add is b, and the second is a. For the operator op_assign, the first operand is the result of (a+b), and the second is x.

In the special case of the cast operator, the first operand is the type of the value to be cast, while the second operand is the result type after the cast has been performed. Here is an example rule that detects every cast of a pointer to a struct XYZ, to any result type, i.e. a cast that looks like (struct XYZ *):

 

     if ( op_cast )

        {

        if (  (op_levels(1) == 2)                &&

              (op_level(1,0) == POINTER          &&

              (op_base(1) == STRUCT_TYPE)        &&

              (strcmp(op_base_name(1),"XYZ") == 0) )

           warn( 1234, "Cast from (struct XYZ *) to anything." );

        }

 

 

3.4    How to Detect Implicit Type Conversions

Implicit type conversions can happen in three different contexts. First, when a value of one type is assigned to a value of another type, without an explicit cast, then an implicit type conversion takes place. Second, when the type of an argument that is passed to a function differs from the type of the formal parameter in the prototype for the function, then it must be implicitly converted. Third, an implicit type conversion occurs when type of the value in a return statement differs from the return type of the function.

The CodeCheck functions used for determining the types involved in all implicit type conversions are the same functions as used for operators. However, to detect an implicit type conversion, one of the cnv_ variables must be used as the trigger in the rule. When one of these variables is set, then CodeCheck uses the op_ functions as though a cast operator were present for the conversion.

Here is an example rule that detects every implicit conversion of anything to a pointer to a struct XYZ:

 

     if ( cnv_any_to_ptr || cnv_ptr_to_ptr )

        {

        if (  (op_levels(2) == 2)                &&

              (op_level(2,0) == POINTER)         &&

              (op_base(2) == STRUCT_TYPE)        &&

              (strcmp(op_base_name(2),"XYZ") == 0) )

           warn( 1234, "Implicit conversion to (struct XYZ *)." );

        }

 


Chapter 4: Portable Style

 

This section describes how to use CodeCheck to monitor style for port­abil­ity. Many of the rules described here are based on internal corporate standards for C coding that have been made available to Abraxas Software, and also on the rec­ommendations of two influential books: C Programming Guide­lines, Second Edition, by Thomas Plum, and Portability and the C Language, by Rex Jaeschke.  Jaeschke,Rex;

The guidelines and recommendations found in these sources were de­signed pri­marily to enforce both portability and maintainability. In the sub­sec­tions that follow, those guide­lines that primarily affect program portabil­ity are pre­sented. Each guideline is followed by a description of the relevant CodeCheck vari­ables that can be used to construct corresponding CodeCheck rules.

 

 

4.1    Lexical Issues in Portability

 

4.1.1  Lexical Rules for Variable Names

C programmers have evolved a great variety of lexical guide­lines for vari­ables and their dec­lara­tions. The guide­lines reported here are specifically in­tended to ensure portability. Additional guidelines that are intended to im­prove maintainability may be found in Chapter 4.

1.   Spell variable names in lower case only.

2.   Names with external linkage must be unique in their first 6 characters.

3.   Do not begin an identifier with an underscore character.

The predefined CodeCheck variables which are used to detect violations of the above guide­lines are, re­spectively:

Variable                                  Meaning

dcl_any_upper                 Set to 1 if an upper-case character is found in an iden­tifier name when it is declared.

dcl_extern_ambig          If two external identifiers have names that agree on the first 6 or more charac­ters, regardless of case, then this variable is set to the number of consecutive char­ac­ters on which they agree.

dcl_underscore              Set to 1 if the name of a declared identifier begins with an un­derscore character.

 

 

4.1.2  Nonstandard Characters

Many C compilers allow the use of characters that are not in the standard C char­acter set. Nonstandard characters are, by definition, not portable. The stan­dard char­acters are (HS88:6):

a b c d e f g h i j k l m n o p q r s t u v w x y z

a b c d e f g h i j k l m n o p q r s t u v w x y z

0 1 2 3 4 5 6 7 8 9

! # % ^ & * ( ) - _ + = ~ [ ] \ | ; : ' " { } , . < > / ?

blank      newline     backspace     horizontal-tab

vertical-tab     form-feed     carriage-return

 

Recommendation: For general portability, do not use nonstandard charac­ters. Note that the char­­ac­ters $ and @ are nonstandard. CodeCheck provides a prede­fined vari­able with which to detect nonstandard characters:

Variable                                  Meaning

lex_nonstandard            Whenever a char­acter is found that is not in the stan­dard C set, the value of this variable is set to the inte­ger rep­resen­ta­tion of the nonstandard.

To ignore nonstandard identifiers,  call  function skip_nonansi_ident().

Function                                 Meaning

skip_nonansi_ident( char ) Skip non-ANSI identifiers beginning with '@','$' or '`'. Char parameter of this function specifies the character which leads the identifier. The value of  the parameter only can be '@', '$' or '`'. The other characters have no effect for this function.

 

4.1.3  Trigraphs

Trigraphs are special 3-character sequences introduced in ANSI C. Tri­graphs are sig­nificant as a portability issue only to the extent that older pro­grams which unwit­tingly use trigraph sequences within literal strings will no longer compile correctly (RJ:28). ANSI C compilers translate trigraph se­quences into their cor­re­sponding sin­gle ASCII characters at a very early stage in the lexical analysis phase of compilation. The trigraph sequences and their corres­ponding ASCII characters are:

                  trigraph          meaning

              ??=         #

              ??(         [

              ??/         \

              ??)         ]

              ??'         ^

              ??<         {

              ??!         |

              ??>         }

              ??-         ~

Recommendation: Search old programs for the ?? symbol pair, and re­place ev­ery occurrence with ?\?. Older compilers will simply ignore the back­slash, while ANSI compilers will treat ?\? as a question mark fol­lowed by an es­cape se­quence, thus recoding the string to ??. CodeCheck provides two pre­de­fined variables for identifying inadvertent trigraphs:

Variable                                  Meaning

lex_trigraph                   Set to 1 if an ANSI trigraph is found.

lex_str_trigraph          Set to 1 if a trigraph is found within a string lit­eral.

 

 

4.1.4  Numeric Escape Codes

Numeric escape codes are permitted in C, but they are nor­mally used to re­fer to control characters in the ASCII character set (HS84:25). Such nu­meric es­cape codes will cause a program to fail when it is compiled and used in a non-ASCII (e.g. EBCDIC) envi­ronment.

There is a potentially confusing aspect of hexadecimal escape sequences. Unlike octal sequences, which may have at most three digits, the ANSI  C stan­dard specifies that a hexadecimal escape sequence may have any number of digits. Thus the string literal "/xabcd" contains only one character (because a, b, c, and d are all valid hex digits).

Recommendation: Do not use numeric escape codes unless it is abso­lutely nec­es­sary. Carefully document the meaning of each such usage in the source code, and use a manifest constant (#define) to make its meaning apparent. Code­Check provides two predefined variables for detec­ting numeric escape codes:

Variable                                  Meaning

lex_hex_escape              When a hexadecimal escape sequence is found, this variable is set to number of hexadecimal digits found.

lex_num_escape              When a non-zero numeric escape sequence is found, the value of this variable is set to the value of the es­cape se­quence.

lex_zero_escape            When a zero escape sequence is found (e.g. \0, \00, \x0, \0x0), the value of this variable is set to 1 if the context is a character literal, or 2 if the context is a string literal.

 

 

4.1.7  Escape Sequences in Character and String Literals

An escape sequence within a character or string literal is signaled by the back­slash char­acter: \. The unrestricted use of escape sequences is a fre­quent source of portability problems.

1. For reasons that shall remain forever mysterious, many pre-ANSI com­pil­ers al­low the digits 8 and 9 to appear within an octal escape sequence, as in \09. This usage has always been hopelessly confusing, has always been non-portable, and is now, fortunately, also ungrammatical.

2. Many pre-ANSI compilers do not support the hexadecimal escape se­quence, as in \xA3. This usage is therefore not portable except among ANSI compilers. The hex­adecimal escape sequence \0xA3 (with a zero appearing be­fore the x) is even rarer, and is forbidden by ANSI.

3. In the K&R dialect of C the only defined escape characters are in this set:

        \n   \b   \t   \r   \f   \\   \"   \'

In all other cases when a backslash is followed by a character, the backslash is ig­nored. The H&S dialect allows one more escape, \v (vertical tab). The, ANSI standard includes three further escape characters: \a (alert or bell), \x (hexadecimal), and \? (to disam­biguate trigraphs). Thus pre-ANSI programs which rely on the backslash being ignored before the let­ters a, v, or x are no longer portable.

4. The empty character constant formed by two successive single quote marks ('') is not con­sistently interpreted by C compilers, and is not allowed by many. In particular, it is not al­ways the same as the null character '\0'.

5. Many C compilers allow character con­stants to have more than one char­ac­ter. Even if all com­pilers did allow this usage, it would still be non-portable due to the in­famous NUXI problem. (“NUXI” refers to the worst-case scram­bling of the string “UNIX” that can result when porting encoding character strings).

To detect the above portability problems, CodeCheck provides the follow­ing six predefined variables:

Variable                                  Meaning

lex_big_octal                 Set to 8 or 9, respectively, when a numeric escape se­quence or octal in­teger con­tains the digits 8 or 9.

lex_hex_escape              When a hexadecimal escape sequence is found, this variable is set to number of hexadecimal digits found.

lex_ansi_escape            Set to 1 if an escape sequence con­tains one of the new ANSI escape char­acters: a, v, or ?.

lex_not_KR_escape       Whenever an escape character is found that is not de­fined by K&R (i.e. \n, \b, \t, \r, \f, \\, \", \') then this vari­able is set to the integer repre­sen­ta­tion of the char­ac­ter.

lex_char_empty              Set to 1 if an empty character con­stant is found (e.g. ''). This variable does not flag the null char­ac­ter con­stant ('\0').

lex_char_long                 Set to 1 if a character constant is longer than one char­ac­ter.

 

 

4.1.8  System Variables

A useful convention has evolved within the C community in which iden­ti­­fiers that are de­fined and used by the system (i.e. variables used by the com­piler, the linker, or standard sys­tem header files) are spelled with a lead­ing un­derscore character. If pro­grammers con­form to this convention by never spelling an iden­tifier in this way, then name conflicts are prevented. Unfor­tunately, some pro­grammers are unaware of this convention, and may in­ad­vertently spell an identi­fier with a leading under­score. Even if the pro­gram compiles without error, it will break as soon as it is compiled on an­other sys­tem that happens to use one of these names. This is, therefore, an obscure but significant portability problem.

Recommendation: Adhere to this convention religiously. Do not spell iden­ti­fiers with a leading underscore character, unless you are writing system code. CodeCheck predefined variable:

Variable                                  Meaning

dcl_underscore              Set to 1 if an identifier begins with an un­derscore charac­ter.

 

 

4.1.9  The Numeric Constant Suffixes U, F, and L

The ANSI standard and some pre-ANSI compilers allow a suffix of U (for un­signed) or F (for float) on numeric constants, in precisely the same way that the suffix L (for long) is allowed in K&R C. Needless to say, this is not a portable us­age among pre-ANSI com­pilers.   

The suffix L is generally portable, except when it is used on a floating con­s­tant. Only some non-ANSI compilers recognize the long float type, so this is a non-portable use of the L suffix.

Recommendation: The new suffixes U and F solve many problems of nu­meric am­biguity, and should be used on the grounds that they increase pro­gram clarity and make main­tenance easier. The cost in terms of lack of porta­bility is sub­stan­tially less than the benefit for pro­gram clarity.

CodeCheck provides four predefined variables for suffix detection:

Variable                                  Meaning

lex_float                          Set to 1 if a numeric constant is found with the suf­fix 'F' or 'f'.

lex_long_float              Set to 1 if a floating constant is found with the suf­fix 'L' or 'l'.

lex_suffix                        Set to 1 if a numeric constant is found with any suf­fix ('F', 'f', 'L', 'l', 'U', or 'u').

lex_unsigned                   Set to 1 if a numeric constant is found with the suf­fix 'U' or 'u'.

 

 

4.1.10  Octal and Hexadecimal Numeric Constants

Octal numeric constants have always been among the very worst features of the C lan­guage, and the ANSI standard does little to improve matters. That octal con­stants should be disting­uished from decimal constants sim­ply by adding a leading zero is just plain appalling. Uncounted thou­sands of hours of program­mer time have been invested over the years in finding er­rors caused by plausible but wrong numeric constants. This is a seri­ous and con­tinuing mainte­nance prob­lem.  

Hexadecimal numeric constants are in themselves problematic, but diffi­cul­ties arise because they are fre­quently used to refer to characters. This use of nu­meric constants is non portable because C is not tied to the ASCII character set. As one important example, bear in mind that IBM mainframes use the EBCDIC character set.

Recommendation: Avoid octal constants whenever feasible. If not feasible, add a comment that loudly proclaims that the constant is intended to be octal, or use a mani­fest constant (e.g. #define OCTALTEN 010 ) to make plain the octal na­ture of the number. Similarly, avoid the use of any numeric constant to refer to a character. CodeCheck provides a predefined variable for de­tecting constants of any radix:

Variable                                  Meaning

lex_radix                          Set to the radix of an integer constant (2, 8, 10, or 16).

 

 

4.1.11  The End-of-File Marker

The ANSI standard is quite clear on its specification of the appearance of the end of a file: the last character before the end-of-file marker in a non empty source file must be a newline character. This makes good sense, but some pre-ANSI compilers allow a file to terminate with any character. Con­sequent­ly, the ends of files must be checked for portabil­ity to ANSI. This may seem like a minor point, but some programmers have used this de­fi­ciency of pre-ANSI compilers to ac­complish some very bizarre and com­pletely un­necessary tricks.

Recommendation: It is good practice to end a file with a newline, and it is bad practice to play games with header files that end in the middle of macro defini­tions, declarations, or function definitions.

Variable                                  Meaning

lex_nl_eof                        Set to 1 if a non empty source file does not end with a newline.

 

 

4.1.12  String Concatenation

Under the ANSI C standard, consecutive string constants are con­catenated auto­mati­cally during the lexical analysis phase of compilation. Older com­pil­ers may not perform this operation, thus causing a problem for code that is to be ported from ANSI C to an older compiler. CodeCheck predefined vari­able:

Variable                                  Meaning

lex_str_concat              Set to 1 if two string constants are found, sepa­rated only by whitespace.

 

 

4.1.13  Embedded Assembler Code

Assembler code is, almost by definition, non portable. CodeCheck pro­vides a predefined variable with which large source files can be automatically scanned for embedded assembler.

Variable                                  Meaning

lex_assembler                 Set to 1 when assembler code is detected.

 

 

4.1.14  Continuation Lines

In ANSI standard C, a line can be continued with a backslash character that is followed immediately by a newline character. In older dialects of C this kind of line continuation marker was permitted only within macro defini­tions.  Code­Check provides a predefined variable for detecting the use of the continua­tion marker outside of macro definitions.

Variable                                  Meaning

lex_backslash                 Set to 1 if a backslash-newline pair is found at the end of a line that is not part of a macro definition.

 

 

4.1.15  Special Reserved Keywords and Identifiers

Many compilers have special keywords and identifiers which are shared by no other com­piler. For example, Microsoft uses the special keyword _based. Needless to say, these keywords are generally non portable. Code­Check pro­vides two func­tions for detecting specified identifier and keyword names.

Function                                 Meaning

identifier(char *)     Returns the integer value 1 whenever an identifier is found that matches the argument string.

keyword(char *)            Returns the integer value 1 whenever a keyword is found that matches the argument string.

 

 

4.1.16  Wide String and Character Literals

The ANSI standard provides a mechanism for using strings and characters that come from very large alphabets, such as Chinese. These “wide” strings and characters are lexically signaled with the prefix L immediately before the leading quote.  

CodeCheck provides a predefined variable for detecting wide literals:

Variable                                  Meaning

lex_wide                             Set to 1 if an ANSI wide string or character con­stant is found (prefix L).

 

 

4.1.17  Visibility of Nested Tag Names

ANSI C and all versions of C++ allow tag definitions to be nested within tag definitions. However, the visibility of identifiers associated with this tag depends on which dialect is in use. In ANSI C the nested tag is treated as though it is has global scope. In C++ the nested tag has class scope, but in versions prior to 3.0 the tag name and all of its enum constants are exported to the enclosing scope, to provide compatibility with C. In version 3.0 this com­patibility feature is removed for nested enum constants and applies to nested tag and typedef names only if they do not conflict with names in the enclosing scope. Needless to say, a porta­bility problem arises.

Recommendation: When writing C++ or porting C code to C++, all nested iden­tifiers should be explicitly scoped when used outside the enclosing scope. Do not depend on compatibility with ANSI C.

CodeCheck provides a predefined variable for detecting problematic un­scoped identi­fiers:

Variable                                  Meaning

lex_invisible                 Set to 1 when an unscoped identifier is visible to ANSI C and all ver­sions of C++ prior to 3.0, but is invisible to C++ 3.0.


4.2    Preprocessor Considerations

 

 

The preprocessor may very well be the most fertile source of porta­bility prob­lems in the C language. This has occurred because (a) until the ANSI standard, the prepro­cessor was never adequately standardized, and (b) it is a powerful tool with many quirks and foibles, just crying out for ex­ploitation by “clever” pro­grammers.

 

 

4.2.1  Whitespace within Preprocessor Directives

Some C preprocessors allow whitespace (space or tab characters) to pre­cede the # sym­bol, and to come between the # symbol and the preprocessor com­mand. Such use of whitespace is not portable to older compilers, although the trend is toward permitting it (HS88:27). The ANSI standard permits this use of white­s­pace.

Other forms of whitespace, e.g. vertical tabs, form-feeds, backspace charac­ters, etc, are not at all portable if used anywhere within a preprocessor direc­tive.

Recommendation: To enhance visibility, preprocessor lines should be clearly marked with the # symbol in the first position of the line. However, indentation of the preprocessor directive after the # symbol is desirable to in­dicate, for exam­ple, the nesting level of #if directives. CodeCheck pro­vides three pre­defined vari­ables for detecting whitespace within preprocessor direc­tives:

Variable                                  Meaning

pp_white_before            Set to the amount of whitespace (in charac­ters) that pre­cedes the # char­acter in a prepro­cessor di­rec­tive.

pp_white_after              Set to the amount of whitespace (in charac­ters) that is found after the # character and before the key­word in a prepro­cessor di­rective.

pp_bad_white                   Set to 1 if a non-space, non-tab whitespace char­ac­ter (e.g. vertical tab, form-feed, or backspace) is en­coun­tered within a preprocessor directive.

 

 

4.2.2  Leading Whitespace within Included File Names

Some C preprocessors do not automatically delete any whitespace charac­ters (spaces or tabs) that pre­cede the filename within #include directives. For ex­ample:

#include  <  wrong.h>

#include  "  wrong.h"

This is a portability issue, since many preprocessors do not perform this ser­vice.

Recommendation: Avoid leading whitespace in these filenames.

CodeCheck pro­vides a pre­defined vari­able for detecting leading whitespace within #include filenames:

Variable                                  Meaning

pp_include_white          Set to 1 if the filename in an #include directive has leading whitespace.

 

 

4.2.3  Preprocessor Arithmetic

Some preprocessors will not perform any arithmetic at all within prepro­ces­sor di­rectives, al­though they will do logical comparisons. Preprocessor arithmetic is there­fore not portable. Even pre­processors that do perform arith­metic (notably ones that conform to the ANSI standard) will still not evaluate the sizeof func­tion. This means that some apparently good code ideas, such as the following from a Symantec header, will fail:

1    #if sizeof(int) < 4

2          long     total;

3    #else

4          int      total;

5    #endif

Recommendations: (a) Avoid preprocessor arithmetic. (b) You may be able to use a defi­nition in the file limits.h (supplied with all ANSI com­pilers) to de­duce “sizeof” infor­mation. For example, if INT_MAX is defined in this file as 32767, then the size of an int must be 2 bytes.

CodeCheck provides two predefined variables for detecting these prob­lems:

Variable                                  Meaning

pp_arith                             Set to 1 if a preprocessor directive requires arith­metic calculation.

pp_sizeof                          Set to 1 if a preprocessor directive requires the evalua­tion of a sizeof function.

 

 

4.2.4  Macro Parameters

CodeCheck automatically checks for certain common problems that can occur with macro parameters. For example, a macro call with the wrong number of ar­gu­ments will evoke an automatic CodeCheck warn­ing message. Whenever this occurs CodeCheck will also set the predefined variable lex_bad_call so that the user can have his own custom-designed error message triggered. A few de­viant preprocessors allow a macro call to have fewer arguments than speci­fied, even though this is permitted in nei­ther K&R nor ANSI C.

A separate problem occurs when a variable has the same name as a macro func­tion. Most compilers will accept this usage, but some will treat it as an er­ror.

CodeCheck provides four predefined variables for detecting these and other similar prob­lems:

Variable                                  Meaning

lex_bad_call                   Set to the difference between the num­ber of ar­gu­ments found and the number of arguments expected when a macro function is expanded.

lex_null_arg                   Set to 1 if an actual argument is omitted in a macro call, e.g. XYZ(abc,,123).

pp_arg_count                   Set to the number of formal parameters found in a macro defi­nition.

pp_empty_arglist          Set to 1 if the definition of a macro “function” has no for­mal parameters.  

pp_overload                      Set to 1 if a variable name conflicts with a macro func­tion name.

 

 

4.2.5  Comments in Macro Definitions

Once upon a time there was a marvelously clever programmer who real­ized that by plac­ing a com­ment in a macro definition like this

#define PASTE(a,b)  a/**/b

he could cause the preprocessor to “paste” the actual arguments for a and b to­gether, so that the com­piler would see only the single token ab. For exam­ple, the macro call f(PASTE(i,j)) would be per­ceived by the com­piler as f(ij), re­gardless of the values of the variables i and j. From this little piece of clever­ness there sprang up a whole cot­tage industry of token pasting by pro­gram­mers ea­ger to exploit the very lat­est in pro­gram obfuscation. This ex­ample, which is dis­sected further in RJ88:37, vio­lates the cardinal principle of pro­gram maintain­abil­ity: the intent of the programmer must be fully evident on first reading. Further­more, not all compilers will be so ac­commodating as to ignore the em­bedded comment—all ANSI preprocessors will treat the comment as white­s­pace, and will therefore not paste any tokens together. Thus this mode of token pasting also presents a portability problem.

Fortunately, ANSI has come to the rescue with two badly needed provi­sions: (a) com­ments within macro definitions must be treated as whitespace, and (b) program­mers who feel an overwhelming need to paste tokens may use the new “paste” oper­ator (##) for the pre­processor.

Recommendation: Do not paste tokens unless you have a truly solid rea­son. CodeCheck provides two pre­de­fined variables for detecting these prob­lems:

Variable                                  Meaning

pp_comment                        Set to 1 if two tokens within a macro definition are sepa­rated only by a comment.  

pp_paste                             Set to 1 if the “paste” operator (##) is found in a macro definition.

In its normal (ANSI-conforming) mode of operation, CodeCheck treats any com­ment within a macro definition as whitespace. However, when the –K0 com­mand-line switch is active (indicating K&R [1978] C syntax), CodeCheck inter­prets an embedded comment as a paste operation. Thus even old programs that use this unfortunate feature can be scanned successfully with CodeCheck.

 

4.2.6  The #elif Directive

The ANSI standard and some pre-ANSI compilers allow the #elif pre­pro­cessor di­rective and the defined() preprocessor function. These useful ad­di­tions to the preprocessor permit clearer con­ditional compilation direc­tives, as this example illus­trates:

1    #if defined( macintosh )

2      •

3      •

4    #elif defined( MSDOS )

5      •

6      •

7    #elif defined( vms )

8      •

9      •

10   #endif

Unfortunately, not all compilers support these new constructions.

Recommendation: If maintainability is more important than portability, then by all means use these new constructions — the benefits are significant. Avoid these new con­structions only if you are writing for the most general kind of portability.

CodeCheck provides two predefined variables for detecting these usages:

Variable                                  Meaning

pp_defined                        Set to 1 if the defined pre­pro­ces­sor func­tion is en­coun­tered.   

pp_elif                               Set to 1 if the #elif preprocessor di­rective is en­coun­tered.   

 

 

4.2.7  Macro Names and Parameters Inside Strings

Some older compilers permit macro expansion within string literals. This practice is permitted neither in ANSI C nor in the majority of non-ANSI com­pil­ers. It can pose a serious portability problem for programs that rely on such ex­pansions.

Another problem occurs in macro definitions when a formal parameter of a macro is used inside a string literal. The Vax C compiler, among others, will per­form substitution for the formal parameter name inside the string lit­eral, but ANSI standard compilers will not.

CodeCheck provides two predefined variables for detecting these prob­lems:

Variable                                  Meaning

lex_str_macro                 Set to 1 when a macro identifier is found within a string constant.

pp_arg_string                 Set to 1 if a macro formal parameter is found inside a string literal in the macro definition.

 

 

4.2.8  New ANSI Preprocessor Features

In the process of standardizing and regularizing the C preprocessor, the ANSI standards committee elected to introduce a number of new features. Each of these features constitutes a portability problem for developers who wish to port their project from an ANSI conforming compiler to a strict K&R compiler. These new features include the following:

1.   Use of a macro that expands to a filename within an #include di­rective.

2.   The #error and #pragma preprocessor directives.

3.   The defined() preprocessor function.

4.   The paste and stringize preprocessor operators.

5.   The #elif conditional directive.

In addition to predefined variables which flag each new feature, Code­Check provides a single predefined variable, pp_ansi, for detecting all such features, and pp_unknown for flagging non-ANSI preprocessor directives.

Variable                                  Meaning

pp_ansi                               Set to 1 if a preprocessor feature is encountered that is new with the ANSI standard.

pp_defined                        Set to 1 if the defined pre­pro­ces­sor func­tion is en­coun­tered.

pp_elif                               Set to 1 if the #elif preprocessor di­rective is en­coun­tered.  

pp_error                             Set to 1 if the #error preprocessor directive is en­coun­tered.

pp_include                        Set to the following on an #include directive:

1:   filename is in quotes, and was obtained from a macro expan­sion,

2:   filename is in quotes, not from a macro,

3:   filename is in angle brackets, from a macro,

4:   filename is in angle brackets, not from a macro.

5:   filename is not enclosed (Metaware C only).

6:   filename is not enclosed (Vax C only).

pp_paste                             Set to 1 if the ANSI “paste” operator (##) is found in a macro definition.

pp_pragma                          Set to 1 if a #pragma preprocessor directive is en­coun­tered.

pp_stringize                   Set to 1 if the ANSI “stringize” operator (#) is found in a macro definition.

pp_unknown                        Set to 1 if a non-ANSI preprocessor directive is found.

 

 

4.2.9  Preprocessor Keyword Substitution

Some preprocessors allow macro names to be used after the # symbol, as in this example:

 

#define macro      define

#macro  square(x)  ((x)*(x))

Needless to say, this is not a portable use of macro substitution. CodeCheck pro­vides a predefined variable for detecting this usage:

Variable                                  Meaning

pp_sub_keyword              Set to 1 if the keyword in a preprocessor directive is itself a macro name.

 

 

4.2.10  Recursive Macros

A macro is recursive if its name appears as a token within its definition. C preprocessors vary in their permissiveness with respect to recursive macros. The ANSI standard specifies that the definition of a macro be “turned off” for the du­ration of the expansion of that macro, to prevent the preprocessor from plunging into an infinite recursion. The Microsoft C preprocessor, however, only turns off the definition of macro constants, thus allowing recursive macro functions. Other preprocessors only turn off the current macro being expanded, so that two mu­tually recursive macros can cause an infinite recur­sive death.

Recommendation:  Avoid recursive macros.

CodeCheck pro­vides a predefined variable for detecting recursive macro def­initions:

Variable                                  Meaning

pp_recursive                   Set to 1 if a recursive macro definition is found.

 

 

4.2.11  Macro Definition Length

Some preprocessors store each macro definition as a string, usually after white­space has been eliminated. Such preprocessors may have a maximum , e.g. 100 characters, for the length of any one macro definition. ANSI C, by con­trast, does not impose any length restrictions on macro definitions. Thus long macro definitions may cause a problem when porting to non-ANSI compil­ers.

Variable                                  Meaning

pp_length                          Set to the length (in characters) of the body of a macro definition, after redundant whitespace has been elim­inated.

 

 

4.2.12  Relative Pathnames for Header File Inclusion

An ambiguity arises when an #include directive within a header file contains a relative pathname. Here is a Unix example:

 

#include   <../../foobar.h>

Is the file foobar.h to be found in a location that is relative to the source file, or relative to the header file that contains the above #include directive ? Older K&R compilers always look in a location that is relative to the source file. The ANSI standard does not specify where to look. Many modern compilers look in a location that is relative to the header file that contains the #include directive. (CodeCheck looks in both places.) This ambiguity means that C and C++ programmers who use relative pathnames may encounter difficulties when changing to a different compiler, or when porting to a different system.

Recommendation:  Avoid relative pathnames in header #include directives.

CodeCheck pro­vides a predefined variable for detecting relative pathnames:

Variable                                  Meaning

pp_relative                      Set to 1 when an #include directive within a header file specifies a relative pathname.

 


4.3    Portability in Declarations

 

4.3.1  Array, Structure, and Union Initializers

Some new compilers permit initializers for automatic (i.e. not static and not external) ar­rays, structures, and unions. What they are actually doing is gen­erat­ing exe­cutable code that assigns initial values to these objects. Note that this is somewhat dif­ferent from true initial­ization, in which the initial values are part of the im­age of the program. In any event, since not all com­pilers permit this usage, it is not portable. 

Another difficulty is caused by union initializers in general: many compil­ers do not allow any sort of union initialization.

Recommendation: For the sake of portability, do not try to give initializers to automatic arrays, struc­tures, or unions. It is better either (a) to make such objects that need initializ­ers static, or (b) to use ex­plicit initialization. CodeCheck pro­vides two prede­fined variables for detect­ing initialization problems:

Variable                                  Meaning

dcl_auto_init                 Set to 1 if an initializer for an auto­matic ar­ray, struct, or union is found.

dcl_union_init              Set to 1 when a union initializer is found.

 

 

4.3.2  Enumerated Variables

Traditional K&R compilers do not support enumerated variables, but most modern H&S compilers do. Unfortunately, pre-ANSI C compilers are not fully consistent in their implementations. Because of this there is a mild portability problem with enu­merated vari­ables.

A greater difficulty is presented by compilers that allow computed values for enum­er­ated constants: these are seldom portable.

Recommendation: The advantages of enumerated variables for enhancing pro­gram clar­ity greatly outweigh the portability problems that they may cause. Use them, but don’t use computed explicit values for the enumeration con­stants.

 

1    if ( dcl_init_arith && (dcl_base == ENUM_TYPE) )

2       warn( 99, "Non portable computed initializer." );

Variable                                  Meaning

dcl_init_arith              Set to 1 when a computed initializer is found, or when a computed explicit value for an enumerated con­stant is found.

dcl_base                             Set to an integer which identifies the base type of the cur­rent declarator. The base types are defined as manifest con­stants in the header file check.cch. The manifest constant for an enu­m­er­ated type is ENUM_TYPE.

 

 

4.3.3  Bitfields

The incautious use of bitfields can lead to many portability prob­lems. First, an obvious problem: compilers differ with respect to the maximum size of the bit­fields that they allow. Thus a bitfield of 20 bits will compile without com­plaint on a 32-bit computer, but will cause a syntax error on a 16-bit com­puter.

Another portability problem can be caused by bitfields that are not ex­plic­itly de­clared unsigned. If such a bitfield is compiled by a compiler that con­siders bitfields to be unsigned, if the bitfield is initialized with a negative value, e.g. –1 (a natural way to fill all the bits), and if the value of this field is then as­signed to a signed int or long, then the result will be a small positive integer. However, a compiler that con­siders bitfields to be signed will obtain a nega­tive result in the same assignment. Harbison & Steele (HS88:106) recom­mend that no type other than unsigned ever be used for a bitfield. They point out that a bitfield of type int can be compiled in three different ways: signed, un­signed, and pseudo-un­signed (see Glossary for definition). The signedness of a bitfield for any given C compiler is usually (but not necessar­ily!) the same as the signedness of a char.

Two additional minor portability problems: first, ANSI C compilers now al­low bitfields to be members of unions, but most pre-ANSI compilers do not allow this use of bitfields. Second, some compilers do not allow anonymous (unnamed) bitfields.

CodeCheck provides four variables for detecting these problems.

Variable                                  Meaning

dcl_bitfield                   Set to 1 if a bitfield is found.

dcl_bitfield_anon       Set to 1 if an unnamed bitfield is found.

dcl_bitfield_arith     Set to 1 if a bitfield width requires calculation.

dcl_bitfield_size       Set to number of bits in a bitfield.

dcl_union_bits              Set to 1 if a bitfield is declared as a member of a union.

 

 

4.3.4  Empty Declarations

Empty declarations are allowed in K&R C, but most are not permitted in ANSI C. The exception is an empty declaration in which the type specifier is a forward reference to a tag that will be needed later. A CodeCheck rule can be constructed to detect non-ANSI empty declarations, as follows:

 

1    if ( dcl_empty )

2       {

3       if ( (dcl_base != ENUM_TYPE)

4                      && (dcl_base != UNION_TYPE)

5                      && (dcl_base != STRUCT_TYPE) )

6                      && (dcl_base != CLASS_TYPE) )

7          warn( 9999, "Empty declaration." );

8       }

The CodeCheck variables used in this rule are:

Variable                                  Meaning

dcl_empty                          Set to 1 if an empty declarator is found.

dcl_base                             Set to an integer which identifies the base type of the cur­rent declarator. The base types are defined as mani­fest con­stants in the CodeCheck header file check.cch.

 

 

4.3.5  Microsoft “Based” Pointers

Microsoft C version 6.0 introduced into C a new and completely non-portable pointer type: the “based” pointer. To implement this new concept, four new non-portable keywords (_segment, _segname, _based, _self) and one new op­erator (:>) were introduced. In its default mode of operation CodeCheck will recognize these constructs, and will check for improper grammatical usage just as it does for the standard keywords and operators. To disable these and all other extended keywords and operators, specify -K1 (for ANSI C) or -K0 (for strict K&R C) on the command line.

 

 

4.3.6  Pascal Functions

Many implementations of C allow the programmer to call and write func­tions that use the conventions for passing arguments and returning values. Un­fortu­nately, there is little agreement on how the pascal keyword should be used. Here are three representative cases:    

• Microsoft C:        pascal is a type modifier, similar to cdecl.  Microsoft C:pascal;

• MPW C:              pascal is a type specifier, similar to float.  MPW C:pascal;

• Think C:              pascal is a storage class specifier, similar to static. Think C:pascal;

As a result of these differences, it is difficult to use the pascal keyword in a fully portable way. One way to avoid the worst kind of trouble with this and similar key­words is never to declare more than one function per declarator list. Additionally, the CodeCheck function keyword() can be used to detect occur­rences of this troublesome keyword.

 

1    if ( dcl_function && (dcl_count > 1) )

2       warn( 98, "Use only one declarator here." );

3

4    if ( keyword("pascal") )

5       warn( 99, "This keyword is not fully portable." );

 

Variable/Function                 Meaning

dcl_count                          Set to the index of the current declarator within a comma-delimited declarator list. The first declara­tor has index 1, the second 2, etc, until a semicolon is found that marks the end of the list.

dcl_function                   Set to 1 if this is a function declaration.

dcl_level()                      See Section 3.2 of the Reference Manual for details.

keyword()                          Returns the integer value 1 whenever a keyword is found that matches the argument string.

 

 

4.3.7  Variable Numbers of Arguments

The ability to call functions with variable numbers and types of arguments is a powerful feature that differentiates C from most other high-level com­puter languages. Unfortunately, it also presents some portability problems. Some com­pilers, e.g. Microsoft C, allow users to indicate that a function takes a variable number of arguments by terminating the parameter list in the function declarator with a comma:

 

int main( argc, argv, )

This usage is not portable to ANSI standard C, nor to many other non-ANSI compilers. ANSI conforming compilers need to see an ellipsis (three dots …) fol­low the last comma, while non-ANSI compilers generally consider the last comma to be a syntax error.

By the same token, the ANSI ellipsis notation for functions with variable num­bers of arguments is equally non-portable to pre-ANSI compilers. Code­Check provides three variables for handling these problems:

Variable                                  Meaning

dcl_3dots                          Set to 1 whenever an ellipsis (...) is found.

dcl_need_3dots              Set to 1 when an ellipsis (...) is needed in a function pa­rameter list, but is not found.

dcl_oldstyle                   Set to 1 if an old-style (i.e. not prototyped) function is declared.

 

 

4.3.8  Type Definitions

Some C compilers, e.g. MPW C, do not permit typedef names to be de­clared more than once within a module. Many compilers do tolerate this practice, how­ever, so this becomes a portability problem for those program­mers who acquire the dubious habit of placing identical type definitions within many header files.

Recommendation: Take care to define each type in exactly one header file, and use conditional compilation switches to guarantee that the contents of each header is compiled only once per module.

Variable                                  Meaning

dcl_typedef_dup            Set to 1 whenever a duplicate type definition is found.

 

 

4.3.9  Function typedef names

Many C compilers will accept typedef names for function types, and some will accept such names in function definitions. For example:

1    typedef void weird( int );

2    weird myfunc( int x )

3    {

4       /*  What is the return type???  */

5    }

This usage is not permitted in ANSI C, because of the ambiguity involved in the return type of the function — is it a function returning void, or a func­tion re­turn­ing a function returning void? Compilers that permit this usage make the former assumption, because the latter is impossible in C.

Recommendation: Do not typedef function types, even if your compiler al­lows it. Here is a CodeCheck rule that will detect such type definitions:

1    if ( dcl_typedef )

2       if ( dcl_function )

3          warn( 99, “Do not typedef function types.” );

 

Variable                                  Meaning

dcl_function                   Set to 1 if this is a function declaration.

dcl_typedef                      Set to 1 if a typedef name has been declared.

 

 

4.3.10  Uninitialized static float or double variables

The C language has always required that static variables without explicit ini­tializers be initialized by the compiler to zero. However, a portability issue arises with static float, double, and long double variables: will the compiler use a zero bit-pattern for the initializer, or the floating-point representation of zero? The two are not necessarily the same. ANSI-conforming compilers are required to use the floating-point representation of zero, but many pre-ANSI compilers use the zero bit-pattern.

Recommendation: Always explicitly initialize static float, double, and long double variables. Here is a CodeCheck rule that will detect this problem:

1    if ( dcl_static )

2       if ( (dcl_simple || dcl_function )

3             && (! dcl_initializer)

4             && (dcl_base >= FLOAT_TYPE)

5             && (dcl_base <= LONG_DOUBLE_TYPE) )

6          warn( 99, "Need explicit initializer here." );

 

Variable                                  Meaning

dcl_base                             Set to an integer which identifies the base type of the current declarator. The base types are defined as mani­fest con­stants in the CodeCheck stan­dard header file check.cch.

dcl_function                   Set to 1 if this is a function declaration.

dcl_simple                        Set to 1 when a simple variable (i.e. neither pointer, array, reference, nor function) is declared.

dcl_initializer            Set to 1 when an initializer is found.

dcl_static                        Set to 1 when a non-local static identifier has been declared.


4.4    Portability at the Expression Level

 

4.4.1  Non-ANSI expressions

At least one popular compiler, Gnu C, allows parenthesized compound state­ments within expressions — an extension to C that is most certainly not part of the ANSI standard. For example, consider this macro defined in the Gnu header file ctype.h:

 

#define tolower(c)  ( { int _c=(c); isupper(_c) ? _tolower(_c) : _c; } )

The purpose of this odd construction is to guarantee that the argument of the macro will be referenced only once. While this is a laudable goal, there is also the temptation for programmers to use this extension in their own code. This will create an immediate portability problem. CodeCheck has a trigger that will de­tect this (and other) non-ANSI expressions:

Variable                                  Meaning

exp_not_ansi                   Set to 1 if a non-ANSI C expression is found. This vari­able does not trigger on C++ expressions that con­form to the ANSI base document for C++.

4.4.1  Word Length

C program­mers should never rely on data size or type casts to truncate ex­pres­sions to a specific number of bits. As Plum (TP84:20-21) and others have pointed out, there are C programming environments with chars that are 8, 9, 10, and 16 bits wide, shorts that are 16, 18, 20, 32, and 36 bits wide, and longs that are 32, 36, 40, and 64 bits wide! length:char; length:short; length:long; length:word;

A CodeCheck predefined variable can be used to detect some (but not all) vio­la­tions of this guideline:

Variable                                                                                                                                                      Meaning

exp_cast_narrow;                                                                                                                                                      Set to 1 if a narrowing cast is found.

An assignment or cast is “narrowing” whenever it causes a truncation. For widest possi­ble applicability in the definition of narrowing, CodeCheck as­sumes that the length order­ing of the integer types is:

                                                                                                                             long > int > short > char

Thus CodeCheck will flag as narrowing any cast or assignment that causes a wider length type to be converted to a smaller length type. Here is an example of a narrow­ing cast (the programmer’s intent was, apparently, to add only the low-order bits of z to y):

 

1                                                                           long                                                                           x, y, z;

2

3                                                                           x = y + (long)(short)z

And here is an example of a narrowing assignment (really just an implicit nar­row­ing cast):

 

1                                                                           long                                                                           x, y;

2

3                                                                           x = (short)y.

 

 

4.4.2  Byte Order

Which byte is the first byte in an integer? Computer architectures differ on this point, which results in some very subtle and hard-to-detect portability prob­lems (TP84:22-24). It is therefore very bad style to do things that are byte-order depen­dent, such as casting a “pointer-to-long” to a “pointer-to-char” (which character is being pointed to will vary depending on the archi­tecture of the com­pu­ter). CodeCheck cannot detect all byte-order depen­dencies, but it can detect the most common forms. The rel­evant guidelines are:

a.                                                                                                                                                      Be very careful with casting any pointer (other than a pointer-to-void) to a pointer-to-char.

b.                                                                                                                                                      Do not use character constants that are more than one character wide.

The CodeCheck variables which can be used to detect most violations of these guide­lines are as follows:

Variable                                                                                                                                                      Meaning

exp_cast_byte;                                                                                                                                                      Set to 1 whenever a cast to a byte-length object is found.

lex_char_long;                                                                                                                                                      Set to 1 if a character constant is longer than one char­ac­ter.

 

 

4.4.3  Right Shifted Signed Integers

The right shift operator (>>) should not be used on signed integers, be­cause com­puter architectures differ in their treatment of the sign bit while performing this opera­tion (TP84:38). Some will extend the sign bit, thus pre­serving the sign of the operand, and others will not.

Variable                                                                                                                                                      Meaning

exp_shift_right;                                                                                                                                                      Set to 1 if the right shift operator is used on a signed operand.

 

 

4.4.4  Increment and Decrement Operators

The increment and decrement operators (++ and --) can cause portability prob­lems when the variable to which they are applied appears more than once within an expression. Consider this example:

                                                                           d2 = x[--k] - 2.0*x[++k] + x[++k];

The portability problem occurs because the C language definition does not spec­ify which term on the right is to be evaluated first. The programmer in­tended ex­ecution to proceed from left to right, but some compilers may pro­ceed from right to left — a pre­scription for chaos.

The problem even extends across assignment operators. The following code frag­ment from TP84:40 looks innocuous enough, but it fails for the same reason:

                                                                           a[i] = i++;

Indeed, the ordering of operations is unspecified by C between so-called se­quence points. Unfortunately, pre-ANSI compilers implemented sequence points in vari­ous ways. The safe approach is this: when a variable is to be in­cremented or decremented with ++ or --, make sure that it is not referred to again within the same simple statement. increment; decrement; sequence point;

The same principle holds for a function call that may change the value of a variable: do not refer again to that variable within the same statement. This is called the “side effect” problem, because the changes to the variables in question are side effects; of the actual operation (variable reference or function call).

Variable                                                                                                                                                      Meaning

exp_side_effect;                                                                                                                                                      Set to 1 if an operator with a side-ef­fect is applied to a variable that ap­pears more than once between ANSI-de­fined sequence points.

 


4.5    Portability of Functions

 

4.5.1  Labels without Statements

Although no published grammar for C permits labels to exist without a state­ment that is to be labeled, many C compilers do permit this. For example, the la­bel “end” in the following code fragment does not label any statement, even though it is clear what the author intended. This use of labels is forbid­den in ANSI C, and should generate a syntax error when compiled by any strictly con­forming compiler.

 

1       while ( x == 0 )

2          {

3          /* ... */

4    end:

5          }

Recommendation:  For maximum portability, do not use labels without state­ments. The statement should be empty, i.e. a lone semicolon. CodeCheck has a predefined variable that will detect labels without statements:

Variable                                  Meaning

stm_bad_label                 Set to 1 whenever a label or list of labels is found that is not attached to any statement.

 

4.5.2  Hidden Parameters

Traditional K&R compilers allow an identifier declared in the compound statement of a func­tion (i.e. just after the first open brace) to have the same name as one of the function’s formal para­meters, thus effectively hiding the parameter from the function. This practice is disallowed in the ANSI stan­dard, and in the better H&S compilers. Here is an example of the problem:

1    void bomber( xptr )

2       char  *xptr;      /*  The formal parameter is     */

3       {                 /*  hidden by the declared      */

4       long  *xptr;      /*  automatic variable.         */

5

6       *xptr = 40000; /*  This can crash the system,  */

7       }                 /*  but the function compiles.  */

Recommendation:  Since the only plausible circumstance in which this might occur is as a result of programmer error, there is little to say other than “don’t do it”. It may be an interesting test of your com­piler to see if it can catch this bug. CodeCheck has a predefined variable that will find it:

Variable                                  Meaning

dcl_parm_hidden            Set to 1 if a function parameter has the same name as an identifier de­clared within the function’s com­pound state­ment.


4.6    C Compiler Limits

 

 

4.6.1  Identifier, String, and Line Lengths

ANSI conforming C compilers allow internal identifiers to have a signifi­cance of at least 31 characters, but strict K&R compilers use a significance of only 8 charac­ters. To further confuse the issue, the ANSI standard guarantees only 6 significant characters with no case sensitivity for external identifiers, due to the limitations of certain linkers. The variable dcl_extern_ambig should be used to detect portability problems caused by ambiguities in exter­nal identifiers, and the fol­lowing rule may be used to enforce length limita­tions on internal identi­fiers:

 

1    if ( dcl_ident_length > 31 )

2       warn( 1001, "Identifier length exceeds 31." );

Recommendation: Use long names for both external and internal identifiers. If your linker demands short ex­ter­nal names, then use the preprocessor to de­fine your long names as short 6-character en­coded names (as suggested in HS84:15). This will allow you to refer to variables by a long descriptive name, and still have short names for the linker. For example:

#define ptrRecentEventRecord   p32RER

 

/* ... intervening code ...  */

 

extern EventRecord   *ptrRecentEventRecord;

ANSI conforming compilers allow string literals up to at least 509 charac­ters in length, but the actual limits may vary widely. There is no guaranteed mini­mum among non-ANSI compilers. Recommendation: restricting string literals to 255 characters may be a safe, conservative policy.

 

1    if ( lex_str_length > 255 )

2       warn( 1002, "String too long for portability." );

 

C imposes no limit on line-length, but most compilers do have a limit, usu­ally in the 100-500 char­acter range. Lines longer than 80 characters may not be port­able. The POSIX.2 standard specifies a maximum line length of 2048 characters.

CodeCheck provides several predefined variables for de­tecting length viola­tions:

Variable                                  Meaning

dcl_extern                        Set to 1 if the extern storage class is ex­plicitly spec­i­fied in a declaration.

dcl_extern_ambig          If two external identifiers have names that agree on the first 6 or more charac­ters, regardless of case, then this variable is set to the number of consecutive char­ac­ters on which they agree.

dcl_global                        Set to 1 if an identifier with external linkage has been declared. This includes variable, func­tion, and type­def names.

dcl_ident_length          Set to the number of characters in the declared iden­ti­fier.

dcl_local                          Set to 1 if a local identifier  has been declared.

dcl_static                        Set to 1 when a non-local static identifier has been declared.

lex_str_length              Set to the length of a string literal (the terminating zero is not counted).

lin_length                        Set to the number of characters in the line (ex­clud­ing the newline character at the end of the line).

 

 

4.6.2  Preprocessor Limits

C compilers vary greatly with respect to limitations on the nesting of #if con­ditionals and #include directives. For example, ANSI conforming com­pilers must allow the nesting of #if conditionals to a depth of 8, but it is un­safe to as­sume that more than two or three will be portable (RJ89:195).

It is prudent to limit the depth of header file inclusion, because some pre­pro­cessors allow a nesting depth of only three or four.

The ANSI standard specifies that compilers must allow up to 31 formal para­meters in macros, but pre-ANSI compilers may allow only four or five (RJ88:182).

CodeCheck pro­vides three predefined variables for enforcing these pre­pro­cessor limits:

Variable                                  Meaning

pp_if_depth                      Set to the new depth of conditional compilation when­ever an #if, #ifdef, #ifndef, #else, #elif, or #endif directive is activated.

pp_include_depth          Set to the new depth of file inclusion whenever an #include directive is executed, or an end-of-file in a header file is encoun­tered. See also lin_source.

pp_arg_count                   Set to the number of formal parameters found in a macro defi­nition.


Chapter 5: Maintainable Style

 

Good C programming style is much more than simply indenting source code accord­ing to a specific formal system. Indeed, the indentation problem has be­come trivial as a result of the commercial publication of code “beauti­fiers”, which automatically adjust the indentation of a program to the specifi­cations of the programmer. As many authors have pointed out, writing C with a good style re­quires adhering to a well thought-out list of rules for such things as choosing variable names, declaring vari­ables, using pre­processor di­rectives, etc. In many cases these rules can be expressed as CodeCheck rules.

This section describes how to use CodeCheck to monitor style for main­tain­ability. The Code­Check rules given here are based on various corporate stan­dards for C coding that have been made available to Abraxas Software.

The C language is so large, its notation so compact, and its restrictions so few that stylistic con­ventions are absolutely mandatory for all C program­mers. The only ques­tion is: how tough should they be? Most successful com­panies that em­ploy teams of programmers use conventions and stan­dards that are very tough indeed — and this toughness almost certainly contributes materially to their suc­cess. Discipline in itself does not inhibit creativ­ity: this is just as true in the art of C programming as it is in any other art.

Because CodeCheck rules are written in a simplified form of C, the style rules can be edited and customized to accommodate almost any taste. A great number of CodeCheck variables have been predefined for an enormous range of stylistic issues, not just the ones identified by Thomas Plum. This means that project managers can design stylistic rule sets that correspond precisely to the issues that are of concern to them, and need not feel constrained to the style propounded by Plum.


5.1    Lexical Issues in Program Maintenance

 

5.1.1  Choosing Identifier Names

Programmers use a great variety of guide­lines for choosing names for vari­ables. One popular set of guidelines is para­­phrased here.

1.   Names should never be redefined in inner blocks. This practice con­fuses more than it helps, and should be avoided.

2.   Function and typedef names must begin with a capital letter, variable names must begin with a lowercase letter.

3.   Macro names must always be all uppercase, non-macro names must never be all uppercase.

  1. Variable, function, and typedef names should never conflict with la­bels.

5.   Class names must begin with a capital letter and has a corporate prefix, like Microsoft MFC class CEdit.

6.   Data members of a class must have a prefix.

 

Some of the CodeCheck predefined variables and functions which can be used to detect vio­la­tions of guide­lines similar to the preceding are:

Variable                                  Meaning

dcl_all_upper                 Set to 1 if only uppercase letters are found in an iden­tifier name when it is declared.

dcl_enum_hidden            Set to 1 when a declarator name hides an enumer­ated constant.

dcl_first_upper            Set to the number of initial uppercase letters in an identifier when it is declared.

dcl_hidden                        Set to 1 if an inner-block declaration hides an outer.

dcl_label_overload     Set to 1 if an inner-block declarator name matches a label within the same function.

dcl_typedef                      Set to 1 if a typedef name has been declared.

 

 

5.1.2  Hungarian Notation

Many companies have found that program maintenance can be sim­pli­fied by adopting a spelling convention for identifiers which encodes the type of the vari­able in lowercase prefixes. The actual name of the identifier is distin­guished from its prefixes simply by beginning the name with a capital letter. This is the so-called Hungarian notation.

Many variations on the Hungarian theme exist. One simple set of Hungar­ian prefixes is described below, together with a set of CodeCheck rules that test for compliance with these guidelines. This is an illustrative example only, and is not intended to be a complete specification for a Hungarian prefix scheme.

Variable names must begin with one or more lowercase prefixes that describe the type of the variable. These prefixes are, in the order in which they must be used:

1.      "p" if this is a pointer, "np" if  near pointer, "fp" if far pointer;

2.      "a" if this is an array,

3.      "c" if the base type is char, "s" if short, "i" if int, and "l" if long;

4.      "g" if the base type is float, "d" if double;

5.      "x" if the base type is struct, "w" if union, "en" if enum;

6.      "b" if the base is the typedef name "Boolean".

For example, suppose that we want to declare that Foo is an array of far pointers to Booleans. The declared name of the identi­fier should be afpbFoo and the dec­la­ration should read:

 

     Boolean  far * afpbFoo[] = {  /* initializers */  };

This spelling scheme can be enforced with one single CodeCheck rule:

 

 

 1   #include <check.cch>

 2  

 3   int   k;

 4  

 5   if ( dcl_global || dcl_static || dcl_local )

 6      {

 7      k = 0;

 8      while ( k < dcl_levels )

 9         {

10         switch ( dcl_level(k) )

11            {

12         case ARRAY:

13            if ( ! prefix("a") )

14               warn( 1001, "a prefix missing on array." );

15            break;

16         case POINTER:

17            switch ( dcl_level_flags(k) )

18               {

19            case NEAR_FLAG:

20               if ( ! prefix("n") )

21                  warn( 1002, "n prefix missing on near pointer." );

22               break;

23            case FAR_FLAG:

24               if ( ! prefix("f") )

25                  warn( 1001, "f prefix missing on far pointer." );

26               break;

27               }

28            if ( ! prefix("p") )

29               warn( 1001, "p prefix missing on pointer." );

30            break;

31            }

32         k++;

33         }

34      switch ( dcl_base )

35         {

36      case VOID_TYPE:

37         if ( ! prefix("v") )

38            warn( 1001, "v prefix missing on void." );

39         break;

40      case ENUM_TYPE:

41         if ( ! prefix("n") )

42            warn( 1001, "v prefix missing on enum." );

43         break;

44      case CHAR_TYPE:

45         if ( ! prefix("c") )

46            warn( 1001, "c prefix missing on char." );

47         break;

48      case SHORT_TYPE:

49         if ( ! prefix("s") )

50            warn( 1001, "s prefix missing on short." );

51         break;

52      case INT_TYPE:

53         if ( ! prefix("i") )

54            warn( 1001, "i prefix missing on int." );

55         break;

56      case LONG_TYPE:

57         if ( ! prefix("l") )

58            warn( 1001, "l prefix missing on long." );

59         break;

60      case FLOAT_TYPE:

61         if ( ! prefix("g") )

62            warn( 1001, "g prefix missing on float." );

63         break;

64      case DOUBLE_TYPE:

65         if ( ! prefix("d") )

66            warn( 1001, "d prefix missing on double." );

67         break;

68      case UNION_TYPE:

69         if ( ! prefix("w") )

70            warn( 1001, "w prefix missing on union." );

71         break;

72      case STRUCT_TYPE:

73         if ( ! prefix("x") )

74            warn( 1001, "x prefix missing on struct." );

75         break;

76      case DEFINED_TYPE:

77         if ( strcmp(dcl_base_name(),"Boolean") == 0 )

78            if ( ! prefix("b") )

79               warn( 1016, "b prefix needed on %s.", dcl_name() );

80         break;

81         }

82      }

 

Some of the CodeCheck predefined variables and functions which can be used to detect vio­la­tions of guide­lines similar to the preceding are listed be­low. The full list of declarator variables is given in the CodeCheck Reference Manual.

Variable                                  Meaning

dcl_base                             Set to a constant that identifies the base type of the de­clarator (see Reference Manual section 3.2 for details).

dcl_base_root       Type from which the type of dcl_base is derived. If the type of dcl_base is not a user-defined type,                    dcl_base_root has same value as dcl_base.

dcl_base_name()            Returns the name of the base type of the declarator.

dcl_base_name_root()The name of type from which type of dcl_base_name is derived. If the type of dcl_base_name is not a user-defined type, dcl_base_name_root() returns the same value as dcl_base_name().

dcl_explicit        Set to 1 when a declarator has specifier "explicit".

dcl_extern                        Set to 1 if the extern storage class is ex­plicitly used in a declaration.

dcl_from_macro              Set to 1 when declarator name is derived from a macro expansion.

dcl_global                        Set to 1 if an identifier with external linkage has been declared. This includes variable, func­tion, and type­def names.

dcl_hidden                        Set to 1 if an inner-block declaration hides an outer.

dcl_Hungarian                 Set to 1 if the Hungarian style is detected (a capital let­ter is immediately preceded by a lowercase letter).

dcl_levels                        Number of levels in the type of this declarator.

dcl_local                          Set to 1 if a local identifier has been declared.

dcl_member          1 when a union member identifier is declared,

2 when a struct member identifier is declared,

3 when a class member identifier is declared;

( C++ members may be: variables, functions, or typedef names ).

dcl_mutable         1 when an indentifier is declared 'mutable'.

dcl_scope_name()    The scope name of current declarator.

dcl_signed                        Set to 1 if the signed type specifier is ex­plicitly used in a declaration.

dcl_static                        Set to 1 when a non-local static identifier has been declared.

dcl_typedef                      Set to 1 if a typedef name has been declared.

dcl_unsigned                   Set to 1 when the type specifier unsigned is used in a declaration.

op_declarator       Any operator found within a declaration.

prefix()                             Set to 1 if the next prefix in the declarator name matches the argument string.

suffix()                             Set to 1 if the next suffix in the declarator name matches the argument string.

 

 

5.1.3  Manifest Constants

Plum recommends several guide­lines for de­fining manifest constants (TP84:18-19). A constant is “manifest” if its meaning is clearly apparent to the main­tenance pro­grammer. Manifest constants are used to give meaning to num­bers which other­wise would seem wholly capri­cious, as may be seen in these two contrasting exam­ples:  

Example 1 —  bad, 32 has no manifest meaning:

1    if ( index < 32 )

2       x[index++] = 0;

 

Example 2 —  good, 32 now has a clear meaning:

 

1    #define  TABLSIZE  32

2      •••

3      •••

4    if ( index < TABLSIZE )

5       x[index++] = 0;

 As Plum points out, manifest constants are neither manifest nor constant if their value is changed midway through a program (by means of an #undef and/or another #define that rede­fines the constant). To change a manifest con­stant in this manner is to create an immediate maintenance prob­lem. In­deed, there only three valid purposes for an #undef:

1.   to override a macro name defined in a standard header,

2.   to limit the lexical scope of a macro,

3.   to free up space in the com­piler’s macro table, if it threatens to over­flow.

The Plum guidelines for manifest constants that can be checked by Code­Check are para­phrased as follows:

a.   The names of manifest constants should be spelled in cap­ital letters.

b.   The value of manifest constants must never change.

c.   If a manifest constant is to be used in more than one file, then it must be de­fined in a single header file.

The predefined CodeCheck variables which are used to detect violations of these guide­lines are:

Variable                                  Meaning

lex_not_manifest          Set to 1 if a numeric constant other than 0 or 1 is used in any context other than a macro definition or a com­ment.

lex_initializer            Set to the following when an initializer is found:

                                                1 if the initializer is the integer zero,

                                                2 if the initializer is a non zero integer,

                                                3 if initializer is a boolean literal, true or false(C++ only).

                                                4 if the initializer is a float or double constant,

                                                5 if the initializer is a string literal, and

                                                6 if the initializer is anything else.

pp_lowercase                   Set to 1 if the macro name in a macro definition is de­fined with any let­ters that are lowercase.

pp_stack                             Set to 1 if a macro is multiply de­fined.

pp_undef                             Set to 1 whenever #undef is used.  

pp_macro_dup                   Set to 1 if a macro is defined in more than one file.

pp_macro_conflict       Set to 1 if a macro is defined differently in more than one file.

 

 

5.1.4  Lexical Rules for Spacing Operators

C code readability is greatly enhanced by rigorous adherence to a uniform system for placing spaces around C operators and punctuation marks. Elabo­rat­ing somewhat upon the spacing scheme recommended by Plum (TP84:30-32), CodeCheck recognizes six categories of tokens that need spacing:

1.   High precedence operators:  all unary operators, and the four mem­ber se­lec­tion operators ( .  –>  .*  –>* ). These to­kens should never have space separating them from their operands.

2.   Low precedence operators:  all assignment operators and the con­di­tion­al operator pair ( ? : ). These tokens should always have space separat­ing them from their operands.

3.   Commas, semicolons, and colons:  these tokens should not be pre­ceded by a space, and should be followed by a space or new­line.

4.   Function argument parentheses: the open parenthesis in a func­tion call should not be separated by whitespace from the func­tion iden­tifier. There should be consistency within a project as to whether or not the contents of the paren­thetical ex­pres­sion are separated by spaces from the parentheses.

5.   Subscript selection brackets:  the open bracket should not be sep­a­rated by whitespace from the array to which it refers. There should be consis­tency within a project as to whether or not the contents of the bracket ex­pres­sion are separated by spaces from the brackets.

6.   Other:  This category encompasses all other operators not listed above, including arithmetic, bit­wise, relational, and logical oper­a­tors. These to­kens are usu­ally but not necessarily sur­rounded by spaces.

The predefined CodeCheck variables with which these stylistic guidelines may be checked are given below.  In designing a set of lexical rules relating to spacing, the goal should be to enforce consistency to a simple and clearly-stated spacing scheme.

Variable                                  Meaning

lex_punct_after            Set to 1 if a comma or semicolon is not followed by a whitespace character, a comma, or a semicolon.

lex_punct_before          Set to 1 if a comma or semicolon is pre­ceded by a space.

lin_continues                 Set to 1 if a line ends before the end of the current ex­pression.

op_executable                 Set to 1 if the operator is within executable code, including initializers.

op_declarator                 Set to 1 if the operator is within a declaration, not including initializers.

op_low                                  Set to 1 for operators with low precedence.

op_medium                          Set to 1 for operators with medium precedence.

op_high                               Set to 1 for operators with high precedence.

op_prefix                          Set to 1 for unary prefix operators.

op_infix                             Set to 1 for binary infix operators.

op_postfix                        Set to 1 for unary postfix operators.

op_space_before            Set to 1 if an operator or punctuation mark is pre­ceded by a ­s­pace character.

op_space_after              Set to 1 if an operator or punctuation mark is fol­lowed by a ­s­pace character.

op_white_before            Set to 1 if an operator or punctuation mark is pre­ceded by white­s­pace.

op_white_after              Set to 1 if an operator or punctuation mark is fol­lowed by white­s­pace.

 

 

5.1.5  Indentation and the Placement of Braces

The purpose of indentation is to reveal the subordinate nature of blocks of code, thus greatly enhancing program readability. Plum has summarized the fundamental rule for indentation which underlies almost all popular formats (TP84:42):  

Each line which is part of the body of a C control structure (if, while, do-while, for, switch) is indented one tab stop from the margin of its control­ling line. The same rule applies to function, struct, or union defi­ni­tions, and aggre­gate initial­izers.

CodeCheck provides several predefined variables that allow indentation to be checked, using Plum’s general guideline as a basis for individualized schemes.

The fundamental rule stated above leaves unspecified how braces are to be treated. Oddly enough, the opinions of C programmers on this point seem to be held with a fervor approaching religious fanaticism. CodeCheck resolves the problem by providing a command-line option (–B) which, if set, informs Code­Check that braces are to be considered part of the body of the control struc­ture (this corresponds to Plum’s own preference, and to the convention used in Pas­cal). If –B is not set, then braces are not so considered (the Kerni­ghan & Ritchie style). The –B option only affects the way in which the pre­de­fined Code­Check variable lin_nest_level is incremented and decremented. Two ex­am­ples may serve to illustrate this difference:

Option -B not set (default)              lin_nest_level

1    if ( test > 0 )                 /*    1                 */

2    {                               /*    1: no indent      */

3       k += delta;                  /*    2                 */

4       printf( "%d\n", k );         /*    2                 */

5    }                               /*    1: no indent      */

6    printf( "Done" );               /*    1                 */

Option -B set                       lin_nest_level

1    if ( test > 0 )                 /*  1                   */

2       {                            /*  2: indented brace   */

3       k += delta;                  /*  2                   */

4       printf( "%d\n", k );         /*  2                   */

5       }                            /*  2: indented brace   */

6    printf( "Done" );               /*  1                   */

       

The CodeCheck predefined variable lin_nest_level not only counts the depth of control struc­tures, it is also incremented and decremented appropri­ately within function definitions, structure and union definitions, and aggre­gate ini­tializers.

Variable                                  Meaning

lin_nest_level              Set to the nesting level when the first non­white char­acter of a line is found. How braces are counted in the nesting level depends on the com­mand line op­tion –B.

lin_indent_space          Set to the number of leading space charac­ters found be­fore the first non-white non-comment char­ac­ter of a line.

lin_indent_tab              Set to the number of leading tab char­acters found be­fore the first non-white non-com­ment char­ac­ter of a line.

lin_continuation          Set to 1 if a line continues an expression or declara­tion list from the previous line.

Some standards ask programmers to use only space characters to indent, while others specify only tab characters. In these cases the CodeCheck vari­ables lin_indent_space and lin_indent_tab can be used to deter­mine the amount of indentation and to detect the use of the wrong character to create in­dentation.

Determining the actual width of the indentation used on any given line of code is problematic, unless the characteristics of the editor used to display the code are known. Some editors (e.g. Macintosh and Windows editors) use pro­por­tional-width characters, while older editors use single-width characters. Worse, editors vary greatly in how they treat tab characters: primitive editors simply generate 8 spaces when a tab is entered, better editors move forward to a fixed position on the page, while yet others behave in a context-dependent way. In the face of this chaos, it is simply not possible to find a general form­ula that calcu­lates the actual displayed indentation given a sequence of mixed spaces and tabs.

Here is an example CodeCheck rule that checks indentation within func­tion definitions, based on the number of leading tab characters in each line. This rule ignores the indentation of comments, preprocessor directives, and declarations.

 

 1   int  difference;

 2

 3   if ( lin_within_function )

 4      if (! (lin_is_comment || lin_preprocessor || lin_dcl_count) )

 5         {

 6         difference = lin_nest_level - lin_indent_tab;

 7         switch ( difference )

 8            {

 9         case -1:

10            warn( 1003, "Indent 1 fewer tab." );

11            break;

12         case 0:     // Indentation is correct.

13            break;

14         case 1:

15            warn( 1003, "Indent 1 more tab." );

16            break;

17         default:

18            warn( 1004, "Indent %d tabs.", difference );

19            break;

20         }

Most standards for the indentation of switch statements call for an appear­ance like this:

 

1       switch ( ijk )

2       {

3       case 1:

4          break;

5       default:

6          break;

7       }

However, some standards call for more indentation of the contents of the switch, as in this example:

 

1       switch ( ijk )

2       {

3          case 1:

4             break;

5          default:

6             break;

7       }

CodeCheck assumes that the former example is the norm. To enforce a stan­dard that calls for more indentation, like the latter example, the following rule can be used:

 

 1   #include <check.cch>

 2

 3   int   start_switch,  //  Flags beginning of a switch statement.

 4         switch_depth,  //  Measures the depth of switch nesting.

 5         diff;          //  Amount of error in nesting (in tabs).

 6

 7   if ( stm_cp_begin == SWITCH )

 8      start_switch = 1;            //  Detect start of switch

 9

10   if ( stm_is_comp == SWITCH )

11      --switch_depth;              //  Detect end of switch

12

13   if ( lin_within_function )

14      if ( ! (lin_is_comment || lin_preprocessor || lin_dcl_count) )

15         {

16         diff = lin_nest_level + switch_depth - lin_indent_tab;

17         switch ( diff )

18            {

19            case -1:

20               warn( 1003, "Indent 1 fewer tab." );

21               break;

22            case 0:     //   Indentation is correct.

23               break;

24            case 1:

25               warn( 1003, "Indent 1 more tab." );

26               break;

27            default:

28               warn( 1004, "Indent %d tabs.", difference );

29               break;

30            }

31         if ( start_switch )

32            {

33            ++switch_depth;

34            start_switch = 0;

35            }

36         }

The variables used in this example are:

Variable                                  Meaning

stm_cp_begin                   When the open curly brace of a compound statement has been found, this variable is set to the context of the compound statement (see Reference Manual sec­tion 3.12 for details). 

stm_is_comp                      When the close curly brace of a compound statement has been found, this variable is set to the context of the compound statement (IF through COMPOUND).

Some standards require that if, else, for, while and do statements contains compound statements even though the compound statements themselves contain only single statements or empty statements.

The variable for this purpose is

Variable                            Meaning

stm_need_comp                Set to 1 if a statement contained by if, else, while, do and for is not a compound statement.

 

5.1.6  Postfix Problems

For reasons that are impossible to understand, C has always allowed a long numeric constant to be indicated with a postfixed lower­case ‘el’, as in the constant pi: 3.1415926535l. It takes great concentration to discern that the last character in this constant is an ‘el’ and not a ‘one’.

As a matter of style, some project leaders do not permit the use of suf­fixes at all, preferring to see explicit casts. The predefined CodeCheck vari­able which will de­tect any suffix attached to a numeric constant is lex_suffix.

Recommendation: Avoid the lowercase ‘el’. Always use an explicit cast or the up­per­case ‘L’ postfix to in­dicate a long.

Variable                                  Meaning

lex_lc_long                      Set to 1 if a numeric constant ends with a lower-case ‘el’, in­di­cating a long type.

lex_suffix                        Set to 1 if a numeric constant is found with any suf­fix ('F', 'f', 'L', 'l', 'U', or 'u'), or combination of these.

 

 

5.1.7  Nonstandard Comments

A small minority of C compilers allow the nesting of comments within com­ments. This us­age is therefore not portable. CodeCheck can recognize both kinds of nonstandard comments, but treats a nested /* ... */ comment as a syntax error. To avoid the syntax error, set the -N option on the command-line that invokes CodeCheck, or have a rule that calls set_option(‘N’,1) at the start of each module.  

If you want CodeCheck to decide that nested comments are acceptable as soon as the first nested comment is found, use this rule:

1    if ( lin_nested_comment )

2       {

3       warn( 1234, “Nested comment.” );

4       set_option( ‘N’, 1 );

5       }

Some compilers have adop­ted the // comment of C++. These comments are terminated by the end of the line on which the // was found. Needless to say, these comments are not port­able either. CodeCheck accepts a // comment as syn­­tacti­cally correct, and excepts without complaint the nesting of // com­ments within /* ... */ comments and vice versa.

Recommendations:  (a) To block out a section of code that contains com­ments, bracket the section with #if 0 at the top and #endif at the bottom. This will serve to disable the entire section of code, even if it contains com­ments. (b) Do not use the // comment format unless you are actually writing C++ code.

Code­Check provides two predefined variables for detecting nonstandard com­­ments:

Variable                                  Meaning

lin_nested_comment     Set to 1 if a /*...*/ comment is found nested within an­other /*...*/ comment.

lex_cpp_comment            Set to 1 if a // comment is found.

 

 

5.1.8  Magic Numbers

A “magic number” is a number whose meaning is not apparent to the main­tenance pro­grammer. The example in section 5.1.3 illustrates the use of a magic number, in this case 32. There are three problems with magic num­bers:

1.   The meaning of a simple number is almost never apparent from context alone.

2.   It is very difficult to change all occurrences of a particular magic number when its value must be changed, because not all occurrences of a particular number will have the same meaning.

3.   A magic number adds to a program’s cognitive complexity because it forces the reader to wonder what the meaning of the number might be.

Recommendation:  Either define constants other than 0 and 1 as manifest con­stant macros (e.g. #define buffno 15 ) or use a constant declaration with an ini­tial­izer (e.g. const short buffno = 15; ). If you use a macro for the con­stant, then place its defini­tion in a single header file. Here is a Code­Check rule for de­tecting magic constants:

1    if ( lex_not_manifest )

2       if ( lex_initializer < 5 && lex_initializer != 3  )

3          warn( 99, "Use a const or macro for this magic number!" );

Variable                                  Meaning

lex_initializer            Set to the following when an initializer is found:

                                                1 if the initializer is the integer zero,

                                                2 if the initializer is a non zero integer,

                                                3 if initializer is a boolean literal, true or false ( C++ only ),

                                                4 if the initializer is a float or double constant,

                                                5 if the initializer is a string literal, and

                                                6 if the initializer is anything else.

lex_not_manifest          Set to 1 if a numeric constant other than 0 or 1 is used in any context other than a macro definition or a comment.

 

 

5.1.9  Escape Sequences in Character Constants and String Literals

An escape sequence within a character constant or string literal is signaled by the back­slash char­acter: \. The unrestricted use of escape sequences is a fre­quent cause of maintenance problems.

Recommendation: for maximum program clarity and maintainability, ob­serve these rules:

1.   To enhance program readability, use macros to encode character con­stants that contain escape se­quences, as in #define DBLQUOTE '\"'

2.   Avoid numeric escape codes if possible; if unavoidable then use a macro definition to encode the constant, so that the purpose is evident in the name of the macro.

CodeCheck provides a predefined variable for detecting numeric escapes:

Variable                                  Meaning

lex_num_escape              Whenever a non zero numeric escape sequence is found, the value of this variable is set to the value of the nu­meric es­cape se­quence.

 


5.2    Preprocessor Considerations

 

5.2.1  Silent Preprocessor Errors

The preprocessor permits programmers to make a variety of mistakes without com­plaint. These constitute maintenance problems as well, because programs contain­ing these errors may appear to compile and execute cor­rectly.

First, many compilers ignore code that appears on the same line as an #ifdef di­rective. In the fol­lowing example the desired debug message will never be printed:

1    #ifdef DEBUG printf( "Now entering recur­sion..." );

2    #endif

Second, even experienced programmers may sometimes place an assign­ment op­erator in a simple substitution macro, as in the fol­lowing code (which will compile, although not the way the programmer intended!).

3    #define TOOBIG = 99999

This is a case in which the permissive grammar of the preprocessor actu­ally en­cour­ages silent but deadly program errors. A similar error occurs when a pro­grammer inadvertently concludes a macro definition with a semicolon.

Third, some programmers forget (or do not realize) that it is important always to surround macro formal parameters with parentheses.

There are four predefined CodeCheck variables for detecting these com­mon prepro­cessor errors:

Variable                                  Meaning

pp_trailer                        Set to 1 if a preprocessor line con­tains any non­white characters after the end of the di­rective and before the end of the line.

pp_assign                          Set to 1 if a macro definition is a sim­ple assign­ment.

pp_semicolon                   Set to 1 if a macro definition ends with a semi­colon.

pp_arg_paren                   Set to 1 if a macro formal parameter is used without being surrounded by parentheses.

 

 

5.2.2  The #define/#undef Morass

Any constant which might need to be changed during program develop­ment or mainte­nance should be “manifest”, which means that it must be clearly appar­ent to the sight or understanding (TP84:18). Manifest constants are used to give meaning to numbers which otherwise would seem wholly capri­cious, as may be seen in the two contrasting examples in section 5.1.3.

A problem is caused by programmers who have learned that their com­piler treats the #define directives as though they were pushing defini­tions onto a stack, and #undef directives as though they were popping these defini­tions off the stack. Using #define and #undef in this way is not only an abuse of the preprocessor, it is also highly non portable. Not all preproces­sors are set up in this way, and the ANSI standard specifically forbids this be­hav­ior. The Code­Check preprocessor adheres to the ANSI stan­dard in this matter, and flags every apparent pop of the macro stack by setting the pre­de­fined variable pp_unstack (defined below).

There is another way in which #undef can cause tearing of hair and gnash­ing of teeth, as illustrated in this highly artificial example:

1    #define  BLEEP  3

2    #define  BLATZ  BLEEP + BLEEP

3    #undef   BLEEP

What will your compiler see if you now refer to BLATZ? The ANSI stan­dard is quite clear that BLATZ must be replaced by BLEEP+BLEEP, but some pre-ANSI com­pil­ers will re­ceive 3+3 from their preprocessors. CodeCheck flags this kind of #undef usage by set­ting the predefined variable pp_depend.

Lastly, there is the ANSI concept of “benign redefinition”. ANSI has blessed the idea that macros can be multiply defined (e.g. in several different header files) as long as the definitions are virtually identical. Two macro def­ini­tions are vir­tually iden­tical if they are exactly identical after each whites­pace se­quence has been replaced by a single space character. Any other redefi­nition of a macro will be flagged by an ANSI preprocessor as an error. The Code­Check pre­processor al­lows benign redefinition, but sets the vari­able pp_benign whenever a benign redefinition is encoun­tered.

Variable                                  Meaning

pp_benign                          Set to 1 if a macro is redefined to be vir­tually identi­cal to its previous defi­nition.

pp_depend                          Set to 1 if #undef is used on a macro that is used by other macros.

pp_stack                             Set to 1 if a macro is rede­fined within a module (0 if the redefinition is benign).

pp_undef                             Set to 1 whenever #undef is used.  

pp_unstack                        Set to 1 if #undef is used to un­stack mul­­tiply-de­fined macros.  

Recommendation: define your manifest constants once in a header file, and leave them de­fined. Do not depend upon benign redefinition, and do not use #undef unless absolutely required by overflow of the macro symbol table.

 


5.3    Maintainability in Declarations

 

5.3.1  Identifier Name Length

For program clarity and maintainability, it is desirable to use identifier names that are long enough to express the purpose to which they will be put. Thus iden­tifier length can and should be used as one indicator of main­tain­ability.

Recommendation: Give each variable a name that succinctly describes its pur­pose.

Variable                                  Meaning

dcl_ident_length          Set to the number of characters in the declared iden­ti­fier.

Here is a sample CodeCheck program for measuring the average identifier name length within each module of a project, and flagging any identifiers that are over 31 or under 3 characters in length. The assignment in line 9 is necessary because dcl_ident_length is not itself a statistic.

 1   statistic float   length;

 2

 3   if ( dcl_ident_length )

 4      {

 5      if ( dcl_ident_length> 31 )

 6         warn(5000, "Identifier exceeds 31 characters!");

 7      if ( dcl_ident_length< 3 )

 8         warn(5001, "Use a longer name (>3 characters)." );

 9      length = dcl_ident_length;   // save current length

10      }

11

12   if ( mod_begin )

13      reset(length);

14

15   if ( mod_end )

16      {

17      meanIDLength = mean( length );

18      printf("Mean identifier length = %g", meanIDLength);

19      }

 

5.3.2  Declaration Format

Plum (TP84:2) and many others strongly recommend some fairly stringent guide­lines for formatting declarations. The guide­lines reported here are those in­tended to im­prove the main­tainability of C programs.

1.   All variables, functions, and function parameters must have an ex­plicit type speci­fier.

2.   Declare only one variable, function, tag or member per line of source code.

3.   Place an explanatory comment at the end of every declaration line.

The predefined CodeCheck variables which are used to detect violations of the above guide­lines are:

Variable                                  Meaning

dcl_no_specifier          Set to 1 if a variable or function declarator has no ex­plicit type information.

dcl_not_declared          Set to 1 if an old-style function parameter is not de­clared.

dcl_parameter       Index of function parameter (1 for first, etc.)

lin_dcl_count                 Set to the number of identifiers declared on the cur­rent line (includes tag definitions and function pa­ra­meters).

lin_has_comment            Set to 1 if a line has a comment that contains text.

 

In this example we restrict one variable, function declaration per line, it is unnecessary to count function parameters. The following rule will exclude function parameters.

 

int parmCount;

if ( mod_begin ) paramCount = 0;      // reset

if ( dcl_parameter )   paramCount++;

if ( lin_end ) {

 

if ( lin_dcl_count – paramCount > 1 ) {

warn( 99, “Only use one declarator per line.” );

   }

paramCount = 0;          // reset count at EOL

}

 

5.3.3  Initialization of External Variables

The ambiguity in C between defining and referencing external declarations makes poss­ible an extremely bizarre  class of bugs, some of which do not man­ifest them­selves until an apparently stable program is substantially revised. From a pro­grammer’s point of view, the best defense against these bugs is strict ad­her­ence to two guidelines, para­phrased from Harbison & Steele (HS87:80):

1.   Define each external variable in only one source file. Indicate that this is a definition by omitting the extern storage class key­word from the declaration, and supplying an initializer.

2.   Every non-defining declaration of an external variable must use the extern storage class keyword, and must not have an initial­izer.

These two rules appear to be the only rules that will simultaneously pro­tect pro­grammers from bugs introduced by missing or conflicting initializers, and also ensure portability to non-ANSI compilers that use idiosyncratic methods for distinguishing defining from referencing declarations. Here is a Code­Check rule that checks for adherence to these guidelines:

 

1    if ( dcl_variable )

2       if ( dcl_global  )

3          {

4          if ( dcl_extern && dcl_initializer )

5             warn( 8001, "Initializer must be omitted." );

6          else if ( (! dcl_extern) && (! dcl_initializer) )

7             warn( 8002, "Initializer needed here." );

8          }

Variable                                  Meaning

dcl_extern                        Set to 1 if the extern storage class is ex­plicitly spec­i­fied in a declaration.

dcl_global                        Set to 1 if an identifier with external linkage has been declared. This includes variable, func­tion, and type­def names.

dcl_initializer            Set to 1 when an initializer is found.

dcl_template        Number of C++ function template parameters.

dcl_variable                   Set to 1 if a variable is declared.

 

5.3.4  Keywords const, and volatile as Type Modifiers

Many C compilers for DOS and OS/2 (e.g. Microsoft, Borland and Intel) have spe­cial keywords that modify the type of a declarator. These keywords include near, far, cdecl, pascal, and others. Programmers must be alert to the fact that these special non-ANSI type modifiers do not act like ordi­nary type speci­fiers: they only modify the type of the next declarator or pointer, and have no effect on any other declarator in the list. For example, in this declaration list                

 

     short far p, q;

the declarator p is a far short integer, but q is only a plain short integer. (Except when compiled by Metaware High C, for which near, far, and huge are type specifiers. The possibilities for confusion are truly endless.)

A serious ambiguity arises with const and volatile: some compilers ac­cept these keywords both as ANSI standard type qualifiers and as non-ANSI type modifiers, depending on context. Consider these four decla­ra­tion lists:

 

1    const short      a, b;          /*  ANSI, both are constant      */

2    short const      c, d;          /*  ANSI, both are constant      */

3    short far const  e, f;          /*  Not ANSI, f is NOT constant  */

4    short            g, const h;    /*  Not ANSI, h is constant      */

In the first two declaration lists const is used as a type specifier, which ap­plies to every variable in the declaration list. The third declaration list illus­trates a non-ANSI use of const that is allowed by the Microsoft C com­piler. Here the keyword const is a type modifier (modifying only the next declara­tor), hence only e is a far constant short integer, while f is a plain short inte­ger. The fourth declaration list illustrates another non-ANSI use of const that is allowed by Mi­crosoft and Intel com­pilers; in this list only h is constant.

Here is a simple rule that will detect these potentially troublesome usages:

 

1    if ( dcl_cv_modifier )

2       warn( 8026, "Non-ANSI usage of const or volatile." );

 

Variable                                  Meaning

dcl_cv_modifier            Set to 1 if the keyword const is used as a non-ANSI type modifier (similar to near, far, etc.) rather than as an ANSI type specifier. Set to 2 if the keyword volatile is used as a non-ANSI type modifier.

C++ has the following special keywords.

dcl_explicit        1 when a declarator has specifier explicit.

dcl_mutable         1 when an indentifier is declared mutable.

 


5.4    Maintainability at the Project Level

 

5.4.1  Macro Redefinition

Macros that are defined differently within the several files of a project are at best a source of great confusion for the maintenance programmer, and at worst can cause some of the most mysterious program behaviors ever en­countered.

Recommendation: If a macro is used in more than one source file of a pro­ject, then place its definition in a single header file. Each macro should have exactly one defini­tion per project. Be sure to document the definition with a comment.

CodeCheck normally compares each macro definition with every other macro of the same name defined in other modules. If the current definition differs in any significant way (i.e. it is not a “benign redefinition” in ANSI terminology), then CodeCheck emits the warning message C0008 (see Reference Manual, Section 5.2). The CodeCheck variable pp_macro_conflict is also set at this time, so that users can trigger their own customized error messages.

CodeCheck pro­vides three predefined variables for working with macro re­defini­tion problems:

Variable                                  Meaning

pp_macro_dup                   Set to 1 if a macro is defined in more than one file.

pp_macro_conflict       Set to 1 if a macro is defined differently in separate mod­ules of a project.

prj_conflicts                 Set to the number of conflicting macro definitions found in a project.

 


Chapter 6:  Software Metrics

 

Good management of the production and maintenance of computer soft­ware is at best very difficult. A major part of the problem is caused by difficul­ties in measuring such crucial quantities as programmer productivity, quality of code, and defect rate. Programmer productivity, for example, is often mea­sured in thousands of lines of code per month, as though all programmers wrote lines in roughly the same way. Unfortu­nately for this measure, some write with great expanses of whitespace, blank lines, and comments, while others produce dense agglomerations of turgid code. Line for line, the func­tional capability of the dense code may be ten times that of the expansive code. Should the programmer who writes dense code be penalized for this? On the other hand, the expansive code is almost always easier to maintain, because it is more comprehensible to the maintenance programmer, so clearly there are trade-offs that must be quanti­fied.

The emerging discipline of software engineering has produced many ex­peri­mental metrics which attempt to quantify various aspects of computer code. The literature on software metrics is still quite sparse compared to the more mature engineering disci­plines, but many good ideas have appeared. A good textbook on software measure­ment was finally published in 1986: Soft­ware Engineering Metrics and Models, by Conte, Dunsmore, and Shen. In ad­dition, Grady and Caswell have written an excellent case history of Hewlett-Packard’s efforts in software measure­ment, entitled Soft­ware Metrics: Estab­lishing a Company-Wide Program.

 

 

6.1    Program Size

Measures of program size are fundamental to the management of software engi­neering, because they provide the units on which many important mea­sures are based. Productivity, for example, is often expressed in thousand lines of code per programmer per month, and accuracy can be mea­sured in defects per hun­dred lines of code.

Software Engineers have attempted to resolve the problems of the “lines of code” measure in two ways:  (1) tinkering with the definition of a line of code, so that it (for example) excludes blank and comment lines, as discussed in section 6.1.2, and (2) defining other measures of program size, as discussed in section 6.1.3 (statements), 6.1.4 (tokens) and 6.1.5 (functions). There is, how­ever, a third way to deal with this problem: use distributions.

 

 

6.1.1  Distributions Provide Additional Information

An approach to the problem of defining program size which is both more meaningful and appro­priate from a statistical point of view, is to focus on the dis­tribution of kinds of lines of code. This approach has not yet appeared in the soft­ware engineering litera­ture, even though it is common enough in the context of general measurement tools for business management. As an ex­ample of this new focus, one could define the optimum distribution of kinds of lines of code as 10% blank, 30% comment, 30% declarations, and 30% exe­cutable code. Now a pro­gram can be described in two complementary ways: total lines and deviation from the opti­mum mix of kinds of lines. All of the measures of program size that can be calculated by CodeCheck can be consid­ered in this manner, by declaring the CodeCheck vari­able for the measure to be of the statistic storage class.

 

 

6.1.2  Lines of Code

Although there is no consensus within software engineering as to how exactly to define a line of code, the actual variations concern how to treat:  (1) blank lines, (2) comment lines, (3) lines derived from header files, (4) lines suppressed by the preprocessor, (5) multiple and/or fragmentary statements, and (6) non-exe­cutable statements. One popular and useful defini­tion (CDS86:35) reads as fol­lows:

A line of code is any line of program text that is not a comment or a blank line, regardless of the number of statements or frag­ments of state­ments on the line. This specifically includes all lines containing program headers, declarations, and executable and non-executable statements.

The above definition is silent with regard to whether lines are to be counted if they are suppressed or if they come from header files, and these are points on which reasonable men and women can and do differ.

CodeCheck is flexible enough to handle most variations of the measure for lines of code. There are 10 predefined variables that describe features of each line:

Line variables:

Variable                                  Meaning

lin_end                               Set to 1 when an end-of-line marker has been found (a newline character or the backslash-new­line pair).

lin_dcl_count                 Set to the number of identifiers declared on the cur­rent line (includes tag definitions and function pa­ra­meters).

lin_has_comment            Set to 1 if a line has a comment that contains text.

lin_has_code                   Set to 1 if a line contains C code.

lin_header                        Set to 1 if the current line was obtained from a pro­ject header file by means of  #include "file­name".  Set to 2 if the current line was ob­tained from a system header file by means of  #include <filename>.

lin_include_kind    1 if the line includes a project header,

2 if the line includes a system header.

lin_include_name()     The file name this line includes.

lin_is_comment              Set to 1 if a line has no C code and either con­tains a comment or is contained within a comment. The com­ment line must contain text to qualify as a real com­ment.

lin_is_exec                      Set to 1 if a line contains code that is executable.

lin_preprocessor          Set to the 1 if a line is a preprocessor direc­tive (i.e. be­gins with #).

lin_suppressed              Set to 1 if compilation of a line has been sup­pressed by the preprocessor.

lin_is_white                   Set to 1 if a line consists entirely of whitespace (tabs & spaces), or is a comment line without any text.

lin_number                        Set to the number of the current line, relative to the start of the current file.

lin_source                        Set to 1 if the current line was not obtained from a header file.

 

In addition to these, CodeCheck has predefined variables that compute to­tal lines of code at the end of every statement, function, module, and project.

Statement variables:

stm_lines                          Set to the number of lines in a statement.

Function variables:

Variable                                  Meaning

fcn_total_lines            Set to the total number of lines in the definition of a C function (sta­tis­tic).

fcn_com_lines                 Set to the number of pure comment lines in the defini­tion of a C function (sta­tis­tic).

fcn_white_lines            Set to the number of whitespace lines in the defini­tion of a C function (sta­tis­tic).

fcn_exec_lines              Set to the number of executable lines in the defini­tion of a C function (sta­tis­tic).

Module variables:

mod_total_lines            Set to the total number of lines in a module (sta­tis­tic).

mod_com_lines                 Set to the number of pure comment lines in a module (sta­tis­tic).

mod_white_lines            Set to the number of whitespace lines in a module (sta­tis­tic).

mod_exec_lines              Set to the number of executable lines in a module (sta­tis­tic).

Project variables:

prj_total_lines            Set to the total number of lines in a module.

prj_com_lines                 Set to the number of pure comment lines in a module.

prj_white_lines            Set to the number of whitespace lines in a module.

prj_exec_lines              Set to the number of executable lines in a module.

 

A CodeCheck rule which reports the size of a module as measured in lines of code, using the definition given at the beginning of this section, can be con­structed in this way:

 

1   if ( mod_begin

2     n = 0;

3

4   if ( mod_end )

5     {

6     n = mod_total_lines - mod_com_lines - mod_white_lines;

7     printf( "Lines of code in %s: %d", mod_name(), n );

8     }

 

 

6.1.3  Statements

A natural way to overcome some of the problems inherent in the defini­tion of a line of C code is to count executable statements instead. Statements have sev­eral advan­tages over lines: there is very little ambiguity as what con­stitutes a C statement, and every C statement is a conceptually complete piece of code. Fur­thermore, the state­ment is the smallest unit of C pro­gram­ming for which this can be said.

What little ambiguity there is in the definition of a “statement” concerns two mat­ters — how to count statements that include other statements, and whether (or how) to count such non-executable statement-like entities as type definitions, de­cla­ra­tions, initializers, and preprocessor directives. Kernighan & Ritchie (KR88:236) identify five kinds of exe­cutable C statements:

•  Expression               an expression followed by a semicolon.

•  Jump                        a break, continue, return, or goto statement.

•  Compound               a list of statements surrounded by braces.

•  Selection                  an if, if-else, or switch statement.

•  Iteration                  a for, while, or do statement.

•  Labeled                    any statement with a label.

Thus Kernighan & Ritchie do not consider type definitions, declarations, and prepro­cessor directives as statements at all, and would count this example

1    if ( x > y )

2       x = y;

3    else

4       {

5 skip:

6       x = 0;

7       y++;

8       }

as six statements (an if-else statement, a compound statement, three ex­pres­sion statements, and one labeled statement).

Unfortunately, the Kernighan & Ritchie statement categories are now some­what out-of-date because C++ has blurred the formerly clear distinction between declarations and executable statements. For our purposes it seems necessary to include declarations as a kind of statement. A simple and useful alternative set of statement categories is as follows:

•  Non-executable                 Local declarations, except local C++ declara­tions that have initializers (because they are executable).

•  Low-level                         Expression and jump statements, and local C++ decla­rations with initializers.

•  High-level                         Compound, selection, and iteration statement and try blocks ( C++ exception handling ).

 

In this scheme an else-clause is considered to be a separate high-level statement, and labeled statements are not counted twice. Incidentally, Code­Check con­siders a label without an accompanying statement (e.g. just be­fore the last brace of a function definition) to be a low-level empty state­ment, equivalent to a la­beled semicolon. Such statements are not allowed by the ANSI standard, but are per­mitted by virtually every compiler.

To provide sufficient flexibility for the definition of a great variety of size mea­sures based on counting statements, CodeCheck offers the following pre­de­fined vari­ables. Each variable is set to its appropriate value when the end of the statement (as defined by Kernighan & Ritchie) has been found.

Statement variables:

stm_cases                          Set to the number of case labels attached to this statement (includes the default label).

stm_catchs          Number of handlers in try block

stm_cp_begin                   When the open curly brace of a compound statement has been found, this variable is set to the context of the compound statement (see the Reference Manual, section 3.12 for details). 

stm_end                               Set to 1 when the end of a statement has been found.

stm_end_tryblock    1 when end of whole try-block is reached .

stm_is_expr                      Set to 1 if this is an expression statement.

stm_is_jump                      Set to 1 if this is a jump statement.

stm_is_comp                      When the close curly brace of a compound statement has been found, this variable is set to the context of the compound statement (see the Reference Manual, section 3.12 for details).

stm_is_select                 Set to 1 if this is a selection statement.

stm_is_iter                      Set to 1 if this is an iteration statement.

stm_is_low                        Set to 1 if this is an expression or jump state­ment.

stm_is_high                      Set to 1 if this is a compound, selection, or itera­tion state­­ment.

stm_is_nonexec              Set to 1 if this is a local declaration. This does not trig­ger on a local C++ declaration that has an ini­tializ­er.

stm_labels                        Set to the number of ordinary labels (not including case or default labels) attached to this statement.

In some cases, an exception handler may never be reached because the other handler(s) before it can catch the same exception.

stm_never_caught    1 if an exception handler will never be reached.

 

In addition to these, CodeCheck has predefined variables that compute to­tal num­bers of statements at the end of every function, module, and project:

Function variables:

fcn_low                               Set to the number of low-level statements found in the definition of a C function (sta­tis­tic).

fcn_high                             Set to the number of high-level statements found in the definition of a C function (sta­tis­tic).

fcn_nonexec                      Set to the number of non-execut­able statements found in the definition of a C function (sta­tis­tic).

Module variables:

mod_low                               Set to the number of low-level statements found in a module (sta­tis­tic).

mod_high                             Set to the number of high-level statements found in a module (sta­tis­tic).

mod_nonexec                      Set to the number of non-execut­able statements found in a module (sta­tis­tic).

Project variables:

prj_low                               Set to the number of low-level statements found in a project.

prj_high                             Set to the number of high-level statements found in a project.

prj_nonexec                      Set to the number of non-execut­able state­ments found in a project.

 

A CodeCheck rule which reports the size of a project as measured in state­ments, using the broadest definition of the term, can be constructed in this way:

1    if ( prj_end )

2       {

3       size = prj_low + prj_high + prj_nonexec;

4       printf( "Total project statements = %d", size );

5       }

 

 

6.1.4  Tokens

A third, natural way to overcome some of the problems inherent in mea­sur­ing pro­gram size in terms of lines of code is to count tokens. This is the method pioneered by the late Maurice Halstead under the name “Soft­ware Science” (MH77). A token is the smallest unit of text recognized by a compiler, e.g. the keyword “else”, the oper­ator “+=”, and the punctuation mark “;”. There are, however, two kinds of tokens in C: preprocessor tokens and lexical tokens. The two are not always the same, espe­cially in older versions of C. The ANSI stan­dard has gone a long way towards bringing the two into closer agreement, but some differences still remain (and will always re­main as long as C has a prepro­cessor). To confuse matters even further, Halstead’s tokens do not exactly coin­cide with the tokens of the C grammar, nor do all users of Halstead’s metrics agree on how his definitions are best imple­mented.

Fortunately, studies have shown that program size as measured by token counting is not sensitive to minor variations in the details of the definition of to­kens (CDS86:42). It is always important, of course, to be consistent when making compar­isons of program size, but the actual definition used seems to make very little differ­ence in the conclusions reached.

Halstead divided all tokens into two somewhat arbitrary classes: operators and operands. CodeCheck interprets this classification in the C and C++ con­texts as follows: every token that is an identifier, numeric con­stant, string lit­eral, char­acter literal, or label is classified as an operand. A “Halstead opera­tor” is any to­ken that is not an operand. Note that standard C operators, enumerated in Sec­tions 2.4 and 6.6, are slightly different. For example, every punctuation mark is a Halstead opera­tor but not a C operator. For those who need measures based on Halstead operators, Code­Check provides two ver­sions of each predefined vari­able for counting operators, one using Halstead’s definition, the other using the standard C definition. For our pur­poses the term standard operator will refer to the standard C opera­tor.

The tokens counted by CodeCheck in this context are preprocessor tokens, i.e. the to­kens seen by the preprocessor before macro expansion takes place. To in­clude tokens generated by macro expansion in all of these counts, specify –E on the command line.

CodeCheck provides 16 predefined variables with which line, function, mod­ule, and program size can be measured in terms of tokens.

Line variables:

lin_tokens                        Set to the number of tokens found in a line of code be­fore macro expansion.

lin_operators                 Set to the number of standard C operators found in a line of code, before macro expansion.

lin_operands                   Set to the number of operands found in a line of code, before macro expansion

Statement variables:

stm_operators                 Set to the total number of standard operators found in a state­ment, before macro expansion.

stm_operands                   Set to the total number of operands found in a state­ment, before macro expansion.

stm_tokens                        Set to the total number of tokens found in a state­ment, be­fore macro expansion.

Function variables:

fcn_H_operators            Set to the total number of Halstead operators found in a func­tion before macro expansion (sta­tis­tic).

fcn_uH_operators          Set to the number of unique Halstead operators found in a func­tion before macro expansion (sta­tis­tic).

fcn_tokens                        Set to the total number of tokens found in a func­tion be­fore macro expansion (sta­tis­tic).

fcn_operators                 Set to the total number of standard C operators found in a func­tion before macro expansion (sta­tis­tic).

fcn_operands                   Set to the total number of operands found in a func­tion, before macro expansion (sta­tis­tic).

fcn_u_operands              Set to the number of unique operands found in a func­tion, before macro expansion (sta­tis­tic).

 

Module variables:

mod_H_operators            Set to the total number of Halstead operators found in a mod­ule before macro expansion (sta­tis­tic).

mod_uH_operators          Set to the number of unique Halstead operators found in a mod­ule before macro expansion (sta­tis­tic).

mod_tokens                        Set to the total number of tokens found in a mod­ule be­fore macro expansion (sta­tis­tic).

mod_operators                 Set to the total number of standard C operators found in a module before macro expansion (sta­tis­tic).

mod_operands                   Set to the total number of operands found in a mod­ule, before macro expansion (sta­tis­tic).

mod_u_operands              Set to the number of unique operands found in a mod­ule, before macro expansion (sta­tis­tic).

Project variables:

prj_H_operators            Set to the total number of Halstead operators found in a pro­ject before macro expansion.

prj_uH_operators          Set to the number of unique Halstead operators found in a pro­ject before macro expansion.

prj_tokens                        Set to the total number of tokens found in a project be­fore macro expansion.

prj_operators                 Set to the total number of standard C operators found in a project before macro expansion.

prj_operands                   Set to the total number of operands found in a pro­ject, before macro expansion.

prj_u_operands              Set to the number of unique operands found in a pro­ject, before macro expansion.

 

6.1.5  Functions

A fourth, natural way to overcome some of the problems inherent in mea­sur­ing program size in terms of lines of code is to count functions and macros (CDS86:42). Of all the measures presented here, this is the simplest and least am­biguous.

CodeCheck provides two predefined variables for counting functions at the module and project level:

Variable                                  Meaning

mod_functions                 Set to the number of functions defined in a mod­ule (sta­tis­tic).

mod_macros                        Set to the number of macros defined in a module (sta­tistic).

prj_functions                 Set to the number of functions defined in a project.

prj_macros                        Set to the number of macros defined in a project.


6.2    Logical Complexity

Measures of logical complexity are extremely important tools for en­suring program maintainability, on the universally recognized principle that logi­c­ally com­plex code is very difficult to understand and maintain. Contrary to popular belief, complicated prob­lems do not need logically com­plex programs for their solution. Indeed, it can be ar­gued that logical complexity in programs is caused by lack of understanding of the problem on the part of the pro­grammer: the bet­ter he or she understands the problem, the simpler and more elegant the pro­gram produced to solve it. One can argue that ev­ery milestone in the his­tory of computer pro­gram­ming has been a technique or concept that reduces program complexity: symbolic variables, formula transla­tion, formatted I/O, structured control statements, recursive functions, struc­tured data, modular pro­grams, and object-oriented pro­gram­ming.

Oddly enough, most measures of logical complexity are not themselves complex. Popular metrics such as decision count (section 6.2.1) and mean log­ical depth (section 6.2.3) are easy to calculate, while McCabe’s cy­clomatic com­plexity number only looks complicated — it is actually eas­ily calculated from the deci­sion count.

 

 

6.2.1  Decision Count

The flow of control in a C function proceeds sequentially through its state­ments until it is interrupted by a goto statement, a function call, or a binary deci­sion point, a statement from which con­trol passes to one of two choices. Goto statements and function calls are not binary decision points, because each passes control to exactly one choice. A switch statement is not in itself a binary decision point, but each case statement is. Thus every C state­ment ei­ther leads to exactly one statement or is a bi­nary decision point (the switch statement leads to the first case in the switch).

A simple measure for the logical complexity of a function can be con­struc­ted by counting the number of binary decision points in the function. This is called the deci­sion count for the function.

CodeCheck provides predefined variables for counting decisions at the func­tion, module, and project levels.

Variable                                  Meaning

fcn_decisions                 Set to the number of binary decision points in a func­tion (statistic).

mod_decisions                 Set to the number of binary decision points in a mod­ule (statistic).

prj_decisions                 Set to the number of binary decision points in a pro­ject.

 

Here is an example set of CodeCheck rules that alerts the user to functions with more than 12 decisions, and calculates the decisions per function rate for the entire project:

 1   float dpf;  /* decisions per function */

 2

 3   if ( fcn_decisions > 12 )

 4      warn( "This function may be too complex." );

 5

 6   if ( prj_end )

 7      {

 8      dpf = (1.0*prj_decisions) / prj_functions;

 9      printf( "Decisions per function rate = %g\n", dpf );

10      }

 

 

6.2.2  McCabe Cyclomatic Complexity

The structure of an algorithm is often depicted by a directed graph called a flowchart or flowgraph. If the flowgraph is abbreviated so that it describes only test nodes and branches between nodes, then it represents the logic struc­ture of the algo­rithm (CDS86:60). Consider the algorithm for computing the greatest common divisor of two positive integers, for example. The fol­low­ing C code im­plements this simple al­gorithm:

 

 1   long gcd( long n, long d )

 2      {

 3      register long  temp;

 4

 5      while( d )

 6         {

 7         temp = d;

 8         d = n % d;

 9         n = temp;

10         }

11      return n;

12      }

The flowgraph for this algorithm looks like this:

However, three nodes of this flowgraph are not involved with branching. To ob­tain its logic structure we collapse these extraneous nodes, which yields a much more simple graph:

By constructing the logic structure graph for this function we have re­vealed its logical flow in its starkest, most visible form. This is the purpose of the logic structure graph. It is intuitively clear that the logical complexity of a function should be related in some way to the structural richness of this graph. In 1976 McCabe proposed a mea­sure for this structural richness that has been universally accepted.

McCabe’s cyclomatic complexity measure for a C function or program is de­rived from simple properties of the logic structure graph of the function. This measure is de­fined as follows. First connect all exit nodes to the entry node. Then calculate

edges – nodes + 1

where “edges” are lines that connect boxes, and “nodes” are boxes that emit or re­ceive edges.

In the greatest common divisor algorithm, described above, McCabe’s mea­sure is easily calculated from the logic structure graph in the above figure as:  4 edges – 3 nodes + 1 = 2. It is interesting to note that the same result is found if one calculates the McCabe measure on the flowgraph itself:  7 edges –6 nodes + 1 = 2. That this is true for any algo­rithm reflects a fundamental prin­ciple: the logical complexity of an al­gorithm does not change when non-deci­sion nodes are in­serted or removed.

The simplest possible algorithm has a McCabe measure of 1, and McCabe consid­ered that cyclomatic complexities in excess of 10 were an indication of overly complex code. One way to understand cyclomatic complexity is this: first imagine that all exit nodes are connected to the entry node. Then count the num­ber of edges that must be removed from the logic structure graph to eliminate all circuits (ignore arrow directions when look­ing for circuits). The number of edges removed is the cyclomatic complexity.

There is a remarkable relationship between McCabe’s cyclomatic com­plex­ity and the decision count metric defined in the previous section: McCabe’s number for any al­gorithm is exactly one more than its decision count. This occurs because each binary decision point contributes exactly two edges and one node, thus increas­ing the Mc­Cabe measure by one.

Because of the simple relationship between McCabe’s measure and de­ci­sion count, no additional CodeCheck variables are needed to calculate the McCabe mea­sure.

 

 

6.2.3  Logical Depth

The “logical depth” of a statement is its logical nesting level. Logical nest­ing re­sults from decisions, i.e. if, else, for, while, switch and do statements. The following code frag­ment, taken from a full-pivot matrix inversion routine, illus­trates a nest­ing depth of 5. Note that there are a total of 12 statements in these 14 lines, con­sisting of 5 ex­pression state­ments, 2 iteration statements, 3 selec­tion state­ments, and 2 com­pound statements.

                                                /*  DEPTH  */

 1   max = 0;                                    /*    0    */

 2   for ( j = 0; j < dim; j++ )                 /*    0    */

 3    if ( notused[j] )                         /*    1    */

 4      for ( k = 0; k < dim; k++ )              /*    2    */

 5       if ( notused[k] )                      /*    3    */

 6         {                /*    4    */

 7         temp = fabs( A[j,k] );                /*    4    */

 8         if ( max <= temp )                    /*    4    */

 9          {                                   /*    5    */

10          row = j;                            /*    5    */

11          col = k;                            /*    5    */

12          max = temp;                         /*    5    */

13          }                                   /*    5    */

14         }                /*    4    */

The complexity caused by nesting may be under­stood by trying to answer this ques­tion: when all the looping is done, what are row, col and max set to? Clearly, the greater the depth the harder it is to figure out the answer to such questions.

Functions, which consist of many statements, have several kinds of depth mea­sures. Two measures of obvious utility are maximum depth and average depth. Since these are statistical quantities, CodeCheck provides a predefined variable from which these and other statistical des­criptors can be derived.

Variable                                  Meaning

stm_depth                          Set to the logical depth of a statement, i.e. its nest­ing level within if-, for-, while-, and do-state­ments.

Here is an example of a set of CodeCheck rules that calculates a histogram of maximum depth for the functions in each module:

 1   statistic float   depth, maxdepth;

 2   int               max;

 3

 4   if ( stm_end )           /*  Needed because stm_depth  */

 5      depth = stm_depth;     /*  is not a statistic.       */

 6

 7   if ( fcn_begin )          /*  Initialize in   */

 8      reset( depth );        /*  every function  */

 9

10   if ( fcn_end )

11      maxdepth = maximum( depth );

12

13   if ( mod_begin )          /*  Initialize in   */

14      reset( maxdepth );     /*  every module    */

15

16   if ( mod_end )

17      {

18      max = maximum( maxdepth );

19      histogram( maxdepth, 0, max, max );

20      }


6.3    Code Density

Like the measures of logical complexity discussed in the previous section, mea­sures of code density are important tools for en­suring program main­tain­ability, on the realistic principle that dense code is very difficult to under­stand and maintain. Intu­itively speaking, computer code is dense if it con­tains a lot of material per unit of size. Size has been discussed in section 6.1 above, but what about “material”? There are many possibilities: number of operators, number of operands, tokens per line, opera­tors per hundred to­kens, etc., and these may be weighted according to their cognitive ambiguity or rarity (generally speaking, the more ambigu­ous or rare an object, e.g. a comma op­er­a­tor, the more it should contribute to the density measure). The subsec­tions that follow des­cribe these and other useful ideas for measur­ing code density.

Code density may be interesting as an overall measurement of a C pro­gram, but it can also serve another purpose, which will be even more useful for many program­mers. Just as indicators of code complexity can be used to alert the pro­grammer to functions that are overly complex, an indicator of code density can be used to identify lines or statements that are overly dense. Thus a kind of early warning system can be built by writing a CodeCheck rule file that contains a va­riety of alerts of this type.

 

 

6.3.1  Two Examples of Dense Code

This C function, adapted from KR88:49, illustrates the value of a warning based on density:

1    /* getbits:  get n bits from position p  */

2    unsigned getbits(unsigned x, int p, int n)

3       {

4       return (x>>(p+1-n))&~(~0<<n);

5       }

This routine has just one executable statement. How long will it take a main­te­nance programmer to understand how it works? Five minutes? Ten? Most of the difficulty comes from the uncommented use of so many opera­tors, five of which are rarely-seen bitwise operators. The cal­cu­la­tion should be broken down into in­termediate steps — this will not only help the reader to parse the expression cor­rectly, it will also provide space in which to place com­ments. CodeCheck rules for any of the following three excess-density conditions would have flagged this line:

            a)  operators per line > 5,

            b)  tokens per line > 15,

            c)  operators per operand > 4/3.

Here is a preferable version, which uses register variables to achieve a speed at least as good as the original, if not better (the actual speed will depend on the amount of optimization performed by the compiler):

 1   #define  RIGHTBITS(n)  (~(~0 << n))   /* Mask for right n bits */

 2

 3   /* getbits: get n bits from position p (counting p from right) */

 4

 5   unsigned getbits(unsigned x, int p, int n)

 6      {

 7      register unsigned margin,   /*  # unwanted bits on right  */

 8                        shifted,  /*  result after rt shifting  */

 9                        mask;     /*  mask for the desired bits */

10

11      margin = p + 1 - n;      /* size of the right-hand margin  */

12      shifted = x >> margin;   /* shift wanted bits to the right */

13      mask = RIGHTBITS(n);     /* create a mask for right n bits */

14      return (shifted & mask); /* mask out all unwanted bits     */

15      }

Each line of the new version has a lower density by almost any measure, and none would be flagged by the excess-density conditions listed above. In addition, the extra space gained at the end of the new lines allows room for brief com­ments, which, at least in this example, are crucial.

Here is another example of overly dense and abstract code, also adapted from K&R88. This routine implements the library function strcpy:

1    /*  strcpy:  copy string at t to s.  */

2

3    void strcpy(char *s, char *t)

4       {

5       while (*s++ = *t++)

6          ;

7       }

The apparent elegance of this routine is only superficial — as a programming puzzle it is superb, but as a working routine it exhibits almost every cognitive difficulty that a C routine can present. To fully com­pre­hend how it works, the reader must unnecessarily go through a long series of analytical steps, each re­quir­ing insight, perceptiveness, and pre­ci­sion. First, the reader must per­ceive that the condition has an assign­ment op­erator (=), not an equality test (==). Sec­ond, the reader must separate the lvalue (*s) from the increment op­erator (++), and figure out whether the in­crement is applied to *s or to s. Third, the rvalue must be un­derstood: it looks just like the lvalue, but it yields a charac­ter, not an address of a charac­ter, so it is actually different. Without perceiving this differ­ence no sense can be made of the expression. Fourth (and last), the reader must distinguish be­tween the action taken by the expression and the test for termina­tion of the while statement. The ter­mina­tion test is only un­derstood when the reader makes the identification be­tween logical false and the end-of-string marker.

Kernighan & Ritchie admit that this routine is indeed cryptic at first sight. How­ever, they sug­gest that it uses an idiom that should be mastered due to its notational con­veni­ence and frequent use (KR88:106). Is convenience a good sub­stitute for clar­ity? Very few quality control managers would agree.

 

 

6.3.2  Operational Density

Any metric for code density that uses an operator count in the numerator is a mea­sure of operational density.  The rationale for making operators the ba­sis for mea­suring code den­sity is that it is the operators that encode ac­tions: too many actions in too small a space is not good programming style. How­ever, there are many ways to use operator counts in the construction of den­sity met­rics. Code­Check offers a variety of predefined variables with which to calculate operator-based density. First, here are some of the many possible metrics for op­erational density:

operators per executable line            Used primarily to identify lines that ought to be broken down into several simpler lines.

operators per expression                   Used primarily to identify expressions that ought to be simplified.

operators per hundred tokens            Can be used to quantify the overall density of functions, modules, and projects.

operators per operand                       Can be used to quantify the “abstractness” of ex­pressions, lines, statements, and functions.

These and other metrics can be constructed with the following CodeCheck prede­fined variables.

Expression variables:

Variable                                  Meaning

exp_operators                 Set to the number of standard C operators found in an ex­pression, before macro expansion.

exp_operands                   Set to the number of operands found in an ex­pres­sion, before macro expansion.

exp_tokens                        Set to the number of tokens found in an expression, be­fore macro expansion.

Line variables:

lin_has_code                   Set to 1 if a line contains C code.

lin_is_exec                      Set to 1 if a line contains code that is executable.

lin_operators                 Set to the number of standard C operators found in a line of code, before macro expansion.

lin_operands                   Set to the number of operands found in a line of code, before macro expansion

lin_tokens                        Set to the number of tokens found in a line, before macro expansion.

Statement variables:

stm_operators                 Set to the number of standard C operators found in a state­ment, before macro expansion.

stm_operands                   Set to the number of operands found in a state­ment, before macro expansion.

stm_tokens                        Set to the number of tokens found in a statement, be­fore macro expansion.

Function variables:

fcn_operators                 Set to the number of standard C operators found in a func­tion before macro expansion (sta­tis­tic).

fcn_operands                   Set to the number of operands found in a func­tion, be­fore macro expansion (sta­tis­tic).

fcn_tokens                        Set to the number of tokens found in a function, be­fore macro expansion (sta­tis­tic).

Module variables:

mod_operators                 Set to the number of standard C operators found in a module before macro expansion (sta­tis­tic).

mod_operands                   Set to the number of operands found in a mod­ule, be­fore macro expansion (sta­tis­tic).

mod_tokens                        Set to the number of tokens found in a mod­ule be­fore macro expansion (sta­tis­tic).

Project variables:

prj_operators                 Set to the number of standard C operators found in a project before macro expansion.

prj_operands                   Set to the number of operands found in a project, be­fore macro expansion.

prj_tokens                        Set to the number of tokens found in a project be­fore macro expansion.

 

Here is a sample CodeCheck rule set that calculates five of the operator-based density metrics mentioned above. Note the use in line 18 of multiplica­tion by 1.0 to achieve the effect of casting to float (casts are not permitted in CodeCheck rules).

 1   float oo_ratio, ops_per_token;

 2

 3   if ( lin_is_exec )

 4      if ( lin_operators > 5 )

 5         warn( "This line is too dense." );

 6

 7   if ( exp_operators > 5 )

 8      warn( "This expression is too dense." );

 9

10   if ( lin_operands > 2 )

11      if ( 2*lin_operators > 3*lin_operands )

12         warn( "This line is too abstract." );

13

14   if ( fcn_end )

15      {

16      printf( "Characteristics of %s:\n\n", fcn_name() );

17

18      oo_ratio = (1.0*fcn_operators)/fcn_operands;

19      printf( " Operator/operand ratio = %g\n", oo_ratio );

20

21      ops_per_token = (100.0*fcn_operators)/fcn_tokens;

22      printf( " Operators per hundred tokens = %g\n, ops_per_token );

23      }

 

 

6.3.3  A Weighted Index for Operational Density

Not all operators are equal with respect to frequency of use and familiarity to pro­gram­mers. Many excellent C programmers can easily spend a career with­out ever us­ing the comma operator, for example. Furthermore, some operators, like % and >>, are unreliable, due to being some­what machine-spe­cific. Thus it makes sense to weight operators in an operational density metric accord­ing to general familiarity and reliabil­ity, on the grounds that the more the reader has to think about an operator and how it behaves, the more it contributes to the overall den­sity of the code.

A simple weighting scheme for operators can be constructed by assigning in­teger weights to each category of operator. Then every time an operator is found, add its weight to the operator count. The result is a weighted index in­stead of a count. For greater precision, floating-point weights can be used. The crucial step, however, is the initial choice of weights. Here is an example:

Operators                   Weight            Justification

+ – * / !         1      These are the best known operators.

[] .              1      Very common.

< <= > >=         1      Very common.

!= |=             2      Easily confused.

+= -= *= /=       2      Require a little additional thought.

++ --             3      Need thought, and they resemble + and -.

= ==              3      Notoriously ambiguous.

&& ||             2      Common, but resemble & and |.

& | ~ <<          3      Not often used.

>> >>= % %=       5      Machine-specific.

* & ->                            3               Pointer operations require thought.

sizeof                            2               Argument is an unevaluated expression.

( type name )                     2               Casts need abstract declarators.

^ ?: ^=           3      Requires thought.

,                                        5               The comma is very difficult to recognize.

 

Managers, software engineers, and researchers will differ on how to assign weights to operators. To allow individual weighting choices, CodeCheck pro­vides a separate predefined variable for every distinct operator and punc­tua­tion mark (as we have seen before, in the case of the Halstead metrics, one person’s punctuation mark may be another person’s operator). These vari­ables are set af­ter all macros have been expanded. For the complete list of all operator variables, see the CodeCheck Reference Manual, section 3.9.

Here is a sample CodeCheck rule set that calculates a weighted index for op­era­tional density at the module level, using the weights suggested earlier. It de­fines op­erational density as weighted operators per executable line.

 1   float index;      //  The weighted index  */

 2

 3   if ( mod_begin )  //  Initialize at start of each module

 4      index = 0;

 5

 6   if (op_add || op_subt || op_mul || op_div || op_log_not)

 7      index += 1.0;

 8

 9   if ( op_subscript || op_call || op_member )

10      index += 1.0;

11

12   if ( op_less || op_less_eq || op_more || op_more_eq )

13      index += 1.0

14

15   if ( op_not_eq || op_or_assign )

16      index += 2.0;

17

18   if ( op_add_assign || op_sub_assign )

19      index += 2.0;

20

21   if ( op_mul_assign || op_div_assign )

22      index += 2.0;

23

24   if ( op_pre_incr  || op_pre_decr

25                     || op_post_incr

26                     || op_post_decr )

27      index += 3.0;

28

29   •••   // more index calculations

30

31   if ( op_comma )

32      index += 5.0;

33

34   if ( mod_end ) /*  Print at end of each module  */

35      {

36      index /= mod_exec_lines;

37      printf( "Module %s:\n", mod_name() );

38      printf( "   operational density = %g\n, index );

39      }


Chapter 7: CodeCheck Rule Sets

 

 

7.1    Verifying POSIX.1 compliance

 

The rule file shown here is designed to flag non-POSIX features that are likely to occur in C programs written for Berkeley Unix (BSD version 4.3).

Rule file bsd43.cc

 

/*   BSD43.cc

 

     Copyright (c) 1992-95 by Abraxas Software. All rights reserved.

==============================================================================

     Purpose:  Flags BSD 4.3 features that are not POSIX.1 conforming.

     Author:      Loren Cobb.

     Revision: 12 October 1994.

                        16, March, 1994. Mask the message for the functions

                        which are class member functions and have the same

                        names in the list.

     Format:      Monospaced font with 4 spaces/tab.

==============================================================================

 

Abstract:

 

     These CodeCheck rules generate warning messages when BSD 4.3 features

     are used that are not POSIX conforming. If possible, these messages

     will suggest the appropriate POSIX feature to use.

    

     The features flagged in these rules are the ones that I know about as

     of the date on this rule file. If you know of others that ought to be

     flagged, please fax me the details at your earliest convenience. The

     Abraxas fax number is 503-244-8375. Many thanks in advance!

 

Warning Codes:

 

2000 Precede all headers with #define _POSIX_SOURCE.

2001 Replace function <BSD function> with <POSIX function>.

2002 Function <BSD function> has no POSIX equivalent.

2003 Function <BSD function> is not needed in POSIX.

2004 POSIX requires #include <POSIX header> for <function>.

2005 Replace <BSD header> with <POSIX header>.

2006 Replace <BSD macro> with <POSIX macro>.

2007 If you need an audible alarm, use \a instead of \07.

2008 Replace tag <BSD name> with <POSIX name>.

 

 

Suggested Actions:

 

2000 The macro _POSIX_SOURCE must be defined for POSIX headers to be

     read correctly. Define this macro at the top of every source file.

2001 Look up the replacement function in a POSIX reference - the return

     type and some arguments may differ from the BSD function, and a

     POSIX header may need to be included.

2002 You will need to hand-code a replacement function.

2003 No call to this function is needed in a standard POSIX environment.

2004 Insert the appropriate #include after #define _POSIX_SOURCE.

2005 Change the name of the BSD header to the appropriate POSIX header.

2006 Change the name of the BSD macro to the appropriate POSIX macro.

2007 If you need an audible alarm, change \07 to \a in the string.

2008 Change the name of the BSD tag to the appropriate POSIX tag.

 

 

Useful References:

 

Horton, Mark R. (1990) "Portable C Software." Published by Prentice-Hall,

        Englewood Cliffs, NJ 07632, USA.

 

IEEE (1988) "IEEE Standard Portable Operating System Interface for Computer

        Environments 1003.1-1988." Published by IEEE, 345 East 47th Street,

        New York, NY 10017, USA.

 

Lewine, Donald A. (1991) "POSIX Programmer's Guide."  Published by O'Reilly &

        Associates, 632 Petaluma Avenue, Sebastopol, CA 95472, USA.

 

Zlotnick, Fred (1991) "The POSIX.1 Standard." Published by Benjamin/Cummings,

        390 Bridge Parkway, Redwood City, CA 94065, USA.

 

==============================================================================

*/

 

#define     NEED_POSIX       2000

#define     REPLACE_FCN      2001

#define     NO_EQUIV         2002

#define     NOT_NEEDED       2003

#define     INCLUDE          2004

#define     REPLACE_HDR      2005

#define     REPLACE_CON      2006

#define     ALARM            2007

#define     REPLACE_TAG      2008

 

#define REPLACE(fname,gname)  if ( strcmp(idn_name(),fname) == 0 )       \

            {                                                                               \

            warn(REPLACE_FCN, "Replace " fname " with " gname ".");                  \

            ++items_flagged;                                                                \

            }

 

#define REWRITE(fname)  if ( strcmp(idn_name(),fname) == 0 )                         \

            {                                                                               \

            warn(NO_EQUIV, "Function " fname " has no POSIX equivalent.");      \

            ++items_flagged;                                                                \

            }

 

#define DELETE(fname)  if ( strcmp(idn_name(),fname) == 0 )                     \

            {                                                                               \

            warn(NOT_NEEDED, "Function " fname " is not needed in POSIX.");     \

            ++items_flagged;                                                                \

            }

 

#define CALLED(fname)  (strcmp(idn_name(),fname) == 0)

 

 

int     posix_needed,    //  1 if macro _POSIX_SOURCE has not yet been defined.

        unistd_needed,      //  1 if header unistd.h has not yet been included.

        items_flagged;      //  Number of non-POSIX features found in this module.

 

int     sys_dir_included,   //  1 if header sys/dir.h       has been #included.

        sys_time_included,  //  1 if header sys/time.h      has been #included.

        time_included,      //  1 if header time.h          has been #included.

        unistd_included;    //  1 if header unistd.h has been #included.

 

 

if ( mod_begin )

     {

     posix_needed     = 1;

     unistd_needed    = 1;

     items_flagged    = 0;

     sys_dir_included = 0;

     sys_time_included   = 0;

     time_included    = 0;

     unistd_included     = 0;

     }

 

if ( pp_macro )

     if ( strcmp(pp_name(), "_POSIX_SOURCE") == 0 )

        posix_needed = 0;

 

if ( header_name() )

     {

     if ( posix_needed )

        {

        warn( NEED_POSIX, "Precede all headers with #define _POSIX_SOURCE" );

        posix_needed = 0;

        ++items_flagged;

        }

 

     if ( unistd_needed )

        {

        if ( strcmp(header_name(), "unistd.h") != 0 )

            warn( INCLUDE, "POSIX recommends #include <unistd.h> before this line." );

        unistd_needed = 0;

        ++items_flagged;

        }

 

     if ( strcmp(header_name(),"sys/dir.h") == 0 )

        {

        warn( REPLACE_HDR, "Replace <sys/dir.h> with <dirent.h>." );     //     MRH:328

        ++items_flagged;

        }

     else if ( strcmp(header_name(),"sys/param.h") == 0 )

        {

        warn( REPLACE_HDR, "Replace <sys/param.h> with <unistd.h>." );

        ++items_flagged;

        }

     else if ( strcmp(header_name(),"sys/time.h") == 0 )

        {

        warn( REPLACE_HDR, "Replace <sys/time.h> with <time.h>." );

        ++items_flagged;

        sys_time_included = 1;

        }

     else if ( strcmp(header_name(),"unistd.h") == 0 )

        {

        unistd_included = 1;

        unistd_needed = 0;

        }

     else if ( strcmp(header_name(),"varargs.h") == 0 )

        {

        warn( REPLACE_HDR, "Replace <varargs.h> with <stdarg.h>." );

        ++items_flagged;

        }

     }

 

if (idn_function&&!idn_member)

     {

          REPLACE( "alloca"             , "malloc"                              )        //  DAL:566

     else REPLACE( "bcmp"               , "strncmp"                             )        //  DAL:566

     else REPLACE( "bcopy"              , "strncmp"                             )        //  DAL:566

     else REPLACE( "cuserid"            , "getlogin or getpwuid"   )            //  DAL:249

     else REPLACE( "ecvt"               , "sprintf"                             )        //  DAL:566

     else REPLACE( "fcvt"               , "sprintf"                             )        //  DAL:566

     else REPLACE( "flock"              , "fcntl"                               )        //  DAL:566

     else REPLACE( "gcvt"               , "sprintf"                             )        //  DAL:566

     else REPLACE( "getdtablesize"      , "sysconf"                             )        //  DAL:566

     else REPLACE( "getpw"              , "getpwent"                     )           //  DAL:566

     else REPLACE( "gettimeofday" , "localtime and time"           )            //  DAL:566

     else REPLACE( "getwd"              , "getcwd"                              )        //  DAL:566

     else REPLACE( "index"              , "strchr"                              )        //  DAL:566

     else REPLACE( "initstate"          , "srand"                               )        //  DAL:566

     else REPLACE( "ioctl"              , "[see a POSIX.1 book]"   )            //  DAL:566

     else REPLACE( "killpg"             , "kill"                                )        //  DAL:566

     else REPLACE( "mknod"              , "mkdir or mkfifo"              )           //  DAL:566

     else REPLACE( "mktemp"             , "tmpnam"                              )        //  DAL:487

     else REPLACE( "pclose"             , "close"                               )        //  DAL:566

     else REPLACE( "popen", "pipe, fdopen, fork, system, or wait"  )            //  DAL:566

     else REPLACE( "random"             , "rand"                                )        //  DAL:566

     else REPLACE( "rindex"             , "strrchr"                             )        //  DAL:567

     else REPLACE( "scandir"            , "readdir, malloc, qsort" )            //  DAL:567

     else REPLACE( "seekdir"            , "opendir, readdir"             )           //  DAL:567

     else REPLACE( "setbuffer"          , "setvbuf"                             )        //  DAL:567

     else REPLACE( "setitimer"          , "alarm"                               )        //  DAL:567

     else REPLACE( "setlinebuf"         , "setvbuf"                             )        //  DAL:567

     else REPLACE( "setregid"           , "setgid and setegid"           )           //  DAL:567

     else REPLACE( "setreuid"           , "setuid and setuegid"          )           //  DAL:567

     else REPLACE( "setstate"           , "srand"                               )        //  DAL:567

     else REPLACE( "sigblock"           , "sigprocmask"                         )        //  DAL:567

     else REPLACE( "signal"             , "sigaction"                    )           //  DAL:420

     else REPLACE( "sigpause"           , "sigsuspend"                          )        //  DAL:567

     else REPLACE( "sigsetmask"         , "sigprocmask"                         )        //  DAL:567

     else REPLACE( "sigvec"             , "sigpending"                          )        //  DAL:567

     else REPLACE( "srandom"            , "srand"                               )        //  DAL:567

     else REPLACE( "system"             , "[see a POSIX.2 book]"   )            //  DAL:470

     else REPLACE( "timezone"           , "localtime"                    )           //  DAL:567

     else REPLACE( "utimes"             , "utime"                               )        //  DAL:567

     else REPLACE( "valloc"             , "malloc"                              )        //  DAL:567

     else REPLACE( "vfork"              , "fork"                                )        //  DAL:567

     else REPLACE( "vhangup"            , "tcsetattr"                    )           //  DAL:567

     else REPLACE( "wait3"              , "waitpid"                             )        //  DAL:567

 

     else REWRITE( "bzero"              )            //  DAL:566

     else REWRITE( "cabs"               )            //  DAL:566

     else REWRITE( "ffs"                )            //  DAL:566

     else REWRITE( "gamma"              )            //  DAL:566

     else REWRITE( "getpass"            )            //  DAL:566

     else REWRITE( "hypot"              )            //  DAL:566

     else REWRITE( "insque"             )            //  DAL:566

     else REWRITE( "isascii"            )            //  DAL:566

     else REWRITE( "j0"                 )            //  DAL:566

     else REWRITE( "j1"                 )            //  DAL:566

     else REWRITE( "jn"                 )            //  DAL:566

     else REWRITE( "remque"             )            //  DAL:567

     else REWRITE( "y0"                 )            //  DAL:567

     else REWRITE( "y1"                 )            //  DAL:567

     else REWRITE( "yn"                 )            //  DAL:567

    

     else DELETE( "endgrent"            )            //  DAL:566

     else DELETE( "endpwent"            )            //  DAL:566

     else DELETE( "nice"                )            //  DAL:566

     else DELETE( "setgrent"            )            //  DAL:567

     else DELETE( "setpwent"            )            //  DAL:567

    

     if ( CALLED("asctime") && sys_time_included && (! time_included) )         //  DAL:217

        {

        warn( INCLUDE, "POSIX requires #include <time.h> for function asctime." );

        ++items_flagged;

        }

     }

 

if ( identifier("direct") ) //   MRH:328

     if ( sys_dir_included )

        {

        warn( REPLACE_TAG, "Replace tag \"direct\" with \"dirent\"." );

        ++items_flagged;

        }

 

if ( macro("L_INCR") )      //  DAL:351

     {

     warn( REPLACE_CON, "Replace L_INCR with SEEK_CUR." );

     ++items_flagged;

     }

 

if ( macro("L_SET") )    //  DAL:351

     {

     warn( REPLACE_CON, "Replace L_SET with SEEK_SET." );

     ++items_flagged;

     }

 

if ( macro("L_XTND") )      //  DAL:351

     {

     warn( REPLACE_CON, "Replace L_XTND with SEEK_END." );

     ++items_flagged;

     }

 

if ( macro("O_NDELAY") ) //  DAL:366

     {

     warn( REPLACE_CON, "Replace O_NDELAY with O_NONBLOCK." );

     ++items_flagged;

     }

 

if ( macro("SIGIOT") )      //  DAL:211

     {

     warn( REPLACE_CON, "Replace SIGIOT with SIGABRT." );

     ++items_flagged;

     }

 

if ( lex_num_escape == 7 )  //  DAL:377

     {

     warn( ALARM, "If you need an audible alarm, use \\a instead of \\07." );

     ++items_flagged;

     }

 

if ( mod_end )

     {

     printf( "\n*** %d non-POSIX BSD features were found in module %s ***\n\n", items_flagged, mod_name() );

     }


7.2    Compliance with Coding Standards

 

Almost every company that engages in C software development has its own internal standards for programmers. The rule file that follows encodes the actual stan­dards used by one such company. This rule file is used both by individual pro­gram­mers and by team leaders to monitor compliance with the company’s standards.

Rule file sample.cc

 

//   Copyright (c) 1988-93 by Abraxas Software. All rights reserved.

 

//  This file can be used by company programmers to monitor

//  their compliance with corporate standards.

 

//  The warnings are intended to be seen in the context of

//  a list file. Use:  check myproject.ccp -Rsample -L

 

//  Warning codes:  1000  lexical

//                  2000  keyword

//                  3000  preprocessor

//                  4000  declaration

//                  5000  general

 

//  These rules may serve as a basis for customizing your

//  own set of company standards for C code production.

 

 

#include <check.cch>

 

/**********  Lexical Rules  **********/

 

if ( lex_nonstandard )

     warn( 1001, "Nonstandard character." );

 

if ( lex_unsigned || lex_float )

     warn( 1002, "Do not use this suffix." );

 

if ( (lex_radix == 8) || (lex_radix == 16) )

     warn( 1003, "Do not use octal or hex constants." );

 

if ( lex_str_length > 509 )

     warn( 1005, "String literal too long for ANSI C." );

 

if ( lex_str_trigraph )

     warn( 1006, "ANSI C will recode the trigraph in this string." );

 

if ( lex_nested_comment )

     warn( 1007, "Do not nest comments." );

 

if ( op_low )

     {

     if ( ! op_white_before )

        warn( 1008, "Put space before operator %s.", token() );

     if ( ! op_white_after )

        warn( 1008, "Put space before operator %s.", token() );

     }

 

/**********  Keyword rules  **********/

 

if ( keyword("int") )

     if ( ! lex_macro_token )

        warn( 2001, "Use INTEGER, short, or long." );

 

if ( keyword("register") )

     if ( ! lex_macro_token )

        warn( 2002, "Use the REGISTER macro here." );

 

if ( keyword("volatile") )

     warn( 2003, "The \'volatile\' keyword is not portable." );

 

 

/**********  Preprocessor Rules  **********/

 

if ( pp_sub_keyword )

     warn( 3001, "Do not substitute preprocessor keywords." );

 

if ( pp_stack || pp_benign )

     warn( 3002, "Do not redefine macros without an #undef." );

 

if ( pp_include & 1 )

     warn( 3003, "#include macros are not allowed." );

 

if ( lex_not_manifest )

     warn( 3004, "Define this constant in a macro." );

 

if ( pp_arg_paren )

     warn( 3005, "Surround this argument with parentheses." );

 

if ( pp_comment )

     warn( 3006, "Surround this comment with space." );

 

if ( pp_white_before )

     warn( 3007, "The # must be in column 1 for portability." );

 

if ( pp_trailer )

     warn( 3008, "Place these tokens in a comment." );

 

if ( pp_if_depth > 8 )

     warn( 3009, "#if nesting > 3 is not be portable." );

 

if ( pp_include_depth > 8 )

     warn( 3010, "Include depth exceeds 8." );

 

if ( pp_pragma )

     warn( 3011, "Pragmas are generally not portable." );

 

if ( pp_arg_multiple )

     warn( 3012, "Possible undesired side-effects may occur here." );

 

if ( pp_assign )

     warn( 3013, "The \"=\" sign in this macro may be an error." );

 

if ( pp_keyword )

     warn( 3014, "Warning: this macro redefines a keyword." );

 

if ( pp_overload )

     warn( 3015, "This identifier conflicts with a macro function name." );

 

if ( pp_semicolon )

     warn( 3016, "Macro body ends with a semicolon." );

 

if ( pp_undef )

     warn( 3017, "Use #undef as seldom as possible." );

 

if ( pp_unstack )

     warn( 3018, "Undefining multiply-defined macros is not portable." );

 

 

/**********  Declaration Rules  **********/

 

if ( dcl_no_specifier )

     {

     if ( dcl_function )

        warn( 4001, "Explicit function return type required." );

     else

        warn( 4002, "Explicit type specifier required." );

     }

 

if ( dcl_bitfield_anon )

     warn( 4003, "All bitfields must have names." );

 

if ( dcl_union_init )

     warn( 4004, "Do not initialize unions." );

 

if ( dcl_auto_init )

     warn( 4005, "Do not initialize auto structs or arrays." );

 

if ( dcl_union_bits )

     warn( 4006, "Do not use bitfields in unions." );

 

if ( dcl_tag_def )

     if ( lin_source && (dcl_base == STRUCT_TYPE) )

        warn( 4007, "Define structures in header files only." );

 

if ( dcl_oldstyle )

     warn( 4008, "ALWAYS use prototypes." );

 

if ( idn_no_prototype )

     warn( 4009, "Function %s needs a prototype.", idn_name() );

 

if ( dcl_need_3dots )

     warn( 4010, "ANSI C requires 3 dots (...) here." );

 

if ( dcl_hidden )

     warn( 4011, "This declaration hides another." );

 

 

/**********  General Rules  **********/

 

if ( macro("offsetof") )

     warn( 5001, "Do not use offsets!" );


7.3    Porting to ANSI C

 

The differences between the de facto H&S standard of 1984-89 and the re­cently published ANSI standard have been described in a number of re­cent pub­li­ca­tions, the best of which are RJ88, HS88, and KR88. The rules given here for por­ta­tion from H&S to ANSI were developed from all of these sources. These rules are not complete, but they cover most of the major prob­lem areas.

Rule file ansi.cc

 

/*   ansi.cc

     Copyright (c) 1992-94 by Abraxas Software. All rights reserved.

==============================================================================

     Purpose:  Checks for compatibility with ANSI C standards.

     Author:      Loren Cobb.

     Revision: 12 October 1994.

     Format:      Monospaced font with 4 spaces/tab.

==============================================================================

 

Abstract:

 

     These CodeCheck rules check for compatibility with the ANSI C

     Standard. These rules are not comprehensive, but they do check

     for a great many of the troublesome areas of ANSI compliance.

    

     These rules are applied to all headers that are included in double

     quotes, e.g. #include "project.h". For proper use of these rules, be

     sure to include system headers in angle brackets: #include <ctypes.h>.

 

     DOS and OS/2 only: do not use the -K1 or -K0 options with these rules,

     as this will prevent CodeCheck from parsing (and detecting) non-ANSI

     keywords such as near, far, huge, pascal, etc.

 

Warning Codes:

 

8001 Rule file ansi.cc should not be run with option -K1 or -K2.

8002 Octal digits 8 and 9 are illegal in ANSI C.

8003 The long float type is illegal in ANSI C.

8004 ANSI C files must end with a newline character.

8005 String literal too long for ANSI C.

8006 ANSI C does not expand macros inside strings.

8007 ANSI C will recode the trigraph in this string.

8008 Too many parameters in this macro.

8009 This comment will not paste tokens in ANSI C.

8010 Header file nesting is too deep.

8011 This preprocessor usage is not allowed in ANSI C.

8012 ANSI C does not permit sizeof in directives.

8013 ANSI C does not parse tokens that follow a preprocessor directive.

8014 Not an ANSI preprocessor directive.

8015 This declaration is not valid.

8016 Only 31 chars are significant.

8017 Use double instead of long float.

8018 ANSI C requires 3 dots (...) here.

8019 Return type for this function should be specified.

8020 An explicit type is required in ANSI C declarations.

8021 Local static function declarations are not allowed in ANSI C.

8022 Zero-length arrays are illegal in ANSI C.

8023 This function has no prototype.

8024 Too many macros for some ANSI compilers.

8025 <specifier> is not an ANSI type specifier.

8026 For Apple "extended" type use ANSI "long double".

8027 Non-ANSI type modifier (pascal, cdecl, near, far, huge,

            interrupt, export, loadds, saveregs, based, or fastcall).

8028 Non-ANSI placement of const or volatile.

8029 Non-ANSI function type specifier (inline, virtual, pure, pascal,

            cdecl, interrupt, loadds, saveregs, fastcall, or export).

8030 Nested comments are not allowed in ANSI C.

8031 The preceding label is not attached to a statement.

8032 Array <name> must be declared extern.

8033 Binary constants are not permitted in ANSI C.

8034 <identifier>: Prefix <string> is reserved by ANSI.

8035 ANSI C does not use va_dcl.

8036 Macro va_start has two arguments in ANSI C.

8037 Replace header <header> with <header>.

8038 Replace function <name> with <name>.

8039 Replacement of preprocessor commands is not allowed in ANSI C.

8040 Empty initializers are not allowed in ANSI C.

 

==============================================================================

*/

 

#include <check.cch>

 

#define REPLACE_HDR(hdr1,hdr2)  if ( strcmp(header_name(),hdr1) == 0 )   \

            warn( 8037, "Replace header " hdr1 " with " hdr2 ".");

 

#define REPLACE_FCN(fname,gname)  if ( strcmp(op_function(),fname) == 0 ) \

            warn( 8038, "Replace " fname " with " gname ".");

 

int     ch,              //  Character that follows a prefix.

        k,               //  counter for dcl_level(k)

        non_ANSI_mod,    //  Non-ANSI type modifier flags

        varargs_included;   //   True if <varargs.h> was included.

 

 

if ( prj_begin )

     {

     /*

        Flags for non-ANSI type and pointer modifiers:

     */

     non_ANSI_mod = ~(CONST_FLAG + VOLATILE_FLAG);

 

     /*

        Make sure that extended keywords are allowed on DOS machines:

     */

#ifdef __MSDOS__

     if ( option('K') < 3 )

        {

        warn( 8001, "Rule file ansi.cc should not be run with -K1 or -K2." );

        }

#endif

     }

 

if ( lex_big_octal )

     warn( 8002, "Octal digits 8 and 9 are illegal in ANSI C." );

 

if ( lex_long_float )

     warn( 8003, "The long float type is illegal in ANSI C." );

 

if ( lex_nl_eof )

     warn( 8004, "ANSI C files must end with a newline character." );

 

if ( lex_str_length > 509 )

     warn( 8005, "String literal too long for ANSI C." );

 

if ( lex_str_macro )

     warn( 8006, "ANSI C does not expand macros inside strings." );

 

if ( lex_str_trigraph )

     warn( 8007, "ANSI C will recode the trigraph in this string." );

 

if ( pp_arg_count > 31 )

     warn( 8008, "Too many parameters in this macro." );

 

if ( pp_comment )

     warn( 8009, "This comment will not paste tokens in ANSI C." );

 

if ( pp_if_depth > 8 )

     warn( 8010, "Header file nesting is too deep." );

 

if ( pp_not_ansi )

     warn( 8011, "This preprocessor usage is not allowed in ANSI C." );

 

if ( pp_sizeof )

     warn( 8012, "ANSI C does not permit sizeof in directives." );

 

if ( pp_trailer )

     warn( 8013, "ANSI C does not parse tokens that follow a preprocessor directive." );

 

if ( pp_unknown )

     warn( 8014, "Not an ANSI preprocessor directive." );

 

if ( dcl_empty )

     if ( ! dcl_tag_def )

        warn( 8015, "This declaration is not valid." );

 

if ( dcl_ident_length > 31 )

     warn( 8016, "Only 31 chars are significant." );

 

if ( dcl_long_float )

     warn( 8017, "Use double instead of long float." );

 

if ( dcl_need_3dots )

     warn( 8018, "ANSI C requires 3 dots (...) here." );

 

if ( dcl_no_specifier )

     {

     if ( dcl_function )

        warn( 8019, "Return type for function %s should be specified.", dcl_name() );

     else

        warn( 8020, "An explicit type for %s is required in ANSI C.", dcl_name() );

     }

 

if ( dcl_static )

     if ( dcl_local && dcl_function )

        warn( 8021, "Local static function declarations are not allowed in ANSI C." );

 

if ( dcl_zero_array )

     warn( 8022, "Zero-length arrays are illegal in ANSI C." );

 

if ( idn_no_prototype )

     warn( 8023, "Function %s has no prototype.", idn_name() );

 

if ( mod_macros > 1024 )

     warn( 8024, "Too many macros for some ANSI compilers.");

 

if ( dcl_base == EXTRA_INT_TYPE )

     if ( lin_header != SYS_HEADER )

        warn( 8025, "non-ANSI integer type." );

 

if ( dcl_base == EXTRA_UINT_TYPE )

     if ( lin_header != SYS_HEADER )

        warn( 8025, "non-ANSI unsigned type." );

 

if ( dcl_base == EXTRA_FLOAT_TYPE )

     if ( lin_header != SYS_HEADER )

        warn( 8025, "non-ANSI float type." );

 

if ( dcl_base == EXTRA_PTR_TYPE )

     if ( lin_header != SYS_HEADER )

        warn( 8025, "non-ANSI pointer type." );

 

if ( dcl_storage_flags & GLOBAL_SC )

     if ( lin_header != SYS_HEADER )

        warn( 8025, "VAX globaldef and globalref are not ANSI type specifiers." );

 

if ( keyword("comp") )

     if ( lin_header != SYS_HEADER )

        warn( 8025, "Apple comp is not an ANSI type specifier." );

 

if ( keyword("extended") )

     if ( lin_header != SYS_HEADER )

        warn( 8026, "For Apple \"extended\" type use ANSI \"long double\"." );

 

if ( dcl_variable || dcl_function )

     if ( lin_header != SYS_HEADER )

        {

        k = 0;

        while ( k <= dcl_levels )      

            if ( dcl_level_flags(k++) & non_ANSI_mod )

               warn( 8027, "Non-ANSI type modifier." );

        }

 

if ( dcl_cv_modifier )

     if ( lin_header != SYS_HEADER )

        warn( 8028, "Non-ANSI placement of const or volatile." );

 

if ( dcl_function_flags )

     warn( 8029, "Non-ANSI function type specifier." );

 

if ( lex_nested_comment )

     {

     warn( 8030, "Nested comments are not allowed in ANSI C." );

 

 

//   If you want CodeCheck to assume that nested comments are okay as

//   soon as it finds the first such comment, then enable these 2 lines:

 

/*

     if ( option('N') == 0 )            //  Assume that the rest of this file

        set_option( 'N', 1 );           //  will contain nested comments too.

*/

     }

 

//  Although many modern compilers allow labels that are not attached

//  to any statement, e.g. at the end of a block, this is not allowed

//  by the ANSI standard.

 

if ( stm_bad_label )

     warn( 8031, "The preceding label is not attached to a statement." );

 

 

//  Detects local arrays that have no explicit dimension. Some

//  pre-ANSI compilers consider such arrays to be implicitly

//  external (i.e. the identifier has file scope and external

//  linkage). This interpretation is not allowed in ANSI C.

 

if ( dcl_level(0) == ARRAY )

     if ( dcl_local && (dcl_array_size == -1) && (! dcl_parameter) )

        if ( (! dcl_extern) && (! dcl_initializer) )

            warn( 8032, "Array %s must be declared extern.", dcl_name() );

 

 

//  Detect Zortech binary constants (e.g. 0b10101001):

 

if ( lex_radix == 2 )

     if ( lin_header != SYS_HEADER )

        warn( 8033, "Binary constants are not permitted in ANSI C." );

 

 

//  Check each external identifier for reserved prefixes:

 

if ( (dcl_global && ! dcl_static) || pp_macro )

     if ( lin_header != SYS_HEADER )

        {

        if ( prefix("E") )

            {

            ch = root()[0];

            if ( isdigit(ch) || isupper(ch) )

               warn( 8034, "%s: prefix E is reserved by ANSI.", dcl_name() );

            }

        if ( prefix("is") )

            {

            ch = root()[0];

            if ( islower(ch) )

               warn( 8034, "%s: prefix \"is\" is reserved by ANSI.", dcl_name() );

            }

        else if ( prefix("to") )

            {

            ch = root()[0];

            if ( islower(ch) )

               warn( 8034, "%s: prefix to is reserved by ANSI.", dcl_name() );

            }

        else if ( prefix("LC_") )

            {

            ch = root()[0];

            if ( isupper(ch) )

               warn( 8034, "%s: prefix LC_ is reserved by ANSI.", dcl_name() );

            }

        else if ( prefix("SIG") )

            {

            ch = root()[0];

            if ( isupper(ch) || (ch == '_') )

               warn( 8034, "%s: prefix SIG is reserved by ANSI.", dcl_name() );

            }

        else if ( prefix("mem") )

            {

            ch = root()[0];

            if ( islower(ch) )

               warn( 8034, "%s: prefix mem is reserved by ANSI.", dcl_name() );

            }

        else if ( prefix("str") )

            {

            ch = root()[0];

            if ( islower(ch) )

               warn( 8034, "%s: prefix str is reserved by ANSI.", dcl_name() );

            }

        else if ( prefix("wcs") )

            {

            ch = root()[0];

            if ( islower(ch) )

               warn( 8034, "%s: prefix wcs is reserved by ANSI.", dcl_name() );

            }

        }

 

if ( macro("va_dcl") )

     warn( 8035, "ANSI C does not use va_dcl." );

 

if ( macro("va_start") )

     if ( varargs_included )

        warn( 8036, "Macro va_start has two arguments in ANSI C." );

 

if ( header_name() )

     {

     if ( strcmp(header_name(),"varargs.h") == 0 )

        {

        warn( 8037, "Replace <varargs.h> with <stdargs.h>." );

        varargs_included = 1;

        }

     else REPLACE_HDR( "memory.h",      "string.h" )

     else REPLACE_HDR( "sys/times.h",   "time.h" )

     }

 

if ( op_call )

     {

         REPLACE_FCN( "cfree",          "free" )

     else REPLACE_FCN( "bcmp",          "strcmp" )

     else REPLACE_FCN( "bzero",         "memset" )

     else REPLACE_FCN( "strpos",        "strchr" )

     else REPLACE_FCN( "strrpos",       "strrchr" )

     else REPLACE_FCN( "mktemp",        "tmpnam" )

     }

 

if ( pp_sub_keyword )

     warn( 8039, "Replacement of preprocessor commands is not allowed in ANSI C." );

 

if ( exp_empty_initializer )

     warn( 8040, "Empty initializers are not allowed in ANSI C." );

 

 

 


7.4    Porting to Strict K&R Compilers

 

The differences between the de facto H&S standard of 1984-89 and the ori­g­inal Kernighan & Ritchie standard of 1978 were described by Har­bi­son & Steele in HS84. The rule set given here for porting from ANSI C to K&R C was de­veloped pri­marily from this source. These rules are not com­plete, but they cover most of the major problem areas.

Rule file toKR.cc

 

 

/*   Copyright (c) 1989-92 by Abraxas Software.

 *

 *   This is a collection of CodeCheck rules for

 *   testing for portability to strict K&R C compilers.

 */

 

 

#include <check.cch>

 

if ( identifier("entry") )

     warn( 7001, "\"entry\" is a reserved keyword in K&R." );

 

if ( lex_not_KR_escape )

     warn( 7002, "This escape sequence is not defined in K&R." );

 

if ( lex_backslash )

     warn( 7003, "This line continuation is not allowed in K&R." );

 

if ( lex_float || lex_unsigned || lex_long_float )

     warn( 7004, "This suffix is not allowed in K&R." );

 

if ( (lex_radix == 16) || lex_hex_escape )

     warn( 7005, "Hexadecimal numbers are not defined in K&R." );

 

if ( lex_str_macro )

     warn( 7006, "K&R compilers do not recognize macros in strings." );

 

if ( lex_str_concat )

     warn( 7007, "Implicit string concatenation not defined in K&R." );

 

if ( lex_trigraph )

     warn( 7008, "Trigraphs are not defined in K&R." );

 

if ( lex_wide )

     warn( 7009, "Wide chars and strings are not defined in K&R." );

 

if ( pp_error )

     warn( 7010, "The #error directive is not defined in K&R." );

 

if ( pp_paste )

     warn( 7011, "The # (paste) operator is not defined in K&R." );

 

if ( pp_pragma )

     warn( 7012, "The #pragma directive is not defined in K&R." );

 

if ( pp_stringize )

     warn( 7013, "The ## preprocessor operator not defined in K&R." );

 

if ( pp_unknown)

     warn( 7014, "This preprocessor directive is not defined in K&R." );

 

if ( pp_defined )

     warn( 7015, "The \"defined\" function is not defined in K&R." );

 

if ( pp_elif )

     warn( 7016, "The #elif directive is not defined in K&R." );

 

if ( keyword("const") )

     warn( 7017, "The \"const\" type is not defined in K&R." );

 

if ( keyword("volatile") )

     warn( 7018, "The \"volatile\" type is not defined in K&R." );

 

if ( keyword("signed") )

     warn( 7019, "The \"signed\" type is not defined in K&R." );

 

if ( dcl_function )

     if ( ! dcl_oldstyle )

        warn( 7020, "Prototypes are not allowed in K&R." );

 

if ( dcl_3dots )

     warn( 7021, "The 3-dot notation is not allowed in K&R." );

 

if ( dcl_need_3dots )

     warn( 7022, "Comma after argument list is not allowed in K&R." );

 

if ( keyword("enum") )

     warn( 7023, "The enumerated type is not permitted in K&R." );

 

if ( keyword("void") )

     warn( 7024, "The void type is not permitted in K&R." );

 

if ( dcl_union_init )

     warn( 7025, "Unions cannot be initialized in K&R." );

 

if ( lin_nested_comment )

     warn( 7026, "Nested comments are not permitted in K&R." );

 

if ( lex_cpp_comment )

     warn( 7027, "The // comment is not permitted in K&R." );

 

if ( dcl_extern_ambig )

     warn( 7028, "%s matches another name on 6 chars.", dcl_name() );


7.5    Measuring Code Complexity

 

The McCabe Cyclomatic Complexity measure, described in Section 6.2.2, is one of the most commonly used measures of code complexity. Here is a sam­ple CodeCheck rule file that calculates the McCabe and other measures of com­plex­ity for every function in a project.

Rule file complex.cc

 

/*

 *   Copyright (c) 1989-90 by Abraxas Software.

 *

 *   This is the start of a collection of CodeCheck rules

 *   for measuring function complexity and operator density.

 *

 *   NOTE: A CodeCheck bug (corrected in version 2.07) caused

 *   the McCabe measure calculated by these rules to be under-

 *   estimated: certain decision points were not counted.

 *

 */

 

 

statistic int  McCabe;

statistic float   density;

 

if ( prj_begin )

     {

     printf( "\n================== %s ================\n", prj_name() );

     printf( "\nDate: %s.\n\n", time_stamp() );

     printf( "Complexity => McCabe's Cyclomatic Complexity.\n" );

     printf( "Density    => Operators per executable line.\n" );

     printf( "Asterisks  => Function is too complex.\n" );

     }

 

if ( mod_begin )

     {

     printf( "\n\n--------------- %s ---------------\n\n", mod_name() );

     printf( "FUNCTION       Complexity    Density\n" );

     reset( fcn_decisions );

     reset( fcn_operators );

     reset( fcn_exec_lines );

     reset( McCabe );

     reset( density );

     }

 

if ( fcn_end )

     {

     McCabe = 1 + fcn_decisions;

     printf( "%-16s %3d ", fcn_name(), McCabe );

     if ( McCabe >= 30 )

        printf( "*** " );

     else if ( McCabe >= 20 )

        printf( "**  " );

     else if ( McCabe >= 10 )

        printf( "*   " );

     else

        printf( "    " );

    

     if ( fcn_exec_lines > 0 )

        density = (1.0*fcn_operators) / fcn_exec_lines;

     else

        density = 0.0;

     printf( "%9.1f\n", density );

     }

 

if ( mod_end )

     if ( ncases(fcn_exec_lines) > 0 )

        {

        printf("\nFunction Density (operators per executable line):\n");

        printf( "  Mean:    %6.2f\n", mean(density) );

        printf( "  Std.Dev: %6.2f\n", stdev(density) );

    

        printf( "\nFunction Complexity (McCabe):\n" );

        printf( "  Mean:    %6.2f\n", mean(McCabe) );

        printf( "  Std.Dev: %6.2f\n", stdev(McCabe) );

        printf( "  Maximum: %6.2f\n", maximum(McCabe) );

        printf( "  Histogram:\n" );

        histogram( McCabe, 0, 20, 21 );

        printf( "\n" );

        }

 

if ( prj_end )

     printf( "\n\n================== END ==================\n\n" );


7.6    Verifying the Order of Module Elements

 

Many software companies have as part of their coding standards a specific ordering for the elements of every source file, e.g. first external declarations, then macros, type definitions, static declarations, and finally, function defini­tions. It is not difficult to build a rule file that checks for violations of these orderings. The following rules illustrate one method in use at a major soft­ware house:

Rule file order.cc

 

 

//   These CodeCheck rules illustrate one way of enforcing

//   a standard order upon the elements of a C source file.

 

//   Copyright (c) 1991 by Abraxas Software. All rights reserved.

//   Author: Loren Cobb, Abraxas Software, February 1991.

 

//   This file detects violations of the following order:

 

//   1.  Header file #include directives.

//   2.  External variable declarations.

//   3.  Function declarations.

//   4.  Macro definitions.

//   5.  Type definitions.

//   6.  Public variable definitions (external linkage).

//   7.  Private variable definitions (static identifiers).

//   8.  Function definitions.

 

 

///////////////////////////////

//   Module Initialization:  //

///////////////////////////////

 

int  stage;        //  Module format stage (range: 0-8)

 

if ( mod_begin )

  stage = 0;       //  Reinitialize stage at start of every module.

 

 

/////////////////////////////

//   Module Format Rules:  //

/////////////////////////////

 

if ( pp_include )

  {

  if ( (stage > 1) && (stage != 4) )

     warn( 1001, "This #include of file %s is out of sequence.", header_name() );

  stage = 1;

  }

 

if ( dcl_global )

  if ( lin_source )

     {

     if ( dcl_function )

       {

       if ( dcl_definition )

          stage = 8;

       else

          {

          if ( stage > 3 )

            warn( 1003, "Function %s is out of sequence.", dcl_name() );

          stage = 3;

          }

       }

     else if ( dcl_extern )

       {

       if ( stage > 2 )

          warn( 1002,"Declaration of %s is out of sequence.",dcl_name());

       stage = 2;

       }

     else if ( dcl_static )

       {

       if ( stage > 7 )

          warn( 1007, "Variable %s is out of sequence.", dcl_name() );

       stage = 7;

       }

     else

       {

       if ( stage > 6 )

          warn( 1006, "Variable %s is out of sequence.", dcl_name() );

       stage = 6;

       }

     }

 

if ( pp_macro )

  if ( lin_source )

     {

     if ( stage > 4 )

       warn( 1004, "This macro definition is out of sequence." );

     stage = 4;

     }

 

if ( dcl_typedef )

  if ( lin_source )

     {

     if ( stage > 5 )

       warn( 1005, "Typename %s is out of sequence.", dcl_name() );

     stage = 5;

     }


7.7    C++ Rules

 

There have been several attempts to codify good C++ programming prac­tices in the form of standards for C++ source code, but the definitive work in this area is yet to be written. In the interim, here are a few rules for C++ code that can serve as a basis for further devel­opment:

Rule file cplus.cc

 

//  Copyright (c) 1992-93 by Abraxas Software.

 

//  This is a collection of CodeCheck rules

//  for elementary checking of C++ source code.

 

 

#include <check.cch>

 

if ( mod_begin )

     if ( (option('K') < 4) || (option('K') > 7) )

        {

        warn( 1000, "Use C++ for this file! (option -K4)." );

        }

 

//////////////////////////////////////////////////////////////////////////

//  1. Declare a virtual destructor in every base class that

//     has a virtual function. (Stroustrup & Ellis, p. 278)

 

int     has_virtual_function,

        has_virtual_destructor;

 

if ( tag_begin )

     {

     if ( tag_kind > 1 && ! tag_nested && ! tag_local )

        {

        has_virtual_function = 0;

        has_virtual_destructor = 0;

        }

     }

 

if ( dcl_virtual )

     {

     if ( dcl_base == DESTRUCTOR_TYPE )

        has_virtual_destructor = 1;

     else if ( dcl_base != CONSTRUCTOR_TYPE )

        has_virtual_function = 1;

     }

 

if ( tag_end )

     {

     if ( tag_kind > 1 && ! tag_nested && ! tag_local )

        {

        if ( has_virtual_function && ! has_virtual_destructor )

            warn( 2007, "This class needs a virtual destructor." );

        has_virtual_function = 0;

        has_virtual_destructor = 0;

        }

     }

 

//////////////////////////////////////////////////////////////////////////

//   2. Use const variables or enumerated constants

//      instead of preprocessor (#define) constants.

 

if ( pp_const )

     warn( 1001, "Do not use the preprocessor to define constants." );

 

//////////////////////////////////////////////////////////////////////////

//  3. A nested tag name should have an explicit scope.

 

if ( lex_invisible )

     warn( 1002, "Nested tag name %s should be scoped.", token() );

 

//////////////////////////////////////////////////////////////////////////

//  4. In C++ the const specifier implies internal linkage,

//     so either specify extern or initialize the constant.

 

if ( dcl_simple )

     if ( dcl_level_flags(0) & CONST_FLAG )

        if ( ! dcl_parameter && ! dcl_initializer && ! dcl_extern )

            warn( 2001, "Initializer or extern specifier needed." );

 

if ( dcl_level(0) == POINTER )

     if ( dcl_level_flags(0) & CONST_FLAG )

        if ( ! dcl_parameter && ! dcl_initializer && ! dcl_extern )

            warn( 2001, "Initializer or extern specifier needed." );

 

//////////////////////////////////////////////////////////////////////////

//  5. C++ variables should ALWAYS have an explicit type.

 

if ( dcl_no_specifier )

     {

     if ( dcl_function )

        warn( 2002, "%s: Explicit return type recommended.", dcl_name() );

     else

        warn( 2002, "%s: Explicit type specifier needed.", dcl_name() );

     }

 

//////////////////////////////////////////////////////////////////////////

//  6. Default function parameters are much safer than

//     variable argument lists, so use them!

 

if ( dcl_3dots )

     warn( 2003, "Use default parameters, not variable argument lists." );

 

//////////////////////////////////////////////////////////////////////////

//  7. Avoid gratuitous overloading of names in C++.

 

if ( dcl_hidden )

     warn( 2004, "Identifier %s is already in use.", dcl_name() );

 


 

 

7.8    Advanced C++ Rules

 

This rule represents sample in-house quality control standard for C++ that can be used by a typical software team.

Rule File: xyz.cc

 

/*   xyzrule.cc   //  Copyright (c) 1992-96 by Abraxas Software.

    

==============================================================================

     Purpose:  Checks for compatibility with XYZ C++ standards.

    

==============================================================================

 

Abstract:

 

     These CodeCheck rules check for compatibility with the XYZ C++

     Coding Rules.

 

     These rules are applied to all headers that are included in double

     quotes, e.g. #include "project.h". For proper use of these rules, be

     sure to include system headers in angle brackets: #include <ctypes.h>.

 

Warning Codes:

 

9111 Header filename <name> is not in DOS format.

9121 Class names must begin with XYZxxx_

9131 Private member name <name> must not begin in uppercase.

9131 Private member name <name> must end with an underscore.

9211 Definition of class <name> belongs in a header file.

9212 File <name> needs a leading comment block.

9213 Header file <name> should be wrapped in an #ifndef.

9213 <name> is not the correct wrapper name for this file.

9214 Definition of function <name> must NOT be in a header file.

9221 Public section must come first in class <name>.

9222 Data member <name> of class <name> must be private.

9231 Function <name> is too long to be inlined.

9232 Do NOT use inline within a class definition.

9251 Class <name> needs a default constructor.

9251 Class <name> needs a copy constructor.

9252 Class <name> needs an operator=().

9253 Class <name> has too many constructors (limit is 3).

9254 Class <name> needs a destructor.

9255 Destructor for class <name> must be virtual.

9261 Operator <name> should not be a friend.

9261 Do NOT declare friend functions.

9261 Do NOT declare friend classes.

9271 Do NOT use virtual base classes.

9281 Declare <name> using a typedef name, not a basic C type.

9282 Define typedef name <name> in a base class.

9293 Define enum <name> in a base class.

9311 Declare parameter <name> to be a reference to <type>.

9312 Operator <name> should not return an object.

9314 Function <name> should not return an object.

9315 Reference parameters must come first.

9316 Constant member functions should be avoided.

9317 Parameter <name> should not be const.

9411 Use // comments, not oldstyle C comments.

9421 Declare <name> as a const, not a macro.

9422 This enumeration needs an enum type name.

9431 Global constant <name> should be a class member.

9441 Function <name> needs an explicit return type.

 

==============================================================================

*/

 

#include <check.cch>

 

#define DOT       ('.')     //   period character

#define SLASH     ('/')     //   forward slash character

#define BACKSLASH ('\\')    //   backslash character

#define COLON     (':')     //   colon character

#define TILDE     ('~')     //   tilde character

#define UNDERSCORE ('_')     //   underscore character

 

#define PUBLIC    0

#define PROTECTED 1

#define PRIVATE       2

 

 

int     ch, j, k, okay, level,

        is_constant, is_object,

        lin_if_depth,    //  Holds latest value of pp_if_depth.

        comment_needed,     //   True until a header file's comment-block is found.

        define_needed,      //   True until a header file's wrapper macro is #defined.

        public_needed,      //   True when a class definition begins.

        one_liner_needed,   //   True when a function is explicitly inlined.

        no_wrap_message, //  True if a wrapper-needed message has NOT been given.

        class_has_virtual_function,

        destructor_is_virtual,

        non_ref_parm_found, //   True if a non-reference fcn parameter has been found.

        detect_virtual_base,

        bad_name,        //  True if the wrapper name is wrong.

        length, ch1, ch2;

 

 

 

// ---------- Rule 1.1.1 ----------

 

if ( header_name() )     // A header is about to be opened

     {

     if ( pp_include < 3 )  //   Header filename is in double quotes

        {

        j = 0;

        k = 0;

        ch = header_name()[k++];

        while ( ch != 0 )

            {

            ch = header_name()[k++];

            if ( ch == DOT )

               {

               break;    // beginning of extension found

               }

            else

               {

               j = k;    // count characters before dot

               }

            if ( ch == BACKSLASH || ch == COLON )

               {

               j = 0;    // DOS directory or disk marker

               }

            else if ( ch == SLASH || ch == TILDE )

               {

               j = 0;    // Unix directory or disk marker

               }

            }

        if ( j > 8 )

            {

            warn( 9111, "Header filename %s is not in DOS 8.3 format.",

                  header_name() );

            }

        else if ( ch == DOT )

            {

            j = 0;

            ch = header_name()[k+j];

            while ( ch != 0 )

               {

               j++;

               ch = header_name()[k+j]; //     count chars after dot.

               }

            if ( j > 3 )

               {

               warn( 9111, "Header filename %s is not in DOS 8.3 format.",

                      header_name() );

               }

            }

        }

     }

 

 

 

// ---------- Rule 1.2.1 ----------

 

if ( tag_begin )

     {

     // Apply this rule to global classes only:

 

     if ( tag_global && (tag_kind == CLASS_TAG) )

        {

        if ( ! prefix("XYZ") )

            warn( 9121, "Class names must begin with \"XYZ_\"" );

        else if ( strstr(tag_name(), "_") == 0 )

            warn( 9121, "Use an underscore after the class name prefix." );

        }

     }

 

 

 

// ---------- Rule 1.3.1 ----------

 

if ( dcl_member == 3 )

     {

     if ( dcl_access == PRIVATE )

        {

        if ( dcl_variable )

            {

            if ( isupper(dcl_name()[0]) )

               warn( 9131, "Private data member name %s must not begin in uppercase.",

                      dcl_name() );

            if ( ! suffix("_") )

               warn( 9131, "Private data member name %s must end with an underscore.",

                      dcl_name() );

            }

        else if ( dcl_function )

            {

            if ( isupper(dcl_name()[0]) )

               warn( 9131, "Private function name %s must not begin in uppercase.",

                      dcl_name() );

            if ( ! suffix("_") )

               warn( 9131, "Private function name %s must end with an underscore.",

                      dcl_name() );

            }

        }

     }

 

 

 

// ---------- Rule 2.1.1 ----------

 

if ( tag_begin )

     {

     if ( tag_global && (tag_kind == CLASS_TAG) )

        {

        if ( lin_source )

            warn( 9211, "Definition of class %s belongs in a header file.",

                  tag_name() );

        }

     }

 

 

 

// ---------- Rules 2.1.2 ----------

 

if ( mod_begin )

     {

     comment_needed = FALSE;

     }

 

if ( lin_end )

     {

     if ( lin_number == 1 )

        {

        if ( lin_is_comment )

            {

            comment_needed = FALSE;

            }

        else if ( lin_header )

            {

            comment_needed = TRUE;

            }

        }

       

     if ( comment_needed )

        {

        if ( lin_is_comment )

            {

            comment_needed = FALSE;

            }

        else if ( lin_preprocessor )

            {

            warn( 9212, "File %s needs a leading comment block.",

                  file_name() );

            comment_needed = FALSE;

            }

        else if ( lin_has_code )

            {

            warn( 9212, "File %s needs a leading comment block.",

                  file_name() );

            comment_needed = FALSE;

            }

        }

     }

 

 

 

// ---------- Rule 2.1.3 ----------

 

if ( pp_if_depth || pp_endif )

     {

     lin_if_depth = pp_if_depth;

     }

 

if ( pp_include )

     {

     if ( lin_header )

        {

        if ( no_wrap_message )

            {

            warn( 9213, "Header file %s should be wrapped in an #ifndef.",

                  file_name() );

            }

        }

 

     no_wrap_message = TRUE;

     define_needed = TRUE;

     }

 

if ( lin_dcl_count )

     {

     if ( lin_header && lin_if_depth == 0 )

        {

        if ( no_wrap_message )

            {

            warn( 9213, "Header file %s should be wrapped in an #ifndef.",

                  file_name() );

            no_wrap_message = FALSE;

            define_needed = FALSE;

            }

        }

     }

 

if ( pp_macro )

     {

     if ( lin_header )

        {

        if ( no_wrap_message && lin_if_depth == 0 )

               {

               warn( 9213, "Header file %s should be wrapped in an #ifndef.",

                      file_name() );

               no_wrap_message = FALSE;

               define_needed = FALSE;

               }

        if ( define_needed && lin_if_depth == 1 )

            {

            bad_name = FALSE;

            length = strlen( pp_name() );

            k = 0;

            while ( k < length )

               {

               ch1 = pp_name()[k];

               ch2 = file_name()[k];

               k++;

               if ( isalpha(ch2) )

                  {

                  ch2 = toupper( ch2 );

                  }

               else if ( ch2 == DOT )

                  {

                  ch2 = UNDERSCORE;

                  }

               if ( ch1 != ch2 )

                  {

                  warn( 9213, "%s is not the correct wrapper name for this file.",

                         pp_name() );

                  break;

                  }

               }

            }

        define_needed = FALSE;               

        }

     }

 

 

 

// ---------- Rule 2.1.4 ----------

 

if ( dcl_function )

     {

     if ( dcl_member && dcl_definition )

        {

        if ( ! lin_source )

            {

            warn( 9214, "Definition of function %s must NOT be in a header file.",

                  dcl_name() );

            }

 

        one_liner_needed = dcl_inline;        //     for Rule 2.3.2

        }

     }

 

 

 

// ---------- Rule 2.2.1 ----------

 

if ( tag_begin )

     {

     if ( tag_global && (tag_kind == CLASS_TAG) )

        {

        public_needed = TRUE;

        }

     }

 

if ( dcl_member )

     {

     if ( dcl_access )      //   true if protected or private member

        {

        if ( public_needed )

            {

            warn( 9221, "Public section must come first in class %s.",

                  class_name() );

            }

        }

     public_needed = FALSE;

     }

 

 

 

// ---------- Rule 2.2.2 ----------

 

if ( dcl_member )

     {

     if ( dcl_variable && (dcl_access != PRIVATE) )

        {

        warn( 9222, "Data member %s of class %s must be private.",

               dcl_name(), class_name() );

        }

     }

 

 

 

// ---------- Rule 2.3.1 ----------

 

if ( fcn_exec_lines > 1 )

     {

     if ( one_liner_needed )

        {

        warn( 9231, "Function %s is too long to be inlined.",

               fcn_name() );

        }

     }

 

 

 

// ---------- Rule 2.3.2 ----------

 

 

if ( dcl_inline )

     {

     if ( lin_within_class == 1 )

        {

        warn( 9232, "Do NOT use inline within a class definition. (%d)", lin_within_class );

        }

     }

 

 

 

// ---------- Rule 2.4.1 ----------

 

//   *** This needs a new trigger in CodeCheck: dcl_override. ***

 

 

 

// ---------- Rules 2.5.1, 2.5.2, and 2.5.4 ----------

 

if ( tag_end )

     {

     if ( tag_global && (tag_kind == CLASS_TAG) )

        {

        if ( ! tag_has_default )

            {

            warn( 9251, "Class %s needs a default constructor.", class_name() );

            }

        if ( ! tag_has_copy )

            {

            warn( 9251, "Class %s needs a copy constructor.", class_name() );

            }

        if ( ! tag_has_assign )

            {

            warn( 9252, "Class %s needs an operator=(%s&).", class_name(),

                  class_name() );

            }

        }

     }

 

 

 

// ---------- Rule 2.5.3 ----------

 

if ( tag_constructors > 3 )

     {

     warn( 9253, "Class %s has too many constructors (limit = 3).",

            class_name() );

     }

 

 

 

// ---------- Rules 2.5.4 and 2.5.5 ----------

 

if ( tag_begin )

     {

     if ( tag_global && (tag_kind == CLASS_TAG) )

        {

        class_has_virtual_function = FALSE;

        destructor_is_virtual = FALSE;

        }

     }

 

if ( dcl_function )

     {

     if ( dcl_virtual )

        {

        if ( dcl_base == DESTRUCTOR_TYPE )

            {

            destructor_is_virtual = TRUE;

            }

        else

            {

            class_has_virtual_function = TRUE;

            }

        }

     }

 

if ( tag_end )

     {

     if ( tag_global && (tag_kind == CLASS_TAG) )

        {

        if ( tag_has_destr )

            {

            if ( class_has_virtual_function )

               {

               if ( ! destructor_is_virtual )

                  {

                  warn( 9255, "Destructor %s::~%s() must be virtual.",

                         class_name(), class_name() );

                  }

               }

            }

        else

            {

            warn( 9254, "Class %s needs a destructor.", class_name() );

            }

        }

     }

 

 

 

// ---------- Rules 2.6.1 and 2.6.2 ----------

 

if ( dcl_friend )

     {

     if ( dcl_function )

        {

        if ( prefix("operator") )

            {

            if ( strequiv(root(), "=") )

               ;

            else if ( strequiv(root(), "+") )

               ;

            else if ( strequiv(root(), "-") )

               ;

            else if ( strequiv(root(), "*") )

               ;

            else if ( strequiv(root(), "/") )

               ;

            else if ( strequiv(root(), "%") )

               ;

            else if ( strequiv(root(), ">>") )

               ;

            else if ( strequiv(root(), "<<") )

               ;

            else

               warn( 9261, "%s should not be a friend.", dcl_name() );

            }

        else

            warn( 9261, "Do NOT declare friend functions." );

        }

     else

        {

        warn( 9262, "Do NOT declare friend classes." );

        }

     }

 

 

 

// ---------- Rule 2.7.1 ----------

 

if ( keyword("class") )

     {

     detect_virtual_base = TRUE;

     }

 

if ( keyword("virtual") )

     {

     if ( detect_virtual_base )

        {

        warn( 9271, "Do NOT use virtual base classes." );

        }

     }

 

if ( dcl_base == CLASS_TYPE )

     {

     detect_virtual_base = FALSE;

     }

 

 

 

// ---------- Rule 2.8.1 ----------

 

if ( dcl_member )

     {

     if ( dcl_variable && dcl_base != DEFINED_TYPE )

        {

        warn( 9281, "Declare %s using a typedef name, not a basic C type.",

               dcl_name() );

        }

     }

 

 

 

// ---------- Rule 2.8.2 ----------

 

if ( dcl_typedef )

     {

     if ( dcl_member == 0 )

        {

        warn( 9282, "Define typedef name %s in a base class.",

               dcl_name() );

        }

     }

 

 

 

// ---------- Rule 2.9.1 ----------

 

//   This rule is not currently enforceable with CodeCheck.

 

 

 

// ---------- Rule 2.9.2 ----------

 

//   This rule is redundant (covered by 3.1.2).

 

 

 

// ---------- Rule 2.9.3 ----------

 

if ( tag_kind == ENUM_TAG )

     {

     if ( tag_global )

        {

        warn( 9293, "Define enum %s in a base class.", tag_name() );

        }

     }

 

 

 

// ---------- Rule 3.1.1 ----------

 

if ( dcl_parameter )

     {

     is_object = ((dcl_base == CLASS_TYPE) || (dcl_base == STRUCT_TYPE));

     if ( is_object )

        {

        if ( (dcl_levels == 0) || ((dcl_levels == 1) && (dcl_level(0) == POINTER)) )

            {

            if ( dcl_abstract )

               {

               warn( 9311, "Declare parameter #%d to be a reference to %s.",

                      dcl_parameter, dcl_base_name() );

               }

            else

               {

               warn( 9311, "Declare parameter %s to be a reference to %s",

                      dcl_name(), dcl_base_name() );

               }

            }

        }

     }

 

 

 

// ---------- Rules 3.1.2 and 3.1.4 ----------

 

if ( dcl_function )

     {

     is_object = ((dcl_base == CLASS_TYPE) || (dcl_base == STRUCT_TYPE));

     if ( dcl_member && is_object )

        {

        if ( (dcl_levels == 1) || ((dcl_levels == 2) && (dcl_level(1) == REFERENCE)) )

            {

            if ( prefix("operator") )

               {

               if ( strequiv(root(), "=") )

                  ;

               else if ( strequiv(root(), "+") )

                  ;

               else if ( strequiv(root(), "-") )

                  ;

               else if ( strequiv(root(), "*") )

                  ;

               else if ( strequiv(root(), "/") )

                  ;

               else if ( strequiv(root(), "%") )

                  ;

               else if ( strequiv(root(), ">>") )

                  ;

               else if ( strequiv(root(), "<<") )

                  ;

               else

                  {

                  warn( 9312, "%s::%s() should not return an object.",

                         class_name(), dcl_name() );

                  }

               }

            else

               {

               warn( 9314, "Function %s::%s() should not return an object.",

                      class_name(), dcl_name() );

               }

            }

        }

     }

 

 

 

// ---------- Rule 3.1.3 ----------

 

//   This rule cannot be enforced with this version of CodeCheck

 

 

 

// ---------- Rule 3.1.5 ----------

 

if ( dcl_parameter )

     {

     if ( dcl_parameter == 1 )

        {

        non_ref_parm_found = FALSE;

        }

       

     if ( dcl_level(0) == REFERENCE )

        {

        if ( non_ref_parm_found )

            {

            warn( 9315, "Reference parameters must come first." );

            }

        }

     else   // a non-reference fcn parameter has been found

        {

        non_ref_parm_found = TRUE;

        }

     }

 

 

 

// ---------- Rule 3.1.6 ----------

 

if ( dcl_function )

     {

     is_constant = (dcl_level_flags(0) & CONST_FLAG);

     if ( dcl_member && is_constant )

        {

        warn( 9316, "Constant member functions should be avoided." );

        }

     }

 

 

 

// ---------- Rule 3.1.7 ----------

 

if ( dcl_parameter )

     {

     is_object = ((dcl_base == CLASS_TYPE) || (dcl_base == STRUCT_TYPE));

     if ( is_object && (dcl_levels == 1) )

        {

        is_constant = (dcl_level_flags(1) & CONST_FLAG);

        level = dcl_level( 0 );

        if ( is_constant && ((level == POINTER) || (level == REFERENCE)) )

            {

            if ( dcl_abstract )

               {

               warn( 9317, "Parameter #%d should not be const.",

                      dcl_parameter );

               }

            else

               {

               warn( 9317, "Parameter %s should not be const.", dcl_name() );

               }

            }

        }

     }

 

 

 

// ---------- Rule 4.1.1 ----------

 

if ( lex_c_comment )

     {

     warn( 9411, "Use /\/ comments, not \/\*...*\/ comments." );

     }

 

 

 

// ---------- Rule 4.2.1 ----------

 

if ( pp_const )

     {

     if ( lin_header && ! define_needed )

        {

        warn( 9421, "Declare %s as a const, not a macro.", pp_name() );

        }

     }

 

 

 

// ---------- Rule 4.2.2 ----------

 

if ( tag_anonymous )

     {

     if ( tag_global && tag_kind == ENUM_TAG )

        {

        warn( 9422, "This enumeration needs an enum type name." );

        }

     }

 

 

 

// ---------- Rule 4.3.1 ----------

 

if ( dcl_variable )

     {

     is_constant = (dcl_level_flags(dcl_levels) & CONST_FLAG);

     if ( dcl_global && is_constant )

        {

        warn( 9431, "Global constant %s should be a class member.",

               dcl_name() );

        }

     }

 

 

 

// ---------- Rule 4.4.1 ----------

 

if ( dcl_no_specifier )

     {

     if ( dcl_function )

        {

        warn( 9441, "Function %s needs an explicit return type.",

               dcl_name() );

        }

     }

 


Chapter 8:  Supporting Material

 

 

8.1    Glossary

 

abstract declarator         A declarator without an identifier. Abstract declarators are only used in two situations in C: in casts, and as the ar­gu­ment for sizeof.

ANSI                              American National Standards Institute.

ASCII                             American Standard Code for Information Interchange. A 7-bit cod­ing scheme for the binary representation of al­pha­betic, nu­meric, and page-for­matting infor­mation. ASCII is used by al­most all com­puters except IBM main­frames and their compat­ibles, which use EBCDIC (al­though IBM mi­crocomputers and their compatibles do not).

big endian                       A computer is “big endian” (as opposed to “little en­dian”) if the low-order (“end”) byte in a word has a larger address than the high-order byte. The Motorola 68000 family is big endian, while the VAX and Intel 80x86 fami­lies are not. Syn­onym: left-to-right.

bitwise                           Certain operators in C act in a “bitwise” fashion, mean­ing that the operator is separately applied to each bit of the operands.

BNF                               Either “Backus-Naur Form” or “Backus Normal Form”, de­pending on how much credit you want to give to Dr. P. Naur. This is a lan­guage designed for use in describing language grammars.

cast                                 A “cast” is an explicit type conversion. In C, a cast con­sists of an abstract declarator surrounded by parentheses.

conjunction                     The logical (i.e. not bitwise) “and” operation.

declarator                       A type specification. A declarator may or may not in­clude an identi­fier. For example, the cast (char *) is an ab­stract de­clar­a­tor without an identifier, while the ex­pres­sion  char *x is a declarator with an identifier (i.e. x).

dereference                    A pointer is “dereferenced” when the value is obtained from the ad­dress to which the pointer points. The origin of this awkward neol­ogism is obscure, but it seems to have been in­vented by analogy with the word “reference”: if a variable can be referenced by a pointer, then the poin­ter must be derefer­enced to obtain the value to which it points. It’s not English, but it com­putes.

disjunction                      The logical (i.e. not bitwise) “or” operation.

EBCDIC                         Extended binary coded decimal interchange code. An 8-bit coding scheme for the binary representation of alpha­betic, nu­meric, and page-formatting in­formation. EBCDIC was invented by IBM for use by its mainframe com­puters. Pro­nounced “ebsidick”.

enumeration tag              The name given to an enumeration. Once an enumera­tion has been tagged, it can be identified by tag when declaring other variables that refer to the same enumer­a­tion.

escape sequence             The C language permits the use of certain codes that stand for spe­cial char­acters (e.g. \b is the C escape se­quence for the backspace con­trol character).

exception                        An unexpected condition that the program encounters and cannot cope with. See C++ keywords try, catch, and throw.

extent                              The “extent” of a variable is the time during which it refers to stor­age. Vari­ables with static extent have storage allo­cated through­out execution of the program, while vari­ables of local extent are allo­cated storage only while ex­e­cution proceeds through the block in which they are de­clared. See “scope” and “visibility”.

hexadecimal                   Base 16 numbers. The digits are: {0, 1, … 8, 9, a, b, c, d, e, f}. Hexa­decimal constants in C are indicated by the prefix “0x”.

identifier                        A name for a variable or function.

indirection                      The “indirection” operator (*) dereferences a pointer. That is to say, it yields the value stored in the address to which the pointer points.

infix operator                 A binary operator is “infix” if it is written between its operands. For example, the logical and operator (&&) is in­fix. See also “prefix” and “postfix”.

initializer                        A declaration in the C language may include an initial value for the variable that is declared. This value is its “initializer”.

ISO                                 International Standards Organization.

lexical analysis              A compiler is performing “lexical analysis” of source code when it is read­ing characters and collecting them into to­kens (e.g. names, num­bers, strings, operators).

linkage                            An identifier has “external linkage” if it is supposed to re­fer to the same object in each module in which it is de­clared. An identifier has “internal linkage” if it refers to a different object in each module.

lint                                  This is the name given in the early days of C to a utility pro­gram that per­formed basic error-checking (i.e. lint-pick­ing) on C source files. As C com­pilers have evolved so­phisticated er­ror-checking abilities of their own, lint pro­grams have either become obsolete or have evolved into specialized error-check­ing niches.

little endian                    A computer is “little endian” (as opposed to “big en­dian”) if the low-order (“end”) byte in a word has a smaller ad­dress than the high-or­der byte. The VAX and Intel 80x86 families are little endian, while the  Motorola 68000 fami­ly is not. Synonym: right-to-left.

lvalue                             An “lvalue” (pronounced el-value) is an expression that refers to a region of memory that can be examined or al­tered. So-called because it is a value that can appear on the left side of an assignment.

macro                             A macro is an expression that, in form, resembles a con­s­tant or a func­tion call. However, the C preprocessor re­places every macro expression with its appropriate C ex­pansion before compilation.

manifest constant            A constant (in any computer language) is “manifest” if (a) its mean­ing is clearly apparent, and (b) its value never changes. Mani­fest constants are best created in C with the #define pre­pro­ces­sor direc­tive.

module                           The term “module” in C usually refers to a source file and all of its associ­ated include files. Each module of a C pro­gram is compiled independently.

nil                                   Empty. Frequently used by former Pascal programmers as a synonym for “null”.

null pointer                     A pointer whose value is not the address of any object. By convention, the address stored in a null pointer is zero. The fact that such a pointer ac­tually points to ad­dress 0 is pure­ly coincidental. Beware: in some machines the null pointer is not zero.

null statement                 C permits a statement to be empty — the null statement con­sists of a single semicolon.

null string                       An empty character string.

octal                               Base 8 numbers. The digits are: {0, 1, 2, 3, 4, 5, 6, 7}. In older versions of C the digits 8 and 9 are also allowed, for rea­sons that surpass under­standing. Warning: octal con­stants in C are indicated by the prefix digit “0”. Thus 010 is the same number as 8 (decimal).

overflow                        The result of an arithmetic operation is said to “overflow” when the result is too large to represent us­ing the speci­fied type.

overload                         An identifier has been “overloaded” if it is used for more than one purpose within a single block. Overload­ing is permitted (but not recom­mended) in certain spe­cific con­texts in C. In the worst case a single name can simulta­ne­ously refer to a macro, a statement label, a structure tag, a component name, and a variable name. In C++ the over­loading is carried to even greater ex­tremes.

PCC                                The Portable C Compiler. This C compiler from Bell Labs im­plemented a version of the K&R standard, and served as the starting point for many commercial C compilers.

pointer                            A variable whose value is an address.

postfix operator              A unary operator is “postfix” if it is written after its operand. The sub­script se­lection operator (square brack­ets) is postfix. See also “prefix” and “infix”.

pragma                           ANSI C allows the new preprocessor directive #pragma, which is used to supply implementation-de­fined infor­ma­tion to the compiler.

prefix operator               A unary operator is “prefix” if it is written before its oper­and. The nega­tion operator (–) is prefix. See also “post­fix” and “infix”.

preprocessor                  Unlike almost all other high-level languages, the C lan­guage has a built-in pre­processor which manipulates the source code in advance of actual com­pilation. The C pre­processor is primar­ily used for (a) expanding macro ex­pressions into true C code, (b) inserting other files into the compilation stream, (c) con­trolling conditional in­clusion of code, and (d) perpetrating sly programming tricks on poor unsuspecting com­pilers.

prototype                        A new and welcome addition to the C language, a func­tion pro­totype is a declaration which specifies the type of each of its formal parameters, and the type of its return value.

pseudo-unsigned             C compilers may treat the type char as either signed or unsigned. Unfortu­nately, compilers may also treat this type as “pseudo-un­signed”, meaning that its value is never negative but it is treated as though it were signed during type conver­sions. Ugly.

rule                                 If-statements are called “rules” in some computer lan­guages (and es­pecially in expert systems) if they satisfy these condi­tions: (a) the rules are not part of the pro­gram (i.e. they are data, not code), and (b) the rules are not or­dered.

scope                              The “scope” of a declaration is the set of statements through­out which the declaration is active. Note: a dec­la­ration may not be “visible” within all statements of its scope. See “visibility” and “extent”.

semantics                         A compiler is performing semantic processing when it is at­taching meaning to the grammatical forms that are de­tected during syntactic analysis.

signal                              A “signal” is an asynchronous event that requires spe­cial handling. Signals may be “raised” by the computer’s error de­tection mechan­ism, by the pro­gram itself, or by events external to the program.

signal handler                 A “signal handler” is a C function that is invoked when its signal is “raised”. For example, an arithmetic over­flow han­dler is a function that might allow a graceful exit to oc­cur from a numerical com­pu­ta­tion when an over­flow oc­curs.

storage class                   A declaration can be modified by giving it a storage class specifier. The C storage classes are: auto, extern, regis­ter, static, and typedef.  In C++ typedef is not a storage class.

syntactics                        A compiler is performing syntactic analysis when it is read­ing a se­quence of tokens and parsing them into gram­mati­cal pat­terns. Syn­tactic analysis fol­lows lexical analysis, and precedes semantics. 

token                               The smallest unit of text recognized by a compiler, e.g. the keyword “else”, the operator “+=”, and the semi­colon “;”.

trigraph                           For the sake of keyboard devices that do not have the full comple­ment of C characters, the ANSI standard al­lows the use of “trigraphs” to rep­resent the missing char­acters. Each tri­graph is a three-character sequence begin­ning with “??”.

unary                              An operation is “unary” if it operates on exactly one operand.

underflow                       The result of an arithmetic operation is said to “underflow” when the result is too small to represent us­ing the specified type.

visibility                         A declaration for an identifier is “visible” within a C pro­gram if the identifier is still associated with the decla­ra­tion. Visibility is lost when the declaration is hidden by another declaration of the identi­fier in an inner block. See “scope” and “extent”.

void                                The absence of a quantity (not the quantity zero). A func­tion which returns “void” is a function which returns no value at all. A “pointer to void” is the generic unde­fined pointer to data in ANSI C.

volatile                           In ANSI C, a variable may be declared “volatile”, mean­ing that it must be treated with great care by optimizing com­pilers.

whitespace                     Source characters that consist of blank, tab, and return charac­ters. In some contexts a comment also counts as whitespace.

xor                                  The exclusive “or” operation.

yacc                                “Yet Another Compiler Compiler”. This is a program that takes as input a BNF description of a language grammar and produces as output a parser for that lan­guage.


8.2    Bibliography

 

AN90        X3J11 Technical Committee (1990) American National Standard for the Programming Language C. American National Standards Insti­tute.

AN95        Doc No:X3J16/95-0087 (1995) Working Paper for Draft Proposed International Standard for Information Systems - Programming Language C++ CBEMA, 1250 Eye Street NW, Suite 200,  Washington DC 20005

AN96        Doc No:X3J16/96-0225 (1996) Working Paper for Draft Proposed  International Standard for Information Systems – Programming Language C++. ITIC, 1250 Eye Street NW, Suite 200, Washington DC 20005

CDS86      Conte, SD, Dunsmore, HE, and Shen, VY (1986) Software En­gin­eer­ing Metrics and Models. Benjamin.

DM87        DeMillo, RA, McCracken, WM, Martin, RJ and Passafiume, JF (1987) Software Testing and Evaluation. Benjamin/Cummings.

GS96         Glass, G and Schuchert B(1996) The STL <Primier>. Prentice Hall.

GC87         Grady, RB and Caswell, DL (1987) Software Metrics: Establishing a Com­pany-Wide Program. Prentice-Hall.

ES90         Ellis, MA, and Stroustrup, B (1990) The Annotated C++ Reference Manual. Addison-Wesly.

POSIX.1    IEEE Technical Committee on Operating Systems (1990) Infor­ma­tion Technology — Portable Operating System Interface — Part 1: System Application Program Interface (API) [C Lan­guage]. ISO/ IEC 9945-1; IEEE Std 1003.1-1990. Institute of Electrical and Elec­tronic Engineers.

MH77        Halstead, MH (1977) Elements of Software Science. Elsevier.

HS84         Harbison, SP and Steele, GL (1984) C: A Reference Man­ual. Prentice-Hall. .P.; .L.;

HS88         Harbison, SP and Steele, GL (1987) C: A Reference Man­ual, 2nd Edi­tion. Prentice-Hall. .P.; .L.;

HS91         Harbison, SP and Steele, GL (1991) C: A Reference Man­ual, 3rd Edi­tion. Prentice-Hall. .P.; .L.;

MRH90     Horton, MR (1990) Portable C Software. Prentice-Hall. .;

RJ88          Jaeschke, R (1988) Portability and the C Language. Hayden Books.

KR78         Kernighan, BW and Ritchie, DM (1978) The C Programming Lan­guage. Prentice-Hall. .W.;.M.;

KR88         Kernighan, BW and Ritchie, DM (1988) The C Programming Lan­guage, 2nd Edition. Prentice-Hall. .W.;.M.;

AK89        Koenig, A (1989) C Traps and Pitfalls. Addison-Wesley.

OL86         Lecarme, O and Gart, ML (1986) Software Portability. McGraw-Hill.

MSD92      Meyers, SD(1992) Effective C++. Addison-Wesley.

MSD96      Meyers, SD(1996) More Effective C++. Addison-Wesley.

OS95         ObjectSpace(1995) STL<ToolKit>. ObjectSpace

PS81          Perlis, A, Sayward, F and Shaw, M (1981) Software Metrics: An Anal­ysis and Evaluation. MIT Press.

TP84         Plum, T (1984) C Programming Guidelines. Prentice-Hall.

TP89         Plum, T (1989) C Programming Guidelines, 2nd Edition. Prentice-Hall.

RC90         Rabinowitz, H and Chaim, S (1990) Portable C. Prentice-Hall.

BS91         Stroustrup, B (1991) The C++ Programming Language, 2nd Edition. Ad­di­son-Wes­ley.

BS97         Stroustrup, B (1997) The C++ Programming Language, 3rd Edition, Addison-Wesley.

VW88        Vincent, J, Waters, A, and Sinclair, J (1988) Software Quality Assur­ance, Volume 1. Prentice-Hall.


8.3      Index

 


"-D, 22

analysis

lexical, 150

semantic, 153

syntactic, 153

ANSI, 148

array initializer, 48

ASCII, 148

assembler, 20, 36

AT&T C++, 4

backslash, 37

binary decision point, 99

bitfield, 49, 50

bitwise, 148

BNF, 148

boolean, 68

Borland, 4, 84

braces, 70

C++, i, v, 1, 4, 5, 9, 10, 11, 16, 19, 21, 22, 24, 38, 47, 55, 66, 68, 75, 76, 83, 84, 92, 93, 95, 133, 135, 149, 151, 153, 155, 156

dialects, 4

example rules, 133

cast, 148

CCEXCLUDE, 6

CCRULES, 2, 10

characters

constants, 32

nonstandard, 30

pseudo-unsigned, 49, 152

standard, 30

trigraph, 31

underscore, 29

wide, 38

cnv_any_to_ptr, 28

cnv_ptr_to_ptr, 28

CodeCheck

predefined constants, 20

CODECHECK, 20

CodeWarrior, 4

comma, 69

operator, 105, 110, 111

separator, 69

command line, 1

comment

//, 75

macro, 42

nested, 4, 75

Compiler Drift, 14

complexity, 99

Concatenation, 36

conjunction, 148

const, 84

constants

hexadecimal, 35

manifest, 32, 67, 151

octal, 35

predefined, 20

Continuation, 37

dcl_3dots, 52

dcl_all_upper, 62

dcl_any_upper, 29

dcl_auto_init, 48

dcl_base, 25, 49, 50, 54, 65

dcl_base_name(), 26, 65

dcl_bitfield, 50

dcl_bitfield_anon, 50

dcl_bitfield_arith, 50

dcl_bitfield_size, 50

dcl_count, 51

dcl_cv_modifier, 84

dcl_empty, 50

dcl_enum_hidden, 62

dcl_explicit, 85

dcl_extern, 59, 65, 83

dcl_extern_ambig, 30, 59

dcl_first_upper, 62

dcl_from_macro, 65

dcl_function, 52, 53, 54

dcl_global, 26, 59, 65, 83

dcl_hidden, 62, 65

dcl_Hungarian, 66

dcl_ident_length, 59, 81

dcl_init_arith, 49

dcl_initializer, 54, 83

dcl_label_overload, 62

dcl_level(), 25, 52

dcl_level_flags(), 26

dcl_levels, 25

dcl_local, 26, 59, 66

dcl_name(), 25

dcl_need_3dots, 52

dcl_no_specifier, 82

dcl_not_declared, 82

dcl_oldstyle, 52

dcl_parm_hidden, 57

dcl_signed, 66

dcl_simple, 54

dcl_static, 54, 59, 66

dcl_template, 83

dcl_typedef, 53, 63, 66

dcl_typedef_dup, 53

dcl_underscore, 30, 34

dcl_union_bits, 50

dcl_union_init, 48

dcl_unsigned, 66

dcl_variable, 83

decisions, 99

declarator, 148

abstract, 148

default.cco, 2, 5

defined, 43, 45

dereference, 149

dialect, 4

disjunction, 149

EBCDIC, 149

elif, 43

endian

big, 148

little, 150

end-of-file, 36

ENUM_TYPE, 49

enumerated constant, 49

environment

CCEXCLUDE, 1

CCRULES, 1

CIncludes, 1

INCLUDE, 1

escape, 149

escape codes, 31

Escape Sequences, 32

exception, 149

exception handler, 93

exp_not_ansi, 55

exp_operands, 108

exp_operators, 107

exp_tokens, 108

explicit, 85

extensions

Borland, 4, 84

C++, 4

CodeCheck, 8

HP/Apollo, 4

Intel, 84

Metaware, 4, 45, 84

Microsoft, 4, 51, 84

MPW C, 51

Symantec, 4

Think C, 51

Vax, 4, 45

extent, 149

extern, 59, 65, 83, 153

far, 63, 84

fcn_com_lines, 90

fcn_decisions, 100

fcn_exec_lines, 90

fcn_H_operators, 96

fcn_high, 94

fcn_low, 94

fcn_nonexec, 94

fcn_operands, 96, 108

fcn_operators, 96, 108

fcn_tokens, 96, 108

fcn_total_lines, 90

fcn_u_operands, 96

fcn_uH_operators, 96

fcn_white_lines, 90

file

listing, 4, 6

object, 2

project, 2

prototypes, 6

rule, 2, 5

stderr.out, 4

flowchart, 100

flowgraph, 100

Gnu C, 55

Hall, Mark R., 156

Halstead, Maurice, 95

Harbison, S.P., 155, 156

header files

search path, 3

suppress checking, 6

hexadecimal, 149

huge, 84

Hungarian prefixes, 63

identifier, 149

identifier(), 37

idn_global, 26

idn_local, 26

idn_member, 26

idn_parameter, 26

indentation, 70

indirection, 150

infix, 150

initializer, 150

initializers

array, 48

Installation, 1

Intel, 84

ISO, 16, 150

Jaeschke, Rex, 156

Kernighan, B.W., 156

keyword(), 37, 52

Koenig, Andrew, 13, 156

labels, 83

lex_ansi_escape, 33

lex_assembler, 37

lex_backslash, 37

lex_bad_call, 41

lex_big_octal, 33

lex_char_empty, 33

lex_char_long, 33

lex_cpp_comment, 75

lex_float, 35

lex_hex_escape, 32, 33

lex_initializer, 68, 76

lex_invisible, 38

lex_lc_long, 74

lex_long_float, 35

lex_nl_eof, 36

lex_nonstandard, 30

lex_not_KR_escape, 33

lex_not_manifest, 67, 76

lex_null_arg, 41

lex_num_escape, 32, 77

lex_punct_after, 69

lex_punct_before, 69

lex_radix, 35

lex_str_concat, 36

lex_str_length, 59

lex_str_macro, 44

lex_str_trigraph, 31

lex_suffix, 35, 74

lex_trigraph, 31

lex_unsigned, 35

lex_wide, 38

lex_zero_escape, 32

lexical, 150

lexical guidelines, 29

lin_continuation, 71

lin_continues, 69

lin_dcl_count, 82, 89

lin_end, 89

lin_has_code, 89, 108

lin_has_comment, 82, 89

lin_header, 89

lin_include_kind, 89

lin_include_name(), 89

lin_indent_space, 71

lin_indent_tab, 71

lin_is_comment, 89

lin_is_exec, 89, 108

lin_is_white, 89

lin_length, 59

lin_nest_level, 2, 71

lin_nested_comment, 75

lin_number, 90

lin_operands, 96, 108

lin_operators, 96, 108

lin_preprocessor, 89

lin_source, 90

lin_suppressed, 89

lin_tokens, 96, 108

linkage, 150

lint, v, 150

logic structure, 100

long float, 34

lvalue, 150

machismo, 13

macro, 150

macros

predefined, 20

maintainability, 99

manifest constants, 67

McCabe, 100

Metaware, 4, 84

Metrowerks, 4

Microsoft, 4, 51, 84

mod_com_lines, 90

mod_decisions, 100

mod_exec_lines, 90

mod_functions, 98

mod_H_operators, 97

mod_high, 94

mod_low, 94

mod_macros, 98

mod_nonexec, 94

mod_operands, 97, 109

mod_operators, 97, 109

mod_tokens, 97, 109

mod_total_lines, 90

mod_u_operands, 97

mod_uH_operators, 97

mod_white_lines, 90

module, 151

MPW C, 51

typedef names, 53

mutable, 85

near, 63, 84

nested

comments, 75

header files, 59

logic, 103

nested:. if directives

newline, 36

NUXI, 33

octal, 151

op_base(), 27

op_cast, 27

op_declarator, 69

op_executable, 69

op_high, 69

op_infix, 69

op_level(), 27

op_level_flags(), 27

op_levels(), 27

op_low, 69

op_medium, 69

op_operands, 27

op_postfix, 69

op_prefix, 69

op_space_after, 70

op_space_before, 70

op_white_after, 70

op_white_before, 70

options

–A, 2

–B, 2, 70, 71

–C, 2

–D, 2

–E, 3

embedded SQL, 6

–F, 3

–G, 3

–H, 3

–I, 3

–J, 3

–K, 4, 42

–L, 4

–M, 4

macros, 4

–N, 4

–NEST, 4

nested classes, 4

nested comments, 4

–O, 4

–P, 5

progress, 5

prototypes, 6

–Q, 5

–R, 5

rule file, 5

–S, 5

–SQL, 6

stderr.out, 4

–T, 6

–U, 6

user defined, 6

–V, 6

variables, 7

–W, 6

–X, 6

–Y, 6

Z, 6

options:. include files

overload, 151

Parochialism, 13

pascal, 84

Microsoft C, 51

MPW C, 51

Think C, 51

type modifier, 51

type specifier, 51

PCC, 151

Plum,Thomas, 29, 156

pointer, 152

null, 151

portation problem

source, 16

target, 16

postfix, 152

pp_ansi, 45

pp_arg_count, 41, 60

pp_arg_paren, 78

pp_arg_string, 44

pp_arith, 41

pp_assign, 78

pp_bad_white, 39

pp_benign, 79

pp_comment, 42

pp_defined, 43, 45

pp_depend, 80

pp_elif, 43, 45

pp_empty_arglist, 41

pp_error, 45

pp_if_depth, 60

pp_include, 45

pp_include_depth, 60

pp_include_white, 40

pp_length, 47

pp_lowercase, 68

pp_macro_conflict, 68, 86

pp_macro_dup, 68, 86

pp_overload, 42

pp_paste, 42, 45

pp_pragma, 45

pp_recursive, 46

pp_relative, 47

pp_semicolon, 78

pp_sizeof, 41

pp_stack, 68, 80

pp_stringize, 45

pp_sub_keyword, 46

pp_trailer, 78

pp_undef, 68, 80

pp_unknown, 45

pp_unstack, 80

pp_white_after, 39

pp_white_before, 39

pragma, 152

predefined macros, 20

prefix, 66, 152

preprocessor, 152

argument, 44, 78

arguments, 41, 60

arithmetic, 40

define, 68, 80

defined, 43, 45

elif, 43

if, 40

keywords, 46

semicolon, 78

sizeof, 40, 41

undef, 68, 80

whitespace, 39, 40

preprocessor:. elif

prj_com_lines, 91

prj_conflicts, 86

prj_decisions, 100

prj_exec_lines, 91

prj_functions, 98

prj_H_operators, 97

prj_high, 94

prj_low, 94

prj_macros, 98

prj_nonexec, 94

prj_operands, 97, 109

prj_operators, 97, 109

prj_tokens, 97, 109

prj_total_lines, 91

prj_u_operands, 97

prj_uH_operators, 97

prj_white_lines, 91

project, 2

prototype, 152

prototypes

creation, 6

recursive, 46

reserved keywords:, 37

Ritchie, D.M., 156

rule, 152

scope, 152

semantics, 153

signal, 153

signal handler, 153

signed, 66

sizeof, 40

specifier

storage class, 51, 153

type, 51, 84

SQL, 6

standard

K&R, 15

PCC, 15

statement

null, 151

Steele, G.L., 155, 156

stm_bad_label, 56

stm_cases, 92

stm_catchs, 93

stm_cp_begin, 73, 93

stm_depth, 103

stm_end, 93

stm_end_tryblock, 93

stm_is_comp, 74, 93

stm_is_expr, 93

stm_is_high, 93

stm_is_iter, 93

stm_is_jump, 93

stm_is_low, 93

stm_is_nonexec, 93

stm_is_select, 93

stm_labels, 93

stm_lines, 90

stm_never_caught, 93

stm_operands, 96, 108

stm_operators, 96, 108

stm_tokens, 96, 108

storage class, 153

storage classes

pascal, 51

string

null, 151

wide, 38

style, 11

suffix, 66

F, 34

L, 34

U, 34

Symantec, 4

syntax, 153

System Variables, 34

tab stops, 70

tag

enumeration, 149

Think C, 51

token, 95, 153

trigraph, 31, 153

try blocks, 92

type

modifier, 51, 84

specifier, 51, 84

-U, 22

unary, 153

undef, 68, 80

underflow, 153

visibility, 153

void, 154

volatile, 84, 154

whitespace, 39, 40, 154

xor, 154

yacc, 154