C and C++

Source Code Modification

using

CodeFix

 

by

Patrick Conley.

Codefix is a product of Abraxas Software, Inc.

Codefix was designed & written by Patrick Conley.

 

 

 

 

For more information, contact:

Abraxas Software, Inc.

Post Office Box 19586

Portland, OR 97280, USA

Phone:  503-802-0810

Fax:  206.309.0304

Email: support@abxsoft.com

Internet: www.abxsoft.com

 



Table of Contents

Preface.......................................................................................................................................................................... 3

I. Introduction to Source Code Modification................................................................................... 4

C++ parsing complexity................................................................................................................................... 4

II. Calendar related Symbol Identification....................................................................................... 6

Collection of foreign/domestic Date related Symbols in a C|++ program................... 7

Provided Symbol Tables................................................................................................................................. 7

Collection Phase.................................................................................................................................................... 8

A symbol collection example................................................................................................................................ 8

Identifying Code Using Date Name Criteria....................................................................................... 9

Automatic Detection of Date Symbols in C/C++............................................................................ 11

Automatic Code Modification for Date Programming Problems.................................... 12

III. Program Layout Modification............................................................................................................ 13

IV. Obfuscation and/or shrouding of C++ programs................................................................. 14

V. Dynamic Testing: Insertion of code for runtime testing.................................................. 15

VI. Database generation from C/C++ source code........................................................................ 17

VII. HTML Generation from C/C++................................................................................................................ 18

VIII. Comment Analysis & Generation..................................................................................................... 19

IX. References......................................................................................................................................................... 20

Index............................................................................................................................................................................. 21

 


Preface

Maintaining programs in C or C++ is a difficult task. Even experienced programmers need tools to aid in the program develop­ment process, but all too few tools exist to detect bugs in C and C++ source code and help the programmer to avoid problems.

Codefix is a powerful tool for modifying C and C++ source code. Unlike other tools, Codefix is itself fully programmable. It performs its primary task — analyzing and modifying C and C++ source code — entirely under the direc­tion of a user-written control program.

 

 

Codefix is a powerful tool for modifying C and C++ source code. Stan­dards and mea­sures can be specified by the user for a tremendous number of fea­tures of C++ code that have an impact on awareness, assessment, renovation, validation, and implementation. Codefix is de­signed to enhance dra­matically the effec­tiveness and effi­ciency of project man­agement in com­mercial and indus­trial pro­gram­ming ef­forts.

A custom Codefix pro­gram specifying code stan­dard­s and measures can be written by a pro­ject leader using the Codefix language (actually a re­strict­ed subset of C itself).

Codefix can be pro­grammed to:

 

·        Analyze source code for date programming problems, includes rules for date type-encoded identifiers, proper use of date related macros and typedef's, prototypes, etc. Year-2000 is not the only calendar related date problem. There will be many problems in 2038 and coming leap years. CodeFix can be in the future to find and fix date problems.

 

·        Modify code layout to improve readability. Most standards are supported for indentation and source program formatting. Generate 'pretty' C++.

 

·        Obfuscation or shrouding of code for your distribution, yet still maintain proprietary trade secrets.

 

·        Dynamic Testing is available by inserting assertions at locations in the target code where possible conflicts are found. Add code for run-time testing & debugging.

 

·        Generate HTML Documentation from C/C++ programs.

 

·        Database Generation is provided where Oracle & Microsoft databases can be generated so management can analyze computer programs from their favorite database in the form of graphics. Generate databases from C++/C source code.

 

·        Commenting - Validating that C/C++ is correctly commented, and generating comment stubs for cases of missing comments.

 


I. Introduction to Source Code Modification

 

Since 1982 Abraxas Software has been providing language solutions for all programming languages. We first started out  ‘CodeCheck’ development in 1986 and released the product in 1990. Since that time many people have asked us, “Why not ‘fix’ the problems instead of just logging them?.”

It had been our feeling that the automatic modification of source code is a dangerous proposition, e.g. taking intelligent people out of the loop.

Today we have identified five main areas where CodeFix can be used to support C/C++ programmers in source code modification.

 

1.)     Date/Calendar ( Y2K ) symbol identification, commenting, and correction. Both foreign and domestic calendar problems can be found.

 

2.)     Program Layout Modification.

 

3.)    Obfuscation and/or shrouding of C++ programs for public distribution.

 

4.)    Insertion of code for runtime testing.

 

5.)    Database generation from C/C++ source code.

 

C++ parsing complexity

The parsing of C++ is extremely complex and we believe that given our sixteen years experience in this area we can help professional programmers solve extremely difficult problems. Today with the use of templates, namespace and other abstractions it is impossible to identify YYMMDD symbol related problems using conventional tools that are simply based on searching for the explicit symbols. For instance is the following example:

Class Date

{

public:

Date ( int mon, int day, int year );     // constructor

Int getYear()  const;

Private:

Int month, day, year;                  // private data

}

inline int Date::getYear() const

{

return year;

}

 

int retire;

Retire = Date.getYear();      //flag ‘retire’ as Date

 

In the above example most tools would not be aware that ‘retire’ is a Date type. Since CodeFix is capable of following the use of ‘Retire’ it is capable of finding even the most complex date usage problems related to C++.


II. Calendar related Symbol Identification

 

The core concept of date/calendar assessment using CodeFix is that of symbol identification. Symbol Identification involves several passes to acquire the needed information. The passes can be considered as - collection, documentation, analysis, and correction.

The scope of symbols can include.

 

1. Simple 'C' data types, like

 

int year=98.

 

Here we have the simple use of year with being initialized with a two digit date.

 

2. Date/Time service routines, like

 

set_this_year( (int) 98 )

 

In this case we have a time handler setting the current year to a two digit year.

 

3.) Sort routines, like

 

merge_table( emp_list, result_list, START_DATE, 98 )

 

Here we have a sort routine where the employee list is being merged by start_date         from a two digit date.

 

4.) C++ complex data template types like,

 

template <class date> class employee{};

employee<date> de;

 

            In this typical C++ template problem "de" for 'employee date' has been instantiated as a date type.


 

Collection of foreign/domestic Date related Symbols in a C|++ program

 

Collection involves the building of a calendar-name symbol tables that the expert system ‘codefix’ will use in the identification process.

 

A simple symbol table may be thought as the following.

 

Begin

beg

bgn

mdy

mmddyy

mmyy

Month

mon

mo

mmm

ccyy

cyyddd

Cyyddm

cyymmdd

Curr

Current

date

Day

 

Figure 1.

 

As shown in figure 1, we have a collection of symbols, which in effect are just strings of common Year-2000 related names. These are string known to represent time/date information and experience has shown that these are the typical names that programmers have used historically for time/date data types.

 

The problem of course is that not all information representing time/date information uses these name combinations. History has shown that not only do programmers not use meaningful names in their programs, they may even use the name of their cat to represent the day of year!

 

The collection phase of CodeFix is to build the symbol table, e.g. build a list of strings that by context represents date and/or highly likely may represent dates, by the context of the program using expert system technology and advanced parsing techniques.

 

Provided Symbol Tables

 

Codefix comes with several pre-built symbol tables for checking source code, they include -

 

1.) Simple symbols

 

Simple symbols containing the classic symbols usually provided by most YEAR2000 documentation, it includes about two dozen of the most common

Symbols used in programming for dates.

 

2.) Advanced symbols

 

            This example is from a large suite of 'date' related public code samples, this

Symbol table provides hundreds of symbols used for representing dates in the industry.

 

3.) Foreign symbols

 

            This example provides a large set of strings used in providing computation for

            Worldwide calendar sets outside of the USA.

 

Collection Phase

 

This section will discuss how to build your own CodeFix symbol table.

 

The generation of symbol tables requires the extraction of symbols a large set of C/C++ which is know to contain 'date' related computation in your organization. Most likely after the generation of the initial symbol table some pruning will be required to reject base types that are not considered date related.

 

For example,

 

Check -rcollect.cc datecode.c

 

In the above example, expert system rule script symbol.cc contains the 'rules' for building the symbol table from the known date code in the example datecode.c.

 

The results will be written to the file symbol.tmp, by default. If a name other than symbol.tmp is desired then the file symbol.cc must be modified.

 

If there is more than one file to be included, the wild card option (*) may be used before the dot-c suffix.

 

A symbol collection example

 

What follows is a simple 'C' example of symbol table generation.

 

1: typedef struct DATE_INFO

  2: {

  3:    int year;

  4:    int month;

  5:    int day;

  6: } DI;

  7: DI dtglo;                  // global

  8: enum DATE { year=1900 };

  9: // simple 'date' example

 10: struct DATE_INFO bridge_2to4 (struct DATE_INFO *date)

 11: {

 12:    int y2, y4;

 13:    struct DATE_INFO ywd;   // local

 14:

 15:    y2 = date->year;

 16:

 17:    if ( y2 > 49 )

 18:    y4 = y2 + 1900;

 19:    else

 20:            y4 = y2 + 2000;

 21:

 22:    ywd.year = y4; // local usage

 23:

 24:    dtglo.year = y4; // global usage

 25:

 26:    return (ywd);

 27: }

 

For the above case the initial generated symbol table would appear as follows.

 

DM year 2 DATE_INFO     // data member 'year' from line 3

DM month 2 DATE_INFO

DM day 2 DATE_INFO

GT DATE_INFO 3          // GT - Global Tag 'DATE_INFO'

GD DI 26 26

GD dtglo 28 26          // GD - Global Definition

ED year 6               // Enum

GT DATE 1

LD date  26 26          // Local Definition

FD bridge_2to4 26 0     // function definition

LD y2 bridge_2to4 6 6

LD y4 bridge_2to4 6 6

LD ywd bridge_2to4 26 26

 

·        note file name info and line number, author, other information is kept internally.

 

The initial pass of CodeFix for Y2K collection is the generation of this intermediate symbol table containing the base information on all symbols defined. The constants shown can be found in the appendix and contain type information both current and base for complex types.

 

The next step is determination of whether a date usage is found in the source example.

 

Identifying Code Using Date Name Criteria

 

Given input source code in the form of a singled file or complete project including many files a symbol table is generated as shown in the previous section.

 

The basic concept of identification is finding all use of symbols that meet the criteria of the 'Y2K' keyword list, and then generating a subset symbol table of the original definitions meeting those criteria.

 

What  follows is an intermediate form of the source example in this section. Where the first character identifies source origination.

 

Where first character in record means:

 

* The file name

- Source Header File ( This data is not emitted to the final output )

+ Source from file

 

Note that before all usage of date symbols there is inserted code identified by the control string '$DATE$', all symbols defined that meet the criteria of Y2K keywords are marked for there usage, prior to use. This intermediate step helps identify potential Y2K usage of all symbols.

 

*//$DATE$ MN b.c

+

+#include "b.h"

+

+DI dtglo; // global

+

+// simple 'date' example

+struct DATE_INFO bridge_2to4 (struct DATE_INFO *date)

+{

+   int y2, y4;

+   struct DATE_INFO ywd;       // local

+

-//$DATE$ IL y2 int 6

-//$DATE$ IL date DATE_INFO 26

+   y2 = date->year;

+

-//$DATE$ IL y2 int 6

+   if ( y2 > 49 )

-//$DATE$ IL y4 int 6

-//$DATE$ IL y2 int 6

+       y4 = y2 + 1900;

+   else

-//$DATE$ IL y4 int 6

-//$DATE$ IL y2 int 6

+       y4 = y2 + 2000;

+

-//$DATE$ IL ywd DATE_INFO 26

-//$DATE$ IL y4 int 6

+   ywd.year = y4; // local usage

+

-//$DATE$ IG dtglo DI 28

-//$DATE$ IL y4 int 6

+   dtglo.year = y4; // global usage

+

-//$DATE$ IL ywd DATE_INFO 26

+   return (ywd);

+}

 

Automatic Detection of Date Symbols in C/C++

 

Using our original example, from the previous section we now have collected the symbols, and reduced them to the subset that are candidates for Y2K.

 

In this section we have emitted the original source with candidates documented.

 

1: typedef struct DATE_INFO

  2: {

  3:    int year;

  4:    int month;

  5:    int day;

  6: } DI;

  7: DI dtglo;                         // global

  8: enum DATE { year=1900 };

  9: // simple 'date' example

 10: struct DATE_INFO bridge_2to4 (struct DATE_INFO *date)

 11: {

 12:    int y2, y4;

 13:    struct DATE_INFO ywd;  // local

 14:

 15: //$DATE$ IL date DATE_INFO 26          // 26 - means 'struct' base type

 16:    y2 = date->year;

 17:

 18:    if ( y2 > 49 )

 19:   y4 = y2 + 1900;

 20:    else

 21:           y4 = y2 + 2000;

 22:

 23:    ywd.year = y4; // local usage

 24:

 25: //$DATE$ IG dtglo DI 28              // 28 means defined type by 'typedef'

 26:    dtglo.year = y4; // global usage

 27:

 28:    return (ywd);

 29: }

 

Note on lines 15 & 25 we have marked our commented control string with "$DATE$. Following the control string we have the name of the symbol, followed the base name of the object that defined the symbol, followed by the base type. The constants are defined in the appendix.

 

Obviously this example is very simple were only matching those types that are explicitly declared as having the date keyword in the symbol name. However we could have included scope to that of the parent type, or even in the case of the assignment we could consider the type on the left-value ( lvalue ), e.g. the type to the left of the equal sign.

 

Automatic Code Modification for Date Programming Problems

 

Finally were at the goal of our problem. Lets use a more simple case here.

 


 

III. Program Layout Modification.

 

There are many standards for readability, codefix provides templates for the three most common formats to automatically be applied to you C/C++ software.

 

Since the script sources are provided for the layout modification any type of formatting can be applied to your source code.

 


 

IV. Obfuscation and/or shrouding of C++ programs

 

In selling or distributing software in today's marketplace it is essential to support all computer platforms. Given the large number of computer and operating system combinations its is not possible for even the largest corporation to ship binary software for all platforms. Given the portability of C/C++ the shipment of source code is sometime the only solution. CodeFix will apply the highest levels of code Obfuscation to your source code so that you can deliver to your customer with no loss of trade secrets. The generated source code while compilable is not meaningful to a recipient.

 

 


 

V. Dynamic Testing: Insertion of code for runtime testing.

 

While the principal use of CodeFix is on that of static analysis, it is possible to apply the notion of runtime dynamic analysis.

 

A simple example will be provided in the case of the Year-2000 problem.

 

Int date;

 

Int year_4, year_2;

 

year_4 = year_2 + 1900;

 

the bridge patch replaces the line with code such as:

 

   if (year_2 > 49)

      year_4 = year_2 + 1900;

   else

      year_4 = year_2 + 2000;

 

In the above case it would be desirable for instance to find all symbols that use that are determined to represent a date and then assert that the dates are always in the YYYY format.

 

In this case CodeFix would insert the following assertions prior to use in all cases of use the of the symbols 'year_2' and 'year_4'

 

Where 'ASSERT_YYYY' is defined as -

 

#define ASSERT_YYYY(x) ( if ( !x )   fprintf( testfp, "Invalid YYYY usage in file=%s at line=%d\n", FILE, LINE )

 

CodeFix can automatically add the the assertion as above in all cases where a symbol is determined to be and illegal date.

 

From the above case the included assertion into the source code would appear as follows.

 

ASSERT_YYYY( year_4 > 1900 );

ASSERT_YYYY( year_2 > 1900 );

year_4 = year_2 + 1900;

 

As shown the assertions are inserted prior to use in the generated source code. The code is then compilied and linked and at runtime if there is case where the assertion fails then the message is written to the file channel testfp ( FILE * testfp ),where this may be a database for future analysis.

 

Codefix can add such assertions as detailed above for both simple ( int ) and complex ( class, template ) data types.

 

 


 

VI. Database generation from C/C++ source code.

 

Quite often it is simply impossible to analyze the results from source code analysis products because of huge quantity of information. In this case it is essential to take a set of source code and generate a database that is compatible with Oracle or Microsoft Access so that management and/or programmers can analyze the results of source code analysis.

 

year_4 = year_2 + 1900;

 

In this case whenever Y2K symbols are found the information is written to a file in 80 byte card image format.

 

Year_4

int

file.c

l

simple

 

In this case the generated record would appear as above, where the data would contain the symbol name, the type, the file name, the line number, and the scope of the symbol. When this analysis is done on large body of code it is possible for a generic database to provide graphics and even keep track of all relevant information from the source including extracting information from the comments.


VII. HTML Generation from C/C++

 

In this section we discuss automatically generating web documentation from C/C++, e.g. generating HTML from C/C++ to be used by an internet browser.

 


VIII. Comment Analysis & Generation

 

This section will discuss the requirements of source code commenting, validating, and generating correct comment blocks for documentation.

 


IX. References

 



Index


assert, 15

Layout, 4

Obfuscation, 4, 14

parsing, 4

runtime testing, 4

Y2K, 4