FORMULA BASED CODING

(from a proposal to a leading New York bank)

 

Overview:

The Bank is Calculating and Paying Agent for numerous trust funds consisting of mortgage loans for the benefit of holders of the respective Mortgage Pass Through certificates. Although each fund involves the accrual and payment of interest and principal to the Certificate holders, the various agreements giving rise to each trust vary widely from trust to trust. Consequently, software developed to administer the monthly distributions of these trusts must be tailor written for the most part. While a "template" approach to the problem is unlikely to produce much in the way of results, a structured approach to the development of these applications can lead to reduced development time for both the developer and the end user who writes the specifications and to enhanced user friendliness through the standardization of screens and their behavior.

Models used in the past to develop these applications are both bloated with respect to the size of the code and data tables and verbose in the use of memory variables. Many of these applications present serious maintenance problems causing both the enduser and developer to spend excessive hours trying to make the models function as they are supposed to function during each monthly production cycle.

The structured approach that follows was developed jointly with the Structured Finance Group in connection with the Mortgage Corporation Series 1993-2 trust, the most complex of any of these deals to date. The approach led to a reduction of development time by at least 50% and reduced debugging of the finished application to a few hours.

 

Data Structures:

For the sake of completeness, the data structures considered herein are based on those trusts in which the Servicer furnishes loan data and the individual loan level detail on a monthly basis. Deals vary in complexity. Some deals involve groups of loans differentiated, for example, by geography or other considerations. In addition, most deals will have multiple classes of certificate holders each sharing in the monthly distribution of funds in a different manner.

In cases where the Servicer furnishes the LOAN detail file, this detail must be summarized to the GROUP level (if any) and to the SERIES level. The GROUP file will consist of one record per month per group, while the SERIES file consists of one record per month. In addition to the loan summaries, the GROUP and SERIES records will contain fields for the input data furnished monthly by the loan Servicer. THE SERIES OR GROUP FILES SHOULD NOT INCLUDE FIELDS CONTAINING CALCULATED DATA.

The CLASS file contains one record per month containing the details of each monthly distribution for each Certificate Class. As is the case of the other files above, THE CLASS FILE SHOULD NOT INCLUDE FIELDS CONTAINING CALCULATED DATA.

Other files may be required depending on the trust agreements. Most typically they can include a reserve fund, details on realized losses or additional delinquency data.

The RULE excluding anything but Servicer supplied data from the individual fields of each table is a fundamental part of the Structured Approach. It has the advantage of maintaining strict data integrity and, perhaps more importantly, results in the most simple data structures. Finally, it is consistent with the concept of Formula Based Coding discussed below.

Indexes:

Except for the LOAN file, none of the data files will benefit that much from indexes, since the other files will always be rather small. In the case of LOAN detail files, a CDX file should be created containing tags for each field involved in any SELECT commands. In the most dramatic cases, this will reduce the execution for SELECT commands from minutes to a second or two at most. There may be a moderate increase in speed by creating CDXs for the GROUP and CLASS files on the distribution date and respectively the group or class designation. In any event, the order of any open CDX should be set to zero and left there. SELECT and LOCATE commands are permitted. SEEK commands are not permitted because they require the setting and manipulation of indexes which is simply not necessary when dealing with tables of this size. LOCATEs are usually faster than SEEKs in these circumstances when combined with Rushmore Optimization.

 

Common Functions:

BDAY() is passed the distribution date. The first business day is returned on or following the passed parameter.

FIRSTDOM() returns the date value for the fist day of the month of the date passed as a parameter. The distribution date is then determined by adding 24 to the returned value.

CURCLASS() or CURGROUP() moves the record pointer of the respective file to the matching record for the current distribution date.

YESNO() is used to present the enduser with a choice. Returns true or false.

POPTALK() is used to pass a message to the enduser, much like WAIT WINDOW.

TEXTFILE() is used to write the results of calculations to a text file.

WFDIST() Returns the funds remaining for distribution after a specific distribution has been made and the amount of the last distribution. It also updates the CalcFile (see Calculations and Waterfalls) for the status of the distribution.

WAIT WINDOW should never be used to obtain input from the user.

 

First Distributions:

Don't use an INITIAL or PROGRAM file to establish opening values for the first distribution. This simply adds an unneeded table to the database and requires additional coding to handle the difference between the first and subsequent distributions. The appropriate approach is to create a dummy distribution date one month prior to the first distribution. The CLASS and GROUP files are then populated for the dummy distribution date with the opening balances. Since this data is brought forward as the beginning balances of the First Distribution, there is no need to treat the first distribution any differently than any other distribution, thereby eliminating an unneeded level of complexity.

MAIN Program:

The MAIN program (the one used to build the Project Manager) does four things. Depending on the setting of an environmental variable (GETENV()), it checks for a valid password. Second, it reads the current distribution date from the last record in the SERIES file (mddate), determines the previous distribution date (LastDist) and the next distribution date (NextDist). Third it opens all data files. Fourth, it displays the system popup. This PRG file has two variations to accommodate applications that deal with loan detail files and those that don't. In most cases, the standard form of these two modules can be used with minimum modification.

Input Screens:

Usually input from the Servicer does not vary too much from one application to the next. It is probably quicker to adapt screens from a previous application than create the screen set from scratch. If no suitable screen set exists, create the screens using the quick screen feature of the Screen Builder (from SERIES file) and deleting fields not needed in the specific window. In order to preserve the screen set, the Project Manager should be used.

 

Calculations:

Conventional wisdom has always held that variables should be given names that describe in an abbreviated manner the nature of the variable. FORMULA BASED CODING follows a very strict variable naming convention that rejects the conventional approach for complex financial calculations.

Each calculation in the specifications, whether intermediate or final, follow a meaningful numbering convention such as: F1, F2, E1, E2, etc. The results are stored to a variable name composed of Calc plus the Calc number from the specifications, i.e. CalcF1 and so forth. When intermediate calculations are included in future calculations, they are included in this fashion. This requires that the specifications be written in the same manner. Calculations are never stored as data fields.

 

The advantages of Formula Based Coding over the traditional approach are many. While CalcF5 may not be meaningful on its own, it refers directly to a specific calculation in the specifications, which is certainly more meaningful than a variable called m.SenPrePPct or class.SnPrePct. The programmer does not need to remember what fanciful variable or field name he or she chose to call the calculation some three hundred or so lines of code back. Since the code is written directly to the specifications, the need for extensive commentary in the code is completely eliminated. The names say it all. Finally, in addition to a noticeable reduction in programming time, the effect on the debugging process is dramatic, perhaps the most dramatic advantage of Formula Based Coding. Since the code is both compact and inherently tied to the specifications, it becomes relatively easy. The debugging process should be reduced from days to a few hours.

An inviolable requirement of Formula Based Coding is the writing of the results of each calculation to a text file (Calc File) which becomes a permanent part of the archive for each distribution. This requirement is a major factor in the reduction of debugging time and provides the necessary audit trail without writing the results of calculations to data files (see Close Distributions and Waterfalls below).

One very obvious advantage of Formula Based Coding is that the programmer really doesn't need to understand very much about the model. While in other circumstances I would argue that the developer must understand the needs and requirements of the end user, in the case of complex financial calculations there is no way the programmer can assimilate the necessary knowledge without running the budget to Mars and beyond while obtaining a degree in Finance and passing the CPA and the State Bar exams. On the other hand, the structured approach does require that the individual writing the specifications have a some appreciation of how data flows through fields and records. In reality, this means understanding that the ending balances of the previous month are carried forward as the beginning balances of the current distribution and this is not a calculation from the programmer's point of view. It's data manipulation.

 

Waterfalls:

Waterfall refers to the formulas used to distribute available distribution amounts to the various investor classes within the trust fund. Since the process of distribution is dependent upon the results of the calculations, the waterfall(s) should be programmed using Formula Based Coding. The same arguments for using this approach with the calculations are equally valid here.

 

Close Distributions:

The current distribution date is always contained in the field DistDate in the last record of the SERIES file. Therefore, to close a distribution, a new record must be added to the SERIES file and the various date fields (including DistDate) updated for the next distribution. No other changes should be made to any of the data files.

In order to preserve an audit trail the Calc File and all data files except the loan detail file should be compacted and added to a "Zip" files whose name includes the month and year of the distribution. This should be done under program control at the time of closing the distribution..

 

Specifications:

In the last two applications no formal reports were produced by the system. Instead, the report data was exported directly to spreadsheets. Since formal reports require time to develop, continuing this practice should lead to higher productivity. Specifications should include which data is to be exported to a spreadsheet. Switching from FoxPro DOS and Lotus DOS to FoxPro for Windows and Excel with their Dynamic Data Exchange capability would further simplify this process.

Terminology such as "Program" or "Initial" files should be abandoned. This section of the specifications should be called "Beginning Balances" or something similar. As pointed out above, the population of a dummy distribution date with the initial values is a far more productive approach.

There is no need to include calculations for establishing beginning balances. The specifications should simply state which ending balances from the prior distribution should be carried forward to the current distribution.

Intermediate calculations should be grouped and indentified in the specifications in the same manner as these calc variables are used in the code: i.e. CalcF1, CalcF2, etc. This tying of the specifications to the code is the central concept of Formula Based Coding.

Care should be taken to avoid unnecessary calculations. A case in point was the original CMS2 specifications which called for several calculations for Remic I, whereas in fact Remic I was simply a regrouping of the classes from Remic II, a rather trivial task in FoxPro. Had the original specifications not been changed, at the very least the waterfall would have required an extra day to develop.

 

Conclusion:

The Formula Based Coding approach promises to reduce what has been in the past a major problem area to a normal, everyday getting the job done scenario. A few hours invested in teaching this approach to both consultants and employees should yield handsome dividends in the future. Even more importantly, it puts in stark relief the limitations of throwing programmers at problems for which no methodology or systems analysis exist. Most often five hours of thinking constructively saves fifty hours of coding and debugging. It's not only the programmer who pays the price. It's the enduser as well and, ultimately, the profit margins of the Bank.