0-Introduction
This paper describes an expert system, grounded on case-based reasoning, which predicts the yield of a banana plantation. The knowledge of the system resides in a collection of historic records, which correspond to weekly-assembled descriptions of actual weather conditions over the bunch maturation period. The records are organized in cohorts of parturition. Each record is made out of a label, which contains the identification of the week of the year in which parturition of the cohort occurred together with data on rain, sunshine, mean temperature, and mean humidity, and a vector with the yield during the harvesting weeks of the cohort. The records are concrete descriptions rather than frames or prototypes.
The cases in memory make possible the prediction of yield by simulation. In addition, they allow for the validation of the system, by "inverse prediction": after each new record is assembled on the basis of statistics, its label is used to retrieve preexisting records to check simulated yield with actual yield. The knowledge base of records grows monotonically with the following exception: if a label of a newly assembled record matches –within the margin of error– a preexisting record for the same week of the year, the two records are compacted according to some formula. This process of compaction corresponds to the learning of the system that, with time, enhances performance.
Section 1 describes the domain of application. Section 2 describes the methodology used. Section 3 explains the assemblage of historic records. Section 4 explains the execution of the simulation (assemblage of actual weather labels, retrieval and selection of relevant records, instantiation, and aggregation of records to form predicted total yield). Section 5 presents the validation of the system. Section 6 describes the self-correcting behavior (learning) of the system. Section 7 is the conclusion, and shows that the mechanisms described are generalizable over domains and performance elements.
1-The problem
Banana plantations produce bunches all year round. Every week many plants "burst" and give birth to a new bunch; this event is called parturition. The bunch is decorated with a colored ribbon which represents the parturition cohort consisting of all the plants that underwent parturition during a particular week; up to 12 different colors are used to differentiate the cohorts. A cohort becomes ready for harvesting in approximately 12 weeks, with some loss of bunches from parturition to harvest. The harvest occurs in three stages: in the 11th and 12th weeks only bunches with fingers of a standard caliber (46/32") are cut; in the 13th all bunches are harvested regardless of caliber. The Farm keeps the Shipper, with whom it has signed a marketing contract, informed of the number of 42-pound boxes it expects to deliver. That estimate becomes binding three weeks before cutting date, within e% of error, where e is a small integer (most commonly 5) stipulated by contract. The profits of the F'arm are tightly associated with good estimates, hence the importance of having an accurate expert judgment on the matter. The yield problem is that of making an acceptable estimate of the number of boxes the Farm will deliver each cutting week, on the basis of the number of parturitions of the relevant standing cohorts and the experience represented by records of previous cases.
The weather conditions for fruit growth are mainly amount of rain and hours of sunshine, roughly the more of both the better. More specifically: 1 inch of rain a week is ideal, but too much of it can be detrimental for nutrient absorption; every sunny hour is welcome, unless the humidity is too low and the temperature too high (which could produce plant dehydration); the plant thrives within a temperature range of 23 to 29 degrees Celsius; too high humidity provokes the attack of the dreadful black-sigatoka disease which drastically diminishes yield. Under the right combination of rain, sunshine, temperature, and humidity, a bunch increases the size of its fingers by 1 degree every three days (1 degree = 1/32").
2-Methodology
The methodology is based on the direct storing and consulting of records of particular cases, i.e. pairs of weather-condition descriptions and corresponding yield. No frames or prototypes are assumed. The methodology relies entirely on direct mechanisms for selecting, adjusting, and applying the stored knowledge of past cases to produce a simulation of the question at hand, namely "how many boxes should we expect to fill in cutting day three weeks from now?". Each record should contain the information associated with each parturition cohort. The record is made out of a label and a vector. The label contains the description of the weather during the first 10 weeks of maturation. It is composed of five fields. The first one is the ordinal number of the week in the year (1 to 52). Its purpose is to contribute an element of seasonal change in the weather conditions, as the calendar moves from the colder to the warmer, from the dryer to the more humid weeks or vice versa. The other four fields are hours of sunshine, inches of rain, mean temperature, and mean humidity during the first 10 weeks of the cohort. The vector is composed of three fields that contain the yield of the three harvesting weeks, expressed as numbers of boxes filled on the 11th, 12th, and 13th weeks divided by the number of parturitions of the cohort.
3-Assemblage of Historic Records
A historic record can be viewed as a particular format for representing weather, parturition,
and yield statistics for different combination of weekly weather conditions during a 13-week
period. The record is assembled automatically every Saturday at midnight, using information
provided by sensors strategically located all over the farm. It refers to the 13-week period of a cohort which ends at that time. The record is made out of a label and a predictive vector. The label subsumes the weather statistics for the first 10 weeks of the cohort (the weather during the cutting weeks is assumed not to be relevant). The predictive vector contains the yield numbers for the three cutting weeks. Once assembled, the new record is stored in the data base, label and predictive vector together, as is, except in the case described below in Section 6.
4-Execution of Simulation
The simulation has four stages:
4.1-Assemblage of Actual-Weather Labels
Twelve actual-weather labels have to be assembled, according to the actual weather conditions of the previous 12 weeks. Each of the assembled labels starts with the ordinal number of the first week of the respective cohort, and follows with the values for the four weather variables, computed as far as the system has data for the elapsed time since parturition. The support of each actual-weather label is thus full in the case of first three labels, which rely on 10 weeks of data, but progressively weaker for the others. But again, the prediction to be based on those labels will be contractually binding only in the case of the first three, the others being only tentative.
4.2-Selection of Predictive Vectors
Each of the actual-weather labels is used to retrieve predictive vectors whose labels lie within an e% margin for each of the variables in the labels (the e% margin is also applied to the first field, so that weeks closely following or preceding the one in the label may be considered in the matching process). If no match is forthcoming, the current value of e is relaxed by 1, and the matching process runs again. If multiple matches occur, the vectors in the matching set are averaged out to get a single vector for the purpose of instantiation. The result of this phase is the selection of 12 predictive vectors, directly retrieved or computed by averaging out.
4.3-Instantiation
The parturition figures for each cohort are used to instantiate each predictive vector previously selected. The process of instantiation consists of multiplying the parturition figure by the predictive vector, the result being an instantiated vector. Since there are 12 predictive vectors, 12 instantiated vectors are forthcoming.
4.4-Aggregation
The instantiated vectors form a displaced matrix; the kernel of the matrix produces, by column addition, the prediction of yield for ten consecutive weeks, starting current week + 2.

The automatic assemblage of vectors explained in section 3 could be seen as an "inverse prediction," since what the system does is to start with actual figures for parturition and three types of yield (precocious, regular, and late), and end up producing the relative numbcrs that would be needed to make their prediction. That inverse prediction can be used to validate the system, by verifying to what extent its results lie within the contractual margin of error e of the predictive vectors retrievable with the label.
The validation process functions as follows: after running the process described in Section 3 a new record is produced. Its label is then used to retrieve a set of preexistent records, by the same procedure described in 4.2. Once a predictive vector is obtained or computed from preexisting records, the total yield of the three fields in the two vectors are compared to see if they differ by more than e% margin of error. If they do not, the system is validated.
6-Learning
The described process of validation can be used as a self-correcting mechanism for the system. If a preexisting record is found that matches within e the new record (the one just assembled), and its label has a first field identical to the first field of the label of the new record, the new record is averaged with the preexisting record, rather than simply added to the data base. If the validation process has given a positive result (difference in yield lyiug within e%), there is no reason to prefer one of the two vectors to the other, so each is assigned a .50% weight. On the other hand, if the discrepancy is larger than e%, it seems advisable to give less weight to the new vector. I propose a weight of -0.01x + 0.5, where x is the excess discrepancy (curves more elaborate than this straight line could be decided upon by means of sophisticated statistical analysis, but I doubt the gain in accuracy of prediction will be worth the trouble). Finally, in cases of gross discrepancies (over 50%) it seems advisable to totally ignore the old vector and substitute the new one, in order to insure that long-term changes in quality- of attention to the farm always be taken into account. One could say that this adaptive behavior of the system amounts to an elemental form of learning which definitely will reflect itself in improved performance.
7-Conclusion
I have described an adaptive expert system grounded on case-based reasoning. The system has generic properties, i.e. it seems to be generalizable to different domains and different performance elements (the performance element in this case is a simulation, but it could be almost anything, from the running of a physical machine to the supervision of an electronic network, for instance). The mechanisms presented include the assemblage of statistics of categorizing conditions (in this particular system, weather conditions) and performance data (in this system, yield) in historic records. These records have two parts: a label, which is the handle of the record for retrieval purposes; and a predictive vector, which contains the performance figures (expressed in numbers relative to an independent or input variable, in this system number of parturitions). The fact that all contents of the records are numeric, including the one representing temporal order (first field), makes it possible to compare labels, and to average-out sets of records. The basic methodological idea is that, instead of pre-computing prototypes or frames for a domain, one should store real cases directly, and rely on large memory and very little computation at performance time. The recommended method of knowledge application is to retrieve records that match actual-data labels within a certain margin of tolerance, averaging the collection to produce input to the performance element. Finally, the suggestion is made that some form of compaction of historic records be applied, in selected circumstances, so that monotonic growth of the knowledge base be limited and at the same time the system become capable of self-corrective adaptation. The hope is that this kind of solution be applicable across domains and performance elements in the design of a whole class of memory-intensive expert systems that learn.