Next: Creating scatter-partitioning fuzzy systems
Up: Incremental neuro-fuzzy systems
Previous: Equivalence of fuzzy systems
Moody and Darken[3] have proposed a multi-phase approach
to RBFNs. First, a pre-defined number of centers is distributed in
input space with a cluster method (e.g., LBG[4] or
k-means[5]). The width parameters
of the Gaussians are
set by a local heuristic, e.g., setting the
of each unit equal to
the distance to the nearest other units. Moreover, Moody and Darken
propose to use normalized activations according to (3). In
terms of fuzzy systems the steps just described correspond to the
identification of the IF-parts of a Sugeno fuzzy rule.
The THEN-parts of the fuzzy rules or, alternatively, the output weights of the RBFN, are set by pseudo-inverse computation such that the summed square error (4) for a given training data set is minimized. It is also possible, but has usually no advantage, to compute the output weights iteratively through gradient descent on the error function[6].
This multi-phase approach is straight-forward and is often reported to be much faster than, e.g., the backpropagation training of multi-layer perceptrons for the same data. A possible problem of the approach, which has for example been noted by Bishop[7], is that the clustering is completely unsupervised and does not take the given desired output information (class labels or continuous output values) into account. Clustering methods usually try to minimize the mean distance between the centers they distribute and the given data (which is only the input part of the training data). This error, however, is of little relevance to many supervised learning problems. The resulting distribution of RBF centers (or rule patches) may, therefore, be poor for the classification or regression problem at hand. Fig. 6 shows an example where this is the case.
![]() |
If one visually analyzes the generated neuro-fuzzy system in Fig. 6, it becomes obvious that at several places there are rules covering neighboring areas and having basically the same output. Such rules could be combined into fewer rules each covering a correspondingly larger area of the input space. This would set free resources which could be used in places where the system can benefit from them more, for example in the upper right corner of the displayed part of the input space.
Instead of first constructing a possibly poor neuro-fuzzy system and then improving it later on, it had some advantages if one could immediately build a good system for the problem at hand. This is the goal of the method in the following section.
Bernd Fritzke