Authors: Wei, H. L.1; Billings, S. A.1
Source: International Journal of Control, Volume 82, Number 1, January 2009 , pp. 27-42(16)
Publisher: Taylor and Francis Ltd
Abstract:
In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance.Keywords: cross-validation; model structure/subset selection; non-linear system identification; parameter estimation; random resampling; split-sample
Document Type: Research article
DOI: 10.1080/00207170801955420
Affiliations: 1: Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield, S1 3JD, UK
Links for this article