Abstract
This paper studies nonparametric estimation of the discount curve, which should be decreasing and positive over the entire maturity domain. Very few papers explicitly impose these shape requirements for removing the possibility of obtaining a shape-violating estimation. No matter how small the approximating error is, a shape-violating discount curve can never be accepted by the financial industry. Since these shape requirements are continuously constrained and involve an infinite number of inequality constraints, it is hard to provide a necessary and sufficient implementation that is computationally tractable. Existing parametric and nonparametric methods fail to achieve universal flexibility and shape compliance simultaneously. This paper proposes a nonparametric method that approximates the discount curve with algebraic polynomials and ensures the discount function is decreasing and positive over the entire domain. This estimation problem can be reformulated equivalently as a semidefinite program that is convex and computationally tractable. The proposed method is the first one which not only has asymptotic universal fitting flexibility, but also fully complies with shape requirements. Experimental results on one artificial data, one US Gilt STRIPS data, and one US Treasury bonds data demonstrate its superiority over state-of-the-art methods in terms of both the compliance of shape requirements and out-of-sample fitting measures.
Keywords
Introduction
As an important topic in finance, refining term structure estimation has received considerable attention for many decades [1]. The term structure of interest rates is the relationship between interest rates and maturities. The term structure of interest rates can be expressed in the discount curve d (t), or the spot rate curve r (t), or the instantaneous forward rate curve f (t). They are related to each other as follows
In this paper the maturity domain is a finite interval [0, T] ⊂ [0, + ∞). A discount function is required to satisfy the following shape requirements:
Positive: d (t) ≥0, ∀ t ∈ [0, T]. For any maturity, the present value of $1 in the future cannot be negative.
Decreasing: d (t1) ≥ d (t2) , ∀0 ≤ t1 < t2 ≤ T. The present value of $1 in the future decreases with maturity. This requirement is equivalent to the positive requirement of the instantaneous forward rate function: f (t) ≥0, ∀ t ∈ [0, T].
Unity at time 0: d (0) =1. It means that the limit of the present value of $1 in the future is 1 as the tenor approaches 0.
Continuous: d (t) is continuous on [0, T]. Dis-continuousness of d (t) at any t ∈ [0, T] implies +∞ or -∞ instantaneous forward rate at t.
Given that the decreasingness of d (t) holds, the requirement of positiveness can be replaced with d (T) ≥0.
However, most of the existing models, including parametric and nonparametric, have the possibility of violating the above shape requirements. If one estimated discount curve
Three attempts have explicitly imposed shape requirements on cross-sectional term structure models: shape-restricted B-spline regression [2, 3] and shape-restricted Gaussian process regression [4]. The first two attempts approximate the discount function with B-splines, but they differ in implementing arbitrage-free requirements. In [2], the requirement of monotonicity over the entire domain is simplified as the requirement of monotonicity over the knots. [3] implements monotonicity with the gradual diminution of the weight sequence of B-splines. [4] estimates the term structure with a generalization of kriging models with linear equality constraints (market-fit requirement) and inequality constraints (monotone-shape requirement). Section 2 gives a detailed introduction on these three attempts.
All the above attempts, however, should be improved. If the order of B-splines is higher than 3, [2] provides a necessary, but not sufficient, implementation of these qualitative requirements, while [3] provides a sufficient, but not necessary, implementation. Insufficient implementation leads to under-restriction that fails to guarantee shape preservation, while unnecessary implementation leads to over-restriction that is harmful to fitting inflexibility. Despite the knots-constrained spline regression (order m = 3) [2], the parameters-constrained spline regression (order m = 3) [3] and the shape-restricted Gaussian process regression [4] provide a necessary and sufficient implementation of these qualitative requirements, the discount curve estimated by these models has a piece-wise linear derivative, which implies zigzag-like forward rate curves.
This paper proposes a sieve estimator of the discount function. This estimation approximates the discount function with algebraic polynomials under explicit shape requirements. The polynomial for the discount function is required to be unity at maturity 0, positive and decreasing over the maturity domain. The requirement of monotonicity involves an infinite number of inequality constraints and makes this estimation become a semi-infinite program. But it can be equivalently transformed into a semidefinite program, which is convex and computationally tractable.
The proposed model has five apparent advantages. It has universal flexibility and can approximate any continuous discount function with an arbitrary accuracy as the polynomial degree increases. It is a sufficient and necessary implementation of shape requirements. For any training data and any polynomial degree, the estimated term structure cannot include any negative discount factor and any negative forward rate. It can be solved with a semidefinite program, which is convex and computationally tractable. It can estimate the term structure of interest rates when the data include coupon bonds. This is a great advantage over many interpolation-based models that can only estimate the par-yield curve when the data include coupon bonds. In practice, spot rates with tenors longer than one year are not directly observable. Commonly bonds with maturities longer than one year are coupon bonds. The estimated discount curve is differentiable for any order.
Table 1 summarizes the advantages of the proposed method over others.
Summary of various methods
A function f is said to be of C m if the derivatives f′, f″, ⋯, f(m) exist and are continuous. For example, class C0 consists of all continuous functions, the class C1 consists of all differentiable functions whose derivative is continuous. A function f is said to be infinitely differentiable, or of C∞, if it has derivatives of all orders. Therefore, C0⊇ C1 ⊇ C2 ⊇ ⋯. In this table, for three methods, i.e., the knots-constrained spline regression (order m = 3) [2], the parameters-constrained spline regression (order m = 3) [3] and the shape-restricted Gaussian process regression [4], the function family is C2, the estimated discount function is of C2, and the estimated forward rate is piece-wise linear.
All vectors are column vectors written in boldface and lowercase letters, whereas all matrices are in boldface and uppercase letters. All elements are written in plain lowercase letters. For example, the i-th element of vector
The paper proceeds as follows. Section 2 introduces three closely related methods in detail. The estimation model with shape-restricted polynomial regression is presented in Section 3. Section 4 illustrates the novelty of the proposed model with artificial and real datasets. This paper is concluded by Section 5.
One can estimate the term structure of interest rates by fitting bond prices with parametric models. Popular parametric models are the Nelson-Siegel model [5] and the Svensson model [6]. It is easy to extend parametric models to time series of interest rates [20, 21]. The major shortcoming of parametric models is the lack of universal flexibility. Nonparametric models for the term structure can be classified into two categroies. One category is regression splines, including quadratic and cubic splines [7, 8], cubic ℓ1 splines [9], exponential splines [10], smooth splines [11–13], tension splines [14] and Bayesian splines [15]. Another category is polynomials, including algebraic polynomials [16], Bernstein polynomials [17] and Chebyshev polynomials [18, 19].
However, due to four shape restrictions, the estimation of the term structure of interest rates should be regarded as shape-restricted regression. Shape-restricted regression has a long history in statistical literature with seminal works dating back to half a century, such as [22] and [23]. Common shapes analyzed in nonparametric econometrics are monotone, convex (concave), supermodular and homogeneous [24–31]. Recently one special issue on nonparametric inference under shape constraints was published by Statistical Science [32].
The decreasing shape requirement is imposed on the entire maturity domain, which involves an infinite number of inequality constraints. Hence, the estimation problem usually becomes a semi-infinite program [33–35], which involves finite decision variables and infinite equality or inequality constraints. Generally, semi-infinite programs are computationally intractable, except for very few special structures. One can refer to [36] for a comprehensive survey on semi-infinite programs.
This shape requirement was neglected by many classical term structure models, including the most popular one [5]. Attempts have been proposed to improve in this direction. Among them, some models provide only a necessary, but not sufficient, implementation, in which infinitely many inequality constraints corresponding to uncountably many points t ∈ [0, T] are reduced to finitely many inequality constraints corresponding to finitely many tenors. For example in [37], an extended Nelson-Siegel model, the decreasing requirement is replaced with
Knots-constrained spline regression [2]
Assume the bond data consist of N zero-coupon bonds
When the B-splines is quadratic, i.e., m = 3, constrain (2b) is a necessary and sufficient condition for the requirement of monotonicity. For any x ∈ [x
i
, xi+1], i = m, ⋯ , K + m - 2,
However, when m ≥ 4, Eq. (3) doesn’t hold, and thus constrain(2b) is not sufficient for the requirement of monotonicity. In other words, provided that m ≤ 4, the monotonicity at finite knots doesn’t follow the monotonicity over the entire interval. Hence [2] may fail to obtain a universally decreasing discount curve, even though its derivative at all knots is negative.
[3]also applies spline regression to estimate the discount function. Provided that the set of knots is the same as [2], the coefficients of the estimator
The monotonicity of the estimated curve is ensured by the variation diminution property of the B-splines [54, pp. 138-142]de1978practical: the number of sign changes in d (·) is, at most, as large as in the sequence a1, a2, ⋯, aK+m-2. When m ≤ 3, this constraint is also necessary for the requirement of monotonicity. When m > 3, this constraint is no longer necessary, which implies over-restriction and loss of flexibility.
In this method, the domain [0, T] is discretized into a regular subdivision 0 = x0, x1, ⋯, x
K
= T with a constant mesh size δ = T/K. An associated set of basis functions φ
j
is defined as Equality constraints related to market consistence: Inequality constraints related to the requirement of monotonicity ξ0, ⋯ , ξ
K
≤ 0.
In the original [4], only the requirement of monotonicity is explicitly imposed. To free from arbitrage opportunities, the discount function is further required to satisfy d (0) =1 and d (T) >0. This paper improves the original method and considers the other two requirements. To achieve d (0) =1, one can impose an additional equality constraint η = 1. To achieve d (T) >0, one should impose an additional inequality constraint η + δξ0/2 + δξ1 + ⋯ + δξK-1 + δξ K /2 ≥0 .
Methodology
Assume that the current time is 0 and the term structure should be estimated from quoted prices of N bonds. Let q
n
be the n-th bond’s dirty (full) price, i.e., the sum of the quoted price and the accrued interest. Assume the n-th bond has M
n
determined future cash flows up to maturity: payment c
nm
occurring at time t
nm
, m = 1, 2, ⋯ , M
n
. For a given discount function d (t), the price of the n-th bond should be
This paper estimates the discount function d (t) in the following framework
This paper approximates the discount function with algebraic polynomials
Constraints (10b) corresponds to the monotonicity of functional family
Instead of solving the semi-infinite program by discretization-based heuristic algorithms, this paper proposes an equivalent semidefinite representation of constraint (10b). Thanks to the following lemma [39, Theorem 9, Theorem 10], this program can be transformed to a semidefinite program with finitely many decision variables and finitely many inequality constraints. Let
When k is odd, i.e., k = 2k1 - 1,
According to the Markov-Lukács theorem, the necessary and sufficient condition for p (t) to be positive on in case of even k, i.e., in case of odd k, i.e.,
for some sum-of-squares polynomials p1 of degree 2k1 and p2, p3 and p4 of degree 2k1 - 1. Moreover, the cone of sum-of-squares of polynomials
With a straightforward application of Lemma 1, the semi-infinite program (10) can be transformed to one of the following two programs according to the parity of k.
At the end of this section, it is necessary to emphasize again that, the extrapolation of this estimated discount function d (t) beyond the maturity domain [0, T] is highly unreliable, because the positive and monotone shape requirements are imposed only on [0, T]. T must be larger than the longest tenor of cash flows in the portfolio.
The proposed model is built on the CVX toolbox [40] for solving semidefinite programs. When T and k are very large, some variables may reach the maximum arithmetic representation of a computer. For example, given that T = 50 and k = 200, T k = 50200 will be treated as infinity because of overflow. Hence, all experiments scale the maturity domain [0, T] to the unit interval [0, 1] for avoiding possible overflow. Benchmark models, including the Nelson-Siegel model [5], the Nelson-Siegel-Svensson model [6] and the unrestricted spline regression [11], employ the built-in Financial Instruments Toolbox of Matlab 2021B with the default settings. Because the estimation of the Nelson-Siegel model [5] and the Nelson-Siegel-Svensson model [6] involves a non-convex optimization, it is a rather tough task to detect the number of local optima and search its global optimum. For simplicity, this paper relies on the default settings of the IRFitOptions in this toolbox.
This paper will not report the computational time, because semidefinite programming can be efficiently solved by CVX. According to some trial experiments, even when the number of coupons N = 500 and the polynomial degree k = 60, all semidefinite problems can be solved within 1 minute. In financial practices, few cases have so many bond quotations.
Monte Carlo analysis
Motivated by [41], this subsection generates an artificial dataset from a multi-factor Cox-Ingersoll-Ross short rate model [42]
This artificial dataset consists of 100 zero-coupon bonds with face value 100. The maturities were randomly drawn from [0, T] = [0, 50] with uniform distribution. Bond prices are obtained by the multi-factor Cox-Ingersoll-Ross model plus a multiplicative normal noise with standard deviation 0.5%. The discount curve estimated by the Nelson-Siegel model [5],

The estimated discount curve of the Nelson-Siegel model [5] on the artificial data.
Six shape-restricted term structure models are compared by illustrating figures: (a) the knots-constrained spline regression with order 3 [2]; (b) the knots-constrained spline regression with order 4 [2]; (c) the parameters-constrained spline regression with order 3 [3]; (d) the parameters-constrained spline regression with order 4 [3]; (e) the shape-restricted Gaussian process regression [4]; (f) the shape-restricted polynomial regression (this paper). Model (a) and (b) use L1 fidelity and L1 roughness. Model (c) and (d) use L2 fidelity and ℓ2-norm roughness. In all models (a) - (d), the number of internal notes is 13 and the trade-off parameter λ is 10-2. In model (e), the kernel is Gaussian with σ2 = 10-2 and K = 14. In the proposed method, the degree of the polynomial is 20 and the trade-off parameter λ is 10-8. The above parameter settings are arbitrary just for illustrative purposes.
Since each estimated nonparametric discount curve is quite close to the true, each estimated term structure is shown with its forward rate curve. The forward rate curve f (t) = - ∂d (t)/(d (t) ∂t) is the differential of the discount curve, and can disclose more detailed information.
As displayed in Fig. 2(b), the forward rate curve estimated by [2] with order 4 is not positive over the entire maturity domain. This confirms that, given that the order is higher than 3, constraint (2b) is not a sufficient condition for the monotonicity of the discount curve. All forward rate curves estimated by the other five models are positive, which confirms the sufficiency of their implementations.

Estimated forward rate curves by six shape-restricted models.
In three models, the knots-constrained spline regression with order 3, the parameters-constrained spline regression with order 3, and the shape-restricted Gaussian process regression, the forward rate curve is non-differentiable at internal knots. When the basis function is piece-wise quadratic, its derivative function between each segment is linear, and at each internal knot the left derivative and the right derivative are not necessarily equal. The knots-constrained spline regression with order 4 and the parameters-constrained spline regression with order 4 can obtain discount curves that are second-order differentiable because they are piece-wise cubic. The proposed method uses a simple polynomial, so that its estimated discount curve is infinite-order differentiable.
The experiment repeats the above estimation 104 times and report out-of-sample performance in Table 2. Three state-of-the-art models are also included: the Nelson-Siegel model [5, 6] and [11]. [4] uses the Gaussian kernel with kernel parameter σ2 = 2-2. [11] uses cubic B-spline regression with trade-off parameter 10-2.
Out-of-sample experimental results on the artificial data
Model comparison is based on the following three performance measures. d-RMSE. This RMSE measures the discrepancy between the true discount function and the estimated discount function f-RMSE. This RMSE measures the discrepancy between the true forward rate function f (t) = - d′ (t)/d (t) and the estimated forward rate function PVNR: Percent of Violating No-arbitrage Requirement. It is not easy to judge whether one estimated discount curve violates shape requirements. For simplicity, the judge of violation is based on the grid
In each cell like a ± b, a and b are the average and standard deviation of 104 RMSEs. Experimental results in Table 2 show that explicit imposition of shape restrictions can improve out-of-sample fitting performance. In terms of all three measures, the method proposed in this paper has an advantage over the other eight models.
The UK Gilt STRIPS data are downloaded from the website of UK Debt Management Office 1
. It consists of 427 563 daily closing quotes of 205 UK STRIPS between July 22, 2007 and July 21, 2017. The experiment will not analyze the evolvement of term structures, or study its time series prediction problem. In other words, discount curves at 2 528 trading days are estimated independently. Even for the same model, hyperparameter settings, such as the number of internal knots in splines models, the polynomial degree in the proposed model, are not necessarily the same at different trading days. For each trading day, T is the greatest maturity of all bonds.Unlike the above subsection on the artificial data, the underlying term structure at each trading day is never known. As a result, d-RMSE (20) or f-RMSE (21) are not available. Therefore the experiment has to measure model comparison with fitting performance on test data. For each trading day, 70% of samples are randomly chosen as the training data, and other samples as the test data. For all models, the training data are used to estimate
Five-fold cross-validation is used to determine hyper-parameters of all models. In five splines-based models, the candidate set for the number of internal knots is {3, 4, ⋯ , 30}. In the proposed method, the candidate set for the polynomial degree k is {8, 9, ⋯ , 50}. The candidate set for the weight λ is
Experimental results of nine models are listed in Table 3. In each a ± b cell, a and b are the average and standard deviation of 25 280 RMSEs. The six shape-restricted models have an obvious advantage over the three classical models. the shape-restricted polynomial regression (this paper), the knots-constrained spline regression (order m = 4) [2] and the parameters-constrained spline regression (order m = 4) [3] achieve the first, second and third best performances respectively. It verifies that the requirement of smoothness can improve out-of-sample fitting performance.
Out-of-sample results on the UK STRIPS data
This data set is taken from the CRSP government bond files. It consists of daily closing quotes for US Treasury bonds between January 2, 2007 and December 31, 2014. This period covers the 2007-2010 sub-prime mortgage crisis. All bonds that include contingent cash flows, i.e., callable bonds, flower bonds, and inflation-adjusted bonds, are eliminated from the data. For each trading day and each bond, the mean of the closing bid and asked prices is used as the quoted price. In this experiment, the knots-constrained spline regression [2] and the parameters-constrained spline regression [3] will not be compared, as they are incapable of estimating the term structure when the data include coupon-bonds.
Because the data cover 2003 trading days, i.e., 2003 yield curves, the experiment should conduct 2003 independent model comparisons. For each trading day, 70% of samples are randomly chosen as the training data, and other samples as the test data. To decrease the error from this random partition, the experiment conduct these steps 10 times and measure the model performance with the average of these 10 runs in each trading day. Other than PVNR, two performance measures are used for model comparison. q-RMSE. q-RMSE is the RMSE of fitting errors of bond prices. Assume the test data include N bonds, the n-th bond has cash flows
Hit rate. The hit rate is defined as the percent of fitted prices that fall within the corresponding bid-asked spreads.
Experimental results are presented in Table 4. As expected, the proposed model achieves zero percent of violating arbitrage-free requirements, because it is a sufficient implementation of shape requirements. It achieves the least average RMSE among the five methods. 23.642% fitting prices fall in their corresponding bid-asked spreads, which is the second-best score among the five models. The proposed model significantly outperforms the other four models. In this experiment, the two parametric methods achieve worse performance than the three non-parametric methods. This result indicates that parametric functions with four or five parameters are insufficient for describing the term structure of interest rates in practice.
Empirical results on the US treasury bonds data
Empirical results on the US treasury bonds data
This paper proposes a nonparametric estimation method for the term structure of interest rates under shape requirements. To free from arbitrage opportunities, a discount function d (·) is required to be continuous, decreasing, positive and d (0)=1. Because the requirement of monotonicity is imposed on every point of the maturity domain, it is continuously constrained and computationally intractable. Many parametric and nonparametric methods neglect shape requirements. Some necessary, but not sufficient, implementations cannot guarantee full conformance of shape requirements, while some sufficient, but not necessary, implementations are too over-restrictive to achieve universal flexibility.
The proposed method approximates the discount function with an algebraic polynomial and presents an equivalent implementation of arbitrage-free requirements. The decreasing shape requirement is implemented by requiring its derivative to be non-positive everywhere. Its estimation can be solved by one of the two semidefinite programs according to the parity of its degree k. Experimental results on artificial data, UK STRIPS data, and US Treasury bonds data clearly show that the proposed method has a great advantage over many state-of-the-art methods.
The main drawback of the model is the lack of parsimony and economic interpretation, which is the limitation of all nonparametric models. In the proposed model, the estimated discount function has k + 1 coefficients that do not represent any economic intuition. Therefore, it has limited potential in analyzing the dynamics and mechanisms of interest rates. However, the proposed model is mainly motivated by pricing and valuation. After all, pricing accuracy, instead of economic interpretation, is the core in pricing fixed-income securities. An interesting future research direction is to extend our model with long maturities. For example, in actuarial science insurance and reinsurance companies need to evaluate cash flows with very long maturities. The Solvency II framework released by the European Insurance and Occupational Pensions Authority (EIOPA) requires that the maturity domain of the estimated risk-free term structure should cover 60 years at least [43]. However, there are very few treasury bonds that have a maturity longer than 20 years. Therefore, some extrapolating techniques should be included in our model.
Footnotes
Acknowledgment
The work is supported by Zhejiang Natural Science Foundation (LY19G010001,LY20G010002) and National Natural Science Foundation of China (71571163).
