Improved model fitting approaches under ranked set sampling schemes with application to forest data

In this paper, we address the problem of fitting probabilistic models based on several sampling designs that originated from the ranked set sampling (RSS) scheme. Although these sampling designs have been proved to be efficient and/or economical alternatives to simple random sampling (SRS) and RSS, the problem of model fitting still should be properly studied for a complete understanding of their advantages. In this study, we investigated the performance of eight RSS-based designs, among them the recently proposed modified neoteric RSS (Taconeli & Cabral, 2019), combined with six estimation methods in model fitting. Through extensive simulations, we could identify the combinations that provided higher efficiency. The simulated results showed that the RSS-based designs outperformed SRS in model fitting, whereas the Anderson-Darling and maximum product of spacings were the most efficient among the estimation methods. In addition, the Anderson-Darling and maximum product of spacings estimation, combined with the neoteric RSS design and its variants, allowed the highest efficiency under both perfect and imperfect ranking. Additional simulations based on real data set from a forest inventory corroborated these findings.


Introduction
Ranked set sampling (RSS), proposed by McIntyre (1952), leads to more efficient estimation when the units to be sampled can be easily ranked prior to being effectively measured. It is useful when measuring the variable of interest is somehow difficult, costly or demands destructive tests, but small sets of population units can be easily ranked. The statistical efficiency of RSS estimation over its simple random sampling (SRS) counterpart was first proved by Takahasi & Wakimoto, 1968, when the population units are ranked into the sets without errors (perfect ranking), whereas Dell & Clutter, 1972 verified that RSS still produces more efficient estimators under imperfect ranking. The ranking process can be based on the values of some accessible concomitant variable or personal judgment. In the last decades, several studies have ratified the superiority of RSS over SRS in different contexts. For a comprehensive review of the RSS theory and its applications, we recommend Chen et al., 2003. The RSS design can be described as follows.
1. Randomly select n 2 units from the target population and divide them randomly into n sets of size n; 2. Rank the units within each set according to some easy and inexpensive ranking criterion; 3. Select the unit ranked in position i from the ith set, for i = 1, 2, ..., n, to compose the final sample.
Only these observations must be effectively measured for the variable of interest; 4. Steps 1-3 can be repeated m times to draw a sample of size mn.
The final sample is denoted by y [i]j ; i = 1, ..., n, j = 1, 2, ..., m , such that y [i]j refers to the sample unit judged on position i in the ith set from the jth cycle. Similar notations may be adopted for all sampling designs we have considered in this study. When assuming a particular probabilistic model, these random variables are independent, as they are produced by sample units from disjoint sets of the population, but not identically distributed, due to the ranking process.
Several extensions of the original RSS design were proposed in the last decades. These extensions may provide additional statistical efficiency and/or more economy to the sampling process. To highlight some of these extensions, Al-Saleh & Al-Kadiri, 2000 introduced the double ranked set sampling (DRSS) technique; Muttlak, 1996 proposed the paired ranked set sampling (PRSS); Haq et al., 2016, the paired-double RSS (PDRSS); Zamanzade & Al-Omari, 2016, the neoteric ranked set sampling (NRSS); and Taconeli & Cabral, 2019, the double NRSS. These sampling schemes will be described in details later, as well as their pros and cons.
Although the existing literature encompasses many studies on RSS and its extensions, the scenario is somewhat different regarding parametric estimation and model fitting, as the majority of publications are limited to the original RSS scheme. Moreover, these studies often consider maximum likelihood estimation (MLE), where the likelihood function is based on the strong and very restrictive assumption of perfect ranking (see Akgül et al., 2018;Chen et al., 2017Chen et al., , 2019Esemen & Gürler, 2018). Under the perfect ranking scenario, and assuming a particular probabilistic model, the RSS sample units produce a set of independent order statistics, and the likelihood function can be easily derived. Under imperfect ranking, however, the likelihood function is not so trivially obtained. Perfect ranking is not common in practice, since ranking errors are always expected. In such cases, the efficiency of MLE quickly deteriorates with the ranking errors, as can be seen in Dey et al., 2017 andConsulin et al., 2018. Alternatives to MLE were recently considered by Taconeli & Bonat, 2020 and Pedroso et al., 2020, but only for the original RSS design.
For some RSS-based schemes such as NRSS, PRSS, and their double-stage versions, the random variables are neither independent nor identically distributed, since some sample units are drawn from the same ranked set. In this case, MLE is challenging even under perfect ranking. This fact may explain the gap in the literature regarding parametric estimation for these sampling designs. More precisely, in our search we did not find studies on parametric estimation under NRSS, PRSS, and their extensions. Moreover, the number of studies is scarce even for the original RSS and the DRSS procedures.
Our practical motivation for this research was based on a native forest data set. Forest data usually involve performing measurements on a set of variables in the field, generally being a quite expensive activity. The tree total height is strongly correlated with the individual tree diameter (Sharma & Parton, 2007), which allows estimating the tree growth potential and the forest production over time. Due to overlapping treetops, total height is hard to measure directly within the forest, requiring adequate training of operators, besides high precision instruments for indirect measurements.
In practice, few selected trees have their total height measured during the forest inventory, while diameter can be easily obtained from all trees. This practice comes of the desire to reduce measurement costs, and/or sometimes is due to poor visibility in the forest (Zhang et al., 2004). In this context, alternative sampling methodologies to simple random sampling become relevant, especially when they are associated with lower costs of data collection, such as the RSS-based designs.
Therefore, in this study we evaluated the performance of eight different RSS-based designs combined with six estimation methods in fitting probabilistic models. We have also performed additional simulations based on a data set obtained from a forest inventory carried out in native forests located at Brazil. The variable of interest was the height of trees, and their diameters were used to rank the units into each set. Our main objective was to identify methods that meeting the following requirements: (i) flexibility to encompass non-identically distributed and/or nonindependent sample units; (ii) robustness to imperfect rankings; and (iii) practical applicability, with particular interest in forest data. Our study was based on extensive simulations, where different probabilistic models, sample sizes, and levels of imperfect ranking were considered. Additionally, we proposed and assessed the performance of a new sampling design based on a modification of NRSS, which we named as modified NRSS (NRSSm).
In Section 2 we present the sampling designs considered in this study. Section 3 describes the parametric estimation methods we have used. Monte Carlo simulation was used to address the performance of the proposed methods and the results are presented in Section 4. Simulation analysis based on a real data set from a forest inventory are discussed in Section 5. Our final remarks are given in the Section 6.

RSS-based designs
In this section, we describe the RSS-based designs considered in this work. Although some designs are based on a greater number of initially identified elements, or two ranking stages, all of them produce a final sample of size N = mn. For additional information on the sampling designs we recommend consulting the original papers.

Paired ranked set sampling (PRSS)
PRSS is an economic alternative to RSS and other RSS-based designs, as fewer population units must be initially identified. The following steps describe the PRSS procedure.
1. Randomly select nk elements from the target population and allocate them randomly into k sets of size n. For even n, k = n/2, and, for odd n, k = [n/2] + 1, where [x] is the integer part of x; 2. Rank the sample units within each set according to some easy and inexpensive ranking criterion; 3. Select the units ranked in positions 1 and n from the 1st set, the units ranked in positions 2 and n -1 for the 2nd set, and so on; 4. Steps 1-3 can be repeated m times to draw a sample of size mn.
Not all observations from a PRSS sample are independent, as the final sample presents pairs of sample units selected from the same ranked set (Muttlak, 1996). According to Zamanzade & Mahdizadeh, 2018, there is a positive correlation among sample units provided by the same ranked set, such that the statistical inference based on PRSS is expected to be less efficient than that based on RSS, for a fixed sample size.

Neoteric ranked set sampling (NRSS)
The following steps describe NRSS: 1. Randomly select n 2 units from the target population;

Neoteric-neoteric ranked set sampling (NNRSS)
The NNRSS design is also a two-stage sampling, with both stages based on NRSS. The following steps allows selecting a NNRSS sample: 1. Randomly select n 3 elements from the target population and divide them randomly into n sets of size n 2 ; 2. Apply the NRSS method in each set to obtain n NRSS samples of size n; 3. Employ the NRSS procedure to the pooled n 2 elements selected in step 2 to obtain a NNRSS sample of size n; 4. Steps 1-3 can be repeated m times to obtain sample of size mn.
As for the other RSS-based designs, the double-stage NRSS provides more efficient inferences than their single-stage counterparts.

Modified NNRSS (MNNRSS)
The MNNRSS procedure is somewhat similar to NRSS, but the selection scheme in the first stage is somewhat different: 1. Randomly select n 3 elements from the target population and divide them randomly into n ranked sets of size n 2 ; 2. Select the units ranked in the positions {i + jn}, for j = 0, 1, ..., n -1, from the ith set, i = 1, 2, ..., n; 3. Apply the NRSS procedure to the pooled n 2 elements in step 2 to obtain a MNNRSS sample of size n; 4. Steps 1-3 can be repeated m times to obtain a sample of size mn.
Similar to NNRSS, the MNNRSS procedure require the initial identification of n 3 elements from the target population, and the final sample is also obtained after two ranking stages. Additional informations on NNRSS and MNNRSS can be obtained in Taconeli & Cabral, 2019. We should emphasize that NRSS and its two-stage versions can lead to logistical challenges depending on the available ranking criterion. Ranking larger sets based on visual inspection, for example, can be a very difficult task, whereas it can be more accessible when some concomitant variable is used to this end. Hence, operational issues should also be taken into account to choose the most appropriate RSS-based design.

Estimation methods
Among the available estimation methods, MLE may be highlighted by its popularity and desirable asymptotic properties, such as unbiasedness, efficiency, consistency, and normality. However, MLE may be not feasible in some situations, particularly when specifying correctly the likelihood function becomes a hard task. Other estimation methods based on the likelihood function, as the Bayesian approaches, suffer from the same drawback. This might be one of the main reasons why we have few studies on the parametric estimation and model fitting for RSS-based designs. In this work, we address a variety of estimation methods applied to RSS-based designs, namely: maximum product of spacings, ordinary and weighted least squares, Cramér-von-Mises, Anderson-Darling, and right-tail Anderson-Darling. These methods were already considered as alternatives to MLE, under SRS, in several studies, as can be observed in Mazucheli  The following subsections describe the aforementioned estimation methods. For all of them, let us consider y a continuous random variable with cumulative distribution function (CDF) F(·, θ) and probability density function (PDF) f (·, θ), where θ ∈ Θ represents a vector of distribution parameters, and Θ the parameter space. Additionally, let y (1) , y (2) , ..., y (N) be the N = mn sample units ordered in an ascending way according to the effectively evaluated values for the variable of interest.

Maximum product of spacings
The maximum product of spacings (MPS) method applies the probability integral transformation to the sample data. The parameter estimates are obtained such that the spacings between the ordered transformed observations are uniformly distributed at most (see Cheng & Amin, 1979). The uniform spacings are defined as: for i = 1, 2, ..., N + 1, where F(y (0) |θ) = 0 and F(y (N+1) |θ) = 1, such that N+1 i=1 D i (θ) = 1. The MPS estimate of θ is obtained by maximizing the geometric mean of the spacings with respect to θ, given by:

Ordinary and weighted least-squares
Ordinary and weighted least-squares were presented in Swain et al., 1988. Let y (i) be the ith order statistic from a random sample of size N generated from a continuous probability distribution. The following general results can be verified: The ordinary least square (OLS) estimate for θ, denoted byθ OLS , minimizes the following sum of squares:θ We can weighting the sum of squares components by the inverse of their respective variances, producing the weighted least squares (WLS) method. The WLS estimate, denoted byθ WLS , is obtained as follows:θ

Cramér-von-Mises
The Cramér-von-Mises, Anderson Darling, and right-tail Anderson Darling methods are derived from some well known goodness-of-fit statistics. We start with the Cramér-von-Mises (CVM) method, detailed in Parr & Schucany, 1980.
The CVM estimate for θ, denoted byθ CVM , is obtained by minimizing the CVM test statistic:

Anderson-Darling and right-tail Anderson-Darling
The Anderson-Darling (AD) estimate for θ, denoted byθ AD , is obtained by minimizing the Anderson-Darling test statistic, proposed in Anderson & Darling, 1952 and given by: Finally, for the right-tail Anderson-Darling (RTAD) method, we must find θ that minimizes the RTAD statistics:

Estimation methods applied to RSS-based designs
Our proposal to apply the presented estimation methods for RSS-based designs is somewhat general. It consists in pooling all the N = mn effectively measured sample units along the m cycles, and ranking them according to the observed values for the variable of interest in an ascending order. Regardless of the sampling design, the pooled ranked sample can be denoted as y (1) , y (2) , ..., y (N) , and all estimation methods we have discussed so far were applied here to this pooled sample. Although this approach does not explicitly consider the sampling design, as the rankings provided by the ranked sets are not taken into account, it has been adopted in other studies due to its flexibility and efficiency under imperfect ranking (Alizadeh Noughabi, 2017; Pedroso et al., 2020; Taconeli & Bonat, 2020).
All suggested estimation methods require numerical optimization to compute the parameter estimates. We implemented the approaches using the optim() function of the statistical software R Core Team, 2019. The R-scripts are available upon request from the authors.
The chosen probabilistic models cover a wide variety of shapes for density functions, with different degrees of skewness and kurtosis. The adopted sample sizes are very useful for RSS-based designs, where, in general, the set sizes are small and larger sample sizes are obtained by increasing m. We have also evaluated the impact of imperfect ranking. This was done by first simulating data from a standard bivariate normal distribution with correlation ρ. In this case, we took one of these variables as the variable of interest while the other was considered a concomitant variable, used to rank the sets. In this way, we have perfect ranking when ρ = 1, while for ρ = 0 the sampling scheme becomes equivalent to SRS. We considered the following correlation levels: ρ = 0.5, 0.6, 0.7, 0.8, 0.9, and 1. Finally, the inverse probability integral transformation was applied to the variable of interest in order to generate samples from the target distribution. The smaller ρ, the lower the correlation between the concomitant and interest variables, increasing the amount of ranking errors induced in the simulation.
We randomly simulated 10,000 samples from each probabilistic model and for each combination of sampling design, sample size, set size, and correlation level. As we are interested in evaluating the sampling designs and estimation methods in model fitting, we adopted the mean integrated squared error (MISE) as a discrepancy measure between the true and fitted PDFs. Let f (y;θ) and f (y; θ) represent the true and fitted PDF. MISE is defined as follows: MISE was estimated based on the 10,000 simulated samples for each simulated scenario, as described by: whereθ (b) refers to the estimate for θ produced by the bth simulated sample, for b = 1, 2, ..., 10, 000.
The total number of simulated scenarios is too high to display the results for each of them. Therefore, we proceeded with a two-step strategy. First, we present a series of marginal evaluations, averaging the simulated MISEs over the sampling designs, estimation methods, sample sizes, and/or correlation levels. These results allowed us to initially explore the performances of the sampling designs and estimation methods in a wide range of scenarios. Then, we selected some sampling designs, in addition to the estimation methods that produced the best results in the first stage, to explore each scenario individually. At this stage, we do not averaged the MISEs over any other simulation parameter. Figure 1 presents the estimated MISEs averaged over the sample sizes and sampling designs, for each probabilistic model and according to the levels of imperfect ranking. We may conclude that MPS and AD were the most efficient methods for almost all combinations of probabilistic model and ρ. The WLS method often appears at third position, while the others estimation methods have shown worse performance. All estimation methods becomes less efficient as ρ gets smaller. In all plots, we represent the results for MLE based on SRS as a benchmark. In most simulated scenarios, the AD and MPS methods were more efficient than SRS-MLE, even for the smallest ρ values. Figure 2 displays the results for the sampling designs averaging over the sample sizes and estimation methods. Despite some overlapping curves, we can notice that the RSS-based designs outperformed SRS-MLE in most scenarios. Once again, higher MISEs were verified for lower ρ values. It was also possible to identify a general hierarchy on the results: NNRSS and MNNRSS produced the lowest MISEs, followed by NRSS and NRSSm; then DRSS and PDRSS; next, RSS and PRSS; finally, SRS provided the least efficient estimation. Among the pair members themselves (for example, NNRSS and MNNRSS), the differences were negligible. Figures 3 and 4 allow us to analyze how each estimation method and sampling design performed for different sample sizes and probabilistic models. We averaged here the MISEs over the correlation levels, and over either the sampling designs ( Figure 3) or the estimation methods ( Figure 4). We may observe from Figure 3 that the RSS-based estimators outperformed SRS-MLE in most scenarios for all sample sizes. Additionally, all estimators lose efficiency for small sample sizes and, for a fixed sample size, higher efficiency was reached by increasing n rather than m. The curves were almost parallel for all distributions, suggesting that the relative performances of the estimation methods did not substantially change with the different combinations of sample size and probabilistic model.
For a better visualization of the results, we omitted the following sampling designs from Figure 4: PRSS, PDRSS, NNRSS, and NRSSm, which performed, respectively, very similar to those of RSS, DRSS, MNNRSS, and NRSS. Once again, the sampling designs keep the performances hierarchy registered in Figure 2. The curves again shown a clear parallelism, indicating that their relative performances did not vary for the different sample sizes and probabilistic models.   For the pair of estimation methods with better performance, AD and MPS, we also present their relative efficiencies (RE 1 ) to the SRS-MLE, for each combination of sampling designs, N, n and ρ, averaging over the eight probabilistic models. RE 1 was calculated as follows: where MISE ·RSS (d) represents the MISE for some combination of sampling designs, N, n and ρ under a specific probabilistic model (d), and MISE MLE,SRS (d) the MISE for SRS-MLE under the same specification, d = 1, 2, ..., 8. These results can be seen in Tables 1 and 2 for AD and MPS estimation, respectively. Some remarks are presented as fallows: • For both AD and MPS methods, RSS-based estimation allowed more efficient model fittings than SRS-MLE and their SRS analogous in most cases; • The relative efficiencies decreased for smaller ρ, and they were higher, for a fixed N, when n = 5 than for n = 3; • The highest efficiency was verified using MNNRSS, followed by NNRSS; • NRSS and NRSSm provided better model fittings than some double-stage RSS designs, as DRSS and PDRSS, indicating that they are cost-effective alternatives; • PRSS allowed similar results to RSS, and the same for PDRSS relative to DRSS. By this way, PRSS and PDRSS are cost-effective alternatives to RSS and DRSS, respectively; • When combined with AD estimation, the proposed NRSSm design provided higher efficiency than their single-stage opponents, besides DRSS and PDRSS. NRSS, on the other hand, outperformed the same set of sampling designs when combined with MPS estimation. Therefore, NRSSm-AD and NRSS-MPS are recommended methods for model fitting in the RSS framework. Table 3 presents the efficiencies of AD and MPS estimation relative to the well-established RSS-MLE combination, calculated as: where MISE ·RSS (d) represents the MISE for some combination of sampling designs, N, n and ρ under a specific probabilistic model (d), and MISE MLE,RSS (d) the MISE for RSS-MLE under the same specification, d = 1, 2, ..., 8. We should remember that RSS-MLE is based on the distributions of order statistics, i.e, derived from the assumption of perfect ranking. In general, we highlight that: • AD and MPS estimation provided higher efficiency than RSS-MLE when combined with NRSS, NRSSm, DRSS, PDRSS, NNRSS, and MNNRSS. The efficiency improvement was higher as N and ρ increase and, for a fixed N, when n = 5 than for n = 3; • For RSS and PRSS, AD and MPS were still more efficient than RSS-MLE, but only for the lowest correlations (particularly for ρ = 0.7 and ρ = 0.5); • AD was usually more efficient than MPS for almost all RSS-sampling designs, except for NRSS, NNRSS, and MNNRSS.
As mentioned before, we proceed now with the analysis of some selected sampling designs and estimation methods under each probabilistic model, one at a time. Figures 8 to 15, available as online supplementary material, show the corresponding results. We choose to present the results for AD and MPS estimation, as they proved to be the most efficient estimation methods based on our previous simulation. In addition, we opted to display the results of DRSS, MNNRSS, NRSS, and RSS, since they produced similar results, respectively, to PDRSS, NNRSS, NRSSm, and PRSS. The results obtained for the different probabilistic models are somewhat similar, and the main findings are briefly summarized in the following remarks.  (3) 2.14 (4) 2.09 (5) 2.05 (6) 3.01 (2) 3.09 (1) 1.01 (9) 0.7 1.28 (7) (6) 3.45 (1) 3.27 (2) 1.05 (9) 0.7 1.25 (7) 1.23 (8) 1.48 (3) 1.30 (6) 1.31 (4) 1.31 (5) 1.62 (1) 1.61 (2) 1.05 (9) 0.5 1.14 (7) 1.12 (8) (6) 2.86 (2) 2.96 (1) 1.00 (9) 1. All RSS-based designs outperformed their SRS counterparts and MLE based on SRS in fitting the probabilistic models; 2. The efficiency provided by the RSS-based designs decreased for lower ρ; 3. Higher efficiency was achieved for larger N, or by increasing n for fixed N; 4. MNNRSS (and NNRSS) generally performed the best among the RSS-based designs, followed by NRSS (and NRSSm); then DRSS (and PDRSS); and finally RSS (and PRSS); 5. In general, there was not a remarkable difference between AD in MPS estimation, as they usually provided very similar performances. The only exception was the log-normal distribution, for which AD estimation was superior than MPS, especially when combined with RSS and DRSS designs.

Simulation based on real forest data
Tree height is an important variable for predicting the forest growth and yield. Especially in native forest, height measures are performed in few trees since it is a hard procedure with high costs. However, the tree diameter at breast height is easily accessible and positively correlated with total height. Therefore, RSS and its extensions are promising designs in order to estimate the distribution of tree heights, where the diameter may be considered a concomitant variable used to rank the elements into each set. In this section, we use total height (in meters) and diameter at breast height (in centimeters) data provided by a forest inventory project developed on the Forest Management Lab, Department of the Forest Engineering of Federal University of Jequitinhonha and Mucuri Valleys, Diamantina/Minas Gerais -Brazil. This data set contains heights and diameters at breast height for 12,295 trees from 172 native species, and it is available in the R package forestmangr, named as exfm20. For additional details about this data set, see Sollano Rabelo Braga et al., 2019.
The Pearson product-moment correlation coefficient calculated for this pair of variables is 0.632, while the Spearman's rank correlation coefficient is 0.631. These values are relatively low for this type of study, but we are handling with native forest data and a large number of different native tree species. However, it can be quite interesting to observe how the estimation methods are affected by this level of imperfect ranking in a real problem. Figure 5 displays the height distribution and a correlation plot for height and diameter. We may notice that height distribution is slightly right skewed, as well as diameter and height show a quite common non-linear relationship (Cysneiros et al., 2020).
Following, we proceeded with the search of a probabilistic model providing the best fit for the height distribution considering the entire data set. This step was necessary since we are interested in assessing how the sampling designs and estimation methods perform in fitting a probability distribution that properly describes the tree heights.
In our search, we have fitted several probabilistic models with one, two, or three parameters using MLE, and they were compared based on the Akaike information criterion (AIC). The twoparameter power Lindley distribution, an extension of the original one-parameter Lindley model, performed the best, yielding lower AIC than all the other investigated models. Therefore, we considered the power Lindley distribution for this application. Ghitany et al., 2013 presents this model, its mathematical properties, and associated inferences, while Taconeli & Giolo, 2020 discuss MLE for the power Lindley distribution under RSS.
Based on this data set and the power Lindley distribution, a new simulation study was conducted. Once again, we have considered the following settings: N=15, 30, and 60; n=3 and 5; AD and MPS estimation, and the eight RSS-based sampling designs, along with SRS. For each combination of N, n, estimation method and sampling design, 10,000 samples were drawn. Samples were simulated with replacement from the exfm20 data set, and for each selected sample, the power Lindley distribution was fitted. We have used two different ranking schemes: in the first, the trees in each set  were directly ranked based on their heights, configuring the perfect ranking scenario; in the second, they were ranked according to their respective diameters at breast height, configuring the imperfect ranking scenario. Results are again summarized through MISE and the relative efficiencies to SRS-MLE, as described in (1), and they are presented in Figures 6 and 7 for perfect and imperfect ranking, respectively. We may observe that RSS-based estimation performed better than SRS-MLE for almost all sampling designs under both perfect and imperfect ranking scenarios. As expected, the efficiency gains were higher when the ranking is perfect, and lower as the sample size increases. In general, the designs based on NRSS provided higher efficiency than their competitors. Under imperfect ranking, RSS-MLE was highly affected by ranking errors, in such a way that the alternative estimation methods were more efficient for all simulated scenarios. Finally, we may observe that some alternatives to RSS-MLE have shown better performance even for perfect ranking. It is the case of NRSSm combined with AD estimation, which produced higher RE than RSS-MLE for all combinations of N and n.

Final remarks
We have evaluated the performance of six estimation methods combined with eight different RSS-based designs in fitting probabilistic models. These procedures have shown to be flexible, providing higher efficiency than their SRS-based counterparts, and they are also more robust than RSS-MLE for imperfect ranking, particularly with respect to AD and MPS. The simulated results allowed us to rank the RSS-based designs and estimation methods according to the provided efficiencies. Based on the results, we can recommend to use RSS-based designs for model fitting purposes. The AD and MPS estimation are preferable, since they usually produced lower MISEs. Furthermore, the NRSS and its extensions were recommended to achieve the best results, but operational restrictions must be taken into account regarding ranking larger sets. The double-stage designs outperformed their respective single-stage versions, and they are also recommended, but respecting operational difficulties in selecting a greater number of population units to obtain a fixed sample size.
Our research showed that the RSS and their extensions have great potential to be applied at forest data and improve the efficiency of forest modeling. These methodologies are also important as alternatives for reducing the sample size, the time for collecting data and the cost involved with forest inventory, which can directly impact the forest management. For practical forest management, the heights are widely used for simulating the tree growth over time and predicting the individual tree volume for multiple timber products. The higher costs for collecting data induce lower sample sizes in practice, increasing the uncertainty in height estimations. Efficiency gains by using the RSSbased designs can improve the forest inventory quality based on less sample units. A lower sample associated with great efficiency has potential to reduce the costs and improve the forest modeling.
We believe that this study may enable a wide range of new developments. As topics for future research, we may highlight the evaluation of other RSS-based designs and/or estimation methods, the situation of discrete random variables, such as that related to count data, and additional applications based on other real data sets. Furthermore, as our simulation methodology is consistent with the case of non-finite populations, where a single sample unit can only appear once in the final sample, studies considering finite and small to moderate population sizes are also recommended for future research.
Finally, additional researches on forest sciences, involving other variables such as the tree volume and biomass, are also recommended for future investigation. It would be very useful to assess how the RSS-based designs could be employed combined with other traditional and well established sampling schemes in this area. Stratified or clustered RSS-based designs, for example, could be considered as possible cost-benefit sampling schemes for forest inventories.