FUNDAMENTAL CONCEPTS AND RECENT APPLICATIONS OF FACTORIAL STATISTICAL DESIGNS

 ABSTRACT: Factorial designs have been increasingly used in scientific investigations and technological development. The designs, through the use of matrices with all the treatment combinations, have been capable to effectively characterize the relationships between the variables of multi-factor experiments, assess the experimental variabilities, and derive mathematical functions that represent the behavior of the responses. Factorial designs were fractionalized, which substantially reduced the number of treatments without the loss of relevant information. The addition of central and star points to the factorial arrays has given them the orthogonality and rotatability characteristics, frequently used to fit models with curvature and identify critical regions of interest. Literature reports indicated that factorial designs, also called factorial experiments, were successfully applied in different types of investigations, including in cost evaluations and timeseries studies. They were capable to estimate important features of the experiments, like the individual and combined effects of factors, the magnitude of residuals, additionally to express the relationships of the variables in polynomial equations, draw response surface and contour plots, and determine optimal combinations of parameters. In this review, the fundamental aspects of the Complete, Fractional, Central Composite Rotational and Asymmetrical factorial designs were conceptualized, and recent applications of these powerful tools were described.


Introduction
Experimentation is an essential part of the scientific methodology, fundamental for the production of knowledge, progress of science and technological development (ZANCAN, 2000). The process of experimentation, initially defined as a "deliberate observation under conditions deliberately arranged by the observer", has been summarized to a "a cause-and-effect relationships in a system", an essential requirement to demonstrate theories or hypotheses of a functional structure (HINKELMANN and KEMPTHORNE, 2007;MONTGOMERY, 2017). The application of statistical designs in experimentation, which now has been a common practice in academic laboratories to ensure valid conclusions in objective investigations, has begun in the 1930s, when the activities of getting products with better quality and processes with higher productivity had become a goal in industries (MASON et al., 2003;MONTGOMERY, 2017). Since then, the experimentation process through statistical designs has been increasingly used, along to achieve products with better quality within the limits of desired specifications, to substantially reduce the development time and cost, promote reliable and environmentalfriendly processes, and to proceed experimental analyses and data collection more efficiently (MASON et al., 2003;RODRIGUES and IEMMA, 2005;WEISSMAN and ANDERSON, 2015;MONTGOMERY, 2017).
The use of statistical designs, also called statistical plans, has enabled the researchers to, within the rigor and adherence to the principles of science, generate the maximum amount of relevant information in a minimum number of possible experiments (MASON et al., 2003). They have been used to accurately draw inferences from the observational data, characterize the relationships between different types of experimental variables, and economically assess the relevance and the quantifying effects of factors (CHRISTENSEN, 1996;MASON et al., 2003). Different from OFAT (one-factor-at-a-time) experiments that only shows how a response variable is affected by varying each factor while all the other are kept constant, well planned statistical designs, like the factorial ones, have enabled the consideration of any possible interaction between the factors, a remarkable achievement in modern experimentation (MONTGOMERY, 2017). Factorial designs, in which the factors are varied together instead of one at a time, have been an important approach for scientists and designers to, without the excessive use of experimental resources and waste through uneconomical methods, quantify synergistic or antagonistic effects of factors and estimate ideal combinations of variables (MASON et al., 2003;RODRIGUES and IEMMA, 2005).
Another great feature of statistical designs is that they have been able to measure the magnitude of errors, that is, random variations that can occur due to changes in the ambient condition, experimental and measurement errors, or effects due to any other known or unknown influences (MASON et al., 2003). The possibility to distinguish assignable causes of variations from random ones has been widely used to eliminate sources of bias (systematic differences) and ensure that the experiment has provided precise information concerning the responses of interest (MASON et al., 2003). They have been successfully able, thus, to properly measure the relationships between the variables and the influence of experimental factors (DRAPER and SMITH, 1998;HINKELMANN and KEMPTHORNE, 2007). Additionally, they have been successfully capable to derive simple and complex polynomial equations regarding the behavior of the experimental data, draw response surface and contour plots that simplify the interpretation of the empirical results, forecast future events based on previously observed values, and determine optimal operational conditions of different kind of systems (MASON et al., 2003;RODRIGUES and IEMMA, 2005).
Due to the growing importance that statistical designs have been acquiring in modern researches, as well the great relevance that they have been detaining in scientific and technological areas, this review article addressed the conceptualization of their fundamentals and described the characteristics of the Complete, Fractional, Central Composite Rotational and Asymmetrical factorial designs. The article also addressed the sequential use of factorial statistical designs as an experimental optimization strategy, their employment in cost evaluations and time-series studies, and summarized recent applications of their use in several scientific fields.

Methodology
The literature review was based on physical documents obtained in the library of the local University and in virtual databases indexed on the Web of Science, Scopus, ScienceDirect and Google Scholar platforms. The surveys were conducted until August 12th, 2021, and included both public open-access and institutional available documents published in English or Portuguese. The terms used in the searchers were "Fundaments of Statistics", "Concepts of Statistics", "Statistical Analysis", "Design of Experiment", "Statistical Inference", "Confidence Interval", "Prediction Interval", "Regression Analysis", "Factor Effects", "Determination Coefficient", "Analysis of Variance", "ANOVA", "Lack-of-Fit", "Outlier", "Pareto Diagram", "Response Surface", "Contour Plot", "Multi-factor Design", "Factorial Design", "Complete Factorial", "Full Factorial", "Fractional Factorial", "Central Composite Rotational", "Asymmetrical Factorial", "Mixed-Level Factorial", "Sequential Use of Factorial", "Plackett-Burman Design", "Taguchi Design", "Cost Evaluation", "Economic Analysis", "Time-Series" and "Time-Trend". After the acquirement of the documents, they were screened according to their titles, abstracts and contents in order to eliminate duplicates and verify their adequacy to the theme proposed in this article. Then, the documents were classified according to the following topics: (i) Fundaments of Statistics, (ii) Design of Experiments, (iii) Complete Factorial, (iv) Fractional Factorial, (v) Central Composite, (vi) Asymmetrical Factorial, (vii) Sequential use of Factorial Designs, (viii) Cost Analyses/Evaluations and (ix) Time-Series Studies.

Results
The literature survey found 216 documents that had adherence to the theme proposed by this review article. After the screening process mentioned in the previous item, 66 documents were classified as belonging to the topic (i), 28 as belonging to the topic (ii), 12 to the topic (iii), 15 to the topic (iv), 20 to the topic (v), 13 to the topic (vi), 16 to the topic (vii), 17 to topic (viii) and 29 to the topic (ix). Of these, 98 documents that presented great quality were chosen to be used as a theoretical basis in the preparation of this article and included in the references.

Principles of statistical designs
Analyze and understand a relevant set of data from a certain investigation are frequent practices in the research activity of different study fields. Descriptive statistical values, which summarize datasets, such as the simple arithmetic mean, mode, median, and dispersion measures, like the amplitude, deviations and variances, have been notoriously used to explore and simplify the interpretation of empirical observations (Equations 1, 2, 3, 4, 5 and 6) (PEARSON, 1894;YULE, 1897;FISHER, 1919;ZWILLINGER and KOKOSKA, 1999;ALTMAN and BLAND, 2005;SHESKIN, 2011). Regularities or patterns identified from the observations, though, have enabled the researchers to transform the data into more useful information, a fundamental approach to draw valid inferences from the observational data and develop models that adequately describe them (MASON et al., 2003;HINKELMANN and KEMPTHORNE, 2007). Statistical analyses, in this context, have been playing a major role to judge the adequacy of scientific hypotheses, compare results in the literature, and make strategic decisions (BROWNLEE, 1965;MORETTIN and BUSSAB, 2017).
The possibility to develop experimentation designs, that is, statistically designed experiments that allows an efficient measurement of the relationship among variables of interest and the estimation of the magnitude of experimental errors, have enabled the researchers to comprehend cause-and-effect relations in a system, thus, represent a theory and make accurate forecasts based on their results (MASON et al., 2003;KUTNER et al., 2005;MONTGOMERY, 2017). These designs, where the variables of interest are often controlled and fixed at predetermined values for each test run, have been named statistical designs, also known as statistical experiments (MASON et al., 2003). Statistical designs, besides developed from empirical observations obtained in experimental studies, have also considered various types of statistical parameters, specifications and measurement errors (MASON et al., 2003). The designs' assumptions, which can have many forms and configurations, have been mainly confirmed from adequation proceedings, adjustment measures or subjectively by graphic techniques (CHRISTENSEN, 1996;MASON et al., 2003).
where ̅ refers to the arithmetic mean, xi to the response of a random variable, n to the sample size, S to the sample standard deviation, to the population standard deviation, µ to the population mean, N to the population size, SE to the sample or populational standard error, S 2 to the sample variance, and 2 to the population variance.

Statistical inferences
Statistical designs have enabled the estimation of conclusions about certain population characteristics and processes through measurements or observations made in a representative set, called the sampling group (CHRISTENSEN, 1996;MORETTIN and BUSSAB, 2017). The conclusions, also known as statements or inferences, have involved a degree of uncertainty due to uncontrollable experimental variations based on statistical models that represent the probability of the occurrence of events, named probabilistic frequency distributions (GALTON, 1889;MASON et al., 2003;ONYIAH, 2008). Most of statistical designs have been assumed to detain a Normal or close to Normal frequencies, that is, when the data is symmetrically concentrated around a central value. Statistical designs with small sample sizes (n < 30), however, in order to adjust the number of observations through the degree of freedom, have commonly replaced the Normal distribution by the Student's t distribution, and the populational standard deviations by the sample deviations (RODRIGUES and IEMMA, 2005;ONYIAH, 2008).
The assumptions of the inferences have been done by the use of hypothesis tests, which uses a validation criterion with two suppositions: a null (H0) and an alternative hypothesis (HA) (CHRISTENSEN, 1996). The null hypothesis has been commonly used to state that there was sufficient evidence to prove that there was no relationship between two measured parameters, and the alternative hypothesis, that there was not enough evidence for such statement. The results of the tests have been compared to tabulated critical regions of the Normal or the Student's t distributions models according to determined levels of significance (Equations 7 and 8) (MORETTIN and BUSSAB, 2017). The levels of significance, also called α value, have been used to determine the probability to mistakenly reject the null hypothesis when it was true, that is, the probability of the occurrence of a false positive (KALINOWSKI and FIDLER, 2010). The test uses a measurement of evidence related to the probability to obtain extreme results under the null hypothesis, called p-value, probability value or descriptive level (BROWNLEE, 1965). The p-value has been interpreted as a decreasing index of reliability, that is, the higher its value, the less reliable has been the observed relationship between the parameters of interest.
where and refers to the test statistics following the respective Normal or Student's t distributions, ̅ to the sample mean, µ to the population mean, n to the number of independent observations (Student's t distribution with a n -1 degree of freedom), to the population standard deviation, and S to the sample standard deviation.

Confidence and prediction intervals
Confidence intervals, a type of estimate that infers information regarding the precision of the estimators, have been used to represent repetition frequencies around estimates of uncertainties and to indicate the inaccuracy of the means (NAKAGAWA and CUTHILL, 2007;PATINO and FERREIRA, 2015). In statistical designs, the intervals are associated to a confidence level whose amplitude estimates the frequency in which the value of a parameter of interest would be included in infinite hypothetical repetitions (KALINOWSKI and FIDLER, 2010). Confidence intervals of 95%, for example, have indicated that the mean value of the parameter of interest would tend to be included within the calculated extremities 95% of the time if the procedure was repeated numerous times (COX and HINKLEY, 1979). The prediction interval has also been interpreted as an interval estimate calculated from observations, however, it has indicated values in which a new observation would probably fall according to a certain probability. This type of interval has provided more accurate estimates in relation to the occurrences of events and experimental responses (CHRISTENSEN, 1996). The calculation of a 100(1-α)% confidence interval of a normally distributed large sample size with a known populational standard deviation has been done as described in equation 9. The prediction interval has been calculated as indicated in equation 10 (ONYIAH, 2008).
where ̂ refers to the interval estimate, ̅ to the sample mean, Z to the tabulated value of the Normal distribution, α to the probability of the confidence level, to the population standard deviation, and n to the number of independent observations.

Regression analyzes
Statistical designs, through regression methods, have enabled the characterization of the relationships between independent and dependent variables used in the experiments, so called predictor and response variables respectively (DRAPER and SMITH, 1998;MONTGOMERY, 2017). Regression techniques, widely used in statistical modeling, besides used to quantitatively estimate the contribution of independent variables in the prediction of the dependent ones, have been used to assess the adequacy of the assumptions, examine the influence of atypical observations (outliers), and also to identify the presence of linear effects between variables, called correlations (CHRISTENSEN, 1996;MASON et al., 2003). In general, the relationships between the variables have been expressed in mathematical functions calculated using the least squares method (DRAPER and SMITH, 1998;ZWILLINGER and KOKOSKA, 1999).
The least squares method has provided a numerical estimation of the vertical intercepts and the slopes of the regression equations, in other words, the values of the constants and the angular coefficients, also called gradient of the linear functions. The associations of the responses have been described through simple or complex linear coefficients, that is, of first or more orders (Equations 11 and 12). The regression models have been referred as linear because the unknown coefficients have appeared in linear forms, that is, as additives or multipliers constants of the values of the predictor variables (MASON et al., 2003). Regression models have been diagnosed through adjustment measures, inferences analysis, and visually through graphics, such as dispersions and scatterplots (CHRISTENSEN, 1996;MASON et al., 2003;MONTGOMERY, 2017). = 0 + 1 1 + 2 2 + ⋯ + + (11) = 0 + 1 1 + 2 2 + 12 1 2 + 11 1 2 + 22 2 2 + where y refers to the predicted response of the adjusted model, x... to the predictor variables, β0 to the intercept value, β... to the simple or quadratic (parabolic) linear coefficients, k to the number of predictor variables, and e to the adjustment error.

Factors' effects
In statistical designs, the independent variables, also referred as the predictors or controllable variables, have been commonly called factors. As seen in regression models, the relationships between the factors and the responses have been quantitatively calculated and described in the form of polynomial functions (Equation 13) (BEZERRA et al., 2008). The effects of the factors have also been used in inferences tests and interval estimates (MASON et al., 2003). The interactions and correlated effects, non-linear and linear relationships between the responses of three or more variables, have occurred when the results of one factor depended on the state of another factor. The interactions effects have also been measured and described in the form of mathematical equations in regression models (Equation 14) (CHRISTENSEN, 1996). In balanced statistical designs, that is, with equal amounts of repetitions for all the combinations of equally spaced factor's values, the coefficients of the effects have been previously tabulated, which had simplified the derivation of polynomial functions of second or more orders (Equation 15) (MASON et al., 2003). The values of the predictor variables have been used in a standardized and coded forms, facilitating comparative analyses (Equation 16).
where y refers to the predicted response, k to the number of variables, 0 to the constant term, to the simple linear coefficient, to the interaction coefficient, to the quadratic coefficient (second order), and to the predictor variables, to the residue, to a standardized predictor variable, ̅ to the sample mean, and to the sample standard deviation.

Coefficients of determination
The coefficient of determination, also called explanatory coefficient or R 2 , has been interpreted as a measure of the percentage of responses that was explained by the regression equations. In other words, the determination coefficient has quantified the proportion of the adequacy between the observed values in relation to the proposed values by the statistical design (RODRIGUES and IEMMA, 2005;NAKAGAWA and SCHIELZETH, 2013). The adjusted coefficient of determination (R 2 a), in turn, has been defined as an adjustment of the R 2 , as it has also considered the number of independent observations, the degree of freedom. The adjusted determination coefficient has been used to compare statistical designs with different numbers of predictor variables (CHRISTENSEN, 1996). The quality of the regression equation fit, according to the determination coefficient R 2 , has been calculated as indicated in equation 17. The calculation of the adjusted R 2 has been done as demonstrated in equation 18. In general, R 2 and R 2 a values have varied between 1 and 0, and have been expressed as percentages.
where 2 refers to the determination coefficient, 2 to the adjusted determination coefficient, to the experimental value observed in the response variable as a function of the level i of a predictor variable, ̅ to the general mean of the observed values, ̂ to the estimated or predicted value of the adjusted model for a response variable as a function of the level i of a predictor variable, n to the number of observations, and k to the number of predictor variables.

Analysis of variance
Through the use of analytical techniques around the variances, statistical designs have been capable to estimate significant differences between the means of treatment groups. The method, called ANOVA, has enabled to separate or partition the observable variation of the designs into two components: assignable and uncontrollable causes of variations, and suppose if there were significant differences between them (MASON et al., 2003). The assignable causes have been considered as sources of variations that were possible to be measured and controllable, while the uncontrollable, sources of random variations (RODRIGUES and IEMMA, 2005;MORETTIN and BUSSAB, 2017). In general, the variability decomposition of random and independent samples, with normally distributed frequencies and equal populational variances, has been done through the Total Sum of Squares (Equations 19 and 20). The Sum of Squares of the Error and the Sum of Squares of the Regression model, respectively called residue and adjusted model, have been calculated using the equations 21 and 22.
where TSS refers to the sum of squares due to Total variation, SSR to the sum of squares due to parameters considered in the model (Regression), SSE to the sum of squares due to adjustment Errors (residue), to the value observed in the response variable as a function of the level i of a predictor variable, ̅ to the general mean of the observed values, ̂ to the predicted value of the adjusted model for a response variable as a function of the level i of a predictor variable, and n to the number of independent observations.

ANOVA significance tests
In analyzes of variance, the probability to obtain conclusive statements about the differences among the means of treatment groups, in general, have been calculated through the Mean Square of the Regression and the Residue ratio, called F test (Equation 23). The estimates of the test have been compared to critical regions of the Fisher-Snedecor distribution according to previously determined significance levels (HOAGLIN et al., 1991;CHRISTENSEN, 1996). The ANOVA test has been commonly used to measure the magnitude of the statistical significance, that is, the estimates related to the "degree of certainty" between the associations of the parameters. ANOVA procedures have been frequently used in multiple variables analysis, where the F test has been done independently for each factor and interaction, including covariates, blocks and curvatures (SHESKIN, 2011).

=
( 23) where refers to the value of the test statistic following the F distribution, MSreg to the mean square of the Regression, and MSres to the mean square of the Residue.

Lack-of-fit tests
The residuals of statistical designs are important parameters of reliability. They have been commonly used in inferences methods that allow the validation of the adequacy of the hypothetical models (RODRIGUES and IEMMA, 2005). The estimates have been done by the use of the variation between repetitions within each treatment, that is, random fluctuations between authentic replicates (DRAPER and SMITH, 1998;BEZERRA et al., 2008). The calculations have been made by partitioning the Sum of the Squares of the Residues into two useful components for a hypothesis test according to certain levels of significance, so called Pure Error and Lack of Fit (Equations 24 and 25). Statistically significant lack-of-fit hypothesis tests (p ≤ α) have, in general, indicated a residual variability greater than a pure error variability (Equation 26).
Depending on the significance criterion, significant lack-of-fit tests have indicated a low degree of model adequacy, requiring, in some cases, the use of more complex designs to describe the behavior of the responses (RODRIGUES and IEMMA, 2005). Noteworthy, the lack-of-fit test by the sum of squares has been highly sensitive regarding deviations from normality, large or small sample sizes, and the presence of influential observations, those with atypical values, called outliers (HOOPER et al., 2008). During the past years, remarkable researches have been done involved in the development of new diagnostic techniques and structural transformations of statistical designs. Methods, such as goodnessof-fit, incremental fit indices, parsimony fit indices, among others, have been proposed to validate the assumptions of different types of models (D' AGOSTINO, 1986;MULAIK et al., 1989;STEIGER, 1990;MILES and SHEVLIN, 2007;HOOPER et al., 2008;MONTGOMERY, 2017).
where SSPE refers to the sum of squares of the Pure Error, SSRes to the sum of squares of the Residue, SSLOF to the sum of squares of lack of fit, to the value of the test statistic following the F distribution, to the observed value of the j repetition in the response variable as a function of the level i of a predictor variable, ̅ to the general mean of the values observed in the level i of a predictor variable, ̂ to the value predicted through the adjusted model for a response variable as a function of the level i of a predictor variable, ̅ to the mean value of observations between the repetitions at the level i of a predictor variable, and n to the number of independent observations.

Outliers
The presence of outliers, observations with atypical values in relation to other observations obtained under the same condition, was seen to strongly affect the inferences procedures and the regression models (MASON et al., 2003). Outliers, usually with a high residual value, have been visually identified through graphs and diagrams, such as box plots, residual plots, normal Quantil-Quantil, and also by diagnostic techniques, such as Grubbs test, Leverage values, Cook's distance, standardized and studentized analysis of residues (GRUBBS et al., 1950;COOK, 1977;CHRISTENSEN, 1996;KUTNER et al., 2005;MONTGOMERY, 2017). Accommodation methods, such as the collection of more data, model reexpression, deletion, winsorizing and trimming extreme observations, have been often used to mitigate the effects of outliers (TUKEY, 1962;MASON et al., 2003). The techniques, however, have been proceeded in a careful, justified and gradual way, based on stepwise selection methods, such as forward selection, backward elimination and stepwise iteration (CHRISTENSEN, 1996;DRAPER and SMITH, 1998;SHESKIN, 2011).

Pareto diagrams
Pareto diagrams, graphs that contain histograms and lines, have been used to order the frequencies according to their occurrences. In statistical designs, Pareto diagrams have been used to order the individual and combined effects of the variables according to the magnitudes of their statistical significance. The description of the effects in absolute and standardized values has enabled the inclusion of reference lines that have indicated a minimum magnitude for a significant statistical effect according to determined α levels of significance (RODRIGUES and IEMMA, 2005). In statistical modeling, Pareto diagrams have permitted the prioritization of variables of greater importance in a simplified way and frequently used in quality control charts (PORTER et al., 1997).

Response surface and contour plots
The relationship between two or more factors has been drawn on graphs with two or three dimensions, called contour or response surface plots. The graphs, constructed through mathematical functions regarding the relationship between the independent and dependent variables, have been used to simplify the interpretation of empirical results, mainly in experiments with multiple factors (MASON et al., 2003;HINKELMANN and KEMPTHORNE, 2007). In scientific investigations, contour and response surface graphs have been frequently used to evaluate important characteristics of statistical designs, such as the behavior of the results, sensitivity of the variables and effects of interactions. In the assessment of industrial processes, response surface and contour charts have been used to improve the quality of products and to estimate optimal operational conditions of the systems (NOORDIN et al., 2004;RODRIGUES and IEMMA, 2005).

Designs' equations
The behavior of experimental responses of statistical designs have been described in mathematical functions, expressed, in general, as demonstrated in equation 27 (MASON et al., 2003). In the equation, the smooth or the regular part, so called μ, refers to the predictable part of the statistical design. The e part, considered as important as the μ part, refers to adjustment errors, also called residue (MORETTIN and BUSSAB, 2017). The residue is calculated from uncontrollable sources of variation, and has been interpreted as a measure of discrepancy between the observed and the proposed values by the statistical designs. Residual analyzes, which are usually done through adjustment measurements or by the use of graphs and diagrams, have been crucial to validate the adequacy of theoretical models (CHRISTENSEN, 1996;TSAI et al., 1998) where y refers to the representation or the description of the observed data, μ to the statistical design, and e to the adjustment error (residual).

Multi-factor designs
Statistical designs have been increasingly used in experiments with multiple factors, also called multi-factors. The application of appropriate statistical designs in this type of study has allowed an efficient characterization of the individual and the interaction effects of the factors, in addition to provide the acquisition of precious estimates around the variability of adjustment errors (BEZERRA et al., 2008;ANDERSON and WHITCOMB, 2010). Statistical designs with multiple factors have been commonly used in the construction of mathematical equations that describe the behavior of experimental responses in certain regions of interest (MASON et al., 2003). The functions of first, second, third or more orders, derived from multi-factor statistical designs, have been used in the synthesis of response surface graphs and contour plots. The approach has enabled a quantifying estimation of the factors' effects, identification of critical regions, and optimization of responses (RODRIGUES and IEMMA, 2005;WEISSMAN and ANDERSON, 2015).

Complete factorial designs
The choice between the various types of statistical designs has depended on the objective, number of factors involved, available resources, and especially, the number of required experiments (MASON et al., 2003;WEISSMAN and ANDERSON, 2015). The Complete Factorial (CF) design, a classic multi-factor statistical design, has enabled all the treatment combinations through the use of a matrix with predetermined and equidistant coefficients (FISHER, 1936). CF designs, due to their broad applications in many research fields, have been increasingly used to characterize the individual and combined effects of different type of variables, estimate variations concerning the measurement errors, and also to derive graphs and mathematical functions that represent the behavior of experimental responses (HINKELMANN and KEMPTHORNE, 2007;SHESKIN, 2011;MONTGOMERY, 2017). Depending on the availability of time and experimental resources, the sequences of the factorial testing runs, that is, the assignments of each factorlevel combination, have been capable to be minimally or maximally randomized by algorithms according to different criteria, which tend to decrease nonconstant variances, time trend effects and possible sources of bias (CHENG et al., 1998;ANGELOPOULOS et al., 2009;HILOW, 2013;BHOWMIK et al., 2017). The treatment points of the matrix of a CF design, also called Full Factorial design, with three factors and three levels (3 3 ) were represented in Figure 1. The second order polynomial function generated by the regression model of a 3 3 CF planning was described in equation 28.
According to reports, CF designs have been successfully used in the study of different types of processes, including the optimization of an eco-friendly biosorption process of a cotton dye, and the extraction method of a chemical compound with pharmacological activity from a plant's leaves (HENN et al., 2019;MOGHAZY et al., 2019). The authors stated that the CF designs were effectively used to determine the factors of the processes that were statistically significant. According to the descriptions, the polynomial equations built from the CF designs were successfully used to evaluate the individual and combined influences of the parameters, and provided robust assessments about the variables' effects as a function of the responses. The CF designs were also used by Porto et al. (2019) and Lara et al. (2019) to evaluate the performance of anticorrosive alloys, and to optimize the dielectric properties of a magneto-dielectric composite. According to the authors, the CF designs enabled an efficient investigation of the parameters that were used to achieve desired characteristics, and were successfully used to improve the performance of the systems. Ravindran et al. (2020) also used a CF design to screen out the significant constituents of a growth medium that was used for the cultivation of a freshwater microalga. The CF design was able to determine the multiple significant factors that were affecting the growth rate and the final biomass of the microorganism, as much as the existence of interactions between the variables. = 0 + 1 1 + 11 1 2 + 2 2 + 22 2 2 + 3 3 + 33 3 2 + 12 1 2 + where y refers to the predicted response of the adjusted design, β0 to the value of the intercept, x1, x2 and x3 to the predictor variables, β... to the partial linear coefficients, and e to the adjustment error.

Fractional factorial designs
Due to limitations involving time and availability of resources, Complete Factorial designs have not always been possible to be conducted. In such cases, Fractional Factorial (FF) designs have been used, that is, models that were consisted of a subset of the experimental points of a CF planning, a fraction of the CF treatment runs (BOX and HUNTER, 1961). The fractionalization of factorial designs has been made by the use of confounded effects, whose values could only be attributed to the combined influences of responses, and not to single individual responses, also called aliased effects (MASON et al., 2003). The fractions of the FF matrices have been carefully chosen, ensuring that the effects of interest were not confounded with others of interest or, at least, were confounded with the effects of variables that did not have appreciable magnitudes (MASON et al., 2003). This type of application has enabled a significant reduction of the number of experiments and, even so, has provided a comprehensive investigation of the factors without the loss of relevant information (BOX and HUNTER, 1961;GUNST and MASON, 2009). The treatment points of the matrix of a one-third FF design with three factors and three levels (3 3-1 ) were represented in figure 2. The second order polynomial function generated by the regression model of the FF planning was described in equation 29.
Reports indicated that FF designs have been successfully used to identify significant variables in screening scientific experiments and industrial processes. Studies by Pan et al. (2019), who used a FF design to develop and optimize a fermentation medium for the production of a biopolymer by a lactic bacterium, indicated that the experimental design allowed the authors to simultaneously evaluate a large number of variables in a reduced number of experiments. Lim et al. (2020), in turn, also used a FF plan to optimize the combinations of the drugs that were used in the therapy of cancer cell lines. The authors reported that the FF statistical design enabled an accurately identification and optimization of effective factors combinations. FF designs were also used in the investigation of the impacts that various constituents had on the microbial activity of a plant growth media, and in the study of the effects that design parameters had on the performance of reinforced concrete bridge piers strengthened with steel-reinforced polymer composites (VAN GERREWEY et al., 2020;WAKJIRA et al., 2020). The FF design also allowed Rocha et al. (2021) to produce cost effective and efficient magnetic activated carbon for the adsorptive removal of pharmaceuticals from aqueous media, and Al- Dawalibi et al. (2020) to select the best marketing strategy for the purpose of increasing the sales revenue. The authors stated that the FF designs, in a reduced number of trials, enabled the identification of the most important experimental variables and were also extremely useful to investigate the main effects and possible interactions among them. = 0 + 1 1 + 11 1 2 + 2 2 + 22 2 2 + 3 3 + 33 3 2 + 12 1 2 + 122 1 2 2 + where y refers to the predicted response of the adjusted design, β0 to the value of the intercept, x1, x2 and x3 to the predictor variables, β... to the partial linear coefficients, and e to the adjustment error.

Central Composite Rotational designs
The Central Composite Rotational (CCR) design, developed by Box & Hunter (1957), is consisted of a factorial design with orthogonality and rotationality (rotatability) characteristics, acquired by the insertion of star (alpha) and central points to the factorial arrangements. The conditions have provided independent estimates for the model coefficients and identical variances for all the treatment points, situated at the same distance from the center (CONAGIN, 1982). CCR designs have enabled the obtention of robust experimental information in a relatively modest number of treatments, including the quantification of individual and combined effects of the factors, the estimation of variations related to procedural errors, and also the construction of equations that express responses with curvatures, that is, with second-order functions (MASON et al., 2003;ONYIAH, 2008). The adjusted models of CCR designs have been often used to economically synthesize contour plots and response surface graphs (RODRIGUES and IEMMA, 2005). The treatment points of the matrix of a CCR design with three factors, two levels and sixstar points, were represented in figure 3. The second order polynomial function generated by the regression model of the CCR planning was described in equation 30.
According to reports, a Central Composite design was successfully used to optimize the parameters of a process that was used for the production of bioethanol by a yeast strain (ZANI et al., 2019). Other studies indicated that CCR designs were also effectively applied in the evaluation of performance and optimization of the parameters of an anaerobic codigestion of leachate and glycerol for renewable energy generation, and in the maximization of the phytoremediation process of an arsenic-contaminated water TAKEDA et al., 2020). Statistical CCR designs were, as well, able to optimize the parameters for the production of an halotolerant enzyme by a filamentous fungi grown under solid-state fermentation, and the key factors that were affecting the hydrogen production from coffee waste (DAS NEVES et al., 2020;VILLA MONTOYA et al., 2020). The CCR design was also successfully partitioned into blocks without losing the characteristics of rotationality and orthogonality in fertilization experiments, a highly desirable characteristic (CONAGIN, 1982). According to the researchers, CCR designs were capable to study different factors simultaneously, and effectively quantified their individual and combined effects. The mathematical functions constructed by the CCR designs were seen to have a good fit in comparison to the experimental data, and efficiently enabled the synthesis of response surface and contour plots. The authors stated that CCR statistical designs could be successfully used to control, forecast, and optimize scientific experiments and industrial processes. = 0 + 1 1 + 11 1 2 + 2 2 + 22 2 2 + 3 3 + 33 3 2 + 12 1 2 + 13 1 3 + 23 2 3 + where y refers to the predicted response of the adjusted design, β0 to the value of the intercept, x1, x2 and x3 to the predictor variables, β... to the partial linear coefficients, and e to the adjustment error.

Asymmetrical designs
In some applications, due to the lack of uniform conditions, such as heterogeneous experimental units, restrictions of experimental procedures, factors that have more than two levels, or to the addition of qualitative variables, such as a control group, it has not been possible to include all factor combinations in the factorial designs (MASON et al., 2003;SHESKIN, 2011;MONTGOMERY, 2017). In those situations, however, it has been possible to use factorial arrays that have an unequal number of levels of factors, so called Asymmetrical Factorial (AF) ones (HINKELMANN and KEMPTHORNE, 2007). The designs, also named Mixed-Level factorial designs, have been defined as factorial experiments that have more than two groups of factors with different numbers of levels, in which, all factors in the same group have the same number of levels. For example, 2 m x 3 n experiments, where the m factor has 2 levels each and n factor 3 levels each (HINKELMANN and KEMPTHORNE, 2007). AF designs have been mainly evaluated by employing a factorial analysis of variance for a mixed design, where one of the factors is analyzed as a between-subjects variable and the other factor as a within-subjects variable (SHESKIN, 2011). The method has been used to extract the variability as a separate sum of squares, which reduces the magnitude of the error, enables the simultaneously evaluation of the influence of the treatments, and recognize the presence of carryover effects (SHESKIN, 2011), that is, when the effects of the treatments are not independent from each other.
AF designs, according to NAZIEF et al. (2020) andKUMAR et al. (2017), were successfully applied to evaluate the nutritional and cooking characteristics of brown rice at different storage structure, and to optimize the formulation of solid lipid nanoparticles that enhances the oral bioavailability of poorly water-soluble drugs. They were also effectively used to analyze the performance of a heat exchanger system, optimize the spherical agglomeration crystallization method of a pharmaceutical active agent, and to investigate the empirical result that an audience experienced to repetition of anthropomorphic ads on multiple-media conditions (GYULAI et al., 2018;AGRAWAL et al., 2020;CHANDRASEKARAN et al., 2021). According to the researchers, AF designs, besides highly flexible due to their capacity to evaluate different type of variables with mixed number of levels, were also capable to economically identify important factors that were significantly contributing to the system's responses, quantify the main effects and interactions, as much to optimize the operational parameters in order to obtain the best desired conditions.

Sequential experimentation
A comprehensive investigation, which is generally performed to answer questions, frequently involves a sequence of experiments (MASON et al., 2003). The experimental process, therefore, has been used to provide the knowledge to answer the stated questions and, mostly, foment decisions about further experimentations and extend the investigation (MASON et al., 2003;HINKELMANN and KEMPTHORNE, 2007). The first step of a sequence of experiments has conventionally been a screening experiment, used to identify the key variables that will be examined more comprehensively in subsequent experiments, also termed dominant factors (MASON et al., 2003;RODRIGUES and IEMMA, 2005). Screening designs, defined as a deliberately confounded multifactor experiment that has the objective to filter out important main effects, in other words, to "screen out" factors that presents major impact on the dependent variable, have been developed to efficiently evaluate a large number of factors employing a minimal number of observations and limited experimental resources (SHESKIN, 2011;MONTGOMERY, 2017).
The Plackett-Burman (PB) designs (PLACKETT and BURMAN, 1946), e.g., are Fractional Factorial designs that have been used to screen out a large number of factors using the smallest possible number of combinations of experimental conditions. This type of design, where each independent variable has two levels, allows the testing of all the main effects but none of their interactions (SHESKIN, 2011). After the screening process, more extensive investigations involving only the dominant factors have been typically conducted (MASON et al., 2003). The strategy has mainly involved the sequential use of FF designs, which provides the identification of strong two-factor interactions, followed by the use of response surface methods, such as CF or CCR designs, which enables the comprehensive evaluation of few factors, the derivation of second order models and the geometric representation of the responses (MASON et al., 2003). If the objective of the response surface is to determine specific operational regions or optimal conditions, another sequential procedure is often conducted in order to improve the system until the point of desired response (MONTGOMERY, 2017), a procedure named "steepest ascent" (BOX and WILSON, 1951).
The use of PB designs as a screening experiment, followed by the sequential use of Complete, Central Composite or Central Composite Rotational designs, e.g., were successfully employed to develop and evaluate the properties of an oil-based emulsion gel, optimize the production of an enzyme by an actinomycete in submerged fermentation, and to develop a method of sterols and squalene extraction and determination in cyanobacteria (CÂMARA et al., 2020;FAGUNDES et al., 2021;PATEL et al., 2021). The initial use of FF designs, followed by the sequential use of CF and/or Central Composite designs, in turn, was used by Keijok et al. (2019) to optimize a method used in the synthesis of metallic nanoparticles, by Gautério et al. (2020) to maximize the production of an enzyme by a yeast-like fungus using a by-product of rice grain milling, and by Lv et al. (2019) to optimize an ethanol-water distillation column. According to the authors, the results obtained in the initial screening designs were essential to, in an economical number of experiments, identify significant factors that were affecting the system's response. The sequential use CF, Central Composite and CCR designs, indeed, enabled the researchers to effectively evaluate the effects of the selected variables, their interactions, and also to apply response surface methods to optimize the system's parameters in order to obtain ideal specifications.

Cost analyses
The use of factorial designs has been gaining increasingly importance in costs analyses, also referred as economic evaluations. Those designs have mainly focused in the development of cost-efficient processes and products by using particular criteria, like the expenditures related to specific raw materials, machinery settings, technology employed, fabrication or assembly methods (ASKIN and GOLDBERG, 1988;TACK and VANDEBROEK, 2004). Along that, cost related statistical designs have also aimed at meeting all the product or process functional requirements, obtain the highest possible quality, as well to reduce the environmental negative impacts (SIVAKUMAR et al., 2008;LABIDI et al., 2021). The Taguchi design (TAGUCHI, 1985), e.g, is a Fractional Factorial that was developed to aid the selection of various parameter settings using a minimum number of experimental conditions, and to identify combinations of levels which produce highest robustness, quality performance and low variability over uncontrollable factors (SHESKIN, 2011). The design, firstly developed to improve the quality of manufactured products, has now been frequently applied to design processes to operate consistently and optimally over a variety of conditions, thus, maximize efficiencies and minimize costs (KARNA and SAHAI, 2012).
Taguchi designs, according to reports, were effectively applied to evaluate the economic aspects of the operational conditions of advanced oxidation pre-treatment processes of a reverse osmosis concentrate, the mechanical properties and associated costs of pervious concrete mixtures, as well to analyze and optimize the design of a cost-effective planar waveguide solar concentrator (CAI et al., 2020;KANT et al., 2021;TAHERI and RAMEZANIANPOUR, 2021). Others designs, like the FF, CF, Central Composite and Mixed ones, were used to evaluate economic and environmental aspects of a continuous electrocoagulation process of nitrate removal, analyze the economic sustainability of a methanol production plant using renewable energy sources, and optimize the formulation of cleaning products with better properties and cost (OCHOA et al., 2017;BELLOTTI et al., 2019;KARAMATI-NIARAGH et al., 2019). The authors stated that the factorial designs, in a minimum number of experiments, were capable to identify significant variables that were affecting the system's quality and economic value, estimate optimal parameters conditions, and successfully develop cost-effective products and processes.

Time-Series studies
Factorial designs have also been increasingly employed in time-series studies, regarded as designs with longitudinal data or repeated measures, as they present measurements repeated over a specific length of time (HINKELMANN and KEMPTHORNE, 2007). Time-series studies, also known as Time-trends, have been often applied in medical, pharmaceutical, nutritional, agricultural or psychological applications, where it is intended to study the influence of the treatments over a certain period of time (HINKELMANN and KEMPTHORNE, 2007). Moreover, they have been used in the evaluation of industrial processes due to their natural dynamic behavior and unstable statistical control (VANHATALO et al., 2013). The key aspect of a time-series design is that the time points have a natural temporal ordering and lack independence, since their errors are correlated (CHRISTENSEN, 1996;KUTNER et al., 2005). In some statistical models, moreover, the time points may also not be equidistant, that is, may be placed in unequally spaced intervals of time (HINKELMANN and KEMPTHORNE, 2007). Timeseries factorial designs have been usually analyzed by comparing the time points separately, where each time point is considered as a separate experiment, or jointly through the use of linear filter models, autoregressive models, moving average (ARIMA) and nonstationary models (STEINBERG, 1988;VANHATALO et al., 2013;BOX et al., 2015). The analyses have been also performed based on summarizing measurements, a measure that summary the entire set of time points for each treatment, such as the averages responses for each run (HINKELMANN and KEMPTHORNE, 2007;VANHATALO et al., 2013).
Factorial designs, e.g, were used in a 24-day time-series analysis to investigate the effects and mechanisms that the carbon and nitrogen amendment had on the mineralization of organic phosphorus in microcosm soils, in a 12-month longitudinal study to investigate the role that a pancreatic enzyme supplementation had on patients after a gastric surgery, and in a time-course study related to bone aging gene expression in mice (CATARCI et al., 2018;MISE et al., 2020;WU et al., 2021). They were also effectively used to study daily variations and annual seasonal patterns that fire rotation interval and overstory vegetation type had on ambient soil temperatures in a pine forest, assess the effects of 30 years of nutrition intervention on total and cancer mortality in a population, and to understand and forecast the short and long term of ports demand (WANG et al., 2018;WEISE et al., 2019;DA SILVA and BARBOSA, 2020). According to the researchers, factorial designs, when applied to time-series studies, besides capable to detect experimental time patterns and generate robust diagnoses regarding the dynamics of the systems under study, were also capable to jointly estimate factorial and longitudinal effects, and accurately forecast future events based on previously observed values. Some recent applications of factorial statistical designs in different scientific fields published in the literature were described in table 1.
Fractional Factorial

Medical
Assess the effects of 30 years of nutrition intervention on total and cancer mortality in a population Wang et al. (2018)

Conclusions
Through the use of matrices with all the possible treatment combinations, factorial statistical designs have been extremely useful in scientific experiments and research activities. They have been used to draw accurate inferences from the experimental results, estimate confidence and prediction intervals, characterize relationships among the variables, quantify the individual and combined effects of factors, distinguish assignable causes of variations from random ones, as well to estimate the adequacy and validate the proposed models. The fractionalization of factorial designs has been successfully employed to considerably reduce the number of experiments without the loss of pertinent information. The property has been useful to screen out the variables of interest in experiments with a large number of factors. The orthogonality and rotationality (rotatability) characteristics were achieved by the addition of star (alpha) and central points to the designs' matrices. The condition has provided the achievement of robust estimates regarding the individual and combined effects of the variables, fit responses with curvatures, and derive quadratic polynomial functions regarding the nature of the responses.
From reports found in the literature, it was observed that the Complete, Fractional, Central Composite Rotational and Asymmetrical factorial statistical designs were successfully applied in investigations related to different scientific fields. The designs were used to economically identify variables that were significantly affecting the responses of the systems under study, quantify the magnitude of experimental errors, as well the individual and combined influence of the factors. They were also used to derive mathematical functions that accurately represented the behavior of the responses, elaborate response surface and contour plots, and optimize the system's conditions in order to obtain desired specifications. Moreover, the designs were capable to, in a minimum number of experiments, develop cost-effective products and processes within specific quality standards, diagnose the dynamics of longitudinal investigations, and accurately forecast future events based on previously observed values. Due to their high efficiency and robustness, statistical factorial designs will most likely be keep increasingly used in the future. As addressed in this article, their great applicability, which allow them to be employed in many academic and industrial areas, make them a powerful and important tool in modern researches and development of technology.