Essays on Forecasting with Many Predictors

Description
This dissertation studies how forecasting performance can be improved in big data. The first chapter with Seung C. Ahn considers Partial Least Squares (PLS) estimation of a time-series forecasting model with data containing a large number of time series observations

This dissertation studies how forecasting performance can be improved in big data. The first chapter with Seung C. Ahn considers Partial Least Squares (PLS) estimation of a time-series forecasting model with data containing a large number of time series observations of many predictors. In the model, a subset or a whole set of the latent common factors in predictors determine a target variable. First, the optimal number of the PLS factors for forecasting could be smaller than the number of the common factors relevant for the target variable. Second, as more than the optimal number of PLS factors is used, the out-of-sample explanatory power of the factors could decrease while their in-sample power may increase. Monte Carlo simulation results also confirm these asymptotic results. In addition, simulation results indicate that the out-of-sample forecasting power of the PLS factors is often higher when a smaller than the asymptotically optimal number of factors are used. Finally, the out-of-sample forecasting power of the PLS factors often decreases as the second, third, and more factors are added, even if the asymptotically optimal number of the factors is greater than one. The second chapter studies the predictive performance of various factor estimations comprehensively. Big data that consist of major U.S. macroeconomic and finance variables, are constructed. 148 target variables are forecasted, using 7 factor estimation methods with 11 information criteria. First, the number of factors used in forecasting is important and Incorporating more factors does not always provide better forecasting performance. Second, using consistently estimated number of factors does not necessarily improve predictive performance. The first PLS factor, which is not theoretically consistent, very often shows strong forecasting performance. Third, there is a large difference in the forecasting performance across different information criteria, even when the same factor estimation method is used. Therefore, the choice of factor estimation method, as well as the information criterion, is crucial in forecasting practice. Finally, the first PLS factor yields forecasting performance very close to the best result from the total combinations of the 7 factor estimation methods and 11 information criteria.
Date Created
2021
Agent

Hospital Observational Project Analysis: Doctor/Nurse Commmunication

135345-Thumbnail Image.png
Description
This paper comes from a consulting project that the consulting firm, New Venture Group (NVG), did for a hospital in the southwest United States. The name of the hospital as well as the names of the hospitalists and units for

This paper comes from a consulting project that the consulting firm, New Venture Group (NVG), did for a hospital in the southwest United States. The name of the hospital as well as the names of the hospitalists and units for the hospital will be withheld for confidentiality reasons. The hospital will be referred to as the ‘client’ throughout this paper. New Venture Group is a management consulting firm associated with Arizona State University (ASU), W.P. Carey School of Business and The Barrett Honors College. NVG recruits their consultants directly from the upper-class student body. NVG takes on projects from a wide variety of clients to provide real-world solutions comparable to that of other management consulting firms in the industry.
The client wanted to look into ways to improve patient satisfaction. To improve patient satisfaction the consulting team performed research and held a data collection. The team researched literature for possible improvements in technology, management procedures, and hospital operations protocols. The team then provided the findings and possible implementations to the client. Another item the team looked into was communication between night shift hospitalists and nurses, and possible ways to improve their communication. In the winter of 2010 a data collection was held at the client hospital that measured several different metrics of hospitalist
urse communication. In early 2011 a NVG team provided a descriptive statistics analysis of the results to the client. After the team’s first presentation I joined NVG and the team with this client. The client wanted to dig deeper into the data to find any patterns that were inherent in the data that were not immediately obvious from descriptive statistics. To do this I built over a 150 different regressions to dig from the data as many different patterns that could be found. Most of these regressions found many non-interesting results and a few did find significant interesting results. A report was sent to the client with all the results found. This paper is structured differently than the one delivered to the client in that only the significant interesting results are included and terminology will used for an audience who is familiar with statistics and mathematics. The work in this paper is the combined result of the whole team. My most specific input in this project is the quantitative analysis section. The other parts of this paper are also included so that the reader can see the full results of this consulting project.
Date Created
2012-05
Agent

Application of Bayesian methods to structural models and stochastic frontier production models

152857-Thumbnail Image.png
Description
This dissertation applies the Bayesian approach as a method to improve the estimation efficiency of existing econometric tools. The first chapter suggests the Continuous Choice Bayesian (CCB) estimator which combines the Bayesian approach with the Continuous Choice (CC) estimator suggested

This dissertation applies the Bayesian approach as a method to improve the estimation efficiency of existing econometric tools. The first chapter suggests the Continuous Choice Bayesian (CCB) estimator which combines the Bayesian approach with the Continuous Choice (CC) estimator suggested by Imai and Keane (2004). Using simulation study, I provide two important findings. First, the CC estimator clearly has better finite sample properties compared to a frequently used Discrete Choice (DC) estimator. Second, the CCB estimator has better estimation efficiency when data size is relatively small and it still retains the advantage of the CC estimator over the DC estimator. The second chapter estimates baseball's managerial efficiency using a stochastic frontier function with the Bayesian approach. When I apply a stochastic frontier model to baseball panel data, the difficult part is that dataset often has a small number of periods, which result in large estimation variance. To overcome this problem, I apply the Bayesian approach to a stochastic frontier analysis. I compare the confidence interval of efficiencies from the Bayesian estimator with the classical frequentist confidence interval. Simulation results show that when I use the Bayesian approach, I achieve smaller estimation variance while I do not lose any reliability in a point estimation. Then, I apply the Bayesian stochastic frontier analysis to answer some interesting questions in baseball.
Date Created
2014
Agent

Dissertation on generalized empirical likelihood estimators

152051-Thumbnail Image.png
Description
Schennach (2007) has shown that the Empirical Likelihood (EL) estimator may not be asymptotically normal when a misspecified model is estimated. This problem occurs because the empirical probabilities of individual observations are restricted to be positive. I find that even

Schennach (2007) has shown that the Empirical Likelihood (EL) estimator may not be asymptotically normal when a misspecified model is estimated. This problem occurs because the empirical probabilities of individual observations are restricted to be positive. I find that even the EL estimator computed without the restriction can fail to be asymptotically normal for misspecified models if the sample moments weighted by unrestricted empirical probabilities do not have finite population moments. As a remedy for this problem, I propose a group of alternative estimators which I refer to as modified EL (MEL) estimators. For correctly specified models, these estimators have the same higher order asymptotic properties as the EL estimator. The MEL estimators are obtained by the Generalized Method of Moments (GMM) applied to an exactly identified model. The simulation results provide promising evidence for these estimators. In the second chapter, I introduce an alternative group of estimators to the Generalized Empirical Likelihood (GEL) family. The new group is constructed by employing demeaned moment functions in the objective function while using the original moment functions in the constraints. This designation modifies the higher-order properties of estimators. I refer to these new estimators as Demeaned Generalized Empirical Likelihood (DGEL) estimators. Although Newey and Smith (2004) show that the EL estimator in the GEL family has fewer sources of bias and is higher-order efficient after bias-correction, the demeaned exponential tilting (DET) estimator in the DGEL group has those superior properties. In addition, if data are symmetrically distributed, every estimator in the DGEL family shares the same higher-order properties as the best member.  
Date Created
2013
Agent

Essays In financial and international macroeconomics

149591-Thumbnail Image.png
Description
I study the importance of financial factors and real exchange rate shocks in explaining business cycle fluctuations, which have been considered important in the literature as non-technological factors in explaining business cycle fluctuations. In the first chapter, I study the

I study the importance of financial factors and real exchange rate shocks in explaining business cycle fluctuations, which have been considered important in the literature as non-technological factors in explaining business cycle fluctuations. In the first chapter, I study the implications of fluctuations in corporate credit spreads for business cycle fluctuations. Motivated by the fact that corporate credit spreads are countercyclical, I build a simple model in which difference in default probabilities on corporate debts leads to the spread in interest rates paid by firms. In the model, firms differ in the variance of the firm-level productivity, which is in turn linked to the difference in the default probability. The key mechanism is that an increase in the variance of productivity for risky firms relative to safe firms leads to reallocation of capital away from risky firms toward safe firms and decrease in aggregate output and productivity. I embed the above mechanism into an otherwise standard growth model, calibrate it and numerically solve for the equilibrium. In my benchmark case, I find that shocks to variance of productivity for risky and safe firms account for about 66% of fluctuations in output and TFP in the U.S. economy. In the second chapter, I study the importance of shocks to the price of imports relative to the price of final goods, led by the real exchange rate shocks, in accounting for fluctuations in output and TFP in the Korean economy during the Asian crisis of 1997-98. Using the Korean data, I calibrate a standard small open economy model with taxes and tariffs on imported goods, and simulate it. I find that shocks to the price of imports are an important source of fluctuations in Korea's output and TFP in the Korean crisis episode. In particular, in my benchmark case, shocks to the price of imports account for about 55% of the output deviation (from trend), one third of the TFP deviation and three quarters of the labor deviation in 1998.
Date Created
2011
Agent