Estimating Optimal Transformations for Multiple Regression Using the Ace Algorithm

In: Computers and Technology

Submitted By Tonywu
Words 6000
Pages 24
Journal of Data Science 2(2004), 329-346

Estimating Optimal Transformations for Multiple Regression Using the ACE Algorithm
Duolao Wang1 and Michael Murphy2 School of Hygiene and Tropical Medicine and 2 London School of Economics

1 London

Abstract: This paper introduces the alternating conditional expectation (ACE) algorithm of Breiman and Friedman (1985) for estimating the transformations of a response and a set of predictor variables in multiple regression that produce the maximum linear effect between the (transformed) independent variables and the (transformed) response variable. These transformations can give the data analyst insight into the relationships between these variables so that relationship between them can be best described and non-linear relationships can be uncovered. The power and usefulness of ACE guided transformation in multivariate analysis are illustrated using a simulated data set as well as a real data set. The results from these examples clearly demonstrate that ACE is able to identify the correct functional forms, to reveal more accurate relationships, and to improve the model fit considerably compared to the conventional linear model. Key words: Alternating conditional expectation (ACE) algorithm, nonparametric regression, transformation.

1. Introduction In regression analysis, we try to explain the effect of one or more independent variables (predictors or covariates) on a dependent variable (response). The initial stages of data analysis often involve exploratory analysis. Instead of imposing preconceived models, we seek insight into the nature of relationships in the data set and, if possible, the underlying phenomena that might have produced the observed data values. Unfortunately traditional multiple regression techniques are limited in this respect since they usually require a priori assumptions about the functional forms…...

Similar Documents

Multiple Regression

...Introduction to Multiple Regression Dale E. Berger Claremont Graduate University http://wise.cgu.edu Overview Multiple regression is a flexible method of data analysis that may be appropriate whenever a quantitative variable (the dependent or criterion variable) is to be examined in relationship to any other factors (expressed as independent or predictor variables). Relationships may be nonlinear, independent variables may be quantitative or qualitative, and one can examine the effects of a single variable or multiple variables with or without the effects of other variables taken into account (Cohen, Cohen, West, & Aiken, 2003). Multiple Regression Models and Significance Tests Many practical questions involve the relationship between a dependent or criterion variable of interest (call it Y) and a set of k independent variables or potential predictor variables (call them X1, X2, X3,..., Xk), where the scores on all variables are measured for N cases. For example, you might be interested in predicting performance on a job (Y) using information on years of experience (X1), performance in a training program (X2), and performance on an aptitude test (X3). A multiple regression equation for predicting Y can be expressed a follows: (1) [pic] To apply the equation, each Xj score for an individual case is multiplied by the corresponding Bj value, the products are added together, and the constant A is added to......

Words: 1415 - Pages: 6

Optimal Power Allocation in Multi-Relay Mimo Cooperative Networks: Theory and Algorithms

...Optimal Power Allocation in Multi-Relay MIMO Cooperative Networks: Theory and Algorithms Abstract Cooperative networking is known to have significant potential in increasing network capacity and transmission reliability. Although there have been extensive studies on applying cooperative networking in multi-hop ad hoc networks, most works are limited to the basic three-node relay scheme and single-antenna systems. These two limitations are interconnected and both are due to a limited theoretical understanding of the optimal power allocation structure in MIMO cooperative networks (MIMO-CN). In this paper, we study the structural properties of the optimal power allocation in MIMO-CN with per-node power constraints. More specifically, we show that the optimal power allocations at the source and each relay follow a matching structure in MIMO-CN. This result generalizes the power allocation result under the basic three-node setting to the multi-relay setting, for which the optimal power allocation structure has been heretofore unknown. We further quantify the performance gain due to cooperative relay and establish a connection between cooperative relay and pure relay. Finally, based on these structural insights, we reduce the MIMO-CN rate maximization problem to an equivalent scalar formulation. We then propose a global optimization method to solve this simplified and equivalent problem. Architecture Existing System In Existing System, the multi-hop ad hoc......

Words: 1026 - Pages: 5

Advanced Algorithms

...Approximation Algorithms Springer Berlin Heidelberg NewYork Barcelona Hong Kong London Milan Paris Singapore Tokyo To my parents Preface Although this may seem a paradox, all exact science is dominated by the idea of approximation. Bertrand Russell (1872–1970) Most natural optimization problems, including those arising in important application areas, are NP-hard. Therefore, under the widely believed conjecture that P = NP, their exact solution is prohibitively time consuming. Charting the landscape of approximability of these problems, via polynomial time algorithms, therefore becomes a compelling subject of scientific inquiry in computer science and mathematics. This book presents the theory of approximation algorithms as it stands today. It is reasonable to expect the picture to change with time. The book is divided into three parts. In Part I we cover a combinatorial algorithms for a number of important problems, using a wide variety of algorithm design techniques. The latter may give Part I a non-cohesive appearance. However, this is to be expected – nature is very rich, and we cannot expect a few tricks to help solve the diverse collection of NP-hard problems. Indeed, in this part, we have purposely refrained from tightly categorizing algorithmic techniques so as not to trivialize matters. Instead, we have attempted to capture, as accurately as possible, the individual character of each problem, and point out connections between problems and algorithms for solving......

Words: 140657 - Pages: 563

Forecasting Gold Prices Using Multiple Linear Regression Method

...Forecasting Gold Prices Using Multiple Linear Regression Method Z. Ismail, 2A. Yahya and 1A. Shabri Department of Mathematics, Faculty of Science 2 Department of Basic Education, Faculty of Education University Technology Malaysia, 81310 Skudai, Johor Malaysia 1 1 Abstract: Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR) model. MLR is a study on the relationship between a single......

Words: 3920 - Pages: 16

Optimum Thresholding Using Fuzzy Techniques

...Dissertation Phase-I: Synopsis Topic: OPTIMUM THRESHOLDING USING FUZZY TECHNIQUES Guided by- Presented by- Mr.Puneet Manocha Anupama (Roll No.1600872) Assit. Professor IIIrd Semester, M.Tech (ICE) OBJECTIVE: * To review different research papers based on Fuzzy Thresholding. * To apply fuzzy thresholding technique to an image * To calculate optimum threshold using Gamma membership function. LITERATURE REVIEW: Introduction: Typical computer vision applications usually require an image segmentation-preprocessing algorithm as a first procedure. At the output of this stage, each object of the image, represented by a set of pixels, is isolated from the rest of the scene. The purpose of this step is that objects and background are separated into non-overlapping sets. There are various techniques of segmentation and among them threshold is much simpler than other segmentation techniques. Usually, this segmentation process is based on the image gray-level histogram. In that case, the aim is to find a critical value or threshold. Through this threshold, applied to the whole image, pixels whose gray levels exceed this critical value are assigned to one set and the rest to the other. For a well-defined image, its histogram has a deep valley between two peaks. Around these peaks the object and......

Words: 2221 - Pages: 9

Ace Repair Case Study

...Ace Repair Inc. Case Study The owner of Ace Repair Inc, Peter Vanderhein thought that he needed special expertise for the finance function forced by the rapid growth of the company and he recognized that the estimate of the cost of capital itself was questionable. The controller who is in charge of the financial part has been using book value weights to calculate WACC but it considers only long-term capital value. At this point, the problem is that they want to decide what weights should be used and know how much difference of WACC calculated by the choice of weights and current problems of procedures for estimating the costs of debt and equity. When Peter Vanderhein was in college as a business student, he found that most shops managed by his uncle did not operate properly by inefficiency in excessive stocks of some items, shortages, late payments and etc. Peter Vanderhein took over shops from his uncle and started Ace Repair, Inc in 1979. After he adopted the new managerial system based on computers and software and he trained his employees in the use of the new equipment, he could expand his business. He bought three additional shops in 1982, and Ace Repair was expanded to 243 shops located throughout the Midwest.  Currently, Ace has 6.2261 million common shares outstanding of which Peter owns 17 percent. The stock sells in the over-the-counter market for $30.50 per share.         As company grows significantly backed by rapid growth in profits, Peter started to......

Words: 3669 - Pages: 15

Multiple-Regression

...MULTIPLE REGRESSION After completing this chapter, you should be able to: understand model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model use variable transformations to model nonlinear relationships recognize potential problems in multiple regression analysis and take the steps to correct the problems. incorporate qualitative variables into the regression model by using dummy variables. Multiple Regression Assumptions The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Tools / Data Analysis… / Correlation Can check for statistical significance of correlation with a t test Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per......

Words: 1561 - Pages: 7

Multiple Linear Regression

...In multiple linear regression analysis, R2 is a measure of the ________. A) homoskedasticity of the predictors B) misclassification rate C) percentage of the variance of the dependent variable that is explained by the set of independent (predictor) variables D) precision of the resulting model when applied to the validation data 2. Categorical variables can be used in a multiple linear regression model _________. A) by partitioning of the dataset B) when no multicollinearity among the independent variables is present C) when the sample size is at least 10 times that of the number of variables D) through the use of dummy variables 3. In multiple linear regression analysis “multicollinearity” refers to _________. A) two or more predictors sharing the same linear relationship with the outcome variable B) a high degree of correlation between the dependent variables C) the equality of the variance of the dependent throughout its range of values D) None of the above. 4. In multiple regression analysis, which of the following is an example of a subset selection algorithm? A) Forward selection B) Backwards elimination C) Stepwise regression D) All of the above 5. _________ is an important property of a good model. A) Complexity B) Independence C) Parsimony D) None of the bove 6. An assumption that applies to the linear multiple regression method is that the distribution of the error term values should be ________. A)......

Words: 460 - Pages: 2

Introduction to Algorithms

...T C L I F F O R D STEIN INTRODUCTION TO ALGORITHMS T H I R D E D I T I O N Introduction to Algorithms Third Edition Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Introduction to Algorithms Third Edition The MIT Press Cambridge, Massachusetts London, England c 2009 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. For information about special quantity discounts, please email special sales@mitpress.mit.edu. This book was set in Times Roman and Mathtime Pro 2 by the authors. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Introduction to algorithms / Thomas H. Cormen . . . [et al.].—3rd ed. p. cm. Includes bibliographical references and index. ISBN 978-0-262-03384-8 (hardcover : alk. paper)—ISBN 978-0-262-53305-8 (pbk. : alk. paper) 1. Computer programming. 2. Computer algorithms. I. Cormen, Thomas H. QA76.6.I5858 2009 005.1—dc22 10 9 8 7 6 5 4 3 2 2009008593 Contents Preface xiii I Foundations 1 2 Introduction The Role of Algorithms in Computing 5 1.1 Algorithms 5 1.2 Algorithms as a technology 11 Getting Started 16 2.1 Insertion sort 16 2.2 Analyzing algorithms 23 2.3 Designing algorithms 29 3 3 4 Growth of Functions 43 3.1......

Words: 387342 - Pages: 1550

Using Multiple Intelligences

...Brittany PSY-100 Psychology in Everyday Life March 12, 2015 Kelley Using Multiple Intelligences Howard Gardner discusses nine distinct intelligences which help us identify information that surrounds us. According to Jung and Gardner theory, we are all able to know the world through linguistics, logical-mathematical skills, spatial representation, musical thinking, bodily kinesthetic, interpersonal understanding, intrapersonal understanding, naturalist, and existential (Gardner). Gardner believed that bits and pieces of each category could be formed together in one person, or be singular, depending on the person (Child Development Theorists: Freud to Erikson, to Spock...and beyond). Using each intelligence to our advantage will help us understand and do more than we ever thought was possible. These intelligences also provide ways to learn how to communicate and understand other individuals, while having a better understanding for ourselves. For these reasons it is important to know the Jung and Gardner theories on multiple intelligences, identify and assess our own intelligences based on the theories of Jung and Gardner, and how to identify and compare our own and others intelligences based on the Jung and Gardner theories. Howard Gardner never planned on being a psychologist until he met Erik Erikson. Erik Erikson was psychologist who worked on a project called PIAGET, and because of this project Gardner decided that psychology was the field for him (Child......

Words: 1154 - Pages: 5

Multiple Regression Analysis of Rb in Bangladesh

...Benckiser A Report on “Multiple Regression Analysis of Determinants of Dividend Payout Ratio of Reckitt Benckiser” Acknowledgement It is a great honor for us to submit this report to our respected teacher. At first we want to convey our thanks and gratitude to her for assigning us to prepare report entitled, “Reckitt Benckiser”. It would not have been possible for us to complete the report, but for his help. All of the efforts ended at a desired point for the cooperation and hard work, Sincerity and seriousness of our group members. So, all of them as well as our group members are worth of pure compliment. Letter of Transmittal February 14, 2015 Dear Sir, Subject: Submitting the report on “Determinants of dividend payout ratio of Reckitt Benckiser”. We are submitting a well-structured and comprehensive report on Reckitt Benckiser”. Despite many constraints like scope and access to information, we have tried to create something satisfactory. We have tried to follow your guideline in every aspects of preparing this report. We have concentrated on the most relevant and logical areas to make our report coherent as well as practical. We hope this report will entice your kind appreciation. Sincerely, ________________ Executive Summery Reckitt Benckiser is a global leader in household, health and personal care sectors and one of the fast growing multinationals. In our report we mainly deal with Multiple regression analysis of......

Words: 3639 - Pages: 15

Optimal Packet Routing Scheme with Multiple Sources and Multiple Destinations

...Project Report CS239: Computational Geometry Optimal Packet Routing Scheme with Multiple Sources and Multiple Destinations Submitted by (403-134-387) Winter 2004 Input / Problem The idea of the problem originates from the need of online companies like amazon, netflix, etc, to send the merchandize purchased by customers. The customers are located at different places. How these packets are routed so as to minimize the cost is a big problem. Most of the times these stores have multiple distribution centers or outlets from where the demand can be met. Thus the problem takes the shape of multiple source, multiple, multiple destinations. Given ‘n’ Source points (Distribution centers), and ‘m’ destination/consumer points. Location of the distribution and destination points described in their (x,y) co-ordinates. The measure of cost is the distance metric (length of the route). Output The output of the problem would be, which distribution center would serve which destination points and how the packet would be routed to minimize cost. Note: If the routes are represented as a graph then the graph would be a disconnected graph, depending on what destination nodes are served by which distribution center node. What is known? The general case of this problem, without any restrictions, can be modeled as Steiner Tree problem. It is a well known problem and its computation has been shown to be NPHard, by Garey, Graham and Johnson (1976). Approach In this analysis...

Words: 2950 - Pages: 12

Multiple Regression

...Project Title: A STATISTICAL ANAYLYSIS OF NBA PLAYER SALARIES USING A MULTIPLE REGRESSION. ABSTRACT Basketball is one of the most popular sports in the world and National Basketball Association (NBA) is the most popular basketball league in the world. The NBA league is based on the United States of America and it consists of 30 teams. The NBA is so popular that the NBA finals are the 2nd most watched televised event in the U.S. after the NFL (National Football League) Super Bowl. Sometimes when we think about NBA players and the enormous amount of money they are making, we become a little jealous. It is well known about how some star players make so much money or are over-paid and yet can hardly form a sentence. The greatest challenge for the board of NBA has been how to harmonize the salaries. Due to this various people have tried to come up with different solutions .Some argue that height ,weight and physical strength play a big role in team winning but this is not the case as some players who are short help their teams win in several occasions. To solve this problem a multiple regression analysis will be utilized to analyze the salary data. A relationship will be established between the salary and performance variables. The other challenge will be choosing the model parameters that will be significant in order to be included in the model that will be developed. This can be solved by arranging the factors affecting an NBA player salary in a decreasing order of......

Words: 1819 - Pages: 8

Predicting Stock Market Using Regression Techniques-

...Research Journal of Finance and Accounting www.iiste.org ISSN 2222-1697 (Paper) ISSN 2222-2847 (Online) Vol.6, No.3, 2015 27 Predicting Stock Market using Regression Technique Prof. Mitesh A. Shah1* Dr.C.D.Bhavsar2 1.Department of Statistics, S.V. Vanijya Mahavidyalaya, Ahmedabad, Gujarat, India 2.Department of Statistics, Gujarat University, Ahmedabad, Gujarat, India Email of the corresponding Author: m_a_shah73@yahoo.com Abstract We use two and half year data set of 50 companies of Nifty along with Nifty from 1st Jan 2009 to 28th June 2011 and apply multivariate technique for data reduction, namely Factor Analysis. Using Factor analysis we reduce these 50 companies’ data (50 variables) into the most significant 4 FACTORS. These four significant factors are then used to predict the Nifty using Multiple linear regression. We observed that the model is good fitted and it explained 90 % of the total variance. Keywords: Nifty, Factor Analysis, Multiple Linear Regression, Data reduction 1. Introduction: In this paper, we applying data reduction technique of Factor analysis on the Nifty Stocksand then predict NIFTY using Multiple Linear Regression Technique. Factor analysis is a statistical technique to study interrelationship among the Variables. The idea behind factor analysis is grouping the variables by their correlation in such a way that particular group is highly correlated among themselves but relatively smaller correlation with the variables in other......

Words: 3557 - Pages: 15

Regression Case

...------------------------------------------------- REYEM AFFAIR Regression Case Quantitative Methods II To ------------------------------------------------- Prof. Arnab Basu On October 21, 2011 By GROUP NO. 5 Bharati vishal (11110) akshay ram (11110) dhanashree vinayak shirodkar (11110) amol devnath kumbhare (11110) ajusal sugathan (11110) arun prabu (11110) ghule nilesh vishnu (11110) mudavath swetha (11110) Raja Simon J (1111052) sagar behera (11110) shreya sethi (11110) swati murarka (11110) Indian Institute Of Management, Bangalore Table of Contents S.No | Particulars | Pages | 1. | Executive Summary | 3-4 | 2. | Understanding of the Problem | 4 | 3. | Model Description | 5-13 | | Model 1Prediction interval Vs Confidence IntervalStep wise Regression: A closer lookTest of Model: Analysis of Results | 5-8 | | | 6 | | | 7 | | | 8 | | Model 2Test of Model: Analysis of Results | 9-13 | | | 11-13 | | Other Models | 13 | 4. | Conclusions and Recommendations | 14 | 5. | Appendix 1. Variables Entered/Removed 2. Model Summary 3. ANOVA 4. Coefficients 5. Residual Statistics | 15 | Executive Summary Reyem Affiar has recently found the below described condominium in Mid-Cambridge that he wants to purchase. Street Address : 236 Ellery Street Last Price : $169000 Area & Area Code : M/9 Bed : 2 Bath : 1 Rooms : 5 Interior : 1040 Condo : $175 Tax : $1121 RC :......

Words: 8503 - Pages: 35

Down on Me (Man of the Month #1) by J. Kenner | Cepillo de Dientes Eléctrico Braun Oral-B Advance Power - Pilas incluidas | IObit Malware Fighter Pro 6 2 0 4770 + Crack [CracksMind]