Implementing Portfolio Selection by Using Data Mining

In: Computers and Technology

Submitted By coolananddd
Words 10967
Pages 44
The Chinese University of Hong Kong
Department of Computer Science and
Engineering

Final Year Project
Trading Strategy and Portfolio
Management (LWC 1301)
Implementing Portfolio Selection By
Using Data mining

Tseng Ling Chun (1155005610)
Supervisor: Professor Chan Lai Wan
Marker: Professor Xu Lei
1

Table of Contents
Table of Contents………………………………………….…………………………………………………2
1. Introduction………………………………………….…………………………………………................4
1.1 Financial Portfolios.......................................................................................................4
1.2 Data Mining and Decision Trees………………………………………..................….4
1.3 Flow of Report……………………………………….....................................................….5
2. Classification and Regression Trees (CART) …………………………………..........……….6
2.1 Detailed description of CART……………………………………................................6
2.2 Tree Construction………………………………………..............................................….8
2.2.1 Application of Impurity Function in CART……………………...…...9
2.3 Splitting Rules…………........……………...………….………………………….......……11
3. Optimizing Size of Tree……………………………....………..................................................….12
3.1 Parameterization of Trees…………………………………...........................……….13
3.2 Cost – Complexity Function……………………………………....….........................14
3.3 V – Fold Cross – Validation……………………………………..........................…….15
4. Iterative Dichotomiser 3 (ID3) …………………………………....…..................................….18
4.1 Entropy and Information Gain……………....……………..................…………….19
5. Data Used…………………………………..................……….………………………………………….20
5.1 Platforms and Open Source Library…................................................................20
5.1.1 Testing and Development Environment…….......…................…...20
5.1.2…...

Similar Documents

Data Mining

...Data Mining Jenna Walker Dr. Emmanuel Nyeanchi Information Systems Decision Making May 30, 2012 Abstract Businesses are utilizing techniques such as data mining to create a competitive advantage customer loyalty. Data mining allows business to analyze customer information, such as demographics and purchase history for a better understanding of what the customers need and what they will respond to. Data mining currently takes place in several industries, and will only become even more widespread as the benefits are endless. The purpose of this paper is to gain research and examine data mining, its benefits to businesses, and issues or concerns it will need to overcome. Real world case studies of how data mining is used will also be presented for a deeper understanding. This study will show that despite its disadvantages, data mining is an important step for a business to better understand its customers, and is the future of business marking and operational planning. Tools and Benefits of data mining Before examining the benefits of data mining, it is important to understand what data mining is exactly. Data mining is defined as “a process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases, including data warehouses” (Turban & Volonino, 2011). The information identified using data mining includes patterns indicating......

Words: 1900 - Pages: 8

Data Mining

...Data Mining Prepared by: Kirsten Sullivan Strayer University CIS 500 Dr. Baab September 9, 2012 Data mining is a concept that companies use to gain new customers or clients in an effort to make their business and profits grow. The ability to use data mining can result in the accrual of new customers by taking the new information and advertising to customers who are either not currently utilizing the business's product or also in winning additional customers that may be purchasing from the competitor. Generally, data are any “facts, numbers, or text that can be processed by a computer.”1 Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes operational or transactional data such as, sales, cost, inventory, payroll, and accounting. Data mining also known as “knowledge discovery”, is the process of analyzing data from different perspectives and summarizing it into useful information- information that can then be used to increase revenue, cuts costs, and continue the goals outlined for the company. Data mining consists of five major elements: “Extract, transform, and load transaction data onto the data warehouse system, store and manage the data in a multidimensional database system, provide data access to business analysts and information technology professionals, analyze the data by application software, present the data in a useful format, such as a graph or......

Words: 1778 - Pages: 8

Data Mining

...A data warehouse is a subject-oriented, integrated, time-variant, nonupdateable collection of data used to support management decision-making and business intelligence (Hoffer, 2011). Business Intelligence (BI) is a term that describes a comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store and analyze data with the purpose of generating and presenting information to support decision making (Coronel, Morris, & Rob, 2013). Data Warehouse A data warehouse enables an organization to obtain the information about future trends and track customer demands. The key terms that define data warehouse are subject-oriented, integrated, time-variant, and nonupdateable. Each one has its meaning and importance in data warehousing. Subject-oriented – A data warehouse is organized around the key subjects that may include but not limited to customers, patients, students, products, and time. Integrated – The data in the data warehouse are defined using consistent naming conventions, formats, structures, and related characteristics. This means data warehouse holds one version of “the truth”. Time-variant – Data in the data warehouse contain a time dimension so they could be used to study trends and changes. Nonupdateable – Once the data gets loaded into the data warehouse, it could not be updated by the end users. Data warehousing is a process where organizations create and maintain data warehouses and extract......

Words: 1390 - Pages: 6

Data Mining

...Data Mining: What is Data Mining? Overview Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Continuous Innovation Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. Example For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to......

Words: 1657 - Pages: 7

Data Mining

...DATA MINING FOR INTELIIGENCE LED POLICING The paper concentrates on use of data mining techniques in police domain using associative memory being the main technique. The author talks of constructing the data being easier and thus giving effective decision making. The author does mention making the process as simple as possible since there are not enough technically sound people into databases. The process involves a step procedural method. Further the author does explain the advantages of this system in police environment. The author mentions use of data mining activities by Dutch forces and how it makes the work easier to predict and analyze the scenario. The author talks about the tool and name given to it as Data detective. This tool involved a chunk of data stored in data warehouse. There has been a continuous development in the tool used here throughout the years making it more efficient than before. The data mining tool automatically predicts the trend and the lays down all the statistical report. This tool makes it easier for the police to pin out criminals and their trends easily. The process raises a challenge so that a predictive modeling can be developed better than before. The author talks about understanding the links and then predicting is important. The author also mentions that this involves pattern matching which is achieved by data mining. The tool also helps in automatic prediction of criminal nature matches a profile and this leads to be......

Words: 1306 - Pages: 6

Data Mining

...Data Mining Objectives: Highlight the characteristics of Data mining Operations, Techniques and Tools. A Brief Overview Online Analytical Processing (OLAP): OLAP is the dynamic synthesis, analysis, and consolidation of large volumns of multi-dimensional data. Multi-dimensional OLAP support common analyst operations, such as: ▪ Considation – aggregate of data, e.g. roll-ups from branches to regions. ▪ Drill-down – showing details, just the reverse of considation. ▪ Slicing and dicing – pivoting. Looking at the data from different viewpoints. E.g. X, Y, Z axis as salesman, Nth quarter and products, or region, Nth quarter and products. A Brief Overview Data Mining: Construct an advanced architecture for storing information in a multi-dimension data warehouse is just the first step to evolve from traditional DBMS. To realize the value of a data warehouse, it is necessary to extract the knowledge hidden within the warehouse. Unlike OLAP, which reveal patterns that are known in advance, Data Mining uses the machine learning techniques to find hidden relationships within data. So Data Mining is to ▪ Analyse data, ▪ Use software techniques ▪ Finding hidden and unexpected patterns and relationships in sets of data. Examples of Data Mining Applications: ▪ Identifying potential credit card customer groups ▪ Identifying buying patterns of customers. ▪ Predicting trends of......

Words: 1258 - Pages: 6

Data Warehousing and Data Mining

...Introduction 2 Assumptions 3 Data Availability 3 Overnight processing window 3 Business sponsor 4 Source system knowledge 4 Significance 5 Data warehouse 6 ETL: (Extract, Transform, Load) 6 Data Mining 6 Data Mining Techniques 7 Data Warehousing 8 Data Mining 8 Technology in Health Care 9 Diseases Analysis 9 Treatment strategies 9 Healthcare Resource Management 10 Customer Relationship Management 10 Recommended Solution 11 Corporate Solution 11 Technological Solution 11 Justification and Conclusion 12 References 14 Health Authority Data (Appendix A) 16 Data Warehousing Implementation (Appendix B) 19 Data Mining Implementation (Appendix B) 22 Technological Scenarios in Health Authorities (Appendix C) 26 Technology Tools 27 Data Management Technology Introduction The amount of information offered to us is literally astonishing, and the worthiness of data as an organizational asset is widely acknowledged. Nonetheless the failure to manage this enormous amount of data, and to swiftly acquire the information that is relevant to any particular question, as the volume of information rises, demonstrates to be a distraction and a liability, rather than an asset. This paradox energies the need for increasingly powerful and flexible data management systems. To achieve efficiency and a great level of productivity out of large and complex datasets, operators need have tools that streamline the tasks of managing the data and......

Words: 8284 - Pages: 34

Data Mining

...A Statistical Perspective on Data Mining Ranjan Maitra∗ Abstract Technological advances have led to new and automated data collection methods. Datasets once at a premium are often plentiful nowadays and sometimes indeed massive. A new breed of challenges are thus presented – primary among them is the need for methodology to analyze such masses of data with a view to understanding complex phenomena and relationships. Such capability is provided by data mining which combines core statistical techniques with those from machine intelligence. This article reviews the current state of the discipline from a statistician’s perspective, illustrates issues with real-life examples, discusses the connections with statistics, the differences, the failings and the challenges ahead. 1 Introduction The information age has been matched by an explosion of data. This surfeit has been a result of modern, improved and, in many cases, automated methods for both data collection and storage. For instance, many stores tag their items with a product-specific bar code, which is scanned in when the corresponding item is bought. This automatically creates a gigantic repository of information on products and product combinations sold. Similar databases are also created by automated book-keeping, digital communication tools or by remote sensing satellites, and aided by the availability of affordable and effective storage mechanisms – magnetic tapes, data warehouses and so on. This has created a......

Words: 22784 - Pages: 92

Data Mining

...Data Mining/Data Warehousing Matthew P Bartman Strayer University Ibrahim Elhag CIS 111– Intro to Relational Database Management June 9, 2013 Data Mining/Data Warehousing When it comes to technology especially in terms of storing data there are two ways that it can be done and that is through data mining and data warehousing. With each type of storage there are trends and benefits. In terms of data warehousing there are 5 key benefits one of them being that it enhance business intelligence. What this means is that business processes can be applied directly instead of things having to be done with limited information or on gut instinct. Another benefit of data warehousing is that it can also save time meaning that if a decision has to be made the data can be retrieved quickly instead of having to find data from multiple sources. Not only does data warehousing enhance business intelligence and save time but it can also enchance data quality and consistency.This is accomplished by converting all data into one common format and will make it consistent with all departments which ensures accuracy with the data as well. While these key benefits another one is that it can provide historical intelligence which means that analayze different time periods and trends to make future predictions. One other key benefit is that it provides a great return on investment. The reason being that a data warehouse generates more......

Words: 2018 - Pages: 9

Data Mining

...Data Mining I found the topic of data mining very interesting in that it uncovers coveted information needed for improving and refining our daily lives. Information regarding traffic patterns, flight arrivals, consumer purchases, education, is collected and analyzed to improve a particular model. The data mining process is designed to gather information from a targeted sample which will enable companies to refine their business model in order to become more profitable. This process is not engineered to accumulate more information for an organization but to extract more meaningful information and correlate patterns of information that already exists in their data base. The importance of this information will allow companies to better analyze information to make quick effective decisions which will spur productivity. Data mining in turn can monitor and analyze these results to effectively manage assets. Organizations will be able to better predict the results of their decision making. How Data Mining Works A sample size is created by targeting large amounts of relevant information that is small enough to process. The information is then studied to find relationships which were anticipated , analyze trends, and recognize irregularities to gain knowledge for a design. “The data is then modified to transform the variables to focus the model selection process. A model is then selected by using analytical tools to search for a combination of data that......

Words: 888 - Pages: 4

Portfolio Selection

...Disadvantages of Markowitz approach: The Markowitz method is very sensitive to small changes in the initial conditions, that is in the choice of the data period. Sometimes even changing the analysed period by a few days will greatly alter the composition of the portfolio. Therefore, there is no certainty that the used parameters are stable enough over time. Markowitz’ optimizers maximize errors. It is not possible to estimate exactly the expected returns, variances and covariances. It is assumed that the returns of the optimised assets follow a normal distribution, which in practice does not hold in all cases. Therefore, estimation errors are inevitable. This is especially true when the number of stocks under consideration is large when compared to the return history in the sample - which is the typical situation in practice. As a result, the investor is suggested to invest in extremely under-diversified portfolios or in the portfolios which contain large short positions - which can be seen inVariance is a method of risk calculation through measuring variance around the expected return. However, only losses represent a real risk – therefore it is questionable, if variance is a proper risk-measuring tool. In Markowitz approach, only the expected return is taken into account when modelling the future expected uncertainties. It is a great simplification, as in fact many more factors are relevant – such as the employment rate, economic growth etc. In times of economic crisis......

Words: 471 - Pages: 2

Data Warehousing and Data Mining

...According to Lee, the most popular definition is a data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process (2014). Basically a data warehouse is a copy of transaction data specifically structured for query and analysis. According to Frand, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cut costs, or both (1997). There are many benefits of data warehousing. Yes, it will cost large amounts of money from businesses to have a data warehouse but, in the long run it is worth it to have in a corporation. One benefit is that data warehouses stores and presents information in a way that allows management to make important decisions (Prathap, 2014). Management and even executives can look at the business as a whole instead of by each department. According to Prathap, another benefit of data warehouses is their ability to handle server tasks connected to querying which is not used in most transaction systems (2014). Creating queries and reports can take time and with data warehousing, the server can handle the tasks in a timely fashion. Again, according to Prathap, one of the most important benefits of data warehouses is that they set the stage for an environment where a small amount of technical knowledge about databases...

Words: 1726 - Pages: 7

Data Mining

...ITT TECHNICAL INSTITUTE DATA MINING REPORT ONE AND TWO Shirly Mu, Todd Hughson, James Wall PROBLEM SOLVING THEORY ONSITE COURSE GS1140 INSTUCTOR LJILJANA MORRIS Today we are doing a project report on Costco. For Sol and Robert Price in 1976 they asked friends and family to help out with an opening price of two point five millon to open Price Club on July twelfth, they open their shop in an air hanger on Boulevard in San Diego, California. They were originally going to serve only small business. Mr. Price found out that it will be more beneficial to serve select customers. Costco was founded by James Sinegal and Jeffery H. Brotman. Costco opened its doors in 1983 in Seattle, Washington. Price Club and Costco later merged and renamed the business PriceCostco. And in 1997 due to its success the name was changed again to Costco Wholesale. (About History, 2014) Costco Wholesale stores headquarters are located in Issaquah, WA. The mission statement for Costco is to continually provide our members with quality goods and services at the lowest possible prices. (Farfan, 2014) There are three types of membership cards at Costco; Executive membership, Business membership and Gold membership. The one I have is executive membership this cost about one hundred and twenty dollars sounds like a lot but I’ll be able to get two percent back in my shopping. If I do not get back more than fifty five dollars back for the entire year they will give me that amount or the two percent......

Words: 1306 - Pages: 6

Data Mining

...1. Define data mining. Why are there many different names and definitions for data mining? Data mining is the process through which previously unknown patterns in data were discovered. Another definition would be “a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequent knowledge from large databases.” This includes most types of automated data analysis. A third definition: Data mining is the process of finding mathematical patterns from (usually) large sets of data; these can be rules, affinities, correlations, trends, or prediction models. Data mining has many definitions because it’s been stretched beyond those limits by some software vendors to include most forms of data analysis in order to increase sales using the popularity of data mining. What recent factors have increased the popularity of data mining? Following are some of most pronounced reasons: * More intense competition at the global scale driven by customers’ ever-changing needs and wants in an increasingly saturated marketplace. * General recognition of the untapped value hidden in large data sources. * Consolidation and integration of database records, which enables a single view of customers, vendors, transactions, etc. * Consolidation of databases and other data repositories into a single location in the form of a data warehouse. * The exponential......

Words: 4581 - Pages: 19

Data Mining

...Data Mining 6/3/12 CIS 500 Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. This information can be used to increase revenue, cut costs or both. Data mining software is a major analytical tool used for analyzing data. It allows the user to analyze data from many different angles, categorize the data and summarizing the relationships. In a nut shell data mining is used mostly for the process of finding correlations or patterns among fields in very large databases. What ultimately can data mining do for a company? A lot. Data mining is primarily used by companies with strong customer focus in retail or financial. It allows companies to determine relationships among factors such as price, product placement, and staff skill set. There are external factors that data mining can use as well such as location, economic indicators, and competition of other companies. With the use of data mining a retailer can look at point of sale records of a customer purchases to send promotions to certain areas based on purchases made. An example of this is Blockbuster looking at movie rentals to send customers updates regarding new movies depending on their previous rent list. Another example would be American express suggesting products to card holders depending on monthly purchases histories. Data Mining consists of 5 major elements: • Extract, transform, and load transaction data onto the......

Words: 1012 - Pages: 5

Jawani Phir Nahi Ani | Android Auto | Przystanek Alaska (Northern...