数据科学(影印版 英文) 作 者: (美)舒特,奥尼尔 著 出版时间: 2014 内容简介 现在人们已经意识到数据可以让选举或者商业模式变得不同,数据科学作为一项职业正在不断发展。但是你应该如何在这样一个广阔而又错综复杂的交叉学科领域中开展工作呢?舒特、奥尼尔著的《数据科学(影印版)》这本书将会告诉你所需要了解的一切。它富有深刻见解,是根据哥伦比亚大学的数据科学课程的讲义整理而成。 目录 Preface 1. Introduction: What Is Data Science? Big Data and Data Science Hype Getting Past the Hype Why Now? Datafication The Current Landscape (with a Little History) Data Science lobs A Data Science Profile Thought Experiment: Meta-Definition OK, So What Is a Data Scientist, Really? In Academia In Industry 2. Statistical Inference, Exploratory Data Analysis, and the Data Science Process Statistic.a1 Thinking in the Age of Big Data Statistical Inference Populations and Samples Populations and Samples of Big Data Big Data Can Mean Big Assumptions Modeling Exploratory Data Analysis Philosophy of Exploratory Data Analysis Exercise: EDA The Data Science Process A Data Scientist's Role in This Process Thought Experiment: How Would You Simulate Chaos? Case Study: RealDirect How Does RealDirect Make Money? Exercise: RealDirect Data Strategy 3. Algorithms Machine Learning Algorithms Three Basic Algorithms Linear Regression k-Nearest Neighbors (k-NN) k-means Exercise: Basic Machine Learning Algorithms Solutions Summing It All Up Thought Experiment: Automated Statistician 4. Spare Filters, Naive Bayes, and Wrangling Thought Experiment: Learning by Example Why Won't Linear Regression Work for Filtering Spare? How About k-nearest Neighbors? Naive Bayes Bayes Law A Spare Filter for Individual Words A Spam Filter That Combines Words: Naive Bayes Fancy It Up: Laplace Smoothing Comparing Naive Bayes to k-NN Sample Code in bash Scraping the Web: APIs and Other Tools Jake's Exercise: Naive Bayes for Article Classification Sample R Code for Dealing with the NYT API 5. Logistic Regression Thought Experiments Classifiers Runtime You Interpretability Scalability M6D Logistic Regression Case Study Chck Models The Underlying Math 6.1ime Stamps and Financial Modeling 7.Extracting Meaning from Data 8.Recommendation Engines:Building a User-Facing Data Product at Scale 9.Data Visualization and Fraud Detection 10.SociaI Networks and Data Journalism 11.Causality 12.Epidemiology 13.Lessons Learned from Data Competitions:Data Leakage and Model Evaluation 14.Data Engineering:MapReduce,Pregel,and Hadoop 15.The Students Speak 16.Next-Generation Data Scientists,Hubris,and Ethics Index