Select Page

307 Curated List of Free Data Science Ebooks, Courses and Resources

307 Curated List of Free Data Science Ebooks, Courses and Resources
The term “Data Science” was coined at the beginning of the 21st Century. It is attributed to William S. Cleveland who, in 2001, wrote “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. Analytic applications and data scientists can then review the results to uncover patterns and enable business leaders to draw informed insights. It is also a discipline that incorporates varying degrees of Data Engineering, Scientific Method, Math, Statistics, Advanced Computing, Visualization, Hacker mindset, and Domain Expertise.

Data Science and Artificial Intelligence, are the two most important technologies in the world today. While Data Science makes use of Artificial Intelligence in its operations, it does not completely represent AI. While many consider contemporary Data Science as Artificial Intelligence, it is simply not so. Data science involves attempting to solve complex problems with data; AI consists of developing algorithms to find solutions to these problems. Data science can be related to AI but is not a subset of AI.

This list tries to cover from basic introductory to deep analytical data science materials. You’ll also find various curriculums for learning Data Science. Foundational in both theory and technologies, the Open Source Data Science Masters (OSDSM) breaks down the core competencies necessary to making use of data. All kinds of topics will be covered, including Python, machine learning, specific topics such as statistics, databases, linear algebra / programming, etc., courses, tutorials and a long list of blogs which you can follow and refer to.

Linear Algebra & Programming

  1. Linear Algebra – Khan Academy / Videos
  2. Linear Programming (Math 407) – University of Washington / Course
  3. An Intuitive Guide to Linear Algebra – Better Explained / Article
  4. A Programmer’s Intuition for Matrix Multiplication – Better Explained / Article
  5. Vector Calculus: Understanding the Cross Product – Better Explained / Article
  6. Vector Calculus: Understanding the Dot Product – Better Explained / Article

Convex Optimization

  1. Convex Optimization / Boyd – Stanford / Lectures

Statistics

  1. Think Stats: Probability and Statistics for Programmers – Digital
  2. Think Bayes – Digital

Computing

  1. Get your environment up and running with the – Data Science Toolbox

Algorithms

  1. Algorithms Design & Analysis I – Stanford / Coursera

Distributed Computing Paradigms

  1. See Intro to Data Science – UW / Lectures on MapReduce
  2. Intro to Hadoop and MapReduce – Cloudera / Udacity Course. Includes select free excerpts of Hadoop: The Definitive Guide.

Databases

  1. Introduction to Databases – Stanford / Online Course
  2. SQL School – Mode Analytics / Tutorials
  3. SQL Tutorials – SQLZOO / Tutorials

Data Mining

  1. Introduction to Information Retrieval / Stanford – Digital

Data Design

  1. Tidy Data in Python – Focuses on one aspect of cleaning up data, tidying data: structuring datasets to facilitate analysis.

Machine Learning

Foundational & Theoretical

  1. Machine Learning – Ng Stanford / Coursera
  2. A Course in Machine Learning – UMD / Digital Book
  3. The Elements of Statistical Learning / Stanford – Digital
  4. Machine Learning – Caltech / Edx

Practical

  1. Machine Learning for Hackers – IPYNB / digital book
  2. Intro to scikit-learn, SciPy2013 – Youtube tutorials

Probabilistic Modeling

  1. Probabilistic Programming and Bayesian Methods for Hackers – Github / Tutorials
  2. Probabilistic Graphical Models – Stanford / Coursera

Deep Learning (Neural Networks)

  1. Neural Networks – Andrej Karpathy / Python Walkthrough
  2. Deep Learning for Natural Language Processing CS224d – Stanford

Social Network & Graph Analysis

  1. Social and Economic Networks: Models and Analysis / – Stanford / Coursera

Natural Language Processing

  1. From Languages to Information / Stanford CS147
  2. NLP with Python (NLTK library)
  3. How to Write a Spelling Correcter / Norvig (Tutorial)[Big Data Analysis with Twitter – UC Berkeley / Lectures

in Python

  1. Data Analysis in Python – Tutorial

Theoretical Courses / Design & Visualization

  1. Data Visualization – University of Washington / Slides & Resources
  2. Rice University’s Data Viz class – Rice University / Slides

Practical Visualization Resources

  1. D3 Library / Scott Murray – Blog / Tutorials
RELATED

Other Related Posts

  1. More Data Science Ebooks & Resources
  2. 126 Free Artificial Intelligence (AI) Courses, Ebooks, Videos and Papers – 2021
    This is a curated list of free Artificial Intelligence (AI) courses, ebooks, videos and papers pointing towards interesting directions and topics that you may be interested in.
  3. Other Programming Posts

Python (Learning)

  1. Python – Class / Google
  2. Think Python – Digital

Python (Libraries)

  1. Command Line Install Script – For Scientific Python Packages.
  2. numpy Tutorial / Stanford CS231N – This course expects that many of you will have some experience with Python and numpy; for the rest of you, this section will serve as a quick crash course on both the Python programming language and its use for scientific computing.
  3. Pandas Cookbook – Data structure library.

Data Structures & Analysis Packages

  1. Flexible and powerful data analysis / manipulation library with labeled data structures objects, statistical functions, etc – Pandas Tutorials.

Machine Learning Packages

  1. scikit-learn – Tools for Data Mining & Analysis.

Networks Packages

  1. networkx – Network Modeling & Viz

Statistical Packages

  1. PyMC – Bayesian Inference & Markov Chain Monte Carlo sampling toolkit.
  2. Statsmodels – Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
  3. PyMVPA – Multivariate Pattern Analysis in Python.

Natural Language Processing & Understanding

  1. NLTK – Natural Language Toolkit.
  2. Gensim – Python library for topic modeling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Data APIs

  1. twython – Python wrapper for the Twitter API.

Visualization Packages

  1. matplotlib – Well-integrated with analysis and data manipulation packages like numpy and pandas
  2. Seaborn – A high-level statistical visualization package built on top of matplotlib

iPython Data Science Notebooks

  1. Data Science in IPython Notebooks – Linear Regression, Logistic Regression, Random Forests, K-Means Clustering.

Capstone Project

  1. Capstone Analysis of Your Own Design; – Quora’s Idea Compendium.
  2. Analyze your LinkedIn Network – Generate & Download Adjacency Matrix.

Resources

Read

  1. DataTau – The “Hacker News” of Data Science.
  2. Wikipedia – The free encyclopedia.
  3. /r/MachineLearning – Machine learning subreddit.

Watch & Listen

  1. The Talking Machines – Podcast about Machine Learning
  2. What Data Science Is / Hilary Mason

Online Courses

  1. Metacademy – Search for a concept you want to learn.
  2. Coursera – Online university courses.
  3. Wolfram Alpha – The smart number and info cruncher.
  4. Khan Academy – High quality, free learning videos.

Lectures & Learning

Intro into Data-science

  1. https://www.youtube.com/watch?v=rpwZ_i-9U0o by Micheael Manoochehri (Materials) – DataEDGE 2013
  2. https://www.youtube.com/watch?v=Zdh3p4EKLeQ by Buck Woody – The DRIVE/conference 2013

Machine Learning

  1. CS 229: Machine Learning by Dr. Andrew Ng (Materials) – Stanford Lecture series 2008
  2. Learning from Data by Dr. Yaser S. Abu-Mostafa (Materials) – Caltech
  3. Machine Learning with Scikit-Learn (I) by Jake VanderPlas (Materials) – PyCon 2015

Statistical Methods

  1. Statistical Thinking for Data Science by Chris Fonnesbeck – SciPy 2015

Intro on Hadoop Ecosystem

  1. Intro to Hadoop by Bill Graham (Materials) – Part of Berkeleyi School course Info290: Analyzing Big Data With Twitter, 2012

HDFS

  1. HDF5 is Eating the World by Andrew Collette – SciPy 2015
  2. Introduction to HDF5 by Quincey Koziol – 2014

Python-based analysis

  1. My Data Journey with Python by Wes McKinney (Materials) – SciPy 2015
  2. Hands-on Data Analysis with Python by Sarah Guido – PyCon 2015
  3. Analyzing and Manipulating Data with Pandas by Jonathan Rocher (Materials) – SciPy 2015 Tutorial
  4. Machine Learning with Scikit Learn / Part 2 by Andreas Mueller & Kyle Kastner (Materials) – SciPy 2015 Tutorial

R-based analysis

  1. Introduction to Data Science with R – Data Analysis by David Langer (Materials) – 2014

Data Science Blogs

  1. A Blog From a Human-engineer-being
  2. Adit Deshpande
  3. Advanced Analytics & R
  4. Adventures in Data Land
  5. Ahmed BESBES
  6. Ahmed El Deeb
  7. Airbnb Data blog
  8. Alex Perrier
  9. Algobeans | Data Analytics Tutorials & Experiments for the Layman
  10. Amazon AWS AI Blog
  11. Amit Chaudhary
  12. Analytics and Visualization in Big Data @ Sicara
  13. Analytics Vidhya
  14. Andreas Müller
  15. Andrej Karpathy blog
  16. Andrew Brooks
  17. Andrey Kurenkov
  18. Andrey Vasnetsov
  19. Anton Lebedevich’s Blog
  20. Arthur Juliani
  21. Audun M. Øygard
  22. Avi Singh
  23. Beautiful Data
  24. Beckerfuffle
  25. Becoming A Data Scientist
  26. Ben Frederickson
  27. Berkeley AI Research
  28. Big-Ish Data
  29. Blog on neural networks
  30. Blogistic Regression
  31. blogR | R tips and tricks from a scientist
  32. Brain of mat kelcey
  33. Brilliantly wrong thoughts on science and programming
  34. Bugra Akyildiz
  35. Carl Shan
  36. Casual Inference
  37. Chris Stucchio
  38. Christophe Bourguignat
  39. Christopher Nguyen
  40. cnvrg.io blog
  41. colah’s blog
  42. Daniel Forsyth
  43. Daniel Homola
  44. Data Blogger
  45. Data Double Confirm
  46. Data Miners Blog
  47. Data Mining Research
  48. Data Mining: Text Mining, Visualization and Social Media
  49. Data School
  50. Data Science @ Facebook
  51. Data Science 101
  52. Data Science Dojo Blog
  53. Data Science Insights
  54. Data Science Notebook
  55. Data Science Tutorials
  56. Dataaspirant
  57. Dataclysm
  58. DataGenetics
  59. Dataiku
  60. DataKind
  61. Datanice
  62. Dataquest Blog
  63. DataRobot
  64. Datascienceblog.net
  65. Datascope
  66. DatasFrame
  67. David Mimno
  68. David Robinson
  69. Dayne Batten
  70. Deep and Shallow
  71. Deepdish
  72. Delip Rao
  73. DENNY’S BLOG
  74. Dimensionless
  75. Distill
  76. District Data Labs
  77. Diving into data
  78. Domino Data Lab’s blog
  79. Dr. Randal S. Olson
  80. Drew Conway
  81. Dustin Tran
  82. Eder Santana
  83. Edwin Chen
  84. EFavDB
  85. Eigenfoo
  86. Emilio Ferrara, Ph.D.
  87. Entrepreneurial Geekiness
  88. Eric Jonas
  89. Eric Siegel
  90. Erik Bern
  91. ERIN SHELLMAN
  92. Ethan Rosenthalh
  93. Eugenio Culurciello
  94. Fabian Pedregosa
  95. Fast Forward Labs
  96. FlowingData
  97. Full Stack ML
  98. Garbled Notes
  99. Grate News Everyone
  100. Greg Reda
  101. i am trask
  102. I Quant NY
  103. I’m a bandit
  104. inFERENCe
  105. Insight Data Science
  106. Ira Korshunova
  107. Jason Toy
  108. Java Machine Learning and DeepLearning
  109. jbencook
  110. Jesse Steinweg-Woods
  111. John Myles White
  112. Jonas Degrave
  113. Jovian
  114. Joy Of Data
  115. Julia Evans
  116. jWork.ORG.
  117. Kavita Ganesan’s NLP and Text Mining Blog
  118. KDnuggets
  119. Keeping Up With The Latest Techniques
  120. Kenny Bastani
  121. Kevin Davenport
  122. kevin frans
  123. korbonits | Math ∩ Data
  124. Large Scale Machine Learning
  125. LATERAL BLOG
  126. Lazy Programmer
  127. Learn Analytics Here
  128. LearnDataSci
  129. Learning With Data
  130. Life, Language, Learning
  131. Locke Data
  132. Louis Dorard
  133. M.E.Driscoll
  134. Machine Learning
  135. Machine Learning (Theory)
  136. Machine Learning Mastery
  137. Machine Learning, etc
  138. Machine Learning, Maths and Physics
  139. Machined Learnings
  140. MAPPING BABEL
  141. MAPR Blog
  142. MAREK REI
  143. MARGINALLY INTERESTING
  144. Mark White
  145. Math ∩ Programming
  146. Matthew Rocklin
  147. Mic Farris
  148. Mike Tyka
  149. Mirror Image
  150. Mitch Crowe
  151. MLWave
  152. MLWhiz
  153. Models are illuminating and wrong
  154. Moody Rd
  155. Natural language processing blog
  156. Neil Lawrence
  157. Neptune Blog: in-depth articles for machine learning practitioners
  158. Nikolai Janakiev
  159. no free hunch
  160. Nuit Blanche
  161. Number 2147483647
  162. On Machine Intelligence
  163. Opiate for the masses Data is our religion.
  164. p-value.info
  165. Pete Warden’s blog
  166. Peter Laurinec – Time series data mining in R
  167. Plotly Blog
  168. Probably Overthinking It
  169. Prooffreader.com
  170. Publishable Stuff
  171. PyImageSearch
  172. Pythonic Perambulations
  173. ℚuantitative √ourney
  174. quintuitive
  175. R and Data Mining
  176. R-bloggers
  177. R2RT
  178. Ramiro Gómez
  179. Randy Zwitch
  180. RaRe Technologies
  181. Reinforcement Learning For Fun
  182. Revolutions
  183. Rinu Boney
  184. Robert Chang
  185. Rocket-Powered Data Science
  186. Sachin Joglekar’s blog
  187. samim
  188. Sebastian Raschka
  189. Sebastian Ruder
  190. Sebastian’s slow blog
  191. Simply Statistics
  192. Springboard Blog
  193. Statistical Modeling, Causal Inference, and Social Science
  194. Stitch Fix Tech Blog
  195. Stochastic R&D Notes
  196. StreamHacker
  197. Subconscious Musings
  198. TechnoCalifornia
  199. TEXT ANALYSIS BLOG | AYLIEN
  200. The Angry Statistician
  201. The Clever Machine
  202. The Data Camp Blog
  203. The Data Incubator
  204. The Data Science Lab
  205. The Data Science Swiss Army Knife
  206. THE ETZ-FILES
  207. The Science of Data
  208. The Shape of Data
  209. The unofficial Google data science Blog
  210. Tombone’s Computer Vision Blog
  211. Tommy Blanchard
  212. Towards Data Science
  213. Trevor Stephens
  214. Trey Causey
  215. UW Data Science Blog
  216. Victor Zhou
  217. Wellecks
  218. Wes McKinney
  219. While My MCMC Gently Samples
  220. WildML
  221. Will do stuff for stuff
  222. Will wolf
  223. WILL’S NOISE
  224. William Lyon
  225. Win-Vector Blog
  226. Yanir Seroussi
  227. Zac Stewart
  228. 大トロ