Mining of Massive Datasets

by John | Nov 17, 2018 | Computers and Internet, Mathematics | 0 comments

fi-miningofmassive

At the highest level of description, this book is about data mining. However, it focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to ‘train’ a machine-learning engine of some sort. The principal topics covered are:

Distributed file systems and map-reduce as a tool for creating parallel algorithms that succeed on very large amounts of data.
Similarity search, including the key techniques of minhashing and localitysensitive hashing.
Data-stream processing and specialized algorithms for dealing with data that arrives so fast it must be processed immediately or lost.
The technology of search engines, including Google’s PageRank, link-spam detection, and the hubs-and-authorities approach.
Frequent-itemset mining, including association rules, market-baskets, the A-Priori Algorithm and its improvements.
Algorithms for clustering very large, high-dimensional datasets.
Two key problems for Web applications: managing advertising and recommendation systems.
Algorithms for analyzing and mining the structure of very large graphs, especially social-network graphs.
Techniques for obtaining the important properties of a large dataset by dimensionality reduction, including singular-value decomposition and latent semantic indexing.
Machine-learning algorithms that can be applied to very large data, such as perceptrons, support-vector machines, and gradient descent.

Mining of Massive Datasets

by Jure Leskovec, Anand Rajaraman, Jeff Ullman (PDF, PPT, Videos) – 12 chapters

Click here download this free ebook

Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman

← From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science Foundations of Databases →

Related Posts

293 Free Mathematics Ebooks, Learning Platforms, Tools and Resources

293 Free Mathematics Ebooks, Learning Platforms, Tools and Resources

This post will further your knowledge in areas that all mathematicians will be interested in. Find out what resources you can read to learn more, and find a selection of useful links including a variety of ebooks, learning platforms, videos, tools and lecture notes on a wide array of topics, such as doing math foundation, set theory, logic, type theory, algebra and many many more.

550 Software and Programming Podcasts for Your Listening Pleasure

550 Software and Programming Podcasts for Your Listening Pleasure

If you’re looking to learn more about coding or any softwares in general, whether to pique your curiosity or prepare you for a career change, here are 550 software and programming podcasts for your listening pleasure. It covers all programming languages imaginable, including general softwares and operating systems as well as podcasts from 17 different languages. Take your pick and happy listening!

28 Free Software Development Roadmaps – Best Practices And Tips

28 Free Software Development Roadmaps – Best Practices And Tips

Use this guide to learn how product roadmap software can help you accomplish key business goals, including aligning your product strategy, syncing up your team, and driving alignment across your organization. These 28 free software development roadmaps can be used to ensure your next product roadmap is on track – learn about the best practices, pitfalls to avoid, and as a rough guide to create successful launches.