Machine Learning with and Python®: Essential Techniques for Predictive Analytics Full eBook

  Machine Learning with Spark™ and Python®: 

Essential Techniques for Predictive Analytics 


Machine Learning with Spark™ and Python:

Introduction to Machine Learning:

1. The Magic Black Box Approach:

Spark’s MLlib component allows you to do machine learning. This expression comes from the English language. The official French translation is machine learning. There are three ways to approach this concept: the magic black box, mathematics and intuition. The first presents only the impacts of an AI application. Only the final possibilities are presented. There is sometimes this idea that machine learning can do anything, will save us from everything or on the contrary make us all perish. The subject is often considered as white magic capable of giving us access to the best of worlds. At other times, it is described as pure witchcraft ready to plunge us into the abyss of an ugly and evil universe. Even if things tend to change, this is an area that remains little understood and in fact arouses a lot of fantasies. Presenting only machine learning is not ideal for understanding what it is. This is where misunderstandings arise. For example, to say that such a concept could tomorrow replace the people who produce computer programs.


 Machine learning can be seen as the set of techniques that allow a machine to learn to perform a task without having to explicitly program it for it.

Arthur Samuel.

Definition taken from an article by Antoine Gaudelas

Machine learning or statistical learning is an area of artificial intelligence that concerns the design, analysis, development and implementation of methods allowing a computer to evolve through a systematic process, and thus to perform difficult or problematic tasks to be performed by more traditional algorithmic means.


Apache Spark is a data processing system, open source distributor, used in the workload of large data for analysis and give us results. It uses caching in memory and optimized query execution for fast analytical queries on data of any size. 

Provides APIs for development in Java, Scala, Python and R, and supports code reuse across multiple workloads: batch processing, interactive queries, real-time analysis, machine learning and graphical processing

 It is used by organizations from all sectors, including FINRA, Yelp, Zillow, DataXu, Urban Institute and CrowdStrike. Apache Spark has become one of the most popular distributed big data processing frameworks, with 365,000 meetup members in 2017.

What are its advantages?

The benefits of Apache Spark are many and make it one of the most active projects in the Hadoop ecosystem. Here are a few:


By caching in memory and optimizing query execution, it can execute fast analytical queries on data of any size.

User-friendliness for developers:

Apache Spark natively supports Java, Scala, R and Python, offering a variety of languages to create applications. These APIs make it easier for developers, as they hide the complexity of distributed processing behind simple, high-level operators that significantly reduce the amount of code needed.

Multiple workloads:

It offers the ability to execute multiple workloads, including interactive queries, real-time analytics, machine learning and graph processing. An application can seamlessly combine multiple workloads.

Machine Learning and Python:

Python has established itself/established itself in the scientific and industrial world.

The field of machine learning has not been left out, on the contrary...

The tremendous possibilities of language calculation have made it possible to penetrate this sector and multiple bookshops have been born.

Annoy, extremely fast library implementing the search for the nearest neighbors:

Caffe, Deep learning framework

Chainer, Intuitive framework for neural networks

neon, Deep Learning framework extremely powerful

NuPIC, AI platform implementing HTM learning algorithms

Shogun, Large Scale Machine Learning Toolbox

TensorFlow, Neural network with high-level API

Torch, High-performance learning algorithms framework with Python binding

Theanets, deep learning

The most amazing is that they are all generally of high quality and used in professional environments.

However, Scikit-Learn is probably the most popular library available for this language.

It has a large number of features specialized in data analysis and data mining that make it a tool of choice for researchers and developers.

Download the book for free

Watch a video on our YouTube 
channel IN AI - ML - DL:

Post a Comment

Previous Post Next Post