Data Science for Beginners 4 Course in 1 Python Programming, Data Analysis, Machine Learning Free Book PDF Full 2022
byDaoued-
0
Data Science
for Beginners 4 Books in 1 Python Programming, Data Analysis, Machine Learning.
A Complete Overview to Master 2022.
Python for data science: the basics of language:
This article is the first in a series of articles «introduction à la data science avec python». In this series we will tackle several facets of data science (programming, data processing, some statistics, machine learning and deep learning).
This series will essentially aim to present the different python libraries for data science.
So why Python for data science?
In recent years Python has become one of the most popular programming languages in addition to being the most widely used programming language for scientific computational tasks such as analyzing and visualizing large data sets. So we understand why using python for data science.
Moreover, this programming language is quite easy to handle and therefore perfect for beginners. Also if you want to know more about the whole story behind this language I leave you the Wikipedia link here .
What you need before you start
In this article the Python version we will use is Python 3.
If you are a complete beginner in programming, we recommend installing only Python 3.
You will find the necessary here.
However, if you already have programming experience, we recommend installing anaconda distribution via this tutorial.
Or you can start without installation with Google Colab.
Programming in python: The Basics
1. Hello World
You can’t start learning the basics of a language without going through the famous Hello World.
In python it’s really simple. To do this simply open a terminal and enter the following command:
1
python
then the following command:
1
print("Hello World")
You will see “Hello World” in your terminal.
We can say that this line of code is an instruction. So we ask the Python interpreter to execute this instruction. Thus, we can say that a python program is a logical sequence of instruction.
A core python concept is indentation. Indentation is a way to define a block of instruction. That is, this block of instruction (or lines of code) must be executed for a specific purpose.
Most programming languages such as JavaScript, C++ and Java use {braces} to define a block of code. One of Python’s distinguishing features is its use of indentation to highlight blocks of code. Spaces are used for indentation in Python. All instructions with the same distance to the right belong to the same block of code. If a block needs to be more deeply nested, it is simply indented further to the right.
Let’s follow this example:
1
2
3
4
x="Beginner"
if x== "Beginner":
print("Hello World")
print("Beginner")
In this example we have a block of 2 lines of code with the function "print() » (because they are offset to the right). This allows us to understand that they belong to the same instruction block that begins with “if x== “Beginner”:”. The standard at the python level is to use 4 spaces to shift an instruction block.
2. Variables and Types
The variables
In programming, a variable is a value that can change depending on the conditions or information transmitted to the program. In Python we declare a variable as follows:
1
x=1
That is, we assign 1 to the variable x.
The Types:
In python, when you create a variable, you automatically define a type ( Number, Text,…). The basic python types are:
Integer: These are positive or negative integers like 2, 3,198, etc.
Float: "Float" means "floating-point number." To put it more simply, these are semicolons like 3.2, 10.0, 17.9818.
Boolean: These are Boolean values (True or False).
String: This is a collection of one or more characters put in single, double or triple quotes
The complex numbers we will not present in this article.
We can thus notice that these different types are only numerical values and text!
The following example shows you the python syntax for declaring your different variables
01
02
03
04
05
06
07
08
09
10
11
12
13
#Embed
x=10
X=23
#Float
y=9.0
pi= 3.14
#Boolean
boolVal1= True
boolVal2= False
#String
unString ="This is a character string!"
You can use the print() function to display the different values of these variables and type() to display their type.
In Python, other types of variables obviously exist. They have a more complex structure than those presented above.
Analyses using machine learning:
Once the data is cleaned up, stored in a DataFrame and ready to be analyzed, you can now apply machine learning (machine learning) tools and represent the results through visualizations.
The two reference libraries for this are scikit-learn and matplotlib. Both are extremely comprehensive and have many features for python in data science.
Let’s continue our example of IoT, our connected object is an object allowing to pay purchases in points of sale, as soon as a purchase is made, it sends in a database about thirty parameters related to this purchase (position, amount, etc.). The SQL base is already composed of 10,000 transactions.
The interest here is to work on certain variables of interest in order to obtain 6 classes of transactions.
So we’ve recovered the data associated with the 10,000 transactions and we’re focusing on 12 quantitative variables.
We use scikit-learn and k-means method.
So we stored the data in a frame called frameTransac of size 10'000 12. We will first use the k-means methods to obtain the 6 classes.
from sklearn.cluster import KMeans
model = KMeans(6)
class = model.fit_predict(FrameTransac[nom_des_variables])
class_center = DataFrame(model.cluster_centers_, labels="Element central")
So we get a csv file that can be used, it is exported to the working directory, we can adapt the path very well. Separators could also be changed to use other formats. So if you want a «French» csv (separated by semicolons and using commas instead of dots), you can use the following command:
class_center.to_csv("centers.csv",sep=";",decimal=",")
The entire process can be automated to evolve with changes to the original database. We could also display graphs with representations of transaction classes.
Conclusions:
Python is the language of choice for data processing and analysis, it is a serious competitor for R and allows data analysis processing to be integrated into IoT applications. Feel free to share your experiences with me by commenting on this article.
keywords: machine learning, machine learning is, python machine learning,machine learning modeling, andrew ng machine learning ,
ai learning , aws machine learning, supervised learning ,unsupervised learning, ai ml, deep learning ai, tensorflow, data analytics, master's in data science, online master's data science, data analytics degrees, data science degrees, certified data scientist, master's in data analytics online , ms in data science, datascience berkeley ,uc berkeley data science, data science for managers, data science for beginners, certified data scientist, data science for all, big data analyst, r for data science, pandas, keras,tensorflowjs,hands on machine learning.
DOWNLOAD THIS EBOOK FREE PDF!