Exploratory data analysis for data science :A practical approach using biological data and python

Price 3500 | USD $105

Live class will start from 8 Oct 2021.

Attend 2 demo sessions free and pay only after you are satisfied.

Hurry! Few seats available

About Course

It is well known that in data analysis if rubbish is in we get rubbish out. EDA is to do data analysis using descriptive and inferential statistics, before experimenting in application of a machine learning algorithm . EDA identifies data patterns, detects outliers, removes null values, balances data and finds interesting relations among the variables. In data analysis, this is where most of the time is spent. Most available explanation of EDA using python is based on non-biology related data. This sometimes leads to difficulty in understanding and implementation. Through this course we learn to do EDA on biological data from genes to proteins and diseases and build machine learning models. The demo and practice projects will give you high confidence to develop your own machine learning models in the future. The course applies lots of coding in python. Do not worry we have you covered. We first learn python in a fun way by using biological data followed by statis

Course highlights


  1. Python revision part I with biological data· Install Anaconda· Google Collab introduction · Data types· List, sets, dictionaries and tuple· Loops For, While· Function
  2. Python Revision part II with biological data· How to read and write file with python· Numpy· Pandas
  3. Basic Statistics Part I (Theory)· Types of Data· Descriptive and inferential statistics· Normal Distribution· Sampling and central limit theorem
  4. Basic Statistics Part II (Theory)· p-value, significance· Types of statistical tests: Parametric, Non-parametric.· Which test to use when?· What are features?
  5. Practical: Introduction to datasets used in this course: Diabetes, breast cancer, solubility, microbial peptide, Chembl dataset, Iris dataset
  1. EDA with python and biological data· Application of Seaborn and matplotlib libraries· Univariate, Bivariate and multivariate analysis
  2. Basic Statistics Part III· Correlation Theory and practical· Linear Regression Theory o Practical EDA and Linear regression with Delaney solubility dataset· Multiple linear Regression
  3. Logistic regression theory, ROC curves theory· EDA with Diabetes dataset Jupyter Notebook (Seaborn, matplotlib application and generation of logistic regression model)· EDA with Breast cancer dataset and logistic regression
  4. Feature scaling and engineering· Chemical SMILES feature generation and Protein feature generation P-feature· Data scaling and Normalization theory and practical· Dimensionality reduction theory o PCA theory with example (practical)
  1. ROC AND AUC· Lazy predictors o Practical using Delaney solubility and diabetes dataset · ROC and AUC curves with practical example· Assignment: Generate ROC and AUC curve on the Diabetes dataset
  2. Practice projects: · Label binarize and one hot encoding· QSAR dataset (Chembl dataset) EDA and regression, RDkit)· Demo Project with microbial peptide dataset Assignment: Generation of other features
  1. EDA on Genome data Prediction of genetic disease. Dataset will be provided. Apply all the learned concepts of EDA, look for missing values, replace missing values. Binarize the dependent variable. Look at data imbalance and apply logistic regression method to classify the data.
  2. Cheminformatics complete project with a novel disease target Aim: Prediction of log IC50 valuesDownload desired data from ChemBL database. Do EDA, feature scaling and engineering and apply a regression model.
  3. EDA on Protein data and protein feature generation Aim: To predict the temperature optimum of the proteins for biocatalysis.Dataset will be provided. Generate features using the sequence information in FASTA and a feature generating python package. Perform EDA and generate a regression model to predict the temperature optimum.
  4. BONUS: How to get practice datasets and make a GitHub profile? What are the available python packages to automate the EDA processes?

Features available in our live classes

“who knows, does it live”

Certificate of completion

Life time study material access

Doubt solving at any time

Job opportunities


Top instructor


5 / 5
Top reviews

Certificate of completion

What students say about us

ReadMyCourse is a one stop destination for understanding courses related to research and new technology With great qualified instructors and 24X7 doubts support, difficult topics are presented in a simplified manner.

Dr. Rimanpreet Kaur | Bioanalyst
Machine learning course

I had a great learning experience with ReadMyCourse. The concepts were clearly explained by the teachers, especially the way the course was simplified and presented, makes it more fun and easy. I got a job offer as Bioanalyst.

Anup Singh | IISC
NGS course

It is quite understandable for me. I have some background in programming language it is a plus point, apart from it also its quite easy to understand with live classes. Course was more practical and with live coding examples.

Sakina Vakhariya | AstraZeneca
Python for Bioinformatics course

Our values lie in our instructors

Anand Kumar
Python for Bioinformatics, Vaccine designing

Anand is the Co-Founder and CTO of ReadMyCourse. He has 5 years of experience in working with computation biology, Machine Learning, and Vaccine designing. He has co-authored various research papers in reputed journals and has advanced the career of thousands of students.

Dr. Dibyabhaba Pradhan
NGS data analysis, Vaccine designing, Bioanalyst

Dr. Dibyabhaba Pradhan is a Post-doctoral Research Scientist. He is a PhD in Bioinformatics and has more than 12 years of Research and teaching experience in High throughput NGS data analysis, Computer-aided vaccine design, Rational Drug Design and Medical Informatics. He co-authored more than 53 research papers in International and National Journals of repute.

Nirupma Singh
Python, R, Data Analysis, Machine Learning

Nirupma has 5 years of experience with R and Python programming languages for handling and analysing biological data and implement machine learning. She has a post-graduate in Microbiology therefore, She can understand and interpret the biological data quite well and could provide valuable insights. She enjoys interactive teaching with the learners and give her best to it.

Sharon Priya Alexander
Drug designing, Bioinformatics tools

Sharon Priya Alexander has Masters Degree in Bioinformatics bagged late Dr. P. Subramanyam IAS Gold Medal for being a University topper. He has co-authored various research papers in reputed journals. Currently Pursuing PhD in drug designing.


If you miss the class you can attent in any other session. You can view recorded sessions also.

Yes you can attend demo session for free. Before the final class will be started you can attend demo live classes.

Yes you are eligible for refund in the period of three classes.

You will get the certificate after completion of all the live sessions and study materials.

Exploratory data analysis for data science :A practical approach using biological data and python