Getting Data


All modules in PyCaret can work directly with pandas Dataframe. It can consume the dataframe, Irrespective of how it is loaded in the environment. See the below example of loading a csv file into the notebook using pandas native functionality.

 

Load data using pandas


 

# Importing data using pandas
import pandas as pd

data = pd.read_csv('c:/path_to_data/file.csv')

 

Loading data from PyCaret’s repository


PyCaret also hosts the repository of open source datasets¬†that were used throughout the documentation for demonstration purposes. These are hosted on PyCaret’s github and can also be directly loaded using pycaret.datasets module. See an example below:

Code
# Loading data from pycaret
from pycaret.datasets import get_data
data = get_data('juice') 

 

Output

PyCaret’s Data Repository


Dataset Data Types Default Task Target Variable # Instances # Attributes
anomaly Multivariate Anomaly Detection None 1000 10
france Multivariate Association Rule Mining InvoiceNo, Description 8557 8
germany Multivariate Association Rule Mining InvoiceNo, Description 9495 8
bank Multivariate Classification (Binary) deposit 45211 17
blood Multivariate Classification (Binary) Class 748 5
cancer Multivariate Classification (Binary) Class 683 10
credit Multivariate Classification (Binary) default 24000 24
diabetes Multivariate Classification (Binary) Class variable 768 9
electrical_grid Multivariate Classification (Binary) stabf 10000 14
employee Multivariate Classification (Binary) left 14999 10
heart Multivariate Classification (Binary) DEATH 200 16
heart_disease Multivariate Classification (Binary) Disease 270 14
hepatitis Multivariate Classification (Binary) Class 154 32
income Multivariate Classification (Binary) income >50K 32561 14
juice Multivariate Classification (Binary) Purchase 1070 15
nba Multivariate Classification (Binary) TARGET_5Yrs 1340 21
wine Multivariate Classification (Binary) type 6498 13
telescope Multivariate Classification (Binary) Class 19020 11
glass Multivariate Classification (Multiclass) Type 214 10
iris Multivariate Classification (Multiclass) species 150 5
poker Multivariate Classification (Multiclass) CLASS 100000 11
questions Multivariate Classification (Multiclass) Next_Question 499 4
satellite Multivariate Classification (Multiclass) Class 6435 37
asia_gdp Multivariate Clustering None 40 11
elections Multivariate Clustering None 3195 54
facebook Multivariate Clustering None 7050 12
ipl Multivariate Clustering None 153 25
jewellery Multivariate Clustering None 505 4
mice Multivariate Clustering None 1080 82
migration Multivariate Clustering None 233 12
perfume Multivariate Clustering None 20 29
pokemon Multivariate Clustering None 800 13
population Multivariate Clustering None 255 56
public_health Multivariate Clustering None 224 21
seeds Multivariate Clustering None 210 7
wholesale Multivariate Clustering None 440 8
tweets Text NLP tweet 8594 2
amazon Text NLP / Classification reviewText 20000 2
kiva Text NLP / Classification en 6818 7
spx Text NLP / Regression text 874 4
wikipedia Text NLP / Classification Text 500 3
automobile Multivariate Regression price 202 26
bike Multivariate Regression cnt 17379 15
boston Multivariate Regression medv 506 14
concrete Multivariate Regression strength 1030 9
diamond Multivariate Regression Price 6000 8
energy Multivariate Regression Heating Load / Cooling Load 768 10
forest Multivariate Regression area 517 13
gold Multivariate Regression Gold_T+22 2558 121
house Multivariate Regression SalePrice 1461 81
insurance Multivariate Regression charges 1338 7
parkinsons Multivariate Regression PPE 5875 22
traffic Multivariate Regression traffic_volume 48204 8

Try this next