Getting Data
All modules in PyCaret can work directly with pandas Dataframe. It can consume the dataframe, Irrespective of how it is loaded in the environment. See the below example of loading a csv file into the notebook using pandas native functionality.
Load data using pandas
# Importing data using pandas import pandas as pd data = pd.read_csv('c:/path_to_data/file.csv')
Loading data from PyCaret’s repository
PyCaret also hosts the repository of open source datasets that were used throughout the documentation for demonstration purposes. These are hosted on PyCaret’s github and can also be directly loaded using pycaret.datasets module. See an example below:
Code
# Loading data from pycaret from pycaret.datasets import get_data data = get_data('juice')
Output
PyCaret’s Data Repository
Dataset | Data Types | Default Task | Target Variable | # Instances | # Attributes |
anomaly | Multivariate | Anomaly Detection | None | 1000 | 10 |
france | Multivariate | Association Rule Mining | InvoiceNo, Description | 8557 | 8 |
germany | Multivariate | Association Rule Mining | InvoiceNo, Description | 9495 | 8 |
bank | Multivariate | Classification (Binary) | deposit | 45211 | 17 |
blood | Multivariate | Classification (Binary) | Class | 748 | 5 |
cancer | Multivariate | Classification (Binary) | Class | 683 | 10 |
credit | Multivariate | Classification (Binary) | default | 24000 | 24 |
diabetes | Multivariate | Classification (Binary) | Class variable | 768 | 9 |
electrical_grid | Multivariate | Classification (Binary) | stabf | 10000 | 14 |
employee | Multivariate | Classification (Binary) | left | 14999 | 10 |
heart | Multivariate | Classification (Binary) | DEATH | 200 | 16 |
heart_disease | Multivariate | Classification (Binary) | Disease | 270 | 14 |
hepatitis | Multivariate | Classification (Binary) | Class | 154 | 32 |
income | Multivariate | Classification (Binary) | income >50K | 32561 | 14 |
juice | Multivariate | Classification (Binary) | Purchase | 1070 | 15 |
nba | Multivariate | Classification (Binary) | TARGET_5Yrs | 1340 | 21 |
wine | Multivariate | Classification (Binary) | type | 6498 | 13 |
telescope | Multivariate | Classification (Binary) | Class | 19020 | 11 |
glass | Multivariate | Classification (Multiclass) | Type | 214 | 10 |
iris | Multivariate | Classification (Multiclass) | species | 150 | 5 |
poker | Multivariate | Classification (Multiclass) | CLASS | 100000 | 11 |
questions | Multivariate | Classification (Multiclass) | Next_Question | 499 | 4 |
satellite | Multivariate | Classification (Multiclass) | Class | 6435 | 37 |
asia_gdp | Multivariate | Clustering | None | 40 | 11 |
elections | Multivariate | Clustering | None | 3195 | 54 |
Multivariate | Clustering | None | 7050 | 12 | |
ipl | Multivariate | Clustering | None | 153 | 25 |
jewellery | Multivariate | Clustering | None | 505 | 4 |
mice | Multivariate | Clustering | None | 1080 | 82 |
migration | Multivariate | Clustering | None | 233 | 12 |
perfume | Multivariate | Clustering | None | 20 | 29 |
pokemon | Multivariate | Clustering | None | 800 | 13 |
population | Multivariate | Clustering | None | 255 | 56 |
public_health | Multivariate | Clustering | None | 224 | 21 |
seeds | Multivariate | Clustering | None | 210 | 7 |
wholesale | Multivariate | Clustering | None | 440 | 8 |
tweets | Text | NLP | tweet | 8594 | 2 |
amazon | Text | NLP / Classification | reviewText | 20000 | 2 |
kiva | Text | NLP / Classification | en | 6818 | 7 |
spx | Text | NLP / Regression | text | 874 | 4 |
wikipedia | Text | NLP / Classification | Text | 500 | 3 |
automobile | Multivariate | Regression | price | 202 | 26 |
bike | Multivariate | Regression | cnt | 17379 | 15 |
boston | Multivariate | Regression | medv | 506 | 14 |
concrete | Multivariate | Regression | strength | 1030 | 9 |
diamond | Multivariate | Regression | Price | 6000 | 8 |
energy | Multivariate | Regression | Heating Load / Cooling Load | 768 | 10 |
forest | Multivariate | Regression | area | 517 | 13 |
gold | Multivariate | Regression | Gold_T+22 | 2558 | 121 |
house | Multivariate | Regression | SalePrice | 1461 | 81 |
insurance | Multivariate | Regression | charges | 1338 | 7 |
parkinsons | Multivariate | Regression | PPE | 5875 | 22 |
traffic | Multivariate | Regression | traffic_volume | 48204 | 8 |
Try this next
Was this page helpful?
GitHub