Module

pycaret.datasets

Module to get datasets in pycaret

Functions 1

get_data(dataset: str = 'index', folder: Optional[str] = None, save_copy: bool = False, profile: bool = False, verbose: bool = True, address: Optional[str] = None)

Function to load sample datasets.

Order of read: (1) Tries to read dataset from local folder first. (2) Then tries to read dataset from folder in GitHub "address" (see below) (3) Then tries to read from sktime (if installed) (4) Raises error if none exist

List of available datasets on GitHub can be checked using (1) get_data('index') or (2) get_data('index', folder='time_series/seasonal) (see available "folder" options below)

Example#

from pycaret.datasets import get_data all_datasets = get_data('index') juice = get_data('juice')

dataset: str, default = 'index'

Index value of dataset.

folder: Optional[str], default = None

The folder from which to get the data.
If 'None', gets it from the "common" folder. Other options are:
    - time_series/seasonal
    - time_series/random_walk
    - time_series/white_noise

save_copy: bool, default = False

When set to true, it saves a copy in current working directory.

profile: bool, default = False

When set to true, an interactive EDA report is displayed.

verbose: bool, default = True

When set to False, head of data is not displayed.

address: string, default = None

Download url of dataset. Defaults to None which fetches the dataset from
"https://raw.githubusercontent.com/pycaret/datasets/main/". For people
having difficulty linking to github, they can change the default address
to their own
(e.g. "https://gitee.com/IncubatorShokuhou/pycaret/raw/master/datasets/")

Returns:

pandas.DataFrame

Warnings#

  • Use of get_data requires internet connection.

Raises#

ImportError

(1) When trying to import time series datasets that require sktime,
but sktime has not been installed.
(2) If the data does not exist