Module to get datasets in pycaret
Module
pycaret.datasets
Functions 1
get_data(dataset: str = 'index', folder: Optional[str] = None, save_copy: bool = False, profile: bool = False, verbose: bool = True, address: Optional[str] = None)
Function to load sample datasets.
Order of read: (1) Tries to read dataset from local folder first. (2) Then tries to read dataset from folder in GitHub "address" (see below) (3) Then tries to read from sktime (if installed) (4) Raises error if none exist
List of available datasets on GitHub can be checked using
(1) get_data('index') or
(2) get_data('index', folder='time_series/seasonal)
(see available "folder" options below)
Example#
from pycaret.datasets import get_data all_datasets = get_data('index') juice = get_data('juice')
dataset: str, default = 'index'
Index value of dataset.folder: Optional[str], default = None
The folder from which to get the data.
If 'None', gets it from the "common" folder. Other options are:
- time_series/seasonal
- time_series/random_walk
- time_series/white_noisesave_copy: bool, default = False
When set to true, it saves a copy in current working directory.profile: bool, default = False
When set to true, an interactive EDA report is displayed.verbose: bool, default = True
When set to False, head of data is not displayed.address: string, default = None
Download url of dataset. Defaults to None which fetches the dataset from
"https://raw.githubusercontent.com/pycaret/datasets/main/". For people
having difficulty linking to github, they can change the default address
to their own
(e.g. "https://gitee.com/IncubatorShokuhou/pycaret/raw/master/datasets/")Returns:
pandas.DataFrameWarnings#
- Use of
get_datarequires internet connection.
Raises#
ImportError
(1) When trying to import time series datasets that require sktime,
but sktime has not been installed.
(2) If the data does not exist