API#

pdpilot.partial_dependence(*, predict: Callable[[pandas.DataFrame], List[float]], df: pandas.DataFrame, features: List[str], resolution: int = 20, one_hot_features: Dict[str, List[Tuple[str, str]]] | None = None, nominal_features: List[str] | None = None, ordinal_features: List[str] | None = None, feature_value_mappings: Dict[str, Dict[str, str]] | None = None, num_clusters_extent: Tuple[int, int] = (2, 5), mixed_shape_tolerance: float = 0.15, compute_two_way_pdps: bool = True, cluster_preprocessing: str = 'diff', n_jobs: int = 1, seed: int | None = None, output_path: str | None = None, logging_level: str = 'INFO') dict | None#

Calculates the data needed for the widget. This includes computing the data for the PDP and ICE plots, calculating the metrics to rank the plots by, and clustering the lines within each ICE plot.

Parameters:
  • predict (Callable[[pd.DataFrame], list[float]]) – A function whose input is a DataFrame of instances and returns the model’s predictions on those instances.

  • df (pd.DataFrame) – Instances to use to compute the PDPs and ICE plots.

  • features (list[str]) – List of feature names to compute the plots for.

  • resolution (int, optional) – For quantitative features, the number of evenly spaced to use to compute the plots, defaults to 20.

  • one_hot_features (dict[str, list[tuple[str, str]]] | None, optional) – A dictionary that maps from the name of a feature to a list tuples containing the corresponding one-hot encoded column names and feature values, defaults to None.

  • nominal_features (list[str] | None, optional) – List of nominal and binary features in the dataset that are not one-hot encoded. If None, defaults to binary features in the dataset.

  • ordinal_features (list[str] | None, optional) – List of ordinal features in the dataset. If None, defaults to integer features with 3-12 unique values.

  • feature_value_mappings (dict[str, dict[str, str]] | None, optional) – Nested dictionary that maps from the name of a nominal or ordinal feature, to a value for that feature in the dataset, to the desired label for that value in the UI, defaults to None.

  • num_clusters_extent (tuple[int, int]) – The minimum and maximum number of clusters to try when clustering the lines of ICE plots. Defaults to (2, 5).

  • mixed_shape_tolerance (float) – Quantitative and ordinal one-way PDPs are labeled as having positive, negative, or mixed shapes. A lower value for this parameter leads to more PDPs being labeled as positive or negative and fewer being labeled as mixed. A higher value leads to more being labeled as mixed. Must be in the range [0, 0.5]. Defaults to 0.15.

  • compute_two_way_pdps (bool) – Whether or not to compute two-way PDPs. Defaults to True.

  • cluster_preprocessing (str) – How to preprocess the ICE lines before clustering them. “diff” calculates the differences in y-values between successive points in the lines using np.diff. “center” centers the ICE lines so that they all begin at y = 0. Defaults to “diff”.

  • n_jobs (int, optional) – Number of jobs to use to parallelize computation, defaults to 1.

  • seed (int | None, optional) – Random state for clustering. Defaults to None.

  • output_path (str | None, optional) – A file path to write the results to. If None, then the results are instead returned.

  • logging_level (string, optional) – The verbosity of printed messages. Must be “DEBUG”, “INFO”, “WARNING”, or “ERROR”. Defaults to “INFO”.

Raises:

OSError – Raised when the output_path, if provided, cannot be written to.

Returns:

Wigdet data, or None if an output_path is provided.

Return type:

dict | None

class pdpilot.PDPilotWidget(**kwargs: Any)#

This class creates the interactive widget.

Parameters:
  • predict (Callable[[pd.DataFrame], list[float]]) – A function whose input is a DataFrame of instances and returns the model’s predictions on those instances.

  • df (pd.DataFrame) – Instances to use to compute the PDPs and ICE plots.

  • labels (list[float] | list[int] | np.ndarray | pd.Series) – Ground truth labels for the instances in df.

  • pd_data (dict | str | Path) – The dictionary returned by pdpilot.pdp.partial_dependence() or a path to the file containing that data.

  • seed (int | None, optional) – Random state for clustering. Defaults to None.

  • height (int, optional) – The height of the widget in pixels, defaults to 600.

Raises:

OSError – Raised if pd_data is a str or Path and the file cannot be read.