{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example usage\n", "\n", "Here we will demonstrate how to use the `datpro` package to summarize data, detect anomalies, and create visualizations for a dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import datpro as dp\n", "import pandas as pd\n", "import numpy as np\n", "import altair as alt\n", "from itertools import combinations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load example dataset\n", "We'll use a sample dataset to demonstrate the functionalities of the `datpro` package. The dataset contains demographic and transactional data, with the goal of predicting income based on other features such as age, gender, spending_score, and region.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Age | \n", "Income | \n", "Spending_Score | \n", "Gender | \n", "Region | \n", "
|---|---|---|---|---|---|
| 0 | \n", "66 | \n", "NaN | \n", "26.373678 | \n", "Male | \n", "South | \n", "
| 1 | \n", "65 | \n", "66369.651809 | \n", "20.906870 | \n", "Female | \n", "South | \n", "
| 2 | \n", "59 | \n", "70764.092278 | \n", "47.990597 | \n", "Male | \n", "West | \n", "
| 3 | \n", "64 | \n", "41432.315153 | \n", "31.120625 | \n", "Female | \n", "North | \n", "
| 4 | \n", "53 | \n", "52963.994070 | \n", "12.016596 | \n", "Female | \n", "East | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1005 | \n", "18 | \n", "62455.037248 | \n", "22.795113 | \n", "Female | \n", "North | \n", "
| 1006 | \n", "24 | \n", "35361.901205 | \n", "18.846863 | \n", "Male | \n", "South | \n", "
| 1007 | \n", "51 | \n", "56554.072546 | \n", "17.076530 | \n", "Female | \n", "South | \n", "
| 1008 | \n", "63 | \n", "52799.136847 | \n", "42.219961 | \n", "Male | \n", "East | \n", "
| 1009 | \n", "42 | \n", "52727.993826 | \n", "27.395330 | \n", "Male | \n", "East | \n", "
1010 rows × 5 columns
\n", "| \n", " | min | \n", "25% | \n", "50% | \n", "75% | \n", "max | \n", "
|---|---|---|---|---|---|
| Age | \n", "18.000000 | \n", "31.000000 | \n", "44.000000 | \n", "56.000000 | \n", "69.000000 | \n", "
| Income | \n", "6556.169327 | \n", "40915.394217 | \n", "51146.204619 | \n", "60893.485307 | \n", "443001.985244 | \n", "
| Spending_Score | \n", "0.536808 | \n", "16.880278 | \n", "26.670824 | \n", "38.786205 | \n", "75.010095 | \n", "