geomapviz package#

Submodules#

geomapviz.aggregator module#

geomapviz.aggregator.compute_confidence_interval(df: DataFrame, groups: str | List[str], target: str = 'target', weight: str = 'weight', other_cols_avg: List[str] | None = None, distr: str = 'gaussian', n_std: float = 2.0)[source]#

geomapviz.aggregator.compute_weighted_average(df: DataFrame, groups: str | List[str], target: str = 'target', weight: str = 'weight', other_cols_avg: List[str] | None = None)[source]#

compute_weighted_average computes the weighted arithmetic average, grouped by the column group. The weighted average is :math: sum_{i} w_{i} x_{i} / sum_{i} w_{i} If the weight is None, it computes the arithmetic average without weights :math: sum_{i} x_{i} / N

Parameters:

df – the data set
groups – the predictor(s) to group by
target – the name of the observed/target column
weight – the name of the column weight
other_cols_avg – Other columns to average, such as the predicted values of a model

Returns:

pd.DataFrame – the dataframe with the arithmetic average, by group

geomapviz.aggregator.dissolve_and_aggregate(df: DataFrame, target: str, other_cols_avg: List[str] | None = None, dissolve_on: List[str] | None = None, distr: str = 'gaussian', geoid: str = 'INS', weight: List[str] | None = None, shp_file: GeoDataFrame | None = None) → GeoDataFrame[source]#

Dissolves a GeoDataFrame based on a column, and aggregates data based on the dissolved polygons.

Parameters:

df – Dataframe with the data to be aggregated.
cols_to_plot – List of columns to plot on map.
target – Column with the target variable.
other_cols_avg – Columns with the predicted values or any other columns to average.
distr – Distribution of the target variable, by default “gaussian”.
weight – Column with the weights to be used, by default None.
dissolve_on – Column to dissolve the GeoDataFrame, by default None.
geoid – Column with the geoid, by default “geoid”.
shp_file – The shapefile to use for the map, as a GeoDataFrame. The default is None.

Returns:

geopandas.GeoDataFrame – geodataframe with the dissolved polygons.

geomapviz.aggregator.encode_categorical_columns(df: DataFrame) → DataFrame[source]#

Encode categorical columns in the input DataFrame using the .cat.codes method.

Parameters:: df – Input DataFrame to encode categorical columns.
Returns:: pd.DataFrame – Returns a new DataFrame with categorical columns encoded.

Examples

>>> import pandas as pd
>>> from typing import List
>>> df = pd.DataFrame({'A': pd.Categorical(['a', 'b', 'c', 'a'], categories=['a', 'b', 'c']),
>>>                       'B': pd.Categorical(['b', 'a', 'b', 'c'], categories=['a', 'b', 'c']),
>>>                       'C': [1, 2, 3, 4],
>>>                       'D': [5, 6, 7, 8]})
>>> encoded_df = encode_categorical_columns(df)
>>> print(encoded_df)

geomapviz.aggregator.merge_zip_df(zip_path: str, df: DataFrame, geoid: str = 'geoid', cols_to_keep: List[str] | None = None) → DataFrame[source]#

Merge a DataFrame df with a mapping table for the zipcode and other relevant geographical information (district name, sub-districts, etc.). The key is the geoid column. The zip mapper might be such as:

0 | 21004 | BRUSSEL | 50.8333 | 4.35 | 1000 | Brussels | Brussel Hoofdstad |

1 | 21015 | SCHAARBEEK | 50.85 | 4.38333 | 1030 | Brussels | Brussel Hoofdstad |

Parameters:

zip_path – The path to the zipcode mapper, a csv file with additional geo info and a geoid column
df – The DataFrame to merge with the zipcode mapper
geoid – The name of the geoid column in both the df and the zipcode mapper
cols_to_keep – The list of columns to keep from the zipcode mapper. If None, keep all columns.

Returns:

pd.DataFrame – The merged DataFrame with additional geo information

Raises:

TypeError – If cols_to_keep is not None and not a list of strings

geomapviz.aggregator.prepare_dataframe(df: DataFrame, groups: List[str] | str, target: str, other_cols_avg: List[str] | None = None, weight: str | None = None, verb: int = 0, distr: str = 'gaussian') → DataFrame[source]#

Prepare dataframe for the confidence interval computation.

Parameters:

df – Input data.
groups – List of column names containing the groups of interest.
target – Name of the target column.
other_cols_avg – Other columns to average, such as the predicted values of a model
weight – Name of the weight column. Default is None.
verb – Controls the verbosity of the warning message. Default is 0.
distr – Name of the distribution. Default is “gaussian”.

Returns:

pd.DataFrame – Prepared dataframe.

Notes

If weight is None, a weight column is added and set to 1. If the distribution is not Gaussian and the weight is not provided, a warning message is raised.

geomapviz.aggregator.weighted_average_aggregator(df: DataFrame, groups: str | List[str], target: str, other_cols_avg: List[str] | None = None, distr: str = 'gaussian', weight: str | None = None, verb: int = 0) → Tuple[DataFrame, DataFrame][source]#

Computes the weighted average and the confidence interval of a target variable in a Pandas DataFrame, grouped by one or more categorical columns.

Parameters:

df – The input DataFrame to compute the weighted average and confidence interval on.
groups – The name(s) of the column(s) in df that define the groups to aggregate. If groups is a string, it will be interpreted as a single group column name. If groups is a list of strings, it will be interpreted as multiple group column names.
target – The name of the column in df that contains the target variable to aggregate.
other_cols_avg – The predicted values of the target variable to use for computing the confidence interval or any other columns to average. If other_cols_avg is not None, it should be a list of column names.
distr – The distribution to use for computing the confidence interval. Supported distributions are ‘gaussian’ (default), ‘t’ and ‘bootstrap’.
weight – The name of the column in df that contains the weights to use for computing the weighted average. If weight is None (default), all rows are assumed to have equal weight.
verb – Verbosity level of the function (0: no message, 1: info, 2: debug). The default is 0.

Returns:

Tuple[pandas.DataFrame, pandas.DataFrame] – A tuple of two DataFrames: - The first DataFrame contains the weighted average and the number of observations per group. - The second DataFrame contains the confidence interval of the weighted average, computed at 95% confidence level.

Raises:

ValueError – If any of the input arguments is invalid.

Examples

>>> import pandas as pd
>>> from my_module import weighted_average_aggregator
>>> data = pd.DataFrame({'color': ['red', 'green', 'red', 'green', 'green'],
...                      'size': ['small', 'large', 'medium', 'large', 'small'],
...                      'price': [1.0, 2.0, 3.0, 4.0, 5.0]})
>>> groups = ['color', 'size']
>>> target = 'price'
>>> weights = 'weights'
>>> data[weights] = [1, 2, 3, 4, 5]
>>> result, conf = weighted_average_aggregator(df=data, groups=groups, target=target, weight=weights)

geomapviz.plot module#

Module for geographical visualization (geomapviz)

class geomapviz.plot.PlotOptions(df: DataFrame, target: str, other_cols_avg: str | None = None, weight: ndarray | None = None, plot_weight: bool = False, dissolve_on: str | None = None, geoid: str = 'nis', shp_file: GeoDataFrame | None = None, distr: str = 'gaussian', plot_uncertainty: bool = False, alpha: float = 0.5, background: str | None = None, figsize: Tuple[float, float] = (12, 12), ncols: int = 2, cmap: str | None = 'plasma', facecolor: str = '#2b303b', nbr_of_dec: int | None = None, autobin: bool = False, normalize: bool = True, n_bins: int = 7)[source]#

Bases: object

Options for generating a thematic map using Matplotlib.

Parameters:

df (pd.DataFrame) – The pandas DataFrame containing the data to plot.
target (str) – The column name of the target variable to plot.
other_cols_avg (Optional[str], optional) – The column name of other columns in the dataframe to be averaged and plotted against the target variable, by default None.
weight (Optional[np.ndarray], optional) – An array of weights for each observation, by default None.
plot_weight (bool, optional) – A boolean flag indicating whether to plot the weights on the map, by default False.
dissolve_on (Optional[str], optional) – The column name of the column to dissolve on when aggregating geometries, by default None.
geoid (str, optional) – The name of the column containing the geographic ID, by default “nis”.
shp_file (Optional[gpd.geodataframe.GeoDataFrame], optional) – A GeoDataFrame containing the geometry data to plot, by default None.
distr (str, optional) – The distribution type to use when calculating bin thresholds for the target variable, by default “gaussian”.
plot_uncertainty (bool, optional) – A boolean flag indicating whether to plot uncertainty bands around the target variable, by default False.
background (Optional[str], optional) – The name of the background map to use, by default None.
figsize (Tuple[float, float], optional) – The figure size, by default (12, 12).
ncols (int, optional) – The number of columns in the plot grid, by default 2.
cmap (Optional[str], optional) – The name of the color map to use for the plot, by default None.
facecolor (str, optional) – The background color of the plot, by default “#2b303b”.
nbr_of_dec (Optional[int], optional) – The number of decimal places to use when displaying values on the plot, by default None.
autobin (bool, optional) – A boolean flag indicating whether to automatically calculate bin thresholds for the target variable, by default False.
normalize (bool, optional) – A boolean flag indicating whether to normalize the color scale, by default True.
n_bins (int, optional) – The number of bins to use when manually calculating bin thresholds for the target variable, by default 7.
interactive (bool) – whether to use interactive charts or not.

Returns:

PlotOptions – A PlotOptions object containing the input arguments.

alpha: float = 0.5#

autobin: bool = False#

background: str | None = None#

cmap: str | None = 'plasma'#

df: DataFrame#

dissolve_on: str | None = None#

distr: str = 'gaussian'#

facecolor: str = '#2b303b'#

figsize: Tuple[float, float] = (12, 12)#

geoid: str = 'nis'#

interactive = False#

n_bins: int = 7#

nbr_of_dec: int | None = None#

ncols: int = 2#

normalize: bool = True#

other_cols_avg: str | None = None#

plot_uncertainty: bool = False#

plot_weight: bool = False#

shp_file: GeoDataFrame | None = None#

target: str#

weight: ndarray | None = None#

geomapviz.plot.spatial_average_facetplot(options: PlotOptions) → Figure[source]#

Create a facet plot of spatial data on a map.

Parameters:: options (PlotOptions) – An object containing the options for the plot.
Returns:: matplotlib.figure.Figure – The figure object containing the plot.
Raises:: TypeError – If the shapefile is not a GeoDataFrame.

Notes

This function uses the following helper functions: dark_or_light_color(), dissolve_and_aggregate(), calculate_bins_grouped_data(), and plot_grouped_data().

geomapviz.plot.spatial_average_plot(options: PlotOptions)[source]#

Plot data on a map using a GeoDataFrame.

This function loads the data from a DataFrame df, aggregates it by geographic area, and plots the resulting averages on a map using a GeoDataFrame shp_file. The target column in df is used as the variable to plot on the map. Other columns can also be plotted using the other_cols_avg parameter. The weight parameter can be used to weight the data. The dissolve_on and geoid parameters are used to group the data by geographic area. The autobin, normalize, and n_bins parameters control the binning of the data. The cmap parameter controls the colormap, and the facecolor parameter controls the color of the plot background. The plot_weight and plot_uncertainty parameters control whether to plot the weight and uncertainty data, respectively. The resulting plot is returned as a matplotlib Figure object.

Parameters:: options (PlotOptions) – The options for the plot.
Returns:: matplotlib.figure.Figure – The resulting figure object.
Raises:: TypeError – If shp_file in options is not a GeoDataFrame.

geomapviz.shapefiles module#

geomapviz.shapefiles.load_geometry(shp_path: str, geoid: str = 'INS') → GeoDataFrame[source]#

Load a shapefile and convert it to a GeoDataFrame with coordinates and projection set to Google Mercator.

Parameters:

shp_path – The file path of the shapefile to load.
geoid – The name of the geoid column in the GeoDataFrame, by default “INS”.

Returns:

gpd.GeoDataFrame – The GeoDataFrame with the shapefile’s geometry projected in Google Mercator and the geoid column cast to string.

geomapviz.shapefiles.load_shp(country: str = 'BE')[source]#

Load a shapefile of a specific country.

Parameters:: country – The ISO 3166-1 alpha-2 code of the country to load. Default is “BE” for Belgium.
Returns:: gpd.geodataframe.GeoDataFrame – A GeoDataFrame containing the shapefile data of the specified country.
Raises:: ValueError – If the specified country code is invalid or the corresponding shapefile is not found.

Examples

>>> belgium = load_shp("BE")
>>> belgium.plot()

geomapviz.utils module#

geomapviz.utils.check_list_of_str(str_list: List[str], name: str = 'str_list') → None[source]#

Raise an exception if str_list is not a list of strings

Parameters:

str_list –
name – (default 'str_list')

Raises:

TypeError – if str_list is not a List[str]

geomapviz.utils.convert_category_to_code(df: DataFrame)[source]#

convert_category_to_code converts categories (levels) to codes for easier representation

Parameters:: df – dataframe with cateorical columns
Returns:: pd.DataFrame – numerical dataframe

geomapviz package#

Submodules#

geomapviz.aggregator module#

geomapviz.plot module#

geomapviz.shapefiles module#

geomapviz.utils module#

Module contents#