Evaluation Visualization¶

The evaluation visualization module provides comprehensive tools for visualizing model performance metrics through various chart types including bar charts, radar charts, heatmaps, histograms, and violin plots.

Basic Usage¶

from comproscanner import eval_visualizer

# Bar chart for single model
fig = eval_visualizer.plot_single_bar_chart(
    result_file="evaluation.json",
    output_file="metrics.png"
)

# Radar chart comparison for multiple models
fig = eval_visualizer.plot_multiple_radar_charts(
    result_sources=["eval1.json", "eval2.json"],
    model_names=["Model A", "Model B"],
    output_file="comparison.png"
)

Single Model Visualizations¶

`plot_single_bar_chart()`¶

Create a bar chart visualization of evaluation metrics for a single model.

fig = eval_visualizer.plot_single_bar_chart(
    result_file="evaluation.json",
    output_file="metrics.png"
)

Required Parameters¶

Either result_file OR result_dict must be provided.

`result_file` (str)¶

Path to JSON file containing evaluation results.

`result_dict` (dict)¶

Dictionary containing evaluation results.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_name` (str)¶

Name of the model for display.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name.

`display_values` (bool)¶

Whether to display metric values on bars.

`title` (str)¶

Custom title for the plot.

`typical_threshold` (float)¶

Threshold value to display as horizontal line.

`threashold_line_style` (str)¶

Style of the threshold line.

`threashold_tolerance_range` (float)¶

Tolerance range for threshold line to skip the bars for better visibility.

`threshold_color` (str)¶

Color for the threshold line.

`show_grid` (bool)¶

Whether to display grid lines.

`bar_width` (float)¶

Width of the bars.

`y_axis_label` (str)¶

Label for the y-axis.

`x_axis_label` (str)¶

Label for the x-axis.

`y_axis_range` (Tuple[float, float])¶

Range for the y-axis.

`dpi` (int)¶

DPI for output image.

`metrics_to_include` (List[str])¶

List of metrics to include.

Default Values

output_file = None
model_name = None
figsize = (12, 8)
colormap = "Blues"
display_values = True
title = None
typical_threshold = None
threashold_line_style = "--"
threashold_tolerance_range = 0.03
threshold_color = "red"
show_grid = True
bar_width = 0.6
y_axis_label = "Score"
x_axis_label = None
y_axis_range = (0, 1)
dpi = 300
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "absolute_precision", "absolute_recall", "absolute_f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]

`plot_single_radar_chart()`¶

Create a radar chart visualization for a single model's evaluation metrics.

fig = eval_visualizer.plot_single_radar_chart(
    result_file="evaluation.json",
    output_file="radar.png"
)

Required Parameters¶

Either result_file OR result_dict must be provided.

`result_file` (str)¶

Path to JSON file containing evaluation results.

`result_dict` (dict)¶

Dictionary containing evaluation results.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_name` (str)¶

Name of the model for display.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name.

`display_values` (bool)¶

Whether to display metric values.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title.

`title_pad` (float)¶

Padding for the title from plot.

`typical_threshold` (float)¶

Threshold value to display as circular line.

`threshold_color` (str)¶

Color for the threshold line.

`threshold_line_style` (str)¶

Style of the threshold line.

`label_fontsize` (int)¶

Font size for axis labels.

`value_fontsize` (int)¶

Font size for displayed values.

`legend_loc` (str)¶

Location for the legend box.

`legend_fontsize` (int)¶

Font size for the legend.

`bbox_to_anchor` (Tuple[float, float])¶

Bounding box for legend.

`show_grid` (bool)¶

Whether to display grid lines.

`show_grid_labels` (bool)¶

Whether to display grid line values/labels.

`grid_line_width` (float)¶

Width of the grid lines.

`grid_line_style` (str)¶

Style of the grid lines.

`grid_line_color` (str)¶

Color of the grid lines.

`grid_line_alpha` (float)¶

Alpha (transparency) of grid lines ranging from 0 to 1.

`fill_alpha` (float)¶

Alpha (transparency) of the filled area ranging from 0 to 1.

`marker_size` (int)¶

Size of the markers in the radar plot.

`line_width` (float)¶

Width of the lines in the radar plot.

`label_padding` (float)¶

Padding for the labels in the radar plot.

`clockwise` (bool)¶

Flag to indicate whether to draw the radar chart in a clockwise direction.

`start_angle` (float)¶

Starting angle for the radar chart.

`radar_range` (Tuple[float, float])¶

Range of axes for the radar chart.

`dpi` (int)¶

DPI for output image.

`metrics_to_include` (List[str])¶

List of metrics to include.

Default Values

output_file = None
model_name = None
figsize = (10, 8)
colormap = "Blues"
display_values = False
title = None
title_fontsize = 14
title_pad = 50.0
typical_threshold = None
threshold_color = "red"
threshold_line_style = "--"
label_fontsize = 12
value_fontsize = 10
legend_loc = "best"
legend_fontsize = 10
bbox_to_anchor = None
show_grid = True
show_grid_labels = False
grid_line_width = 1.0
grid_line_style = "-"
grid_line_color = "gray"
grid_line_alpha = 0.2
fill_alpha = 0.4
marker_size = 7
line_width = 2.0
label_padding = 0.25
clockwise = True
start_angle = np.pi / 2
radar_range = (0, 1)
dpi = 300
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "absolute_precision", "absolute_recall", "absolute_f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]

`plot_single_performance_heatmap()`¶

Create a heatmap showing the distribution of scores across metrics for a single model.

fig = eval_visualizer.plot_single_performance_heatmap(
    result_file="evaluation.json",
    output_file="heatmap.png"
)

Required Parameters¶

Either result_file OR result_dict must be provided.

`result_file` (str)¶

Path to JSON file containing evaluation results.

`result_dict` (dict)¶

Dictionary containing evaluation results.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_name` (str)¶

Name of the model for display.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for heatmap.

`bin_count` (int)¶

Number of bins to divide the score range into.

`score_range` (Tuple[float, float])¶

Min and max values for score bins.

`use_percentage` (bool)¶

Whether to show percentages (True) or counts (False).

`show_averages` (bool)¶

Whether to show average scores per metric.

`show_group_labels` (bool)¶

Whether to show metric group labels.

`show_annotations` (bool)¶

Whether to show value annotations in cells.

`annotation_format` (str)¶

Format string for annotations (e.g., '.1f' or 'd').

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title text.

`title_pad` (float)¶

Padding for the title from the top of the plot.

`labels` (List[str])¶

Labels for the x and y axes.

`label_fontsize` (int)¶

Font size for the axis labels.

`dpi` (int)¶

DPI for output image.

`group_metrics` (bool)¶

Whether to visually group related metrics together.

`metric_groups` (List[Dict])¶

Custom metric groups definition for grouping metrics.

`group_colors` (List[str])¶

Colors for metric groups.

`metrics_to_include` (List[str])¶

Specific metrics to include in the heatmap.

`group_label_right_margin` (int)¶

Right margin for group labels.

`average_value_left_margin` (int)¶

Left margin for average values.

`plot_padding` (float)¶

Padding between heatmap and axes.

Default Values

output_file = None
model_name = None
figsize = (12, 12)
colormap = "YlGnBu"
bin_count = 10
score_range = (0, 1)
use_percentage = True
show_averages = False
show_group_labels = False
show_annotations = False
annotation_format = None
title = None
title_fontsize = 14
title_pad = None
labels = ["Metrics", "Scores"]
label_fontsize = 12
dpi = 300
group_metrics = False
metric_groups = None
group_colors = None
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "precision", "recall", "f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]
group_label_right_margin = 1
average_value_left_margin = 1
plot_padding = 0.1

`plot_single_histogram_chart()`¶

Create a histogram for a single metric from evaluation results.

fig = eval_visualizer.plot_single_histogram_chart(
    result_file="evaluation.json",
    metric_name="overall_accuracy",
    output_file="histogram.png"
)

Required Parameters¶

Either result_file OR result_dict must be provided.

`result_file` (str)¶

Path to JSON file containing evaluation results.

`result_dict` (dict)¶

Dictionary containing evaluation results.

Optional Parameters¶

`metric_name` (str)¶

Name of the metric to plot.

`output_file` (str)¶

Path to save the output plot image.

`model_name` (str)¶

Name of the model for display in the plot title.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`bins` (int)¶

Number of bins or bin edges for histogram.

`color` (str)¶

Color for the histogram bars.

`color_gradient` (bool)¶

Whether to use color gradient for histogram bars.

`gradient_colors` (List[str])¶

List of colors for gradient.

`show_kde` (bool)¶

Whether to show a Kernel Density Estimation (KDE) curve over the histogram.

`show_mean` (bool)¶

Whether to show a vertical line at the mean value.

`mean_color` (str)¶

Color for the mean line.

`mean_line_style` (str)¶

Line style for the mean line.

`show_median` (bool)¶

Whether to show a vertical line at the median value.

`median_color` (str)¶

Color for the median line.

`median_line_style` (str)¶

Line style for the median line.

`show_threshold` (bool)¶

Whether to show a threshold line.

`threshold_value` (float)¶

Value for the threshold line.

`threshold_color` (str)¶

Color for the threshold line.

`threshold_line_style` (str)¶

Line style for the threshold line.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title.

`xlabel` (str)¶

Custom label for x-axis.

`ylabel` (str)¶

Label for y-axis.

`xlabel_fontsize` (int)¶

Font size for x-axis label.

`ylabel_fontsize` (int)¶

Font size for y-axis label.

`legend_loc` (str)¶

Location for the legend.

`bbox_to_anchor` (Tuple[float, float])¶

Bounding box for the legend.

`dpi` (int)¶

DPI for output image.

Default Values

metric_name = "overall_accuracy"
output_file = None
model_name = None
figsize = (8, 6)
bins = 10
color = "skyblue"
color_gradient = False
gradient_colors = None
show_kde = False
show_mean = False
mean_color = "green"
mean_line_style = "-"
show_median = False
median_color = "black"
median_line_style = "-"
show_threshold = False
threshold_value = 0.8
threshold_color = "red"
threshold_line_style = "--"
title = None
title_fontsize = 14
xlabel = None
ylabel = "Count"
xlabel_fontsize = 12
ylabel_fontsize = 12
legend_loc = "best"
bbox_to_anchor = None
dpi = 300

`plot_single_violin_chart()`¶

Create a violin plot for all metrics from a single model's evaluation results.

fig = eval_visualizer.plot_single_violin_chart(
    result_file="evaluation.json",
    output_file="violin.png"
)

Required Parameters¶

Either result_file OR result_dict must be provided.

`result_file` (str)¶

Path to JSON file containing evaluation results.

`result_dict` (dict)¶

Dictionary containing evaluation results.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_name` (str)¶

Name of the model for display in the plot.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for the violins.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title text.

`title_pad` (float)¶

Padding for the title from the top of the plot.

`show_box` (bool)¶

Whether to show a box plot inside the violin.

`show_mean` (bool)¶

Whether to show the mean marker.

`mean_marker` (str)¶

Marker style for the mean.

`mean_color` (str)¶

Color for the mean marker.

`show_median` (bool)¶

Whether to show the median line.

`median_color` (str)¶

Color for the median line.

`median_line_style` (str)¶

Line style for the median.

`show_grid` (bool)¶

Whether to display grid lines.

`show_threshold` (bool)¶

Whether to show a threshold line.

`threshold_value` (float)¶

Value for the threshold line.

`threshold_color` (str)¶

Color for the threshold line.

`threshold_line_style` (str)¶

Line style for the threshold line.

`violin_alpha` (float)¶

Alpha (transparency) of the violin plots (0-1).

`violin_width` (float)¶

Width of the violin plots.

`x_label` (str)¶

Label for the x-axis.

`y_label` (str)¶

Label for the y-axis.

`x_label_fontsize` (int)¶

Font size for x-axis label.

`y_label_fontsize` (int)¶

Font size for y-axis label.

`y_axis_range` (Tuple[float, float])¶

Range for the y-axis.

`label_rotation` (int)¶

Rotation angle for x-axis labels.

`inner` (str)¶

The representation of the data points inside the violin ('box', 'stick', 'point', or None).

`dpi` (int)¶

DPI for output image.

`metrics_to_include` (List[str])¶

Specific metrics to include in the plot.

Default Values

output_file = None
model_name = None
figsize = (14, 10)
colormap = "Blues"
title = None
title_fontsize = 14
title_pad = 10.0
show_box = False
show_mean = True
mean_marker = "o"
mean_color = "red"
show_median = False
median_color = "green"
median_line_style = "-"
show_grid = True
show_threshold = False
threshold_value = 0.8
threshold_color = "red"
threshold_line_style = "--"
violin_alpha = 0.7
violin_width = 0.8
x_label = "Metrics"
y_label = "Score"
x_label_fontsize = 12
y_label_fontsize = 12
y_axis_range = (0, 1)
label_rotation = 45
inner = "box"
dpi = 300
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "precision", "recall", "f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]

Multi-Model Comparison Visualizations¶

`plot_multiple_bar_charts()`¶

Plot evaluation metrics from multiple result files or dictionaries as grouped bar charts.

fig = eval_visualizer.plot_multiple_bar_charts(
    result_sources=["model1.json", "model2.json"],
    output_file="comparison.png"
)

Required Parameters¶

Either result_sources OR folder_path must be provided.

`result_sources` (Union[List[str], List[Dict], str])¶

List of JSON file paths or dictionaries containing evaluation results for multiple models.

`folder_path` (str)¶

Path to folder containing JSON result files.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_names` (List[str])¶

Names of models to display in the legend. Defaults to filename or agent_model_name from results.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for the bars.

`display_values` (bool)¶

Whether to display metric values on bars.

`title` (str)¶

Custom title for the plot.

`typical_threshold` (float)¶

Threshold value to display as horizontal line. If not provided, no line is drawn.

`threshold_line_style` (str)¶

Style of the threshold line.

`threashold_tolerance_range` (float)¶

Tolerance range for the threshold line.

`threshold_color` (str)¶

Color for the threshold line.

`show_grid` (bool)¶

Whether to display horizontal grid lines in the plot.

`y_label` (str)¶

Label for the y-axis.

`x_label` (str)¶

Label for the x-axis.

`group_width` (float)¶

Width allocated for each group of bars (0-1).

`bar_width` (float)¶

Width of individual bars. Calculated automatically if None.

`legend_loc` (str)¶

Location of the legend.

`legend_fontsize` (int)¶

Font size for the legend.

`y_axis_range` (Tuple[float, float])¶

Range for the y-axis.

`dpi` (int)¶

DPI for output image.

`metrics_to_include` (List[str])¶

List of metrics to include in the plot.

Default Values

output_file = None
model_names = None
figsize = (14, 10)
colormap = "Blues"
display_values = True
title = None
typical_threshold = None
threshold_line_style = "--"
threashold_tolerance_range = 0.03
threshold_color = "red"
show_grid = True
y_label = "Score"
x_label = None
group_width = 0.8
bar_width = None
legend_loc = "best"
legend_fontsize = 10
y_axis_range = (0, 1)
dpi = 300
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "precision", "recall", "f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]

`plot_multiple_radar_charts()`¶

Plot evaluation metrics from multiple result files or dictionaries as a radar chart.

fig = eval_visualizer.plot_multiple_radar_charts(
    result_sources=["model1.json", "model2.json"],
    output_file="radar_comparison.png"
)

Required Parameters¶

Either result_sources OR folder_path must be provided.

`result_sources` (Union[List[str], List[Dict], str])¶

List of JSON file paths or dictionaries containing evaluation results for multiple models.

`folder_path` (str)¶

Path to folder containing JSON result files.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_names` (List[str])¶

Names of models to display in the legend.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for the plot lines and markers.

`display_values` (bool)¶

Whether to display metric values on the chart.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title.

`title_pad` (float)¶

Padding for the title from the top of the plot.

`typical_threshold` (float)¶

Threshold value to display as circular line.

`threshold_color` (str)¶

Color for the threshold line.

`threshold_line_style` (str)¶

Style of the threshold line.

`label_fontsize` (int)¶

Font size for axis labels.

`value_fontsize` (int)¶

Font size for displayed values.

`legend_loc` (str)¶

Location of the legend.

`bbox_to_anchor` (Tuple[float, float])¶

Bounding box for the legend.

`legend_fontsize` (int)¶

Font size for the legend.

`show_grid` (bool)¶

Whether to display grid lines.

`show_grid_labels` (bool)¶

Whether to display grid line values/labels.

`grid_line_width` (float)¶

Width of the grid lines.

`grid_line_style` (str)¶

Style of the grid lines.

`grid_line_color` (str)¶

Color of the grid lines.

`grid_line_alpha` (float)¶

Alpha (transparency) of the grid lines (0-1).

`fill_alpha` (float)¶

Alpha (transparency) of the filled area (0-1).

`marker_size` (int)¶

Size of the data point markers.

`line_width` (float)¶

Width of the plot lines.

`label_padding` (float)¶

Distance padding for axis labels from plot.

`clockwise` (bool)¶

Direction of the radar chart.

`start_angle` (float)¶

Start angle in radians.

`radar_range` (Tuple[float, float])¶

Range for the radar axes.

`dpi` (int)¶

DPI for output image.

`metrics_to_include` (List[str])¶

List of metrics to include in the plot.

Default Values

output_file = None
model_names = None
figsize = (12, 10)
colormap = "viridis"
display_values = False
title = None
title_fontsize = 14
title_pad = 50.0
typical_threshold = None
threshold_color = "red"
threshold_line_style = "--"
label_fontsize = 12
value_fontsize = 10
legend_loc = "best"
bbox_to_anchor = None
legend_fontsize = 10
show_grid = True
show_grid_labels = False
grid_line_width = 1.0
grid_line_style = "-"
grid_line_color = "gray"
grid_line_alpha = 0.2
fill_alpha = 0.25
marker_size = 7
line_width = 2
label_padding = 0.25
clockwise = True
start_angle = np.pi / 2
radar_range = (0, 1)
dpi = 300
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "precision", "recall", "f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]

`plot_multiple_performance_heatmaps()`¶

Create a heatmap showing the distribution of scores across metrics for multiple models.

fig = eval_visualizer.plot_multiple_performance_heatmaps(
    result_sources=["model1.json", "model2.json"],
    output_file="heatmaps.png"
)

Required Parameters¶

Either result_sources OR folder_path must be provided.

`result_sources` (Union[List[str], List[Dict], str])¶

List of JSON file paths or dictionaries containing evaluation results for multiple models.

`folder_path` (str)¶

Path to folder containing JSON result files.

Optional Parameters¶

`output_file` (str)¶

Path to save the output visualization.

`model_names` (List[str])¶

Names to display for models in the plots.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for the heatmap.

`bin_count` (int)¶

Number of bins to divide the score range into.

`score_range` (Tuple[float, float])¶

Min and max values for score bins.

`use_percentage` (bool)¶

Whether to show percentages (True) or counts (False).

`show_averages` (bool)¶

Whether to show average scores per metric group and model.

`show_group_labels` (bool)¶

Whether to show metric group labels.

`show_annotations` (bool)¶

Whether to show value annotations in cells.

`annotation_format` (str)¶

Format string for annotations (e.g., '.1f' or 'd').

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title.

`labels` (List[str])¶

Labels for the x and y axes.

`label_fontsize` (int)¶

Font size for the axis labels.

`dpi` (int)¶

Resolution for saved image.

`group_metrics` (bool)¶

Whether to visually group related metrics.

`metric_groups` (List[Dict])¶

Custom metric groups definition.

`group_colors` (List[str])¶

Colors for metric groups.

`metrics_to_include` (List[str])¶

Specific metrics to include. If None, includes all available.

`sort_models_by` (str)¶

Metric to sort models by when displaying multiple models.

`combine_models` (bool)¶

Whether to combine all models into a single distribution plot.

`group_label_right_margin` (int)¶

Right margin for group labels.

`average_value_left_margin` (int)¶

Left margin for average values.

`plot_padding` (float)¶

Padding between heatmap and axes labels and title.

Default Values

output_file = None
model_names = None
figsize = (14, 12)
colormap = "YlGnBu"
bin_count = 10
score_range = (0, 1)
use_percentage = True
show_averages = False
show_group_labels = False
show_annotations = False
annotation_format = None
title = None
title_fontsize = 14
labels = ["Metrics", "Scores"]
label_fontsize = 12
dpi = 300
group_metrics = True
metric_groups = None
group_colors = None
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "precision", "recall", "f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]
sort_models_by = "overall_accuracy"
combine_models = False
group_label_right_margin = 1
average_value_left_margin = 1
plot_padding = 0.1

`plot_multiple_confusion_matrices_combined()`¶

Create a confusion matrix-style heatmap showing all models vs all performance metrics in a single visualization.

fig = eval_visualizer.plot_multiple_confusion_matrices_combined(
    result_sources=["model1.json", "model2.json"],
    output_file="confusion_matrices.png"
)

Required Parameters¶

Either result_sources OR folder_path must be provided.

`result_sources` (Union[List[str], List[Dict], str])¶

List of JSON file paths or dictionaries containing evaluation results for multiple models.

`folder_path` (str)¶

Path to folder containing JSON result files.

Optional Parameters¶

`output_file` (str)¶

Path to save the output visualization.

`model_names` (List[str])¶

Names to display for models in the plot.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for the heatmap.

`show_annotations` (bool)¶

Whether to show value annotations in cells.

`annotation_format` (str)¶

Format string for annotations (e.g., '.2f' or '.1f').

`annotation_fontsize` (int)¶

Font size for the annotation values inside cells.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title.

`title_pad` (float)¶

Padding for the title from the top of the plot.

`labels` (List[str])¶

Labels for the x and y axes.

`label_fontsize` (int)¶

Font size for the axis labels.

`tick_label_fontsize` (int)¶

Font size for x and y tick labels.

`dpi` (int)¶

Resolution for saved image.

`metrics_to_include` (List[str])¶

Specific metrics to include. Default includes all 9 standard metrics.

`sort_models_by` (str)¶

Metric to sort models by, or "average" for average of all metrics.

`value_range` (Tuple[float, float])¶

Min and max values for color mapping.

`show_colorbar` (bool)¶

Whether to show the colorbar legend.

`colorbar_label` (str)¶

Label for the colorbar.

`colorbar_fontsize` (int)¶

Font size for colorbar labels.

`plot_padding` (float)¶

Padding between heatmap and axes labels and title.

Default Values

output_file = None
model_names = None
figsize = (14, 10)
colormap = "YlOrRd"
show_annotations = True
annotation_format = None
annotation_fontsize = 10
title = None
title_fontsize = 14
title_pad = 20.0
labels = ["Models", "Metrics"]
label_fontsize = 12
tick_label_fontsize = 10
dpi = 300
metrics_to_include = ["overall_accuracy", "overall_composition_accuracy", "overall_synthesis_accuracy", "precision", "recall", "f1_score", "normalized_precision", "normalized_recall", "normalized_f1_score"]
sort_models_by = "average"
value_range = (0, 1)
show_colorbar = True
colorbar_label = "Score"
colorbar_fontsize = 10
plot_padding = 0.1

`plot_multiple_histogram_charts()`¶

Create histograms for a single metric from evaluation results for multiple models.

fig = eval_visualizer.plot_multiple_histogram_charts(
    result_sources=["model1.json", "model2.json"],
    metric_name="overall_accuracy",
    output_file="histograms.png"
)

Required Parameters¶

Either result_sources OR folder_path must be provided.

`result_sources` (Union[List[str], List[Dict], str])¶

List of JSON file paths or dictionaries containing evaluation results for multiple models.

`folder_path` (str)¶

Path to folder containing JSON result files.

Optional Parameters¶

`output_file` (str)¶

Path to save the output plot image.

`model_names` (List[str])¶

Names of the models for display in the plot titles.

`metric_name` (str)¶

Name of the metric to plot.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`bins` (int)¶

Number of bins or bin edges for histogram.

`colormap` (str)¶

Matplotlib colormap name for the histogram colors.

`show_kde` (bool)¶

Whether to show a KDE curve over the histogram.

`kde_alpha` (float)¶

Alpha value for the KDE curve.

`show_mean` (bool)¶

Whether to show a vertical line at the mean value.

`mean_color` (str)¶

Color for the mean line.

`mean_line_style` (str)¶

Line style for the mean line.

`show_median` (bool)¶

Whether to show a vertical line at the median value.

`median_color` (str)¶

Color for the median line.

`median_line_style` (str)¶

Line style for the median line.

`show_threshold` (bool)¶

Whether to show a threshold line.

`threshold_value` (float)¶

Value for the threshold line.

`threshold_color` (str)¶

Color for the threshold line.

`threshold_line_style` (str)¶

Line style for the threshold line.

`show_grid` (bool)¶

Whether to show grid lines on the plot.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title.

`xlabel` (str)¶

Custom label for x-axis.

`ylabel` (str)¶

Label for y-axis.

`xlabel_fontsize` (int)¶

Font size for x-axis label.

`ylabel_fontsize` (int)¶

Font size for y-axis label.

`legend_loc` (str)¶

Location for the legend.

`legend_fontsize` (int)¶

Font size for the legend.

`bbox_to_anchor` (str)¶

Bounding box for the legend.

`is_normalized` (bool)¶

Whether to normalize histograms to show percentages.

`shared_bins` (bool)¶

Whether to use shared bins across all histograms.

`dpi` (int)¶

DPI for the output image.

Default Values

output_file = None
model_names = None
metric_name = "overall_accuracy"
figsize = (14, 12)
bins = 10
colormap = "tab10"
show_kde = False
kde_alpha = 0.7
show_mean = False
mean_color = "green"
mean_line_style = "-"
show_median = False
median_color = "black"
median_line_style = "-"
show_threshold = False
threshold_value = 0.8
threshold_color = "red"
threshold_line_style = "--"
show_grid = True
title = None
title_fontsize = 14
xlabel = None
ylabel = "Count"
xlabel_fontsize = 12
ylabel_fontsize = 12
legend_loc = "best"
legend_fontsize = 10
bbox_to_anchor = None
is_normalized = True
shared_bins = True
dpi = 300

`plot_multiple_violin_charts()`¶

Create violin plots comparing multiple models on a single metric.

fig = eval_visualizer.plot_multiple_violin_charts(
    result_sources=["model1.json", "model2.json"],
    metric_name="overall_accuracy",
    output_file="violins.png"
)

Required Parameters¶

Either result_sources OR folder_path must be provided.

`result_sources` (Union[List[str], List[Dict], str])¶

List of JSON file paths or dictionaries containing evaluation results for multiple models.

`folder_path` (str)¶

Path to folder containing JSON result files.

Optional Parameters¶

`output_file` (str)¶

Path to save the output visualization.

`model_names` (List[str])¶

Names to display for models in the plot.

`metric_name` (str)¶

Name of the metric to compare across models.

`figsize` (Tuple[int, int])¶

Figure size (width, height) in inches.

`colormap` (str)¶

Matplotlib colormap name for the violins.

`title` (str)¶

Custom title for the plot.

`title_fontsize` (int)¶

Font size for the title text.

`title_pad` (float)¶

Padding for the title from the top of the plot.

`show_box` (bool)¶

Whether to show a box plot inside the violin.

`show_mean` (bool)¶

Whether to show the mean marker.

`mean_marker` (str)¶

Marker style for the mean.

`mean_color` (str)¶

Color for the mean marker.

`show_median` (bool)¶

Whether to show the median line.

`median_color` (str)¶

Color for the median line.

`median_line_style` (str)¶

Line style for the median.

`show_grid` (bool)¶

Whether to display grid lines.

`show_threshold` (bool)¶

Whether to show a threshold line.

`threshold_value` (float)¶

Value for the threshold line.

`threshold_color` (str)¶

Color for the threshold line.

`threshold_line_style` (str)¶

Line style for the threshold line.

`violin_alpha` (float)¶

Alpha (transparency) of the violin plots (0-1).

`violin_width` (float)¶

Width of the violin plots.

`x_label` (str)¶

Label for the x-axis.

`y_label` (str)¶

Label for the y-axis.

`x_label_fontsize` (int)¶

Font size for x-axis label.

`y_label_fontsize` (int)¶

Font size for y-axis label.

`y_axis_range` (Tuple[float, float])¶

Range for the y-axis.

`label_rotation` (int)¶

Rotation angle for x-axis labels.

`inner` (str)¶

The representation of the data points inside the violin ('box', 'stick', 'point', or None).

`dpi` (int)¶

Resolution for saved image.

Default Values

output_file = None
model_names = None
metric_name = "overall_accuracy"
figsize = (12, 8)
colormap = "viridis"
title = None
title_fontsize = 14
title_pad = 50.0
show_box = False
show_mean = True
mean_marker = "o"
mean_color = "red"
show_median = False
median_color = "green"
median_line_style = "-"
show_grid = True
show_threshold = False
threshold_value = 0.8
threshold_color = "red"
threshold_line_style = "--"
violin_alpha = 0.7
violin_width = 0.8
x_label = "Models"
y_label = "Score"
x_label_fontsize = 12
y_label_fontsize = 12
y_axis_range = (0, 1)
label_rotation = 45
inner = "box"
dpi = 300

Next Steps¶

Explore Data Visualization
Learn about Semantic Evaluation
Learn about RAG Configuration

Evaluation Visualization¶

Basic Usage¶

Single Model Visualizations¶

plot_single_bar_chart()¶

Required Parameters¶

result_file (str)¶

result_dict (dict)¶

Optional Parameters¶

output_file (str)¶

model_name (str)¶

figsize (Tuple[int, int])¶

colormap (str)¶

display_values (bool)¶

title (str)¶

typical_threshold (float)¶

threashold_line_style (str)¶

threashold_tolerance_range (float)¶

threshold_color (str)¶

show_grid (bool)¶

bar_width (float)¶

y_axis_label (str)¶

x_axis_label (str)¶

y_axis_range (Tuple[float, float])¶

dpi (int)¶

metrics_to_include (List[str])¶

plot_single_radar_chart()¶

Required Parameters¶

result_file (str)¶

result_dict (dict)¶

Optional Parameters¶

output_file (str)¶

model_name (str)¶

figsize (Tuple[int, int])¶

colormap (str)¶

display_values (bool)¶

title (str)¶

title_fontsize (int)¶

title_pad (float)¶

typical_threshold (float)¶

threshold_color (str)¶

threshold_line_style (str)¶

label_fontsize (int)¶

value_fontsize (int)¶

legend_loc (str)¶

legend_fontsize (int)¶

bbox_to_anchor (Tuple[float, float])¶

show_grid (bool)¶

show_grid_labels (bool)¶

grid_line_width (float)¶

grid_line_style (str)¶

grid_line_color (str)¶

grid_line_alpha (float)¶

fill_alpha (float)¶

marker_size (int)¶

line_width (float)¶

label_padding (float)¶

clockwise (bool)¶

start_angle (float)¶

radar_range (Tuple[float, float])¶

dpi (int)¶

metrics_to_include (List[str])¶

plot_single_performance_heatmap()¶

Required Parameters¶

result_file (str)¶

result_dict (dict)¶

Optional Parameters¶

output_file (str)¶

model_name (str)¶

figsize (Tuple[int, int])¶

colormap (str)¶

bin_count (int)¶

score_range (Tuple[float, float])¶

use_percentage (bool)¶

show_averages (bool)¶

show_group_labels (bool)¶

show_annotations (bool)¶

annotation_format (str)¶

title (str)¶

title_fontsize (int)¶

title_pad (float)¶

`plot_single_bar_chart()`¶

`result_file` (str)¶

`result_dict` (dict)¶

`output_file` (str)¶

`model_name` (str)¶

`figsize` (Tuple[int, int])¶

`colormap` (str)¶

`display_values` (bool)¶

`title` (str)¶

`typical_threshold` (float)¶

`threashold_line_style` (str)¶

`threashold_tolerance_range` (float)¶

`threshold_color` (str)¶

`show_grid` (bool)¶

`bar_width` (float)¶

`y_axis_label` (str)¶

`x_axis_label` (str)¶

`y_axis_range` (Tuple[float, float])¶

`dpi` (int)¶

`metrics_to_include` (List[str])¶

`plot_single_radar_chart()`¶

`result_file` (str)¶

`result_dict` (dict)¶

`output_file` (str)¶

`model_name` (str)¶

`figsize` (Tuple[int, int])¶

`colormap` (str)¶

`display_values` (bool)¶

`title` (str)¶

`title_fontsize` (int)¶

`title_pad` (float)¶

`typical_threshold` (float)¶

`threshold_color` (str)¶

`threshold_line_style` (str)¶

`label_fontsize` (int)¶

`value_fontsize` (int)¶

`legend_loc` (str)¶

`legend_fontsize` (int)¶

`bbox_to_anchor` (Tuple[float, float])¶

`show_grid` (bool)¶

`show_grid_labels` (bool)¶

`grid_line_width` (float)¶

`grid_line_style` (str)¶

`grid_line_color` (str)¶

`grid_line_alpha` (float)¶

`fill_alpha` (float)¶

`marker_size` (int)¶

`line_width` (float)¶

`label_padding` (float)¶

`clockwise` (bool)¶

`start_angle` (float)¶

`radar_range` (Tuple[float, float])¶

`dpi` (int)¶

`metrics_to_include` (List[str])¶

`plot_single_performance_heatmap()`¶

`result_file` (str)¶

`result_dict` (dict)¶

`output_file` (str)¶

`model_name` (str)¶

`figsize` (Tuple[int, int])¶

`colormap` (str)¶

`bin_count` (int)¶

`score_range` (Tuple[float, float])¶

`use_percentage` (bool)¶

`show_averages` (bool)¶

`show_group_labels` (bool)¶

`show_annotations` (bool)¶

`annotation_format` (str)¶

`title` (str)¶

`title_fontsize` (int)¶

`title_pad` (float)¶

`labels` (List[str])¶

`label_fontsize` (int)¶

`dpi` (int)¶

`group_metrics` (bool)¶

`metric_groups` (List[Dict])¶

`group_colors` (List[str])¶

`metrics_to_include` (List[str])¶

`group_label_right_margin` (int)¶

`average_value_left_margin` (int)¶