Skip to content

Data Visualization

Comprehensive visualization tools for extracted composition-property data, synthesis information, and material families.

Basic Usage

from comproscanner import data_visualizer

# Family pie chart
fig = data_visualizer.plot_family_pie_chart(
    data_sources=["results.json"],
    output_file="families.png"
)

# Knowledge graph
data_visualizer.create_knowledge_graph(
    result_file="results.json"
)

Available Functions

plot_family_pie_chart()

Create a pie chart visualization of material families distribution.

from comproscanner import data_visualizer

fig = data_visualizer.plot_family_pie_chart(
    data_sources=["results.json"],
    output_file="families.png"
)

Required Parameters

Either data_sources OR folder_path must be provided.

data_sources (Union[List[str], List[Dict], str])

List of paths to JSON files or dictionaries containing materials data.

folder_path (str)

Path to folder containing JSON data files.

Optional Parameters

output_file (str)

Path to save the output plot image. If None, the plot is not saved.

figsize (Tuple[int, int])

Figure size as (width, height) in inches.

dpi (int)

DPI for output image.

min_percentage (float)

Minimum percentage for a category to be shown separately. Categories below this threshold are grouped into "Others".

title (str)

Title for the plot.

color_palette (str)

Matplotlib colormap name for the pie sections.

title_fontsize (int)

Font size for the title.

label_fontsize (int)

Font size for the percentage labels.

legend_fontsize (int)

Font size for the legend.

is_semantic_clustering_enabled (bool)

Whether to use semantic similarity for clustering similar families.

similarity_threshold (float)

Similarity threshold for clustering ranging between 0 and 1. Higher values require more similarity for grouping.

Default Values

output_file = None
figsize = (10, 8)
dpi = 300
min_percentage = 1.0
title = "Distribution of Material Families"
color_palette = "Blues"
title_fontsize = 14
label_fontsize = 10
legend_fontsize = 10
is_semantic_clustering_enabled = True
similarity_threshold = 0.8


plot_family_histogram()

Create a histogram visualization of material families frequency distribution.

from comproscanner import data_visualizer

fig = data_visualizer.plot_family_histogram(
    data_sources=["results.json"],
    output_file="families_hist.png"
)

Required Parameters

Either data_sources OR folder_path must be provided.

data_sources (Union[List[str], List[Dict], str])

List of paths to JSON files or dictionaries containing materials data.

folder_path (str)

Path to folder containing JSON data files.

Optional Parameters

output_file (str)

Path to save the output plot image. If None, the plot is not saved.

figsize (Tuple[int, int])

Figure size as (width, height) in inches.

dpi (int)

DPI for output image.

max_items (int)

Maximum number of items to display. Shows top N most frequent items.

title (str)

Title for the plot.

color_palette (str)

Matplotlib colormap name for the bars.

x_label (str)

Label for the x-axis.

y_label (str)

Label for the y-axis.

rotation (int)

Rotation angle for x-axis labels in degrees.

title_fontsize (int)

Font size for the title.

xlabel_fontsize (int)

Font size for the x-axis label.

ylabel_fontsize (int)

Font size for the y-axis label.

xtick_fontsize (int)

Font size for the x-axis tick labels.

value_label_fontsize (int)

Font size for the value labels displayed on top of bars.

grid_axis (str)

Axis for grid lines. Options: "x", "y", "both", or None for no grid.

grid_linestyle (str)

Line style for grid lines (e.g., "--", "-", ":", "-.").

grid_alpha (float)

Alpha (transparency) for grid lines ranging between 0 and 1.

is_semantic_clustering_enabled (bool)

Whether to enable semantic clustering of families.

similarity_threshold (float)

Similarity threshold for clustering which ranges between 0 and 1.

Default Values

output_file = None
figsize = (12, 8)
dpi = 300
max_items = 15
title = "Frequency Distribution of Material Families"
color_palette = "Blues"
x_label = "Material Family"
y_label = "Frequency"
rotation = 45
title_fontsize = 14
xlabel_fontsize = 12
ylabel_fontsize = 12
xtick_fontsize = 10
value_label_fontsize = 9
grid_axis = "y"
grid_linestyle = "--"
grid_alpha = 0.3
is_semantic_clustering_enabled = True
similarity_threshold = 0.8


plot_precursors_pie_chart()

Create a pie chart visualization of precursors distribution in materials synthesis.

from comproscanner import data_visualizer

fig = data_visualizer.plot_precursors_pie_chart(
    data_sources=["results.json"],
    output_file="precursors_pie.png"
)

Required Parameters

Either data_sources OR folder_path must be provided.

data_sources (Union[List[str], List[Dict], str])

List of paths to JSON files or dictionaries containing materials data.

folder_path (str)

Path to folder containing JSON data files.

Optional Parameters

output_file (str)

Path to save the output plot image. If None, the plot is not saved.

figsize (Tuple[int, int])

Figure size as (width, height) in inches.

dpi (int)

DPI for output image.

min_percentage (float)

Minimum percentage for a category to be shown separately. Categories below this threshold are grouped into "Others".

title (str)

Title for the plot.

color_palette (str)

Matplotlib colormap name for the pie sections.

title_fontsize (int)

Font size for the title.

label_fontsize (int)

Font size for the percentage labels.

legend_fontsize (int)

Font size for the legend.

is_semantic_clustering_enabled (bool)

Whether to use semantic similarity for clustering similar precursors.

similarity_threshold (float)

Threshold for similarity-based clustering when is_semantic_clustering_enabled is True (0-1).

Default Values

output_file = None
figsize = (10, 8)
dpi = 300
min_percentage = 1.0
title = "Distribution of Precursors in Materials Synthesis"
color_palette = "Blues"
title_fontsize = 14
label_fontsize = 10
legend_fontsize = 10
is_semantic_clustering_enabled = True
similarity_threshold = 0.8


plot_precursors_histogram()

Create a histogram visualization of precursors frequency distribution in materials synthesis.

from comproscanner import data_visualizer

fig = data_visualizer.plot_precursors_histogram(
    data_sources=["results.json"],
    output_file="precursors_hist.png"
)

Required Parameters

Either data_sources OR folder_path must be provided.

data_sources (Union[List[str], List[Dict], str])

List of paths to JSON files or dictionaries containing materials data.

folder_path (str)

Path to folder containing JSON data files.

Optional Parameters

output_file (str)

Path to save the output plot image. If None, the plot is not saved.

figsize (Tuple[int, int])

Figure size as (width, height) in inches.

dpi (int)

DPI for output image.

max_items (int)

Maximum number of items to display. Shows top N most frequent items.

title (str)

Title for the plot.

color_palette (str)

Matplotlib colormap name for the bars.

x_label (str)

Label for the x-axis.

y_label (str)

Label for the y-axis.

rotation (int)

Rotation angle for x-axis labels in degrees.

title_fontsize (int)

Font size for the title.

xlabel_fontsize (int)

Font size for the x-axis label.

ylabel_fontsize (int)

Font size for the y-axis label.

xtick_fontsize (int)

Font size for the x-axis tick labels.

value_label_fontsize (int)

Font size for the value labels on bars.

grid_axis (str)

Axis for grid lines ('x', 'y', 'both', or None for no grid).

grid_linestyle (str)

Line style for grid lines.

grid_alpha (float)

Alpha (transparency) for grid lines ranging between 0 and 1.

is_semantic_clustering_enabled (bool)

Whether to enable semantic clustering of precursors.

similarity_threshold (float)

Similarity threshold for clustering which ranges between 0 and 1.

Default Values

output_file = None
figsize = (12, 8)
dpi = 300
max_items = 15
title = "Frequency Distribution of Precursors in Materials Synthesis"
color_palette = "Blues"
x_label = "Precursor"
y_label = "Frequency"
rotation = 45
title_fontsize = 14
xlabel_fontsize = 12
ylabel_fontsize = 12
xtick_fontsize = 10
value_label_fontsize = 9
grid_axis = "y"
grid_linestyle = "--"
grid_alpha = 0.3
is_semantic_clustering_enabled = True
similarity_threshold = 0.8


plot_characterization_techniques_pie_chart()

Create a pie chart visualization of characterization techniques distribution.

from comproscanner import data_visualizer

fig = data_visualizer.plot_characterization_techniques_pie_chart(
    data_sources=["results.json"],
    output_file="techniques_pie.png"
)

Required Parameters

Either data_sources OR folder_path must be provided.

data_sources (Union[List[str], List[Dict], str])

List of paths to JSON files or dictionaries containing materials data.

folder_path (str)

Path to folder containing JSON data files.

Optional Parameters

output_file (str)

Path to save the output plot image. If None, the plot is not saved.

figsize (Tuple[int, int])

Figure size as (width, height) in inches.

dpi (int)

DPI for output image.

min_percentage (float)

Minimum percentage for a category to be shown separately.

title (str)

Title for the plot.

color_palette (str)

Matplotlib colormap name for the pie sections.

is_semantic_clustering_enabled (bool)

Whether to use semantic similarity for clustering similar techniques.

similarity_threshold (float)

Threshold for similarity-based clustering when is_semantic_clustering_enabled is True ranging between 0 and 1.

title_fontsize (int)

Font size for the title.

label_fontsize (int)

Font size for the percentage labels.

legend_fontsize (int)

Font size for the legend.

Default Values

output_file = None
figsize = (10, 8)
dpi = 300
min_percentage = 1.0
title = "Distribution of Characterization Techniques"
color_palette = "Blues"
is_semantic_clustering_enabled = True
similarity_threshold = 0.8
title_fontsize = 14
label_fontsize = 10
legend_fontsize = 10


plot_characterization_techniques_histogram()

Create a histogram visualization of characterization techniques frequency distribution.

from comproscanner import data_visualizer

fig = data_visualizer.plot_characterization_techniques_histogram(
    data_sources=["results.json"],
    output_file="techniques_hist.png"
)

Required Parameters

Either data_sources OR folder_path must be provided.

data_sources (Union[List[str], List[Dict], str])

List of paths to JSON files or dictionaries containing materials data.

folder_path (str)

Path to folder containing JSON data files.

Optional Parameters

output_file (str)

Path to save the output plot image. If None, the plot is not saved.

figsize (Tuple[int, int])

Figure size as (width, height) in inches.

dpi (int)

DPI for output image.

max_items (int)

Maximum number of items to display. Shows top N most frequent items.

title (str)

Title for the plot.

color_palette (str)

Matplotlib colormap name for the bars.

x_label (str)

Label for the x-axis.

y_label (str)

Label for the y-axis.

rotation (int)

Rotation angle for x-axis labels in degrees.

is_semantic_clustering_enabled (bool)

Whether to use semantic similarity for clustering similar techniques.

similarity_threshold (float)

Threshold for similarity-based clustering when is_semantic_clustering_enabled is True ranging between 0 and 1.

title_fontsize (int)

Font size for the title.

xlabel_fontsize (int)

Font size for the x-axis label.

ylabel_fontsize (int)

Font size for the y-axis label.

xtick_fontsize (int)

Font size for the x-axis tick labels.

value_label_fontsize (int)

Font size for the value labels on bars.

grid_axis (str)

Axis for grid lines ('x', 'y', 'both', or None for no grid).

grid_linestyle (str)

Line style for grid lines.

grid_alpha (float)

Alpha (transparency) for grid lines ranging between 0 and 1.

Default Values

output_file = None
figsize = (14, 8)
dpi = 300
max_items = 15
title = "Frequency Distribution of Characterization Techniques"
color_palette = "Blues"
x_label = "Characterization Technique"
y_label = "Frequency"
rotation = 45
is_semantic_clustering_enabled = True
similarity_threshold = 0.8
title_fontsize = 14
xlabel_fontsize = 12
ylabel_fontsize = 12
xtick_fontsize = 10
value_label_fontsize = 9
grid_axis = "y"
grid_linestyle = "--"
grid_alpha = 0.3


create_knowledge_graph()

Create a comprehensive knowledge graph from extracted composition-property data directly in Neo4j database. The knowledge graph visualizes relationships between materials, families, precursors, methods, techniques, and properties.

from comproscanner import data_visualizer

data_visualizer.create_knowledge_graph(
    result_file="results.json"
)

Required Parameters

result_file (str)

Path to the JSON file containing extracted results.

Optional Parameters

is_semantic_clustering_enabled (bool)

Whether to enable clustering of similar items using semantic similarity.

family_clustering_similarity_threshold (float)

Similarity threshold specifically for clustering material families ranging between 0 and 1.

method_clustering_similarity_threshold (float)

Similarity threshold specifically for clustering synthesis methods ranging between 0 and 1.

precursor_clustering_similarity_threshold (float)

Similarity threshold specifically for clustering precursors ranging between 0 and 1.

technique_clustering_similarity_threshold (float)

Similarity threshold specifically for clustering characterization techniques ranging between 0 and 1.

keyword_clustering_similarity_threshold (float)

Similarity threshold specifically for clustering keywords ranging between 0 and 1.

Default Values

is_semantic_clustering_enabled = True
family_clustering_similarity_threshold = 0.9
method_clustering_similarity_threshold = 0.8
precursor_clustering_similarity_threshold = 0.9
technique_clustering_similarity_threshold = 0.8
keyword_clustering_similarity_threshold = 0.85

Neo4j Database Required

The knowledge graph is created directly in a Neo4j database. Ensure you have Neo4j running and configured with following credentials in your .env file before creating knowledge graphs.

# neo4j
NEO4J_URI=YOUR_NEO4J_URI # default URI for Neo4j is bolt://localhost:7687
NEO4J_USER=YOUR_NEO4J_USERNAME
NEO4J_PASSWORD=YOUR_NEO4J_PASSWORD
NEO4J_DATABASE=YOUR_NEO4J_DATABASE_NAME

Output Format

All visualization functions (except create_knowledge_graph) return a matplotlib.figure.Figure object that can be viewed interactively:

from comproscanner import data_visualizer

fig = data_visualizer.plot_family_pie_chart(
    data_sources=["results.json"]
)

# Show the plot
fig.show()

Next Steps