Data Visualization¶
Comprehensive visualization tools for extracted composition-property data, synthesis information, and material families.
Basic Usage¶
from comproscanner import data_visualizer
# Family pie chart
fig = data_visualizer.plot_family_pie_chart(
data_sources=["results.json"],
output_file="families.png"
)
# Knowledge graph
data_visualizer.create_knowledge_graph(
result_file="results.json"
)
Available Functions¶
plot_family_pie_chart()¶
Create a pie chart visualization of material families distribution.
from comproscanner import data_visualizer
fig = data_visualizer.plot_family_pie_chart(
data_sources=["results.json"],
output_file="families.png"
)
Required Parameters¶
Either data_sources OR folder_path must be provided.
data_sources (Union[List[str], List[Dict], str])¶
List of paths to JSON files or dictionaries containing materials data.
folder_path (str)¶
Path to folder containing JSON data files.
Optional Parameters¶
output_file (str)¶
Path to save the output plot image. If None, the plot is not saved.
figsize (Tuple[int, int])¶
Figure size as (width, height) in inches.
dpi (int)¶
DPI for output image.
min_percentage (float)¶
Minimum percentage for a category to be shown separately. Categories below this threshold are grouped into "Others".
title (str)¶
Title for the plot.
color_palette (str)¶
Matplotlib colormap name for the pie sections.
title_fontsize (int)¶
Font size for the title.
label_fontsize (int)¶
Font size for the percentage labels.
legend_fontsize (int)¶
Font size for the legend.
is_semantic_clustering_enabled (bool)¶
Whether to use semantic similarity for clustering similar families.
similarity_threshold (float)¶
Similarity threshold for clustering ranging between 0 and 1. Higher values require more similarity for grouping.
Default Values
output_file = Nonefigsize = (10, 8)dpi = 300min_percentage = 1.0title = "Distribution of Material Families"color_palette = "Blues"title_fontsize = 14label_fontsize = 10legend_fontsize = 10is_semantic_clustering_enabled = Truesimilarity_threshold = 0.8
plot_family_histogram()¶
Create a histogram visualization of material families frequency distribution.
from comproscanner import data_visualizer
fig = data_visualizer.plot_family_histogram(
data_sources=["results.json"],
output_file="families_hist.png"
)
Required Parameters¶
Either data_sources OR folder_path must be provided.
data_sources (Union[List[str], List[Dict], str])¶
List of paths to JSON files or dictionaries containing materials data.
folder_path (str)¶
Path to folder containing JSON data files.
Optional Parameters¶
output_file (str)¶
Path to save the output plot image. If None, the plot is not saved.
figsize (Tuple[int, int])¶
Figure size as (width, height) in inches.
dpi (int)¶
DPI for output image.
max_items (int)¶
Maximum number of items to display. Shows top N most frequent items.
title (str)¶
Title for the plot.
color_palette (str)¶
Matplotlib colormap name for the bars.
x_label (str)¶
Label for the x-axis.
y_label (str)¶
Label for the y-axis.
rotation (int)¶
Rotation angle for x-axis labels in degrees.
title_fontsize (int)¶
Font size for the title.
xlabel_fontsize (int)¶
Font size for the x-axis label.
ylabel_fontsize (int)¶
Font size for the y-axis label.
xtick_fontsize (int)¶
Font size for the x-axis tick labels.
value_label_fontsize (int)¶
Font size for the value labels displayed on top of bars.
grid_axis (str)¶
Axis for grid lines. Options: "x", "y", "both", or None for no grid.
grid_linestyle (str)¶
Line style for grid lines (e.g., "--", "-", ":", "-.").
grid_alpha (float)¶
Alpha (transparency) for grid lines ranging between 0 and 1.
is_semantic_clustering_enabled (bool)¶
Whether to enable semantic clustering of families.
similarity_threshold (float)¶
Similarity threshold for clustering which ranges between 0 and 1.
Default Values
output_file = Nonefigsize = (12, 8)dpi = 300max_items = 15title = "Frequency Distribution of Material Families"color_palette = "Blues"x_label = "Material Family"y_label = "Frequency"rotation = 45title_fontsize = 14xlabel_fontsize = 12ylabel_fontsize = 12xtick_fontsize = 10value_label_fontsize = 9grid_axis = "y"grid_linestyle = "--"grid_alpha = 0.3is_semantic_clustering_enabled = Truesimilarity_threshold = 0.8
plot_precursors_pie_chart()¶
Create a pie chart visualization of precursors distribution in materials synthesis.
from comproscanner import data_visualizer
fig = data_visualizer.plot_precursors_pie_chart(
data_sources=["results.json"],
output_file="precursors_pie.png"
)
Required Parameters¶
Either data_sources OR folder_path must be provided.
data_sources (Union[List[str], List[Dict], str])¶
List of paths to JSON files or dictionaries containing materials data.
folder_path (str)¶
Path to folder containing JSON data files.
Optional Parameters¶
output_file (str)¶
Path to save the output plot image. If None, the plot is not saved.
figsize (Tuple[int, int])¶
Figure size as (width, height) in inches.
dpi (int)¶
DPI for output image.
min_percentage (float)¶
Minimum percentage for a category to be shown separately. Categories below this threshold are grouped into "Others".
title (str)¶
Title for the plot.
color_palette (str)¶
Matplotlib colormap name for the pie sections.
title_fontsize (int)¶
Font size for the title.
label_fontsize (int)¶
Font size for the percentage labels.
legend_fontsize (int)¶
Font size for the legend.
is_semantic_clustering_enabled (bool)¶
Whether to use semantic similarity for clustering similar precursors.
similarity_threshold (float)¶
Threshold for similarity-based clustering when is_semantic_clustering_enabled is True (0-1).
Default Values
output_file = Nonefigsize = (10, 8)dpi = 300min_percentage = 1.0title = "Distribution of Precursors in Materials Synthesis"color_palette = "Blues"title_fontsize = 14label_fontsize = 10legend_fontsize = 10is_semantic_clustering_enabled = Truesimilarity_threshold = 0.8
plot_precursors_histogram()¶
Create a histogram visualization of precursors frequency distribution in materials synthesis.
from comproscanner import data_visualizer
fig = data_visualizer.plot_precursors_histogram(
data_sources=["results.json"],
output_file="precursors_hist.png"
)
Required Parameters¶
Either data_sources OR folder_path must be provided.
data_sources (Union[List[str], List[Dict], str])¶
List of paths to JSON files or dictionaries containing materials data.
folder_path (str)¶
Path to folder containing JSON data files.
Optional Parameters¶
output_file (str)¶
Path to save the output plot image. If None, the plot is not saved.
figsize (Tuple[int, int])¶
Figure size as (width, height) in inches.
dpi (int)¶
DPI for output image.
max_items (int)¶
Maximum number of items to display. Shows top N most frequent items.
title (str)¶
Title for the plot.
color_palette (str)¶
Matplotlib colormap name for the bars.
x_label (str)¶
Label for the x-axis.
y_label (str)¶
Label for the y-axis.
rotation (int)¶
Rotation angle for x-axis labels in degrees.
title_fontsize (int)¶
Font size for the title.
xlabel_fontsize (int)¶
Font size for the x-axis label.
ylabel_fontsize (int)¶
Font size for the y-axis label.
xtick_fontsize (int)¶
Font size for the x-axis tick labels.
value_label_fontsize (int)¶
Font size for the value labels on bars.
grid_axis (str)¶
Axis for grid lines ('x', 'y', 'both', or None for no grid).
grid_linestyle (str)¶
Line style for grid lines.
grid_alpha (float)¶
Alpha (transparency) for grid lines ranging between 0 and 1.
is_semantic_clustering_enabled (bool)¶
Whether to enable semantic clustering of precursors.
similarity_threshold (float)¶
Similarity threshold for clustering which ranges between 0 and 1.
Default Values
output_file = Nonefigsize = (12, 8)dpi = 300max_items = 15title = "Frequency Distribution of Precursors in Materials Synthesis"color_palette = "Blues"x_label = "Precursor"y_label = "Frequency"rotation = 45title_fontsize = 14xlabel_fontsize = 12ylabel_fontsize = 12xtick_fontsize = 10value_label_fontsize = 9grid_axis = "y"grid_linestyle = "--"grid_alpha = 0.3is_semantic_clustering_enabled = Truesimilarity_threshold = 0.8
plot_characterization_techniques_pie_chart()¶
Create a pie chart visualization of characterization techniques distribution.
from comproscanner import data_visualizer
fig = data_visualizer.plot_characterization_techniques_pie_chart(
data_sources=["results.json"],
output_file="techniques_pie.png"
)
Required Parameters¶
Either data_sources OR folder_path must be provided.
data_sources (Union[List[str], List[Dict], str])¶
List of paths to JSON files or dictionaries containing materials data.
folder_path (str)¶
Path to folder containing JSON data files.
Optional Parameters¶
output_file (str)¶
Path to save the output plot image. If None, the plot is not saved.
figsize (Tuple[int, int])¶
Figure size as (width, height) in inches.
dpi (int)¶
DPI for output image.
min_percentage (float)¶
Minimum percentage for a category to be shown separately.
title (str)¶
Title for the plot.
color_palette (str)¶
Matplotlib colormap name for the pie sections.
is_semantic_clustering_enabled (bool)¶
Whether to use semantic similarity for clustering similar techniques.
similarity_threshold (float)¶
Threshold for similarity-based clustering when is_semantic_clustering_enabled is True ranging between 0 and 1.
title_fontsize (int)¶
Font size for the title.
label_fontsize (int)¶
Font size for the percentage labels.
legend_fontsize (int)¶
Font size for the legend.
Default Values
output_file = Nonefigsize = (10, 8)dpi = 300min_percentage = 1.0title = "Distribution of Characterization Techniques"color_palette = "Blues"is_semantic_clustering_enabled = Truesimilarity_threshold = 0.8title_fontsize = 14label_fontsize = 10legend_fontsize = 10
plot_characterization_techniques_histogram()¶
Create a histogram visualization of characterization techniques frequency distribution.
from comproscanner import data_visualizer
fig = data_visualizer.plot_characterization_techniques_histogram(
data_sources=["results.json"],
output_file="techniques_hist.png"
)
Required Parameters¶
Either data_sources OR folder_path must be provided.
data_sources (Union[List[str], List[Dict], str])¶
List of paths to JSON files or dictionaries containing materials data.
folder_path (str)¶
Path to folder containing JSON data files.
Optional Parameters¶
output_file (str)¶
Path to save the output plot image. If None, the plot is not saved.
figsize (Tuple[int, int])¶
Figure size as (width, height) in inches.
dpi (int)¶
DPI for output image.
max_items (int)¶
Maximum number of items to display. Shows top N most frequent items.
title (str)¶
Title for the plot.
color_palette (str)¶
Matplotlib colormap name for the bars.
x_label (str)¶
Label for the x-axis.
y_label (str)¶
Label for the y-axis.
rotation (int)¶
Rotation angle for x-axis labels in degrees.
is_semantic_clustering_enabled (bool)¶
Whether to use semantic similarity for clustering similar techniques.
similarity_threshold (float)¶
Threshold for similarity-based clustering when is_semantic_clustering_enabled is True ranging between 0 and 1.
title_fontsize (int)¶
Font size for the title.
xlabel_fontsize (int)¶
Font size for the x-axis label.
ylabel_fontsize (int)¶
Font size for the y-axis label.
xtick_fontsize (int)¶
Font size for the x-axis tick labels.
value_label_fontsize (int)¶
Font size for the value labels on bars.
grid_axis (str)¶
Axis for grid lines ('x', 'y', 'both', or None for no grid).
grid_linestyle (str)¶
Line style for grid lines.
grid_alpha (float)¶
Alpha (transparency) for grid lines ranging between 0 and 1.
Default Values
output_file = Nonefigsize = (14, 8)dpi = 300max_items = 15title = "Frequency Distribution of Characterization Techniques"color_palette = "Blues"x_label = "Characterization Technique"y_label = "Frequency"rotation = 45is_semantic_clustering_enabled = Truesimilarity_threshold = 0.8title_fontsize = 14xlabel_fontsize = 12ylabel_fontsize = 12xtick_fontsize = 10value_label_fontsize = 9grid_axis = "y"grid_linestyle = "--"grid_alpha = 0.3
create_knowledge_graph()¶
Create a comprehensive knowledge graph from extracted composition-property data directly in Neo4j database. The knowledge graph visualizes relationships between materials, families, precursors, methods, techniques, and properties.
from comproscanner import data_visualizer
data_visualizer.create_knowledge_graph(
result_file="results.json"
)
Required Parameters¶
result_file (str)¶
Path to the JSON file containing extracted results.
Optional Parameters¶
is_semantic_clustering_enabled (bool)¶
Whether to enable clustering of similar items using semantic similarity.
family_clustering_similarity_threshold (float)¶
Similarity threshold specifically for clustering material families ranging between 0 and 1.
method_clustering_similarity_threshold (float)¶
Similarity threshold specifically for clustering synthesis methods ranging between 0 and 1.
precursor_clustering_similarity_threshold (float)¶
Similarity threshold specifically for clustering precursors ranging between 0 and 1.
technique_clustering_similarity_threshold (float)¶
Similarity threshold specifically for clustering characterization techniques ranging between 0 and 1.
keyword_clustering_similarity_threshold (float)¶
Similarity threshold specifically for clustering keywords ranging between 0 and 1.
Default Values
is_semantic_clustering_enabled = Truefamily_clustering_similarity_threshold = 0.9method_clustering_similarity_threshold = 0.8precursor_clustering_similarity_threshold = 0.9technique_clustering_similarity_threshold = 0.8keyword_clustering_similarity_threshold = 0.85
Neo4j Database Required
The knowledge graph is created directly in a Neo4j database. Ensure you have Neo4j running and configured with following credentials in your .env file before creating knowledge graphs.
# neo4j
NEO4J_URI=YOUR_NEO4J_URI # default URI for Neo4j is bolt://localhost:7687
NEO4J_USER=YOUR_NEO4J_USERNAME
NEO4J_PASSWORD=YOUR_NEO4J_PASSWORD
NEO4J_DATABASE=YOUR_NEO4J_DATABASE_NAME
Output Format¶
All visualization functions (except create_knowledge_graph) return a matplotlib.figure.Figure object that can be viewed interactively:
from comproscanner import data_visualizer
fig = data_visualizer.plot_family_pie_chart(
data_sources=["results.json"]
)
# Show the plot
fig.show()
Next Steps¶
- Learn about Evaluation Visualization
- Explore Data Extraction
- Learn about RAG Configuration