Evaluation Overview¶

ComProScanner provides both semantic and agentic methods for evaluating extraction quality. The following sections outline the approaches, advantages, and use cases for each method.

Evaluation Methods¶

Semantic Evaluation¶

Approach: Uses embedding models to compute similarity between extracted and ground truth data.

Advantages:

Fast and cost-effective
Physics-aware with PhysBERT embeddings (default)
Consistent results

Best For:

Quick quality assessment
Large-scale evaluations
Budget-conscious projects

Agentic Evaluation¶

Approach: Uses specialized LLM agents to evaluate extraction quality with nuanced understanding.

Advantages:

More accurate assessment
Better context understanding
Handles edge cases

Best For:

High-stakes evaluations
Complex compositions
Detailed analysis

Comparison¶

Feature	Semantic	Agentic
Speed	Fast	Slower
Cost	Low	Higher
Accuracy	Good	Excellent
Context Understanding	Limited	Advanced
Reproducibility	High	Moderate

Evaluation Metrics¶

Both methods provide:

Overall Accuracy: Combined accuracy across both composition accuracy and synthesis accuracy
Composition Accuracy: Custom weight-based accuracy of extracted composition-property based data (compositions_property_values, property_unit, family)
Synthesis Accuracy: Custom weight-based accuracy of synthesis related data (method, precursors, steps, characterization_techniques)
Classification Metrics: Standard Precision/Recall/F1 metrics for detailed performance analysis
Normalized Classification Metrics: Classification metrics normalized to ensure an equitable comparison between articles with significant disparities in the quantity of extractable information.

To read more about the evaluation metrics, please refer the journal article here.

Quick Start¶

Semantic Evaluation¶

from comproscanner import evaluate_semantic

results = evaluate_semantic(
    ground_truth_file="ground_truth.json",
    test_data_file="extracted_results.json",
    output_file="semantic_eval.json"
)

Agentic Evaluation¶

from comproscanner import evaluate_agentic

results = evaluate_agentic(
    ground_truth_file="ground_truth.json",
    test_data_file="extracted_results.json",
    output_file="agentic_eval.json"
)

Detailed Guides¶

Semantic Evaluation - Semantic-based evaluation
Agentic Evaluation - LLM agent-based evaluation

Next Steps¶

Continue to Semantic Evaluation or Agentic Evaluation
Explore Visualization
Learn about RAG Configuration