Changelog
[0.1.6] - 2026-04-02¶
Changed¶
- Updated README.md, CITATION.cff and docs with the published version (advance article) of the ComProScanner paper in Digital Discovery as fully open access:
- ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature
Added¶
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at
docs/getting-started/api-key-guide.mdwith detailed instructions for each provider.
Fixed¶
- Model prefix handling in
rag_tool.pystandardized to reflect the docs. HF_TOKENdocumentation clarified as optional — only required for gated or private Hugging Face models.
[0.1.5] - 2026-02-08¶
Added¶
-
Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the
examples/piezo_test/comparing_existing_frameworksfolder. -
New parameter
apply_advanced_cleaningadded to data cleaning methods indata_cleaner.py. When set toTrue, it triggers the advanced cleaning pipeline. -
Advanced composition cleaning methods in
data_cleaner.py: _remove_miller_indices()- Removes crystal plane notations from chemical formulas_remove_zero_coefficient_elements()- Removes elements with zero coefficients_normalize_coefficients()- Removes trailing zeros from coefficients_expand_leading_and_trailing_coefficients()- Expands leading/trailing coefficient patterns-
_expand_parenthetical_coefficients()- Expands nested bracket coefficients -
Enhanced documentation in
docs/usage/data-cleaning.md: - Added
apply_advanced_cleaningparameter documentation - Added Mermaid process flow diagram showing cleaning stages
-
Added advanced cleaning examples with tables for each transformation type
-
Template for GitHub issues added to .github/ISSUE_TEMPLATE for the following topics:
- bug reports
- feature requests
- documentation improvements
-
support questions
-
Changelog page added in the documentation. Also, CHANGELOG.md linked in README.md.
-
DeepWiki integration badge added to README.md for community Q&A support:
-
arXiv preprint badge added to README.md:
-
CITATION.cff added for standardized citation information based on the latest release and arXiv preprint.
Fixed¶
-
OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.
-
Empty/corrupted PDF handled in
pdf_processor.pyandwiley_processor.pyto avoid having GLYPH errors during text extraction. -
Data extraction failures fixed if composition-property text data is empty.
-
CSV progress tracking in
elsevier_processor.py: - DtypeWarning resolved by adding
dtype=str, low_memory=Falsetopd.read_csv() - Data loss issue fixed with immediate CSV persistence for processed articles
-
Sleep delays optimized for batch writes
-
Type annotation warnings in documentation build (griffe/mkdocstrings):
- Added return type annotations to function signatures in
comproscanner.py - Added return type annotations to all visualization functions in
data_visualizer.pyandeval_visualizer.py - Fixed parameter type format in docstrings from colon to comma notation
- Added
TYPE_CHECKINGconditional imports for matplotlib Figure type -
Fixed
**kwargstype annotations across multiple modules -
Numbered list formatting in
docs/about/contribution.md: - Fixed list continuation by using 4-space indentation for code blocks and nested lists
-
Disabled format on save for Markdown files in
.vscode/settings.json -
GitHub Actions CI disk space issue:
- Added
--no-cache-dirflag to pip install to reduce disk usage
Changed¶
- README badges section converted from HTML to markdown format for better compatibility across platforms.
[0.1.4] - 2025-12-02¶
Added¶
-
New function
clean_data()added for improved data cleaning and preprocessing instead of integrating it into data extraction function. -
New documentation page for Data Cleaning added:
- docs/usage/data-cleaning.md
-
Added to mkdocs.yml navigation.
-
New API overview documentation page added:
- docs/api.md
- Added to mkdocs.yml navigation.
-
New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
-
New tests added for remaining utils functions.
-
Added pytest coverage tracking (50%) using
pytest-covand coverage report generation using codecov.
Fixed¶
- Tests updated to reflect changes in data cleaning process.
Removed¶
- Arguments related to data cleaning removed from data extraction function.
Changed¶
- README images updated with raw GitHub links for better reliability:
- ComProScanner Logo
- ComProScanner Workflow
[0.1.3] - 2025-11-04¶
Fixed¶
- RecursiveCharacterTextSplitter importing updated for latest langchain version to avoid import errors:
- Changed from
from langchain.text_splitter import RecursiveCharacterTextSplitter - To
from langchain.text_splitter.recursive_character import RecursiveCharacterTextSplitter
[0.1.2] - 2025-10-24¶
Added¶
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
- arXiv:2510.20362
[0.1.1] - 2025-10-22¶
Fixed¶
- README images updated with external image link to fix PyPI rendering issue.
- ComProScanner Logo
- ComProScanner Workflow
[0.1.0] - 2025-10-22¶
Added¶
- Initial release of ComProScanner.