Skip to content

Releases: aryn-ai/sycamore

v0.1.17

06 Jun 18:21
a217c92
Compare
Choose a tag to compare

This Sycamore release contains new writers to the Weaviate and Pinecone vector databases, enhancements to the demo UI, and numerous small features and bug fixes.

What's Changed

New Contributors

Full Changelog: v0.1.16...v0.1.17

v0.1.16

07 May 04:41
f1e4ed0
Compare
Choose a tag to compare

This release contains support in the SycamorePartitioner for extracting table structure and images, as well as a new transform for summarizing images. It also includes a number of bug fixes and enhancements.

What's Changed

  • fix ui error when no title is extracted and we're not in ntsb setting by @HenryL27 in #352
  • Fix almost all the pyproject.toml and poetry.lock files to have consistent requirements on python dependencies. by @eric-anderson in #345
  • Bind mount to convey SSL cert/key to Jupyter & UI by @alexaryn in #349
  • Use real SSL certificate for OpenSearch HTTP. by @alexaryn in #353
  • copy lib/poetry-lock into containers to make poetry happy by @HenryL27 in #354
  • copy lib/poetry-lock into remote-processor-service too. by @HenryL27 in #355
  • copy in all of poetry-lock, not just the pyproject files by @HenryL27 in #356
  • Update data model for table structure recognition. by @bsowell in #357
  • Put token-protected HTTPS proxy in front of UI proxy. by @alexaryn in #359
  • Arxiv switched to HTTP for these PDFs; make it work. by @alexaryn in #360
  • Add apt update to UI Dockerfiles. by @alexaryn in #361
  • Use chown in our copy commands to make sure all files are owned by app by @eric-anderson in #362
  • Add TableStructureExtractor interface and TableTransformer impl. by @bsowell in #358
  • fix zsh path by @eric-anderson in #367
  • Jupyter container improvements by @eric-anderson in #369
  • Don't say localhost if it's not going to work. by @alexaryn in #366
  • bump deploy timeout for reranking model from 60 to 120 by @HenryL27 in #363
  • ingest all ntsb docs, automatically detect docker v not, spread path … by @HenryL27 in #368
  • Fix typos in README by @hsm207 in #370
  • Fix default prep script when given an empty directory to import by @HenryL27 in #371
  • fix typo by @HenryL27 in #372
  • Add the ability to summarize images to partitioned docsets. by @bsowell in #365
  • Store element bbox as a tuple rather than BoundingBox. by @bsowell in #374
  • Jonfritz patch 1 partition update by @jonfritz in #376
  • FIX: Error on initiate conversation without a conversation id by @sohamkasar19 in #375
  • Add API docs for the SycamorePartitioner and table extraction. by @bsowell in #373
  • Fix malformed text from beautiful soup. by @bohou-aryn in #351
  • Handle deserializing JSON documents when elements is None. by @bsowell in #377
  • Bump sycamore version to 0.1.16 by @bsowell in #378

New Contributors

Full Changelog: v0.1.15...v0.1.16

v0.1.15

11 Apr 23:58
Compare
Choose a tag to compare

This release add support for writing DocSets to jsonl files as well as other incremental features and bug fixes.

What's Changed

Full Changelog: v0.1.14...v0.1.15

v0.1.14

02 Apr 19:38
8b7190b
Compare
Choose a tag to compare

This release includes CPU support and OCR in the Sycamore Partitioner, caching for better performance and lower cost when using Textract for table extraction, an upgraded version of Ray (2.10), and more.

What's Changed

Full Changelog: v0.1.13...v0.1.14

v0.1.13

15 Mar 21:28
88c691b
Compare
Choose a tag to compare

This release upgrades the Sycamore docker containers to use OpenSearch 2.12 and adds support for SSL. It also includes significant additions to the Sycamore documentation (https://sycamore.readthedocs.io/), and a number of other features and bug fixes.

What's Changed

Full Changelog: v0.1.12...v0.1.13

v0.1.12

09 Feb 01:10
Compare
Choose a tag to compare

This release adds components to Sycamore to enable search and analytics use cases, beyond data preparation. Sycamore can now be deployed using Docker containers, and you can also download the Python libraries for data preparation. The documentation has also been updated to reflect this change in scope.

This release also has other features and bug fixes.

What's Changed

Full Changelog: v0.1.11...v0.1.12

v0.1.11

03 Jan 20:12
Compare
Choose a tag to compare

This release removes support for OpenAI's text-davinci-003 model, which will be deprecated on 1/4/23, and replaces it with gpt-3.5-turbo-instruct. All users of sycamore should upgrade.

What's Changed

  • Migrate from text-davinci-003 to gpt-3.5-turbo-instruct. by @bsowell in #202
  • Bump version to v0.1.11. by @bsowell in #203

Full Changelog: v0.1.10...v0.1.11

v0.1.10

21 Dec 21:55
Compare
Choose a tag to compare

This Sycamore release adds support for near duplicate detection via shingling. It also includes documentation improvements and incremental bug fixes.

What's Changed

  • Render schema extraction documentation by @mkyl in #194
  • Additional documentation for Schema extraction by @mkyl in #195
  • Add async-timeout dependency by @eric-anderson in #198
  • Add docstrings to all public document methods so they show up on sycamore.readthedocs.io by @eric-anderson in #197
  • Near-Duplicate Detection in Sycamore: Document Tagging and Document Dropping by @alexaryn in #199
  • Bump version to v0.1.10. by @bsowell in #200

Full Changelog: v0.1.9...v0.1.10

v0.1.9

08 Dec 01:04
Compare
Choose a tag to compare

This Sycamore release adds improved heuristics for partitioning documents. It also includes a new method of automatically inferring entities to extract from unstructured documents, as well as incremental features and bug fixes.

What's Changed

  • Change the default merge size to 256. by @eric-anderson in #178
  • Simplify running the http crawler. by @eric-anderson in #180
  • Fix text chunking for html importing to improve result quality. by @eric-anderson in #185
  • Remove docker_compose and opensearch files. They were moved to quickstart. by @eric-anderson in #183
  • Change simple_ingest and s3_ingest to use GTE-small embedding model. by @alexaryn in #169
  • Remove unneeded mapping in OpenSearch index settings. by @alexaryn in #186
  • Added HTML ingest example. Fixed order in S3 ingester. by @alexaryn in #188
  • Simple transform to perform regex replacement on Elements. by @alexaryn in #187
  • Update README.md by @jonfritz in #179
  • Entity Extraction by @mkyl in #161
  • Merging/breaking elements based on heuristics including bbox by @alexaryn in #171
  • Update aiohttp and cryptography to address dependabot alerts. by @bsowell in #192
  • Bump version to v0.1.9. by @bsowell in #191

New Contributors

Full Changelog: v0.1.8...v0.1.9

v0.1.8

18 Nov 17:50
Compare
Choose a tag to compare

This Sycamore release contains code to build Docker containers as well as small improvements and bug fixes.

What's Changed

New Contributors

Full Changelog: v0.1.7...v0.1.8