Skip to content

A text extractor for extracting text from HTML, PDF, Image and other files.

License

Notifications You must be signed in to change notification settings

skglobal-jsc/text-extractor

Repository files navigation

@sk-global/text-extractor

A text extractor for extracting text from HTML, PDF, Image and other files.

Currently supported types ...

Installation

npm install @sk-global/text-extractor

Usage

CommonJS

const { fromUrl, fromBufferWithMimeType, fromBuffer } = require('@sk-global/text-extractor');

// fromUrl
const text = await fromUrl('https://www.digital.go.jp/assets/contents/node/basic_page/field_ref_resources/d6cfdcdd-75e4-460c-9ec0-af4f952e03d5/20210906_meeting_promoting_01.pdf');

// fromBufferWithMimeType
const text = await fromBufferWithMimeType(buffer, 'image/png');

// fromBuffer
const text = await fromBuffer(buffer);

ES6

import { fromUrl, fromBufferWithMimeType, fromBuffer } from '@sk-global/text-extractor';

Roadmap

  • Add support for more file types
  • Add support for options passed to the underlying libraries

About

A text extractor for extracting text from HTML, PDF, Image and other files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages