Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createWorker throws exception with option.langPath set in electron #925

Open
HQidea opened this issue May 24, 2024 · 7 comments
Open

createWorker throws exception with option.langPath set in electron #925

HQidea opened this issue May 24, 2024 · 7 comments
Milestone

Comments

@HQidea
Copy link

HQidea commented May 24, 2024

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
"tesseract.js": "^5.1.0",
Describe the bug
When using tesseract.js in the electron environment, setting option.langPath to a local file path, the createWorker function results in an error.
The following code snippet and error message demonstrate the issue:

const worker: Promise<Tesseract.Worker> = createWorker("chi_sim", 1, {
  langPath: path.join(__dirname, "../../resources/ocr")
});

error:

Error: TypeError: Only absolute URLs are supported
    at Worker.<anonymous> (/path/to/node_modules/.pnpm/[email protected]/node_modules/tesseract.js/src/createWorker.js:248:15)
    at Worker.emit (node:events:518:28)
    at Worker.emit (node:domain:488:12)
    at MessagePort.<anonymous> (node:internal/worker:262:53)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:826:20)
    at exports.emitMessage (node:internal/per_context/messageport:23:28)

Upon examining the code, I found that the condition below appears to be arbitrary since in the electron environment, it should be able to load a language file from a local path as well.

// For Node.js, langPath may be a URL or local file path
// The is-url package is used to tell the difference
// For the browser version, langPath is assumed to be a URL
if (env !== 'node' || isURL(langPathDownload) || langPathDownload.startsWith('moz-extension://') || langPathDownload.startsWith('chrome-extension://') || langPathDownload.startsWith('file://')) { /** When langPathDownload is an URL */
path = langPathDownload.replace(/\/$/, '');
}

Expected behavior
It should be able to load a language file from a local path in the electron environment.

@Balearica
Copy link
Member

I would agree that the Node.js version should work similarly regardless of whether or not it is being run in Electron, so this condition should be patched if that is not the current behavior.

As an aside, is there a compelling reason to use the Node.js version of Tesseract.js rather than the browser version in an Electron application? My understanding was that Electron supports both, however it was considered a security best practice to avoid giving code access to the filesystem with Node.js integration where possible. However, I have minimal experience with Electron so honestly do not understand the tradeoffs well. The browser version of Tesseract.js can definitely be run using entirely offline resources, as seen in this example repo.

@HQidea
Copy link
Author

HQidea commented May 25, 2024

@Balearica Thx for reply. In Electron, there are two types of processes: main process and render process. As I understand it, what you mentioned and the example you provided is using Tesseract.js in the render process. Upon inspecting the code, I found that the environment is 'webworker' and it utilizes fetch to load language files.

However, in my scenario, I need to run Tesseract.js in the main process after receiving an IPC event from the render process, where the environment is detected as 'electron', and it uses 'node-fetch' to load language files, which does not support local file paths.

@Balearica
Copy link
Member

@HQidea Can you make a minimal example repo that can be cloned to replicate this issue? I ran the snippet provided in the OP within the main.js file in the example repo I posted (rather than the index-offline.html file), however everything still ran as expected. Therefore, there is presumably additional context required to replicate this.

@HQidea
Copy link
Author

HQidea commented May 26, 2024

@Balearica this is the repo that can reproduce the error.

I also tried to replicate the error using the repo you provided. However the results are inconsistent. Sometimes the environment in the worker thread is tested as 'electron', and sometimes as 'node'. When the environment is tested as 'node', everything goes well. I'm not sure why.

@Balearica
Copy link
Member

@HQidea I was able to reproduce this issue using your repo, and was able to confirm that simply commenting out the special isElectron case within getEnvironment resolved the issue.

} else if (isElectron()) {
env.type = 'electron';

I am hesitant to edit something without fully understanding what it does, so I reviewed the various issues/PRs regarding why a special case for Electron was introduced to begin with. These are linked below.

  1. Allow Explicit Specification whether the environment is Node or Browser #376 (related commit b8aba2e)
  2. Worker Script Path has 'http:/localhost' prepended, causing MODULE_NOT_FOUND error #474
  3. Fix fetch when running in electron webview #489 (related issues Cannot read property 'arrayBuffer' of undefined (Electron & React) #449, Cannot read property 'arrayBuffer' of undefined ( Electron.js + Vue.js) #465)
  4. Problem with start tesseract on electron-vue app #497 (related PR Fixed method for selecting env type #498)

After reviewing these discussions, I believe Electron was added as a special case because people wanted to use Node.js in an Electron render process. You do not need special logic to use the Node.js version of Tesseract.js in the main Electron process, and you don't need a special logic to use the browser version of Tesseract.js in the render Electron processes. The only "special case" introduced by Electron is the theoretical ability to run Node.js code in what appears to be a browser context (window.document is defined).

While my understanding of Electron is limited, my initial thought is that this special case is not something we need to support. Using Node.js in a renderer process appears to be strongly advised against in the Electron docs. Additionally, the browser version of Tesseract.js works perfectly well in Electron renderer processes, and is just as fast. If this understanding is correct, and there is not a compelling reason to (attempt to) support using the Node.js version in a renderer process, then we could just cut out the Electron-specific stuff entirely, which should solve your issue.

@HQidea
Copy link
Author

HQidea commented May 27, 2024

@Balearica I totally agree with you on not using Node.js in a renderer process.

@Balearica
Copy link
Member

It's unclear to me whether this could break any existing code, so my inclination is to combine this with any other planned breaking changes and release as v6.

@Balearica Balearica added this to the v6.0 milestone Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants