Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paragraph segmentation does not work #66

Open
mirfan899 opened this issue Sep 16, 2022 · 0 comments
Open

Paragraph segmentation does not work #66

mirfan899 opened this issue Sep 16, 2022 · 0 comments

Comments

@mirfan899
Copy link

Hi, I tried long string to convert it to paragraphs but it fails even changing parameters.

Here is the test sample.

import lexnlp.nlp.en.segments.paragraphs as p
paras = p.get_paragraph_list("We will host your documentation for free, forever. There are no tricks. We help over 100,000 open source projects share their docs, including a custom domain and theme. Whenever you push code to your favorite version control service, whether that is GitHub, BitBucket, or GitLab, we will automatically build your docs so your code and documentation are never out of sync. We build and host your docs for the web, but they are also viewable as PDFs, as single page HTML, and for eReaders. No additional configuration is required. We can host and build multiple versions of your docs so having a 1.0 version of your docs and a 2.0 version of your docs is as easy as having a separate branch or tag in your version control system. Read the Docs simplifies software documentation by automating building, versioning, and hosting of your docs for you. We fund our operations through advertising, corporate-hosted documentation with Read the Docs for Business, donations, and we are supported by a number of generous sponsors. Read the Docs is open source and community supported. It depends on users like you to contribute to development, support, and operations. You can learn more about how to contribute in our docs. Thanks so much to our wonderful team who helps us run the site. Read the Docs wouldn't be possible without them.")

for para in paras:
    print(para)

and the output is a single string.

We will host your documentation for free, forever. There are no tricks. We help over 100,000 open source projects share their docs, including a custom domain and theme. Whenever you push code to your favorite version control service, whether that is GitHub, BitBucket, or GitLab, we will automatically build your docs so your code and documentation are never out of sync. We build and host your docs for the web, but they are also viewable as PDFs, as single page HTML, and for eReaders. No additional configuration is required. We can host and build multiple versions of your docs so having a 1.0 version of your docs and a 2.0 version of your docs is as easy as having a separate branch or tag in your version control system. Read the Docs simplifies software documentation by automating building, versioning, and hosting of your docs for you. We fund our operations through advertising, corporate-hosted documentation with Read the Docs for Business, donations, and we are supported by a number of generous sponsors. Read the Docs is open source and community supported. It depends on users like you to contribute to development, support, and operations. You can learn more about how to contribute in our docs. Thanks so much to our wonderful team who helps us run the site. Read the Docs wouldn't be possible without them.

Am I missing something here? I tried different parameter values.

score_threshold=0.1
score_threshold=0.2
score_threshold=0.3
score_threshold=0.5
score_threshold=0.7
# window
window_pre=3
window_pre=2
window_pre=1
window_pre=5
window_pre=7
window_post=1
window_post=2
window_post=3
window_post=4
window_post=5
window_post=7

The result is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant