Skip to content

Features

adantsou edited this page Oct 5, 2021 · 3 revisions

General

EToolbox Link Inspector collects the following information about broken links:

  • link href
  • link type
  • http status code
  • status message
  • reference to the page containing the link, including the page's title and location
  • reference to the component containing the link, including the component's title, resource type and location
  • reference to the property containing the link, including the property's name and location

general-view.png

The primary goal is detection of broken links on an Author instance and correcting them by content managers before going to live.

Scheduled data feed generation

The tool leverages a data feed generated by the scheduled task instead of calculating data upon each request in order to prevent additional load on AEM instance and improve user experience (loading time). Internal and external links are retrieved from the content via traversing a repository (which is more efficient for large content volumes compared to querying) and then are validated (please see the section Links Validation for more details). After that all not valid links and all related details are assembled into the data feed file. Data feed generation is started as a sling job (ordered) to avoid overlapping executions.

By default, the scheduler is configured to start at 5 AM daily and is disabled (/system/console/configMgr/com.exadel.etoolbox.linkinspector.core.schedulers.DataFeedGenerationTask):

scheduler-config

The data feed is stored as a json located at /content/etoolbox-link-inspector/data/datafeed.json

Manual data feed generation

Data feed generation can be triggered manually by requesting the resource /content/etoolbox-link-inspector/servlet/triggerDataFeedGeneration via GET (admin permissions are required - admin user or member of administrators group). The servlet should be used for testing purposes only or in exceptional cases not covered by the scheduled execution, normally the data feed should be generated by the scheduler.

Data Filtering

There is a set of configurable options allowing to set up content and links filtering for conducting a more precise inspection.

The data feed is generated based on the set of parameters enclosed within the OSGI config /system/console/configMgr/com.exadel.etoolbox.linkinspector.core.services.data.impl.GridResourcesGeneratorImpl

generator-config

Links Validation

External Links Validation

The service leverages the PoolingHttpClientConnectionManager for sending HEAD requests concurrently to validate external links. If the returned http status code is not equal to any code from the range 200-207, the link is recognized as not valid.

Connection timeout, socket timeout and user agent are configurable (/system/console/configMgr/com.exadel.etoolbox.linkinspector.core.services.impl.ExternalLinkCheckerImpl):

external-links-check-config

The following types of the external links are considered:

  • A link stored in a single-value or multi-value property. The link starts with https://, https://
  • A link contained in a single-value or multi-value property along with text content (RTE). The link starts with https://, https://
  • A link stored in a single-value or multi-value property. The link starts with www
  • A link contained in a single-value or multi-value property along with text content (RTE). The link starts with www

Internal Links Validation

An internal link is considered valid if the resource matching the link location is present in a repository. Parallel streams are leveraged to improve performance during the validation.

The following types of the internal links are considered:

  • A link stored in a single-value or multi-value property. The link starts with /content/
  • A link contained in a single-value or multi-value property along with text content (RTE). The link is present in an html attribute (such as href, src, action, etc) thus it should start with "/content/ (leading by double quote)

If an internal link, retrieved from the content, contains the .html extension, the extension is removed prior to validation.

CSV Report

The full report in the CSV format can be downloaded via clicking the Download Full Report button:

csv-download

The report contains all found broken links and has the same structure as the UI grid:

csv-sample

If the report and data feed have not generated yet, the corresponding warning message is displayed while attempting to download the report:

csv-download-warn.png

Support of localized output in the report

The links containing locale specific characters are properly encoded and displayed in the grid as well as in the CSV report.

Notes:

  • The report is located at /content/etoolbox-link-inspector/download/report.csv
  • The UI grid has a limit of 500 items currently, all the collected items are available in the CSV report

Fix Broken Link

The feature is available for a single selection and allows to replace the selected link with the specified one

fix-broken-link-overview.png

ACL check

User should have read and write permissions for the selected path in order to see the Fix broken link button:

fix-broken-link-acl.png

Otherwise, the button is hidden.

Input link validation

Client side validation

The input link should not be empty nor equal to the current link:

fix-broken-client-check.png fix-broken-client-check-2.png

Server side validation

The input link is validated at the server side (please see the section Links Validation for more details) after submitting the dialog:

fix-broken-server-check.png fix-broken-server-check-2.png

The message contains details (status code, status message) about the reason of the failed validation.

Skip server side validation

If the checkbox Skip input link check before replacement is checked, the server side validation of the input link will be omitted, so that any link which passes the Client side validation (non-empty and non-equal to the current one link) can be entered and replacement won't be interrupted by the validation:

fix-broken-skip-check fix-broken-skip-check-2

The checkbox was introduced for taking into account the cases, when the input link doesn't match the internal link (starts with /content/) nor external link (starts with https:// or http://) patterns, e.g. vanity urls (/my-vanity-path).

Success message

fix-broken-success.png

Replace by Pattern

The feature allows applying replacement by a regex pattern within the detected broken links scope.

replace-by-pattern-overview

It is strongly recommended using the 'Replace by Pattern' feature by privileged users only since improper use of it might imply broad content updates and as a consequence high load of an AEM instance along with undesired changes in the content .

The replacement is done by the servlet mapped to the resource /content/etoolbox-link-inspector/servlet/replaceLinksByPattern, so appropriate ACLs should be applied for this path.

The number of processed items is limited (10k by default) to avoid implications caused by massive content updates, the limit is configurable at /system/console/configMgr/com.exadel.etoolbox.linkinspector.core.servlets.ReplaceByPatternServlet:

replace-by-pattern-config

ACL check

The button Replace By Pattern is disabled, if a user has no sufficient read permissions for triggering the servlet /content/etoolbox-link-inspector/servlet/replaceLinksByPattern, that encloses the functionality of the replacement by pattern.

The button is disabled as well, if the grid has no items:

replace-by-pattern-disabled.png

During searching links by pattern within the broken links scope, the ACL check for content paths takes place. If a user doesn't have read/write permissions for updating the link within the content resource, the item will be excluded from the processing.

Input fields validation

The input fields should not be empty nor equal to each other:

replace-by-pattern-validation-1 replace-by-pattern-validation-2

Dry Run

If the Dry Run mode selected, changes won't be applied in content. The purpose of the Dry Run is to validate the upcoming changes without actual modifications in the repository.

replace-by-pattern-dry-run.png replace-by-pattern-dry-run-success

CSV output

The feature, allowing to download the CSV output containing the details of replacement by pattern, was introduced in order to make it possible to review the updated items, especially for the large content volume updates:

replace-by-pattern-csv-1 replace-by-pattern-csv-2

The content of the CSV output:

replace-by-pattern-csv-3

If the checkbox Download CSV with updated items is not checked, the success message will contain the number of updated items.

replace-by-pattern-success-items

Backup

The backup package is generated before the replacement and can be further used for reverting any unexpected results:

replace-by-pattern-backup

The package belongs to the group EToolbox_Link_Inspector and has the name "replace_by_pattern_backup_" + generation date in milliseconds:

replace-by-pattern-backup-2

User should have sufficient permissions in order to create the backup package.

Notification about necessity of data feed regeneration

After updating any link the alert, indicating that further data feed regeneration is required in order to reflect the latest changes, is shown:

regeneration-needed-notification.png

Stats popover

The popover contains the last generation stats along with filtering properties.

stats-popover.png