Microservice.Exchange

.NET Core library provides Core Web Crawler Functionality

Crawler.Core

Core library to Parse HTML files. Document Parts can be defined; or are automatically detected. The DocumentPart composite is returned.

Library contains other Core interfaces/abstractions.

E.g.

CrawlRequest
CrawlResponse
UserActions (interaction with a web page)

Crawler.Strategies.Core

Core library with strategies to crawl web sites. Basic strategy with continuation is provided. Custom strategies can be introduced by implementing the following: ICrawlContinuationStrategy and ICrawlStrategy

The following strategies are provided:

Crawl all Links in Page
Crawl Domain specific links
Track links in Page

Note: the CrawlConfiguration can specify how to Crawl a particular web-site

Crawler.WebDriver.Core

Core library defining interfaces for a Webdriver. The Crawler will interact with the web-driver for User-Actions and to extract data

Crawler.Scheduler.Core

Core library provides scheduling logic for Crawls.

E.g.

Hourly crawl of a website
Crawl links found via Crawler Strategy
Schedule outstanding crawls

Crawler.RequestHandling.Core

Core library to manage Request Throttling. Target websites should NOT be over-loaded with requests (resulting in a denial-of-service). Instead requests should be throttled.

Crawler.Management.Core

Core library to Bootstrap the Crawler Framework. With various data sources to read requests from and to publish results.

The following are provided:

File
RabbitMq
Elasticsearch
MongoDb

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
Caching @ 9db0187		Caching @ 9db0187
Crawler.Configuration.Core		Crawler.Configuration.Core
Crawler.Core.UnitTest		Crawler.Core.UnitTest
Crawler.Core		Crawler.Core
Crawler.DataModel		Crawler.DataModel
Crawler.Management.Core.UnitTest		Crawler.Management.Core.UnitTest
Crawler.Management.Core		Crawler.Management.Core
Crawler.Microservice.Core		Crawler.Microservice.Core
Crawler.RequestHandling.Core		Crawler.RequestHandling.Core
Crawler.Scheduler.Core		Crawler.Scheduler.Core
Crawler.Stategies.Core.UnitTest		Crawler.Stategies.Core.UnitTest
Crawler.Stategies.Core		Crawler.Stategies.Core
Crawler.Strategies.General.UnitTest		Crawler.Strategies.General.UnitTest
Crawler.Strategies.General		Crawler.Strategies.General
Crawler.WebDriver.Core		Crawler.WebDriver.Core
Microservice.Amqp @ 8a5b598		Microservice.Amqp @ 8a5b598
Microservice.DataModel.Core @ efe0c9f		Microservice.DataModel.Core @ efe0c9f
Microservice.Elasticsearch @ 07d4946		Microservice.Elasticsearch @ 07d4946
Microservice.Grpc @ 5586f26		Microservice.Grpc @ 5586f26
Microservice.Mongodb @ 02f0167		Microservice.Mongodb @ 02f0167
Microservice.Serialization @ 7f5665c		Microservice.Serialization @ 7f5665c
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
ReadMe.md		ReadMe.md
core.sln		core.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microservice.Exchange

Crawler.Core

Crawler.Strategies.Core

Crawler.WebDriver.Core

Crawler.Scheduler.Core

Crawler.RequestHandling.Core

Crawler.Management.Core

License

About

Releases

Packages

Languages

License

egerpaulj/Crawler.Core

Folders and files

Latest commit

History

Repository files navigation

Microservice.Exchange

Crawler.Core

Crawler.Strategies.Core

Crawler.WebDriver.Core

Crawler.Scheduler.Core

Crawler.RequestHandling.Core

Crawler.Management.Core

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages