Fonctionnalités

A crawling is a process which browses any of the Origines specified by the User in a methodical, automated manner.

The Workflow and Launching of the crawler is as follows. An Origine ca be selected from the Origine table and can be launched. The parameters of crawler has been already set Origin Parameters

Once the crawling has been done, the contents are inserted into a Collecte table as shown below.

The structure of the collecte List is as follows

Collecte List Parameter

Description

Id

Unique ID value representing each collecte

Nom

Name of the Collecte

URL's

The root URL(Origine) from which the collecte was started

Debut

When was the collecte started

Duree

The complete time taken by the crawler to collecte the url's

If an individual collecte is chosen, then we have the CRUD operation for the collecte table as shown below

The structure of the Collecte Table is as follows

Collecte Table Parameter

Description

Url

Each URL that has been crawled from its Origine

Langue

Language of each URL (inherited from its Origine)

Type

type of the url which has been crawled like xml or html or pdf

Facets

The Pre-indexation can be done even at the url level also

A Parser

is set to oui if the url is to be parsed and non otherwise

Dernière mise à jour

Cet article vous a-t-il été utile ?