Fonctionnalités
Dernière mise à jour
Cet article vous a-t-il été utile ?
Dernière mise à jour
Cet article vous a-t-il été utile ?
Currently DREAM Parser can parse the following types of documents
HTML - Hypertext Markup Language(HTML) is a text-based approach to describing how content contained within an HTML file is structured.
PDF - Our Parser can also recognise the text contents in the Portable Document Format(PDF) and analyses the corrseponding facets for it
XML - An Extended Markup Language(XML) is a metalanguage which allows users to define their own customized markup languages, especially in order to display documents on the Internet. It also uses an Intermediary table known as Inter Table
Launching
The launching of the Parser is explained as follows
The User can select which collecte to be parsed from the .
Once the collecte to be parsed has been selected, then click on Lancement
button to start the parsing
The data after the parsing will be fed into a table called as Indexation Engine. The CRUD for this table has been provided if a user has any manual updates to be done for any of the url's that have been parsed.
An example of the IE Table is shown below
Waiting for Marius update of the new IE Table on prodn
The structure of the Indexation Engine is as follows
IE Table elements
Meaning
url
The url of the documents that has been parsed
title
The title of the documents that has been parsed
Fonction
Facet calculated during Parsing Not overwritten if pre-indexed
Secteur
Facet calculated during Parsing Not overwritten if pre-indexed
Type d'Info
Facet calculated during Parsing Not overwritten if pre-indexed
Theme
Facet calculated during Parsing Not overwritten if pre-indexed
Operations
CRUD feature for manually updating the indexation