DReAM
  • La Plateforme DReAM
  • Installation
    • Introduction
    • Serveur
      • Configuration
      • Logiciels
        • Docker
        • Solr
        • MySQL
        • Java
        • PHP/Composer
        • Iptables
        • Fail2Ban
    • Applications
      • Java
        • Parseur
      • PHP
        • Recherche
        • Console
        • Collecte (Crawler)
    • Outils
      • Jenkins
      • Ansible
    • FAQ
  • Déploiement
    • Introduction
    • Mise en Recette
    • Mise en Production
    • Outils
    • FAQ
  • SolR
    • Introduction
    • Configuration
    • FAQ
  • Base de données
    • Introduction
    • FAQ
  • Applications
    • Introduction
    • Recherche
      • Installation
      • Fonctionnalités
        • Facettes
        • Alertes
    • Authentification
      • Fonctionnalités
        • Oauth API
        • Droits Accès
      • Installation
    • Console
      • Installation
      • Fonctionnalités
        • Parser
        • Collecte
        • Mise à jour des applications
    • Collecte (Crawler)
      • Installation
      • Fonctionnalités
    • Parseur
      • Installation
      • Fonctionnalités
Propulsé par GitBook
Sur cette page

Cet article vous a-t-il été utile ?

  1. Applications
  2. Parseur

Fonctionnalités

PrécédentInstallation

Dernière mise à jour il y a 3 ans

Cet article vous a-t-il été utile ?

Currently DREAM Parser can parse the following types of documents

  1. HTML - Hypertext Markup Language(HTML) is a text-based approach to describing how content contained within an HTML file is structured.

  2. PDF - Our Parser can also recognise the text contents in the Portable Document Format(PDF) and analyses the corrseponding facets for it

  3. XML - An Extended Markup Language(XML) is a metalanguage which allows users to define their own customized markup languages, especially in order to display documents on the Internet. It also uses an Intermediary table known as Inter Table

Launching

The launching of the Parser is explained as follows

  • The User can select which collecte to be parsed from the .

  • Once the collecte to be parsed has been selected, then click on Lancement button to start the parsing

The data after the parsing will be fed into a table called as Indexation Engine. The CRUD for this table has been provided if a user has any manual updates to be done for any of the url's that have been parsed.

An example of the IE Table is shown below

Waiting for Marius update of the new IE Table on prodn

The structure of the Indexation Engine is as follows

IE Table elements

Meaning

url

The url of the documents that has been parsed

title

The title of the documents that has been parsed

Fonction

Facet calculated during Parsing Not overwritten if pre-indexed

Secteur

Facet calculated during Parsing Not overwritten if pre-indexed

Type d'Info

Facet calculated during Parsing Not overwritten if pre-indexed

Theme

Facet calculated during Parsing Not overwritten if pre-indexed

Operations

CRUD feature for manually updating the indexation

collecte list