Tutorial : Trends in Robust Parsing
Jacques Vergne
GREYC
Université de Caen
FRANCE
https://lucasn01.users.greyc.fr/JacquesVergne/
Aim of this tutorial :
The aim of this tutorial is to outline and understand
fundamental trends of the evolution of robust parsing, among the variety
of concepts, techniques, and parsing processes, and to get a synthetic
view of the topic, while stressing the evolution of concepts and methods.
Today, robust parsing is changing rapidly from
tagging to chunking and clause bracketing. And partial parsing is becoming
less and less partial, with computational properties which allow a good
integration into industrial contexts, where linear complexity is a prerequisite
: robust parsers are able to process raw linguistic material at a constant
and foreseeable rate with foreseeable results.
This tutorial is designed for PhD. students and
researchers in NLP. The expected prerequisites are basic knowledge on parsing
and tagging.
Downloading the documents :
Tutorial outline :
-
Introduction :
2 meanings of parsing : parsing with formal grammars
versus robust parsing
-
Standard operations in robust parsing :
-
part-of-speech tagging
-
function of tagging : giving a part-of-speech tag to every word
for shallow parsing on raw material
or for replacing morpho-lexical analysis before syntactic analysis
-
an explicit process, but no explicit expected structures
-
resources :
extracting probabilities from hand-tagged corpora
(for instance : Church 1988 and 1993, Merialdo 1994)
or extracting symbolic rules from hand-tagged corpora (Brill tagger)
or manually writing symbolic rules (Xerox Grenoble, GREYC Caen)
-
the importance of the tagset :
for contextual deductions,
and to measure performance
-
different ways to articulate lexica and contextual resources
-
the tagging process : triggering rules on tokens
-
linear complexity, constant and foreseeable rate
-
the beginning of a way of renewal in parsing strategies
-
chunking
-
the concept of chunk (Abney 1991, Church 1988)
-
a prosodic segment
-
a non recursive constituent
-
functions of chunking : delimiting and typing chunks
-
lexical resources : beginnings, endings and separators of
chunks
-
the chunking process : as in tagging, triggering rules on
tokens
-
chunking after tagging ? or tagging and chunking together
?
-
why tagging is easier and more accurate inside typed chunks
-
grammatical words => beginning of a chunk, and type of the
chunk
-
the type of the chunk constrains the word categories
-
linking chunks
-
with an algorithm of linear complexity (Vergne and Giguet)
-
clause bracketing and computing chunk main functions
inside a clause
-
the work done in Xerox Grenoble (Aït-Moktar and Chanod
1997)
-
clause bracketing before chunking (Ejerhed 1996)
-
Shared properties, and differences in robust parsing
-
non recursive representations of constituent structures
-
imply a hierarchy of constituents of different types :
token, chunk, clause, sentence, paragraph, ...
-
and are a "comeback" of dependency representations
-
implementations
-
implementations based on statistical models (for instance
:
Church 1988, Merialdo 1991 and 1995, Briscoe and Carrol 1993, Rajman 1995)
-
or finite state transducers (for instance : Ejerhed 1996,
Abney 1996,
Aït-Moktar and Chanod 1997)
-
or rules and engine (GREYC Caen)
-
Two technologies to implement symbolic rules
-
Finite-State Transducers (FST)
-
Engine and rules
-
Typical applications
-
Comparing robust parsing with formal grammar parsing
-
Introduction to the practical
The aim of the practical was to illustrate the course
and to give participants the opportunity to practice on the "GREYC parser",
which is a general platform to design and build parsers.
The "GREYC parser" is described (in French) on
: https://lucasn01.users.greyc.fr/JacquesVergne/analyseur_GREYC/analyseur_du_GREYC.html
The practical consisted in :
-
Chunking English :
-
executing a very simple chunker for English
-
modifying rules
-
Changing natural language :
-
making a chunker for French
-
making a chunker for another language
-
Linking 2 chunks : the subject noun chunk to the verb chunk
-
Changing scale of the computed unit : clause bracketing
-
The genericity of the GREYC engine
-
the language dimension
-
the scale dimension
For still more details, ask me by mail : mailto:Jacques.Vergne@unicaen.fr
Tutorial speakers :
Jacques
Vergne (Jacques.Vergne@unicaen.fr) is a lecturer and researcher
in computer science and NLP at the GREYC,
the computer science laboratory of the university of Caen (France). His
research domain is robust and accurate parsing. He has built the 1998
parser which obtained the best results in the GRACE contest (http://limsi.fr/TLP/grace/),
an international evaluation which had the aim to compare taggers for French
in a unique protocol.
Emmanuel
Giguet acted as project manager of the team which realized the
"GREYC parser". His PhD. thesis has given the 1998 parser a more general
design which now is implemented in the "GREYC parser".
Some references for a preliminary insight into the topic
:
Abney S. (1991). "Parsing By Chunks". In: Robert Berwick,
Steven Abney and Carol Tenny (eds.), Principle-Based Parsing. Kluwer Academic
Publishers, Dordrecht. http://www.sfs.nphil.uni-tuebingen.de/~abney/Abney_90e.ps.gz
Abney S. (1995). "Chunks and Dependencies: Bringing Processing
Evidence to Bear on Syntax". In: Computational Linguistics and the Foundations
of Linguistic Theory. CSLI. pp. 145-164.
http://www.sfs.nphil.uni-tuebingen.de/~abney/Abney_91i.ps.gz
Abney S. (1996b). "Partial Parsing via Finite-State Cascades".
In Proceedings of the ESSLLI '96 Robust Parsing Workshop.
http://www.sfs.nphil.uni-tuebingen.de/~abney/96h.ps.gz
Aït-Mokhtar S. and Chanod J.-P. (1997). "Incremental
Finite-State Parsing". In Proceedings of ANLP'97, Washington, pp.72-79.
http://www.xrce.xerox.com/publis/mltt/mltt-97-01.ps
Brill E. (1992). "A simple rule-based part-of-speech tagger".
In Proceedings of the Third Conference on Applied Natural Language Processing,
Trento. ACL.
Church K. and Mercer R. (1993). "Introduction of the special
issue of Computational Linguistics Using large corpora". Computational
Linguistics, volume 19, number 1, pp.1-24.
Computational Linguistics (1993). "Special issue on Using
large corpora". Volume 19, number 1 and 2.
Ejerhed E. (1996). "Finite state segmentation of discourse
into clauses". In Proceedings of ECAI'96 Workshop Extended finite state
models of language, A. Kornai (Ed.) pp.24-33. http://www.kornai.com/ECAI/ejerhed.html
Giguet E., Vergne J. (1997). "From Part-of-Speech Tagging
to Memory-based Deep Syntactic Analysis". In Proceedings of the International
Workshop on Parsing Technologies (IWPT'97), MIT, Boston, Massachussets.
https://giguete.users.greyc.fr/iwpt97/GiguetIwpt97.pdf
Giguet E. (1998). "Méthode pour l'analyse automatique
de structures formelles sur documents multilingues". Ph.D thesis, Université
de Caen.
https://giguete.users.greyc.fr/these/
Grefenstette G. (1996). "Light Parsing as Finite-State
Filtering". ECAI'96 workshop on "Extended finite state models of language".
Aug. 11-12, Budapest.
http://www.xrce.xerox.com/publis/mltt/mltt-96-12.ps
Vergne J. and Giguet E. (1998). "Regards Théoriques
sur le Tagging". Cinquième conférence annuelle : Le Traitement
Automatique des Langues Naturelles, TALN'98, Paris, pp. 22-31.
https://lucasn01.users.greyc.fr/JacquesVergne/VergneGiguetTaln98.pdf