org.apache.uima.examples.cas
Class RegExAnnotator

java.lang.Object
  extended by org.apache.uima.analysis_component.AnalysisComponent_ImplBase
      extended by org.apache.uima.analysis_component.Annotator_ImplBase
          extended by org.apache.uima.analysis_component.CasAnnotator_ImplBase
              extended by org.apache.uima.examples.cas.RegExAnnotator
All Implemented Interfaces:
AnalysisComponent

public class RegExAnnotator
extends CasAnnotator_ImplBase

Annotator that find substrings of the input document that match regular expressions.

There are two ways to specify the regular expressions - via configuration parameters or via an external resource file.

This annotator takes the following optional configuration parameters:

The indices of the Patterns and TypeNames arrays correspond, so that a substring that matches Patterns[i] will result in an annotation of type TypeNames[i].

It is also possible to provide an external resource file that declares the annotation type names and the regular expressions to match. The annotator will look for this file under the resource key "PatternFile". The file format is as follows:

If a regular expression is matched, it will be annotated with the last annotation type declared (the nearest preceding line starting with %).


Field Summary
static java.lang.String MESSAGE_DIGEST
           
 
Constructor Summary
RegExAnnotator()
           
 
Method Summary
protected  int[] getRangesToAnnotate(CAS aCAS)
          Utility method that determines which subranges of the document text should be annotated by this annotator.
 void initialize(UimaContext aContext)
          Performs any startup tasks required by this annotator.
 void process(CAS aCAS)
          Invokes this annotator's analysis logic.
 void typeSystemInit(TypeSystem aTypeSystem)
          Acquires references to CAS Type and Feature objects that are later used during the process(CAS) method.
 
Methods inherited from class org.apache.uima.analysis_component.CasAnnotator_ImplBase
getRequiredCasInterface, process
 
Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, next
 
Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MESSAGE_DIGEST

public static final java.lang.String MESSAGE_DIGEST
See Also:
Constant Field Values
Constructor Detail

RegExAnnotator

public RegExAnnotator()
Method Detail

initialize

public void initialize(UimaContext aContext)
                throws ResourceInitializationException
Performs any startup tasks required by this annotator. This implementation reads the configuration parmaeters and compiles the regular expressions.

Specified by:
initialize in interface AnalysisComponent
Overrides:
initialize in class AnalysisComponent_ImplBase
Parameters:
aContext - Provides access to services and resources managed by the framework. This includes configuration parameters, logging, and access to external resources.
Throws:
ResourceInitializationException - if this AnalysisComponent cannot initialize successfully.
See Also:
BaseAnnotator.initialize(AnnotatorContext)

typeSystemInit

public void typeSystemInit(TypeSystem aTypeSystem)
                    throws AnalysisEngineProcessException
Acquires references to CAS Type and Feature objects that are later used during the process(CAS) method.

Overrides:
typeSystemInit in class CasAnnotator_ImplBase
Throws:
AnalysisEngineProcessException - if the provided type system is missing types or features required by this annotator
See Also:
BaseAnnotator.typeSystemInit(TypeSystem)

process

public void process(CAS aCAS)
             throws AnalysisEngineProcessException
Invokes this annotator's analysis logic. This annotator uses the java regular expression package to find annotations using the regular expressions defined by its configuration parameters.

Specified by:
process in class CasAnnotator_ImplBase
Parameters:
aCAS - the CAS to process
aResultSpec - A list of outputs that this annotator should produce.
Throws:
AnnotatorProcessException - if a failure occurs during processing.
AnalysisEngineProcessException - if a problem occurs during processing
See Also:
CasAnnotator_ImplBase.process(CAS)

getRangesToAnnotate

protected int[] getRangesToAnnotate(CAS aCAS)
Utility method that determines which subranges of the document text should be annotated by this annotator. This is done as follows:

Parameters:
aCAS - CAS currently being processed
Returns:
an array of integers indicating the document subranges eligible for annotation. Begin and end positions of the subranges are stored in successive elements of the array. For example, elements 0 and 1 are the start and end of the first subrange; elements 2 and 3 are the start and end of the second subrange, and so on.


Copyright © 2013. All Rights Reserved.