Regular Expressions
Parsing Numbers


return to main index



Introduction

This tutorial presents a rudimentary explanation of regular expressions (RE's) and their use in matching and replacing text found in rib files. Fortunately, rib files have a consistent structure that makes writing RE's relatively straightforward. For example, an RE that would match the following statement in a rib file can take advantage of the fact the values associated with the Illuminate statement are always integers rather than floating point numbers.

    Illuminate 2 1

Although the examples in this tutorial, written in Tcl, assume the text that is being searched/replaced are rib statements originating from a rib file the code that deals with the opening, reading and closing of text files has been omitted. Details about file handling can be found in the tutorial "Tcl: File Filtering" There are two main procedures in Tcl that use RE's, namely,

    regsub OPTIONS REGULAR_EXPRESSION  text_in  text_replace  text_out 
    regexp OPTIONS REGULAR_EXPRESSION  text_in  text_out 

Examples 1 and 2 demonstrate how these procedures are used.


Example 1 regsub


set input "Hello bafundza"
  
regsub {bafundza} $input {student} copy
puts $copy

The text shown in bold specifies a regular expression, called a pattern, consisting of literal characters that regsub will compare to the input text. Any part of the input text that matches the pattern is replaced by the text ("students") bounded by the second set of curly braces. Using Cutter to execute the Tcl script shown in listing 1 generates the following output.

    hello student

For information about the execution of Tcl scripts with Cutter refer to the tutorial "Cutter: About"


Rib Files & ShadingRate

Rib files produced by Maya (mtor and RfM Pro) and Houdini have one or more ShadingRate statements. ShadingRate controls the fineness of the micro-polygons that a surface will be divided into by a RenderMan. Having control over the size of micro-polygons is an important way to "balance" rendering quality and rendering speed. Maya and Houdini can a long time to generate rib files for a complex scene. If ShadingRate(s) has/have been set to an inappropriate value it is more efficient to change an existing sequence of rib files rather than to generate a new set. This example of the use of regsub is intended to address the task of making such edits to a rib file. A ShadingRate statement can appear in a rib file in these formats,

    # integer format
    ShadingRate 2
    
    # decimal format
    ShadingRate 0.5
    
    # expotential format - possible but highly unlikely!
    ShadingRate 9.59616e-017

Expotential format will be ignored for the purposes of this tutorial. Instead of dealing directly with a rib file, assume we have a single line of text consisting of,

    ShadingRate 5

that must be changed to,

    ShadingRate 10

This could be accomplished as follows.


Example 1b


set input "ShadingRate 5"
  
regsub {ShadingRate \d} $input {ShadingRate 10} copy
puts $copy

The pattern means,

    match ShadingRate,
    followed by a digit

However, the pattern works only if the value of ShadingRate is a single digit. This input will cause the pattern to fail!

   ShadingRate 23

To specify one or more digits the regular expression counting qualifier "+" can be used.


Example 1c


set input "ShadingRate 23"
  
regsub {ShadingRate \d+} $input {ShadingRate 10} copy
puts $copy

The pattern means,

    match ShadingRate,
    followed by one or more digits

But this input will cause the pattern to fail!

   ShadingRate 2.5

The pattern could be changed to {ShadingRate \d+.\d+}. While this pattern will appear to be successful it is so only because the period is a regular expression metacharacter that means "match any character" following the period.

The next pattern attempts to cater for the optional inclusion of a decimal point.


Example 1d


set input "ShadingRate 2.5"
  
regsub {ShadingRate \d*[.]?\d*} $input {ShadingRate 10} copy
puts $copy

The pattern means,

    match ShadingRate,
    followed by zero or more digits,
    followed by an optional decimal point,
    folowed by zero or more digits

The pattern uses the counting qualifiers "*" and "?". Unfortunately, the pattern is flawed because it defines sub-components of the number pattern as optional. As a result it doesn't even require a numeric value in order to achieve a match!

To derive a reasonably robust pattern we must note the numeric value of ShadingRate can have an integer format or, alternatively a decimal format. This is achieved by using the "|" alternation metacharacter.


Example 1e


set input "ShadingRate 5"
  
regsub {ShadingRate \d*([.]\d*)?|[.]\d+} $input {ShadingRate 10} copy
puts $copy

The pattern means,

    match ShadingRate,
    followed by zero or more digits,
    followed by either,
        an optional decimal point and zero or more digits
    OR
        a decimal point and at least one digit


Example 2 regexp


set input {Student, unjani wena?}
  
if {[regexp -nocase {student} $input]} {
    regsub {unjani wena} $input {how are you} copy
    }
puts $copy

The procedure regexp returns the value "1" if it can match the regular expression pattern (the text within the curly braces) with any part of the input text. The procedure supports a number of "switches" ie. flags, that be used to control its behavior. For example, the "-nocase" switch ensure the matching mechanism of regexp is case-insensitive. Executing the script in Cutter generates the following output.

    Student, how are you?

The documentation on this proecedure can be displayed in Cutter by alt + double clicking on the "regexp". Refer to the tutorial "Cutter: Integration with Tcl".


Rib Files & Surface Shading

Rib files generally contain one or more Surface statements. The Surface statement specifies which surface shader a renderer is to use to "colorize" an object. Following the name of a shader is a list of the parameters that belong to the shader - their names and values. Rib files produced by Maya (mtor/RfM Pro) also specify the datatype of each parameter. During "look development" of a very complex scene it is often more efficient to repeatedly edit, through the use of a Python or Tcl script, a single "experimental" rib file than incurring the delay of generating a fresh rib.

A typical Maya (mtor/Rfm Pro) Surface statement is shown below. Houdini and Maya write such statements on a single line.

    Surface "foo" "float bar" [1.0] "float Kd" [0.5]

Suppose the value of the "bar" parameter must be altered to 2.5. Although somewhat unlikely, it is possible that another, entirely different, shader may use a parameter with the same name. Therefore, before a Python or Tcl script attempts to substitute a value for "bar" it must check if the parameter "belongs" to "foo". For example,


Example 2b


set input {Surface "foo" "float bar" [1.0] "float Kd" [0.5]}
  
if {[regexp {Surface "foo"} $input]} {
    regsub {("float bar") \[(\d*([.]\d*)?|[.]\d+)\]} $input {\1 [2.5]} copy
    # note:  <--group1-->    <-----group2------>
    }
puts $copy

The regsub uses RE's grouping/capturing mechanism "()" to ensure the "separate" parts of the pattern can be referenced later. The number pattern has been modified so that it matches the opening and closing square brackets that form part of the parameter value. For convenience the substitution string {\1 [2.5]} refers to group 1. The replacement string could have been {"float bar" [2.5]}.

Limitations
The RE patterns presented on this page have at least a couple of potential flaws. The patterns,

  • assume one space between parameter name and value,
  • values are always positive, and they
  • do not match values written in expotential format.



© 2002- Malcolm Kesson. All rights reserved.