Recognizing Patterns of Text

Regular expressions can also be used to search for patterns of text. For example, if you want to search for a text pattern that starts with the letter s and ends with the letters rt (so that you would find words such as smart, start, skirt, and sport), you can use a regular expression. The .NET Framework has a number of regular expression classes that can be referenced in the System.Text.RegularExpressions namespace. You can add regular expressions to the list of smart tag expressions by creating an instance of the regular expression and passing a particular pattern.

First, we create a simple smart tag and add the word smart as a term to recognize. Later, we convert this to a regular expression and show some of the expressions you can use to recognize text patterns.

1. Add a reference to Microsoft Smart Tags 2.0 Type Library.

2. Right-click ThisDocument, and click View Code.

3. Add the following imports statement to the top of the code file: Import System.Text.RegularExpressions

4. Add the code in Listing 9.14 to the ThisDocument class. This code creates a smart tag and then adds a term that the smart tag will recognize. Note that this code also replaces the Startup event handler of ThisDocument.

Listing 9.14. Creating a simple smart tag

WithEvents SampleAction As Microsoft.Office.Tools.Word.Action

Private Sub ThisDocument_Startup(ByVal sender As Object, _ ByVal e As System.EventArgs) Handles Me.Startup

Dim MySmartTag As New Microsoft.Office.Tools.Word.SmartTag( _ "www.aw-bc.com/Demo#RegExExample", "RegEx Example")

MySmartTag.Terms.Add("smart")

SampleAction = New Microsoft.Office.Tools.Word.Action( _ "Perform Action")

MySmartTag.Actions = _

New Microsoft.Office.Tools.Word.Action() {SampleAction}

Me.VstoSmartTags.Add(MySmartTag)

Private Sub SampleAction_Click(ByVal sender As Object, _

ByVal e As Microsoft.Office.Tools.Word.ActionEventArgs) _ Handles SampleAction.Click

MsgBox("Text recognized and action performed.") End Sub

5. In Solution Explorer, right-click ThisDocument and click View Designer.

6. Add the following text to the document: Starting with Office XP, you can create smart tags that are smarter than you think. Smart tags enable you to start taking action on a recognized term.

7. Press F5 to run the code.

8. Move your cursor over the word smart (it now has a dotted line below it). Click the smart tag drop-down, and then click Perform Action. A message box will appear, as shown in Figure 9.12.

Now let's replace the code that added a single term with code that adds a regular expression to the smart tag's Expressions collection. Revise the code in the Startup event handler of ThisDocument as shown in Listing 9.15. Notice that the code for adding a term has been commented out.

Listing 9.15. Adding a regular expression to a smart tag 'MySmartTag.Terms.Add("smart")

MySmartTag.Expressions.Add(New Regex("smart"))

When you run this code, you don't get quite the same results even though our regular expression is just a series of characters that represent the same word that we passed to the Terms collection earlier. Notice that this time, Word places a dotted line under the characters smart in the word smarter, as shown in Figure 9.13.

Figure 9.12. Using a smart tag

Starting - with- O ffi c e ■ XP, ■ yo u- c an- ere ate ■ sm art- tags ■ that- are ■ smart er-than- yo u- think. ■ S m arttags ■ e nab 1 e ■ yo u-to ■ s tart-taking ■ acti o n- o n- a-re c o gmz e d-t erm. ^ j

Figure 9.13. Using a regular expression to create a simple smart tag

The word smarter is recognized because with regular expressions, any text that matches the pattern is recognized, whether or not it constitutes a whole word. (As you'll see later in this chapter, you can add expressions that limit recognition to a word boundary.) First, we add an expression that will satisfy our original goal, which was to recognize any word that contains any characters between an s and rt. In Listing 9.16, we change the code that adds the regular expression so that we supply a pattern that recognizes any characters between the s and rt.

Listing 9.16. Adding a regular expression to the Expressions collection of a smart tag

MySmartTag.Expressions.Add(New Regex("s+[a-z]+rt"))

Using [a-z] indicates that we are looking for a pattern of any characters that fall between a and z. This time, in addition to the original recognized text, the word start is recognized, as shown in Figure 9.14.

Starting-with-Offic © !P/you-can-create-sm^-tags-that-are-smarter-than-you-think. -Smarttags ■ e nab 1 e ■ y o u-to ■ stSrt-taking ■ acti o n- on- a-re c o gniz e d-t erm||

Figure 9.14. Creating a simple smart tag with an [a-z] regular expression

You might have noticed that the first word in the sentence (Starting) and the first word in the second sentence (Smart) were not recognized as smart tags. This is because both of these words start with a capital letter S. If you want your regular expression to include uppercase S as well as lowercase s, you can use the code shown in Listing 9.17. Here, we add a pipe character between the two choices and surround the choices with parentheses: (s|S).

Listing 9.17. Recognizing lower- and uppercase characters

MySmartTag.Expressions.Add(New Regex("(s|S)+[a-z]+rt"))

Now, all instances of words that have any characters between an s and rt are recognized whether the word starts with a lowercase s or an uppercase S, as shown in Figure 9.15.

Finally, if you want to change the regular expression so that only words that end in rt are recognized (complete words), you can add "\b" to the expression (indicating a word boundary), as shown in Listing 9.18.

St^ing-with-Office-^,-you-can-create-sm

Figure 9.15. Enabling the smart tag to recognize both upper- and lowercase S using (s\S)

Listing 9.18. Recognizing word boundaries

MySmartTag.Expressions.Add(New Regex("(s|S)+[a-z]+rt\b"))

Now when you run this code, only words that start with s or S and end with rt are recognized, as shown in Figure 9.16.

Starting -with- Office- XP ,■ y o u- e an- ere ate- s m art- tags ■ that- are ■ sm art er-than- y o u- think. ■ Sm art-

Figure 9.16. Enabling the smart tag to recognize words that start with s or S and end with rt

You can use regular expressions to create many other patterns. Table 9.1 lists some of the more common expressions.

Table 9.1. Regular Expressions

Description

Expression

Match any single character .

Match zero or more instances of preceding expression

*

Match one or more instances of preceding expression

+

Match any character within a set [ ]

Match any character not in the set

iA ]

Match any alphanumeric character

[a-zA-Z0-9]

Match any alphabetic character

[a-zA-Z]

Table 9.1. Regular Expressions (Continued)

Description

Expression

Match any numeric character

[0-9]

Match one character or the other, usually within a group |

Group characters

( )

Escape a character

\

Text boundary

\b

Tab character

\t

Any whitespace character

\s

Decimal digit

\d

This table provides only a very small subset of the characters and matching patterns you can use. We encourage you to become more familiar with regular expressions and the Regex class so that you can enable powerful text recognition in your smart tags. In addition, a number of tools are available for generating regular expressions, eliminating the need to build the expressions by hand. A Web search will reveal numerous tools you might use.

0 0

Post a comment