𝔖 Scriptorium
✦   LIBER   ✦

📁

Mastering Regular Expressions

✍ Scribed by Jeffrey E.F. Friedl


Publisher
O'Reilly Media
Year
2006
Tongue
English
Leaves
542
Edition
Third Edition
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


I've been using regular expressions (knowingly / intentionally) for the better part of the last 2 decades, and yet until I read this book, I didn't fully realise what a powerful set of tools I had at my disposal. Jeffrey Friedl's explanation of what makes the various types of engines work, and the chapter on optimising regexes for NFAs were extremely helpful. As a network security administrator I find myself having to parse through massive amounts of text and data a regular basis, and thanks to this book, I've been able to better automate a lot these processes and gleam more valuable data from the waves of information waiting to be processed. This book really should be essential reading not just for programmers and web developers, but any one who works in IT or might benefit from the use any kind of scripting / automation.

✦ Table of Contents


Cover......Page 1
Title page......Page 3
Copyright page......Page 4
Dedication......Page 5
Table of Contents......Page 7
The Need for This Book......Page 17
Intended Audience......Page 18
Organization......Page 19
The Details......Page 20
Typographical Conventions......Page 21
Exercises......Page 22
Safari®Enabled......Page 23
Personal Comments and Acknowledgments......Page 24
1 Introduction to Regular Expressions......Page 25
Solving Real Problems......Page 26
The Filename Analogy......Page 28
The Language Analogy......Page 29
Searching Text Files: Egrep......Page 30
Start and End of the Line......Page 32
Matching any one of several characters......Page 33
Negated character classes......Page 34
Matching Any Character with Dot......Page 35
Matching any one of several subexpressions......Page 37
Ignoring Differences in Capitalization......Page 38
Word Boundaries......Page 39
In a Nutshell......Page 40
Optional Items......Page 41
Other Quantifiers: Repetition......Page 42
Parentheses and Backreferences......Page 44
The Great Escape......Page 46
A Few More Examples......Page 47
Dollar amount (with optional cents)......Page 48
An HTTP/HTML URL......Page 49
Time of day, such as "9:17 am" or "12:30 pm"......Page 50
Flavor......Page 51
Extending the Time Regex to Handle a 24–Hour Clock......Page 52
Character......Page 53
Improving on the Status Quo......Page 54
Summary......Page 56
Personal Glimpses......Page 57
2 Extended Introductory Examples......Page 59
About the Examples......Page 60
A Short Introduction to Perl......Page 61
Matching Text with Regular Expressions......Page 62
Side Effects of a Successful Match......Page 64
Intertwined Regular Expressions......Page 67
A short aside—metacharacters galore......Page 68
Non-Capturing Parentheses: (?:•••)......Page 69
Generic "whitespace" with ......Page 71
Intermission......Page 73
Example: Form Letter......Page 74
Example: Prettifying a Stock Price......Page 75
A Small Mail Utility......Page 77
A Sample Email Message......Page 78
A Warning About .......Page 80
Real-world problems, real-world solutions......Page 82
Adding Commas to a Number with Lookaround......Page 83
Lookaround doesn't "consume" text......Page 84
A few more lookahead examples......Page 85
Back to the comma example.........Page 88
Word boundaries and negative lookaround......Page 89
Text-to-HTML Conversion......Page 91
Cooking special characters......Page 92
Separating paragraphs......Page 93
"Linkizing" an email address......Page 94
Matching the username and hostname......Page 95
Putting it together......Page 97
"Linkizing" an HTTP URL......Page 98
Building a regex library......Page 100
That Doubled-Word Thing......Page 101
Moving bits around: operators, functions, and objects......Page 104
Regular Expressions and Cars......Page 107
In This Chapter......Page 108
The Origins of Regular Expressions......Page 109
Egrep evolves......Page 110
POSIX—An attempt at standardization......Page 111
Perl evolves......Page 112
A partial consolidation of flavors......Page 114
At a Glance......Page 115
Care and Handling of Regular Expressions......Page 117
Integrated Handling......Page 118
A procedural example......Page 119
Regex handling in VB and other .NET languages......Page 120
Why do approaches differ?......Page 121
Search and replace in Java......Page 122
Search and replace in PHP......Page 123
GNU Emacs......Page 124
Strings as Regular Expressions......Page 125
Strings in Java......Page 126
Strings in PHP......Page 127
Strings in Tcl......Page 128
Character-Encoding Issues......Page 129
Unicode......Page 130
Characters versus combining-character sequences......Page 131
Multiple code points for the same character......Page 132
Unicode line terminator......Page 133
Case-insensitive match mode......Page 134
Dot-matches-all match mode (a.k.a., "single-line mode")......Page 135
Enhanced line-anchor match mode (a.k.a., "multiline mode")......Page 136
Common Metacharacters and Features......Page 137
These are machine dependent?......Page 139
Octal escape—\num......Page 140
Control characters: \ cchar......Page 141
Normal classes: [a-z] and [^a-z]......Page 142
Dot versus a negated character class......Page 143
Class shorthands: \w, \d, , \W, \D, \S......Page 144
Unicode properties, scripts, and blocks: \p{Prop}, \P{Prop}......Page 145
Scripts......Page 146
Other properties/qualities......Page 148
Full class set operations: [[a-z] && [^aeiou]]......Page 149
Mimicking class set operations with lookaround......Page 150
POSIX bracket-expression "character class": [[:alpha:]]......Page 151
Emacs syntax classes......Page 152
End of line/string: $, \Z, \z......Page 153
Start of match (or end of previous match): \G......Page 154
End of previous match, or start of the current match?......Page 155
Advanced Use of \G with Perl......Page 156
Lookahead (?=•••), (?!•••); Lookbehind, (?<=•••), (?<!•••)......Page 157
Mode-modified span: (?modifier:•••), such as (?i:•••)......Page 159
Literal-text span: \Q•••\E......Page 160
Grouping-only parentheses: (?:•••)......Page 161
Named capture: (?•••)......Page 162
Alternation: •••I•••I•••......Page 163
Using lookaround as the test......Page 164
Lazy quantifiers:
?, +?, ??, {num,num}?......Page 165
Guide to the Advanced Chapters......Page 166
Start Your Engines!......Page 167
The impact of standards......Page 168
Regex Engine Types......Page 169
Traditional NFA or not?......Page 170
About the Examples......Page 171
The "transmission" and the bump-along......Page 172
Capturing parentheses......Page 173
No "electric" parentheses, backreferences, or lazy quantifiers......Page 174
A subjective example......Page 175
Being too greedy......Page 176
NFA Engine: Regex-Directed......Page 177
DFA Engine: Text-Directed......Page 179
Consequences to us as users......Page 180
Backtracking......Page 181
A crummy little example......Page 182
Saved States......Page 183
A non-match......Page 184
A lazy match......Page 185
Revisiting a fuller example......Page 186
More About Greediness and Backtracking......Page 187
Problems of Greediness......Page 188
Multi-Character "Quotes"......Page 189
Using Lazy Quantifiers......Page 190
Greediness and Laziness Always Favor a Match......Page 191
The Essence of Greediness, Laziness, and Backtracking......Page 192
Possessive Quantifiers and Atomic Grouping......Page 193
The essence of atomic grouping......Page 194
Faster failures with atomic grouping......Page 195
Possessive Quantifiers, ?+, *+, ++, and {m,n}+......Page 196
The Backtracking of Lookaround......Page 197
Is Alternation Greedy?......Page 198
Taking Advantage of Ordered Alternation......Page 199
Ordered alternation pitfalls......Page 200
Really, the longest......Page 201
POSIX and the Longest-Leftmost Rule......Page 202
DFA efficiency......Page 203
DFA versus NFA: Differences in the pre-use compile......Page 204
DFA versus NFA: Differences in what is matched......Page 205
DFA Speed with NFA Capabilities: Regex Nirvana?......Page 206
Summary......Page 207
5 Practical Regex Techniques......Page 209
Continuing with Continuation Lines......Page 210
Matching an IP Address......Page 211
Know your context......Page 213
Removing the leading path from a filename......Page 214
Accessing the filename from a path......Page 215
Both leading path and filename......Page 216
Matching Balanced Sets of Parentheses......Page 217
Watching Out for Unwanted Matches......Page 218
Allowing escaped quotes in double-quoted strings......Page 220
Knowing Your Data and Making Assumptions......Page 222
Stripping Leading and Trailing Whitespace......Page 223
Matching an HTML Tag......Page 224
Matching an HTML Link......Page 225
Validating a Hostname......Page 227
Link Checker in VB.NET......Page 228
Plucking Out a URL in the Real World......Page 230
Extended Examples......Page 232
Keeping in Sync with Your Data......Page 233
Keeping the match in sync with expectations......Page 234
Maintaining sync after a non-match as well......Page 235
This example in perspective......Page 236
Parsing CSV Files......Page 237
Distrusting the bump-along......Page 239
CSV Processing in Java......Page 241
Other CSV formats......Page 242
CSV Processing in VB.NET......Page 243
6 Crafting an Efficient Expression......Page 245
A Sobering Example......Page 246
Efficiency Versus Correctness......Page 247
Effects of a Simple Change......Page 248
Advancing Further—Localizing the Greediness......Page 249
"Exponential" matches......Page 250
A Global View of Backtracking......Page 252
More Work for a POSIX NFA......Page 253
Work Required During a Non-Match......Page 254
Alternation Can Be Expensive......Page 255
Benchmarking......Page 256
Benchmarking with PHP......Page 258
Benchmarking with Java......Page 259
Benchmarking with VB.NET......Page 261
Benchmarking with Python......Page 262
Benchmarking with Tel......Page 263
No Free Lunch......Page 264
The Mechanics of Regex Application......Page 265
Compile caching......Page 266
DFAs, Tcl, and Hand-Tuning Regular Expressions......Page 267
Compile caching in the object-oriented approach......Page 268
Length-cognizance optimization......Page 269
End of string/line anchor optimization......Page 270
Simple quantifier optimization......Page 271
Character following lazy quantifier optimization......Page 272
"Excessive" backtracking detection......Page 273
State-suppression with possessive quantifiers......Page 274
Small quantifier equivalence......Page 275
Techniques for Faster Expressions......Page 276
Don't use superfluous character classes......Page 278
"Factor out" required components from the front of alternation......Page 279
Lazy Versus Greedy: Be Specific......Page 280
Split Into Multiple Regular Expressions......Page 281
Mimic Initial-Character Discrimination......Page 282
Use Atomic Grouping and Possessive Quantifiers......Page 283
Put the most likely alternative first......Page 284
Unrolling the Loop......Page 285
Method 1: Building a Regex From Past Experiences......Page 286
Constructing a general "unrolling-the-loop" pattern......Page 287
1) The start of special and normal must never intersect......Page 288
3) Special must be atomic......Page 289
Method 2: A Top-Down View......Page 290
Method 3: An Internet Hostname......Page 291
Using Atomic Grouping and Possessive Quantifiers......Page 292
Making a neverending match safe with atomic grouping......Page 293
Unrolling the continuation-line example......Page 294
Unrolling the CSV regex......Page 295
To unroll or to not unroll.........Page 296
A direct approach......Page 297
Making it work......Page 298
Unrolling the C loop......Page 299
Return to reality......Page 300
A Helping Hand to Guide the Match......Page 301
A Well-Guided Regex is a Fast Regex......Page 303
In Summary: Think!......Page 305
7 Perl......Page 307
Regular Expressions as a Language Component......Page 309
Perl's Regex Flavor......Page 310
Regex Operands and Regex Literals......Page 312
Features supported by regex literals......Page 313
Picking your own regex delimiters......Page 315
Regex Modifiers......Page 316
Regex-Related Perlisms......Page 317
Contorting an expression......Page 318
Dynamically scoped values......Page 319
Regex side effects and dynamic scoping......Page 322
Special Variables Modified by a Match......Page 323
Building and Using Regex Objects......Page 327
Match modes (or lack thereof) are very sticky......Page 328
Viewing Regex Objects......Page 329
The Match Operator......Page 330
Using a regex object......Page 331
The default target......Page 332
Different Uses of the Match Operator......Page 333
Normal "pluck data from a string"—list context, without/g......Page 334
"Pluck all matches"—list context, with the /g modifier......Page 335
Iterative Matching: Scalar Context, with /g......Page 336
The "current match location" and the pos () function......Page 337
Presetting a string's pos......Page 338
"Tag-team" matching with /gc......Page 339
The Match Operator's Environmental Relations......Page 340
Outside influences on the match operator......Page 341
The Substitution Operator......Page 342
The /e Modifier......Page 343
Multiple uses of/e......Page 344
The Split Operator......Page 345
Target string operand......Page 346
Advanced split......Page 347
Special matches at the ends of the string......Page 348
Split has no side effects......Page 349
Fun with Perl Enhancements......Page 350
Using a Dynamic Regex to Match Nested Pairs......Page 352
Using embedded code to display match-time information......Page 355
Using embedded code to see all matches......Page 356
Finding the longest match......Page 358
Using local in an Embedded-Code Construct......Page 359
Sanitizing user input for interpolation......Page 361
A Warning About Embedded Code and my Variables......Page 362
Matching Nested Constructs with Embedded Code......Page 364
Adding start- and end-of-word metacharacters......Page 365
Adding support for possessive quantifiers......Page 367
Mimicking Named Capture......Page 368
Perl Efficiency Issues......Page 371
Regex Compilation, the /o Modifier, qr/•••/, and Efficiency......Page 372
Unconditional caching......Page 374
On-demand recompilation......Page 375
Potential "gotchas" of/o......Page 376
Using m/•••/ with regex objects......Page 377
Using the default regex for efficiency......Page 378
The pre-match copy is not always needed......Page 379
How expensive is the pre-match copy?......Page 380
Don't use naughty modules......Page 381
How to Check Whether Your Code is Tainted by $&......Page 382
When not to use study......Page 383
Benchmarking......Page 384
Regex Debugging Information......Page 385
Run-time debugging information......Page 386
Final Comments......Page 387
8 Java......Page 389
Java's Regex Flavor......Page 390
Special Java character properties......Page 393
Unicode Line Terminators......Page 394
Using java.util.regex......Page 395
The Pattern.compile() Factory......Page 396
The Matcher Object......Page 397
Applying the Regex......Page 399
Querying Match Results......Page 400
Simple Search and Replace......Page 402
Simple search and replace examples......Page 403
Advanced Search and Replace......Page 404
Search-and-replace examples......Page 405
In-Place Search and Replace......Page 406
Using a different-sized replacement......Page 407
The Matcher's Region......Page 408
Points to keep in mind......Page 409
Looking outside the current region......Page 410
Transparent bounds......Page 411
Anchoring bounds......Page 412
Methods for Building a Scanner......Page 413
Examples illustrating hitEnd and requireEnd......Page 415
Other Matcher Methods......Page 416
Other Pattern Methods......Page 418
Pattern's split Method, with One Argument......Page 419
Split with a limit greater than zero......Page 420
Adding Width and Height Attributes to Image Tags......Page 421
Validating HTML with Multiple Patterns Per Matcher......Page 423
Multiple Patterns and the One-Argument find()......Page 424
Java Version Differences......Page 425
Unicode-support differences between 1.4.2 and 1.5.0......Page 426
Differences Between 1.5.0 and 1.6......Page 427
9 .NET......Page 429
.NET's Regex Flavor......Page 430
Conditional tests......Page 433
"Compiled" expressions......Page 434
Right-to-left matching......Page 435
ECMAScript mode......Page 436
Quickstart: Checking a string for match......Page 437
Quickstart: Search and replace......Page 438
Importing the regex namespace......Page 439
Regex objects......Page 440
Match objects......Page 441
Core Object Details......Page 442
Regex options......Page 443
Using Regex Objects......Page 445
Using a replacement delegate......Page 448
Using Split with capturing parentheses......Page 450
Using Match Objects......Page 451
Displaying Information about a Regex Object......Page 452
Using Group Objects......Page 454
Static "Convenience" Functions......Page 455
Support Functions......Page 456
Regex Assemblies......Page 458
Creating Your Own Regex Library with an Assembly......Page 459
Matching Nested Constructs......Page 460
Capture Objects......Page 461
10 PHP......Page 463
PHP's Regex Flavor......Page 465
The Preg Function Interface......Page 467
PHP single-quoted strings......Page 468
Delimiters......Page 469
Mode modifiers outside the regex......Page 470
PHP-specific modifiers......Page 471
"Unknown Modifier" Errors......Page 472
preg_match......Page 473
Trailing "non-participatory" elements stripped......Page 474
Named capture......Page 475
Getting more details on the match: PREG_OFFSET_CAPTURE......Page 476
preg_match_all......Page 477
Collecting match data......Page 478
The default PREG_PATTERN_ORDER arrangement......Page 479
preg_match_all and the PREG_OFFSET_CAPTURE flag......Page 480
preg_match_all with named capture......Page 481
preg_replace......Page 482
Basic one-string, one-pattern, one-replacement preg_replace......Page 483
Multiple subjects, patterns, and replacements......Page 484
Ordering of array arguments......Page 486
preg_replace_callback......Page 487
preg_split......Page 489
preg_split's limit argument......Page 490
preg_split's flag arguments......Page 492
preg_grep......Page 493
preg_quote......Page 494
"Missing" Preg Functions......Page 495
The solution......Page 496
Syntax-Checking an Unknown Pattern Argument......Page 498
Matching Text with Nested Parentheses......Page 499
Recursive reference via named capture......Page 500
No Backtracking Into Recursion......Page 501
The S Pattern Modifier: "Study"......Page 502
Enhancing the optimization with the S pattern modifier......Page 503
CSV Parsing with PHP......Page 504
The main body of this expression......Page 505
Real-world XML......Page 507
HTML?......Page 508
......Page 509
$......Page 511
.......Page 512
A......Page 513
B......Page 514
C......Page 515
D......Page 517
E......Page 518
G......Page 520
H......Page 521
I......Page 522
L......Page 523
M......Page 525
N......Page 527
P......Page 528
R......Page 533
S......Page 535
U......Page 537
V......Page 538
Z......Page 539
Colophon......Page 541


📜 SIMILAR VOLUMES


Mastering Regular Expressions
✍ Jeffrey Friedl 📂 Library 📅 2002 🏛 O'Reilly Media, Inc. 🌐 English

"This is the International Edition. The content is in English, same as US version but different cover. Please DO NOT buy if you can not accept this difference. Ship from Shanghai China, please allow about 3 weeks on the way to US or Europe. Message me if you have any questions."

Mastering Regular Expressions
✍ Jeffrey E.F. Friedl 📂 Library 📅 2006 🏛 O'Reilly Media 🌐 English

I have had occasion to require regular expressions in my code and it has always been a matter of trial and error, or looking online for something that should work. Some books have a chapter dedicated to regular expressions but I have always found them confusing. Enter Jeffrey Friedl and this excelle

Mastering Regular Expressions
✍ Jeffrey E.F. Friedl 📂 Library 📅 1997 🏛 O'Reilly Media 🌐 English

Regular expressions are a powerful tool for manipulating text and data. This book, with its unprecedented detail and breadth of coverage, will help you discover a whole new world of mastery over your computer and its data. Regular expressions are not a tool in and of themselves, but are include