Text::Shellwords::Cursor(3pm)

1Text::Shellwords::CursoUrs(e3r)Contributed Perl DocumentTaetxito:n:Shellwords::Cursor(3)
2
3
4

NAME

6       Text::Shellwords::Cursor - Parse a string into tokens
7

SYNOPSIS

9        use Text::Shellwords::Cursor;
10        my $parser = Text::Shellwords::Cursor->new();
11        my $str = 'ab cdef "ghi"    j"k\"l "';
12        my ($tok1) = $parser->parse_line($str);
13          $tok1 = ['ab', 'cdef', 'ghi', 'j', 'k"l ']
14        my ($tok2, $tokno, $tokoff) = $parser->parse_line($str, cursorpos => 6);
15           as above, but $tokno=1, $tokoff=3  (under the 'f')
16
17       DESCRIPTION
18
19       This module is very similar to Text::Shellwords and Text::ParseWords.
20       However, it has one very significant difference: it keeps track of a
21       character position in the line it's parsing.  For instance, if you pass
22       it ("zq fmgb", cursorpos=>6), it would return (['zq', 'fmgb'], 1, 3).
23       The cursorpos parameter tells where in the input string the cursor
24       resides (just before the 'b'), and the result tells you that the cursor
25       was on token 1 ('fmgb'), character 3 ('b').  This is very useful when
26       computing command-line completions involving quoting, escaping, and
27       tokenizing characters (like '(' or '=').
28
29       A few helper utilities are included as well.  You can escape a string
30       to ensure that parsing it will produce the original string
31       (parse_escape).  You can also reassemble the tokens with a visually
32       pleasing amount of whitespace between them (join_line).
33
34       This module started out as an integral part of Term::GDBUI using code
35       loosely based on Text::ParseWords.  However, it is now basically a
36       ground-up reimplementation.  It was split out of Term::GDBUI for
37       version 0.8.
38

METHODS

40       new
41          Creates a new parser.  Takes named arguments on the command line.
42
43          keep_quotes
44              Normally all unescaped, unnecessary quote marks are stripped.
45              If you specify "keep_quotes=>1", however, they are preserved.
46              This is useful if you need to know whether the string was quoted
47              or not (string constants) or what type of quotes was around it
48              (affecting variable interpolation, for instance).
49
50          token_chars
51              This argument specifies the characters that should be considered
52              tokens all by themselves.  For instance, if I pass
53              token_chars=>'=', then 'ab=123' would be parsed to ('ab', '=',
54              '123').  Without token_chars, 'ab=123' remains a single string.
55
56              NOTE: you cannot change token_chars after the constructor has
57              been called!  The regexps that use it are compiled once (m//o).
58              Also, until the Gnu Readline library can accept "=[]," without
59              diving into an endless loop, we will not tell history expansion
60              to use token_chars (it uses " \t\n()<>;&|" by default).
61
62          debug
63              Turns on rather copious debugging to try to show what the parser
64              is thinking at every step.
65
66          space_none
67          space_before
68          space_after
69              These variables affect how whitespace in the line is normalized
70              and it is reassembled into a string.  See the join_line routine.
71
72          error
73              This is a reference to a routine that should be called to
74              display a parse error.  The routine takes two arguments: a
75              reference to the parser, and the error message to display as a
76              string.
77
78          parsebail(msg)
79              If the parsel routine or any of its subroutines runs into a
80              fatal error, they call parsebail to present a very descriptive
81              diagnostic.
82
83          parsel
84              This is the heinous routine that actually does the parsing.  You
85              should never need to call it directly.  Call parse_line instead.
86
87          parse_line(line, named args)
88              This is the entrypoint to this module's parsing functionality.
89              It converts a line into tokens, respecting quoted text, escaped
90              characters, etc.  It also keeps track of a cursor position on
91              the input text, returning the token number and offset within the
92              token where that position can be found in the output.
93
94              This routine originally bore some resemblance to
95              Text::ParseWords.  It has changed almost completely, however, to
96              support keeping track of the cursor position.  It also has nicer
97              failure modes, modular quoting, token characters (see
98              token_chars in "new"), etc.  This routine now does much more.
99
100              Arguments:
101
102              line
103                 This is a string containing the command-line to parse.
104
105              This routine also accepts the following named parameters:
106
107              cursorpos
108                 This is the character position in the line to keep track of.
109                 Pass undef (by not specifying it) or the empty string to have
110                 the line processed with cursorpos ignored.
111
112                 Note that passing undef is not the same as passing some
113                 random number and ignoring the result!  For instance, if you
114                 pass 0 and the line begins with whitespace, you'll get a
115                 0-length token at the beginning of the line to represent the
116                 cursor in the middle of the whitespace.  This allows command
117                 completion to work even when the cursor is not near any
118                 tokens.  If you pass undef, all whitespace at the beginning
119                 and end of the line will be trimmed as you would expect.
120
121                 If it is ambiguous whether the cursor should belong to the
122                 previous token or to the following one (i.e. if it's between
123                 two quoted strings, say "a""b" or a token_char), it always
124                 gravitates to the previous token.  This makes more sense when
125                 completing.
126
127              fixclosequote
128                 Sometimes you want to try to recover from a missing close
129                 quote (for instance, when calculating completions), but
130                 usually you want a missing close quote to be a fatal error.
131                 fixclosequote=>1 will implicitly insert the correct quote if
132                 it's missing.  fixclosequote=>0 is the default.
133
134              messages
135                 parse_line is capable of printing very informative error
136                 messages.  However, sometimes you don't care enough to print
137                 a message (like when calculating completions).  Messages are
138                 printed by default, so pass messages=>0 to turn them off.
139
140              This function returns a reference to an array containing three
141              items:
142
143              tokens
144                 A the tokens that the line was separated into (ref to an
145                 array of strings).
146
147              tokno
148                 The number of the token (index into the previous array) that
149                 contains cursorpos.
150
151              tokoff
152                 The character offet into tokno of cursorpos.
153
154              If the cursor is at the end of the token, tokoff will point to 1
155              character past the last character in tokno, a non-existant
156              character.  If the cursor is between tokens (surrounded by
157              whitespace), a zero-length token will be created for it.
158
159          parse_escape(lines)
160              Escapes characters that would be otherwise interpreted by the
161              parser.  Will accept either a single string or an arrayref of
162              strings (which will be modified in-place).
163
164          join_line(tokens)
165              This routine does a somewhat intelligent job of joining tokens
166              back into a command line.  If token_chars (see "new") is empty
167              (the default), then it just escapes backslashes and quotes, and
168              joins the tokens with spaces.
169
170              However, if token_chars is nonempty, it tries to insert a
171              visually pleasing amount of space between the tokens.  For
172              instance, rather than 'a ( b , c )', it tries to produce 'a (b,
173              c)'.  It won't reformat any tokens that aren't found in
174              $self->{token_chars}, of course.
175
176              To change the formatting, you can redefine the variables
177              $self->{space_none}, $self->{space_before}, and
178              $self->{space_after}.  Each variable is a string containing all
179              characters that should not be surrounded by whitespace, should
180              have whitespace before, and should have whitespace after,
181              respectively.  Any character found in token_chars, but non in
182              any of these space_ variables, will have space placed both
183              before and after.
184

BUGS

186       None known.
187

LICENSE

189       Copyright (c) 2003-2011 Scott Bronson, all rights reserved.  This
190       program is covered by the MIT license.
191

AUTHOR

193       Scott Bronson <bronson@rinspin.com>
194
195
196
197perl v5.38.0                      2023-07-21       Text::Shellwords::Cursor(3)