1Q() Q()
2
3
4
6 q - Text as Data
7
9 `q <flags> <query>`
10
11 Simplest execution is `q "SELECT * FROM myfile"` which prints the entire file.
12
14 q allows performing SQL-like statements on tabular text data. Its pur‐
15 pose is to bring SQL expressive power to the Linux command line and to
16 provide easy access to text as actual data.
17
18 Query should be an SQL-like query which contains filenames instead of
19 table names (or - for stdin). The query itself should be provided as
20 one parameter to the tool (i.e. enclosed in quotes).
21
22 Use -H to signify that the input contains a header line. Column names
23 will be detected automatically in that case, and can be used in the
24 query. If this option is not provided, columns will be named cX, start‐
25 ing with 1 (e.g. q "SELECT c3,c8 from ...").
26
27 Use -d to specify the input delimiter.
28
29 Column types are auto detected by the tool, no casting is needed.
30
31 Please note that column names that include spaces need to be used in
32 the query with back-ticks, as per the sqlite standard.
33
34 Query/Input/Output encodings are fully supported (and q tries to pro‐
35 vide out-of-the-box usability in that area). Please use -e,-E and -Q to
36 control encoding if needed.
37
38 All sqlite3 SQL constructs are supported, including joins across files
39 (use an alias for each table).
40
41 See https://github.com/harelba/q for more details.
42
44 q gets one parameter - An SQL-like query.
45
46 Any standard SQL expression, condition (both WHERE and HAVING), GROUP
47 BY, ORDER BY etc. are allowed.
48
49 JOINs are supported and Subqueries are supported in the WHERE clause,
50 but unfortunately not in the FROM clause for now. Use table aliases
51 when performing JOINs.
52
53 The SQL syntax itself is sqlite´s syntax. For details look at
54 http://www.sqlite.org/lang.html or search the net for examples.
55
56 NOTE: Full type detection is implemented, so there is no need for any
57 casting or anything.
58
59 NOTE2: When using the -O output header option, use column name aliases
60 if you want to control the output column names. For example, q -O -H
61 "select count(*) cnt,sum(*) as mysum from -" would output cnt and mysum
62 as the output header column names.
63
65 q can also get some runtime flags. The following parameters can be
66 used, all optional:
67
68
69
70 Options:
71 -h, --help show this help message and exit
72 -v, --version Print version
73
74 Input Data Options:
75 -H, --skip-header Skip header row. This has been changed from earlier
76 version - Only one header row is supported, and the
77 header row is used for column naming
78 -d DELIMITER, --delimiter=DELIMITER
79 Field delimiter. If none specified, then space is used
80 as the delimiter.
81 -t, --tab-delimited
82 Same as -d <tab>. Just a shorthand for handling
83 standard tab delimited file You can use $´\t´ if you
84 want (this is how Linux expects to provide tabs in the
85 command line
86 -e ENCODING, --encoding=ENCODING
87 Input file encoding. Defaults to UTF-8. set to none
88 for not setting any encoding - faster, but at your own
89 risk...
90 -z, --gzipped Data is gzipped. Useful for reading from stdin. For
91 files, .gz means automatic gunzipping
92 -A, --analyze-only Analyze sample input and provide information about
93 data types
94 -m MODE, --mode=MODE
95 Data parsing mode. fluffy, relaxed and strict. In
96 strict mode, the -c column-count parameter must be
97 supplied as well
98 -c COLUMN_COUNT, --column-count=COLUMN_COUNT
99 Specific column count when using relaxed or strict
100 mode
101 -k, --keep-leading-whitespace
102 Keep leading whitespace in values. Default behavior
103 strips leading whitespace off values, in order to
104 provide out-of-the-box usability for simple use cases.
105 If you need to preserve whitespace, use this flag.
106 --disable-double-double-quoting
107 Disable support for double double-quoting for escaping
108 the double quote character. By default, you can use ""
109 inside double quoted fields to escape double quotes.
110 Mainly for backward compatibility.
111 --disable-escaped-double-quoting
112 Disable support for escaped double-quoting for
113 escaping the double quote character. By default, you
114 can use \" inside double quoted fields to escape
115 double quotes. Mainly for backward compatibility.
116 -w INPUT_QUOTING_MODE, --input-quoting-mode=INPUT_QUOTING_MODE
117 Input quoting mode. Possible values are all, minimal
118 and none. Note the slightly misleading parameter name,
119 and see the matching -W parameter for output quoting.
120
121 Output Options:
122 -D OUTPUT_DELIMITER, --output-delimiter=OUTPUT_DELIMITER
123 Field delimiter for output. If none specified, then
124 the -d delimiter is used if present, or space if no
125 delimiter is specified
126 -T, --tab-delimited-output
127 Same as -D <tab>. Just a shorthand for outputting tab
128 delimited output. You can use -D $´\t´ if you want.
129 -O, --output-header
130 Output header line. Output column-names are determined
131 from the query itself. Use column aliases in order to
132 set your column names in the query. For example,
133 ´select name FirstName,value1/value2 MyCalculation
134 from ...´. This can be used even if there was no
135 header in the input.
136 -b, --beautify Beautify output according to actual values. Might be
137 slow...
138 -f FORMATTING, --formatting=FORMATTING
139 Output-level formatting, in the format X=fmt,Y=fmt
140 etc, where X,Y are output column numbers (e.g. 1 for
141 first SELECT column etc.
142 -E OUTPUT_ENCODING, --output-encoding=OUTPUT_ENCODING
143 Output encoding. Defaults to ´none´, leading to
144 selecting the system/terminal encoding
145 -W OUTPUT_QUOTING_MODE, --output-quoting-mode=OUTPUT_QUOTING_MODE
146 Output quoting mode. Possible values are all, minimal,
147 nonnumeric and none. Note the slightly misleading
148 parameter name, and see the matching -w parameter for
149 input quoting.
150
151 Query Related Options:
152 -q QUERY_FILENAME, --query-filename=QUERY_FILENAME
153 Read query from the provided filename instead of the
154 command line, possibly using the provided query
155 encoding (using -Q).
156 -Q QUERY_ENCODING, --query-encoding=QUERY_ENCODING
157 query text encoding. Experimental. Please send your
158 feedback on this
159
160
161
162 Table names
163 The table names are the actual file names that you want to read from.
164 Path names are allowed. Use "-" if you want to read from stdin (e.g. q
165 "SELECT * FROM -")
166
167 Multiple files can be concatenated by using one of both of the follow‐
168 ing ways:
169
170 · Separating the filenames with a + sign: SELECT * FROM
171 datafile1+datafile2+datefile3.
172
173 · Using glob matching: SELECT * FROM mydata*.dat
174
175
176
177 Files with .gz extension are considered to be gzipped and decompressed
178 on the fly.
179
180 Parsing Modes
181 q supports multiple parsing modes:
182
183 · relaxed - This is the default mode. It tries to lean towards sim‐
184 plicity of use. When a row doesn´t contains enough columns, they´ll
185 be filled with nulls, and when there are too many, the extra values
186 will be merged to the last column. Defining the number of expected
187 columns in this mode is done using the -c parameter. If it is not
188 provided, then the number of columns is detected automatically (In
189 most use cases, there is no need to specify -c)
190
191 · strict - Strict mode is for hardcore csv/tsv parsing. Whenever a
192 row doesn´t contain the proper number of columns, processing will
193 stop. -c must be provided when using this mode
194
195 · fluffy - This mode should not be used, and is just some kind of
196 "backward compatible" parsing mode which was used by q previously.
197 It´s left as a separate parsing mode on purpose, in order to accom‐
198 modate existing users. If you are such a user, please open a bug
199 for your use case, and I´ll see how I can incorporate it into the
200 other modes. It is reasonable to say that this mode will be removed
201 in the future.
202
203
204
205 Output formatting option
206 The format of F is as a list of X=f separated by commas, where X is a
207 column number and f is a python format:
208
209 · X - column number - This is the SELECTed column (or expression)
210 number, not the one from the original table. E.g, 1 is the first
211 SELECTed column, 3 is the third SELECTed column.
212
213 · f - A python formatting string - See
214 http://docs.python.org/release/2.4.4/lib/typesseq-strings.html for
215 details if needed. ** Example: -f 3=%-10s,5=%4.3f,1=%x
216
217
218
220 Example 1: ls -ltrd * | q "select c1,count(1) from - group by c1"
221
222
223
224 This example would print a count of each unique permission string in the current folder.
225
226
227
228 Example 2: seq 1 1000 | q "select avg(c1),sum(c1) from -"
229
230
231
232 This example would provide the average and the sum of the numbers in the range 1 to 1000
233
234
235
236 Example 3: sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as
237 total from - group by c5,c6 order by total desc"
238
239
240
241 This example will output the total size in MB per user+group in the /tmp subtree
242
243
244
245 Example 4: ps -ef | q -H "select UID,count(*) cnt from - group by UID
246 order by cnt desc limit 3"
247
248
249
250 This example will show process counts per UID, calculated from ps data. Note that the column names provided by ps are being used as column name in the query (The -H flag activates that option)
251
252
253
255 Harel Ben-Attia (harelba@gmail.com)
256
257 @harelba https://twitter.com/harelba on Twitter
258
259 Any feedback/suggestions/complaints regarding this tool would be much
260 appreciated. Contributions are most welcome as well, of course.
261
263 Copyright (C) 2012--2014 Harel Ben Attia
264
265 This program is free software; you can redistribute it and/or modify it
266 under the terms of the GNU General Public License as published by the
267 Free Software Foundation; either version 3, or (at your option) any
268 later version.
269
270 This program is distributed in the hope that it will be useful,but
271 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
272 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
273 Public License for more details. You should have received a copy of the
274 GNU General Public License along with this program; if not, write to
275 the Free Software Foundation, Inc., 51 Franklin Street - Fifth Floor,
276 Boston, MA 02110-1301, USA
277
278
279
280 March 2018 Q()