1Q() Q()
2
3
4
6 q - Text as Data
7
9 `q <flags> <query>`
10
11 Simplest execution is `q "SELECT * FROM myfile"` which prints the entire file.
12
14 q allows performing SQL-like statements on tabular text data. Its pur‐
15 pose is to bring SQL expressive power to the Linux command line and to
16 provide easy access to text as actual data.
17
18 Query should be an SQL-like query which contains filenames instead of
19 table names (or - for stdin). The query itself should be provided as
20 one parameter to the tool (i.e. enclosed in quotes).
21
22 Use -H to signify that the input contains a header line. Column names
23 will be detected automatically in that case, and can be used in the
24 query. If this option is not provided, columns will be named cX, start‐
25 ing with 1 (e.g. q "SELECT c3,c8 from ...").
26
27 Use -d to specify the input delimiter.
28
29 Column types are auto detected by the tool, no casting is needed.
30
31 Please note that column names that include spaces need to be used in
32 the query with back-ticks, as per the sqlite standard.
33
34 Query/Input/Output encodings are fully supported (and q tries to pro‐
35 vide out-of-the-box usability in that area). Please use -e,-E and -Q to
36 control encoding if needed.
37
38 All sqlite3 SQL constructs are supported, including joins across files
39 (use an alias for each table).
40
41 See https://github.com/harelba/q for more details.
42
44 q gets one parameter - An SQL-like query.
45
46 Any standard SQL expression, condition (both WHERE and HAVING), GROUP
47 BY, ORDER BY etc. are allowed.
48
49 JOINs are supported and Subqueries are supported in the WHERE clause,
50 but unfortunately not in the FROM clause for now. Use table aliases
51 when performing JOINs.
52
53 The SQL syntax itself is sqlite´s syntax. For details look at
54 http://www.sqlite.org/lang.html or search the net for examples.
55
56 NOTE: Full type detection is implemented, so there is no need for any
57 casting or anything.
58
59 NOTE2: When using the -O output header option, use column name aliases
60 if you want to control the output column names. For example, q -O -H
61 "select count(*) cnt,sum(*) as mysum from -" would output cnt and mysum
62 as the output header column names.
63
65 q can also get some runtime flags. The following parameters can be
66 used, all optional:
67
68 ```` Options: -h, --help show this help message and exit -v, --version
69 Print version
70
71 Input Data Options: -H, --skip-header Skip header row. This has been
72 changed from earlier version - Only one header row is supported, and
73 the header row is used for column naming -d DELIMITER, --delim‐
74 iter=DELIMITER Field delimiter. If none specified, then space is used
75 as the delimiter. -t, --tab-delimited Same as -d tab. Just a shorthand
76 for handling standard tab delimited file You can use $´\t´ if you want
77 (this is how Linux expects to provide tabs in the command line -e
78 ENCODING, --encoding=ENCODING Input file encoding. Defaults to UTF-8.
79 set to none for not setting any encoding - faster, but at your own
80 risk... -z, --gzipped Data is gzipped. Useful for reading from stdin.
81 For files, .gz means automatic gunzipping -A, --analyze-only Analyze
82 sample input and provide information about data types -m MODE,
83 --mode=MODE Data parsing mode. fluffy, relaxed and strict. In strict
84 mode, the -c column-count parameter must be supplied as well -c COL‐
85 UMN_COUNT, --column-count=COLUMN_COUNT Specific column count when using
86 relaxed or strict mode -k, --keep-leading-whitespace Keep leading
87 whitespace in values. Default behavior strips leading whitespace off
88 values, in order to provide out-of-the-box usability for simple use
89 cases. If you need to preserve whitespace, use this flag. --dis‐
90 able-double-double-quoting Disable support for double double-quoting
91 for escaping the double quote character. By default, you can use ""
92 inside double quoted fields to escape double quotes. Mainly for back‐
93 ward compatibility. --disable-escaped-double-quoting Disable support
94 for escaped double-quoting for escaping the double quote character. By
95 default, you can use \" inside double quoted fields to escape double
96 quotes. Mainly for backward compatibility. -w INPUT_QUOTING_MODE,
97 --input-quoting-mode=INPUT_QUOTING_MODE Input quoting mode. Possible
98 values are all, minimal and none. Note the slightly misleading parame‐
99 ter name, and see the matching -W parameter for output quoting.
100
101 Output Options: -D OUTPUT_DELIMITER, --output-delimiter=OUTPUT_DELIM‐
102 ITER Field delimiter for output. If none specified, then the -d delim‐
103 iter is used if present, or space if no delimiter is specified -T,
104 --tab-delimited-output Same as -D tab. Just a shorthand for outputting
105 tab delimited output. You can use -D $´\t´ if you want. -O, --out‐
106 put-header Output header line. Output column-names are determined from
107 the query itself. Use column aliases in order to set your column names
108 in the query. For example, ´select name FirstName,value1/value2 MyCal‐
109 culation from ...´. This can be used even if there was no header in the
110 input. -b, --beautify Beautify output according to actual values. Might
111 be slow... -f FORMATTING, --formatting=FORMATTING Output-level format‐
112 ting, in the format X=fmt,Y=fmt etc, where X,Y are output column num‐
113 bers (e.g. 1 for first SELECT column etc. -E OUTPUT_ENCODING, --out‐
114 put-encoding=OUTPUT_ENCODING Output encoding. Defaults to ´none´, lead‐
115 ing to selecting the system/terminal encoding -W OUTPUT_QUOTING_MODE,
116 --output-quoting-mode=OUTPUT_QUOTING_MODE Output quoting mode. Possible
117 values are all, minimal, nonnumeric and none. Note the slightly mis‐
118 leading parameter name, and see the matching -w parameter for input
119 quoting.
120
121 Query Related Options: -q QUERY_FILENAME, --query-filename=QUERY_FILE‐
122 NAME Read query from the provided filename instead of the command line,
123 possibly using the provided query encoding (using -Q). -Q QUERY_ENCOD‐
124 ING, --query-encoding=QUERY_ENCODING query text encoding. Experimental.
125 Please send your feedback on this ````
126
127 Table names
128 The table names are the actual file names that you want to read from.
129 Path names are allowed. Use "-" if you want to read from stdin (e.g. q
130 "SELECT * FROM -")
131
132 Multiple files can be concatenated by using one of both of the follow‐
133 ing ways:
134
135 · Separating the filenames with a + sign: SELECT * FROM
136 datafile1+datafile2+datefile3.
137
138 · Using glob matching: SELECT * FROM mydata*.dat
139
140
141
142 Files with .gz extension are considered to be gzipped and decompressed
143 on the fly.
144
145 Parsing Modes
146 q supports multiple parsing modes:
147
148 · relaxed - This is the default mode. It tries to lean towards sim‐
149 plicity of use. When a row doesn´t contains enough columns, they´ll
150 be filled with nulls, and when there are too many, the extra values
151 will be merged to the last column. Defining the number of expected
152 columns in this mode is done using the -c parameter. If it is not
153 provided, then the number of columns is detected automatically (In
154 most use cases, there is no need to specify -c)
155
156 · strict - Strict mode is for hardcore csv/tsv parsing. Whenever a
157 row doesn´t contain the proper number of columns, processing will
158 stop. -c must be provided when using this mode
159
160 · fluffy - This mode should not be used, and is just some kind of
161 "backward compatible" parsing mode which was used by q previously.
162 It´s left as a separate parsing mode on purpose, in order to accom‐
163 modate existing users. If you are such a user, please open a bug
164 for your use case, and I´ll see how I can incorporate it into the
165 other modes. It is reasonable to say that this mode will be removed
166 in the future.
167
168
169
170 Output formatting option
171 The format of F is as a list of X=f separated by commas, where X is a
172 column number and f is a python format:
173
174 · X - column number - This is the SELECTed column (or expression)
175 number, not the one from the original table. E.g, 1 is the first
176 SELECTed column, 3 is the third SELECTed column.
177
178 · f - A python formatting string - See
179 http://docs.python.org/release/2.4.4/lib/typesseq-strings.html for
180 details if needed. ** Example: -f 3=%-10s,5=%4.3f,1=%x
181
182
183
185 Example 1: ls -ltrd * | q "select c1,count(1) from - group by c1"
186
187
188
189 This example would print a count of each unique permission string in the current folder.
190
191
192
193 Example 2: seq 1 1000 | q "select avg(c1),sum(c1) from -"
194
195
196
197 This example would provide the average and the sum of the numbers in the range 1 to 1000
198
199
200
201 Example 3: sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as
202 total from - group by c5,c6 order by total desc"
203
204
205
206 This example will output the total size in MB per user+group in the /tmp subtree
207
208
209
210 Example 4: ps -ef | q -H "select UID,count(*) cnt from - group by UID
211 order by cnt desc limit 3"
212
213
214
215 This example will show process counts per UID, calculated from ps data. Note that the column names provided by ps are being used as column name in the query (The -H flag activates that option)
216
217
218
220 Harel Ben-Attia (harelba@gmail.com)
221
222 @harelba https://twitter.com/harelba on Twitter
223
224 Any feedback/suggestions/complaints regarding this tool would be much
225 appreciated. Contributions are most welcome as well, of course.
226
228 Copyright (C) 2012--2014 Harel Ben Attia
229
230 This program is free software; you can redistribute it and/or modify it
231 under the terms of the GNU General Public License as published by the
232 Free Software Foundation; either version 3, or (at your option) any
233 later version.
234
235 This program is distributed in the hope that it will be useful,but
236 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
237 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
238 Public License for more details. You should have received a copy of the
239 GNU General Public License along with this program; if not, write to
240 the Free Software Foundation, Inc., 51 Franklin Street - Fifth Floor,
241 Boston, MA 02110-1301, USA
242
243
244
245 March 2018 Q()