q-text-as-data(1)

1Q()                                                                        Q()
2
3
4

NAME

6       q - Text as Data
7

SYNOPSIS

9       `q <flags> <query>`
10
11       Simplest execution is `q "SELECT * FROM myfile"` which prints the entire file.
12

DESCRIPTION

14       q  allows performing SQL-like statements on tabular text data. Its pur‐
15       pose is to bring SQL expressive power to the Linux command line and  to
16       provide easy access to text as actual data.
17
18       Query  should  be an SQL-like query which contains filenames instead of
19       table names (or - for stdin). The query itself should  be  provided  as
20       one parameter to the tool (i.e. enclosed in quotes).
21
22       Use  -H  to signify that the input contains a header line. Column names
23       will be detected automatically in that case, and can  be  used  in  the
24       query. If this option is not provided, columns will be named cX, start‐
25       ing with 1 (e.g. q "SELECT c3,c8 from ...").
26
27       Use -d to specify the input delimiter.
28
29       Column types are auto detected by the tool, no casting is needed.
30
31       Please note that column names that include spaces need to  be  used  in
32       the query with back-ticks, as per the sqlite standard.
33
34       Query/Input/Output  encodings  are fully supported (and q tries to pro‐
35       vide out-of-the-box usability in that area). Please use -e,-E and -Q to
36       control encoding if needed.
37
38       All  sqlite3 SQL constructs are supported, including joins across files
39       (use an alias for each table).
40
41       See https://github.com/harelba/q for more details.
42

QUERY

44       q gets one parameter - An SQL-like query.
45
46       Any standard SQL expression, condition (both WHERE and  HAVING),  GROUP
47       BY, ORDER BY etc. are allowed.
48
49       JOINs  are  supported and Subqueries are supported in the WHERE clause,
50       but unfortunately not in the FROM clause for  now.  Use  table  aliases
51       when performing JOINs.
52
53       The  SQL  syntax  itself  is  sqlite´s  syntax.  For  details  look  at
54       http://www.sqlite.org/lang.html or search the net for examples.
55
56       NOTE: Full type detection is implemented, so there is no need  for  any
57       casting or anything.
58
59       NOTE2:  When using the -O output header option, use column name aliases
60       if you want to control the output column names. For example,  q  -O  -H
61       "select count(*) cnt,sum(*) as mysum from -" would output cnt and mysum
62       as the output header column names.
63

RUNTIME OPTIONS

65       q can also get some runtime flags.  The  following  parameters  can  be
66       used, all optional:
67
68
69
70           Options:
71             -h, --help            show this help message and exit
72             -v, --version         Print version
73
74             Input Data Options:
75               -H, --skip-header   Skip header row. This has been changed from earlier
76                                   version - Only one header row is supported, and the
77                                   header row is used for column naming
78               -d DELIMITER, --delimiter=DELIMITER
79                                   Field delimiter. If none specified, then space is used
80                                   as the delimiter.
81               -t, --tab-delimited
82                                   Same as -d <tab>. Just a shorthand for handling
83                                   standard tab delimited file You can use $´\t´ if you
84                                   want (this is how Linux expects to provide tabs in the
85                                   command line
86               -e ENCODING, --encoding=ENCODING
87                                   Input file encoding. Defaults to UTF-8. set to none
88                                   for not setting any encoding - faster, but at your own
89                                   risk...
90               -z, --gzipped       Data is gzipped. Useful for reading from stdin. For
91                                   files, .gz means automatic gunzipping
92               -A, --analyze-only  Analyze sample input and provide information about
93                                   data types
94               -m MODE, --mode=MODE
95                                   Data parsing mode. fluffy, relaxed and strict. In
96                                   strict mode, the -c column-count parameter must be
97                                   supplied as well
98               -c COLUMN_COUNT, --column-count=COLUMN_COUNT
99                                   Specific column count when using relaxed or strict
100                                   mode
101               -k, --keep-leading-whitespace
102                                   Keep leading whitespace in values. Default behavior
103                                   strips leading whitespace off values, in order to
104                                   provide out-of-the-box usability for simple use cases.
105                                   If you need to preserve whitespace, use this flag.
106               --disable-double-double-quoting
107                                   Disable support for double double-quoting for escaping
108                                   the double quote character. By default, you can use ""
109                                   inside double quoted fields to escape double quotes.
110                                   Mainly for backward compatibility.
111               --disable-escaped-double-quoting
112                                   Disable support for escaped double-quoting for
113                                   escaping the double quote character. By default, you
114                                   can use \" inside double quoted fields to escape
115                                   double quotes. Mainly for backward compatibility.
116               -w INPUT_QUOTING_MODE, --input-quoting-mode=INPUT_QUOTING_MODE
117                                   Input quoting mode. Possible values are all, minimal
118                                   and none. Note the slightly misleading parameter name,
119                                   and see the matching -W parameter for output quoting.
120
121             Output Options:
122               -D OUTPUT_DELIMITER, --output-delimiter=OUTPUT_DELIMITER
123                                   Field delimiter for output. If none specified, then
124                                   the -d delimiter is used if present, or space if no
125                                   delimiter is specified
126               -T, --tab-delimited-output
127                                   Same as -D <tab>. Just a shorthand for outputting tab
128                                   delimited output. You can use -D $´\t´ if you want.
129               -O, --output-header
130                                   Output header line. Output column-names are determined
131                                   from the query itself. Use column aliases in order to
132                                   set your column names in the query. For example,
133                                   ´select name FirstName,value1/value2 MyCalculation
134                                   from ...´. This can be used even if there was no
135                                   header in the input.
136               -b, --beautify      Beautify output according to actual values. Might be
137                                   slow...
138               -f FORMATTING, --formatting=FORMATTING
139                                   Output-level formatting, in the format X=fmt,Y=fmt
140                                   etc, where X,Y are output column numbers (e.g. 1 for
141                                   first SELECT column etc.
142               -E OUTPUT_ENCODING, --output-encoding=OUTPUT_ENCODING
143                                   Output encoding. Defaults to ´none´, leading to
144                                   selecting the system/terminal encoding
145               -W OUTPUT_QUOTING_MODE, --output-quoting-mode=OUTPUT_QUOTING_MODE
146                                   Output quoting mode. Possible values are all, minimal,
147                                   nonnumeric and none. Note the slightly misleading
148                                   parameter name, and see the matching -w parameter for
149                                   input quoting.
150
151             Query Related Options:
152               -q QUERY_FILENAME, --query-filename=QUERY_FILENAME
153                                   Read query from the provided filename instead of the
154                                   command line, possibly using the provided query
155                                   encoding (using -Q).
156               -Q QUERY_ENCODING, --query-encoding=QUERY_ENCODING
157                                   query text encoding. Experimental. Please send your
158                                   feedback on this
159
160
161
162   Table names
163       The  table  names are the actual file names that you want to read from.
164       Path names are allowed. Use "-" if you want to read from stdin (e.g.  q
165       "SELECT * FROM -")
166
167       Multiple  files can be concatenated by using one of both of the follow‐
168       ing ways:
169
170       ·   Separating  the  filenames  with  a   +   sign:   SELECT   *   FROM
171           datafile1+datafile2+datefile3.
172
173       ·   Using glob matching: SELECT * FROM mydata*.dat
174
175
176
177       Files  with .gz extension are considered to be gzipped and decompressed
178       on the fly.
179
180   Parsing Modes
181       q supports multiple parsing modes:
182
183       ·   relaxed - This is the default mode. It tries to lean  towards  sim‐
184           plicity of use. When a row doesn´t contains enough columns, they´ll
185           be filled with nulls, and when there are too many, the extra values
186           will  be merged to the last column. Defining the number of expected
187           columns in this mode is done using the -c parameter. If it  is  not
188           provided,  then the number of columns is detected automatically (In
189           most use cases, there is no need to specify -c)
190
191       ·   strict - Strict mode is for hardcore csv/tsv  parsing.  Whenever  a
192           row  doesn´t  contain the proper number of columns, processing will
193           stop. -c must be provided when using this mode
194
195       ·   fluffy - This mode should not be used, and is  just  some  kind  of
196           "backward  compatible" parsing mode which was used by q previously.
197           It´s left as a separate parsing mode on purpose, in order to accom‐
198           modate  existing  users.  If you are such a user, please open a bug
199           for your use case, and I´ll see how I can incorporate it  into  the
200           other modes. It is reasonable to say that this mode will be removed
201           in the future.
202
203
204
205   Output formatting option
206       The format of F is as a list of X=f separated by commas, where X  is  a
207       column number and f is a python format:
208
209       ·   X  -  column  number  - This is the SELECTed column (or expression)
210           number, not the one from the original table. E.g, 1  is  the  first
211           SELECTed column, 3 is the third SELECTed column.
212
213       ·   f      -      A      python     formatting     string     -     See
214           http://docs.python.org/release/2.4.4/lib/typesseq-strings.html  for
215           details if needed. ** Example: -f 3=%-10s,5=%4.3f,1=%x
216
217
218

EXAMPLES

220       Example 1: ls -ltrd * | q "select c1,count(1) from - group by c1"
221
222
223
224           This example would print a count of each unique permission string in the current folder.
225
226
227
228       Example 2: seq 1 1000 | q "select avg(c1),sum(c1) from -"
229
230
231
232           This example would provide the average and the sum of the numbers in the range 1 to 1000
233
234
235
236       Example  3: sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as
237       total from - group by c5,c6 order by total desc"
238
239
240
241           This example will output the total size in MB per user+group in the /tmp subtree
242
243
244
245       Example 4: ps -ef | q -H "select UID,count(*) cnt from - group  by  UID
246       order by cnt desc limit 3"
247
248
249
250           This example will show process counts per UID, calculated from ps data. Note that the column names provided by ps are being used as column name in the query (The -H flag activates that option)
251
252
253

AUTHOR

255       Harel Ben-Attia (harelba@gmail.com)
256
257       @harelba https://twitter.com/harelba on Twitter
258
259       Any  feedback/suggestions/complaints  regarding this tool would be much
260       appreciated. Contributions are most welcome as well, of course.
261

COPYRIGHT

263       Copyright (C) 2012--2014 Harel Ben Attia
264
265       This program is free software; you can redistribute it and/or modify it
266       under  the  terms of the GNU General Public License as published by the
267       Free Software Foundation; either version 3, or  (at  your  option)  any
268       later version.
269
270       This  program  is  distributed  in  the hope that it will be useful,but
271       WITHOUT ANY  WARRANTY;  without  even  the  implied  warranty  of  MER‐
272       CHANTABILITY  or  FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
273       Public License for more details. You should have received a copy of the
274       GNU  General  Public  License along with this program; if not, write to
275       the Free Software Foundation, Inc., 51 Franklin Street -  Fifth  Floor,
276       Boston, MA 02110-1301, USA
277
278
279
280                                  March 2018                               Q()