q-text-as-data(1)

1Q()                                                                        Q()
2
3
4

NAME

6       q - Text as Data
7

SYNOPSIS

9       `q <flags> <query>`
10
11       Simplest execution is `q "SELECT * FROM myfile"` which prints the entire file.
12

DESCRIPTION

14       q  allows performing SQL-like statements on tabular text data. Its pur‐
15       pose is to bring SQL expressive power to the Linux command line and  to
16       provide easy access to text as actual data.
17
18       Query  should  be an SQL-like query which contains filenames instead of
19       table names (or - for stdin). The query itself should  be  provided  as
20       one parameter to the tool (i.e. enclosed in quotes).
21
22       Use  -H  to signify that the input contains a header line. Column names
23       will be detected automatically in that case, and can  be  used  in  the
24       query. If this option is not provided, columns will be named cX, start‐
25       ing with 1 (e.g. q "SELECT c3,c8 from ...").
26
27       Use -d to specify the input delimiter.
28
29       Column types are auto detected by the tool, no casting is needed.
30
31       Please note that column names that include spaces need to  be  used  in
32       the query with back-ticks, as per the sqlite standard.
33
34       Query/Input/Output  encodings  are fully supported (and q tries to pro‐
35       vide out-of-the-box usability in that area). Please use -e,-E and -Q to
36       control encoding if needed.
37
38       All  sqlite3 SQL constructs are supported, including joins across files
39       (use an alias for each table).
40
41       See https://github.com/harelba/q for more details.
42

QUERY

44       q gets one parameter - An SQL-like query.
45
46       Any standard SQL expression, condition (both WHERE and  HAVING),  GROUP
47       BY, ORDER BY etc. are allowed.
48
49       JOINs  are  supported and Subqueries are supported in the WHERE clause,
50       but unfortunately not in the FROM clause for  now.  Use  table  aliases
51       when performing JOINs.
52
53       The  SQL  syntax  itself  is  sqlite´s  syntax.  For  details  look  at
54       http://www.sqlite.org/lang.html or search the net for examples.
55
56       NOTE: Full type detection is implemented, so there is no need  for  any
57       casting or anything.
58
59       NOTE2:  When using the -O output header option, use column name aliases
60       if you want to control the output column names. For example,  q  -O  -H
61       "select count(*) cnt,sum(*) as mysum from -" would output cnt and mysum
62       as the output header column names.
63

RUNTIME OPTIONS

65       q can also get some runtime flags.  The  following  parameters  can  be
66       used, all optional:
67
68       ````  Options: -h, --help show this help message and exit -v, --version
69       Print version
70
71       Input Data Options: -H, --skip-header Skip header row.  This  has  been
72       changed  from  earlier  version - Only one header row is supported, and
73       the header row  is  used  for  column  naming  -d  DELIMITER,  --delim‐
74       iter=DELIMITER  Field  delimiter. If none specified, then space is used
75       as the delimiter. -t, --tab-delimited Same as -d tab. Just a  shorthand
76       for  handling standard tab delimited file You can use $´\t´ if you want
77       (this is how Linux expects to provide  tabs  in  the  command  line  -e
78       ENCODING,  --encoding=ENCODING  Input file encoding. Defaults to UTF-8.
79       set to none for not setting any encoding -  faster,  but  at  your  own
80       risk...  -z,  --gzipped Data is gzipped. Useful for reading from stdin.
81       For files, .gz means automatic gunzipping  -A,  --analyze-only  Analyze
82       sample  input  and  provide  information  about  data  types  -m  MODE,
83       --mode=MODE Data parsing mode. fluffy, relaxed and  strict.  In  strict
84       mode,  the  -c  column-count parameter must be supplied as well -c COL‐
85       UMN_COUNT, --column-count=COLUMN_COUNT Specific column count when using
86       relaxed  or  strict  mode  -k,  --keep-leading-whitespace  Keep leading
87       whitespace in values. Default behavior strips  leading  whitespace  off
88       values,  in  order  to  provide out-of-the-box usability for simple use
89       cases. If you need  to  preserve  whitespace,  use  this  flag.  --dis‐
90       able-double-double-quoting  Disable  support  for double double-quoting
91       for escaping the double quote character. By default,  you  can  use  ""
92       inside  double  quoted fields to escape double quotes. Mainly for back‐
93       ward compatibility.  --disable-escaped-double-quoting  Disable  support
94       for  escaped double-quoting for escaping the double quote character. By
95       default, you can use \" inside double quoted fields  to  escape  double
96       quotes.  Mainly  for  backward  compatibility.  -w  INPUT_QUOTING_MODE,
97       --input-quoting-mode=INPUT_QUOTING_MODE Input  quoting  mode.  Possible
98       values  are all, minimal and none. Note the slightly misleading parame‐
99       ter name, and see the matching -W parameter for output quoting.
100
101       Output Options: -D  OUTPUT_DELIMITER,  --output-delimiter=OUTPUT_DELIM‐
102       ITER  Field delimiter for output. If none specified, then the -d delim‐
103       iter is used if present, or space if  no  delimiter  is  specified  -T,
104       --tab-delimited-output  Same as -D tab. Just a shorthand for outputting
105       tab delimited output. You can use -D $´\t´  if  you  want.  -O,  --out‐
106       put-header  Output header line. Output column-names are determined from
107       the query itself. Use column aliases in order to set your column  names
108       in  the query. For example, ´select name FirstName,value1/value2 MyCal‐
109       culation from ...´. This can be used even if there was no header in the
110       input. -b, --beautify Beautify output according to actual values. Might
111       be slow... -f FORMATTING, --formatting=FORMATTING Output-level  format‐
112       ting,  in  the format X=fmt,Y=fmt etc, where X,Y are output column num‐
113       bers (e.g. 1 for first SELECT column etc.  -E  OUTPUT_ENCODING,  --out‐
114       put-encoding=OUTPUT_ENCODING Output encoding. Defaults to ´none´, lead‐
115       ing to selecting the system/terminal encoding  -W  OUTPUT_QUOTING_MODE,
116       --output-quoting-mode=OUTPUT_QUOTING_MODE Output quoting mode. Possible
117       values are all, minimal, nonnumeric and none. Note  the  slightly  mis‐
118       leading  parameter  name,  and  see the matching -w parameter for input
119       quoting.
120
121       Query Related Options: -q QUERY_FILENAME,  --query-filename=QUERY_FILE‐
122       NAME Read query from the provided filename instead of the command line,
123       possibly using the provided query encoding (using -Q). -Q  QUERY_ENCOD‐
124       ING, --query-encoding=QUERY_ENCODING query text encoding. Experimental.
125       Please send your feedback on this ````
126
127   Table names
128       The table names are the actual file names that you want to  read  from.
129       Path  names are allowed. Use "-" if you want to read from stdin (e.g. q
130       "SELECT * FROM -")
131
132       Multiple files can be concatenated by using one of both of the  follow‐
133       ing ways:
134
135       ·   Separating   the   filenames   with   a   +  sign:  SELECT  *  FROM
136           datafile1+datafile2+datefile3.
137
138       ·   Using glob matching: SELECT * FROM mydata*.dat
139
140
141
142       Files with .gz extension are considered to be gzipped and  decompressed
143       on the fly.
144
145   Parsing Modes
146       q supports multiple parsing modes:
147
148       ·   relaxed  -  This is the default mode. It tries to lean towards sim‐
149           plicity of use. When a row doesn´t contains enough columns, they´ll
150           be filled with nulls, and when there are too many, the extra values
151           will be merged to the last column. Defining the number of  expected
152           columns  in  this mode is done using the -c parameter. If it is not
153           provided, then the number of columns is detected automatically  (In
154           most use cases, there is no need to specify -c)
155
156       ·   strict  -  Strict  mode is for hardcore csv/tsv parsing. Whenever a
157           row doesn´t contain the proper number of columns,  processing  will
158           stop. -c must be provided when using this mode
159
160       ·   fluffy  -  This  mode  should not be used, and is just some kind of
161           "backward compatible" parsing mode which was used by q  previously.
162           It´s left as a separate parsing mode on purpose, in order to accom‐
163           modate existing users. If you are such a user, please  open  a  bug
164           for  your  use case, and I´ll see how I can incorporate it into the
165           other modes. It is reasonable to say that this mode will be removed
166           in the future.
167
168
169
170   Output formatting option
171       The  format  of F is as a list of X=f separated by commas, where X is a
172       column number and f is a python format:
173
174       ·   X - column number - This is the  SELECTed  column  (or  expression)
175           number,  not  the  one from the original table. E.g, 1 is the first
176           SELECTed column, 3 is the third SELECTed column.
177
178       ·   f     -     A     python     formatting      string      -      See
179           http://docs.python.org/release/2.4.4/lib/typesseq-strings.html  for
180           details if needed. ** Example: -f 3=%-10s,5=%4.3f,1=%x
181
182
183

EXAMPLES

185       Example 1: ls -ltrd * | q "select c1,count(1) from - group by c1"
186
187
188
189           This example would print a count of each unique permission string in the current folder.
190
191
192
193       Example 2: seq 1 1000 | q "select avg(c1),sum(c1) from -"
194
195
196
197           This example would provide the average and the sum of the numbers in the range 1 to 1000
198
199
200
201       Example 3: sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024  as
202       total from - group by c5,c6 order by total desc"
203
204
205
206           This example will output the total size in MB per user+group in the /tmp subtree
207
208
209
210       Example  4:  ps -ef | q -H "select UID,count(*) cnt from - group by UID
211       order by cnt desc limit 3"
212
213
214
215           This example will show process counts per UID, calculated from ps data. Note that the column names provided by ps are being used as column name in the query (The -H flag activates that option)
216
217
218

AUTHOR

220       Harel Ben-Attia (harelba@gmail.com)
221
222       @harelba https://twitter.com/harelba on Twitter
223
224       Any feedback/suggestions/complaints regarding this tool would  be  much
225       appreciated. Contributions are most welcome as well, of course.
226

COPYRIGHT

228       Copyright (C) 2012--2014 Harel Ben Attia
229
230       This program is free software; you can redistribute it and/or modify it
231       under the terms of the GNU General Public License as published  by  the
232       Free  Software  Foundation;  either  version 3, or (at your option) any
233       later version.
234
235       This program is distributed in the hope  that  it  will  be  useful,but
236       WITHOUT  ANY  WARRANTY;  without  even  the  implied  warranty  of MER‐
237       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the  GNU  General
238       Public License for more details. You should have received a copy of the
239       GNU General Public License along with this program; if  not,  write  to
240       the  Free  Software Foundation, Inc., 51 Franklin Street - Fifth Floor,
241       Boston, MA 02110-1301, USA
242
243
244
245                                  March 2018                               Q()