1HTMLMIN(1)                          htmlmin                         HTMLMIN(1)
2
3
4

NAME

6       htmlmin - htmlmin Documentation
7
8       An HTML Minifier with Seatbelts
9

QUICKSTART

11       For  single  invocations,  there is the htmlmin.minify method. It takes
12       input html as a string for its  first  argument  and  returns  minified
13       html.  It accepts multiple different options that allow you to tune the
14       amount of minification being done, with the defaults being  the  safest
15       available options:
16
17          >>> import htmlmin
18          >>> input_html = '''
19            <body   style="background-color: tomato;">
20              <h1>  htmlmin   rocks</h1>
21              <pre>
22                and rolls
23              </pre>
24            </body>'''
25          >>> htmlmin.minify(input_html)
26          u' <body style="background-color: tomato;"> <h1> htmlmin rocks</h1> <pre>\n        and rolls\n      </pre> </body>'
27          >>> print htmlmin.minify(input_html)
28           <body style="background-color: tomato;"> <h1> htmlmin rocks</h1> <pre>
29                  and rolls
30                </pre> </body>
31
32       If  there  is a chunk of html which you do not want minified, put a pre
33       attribute on an HTML tag that wraps it. htmlmin will leave the contents
34       of the tag alone and will remove the pre attribute before it is output:
35
36          >>> import htmlmin
37          >>> input_html = '''<span>   minified   </span><span pre>   not minified   </span>'''
38          >>> htmlmin.minify(input_html)
39          u'<span> minified </span><span>   not minified   </span>'
40
41       Attributes  will be condensed to their smallest possible representation
42       by default. You can prefix an individual attribute with pre-  to  leave
43       it unchanged:
44
45          >>> import htmlmin
46          >>> input_html = '''<input value="&lt;minified&gt;" /><input pre-value="&lt;not minified&gt;" />'''
47          >>> htmlmin.minify(input_html)
48          u'<input value="<minified>"><input value=&lt;not minified&gt;>'
49
50       The  minify  function works well for one off minifications. However, if
51       you are going to minify several pieces of HTML, the Minifier  class  is
52       provided. It works similarly, but allows for persistence of options be‐
53       tween invocations and recycles the internal data  structures  used  for
54       minification.
55
56   Command Line
57       htmlmin is invoked by running:
58
59          htmlmin input.html output.html
60
61       If  no  output  file is specified, it will print to stdout. If no input
62       specified, it reads form stdin. Help with options can be  retrieved  at
63       any time by running htmlmin -h:
64
65          htmlmin -h
66          usage: htmlmin [-h] [-c] [-s] [--remove-all-empty-space]
67                         [--keep-optional-attribute-quotes] [-H] [-k] [-a PRE_ATTR]
68                         [-p [TAG [TAG ...]]] [-e ENCODING]
69                         [INPUT] [OUTPUT]
70
71          Minify HTML
72
73          positional arguments:
74            INPUT                 File path to html file to minify. Defaults to stdin.
75            OUTPUT                File path to output to. Defaults to stdout.
76
77          optional arguments:
78            -h, --help            show this help message and exit
79            -c, --remove-comments
80                                  When set, comments will be removed. They can be kept on an individual basis
81                                  by starting them with a '!': <!--! comment -->. The '!' will be removed from
82                                  the final output. If you want a '!' as the leading character of your comment,
83                                  put two of them: <!--!! comment -->.
84
85            -s, --remove-empty-space
86                                  When set, this removes empty space betwen tags in certain cases.
87                                  Specifically, it will remove empty space if and only if there a newline
88                                  character occurs within the space. Thus, code like
89                                  '<span>x</span> <span>y</span>' will be left alone, but code such as
90                                  '   ...
91                                    </head>
92                                    <body>
93                                      ...'
94                                  will become '...</head><body>...'. Note that this CAN break your
95                                  html if you spread two inline tags over two lines. Use with caution.
96
97            --remove-all-empty-space
98                                  When set, this removes ALL empty space betwen tags. WARNING: this can and
99                                  likely will cause unintended consequences. For instance, '<i>X</i> <i>Y</i>'
100                                  will become '<i>X</i><i>Y</i>'. Putting whitespace along with other text will
101                                  avoid this problem. Only use if you are confident in the result. Whitespace is
102                                  not removed from inside of tags, thus '<span> </span>' will be left alone.
103
104            --keep-optional-attribute-quotes
105                                  When set, this keeps all attribute quotes, even if they are optional.
106
107            -H, --in-head         If you are parsing only a fragment of HTML, and the fragment occurs in the
108                                  head of the document, setting this will remove some extra whitespace.
109
110            -k, --keep-pre-attr   HTMLMin supports the propietary attribute 'pre' that can be added to elements
111                                  to prevent minification. This attribute is removed by default. Set this flag to
112                                  keep the 'pre' attributes in place.
113
114            -a PRE_ATTR, --pre-attr PRE_ATTR
115                                  The attribute htmlmin looks for to find blocks of HTML that it should not
116                                  minify. This attribute will be removed from the HTML unless '-k' is
117                                  specified. Defaults to 'pre'. You can also prefix individual tag attributes
118                                  with ``{pre_attr}-`` to prevent the contents of the individual attribute from
119                                  being changed.
120
121            -p [TAG [TAG ...]], --pre-tags [TAG [TAG ...]]
122                                  By default, the contents of 'pre', and 'textarea' tags are left unminified.
123                                  You can specify different tags using the --pre-tags option. 'script' and 'style'
124                                  tags are always left unmininfied.
125
126            -e ENCODING, --encoding ENCODING
127                                  Encoding to read and write with. Default 'utf-8'.
128

TUTORIAL & EXAMPLES

130       Coming soon…
131

API REFERENCE

133   Main Functions
134       htmlmin.minify(input,  remove_comments=False, remove_empty_space=False,
135       remove_all_empty_space=False,     reduce_empty_attributes=True,     re‐
136       duce_boolean_attributes=False,   remove_optional_attribute_quotes=True,
137       convert_charrefs=True,  keep_pre=False,  pre_tags=('pre',  'textarea'),
138       pre_attr='pre', cls=<class 'htmlmin.parser.HTMLMinParser'>)
139              Minifies HTML in one shot.
140
141              Parameters
142
143input – A string containing the HTML to be minified.
144
145remove_comments 
146
147                       Remove  comments found in HTML. Individual comments can
148                       be maintained by putting a ! as the first character in‐
149                       side the comment.  Thus:
150
151                          <!-- FOO --> <!--! BAR -->
152
153                       Will become simply:
154
155                          <!-- BAR -->
156
157                       The added exclamation is removed.
158
159
160remove_empty_space  –  Remove empty space found in HTML
161                       between an opening and a closing tag and when  it  con‐
162                       tains  a  newline  or carriage return. If whitespace is
163                       found that is only  spaces  and/or  tabs,  it  will  be
164                       turned  into  a single space. Be careful, this can have
165                       unintended consequences.
166
167remove_all_empty_space – A more extreme version of  re‐
168                       move_empty_space,  this  removes  all  empty whitespace
169                       found between tags. This is almost guaranteed to  break
170                       your HTML unless you are very careful.
171
172reduce_boolean_attributes  – Where allowed by the HTML5
173                       specification, attributes such as ‘disabled’ and ‘read‐
174                       only’   will   have   their  value  removed,  so  ‘dis‐
175                       abled=”true”’ will simply become  ‘disabled’.  This  is
176                       generally  a  good  option to turn on except when Java‐
177                       Script relies on the values.
178
179remove_optional_attribute_quotes – When True,  optional
180                       quotes  around  attributes are removed. When False, all
181                       attribute quotes are left intact.  Defaults to True.
182
183conver_charrefs – Decode character references  such  as
184                       &amp;  and  .   to  their  single charater values where
185                       safe. This currently only applies to  attributes.  Data
186                       content between tags will be left encoded.
187
188keep_pre  – By default, htmlmin uses the special attri‐
189                       bute pre to allow you to demarcate areas of  HTML  that
190                       should not be minified. It removes this attribute as it
191                       finds it. Setting this value to True tells  htmlmin  to
192                       leave the attribute in the output.
193
194pre_tags  –  A  list  of tag names that should never be
195                       minified. You are free to change this list as  you  see
196                       fit,  but  you  will  probably  want to include pre and
197                       textarea if you make any changes to the list. Note that
198                       <script> and <style> tags are never minimized.
199
200pre_attr  – Specifies the attribute that, when found in
201                       an HTML tag, indicates that  the  content  of  the  tag
202                       should  not  be minified. Defaults to pre. You can also
203                       prefix individual tag attributes  with  {pre_attr}-  to
204                       prevent  the  contents of the individual attribute from
205                       being changed.
206
207              Returns
208                     A string containing the minified HTML.
209
210              If you are going to be minifying multiple HTML  documents,  each
211              with the same settings, consider using Minifier.
212
213       class htmlmin.Minifier(remove_comments=False, remove_empty_space=False,
214       remove_all_empty_space=False,     reduce_empty_attributes=True,     re‐
215       duce_boolean_attributes=False,   remove_optional_attribute_quotes=True,
216       convert_charrefs=True,  keep_pre=False,  pre_tags=('pre',  'textarea'),
217       pre_attr='pre', cls=<class 'htmlmin.parser.HTMLMinParser'>)
218              An object that supports HTML Minification.
219
220              Options  are  passed  into this class at initialization time and
221              are then persisted across each use of the instance. If  you  are
222              going to be minifying multiple peices of HTML, this will be more
223              efficient than using htmlmin.minify.
224
225              See htmlmin.minify for an explanation of options.
226
227              minify(*input)
228                     Runs HTML through the minifier in one pass.
229
230                     Parameters
231                            input – HTML to be fed into the minimizer.  Multi‐
232                            ple  chunks  of HTML can be provided, and they are
233                            fed in sequentially as if they were concatenated.
234
235                     Returns
236                            A string containing the minified HTML.
237
238                     This is the simplest way to use an existing Minifier  in‐
239                     stance. This method takes in HTML and minfies it, return‐
240                     ing the result. Note that this method resets the internal
241                     state of  the parser before it does any work. If there is
242                     pending HTML in the buffers, it will be lost.
243
244              input(*input)
245                     Feed more HTML into the input stream
246
247                     Parameters
248                            input – HTML to be fed into the minimizer.  Multi‐
249                            ple  chunks  of HTML can be provided, and they are
250                            fed in sequentially as if they were  concatenated.
251                            You  can  also  call this method multiple times to
252                            achieve the same effect.
253
254              property output
255                     Retrieve the minified output generated thus far.
256
257              finalize()
258                     Finishes current input HTML and  returns  mininified  re‐
259                     sult.
260
261                     This  method flushes any remaining input HTML and returns
262                     the minified result. It resets the state of the  internal
263                     parser  in  the process so that new HTML can be minified.
264                     Be sure to call this method before you reuse the Minifier
265                     instance on a new HTML document.
266
267   WSGI Middlware
268       class     htmlmin.middleware.HTMLMinMiddleware(app,    by_default=True,
269       keep_header=False, debug=False, **kwargs)
270              WSGI Middleware that minifies html on the way out.
271
272              Parameters
273
274by_default – Specifies if minification should be turned
275                       on or off by default. Defaults to True.
276
277keep_header – The middleware recognizes one custom HTTP
278                       header that can be used to turn minification on or  off
279                       on  a per-request basis: X-HTML-Min-Enable. Setting the
280                       header to true will turn minfication on; anything  else
281                       will  turn  minification  off.  If by_default is set to
282                       False, this header is how you would  turn  minification
283                       back on. The middleware, by default, removes the header
284                       from the output. Setting this to True leaves the header
285                       in tact.
286
287debug  –  A quick setting to turn all minification off.
288                       The middleware is effectively bypassed.
289
290              This simple middleware minifies any  HTML  content  that  passes
291              through  it.  Any  additional keyword arguments beyond the three
292              settings the middleware has are passed on to the internal  mini‐
293              fier.  The  documentation  for  the  options  can be found under
294              htmlmin.minify.
295
296   Decorator
297       htmlmin.decorator.htmlmin(*args, **kwargs)
298              Minifies HTML that is returned by a function.
299
300              A simple decorator that minifies the HTML output of any function
301              that  it  decorates.  It  supports  all  the  same  options that
302              htmlmin.minify has.  With no options, it uses  minify’s  default
303              settings:
304
305                 @htmlmin
306                 def foobar():
307                    return '   minify me!   '
308
309              or:
310
311                 @htmlmin(remove_comments=True)
312                 def foobar():
313                    return '   minify me!  <!-- and remove me! -->'
314
315       htmlmin  is  an  HTML  minifier that just works. It comes with safe de‐
316       faults and an easily configurable set options. It can turn this:
317
318          <html>
319            <head>
320              <title>  Hello, World!  </title>
321            </head>
322            <body>
323              <p> How are <em>you</em> doing?  </p>
324            </body>
325          </html>
326
327       Into this:
328
329          <html><head><title>Hello, World!</title><body><p> How are <em>you</em> doing? </p></body></html>
330
331       When we say that htmlmin has ‘seatbelts’, what we mean is that it comes
332       with  features  that  you can use to safely minify beyond the defaults,
333       but you have to put them in yourself. For instance, by default, htmlmin
334       will  never  minimize  the content between <pre>, <textarea>, <script>,
335       and <style> tags.  You can also  explicitly tell it to not minify addi‐
336       tional  tags either globally by name or by adding the custom pre attri‐
337       bute to a tag in your HTML. htmlmin will remove the pre  attributes  as
338       it parses your HTML automatically.
339
340       It  also  includes a command-line tool for easy invocation and integra‐
341       tion with existing workflows.
342
343       To install via pip:
344
345          pip install htmlmin
346
347       Source code is availble on github at https://github.com/mankyd/htmlmin:
348
349          git clone git://github.com/mankyd/htmlmin.git
350
351       • Safely minify HTML with either a function call or  from  the  command
352         line.
353
354       • Extend what elements can and cannot be minified.
355
356       • Intelligently  remove  whitespace completely or reduce to single spa‐
357         ces.
358
359       • Properly handles unclosed HTML5 tags.
360
361       • Optionally remove comments while marking some comments to keep.
362
363       • Simple function decorator to minify all function output.
364
365       • Simple WSGI middleware to minify web app output.
366
367Tested in both Python 2.7 and 3.2: [image: build_status] [image]
368
369
370Index
371
372Search Page
373

AUTHOR

375       Dave Mankoff
376
378       2023, Dave Mankoff
379
380
381
382
3830.1                              Jan 20, 2023                       HTMLMIN(1)
Impressum