1HTMLMIN(1) htmlmin HTMLMIN(1)
2
3
4
6 htmlmin - htmlmin Documentation
7
8 An HTML Minifier with Seatbelts
9
11 For single invocations, there is the htmlmin.minify method. It takes
12 input html as a string for its first argument and returns minified
13 html. It accepts multiple different options that allow you to tune the
14 amount of minification being done, with the defaults being the safest
15 available options:
16
17 >>> import htmlmin
18 >>> input_html = '''
19 <body style="background-color: tomato;">
20 <h1> htmlmin rocks</h1>
21 <pre>
22 and rolls
23 </pre>
24 </body>'''
25 >>> htmlmin.minify(input_html)
26 u' <body style="background-color: tomato;"> <h1> htmlmin rocks</h1> <pre>\n and rolls\n </pre> </body>'
27 >>> print htmlmin.minify(input_html)
28 <body style="background-color: tomato;"> <h1> htmlmin rocks</h1> <pre>
29 and rolls
30 </pre> </body>
31
32 If there is a chunk of html which you do not want minified, put a pre
33 attribute on an HTML tag that wraps it. htmlmin will leave the contents
34 of the tag alone and will remove the pre attribute before it is output:
35
36 >>> import htmlmin
37 >>> input_html = '''<span> minified </span><span pre> not minified </span>'''
38 >>> htmlmin.minify(input_html)
39 u'<span> minified </span><span> not minified </span>'
40
41 Attributes will be condensed to their smallest possible representation
42 by default. You can prefix an individual attribute with pre- to leave
43 it unchanged:
44
45 >>> import htmlmin
46 >>> input_html = '''<input value="<minified>" /><input pre-value="<not minified>" />'''
47 >>> htmlmin.minify(input_html)
48 u'<input value="<minified>"><input value=<not minified>>'
49
50 The minify function works well for one off minifications. However, if
51 you are going to minify several pieces of HTML, the Minifier class is
52 provided. It works similarly, but allows for persistence of options
53 between invocations and recycles the internal data structures used for
54 minification.
55
56 Command Line
57 htmlmin is invoked by running:
58
59 htmlmin input.html output.html
60
61 If no output file is specified, it will print to stdout. If no input
62 specified, it reads form stdin. Help with options can be retrieved at
63 any time by running htmlmin -h:
64
65 htmlmin -h
66 usage: htmlmin [-h] [-c] [-s] [--remove-all-empty-space]
67 [--keep-optional-attribute-quotes] [-H] [-k] [-a PRE_ATTR]
68 [-p [TAG [TAG ...]]] [-e ENCODING]
69 [INPUT] [OUTPUT]
70
71 Minify HTML
72
73 positional arguments:
74 INPUT File path to html file to minify. Defaults to stdin.
75 OUTPUT File path to output to. Defaults to stdout.
76
77 optional arguments:
78 -h, --help show this help message and exit
79 -c, --remove-comments
80 When set, comments will be removed. They can be kept on an individual basis
81 by starting them with a '!': <!--! comment -->. The '!' will be removed from
82 the final output. If you want a '!' as the leading character of your comment,
83 put two of them: <!--!! comment -->.
84
85 -s, --remove-empty-space
86 When set, this removes empty space betwen tags in certain cases.
87 Specifically, it will remove empty space if and only if there a newline
88 character occurs within the space. Thus, code like
89 '<span>x</span> <span>y</span>' will be left alone, but code such as
90 ' ...
91 </head>
92 <body>
93 ...'
94 will become '...</head><body>...'. Note that this CAN break your
95 html if you spread two inline tags over two lines. Use with caution.
96
97 --remove-all-empty-space
98 When set, this removes ALL empty space betwen tags. WARNING: this can and
99 likely will cause unintended consequences. For instance, '<i>X</i> <i>Y</i>'
100 will become '<i>X</i><i>Y</i>'. Putting whitespace along with other text will
101 avoid this problem. Only use if you are confident in the result. Whitespace is
102 not removed from inside of tags, thus '<span> </span>' will be left alone.
103
104 --keep-optional-attribute-quotes
105 When set, this keeps all attribute quotes, even if they are optional.
106
107 -H, --in-head If you are parsing only a fragment of HTML, and the fragment occurs in the
108 head of the document, setting this will remove some extra whitespace.
109
110 -k, --keep-pre-attr HTMLMin supports the propietary attribute 'pre' that can be added to elements
111 to prevent minification. This attribute is removed by default. Set this flag to
112 keep the 'pre' attributes in place.
113
114 -a PRE_ATTR, --pre-attr PRE_ATTR
115 The attribute htmlmin looks for to find blocks of HTML that it should not
116 minify. This attribute will be removed from the HTML unless '-k' is
117 specified. Defaults to 'pre'. You can also prefix individual tag attributes
118 with ``{pre_attr}-`` to prevent the contents of the individual attribute from
119 being changed.
120
121 -p [TAG [TAG ...]], --pre-tags [TAG [TAG ...]]
122 By default, the contents of 'pre', and 'textarea' tags are left unminified.
123 You can specify different tags using the --pre-tags option. 'script' and 'style'
124 tags are always left unmininfied.
125
126 -e ENCODING, --encoding ENCODING
127 Encoding to read and write with. Default 'utf-8'.
128
130 Coming soon…
131
133 Main Functions
134 htmlmin.minify(input, remove_comments=False, remove_empty_space=False,
135 remove_all_empty_space=False, reduce_empty_attributes=True,
136 reduce_boolean_attributes=False, remove_optional_attribute_quotes=True,
137 convert_charrefs=True, keep_pre=False, pre_tags=('pre', 'textarea'),
138 pre_attr='pre', cls=<class 'htmlmin.parser.HTMLMinParser'>)
139 Minifies HTML in one shot.
140
141 Parameters
142
143 · input – A string containing the HTML to be minified.
144
145 · remove_comments –
146
147 Remove comments found in HTML. Individual comments can
148 be maintained by putting a ! as the first character
149 inside the comment. Thus:
150
151 <!-- FOO --> <!--! BAR -->
152
153 Will become simply:
154
155 <!-- BAR -->
156
157 The added exclamation is removed.
158
159
160 · remove_empty_space – Remove empty space found in HTML
161 between an opening and a closing tag and when it con‐
162 tains a newline or carriage return. If whitespace is
163 found that is only spaces and/or tabs, it will be
164 turned into a single space. Be careful, this can have
165 unintended consequences.
166
167 · remove_all_empty_space – A more extreme version of
168 remove_empty_space, this removes all empty whitespace
169 found between tags. This is almost guaranteed to break
170 your HTML unless you are very careful.
171
172 · reduce_boolean_attributes – Where allowed by the HTML5
173 specification, attributes such as ‘disabled’ and ‘read‐
174 only’ will have their value removed, so ‘dis‐
175 abled=”true”’ will simply become ‘disabled’. This is
176 generally a good option to turn on except when
177 JavaScript relies on the values.
178
179 · remove_optional_attribute_quotes – When True, optional
180 quotes around attributes are removed. When False, all
181 attribute quotes are left intact. Defaults to True.
182
183 · conver_charrefs – Decode character references such as
184 & and . to their single charater values where
185 safe. This currently only applies to attributes. Data
186 content between tags will be left encoded.
187
188 · keep_pre – By default, htmlmin uses the special
189 attribute pre to allow you to demarcate areas of HTML
190 that should not be minified. It removes this attribute
191 as it finds it. Setting this value to True tells
192 htmlmin to leave the attribute in the output.
193
194 · pre_tags – A list of tag names that should never be
195 minified. You are free to change this list as you see
196 fit, but you will probably want to include pre and
197 textarea if you make any changes to the list. Note that
198 <script> and <style> tags are never minimized.
199
200 · pre_attr – Specifies the attribute that, when found in
201 an HTML tag, indicates that the content of the tag
202 should not be minified. Defaults to pre. You can also
203 prefix individual tag attributes with {pre_attr}- to
204 prevent the contents of the individual attribute from
205 being changed.
206
207 Returns
208 A string containing the minified HTML.
209
210 If you are going to be minifying multiple HTML documents, each
211 with the same settings, consider using Minifier.
212
213 class htmlmin.Minifier(remove_comments=False, remove_empty_space=False,
214 remove_all_empty_space=False, reduce_empty_attributes=True,
215 reduce_boolean_attributes=False, remove_optional_attribute_quotes=True,
216 convert_charrefs=True, keep_pre=False, pre_tags=('pre', 'textarea'),
217 pre_attr='pre', cls=<class 'htmlmin.parser.HTMLMinParser'>)
218 An object that supports HTML Minification.
219
220 Options are passed into this class at initialization time and
221 are then persisted across each use of the instance. If you are
222 going to be minifying multiple peices of HTML, this will be more
223 efficient than using htmlmin.minify.
224
225 See htmlmin.minify for an explanation of options.
226
227 minify(*input)
228 Runs HTML through the minifier in one pass.
229
230 Parameters
231 input – HTML to be fed into the minimizer. Multi‐
232 ple chunks of HTML can be provided, and they are
233 fed in sequentially as if they were concatenated.
234
235 Returns
236 A string containing the minified HTML.
237
238 This is the simplest way to use an existing Minifier
239 instance. This method takes in HTML and minfies it,
240 returning the result. Note that this method resets the
241 internal state of the parser before it does any work. If
242 there is pending HTML in the buffers, it will be lost.
243
244 input(*input)
245 Feed more HTML into the input stream
246
247 Parameters
248 input – HTML to be fed into the minimizer. Multi‐
249 ple chunks of HTML can be provided, and they are
250 fed in sequentially as if they were concatenated.
251 You can also call this method multiple times to
252 achieve the same effect.
253
254 property output
255 Retrieve the minified output generated thus far.
256
257 finalize()
258 Finishes current input HTML and returns mininified
259 result.
260
261 This method flushes any remaining input HTML and returns
262 the minified result. It resets the state of the internal
263 parser in the process so that new HTML can be minified.
264 Be sure to call this method before you reuse the Minifier
265 instance on a new HTML document.
266
267 WSGI Middlware
268 class htmlmin.middleware.HTMLMinMiddleware(app, by_default=True,
269 keep_header=False, debug=False, **kwargs)
270 WSGI Middleware that minifies html on the way out.
271
272 Parameters
273
274 · by_default – Specifies if minification should be turned
275 on or off by default. Defaults to True.
276
277 · keep_header – The middleware recognizes one custom HTTP
278 header that can be used to turn minification on or off
279 on a per-request basis: X-HTML-Min-Enable. Setting the
280 header to true will turn minfication on; anything else
281 will turn minification off. If by_default is set to
282 False, this header is how you would turn minification
283 back on. The middleware, by default, removes the header
284 from the output. Setting this to True leaves the header
285 in tact.
286
287 · debug – A quick setting to turn all minification off.
288 The middleware is effectively bypassed.
289
290 This simple middleware minifies any HTML content that passes
291 through it. Any additional keyword arguments beyond the three
292 settings the middleware has are passed on to the internal mini‐
293 fier. The documentation for the options can be found under
294 htmlmin.minify.
295
296 Decorator
297 htmlmin.decorator.htmlmin(*args, **kwargs)
298 Minifies HTML that is returned by a function.
299
300 A simple decorator that minifies the HTML output of any function
301 that it decorates. It supports all the same options that
302 htmlmin.minify has. With no options, it uses minify’s default
303 settings:
304
305 @htmlmin
306 def foobar():
307 return ' minify me! '
308
309 or:
310
311 @htmlmin(remove_comments=True)
312 def foobar():
313 return ' minify me! <!-- and remove me! -->'
314
315 htmlmin is an HTML minifier that just works. It comes with safe
316 defaults and an easily configurable set options. It can turn this:
317
318 <html>
319 <head>
320 <title> Hello, World! </title>
321 </head>
322 <body>
323 <p> How are <em>you</em> doing? </p>
324 </body>
325 </html>
326
327 Into this:
328
329 <html><head><title>Hello, World!</title><body><p> How are <em>you</em> doing? </p></body></html>
330
331 When we say that htmlmin has ‘seatbelts’, what we mean is that it comes
332 with features that you can use to safely minify beyond the defaults,
333 but you have to put them in yourself. For instance, by default, htmlmin
334 will never minimize the content between <pre>, <textarea>, <script>,
335 and <style> tags. You can also explicitly tell it to not minify addi‐
336 tional tags either globally by name or by adding the custom pre
337 attribute to a tag in your HTML. htmlmin will remove the pre attributes
338 as it parses your HTML automatically.
339
340 It also includes a command-line tool for easy invocation and integra‐
341 tion with existing workflows.
342
343 To install via pip:
344
345 pip install htmlmin
346
347 Source code is availble on github at https://github.com/mankyd/htmlmin:
348
349 git clone git://github.com/mankyd/htmlmin.git
350
351 · Safely minify HTML with either a function call or from the command
352 line.
353
354 · Extend what elements can and cannot be minified.
355
356 · Intelligently remove whitespace completely or reduce to single spa‐
357 ces.
358
359 · Properly handles unclosed HTML5 tags.
360
361 · Optionally remove comments while marking some comments to keep.
362
363 · Simple function decorator to minify all function output.
364
365 · Simple WSGI middleware to minify web app output.
366
367 · Tested in both Python 2.7 and 3.2: [image: build_status] [image]
368
369
370 · genindex
371
372 · search
373
375 Dave Mankoff
376
378 2020, Dave Mankoff
379
380
381
382
3830.1 Jul 29, 2020 HTMLMIN(1)