1HTMLMIN(1) htmlmin HTMLMIN(1)
2
3
4
6 htmlmin - htmlmin Documentation
7
8 An HTML Minifier with Seatbelts
9
11 For single invocations, there is the htmlmin.minify method. It takes
12 input html as a string for its first argument and returns minified
13 html. It accepts multiple different options that allow you to tune the
14 amount of minification being done, with the defaults being the safest
15 available options:
16
17 >>> import htmlmin
18 >>> input_html = '''
19 <body style="background-color: tomato;">
20 <h1> htmlmin rocks</h1>
21 <pre>
22 and rolls
23 </pre>
24 </body>'''
25 >>> htmlmin.minify(input_html)
26 u' <body style="background-color: tomato;"> <h1> htmlmin rocks</h1> <pre>\n and rolls\n </pre> </body>'
27 >>> print htmlmin.minify(input_html)
28 <body style="background-color: tomato;"> <h1> htmlmin rocks</h1> <pre>
29 and rolls
30 </pre> </body>
31
32 If there is a chunk of html which you do not want minified, put a pre
33 attribute on an HTML tag that wraps it. htmlmin will leave the contents
34 of the tag alone and will remove the pre attribute before it is output:
35
36 >>> import htmlmin
37 >>> input_html = '''<span> minified </span><span pre> not minified </span>'''
38 >>> htmlmin.minify(input_html)
39 u'<span> minified </span><span> not minified </span>'
40
41 Attributes will be condensed to their smallest possible representation
42 by default. You can prefix an individual attribute with pre- to leave
43 it unchanged:
44
45 >>> import htmlmin
46 >>> input_html = '''<input value="<minified>" /><input pre-value="<not minified>" />'''
47 >>> htmlmin.minify(input_html)
48 u'<input value="<minified>"><input value=<not minified>>'
49
50 The minify function works well for one off minifications. However, if
51 you are going to minify several pieces of HTML, the Minifier class is
52 provided. It works similarly, but allows for persistence of options
53 between invocations and recycles the internal data structures used for
54 minification.
55
56 Command Line
57 htmlmin is invoked by running:
58
59 htmlmin input.html output.html
60
61 If no output file is specified, it will print to stdout. If no input
62 specified, it reads form stdin. Help with options can be retrieved at
63 any time by running htmlmin -h:
64
65 htmlmin -h
66 usage: htmlmin [-h] [-c] [-s] [--remove-all-empty-space]
67 [--keep-optional-attribute-quotes] [-H] [-k] [-a PRE_ATTR]
68 [-p [TAG [TAG ...]]] [-e ENCODING]
69 [INPUT] [OUTPUT]
70
71 Minify HTML
72
73 positional arguments:
74 INPUT File path to html file to minify. Defaults to stdin.
75 OUTPUT File path to output to. Defaults to stdout.
76
77 optional arguments:
78 -h, --help show this help message and exit
79 -c, --remove-comments
80 When set, comments will be removed. They can be kept on an individual basis
81 by starting them with a '!': <!--! comment -->. The '!' will be removed from
82 the final output. If you want a '!' as the leading character of your comment,
83 put two of them: <!--!! comment -->.
84
85 -s, --remove-empty-space
86 When set, this removes empty space betwen tags in certain cases.
87 Specifically, it will remove empty space if and only if there a newline
88 character occurs within the space. Thus, code like
89 '<span>x</span> <span>y</span>' will be left alone, but code such as
90 ' ...
91 </head>
92 <body>
93 ...'
94 will become '...</head><body>...'. Note that this CAN break your
95 html if you spread two inline tags over two lines. Use with caution.
96
97 --remove-all-empty-space
98 When set, this removes ALL empty space betwen tags. WARNING: this can and
99 likely will cause unintended consequences. For instance, '<i>X</i> <i>Y</i>'
100 will become '<i>X</i><i>Y</i>'. Putting whitespace along with other text will
101 avoid this problem. Only use if you are confident in the result. Whitespace is
102 not removed from inside of tags, thus '<span> </span>' will be left alone.
103
104 --keep-optional-attribute-quotes
105 When set, this keeps all attribute quotes, even if they are optional.
106
107 -H, --in-head If you are parsing only a fragment of HTML, and the fragment occurs in the
108 head of the document, setting this will remove some extra whitespace.
109
110 -k, --keep-pre-attr HTMLMin supports the propietary attribute 'pre' that can be added to elements
111 to prevent minification. This attribute is removed by default. Set this flag to
112 keep the 'pre' attributes in place.
113
114 -a PRE_ATTR, --pre-attr PRE_ATTR
115 The attribute htmlmin looks for to find blocks of HTML that it should not
116 minify. This attribute will be removed from the HTML unless '-k' is
117 specified. Defaults to 'pre'. You can also prefix individual tag attributes
118 with ``{pre_attr}-`` to prevent the contents of the individual attribute from
119 being changed.
120
121 -p [TAG [TAG ...]], --pre-tags [TAG [TAG ...]]
122 By default, the contents of 'pre', and 'textarea' tags are left unminified.
123 You can specify different tags using the --pre-tags option. 'script' and 'style'
124 tags are always left unmininfied.
125
126 -e ENCODING, --encoding ENCODING
127 Encoding to read and write with. Default 'utf-8'.
128
130 Coming soon…
131
133 Main Functions
134 htmlmin.minify(input, remove_comments=False, remove_empty_space=False,
135 remove_all_empty_space=False, reduce_empty_attributes=True,
136 reduce_boolean_attributes=False, remove_optional_attribute_quotes=True,
137 convert_charrefs=True, keep_pre=False, pre_tags=(u'pre', u'textarea'),
138 pre_attr='pre', cls=<class htmlmin.parser.HTMLMinParser>)
139 Minifies HTML in one shot.
140
141 Parameters
142
143 · input – A string containing the HTML to be minified.
144
145 · remove_comments –
146
147 Remove comments found in HTML. Individual comments can
148 be maintained by putting a ! as the first character
149 inside the comment. Thus:
150
151 <!-- FOO --> <!--! BAR -->
152
153 Will become simply:
154
155 <!-- BAR -->
156
157 The added exclamation is removed.
158
159
160 · remove_empty_space – Remove empty space found in HTML
161 between an opening and a closing tag and when it con‐
162 tains a newline or carriage return. If whitespace is
163 found that is only spaces and/or tabs, it will be
164 turned into a single space. Be careful, this can have
165 unintended consequences.
166
167 · remove_all_empty_space – A more extreme version of
168 remove_empty_space, this removes all empty whitespace
169 found between tags. This is almost guaranteed to break
170 your HTML unless you are very careful.
171
172 · reduce_boolean_attributes – Where allowed by the HTML5
173 specification, attributes such as ‘disabled’ and ‘read‐
174 only’ will have their value removed, so ‘dis‐
175 abled=”true”’ will simply become ‘disabled’. This is
176 generally a good option to turn on except when
177 JavaScript relies on the values.
178
179 · remove_optional_attribute_quotes – When True, optional
180 quotes around attributes are removed. When False, all
181 attribute quotes are left intact. Defaults to True.
182
183 · conver_charrefs – Decode character references such as
184 & and . to their single charater values where
185 safe. This currently only applies to attributes. Data
186 content between tags will be left encoded.
187
188 · keep_pre – By default, htmlmin uses the special
189 attribute pre to allow you to demarcate areas of HTML
190 that should not be minified. It removes this attribute
191 as it finds it. Setting this value to True tells
192 htmlmin to leave the attribute in the output.
193
194 · pre_tags – A list of tag names that should never be
195 minified. You are free to change this list as you see
196 fit, but you will probably want to include pre and
197 textarea if you make any changes to the list. Note that
198 <script> and <style> tags are never minimized.
199
200 · pre_attr – Specifies the attribute that, when found in
201 an HTML tag, indicates that the content of the tag
202 should not be minified. Defaults to pre. You can also
203 prefix individual tag attributes with {pre_attr}- to
204 prevent the contents of the individual attribute from
205 being changed.
206
207 Returns
208 A string containing the minified HTML.
209
210 If you are going to be minifying multiple HTML documents, each
211 with the same settings, consider using Minifier.
212
213 class htmlmin.Minifier(remove_comments=False, remove_empty_space=False,
214 remove_all_empty_space=False, reduce_empty_attributes=True,
215 reduce_boolean_attributes=False, remove_optional_attribute_quotes=True,
216 convert_charrefs=True, keep_pre=False, pre_tags=(u'pre', u'textarea'),
217 pre_attr='pre', cls=<class htmlmin.parser.HTMLMinParser>)
218 An object that supports HTML Minification.
219
220 Options are passed into this class at initialization time and
221 are then persisted across each use of the instance. If you are
222 going to be minifying multiple peices of HTML, this will be more
223 efficient than using htmlmin.minify.
224
225 See htmlmin.minify for an explanation of options.
226
227 minify(*input)
228 Runs HTML through the minifier in one pass.
229
230 Parameters
231 input – HTML to be fed into the minimizer. Multi‐
232 ple chunks of HTML can be provided, and they are
233 fed in sequentially as if they were concatenated.
234
235 Returns
236 A string containing the minified HTML.
237
238 This is the simplest way to use an existing Minifier
239 instance. This method takes in HTML and minfies it,
240 returning the result. Note that this method resets the
241 internal state of the parser before it does any work. If
242 there is pending HTML in the buffers, it will be lost.
243
244 input(*input)
245 Feed more HTML into the input stream
246
247 Parameters
248 input – HTML to be fed into the minimizer. Multi‐
249 ple chunks of HTML can be provided, and they are
250 fed in sequentially as if they were concatenated.
251 You can also call this method multiple times to
252 achieve the same effect.
253
254 output Retrieve the minified output generated thus far.
255
256 finalize()
257 Finishes current input HTML and returns mininified
258 result.
259
260 This method flushes any remaining input HTML and returns
261 the minified result. It resets the state of the internal
262 parser in the process so that new HTML can be minified.
263 Be sure to call this method before you reuse the Minifier
264 instance on a new HTML document.
265
266 WSGI Middlware
267 class htmlmin.middleware.HTMLMinMiddleware(app, by_default=True,
268 keep_header=False, debug=False, **kwargs)
269 WSGI Middleware that minifies html on the way out.
270
271 Parameters
272
273 · by_default – Specifies if minification should be turned
274 on or off by default. Defaults to True.
275
276 · keep_header – The middleware recognizes one custom HTTP
277 header that can be used to turn minification on or off
278 on a per-request basis: X-HTML-Min-Enable. Setting the
279 header to true will turn minfication on; anything else
280 will turn minification off. If by_default is set to
281 False, this header is how you would turn minification
282 back on. The middleware, by default, removes the header
283 from the output. Setting this to True leaves the header
284 in tact.
285
286 · debug – A quick setting to turn all minification off.
287 The middleware is effectively bypassed.
288
289 This simple middleware minifies any HTML content that passes
290 through it. Any additional keyword arguments beyond the three
291 settings the middleware has are passed on to the internal mini‐
292 fier. The documentation for the options can be found under
293 htmlmin.minify.
294
295 Decorator
296 htmlmin.decorator.htmlmin(*args, **kwargs)
297 Minifies HTML that is returned by a function.
298
299 A simple decorator that minifies the HTML output of any function
300 that it decorates. It supports all the same options that
301 htmlmin.minify has. With no options, it uses minify’s default
302 settings:
303
304 @htmlmin
305 def foobar():
306 return ' minify me! '
307
308 or:
309
310 @htmlmin(remove_comments=True)
311 def foobar():
312 return ' minify me! <!-- and remove me! -->'
313
314 htmlmin is an HTML minifier that just works. It comes with safe
315 defaults and an easily configurable set options. It can turn this:
316
317 <html>
318 <head>
319 <title> Hello, World! </title>
320 </head>
321 <body>
322 <p> How are <em>you</em> doing? </p>
323 </body>
324 </html>
325
326 Into this:
327
328 <html><head><title>Hello, World!</title><body><p> How are <em>you</em> doing? </p></body></html>
329
330 When we say that htmlmin has ‘seatbelts’, what we mean is that it comes
331 with features that you can use to safely minify beyond the defaults,
332 but you have to put them in yourself. For instance, by default, htmlmin
333 will never minimize the content between <pre>, <textarea>, <script>,
334 and <style> tags. You can also explicitly tell it to not minify addi‐
335 tional tags either globally by name or by adding the custom pre
336 attribute to a tag in your HTML. htmlmin will remove the pre attributes
337 as it parses your HTML automatically.
338
339 It also includes a command-line tool for easy invocation and integra‐
340 tion with existing workflows.
341
342 To install via pip:
343
344 pip install htmlmin
345
346 Source code is availble on github at https://github.com/mankyd/htmlmin:
347
348 git clone git://github.com/mankyd/htmlmin.git
349
350 · Safely minify HTML with either a function call or from the command
351 line.
352
353 · Extend what elements can and cannot be minified.
354
355 · Intelligently remove whitespace completely or reduce to single spa‐
356 ces.
357
358 · Properly handles unclosed HTML5 tags.
359
360 · Optionally remove comments while marking some comments to keep.
361
362 · Simple function decorator to minify all function output.
363
364 · Simple WSGI middleware to minify web app output.
365
366 · Tested in both Python 2.7 and 3.2: [image: build_status] [image]
367
368
369 · genindex
370
371 · search
372
374 Dave Mankoff
375
377 2013, Dave Mankoff
378
379
380
381
3820.1 Feb 02, 2019 HTMLMIN(1)