1ANALYZE(7) PostgreSQL 14.3 Documentation ANALYZE(7)
2
3
4
6 ANALYZE - collect statistics about a database
7
9 ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
10 ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]
11
12 where option can be one of:
13
14 VERBOSE [ boolean ]
15 SKIP_LOCKED [ boolean ]
16
17 and table_and_columns is:
18
19 table_name [ ( column_name [, ...] ) ]
20
22 ANALYZE collects statistics about the contents of tables in the
23 database, and stores the results in the pg_statistic system catalog.
24 Subsequently, the query planner uses these statistics to help determine
25 the most efficient execution plans for queries.
26
27 Without a table_and_columns list, ANALYZE processes every table and
28 materialized view in the current database that the current user has
29 permission to analyze. With a list, ANALYZE processes only those
30 table(s). It is further possible to give a list of column names for a
31 table, in which case only the statistics for those columns are
32 collected.
33
34 When the option list is surrounded by parentheses, the options can be
35 written in any order. The parenthesized syntax was added in PostgreSQL
36 11; the unparenthesized syntax is deprecated.
37
39 VERBOSE
40 Enables display of progress messages.
41
42 SKIP_LOCKED
43 Specifies that ANALYZE should not wait for any conflicting locks to
44 be released when beginning work on a relation: if a relation cannot
45 be locked immediately without waiting, the relation is skipped.
46 Note that even with this option, ANALYZE may still block when
47 opening the relation's indexes or when acquiring sample rows from
48 partitions, table inheritance children, and some types of foreign
49 tables. Also, while ANALYZE ordinarily processes all partitions of
50 specified partitioned tables, this option will cause ANALYZE to
51 skip all partitions if there is a conflicting lock on the
52 partitioned table.
53
54 boolean
55 Specifies whether the selected option should be turned on or off.
56 You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF,
57 or 0 to disable it. The boolean value can also be omitted, in which
58 case TRUE is assumed.
59
60 table_name
61 The name (possibly schema-qualified) of a specific table to
62 analyze. If omitted, all regular tables, partitioned tables, and
63 materialized views in the current database are analyzed (but not
64 foreign tables). If the specified table is a partitioned table,
65 both the inheritance statistics of the partitioned table as a whole
66 and statistics of the individual partitions are updated.
67
68 column_name
69 The name of a specific column to analyze. Defaults to all columns.
70
72 When VERBOSE is specified, ANALYZE emits progress messages to indicate
73 which table is currently being processed. Various statistics about the
74 tables are printed as well.
75
77 To analyze a table, one must ordinarily be the table's owner or a
78 superuser. However, database owners are allowed to analyze all tables
79 in their databases, except shared catalogs. (The restriction for shared
80 catalogs means that a true database-wide ANALYZE can only be performed
81 by a superuser.) ANALYZE will skip over any tables that the calling
82 user does not have permission to analyze.
83
84 Foreign tables are analyzed only when explicitly selected. Not all
85 foreign data wrappers support ANALYZE. If the table's wrapper does not
86 support ANALYZE, the command prints a warning and does nothing.
87
88 In the default PostgreSQL configuration, the autovacuum daemon (see
89 Section 25.1.6) takes care of automatic analyzing of tables when they
90 are first loaded with data, and as they change throughout regular
91 operation. When autovacuum is disabled, it is a good idea to run
92 ANALYZE periodically, or just after making major changes in the
93 contents of a table. Accurate statistics will help the planner to
94 choose the most appropriate query plan, and thereby improve the speed
95 of query processing. A common strategy for read-mostly databases is to
96 run VACUUM and ANALYZE once a day during a low-usage time of day. (This
97 will not be sufficient if there is heavy update activity.)
98
99 ANALYZE requires only a read lock on the target table, so it can run in
100 parallel with other activity on the table.
101
102 The statistics collected by ANALYZE usually include a list of some of
103 the most common values in each column and a histogram showing the
104 approximate data distribution in each column. One or both of these can
105 be omitted if ANALYZE deems them uninteresting (for example, in a
106 unique-key column, there are no common values) or if the column data
107 type does not support the appropriate operators. There is more
108 information about the statistics in Chapter 25.
109
110 For large tables, ANALYZE takes a random sample of the table contents,
111 rather than examining every row. This allows even very large tables to
112 be analyzed in a small amount of time. Note, however, that the
113 statistics are only approximate, and will change slightly each time
114 ANALYZE is run, even if the actual table contents did not change. This
115 might result in small changes in the planner's estimated costs shown by
116 EXPLAIN. In rare situations, this non-determinism will cause the
117 planner's choices of query plans to change after ANALYZE is run. To
118 avoid this, raise the amount of statistics collected by ANALYZE, as
119 described below.
120
121 The extent of analysis can be controlled by adjusting the
122 default_statistics_target configuration variable, or on a
123 column-by-column basis by setting the per-column statistics target with
124 ALTER TABLE ... ALTER COLUMN ... SET STATISTICS. The target value sets
125 the maximum number of entries in the most-common-value list and the
126 maximum number of bins in the histogram. The default target value is
127 100, but this can be adjusted up or down to trade off accuracy of
128 planner estimates against the time taken for ANALYZE and the amount of
129 space occupied in pg_statistic. In particular, setting the statistics
130 target to zero disables collection of statistics for that column. It
131 might be useful to do that for columns that are never used as part of
132 the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner
133 will have no use for statistics on such columns.
134
135 The largest statistics target among the columns being analyzed
136 determines the number of table rows sampled to prepare the statistics.
137 Increasing the target causes a proportional increase in the time and
138 space needed to do ANALYZE.
139
140 One of the values estimated by ANALYZE is the number of distinct values
141 that appear in each column. Because only a subset of the rows are
142 examined, this estimate can sometimes be quite inaccurate, even with
143 the largest possible statistics target. If this inaccuracy leads to bad
144 query plans, a more accurate value can be determined manually and then
145 installed with ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...).
146
147 If the table being analyzed has one or more children, ANALYZE will
148 gather statistics twice: once on the rows of the parent table only, and
149 a second time on the rows of the parent table with all of its children.
150 This second set of statistics is needed when planning queries that
151 traverse the entire inheritance tree. The autovacuum daemon, however,
152 will only consider inserts or updates on the parent table itself when
153 deciding whether to trigger an automatic analyze for that table. If
154 that table is rarely inserted into or updated, the inheritance
155 statistics will not be up to date unless you run ANALYZE manually.
156
157 For partitioned tables, ANALYZE gathers statistics by sampling rows
158 from all partitions; in addition, it will recurse into each partition
159 and update its statistics. Each leaf partition is analyzed only once,
160 even with multi-level partitioning. No statistics are collected for
161 only the parent table (without data from its partitions), because with
162 partitioning it's guaranteed to be empty.
163
164 By contrast, if the table being analyzed has inheritance children,
165 ANALYZE gathers two sets of statistics: one on the rows of the parent
166 table only, and a second including rows of both the parent table and
167 all of its children. This second set of statistics is needed when
168 planning queries that process the inheritance tree as a whole. The
169 child tables themselves are not individually analyzed in this case.
170
171 The autovacuum daemon does not process partitioned tables, nor does it
172 process inheritance parents if only the children are ever modified. It
173 is usually necessary to periodically run a manual ANALYZE to keep the
174 statistics of the table hierarchy up to date.
175
176 If any child tables or partitions are foreign tables whose foreign data
177 wrappers do not support ANALYZE, those tables are ignored while
178 gathering inheritance statistics.
179
180 If the table being analyzed is completely empty, ANALYZE will not
181 record new statistics for that table. Any existing statistics will be
182 retained.
183
184 Each backend running ANALYZE will report its progress in the
185 pg_stat_progress_analyze view. See Section 28.4.1 for details.
186
188 There is no ANALYZE statement in the SQL standard.
189
191 VACUUM(7), vacuumdb(1), Section 20.4.4, Section 25.1.6, Section 28.4.1
192
193
194
195PostgreSQL 14.3 2022 ANALYZE(7)