Are Regular VACUUM ANALYZE Still Recommended Under 9.1?


Answer :

VACUUM is only needed on updated or deleted rows in non-temporary tables. Obviously you're doing lots of INSERTs but it's not obvious from the description that you're also doing lots of UPDATEs or DELETEs.

These operations can be tracked with the pg_stat_all_tables view, specifically the n_tup_upd and n_tup_del columns. Also, even more to the point, there is a n_dead_tup column that tells, per table, how much rows need to be vacuumed. (see Monitoring statistics in the doc for functions and views related to statistics gathering).

A possible strategy in your case would be to suppress the scheduled VACUUM, keeping an eye on this view and checking on which tables the n_dead_tup is going up significantly. Then apply the aggressive VACUUM to these tables only. This will be a win if there are large tables whose rows never get deleted nor updated and the aggressive VACUUM is really necessary only on smaller tables.

But keep running the ANALYZE for the optimizer to always have fresh statistics.


I see nothing in your question that autovacuum would not take care of. It largely depends on the pattern of your writing activities. You mention 3 million new rows per week, but INSERT (or COPY) typically don't create table and index bloat. (autovacuum only has to take care of column statistics, the visibility map and some minor jobs). UPDATE and DELETE are the dominant cause of table and index bloat, especially when targeting random rows. I don't see any of that in your question.

autovacuum has come a long way and is doing a great job in Postgres 9.1 or later. I would have a look at the autovacuum settings. If vacuuming tends to interfere with your work load, also have a look at "Cost-based Vacuum Delay". Manual vacuuming should be the rare exception.

If you have lots of random UPDATEs, you may want to set the FILLFACTOR to something lower than 100, to allow HOT updates right away and reduce the need for VACUUM. More on HOT updates:

  • Redundant data in update statements

Note also, that temporary tables need manual VACUUM & ANALYZE. I quote the manual on CREATE TABLE:

The autovacuum daemon cannot access and therefore cannot vacuum or analyze temporary tables. For this reason, appropriate vacuum and analyze operations should be performed via session SQL commands. For example, if a temporary table is going to be used in complex queries, it is wise to run ANALYZE on the temporary table after it is populated.


While I agree that using the auto features is best instead of running it database wide, in most cases per table tuning is necessary.

I don't quite agree with the design choice of postgres to tie together vacuum and analyze, I have seen several instances where databases that do a lot of insert/update but little delete never get analyze done and start to perform badly.

The solution is to go into the tables that get used heavily and are subject to large queries and set the auto analyze settings for those tables down to something to where they are getting analyzed once or every other day.

You can get to the per table settings in the gui on the auto vacuum tab and you will see analyze settings there which you can set independently of vacuum.

The settings end up in the reloptions table and can be seen with the query

SELECT c.relname, c.reloptions FROM pg_class c where reloptions is not null 

and a sample value there of an agressive analyze might be

{autovacuum_enabled=true,autovacuum_analyze_threshold=10,autovacuum_analyze_scale_factor=.01} 

To see when the last time your tables got auto analyzed query

select      relname,      n_dead_tup,      n_tup_ins,      n_tup_upd,      n_tup_del,      last_autoanalyze,      autoanalyze_count  from pg_stat_user_tables  where last_autoanalyze is not null  order by last_autoanalyze desc; 

Comments

Popular posts from this blog

Chemistry - Bond Angles In NH3 And NCl3

Change The Font Size Of Visual Studio Solution Explorer