By Saad Ahmad, Executive Vice President of Smart IS International
I have heard this myth so many times that we have issues “because we have too many rows in table x” — or in response to a support request we are told “this is because your purges are not set up properly” or over engineered solutions are created because “pckwrk has too many rows”.
Note that the number of rows in a table is never the issue for performance — it is the access path to the data. Furthermore, in a system like RedPrairie/JDA WMS (an On Line Transaction Processing System) — most of the data is coming from the database cache so it is absolutely not an issue! This also goes into a tangent that on Oracle do not spend so much time in distributing your table-spaces to different disks — this is not 1995!
Current version of this article relies on data from an Oracle database but for this discussion, same concepts apply to SQLServer.
What is a table?
A table, from the point of view of the database, is not a monolith where you read it from top to bottom to see where your data is. It is made of segments that are made of blocks. You simply need to get to the right segment and block to read your data. You get there through the use of data dictionary and indexes.
So, from the point of view of the overall access path, if you divide your 10 million row table into 10 tables and you look at some “index” to see which table you should go to and then go to that table — your overall solution will be no better than going at the 10 million row table itself.
Data partitioning for extremely large tables is very useful and I am not disputing that. There are other reasons to split the data — not performance in an OLTP system.
Index Depth is the Key
Oracle uses a variant of a B-Tree index. The key concept here is the “height” of the index. Height represents how many reads are needed in order to get at the desired data block. Even if you have 10 million rows in a table, your height should be <= 3. That is pretty good and should not cause any performance issues. While we are at it — lets bust another myth — it is not necessary to rebuild the indexes due to performance issues. This is because the B-Tree is balanced as the data is inserted — It is not 1985! Every time you rebuild an index for gaining performance an engineer at Oracle/Microsoft gets heart burn.
To see the index height, analyze the index in oracle and then see the BLEVEL column in “ind” view. Actual height is that number plus 1. You will rarely see this value > 2.
Putting the concept to test
The following tests are on a development machine that has not been tuned for production use. I am comparing the performance to access a table by PK between two tables that have very different sizes.
- dlytrn has 17,471 rows. The BLEVEL is 1 for the primary key.
- usr_large_dlytrn has has 626,542 rows. The BLEVEL is 2 for the primary key.
Access Times for dlytrn (same PK accessed multiple times)
- 1,000- 2.4s
- 10,000- 14.29s
- 100,000- 127.88s
Access Times for usr_large_dlytrn (same PK accessed multiple times)
- 100,000–104.54 s
So in this test the larger table gave better performance — but that is not important. It may have given better performance because of other reasons. The key message is that number of rows in the table itself had no effect — especially in an OLTP system where our queries typically return a very small percentage of the rows in a table and that access is through indexes and cache.
But I swear — when we had performance issues last year and we deleted data from pckwrk — things improved!
I believe you — but unfortunately when someone faces acute performance issues, we often forget about basic troubleshooting strategies and start “trying out options” — sometimes expensive options without any basis for supporting them. For example, people would right away suggest “rebuild indexes” and “purge” without any evidence to support their decisions. And when such actions are taken we often do not capture the results objectively. For instance if we rebuild, we should see if IND view on Oracle shows any improvement.
When you purged data in pckwrk, most likely the access path to pckwrk changed and that is what improved your performance — not the number of rows. So in that case we should have identified the problem query and analyzed its query plan. If we analyzed the same query after purging the data we would have noticed a different plan. Had we used the improved plan on the larger table — we would have seen the same improvement.
Right Indexes vs More Indexes
In many cases dropping some indexes can improve performance as well. We have several indexes delivered by standard product that simply do not make any sense. On top of that several projects end up creating many more indexes without analyzing the need — and those indexes are often the culprit when we are seeing the performance issues. For example if you add an index on wh_id column in pckwrk by itself — that will hurt performance. We should add indexes only when they are selective enough to give less than 10% of the rows.
MOCA “auto binds” the SQL statements, i.e. if you ran a query [select * from pckwrk where pcksts = ‘P’], MOCA will run it as [select * from pckwrk where pcksts = :1]. So whereas with [P] Oracle had enough information to understand that an index on pcksts is good when Oracle does not know the value (as in running with :1) then Oracle may decide that 99% of rows in pckwrk have same pcksts — so maybe it is not a good idea to use pcksts index. These are the types of situations where deleting rows from pckwrk may help because pcksts then becomes more selective — but as pckwrk will grow again the same issue will re-appear. As of 2011, MOCA provides a “#nobind” hint. If that were used for the query then Oracle will run it without bind variables and make the correct decision.
Adding more columns to an index
This is a less known strategy and often has extremely good results. As tables grow organically, even for a selective index — database needs to access several rows in the base table in order to get additional information. For example, lets say we got 100 locations and we join to inventory tables and the query may need invlod.loducc. In order to get that one column, database will need to access the invlod table in addition to the index on invlod. The resulting overhead may be very low, but often such queries are running 100s of times within a transaction so that number can add up. An explain plan would have looked good but would include “TABLE ACCESS BY ROWID FOR INVLOD”. That can be an issue. If you modified the index definition to include the columns that your query needed, then accessing index will be enough and the improvement will be quite significant.
Joining with views in general and poldat_view in particular
As indicated in Top 10 mistakes, policies are often over-used and we end up with unbalanced data distribution. So even for a given polcod, polvar, polval — we may not gain good selectivity. That coupled with the complexity of the poldat_view where it is trying to add rows for each wh is often a major problem.
If you are about to put a join to “poldat_view” in your query — stop, take a glass of water and try to think of some alternate strategies. Even though joining is always faster than using “|” and do multiple queries, if poldat_view is involved — it may be better to not use join and go for “|” and a second query to get data from poldat_view. Another option may be to read the poldat_view first into a record-set and then access that within the loop of your query.
If you must use poldat_view, then see if using base poldat is an option. For example, if your data is always expected to be warehouse specific, then you could use poldat.wh_id_tmpl of the actual warehouse.
Any time you use views in the query — and especially poldat_view — always explain and also see if the database could run it incorrectly under certain scenarios. I generally favor not joining to the view, especially poldat_view, and use one of the other strategies.
Changing Query Plans
Often when our symptom is “Performance was fine yesterday” — the issue is that some change in data distribution caused the database to start using a different plan. In such situations, it helps to see the problem query and see its plans historically. Then we should pin the plan that was running well.
Purging data is not a bad idea and we should have an appropriate purge and archive strategy but that falls in the domain of “data management” not “performance management”. Number of rows in a table is not a measure of performance. When faced with performance issues, it is important to understand why we are facing the problem and then take corrective action. Every corrective action should then be scrutinized to see if it really helped or had no effect. An incorrect assumption can become a problem over time because then when we try to do things right — there is resistance that suggests that all of these changes were made to improve performance. So key message is:
- Just number of rows in table is not a measure of performance
- Do not create complex solutions where you try to “not use pckwrk because it has so many rows”.
- Do not worry that prtmst has 1M rows.
- Rebuilding indexes will rarely — if ever — help performance
- For RedPrairie WMS, data is accessed mostly by indexes and cache, so do not over engineer data distribution strategy where you try to put tables on different physical disks. You have bought that expensive disk array — let it do its job!
- When faced with performance issues, always look at the query plan (and with bind variables). See older plans and see if execution plan changed.
- Remove unnecessary indexes.
- Employ change management regime to control the changes done due to performance issues. Capture the improvements objectively.
Originally published at http://saadwmsblog.blogspot.com.