I was trying to understand how Tableau version 9.0 manages the cache, and in the process of researching this I uncovered some interesting facts. I have compiled these findings here so others won’t have to search in multiple places to find the answers they need.  To begin, here are some basics you may or may not know about the query cache.

What is a query cache?

In Tableau server, when a view or a dashboard is accessed the respective underlying database is queried for the answers to be shown on the view or the dashboard.  This applies for both the live databases as well as data extracts. One of the important processes on Tableau server, called as “VizQL Server”, connects to these databases through the appropriate database drivers and executes the SQL queries. These queries are processed by the databases, and the resulting data set or the output of these queries is retrieved from the databases and stored in the random access memory (RAM) of the Tableau server. This In-memory result set is referred to as the query cache or, alternatively, the data cache.

Why is the query cache important?

The results of the queries are populated in the local memory, so when similar requests are received the data is directly served from the cache rather than being retrieved from the databases again. This helps reduce latency and improves performance. In certain situations, such as “real time” dashboards, the query cache will not play any role because the data will update frequently and it would be appropriate to always query the databases directly to ensure retrieval of the most current data. However, for analytical requirements, a query cache will always bring in benefits in terms of performance.

How was the query cache handled in 8.x?

In Tableau version 8.3 and earlier, each VizQL process had an associated internal memory cache which was managed by the respective VizQL process itself. The results of the queries executed by any VizQL process were stored and managed within the memory cache of the respective VizQL process. As it was closely associated with the VizQL process executing the queries, the cache could not be shared with other Tableau services.

What has changed in 9.0?

Tableau 9.0 has introduced a new server process called “Cache Server” which is responsible for storing, managing and sharing the query cache. Thus, in the new version, the query cache management which was with the VizQL in earlier versions has been assigned to the Cache Server. The cache managed by the Cache Server process is referred to as the “external query cache” – alternately known as the “distributed query cache” or the “shared query cache” because it can also be shared with other Tableau services.  The external query cache can be shared across other services on the same server nod. It can be shared with services on the other nodes running across the cluster.

Does this mean VizQL doesn’t manage the Query Cache anymore?

In fact, the VizQL still has an in-memory cache associated which is used for storing and managing the query cache. For simplicity we call this the “Internal Query Cache”.

To understand how these two sets of caches are communicating with each other, let’s take a look at the series of events that take place when a request for a view or a dashboard is received by Tableau Server for the first time.

  1. The VizQL process generates the SQL query and sends it to the database.
  2. The database processes the query and produces the result set.
  3. The result set is received by Tableau and stored in the RAM which is managed by the Cache Server, also called the external query cache.
  4. The internal query cache of the VizQL that executed the query is populated with the data from the external query cache, thus the same data is duplicated on two caches. This is how the cache is shared among other services.
  5. The VizQL utilizes the internal query cache for all further processing.

The following image explains the process:

Tableau Internal Query Cache
Tableau Internal Query Cache

For any VizQL process, if the internal query cache has been invalided for any reason, the external cache will be referred to re-populate the internal query cache. If the external cache is also not available, the database will be queried, and all steps above will be performed.

If there are multiple VizQL processes running on the server, these processes will check if any other VizQL process has already populated the external cache for the respective query, to populate their internal query cache.

What will happen if the view is refreshed?

While accessing a dashboard, the users may press the “refresh” button on the toolbar. Let’s take a look at few scenarios to understand what will happen to the cache upon refreshing.

Scenario 1: Single VizQL Process configured on Tableau server.

When there is a single VizQL process running on server, all the user sessions will be associated with this VizQL process by default. This means all users will be served by the internal query cache of this particular VizQL.

If any of the users refreshes the view, the internal as well as the external cache will be invalidated and the entire cycle will be repeated to populate both the internal and external cache. Subsequent to this the other users will be served by the newly populated internal query cache.

Scenario 2: Multiple VizQL Process configured on Tableau server.

It might happen that all user sessions would be associated with the same VizQL process. In this case the outcome will be similar to the Scenario 1.

Let’s assume that User 1 is associated with VizQL 1 and User 2 is associated with VizQL 2.

  1. If User 1 is the first one to access the server, all the steps mentioned above will be performed.
  2. User 2 will then simply refer to the already-populated external cache and build its internal query cache.
  3. Thus, there will be a single external query cache but two copies of internal query cache populated for each VizQL.
  4. Suppose User 2 refreshes the view. This will result in invalidating the internal query cache for VizQL 2 as well as the external query cache.
  5. The action by the User 2 will also result in repopulation of both caches after they are invalidated.
  6. However the internal query cache of the VizQL 1 will be unaffected and will contain the data initially populated.

If the data in the database has changed between step 1 and step 4, the common external query cache and the internal query cache of VizQL 2 will have the latest data whereas the internal query cache of VizQL 1 will still refer to the original data.

With respect to the scenario above when the User 2 refreshes the view, Tableau server should also repopulate the VizQL 1 associated with User 1’s internal query cache, but it is not. However we do need to remember that Tableau is intended to be used as an analytical tool and it would be a rare situation when regular updates in the cache would be required so the above situation can be handled by the User 1 also refreshing the view.

 

This Tableau Tip was written by:

Sourabh Dasgupta

S_DasguptaSourabh Dasgupta is the Director of Technology & BI at Corporate Renaissance Group India and is also the Product Manager of CRG’s in-house software products such as FlexABM and Cost Allocator. Over the past 14 years he has been involved as a technical expert in various projects and assignments related to implementation of Activity Based Costing and Business Intelligence and reporting systems in India as well as South Africa.

He is a one of the professional trainers for Tableau software and has conducted many trainings for the clients. He has been involved in designing and development of Tableau reports and dashboards, and was awarded with a Tableau Desktop 8 Qualified Associate certificate as well as a Tableau Server 8 Qualified Associate certificate.  He has successfully conducted trainings on the Tableau Software (Desktop and Server versions) for many clients, which included basic and advanced features of the Tableau Desktop and Tableau Server.

Sourabh has a Bachelor of Science degree (Computer Science, Physics, and Mathematics) from Nagpur University, India and a Diploma from the National Institute of Information Technology (NIIT), Pune, India. He also holds a Microsoft Certified Professional certificate for Designing and Implementing Desktop Applications with Microsoft Visual basic.

If you wish to accelerate your skills and empower your learning with hands-on education, or are seeking an Expert-on-Demand, reach out to Sourabh at sdasgupta@crgroup.com.