Friday, March 30, 2012

ROLAP and partitioning questions

I have done all my dev work on a new AS database and now I need to start tuning processing/partitioning for deployment in production. The database needs to be able to drill down to leaf level and so I have implemented a fact dimension to do so. Testing with this dimension as MOLAP yields a 2.8 GB for 20 days of data partitioned by day. With the same partition scheme, if I change the dimension to ROLAP, the database goes down to 124 MB.

With MOLAP, after the cube is processed, it opens nearly immediately in browse or in Proclarity by opening the cube. Good performance, but too big. With ROLAP, it has been "Loading..." for over an hour. Good size, but terrible performance. Is this normal behavior? My guess is that since it is trying to load the default row/column view, it thinks it has to sum all the ROLAP leaves? I'll have to try some MDX against it for comparison.

Should it take just as long to process a db with a ROLAP dim as opposed to a MOLAP dim? The fact dimension itself processes instantaneously, but when processing the database it spends quite a bit of time in the ROLAP fact dimension. How can I see where all the time is being spent? Both seem to take ~1.5 hours to build.

With partitioning, is it best practice to keep the solution with a single partition and create the partition scheme programatically against the server instance?

I have a case where I need to delete and add an arbitrary number of daily partitions each day depending on what data comes in. I am looking at using ASCMD to do this as our ETL is third party. I can see how to delete and create the objects, but how can I tell what items need to be processed after the partition work is done. Is there any easy way to figure out the list/order of items that needs to be processed?

Thanks,

I will always recommend MOLAP berfore ROLAP. Have you 100 percent aggregation level on your partions? If so you should always start with a lower percentage for aggregations, like 20-30 percent. You can also get large cubes with many dimensions, like 20-30 or more, in one cube.

ROLAP builds summary tables i the relational database and it is always slower than MOLAP.

Use SQL Server profiler to study cube processing.

Regards

Thomas Ivarsson

|||

If your cube is partitioned by day, you need to run a distinct query on day and iter through the partitions to be processed.

MOLAP cubes are larger but if they are partitioned and you have usage based optimization implemented, you will get good performance. Also, go ahead and prune any unnecessary attributes from the dimensions, i.e. ETL load date, etc. This will reduce the size of the dimensions and the cube. A customer or a product dimension would normally have a lot of attributes that you can prune, i.e. BOM info, address, etc.

|||

Agreed. That is why I was using MOLAP. With the fact dimension as MOLAP, my partitions were only at 13% aggregation and my db size is still 2.8 GB.

If I move the fact dimension to ROLAP the db size goes down to 153 MB. I do seem to be having better luck with performance now though. It appears that it is best to redesign your aggregations and reprocess when you change a dimension's storage model. I have changed my aggregations, which are now at 24%. The browse feature in VS is still impossibly slow to come up, but Proclarity and SQL Server Management Studio now appear to be playing nice. It looked like before all 3 were taking > 1 hours to load the metadata.

The thing I still don't understand is when I process the ROLAP dimension, it runs nearly instantaneously as expected. However, when I process the database it still takes 75 minutes to process the ROLAP dimension as part of the cube? Does that make sense? I guess I was thinking using ROLAP would speed the cube build up as well as save on storage.

Keehan

|||That is why I was using MOLAP. When I realized how much space the fact dimension is consuming, I am forced to shift. I have already trimmed all the unnecessary attributes. The fact dimension now only contains items that users may need to drill down to like item_number, equipment_id, etc.|||

I have not used ROLAP in SQL2005 apart from with Microstrategy.

I have a large cube that is 6gb and performance is pretty good if it is designed correctly and aggregated. The SQL database is 50gb.

If you want performance, then MOLAP is the way to go. Partitioning does help with processing multiple partitions and query performance.

You also do not specify how many days data are you planning to keep. It should really be a sliding window. The expectation of the end user to be able to drill down to the leaf level comes from using SQL query tools. I have a similar battle. You should set the expectation that they will only be allowed to drill down to leaf level for the last 30 to 90 days. Beyond that, it becomes unmanageable. The first thing to ask the end users is what is the use case and what decisions are they going to base on the transactional data. A typical scenario in the decision making process is multi-steps:

An analytics view highlights outliers|||

The last line seems to be the key, we have a similar cube which use to grow at about 200,000 records a day. My Data modeller and me

decided to keep the transactions out of the cube, this meant our fact size grows 7100 records a day, peformance has hugely improved, downside is that we

have to drilltrough into the transaction system now, if you need mass datastore contact EMC in Hopkinton,MA !!

thanks

Adi

No comments:

Post a Comment