Talking about Slowly Changing Dimensions

00:18

Hi everyone, 

i was working in a dimension with SCD Type 2 included in Tuenti. I´m not going to explain what is an SCD, because in Wikipedia its very well explained. Also, i´m not going to explain in detail our data model, but i took some useful concepts for the community (and for you!).

I would like to explain here, how and why we made this solution in Tuenti.

Concepts

The first of all, you need to know "how are we managing our customers and users?"(in data model ing terms).

As in another companies, we have this two main different concepts to split the business in two modules:

- Business values in terms of money. When i said money, i´m saying customers which have an account and give us recurring revenues.
- Business values in terms of value. This means that a user its more o less valuable for us. For instance, when a user use the application in Tuenti, that means that this user is most valuable for us, because this type of users have less CHURN ( this its a business concept that represent the average of users - in a defined time range - that leaves the company by some reason - more or less -).

Problem

So in Tuenti we can have a subscription with an user_id associated (we call "linked" subscriptions) or by other side, we can have a subscription_id without user_id associated. Also, a subscription_id can have multiple user_id during their life cycle.

By other hand, we are in a telecommunication company, and at least in Spain, we have two type of subscriptions, "prepay" and "postpay" (if you are interested in this concepts, i can explain them bellow, in the comments).

The fields

So, once we know the main concepts, in tuenti we need to manage some interesting fields:

- subscription_id. Don´t change and its unique.
- The user_id. This changes through the time. 
- Line type. This changes through the time.

Solution

Well, we were thinking about the best solution to integrate the different subscriptions events. The best solution was to create a new dimension called "subscritpions" and we include the changes through the time.

How?

We included three new fields (we have more, but this its a summary):
- row_effective_time
- row_end_time
- source

With those fields now we are able to manage the changes through the time :)

One example about our dimension with SCD Type Two fields:


We have Vertica as main Datawarehouse database, so we used a couple of analytical functions included in the database (i think that they are included in another database, but i can´t verify this). The functions are: LAG, LEAD, FIRST_VALUE and OVER.  But, you can do this SCD with your ETL tool or via scripts.

I expect this meaning helps how are you doing modeling the data in your company.

The next chapter, more detailed and better!

You Might Also Like

1 comentarios

  1. Yes you have this analitical functions in sql server 2012 and avobe, sap hana and i think Oracle... nice Blog

    ReplyDelete

Sé respetuoso/a, en este blog caben todo tipo de opiniones con respeto y serenidad.

statistics :: ヽ(*・ω・)ノ

Contact Form :: (」゜ロ゜)」