OLAP - 计算流失三角形,包括样本数据和立方体(PostgreSQL/Mondrian)

11

现实描述: 我们有一些项目清单。每个项目都有很多账户。您可以在每个账户上执行许多操作。我定义了以下简化的维度和事实表:

Dimensions and attributes:
 Project
  project_key
  project_name
  industry
  number_of_accounts
 Distance
  distance_key
  distance_in_months
  distance_in_quarters
 Account
  account_key
  project_key
  account_id
Fact Table and attributes:
 Action_Fact_Table
  project_key
  distance_key
  account_key
  action_id

现在,我想使用赔付三角法来分析数据(可能不是真正的赔付三角形式,但方法是相同的)。最简单的三角形如下:

              Distance in Months
Project name|     1    2    3    4    5    6    7    8    9    10
-------------------------------------------------------------------------
 Project1   |     5   10   15   20   25   30   35   40   45    50
 Project2   |     7   14   21   28   35   42   49   56   63
 Project3   |     2    5    8   11   14   20   25   30
 Project4   |     0    2    5   10   18   23   40
 Project5   |     5   12   18   20   21   30

按行动作数量的累加和,显示距离项目开始日期的月份间隔。你可以使用季度间隔(或在间隔维度中定义的任何其他周期)创建类似的三角形。

您还可以为项目维度中不同层次创建三角形,例如行业(Project1-Project3 = Industry1,Project4-Project5 = Industry2):

              Distance in Months
Project name|     1    2    3    4    5    6    7    8    9    10
-------------------------------------------------------------------------
 Industry1   |    14   29   44   59   74   92  109  126  108   50
 Industry2   |     5   14   23   30   39   53   40   

还有更高级的赔付三角形,其中您将动作的累加值除以账户数量。假设我们的项目有以下账户数量:

Project_name number_of_accounts  
-----------------------------
Project1     100
Project2     100
Project3     100
Project4     100
Project5     200

然后我想得到以下三角形:

              Distance in Months
Project    |     1    2    3    4    5    6    7    8    9    10
------------------------------------------------------------------------
 Project1  |   .05  .01  .15  .20  .25  .30  .35  .40  .45   .50
 Project2  |    .7  .14  .21  .28  .35  .42  .49  .56  .63
 Project3  |    .2   .5   .8  .11  .14  .20  .25  .30
 Project4  |    .0   .2   .5  .10  .18  .23  .40
 Project5  |   .05  .06  .09  .10 .105  .15

对于希望比较项目及其活动的情况,其中一个项目中的账户数量与其他项目不同,这种情况下,特别有用。问题是是否可能在OLAP中创建这样的计算。我曾考虑过在项目表中使用“项目中的账户数量”,但无法弄清楚如何实现。另一个选项是在账户维度中汇总数据。我在谷歌上也没有找到任何有用信息,可能是因为我问的问题不正确。

这个问题的解决方案在许多行业都广泛适用,特别是在保险和银行业。它可以应用于所有具有长时间性能窗口并可由定义明确、可比较的单位批次跟踪的过程中。

(我们正在使用PostgreSQL、Saiku,在Schema Workbench中定义了立方体)

测试数据(PostgreSQL语法,如果需要其他内容,请告诉我)

--drop table if exists project cascade;
create table project (
  project_key int primary key,
  project_name character varying,
  industry character varying,
  number_of_accounts int
);

--drop table if exists distance cascade;
create table distance (
  distance_key int primary key,
  distance_in_months int,
  distance_in_quarters int);

--drop table if exists account cascade;
create table account (
  account_key int primary key,
  project_key int references project (project_key)
);

--drop table if exists action_fact_table cascade;
create table action_fact_table (
  project_key int references project (project_key),
  distance_key int references distance (distance_key),
  account_key int references account (account_key),
  action_id int
);

-- project data
insert into project values (1,'Project1','Industry1',100);
insert into project values (2,'Project2','Industry1',100);
insert into project values (3,'Project3','Industry1',100);
insert into project values (4,'Project4','Industry2',100);
insert into project values (5,'Project5','Industry2',200);

-- distance data
insert into distance values(1,1,1);
insert into distance values(2,2,1);
insert into distance values(3,3,1);
insert into distance values(4,4,2);
insert into distance values(5,5,2);
insert into distance values(6,6,2);
insert into distance values(7,7,3);
insert into distance values(8,8,3);
insert into distance values(9,9,3);
insert into distance values(10,10,4);
insert into distance values(11,11,4);
insert into distance values(12,12,4);

-- account data
/* let me know if you need insert statement for every row */
insert into account (
select generate_series (1,100), 1 union all
select generate_series (101,200), 2 union all
select generate_series (201,300), 3 union all
select generate_series (301,400), 4 union all
select generate_series (401,600), 5
);

insert into action_fact_table values(1,1,90,10001);
insert into action_fact_table values(1,1,32,10002);
insert into action_fact_table values(1,1,41,10003);
insert into action_fact_table values(1,1,54,10004);
insert into action_fact_table values(1,1,45,10005);
insert into action_fact_table values(1,2,22,10006);
insert into action_fact_table values(1,2,29,10007);
insert into action_fact_table values(1,2,41,10008);
insert into action_fact_table values(1,2,89,10009);
insert into action_fact_table values(1,2,15,10010);
insert into action_fact_table values(1,3,32,10011);
insert into action_fact_table values(1,3,100,10012);
insert into action_fact_table values(1,3,72,10013);
insert into action_fact_table values(1,3,80,10014);
insert into action_fact_table values(1,3,10,10015);
insert into action_fact_table values(1,4,12,10016);
insert into action_fact_table values(1,4,45,10017);
insert into action_fact_table values(1,4,83,10018);
insert into action_fact_table values(1,4,42,10019);
insert into action_fact_table values(1,4,33,10020);
insert into action_fact_table values(1,5,22,10021);
insert into action_fact_table values(1,5,27,10022);
insert into action_fact_table values(1,5,59,10023);
insert into action_fact_table values(1,5,32,10024);
insert into action_fact_table values(1,5,70,10025);
insert into action_fact_table values(1,6,32,10026);
insert into action_fact_table values(1,6,5,10027);
insert into action_fact_table values(1,6,15,10028);
insert into action_fact_table values(1,6,70,10029);
insert into action_fact_table values(1,6,43,10030);
insert into action_fact_table values(1,7,59,10031);
insert into action_fact_table values(1,7,9,10032);
insert into action_fact_table values(1,7,99,10033);
insert into action_fact_table values(1,7,79,10034);
insert into action_fact_table values(1,7,31,10035);
insert into action_fact_table values(1,8,56,10036);
insert into action_fact_table values(1,8,34,10037);
insert into action_fact_table values(1,8,48,10038);
insert into action_fact_table values(1,8,79,10039);
insert into action_fact_table values(1,8,42,10040);
insert into action_fact_table values(1,9,10,10041);
insert into action_fact_table values(1,9,10,10042);
insert into action_fact_table values(1,9,49,10043);
insert into action_fact_table values(1,9,61,10044);
insert into action_fact_table values(1,9,49,10045);
insert into action_fact_table values(1,10,99,10046);
insert into action_fact_table values(1,10,69,10047);
insert into action_fact_table values(1,10,84,10048);
insert into action_fact_table values(1,10,99,10049);
insert into action_fact_table values(1,10,3,10050);
insert into action_fact_table values(2,1,182,10051);
insert into action_fact_table values(2,1,127,10052);
insert into action_fact_table values(2,1,197,10053);
insert into action_fact_table values(2,1,174,10054);
insert into action_fact_table values(2,1,187,10055);
insert into action_fact_table values(2,1,144,10056);
insert into action_fact_table values(2,1,160,10057);
insert into action_fact_table values(2,2,155,10058);
insert into action_fact_table values(2,2,153,10059);
insert into action_fact_table values(2,2,119,10060);
insert into action_fact_table values(2,2,188,10061);
insert into action_fact_table values(2,2,125,10062);
insert into action_fact_table values(2,2,147,10063);
insert into action_fact_table values(2,2,123,10064);
insert into action_fact_table values(2,3,136,10065);
insert into action_fact_table values(2,3,163,10066);
insert into action_fact_table values(2,3,187,10067);
insert into action_fact_table values(2,3,138,10068);
insert into action_fact_table values(2,3,168,10069);
insert into action_fact_table values(2,3,132,10070);
insert into action_fact_table values(2,3,138,10071);
insert into action_fact_table values(2,4,158,10072);
insert into action_fact_table values(2,4,171,10073);
insert into action_fact_table values(2,4,153,10074);
insert into action_fact_table values(2,4,141,10075);
insert into action_fact_table values(2,4,182,10076);
insert into action_fact_table values(2,4,165,10077);
insert into action_fact_table values(2,4,143,10078);
insert into action_fact_table values(2,5,190,10079);
insert into action_fact_table values(2,5,181,10080);
insert into action_fact_table values(2,5,163,10081);
insert into action_fact_table values(2,5,134,10082);
insert into action_fact_table values(2,5,145,10083);
insert into action_fact_table values(2,5,190,10084);
insert into action_fact_table values(2,5,198,10085);
insert into action_fact_table values(2,6,137,10086);
insert into action_fact_table values(2,6,133,10087);
insert into action_fact_table values(2,6,135,10088);
insert into action_fact_table values(2,6,103,10089);
insert into action_fact_table values(2,6,187,10090);
insert into action_fact_table values(2,6,127,10091);
insert into action_fact_table values(2,6,117,10092);
insert into action_fact_table values(2,7,116,10093);
insert into action_fact_table values(2,7,139,10094);
insert into action_fact_table values(2,7,111,10095);
insert into action_fact_table values(2,7,150,10096);
insert into action_fact_table values(2,7,151,10097);
insert into action_fact_table values(2,7,181,10098);
insert into action_fact_table values(2,7,109,10099);
insert into action_fact_table values(2,8,102,10100);
insert into action_fact_table values(2,8,101,10101);
insert into action_fact_table values(2,8,118,10102);
insert into action_fact_table values(2,8,147,10103);
insert into action_fact_table values(2,8,186,10104);
insert into action_fact_table values(2,8,136,10105);
insert into action_fact_table values(2,8,160,10106);
insert into action_fact_table values(2,9,149,10107);
insert into action_fact_table values(2,9,119,10108);
insert into action_fact_table values(2,9,169,10109);
insert into action_fact_table values(2,9,176,10110);
insert into action_fact_table values(2,9,195,10111);
insert into action_fact_table values(2,9,183,10112);
insert into action_fact_table values(2,9,140,10113);
insert into action_fact_table values(3,1,224,10114);
insert into action_fact_table values(3,1,241,10115);
insert into action_fact_table values(3,2,295,10116);
insert into action_fact_table values(3,2,249,10117);
insert into action_fact_table values(3,2,260,10118);
insert into action_fact_table values(3,3,298,10119);
insert into action_fact_table values(3,3,267,10120);
insert into action_fact_table values(3,3,297,10121);
insert into action_fact_table values(3,4,211,10122);
insert into action_fact_table values(3,4,253,10123);
insert into action_fact_table values(3,4,214,10124);
insert into action_fact_table values(3,5,248,10125);
insert into action_fact_table values(3,5,223,10126);
insert into action_fact_table values(3,5,288,10127);
insert into action_fact_table values(3,6,207,10128);
insert into action_fact_table values(3,6,296,10129);
insert into action_fact_table values(3,6,221,10130);
insert into action_fact_table values(3,6,201,10131);
insert into action_fact_table values(3,6,227,10132);
insert into action_fact_table values(3,6,209,10133);
insert into action_fact_table values(3,7,267,10134);
insert into action_fact_table values(3,7,282,10135);
insert into action_fact_table values(3,7,215,10136);
insert into action_fact_table values(3,7,285,10137);
insert into action_fact_table values(3,7,212,10138);
insert into action_fact_table values(3,8,239,10139);
insert into action_fact_table values(3,8,294,10140);
insert into action_fact_table values(3,8,296,10141);
insert into action_fact_table values(3,8,251,10142);
insert into action_fact_table values(3,8,281,10143);
insert into action_fact_table values(4,2,392,10144);
insert into action_fact_table values(4,2,347,10145);
insert into action_fact_table values(4,3,318,10146);
insert into action_fact_table values(4,3,400,10147);
insert into action_fact_table values(4,3,378,10148);
insert into action_fact_table values(4,4,315,10149);
insert into action_fact_table values(4,4,318,10150);
insert into action_fact_table values(4,4,394,10151);
insert into action_fact_table values(4,4,382,10152);
insert into action_fact_table values(4,4,317,10153);
insert into action_fact_table values(4,5,314,10154);
insert into action_fact_table values(4,5,354,10155);
insert into action_fact_table values(4,5,338,10156);
insert into action_fact_table values(4,5,375,10157);
insert into action_fact_table values(4,5,317,10158);
insert into action_fact_table values(4,5,329,10159);
insert into action_fact_table values(4,5,342,10160);
insert into action_fact_table values(4,5,380,10161);
insert into action_fact_table values(4,6,313,10162);
insert into action_fact_table values(4,6,311,10163);
insert into action_fact_table values(4,6,336,10164);
insert into action_fact_table values(4,6,380,10165);
insert into action_fact_table values(4,6,355,10166);
insert into action_fact_table values(4,7,386,10167);
insert into action_fact_table values(4,7,322,10168);
insert into action_fact_table values(4,7,311,10169);
insert into action_fact_table values(4,7,367,10170);
insert into action_fact_table values(4,7,350,10171);
insert into action_fact_table values(4,7,384,10172);
insert into action_fact_table values(4,7,391,10173);
insert into action_fact_table values(4,7,331,10174);
insert into action_fact_table values(4,7,373,10175);
insert into action_fact_table values(4,7,314,10176);
insert into action_fact_table values(4,7,305,10177);
insert into action_fact_table values(4,7,331,10178);
insert into action_fact_table values(4,7,350,10179);
insert into action_fact_table values(4,7,376,10180);
insert into action_fact_table values(4,7,387,10181);
insert into action_fact_table values(4,7,312,10182);
insert into action_fact_table values(4,7,397,10183);
insert into action_fact_table values(5,1,404,10184);
insert into action_fact_table values(5,1,562,10185);
insert into action_fact_table values(5,1,511,10186);
insert into action_fact_table values(5,1,594,10187);
insert into action_fact_table values(5,1,541,10188);
insert into action_fact_table values(5,2,506,10189);
insert into action_fact_table values(5,2,427,10190);
insert into action_fact_table values(5,2,481,10191);
insert into action_fact_table values(5,2,463,10192);
insert into action_fact_table values(5,2,579,10193);
insert into action_fact_table values(5,2,455,10194);
insert into action_fact_table values(5,2,527,10195);
insert into action_fact_table values(5,3,465,10196);
insert into action_fact_table values(5,3,562,10197);
insert into action_fact_table values(5,3,434,10198);
insert into action_fact_table values(5,3,401,10199);
insert into action_fact_table values(5,3,464,10200);
insert into action_fact_table values(5,3,500,10201);
insert into action_fact_table values(5,4,554,10202);
insert into action_fact_table values(5,4,600,10203);
insert into action_fact_table values(5,5,483,10204);
insert into action_fact_table values(5,6,552,10205);
insert into action_fact_table values(5,6,565,10206);
insert into action_fact_table values(5,6,586,10207);
insert into action_fact_table values(5,6,544,10208);
insert into action_fact_table values(5,6,436,10209);
insert into action_fact_table values(5,6,531,10210);
insert into action_fact_table values(5,6,409,10211);
insert into action_fact_table values(5,6,524,10212);
insert into action_fact_table values(5,6,564,10213);

样本立方体(Mondrian):

<Schema name="RunoffTriangleSchema">
  <Cube name="RunoffTriangleCube" visible="true" cache="true" enabled="true">
    <Table name="action_fact_table" schema="public">
    </Table>
    <Dimension type="StandardDimension" visible="true" foreignKey="project_key" name="Project">
      <Hierarchy name="Project" visible="true" hasAll="true">
        <Table name="project" schema="public" alias="">
        </Table>
        <Level name="Industry" visible="true" column="industry" uniqueMembers="false">
        </Level>
        <Level name="Project Name" visible="true" column="project_name" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Dimension type="StandardDimension" visible="true" foreignKey="distance_key" name="Distance">
      <Hierarchy name="Distance" visible="true" hasAll="true">
        <Table name="distance" schema="public" alias="">
        </Table>
        <Level name="Distance In Quarters" visible="true" column="distance_in_quarters" uniqueMembers="false">
        </Level>
        <Level name="Distance In Months" visible="true" column="distance_in_months" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Dimension type="StandardDimension" visible="true" foreignKey="account_key" name="Account">
      <Hierarchy name="Account" visible="true" hasAll="true">
        <Table name="account" schema="public">
        </Table>
        <Level name="Account Key" visible="true" column="account_key" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Measure name="CountActions" column="action_id" aggregator="count" visible="true">
    </Measure>
  </Cube>
</Schema>
2个回答

2

两个赏金但没有答案,我很惊讶。我已找到一个解决方法 - 使用SQL和BIRT引擎,现在接近我要寻找的内容。我仍然希望有人能够解决针对OLAP的问题。


为使其工作,我已经:

  • 编写了自定义函数来动态返回所选列
  • 使用SQL计算基于所选列的挂账三角数据
  • 在BIRT 2.6.1中创建报告以显示结果并提供参数选择界面

动态返回列

    CREATE or replace FUNCTION bizdata.getColumns(_column1 text, _column2 text, _column3 text, _column4 text, _table text, _rqdl text)
      RETURNS TABLE(cmf1 text, cmf2 text, cmf3 text, outval numeric, rqdl text) AS $$
    BEGIN
        RETURN QUERY EXECUTE 
            'SELECT ' 
                || case when _column1 = 'None' then quote_literal('None') else quote_ident(_column1) end || '::text as cmf1,' 
                || case when _column2 = 'None' then quote_literal('None') else quote_ident(_column2) end || '::text as cmf2,' 
                || case when _column3 = 'None' then quote_literal('None') else quote_ident(_column3) end || '::text as cmf3,'   
                || quote_ident(_column4) || '::numeric as baseline,'
                || case when _rqdl = 'None' then 0::text else quote_ident(_rqdl)::text end || '::text as rqdl'  
            ' FROM '
                || 'bizdata.' || _table; 
    END;
     $$ LANGUAGE plpgsql;

Thi function takes the following as input variables:

 - _column1 - common mapping field number 1
 - _column2 - common mapping field number 2
 - _column3 - common mapping field number 3
 - _column4 - column used for aggregation (sum)
 - _table - table used for getting data
 - _rqdl - requested distance level

计算数据

Using bizdata.getColumns() function I can calculate triangle data using the following statement:


with 

params as (
    select 'cmf1'::varchar as prm_name, 'project_owner_name_short'::varchar as prm_value union all 
    select 'cmf2'::varchar as prm_name, 'project_source_name_short'::varchar as prm_value union all
    select 'cmf3'::varchar as prm_name, 'None'::varchar as prm_value union all
    select 'fact'::varchar as prm_name, 'amount'::varchar as prm_value union all    
    select 'fact_table'::varchar as prm_name, 'dwv_daily_allocation_fact'::varchar as prm_value union all       
    select 'baseline'::varchar as prm_name, 'tmp_nominal_value'::varchar as prm_value union all 
    select 'baseline_table'::varchar as prm_name, 'dw_project'::varchar as prm_value union all
    select 'rqdl'::varchar as prm_name, 'year_distance'::varchar as prm_value 
)

,baseline_data as (
    select 
        cmf1,
        cmf2,
        cmf3,
        sum(coalesce(outval,0)) as baseline
    from 
        bizdata.getColumns(
            (select prm_value from params where prm_name = 'cmf1'::text),
            (select prm_value from params where prm_name = 'cmf2'::text),
            (select prm_value from params where prm_name = 'cmf3'::text),
            (select prm_value from params where prm_name = 'baseline'::text),
            (select prm_value from params where prm_name = 'baseline_table'::text), 
            'None'
            )
    group by
        cmf1,
        cmf2,
        cmf3

)




,fact_data as (
    select 
        cmf1,
        cmf2,
        cmf3,
        rqdl::int as rqdl,
        sum(coalesce(outval,0)) as fact
    from 
        bizdata.getColumns(
            (select prm_value from params where prm_name = 'cmf1'::text),
            (select prm_value from params where prm_name = 'cmf2'::text),
            (select prm_value from params where prm_name = 'cmf3'::text),
            (select prm_value from params where prm_name = 'fact'::text),
            (select prm_value from params where prm_name = 'fact_table'::text),
            (select prm_value from params where prm_name = 'rqdl'::text)
            )
    group by
        cmf1,
        cmf2,
        cmf3,
        rqdl

)

select 
    case when cmf1 = 'None' then null else cmf1 end as cmf1,
    case when cmf2 = 'None' then null else cmf1 end as cmf,
    case when cmf3 = 'None' then null else cmf1 end as cmf1,
    rqdl,
    fact,
    baseline,
    sum(fact) over (partition by cmf1, cmf2, cmf3 order by rqdl) as cfact,
    sum(fact) over (partition by cmf1, cmf2, cmf3 order by rqdl) / baseline as cfactpct
from 
    fact_data 
    join baseline_data using (cmf1, cmf2, cmf3)

您可以看到,我可以使用最多3个分组变量(cmf1、cmf2、cmf3),选择任何距离属性(只要该属性在dwv_daily_allocation_fact中可用)。分组变量应该在基线表和事实表中都可用(以获取共同的组级别)。
报告:
最后一步是在BIRT(2.6.1)中创建报告,其中SQL参数部分中的参数将被数据集参数替换,并链接到报告参数。那些使用BIRT的人可能会理解,其他人必须找到其他方法。
参数选择GUI
输出报告
我仍然需要弄清楚正确的表排序方式(使历史最长的组首先出现)。
编辑:我已经在BIRT crosstab中弄清楚了排序,现在它看起来像一个真正的三角形。
如果您需要更详细的说明,请告诉我。

很高兴听到你正在取得进展。我猜测缺乏回复是因为你正在使用一些相对较为晦涩的技术栈。 - Mike Honey
@MikeHoney 可能是一种晦涩的技术,但问题是通用的 - 如果您知道 MS 技术或其他任何技术的答案,请告诉我们。 - Tomas Greif
我可能会在SSRS(报表服务)中使用RunningValue函数。 - Mike Honey

0
我创建了一个名为 pgvint 的 R 包,可以轻松计算陈旧曲线(逐步减少三角形)。该程序包位于github,目前仅支持 PostgreSQL 作为数据源。
示例输出: enter image description here

enter image description here

此外,还有一个Shiny应用程序,可以交互地以不同的布局显示老数据:

enter image description here


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接