Skip to main content

Best practices for writing and optimising Monitoring scenarios

Updated over 12 months ago

Here are some tips from our Data Science team on how to write scenarios so that they will be dependant, performable, human-readable, and have a clear audit trail for when changes are made.

General guidelines

  1. Whenever writing a new scenario or making changes, utilise the Note field and always write a summary of what was changed. This is not only necessary for both internal and external audits - it also helps other users to track changes and understand the purpose behind a change.

  2. Often during Transaction monitoring, you will want to join the person table to have access to person-level data. When doing this, please make use of organisationId and personId. This narrows the search of the person whose data you wish to join and helps to speed up the query.

    join person p on 
    p.id = $personId
    and p.customer_id = $organisationId

  3. For TRANSACTION scenarios always use $transactionTimestamp as the “current” time indicator when limiting scenario time wise as seen in examples below. So instead of inbuilt Postgres current timestamps like now() or current_date; and new.timestamp use the placeholder $transactionTimestamp.

    In case of joining person_aggregate table and setting time specific restrictions on person_aggregate.aggregate_date, then please use the casted version of the transaction timestamp - $transactionTimestamp::timestamp - as it has different variable type.

    This placeholder will fetch the timestamp from the current transaction (i.e. new.id = $transactionId) and substitute it straight into the SQL query in a separate step. We have seen a considerable improvement in scenario performance since more optimal indexes are used. Also if now() is used in scenarios, then testing functionality is not usable since time ranges will be fixed as of initiating a scenario test.

    So for example when joining past transactions and person_aggregate table:

    select true
    from transaction new
    join transaction past on past.person_id = $personId
    join person p on p.id = $personId
    join person_aggregate on person_aggregate.person_id = $personId
    where new.id = $transactionId
    and p.customer_id = $organisationId
    and p.type = 'INDIVIDUAL'
    and p.created_time > $transactionTimestamp - '1 month'::interval
    and past.timestamp between $transactionTimestamp - '1 days'::interval
    and $transactionTimestamp
    and person_aggregate.organisation_id = $organisationId
    and person_aggregate.aggregate_type = 'SUM_TRANSACTIONS'
    and person_aggregate.aggregate_operation = 'SUM'
    and person_aggregate.aggregate_date between $transactionTimestamp::timestamp - '1 days'::interval
    and $transactionTimestamp::timestamp

  4. It is generally a good idea to use as few subqueries as possible. Try as we might, there is usually no outsmarting the Postgres engine. The outer where clause is the main tool to narrow the search and make queries fast and performant.

  5. One naming convention that is intuitive and helpful when using aliases in queries is to use new when referring to the current (triggering) transaction and past when joining historic transactions. For other tables, we typically use the first letter(s) as an alias to abbreviate the full table name. E.g. person_aggregate -> pa (or past) , person -> p

    select sum(past.amount)>10000 
    ,sum(past.amount)
    from transaction new
    join transaction past on past.person_id = $personId
    join person p on p.id = $personId
    where
    new.id = $transactionId
    and past.timestamp between $transactionTimestamp -'1 days'::interval and $transactionTimestamp
    and p.customer_id = $organisationId
    and p.type = 'INDIVIDUAL'

  6. In order to streamline the scenario, try to limit the scenario from triggering transaction (new ) in where clause as much as possible.

  7. For ease of editing scenarios, when selecting certain fields in the select statement, place the comma at the beginning of the row and not at the end. This makes it easier to comment out certain fields.

  8. When necessary, feel free to use CTE-s. They are not the first choice, but when a subquery will be used several times in the main query, they can be useful.

  9. One common question relates to how to JOIN tables and in which order. Usually the Postgres engine and query optimiser takes over and the order of joins does not play a major role in performance. However, we recommend joining smaller tables first and, if possible, always prefer inner joins to left outer joins.

  10. In order to get a float number, we recommend casting using ::float instead of multiplying by *1.0 , as it is more human-readable.

  11. Many fields in transaction and person tables are extracted from the attributes which you send to Salv system with your transaction or person data and are accessible as table columns in queries in the usual way: E.g. transaction.sender_account, person.type. Other fields which are not extracted into their own columns can be queried in the following way:

    1. Transaction attributes (transaction.attributes) use hstore format and require the following notation:

      1. transaction.attributes -> 'transaction_attribute'

    2. Person attributes (person.attributes) use jsonb format and require the following notation:

      1. person.attributes ->> 'person_attribute'

  12. When checking conditions in transaction history or some other attribute, using EXISTS and/or NOT EXISTS clauses is a best practice and is much quicker than using a JOIN. You may want to make use of this when checking dormant account or if this is the ****first transaction of a specific type which has been made (E.g. the first incoming transaction to business customer):

    select 
    not exists(
    select
    1
    from transaction new
    join person p on p.id = $personId
    where p.type in ('BUSINESS', 'LEGAL')
    and new.direction = 'INCOMING'
    and new.id != $transactionId )

    No other this type of transaction exists, so the current one must be the first

    select 
    *
    from person_aggregate pa
    where not exists
    (select 1 from service_provider_person sps where sps.person_id=pa.person_id) -- person is not a MSB

  13. If you use a HAVING clause at the end, these should only be used when filtering on an aggregated field. E.g. HAVING count(p.id) > 5

  14. In order to extract median and percentiles of a distribution, please use the following notation:

    select 
    avg(extract( epoch from (ast.created_time - a.created_time))/3600) as avg_turnaround_time
    ,percentile_disc(0.50) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as median_turnaround_time
    ,percentile_disc(0.95) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as "95th_percentile_turnaround_time"
    ,percentile_disc(0.99) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as "99th_percentile_turnaround_time"
    from alert a
    join alert_status ast on a.id=ast.alert_id
    where customer_id=18
    and status in ('CLOSED_AS_FALSE_POSITIVE','CLOSED_AS_TRUE_POSITIVE')
    and end_time is null

  15. The LAG function, which is used to access previous rows data as per defined offset value, is useful when checking, for example, the previous three values. This function is not useful when looking further back in time. E.g. two hours, two days, etc.

    with last_three_loans as ( 
    select
    ((attributes -> 'credit_count')::numeric - LAG((attributes -> 'credit_count')::numeric,
    -1) OVER (ORDER BY timestamp desc)) > 0 as closed_loans
    ,(timestamp - (attributes -> 'credit_given_date')::timestamp) <= '3 days'::interval as closing_time_short
    from transaction
    where person_id = $personId
    order by timestamp desc
    limit 3 )
    select
    count(case when closed_loans and closing_time_short THEN 1 END) > 2
    from last_three_loans

  16. The Alerting interval for your scenarios is your risk decision and will depend on the risk appetite of your organisation. When asked about alerting intervals for periodic scenarios, we recommend the following:

    1. 32 days (1 month)

    2. 63 days (2 months)

    3. 93 days (3 months)

  17. When monitoring payment reference field for red flag words, the following notation can come in handy:

    select (reference ~* '\y($query_template_danger_words)\y', reference, amount 
    from transaction where id = $transactionId

    E.g.:

    select 'PS5kas car€ as cart' ~* '\y(PS5|car)\y' 
    --also returns true

    select 'PS5kas car^ as cart' ~* '\y(PS5|car)\y'
    --also returns true

    select 'PS5kas scar as cart' ~* '\y(PS5|car)\y'
    --returns false

    select 'PS5kas carl as cart' ~* '\y(PS5|car)\y'
    --returns false

  18. If your organisation is using aggregations to speed up query performance, please note that the person_aggregate table is synchronously calculated/updated and can be used in real-time scenarios. However, the service_provider_aggregate table is asynchronously calculated and should not be used for real-time monitoring.

    • For example, in the case of a scenario which generates an alert if the current (triggering) transaction amount exceeds 2x the historic average in the last 3 months, if we calculate the average using the person_aggregate table, the current transaction is already included. Therefore, instead of:

      sum(case when past.aggregate_operation = 'SUM' then value end) / sum(case when past.aggregate_operation = 'COUNT' then value end) 
      as average
    • the comparison we should use is:

      (sum(case when past.aggregate_operation = 'SUM' then value end) - new.amount)/(sum(case when past.aggregate_operation = 'COUNT' then value end) -1)
    • In this case, the overall scenario logic would be:

      select 
      --logic test
      --parameters for alert details
      from transaction new
      join person_aggregate past on new.person_id=past.person_id
      where new.id = $transactionId
      and past.aggregate_date >= date(new.timestamp) - '3 months'::interval
      and past.aggregate_date <= new.timestamp

Some further useful tricks

  • ~* matches regular expression, case insensitive 'thomas' ~* '.*Thomas.*' - link to documentation.

  • For TRANSACTION scenarios always use $transactionTimestamp as the “current” time indicator when limiting scenario time wise as seen in examples above (instead of inbuilt Postgres current timestamps like now() or current_date).

Did this answer your question?