Here are some tips from our Data Science team on how to write scenarios so that they will be dependant, performable, human-readable, and have a clear audit trail for when changes are made.
General guidelines
Whenever writing a new scenario or making changes, utilise the
Note
field and always write a summary of what was changed. This is not only necessary for both internal and external audits - it also helps other users to track changes and understand the purpose behind a change.Often during Transaction monitoring, you will want to join the
person
table to have access to person-level data. When doing this, please make use oforganisationId
andpersonId
. This narrows the search of the person whose data you wish to join and helps to speed up the query.join person p on
p.id = $personId
and p.customer_id = $organisationIdFor TRANSACTION scenarios always use
$transactionTimestamp
as the “current” time indicator when limiting scenario time wise as seen in examples below. So instead of inbuilt Postgres current timestamps likenow()
orcurrent_date
; andnew.timestamp
use the placeholder$transactionTimestamp
.In case of joining
person_aggregate
table and setting time specific restrictions onperson_aggregate.aggregate_date
, then please use the casted version of the transaction timestamp -$transactionTimestamp::timestamp
- as it has different variable type.This placeholder will fetch the
timestamp
from the current transaction (i.e.new.id = $transactionId
) and substitute it straight into the SQL query in a separate step. We have seen a considerable improvement in scenario performance since more optimal indexes are used. Also ifnow()
is used in scenarios, then testing functionality is not usable since time ranges will be fixed as of initiating a scenario test.So for example when joining past transactions and person_aggregate table:
select true
from transaction new
join transaction past on past.person_id = $personId
join person p on p.id = $personId
join person_aggregate on person_aggregate.person_id = $personId
where new.id = $transactionId
and p.customer_id = $organisationId
and p.type = 'INDIVIDUAL'
and p.created_time > $transactionTimestamp - '1 month'::interval
and past.timestamp between $transactionTimestamp - '1 days'::interval
and $transactionTimestamp
and person_aggregate.organisation_id = $organisationId
and person_aggregate.aggregate_type = 'SUM_TRANSACTIONS'
and person_aggregate.aggregate_operation = 'SUM'
and person_aggregate.aggregate_date between $transactionTimestamp::timestamp - '1 days'::interval
and $transactionTimestamp::timestampIt is generally a good idea to use as few subqueries as possible. Try as we might, there is usually no outsmarting the Postgres engine. The outer
where
clause is the main tool to narrow the search and make queries fast and performant.One naming convention that is intuitive and helpful when using aliases in queries is to use
new
when referring to the current (triggering) transaction andpast
when joining historic transactions. For other tables, we typically use the first letter(s) as an alias to abbreviate the full table name. E.g.person_aggregate -> pa (or past)
,person -> p
select sum(past.amount)>10000
,sum(past.amount)
from transaction new
join transaction past on past.person_id = $personId
join person p on p.id = $personId
where
new.id = $transactionId
and past.timestamp between $transactionTimestamp -'1 days'::interval and $transactionTimestamp
and p.customer_id = $organisationId
and p.type = 'INDIVIDUAL'In order to streamline the scenario, try to limit the scenario from triggering transaction (
new
) inwhere
clause as much as possible.For ease of editing scenarios, when selecting certain fields in the
select
statement, place the comma at the beginning of the row and not at the end. This makes it easier to comment out certain fields.When necessary, feel free to use CTE-s. They are not the first choice, but when a subquery will be used several times in the main query, they can be useful.
One common question relates to how to JOIN tables and in which order. Usually the Postgres engine and query optimiser takes over and the order of joins does not play a major role in performance. However, we recommend joining smaller tables first and, if possible, always prefer inner joins to left outer joins.
In order to get a float number, we recommend casting using
::float
instead of multiplying by*1.0
, as it is more human-readable.Many fields in
transaction
andperson
tables are extracted from the attributes which you send to Salv system with your transaction or person data and are accessible as table columns in queries in the usual way: E.g.transaction.sender_account
,person.type
. Other fields which are not extracted into their own columns can be queried in the following way:Transaction attributes (
transaction.attributes
) usehstore
format and require the following notation:transaction.attributes -> 'transaction_attribute'
Person attributes (
person.attributes
) usejsonb
format and require the following notation:person.attributes ->> 'person_attribute'
When checking conditions in transaction history or some other attribute, using EXISTS and/or NOT EXISTS clauses is a best practice and is much quicker than using a JOIN. You may want to make use of this when checking dormant account or if this is the ****first transaction of a specific type which has been made (E.g. the first incoming transaction to business customer):
select
not exists(
select
1
from transaction new
join person p on p.id = $personId
where p.type in ('BUSINESS', 'LEGAL')
and new.direction = 'INCOMING'
and new.id != $transactionId )No other this type of transaction exists, so the current one must be the first
select
*
from person_aggregate pa
where not exists
(select 1 from service_provider_person sps where sps.person_id=pa.person_id) -- person is not a MSBIf you use a HAVING clause at the end, these should only be used when filtering on an aggregated field. E.g.
HAVING count(p.id) > 5
In order to extract median and percentiles of a distribution, please use the following notation:
select
avg(extract( epoch from (ast.created_time - a.created_time))/3600) as avg_turnaround_time
,percentile_disc(0.50) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as median_turnaround_time
,percentile_disc(0.95) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as "95th_percentile_turnaround_time"
,percentile_disc(0.99) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as "99th_percentile_turnaround_time"
from alert a
join alert_status ast on a.id=ast.alert_id
where customer_id=18
and status in ('CLOSED_AS_FALSE_POSITIVE','CLOSED_AS_TRUE_POSITIVE')
and end_time is nullThe LAG function, which is used to access previous rows data as per defined offset value, is useful when checking, for example, the previous three values. This function is not useful when looking further back in time. E.g. two hours, two days, etc.
with last_three_loans as (
select
((attributes -> 'credit_count')::numeric - LAG((attributes -> 'credit_count')::numeric,
-1) OVER (ORDER BY timestamp desc)) > 0 as closed_loans
,(timestamp - (attributes -> 'credit_given_date')::timestamp) <= '3 days'::interval as closing_time_short
from transaction
where person_id = $personId
order by timestamp desc
limit 3 )
select
count(case when closed_loans and closing_time_short THEN 1 END) > 2
from last_three_loansThe Alerting interval for your scenarios is your risk decision and will depend on the risk appetite of your organisation. When asked about alerting intervals for periodic scenarios, we recommend the following:
32 days (1 month)
63 days (2 months)
93 days (3 months)
When monitoring payment reference field for red flag words, the following notation can come in handy:
select (reference ~* '\y($query_template_danger_words)\y', reference, amount
from transaction where id = $transactionIdE.g.:
select 'PS5kas car€ as cart' ~* '\y(PS5|car)\y'
--also returns true
select 'PS5kas car^ as cart' ~* '\y(PS5|car)\y'
--also returns true
select 'PS5kas scar as cart' ~* '\y(PS5|car)\y'
--returns false
select 'PS5kas carl as cart' ~* '\y(PS5|car)\y'
--returns falseIf your organisation is using aggregations to speed up query performance, please note that the
person_aggregate
table is synchronously calculated/updated and can be used in real-time scenarios. However, theservice_provider_aggregate
table is asynchronously calculated and should not be used for real-time monitoring.For example, in the case of a scenario which generates an alert if the current (triggering) transaction amount exceeds 2x the historic average in the last 3 months, if we calculate the average using the
person_aggregate
table, the current transaction is already included. Therefore, instead of:sum(case when past.aggregate_operation = 'SUM' then value end) / sum(case when past.aggregate_operation = 'COUNT' then value end)
as averagethe comparison we should use is:
(sum(case when past.aggregate_operation = 'SUM' then value end) - new.amount)/(sum(case when past.aggregate_operation = 'COUNT' then value end) -1)
In this case, the overall scenario logic would be:
select
--logic test
--parameters for alert details
from transaction new
join person_aggregate past on new.person_id=past.person_id
where new.id = $transactionId
and past.aggregate_date >= date(new.timestamp) - '3 months'::interval
and past.aggregate_date <= new.timestamp
Some further useful tricks
~*
matches regular expression, case insensitive'thomas' ~* '.*Thomas.*'
- link to documentation.For TRANSACTION scenarios always use
$transactionTimestamp
as the “current” time indicator when limiting scenario time wise as seen in examples above (instead of inbuilt Postgres current timestamps likenow()
orcurrent_date
).