Here are some tips from our Data Science team on how to write scenarios so that they will be dependant, performable, human-readable, and have a clear audit trail for when changes are made.

General guidelines

Whenever writing a new scenario or making changes, utilise the Note field and always write a summary of what was changed. This is not only necessary for both internal and external audits - it also helps other users to track changes and understand the purpose behind a change.
Often during Transaction monitoring, you will want to join the person table to have access to person-level data. When doing this, please make use of organisationId and personId. This narrows the search of the person whose data you wish to join and helps to speed up the query.
```
join person p on 
   p.id = $personId
   and p.customer_id = $organisationId
```
For TRANSACTION scenarios always use $transactionTimestamp as the “current” time indicator when limiting scenario time wise as seen in examples below. So instead of inbuilt Postgres current timestamps like now() or current_date; and new.timestamp use the placeholder $transactionTimestamp.

In case of joining person_aggregate table and setting time specific restrictions on person_aggregate.aggregate_date, then please use the casted version of the transaction timestamp - $transactionTimestamp::timestamp - as it has different variable type.

This placeholder will fetch the timestamp from the current transaction (i.e. new.id = $transactionId) and substitute it straight into the SQL query in a separate step. We have seen a considerable improvement in scenario performance since more optimal indexes are used. Also if now() is used in scenarios, then testing functionality is not usable since time ranges will be fixed as of initiating a scenario test.

So for example when joining past transactions and person_aggregate table:
```
select true
   from transaction new
      join transaction past on past.person_id = $personId
      join person p on p.id = $personId
      join person_aggregate on person_aggregate.person_id = $personId
   where new.id = $transactionId
      and p.customer_id = $organisationId
      and p.type = 'INDIVIDUAL'
      and p.created_time > $transactionTimestamp - '1 month'::interval
      and past.timestamp between $transactionTimestamp - '1 days'::interval
                         and $transactionTimestamp
      and person_aggregate.organisation_id = $organisationId
      and person_aggregate.aggregate_type = 'SUM_TRANSACTIONS'
      and person_aggregate.aggregate_operation = 'SUM'
      and person_aggregate.aggregate_date between $transactionTimestamp::timestamp - '1 days'::interval 
                         and $transactionTimestamp::timestamp
```
It is generally a good idea to use as few subqueries as possible. Try as we might, there is usually no outsmarting the Postgres engine. The outer where clause is the main tool to narrow the search and make queries fast and performant.

One naming convention that is intuitive and helpful when using aliases in queries is to use new when referring to the current (triggering) transaction and past when joining historic transactions. For other tables, we typically use the first letter(s) as an alias to abbreviate the full table name. E.g. person_aggregate -> pa (or past) , person -> p

select sum(past.amount)>10000 
   ,sum(past.amount) 
from transaction new 
   join transaction past on past.person_id = $personId 
   join person p on p.id = $personId 
where 
   new.id = $transactionId 
   and past.timestamp between $transactionTimestamp -'1 days'::interval and $transactionTimestamp 
   and p.customer_id = $organisationId 
   and p.type = 'INDIVIDUAL'

In order to streamline the scenario, try to limit the scenario from triggering transaction (new ) in where clause as much as possible.
For ease of editing scenarios, when selecting certain fields in the select statement, place the comma at the beginning of the row and not at the end. This makes it easier to comment out certain fields.
When necessary, feel free to use CTE-s. They are not the first choice, but when a subquery will be used several times in the main query, they can be useful.
One common question relates to how to JOIN tables and in which order. Usually the Postgres engine and query optimiser takes over and the order of joins does not play a major role in performance. However, we recommend joining smaller tables first and, if possible, always prefer inner joins to left outer joins.
In order to get a float number, we recommend casting using ::float instead of multiplying by *1.0 , as it is more human-readable.
Many fields in transaction and person tables are extracted from the attributes which you send to Salv system with your transaction or person data and are accessible as table columns in queries in the usual way: E.g. transaction.sender_account, person.type. Other fields which are not extracted into their own columns can be queried in the following way:
1. Transaction attributes (transaction.attributes) use hstore format and require the following notation:
  1. transaction.attributes -> 'transaction_attribute'
2. Person attributes (person.attributes) use jsonb format and require the following notation:
  1. person.attributes ->> 'person_attribute'
When checking conditions in transaction history or some other attribute, using EXISTS and/or NOT EXISTS clauses is a best practice and is much quicker than using a JOIN. You may want to make use of this when checking dormant account or if this is the ****first transaction of a specific type which has been made (E.g. the first incoming transaction to business customer):
```
select 
   not exists( 
   select 
      1 
   from transaction new 
      join person p on p.id = $personId 
   where p.type in ('BUSINESS', 'LEGAL') 
      and new.direction = 'INCOMING' 
      and new.id != $transactionId )
```
No other this type of transaction exists, so the current one must be the first
```
select 
* 
from person_aggregate pa 
where not exists 
   (select 1 from service_provider_person sps where sps.person_id=pa.person_id) -- person is not a MSB
```
If you use a HAVING clause at the end, these should only be used when filtering on an aggregated field. E.g. HAVING count(p.id) > 5

In order to extract median and percentiles of a distribution, please use the following notation:

select 
avg(extract( epoch from (ast.created_time - a.created_time))/3600) as avg_turnaround_time 
,percentile_disc(0.50) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as median_turnaround_time 
,percentile_disc(0.95) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as "95th_percentile_turnaround_time" 
,percentile_disc(0.99) within group (order by extract( epoch from (ast.created_time - a.created_time))/3600) as "99th_percentile_turnaround_time" 
from alert a 
   join alert_status ast on a.id=ast.alert_id 
where customer_id=18 
   and status in ('CLOSED_AS_FALSE_POSITIVE','CLOSED_AS_TRUE_POSITIVE') 
   and end_time is null

The LAG function, which is used to access previous rows data as per defined offset value, is useful when checking, for example, the previous three values. This function is not useful when looking further back in time. E.g. two hours, two days, etc.

with last_three_loans as ( 
   select 
      ((attributes -> 'credit_count')::numeric - LAG((attributes -> 'credit_count')::numeric, 
      -1) OVER (ORDER BY timestamp desc)) > 0 as closed_loans 
      ,(timestamp - (attributes -> 'credit_given_date')::timestamp) <= '3 days'::interval as closing_time_short 
   from transaction 
   where person_id = $personId 
   order by timestamp desc 
   limit 3 ) 
select 
   count(case when closed_loans and closing_time_short THEN 1 END) > 2
from last_three_loans

The Alerting interval for your scenarios is your risk decision and will depend on the risk appetite of your organisation. When asked about alerting intervals for periodic scenarios, we recommend the following:
1. 32 days (1 month)
2. 63 days (2 months)
3. 93 days (3 months)

When monitoring payment reference field for red flag words, the following notation can come in handy:

select (reference ~* '\y($query_template_danger_words)\y', reference, amount 
from transaction where id = $transactionId

E.g.:

select 'PS5kas car€ as cart' ~* '\y(PS5|car)\y' 
--also returns true 

select 'PS5kas car^ as cart' ~* '\y(PS5|car)\y' 
--also returns true 

select 'PS5kas scar as cart' ~* '\y(PS5|car)\y' 
--returns false 

select 'PS5kas carl as cart' ~* '\y(PS5|car)\y' 
--returns false

If your organisation is using aggregations to speed up query performance, please note that the person_aggregate table is synchronously calculated/updated and can be used in real-time scenarios. However, the service_provider_aggregate table is asynchronously calculated and should not be used for real-time monitoring.
- For example, in the case of a scenario which generates an alert if the current (triggering) transaction amount exceeds 2x the historic average in the last 3 months, if we calculate the average using the person_aggregate table, the current transaction is already included. Therefore, instead of:
```
sum(case when past.aggregate_operation = 'SUM' then value end) / sum(case when past.aggregate_operation = 'COUNT' then value end) 
   as average
```
- the comparison we should use is:
```
(sum(case when past.aggregate_operation = 'SUM' then value end) - new.amount)/(sum(case when past.aggregate_operation = 'COUNT' then value end) -1)
```
- In this case, the overall scenario logic would be:
```
select 
--logic test 
--parameters for alert details 
from transaction new 
   join person_aggregate past on new.person_id=past.person_id
where new.id = $transactionId 
   and past.aggregate_date >= date(new.timestamp) - '3 months'::interval 
   and past.aggregate_date <= new.timestamp
```

Some further useful tricks

~* matches regular expression, case insensitive 'thomas' ~* '.*Thomas.*' - link to documentation.
For TRANSACTION scenarios always use $transactionTimestamp as the “current” time indicator when limiting scenario time wise as seen in examples above (instead of inbuilt Postgres current timestamps like now() or current_date).

Data Dictionary for Scenarios and Risk Rules

Breakdown of a TM scenario: aggregations: ‘Current transaction is sent to a recipient account that has received a total of 20000 amount in the last 30 days from at least 20 other senders’

Breakdown of a TM scenario: jsonb and hstore attributes in queries

How to implement Custom lists in Monitoring scenarios

Scenarios with accessing previous alerts

Best practices for writing and optimising Monitoring scenarios

General guidelines

Some further useful tricks