Data Transformations - User Guide
NOTE: This feature is considered and Administrative/Power User function. This functionality is only available to Administrative users. If this option is required and doesn't appear in the Navigation Bar, contact Alterian Support to raise a request for access.
Overview & Background
The number of unique Location values in data being collected from website tags in particular, or other integration code via DDE for the Brand in question, can be an issue to give useful Insights and Analytics to Alterian Customers, due to extended URL strings.
Items in the full URL such as GUID’s, #fragments, sessionid’s etc, can cause “like” Locations to become unique due the addition of these items in the URL.
For example in the URL
AcadianFinance/myportal/myaccount/paycreditcard/#Z7_31MEH4C0NGQ450QO24SVRO1SI2
the /# section is for session identification, which within the Locations column, is not desired as it makes the /paycreditcard unique for every Bank Customer and therefore cannot be used to analyse common behaviour. To facilitate analysis of, for example, behaviour leading up to, or after the /paycreditcard Location, it must be a common Location across all Bank Customers.
Therefore the URL above needs to be “transformed” to simply
AcadianFinance/myportal/myaccount/paycreditcard
This feature will allow the definition of a comprehensive Data Transform against the Location column that can be inserted into the execution of both Journey Visualisation (Sankey) and JA Dashboards to give more power to the Location analytics
IMPORTANT NOTE: At time of writing (CX 6.21), most clients will have a form of this Data Transformation functionality in place, specifically for iLoader exports to, predominantly, the JA Dashboards. This is designed to simply remove the very low occurring Locations. By way of a safety measure, any “low frequency cut” transforms already existing as part of the Power BI JA Dashboard creation, will still be in place after the installation of CX 6.22, and will be executed AFTER, any new Data Transform published via this feature. This is to ensure that functionality is preserved.
Accessing Data Transformation
The Data Transform feature is only available to those in the Administrators Group within the CX based security and is therefore off by default for analytics user. The analyst will “use” the transform when it is assigned by the Power User, but generally would not be hands on in its creation, It is important though that the analyst has input, as discussed later.
Once available simply click on Data Transformations in the CX Navigation Bar under the Data Preparation section.
This will display the homepage grid that simply displays any existing Transformation on the system.
Creating a New Data Transform
The new icon allows the creation of a new Transform
Source Data
Selection of the Source Data allows the user to select the Event Stream that the Transform will be built against and therefore assigned to.
A single Transform can be applied to one Event Stream/Target only, and therefore currently each Target must have its own Transform.
This section allows the definition of the column for the Transform to be created on. In this release, the only available option is the Location column, but in future this will be enhanced to allow, for example, Category Transforms.
Sampling
From a performance perspective, it may not practical to build and test a Transform on an entire data set. Therefore we allow the user to defined a sample count and type, with a maximum of 1 million, selecting first, latest or random. Exceeding this limit will result in an error situation. If this occurs simply reduce the sample size.
Transformations
This section of the UI allows the user to build a Transformation for use with the Visualisations using a number of tools provided.
The left hand section allows the addition of Data Transformation functions to be applied to the data in series.
To add the first Transform use the option
This will populate with a new Data Transform function placeholder, and displays a count of the number of unique Locations in the sampled data. Each additional Data Transformation function will display the count of unique Locations after that function has been performed.
IMPORTANT TO REMEMBER: A Journey Visualisation process has a limit of 32,000 discrete Locations to be able to build the visualisation. So viewing the count of discrete Locations as the user builds a transform is useful to ensure a Journey Visualisation will display. The focus must also be to work with analysts to define the section of the URL they want to analyse. The Post-Transform result, is the label that will be given to Journey Visualisation Nodes and data displayed in the Dashboard. It is imperative that this transform is done in conjunction with, and with the agreement of, the CX Analyst.
The right hand of the screen contains a two section grid displaying the resulting Location before and after the series of functions has been applied, labelled as pre and post transform.
The first and second columns are the frequency count and the Pre-Transform Locations in the sampled data.
The third and fourth columns are the frequency count and the Post-Transform Locations in the sampled data.
Both frequencies can be sorted for flexible ease of use.
A search box in available on both the Pre and Post Transforms for ease of use allowing search for specific strings that are know to require Transform, or to check the Post Transform to help validate the function performed.
A toggle allows the view of All Data or filtered to Transformed and Non-Transformed.
These three filter options work in conjunction with each other at all times
Important Usability Notes:
When building Transforms it is important to note the order in which the Transformation function are placed in. They run is sequence from top to bottom, so the result of one Transform function passes the result to the next, so the order of these is important.
For example, a function that Transforms Locations of low occurrence, for example remove any Locations that occur less than 10 times, should only be done at the very end of the full Transform, Low occurrence Locations may become more common, when additions strings such as # fragments are removed. Therefore the Find, replace functions, such as described above with the /paycreditcard example, should be done before any low frequency cuts. If the low frequency cut was aced first, the /paycreditcard paths with the sessionId would be transformed, possible to an undesirable label
For example, the URL below may be unique (only occur once) in the data due to the /# fragment.
myportal/acadianfinance/myaccount/transfer/#Z7_31MEH4C0NGQ450QO24SVRO1SI2
If a low frequency cut is placed at the start, this would be Transformed to a generic value, whereas the correct approach would be to initially have a transform that removes that /# fragment first to create
myportal/acadianfinance/myaccount/transfer
Which is likely now, to be a very common and high frequency Location and very desirable in the data.
Creating a Transform Process
Select the Create New option and save.
Set the Event Stream and Sampling
Create the Transform based on the available functions below
Available Functions
The following functions exists for use in the UI
Replace a full path if it contains specific characters
This function performs a full replacement of the Location value if it contains the specific character. This might typically be used to transform low frequency Locations with a static value
Define the Qualifier, Target characters(s) and replacement value.
An optional Frequency input is available to allow this to be performed low frequency URL’s. Leaving this option blank performs on all column values
Find and replace specific characters within a path
This function replaces the defined Target, with the defined Replacement, leaving the rest of the URL in place. This is typically used without a Frequency value to strip out or replace sections of a Location, perhaps with a more user friendly label.
Define Target and Replacement
An optional Frequency input is available to allow this to be performed low frequency URL’s. Leaving this option blank performs on all column values
Find and Replace before value(s)
This function Matches the Target string, and replaces the Target along with all text to the left of the target with the Replacement. The UI allows definition of either the first or last occurrence of the Target in the Location value. This is typically used to remove everything to the left of, for example, the domain, removing the https:// etc
Define the Qualifier, Target and Replacement.
An optional Frequency input is available to allow this to be performed low frequency URL’s. Leaving this option blank performs on all column values
Find and replace after value(s)
This function Matches the Target string, and replaces the Target along with all text to the right of the target with the Replacement. The UI allows definition of either the first or last occurrence of the Target in the Location value. This is typically used to remove everything after, for example a /# section of a URL path, which would typically be replaced with nothing i.e. removed
Define the Qualifier, Target and Replacement.
An optional Frequency input is available to allow this to be performed low frequency URL’s. Leaving this option blank performs on all column values
Transform using RegEx
This function allows the definition of the Target as a Regex expression. Based non Regex knowledge, this can have a broad usage.
Example: define Target as \d+ and replacing with blank would remove all numeric characters from the Location
Example: Define Target as ^[/-] removes any leading characters that are not letters i.e. “/” or “-”
NOTE: Some characters are specific to Regex, using these incorrectly may cause an undefined error. Always refer to regex syntax documentation
Lowercase all values
Replaces all string values to their lowercase value.
Fine and replace by frequency
This function performs a replace of a URL or other Location value, when the number of occurrences, or frequency, is less that the defined value. Replacing with no value would mean this Location is not considered in Sankey processing. This is typically used to trim off Location that are very spare in the data and not of value.
Define Replace with value and frequency threshold
Find and replace by string length
This function performs a replace of a URL or other Location value, based on the length of the string. Replaces with no value would mean this Location is not considered in Sankey processing. This is typically used to trim off Location that are very spare in the data and not of value.
Define Replace with value and minimum length of value to be transformed
An optional Frequency input is available to allow this to be performed low frequency URL’s. Leaving this option blank performs on all column values.
Assigning Transforms to Visualisations
Once a Transform has been created, it can be assigned to the Visualisations. This will mean that the Transform will execute when extracting Event data for analysis in any of the existing Visualisations.
Form the File menu simply select, Add to Visualisations
Important Usability Notes:
Once a transform is assigned, it will be used by all Analysts on all visualisation, so working to ensure they approve the Transform and then see its us in the application and have the chance to feed back, is a very important part of the process. Creations of Transforms may well be iterative
Once a Transform has been assigned, it can still be worked on in the background. The Journey Visualisation screen will display a label of the Transform in use, along with its version. The Data Transform Homepage will display the current version of the document, which may not be the same as the published version.
There is currently no feature to un-assign, so if the Transform is required to be removed, it is suggested a blank Transform is created saved, and assigned.