R in Insurance, London 2016
The fourth R in Insurance conference took place at Cass Business School, London on 11 July 2016.
Agenda
[09:00 - 09:05] Introduction
- Opening Remarks (Andreas Tsanakas, Cass Business School) [slides]
[09:05 - 11:00] Session 1: Data and Technical Solutions
- (New) Challenges in Actuarial Science (Mario Wüthrich, RiskLab ETH Zurich) [slides]
- Acquiring External Data with R (Mark Chisholm, XLCatlin) [slides]
- Efficient, consistent and flexible Credit Risk simulation with TRNG and RcppParallel (Riccardo Porreca, Mirai Solutions) [slides]
- Grid Computing in R with Easy Scalability in the Cloud (Jonathan Adams, ARMtech Insurance Services) [slides]
[11:00 - 11:30] Coffee break
[11:30 - 12:30] Session 2: Lightning talks
- Investigating the correlation between month of birth and diagnosis of specific diseases (David Smith) [slides]
- Measuring the Length of the Great Recession via Lapse Rates: A Bayesian Approach to Change-Point Detection (Michael Crawford, Applied AI) [slides]
- Data Science vs Actuary: A Perspective using Shiny and HTMLWidgets (Richard Pugh, Mango) [slides]
- estudy2: an R package for the event study in insurance (Iegor Rudnytskyi, University of Lausanne) [slides]
- R as a Service (Mark Sellors, Mango Solutions) [slides]
- RPGM, an example of use with IBNR (Nicolas Baradel, PGM Solutions) [slides]
[12:30 - 13:30] Lunch
[13:30 - 14:30] Session 3: Insurance and statistical modelling in R
- Telematics insurance: Impact on tarification (Roel Verbelen, KU Leuven) [slides]
- Modelling the impact of reserving in high inflation environments (Marcela Granados, EY) [slides]
- An R package of a partial internal model for life insurance (Jinsong Zheng, Talanx AG / University of Duisburg-Essen) [slides]
[14:30 - 15:00] Panel discussion:
- Analytics: Transforming Insurance Businesses
[15:00 - 15:30] Coffee
[15:30 - 17:30] Session 4: Case studies with R in action
- Global Teleconnections (Sundeep Chahal, Lloyd’s) [no slides]
- Probabilistic Graphical Models for Detecting Underwriting Fraud (Mick Cooney, Applied AI) [slides]
- R, Shiny and the Oasis Loss Modelling Framework – a toolkit for Catastrophe modelling (Mark Pinkerton, OASIS) [slides]
- Persuasive Advice for Senior Management: the Three-C’s (Dan Murphy, Trinostics) [slides]
- Announcement R in Insurance 2017 [slides]
[18:30 - 22:00] Conference Dinner:
Abstracts
(New) Challenges in Actuarial Science
Mario Wüthrich, RiskLab ETH Zurich
Currently, the actuarial functions go through massive changes. Many of these changes are data driven, in particular, data analytics and machine learning techniques will likely be playing a key role in actuarial science.
This talk gives actuarial examples of these (new) challenges and it will highlight where the R community can support the actuarial profession.
Acquiring External Data with R
Mark Chisholm, XLCatlin
Many web sites provide data that may be relevant to insurers. This presentation will demonstrate how R can be a helpful tool for acquiring external data that is available on the internet.
Data is a scarce asset in many classes of business in the insurance industry, which makes it difficult to perform multivariate analysis. One reason this problem arises is because an insurer’s internal databases may only accurately record a handful of policy attributes, which limits the number of variables that can be studied. Another reason is that a company may have only recently entered a new class of business, which limits the number of policyholders that can be included in an analysis.
Underwriters can benefit from data available in web sites, but extracting this information can be manual and time-intensive. There are packages available to R users who wish to gather data from web sites without having to learn a new programming language. For example, the RSelenium package provides R bindings for the Selenium WebDriver. While this tool is commonly used for testing web applications, its ability to automate web browsers makes it useful for obtaining data from web sites. In addition, the rvest package can be helpful in extracting tables in web sites and storing them in R data frames. Once this information is downloaded and organised, it can then be used to enrich an insurer’s internal data.
Attendees will learn:
- How R packages can be used to extract data from web sites, and their relevance to insurance
- Examples of RSelenium and rvest in action
- Considerations when accessing data from web sites
Efficient, consistent and flexible Credit Risk simulation with TRNG and RcppParallel
Riccardo Porreca, Mirai Solutions
We will show how we have combined RcppParallel with TRNG (Tina’s Random Number Generation) to achieve efficient, flexible and reproducible parallel Monte Carlo simulations in modelling credit default risk in correlation with market risk.
For rare yet correlated default events of large portfolios (with securities from several thousands of counter-parties) a substantial simulation effort is required to produce meaningful risk measures. Running frequent simulations of sub-portfolios to gain extra insight under stressed conditions or perform impact assessments and what-if scenarios, while possibly maintaining exact reproducibility of full-portfolio results (i.e. such a stand-alone simulation should be identical to extracting results of a corresponding subset from a full simulation run), poses a further complication.
It is typically desired – although often not achieved for various reasons – to have a solution that is “playing fair”, meaning that a parallel execution on a multi-core architecture yields results independent of the architecture, parallelisation techniques and number of parallel processes. At the same time, (Pseudo) Random Number Generators used to draw random numbers in Monte Carlo simulations are intrinsically sequential.
TRNG together with RcppParallel addresses and resolves all these challenges with speed and elegance. RcppParallel offers in-memory, thread-safe access to R objects from its workers. This example stems from the context of Solvency II Internal Model (Economic Capital Model, Solvency Capital Requirements) and was originally carried out as a project for a large global insurer. What we present here shares the ideas and concepts but is a newer model and implementation.
We are hosting a new R package on Github (https://github.com/miraisolutions/rTRNG), which embeds TRNG C++ sources and headers, provides simple examples of how to use parallel RNG with RcppParallel, and exposes some functionality from TRNG into R for easier access, testing and benchmarking.
Grid Computing in R with Easy Scalability in the Cloud
Jonathan Adams, ARMtech Insurance Services
Parallel computing is useful for speeding up computing tasks and many R packages exist to aid in using parallel computing. Unfortunately it is not always trivial to parallelise jobs and can take a significant amount of time to accomplish, time that may be unavailable. Several modelling tasks have arisen in my line of work that have required lots of compute time, but the results of which have been needed in a short amount of time. These tasks have ranged from building multiple neural network models for crop yields to running many realisations of a catastrophe model.
My presentation will demonstrate an alternative method that allows for processing of multiple jobs simultaneously across any number of servers using Redis message queues. This method has proven very useful since I began implementing it at my company over two years ago. In this method, a main Redis server handles communication with R processes on any number of servers. These processes, known as workers, inform the server that they are available for processing and then wait indefinitely until the server passes them a task.
In this presentation, it will be demonstrated how trivial it is to scale up or down by adding or removing workers, and how cheap and easy it can be to perform this scaling in the cloud. This will be demonstrated with sample jobs run on workers in the Amazon cloud. Additionally, this presentation will show you how to implement such a system yourself with the rminions package I have been developing. This package is based on what I have learned over the past couple of years and contains functionality to easily start workers, queue jobs, and even perform R-level maintenance (such as installing packages) on all connected servers simultaneously!
GitHub Repository: https://github.com/PieceMaker/rminions
Investigating the correlation between month of birth and diagnosis of specific diseases
Ben Rickayzen, David Smith, Leonel Rodrigues Lopes Junior, City University
There has been a number of studies into whether the month of birth of an individual affects the probability of them developing particular diseases later in life. The conclusions of these studies though are contradictory with some finding correlations and others not.
Using data from a group of insured lives from Unimed-BH Medical Cooperative, located in southeast Brazil we investigate the possible links in month of birth with the following diseases; diabetes, asthma, cardiovascular, chronic kidney disease, chronic obstructive pulmonary disease, nephrolithiasis and mental health.
To improve the detection of seasonal influence we use R to carry out the Rayleigh test on our data after transforming it into directional data. We also discuss the appropriateness of using the Bonferroni correction for our investigations.
Measuring the Length of the Great Recession via Lapse Rates: A Bayesian Approach to Change-Point Detection
Michael Crawford, Applied AI
Many life insurance companies in Ireland noticed a marked increase in the lapse rates of their policies during the “Great Recession”. Various techniques can be employed to analyse this, but simple graphical and visualisation methods are not particularly revealing for determining a reasonable estimate of when this increase in lapses started and when it ended.
This is important, as any analysis based on data including this period of time will tend to overestimate the future predicted lapse rate. Some careful data manipulation can help measure this effect but fixing a start and end point is important for adjusting models to remove this effect.
In this talk a Bayesian approach to change-point detection is proposed, with an implementation in Stan. Simple visualisations are discussed, along with ideas for how to use the output of this model in subsequent predictive models such as survival analysis.
Data Science vs Actuary: A Perspective using Shiny and HTMLWidgets
Richard Pugh, Mango Solutions
The advent of “Data Science” is having a profound effect on the role of analytics within organisations. In some respects, sectors where analytics as a concept is unused are able to use data science more strategically then in “analytically mature” industries such as insurance. Building on the work presented jointly by Richard Pugh (Mango) and Chris Reynolds (PartnerRe) at the LIFE Conference in 2015, this presentation will:
- Contrast the anticipated skills of a “Data Scientist” vs Actuaries
- Outline possible opportunities for proactive data science in the insurance sector
- Look at the ways in which bodies such as the IoFA are promoting “Data Science” skills
- Discuss ways in which insurance companies are looking to add “Data Science” to supplement actuarial teams
- Suggest key challenges ahead
To compare the skills of “Data Scientists” and Actuaries, we will present Mango’s “Data Science Radar”, a Shiny app with embedded JavaScript using HTMLwidgets. This will be used during the presentation to illustrate the ease with which bespoke R-based applications can be created as an intuitive way to communicate complex ideas. The full Shiny code, based on the radarchart package will be made available.
estudy2: an R package for the event study in insurance
Iegor Rudnytskyi, University of Lausanne
The impact of relevant events on the stock market valuation of companies has been the subject of many studies. An event study is a statistical toolbox that allows to examine the impact of certain events on the firms’ stock valuation. Given the rationality of market participants, the prices of securities immediately incorporate any relevant announcements, information, and updates.
The idea of the event study is to compare the market valuation of the companies during periods related to an event and other (non-event linked) periods. If the behavior of stocks is significantly different in the event-period, then we conclude that an event produces an impact on the market valuation, otherwise we conclude that there is no effect.
The major stream of research is focused on the insurance industry and catastrophe events, therefore, the cross-sectional dependence cannot be neglected. Furthermore, the returns typically are not normally distributed. These points lead to misspecification of the classical parametric tests, and require to validate the results by more tailored and accurate tests (both parametric and nonparametric).
In order to incorporate all these issues we developed the package estudy2 (planned to be submitted to CRAN by August 2016). First, estudy2 incorporates all technical aspects of the rate of return calculation (the core computation is done in C++ by using Rcpp). Also the package incorporates 3 traditional market models: mean-adjusted returns, market-adjusted returns, single-index market model.
Finally, 6 parametric and 6 nonparametric tests of daily cross-sectional abnormal return have been implemented. In addition, the package contains the tests for cumulative abnormal returns (CAR).
In the proposed talk we demonstrate an example from current research, namely, the impact of major catastrophes on insurance firms’ market valuation in order to validate the specification of the tests.
R as a Service
Matt Aldridge, Mango Solutions
Building “R as a Service” to support Production Applications: A Health Insurance Use Case
R is an incredibly powerful language for data analysis, providing a wealth of capabilities to support an analysts’ workflows. However, when we instead look to use R in production systems, there are a number of challenges that arise, such as:
- Big Data – R must scale to enable the application of R modelling approaches to large data sources
- Scale – how could R be used in a scalable, parallel, “always available” manner?
- Centralised – R code must be centralised, versioned and managed to enable change without disruption of a the wider applications
- Integration – to be part of a production “capability”, R must be easily integrated into a wider set of systems
We will propose and describe some ways in which the above challenges can be overcome. In particular, we will present a real-world use case where R was used “as a service” to support a Health Insurance production application for a major insurer.
RPGM, an example of use with IBNR
Nicolas Baradel and William Jouot, PGM Solutions
We present a method in order to estimate the number of claims - above a specific threshold, called large claims - with the IBNR using the method of Schnieper. We show its application with RPGM. RPGM is a software which enables to create R programs by developers without any extra knowledge. The developers build a sequencer, alternating R code execution with GUIs (Graphical User Interfaces). Then, those RPGM programs can be used by everyone: no knowledge in R is needed. RPGM executes each step of the sequencer ; the user only sees the GUIs and the results. By using such a program, we display the methodology of estimating the IBNR with graphics and outputs automatically generated - different sets of data from csv and xlsx files are used.
Telematics insurance: impact on tarification
Roel Verbelen, Katrien Antonio, and Gerda Claeskens, KU Leuven
Telematics technology - the integrated use of telecommunication and informatics – may fundamentally change the car insurance industry by allowing insurers to base their prices on the real driving behavior instead of on traditional policyholder characteristics and historical claims information. Telematics insurance or usage-based insurance (UBI) can drive down the cost for low-mileage clients and good drivers.
Car insurance is traditionally priced based on self-reported information from the policyholder, most importantly: age, license age, postal code, engine power, use of the vehicle, and claims history. Over time, insurers try to refine this a priori risk classification and restore fairness using no-claim discounts and claim penalties in the form of the bonus-malus system. It is expected that these traditional methods of risk assessment will become obsolete. Your car usage and your driver abilities can be better assessed based on telematics data collected, such as: the distance driven, the time of day, how long you have been driving, the location, the speed, harsh or smooth breaking, aggressive acceleration or deceleration, your cornering and parking skills… This high dimensional data, collected on the fly, will force pricing actuaries to change their current practice. New statistical models will have to be developed to adequately set premiums based on individual policyholder’s motoring habits instead of the risk associated to their peer group.
In this work, we take a first step in this direction. We analyze a telematics data set from a European insurer, collected in between 2010 and 2014, in which information is collected on the amount of meters insureds drive. Besides the number of meters driven, we also registered how this distance is divided over the different kind of road types and time slots. This data allows car insurers the use of real driving exposure to price the contract. We build claims frequency models combining traditional and telematics information and discover the relevance and impact of adding the new telematics insights.
List of R packages used: mgcv, data.table, ggplot2, ggmap, compositions, parallel.
Modelling the impact of reserving in high inflation environments
Marcela Granados, EY
An insurance company needs to keep sufficient reserves for fulfilling its long term future payments. However, reserving methods which rely on historical payments, fail in the case of volatile inflation scenarios. This is more profound for long tailed classes involving inflation sensitive cashflows such as litigation expenses, indemnity/medical payments etc. This is because Chain Ladder methods assume that expected incremental losses are proportional to reported losses, and losses in an Accident year are independent of losses in other accident years. High and volatile inflation produces Calendar year effect which invalidates these assumptions.
In such cases, historical payments might not be truly representative of future payments. The reserves in such situations are highly sensitive to inflation, making it the most critical assumption. Therefore, future payments need to be explicitly adjusted for inflation and discounted appropriately, for which actuaries need to get comfortable with inflation rate assumptions. Thus, projecting future inflation rates becomes important for the entire reserve estimation process. This also has implications in pricing, where losses need to be trended and developed to devise rate indications. Explicit recognition of inflation in reserve risk projections is also crucial for capital modelling.
This research study explores the use of Time Series for forecasting inflation rates, especially in countries like Argentina, which are infamous for highly volatile inflation rates. Due to the dynamic nature of inflation volatility, non-linear time series models that account for the changing variances over time, such as ARCH and GARCH are used. Model performance is assessed on how well the model captures the stochastic volatility in the data. Statistical tests are performed to assess the goodness of fit, accuracy of results and model stability. Based on these tests, the appropriate model is selected and its parameters then used to forecast inflation over a longer time horizon.
An R package of a partial internal model for life insurance
Jinsong Zheng, Quantitative Methods, Group Risk Management, Talanx AG / Chair for Energy Trading and Finance, University of Duisburg-Essen
Under Solvency II framework, in order to protect the benefit of shareholder and policyholder, the insurance company should be adequately capitalised to fulfil the capital requirement for solvency. Therefore, two main components should be taken into account, the available capital and the solvency capital requirement.
For the available capital, it refers to shareholder’s net asset value and is defined as the difference between the market value of assets and liabilities. In general, a stochastic model is used to perform the market consistent valuation of the assets and liabilities. We then develop a stochastic cash flow projection model (i.e. asset portfolio consists of coupon bonds and stocks while liability portfolio consists of life insurance products with profit sharing and interest rate guarantee) to capture the evolution of cash flows of assets and liabilities, as well as an Economic Scenario Generator (ESG) (consisting of interest rate model and equity model) to generate economic scenarios including the financial market risk factors through Monte Carlo simulation after calibrating to the market data. Under this framework, we calculate the available capital for a life insurance company through Monte Carlo simulation.
For the Solvency Capital Requirement (SCR), the distribution of available capital at t=1 is taken into account. In principle, the so called nested stochastic simulation should be applied, however, it results quite high computational time and is not quite practical to use this approach. Here we develop the proxy method of replicating portfolio which is widely used in insurance industry to determine the SCR and compare to the nested simulation to check the estimation quality.
In all, we construct a partial internal model to illustrate the calculation of available capital and SCR for a life insurance company by given market data. All the implementation is done in an R-package with Rcpp.
Global Teleconnections
Sundeep Chahal, Lloyd’s
Lloyd’s has worked with the Met Office to model in R the global connections between natural perils such as hurricanes, tornadoes, flooding and wildfire. This has been done by first defining a mathematical relationship between global climate drivers such as El Niño Southern Oscillation (ENSO) and the Atlantic Multi-decadal Oscillation (AMO). Once defined, the relationship between individual perils and climate drivers were established and Monte-Carlo simulations calculated to express the complex interconnectivity amongst sixteen perils of interest. These simulations have then been used to sensitivity test the parameters of the Lloyd’s Catastrophe Model.
Probabilistic Graphical Models for Detecting Underwriting Fraud
Mick Cooney, Applied AI
Medical non-disclosure is a major cost in underwriting life insurance. This is where the applicant lies, omits or is unaware of information about pre-existing medical conditions of relevance to an underwriter. Medical exams help mitigate non-disclosure, but tend to be expensive in both time and money, and may result in the applicant not taking up the policy, even in the event of no health issues of concern.
Probabilistic Graphical Models in general, and Bayesian Belief Networks in particular, are a way to help the underwriters with this process. This talk proposes a simple and basic initial model to serve mainly as a proof of concept for the approach. Methods for dealing with incomplete and missing data will be discussed, as well as discussing realistic expectations for what such a model could reasonably produce, along with possible avenues of improvement for model, such as scaling it to larger networks of variables.
R, Shiny and the Oasis Loss Modelling Framework – a toolkit for Catastrophe modelling
Mark Pinkerton, OASIS
Oasis is a not-for-profit initiative to create a global community around catastrophe modelling, based around open standards and software. The core Oasis system has been designed to be agnostic in that it can execute models from many different suppliers. A unique feature of the Oasis architecture is that it provides a core set of components that can be used directly by modellers or analysts, embedded in other software or deployed in enterprise risk management systems.
R is a great fit for running analytics using the Oasis components, and Shiny is a powerful tool for deploying R’s data visualization and geo-spatial capabilities. R is already widely used in the model development and actuarial communities, and we are starting to see some adoption by the catastrophe modelling community.
This talk will cover:
- The catastrophe modelling problem space
- The Oasis technical architecture
- The use of R in Oasis and potential uses by the catastrophe modelling community
Code examples will be provided for operating Oasis within R, as well as a demo of the Oasis user interface developed in RShiny.
Persuasive Advice for Senior Management: the Three-C’s
Dan Murphy, Trinostics
You are the analyst whom Senior Management named to the 2016 strategic team. While collaborators and Management may not appreciate the technical foundations for your conclusions, how can you nevertheless persuade them to see the value in your message? Persuasive analysts use analytical tools that help them deliver advice with three C’s: Context, Confidence, and Clarity.
For decades, spreadsheets have been the insurance analytical tool of choice primarily because they can prototype models relatively quickly (high Context). But spreadsheets can be difficult to tech-check and audit (low Confidence) and time-consuming to modify for refocused objectives (low Clarity). More recently, R and derivative products are evolving into a computing tool that is particularly well-honed to deliver Three-C analytical advice to insurance Senior Management.