vignettes/model-income-tax.Rmd
model-income-tax.Rmd
The functions model_income_tax
and project
are the core of the grattan package. Grattan applies them to the ATO’s 2% sample files to produce costings of changes to tax policy. The functions are both \(X^n \to X^n\). That is, they take a sample file and return a mutated sample file.
With the mutated sample file, the costing for that particular tax year is the weighted sum of the difference between the new_tax
and the baseline_tax
columns. We can also use the mutated sample file to perform distributional analysis, such as the average change in tax by taxable income percentile.
Since the input data consists of tax returns and the grattan package does not purport to generate inferences about the wider Australian population, these functions cannot (directly) analyse the effect of policies on households or on the wider population. For example, policies affecting welfare payments, changes to the tax settings of businesses or super funds, or changes which would tax people who do not currently file tax returns are not amenable to the kind of analysis these functions perform.
model_income_tax
model_income_tax
takes a sample file and returns a sample file under the settings given by the function arguments.
To start, let’s load the (minimal) packages we need. We’ll use the synthetic 2015-16 sample file contained in the suggested package taxstats1516
. See ?install_taxstats
for installation instructions. For future years, use the latest sample file from the ATO.
library(knitr)
library(data.table)
library(magrittr)
library(hutils)
library(grattan)
require_taxstats1516()
# Use the actual sample file if you've got it
s1516 <- as.data.table(sample_file_1516_synth)
s1516[, WEIGHT := 50L]
This function is purely cosmetic.
#' @return Number formatted as dollar e.g. 30e3 => $30,000
dollar <- function (x, digits = 0) {
nsmall <- digits
commaz <- format(abs(x), nsmall = nsmall, trim = TRUE, big.mark = ",",
scientific = FALSE, digits = 1L)
if_else(x < 0,
paste0("\U2212","$", commaz),
paste0("$", commaz))
}
All instances of model_income_tax
have two mandatory arguments: sample_file
and baseline_fy
. These define the baseline_tax
column in the result. When an argument is left as NULL
, the new_tax
column is calculated using the corresponding tax setting that applied in baseline_fy
.
s1516 %>%
model_income_tax(baseline_fy = "2015-16") %>%
select_grep("tax$", "Taxable_Income") %>% # just look at relevant cols
head %>%
kable
Taxable_Income | baseline_tax | new_tax |
---|---|---|
28849 | 2155 | 2155.29 |
210436 | 72060 | 72060.64 |
22285 | 426 | 426.15 |
58461 | 11592 | 11592.96 |
0 | 0 | 0.00 |
20078 | 0 | 0.00 |
Note that by default new_tax
is a double precision vector, not rounded. You can use return. = sample_file.int
to return rounded variables.
With the use of a simple function to test equality, we can see that new_tax
is just the same as baseline_tax
, as expected.
is_all_equal <- function(x, y) {
if (is.integer(x) && is.integer(y)) {
all(x == y)
} else {
isTRUE(all.equal(x, y))
}
}
s1516 %>%
model_income_tax(baseline_fy = "2015-16",
return. = "sample_file.int") %>%
select_grep("tax$", "Taxable_Income") %T>%
.[, stopifnot(is_all_equal(baseline_tax, new_tax))] %>%
head %>%
kable
Taxable_Income | baseline_tax | new_tax |
---|---|---|
28849 | 2155 | 2155 |
210436 | 72060 | 72060 |
22285 | 426 | 426 |
58461 | 11592 | 11592 |
0 | 0 | 0 |
20078 | 0 | 0 |
The choice of rounded, unrounded, or truncated values may be important for some analysis. For instance, tax liabilities are calculated using whole dollar amounts, so a truncated value may be appropriate when the values of new_tax
for each row need to be very precise. Unrounded values may be important to determine changes in marginal tax rates. Rounded values may be the most appropriate choice for costings.
You can change how the ‘ordinary tax’ is calculated by changing the arguments ordinary_tax_thresholds
and ordinary_tax_rates
. To replicate the 2015-16 tax scales, one would use.
s1516_no_changes <-
# Temp budget repair levy not refundable against SBTO
s1516 %>%
model_income_tax(baseline_fy = "2015-16",
ordinary_tax_thresholds = c(0, 18200, 37000, 80000, 180000),
# temp budget
# repair levy
ordinary_tax_rates = c(0, 0.19, 0.325, 0.37, 0.45 + 0.02),
return. = "sample_file.int")
Note that the temporary budget repair levy is not included by default, so I simulated it by topping up the $180,000 marginal tax rate. This simulation is imperfect because the small business tax offset does not offset levies. As a result, baseline_tax
and new_tax
are slightly different in s1516_no_changes
. This is not a problem for tax years including and beyond 2018-19.
The Medicare levy is more complex to calculate than ordinary income tax. There are parameters relating to two thresholds, as well as different thresholds for families and SAPTO-eligible individuals. Even the simplest modification require changes to multiple parameters. Warnings are emitted whenever parameters are not internally consistent.
Let’s try to increase the Medicare levy rate from 2% and 3%. Observe the warning messages.
## Warning: `medicare_levy_upper_threshold` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
## medicare_levy_upper_threshold = 30479
## Warning: `medicare_levy_upper_sapto_threshold` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
## medicare_levy_upper_sapto_threshold = 48197
Note the warning messsage says that the parameter has been changed. However, you should never tolerate the warning; instead, change the parameter to the suggested one (if you agree with the warning message’s advice).
m1516a <-
s1516 %>%
model_income_tax("2015-16",
# Increase to 3%
medicare_levy_rate = 0.03,
medicare_levy_upper_threshold = 30479,
medicare_levy_upper_sapto_threshold = 48197)
Since there are many degrees of freedom, and since thresholds are generally the things that are actually contemplated when making changes, warnings will suggest changing thresholds over changes to the rate or taper if there is a conflict. Only when the thresholds have been manually selected and there is still a conflict is a change to the taper or rate suggested. For example, if we didn’t want to change the upper threshold, but keep it at its 2015-16 value of $26,670, we could insist:
m1516a <-
s1516 %>%
model_income_tax("2015-16",
# Increase to 3%
medicare_levy_rate = 0.03,
# but keep the upper threshold the same
medicare_levy_upper_threshold = 26670,
medicare_levy_upper_sapto_threshold = 48197)
## Warning: `medicare_levy_lower_threshold` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
## medicare_levy_lower_threshold = 18668
The warning still assumes the taper and rate are the same, but it can no longer suggest a change to the upper threshold (since we provided it), so it suggests a change to the lower threshold. Only once we exhaust the thresholds it can adjust does the warning message start to include changing the taper:
m1516a <-
s1516 %>%
model_income_tax("2015-16",
# Increase to 3%
medicare_levy_rate = 0.03,
# but keep the upper threshold the same
medicare_levy_lower_threshold = 21335,
medicare_levy_upper_threshold = 26670,
medicare_levy_upper_sapto_threshold = 48197)
## Warning: `medicare_levy_taper` was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
## medicare_levy_taper = 0.15
project
The function project
takes a sample file and returns a sample file. The other mandatory argument is h
, the number of integer years ahead of the sample file provided.
Thus, to get a forecast for the 2018-19 tax year:
This uses the internal forecast methods. To specify specific forecast outcomes, you can use the wage.series
and lf.series
To compare the tax collections under these different assumptions, one would use income_tax
separately:
tax_Grattan_1819 <-
s1819 %$%
income_tax(Taxable_Income, "2018-19", .dots.ATO = copy(s1819)) %>%
sum %>%
# Weight (equi-weighted so do now)
multiply_by(s1819[["WEIGHT"]][1L])
tax_2pc_1819 <-
s1819_lf2pc_wage2pc %$%
income_tax(Taxable_Income, "2018-19", .dots.ATO = copy(s1819)) %>%
sum %>%
# Weight (equi-weighted so do now)
multiply_by(s1819[["WEIGHT"]][1L])
Currently there is no interface to using the upper or lower bounds of the labour force or wage price indices. If you wanted the 80% upper bound of the prediction interval for salary out to 2020-21, for instance, you would pass Sw_amt
to excl_vars
and manually inflate.
s2021_wage80pc <-
s1516 %>%
copy %>%
.[, Sw_amt := wage_inflator(Sw_amt,
from_fy = "2015-16",
to_fy = "2020-21",
forecast.level = 80,
forecast.series = "upper")] %>%
.[] %>%
project(h = 5L,
excl_vars = "Sw_amt",
.copyDT = FALSE) %>% # just for memory frugality
.[]
## [1] "$50,884"
## [1] "$51,648"
## [1] "$65,782"
## [1] "$66,544"
To cost a reduction in the capital gains tax discount from 50% to 25% over the four years from 2018-19, we would run
cgt_25pc_fwd_estimates <-
lapply(yr2fy(2019:2022), function(fy) {
s1516 %>%
project_to(to_fy = fy) %>%
model_income_tax("2018-19",
cgt_discount_rate = 0.25) %>%
.[, fy_year := fy]
}) %>%
rbindlist
Note that this takes a few seconds, most of which is spent within project
. We could improve the speed of this by caching the intermediate objects, either as objects in the environment or as files (say, .fst
files). You should consider doing this when you find yourself running project
many times – likely you are just repeating calculations.
cgt_25pc_fwd_estimates %>%
mutate_ntile("Taxable_Income", n = 5L, keyby = "fy_year") %>%
.[, delta := new_tax - baseline_tax] %>%
.[, .(totDelta = sum(delta),
avgDelta = mean(delta)),
keyby = .(fy_year, Taxable_IncomeQuintile)] %>%
# cosmetic
.[, lapply(.SD, round), keyby = key(.)] %>%
kable
fy_year | Taxable_IncomeQuintile | totDelta | avgDelta |
---|---|---|---|
2018-19 | 1 | 0 | 0 |
2018-19 | 2 | 380973 | 7 |
2018-19 | 3 | 1686609 | 31 |
2018-19 | 4 | 3335203 | 62 |
2018-19 | 5 | 72763741 | 1349 |
2019-20 | 1 | 0 | 0 |
2019-20 | 2 | 398814 | 7 |
2019-20 | 3 | 1748443 | 32 |
2019-20 | 4 | 3452320 | 64 |
2019-20 | 5 | 76423334 | 1417 |
2020-21 | 1 | 0 | 0 |
2020-21 | 2 | 434734 | 8 |
2020-21 | 3 | 1767220 | 33 |
2020-21 | 4 | 3609380 | 67 |
2020-21 | 5 | 82787537 | 1535 |
2021-22 | 1 | 0 | 0 |
2021-22 | 2 | 455595 | 8 |
2021-22 | 3 | 1834093 | 34 |
2021-22 | 4 | 3694293 | 69 |
2021-22 | 5 | 86490642 | 1604 |
lito_multi
for custom offsetsWhile model_income_tax
cannot account for the future imagination of tax policy makers, the argument lito_multi
does provide a powerful mechanism for handling complicated offsets. The argument, if provided, must be a list of two components x
and y
. These can be used to define an offset: for every (x_i, y_i)
defined the value of the offset for a taxable income x_i
must be y_i
with the points in between interpolated linearly.
For example to simply mimic LITO
in 2015-16:
s1516 %>%
model_income_tax("2015-16",
lito_multi = list(x = c(-Inf, 37e3, 200e3/3, Inf),
y = c(445, 445, 0, 0)),
return. = "sample_file.int") %>%
.[new_tax != baseline_tax]
## Empty data.table (0 rows) of 67 cols: Gender,age_range,Occ_code,Partner_status,Region,Lodgment_method...
Budget_...
parametersThese were used to cost policies proposed in the 2018 Budget period by the Government and the Opposition. They’re unlikely to have much use except in reproducing past results.
The Seniors and Pensioner Tax Offset (SAPTO) can also be modified. To cost the abolition of SAPTO, one would use:
To model a change to lower the SAPTO threshold from $32,279 to $27,000:
To cost the proposal in Age of entitlement: age-based tax breaks (2016)
s1718_AgeOfEntitlement <-
project(s1516,
h = 2L) %>%
model_income_tax("2017-18",
sapto_lower_threshold = 27e3,
sapto_lower_threshold_married = 42e3,
sapto_max_offset = 1160,
sapto_max_offset_married = 390,
medicare_levy_lower_sapto_threshold = 27000,
medicare_levy_upper_sapto_threshold = 33750,
medicare_levy_upper_family_threshold = 46361,
medicare_levy_lower_family_sapto_threshold = 42000,
medicare_levy_upper_family_sapto_threshold = 52500)
revenue_foregone(s1718_AgeOfEntitlement)
## [1] "$383 million"