Statistical information

The statistical information provided in this section relates to the key statistical information and processes for the data presented by PHIDU, as follows:

Data adjustment

Introduced random error

For the Census data, the Australian Bureau of Statistics (ABS) uses a method of ‘introduced random error' to ensure that no data are released which could risk the identification of individuals in the statistics. The technique slightly adjusts all cells, resulting in small introduced random errors. Whilst the totals and subtotals in the Census summary tables are subjected to small adjustments to preserve the additivity within the tables, data at the higher geographic level may not be equal to the sum of the data for the component geographic units.

For further information, please refer to the ABS Census Dictionary, 2021 (ABS Cat. No. 2901.0) on the following topics:

Definition of ‘Unknown/ ABS cell adjustment' in PHIDU data spreadsheets

Census data

The ‘Unknown/ ABS cell adjustment' cells in the spreadsheets shown for the Census data are the difference between the sum of the small area level data (e.g., SA2) to the State/ Territory totals for the indicator of interest. These cells include data with ‘no usual address', data where the small area level data was unknown or not provided and the randomly adjusted counts (which may in some cases result in a negative number).

Non-Census data

The 'Unknown'* cells in the spreadsheets shown for the non-Census data are the difference between the sum of the small area level data (e.g., SA2) to the State/ Territory totals for the indicator of interest. These cells include data where the small area level data was unknown or not provided (e.g., some data are confidentialised by removal of cells with values of fewer than five or fewer than 20 before being provided to PHIDU); or, where the small area level data was produced from postcodes, the 'Unknown'* cells include data which is non-mappable, such as Post Office Boxes.

*Labelled ‘Unknown/ ABS cell adjustment' in all data workbooks, as, in some cases, they include both ABS and non-ABS data indicators.

Age-standardisation of rates and standardised ratios

Age-standardisation methods

Age-standardisation is a method of adjusting a crude rate to eliminate the effect of differences in population age structures when comparing crude rates for different periods of time, different geographic areas and/or different population sub-groups (e.g., between one year and the next and/or States and Territories, Indigenous and non-Indigenous populations).

Adjustments are undertaken for each of the populations being examined (or the study population) against a standard population.

There are two methods, namely direct and indirect, of calculating age-standardised rates:

The direct method is generally used for comparisons between study groups.
The indirect method is used when calculating rates for small populations where fluctuations in age-specific rates can affect the reliability of rates calculated using the direct method.

The indirect method is also used to calculate standardised mortality ratios (SMRs) and other standardised ratios, for example for health service utilisation. These ratios express the overall experience of a comparison population in terms of the standard population by calculating the ratio of observed to expected deaths in the comparison population. For details as to the calculation of age-standardised rates see then this link:

1. http://meteor.aihw.gov.au/content/index.phtml/itemId/327276/pageDefinitionItemId/tag.MeteorPrinterFriendlyPage

PHIDU's age-standardised rates

PHIDU's data are indirectly age-standardised; and the standard population is the relevant Australian population for the respective indicator.

For the Social Health Atlas of Australia and the Indigenous status (comparison) atlas it is the Australian (Aboriginal plus non-Indigenous) standard for the respective indicator.
For the Aboriginal Social Health Atlas of Australia it is the Aboriginal population (for the respective indicator).

95% confidence intervals

A confidence interval is a range in which it is estimated the true population value lies. Confidence intervals of different sizes can be created to represent different levels of confidence that the true population value will lie within a particular range. A common confidence interval used in statistics is the 95% confidence interval. In a 'normal distribution', the 95% confidence interval is measured by approximately two standard errors either side of the estimate.

A confidence interval can be used to describe how reliable the data are. The greater the level of confidence required, the wider the range between the lower and upper confidence intervals. A major factor determining the range of a confidence interval is the size of the population.

95% confidence intervals (labelled '95% C.I.s') in PHIDU data workbooks

For indicators where confidence intervals are provided, the upper and lower confidence intervals are shown at the 95% level for both the age-standardised rates and associated ratios.

Statistical significance

The statistical significance is calculated using a Z score calculation, which gives a significance score for the observed number around the expected number for the local population. The expected number is calculated using the indirect method of age standardisation.

The Z score enables a user to assess the statistical significance of the difference between the observed and expected numbers. Z scores are significant at the 95% level if their absolute value is greater than or equal to +1.96, and at the 99% level if their absolute value is greater than or equal to +2.58.

Definition of Z score (labelled ‘Sig.') in PHIDU data workbooks

Data which are statistically significant at the 95% confidence level, have been marked with a single asterisk symbol (*). Data which are statistically significant at the 99% confidence level have been marked with a double asterisk symbol (**).

Modelled estimates

National Health Survey, 2017-18 (modelled estimates)

The estimates have been synthetically predicted at the Population Health Area (PHA) level from the 2017-18 National Health Survey (NHS), conducted by the ABS: refer to the individual indicator notes.

National Health Survey, 2014-15 (modelled estimates)

The estimates have been synthetically predicted at the Population Health Area (PHA) level from the 2014-15 National Health Survey (NHS), conducted by the ABS: refer to the individual indicator notes.

General Social Survey, 2014 (modelled estimates)

The modelled estimates presented have been synthetically predicted at the Population Health Area (PHA) level from the 2014 General Social Survey (GSS), conducted by the ABS: for details, refer to the notes on the data.

Australian Health Survey, 2011-12 (modelled estimates)

The estimates have been synthetically predicted at the Population Health Area (PHA) level from the 2011-12 Australian Health Survey (AHS), conducted by the ABS: refer to the note on modelled estimates.

National Health Survey, 2007-08 (modelled estimates)

The estimates have been synthetically predicted at the Statistical Local Area (SLA) level from the 2007-08 National Health Survey (NHS), conducted by the ABS: refer to the note on modelled estimates.

Relative Root Mean Squared Error

The Relative Root Mean Squared Error (RRMSE) is a measure of the propensity for the synthetic predictions to give a different value to that which would be found for a region if it were possible to do an accurate census collecting that health indicator. Where the RRMSE values are high for some health indicators, it would be inappropriate to interpret the synthetic predictions as the actual proportion in a region at a given time.

The RRMSE is primarily a measure of prediction error but in its calculation it may also include some aspects of modelling and sampling error. The RRMSE generally decreases as the population size increases, and used to assess the reliability of predictions.

RRMSE in PHIDU data workbooks

Predictions with RRMSEs from 0.25 and to 0.50 have been marked with a single tilde symbol (~) to indicate that they should be used with caution. RRMSEs greater than 0.50 but less than 1 are marked with a double tilde symbol (~~) to indicate that the prediction is considered too unreliable for general use.

Correlation

Correlation is the degree to which one variable is statistically associated with another. The correlation coefficient is a measure of the strength of this association. When high values for one variable are matched by high values for the other (or when low values are matched by low values), then they are positively correlated. Where the interdependence is inverse (i.e. high values for one are matched by low values for the other), the two variables are negatively correlated. A coefficient of 0 indicates no correlation, while a coefficient of 1 indicates complete correlation.

Additional resources