How Can Students Sitting the Test on Different Days Be Assessed Fairly?
― The Role of Equating and Standardisation in Large-Scale Exams
May 06 2025

One common concern among parents is how students who sit different versions of a test—sometimes even on different days—can be compared fairly. This question has gained particular attention since the NSW Opportunity Class (OC) and Selective High School Placement Tests moved to a computer-based format.
This article outlines how test-score adjustment (equating) and standardisation are applied in large-scale assessments worldwide.
Please note that the NSW Department of Education (DoE) has not publicly released the exact methods used in the OC or Selective exams. The explanations that follow therefore describe generally accepted practices, along with their strengths and limitations.
If you skim the opening and it still feels confusing, please don’t worry—this piece is meant as light, background reading, not a technical manual.
1. Why Use Multiple Test Versions?
In very large cohorts it is rarely practical—nor secure—for every candidate to sit precisely the same paper at the same time. Computer-based delivery therefore often employs multiple versions spread across several sittings. The main reasons are:
- Stronger security – reduces the risk of item exposure and allows for legitimate re-sits.
- Operational flexibility – schools and centres can book sessions that suit local timetables.
- Fairness – score-adjustment methods allow results from different versions to be compared on the same scale.
2. What Are Equating and Standardisation?
◼️ EQUATING (TEST-SCORE ADJUSTMENT)
Purpose – to neutralise difficulty differences between versions.
Process – each test form (e.g. Form A and Form B) contains anchor items that appear in both forms.
Suppose candidates average 80 % correct on the anchors in Form A but 70 % in Form B. Form B is deemed harder. Consequently, a candidate with 27 correct in Form B may be equated to a candidate with 30 correct in Form A.
Example ● Mohan (Form A): 30 raw correct answers. ● Wang (Form B): 28 raw correct. ● After equating, Wang’s 28 equates to 30 on the Form A scale, as Form B was harder. |
▶ Simple Explanation
Mohan’s version of the test (Form A) turns out to be a bit easier. He answers 30 questions correctly.
Wang’s version (Form B) is trickier, so even though he works hard he manages 28 correct answers.
Think of it like basketball hoops: Mohan is shooting at a hoop 2 metres high, while Wang has to aim for one 2.5 metres off the ground. Because Wang’s hoop is higher, the umpire decides he deserves a little bonus for every shot he sinks. After the adjustment, Wang’s raw score of 28 is lifted by 2 bonus points, making his equated score 30—now both friends are level. That whole balancing act is what exam experts call equating.
◼️ STANDARDISATION
Purpose – to place the adjusted scores on a common reporting scale so candidates can see where they stand within the whole cohort.
Process – the equated scores are fitted to a standard-score distribution with a mean of 60 and a standard deviation of 12. These standard scores are then converted to a 0–100 reporting scale for publication.
Continuing the Example ● Both Mohan and Wang now have an equated mark of 30. ● If the overall mean is 28 and the standard deviation is 4: ● Standard score = 60 + ((30 – 28) ⁄ 4) × 12 = 66. ● That standard score might then be mapped to a public score of 82 / 100. |
▶ Simple Explanation
Next, the exam team wants to show where Mohan and Wang sit among all the students who took the test. To do that, they convert everyone’s equated scores to a common scale with an average (mean) of 60 and a spread (standard deviation) of 12. It’s a bit like turning centimetres into star points so the numbers are easy to read at a glance.
Suppose the overall mean for the cohort is 28 and the standard deviation is 4. Because Mohan and Wang have equated marks of 30, which is two points above the mean, they each receive a standard score:
Standard score = 60 + (30 – 28) ÷ 4 × 12 = 66
To make reports even clearer, that standard score of 66 is finally re-labelled on a 0–100 scale, ending up at about 82/100.
That’s all equating and standardisation really do—one adds the right bonus, the other uses a common measuring stick—so no matter which day you sat the test, your result means the same thing.
3. Where Is This Method Used?
Equating and standardisation are routine features of many international tests, including:
- SAT (USA university entrance) – each sitting has a different form; equating ensures, for example, that 540 means the same whichever month it is earned.
- ACT (USA college readiness) – anchor items and psychometric modelling equate multiple forms.
- NAPLAN Online (Australia) – an adaptive design selects questions dynamically; scores are subsequently adjusted so all test pathways share a common scale.
4. Advantages
- Comparable scores – candidates are judged on the same metric even if they answered different questions.
- Flexible scheduling – tests can run over multiple days and still remain comparable.
- Better security – limits the impact of leaked content or pre-knowledge of questions.
5. Limitations
- Standard error of measurement – every mark carries ±2–3 points of statistical noise.
- Fine-cut selection pressure – in rank-order systems with fixed places even small equating variations can influence outcomes near the cut-off.
- Higher complexity and cost – designing, trialling and analysing multiple forms demands more psychometric expertise and resources.
6. Take – Home Message
Equating and standardisation are internationally accepted tools for keeping large-scale testing as practicable when multiple versions are used. They do not remove all uncertainty, but they greatly reduce any systematic advantage tied to sitting a particular form.
Because NSW DoE has not disclosed its precise scoring model, the information here is descriptive, not prescriptive. When you interpret any score, remember that all testing carries a margin of error, and the bigger picture of a student’s learning journey matters more than a single number.
🔸 Disclaimer
This article offers a brief, general overview of equating and standardisation. The actual procedures used by NSW DoE (or any other agency) may differ.
For authoritative advice, consult official sources or a qualified measurement specialist. The author accepts no legal responsibility for actions taken solely
on the basis of this information.