NSW has one of the most valuable property datasets in Australia: public sales records from the Valuer General. In theory, this is exactly what everyday buyers need. In practice, it is not a neat spreadsheet, not a friendly CSV, not a polished PDF, and not something most mums and dads can casually open after dinner.

The official bulk files arrive as ZIP archives. Inside those ZIPs are .DAT files. Inside those DAT files are multiple record types. The only record type that carries the useful sale-level property information is the Type B record. To use it properly, you first need to know that the Type B records matter, then you need to parse them, then you need to clean them, and only then can you begin asking useful market questions.

Our NSW pipeline starts with 2,170,107 raw sale rows.
After filtering, deduplication, classification and bulk-sale handling, the current base tables contain 1,037,644 house records and 585,741 unit records.

The First Trap: The Data Format

NSW bulk property sales information is published as weekly and annual downloads. The current data is delivered weekly, while annual files are used for historical dumps. The data is public, but it is delivered as delimited flat ASCII inside DAT files, packaged inside ZIP archives. That is technically usable. It is not user friendly.

If you are an analyst, you can write a parser. If you are an ordinary buyer trying to understand whether a suburb is moving, you are probably going to end up downloading ZIPs, unpacking files, opening something that looks alien, and wondering why a public dataset is not simply available as a clean CSV.

B;district_code;property_id;sequence_number;file_date;property_name;unit_number;house_number;street_name;locality;postcode;area;area_type;contract_date;settlement_date;purchase_price;zoning;nature_of_property;primary_purpose;...

That is the shape of the problem. The data is there, but it does not meet normal users where they are. It assumes you can read a file layout guide, understand fixed business meanings, identify the right record type, and build your own importer.

Modern detached house in Sydney
For a normal buyer, the question is simple: what did similar houses sell for? The official bulk data makes that answer surprisingly difficult to extract. Image: Sardaka, CC BY-SA 4.0, local copy.

The Second Trap: Noisy Property Types

The raw file is not just houses and units. It includes commercial property, vacant land, car spaces, warehouses, shops, offices, factories, rural land, hotels and plenty of mixed labels. The most common raw purpose is RESIDENCE, but that label alone does not tell you whether the record should be treated as a house or a unit.

In our current raw table, the largest purpose groups include:

Raw primary purpose Rows Why it matters
RESIDENCE 1,810,509 Can be either house or unit depending on unit number and strata fields.
VACANT LAND 229,432 Not a dwelling resale, so it must not be mixed into house medians.
COMMERCIAL 50,823 Useful for other work, but irrelevant to residential buyer charts.
FARM / RURAL labels 31,000+ Some may contain dwellings, but they are not simple suburban house sales.
CARSPACE / CAR SPACE 3,147 Can create absurdly low "unit" prices if not removed.

Even after you get the file open, you are not looking at a clean residential sales table. You are looking at a registry-style dump that needs interpretation.

HARDER THAN IT LOOKS House or Unit?

Distinguishing a house from a unit is not as simple as reading one column. In our pipeline, a RESIDENCE with a strata lot number becomes a unit. A RESIDENCE with a unit number becomes a unit. Explicit labels such as UNIT, STRATA UNIT, FLATS, HOME UNITS and APARTMENT are also treated as units. HOUSE, TOWNHOUSE, DUPLEX and VILLA are usually houses unless the surrounding fields say otherwise.

That sounds straightforward only after the rule exists. Before that, the raw data simply gives you a collection of imperfect clues. If you classify too loosely, you mix apartments into house medians. If you classify too aggressively, you throw away valid records.

Apartment building in Sydney
Units are not always labelled cleanly as units. In NSW sales data, strata fields, unit numbers and purpose labels all have to be interpreted together. Image: Sardaka, CC BY-SA 4.0, local copy.

DATA TRAP Bulk Transactions Can Break the Median

The most dangerous issue is the bulk transaction. Sometimes one dealing covers multiple properties. The source can repeat the same total contract price on every property row. If you do not detect it, one portfolio sale can look like multiple individual properties sold at impossible prices.

Here is a real example from the NSW data:

Dealing Suburb Rows Price shown on each row False total if summed Adjusted per property
AP951161 North Richmond 32 $20,184,000 $645,888,000 $630,750

The transaction includes 32 house records on De Havilland Way in North Richmond. Each row carries the same $20.184 million price. Read naively, it looks like 32 houses sold for $20.184 million each. The cleaner interpretation is that the total transaction was $20.184 million across 32 properties, or $630,750 per property.

Across the current cleaned Stage3 table, we identified 4,973 bulk dealings and 11,240 bulk rows. Their repeated raw prices add up to about $33.45 billion. After per-property adjustment, that becomes about $11.98 billion. That gap is not a rounding error. It is the difference between a usable dataset and a nonsense one.

Altitude Apartments in Parramatta
Bulk apartment or estate transactions can appear as repeated high-price rows. Without correction, a single portfolio transaction can distort suburb-level medians. Image: Sardaka, CC BY-SA 4.0, local copy.

MANUAL CHECK REQUIRED Impossible Prices

Some records are simply too extreme to trust without outside proof. A $130 million Point Piper sale might be real. A $415 million Middle Cove house sale probably needs serious checking. The problem is that the database itself cannot always know which is which.

We maintain manual exclusion lists for known bad property IDs and dealing numbers. These are not elegant, but they are necessary. A rule like "remove everything over $100 million" would wrongly remove some legitimate ultra-prime sales. A rule like "keep everything because it is official" would let obvious errors pollute the medians.

Example record Suburb Raw price Why it needs review
Property ID 969539 Middle Cove $415,000,000 Recorded as a residence. Implausible enough to be manually excluded.
Property ID 1986586 Maroubra $210,000,000 Same property later appears around $3.44m, so the huge record is not trusted.
Property ID 2009977 Coogee $139,500,000 Same address has normal-scale sales nearby in the record history.

This is exactly where manual checking matters. If an extreme sale changes a suburb chart, the next step should be boring and old-fashioned: search the address, look for independent evidence, and decide whether the number describes a real sale, a bulk transaction, a commercial asset, or a data error.

Bad Dates Are Real Too

Dates need cleaning before any timeline chart can be trusted. We found records with missing contract dates, dates in the year 0015 and 0016, and even a future contract date. Some of those rows are harmless once filtered. But if you are copy-pasting into Excel and grouping by year, they can quietly break the analysis.

Suburb Recorded contract date Price Issue
Kingswood 0015-08-29 $474,000 Clearly not a modern NSW sale date.
Moorebank 2026-10-20 $996,400 Future date relative to this analysis.
Wilton NULL $20,000,000 Missing contract date, unusable for timelines.

Duplicates Are Not Rare

Duplicate records are not a small edge case. In the Stage2 table, we found 49,703 duplicated property/dealing groups, representing 81,345 extra rows to remove. Some duplicates are straightforward repeats. Others appear many times for the same property and dealing. If you do not remove them, sale counts inflate and medians can drift.

That matters because people use sales counts as a confidence signal. A suburb with 80 sales feels more reliable than a suburb with 8 sales. But if some of those sales are duplicated registry rows, the confidence is fake.

Why the "Show Sales" Button Matters

This is why heatmaps keeps the raw sale table accessible for NSW through the Show Sales button. A median chart is useful, but it is a summary. When a suburb looks strange, the only responsible next step is to inspect the underlying sales.

The Ultimo example from our yield work is a good reminder. A high unit yield looked suspicious until the individual sale records showed many tiny student-accommodation-style units selling at low prices. The ranking was not necessarily wrong. The median was describing a very specific type of stock.

Where AI Helps

The good news is that these days you do not need to be a full-time data engineer to make progress. You can use AI prompt by prompt: ask it to read the file layout, identify the useful record type, write an importer, check duplicates, separate houses from units, flag impossible prices, and produce a suburb-level summary.

But AI should not be treated as an oracle. It can help you move faster through messy public data, but it can also confidently apply the wrong rule. The most important records still need human review: very high prices, very low prices, sudden suburb spikes, bulk transactions, and anything that changes your conclusion.

Practical rule: use AI to clean and shortlist. Use manual checking to trust. For extreme values, search the address and look for independent proof before believing the chart.

The Real Lesson

NSW property sales data is public and powerful, but it is not ready-to-use. It is a government data product, not a consumer research product. The hard work is not only downloading the files. The hard work is knowing which rows matter, which rows to ignore, which rows to correct, and which rows to manually verify.

That is the difference between having data and having evidence.

Sources: NSW Valuer General property sales data guides, NSW bulk property sales information downloads, NSW Government property sales information guide, heatmaps.com.au processing pipeline and local NSW sales database. Not financial advice.

Explore NSW Sales on the Map