At a glance

Mean absolute error (MAE) of MoleBench against experiment, across the benchmark sets below:

PropertyMethodMAEVerdict
Bond lengthsGFN2-xTB geometry≈ 0.007 Åexcellent
Dipole momentGFN2-xTB≈ 0.22 Dgood (slight over-prediction)
¹H chemical shiftGIAO B3LYP/6-31G*≈ 0.15 ppmgood
¹³C chemical shiftGIAO B3LYP/6-31G*≈ 2 ppm*good
pKaGFN2 ΔG + per-class scaling≈ 0.3 unitsgood (per functional group)
UV-Vis λmaxTD-B3LYP/6-31G*≈ 5 nmgood (basis-dependent)

*excluding one gas-phase carboxylic-acid outlier discussed below. All calculations run on this site; you can reproduce any of them in the Studio.

The honest one-liner: MoleBench is excellent for geometry and trends, and good and quantitative for NMR, dipoles and (after a per-class re-fit) pKa. Use it to understand and compare — and reach for the literature when you need a publication number.

Geometry — GFN2-xTB bond lengths

Getting the shape right is the foundation of everything else, and here MoleBench's default engine is genuinely strong: bond lengths land within about a hundredth of an ångström of experiment.

MoleculeBondMoleBench (Å)Experiment (Å)Δ
waterO–H0.9590.958+0.001
methaneC–H1.0821.087−0.005
ethaneC–C1.5221.535−0.013
ethaneC–H1.0881.094−0.006
benzeneC–C1.3851.397−0.012
benzeneC–H1.0801.084−0.004

Dipole moments — GFN2-xTB

The dipole tests the electronic structure, not just the shape. Trends are captured perfectly (the non-polar molecules come out at zero; the most polar comes out most polar), with a mild systematic over-prediction of a few tenths of a Debye on the carbonyls.

MoleculeMoleBench (D)Experiment (D)Δ
benzene0.000.000.00
formaldehyde2.332.330.00
acetonitrile3.853.92−0.07
chloromethane2.041.87+0.17
methanol1.921.70+0.22
dimethyl ether1.581.30+0.28
ammonia1.781.47+0.31
water2.221.85+0.37
acetone3.422.88+0.54

¹³C NMR — GIAO B3LYP/6-31G* (Advanced tier)

The quantum NMR is calibrated against experiment and performs well across a 200-ppm range, from a shielded methyl to a deshielded carbonyl.

MoleculeCarbonMoleBench (ppm)Experiment (ppm)Δ
benzeneCH128.7128.5+0.2
acetoneC=O205.7206.0−0.3
acetoneCH₃28.230.9−2.7
tolueneC1 (ipso)139.2137.8+1.4
tolueneC2–C6 (avg)128.1127.8+0.3
tolueneCH₃22.321.4+0.9
methanolCH₃53.150.4+2.7
ethanolCH₂61.958.4+3.5
acetic acidC=O168.7178.1−9.4
⚠ The acetic-acid carbonyl is the one large miss — and it's instructive. The calculation is gas-phase, single molecule; real acetic acid hydrogen-bonds and dimerizes, which shifts that carbon by several ppm. This is a model error (the chemistry of the environment), not a method failure — and it's exactly why we show it. Most carbons land within ~2–3 ppm.

¹H NMR — GIAO B3LYP/6-31G*

MoleculeProtonMoleBench (ppm)Experiment (ppm)Δ
ethanolCH₃1.161.21−0.05
tolueneCH₃2.322.34−0.02
acetoneCH₃1.952.09−0.14
benzeneArH7.127.26−0.14
acetic acidCH₃1.932.10−0.17
methanolCH₃3.663.40+0.26
ethanolCH₂3.963.69+0.27

O–H / N–H protons are omitted: they are dominated by hydrogen bonding and concentration, so a gas-phase value is not comparable to a solution measurement.

pKa — GFN2 deprotonation + per-class calibration

This one has a story. An earlier single global calibration carried a +1.5–2 unit high bias on carboxylic acids — which this very benchmark exposed. The cause: the GFN2 deprotonation energy maps to pKa with a class-dependent slope (carboxylic acids, phenols and alcohols each follow a different line), so no single line can fit them all. We re-fit per functional-group class against a 20-acid set spanning pKa 0–17. The bias is gone, and the error dropped to ~0.3 units:

AcidClassMoleBenchExperimentΔ
trifluoroacetic acidacid0.10.23−0.1
formic acidacid3.33.75−0.5
acetic acidacid4.74.76−0.1
benzoic acidacid4.64.20+0.4
p-nitrophenolphenol6.47.15−0.8
phenolphenol10.19.99+0.1
thiophenolthiol6.86.62+0.2
ethanolalcohol16.116.0+0.1
phosphoric acidP-oxyacid2.02.15−0.2
methanesulfonic acidS-oxyacid−2.0−1.9−0.1

Held out from the calibration set, then predicted blind: 2-naphthol 9.8 (exp 9.51) and propanoic acid 4.7 (exp 4.87) — so it generalizes, it isn't memorizing. Acidity ranking across 17 units is reliable, and absolute values are now good to roughly ±0.5 unit for the common classes.

Two honest edges, now flagged in the tool itself. Phosphorus/sulfur oxyacids (phosphoric, phosphonic, sulfonic) were originally mis-scored as alcohols (phosphoric came out ~8 instead of ~2); they now use dedicated P- and S-oxyacid classes and carry an "approximate, strong acid" note. Amino acids are detected and labelled: the gas-phase neutral model can't form the zwitterion that dominates in water, so glycine's −COOH reads ≈3.9 rather than the measured ≈2.35 — the tool now says so up front instead of quietly handing you the wrong number.

UV-Vis & why the basis set matters

UV-Vis is a great lesson in how method choice drives accuracy. The strong π→π* absorption of paracetamol (experimental λmax ≈ 243–249 nm) marches steadily toward experiment as the basis set improves:

Basis setPredicted λmaxvs exp (~244 nm)
STO-3G (minimal)212 nm−32
3-21G (Quick tier)235 nm−9
6-31G* (Advanced tier)240 nm−4
6-31+G* (diffuse)246 nm+2

This is why the Studio's Quick UV uses 3-21G and Advanced uses 6-31G*. TD-DFT is reliable for ordinary valence (π→π*, n→π*) excitations but should not be trusted for charge-transfer or Rydberg states.

The instant (empirical) NMR — and its honest limits

The Quick NMR returns shifts in milliseconds using substituent-additivity rules. For the chemistry it was built for it is remarkably good — but it knows its limits, and now tells you so.

MoleculeWorks well?Why
aspirin, paracetamol, tolueneyes (±~2–3 ppm)substituted benzenes + carbonyls + simple aliphatics — its sweet spot
pyridine, furan (heteroaromatic)noadditivity has no good base values; flagged "low confidence"
caffeine (fused rings)nofused/heteroaromatic; flagged, with a "Run Advanced" button

When the instant estimate is unreliable, MoleBench shows a warning and offers to run the real quantum calculation instead — so a fast estimate never masquerades as a trustworthy one. And the Advanced (quantum) tier genuinely handles them — here it is on the very heteroaromatics the instant tier flags, within ~2–3 ppm of experiment:

MoleculeCarbonAdvanced (QM)ExperimentΔ
pyridineC2/C6151.8149.9+1.9
pyridineC4136.3136.0+0.3
pyridineC3/C5124.2123.8+0.4
furanC2/C5141.3142.8−1.5
furanC3/C4111.6109.6+2.0
thiopheneC2/C5124.8125.4−0.6
thiopheneC3/C4129.9127.2+2.7

The two tiers are complementary by design: the instant estimate for speed on its sweet spot, the quantum calculation for the cases it can't reach — and the tool always tells you which one you should be using.

How to read this

  • Trends are more reliable than absolutes. Comparing two similar molecules with the same method cancels most systematic error — relative answers are the safest use of any of these tools.
  • The model matters as much as the method. Several of the larger errors above are gas-phase vs. solution effects, not the quantum chemistry being wrong.
  • Pick the right tier. Quick tiers are for speed and exploration; Advanced (quantum) tiers are for the numbers you'll quote.
  • Everything here is reproducible. Build any of these molecules in the Studio and run the same calculation yourself.

Open the Studio →   Read the Lecture Notes

Benchmark run on the live MoleBench compute service. Experimental values from standard reference compilations (CRC Handbook, NIST, SDBS and the primary literature). Updated 2026-06-25.