They have a very large collection of documents, some of which had Texas Instrument calculator fonts, which had maths symbols in them, but didn’t always render properly with font substitutions. Several other examples, including barcode fonts (where font substitution can give the numeric value, not losing information but losing functionality).
The top 10 fonts in a collection tend to be the same; it’s the long tail of up to 3,000 or so that might be the problem. Font names help a bit but there are huge variations in font names, eg 50+ for Arial alone! In fact, it’s quite difficult to get useful matches from font names with fonts in font tables, some of which have very weak information content. Times new Roman satisfies about 38% of documents in their collection; Windows XP + Word satisfies about 80% of the documents in the collection; the large collection of fonts they assembled would satisfy about 95% of the collection, many more would be needed to build that up higher.
Worst example was a Cyrillic font, called Glasnost-light but rendered as ASCII; the problem was related to the pre-Unicode code space in some way I didn’t understand. A font substitution looked hopeful; it produced Cyrillic, but unfortunately not Russian, as the encoding was different.
Comment: this is a difficult problem much dealt with in the commercial community, who have secret tables. But even Adobe only deals with a couple of thousand fonts.