Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xref offset wrong in cross-reference stream #68

Open
g-tc opened this issue Jun 27, 2023 · 0 comments
Open

xref offset wrong in cross-reference stream #68

g-tc opened this issue Jun 27, 2023 · 0 comments

Comments

@g-tc
Copy link

g-tc commented Jun 27, 2023

The following is in regard to PDF v1.5 or later using cross-reference streams, I have not looked at how the issue affects xref tables in PDF v1.4 and earlier.

The cross-reference stream data includes an entry for the cross-reference stream object itself, but the offset being recorded by SynPDF is not correct. The offset is redundant information, any reader is only reading the stream because they found it from the offset in the trailer, and almost no software cares about it. But in the last 12 months or so PDF-XChange from Tracker Software has been reporting: "Error loading object: misaligned object". The load proceeds and PDF-XChange reports having automatically corrected the document, so it's not a big deal, but it worries users (and it annoys to developers who want to know their PDFs are valid).

A typical PDF from SynPDF outputs xref data something like this heavily cut down example:

0: 0 0000 ff
1: 2 0007 00
2: 2 0007 01
3: 2 0007 02
4: 2 0007 03
5: 1 000f 00
6: 1 0042 00    << xref object
7: 1 0042 00    << metadata object

Notice the last two objects have the same offset of 0042. That is the result of this code:

procedure TPdfDocument.SaveToStreamDirectEnd;
[...]
    for i := 1 to FXref.ItemCount-1 do
      with FXref.Items[i] do
      if ByteOffset<=0 then begin
        fByteOffset := fSaveToStreamWriter.Position;
        if Value<>FTrailer.FCrossReference then
          Value.WriteValueTo(fSaveToStreamWriter);
      end;
[...]

That loop means that the xref object for FCrossReference ends up with the same fByteOffset as the next object in the collection. We can't give it the correct offset there because we don't know the correct offset yet.

The simplest fix I could see was in the follow functioning, inserting the three lines you see marked...

procedure TPdfTrailer.WriteTo(var W: TPdfWrite);
[...]
    for i := 1 to FXRef.ItemCount-1 do
    with FXRef.Items[i] do begin
      if ObjectStreamIndex>=0 then begin
        assert(FObjectStream<>nil);
        WR.AddIntegerBin(ord(xrefInUseCompressed),TYPEWIDTH);
        WR.AddIntegerBin(FObjectStream.ObjectNumber,offsetWidth);
        WR.AddIntegerBin(ObjectStreamIndex,genWidth);
      end else begin
        WR.AddIntegerBin(ord(xrefInUse),TYPEWIDTH);
        if Value = FCrossReference then              //<<<< INSERT special handling
          WR.AddIntegerBin(FXrefAddress,offsetWidth) //<<<< for Cross Reference enty
        else                                         //<<<< needs actual offset
          WR.AddIntegerBin(ByteOffset,offsetWidth);
        WR.AddIntegerBin(GenerationNumber,genWidth);
      end;
    end;
    FCrossReference.WriteValueTo(W);
  end;
  W.Add(CRLF + 'startxref' + CRLF).Add(FXrefAddress).
    Add(CRLF + '%%EOF' + CRLF);
end;

But you may have a better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant