You are here
Home › Forum home › PDF Property Extension › Bug reports › Encoding problem after "save as" ›Encoding problem after "save as"
Please let our ADS show!
This sites offers only FREE software and it's supported by a few advertisement boxes (no intrusive popups).
Please:
- disable your AdBlocker by adding CoolSoft website to whitelist
- give the proper cookie consent
- enable JavaScript for this website
This seconds wait is to let you update your browser configuration...
Ok, I've done the required changes... now show me your content!- komelensoso
- Posts: 10
- Joined: February 10, 2017 - 14:41
- coolsoft
- Posts: 1978
- Joined: March 25, 2012 - 01:19
Could you please post a small sample PDF file, before and after the SaveAs?
The smallest the better...
- komelensoso
- Posts: 10
- Joined: February 10, 2017 - 14:41
Hi,
Here you are... two pdf, the second is a copy of the first after the info dictionary has been deleted by a save as.(so, it only remains the utf-8 xmp metadata)
(I observed the same problem with hundreds of pdf)
--
pc
- Attachments (Only registered users)
- BeforeAndAfterSaveAs.rar
- coolsoft
- Posts: 1978
- Joined: March 25, 2012 - 01:19
I've found an issue on reading XMP objects where PDFPropertyExtension doesn't respect content encoding.
Attached here you'll find an updated version with the fix.
Please test it and report here any issue.
- Attachments (Only registered users)
- PdfPropertyExtension_1.8.3-beta1.zip
- komelensoso
- Posts: 10
- Joined: February 10, 2017 - 14:41
Hi,
Thank you, it's all right now: I checked many files, and the wmp content is now accurately displayed. This way I may now "batch saved as" my folders.
It seems important for many reasons : first, I don't need to keep metadata updates history in the pdf (history increases the size, and creates a security/privacy risk), but , second, and this is very important : It could help to solve a remaining problem with displaying the meta in explorer in two cases:
1 when performing a search like *.pdf in a folder
2 when performing any arrangement different than "by folder" in a library : in any arrangement where pdf from different folders must be displayed in the same liste, the metadata are not displayed.
I suspect that this is linked to the index (windows.edb) update. (according to microsoft, any folder being part of a library is gathered by the indexing process).
What I don't understand is : why explorer displays the correct metadata as soon as I save or save as a pdf, and why it should access the windows.edb to display these metadata when I explore a library or perform a search? (with the delay imposed by the indexing process in order to get the right values)
Finaly : as I understand it -but I'm not sure- the pdf property handler is used by windows to update the index with the new metadata, so :
1 where is that property handler? wich one shoud I use, or could have been installed on my computer by what program?
2 more basically : what are metadata "offcially" or exactly? What I mean is that : when a modified pdf is gathered, what is a keyword? the dc:subject? the pdf keywords? or the keywords in the infodictionary? I none of these three places are blank and have different values, what is the value that will be set in the index? what are the rules? Are they the reason why I don't see any metadata when exploring a library or performing a search in explorer?
--
pc
- komelensoso
- Posts: 10
- Joined: February 10, 2017 - 14:41
I did some more testing, and there are some tricky issues:
I want to be able to see the metadata in explorer in many ways:
-A) when I search for *.pdf in a folder, in order to get all the pdf of that folder and of its subfolders (sorted by a specific key/metadata)
-B) when I use a library not arranged by folder. I arrange by type, and then explore the pdf type : this way I can see all the pdf in the library sorted by my specific key.
Now, I had problems because most of the files metadata were not displayed in these two cases.
What I did: I uninstalled nirsoft shellexview. Then I saw that the advanced properties of the pdf (in explorer:right-click, properties, advanced) had been set as _not_ to "allow this file to have contents indexed in addition to file". So I selected all the pdf in all my pdf folders, and forced that option to "allow this file to have contents indexed in addition to file". (I also did it at the folder level, with the folders properties)
Then, I also discovered that two bunches of pdf where "blocked" (explorer, right-click, properties). I downladed sysinternal streams, and bleached all that with command lines.
After a while, the indexing process ended, and most of the metadata where displayed in the cases A) and B) : "most of them", but for 17 pdf files, they were still not displayed.(the total amount of pdf is 1677).
What I saw during the indexing process, is searchprotocolhost calling pdpropertyextension.dll.
For the 17 faulty files, in A), the modified date was incorrect. that is: the modified date in the column of explorer, and the modified date in the details tab after a right-clik/properties of the file.
These 17 files, in acrobat, had file attachments. I did a remove hidden info and removed the metadata and the file attachment, then rebuild the metadata, and saved the files with a save as. Nothing happened in explorer (case A)) until I closed acrobat. What happens next is that the system process (but not searchprotocolhost, which called pdfpropertyextension.dll during the indexing proces) accessed the saved files, and then, after a few seconds, the metadata where accurately displayed in explorer (A).
I did another test before cleaning all the faulty pdf : I copied some pdf folders in another location (a new folder not marked for indexing) : when I searched for *.pdf in that location, all the metadata where accurately displayed.
What I think I understand at the moment is : you can display metadata in a folder that is indexed, and in a folder that is not indexed. If the folder is indexed, there may be some problems with some file attachments inside the pdf that will prevent the indexing process to get the correct modified date.
And it seems that libraries are considered by windows as a bunch of indexed folders. That is, if you want to see the metadata in a library, the indexing process should be correct (in that case too, you have to clean the pdf by removing the annoying inside file attachments).
If you think it could be interesting to take a look at the faulty pdf, I upload a few of them (the smallers, before I removed the attachments)
Thank you for your help.
--
pc
- Attachments (Only registered users)
- 5faulty.rar
- coolsoft
- Posts: 1978
- Joined: March 25, 2012 - 01:19
Thanks for the sample files.
I can confirm that PDF file attachments don't have any issue with PDFPropertyExtension, so I suppose that Windows Search is faulty with them.
The attached files displayed correctly on my development machine (Win7-x64), in both Search and default explorer view but I've disabled Windows Search a long time ago because I don't trust it, and I consider it a big CPU hog.
I often need to search into file content ("search content" feature) and Windows Search is simply not reliable.
The issue you're describing is just another reason to disable it ;)
Navigation
Login
Support me
Click here if you want to support CoolSoft using PayPal