19 comments on “Print Archiving via PaperCut

  1. Wow. A+ for such a detailed blog post! After that much technical effort I have to jump in with a few “developer thoughts”. You raise some very valid points!

    (FYI – I lead the development of the GhostTrap component that’s used in print archiving).

    Account Permissions:
    Your point around targeting “least user privilege” is spot on and it’s good to see you’re striving for this. We also factor a lot of this type of thinking in to much of PaperCut’s design. This is why PaperCut is split up into separate processes/services performing different roles with different responsibilities. A good example would be the print archive image generation process itself. This task is delegated to GhostTrap which is run with very restricted privileges in a sandbox.

    The downside to “least user privilege” is the trade-off between simplicity and security! As software designers, it’s our job to do our best to abstract this and make it simple!

    Setup Complexity:
    The issue of setup complexity is very valid and one we debated (at length) when we first released print archiving. Print archiving works very well when print queues are all hosted on the one system – just set up GhostTrap and it works! It gets complex quickly when you involve multiple systems. Our choices were:

    1) Don’t support multiple systems (at least for the first release of print archiving).

    2) Support multiple systems, but have a “less than ideal” setup process, and improve over time.

    3) Delay release or cut other features and make it easy/automatic to support multiple systems.

    Our decision was 2 knowing that most sites with multiple systems/servers generally have good system administrators that will put up with the complexities of shares and permissions, etc. The failing of this however is that completing the 2nd phase (improving over time) has been a little slow in coming!

    Improvements:
    Our thinking moving forward is that we’ll remove the reliance on network shares (e.g. secondary servers at the moment copy the archived spool file into a shared folder). Instead we’ll transfer files to the Application Server via HTTPS (e.g. a file upload). At the time we thought a file share was the best option because you get security, encryption and file locking/retry for free. It was also “zero development”. In hindsight it’s at the cost of setup complexity!

    I’d welcome your thoughts here. Do you think HTTPS transfer that works out of the box be a better option? The downside would be it would be a little less efficient and less fault tolerant.

    Documentation:
    You’re 100% correct. The statement “Create a new domain account with access to the [archive] share […] and full management rights of print spooler on the local machine” is very light on. In most environments with say two print servers, a domain admin account is fine, but I fully agree that it’s not the right solution for desktop systems. We have been deliberately vague here as it would have been wrong to say “just use a domain admin account” for this reason.

    I’ll suggest that we consider an improvement here.

    Keep em coming!

    Like

  2. Thanks so much for the comments Chris; they mean quite a lot coming from one of the guys writing the software.

    Given the three choices you outlined, I agree #2 was the right way to go. And I can’t fault your not placing a higher priority on making this process easier without knowing to what extent it’s impacting other customers; presumably you have a feel for not only how many customers are using print archiving, but also how many are facing a large quantity of local printers as we are. Believe me, my staff and I would love to eliminate all the local printers, and we are doing so as quickly as funds allow. They are a hassle in more ways than one.

    As for a possible architectural change in how archiving works, I see two issues. The difficulty with the current archive setup is not the network share per se, but the permission changes necessary on the print servers to access that share. If you went the HTTPS route, can you make that work with the PCPrintProvider service running as its default of Local System, and without having to make the permission changes to the print spooler and each individual printer? If so, then that’s a huge advantage.

    What I’m not clear on however, either in the current configuration or with the proposed HTTPS change, is which server(s) bear the brunt of the CPU load for archiving. I was under the impression the print server was the heavier loaded machine, but the more I learn about how this all works, the more it seems the application server itself has the higher workload.

    Depending on the answer to that question and any possible (load) changes in moving from the network share to the HTTPS upload method, it could change how someone in my position might choose to architect their print archive setup. And that may not be desirable for those of us who already have archiving functioning.

    Bottom line though, I think the technology used for the archiving matters less than having an easier way of setting it all up. If the HTTPS method eliminates the need to configure all of these permissions, that’s a substantial improvement.

    If it doesn’t, then ideally the Windows installer (I can’t speak to OSX or Linux) should automatically make any permission changes that are necessary. In addition – and I didn’t reference this in my original post – since permission changes are currently needed on *every* printer, the PCPrintProvider service would need to monitor for new printers and change permissions on any new ones that are added. (We have a script running on every startup that does this.)

    I have no idea which option is easier to implement or would ultimately work better, but hopefully my feedback helps a little. Archiving is a great feature and it’s unfortunate if people are choosing not to use it because of the complexity involved to set it up.

    Thanks again for visiting and commenting! I look forward to speaking to you more.

    James

    Like

  3. I had a chat about this with the support guys and one of the other devs today. Our thinking is that maybe we should shoot for the best of both worlds. Out-of-the-box, PaperCut would use the HTTP(S) transfer method meaning it will work with minimal or no extra configuration. However we’d leave in the option to also configure the file share method. That way larger sites with more volume, or a preference for some technical reason, can select this method. Again we’d document this a little better 🙂

    At the technical level: The HTTP(S) method would work fine under the Local System account. Your idea of having PCPrintProvider do the permission setting on a per queue basis is also a great one.

    On the load question: You’re correct. The Application Server bears the brunt of the archiving load. The Application Server performs the conversion of the document from a raw spool file into the image formats, and manages the archiving of these image formats. Having said that, we do a lot of work to ensure this load does not overwhelm the system. Some of the mitigations are:

    * Image conversion is only done for the first few pages. Subsequent image pages are generated on-demand as you page through the document.

    * Image conversion is done using a “worker pool”. The default worker pool size is two. This means that if, say, 100 documents arrive all at the same time, at most two are ever processed in parallel and the rest are queued. This way the CPU should never be swamped.

    * Image conversion is done in low priority threads. Other work on the server will take CPU priority.

    The architecture design has proven very successful. Even the largest sites have had no problem scaling.

    Moving to HTTP(S) transfer would add some CPU overhead on the App Server, however I/O overhead and/or resilience on connection failure is more of a question at the moment.

    Thanks again for your feedback!

    Like

  4. Thanks Chris, this is great information. I agree giving users a choice between the current SMB and the proposed HTTPS upload method would be ideal and would be keeping in line with how flexible PaperCut already is in most areas. Thanks again for your comments.

    Like

    • Good day

      Hi James,
      I have tried to make the Papercut Archive work all in vein, i have enabled the ” Enable Print Archiving” from “options ->Genera -> Enable Archivingl”, after printing out some documents, if i check the archived document under the printer logs, it shows the Archive document is expired or no available and for sure there is nothing in Archive folder, i have followed everything the Papercut’s guidelines states and installed the additional software called ” Ghost Trap , Ghostscrip, GhostPCL” all have failed to me view or preview the archived files.

      After along search i found that some archieved files are located on the local machine hosting the printer under C:/program files/papercut/server/data/Archive/. the files there are in .EMF formate and they can’t be read or converted in JPEG. how do i read the files ?

      Please help me with a possible simple solutions for this problem of retrieving and viewing the archived files in papercut server

      Like

      • Hi Jacob,

        I’m one of the developers here at PaperCut. We’d like to look more into this issue. The fact that you’re getting the *.EMF files in the archive directory suggests that the archiving is working – it’s just the conversion to an image that’s not working. EMF file conversion should work out of the box without the need for Ghost Trap.

        If you haven’t already done so, please report this problem to our support team. They will ask you to enable some debug logging and/or provide one of the EMF files for further analysis.

        If you continue to have issues, please also feel free to reach out to me at chris papercut com.

        Cheers,

        Chris

        Like

        • Hi Chris,
          Yeah the files are being archived, only the conversion of the Image is the problem.
          I have sent you an ermail with the details and the print screens.

          Rgds

          Jacob

          Like

  5. Jacob – The fact that you found “some archieved files […] located on the local machine hosting the printer” makes it sounds as if your PaperCut app server is on a different system than the printer and the print-provider.conf file (on the machine hosting the printer) has not been changed to point to a network share.

    If that’s the case then you need to go through essentially the entire setup in this blog post to get things to work.

    Let me know…I’m curious.

    Like

  6. Hmm it seеms lіke yօu site ate my first ϲomment (іt wass super lߋng) ѕo I guess I’ll jսst sum itt uƿ whɑt I Һad written and ѕay, I’m thoгoughly enjoying yoսr
    blog. I as ѡell am ɑn aspiring blog blogger ƅut Ӏ’mstill neѡ tߋ the wholе thing.
    Do you have any pоints for rookie blog writers? ӏ’d definitely аppreciate it.

    Like

    • Thanks, I appreciate the kind words. I think I’m still a rookie poster myself, so I’m not sure how much advice I can give you other than to say post what interests you. Also, try to make sure your posts are unique and not just simple rehashes of what someone else has posted elsewhere. That’s been my approach and I seem to get a reasonable amount of traffic; oddly enough the more obscure the post the more traffic it seems to get. I guess that’s a function of there not being much else out there on these subjects, which was my intent to begin with. Good luck!

      Like

  7. Really appreciate your time and effort with this.

    I also moved our archive, which in turn stopped working. Googled and found your post.
    Tried your suggestion and script on server 2012, didn’t have much luck.

    Then it dawned on me, there is an easier way (at least for me).

    1 Use the default archive option.
    (Which I believe is: C:\Program File\Papercut MF\server\data\archive)

    2 Rename the “archive” folder to something else, otherwise the following command will fail.

    3 Open command prompt and type the following. (my archive will be located at D:\Archive)
    mklink /d “C:\Program File\Papercut MF\server\data\archive” D:\Archive

    You can also share this linked folder for your second printer server.

    Hope this helps.

    Like

  8. Hi James.

    Had a hard time with that archive config, eh?

    I simply used a domain service account that I created for PaperCut (with the logon as a service right) and put it in the local admin group of my app server plus all four of my print servers, then gave that account FULL control in the Archive share permissions on my app server as well as put the UNC path for that share in the four print-provider.conf files. Changed all four Print Provider services to use that service account instead of SYSTEM (as it cannot open network shares), restarted the Print Spooler services on all four print servers (which restarts the Print Provider services too), et voilà!

    A quick check of the four print-provider.log files showed no errors and the archive thumbnails showed up in the PaperCut Job Log.

    🙂

    I ended up on your blog looking for instructions on how to setup GHOST TRAP.

    Like

    • Oops… A closer look at the PaperCut archiving docs shows that a restart of the app server service gets PaperCut to automatically detect and use Ghost Trap (developed in partnership with PaperCut). Very slick!!!

      🙂

      All done now.

      Like

    • Thanks for the comments John.

      If you are comfortable granting the domain service account local admin rights on every one of your print servers then by all means this is a much easier process. In our case, while I consider that a reasonably acceptable solution for an actual print server, we have many end-user workstations that needed the PaperCut “secondary print server” software installed to support archiving local printers, and I did not want to add a domain account to the local admin group of every PC.

      Like

      • Oh yeah James. In that case, I would do the same. You could have your script run at login and check to see if a control file (i.e. archive-setup.txt) exists in the “PaperCut MF\providers\print\win” folder signifying that your customized print-provider.conf file is in place. If not, all of your commands execute, which include a line that copies that empty control file as well. If a new machine, whose image includes the print provider install, gets put on the network, it will be automatically be setup for archiving the first time anybody logs into it. 🙂

        Like

Leave a comment