Welcome, Guest. Please login or register.
July 18, 2025, 01:30:00 AM

Login with username, password and session length

Search:     Advanced search
we're back, baby
*
Home Help Search Login Register
f13.net  |  f13.net General Forums  |  General Discussion  |  Topic: Document merging, ASP and me! 0 Members and 1 Guest are viewing this topic.
Pages: [1] Go Down Print
Author Topic: Document merging, ASP and me!  (Read 3301 times)
NiX
Wiki Admin
Posts: 7770

Locomotive Pandamonium


on: April 05, 2007, 08:21:44 AM

So, here we go....

Requirements:
- Create a website which will generate a new document (Word/PDF) by modify existing Word documents (combination of several Word documents every time) with data entered by the end user.  All documents have fields which will be replaced, represented by literal strings.
- Must be done in ASP, IIS, MS SQL Server
- Word cannot be run on the server (not recommended by Microsoft).

What is currently done:
- by creating all combinations of the Word documents and saving them as XML, we bypassed the need to open Word.  By doing this though, we run into the problem of being able to maintain the Word documents (if one document changes, it's possible that all combinations will change).

What's needed:
- a better method of doing this so that the maintenance of the documents is a lot easier.

This is KILLING me. It really is. No we can't switch from MS. Company insists on it being that environment.
schild
Administrator
Posts: 60350


WWW
Reply #1 on: April 05, 2007, 09:25:55 AM

Did you take a class on "Windows, The Internet, And How it Screws You" or something.

Why? Why would you need to do this? I'm sure SOMEONE can help you, but dear god. Why?
Evil Elvis
Terracotta Army
Posts: 963


Reply #2 on: April 05, 2007, 09:27:05 AM

This probably won't be too helpful, but I'd strongly push to convert all your .doc files to .pdf format.  Stress stronger interoperability and forward-planning for the likelyhood in the decline of Windows on the desktop.  Let them know that office now exports to PDF.

PDF has FDF (forms for pdf) which make it pretty easy to search/replace data.  I just recently finished doing something similar by auto-generating pdf application forms from graduate student information entered into our database using TCPDF and PHP.  I'm sure there's some similar products out there for ASP.  It's going to be alot easier than scraping the data from every .doc file, and doing your own search/replace.
Krakrok
Terracotta Army
Posts: 2190


Reply #3 on: April 05, 2007, 09:51:55 AM


Yeah, PDF. Just keep all the Word docs there and when they get changed just convert them to PDF after the change. Then combine the PDFs in whatever order and output the single PDF. There are all kinds of PDF combiners out there. Here is a word to PDF COM component but I can't tell if it requires Word.
Sairon
Terracotta Army
Posts: 866


Reply #4 on: April 05, 2007, 10:10:43 AM

What freaking company do you work at? ASP is only used for legacy reasons, the platform was abandoned years ago even by microsoft themselves. Try to make a push for at least ASP.NET. It doesn't even have to be on the same server as the one running your old software. I think you will have problems finding a component for ASP which lets you alter word documents if they've changed the format in any way during the last years.
Lantyssa
Terracotta Army
Posts: 20848


Reply #5 on: April 05, 2007, 10:11:26 AM

I've seen free-ware converters that you "print" a Word document to and it generates the PDF.  That way not everyone needs to have Acrobat or whatever.

Hahahaha!  I'm really good at this!
Roac
Terracotta Army
Posts: 3338


Reply #6 on: April 05, 2007, 10:44:18 AM

What freaking company do you work at? ASP is only used for legacy reasons, the platform was abandoned years ago even by microsoft themselves. Try to make a push for at least ASP.NET. It doesn't even have to be on the same server as the one running your old software. I think you will have problems finding a component for ASP which lets you alter word documents if they've changed the format in any way during the last years.

If he'd use ASP.NET, and he's already using the xml format for Word docs, it shouldn't be much more than building the right XSL doc and management pieces in .NET.  Not terribly difficult, so long as the documents aren't that large or complex.

-Roac
King of Ravens

"Young people who pretend to be wise to the ways of the world are mostly just cynics. Cynicism masquerades as wisdom, but it is the farthest thing from it. Because cynics don't learn anything. Because cynicism is a self-imposed blindness, a rejection of the world because we are afraid it will hurt us or disappoint us." -SC
Evil Elvis
Terracotta Army
Posts: 963


Reply #7 on: April 05, 2007, 10:58:43 AM

It sounded like they're just generating their own XML format so it'll be easier to parse the elements in ASP.

But I think the real problem he described is that their documents are all manually generated, not programatically.  Doing combinations by hand is going to get real ugly real quick. 
NiX
Wiki Admin
Posts: 7770

Locomotive Pandamonium


Reply #8 on: April 05, 2007, 12:09:18 PM

- It is done in ASP.NET (my mistake in assuming ASP = ASP.NET).

- All documents have different page formatting, fonts, header/footer and are maintained by someone who will be using Word and that's the extent of the control I have over the documents that I do get back.  I do not have the ability to say you must use this font, this size, etc, etc.

- It's already a pain just to merge the documents manually because of this (no standard font, page margins, header/footer margins).  Copy and paste does NOT WORK correctly because of this (maybe it does for 2007, but we're working with 2003).

- This is done in Word 2003 (no export to PDF as far as I know for this without something that needs to be purchased).  I'm aware that Word 2007 does have a PDF export function though.

- I don't have a problem doing the actual replace because it's an xml file, but the files are not something that is created BY the website.

- Will look into the FDF and see if that helps.

- Top it all off this is all being done in Visual Studio 2005 and the Word Documents are done through a VSTO Windows Application.

- The word documents are mainly text, I cannot have a blank box/area to fill in stuff (like your tax returns. The text needs to actually wrap depending on the length at least this is what I'm noticing when I'm looking at this FDF that was mentioned (it looks like it's fields for grid/table.)


Keeping this in points so I don't confuse myself or lose track of what I've said.
« Last Edit: April 05, 2007, 12:26:53 PM by NiX »
bhodi
Moderator
Posts: 6817

No lie.


Reply #9 on: April 05, 2007, 12:24:53 PM

I feel for you. That's a situation where even the best solution is going to be kludgy.
naum
Terracotta Army
Posts: 4263


WWW
Reply #10 on: April 05, 2007, 01:59:43 PM

I face this battle in my present position.

Word generated XML is just as abominable as the mounds of grossly foul HTML it can produce.

While I have only dented the document store in our network, I have implemented Ruby scripts (using win32ole library) to convert .doc files to .html. Even have implemented a parser that takes Word documents in "outline format" and ports them out to an HTML format.

Generating PDFs is a piece of cake -- either on Mac platform (where it is simple as "Print as PDF") or on Windows (have full copy of Acrobat).

"Should the batman kill Joker because it would save more lives?" is a fundamentally different question from "should the batman have a bunch of machineguns that go BATBATBATBATBAT because its totally cool?". ~Goumindong
Trippy
Administrator
Posts: 23657


Reply #11 on: April 05, 2007, 03:25:02 PM

Look into the RTF format. It's kind of cryptic and undocumented but it's stored as plain text rather than the binary format of a regular .DOC file making it much more suitable for programmatic editing. Office 2007 uses XML but since you aren't using that version that's not an option.
Evil Elvis
Terracotta Army
Posts: 963


Reply #12 on: April 05, 2007, 06:20:41 PM

The easiest way for you to convert those docs to pdf's by hand is how Lantyssa suggested: get a PDF print driver (http://www.pdfforge.org/products/pdfcreator).  Then open the file and tell it to print using the pdf writer as the printer.

FDF should be able to wrap line text.  FDF does require the areas where data is to be replaced to be marked.  Get a copy of Adobe Acrobat Pro.  With it you just select an area, tell it that you want it to be a form field, assign a variable to it, and you're done.  Then you'd use some (hopefully) open source code to allow you to manage the pdf's in asp, and do search/replace on the form field variable.

You could do the doc->pdf conversion yourself, and just ad the fdf fields if it's not too much hassle.  Personally, if you think you could do what you need with asp and fdf, I'd get a trial version of Acrobat Pro, learn how to make an pdf with fdf, and then show your boss how easy it is.  Convince them that the $120 bucks for acrobat and the few hours it would take to retrain the staff to use it will be alot cheaper (and more future-proof) than what they want from you.

Or convince your boss to shell out $900 clams for an aspose.com doc reader assembly.
Murgos
Terracotta Army
Posts: 7474


Reply #13 on: April 06, 2007, 05:56:43 AM

Eh?  Word outputs well formed XML.  Yes, it's bloated with a bazillion things you don't care about but it is a very ordered and well documented schema.  I also think I recall that you can generate your own schema for Word to export files to if you don't need all the crap tags that word gives you.

It's really not that difficult to parse an XML tree, save out the relevant bits, and regenerate the documents on the fly when requested.  A form seems naturally suited to this.

Create a schema with the field names from the document.  Map them to the word document.  Export to XML.  Parse the XML for relevant info and store it in your DB of choice.

When a doc is requested query the DB, grab the data, populate an XML tree and import it back into Word for the user to manipulate.

If you are clever you can even embed this into Word with macros in the forms and be transparent.  Or, as was mentioned earlier use an XSLT to transform the Word XML tree to a more relevant tree and proceed from there.

This is not a new problem with documentation and one that has been solved numerous times before.  I guarantee you that google will be your friend in this matter.

"You have all recieved youre last warning. I am in the process of currently tracking all of youre ips and pinging your home adressess. you should not have commencemed a war with me" - Aaron Rayburn
naum
Terracotta Army
Posts: 4263


WWW
Reply #14 on: April 06, 2007, 08:15:41 AM

Eh?  Word outputs well formed XML.  Yes, it's bloated with a bazillion things you don't care about but it is a very ordered and well documented schema.  I also think I recall that you can generate your own schema for Word to export files to if you don't need all the crap tags that word gives you.

It's bloated beyond usefulness.

Again, using win32ole library, was much easier to parse paragraphs (text ranges)…


"Should the batman kill Joker because it would save more lives?" is a fundamentally different question from "should the batman have a bunch of machineguns that go BATBATBATBATBAT because its totally cool?". ~Goumindong
Murgos
Terracotta Army
Posts: 7474


Reply #15 on: April 06, 2007, 08:21:27 AM

I agree, you will notice that the vast majority of my post concerned using a custom schema, which even if you have never done one before will probably take you all of 1 morning to figure out.  There are helpful tutorials all over the place, I recommend w3schools.

"You have all recieved youre last warning. I am in the process of currently tracking all of youre ips and pinging your home adressess. you should not have commencemed a war with me" - Aaron Rayburn
Roac
Terracotta Army
Posts: 3338


Reply #16 on: April 06, 2007, 09:34:31 AM

It's bloated beyond usefulness.

Again, using win32ole library, was much easier to parse paragraphs (text ranges)…

Yes it's bloated.  Doesn't matter.  If you don't know XSL a project like this is a great opportunity to learn it, and throwing XSL against a document which has known fields before hand should make it a snap to pull out and reorg however you like.  If the data is already being supplied in XML format, this is really the way to go.  I mean, easy manipulation of tagged data is the whole point of XSL. 

-Roac
King of Ravens

"Young people who pretend to be wise to the ways of the world are mostly just cynics. Cynicism masquerades as wisdom, but it is the farthest thing from it. Because cynics don't learn anything. Because cynicism is a self-imposed blindness, a rejection of the world because we are afraid it will hurt us or disappoint us." -SC
Jayce
Terracotta Army
Posts: 2647

Diluted Fool


Reply #17 on: April 07, 2007, 07:03:12 PM

I don't know if this can specifically help you, since you seem to have an xml solution going pretty well, but I think that you can use the Word PIA (Primary Interop Assembly) in a web (or other) app without the full install of Word - it's just a DLL.  I would think that would at least give you the ability to generate a Word file from the various bits of xml you have as your components, and maybe programmatically generate the xml given text.  I haven't actually used it though, I just know it exists.

Witty banter not included.
Pages: [1] Go Up Print 
f13.net  |  f13.net General Forums  |  General Discussion  |  Topic: Document merging, ASP and me!  
Jump to:  

Powered by SMF 1.1.10 | SMF © 2006-2009, Simple Machines LLC