SharePoint Blogs / SharePoint University
SharePoint Blogs and SharePoint University - all in one place!
Need SharePoint Training? Attend a SharePoint Bootcamp!

Please delete cookies related to sharepointblogs.com and sharepointu.com to resolve login issues!

Word documents to Sharepoint

We currently have a challenge of converting over thousand Word 2003 document into Sharepoint publishing page. Here is our approach:
Step 1: Run Microsoft document convert tool to convert word document into docx
You can download it free from Microsoft at: http://www.microsoft.com/downloads/details.aspx?familyid=13580cd7-a8bc-40ef-8281-dd2c325a5a81&displaylang=en
Step 2: Call out of box document convert to convert docx into aspx
With thousands of files to convert, we have to write a code to do the job. Use Microsoft.Sharepoint.Publishing.PublishingPageCollection class to add new publishing page.

public PublishingPage Add (
string newPageName,
SPFile fileToConvert,
Guid transformerId,
PageConversionPriority priority )

This approach works however we now have 2 issues
1. About 10% of the documents does not convert. The error message shows internal error which does not help us identify the problem. After spending lots of time on the word document, we think this error is relate to the word document format. Such as bullet, section break...at this time, we can not certain exactly the problem
2. Some of the Word format lost or style has been changed after convert.

To work this around, I have another idea to bypass the converter. For example, to convert 100.doc to 100.aspx we follow the steps:
1. create an empty 100.aspx publishing page first. Of course, we know the content type and the field where to save the html content.
2. convert 100.doc to 100.htm using the default office behavior. You could do this by writing a piece of code or just open the 100.doc and save as a html file.
3. programtically paste html text into the 100.aspx publishing page content field.
Note: the html saved by Word will have a style css within the html page. Some class might have a conflict with you Sharepoint master page. Make sure delete those class. However, keep css style in the html and paste into the content field is not the best practice. I'd rather save those style into the css file which the master page is using. If you do this, when paste html text to the content field, make sure delete all style section from the html content.

It is faster and all the word format will be kept even the bullet and lines in the word document. However, I really not sure if this is a good idea or not but at least it gives you an alternative way of converting word document into Sharepoint.

I will post entire solution when the code is complete.


Posted 10-06-2007 12:11 AM by mingssn

Comments

Links (10/7/2007) « Steve Pietrek’s SharePoint Stuff wrote Links (10/7/2007) « Steve Pietrek’s SharePoint Stuff
on 10-07-2007 7:16 PM

Pingback from  Links (10/7/2007) « Steve Pietrek’s SharePoint Stuff

Sharepoint Space wrote Convert html to Sharepoint aspx page
on 10-09-2007 6:23 PM
Blogger Loser » Convert html to Sharepoint aspx page wrote Blogger Loser » Convert html to Sharepoint aspx page
on 10-09-2007 6:58 PM

Pingback from  Blogger Loser » Convert html to Sharepoint aspx page

SharePointPodcast.de wrote SPPD080 SharePointPodcast
on 11-08-2007 1:42 AM

Direkter Download: SPPD-080-2007-11-08 Aktuell E-Mail Records Retention in SharePoint Server 2007 MSDN

SharePoint, SharePoint and stuff wrote SPPD080 SharePointPodcast
on 11-08-2007 1:45 AM

Direkter Download: SPPD-080-2007-11-08 Aktuell E-Mail Records Retention in SharePoint Server 2007 MSDN

Need Help wrote re: Word documents to Sharepoint
on 07-12-2008 11:50 PM

I need to read the uploaded word document content in to the wiki field. Do you have any suggestions for the same. Thanks.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Need SharePoint Training? Attend a SharePoint Bootcamp!
Posts (c) their respective authors. Everything else (c) 2009 SharePoint Experts, Inc.