in

SharePoint Blogs

The Best Place for SharePoint-related Blogs

Sharepoint Space

It is all about Sharepoint

Word documents to Sharepoint

We currently have a challenge of converting over thousand Word 2003 document into Sharepoint publishing page. Here is our approach:
Step 1: Run Microsoft document convert tool to convert word document into docx
You can download it free from Microsoft at: http://www.microsoft.com/downloads/details.aspx?familyid=13580cd7-a8bc-40ef-8281-dd2c325a5a81&displaylang=en
Step 2: Call out of box document convert to convert docx into aspx
With thousands of files to convert, we have to write a code to do the job. Use Microsoft.Sharepoint.Publishing.PublishingPageCollection class to add new publishing page.

public PublishingPage Add (
string newPageName,
SPFile fileToConvert,
Guid transformerId,
PageConversionPriority priority )

This approach works however we now have 2 issues
1. About 10% of the documents does not convert. The error message shows internal error which does not help us identify the problem. After spending lots of time on the word document, we think this error is relate to the word document format. Such as bullet, section break...at this time, we can not certain exactly the problem
2. Some of the Word format lost or style has been changed after convert.

To work this around, I have another idea to bypass the converter. For example, to convert 100.doc to 100.aspx we follow the steps:
1. create an empty 100.aspx publishing page first. Of course, we know the content type and the field where to save the html content.
2. convert 100.doc to 100.htm using the default office behavior. You could do this by writing a piece of code or just open the 100.doc and save as a html file.
3. programtically paste html text into the 100.aspx publishing page content field.
Note: the html saved by Word will have a style css within the html page. Some class might have a conflict with you Sharepoint master page. Make sure delete those class. However, keep css style in the html and paste into the content field is not the best practice. I'd rather save those style into the css file which the master page is using. If you do this, when paste html text to the content field, make sure delete all style section from the html content.

It is faster and all the word format will be kept even the bullet and lines in the word document. However, I really not sure if this is a good idea or not but at least it gives you an alternative way of converting word document into Sharepoint.

I will post entire solution when the code is complete.

Published Oct 06 2007, 12:11 AM by mingssn
Filed under:

Comments

 

Links (10/7/2007) « Steve Pietrek’s SharePoint Stuff said:

Pingback from  Links (10/7/2007) « Steve Pietrek’s SharePoint Stuff

October 7, 2007 7:16 PM
 

Sharepoint Space said:

October 9, 2007 6:23 PM
 

Blogger Loser » Convert html to Sharepoint aspx page said:

Pingback from  Blogger Loser » Convert html to Sharepoint aspx page

October 9, 2007 6:58 PM
 

SharePointPodcast.de said:

Direkter Download: SPPD-080-2007-11-08 Aktuell E-Mail Records Retention in SharePoint Server 2007 MSDN

November 8, 2007 1:42 AM
 

SharePoint, SharePoint and stuff said:

Direkter Download: SPPD-080-2007-11-08 Aktuell E-Mail Records Retention in SharePoint Server 2007 MSDN

November 8, 2007 1:45 AM
 

Need Help said:

I need to read the uploaded word document content in to the wiki field. Do you have any suggestions for the same. Thanks.

July 12, 2008 11:50 PM

Leave a Comment

(required )  
(optional )
(required )  
Add

Need SharePoint Training? Attend a SharePoint Bootcamp!

Posts (c) their respective authors. Everything else (c) 2007 SharePoint Experts