Saturday, 7 March 2009

Converting RTF to plain TXT

As part of my work on the Excel spreadsheet mentioned in the previous post, I also had to display contents of a RTF field.
Since Excel doesn't parse RTF in any way (well, except for import as a file, but that sucks too), I had to convert the RTF field retrieved from the database into plain text so that I could display it for the user.

Following is code that will convert the RTF to plain text.
However, the function makes some assumptions:
  1. The RTF is normal text, maybe with some font formatting
  2. No tables, lists or any special RTF structures are supported. They will be converted to plain text with no special formatting. If you need special formatting, you'll have to add appropriate lines of code into the parser...
  3. The convertor assumes that the code page of the RTF is the same as the code page of the client computer. This is important if you have special (language specific) characters in the RTF itself.
  4. Also in regard to code page, this convertor will only work on ANSI code pages. This means, it will only convert single-byte characters, not multi byte ones. Since I don't have a multi byte RTF available, I don't know how hard it is to fix this one.
  5. Also note that I'm no Excel guru, so the code in the macro may be sub-optimal
So here it goes:

Private Function hexcode(ss)
 If ss = "0" Then
   hexcode = 0
 ElseIf ss = "1" Then
   hexcode = 1
 ElseIf ss = "2" Then
   hexcode = 2
 ElseIf ss = "3" Then
   hexcode = 3
 ElseIf ss = "4" Then
   hexcode = 4
 ElseIf ss = "5" Then
   hexcode = 5
 ElseIf ss = "6" Then
   hexcode = 6
 ElseIf ss = "7" Then
   hexcode = 7
 ElseIf ss = "8" Then
   hexcode = 8
 ElseIf ss = "9" Then
   hexcode = 9
 ElseIf ss = "a" Or ss = "A" Then
   hexcode = 10
 ElseIf ss = "b" Or ss = "B" Then
   hexcode = 11
 ElseIf ss = "c" Or ss = "C" Then
   hexcode = 12
 ElseIf ss = "d" Or ss = "D" Then
   hexcode = 13
 ElseIf ss = "e" Or ss = "E" Then
   hexcode = 14
 ElseIf ss = "f" Or ss = "F" Then
   hexcode = 15
 Else
   hexcode = 0
 End If
End Function

Private Function RTF2TXT(ss)
 While (Right(ss, 1) = Chr(10) Or Right(ss, 1) = Chr(13) Or Right(ss, 1) = " " Or Right(ss, 1) = "}")
   ss = Left(ss, Len(ss) - 1)
 Wend
 If (Len(ss) >= 1) Then
   ss = Right(ss, Len(ss) - 1)
 End If
 iPos = 1
 sResult = ""

 While (Len(ss) > 0)
   If (Mid(ss, iPos, 1) = "\") Then
     If (Mid(ss, iPos + 1, 3) = "tab") Then
       sResult = sResult + Chr(9)
       iPos = iPos + 4
     ElseIf (Mid(ss, iPos + 1, 3) = "par") And (Mid(ss, iPos + 1, 4) <> "pard") Then
       sResult = sResult + Chr(10) 'Chr(13) + chr(10) seems to not work, #13 is displayed as a square char
       iPos = iPos + 4
     ElseIf (Mid(ss, iPos + 1, 1) = "'") Then
       sResult = sResult + Chr(hexcode(Mid(ss, iPos + 2, 1)) * 16 + hexcode(Mid(ss, iPos + 3, 1)))
       iPos = iPos + 4
     Else
       iPos = iPos + 1
       While Mid(ss, iPos, 1) <> "\" And Mid(ss, iPos, 1) <> "{" And Mid(ss, iPos, 1) <> Chr(13) And Mid(ss, iPos, 1) <> Chr(10) And Mid(ss, iPos, 1) <> " "
         iPos = iPos + 1
       Wend
       If Mid(ss, iPos, 1) = " " Then
         iPos = iPos + 1
       End If
     End If
   ElseIf (Mid(ss, iPos, 1) = "{") Then
     iLevel = 1
     iPos = iPos + 1
     While iLevel > 0
       If Mid(ss, iPos, 1) = "{" Then
         iLevel = iLevel + 1
       ElseIf Mid(ss, iPos, 1) = "}" Then
         iLevel = iLevel - 1
       End If
       iPos = iPos + 1
     Wend
   ElseIf (Mid(ss, iPos, 1) = Chr(10) Or Mid(ss, iPos, 1) = Chr(13)) Then
     iPos = iPos + 1
   Else
     sResult = sResult + Mid(ss, 1, 1)
     iPos = iPos + 1
   End If
   If iPos = Len(ss) Then
     ss = ""
   Else
     ss = Mid(ss, iPos)
   End If
   iPos = 1
 Wend

 While (Right(sResult, 1) = Chr(10) Or Right(sResult, 1) = Chr(13) Or Right(sResult, 1) = " ")
   sResult = Left(sResult, Len(sResult) - 1)
 Wend
 RTF2TXT = sResult
End Function

Hope it helps anyone. I couldn't find anything like this function after searching for days.