Tuesday, 24 March 2009

Faster data fetching from multiple SQL tables with multiple base elements to analyze

What this method gives you:
  1. This method doesn't require you to massively rewrite the code.
  2. The code is relatively easy to understand.
  3. The speed improvements are simply massive (30 - 50 times faster code, up to 200 in some scenarios).
  4. and best of all: you can implement this method step by step, one subquery at a time and observe the speed improvements as you go.
Note that I don't read insane amount of books. That said I haven't yet seen this method in a book and therefore don't know it's "official" name. But since my fellow programmers never know what I mean when I say it's name, I'll state for the record that I call this method "synchronized queries". The name stems from the fact that at fetching time, some synchronization must be performed among the queries affected.

Also, this method is to be considered intermediate level at best. It's no rocket science. But since I implemented it on a fair bit of different code segments, I guess it isn't the typical programmer's first line of thought.

On to the method then:

If you've ever programmed a business type application, like MRP, CRM or some such, you've most certainly had to implement an analysis that took multiple base documents as input and then had to fetch the data about those documents from a series of database tables.

The most common scenario that comes to mind is order analysis.
To analyze an order, one has to fetch data from multiple tables like some of the following:
Purchases (materials, services)
Shipping documents (finished products)
Invoices (finished product)
And then a whole myriad of data in the production which you entered to keep track of the production of the ordered product:
Work orders
Material issues
Semi-product and product completion
Worktime + tools usage
etc, etc.

In such an analysis all these tables have the requested order(s) to be analyzed in common. Meaning that for each table you can by some means get to the order id. Either it's in the table already or there's some other table that creates the connection between the two.

The most common solution to do such an analysis that I have seen goes something like this:

for each order do
analyze purchases
analyze shipping
analyze invoices
analyze work orders
analyze material issues
...
next

As you can see, this solution is OK for situations where you only have to analyze one order and it's respective details. You will have one compile for the order and one compile for each detail.

However, when you are doing for example a yearly analysis, it quickly becomes obvious that we have potentionally thousands of orders and therefore thousands of compiles for the poor SQL server.

A compile (statement execute) is relatively the most time consuming part of fetching data from an SQL database. Especially if the query itself is complex and returns relatively few records. Having no indexes on the filter and order fields helps a lot too :)

So the above example behaves optimally when there is one order to analyze and decreases in efficiency with each following order since all subqueries have to be recompiled to return data for each subsequent order.

Knowing that a compile is very expensive, the best way to go should be quite obvious - just decrease the number of compiles. But how can one do that? If I make the queries such that they will return data for all requested orders, that data will just be mixed all together making the analysis very hard to do, especially if we consider the volume of all that data. There's no way it can be fit into memory and retrieved when needed.

Well, the solution is quite simple:
In order to keep the actual analysis routine as close to the original algorithm as possible and still reap the benefits of fewer compiles, one needs to fulfill ony one additional condition: all queries must be sortable by order number.
Since SQL has an adequate command for that, namely "order by", this condition is easy to meet.

There is one additional issue that typically needs to be solved for such situations: normally you won't get a list of orders that need to be analyzed. Rather, the filter will be more like: orders from - to (date), order state, seller, etc.
So, in order to solve this problem, we have to somehow get order numbers since all these details will not be stored in every table.
The solution is quite simple: we just create a temporary table from the original order query with a select into statement. We can then use this temporary table as a join in all subsequent detail queries.

So, the example for one such detail table will be:

Create the temp table
select f1, f2, f3, f4, f5, ...
into #myTempOrders
from Orders
where c1 = xxx and c2 = yyy...

Make the detail query for purchases:
select f1, f2, f3, f4, ...
from purchases
join #myTempOrders on purchases.orderid = #myTempOrders.orderid
order by purchases.orderid

This purchases query will return data for all orders being analyzed if data for them exists.

So to complete the analysis, our new code looks like this:
Create temp table (if needed)
Compile all detail tables
for each order do
analyze purchases
analyze shipping
analyze invoices
analyze work orders
analyze material issues
...
next

Surprising, how the code is actually the same, isn't it? :)
The only additions are initial compiles that will speed up our analysis.

But there is one significant difference:
In our initial code "analyze" meant: compile the detail query, fetch the records, do what you have to do with them.
In our new code "analyze" means: synchronize the master and detail query, fetch the records, do what you have to do with them.

What is changed is "compile" versus "synchronize".
A compile would look like this (a little remodelled query from before):
select f1, f2, f3, f4, ...
from purchases
where purchases.orderid = xxx

A synchronize on the other hand looks like this:
int CmpRec(SQL sqlOrder, SQL sqlAny)
{
//Compares the "key", that is order number in both datasets
if (sqlOrder.asInt("orderid") > sqlAny.asInt("orderid")) return -1;
else if (sqlOrder.asInt("orderid") == sqlAny.asInt("orderid")) return 0;
else return 1;
}

//Advances the detail dataset until equal or greater order id is fetched
while (CmpRec(sqlVk, sVrDok, iStNal, iStPos) < 0)
if (!sqlVk.ForEach()) break;

while (CmpRec(sqlVk, sVrDok, iStNal, iStPos) == 0)
{
//Do your stuff
if (!sqlVk.ForEach()) break;
}

Have fun creating faster programs :)

Autosizing multi-line merged cells in Excel using macros

As part of my previously mentioned Excel project i also had to display multi-line content in cells. Incidentally, most of this content was in the RTF fields mentioned in my previous post.

However, Excel has some issues with auto sizing merged cells containing such content. To be more specific - if the cell containing such text is merged, all Excel methods for auto sizing fail miserably.

So I went ahead and looked for this on the net. After quite a bit of searching I managed to find a forum thread dealing with this problem. Since my memory is worse than that of a fish, I'm afraid I can't give proper credit to the author of the code that solves the problem (just spent another half an hour searching for that thread, but can't find it :( ).

Anyway, the original solution was made so that a macro would first search for all merged cells and then auto-size their respective lines based on the content of those cells. Although this solution may fail if you have single cells that would resize higher than other cells, it was fine for me.

I have modified the original algorithm since I already knew which cells would require auto-sizing. So the "gathering" algorithm is in my case simplified to adding appropriate cell info into the array as I add new cells to the final report.
The actual resizing algorithm is unmodified.

This is what needs to be done:


  'add merged cells into an array
  If iFirst = 1 Then
    ReDim myTexts(0)
    iFirst = 0
  Else
    ReDim Preserve myTexts(UBound(myTexts) + 1)
  End If
  opisi(UBound(myTexts)) = "D" & iRow & ":G" & iRow
  'note that columns D to G were used in my .xls. You can use whatever range you want, just make sure you add all the cells

  'Actual resizing code
  If iFirst = 0 Then
    'Do this only if you added any elements into the array
    For i = LBound(opisi) To UBound(opisi)
      sReport.Range(opisi(i)).Select
      With ActiveCell.MergeArea
        If .Rows.Count = 1 And .WrapText = True Then
          'Do the magic
          CurrentRowHeight = .RowHeight
          ActiveCellWidth = ActiveCell.ColumnWidth
          For Each CurrCell In Selection
            MergedCellRgWidth = CurrCell.ColumnWidth + MergedCellRgWidth
          Next
          .MergeCells = False
          .Cells(1).ColumnWidth = MergedCellRgWidth
          .EntireRow.AutoFit
          PossNewRowHeight = .RowHeight
          .Cells(1).ColumnWidth = ActiveCellWidth
          .MergeCells = True
          .RowHeight = IIf(CurrentRowHeight > PossNewRowHeight, _
            CurrentRowHeight, PossNewRowHeight)
        End If
      End With
      MergedCellRgWidth = 0
    Next i
  End If

Saturday, 7 March 2009

Converting RTF to plain TXT

As part of my work on the Excel spreadsheet mentioned in the previous post, I also had to display contents of a RTF field.
Since Excel doesn't parse RTF in any way (well, except for import as a file, but that sucks too), I had to convert the RTF field retrieved from the database into plain text so that I could display it for the user.

Following is code that will convert the RTF to plain text.
However, the function makes some assumptions:
  1. The RTF is normal text, maybe with some font formatting
  2. No tables, lists or any special RTF structures are supported. They will be converted to plain text with no special formatting. If you need special formatting, you'll have to add appropriate lines of code into the parser...
  3. The convertor assumes that the code page of the RTF is the same as the code page of the client computer. This is important if you have special (language specific) characters in the RTF itself.
  4. Also in regard to code page, this convertor will only work on ANSI code pages. This means, it will only convert single-byte characters, not multi byte ones. Since I don't have a multi byte RTF available, I don't know how hard it is to fix this one.
  5. Also note that I'm no Excel guru, so the code in the macro may be sub-optimal
So here it goes:

Private Function hexcode(ss)
 If ss = "0" Then
   hexcode = 0
 ElseIf ss = "1" Then
   hexcode = 1
 ElseIf ss = "2" Then
   hexcode = 2
 ElseIf ss = "3" Then
   hexcode = 3
 ElseIf ss = "4" Then
   hexcode = 4
 ElseIf ss = "5" Then
   hexcode = 5
 ElseIf ss = "6" Then
   hexcode = 6
 ElseIf ss = "7" Then
   hexcode = 7
 ElseIf ss = "8" Then
   hexcode = 8
 ElseIf ss = "9" Then
   hexcode = 9
 ElseIf ss = "a" Or ss = "A" Then
   hexcode = 10
 ElseIf ss = "b" Or ss = "B" Then
   hexcode = 11
 ElseIf ss = "c" Or ss = "C" Then
   hexcode = 12
 ElseIf ss = "d" Or ss = "D" Then
   hexcode = 13
 ElseIf ss = "e" Or ss = "E" Then
   hexcode = 14
 ElseIf ss = "f" Or ss = "F" Then
   hexcode = 15
 Else
   hexcode = 0
 End If
End Function

Private Function RTF2TXT(ss)
 While (Right(ss, 1) = Chr(10) Or Right(ss, 1) = Chr(13) Or Right(ss, 1) = " " Or Right(ss, 1) = "}")
   ss = Left(ss, Len(ss) - 1)
 Wend
 If (Len(ss) >= 1) Then
   ss = Right(ss, Len(ss) - 1)
 End If
 iPos = 1
 sResult = ""

 While (Len(ss) > 0)
   If (Mid(ss, iPos, 1) = "\") Then
     If (Mid(ss, iPos + 1, 3) = "tab") Then
       sResult = sResult + Chr(9)
       iPos = iPos + 4
     ElseIf (Mid(ss, iPos + 1, 3) = "par") And (Mid(ss, iPos + 1, 4) <> "pard") Then
       sResult = sResult + Chr(10) 'Chr(13) + chr(10) seems to not work, #13 is displayed as a square char
       iPos = iPos + 4
     ElseIf (Mid(ss, iPos + 1, 1) = "'") Then
       sResult = sResult + Chr(hexcode(Mid(ss, iPos + 2, 1)) * 16 + hexcode(Mid(ss, iPos + 3, 1)))
       iPos = iPos + 4
     Else
       iPos = iPos + 1
       While Mid(ss, iPos, 1) <> "\" And Mid(ss, iPos, 1) <> "{" And Mid(ss, iPos, 1) <> Chr(13) And Mid(ss, iPos, 1) <> Chr(10) And Mid(ss, iPos, 1) <> " "
         iPos = iPos + 1
       Wend
       If Mid(ss, iPos, 1) = " " Then
         iPos = iPos + 1
       End If
     End If
   ElseIf (Mid(ss, iPos, 1) = "{") Then
     iLevel = 1
     iPos = iPos + 1
     While iLevel > 0
       If Mid(ss, iPos, 1) = "{" Then
         iLevel = iLevel + 1
       ElseIf Mid(ss, iPos, 1) = "}" Then
         iLevel = iLevel - 1
       End If
       iPos = iPos + 1
     Wend
   ElseIf (Mid(ss, iPos, 1) = Chr(10) Or Mid(ss, iPos, 1) = Chr(13)) Then
     iPos = iPos + 1
   Else
     sResult = sResult + Mid(ss, 1, 1)
     iPos = iPos + 1
   End If
   If iPos = Len(ss) Then
     ss = ""
   Else
     ss = Mid(ss, iPos)
   End If
   iPos = 1
 Wend

 While (Right(sResult, 1) = Chr(10) Or Right(sResult, 1) = Chr(13) Or Right(sResult, 1) = " ")
   sResult = Left(sResult, Len(sResult) - 1)
 Wend
 RTF2TXT = sResult
End Function

Hope it helps anyone. I couldn't find anything like this function after searching for days.

Database queries and how to use them with macros in Microsoft Excel

I'm no Excel guru. In fact, I've never before written an Excel macro. Also, the last time I was programming in Basic, it was Simon's Basic on my beloved Commodore 64 some 25 years ago.

That said, I recently got a request from a client to do a bill of materials output from SQL database directly into Excel. The root cause of this was that Crystal Reports in our MRP application that we use for reports is unable to export to Excel if the report contains a RTF field + subreports + whatnot.

Since my knowledge of macros was quite insufficient, what I mostly did was record a macro, see what Excel generated and copied that code into my functions.

What I needed first was some kind of parameter entry. The user needs to enter the item, it's variant, number of levels that the BOM would be outputted with and the language.
So I looked up "Excel running macro on cell change". This one was pretty straight forward:
  1. You have to run the Visual Basic editor, embedded into Excel. In pre 2007 versions, that would be Tools / Macros / Visual Basic Editor. For Excel 2007 (and presumably up), you first have to enable the developer ribbon (Office button, Excel Options, Popular, Show developer tab in the ribbon). After that you open the developer tab and click the leftmost icon, saying "Visual Basic".
  2. Once in the editor, right click the sheet in question and select "View code" from the popup menu
  3. At the top of the window that opens, there are two combos. Select "Worksheet" from the first and "Change" from the second
  4. Write the function, something like:

Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Column = 3 Then
If Target.Row >= 6 And Target.Row <= 9 Then
'Your code here
End If
End If
End Sub


Then I needed the query that would return the bill of materials. So I went recording a macro and defined my query. Note that Microsoft Query isn't capable of defining complex queries. The most it will do is allow you to define a join or two :) But, fortunately it does have functionality needed to define queries of any complexity.
To write any custom query and have it's results returned to Excel sheet, you have to select "Execute SQL..." from File menu. You will get a dialog, which is unfortunately not resizable, but you can just as well paste the query you made previously in your tool of choice.
After you have written your query, just press the "Execute" button and you will get the results for the query into a new window in the Microsoft Query.
Having this window on top (if it's not the only one), just select "Return data to microsoft office Excel", again from the File menu.
You now have the query in your sheet of choice in the Excel. So you can stop recording the macro and go view it's code.
You can now grab the generated query code and go beautify it some since your SQL statement was garbled hideously by Excel macro recording. Additionally, insert the user input variables into the query so that the query will respond to what user types as parameters.
For me, the end result is something like this:


iIdent = Range("C6").Value
sVarianta = Range("C7").Value
iStNivojev = Range("C8").Value
sJezik = Range("C9").Value

With ActiveSheet.QueryTables.Add(Connection:= _
"ODBC;DSN=IIS;UID=juhuhu;" _
, Destination:=ActiveSheet.Range("A1"))
.CommandText = Array( _
"select * from (" & Chr(13) & "" & Chr(10) & _
" select KoSumNivo, KoInfZapSt, KoKolMateriala, isnull(KomOpis, KoMPNaziv) as KoMPNaziv, m1.MPTeza," & Chr(13) & "" & Chr(10) & _
" KoPodSifMP, KoMPSifKarKlj, m1.MPMaterial, KoOpomba2, KoMPSifEM1, m1.MPDoNaziv," & Chr(13) & "" & Chr(10) _
, _
" KoSumKolMatNIzm, KoSumZapSt, m1.MPOpis, KoMasterMP, KoMasterVarianta, GetKosovnica1.datum," & Chr(13) & "" & Chr(10) & _
" m2.MPNaziv as MasterNaziv, m2.MPDoNaziv as MasterDoNaziv, m2.MPTeza as MasterTeza, KoSeNaroca" & Chr(13) & "" & Chr(10) & "" _
, _
" from GetKosovnica1(" & iIdent & ", '" & sVarianta & "', 2)" & Chr(13) & "" & Chr(10) & _
" join MaticniPodatki m1 on KoPodSifMP = m1.MPSifra" & Chr(13) & "" & Chr(10) & _
" join MaticniPodatki m2 on KoMasterMP = m2.MPSifra" & Chr(13) & "" & Chr(10) & _
" left outer join KomOpisMat on KoPodSifMP = KomSifMP and KomJezik = '" & sJezik & "'" & Chr(13) & "" & Chr(10) _
, _
" ) as Tmp1" & Chr(13) & "" & Chr(10) & _
" where KoSumNivo <= " & iStNivojev & Chr(13) & "" & Chr(10) & _
"order by KoSumZapSt" _ )
.Name = "Query1"
'.... Lots of properties here
End With


I start out with putting the parameters into respective variables. I then proceed to use the variables in the query itself. If you'll go looking at the SQL itself, note that the GetKosovnica is a stored SQL function that I wrote, which parses recursively a table that defines bills of materials. Now that you have this out of the way, you can start modifying the parameters and see how a different results will be returned for different parameters. But, there's a catch: Everytime you change a parameter, you'll have one more query in the sheet. In my case the previous queries were shifting to the left. So now we have to either reuse or delete any previous query on the sheet. I have searched a lot for a way of deleting an existing query on a sheet. I found several methods, but none of them really worked for me, so in the end I decided to go for query reusing. So this is how you reuse an existing query in Microsoft Excel:

If ActiveSheet.QueryTables.Count = 0 Then
'The query does not exist yet, so let's create a new one
With ActiveSheet.QueryTables.Add(Connection:= _
..... Query parameters here
End With
Else
'The query is already there, so let's reuse it
With ActiveSheet.QueryTables(1)
.CommandText = Array( _
..... Query parameters here
End With
End If


Now you will only have one query on the sheet. However, before you can use the data the query returns, you will have to modify it a bit.
By default Excel queries are created as background queries. This means that when you run the query, Excel will run the query in the background and refresh the data only when it is actually returned. Any macro or user operation before this will work on a previous set of data (if any).

In order to have the macro wait for the query to actually execute and replace the data with a new set, you have to set the following query parameters:

.BackgroundQuery = False
.Refresh

Well, actually ".Refresh" is calling the refresh method. Do this both on the newly created query and on the reused query part.

Now you can use the data from the query and copy it into a nicely formatted table.

Special word of caution: while the Query result table will always display the data as it came from the SQL database, the copied data will have leading zeros removed if the entire cell is composed of numerics. To make Excel understand that you want those zeros, you have to copy the data like this:

Dim sODBC As Worksheet
Dim sReport As Worksheet

Set sODBC = Sheets("ODBC result set")
Set sReport = Sheets("Final output")
sReport.Range("K" & iVrstica).Value = "'" & sODBC.Range("H" & iSrc).Text


Note the single quote in double quotes. This tells Excel that you're entering a string, not a number. This also works with manual entry.
The sReport and sODBC are Sheet objects I predefined so that I don't have to type Sheets("ODBC result set") and Sheets("Final output") for each copying command.

This is it for the first part of my Excel tips. Hope you find it useful.

Who's got the balls?

It's here. The economic crisis, I mean.
To be quite honest, it was due for quite some time now. With the global economy booming and everybody spending like there's no tomorrow, without much thought on prices and actual need, we got just what we deserve. Nobody, I mean NOBODY was thinking about saving anything in case things wouldn't be so good in the future.
An economic crisis is certainly no fun, don't get me wrong. Aside from businesses closing down, it has a nasty habit of hitting the average Joe. Lots of people lost their jobs. I wish I could say it was for a greater good, but it wasn't. In my opinion, our current situation is a result first and foremost of greed. Distant second is plain carelesness.

So a couple of trillion US $ just vanished from the market. How miserable an excuse can that be for a crisis?!? The most virtual stuff man managed to invent, easily redefined in any way we see fit - and we call that a good excuse to make several hundred million people unemployed?

World governments failed to react in due time. The measure was quite simple: just replace the vanished money. Yes, I know, that would just boost the inflation in the mid-term. But it would fix the immediate problem in the short term.
But even if the governments did react in time, it would only delay the inevitable cleansing of the economy (by means of a crisis).

It's what the governments are doing now to get us out of the crisis that matters. It is known that government spending boosts economy. But while our governments were spending like mad during the good times, now they are squeaking how there's no money to spend. Oh please - I've lived in a country with 2000% plus inflation. So I know money can be made at will. They won't even do warranties so that banks would start lending again.

So while Obama is signing bills that will cost U.S. taxpayers in excess of 2 trillion US $, the Euro group is clinging to the monetary criteria like we're all going to die if the criteria would be temporarily forgotten to get us out of the crisis. None of the member countries conforms to the criteria anyway, they just have to lie about it now... Not to even mention the program our own prime minister made: a couple of hundred million € savings in government spending with special emphasis that he would be buying his own coffee from now on. OK, forgetting the coffee thing - you managed to save a bit. But where will you put this money? No idea? Oh, well...

To cut this short:
I believe the Americans are doing the right thing at the moment. Government warranties + direct spending to re-boot the banks is currently all that is needed. Sure it will boost inflation, but it will also revive the industry.
The EU on the other hand has no approach at all. Even when the french president decided to throw a hefty pack of cash to the automobile industry, everybody just cried foul at him. At least he did something...

Guess who's going to come out of this stronger?

I'm back

Long time no hear :)
I've been pretty busy lately and as a result this blog suffered accordingly. But at the same time, material was gathering and now I have quite a few things to say.

This post will just be an intro (to help me remember what I have prepared) :) so here's a kind of TOC tor the following posts:
1. First, I'll have to do a short rant on current economic crisis and the way EU + member governments are dealing with it. I just have to rant a little :D

After this I'll follow with some more useful posts, beginning with programming. Since I've had a request from a client to do a rather complex Excel application, I'll begin with that:
2. The first post will deal with Excel and using data queries in macros
3. The second will be about RTF to plain text conversion, also in an Excel macro
4. The third will be about multi line cell auto-sizing, since Excel doesn't quite cut it with it's standard functions
Although I had to research many aspects of Excel macro programming while I was doing this, these three seem worth publishing as the solutions for the particular problems were not readily available on the net.

5. Continuing with programming tips, I'll explain a rather simple method to increase data fetching speed from multiple SQL queries.

Last, but not least, I've been lax in posting about some hardware that I have tested, so you'll have my thoughts on the following:
6. Auzentech X-Fi Prelude 7.1 audio card
7. Zalman Reserator 2
8. Antec P182 computer case

9. To finish up, I have some code that allows one to sort huge amounts of data and still do it in their lifetime. I'm talking billions of records and hundreds of gigabytes of data. This one will take a bit since I have to clean the implementation specifics from the code, but eventually I'll post this too :). Bottom line with this code is that this code helped reduce some reporting from 48+ hours (data in SQL database) down to 20 minutes (data in custom files, but on the same server).

So here goes. Lots of work to be done. I'm hoping to have all these articles done in march, but one never knows :)