from old school to new frontier: Unicode migration is harder as everybody told you

Harder? Why harder? Everybody is telling you - it's easier as you think.

With every new Delphi Version there is this kind of movement:

Do your Unicode conversion today and use our new Rad-Studio-Version to do your work even better as before. ( Combined often with a special offer )

I fully understand this and, to be honest, this is true and it's also a good idea. Using old Delphi-Versions is horrible. Yes, the old versions are a little bit faster and perhaps the IDE is a little bit more stable. But this is a bad tradeoff because you miss all the new stuff that makes your life so much easier.

Ok, if everything is so fine and easy, why this blog post?

If you have old source code in your legacy application and the oldest unit is created after the year 20xx, you probably have no problems. Because you've used the Rad approach and have clicked DB Components on your form direct bound to your database. Perhaps you used the BDE, but converting this to Firedac is not so hard.

So far so good... Buy the latest Delphi Version to go ahead...

But if you're old and you really learned how to code in the old days with Turbo-Pascal, you are not using DB-Components at all.

You are using a Record to hold your data. The records had to be aligned by byte or had to be marked as Packed Record.

You store a date in 3 bytes and of course, you had to use typed string to match your needs.

In these days you had this kind of declaration because every byte counts:

Type

Str6 = String[ 6];

Str26 = String[26];

Str40 = String[40];

Str80 = String[80];

TAddress = Packed Record

FirstName : Str26;
LastName : Str26;

Street : Str80;

Zip : Str6

Town : Str80;

end;

var

Address : TAddress;

With this definition, you just wrote an address to disk by using blockwrite.

This kind of code works with TP 1 up to Sydney. So what is the problem?

Changing these (short)Strings to a Unicode-String or even to an AnsiString, You are unable to write them to disk anymore because a long string is only a reference, not the actual data. No Problem you just have to serialize the data for read- and write to stream. Each dataset in this stream then has a different length and you have to create a jump table to find the record starting pos. This is the best point to store your data not in a flat-file anymore.

Let's assume you have managed this in your whole app...I still haven't yet.

Long before I had converted my source code to able to compile with XE, I thought it is a good idea to convert every String to an AnsiString, every Char to an AnsiChar and every PChar to an AnsiPChar.

(At this point I was not aware, that the VCL is not working like the Windows API where nearly every function has an A and W version for Ansi and WideStrings.)

So each

Label1.Caption := Address.Firstnames;

produces a warning while compiling. The Compiler is doing the trick here to convert a short string to the Unicode-Caption-String - so it works. btw. this is called a possible solution in most whiter papers. It is not - Yes it compiles and yes it runs, but the warnings are the problem, we will see later.

Again where is the problem? I call it the "Var Param Problem".

Imagine you have the record from above. Then you probably have some methods to deal with this data structure, like:

Function MakeName(var Firstname, LastName : Str26) : Str80;

Using var was a good idea and not to copy 54 bytes on the stack.

Perhaps the MakeName function calls another method inside:

Function Crunch(var S : Str26) : Str26;

begin

While (S[0] > #0) and (S[byte(S[0]) = ' ') do

S[0] := Chr(Byte(S[0]) - );

Crunch := S;

end;

Function EmptyName(var S : Str26) : boolean;

begin

Emptyname := (Crunch(s) = '');

end;

Function MakeName(var Firstname, LastName : Str26) : Str80;

begin

if EmptyName(Firstname)

then MakeName := crunch(LastName)

else MakeName := crunch(LastName) +', '+crunch(Firstname);

end;

Yes, if you did not know this - before RESULT was the way to go, assigning the value to the function name was the way you had to write functions!

There where more stupid things we had done in the old days.

Fillchar(Address,sizeof(Address),#0);

Kills the reference and produces a memory leak if you have a long string in the record.

Move(Address, OldAddress, sizeof(Address));

Moves the reference and doesn't create a copy of it.

With hundreds of Methods, using short or special short string types as var params all over your source code, there are no easy ways to convert all of this step by step to a different string type. Once you have changed one function that is called by nearly everybody you get this snowball effect of compiler errors.

Yes, most of the conversion beside the "Var Param Problem" is done by the compiler. The problem is: You will get thousands of warnings and you can not ignore them, because some of them you have to deal with.

The hard task is: Find the 500 or more problems out of 60.000 warnings. I've started with XE2 and by ignoring most of the "probably lost" warnings because you assign a string to a short string, I'm down to ~7500 warnings.

But here are some good news. With 10.4 we get new managed records that can have an assign and copy method. This is the first time there is a possibility to handle long strings in records the right way by creating a new instance on copy.

It would be so perfect if we had a compiler switch to turn of Unicode in XEx - sorry just dreaming.

While refactoring the code to XE - of course, everything has to be binary matching 1:1, so every not converted data-structure still written to disk is still the same data. By using any sourcecode repository you are able to maintain both trees (Unicode and none Unicode) for a small-time until the conflicts are too much. So the only way to go is to be able to make changes to your non-unicode source, which produces the same results with fewer warnings if you take the XEx compiler. By going this way, you want as little as possible IFDEF's in your source, because you've already a big pile of code to refactor and IFDEF's in the long term makes it more unreadable.

So, how to find a way to go?

It would be perfect if I could keep the short strings in my records as long as possible. Outside of these records, everything could be "normal" string.

- impossible

Everything is calling everybody and nearly every unit is linked to every unit - not always in a direct way, but over some unit in most cases.

Side note: Don't watch Uncle Bob's Clean Code stuff, because after that you will hate your work of the past 35 years much more you already do. (were young we needed the money)

No this is wrong: EVERY Developer should watch this Session 1-6 from Uncle Bob.

Stop reading here and click on this link and after that - if you still think, you do not need to write unit-tests. Sorry, in this case, I can't help you at all. (The only possibility is: You have skipped some parts of the videos).

If I had a trustable coverage of unit tests already in place in my legacy app, EVERYTHING would be so much easier and no fear of changing some of the old methods and dependencies.

And now? I've started a different approach - by using my source-code-tokenizer from my sourcecode formatted project, I was able to find all dependencies in all my Unit-Uses and removed ~400 Units that are not needed anymore in "this" unit. Not bad, but not enough.

I plan to restructure my dependencies with a brute force or NN network - we will see if this could work. The next thing would be a "path through the source" finder, to convert all Var-params from short to string where possible.

So a long way to go and in the end, we are on VCL 32Bit. 64Bit would be the next step. If everybody is going the ARM-Ways perhaps we have to convert to FMX some time in the future.

To get something done, I'm going to ways, first decoupling and writing unit tests, and second try to get rid of the short string or at least the none critical warnings.

And of course everything just in my spare time. I wish I had...

If you have any good ideas - I like to read them in the comments.

So long... Happy coding and watch Uncle Bob!

3 comments:

SKamradtAugust 4, 2020 at 6:29 PM
Such a migration might be a good time to reorg some code as well. If you work towards a MVC pattern for example, it might make it easier to replace the view later with FireMonkey for a cross compatible version. Use plenty of TODO comments for anything that comes to mind as your working your way down the code. This will save you time and energy looking for things later as well as give you a list of things to do.

I recall doing a major conversion from DOS to Windows when Delphi 1 came out. It wasn't fun and had just as many challenges as it sounds like your facing. Spending sometime organizing a WBS does help keep the migration on task, even more so if you can get estimates for each work item. Sure, there is going to be some area of the code that your going to spend 3 weeks on rather than the 2 hours allocated, but it at least gives you a road map to follow and avoids creep.
dummzeuchAugust 5, 2020 at 2:16 PM
I feel with you. That was the reason why I stayed with Delphi 2007 so long even though I always had the latest and (sometimes not so) greatest Version available. Lots of file of record stuff, lots of short strings, many places where a string is used for binary data. And of course: no Unit Tests, because there was never time for this.
It took me several months to get rid of most warnings and errors in the code base (several million LOC), but I am still not sure that I haven't introduced any new bugs. Unfortunately there still isn't enough time for a comprehensive set of unit tests, but at least there are some now.

from old school to new frontier

Buy Delphi

Tuesday, August 4, 2020

Unicode migration is harder as everybody told you - if you are old.

3 comments: