With every new Delphi Version there is this kind of movement:
Do your Unicode conversion today and use our new Rad-Studio-Version to do your work even better as before. ( Combined often with a special offer )
I fully understand this and, to be honest, this is true and it's also a good idea. Using old Delphi-Versions is horrible. Yes, the old versions are a little bit faster and perhaps the IDE is a little bit more stable. But this is a bad tradeoff because you miss all the new stuff that makes your life so much easier.
Ok, if everything is so fine and easy, why this blog post?
If you have old source code in your legacy application and the oldest unit is created after the year 20xx, you probably have no problems. Because you've used the Rad approach and have clicked DB Components on your form direct bound to your database. Perhaps you used the BDE, but converting this to Firedac is not so hard.
So far so good... Buy the latest Delphi Version to go ahead...
But if you're old and you really learned how to code in the old days with Turbo-Pascal, you are not using DB-Components at all.
You are using a Record to hold your data. The records had to be aligned by byte or had to be marked as Packed Record.
You store a date in 3 bytes and of course, you had to use typed string to match your needs.
In these days you had this kind of declaration because every byte counts:
Str6 = String[ 6];
Str26 = String;
Str40 = String;
Str80 = String;
TAddress = Packed Record
FirstName : Str26;
LastName : Str26;
LastName : Str26;
Street : Str80;
Zip : Str6
Town : Str80;
Address : TAddress;
With this definition, you just wrote an address to disk by using blockwrite.
This kind of code works with TP 1 up to Sydney. So what is the problem?
Changing these (short)Strings to a Unicode-String or even to an AnsiString, You are unable to write them to disk anymore because a long string is only a reference, not the actual data. No Problem you just have to serialize the data for read- and write to stream. Each dataset in this stream then has a different length and you have to create a jump table to find the record starting pos. This is the best point to store your data not in a flat-file anymore.
Let's assume you have managed this in your whole app...I still haven't yet.
Long before I had converted my source code to able to compile with XE, I thought it is a good idea to convert every String to an AnsiString, every Char to an AnsiChar and every PChar to an AnsiPChar.
(At this point I was not aware, that the VCL is not working like the Windows API where nearly every function has an A and W version for Ansi and WideStrings.)
Label1.Caption := Address.Firstnames;
produces a warning while compiling. The Compiler is doing the trick here to convert a short string to the Unicode-Caption-String - so it works. btw. this is called a possible solution in most whiter papers. It is not - Yes it compiles and yes it runs, but the warnings are the problem, we will see later.
Again where is the problem? I call it the "Var Param Problem".
Imagine you have the record from above. Then you probably have some methods to deal with this data structure, like:
Function MakeName(var Firstname, LastName : Str26) : Str80;
Using var was a good idea and not to copy 54 bytes on the stack.
Perhaps the MakeName function calls another method inside:
Function Crunch(var S : Str26) : Str26;
While (S > #0) and (S[byte(S) = ' ') do
S := Chr(Byte(S) - );
Crunch := S;
Function EmptyName(var S : Str26) : boolean;
Emptyname := (Crunch(s) = '');
Function MakeName(var Firstname, LastName : Str26) : Str80;
then MakeName := crunch(LastName)
else MakeName := crunch(LastName) +', '+crunch(Firstname);
Yes, if you did not know this - before RESULT was the way to go, assigning the value to the function name was the way you had to write functions!
There where more stupid things we had done in the old days.
Kills the reference and produces a memory leak if you have a long string in the record.
Move(Address, OldAddress, sizeof(Address));
Moves the reference and doesn't create a copy of it.
With hundreds of Methods, using short or special short string types as var params all over your source code, there are no easy ways to convert all of this step by step to a different string type. Once you have changed one function that is called by nearly everybody you get this snowball effect of compiler errors.
Yes, most of the conversion beside the "Var Param Problem" is done by the compiler. The problem is: You will get thousands of warnings and you can not ignore them, because some of them you have to deal with.
The hard task is: Find the 500 or more problems out of 60.000 warnings. I've started with XE2 and by ignoring most of the "probably lost" warnings because you assign a string to a short string, I'm down to ~7500 warnings.
But here are some good news. With 10.4 we get new managed records that can have an assign and copy method. This is the first time there is a possibility to handle long strings in records the right way by creating a new instance on copy.
It would be so perfect if we had a compiler switch to turn of Unicode in XEx - sorry just dreaming.
While refactoring the code to XE - of course, everything has to be binary matching 1:1, so every not converted data-structure still written to disk is still the same data. By using any sourcecode repository you are able to maintain both trees (Unicode and none Unicode) for a small-time until the conflicts are too much. So the only way to go is to be able to make changes to your non-unicode source, which produces the same results with fewer warnings if you take the XEx compiler. By going this way, you want as little as possible IFDEF's in your source, because you've already a big pile of code to refactor and IFDEF's in the long term makes it more unreadable.
So, how to find a way to go?
It would be perfect if I could keep the short strings in my records as long as possible. Outside of these records, everything could be "normal" string.
Everything is calling everybody and nearly every unit is linked to every unit - not always in a direct way, but over some unit in most cases.
Side note: Don't watch Uncle Bob's Clean Code stuff, because after that you will hate your work of the past 35 years much more you already do. (were young we needed the money)
No this is wrong: EVERY Developer should watch this Session 1-6 from Uncle Bob.
Stop reading here and click on this link and after that - if you still think, you do not need to write unit-tests. Sorry, in this case, I can't help you at all. (The only possibility is: You have skipped some parts of the videos).
If I had a trustable coverage of unit tests already in place in my legacy app, EVERYTHING would be so much easier and no fear of changing some of the old methods and dependencies.
And now? I've started a different approach - by using my source-code-tokenizer from my sourcecode formatted project, I was able to find all dependencies in all my Unit-Uses and removed ~400 Units that are not needed anymore in "this" unit. Not bad, but not enough.
I plan to restructure my dependencies with a brute force or NN network - we will see if this could work. The next thing would be a "path through the source" finder, to convert all Var-params from short to string where possible.
So a long way to go and in the end, we are on VCL 32Bit. 64Bit would be the next step. If everybody is going the ARM-Ways perhaps we have to convert to FMX some time in the future.
To get something done, I'm going to ways, first decoupling and writing unit tests, and second try to get rid of the short string or at least the none critical warnings.
And of course everything just in my spare time. I wish I had...
If you have any good ideas - I like to read them in the comments.
So long... Happy coding and watch Uncle Bob!