Intelligent rumblings about programming, AI , the world and life in general

7Jan/120

Makoto Development Journal 2: Character Strings

This is another one of the posts regarding the Makoto Engine. The engine is being developed in all the free time that attending graduate studies in Tokyo University allows me. In this post I would like to address the topic of character strings. We have all tried to use the std::string and know what a frustration it is to have to deal with its deficiencies. It is not well written and does not provide enough functionality. Over the years I have used many String classes such as wxString of the wxWidgets Library, QString from the QT Library and the boost library's string. From the above I would say that the most useful must have been the QString from QT just for the amazing integration it has with the library and then the boost library's string coming in as a close second. The boost string library is a very complete library providing an enormous amount of text processing functions for Strings.

But in the case of the Makoto engine smooth integration with the rest of the engine was a top priority. A complete string library that can be extended as needed was a requirement. Moreover since the Makoto Engine is mostly written in C, we also would need Strings compatible with C and not a String class to be provided to the engine. For that reason a library I have been developing for a very long time just for use in my personal projects called "Refu Library" has been chosen. It contains useful code such as a complete string library, xml reader,threads, e.t.c. which are things I use in more than one project. Funny trivia is that I named the library after the nickname the Japanese had given me to shorten my name when I first visited Japan with a youth exchange program. In every part of Refu library I strive for optimality and speed but I don't claim that it's better than any of the alternatives available out there as far as Strings are concerned. Just that I have been working with it for years and as such is the one I feel most comfortable to work with. Not to mention the fact that any changes I would like to make I can make them at will.

So let us see how the String is defined. In C the string is defined as a struct containing an array of bytes. There are two types of Strings. The normal and the X String both shown below in their C versions. The difference is that the normal String's internal representation is as minimal as that of a C string to save space and speed during calculations. On the other hand StringX contains two additional parameters which prove to be quite useful if you are manipulating frequently changing text or are parsing text from a file. So whenever doing text editing StringX is the better choice.

typedef struct RF_String
{
    //! The string's data
    char* bytes;
}RF_String;
 
typedef struct RF_StringX
{
    //! The string's data
    char* bytes;
    //! The buffer index, denotes how far from the start of the buffer the start of the string has moved
    int bIndex;
    //! The size of the buffer allocated for this extended String
    int bSize;
}RF_StringX;

Of course a String library would not even deserve to name itself one if it did not support Unicode. Refu String is by default unicode. In the past I had another option reserved for ASCII strings but I decided against it since I saw no use for it. The internal representation is in UTF-8 and functions to convert to and from all the other unicode representations exist. Why choose UTF-8 over the other encodings?

The advantages of UTF-8 are many and for a complete list do check the wikipedia article on UTF-8. From those the three biggest advantages in my opinion are:

  • Being UTF-8 all the ASCII characters are represented with the same bytes as in ASCII so a text written in English totally oblivious of the existence of Unicode can still be read by the String and of course an application written a decade ago can still read UTF-8 strings containing only English characters.
  • UTF-8 is of course the most memory efficient encoding scheme, so saving memory is a big advantage it has. The only memory disadvantage is in the encoding of East Asian Languages where in UTF-16 they would be taking 2 bytes but here in UTF-8 they require 3.
  • But above all the biggest advantage and one that particularly makes a big difference in a String library is the fact that the End of String character '\0' is represented by one zero byte and not two as in UTF-16 or 4 as in UTF-32. That means that it is the same as in ASCII, which in turn means that all the original c string functions can work with UTF-8 strings albeit being oblivious to the actual number of characters inside the string.

For example the functions below will work

//assume we got two strings encoded in UTF-8
char* str = func_that_inits_string_utf8();
char* otherstr = func_that_inits_another_string_utf8();
 
//The string comparison function still compares the strings byte per byte oblivious to the fact that they are UTF-8
char equal = strcmp(str,otherstr);
 
//Will actually get the BYTE length of the string , be careful not the character length, since we would
//need to traverse it as UTF-8 and count chars with a special function to get that one
unsigned int byteLength = strlen(str);
 
//Will copy "str" into otherstr succesfully, byte to byte
strcpy(otherstr,str);
 
// works as intended. Will return a pointer to the position of "otherstr" inside "str"
char* pos = strstr(str,otherstr);

So as you can understand with a little bit of care using UTF-8 will make things really easy as far as using C string functions is concerned.

A Few RF_String functions

Now let's see some of the actual string functions implemented in the Refu String library. Below we have a function that retrieves the code point, that is the unicode code of a character in the String.

//! Retrieves the unicode code point of the parameter character. <i>Can be used with StringX</i>
//! @param thisstr The string whose character code point we need
//! @param c The character index whose unicode code point to return. Must be a positive (including zero) integer.
//! @return Returns the code point or OPERATION_FAILURE in case of character index out of bounds
int rfString_GetChar(RF_String* thisstr,unsigned int c);

Here we have a function that returns a substring existing between two strings.

//! Returns the first substring existing between the left and right parameter substrings.  <i>Can be used with StringX</i>
//! @note The Returned Substring needs to be freed by the user. BEWARE when assigning to a string using this function since if any previous string exists there IS NOT getting freed. You have to free it explicitly
//! @param thisstr This current string
//! @param lstr The left substring that will define the new substring
//! @param rstr The right substring that will define the new substring
//! @return Returns the substring between left and right substrings if they are found. If they are not returns a null pointer.
RF_String* rfString_Between(RF_String* thisstr,RF_String* lstr,RF_String* rstr);

This is a function to append another String to this String.

//! Appends the parameter String to this one. <b>Can't be used with RF_StringX</b>
//! @param thisstr The string to append to
//! @param other The string to add to this string
void rfString_Append(RF_String* thisstr,RF_String* other);

This function removes characters from inside the string at a given position counting backwards from that position

//! Removes n characters from the position p (including the character at p) of the string counting backwards. If there is no space to do so, nothing is done and returns false.
//! <i>Can be used with StringX</i>
//! @param thisstr The string to prune from
//! @param p The position to remove the characters from. Must be a positive integer. Indexing starts from zero.
//! @param n The number of characters to remove from the position and back.Must be a positive integer.
//! @return Returns true in case of succesfull removal and false in any other case.
char rfString_PruneMiddleB(RF_String* thisstr,unsigned int p,unsigned int n);

A Few RF_StringX functions

In this section we have examples of StringX functions. These are used for Strings that are intended for heavy text editing use.
Below we can see one function that inserts a character inside a position in a string

//! Inserts a string to this extended string at the parameter character position.
//! @param thisstr The extended string to insert to
//! @param pos     The character position in the string to add it. Should be a positive (or zero) integer. If the position is over the string's size nothing happens.
//! @param other   The string to add
void rfStringX_Insert(RF_StringX* thisstr,unsigned int pos,RF_String* other);

This function here replaces any occurence of substrings existing between left and right inside the string with the to replace string

//! Replaces what exists between the ith left and right substrings of this extended String. Utilizes the internal string pointer.
//! @param thisstr The extended string to work on
//! @param left The left substring that will define the new substring
//! @param right The right substring that will define the new substring
//! @param rstr The string to act as replacement
//! @param i The specific between occurence to replace. Should range between 1 and infinity. If 0 all occurences will be replaced
//! @return Returns true if the replacing happened and false if either the left or the right strings were not found
char rfStringX_ReplaceBetween(RF_StringX* thisstr,RF_String* left,RF_String* right,RF_String* rstr,int i);

Finaly below is a function exhibiting the internal pointer of StringX where it returns a substring located between two specific sequences in the string and also moves the pointer after them.

//! Returns the first substring existing between the left and right substrings of this String and moves the internal pointer right after them
//! @note The Returned Substring needs to be freed by the user. BEWARE when assigning to a string using this function since if any previous string exists there IS NOT getting freed. You have to free it explicitly
//! @param thisstr The extended string to work on
//! @param left The left substring that will define the new substring
//! @param right The right substring that will define the new substring
//! @return Returns the substring between left and right substrings if they are found. If they are not returns a NULL String.
RF_StringX* rfStringX_BetweenMove(RF_StringX* thisstr,RF_String* left,RF_String* right);

C++ String Wrapper

Of course as mentioned above this is a C library with all the functions being written for use in C but for usage in C++ a wrapper is provided which presents the String as a C++ class.

class RF_String
{
    public:
        /** String Constructors/Destructor **/
 
        //! The string's main constructor
        //! @param str the string's content in UTF-8 encoding
        RF_String(const char* str);
        //! The string default constructor, for uninitialized NULL strings
        RF_String();
 
        //e.t.c.
        ...
        ...
};

And as an example below we can see functions of the c++ class wrapping

        //! Adds two strings together
        //! @param s1 A constant reference to the first string to be added
        //! @param s2 A constant reference to the second string to be added
        //! @return Returns the new string which is the addition of s1 and s2
        friend RF_String operator+(RF_String const& s1,RF_String const& s2);
        //! Adds a string and an integer, converting the integer to a string in the process
        //! @param s1 A constant reference to the string to be added
        //! @param num A constant reference to the number to be added
        //! @return Returns the new string which is the addition of the string and the number
        friend RF_String operator+(RF_String const& s1, const int& num);

with their implementations being as simple as just calling the equivalent c functions from the c library

 RF_String operator+(RF_String const& s1,RF_String const& s2)
{
    rfString_Append(s1->str,s2->str);
}
 
RF_String operator+(RF_String const& s1, const int& num)
{
    rfString_Append_i(s1->str,num);
}

which in turns allows nice things not available in C, namely operator overloading such as

RF_String str("This is No.");
str+=5;
//now str contains :"This is No.5"

So basically the ReFu Strings can be used by both C and C++ projects, but what is used in the core of the Makoto Engine is C since the engine itself is written in C.

In conclusion the Refu String library is one that is continuously being developed with adding functionality whenever that is deemed necessary and has been very well combined with the development of the Makoto Engine. It is a very useful String library but as I see it it has one big disadvantage which I plan to correct in the near future. It has no implementation of Regular Expressions which are very useful in String manipulation. Soon regular expression functions will be added to it. Finally as soon as the Makoto Engine gets released the String will be available to all users since it is the String universally used by the engine and the users can utilize it for whatever purpose they like.

25Nov/110

Makoto Development Journal 1: Introduction

I know that I have not made a blog-post in ages and the reasons are varied. Too busy with life, moved to a new apartment in a different place in Tokyo and an unrelenting workload from the university. But I would like to start posting again in the blog introducing a new category of posts. Namely a development journal for the project I have been working on in all my free time for almost a year now. And if one adds up the many sub-projects that got encompassed into it then that would make it a good 3 years of my life. The project's name is "Makoto GUI Engine".

The Makoto Engine is basically a mix of a GUI library and a 3D engine. It will enable programmers to develop hardware accelerated GUI for their software via an intuitive and easy-to-use API. Moreover it will introduce some new and hopefully useful 3D GUI elements (widgets) that a programmer can utilize in order to make a project more modern and easy to use. The purpose of this project is to:

  • Be a modern library with the needs of nowadays programmers and end-users in mind
  • Be multi-platform by employing openGL
  • Offer a complete, easy to use and intuitive API that will allow users to:
    • create hardware accelerated 2D/3D GUI for their software
    • provide fully customizable elements(widgets). The programmer will have full control of how every single GUI element looks
    • add unique 3D elements(widgets) in their GUI increasing the usability and the options provided to the end-users
    • easily transfer their projects between different platforms by having consistent API among all supported platforms
    • easily transfer their project between different compilers due to ABI consistency.

Below you can see a rendering by the engine of its own name using the Consolas font.

Makoto Engine

Makoto Engine Rendering 3D Glyphs of the Consolas Font

An additional requirement that I myself have set for the project is to have as little external dependencies as possible except for openGL itself. The reason is that the Makoto Engine started as an amalgamation of numerous even older projects of mine into one greater project. I started it out of love for learning new things, experimenting and above all creating. This, at times, includes reinventing the wheel but that is I believe a part of the learning process. Not to mention that having done so the engine will consist of modules that are developed so that they can work with each other and not just be tweaked into working after messing with other people's code. Some of the old projects that eventually became a part of the engine are among others a .3ds file parser, a .ttf font parser, triangulation algorithms e.t.c. So far the engine is totally independent of any other library and I hope to keep this so. Of course I can not foresee the future or the needs the project might have in the long run, needs that may require the use of an external library.

Going into a bit more detailed explanation about the engine's development. It is written exclusively in the C language because of the portable ABI it provides even between different compilers. Furthermore I, as the main author and maintainer of the engine, am really familiar with C and so it was a natural choice. Moreover every language existing out there has bindings for C and so if, and hopefully when, bindings have to be made for any another language that will be possible and easy to accomplish. C is the lowest common denominator and that is why it has been chosen.

One might wonder why the name makoto or what does it mean? Well having lived in Japan for the last 1.7 years of my life and with another 1.5 years ahead of me I have taken a liking to some of the Kanji letters (chinese characters) used here. My most favorite in particular is called makoto and in Japanese it means faith, fidelity, trust, confidence e.t.c. I like its meaning and its shape and well since the conception and beginning of development started in Japan I deemed it was a nice name to go with. Below you can see a rendering of that very kanji by the engine using the free and beautiful Sword Kanji true type font.

A 3D rendering of the Kanji (Chinese character) for makoto by the engine

At the moment many different parts of the engine are being developed simultaneously in all my free time (and sometimes even multi-tasking when doing other things). The hierarchy of the 2D GUI elements of the engine has been created, a system of signals generated from the OS and their propagation to the elements, fonts system, engine native 3d models file , basic 3D elements and more.

Even though the original API is written in C, at the same time I have developed a scripting language, the Makoto-Go, that makes it a lot easier and a lot less error-prone to develop software using the engine. The language syntax is still being worked on but its main goal is to be useful, have clear syntax and offer maximum control of the engine while providing  additional error checking for the user-programmer.

As far as the distribution of the engine is concerned, the Makoto Engine will be free for non-commercial use but not open-source. Commercial license will be available for a minor fee. I have been asked by some friends of mine why did I not want to make it open-source? The answer is pretty simple. At the particular moment in time I do not think that the best thing for the Makoto Engine would be to be open-source. Furthermore when a person devotes literally years of his life in a project it is natural to expect to gain something out of it in the future. Making it open-source would warrant me unable to do so. Of course in the world of software things change, and ways to make a living out of open-source software are constantly popping up, so I can not rule out anything for the far future.

Below you can observe the positioning system of the engine at work, where with just a few lines of code the elements are positioned accordingly. In this case the goal was to form a game-like menu. Since the 3D elements are not mature yet, this is only an example with 2D elements. You will notice that the elements appear to be a tad-bit plain but that is only temporary and will be fixed in the future. The point of the screenshot was to showcase successful positioning.

A Simplistic Mock Game Menu rendered by the engine

Of course while working on a project such as this one you have to consider the potential end-users and especially in the case of an engine like this, my fellow programmers who want to easily create nice GUI for their software. So who should use the Makoto Engine?

  • Someone who needs to create beautiful, fully customizable GUI easily and quickly without wasting time or getting lost writing pages of code for something really simple
  • Someone who wants to incorporate fast, reliable, hardware accelerated and unique GUI in their software
  • Someone who would like to use 3D GUI elements in their software in order to increase usability and not just for the usual eye candy effect that 3D elements have been used for in the past

Naturally the Makoto Engine is not meant for everyone. So far the major reason that one should not try to use the engine is because it does not conform to the native look and feel of the respective Operating system it is running on. At the moment I do not plan to make it do so. The default look and feel of the engine will be consistent across all platforms and the elements are all going to be fully customizable as mentioned above. So if you are making applications whose requirement is to have the native look and feel of your target Operating System then you are better off using any of the other great GUI engines out there.

This concludes the first of the development journals of the Makoto Engine. Some of the information is of course subject to change since it is a constant work in progress. After all those months of development, even if it is only in my free time I do have a lot to write about so I am certain that there will be a lot more posts related to the development of the engine. Till next time!