So let’s start by defining what we mean when we talk about matrices. Matrices are rectangular representations of the coefficients of linear equation systems of linear equations with unknowns such as

A special subset of matrices are invertible matrices with linear equations and unknowns which we will discuss in this post. They are a key concept in linear algebra and used at almost every area in mathematics to represent linear transformations such as rotation of vectors in three-dimensional space. Knowing this a representation of the data as a two-dimensional vector would somehow feel natural not only because of the structure of the data but also how to access the data. As a result, the first rough implementation of a matrix could look like this

To get a simpler, and more natural, way to access the data, the implementation provides two overloads of the function call `operator()`

(1,2) which are defining the matrix as a function object (functor). The two overloads of the function call `operator()`

are necessary because we are not only want to manipulate the underlying data of the matrix but also want to be able to pass the matrix as a constant reference to other functions. The private `AssertData()`

(3) class function guarantees that we only allow quadratic matrices. For comparison reasons, the matrix provides a ‘naive’ implementation of matrix multiplications realized with an overloaded `operator*()`

(4) whose algorithm has a computational complexity of . Real-life implementations of matrix multiplication are using Strassen’s algorithm, with a computational complexity of , or even more sophisticated algorithms. For our purpose, we only need the matrix multiplication to explore a little bit of the performance of the matrix implementation and verify the matrix decomposition methods.

Until now we still don’t know if we really should use a two-dimensional vector, regarding computational performance, to represent the data or if it would be better to use another structure. From several other linear algebra libraries, such as LAPACK or Intel’s MKL, we might get the idea that the ‘naive’ approach of storing the data in a two-dimensional vector would be suboptimal. Instead, it is a good idea to store the data completely in one-dimensional structures, such as STL containers like std::vector or std::array or raw C arrays. To clarify this question we will examine the possibilities and test them with quick-bench.com.

`std::vector<T>`

is a container storing its data dynamic (1,2,3) on the heap so that the size of the container can be defined at runtime.

The excerpt of the libstdc++ illustrates also the implementation of the access `operator[]`

(4). And because the `std::vector<T>`

is allocating the memory on the heap, the `std::vector<T>`

has to resolve the data query via a pointer to the address of the data. This indirection over a pointer isn’t really a big deal concerning performance as illustrated at chapter heap-vs-stack further down the article. The real problem is evolving with a two-dimensional `std::vector<T>`

at the matrix multiplication `operator*()`

(5) where the access to the right hand side, `rhs(j,k)`

, is violating the row-major-order of C++. A possible solution could be to transpose the `rhs `

matrix upfront or storing the complete data in a one-dimensional `std::vector<T>`

with a slightly more advanced function call `operator()`

which retrieves the data.

`std::vector<T>`

is fulfilling the requirement of a ContiguousContainer which means that it’s storing its data in contiguous memory locations. Because of the contiguous memory locations, the `std::vector<T>`

is a cache friendly (Locality of Reference) container which makes its usage fast. The example below illustrates the contiguous memory locations of a std::vector<int> with integer size of 4-byte.

A slightly extended example with a two-dimensional array is pointing out the problem we will have, with this type of container, to store the matrix data. As soon as a `std::vector<T>`

has more than one dimension it is violating the Locality of Reference principle, between the rows of the matrix, and therefore is not cache-friendly. This behavior is getting worse with every additional dimension added to the `std::vector<T>`

.

`std::array<T, size>`

is a container storing its data on the stack. Because of the memory allocation on the stack, the size of `std::array<T, size>`

needs to be defined at compile time. The excerpt of libstdc++ illustrates the implementation of `std::array<T, size>`

. `std::array<T, size>`

is in principle a convenience wrapper of a classical C array (1).

The code below shows a simple test of the performance difference between memory allocated on the heap and allocated on the stack. The difference can be explained by the fact that memory management on the heap needs an additional level of indirection, a pointer which points to the address in memory of the data element, which is slowing down the heap a little bit.

The code below is Benchmarking the 4 different ways to store the data (two- and one-dimensional `std::vector`

, `std::array`

, C array) and in addition shows the performance of a C array wrapped in `std::unique_ptr<T[]>`

. All Benchmarks are done with Clang-8.0 and GCC-9.1.

The graphs below point out what we already thought might happen, the performance of a two-dimensional `std::vector`

is worse than the performance of a one-dimensional `std::vector`

, `std::array`

, C array, and the `std::unique_ptr<T[]>`

. GCC seems to be a bit better (or more aggressive) in code optimization then Clang, but I’m not sure how comparable performance tests between GCC and Clang are with quick-bench.com.

If we would choose Clang as compiler it wouldn’t make a difference if we would take a `std::vector`

or one of the others, but for GCC it does. `std::array`

we can’t choose because we want to define the size of the matrix at runtime and therefore `std::array`

is not an option. For our further examination of a possible matrix implementation we could choose a C array, but for simplicity, safer memory management we will use `std::unique_ptr<T[]>`

. `std::vector<T>`

to store the data and querying the data via data() would also be an option.

Let’s finally start discussing different matrix decomposition methods after all these preceding performance considerations.

As long as we can guarantee that all main diagonal elements (also called pivot elements) are unequal to zero at any iteration of the decomposition , we can use a simple LU-Decomposition.

The LU-Decomposition (also known as Gaussian elimination) method is decomposing a given matrix *A* into two resulting matrices, the lower (*L*) matrix which contains quotients and main diagonal elements of value 1, and the upper (*U*) matrix which contains the resulting elements. Through back substitution, the upper matrix can be used to get the solution of the linear equation system.

At first (1) the main diagonal elements of the *L* matrix need to be initialized with 1. The quotients (2) of the *L* matrix can then be calculated by

And the results (3) of the matrix *U* can afterward be calculated by

An example of a matrix *A* and its decomposition matrices *L* and *U* would look like the following.

We have now a reliable algorithm to decompose an invertible matrix and solve the linear equation system. So let’s say we have one of these invertible matrices which have non zero main diagonal pivot elements. It should be no problem to solve the linear equation system with the algorithm described above, now. Let’s find out and say we have to solve the following linear equation system, with an accuracy of 5 digits, which should lead to the results and :

After solving the linear equation system we can get the results of and by back substitution

Unfortunately, the result of is quite off-target. The problem is the loss of significance due to the difference between the two values of the almost same size. To solve this problem we need the partial pivoting strategy which is exchanging the actual pivot element with the value of the largest element of the column. Ok let’s try again with the example above, but this time we have exchanged both rows to have the maximum values at the main diagonal pivots:

Again after solving the linear equation system we can get the results of and by back substitution

Now the results are much better. But unfortunately, we can prove that also performing a partial pivoting, according to the largest element in the column, is not always sufficient. Let’s say we have the following linear equation system

And after solving the linear equation system and applying partial pivoting before each iteration, the solution looks like

As a result, after the back substitution, we get and , but the exact results are and . Again the solution is off-target. The difference of the exact and numerical calculated solution can be explained by the big coefficients after the first iteration which already leads to a loss of information due to value rounding. Additional the values of and lead again to a loss of significance at . The reason behind this behavior is the small value of the first pivot element at the first iteration step compared to the other values of the first row. The matrix was neither strict diagonal dominant nor weak diagonal dominant.

A solution to this problem is the relative scaled pivoting strategy. This pivoting strategy is scaling indirectly by choosing the pivot element whose value is, relative to the sum of the values of the other elements of the row, maximum before each iteration.

As long as the *p*-row gets exchanged by the *k*-row. The exchange of rows can be represented by a permutation matrix and therefore

After initializing the *U* matrix with the *A* matrix and the permutation matrix *P* with an identity matrix, the algorithm is calculating (1) the sum of all elements of row *i* where column and afterward (2) the quotient *q*. As long as the maximum is not zero and the *pk* and *p* row of *L*, *U* and *P* matrix will be swapped.

The results after back substitution are and illustrate the supremacy of an LU-Decomposition algorithm with a relative scaled pivot strategy compared to an LU-Decomposition with a plain diagonal pivoting strategy. Because of the rounding errors of the LU-Decomposition, with and without relative scaled pivoting, an iterative refinement is necessary, which is not part of this post.

Many problems solved with matrices, such as the Finite Element Method, are depending on the law of conservation of energy. Important properties of these matrices are their symmetry and that these matrices are positive definite. A matrix is positive definite if their corresponding quadratic form is positive.

That means that the elements of a symmetric positive definite matrix are necessarily fulfilling the criteria

- there is one
*k*with

A symmetric positive definite matrix can be decomposed with the Cholesky-Decomposition algorithm which results in a *L* (lower) triangular matrix. *L* multiplied with its transposed form results in *A*.

The formulas of the elements of the Cholesky-Decomposition are the same as the LU-Decomposition but because of the symmetry, the algorithm does only needs to take care of the elements of the diagonal and below the diagonal.

Both algorithms, LU- and Cholesky-Decomposition, have a computational complexity of . But because of the symmetry, the Cholesky-Decomposition needs around half of the operations compared to LU-Decomposition for a large number of elements. If the symmetry of the symmetric positive definite matrices would also be considered in a more advanced storage algorithm it could be also possible to reduce its space complexity significantly.

Matrices are a key concept in solving linear equation systems. Efficient implementations of matrices are not only considering computation complexity but also space complexity of the matrix data. If the size of a matrix is already definable at compile-time, `std::array<T, size>`

can be a very efficient choice. If the number of elements needs to be defined at runtime either a `std::vector<T>`

or a raw C array (respective `std::unique_ptr<T*>`

) could be of choice. To choose a fitting decomposition algorithm, the characteristics of the linear equation system need to be known. Luckily this is the case for many applications.

Did you like the post?

What are your thoughts? Did you like the post?

Feel free to comment and share this post.

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

In the following chapters, we have a closer look at several algorithms used for root approximation of functions. We will compare all algorithms against the nonlinear function which has **only one root inside a defined range** [*a*,*b*] (this criterion is quite important, otherwise it might be hard to tell which solution we got from the algorithm). An additional criteria is defined by . Please keep in mind that we don’t check against the number of iterations we run. This might be necessary because we usually don’t want to run infinite loops. As usual, you can find all the sources at GitHub.

Let’s start with a method which is mostly used to search for values in arrays of every size, Bisection. But it can be also used for root approximation. The benefits of the Bisection method are its implementation simplicity and its guaranteed convergence (if there is a solution, bisection will find it). The algorithm is rather simple:

- Calculate the midpoint of a given interval [
*a*,*b*] - Calculate the function result of midpoint
*m*,*f(m)* - If new
*m-a*or*|f(m)|*is fulfilling the convergence criteria we have the solution*m* - Comparing sign of
*f(m)*and replace either*f(a)*or*f(b)*so that the resulting interval is including the sought root

As we can see from source code the actual implementation of Bisection is so simple that most of the code is just there to produce the output. The important part is happening inside the while loop which is executing as long as the result of *f(x)* is not fulfilling the criteria . The new position *x* is calculated in method `calculateX()`

which is checking the sign of *f(x)* and, depending on its result, assigning *x* to *a* or *b* to define the new range which is including the root point. Important to note is that the Bisection algorithm needs always results of *f(a)* and *f(b)* with different signs which is assured by the method `checkAndFixAlgorithmCriteria()`

. The algorithm is rather slow with 36 iterations to reach the convergence criteria of , but it will converge with enough given time, regardless what happens.

As long as we have also the derivative of the function and the function is smooth we can use the much faster Newton method where the next closer point to root can be calculated by:

Also this algorithm is iterating as long as the result of *f(x)* is not fulfilling the criteria . The method `calculateX()`

is not only applying the formula of the Newton method, but it is also checking if the derivative of *f(x)* is getting zero. Because floating point numbers can’t be compared directly we have to compare *f'(x)* against the smallest possible floating point number close to zero which we can get by calling `std::numeric_limits<double>::min()`

. In such a case, where *f'(x)* is zero, the algorithm wouldn’t be able to converge (stationary point) as shown in the image below.

The advantage of the Newton method is its raw speed. In our case it just needs 6 iterations until the algorithm is reaching the convergence criteria of . But the algorithm has also several drawbacks as implied above. Such as:

- Only suitable if we know the derivative of a given function
- Only suitable for smooth and continuously functions
- If starting point is chosen wrong or the calculated point at or close to a local maximum or minimum, the derivative of
*f'(x)*gets 0 and we have a stationary point condition. - If the derivative of a function is not well behaving, the Newton method tends to overshoot. Then might be way to far from root to be useful to the algorithm

In many cases, we don’t have, or it might be to complex, a derivative of a function *f(x)*. In such cases, we can use the Secant method. This method is doing basically the same as the Newton method, but calculating the necessary slope not via its function derivative *f'(x)* but through the quotient of two *x* and *y* values calculated with *f(x)*.

This algorithm needs a range [*a*,*b*] which might include (not absolutely necessary but it helps) the root we are searching for. It is using the method `calculateX()`

is not only calculating , its also checking if the denominator is 0. Also, this algorithm is very fast, it needs only 7 iterations in a well-chosen range and 10 iterations in a broader range. Both calculations are fulfilling the same convergence criteria of . As well as the Newton method this algorithm has also its drawbacks which we have to be aware of:

- Only suitable for smooth and continuously functions
- If one of the calculated points at or close to a local maximum or minimum, the derivative of
*f'(x)*gets 0 and we have a stationary point condition. - If the calculated slope quotient of a function has a very low steepness, also the Secant method tends to overshoot. Then might be way to far from root to be useful to the algorithm
- If the range [
*a*,*b*] is chosen wrong, e.g. including local minimum and maximum, the algorithm tends to oscillation. In general minimum and maximum, local or global, are in some cases a problem for the Secant method

The idea of Dekker’s method is now to combine the speed of the Newton/Secant method with the convergence guarantee of Bisection. The algorithm is defined as the following:

- is the current iteration guess of the root and is the current counterpoint of such that and have opposite signs.
- The algorithm has to fulfill the criteria such that is the closest solution to the root
- is the last iteration value, starting with at the beginning of the algorithm
- If s is between m and then , otherwise
- If then otherwise

As we can see in the implementation of Dekker’s method we always calculate both values, *m* and *s*, with the methods `calculateSecant()`

and `calculateBisection()`

and assigning the results depending onto what the method `useSecantMethod()`

is confirming if *s* is between *m* and (as in rule 4 defined). In line 32-33 we confirm if the resulting values of the function f(x) are fulfilling rule number 5. Because of the Bisection, we have to check and correct after each iteration if the condition of $latex f(a_k)f(b_k) < 0 &s=1 & is still accomplished which is done with the method `checkAndFixAlgorithmCriteria()`

.

The Dekker algorithm is as fast as the Newton/Secant method but also guarantees convergence. It takes 7 iterations in case of a well-chosen range and 9 iterations in case of a broader range. As we can see the Dekker algorithm is very fast but there are examples where is extremely small but is still using the secant method. In such cases, the algorithm will take even more iterations as the pure Bisection would take.

Because of Dekker’s slow convergence problem, the method was extended by Brent which is now known as Brent’s method or Brent-Dekker-Method. This algorithm is extending Dekker’s algorithm by using four points (*a*, *b*, and ) instead of just three points, additional Inverse Quadratic Interpolation instead of just linear interpolation and Bisection, and additional conditions to prevent slow convergence. The algorithm decides with the following conditions which of the methods to use, Bisection, Secant method or Inverse Quadratic Interpolation:

- Bisection (B = last iteration was using Bisection)
- Inverse Quadratic Interpolation
- In all other cases use the Secant method

With these modifications, the Brent algorithm is at least as fast as Bisection but in best cases slightly faster than using the pure Secant method. The Brent algorithm takes 6 iterations in case of a well-chosen range and 9 iterations in case of a broader range.

We have seen five different algorithms we can use to approximate the root of a function, Bisection, Newton Method, Secant Method, Dekker, and Brent. All of them have different possibility and drawbacks. In general, we could argue which algorithm to use as the following:

- Use the Newton method in case of smooth and wel- behaving functions where you have also the function’s derivative
- Use the Secant method in case of smooth and well-behaving functions where you don’t have the function’s derivative
- Use the Brent method in cases where you’re not sure or know your functions have jumps or other problems.

Algorithm | Start/Range | No. Of Iterations |

Bisection | [0, 2] | 36 |

Newton | x0 = 1.5 | 6 |

Secant Good | [1.5, 2] | 7 |

Secant Bad | [0, 2] | 10 |

Dekker Good | [1.5, 2] | 7 |

Dekker Bad | [0, 2] | 9 |

Brent Good | [1.5, 2] | 6 |

Brent Bad | [0, 2] | 9 |

Did you like the post?

What are your thoughts? Did you like the post?

Feel free to comment and share this post.

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

- Gauss-Tschebyschow
- Gauss-Hermite
- Gauss-Laguerre
- Gauss-Lobatto
- Gauss-Kronrod

The idea of the Gauss integration algorithm is to approximate, similar to the Simpson Rule, the function *f(x)* by

While *w(x)* is a weighting function, is a polynomial function (Legendre-Polynomials) with defined nodes which can be exactly integrated. A general form for a range of *a-b* looks like the following.

The Legendre-Polynomials are defined by the general formula and its derivative

The following image is showing the 3rd until the 7th Legendre Polynomials, the 1st and 2nd polynomials are just *1* and *x* and therefore not necessary to show.

Let’s have a closer look at the source code:

The integral is done by the `gaussLegendreIntegral`

(line 69) function which is initializing the `LegendrePolynomial`

class and afterward solving the integral (line 77 – 80). Something very interesting to note: We need to calculate the Legendre-Polynomials only once and can use them for any function of order *n* in the range *a-b*. The Gauss-Legendre integration is therefore extremely fast for all subsequent integrations.

The method `calculatePolynomialValueAndDerivative`

is calculating the value (line 50) at a certain node and its derivative (line 51). Both results are used at method `calculateWeightAndRoot`

to calculate the the node by the Newton-Raphson method (line 33 – 37).

The weight *w(x)* will be calculated (line 40) by

As we can see in the screen capture below, the resulting approximation of

is very accurate. We end up with an error of only . Gauss-Legendre integration works very good for integrating smooth functions and result in higher accuracy with the same number of nodes compared to Newton-Cotes Integration. A drawback of Gauss-Legendre integration might be the performance in case of dynamic integration where the number of nodes are changing.

Did you like the post?

What are your thoughts?

Feel free to comment and share this post.

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

The Newton-Cotes formula is a quadratic numerical approximation for integral calculations. The idea is to interpolate the function, which shall be integrated, by a polynomial with equidistant nodes. The polynomial can be of, for example, a form of aligned trapezoids or aligned parables. Common Newton-Cotes integration polynomial rules are

- Trapezoid rule
- Simpson rule
- Romberg
- Pulcherrima
- Milne/Boole rule
- 6-Node rule
- Weddle rule

With this post, we will have a closer look at the first two integration rules, Trapezoidal and Simpson.

The integral approximation by trapezoids is very simple to explain. All we need to do is to subdivide the example function

we want to integrate into equidistant areas whose exact integrations we can sum up. Clearly, the accuracy of the approximation is depending on the number of subdivisions *N*. The width of each area is therefore defined by

with and .

The area of each trapezoid can be calculated by

and therefore the approximated integral of our function *f(x)* can be defined as

The implementation of the Trapezoidal integration is taking 4 parameters, the range [*a*,*b*] of integration of the function *f(x)*, the number of subdivisions *n*, and the function *f(x)*.

The Simpson rule is approximating the integral of the function *f(x)* by the exact integration of a parable *p(x)* with nodes at *a*, *b*, and . In order to increase the approximation accuracy, the function can be subdivided by *N, *similar to the Trapezoidal integral approximation.

The exact integration can be done by summing up the area of 6 rectangles with equidistant width. The height of the first rectangle is defined by , the height of the next 4 rectangles is defined by , and the height of the last rectangle is defined by . As a result, the formula of the approximated integral according to Simpson rule can be defined as

The implementation of the Simpson integration is, similar to the Trapezoidal based solution, taking 4 parameters, the range [*a*,*b*] of integration of the function *f(x)*, the number of subdivisions *n*, and the function *f(x)*

If we integrate the function *f(x)* with the Trapezoidal approach but bisecting the step size based on the step size of the previous step, we get the approximation sequence

The idea of the Romberg integration is now to introduce a y-axis symmetric parable which is crossing the points and extrapolate to .

Therefore every term of the first column *R(n,0)* of the Romberg integration is equivalent to the Trapezoidal integration, were every solution of the second column *R(n,1)* is equivalent to the Simpson rule, and every solution of the third column *R(n,2)* is equivalent to Boole’s rule. As a result, the Formulas for the Romberg integration are

The repository numericalIntegration can be found at GitHub. It will also contain the other numerical integration methods, as the name suggests, later on.

Did you like the post?

What are your thoughts?

Feel free to comment and share this post.

Processing…

Success! You're on the list.

Let’s start with a multi-project example we already know from my post Introduction into an Automated C++ Build Setup with Jenkins and CMake. I have just slightly changed the example hello world application. This time the main function is printing out “Hello World!” onto console using a shared library, called greeter. The Greeter class itself is utilizing the external library {fmt} to print “Hello World” onto the screen. If you’ve wondered about the Gradle directory and files, those are provided by the Gradle wrapper which facilitates us to build the project without even installing Gradle upfront.

The Gradle native build plugin is quite straight forward to configure. Every Gradle project needs a build.gradle file at its root directory as an entry point and one at each subproject. In most cases, we will do general configurations in a build.gradle file located at the root directory. But there is no need, it can also be empty. By default, Gradle is looking for sources in the directory `src/main/cpp`

. For libraries, public headers are defined in `src/main/public`

and in case they should be used only library internal (private) the default directory is `src/main/headers`

. In case we define the headers also in `src/main/cpp`

, the headers are treated as private as well. If we like to overwrite the default source directories we just need to define them according to this example. To be able to resolve dependencies between subprojects, we need to define a settings.gradle file which is including our subprojects `include 'app', 'greeter', 'testLib'`

.

components.withType(ProductionCppComponent) { //By convention, source files are located in the root directory/Sources/ source.from rootProject.file("Sources/${subproject.name.capitalize()}") privateHeaders.from rootProject.file("Sources/${subproject.name.capitalize()}/include") } components.withType(CppLibrary) { //By convention, public header files are located in the root directory/Sources//include publicHeaders.from rootProject.file("Sources/${subproject.name.capitalize()}/include") }

As build definition of our root directory we just simply define IDE support for each subproject. CLion, for example, has native Gradle support, so importing Gradle projects works smooth as silk. Therefore our root build.gradle file looks like the following.

allprojects { apply plugin: 'xcode' apply plugin: 'visual-studio' }

The application build configuration is defined at the app directory starting with calling the `cpp-application`

plugin which is generating an executable file which can be found and executed at `app/build/install/main/{buildType}/{machine}`

. Project internal dependencies can be defined by the `dependencies`

clause with the implementation of the dependency defined as a project and the given name of the dependency. By default, Gradle is assuming the current host as target machine. If we want to consider other target machines we have to declare them as we do in our example with the `targetMachines`

statement.

The library build configuration is defined at the greeter directory starting with calling the `cpp-library`

plugin and the type of linkage, which can be STATIC and SHARED. Gradle is assuming we want SHARED libraries as default. A bit special is the way of how we have to resolve the dependency to the header only library {fmt}. Unfortunately, Gradle is not supporting header only libraries out of the box, but we can accomplish a workaround by adding the include path to the `includePathConfiguration`

of the resulting binary. All other dependencies can be defined as `api`

, in case we want to share the external dependency api with all consumers of our own defined library, or `implementation`

in case we only want to use the dependency api private with our own library. A good example can be found in Gradle’s example repository.

With Gradle, we can not only build applications and libraries, but we can also execute tests to check the resulting artifacts. A test can be defined by the `cpp-unit-test`

plugin which is generating a test executable. In principle, we could use any of the existing big test libraries, such as googletest, but in my opinion, the out of the box solution is pretty neat and lightweight and can be extended quite easily with external libraries.

With this project setup, we can build all artifacts by the command `./gradlew assemble`

and run tests by `./gradlew check`

. If we want to build and run all tests together we can invoke `./gradlew build`

. In case we need a list of all available tasks provided by Gradle and its plugins we can simply list them including their description by the command `./gradlew tasks`

. At GitHub you can find the resulting repository.

UPDATE: Thanks to D. Lacasse I updated the source set configuration section to customize the folder structure of a project.

Did you like the post?

What are your thoughts?

Feel free to comment and share this post.

Processing…

Success! You're on the list.

The IEEE 754 standard is defining the binary representation of floating-point numbers, this is what we focus on now, and their operations. A Floating-point number *x* is defined as

Where *s* is the sign of the number represented by one bit( which is either 1 for negative , and 0 for positive numbers), *m* is the mantissa (also called precision because its size defines the precision of the floating-point model) represented by *p* bits, *b* is the base of the numeral system (which is 2 for our binary numeral system on today’s computers), and *e* is the exponent represented by *r* bits.

The mantissa *m* is defined with a precision *k, *the number (which is in a binary system 0 or 1), and as

To be able to store also negative exponents e with a positive number we need to add a bias, called B. Therefore we can say our exponent e can be stored as the positive number E with . And B can be calculated by . Typical values for single and double precision are:

Type | Size | Mantissa | Exponent | Value of Exponent | Bias |
---|---|---|---|---|---|

Single | 32 | 23 | 8 | 127 | |

Double | 64 | 52 | 11 | 1023 |

Let’s try an example now. How would be the representation of look like?

As a result ‘s binary representation is 11,00100011110101110001 and after normalization with and bit

we get the mantissa and an exponent with bias of . Now looks like the following as represented in the floating point model:

But wait, I remember values like 0.3 are a problem in binary representation. Is that true? Let’s try.

Ok, that seems to be really a problem, fractions like 0.3 can’t be represented exactly in a binary system because we would need an infinite number of bits and therefore rounding has to be applied while converting back from binary to a decimal representation. The machine epsilon is giving the upper bound of the rounding error and can be calculated by

What we have talked about until now is valid for normalized numbers, but what are exactly normalized numbers? And are there denormalized numbers? In short, yes they do. But let’s start with normalized numbers Normalized numbers are numbers which are at least as big as the precision of the system and therefore have a leading implicit number/bit. You remember the 1,1001…? Because every normalized number has this implicit bit we can save one bit storage. Denormalized numbers on the other hand are smaller than the precision and therefore can’t be represented with but with with *de* as the smallest possible normal exponent. So let’s have a look at how all this looks like in a number series and think about what this will tell us.

The number series is representing all normalized numbers in red and all denormalized numbers in blue. As we can see the first red line represents the smallest (or closest) possible positive value to zero. For single precision the smallest absolute value is . For double precision the smallest absolute value is . Also very important to point out is the fact that the distance between the numbers is not equidistant in a binary system, they are increasing by a factor of 2 with each power of two. This is one reason why comparing floating-point numbers can be a hard job.

Herewith I would like to close this post. We’ve had an exciting little journey into number representation of binary systems and experienced their benefits, but also drawbacks. And for sure it’s not an easy topic, there is quite more to say about.

Did you like the post?

What are your thoughts?

Feel free to comment and share this post.

]]>- Building a ready to deploy release on every commit
- Execution of all tests
- Running static code analysis to track code quality
- And easy to extend with automated deployment (CD)

For all the necessary sources, I prepared a GitHub repository. We are focusing here only on a technical part of an automated build process which is a prerequisite of a CI/CD (Continues Integration/Continues Deployment) process. It is quite a bit more necessary than just tools for a company to fully embrace the ideas behind a CI/CD process but with an automated build and test setup your on a good way.

The build, of a Qt and C++ based example desktop application, will be orchestrated by Jenkins declarative pipeline. The example application is based on a simple CMake project generated with CLion. As static code analysis tool, we are using cppcheck. Testing is done with my favorite testing framework Catch2.

In this post, I presume your already familiar with a basic Jenkins setup. If you’re not familiar with Jenkins, jenkins.io is a good source of information.

Jenkins consists of a declarative pipeline defined in a file called Jenkinsfile. This file has to be located in the projects root folder. The declarative pipeline of Jenkins is based upon Groovy as a DSL and provides a very expressive way to define a build process. Even though the DSL is very mighty because it is based on Groovy you can actually write little scripts, its documentation is, unfortunately, a bit mixed up with its predecessor, the scripting-based pipeline. For our example setup, it’s looking the following way.

The syntax is pretty straight forward. A Jenkinsfile starts always with the declaration of a pipeline block, followed by the declaration of an agent. An agent describes on environment our build should be executed. In our case, we want it on any environment setup, but it could be also a labeled or a docker environment.

With the options directive, we define that we want to keep the last 10 build artifacts and code analysis results. With options we could also define a general build timeout, the number of retries we allow in case of a build failure, or the timestamp for console output while running the build.

The parameters directive provides us the possibility to define arbitrary build parameters of the types **string**, **text**, **booleanParam**, **choice**, **file, **and **password**. In our case, we use **booleanParam **to provide the user with an option to define which additional stages the user wants to execute in case of a manual execution of the project.

But even if those configuration possibilities of Jenkins are vast and very interesting, the really important part of the build declaration file is defined by the stages section with its stage directives. With an arbitrary number of possible stages, we have all the freedom to define our build process as we need to. Even parallel stages, for example, concurrent execution of testing and static code analysis, are possible.

With the first stage, “Build”, we are instructing Jenkins to invoke its CMake-Plugin to generate our build setup and resolving all necessary dependencies via a Vcpkg CMake file. Afterward, the build is executed with `cmake --build .`

through the `withCmake: true`

setting. User-defined toolchains are also no problem so we could have also defined several build setups with GCC, Clang and Visual Studio Compiler.

The other stages, Test, Analyse and Deploy, are again pretty straight forward. All of them have one notable thing in common which is the when directive. With this directive, we can control if a stage gets executed if the condition inside the curly braces returns true. In our case, we use the when directive to evaluate the build parameters we introduced at the beginning of this post. The syntax might be a bit irritating at first glance but after all, it does its job.

To get Jenkins executing our nice pipeline we just need to tell it from which repository to pull. This can be done via the project configuration. Here you just need to choose the option ‘Pipeline script from SCM’. If everything is set up correctly you end up with a smooth running automated build process. Even better, if you have configured Jenkins correctly and it has a static connection to the internet, you can connect Jenkins to GitHub over a Webhook. This means that GitHub will invoke a build every time someone pushed a commit to the repository.

To conclude this post I would like to point out that this isn’t **THE BEST WAY** to configure automated builds. It is one way of millions possible. And it’s the way which worked out pretty well for me in my daily work. I introduced an adapted version of this several years ago in our company and Jenkins is serving us great since then. One important fact of Jenkins declarative pipeline is the point that it’s versioned over the SCM as the rest of the code is, and that’s a very important feature.

Did you like the post?

What are your thoughts?

Feel free to comment and share the post.

Processing…

Success! You're on the list.

I’m working in this company for quite a while, basically since I finished my studies of mechanical engineering. But have I’ve been prepared for the professional development of desktop applications? Well…… not really. Back in university, we had C++ seminars for about a year, and the fact that we have been taught C++ was quite new in 2008. If you have studied mechanical engineering before, you would have been probably taught how to code in Fortran 77, or if you’re lucky in Fortran 95. And basically, the level of programming in C++ was the same as it was in the good old Fortran days. No classes, no encapsulation, no C++ idioms, just pure procedural programming. They just exchanged the syntax and the compiler, that’s it. Clearly, the intention was not to fully educate us in the domain of programming, but I think it’s also a bad idea, if not a dangerous one, to give students such a primitive bunch of half knowledge. And that’s in principle what you get as programming background in classical engineering areas (electrical engineers might be on a higher level). You end up with a team in which C++ knowledge is very heterogeneous. From pure procedural programming with bare knowledge about the STL, to very experienced/self-educated people who are firm with advanced metaprogramming and memory management topics. So somehow there has to be a way to raise the level of programming knowledge for everyone. Not only the most proactive developers, but also the ones which are simply not able, for so many reasons, to raise their programming skills.

One day I stumbled upon Jonathan Boccara’s blog Fluent {C++}. And there was one small page on the side, called Daily C++. Johnathan states that you learn the best while your teaching, and I totally agree with my own experience of teaching in seminars. So I thought it would be a good idea to somehow adapt his concept of Daily C++ to our needs and introducing it into the company. Many of his ideas work nicely for us. For example the short 10 to 20 minute presentations and fine granular topics which fit into the time frame. We just felt that having daily presentations, even such short ones would be too exaggerating for us. Once a week would be fine because, with only 17 developers, everyone would have at least a week to prepare him-/herself. To start we carried a list of topics together we wanted to introduce or deepen our knowledge. Not only C++, but also topics about patterns, established architectures, frameworks, tools, but well… mostly C++. We ended up with a list of around 20 presentations topics, such as RAII, Rule of zero/three/five, operator overloading, memory management, observer and factory pattern, and much more. We encourage the developers to pick a topic and prepare it to present in any way they like. That can be on a whiteboard, PowerPoint, purely code, or most often a mixture of several forms. The only requirement is that the topic is documented and self-explanatory. That way developers who haven’t been able to attend can read it whenever they like. The second and last requirement is that everyone should try to pick the topic he’s/she’s the most unfamiliar with. Picking the topic they have the weakest knowledge gives the developers the opportunity to learn and earn the most out of the weekly’s because it’s not just a simple repetition for them. Contrarily to Jonathan’s approach we always gather in a dedicated and equipped meeting room. We experienced an increase in attention and broader discussions afterward. Additional I think a dedicated room is not only helping to focus but also it’s much quieter. Noise can really be a big problem in large open-plan offices.

Let’s summarize the benefits of the Weekly Knowledge Candy:

- Little easy digestible topics
- Developers train their ability to give presentations and speeches
- 10 to 20 minutes focus helps to understand a topic
- Every developer is focusing on his weakest topics. This way the whole team is gaining the most benefits.

Did you like the post?

What are your thoughts?

Feel free to comment and share the post.

Processing…

Success! You're on the list.

To use it all you need to do is to add a single header and source file to compile with. Unfortunately, that’s, in my opinion, also a drawback. Yes, it’s easier to install and usable in training, but it can’t be exchanged by whoever is operating the system after deployment. In such cases, where the operator wants to define how and what to use for logging, a generic logging interface (facade pattern), such as slf4cxx which is similar to its java pendant slf4j, would be preferable.

Loguru is supporting a various number of features, such as callbacks for logging and fatal errors, verbosity levels, assertions and aborts, and stack traces in case of aborts. It even supports {fmt}. Everyone who ever was used to java/spring log outputs will recognize its similarities.

As you can see it’s pretty straight forward to use. We are not only logging several messages on INFO verbosity level, but also a message in a named thread called “complex lambda”. If we wouldn’t have defined the thread name with loguru::set_thread_name(“complex lambda”), loguru would state the name of a thread with a hex id. The main thread gets its name by calling loguru::init(…). Because our small tool is crashing, loguru is printing us a stack trace which in my opinion is not as helpful as expected, but with ERROR_CONTEXT we get a little better output.

That’s it for now with this post. We have now a short introduction into a, until now, rather unknown, but promising, logging library. Loguru is not only capable of producing Thread-Safe and human readable logging messages but also provides a very simple and handy interface to use.

Did you like the post?

What are your thoughts?

Feel free to comment and share the post.

Processing…

Success! You're on the list.

- My God, It’s Full Of Stars: The N-Body-Problem Series
- My God, It’s Full Of Stars: And then there was CMake
- My God, It’s Full Of Stars: Testing Point Mass Attraction with Catch2

In the last post we implemented two simple tests but while implementing the solver I realized we need clearly a bit more tests. Not only to test the solver, but we will need also some sort of Vector2D (later we might need a 3D variant) because we have to store and manipulate the position, velocity, and acceleration of a point mass. And all of these properties can be expressed as 2-dimensional vectors in a 2D space. To be useful Vector2D has to provide a couple of typical operations, such as addition, subtraction, and multiplication and division with scalars. Additional we need to perform some negative tests for comparing a Vector2D. We would also need to test a division by zero case, which is asserted by operator/(), but as far as I found out Catch2 can’t handle such cases at version 2.6.1 which we are using. This is how the test of Vector2D looks like.

With the Vector2D class, we are fulfilling our tests and make them pass. The operator overloads are used to provide arithmetic operations between mathematical 2-dimensional vectors and scalars. Important to note is that the equality operator==() is comparing the absolute subtraction of both vectors. If they are equal, the result of the subtraction would be close to zero which is checked by comparing the result against std::numeric_limits<double>::min(). This utility template is representing the smallest possible finite value of a specific type and is part of the limits header. An alternative would be to check against relations of both values, but for the moment this would cause too many following issues, such as division by zero. Bruce Dawson is suggesting a comparison against an absolute epsilon based value, in case of comparisons against zero, which we are doing with the std::numeric_limits<double>::min().

We also need to extend the solver tests which are now running a couple of performance tests as well. We are doing this because we want to roughly determine how effective our algorithm is. Additional this gives us the ability to compare different algorithms against each other later on. And according to the paper of Christoph Schäfer, we expect a computational complexity around . Even if the PERFORMANCE clause is in a beta state, it works quite well.

Because we will need to generate a random number of point masses we also introduced a simple ParticleBuilder utility class, based on an adapted and simplified builder pattern with a fluent interface, which can generate single particles or any arbitrary number of particles needed. In our case, we use this to generate our benchmark particles. The basic principle of the builder pattern is that each method, which is configuring a specific property of a point mass, is returning a reference to itself. This way we can concatenate all (or only some if necessary) methods together and build at the end the point mass. The actual generation of the point mass is done at the very end of the definition process which is called lazy initialization. To generate the random values of the point mass properties we take, as suggested by Vorbrodt’s article about random number generators, the std::random_device, which we use as a random seed of the std::mersenne_twister_engine::mt19937 to produce the actual random number we need. The range of the random numbers will be uniformly distributed between 1 and the number of point masses to generate.

The Particle class is then basically straight forward. The only important topic that we have to somehow define is how are we determine if two particles are actually the same. This can’t be done by simply comparing their reference, because we never altering the particles them self, but producing new ones after each operation. All particles are handled as immutable. Because of this, we need to somehow assign a unique ID to every point mass which we are doing by simply assigning a class IDCounter onto the id member and incrementing IDCounter afterward.

Last but not least we have the actual implementation of the Euler-Method. The Solver class gets initialized by the necessary time step and has only one simple interface method which takes a std::vector<Particle> as parameter and returns, after computation is done, the result. In order to execute its computation, the solve method is using the methods calculateAcceleration, calculateVelocity, and calculatePosition, which basically all have the same interface. The essence of the computation, the calculation of the acceleration of a point mass by the gravitational force of all the other point masses, is then done inside the static function AccumulateAcceleration. You might realize that we strongly focused on using functionality provided by the STL, there is even no real for loop. This is done that way because I had the idea of using the execution policy’s of the STL, later on, but then realized that this is not supported until now by libc++ (P0024R2).

We accomplished quite a bit up to this point. We got now a simple but working algorithm which is computing us a solution for a discrete time step of the n-body-problem. But how does our first draft perform? First of all, all tests are passing, great. And we are even able to calculate with a reasonable number of point masses. But we have to remind our selves that we need for only one time step with 25.6K point masses around 45 seconds. That’s quite a lot. If we just see the numbers we might realize that the statement of the computational complexity of the Euler-Method is , the diagram below shows it even more drastic. And it’s totally true. If we would exchange the std::transform, of the calculateAcceleration method, and the std::for_each algorithm, of the AccumulateAcceleration function, with simple for loops, we would easily realize that we have a nested for loop of N point masses. The rest is then just math, .

Till this point, we focused only on readability as much as possible, and I think we could even more. But I would like to focus on two certain issues on the next post, the efficiency of the algorithm and parallelization. If you would like to download the project at this state, feel free to get v0.4.0

Did you like this post?

What are your thoughts?

Feel free to comment and share the post.

Processing…

Success! You're on the list.