RSS

Tag Archives: Technology

Computers, math, numbers… calculations have changed, has your IT shop done the same?

Here is a geek question to ask your friends and family, “Are computers good at math?” I had a little get together with a wide assortment of people and I went to each one of them and asked that very question. The results were overwhelming: 90% of the people responded yes, while one person responded: “not at all!” With all that information, you should be able to tell that I had 10 people in the group with 9 saying yes and 1 no, but I digress… nobody likes word math problems, especially computers.

The funny part is, most math is hard for a computer. Simple Integer math is not bad, but fractions, really big numbers, or really small numbers are complex and time-consuming. We should start with a fairly easy refresher just to get everybody on the same page. Take for instance 1 + 2. That is simple enough to convert the easy to read decimal to binary and then you calculate:

00000001 +

00000010 =

00000011* this assumes non-signed 1-byte or 8-bit binary number. 

Again, just in case you have forgotten how binary works, a very simple refresher:†

No alt text provided for this image

Yes, the original Byte, or 8-bits if you like, could hold a possible value between 0 and 255, for a total max number size of 256, because zero is a value. Which, for non-computer people does not really matter, but for programmers depending on the language being used to develop applications, an Array Index might start at 0 or 1 which over the years has caused many bugs in the system… but again, I digress. 

However, when it comes to math and computers that “is” the way it is, with many gotchas that you don’t realize are there until you dig deeper. However, since most software does calculations, most people assume that computers are good at math. 

Back to our example of adding 1 + 2 = 3. Looking at the above chart you can see if you had a number 00000011 that would equal 3. Now, what if you wanted 6 divided by 2? Easy as pie, as they are standard integer numbers that return whole numbers. However, how does a computer calculate a simple 13 divided by 7 equation?  

Go ahead, use a calculator and divided 13 by 7 or calculate it out, but the result depends on the software being used (I am using Excel 2010) and the rounding approach being applied, it will give an answer close to 1.857142857. Is that number, right?  

No alt text provided for this image

I know you are going to love the answer: maybe! It depends on what your PRECISION is set to with regards to Significant Digits or the Significant Digits of Precision (including the sign and decimal point). In most standard libraries that max number is 15 as it is in Microsoft Excel as an example. Please do not think I’m picking on Microsoft’s Excel, I use that software because I know many people use it and rely on the numbers it calculates to be correct, and for the vast majority of calculations it is, unless it is a really big or small numbers and that is where the problem comes in. Again, depending on the computer hardware, language, and compilers being used, the numbers returned may not be as exact as one might expect.

This is where High Precision (HPM) and Arbitrary Precision libraries come onto the scene. These libraries allow the users to get more exact numbers with higher precision and that can make the difference from being close to being exact. For the rest of this write-up, I will be using HPM for representing the use of High Precision Math.

As an aside, the number returned by Excel was: 1.857142857, but the number returned from the Microsoft Calculator program starting in Windows Vista and above OSs returns 1.8571428571428571428571428571429. The new Calculator program supports HPM, which returns up to 34 Significant Digits. Again, the difference is stark for many reasons:

1.857142857 (Excel)

1.8571428571428571428571428571429 (MS Calculator)

Do you see the issue beyond the less number of digits? Notice the rounding up. In the first number the last returned digit is 7, but as you can see using HPM the number that comes after 7 is a 1, thus the number is truncated to a 7. Again, in the HPM number the last number is a 9, but since the number of the remainder is repeating, we can see that the number was really an 8 but was followed by a 5, which when using the round up strategy it becomes 9 and truncated. This follows the IEEE 754-2008 – Round to nearest; ties to even and Round to nearest; ties away from zero, there are three other round strategies in the support specification. However, they are what is called directed, meaning you must set the round type during the operation for the specified round strategy to be applied. [1]

While large or small significant digits are very important in calculations there is another subject that has to be brought up, and that is the ROUNDING strategy being used when a calculation is being executed. Just for Integer math, there are Tie-Breaking rules, which usually can be set by the programmer and set in some applications for more user control. These rules include: Round half up, Round half down, Round half towards zero, Round half away from zero, Round half to even, Round half to odd and many others depending on the numbers being used. Some of the others include: Rounding simple fractions, Scaled rounding, Round to available value, Floating-point rounding, and my favorite Double rounding.

What the heck is Double rounding, and why should I care? Double rounding is the process of rounding a number twice in succession to get to a different precision. In other words, you could have a number 99.46, round up to 99.5, and then round up again to 100! There were two cases, Martinez v. Allstate and Sendejo v. Farmers, litigated between 1995 and 1997, where the insurance companies argued that double rounding premiums were permissible and in fact required. The US courts ruled against the insurance companies and ordered them to adopt rules to ensure single rounding. [2]

In addition, depending on the computer language being used and the standards followed, these rules may have to be tweaked. For instance, the Java language is supposed to run the same on different machines and special tricks had to be used to make the answers the same on X86 architectures. Java now allows for different behavior as a standard but offers the strictfp (strict floating point) qualifier when it must match across architectures. 

Let that sink in for a second. You could have a bill that calculates on an X86 architecture say for $5.00 dollars and that Java application is moved to different hardware architecture and it returns $5.10, the “correct” calculation was $5.00. But now on different hardware, we have a rounding issue where 10 cents are added, they have 12 million customers that pay every month… that would net them an additional $14,400,000 dollars a year! (12,000,000 x .10 x 12) This is exaggerated, therefore say it was a penny; it still is $1,440,000 extra per year! That is why it matters!

However, when it comes to rounding, I’m not done. There are other specialized standards, two of which dictate rounding strategy used in weather observations, the first being U.S. Weather Observations. Back in 1966, the U.S. Office of the Federal Coordinator for Meteorology dictated weather data should use the “round half up” approach, so 1.5 would be rounded to 2 and -1.5 would be -1. The second-round strategy is Negative zero in meteorology; write “−0” to indicate a temperature between 0.0 and −0.5 degrees (exclusive) that was rounded to an integer. [3] [4] Some of the definitions and examples are from the Wikipedia page on Rounding and I recommend anybody that is interested in more details on rounding to check that page for more information and links to even more articles on the subject.

No alt text provided for this image

Wow, so far, this paper has tried to show that math calculations can be difficult for computers. We found out that many applications don’t support HPM or even higher precision called Arbitrary Precision, and that different rounding strategies can be used to get different outcomes and that the computer languages, hardware, and Math libraries can also produce different results as well. Therefore, you hopefully have some questions by now and one of them should be, “does this matter?”

The answer is, “it should!” In most cases, you want the calculations to be as exact as possible. Moving to a math standard within an organization should be important to all people who care if the numbers are correct. These number differences can have huge consequences on various calculations. 

Here is a real-world example the company I was working for used the standard math library with 15-digit precision. A customer created a cipher calculation to encrypt his customer’s sensitive data files an the numbers being generated where huge, beyond the 15-digit precision, so the compiler / application returned Scientific Notation numbers for those really large numbers. When we introduced HPM, the customer was excited; they could finally get the real numbers and technically a better cipher for encryption.

However, when he changed over the HPM, the digits generated using Scientific Notation (SN) dropped the actual numbers 4660000000000000 = 4.66 x 107 (as an example) results in a number 4.9862E+17 or 498,620,000,000,000,000. However, the real number might have been 4667148014973410 and now when you multiply that by 107 the real number is: 499,384,837,602,154,870 or in the old Scientific Notation: 4.99385E+17. 

No alt text provided for this image

Needless to say, when he calculated the cipher with the HPM the old numbers and new numbers did not match, which meant they could not un-encrypt the existing files. Therefore, they had to unlock using the old algorithm and re-encrypt with the new algorithm using HPM with the more exact numbers. The net result was that the new cipher was a much more durable cipher because if people knew the seed number and knew that Scientific Notation was used on numbers greater than 15 Significant digits, a brute-force attack was much easier to be applied and resulting exposure of data much easier.

Therefore, what should you do? First, know that Microsoft Excel does not support HPM.  There are libraries and add-ons that will support it but out of the box, it supports 15 significant numbers. In addition, you now know that Compilers, Hardware, and Math libraries can give vastly different answers to the same calculation. You need to ask the following questions:

  • Do we have calculations that could require high precision, which again usually means numbers larger or smaller than 15 significant numbers including the decimal point and sign?
  • Do we have any calculation using Scientific Notation (loss of precision)?
  • Make sure all computer languages being used return the same numbers using the same calculations so that you don’t lose or gain precision when data crosses barriers.
  • Define a standard you want to get to:
  • IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point computation established in 1985.
  • Ensure all Compilers and Hardware support the established standard: does Google’s Go language which supports Arbitrary Precision match the C++ and Math library we are using? Are there differences in calculations?
  • Unit testing and beyond: make sure your testing includes actually returned numbers and on different hardware to ensure consistent calculations across your infrastructure.

Keep in mind that there are many computer models out there in the world today that we base our livelihoods on, such as crash simulation tests for automobiles, airplanes, and city growth patterns, and so much more. It is important to understand that the old adage is true; garbage in, garbage out. Some of these models may rely on older approaches and over time when new math libraries get introduced, the outcomes could become much different. Therefore, understand what is available, figure out if there is an exposure possible, and then fix as needed and always test, test, and I would test one more time for good measure.

One final example of why having a higher math precision is good a good thing. In addition, it should be fun for all. “All’ is relative to people who like these kinds of esoteric knowledge! Again, using higher precision does not give a CORRECT answer, it gives you a “more correct or precise” answer than using lower precision.  Question: “How many gallons of water are currently on the Earth?” We know that the 71% of the Earth is covered with water and the oceans hold about 97% of that water. 

Now, let’s break this down so that we can understand it better:

No alt text provided for this image

Now it is just simple multiplication of the two numbers in both Excel 2016 and Microsoft Calculator (Windows® Vista and above):

No alt text provided for this image

Note that the Scientific Notation used by Excel is using the round strategy of half-round up and makes the calculated significant digits are 365,771,000 (trillion gallons), whereas the high precision number calculated by Calculator is 365,770,900 – that is over 100 trillion gallons difference. Not bad, unless you are where that 100 trillion gallons of water is covering the land where you are standing. That would be around 908 cubic miles of water coverage… that may be an important point if that is what is being calculated. However, the Scientific Notation number 3.65771E+20 may be fine if you just wanted a spitball estimate.

Now using the Cipher, Rounding Strategies, Scientific Notation, and Higher Precision, we can see how the computer calculated numbers as very important. Again, I’m always for higher precision as it makes for more “accurate” numbers. I hope this write-up gives a little better understanding of what it takes, how it works, and how numbers can be manipulated to anything a person would want.

I have also included a link to an outstanding article “What Every Computer Scientist Should Know About Floating-Point Arithmetic” (from 1991) for programmers, and I would recommend a review, it is a great article.

[1]http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

[2]http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57469 ISBN 0-8330-2601-1.

[3]https://www.coursehero.com/file/p2kb4ot/In-a-guideline-issued-in-mid-1966-the-US-Office-of-the-Federal-Coordinator-for/

[4]http://www.nws.noaa.gov/om/coop/Publications/coophandbook2.pdf page 36

 
Leave a comment

Posted by on July 3, 2019 in Opinion

 

Tags: , ,

Are Algorithms really non-bias?

You hear it all the time, “the search algorithm is unbiased in the results.” The program uses algorithms to do whatever tasks are needed. But, the big question should be: do algorithms have a bias? 

Well, the answer is YES! The algorithm is as bias as the programmer programming it. At this point, most people reading are going to ask, what is an algorithm?

Definition: A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. (Google)

Let’s put this into practice, and look at some pseudo code:

1.     Flip Coin

2.     Is it heads or is it tails?

This is the classic flip-a-coin problem and because the coin is two-sided, it should flip heads 50% and tails 50%. However, if flipping the coin 10 times, you may get more of one side than the other, but if you continue to flip the coin, say 100 times, most likely it will become almost 50% heads and 50% tails. Now, that is assuming the coin is not weighted or tampered with… which, does that not sound like a bias, but what if someone took the time to weight a coin so that when it flipped it would come up a certain way more often?

In computer science we would use the Random function to return a random number based off a seed number; the closer to 1 means heads and the closer to 0 means tails. Let me state this before someone who programs for a living — yes, I know Random is not always Random, but for this simple example, it will do the trick. As an aside, it turns out writing a truly Random number generator is not as easy as one might think! 

Sorry, back to our example. If the random number is closer to 1 it is heads, and closer to 0 then it is tails. In most cases, if you executed the program as stated above it would eventually come out as 50% for each side.

What if, we want the flip of the coin to come up tails more often? For example, we found out that if a customer flips the coin and it comes up tails before shopping that they will buy 20% more goods at our online store. Therefore, I could just simply add a round process and depending on the round strategy I pick, I could make it so the tails side comes up more often… but this is way random (pun intended). I need more control and I have to make the coin flip look more random because if you come to the store and each time flip the coin and 90 times out of 100 you got tails, you might start asking questions or thinking the system is rigged. Thus, we need to increase the tails outcome by only 30% so now I add a quick calculation to see if the result is within my bias of 30% of the time it is more likely the flip will be tails.

What have we just done? We have made the Algorithm bias! So, as you can see, programming a bias into an algorithm is rather easy. However, let me give one more concrete example of how this works every day in Las Vegas! The gaming regulators basically state the machines must give out X amounts of wins over a period of time. The casino can manipulate the slots algorithm to not pay out any wins for 10 hours and then allow a streak to happen… thus, gaining more attention and people passing by see it and start playing because the machines are hot! However, they may not be, it may be that the slot machine has not been played for 10 hours and must pay out and is trying to get to the defined 30% win ratio! This is a complete bias algorithm. If the casinos wanted to and could get away with it, they would manipulate the machine’s algorithm to hardly ever pay! This does actually occur in some bars where they have a one-off slot machine and may not be regulated as the regular casinos are.

I hope this helps to better understand what you hear on the news and why it is important to understand as we get further down the road with Artificial Intelligence (AI) and systems doing things for us, that all these capabilities are based on algorithms and they are as bias as the programmer who writes them. Therefore, Yes, the algorithms can have a bias!

 
Leave a comment

Posted by on June 29, 2019 in Opinion

 

Tags: ,