- Trending Categories
- Data Structure
- Operating System
- MS Excel
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Signed floating point numbers
Were present real numbers in our daily life is not convenient for representing very small numbers, like +0.00000012347650. This same number can be more conveniently represented in scientific notation as +1.23476× 10−07. But this actually stands for +0.000000123476. So there is an error of 0.00000000000005, which forms a very small percentage error.
Floating-point representation is similar in concept to scientific notation. Logically, a floating-point number consists of:
A signed (meaning positive or negative) digit string of a given length in a given base(or radix).This digit string is referred to as the significand, mantissa, or coefficient.
A signed integer exponent which modifies the magnitude of the number.
It is important to note that floating-point numbers suffer from loss of precision when represented with a fixed number of bits (e.g., 32-bit or 64-bit). This is because there is an infinite number of real numbers (even within a small range of says 0.0 to 0.1). On the other hand, a n-bit binary pattern can represent a finite 2n distinct numbers. Hence, not all the real numbers can be represented. Floating number arithmetic is very much less efficient than integer arithmetic. Hence, it is better to use integers if an application does not require floating-point numbers. The nearest approximation will be used instead, resulted in the loss of accuracy. Modern computers adopt IEEE 754 standard for representing floating-point numbers. There are two representation schemes: 32-bit single-precision and 64-bit double-precision.
IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The mantissa is composed of the fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and need not be stored.
The following table shows the layout for the single (32-bit) and double(64-bit) precision floating-point values. The number of bits for each field is shown, followed by the bit ranges in square brackets. 00 =least-significant bit.
|Single Precision||1 ||8 [30–23]||23 [22–00]|
|DoublePrecision||1 ||11 [62–52]||52 [51–00]|
Laid out as bits, floating point numbers look like this −
Single Precision − SEEEEEEE EFFFFFFF FFFFFFFF FFFFFFFF
Double Precision − SEEEEEEE EEEEFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFFFFFFFFFF
- Related Articles
- Reinterpret 64-bit signed integer to a double-precision floating point number in C#
- Java program to multiply given floating point numbers
- C Program to Multiply two Floating Point Numbers?
- Java Program to Multiply Two Floating-Point Numbers
- Swift Program to Multiply Two Floating-Point Numbers
- Haskell program to multiply two floating point numbers
- Kotlin Program to Multiply Two Floating-Point Numbers
- Convert the specified double-precision floating point number to a 64-bit signed integer in C#
- How to Multiply Two Floating-Point Numbers in Golang?
- Program to find GCD of floating point numbers in C++
- Explain what floating point numbers are and what types of floating points are available in Swift
- Write a one line C function to round floating point numbers
- Explain the IEEE Standard 754 Floating-Point Numbers in computer architecture?
- C++ Floating Point Manipulation
- Fixed Point and Floating Point Number Representations