Data Manipulation

Software metrics is a standard of measure that contains many activities, which involves some degree of measurement. The success in the software measurement lies in the quality of the data collected and analyzed.

What is Good Data?

The data collected can be considered as a good data, if it can produce the answers for the following questions −

  • Are they correct? − A data can be considered correct, if it was collected according to the exact rules of the definition of the metric.

  • Are they accurate? − Accuracy refers to the difference between the data and the actual value.

  • Are they appropriately precise? − Precision deals with the number of decimal places needed to express the data.

  • Are they consistent? − Data can be considered as consistent, if it doesn’t show a major difference from one measuring device to another.

  • Are they associated with a particular activity or time period? − If the data is associated with a particular activity or time period, then it should be clearly specified in the data.

  • Can they be replicated? − Normally, the investigations such as surveys, case studies, and experiments are frequently repeated under different circumstances. Hence, the data should also be possible to replicate easily.

How to Define the Data?

Data that is collected for measurement purpose is of two types −

  • Raw data − Raw data results from the initial measurement of process, products, or resources. For example: Weekly timesheet of the employees in an organization.

  • Refined data − Refined data results from extracting essential data elements from the raw data for deriving values for attributes.

Data can be defined according to the following points −

  • Location
  • Timing
  • Symptoms
  • End result
  • Mechanism
  • Cause
  • Severity
  • Cost

How to Collect Data?

Collection of data requires human observation and reporting. Managers, system analysts, programmers, testers, and users must record row data on forms. To collect accurate and complete data, it is important to −

  • Keep procedures simple

  • Avoid unnecessary recording

  • Train employees in the need to record data and in the procedures to be used

  • Provide the results of data capture and analysis to the original providers promptly and in a useful form that will assist them in their work

  • Validate all data collected at a central collection point

Planning of data collection involves several steps −

  • Decide which products to measure based on the GQM analysis

  • Make sure that the product is under configuration control

  • Decide exactly which attributes to measure and how indirect measures will be derived

  • Once the set of metrics is clear and the set of components to be measured has been identified, devise a scheme for identifying each activity involved in the measurement process

  • Establish a procedure for handling the forms, analyzing the data, and reporting the results

Data collection planning must begin when project planning begins. Actual data collection takes place during many phases of development.

For example − Some data related to project personnel can be collected at the start of the project, while other data collection such as effort begins at project starting and continues through operation and maintenance.

How to Store and Extract Data

In software engineering, data should be stored in a database and set up using a Database Management System (DBMS). An example of a database structure is shown in the following figure. This database will store the details of different employees working in different departments of an organization.

Database Management System

In the above diagram, each box is a table in the database, and the arrow denotes the many-to-one mapping from one table to another. The mappings define the constraints that preserve the logical consistency of the data.

Once the database is designed and populated with data, we can make use of the data manipulation languages to extract the data for analysis.