Bladeren bron

README-en.md edited online with Bitbucket

Jose R Ortiz Ubarri 8 jaren geleden
bovenliggende
commit
c6b00a0aa1
1 gewijzigde bestanden met toevoegingen van 15 en 9 verwijderingen
  1. 15
    9
      README-en.md

+ 15
- 9
README-en.md Bestand weergeven

@@ -45,7 +45,7 @@ As part of your new job as IT auditor you suspect that someone in the Chicago Tr
45 45
 
46 46
 ---
47 47
 
48
-## What is Benford’s Law? (copied from the ISACA journal)
48
+## What is Benford’s Law? (adapted from the ISACA journal [1])
49 49
 
50 50
 Benford’s Law, named for physicist Frank Benford, who worked on the theory in 1938, is the mathematical theory of leading digits. Specifically, in data sets, the leading digit(s) is (are) distributed in a specific, non uniform way. While one might think that the number 1 would appear as the first digit 11 percent of the time (i.e., one of nine possible numbers), it actually appears about 30 percent of the time (see Figure 1). The number 9, on the other hand, is the first digit less than 5 percent of the time. The theory covers the first digit, second digit, first two digits, last digit and other combinations of digits because the theory is based on a logarithm of probability of occurrence of digits.
51 51
 
@@ -100,6 +100,8 @@ After reading through all the data, the content of each array element will be th
100 100
 
101 101
 ---
102 102
 
103
+### Frequency of occurrence
104
+
103 105
 The **frequency of occurrence** is defined as the ratio of times that a digit appears divided by the total number of data.  For example, the frequency of leading digit `1` in the example would computed as $$9 / 20 = 0.45$$.  **Histograms** are the preferred visualization of frequency distributions in a data set. In essence, a histogram is a bar chart where the $$y$$-axis is the frequency and a vertical bar is drawn for each of the counted classifications (in our case, for each digit). 
104 106
 
105 107
 ---
@@ -112,16 +114,16 @@ The **frequency of occurrence** is defined as the ratio of times that a digit ap
112 114
 
113 115
 ---
114 116
 
115
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-01.html"
117
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-01.html"
116 118
 <br>
117 119
 
118
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-02.html"
120
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-02.html"
119 121
 <br>
120 122
 
121
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-03.html"
123
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-03.html"
122 124
 <br>
123 125
 
124
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-04.html"
126
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-04.html"
125 127
 <br>
126 128
 
127 129
 ---
@@ -130,11 +132,15 @@ The **frequency of occurrence** is defined as the ratio of times that a digit ap
130 132
 
131 133
 ##Laboratory session
132 134
 
133
-###Exercise 1: Familiarizing yourself with the data files and the provided code 
135
+###Exercise 1: Understand the data files and the provided code 
134 136
 
135 137
 ####Instructions
136 138
 
137
-1. Load the project `BenfordsLaw` onto QtCreator by double clicking the file `BenfordsLaw.pro` in the folder `Documents/eip/Arrays-BenfordsLaw` on your computer. You can also go to `http://bitbucket.org/eip-uprrp/arrays-benfordslaw` to download the `Arrays-BenfordsLaw` folder to your computer.
139
+1.	Load the project  `BenfordsLaw` into `QtCreator`. There are two ways of doing this:
140
+
141
+ a.	Using the virtual machine: Double click the file `BenfordsLaw`.pro` located in the folder `/home/eip/labs/arrays-benfordslaw` of your virtual machine.
142
+
143
+ b.	Downloading the project’s folder from `Bitbucket`: Use a terminal and write the command `git clone http:/bitbucket.org/eip-uprrp/arrays-benfordslaw` to download the folder `arrays-benfordslaw` from `Bitbucket`. Double click the file `BenfordsLaw.pro` located in the folder that you downloaded to your computer.
138 144
 
139 145
 2. The text files `cta-a.txt`, `cta-b.txt`,  `cta-c.txt`,  `cta-d.txt`,  and `cta-e.txt` in the `data` directory contain either real or bogus data. Each line of the file specifies the bus route code and the number of users for that route on a certain day. Open the file `cta-a.txt` to understand the data format. This will be important when reading the file sequentially using C++. Notice that some of the route codes contain characters.
140 146
 
@@ -164,9 +170,9 @@ In the provided code, notice that the data in the arrays `histoNames` and `histo
164 170
 
165 171
 ##Deliverables
166 172
 
167
-1. Use "Deliverables 1" in Moodle to upload the `main.cpp` file with the modifications you made in **Exercise 2**. Remember to use good programming techniques, include the names of the programmers involved, and to document your program.
173
+1. Use "Deliverable 1" in Moodle to upload the `main.cpp` file with the modifications you made in **Exercise 2**. Remember to use good programming techniques, include the names of the programmers involved, and to document your program.
168 174
 
169
-2. Use "Deliverables 2" in Moodle to upload a **pdf** file that contains screen shots of the histograms produced after analyzing each text file. Please caption each figure with the name of the text file and provide your decision as to whether the file contained real or bogus data.
175
+2. Use "Deliverable 2" in Moodle to upload a **pdf** file that contains screen shots of the histograms produced after analyzing each text file. Please caption each figure with the name of the text file and provide your decision as to whether the file contained real or bogus data.
170 176
 
171 177
 ---
172 178