Browse Source

README-en.md edited online with Bitbucket

Jose R Ortiz Ubarri 8 years ago
parent
commit
c6b00a0aa1
1 changed files with 15 additions and 9 deletions
  1. 15
    9
      README-en.md

+ 15
- 9
README-en.md View File

45
 
45
 
46
 ---
46
 ---
47
 
47
 
48
-## What is Benford’s Law? (copied from the ISACA journal)
48
+## What is Benford’s Law? (adapted from the ISACA journal [1])
49
 
49
 
50
 Benford’s Law, named for physicist Frank Benford, who worked on the theory in 1938, is the mathematical theory of leading digits. Specifically, in data sets, the leading digit(s) is (are) distributed in a specific, non uniform way. While one might think that the number 1 would appear as the first digit 11 percent of the time (i.e., one of nine possible numbers), it actually appears about 30 percent of the time (see Figure 1). The number 9, on the other hand, is the first digit less than 5 percent of the time. The theory covers the first digit, second digit, first two digits, last digit and other combinations of digits because the theory is based on a logarithm of probability of occurrence of digits.
50
 Benford’s Law, named for physicist Frank Benford, who worked on the theory in 1938, is the mathematical theory of leading digits. Specifically, in data sets, the leading digit(s) is (are) distributed in a specific, non uniform way. While one might think that the number 1 would appear as the first digit 11 percent of the time (i.e., one of nine possible numbers), it actually appears about 30 percent of the time (see Figure 1). The number 9, on the other hand, is the first digit less than 5 percent of the time. The theory covers the first digit, second digit, first two digits, last digit and other combinations of digits because the theory is based on a logarithm of probability of occurrence of digits.
51
 
51
 
100
 
100
 
101
 ---
101
 ---
102
 
102
 
103
+### Frequency of occurrence
104
+
103
 The **frequency of occurrence** is defined as the ratio of times that a digit appears divided by the total number of data.  For example, the frequency of leading digit `1` in the example would computed as $$9 / 20 = 0.45$$.  **Histograms** are the preferred visualization of frequency distributions in a data set. In essence, a histogram is a bar chart where the $$y$$-axis is the frequency and a vertical bar is drawn for each of the counted classifications (in our case, for each digit). 
105
 The **frequency of occurrence** is defined as the ratio of times that a digit appears divided by the total number of data.  For example, the frequency of leading digit `1` in the example would computed as $$9 / 20 = 0.45$$.  **Histograms** are the preferred visualization of frequency distributions in a data set. In essence, a histogram is a bar chart where the $$y$$-axis is the frequency and a vertical bar is drawn for each of the counted classifications (in our case, for each digit). 
104
 
106
 
105
 ---
107
 ---
112
 
114
 
113
 ---
115
 ---
114
 
116
 
115
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-01.html"
117
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-01.html"
116
 <br>
118
 <br>
117
 
119
 
118
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-02.html"
120
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-02.html"
119
 <br>
121
 <br>
120
 
122
 
121
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-03.html"
123
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-03.html"
122
 <br>
124
 <br>
123
 
125
 
124
-!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-04.html"
126
+!INCLUDE "../../eip-diagnostic/benfords-law/en/diag-benford-law-04.html"
125
 <br>
127
 <br>
126
 
128
 
127
 ---
129
 ---
130
 
132
 
131
 ##Laboratory session
133
 ##Laboratory session
132
 
134
 
133
-###Exercise 1: Familiarizing yourself with the data files and the provided code 
135
+###Exercise 1: Understand the data files and the provided code 
134
 
136
 
135
 ####Instructions
137
 ####Instructions
136
 
138
 
137
-1. Load the project `BenfordsLaw` onto QtCreator by double clicking the file `BenfordsLaw.pro` in the folder `Documents/eip/Arrays-BenfordsLaw` on your computer. You can also go to `http://bitbucket.org/eip-uprrp/arrays-benfordslaw` to download the `Arrays-BenfordsLaw` folder to your computer.
139
+1.	Load the project  `BenfordsLaw` into `QtCreator`. There are two ways of doing this:
140
+
141
+ a.	Using the virtual machine: Double click the file `BenfordsLaw`.pro` located in the folder `/home/eip/labs/arrays-benfordslaw` of your virtual machine.
142
+
143
+ b.	Downloading the project’s folder from `Bitbucket`: Use a terminal and write the command `git clone http:/bitbucket.org/eip-uprrp/arrays-benfordslaw` to download the folder `arrays-benfordslaw` from `Bitbucket`. Double click the file `BenfordsLaw.pro` located in the folder that you downloaded to your computer.
138
 
144
 
139
 2. The text files `cta-a.txt`, `cta-b.txt`,  `cta-c.txt`,  `cta-d.txt`,  and `cta-e.txt` in the `data` directory contain either real or bogus data. Each line of the file specifies the bus route code and the number of users for that route on a certain day. Open the file `cta-a.txt` to understand the data format. This will be important when reading the file sequentially using C++. Notice that some of the route codes contain characters.
145
 2. The text files `cta-a.txt`, `cta-b.txt`,  `cta-c.txt`,  `cta-d.txt`,  and `cta-e.txt` in the `data` directory contain either real or bogus data. Each line of the file specifies the bus route code and the number of users for that route on a certain day. Open the file `cta-a.txt` to understand the data format. This will be important when reading the file sequentially using C++. Notice that some of the route codes contain characters.
140
 
146
 
164
 
170
 
165
 ##Deliverables
171
 ##Deliverables
166
 
172
 
167
-1. Use "Deliverables 1" in Moodle to upload the `main.cpp` file with the modifications you made in **Exercise 2**. Remember to use good programming techniques, include the names of the programmers involved, and to document your program.
173
+1. Use "Deliverable 1" in Moodle to upload the `main.cpp` file with the modifications you made in **Exercise 2**. Remember to use good programming techniques, include the names of the programmers involved, and to document your program.
168
 
174
 
169
-2. Use "Deliverables 2" in Moodle to upload a **pdf** file that contains screen shots of the histograms produced after analyzing each text file. Please caption each figure with the name of the text file and provide your decision as to whether the file contained real or bogus data.
175
+2. Use "Deliverable 2" in Moodle to upload a **pdf** file that contains screen shots of the histograms produced after analyzing each text file. Please caption each figure with the name of the text file and provide your decision as to whether the file contained real or bogus data.
170
 
176
 
171
 ---
177
 ---
172
 
178