• 검색 결과가 없습니다.

DISASTER RECOVERY AND OTHER THREATS

C. IT Fundamentals

1. Hardware, Software, and Data Organization OPERATING SYSTEMS

Most people are familiar with a version of Microsoft’s Windows as their operating system. There are others also, such as Unix, Linux, OS/2, and Apple’s Macintosh. Among the responsibilities of the operating system is the allocation of computer resources (processors, main memory, printers, etc.) to applications, as well as the scheduling of the applications. Thus, it is critical that the operating system (OS) protect itself and its users.

For security, the OS should have a log-on procedure, requiring a user ID and password. Sometimes, there is then created an access token containing key information about the user and his privileges. The OS might include as well an access control list defining access privileges for each valid user for all system resources.

System audit trails record activity at the system level. They may include keystroke monitoring, recording every key pressed by every user. Or they may involve simply event monitoring, listing activities executed, the user ID, and the starting/stopping times.

PROGRAMMING LANGUAGES

Software is written in a programming language. When computers were first invented, all programs were written in first-generation language, or machine language, instructing the computer precisely what to do, such as what piece of data (the precise numerical location in internal memory and the number of bytes/bits) to be moved to what other location, or to be moved to which internal register to be subtracted from other data that had already been moved to that register. And all of this was written in binary code. Eventually, language translators (assemblers, compilers, and interpreters) were invented to translate from the more natural language (the “source program”) into machine language for execution (the “object program” or “load module”).

Second-generation language – This was an assembler language, which was essentially a one-for-one mapping into machine language instructions from a form that allowed the programmer to use mnemonic abbreviations representing binary-coded instructions.

Third-generation languages – These were actual rudimentary sentences, that could be compiled or interpreted into machine language procedures. They were thus also called procedure-oriented languages, and included FORTRAN, COBOL, BASIC, C, AND PL1.

Fourth-generation languages – These are even more powerful, such as SQL (Structured Query Language) for querying a database.

Event-driven languages – Here we have moved beyond procedural languages, since the program is not executed every time in the programmer’s specified sequence. Instead, the user may alter the flow according to events that he might initiate, by clicking an icon. Languages that allow humans to interface with the computer through screen icons allow GUI (graphical user interfaces).

Object-oriented languages – Here, languages like Java manipulate objects, which are software packets containing both data and instructions.

FILES, AND DATABASE MANAGEMENT SYSTEMS (DBMSs)

Until recently (and even now, to a large extent), all companies organized their data into files, such as the customer master file. That file would be composed of –

- Records (e.g., one record for each customer), which would be composed of

- Fields (e.g., the customer name field, or the customer zip code field) which would be composed of - Bytes, or characters/digits, which would be composed of

- Bits, the binary ones and zeroes a computer uses to represent data.

Files may be organized in the following ways:

¾Sequential, in which all records are in sequence, according to their primary keys (e.g., by customer number for a customer file). Sequentially organized files are efficient for updating or processing an entire master file with a batch of current transactions.

¾Indexed, in which records may be retrieved by the operating system’s searching through a file’s index for the primary key, and the index will provide the disk address, just as an index in the back of a book refers you from a key word to a page number.

¾Randomized, in which a “hashing scheme” or algorithm performed on the primary key provides the disk address. For example, the operating system will divide the key into a number, and the remainder that results from that division will be the disk address of the record with that key. Randomized files provide very fast access for querying particular records.

Files are stored on secondary storage devices, such as – o Tape, if sequential access only is needed

o Magnetic disk, or CD, DVD, if direct or random access is needed

Primary storage consists chiefly of RAM (random access memory) where data and programs are temporarily stored during processing, but it also contains ROM (read-only memory) and cache (very fast-access temporary memory for frequently used items).

Now, database management systems – DBMSs – (software enabling users to create, modify, and utilize an organization’s information, previously stored on multiple separate files) are quite popular, providing:

x Data independence (data exist independent of any particular application, and can be used by any application that is authorized. Thus, for example, “quantity-on-hand” for an inventory item may be used by both the purchasing and the inventory control applications, but is stored and maintained only once.)

x Reduced data redundancy (each data item is stored only once, regardless of how many applications use it) x Accessibility of data by many users, flexibility in designing new forms of output that may draw on different

data

x Organizational cooperation, since one user must be careful not to erroneously change data used by others x Vulnerability. This is a disadvantage, as the reduced redundancy and the accessibility by many users do

require special precautions, such as passwords, frequent back-ups of the database, and transaction logs (files) of each change made to an item of data.

DBMSs include data description language and data manipulation language to facilitate the design, querying, input, and reporting of data. Structured query language (SQL) is a standard, text-based, programming language using keywords “select,” “from,” and “where,” to retrieve data.

The database administrator designs and controls the data dictionary (where each data item in the database is defined and explained), the overall schema (blueprint or layout), and each user’s individual view (subschema).

10-7 DIFFERENT DATABASE SCHEMA DESIGNS

CHAIN (LINKED LIST)

In this example, each record contains data about the invoice, and a “pointer.” By “pointer,” we mean that the physical disk address (or some number from which the disk address may be easily derived) of another record is stored within the first record. So here, the first record contains necessary data fields about Invoice 1 (e.g., invoice number, date, amount) and also contains a field which points to the next invoice in the chain of invoices outstanding for that customer. The asterisk (*) at the end indicates nothing further to which to point. The Chain schema is more like a flat-file than a database, as it is a one-to-one configuration, or “cardinality.” The cardinality is the nature and extent of the relationships among the records of the database.

TREE (HIERARACHY) (ONE-TO-MANY “CARDINALITY”)

Here, one “parent” record may have many pointers – “children” – and each child may have its own children records.

So the customer record points to (contains the disk addresses of) the invoices outstanding for that customer. In turn, each invoice record points to the inventory line items billed on that invoice.

NETWORK (MANY-TO-MANY “CARDINALITIES”)

Here, one “parent” record (finished good record) may point to many “children” records (raw material records).

Moreover, a given child may have many (more than one) parents. And redundancy is reduced, because a given raw material record need exist in only one place, while being pointed to by many finished goods records. Records may be physically dispersed, but logically connected.

Invoice 1 data | Pointer

Invoice 2 data | Pointer

Invoice 3 data | *

Customer Master Record | |

Invoice 1 data | | | Invoice 2 data | |

Line 1 Line 2 Line 3 Line 4 Line 5

Hamburger | | Cheeseburger | | |

Beef Bun Cheese

The three previous schema designs are “navigational” models, where the user must navigate pre-defined

“structured” paths, with embedded pointers. The user must know the structure, and the database may be accessed only along the pre-defined (and thus inflexible) path.

This is different from the – RELATIONAL MODEL PARTS

PARTNO PARTNAME P1032 WHATZIT P1048 FRAMMIS P1079 GIZMO

P1083 WHACHACALLIT

SUPPLIERS

SUPPNO SUPPNAME SUPPADDRESS

S129 Joe's Junk 23 Main St. Mytown, State

S234 Sam's Stuff 1 Chestnut Yourtown, State

S386 Gary's Garage 3 Broadway Histown, State

PRICES – PARTS BY SUPPLIER

PARTNO SUPPNO PRICE

P1032 S234 2.39

P1032 S386 2.45

P1048 S129 4.95

P1079 S129 1.67

P1079 S234 1.89

P1079 S386 1.95

P1083 S129 3.12

P1083 S386 3.08

The relational model keeps its data in multiple separate tables (rows & columns) using no explicit pointers. Instead, relationships may be formed on an ad hoc basis as needed. A supplier’s address need be stored and maintained in only one place, even though that supplier may provide many parts. Similarly a given part may be supplied by many different suppliers, so many-to-many cardinalities are supported, but may require a linking table (parts-by-supplier in this case). The linking table has a composite primary key, partno-suppno in this case. A different linking table may relate parts to finished goods, or to customers who buy them. So there is great flexibility in the relational schema, where many different paths may co-exist.

In the relational model, tables are normalized, resulting in more tables but fewer columns in each one, as each individual table refers to only a single concept. This will reduce redundancy and possible anomalies. Anomalies are types of inconsistencies which would exist if you had everything stored in one large table, for example, with one row per part and all the information about that part on that row. Then, if an inventory part were deleted, you would also delete all the information about the supplier (the delete anomaly) if you purchase only that one part from that supplier. Or, you may have the supplier’s address listed next to each part that supplier supplies, requiring many updates, one or more of which might be neglected (the update anomaly), if the supplier moves. Or, you may not be able to insert a new supplier until you purchase something from him (the insert anomaly). So, if you imagine that everything is initially stored in one large table, normalization involves systematically decomposing it into a set of tables eventually in “third normal form (3NF).” In third normal form, the database is free of anomalies. The database designer systematically eliminates repetitions and unnecessary dependencies, reducing the number of columns (fields, attributes) in the table and instead spinning off additional tables, moving through 1NF, 2NF, and finally to 3NF.

10-9

The relational DBMS can enforce referential integrity. The reference to supplier S234 in the parts-by-supplier table has integrity, because there is such a supplier in the suppliers table.

2. Systems Operations and Processing Modes

Many transaction processing systems use batch processing, in which transactions are accumulated into groups or batches for processing at some regular interval (e.g., daily, weekly, monthly). These batches of transaction records are usually sorted into the same sequence as the master file records before processing against the master file. Hence, 3 physical files are involved: the transaction file, the old (or current) master file, and the new (or updated) master file.

Batch processing is appropriate for applications with high activity ratio – that is, a high percentage of records in the master file are affected by each update run. Payroll is a good example.

The other chief mode is on-line processing, in which individual transactions are processed as they are received, usually at their point of origin. On-line real-time processing processes transactions immediately as they happen (or are captured), providing updated information to users on a timely basis.

On-line real-time processing is appropriate for applications with high volatility – that is, there are many changes to the file per hour.

Let’s consider the phases involved in the batch processing of a payroll application, as an example. First there would be data capture, in which data from time cards may be keyed in. In more modern systems, data may be captured electronically, when employees swipe their plastic ID cards through a reader. Then, there would be an edit phase, where errors in the input transactions (e.g., missing employee numbers) may be detected. Then, if this is batch processing, the time data transactions would likely need to be sorted by employee number preceding the master file maintenance phase, in which the employee master file is updated with the current period’s transactions. In the course of, or following, the maintenance, the reporting phase produces internal reports (such as pay by cost center and payroll register for accounting purposes, labor variances for control, and position control report for management) and external reports (such as taxes withheld).

If there is on-line access, authorized personnel can produce queries, such as lists of employees with certain skills, or ad hoc reports designed by users and produced as needed for a particular purpose.

An audit trail should exist in the payroll (or any other) application, by which transactions may be traced from the original time record to the payroll register, and backwards through the phases.

A telecommunication information system – uses communication technology to move data from distant points.

Distributed data processing (DDP) distributes the processing of the data to users, so that each user can process his own transactions. It may use a centralized database, in which remote users request data and then transmit it back to the central location. Or, it may distribute the database to the users. With distributed data, the system may use a partitioned database, in which each user gets at his local workstation the segment of the database for which he is the primary user. Or, the distributed data may be replicated, where each user gets a complete copy of the database. The replicated database implementation is primarily justifiable simply to support read-only queries, with a high degree of data sharing.

With DDP, we want data concurrency, where each user has access to accurate and up-to-date records. Thus lockout procedures may be necessary, in which software prevents simultaneous accesses to the same data item, where two users may be attempting to change the same record at the same time.

A local area network (LAN) is a linked federation of computers in close proximity (same floor or same building).

Each workstation needs a network interface card fitted into one of the PC’s expansion slots. Generally, there is (at least) one server to store common software and data.

A wide area network (WAN) is a network more geographically dispersed. It uses bridges to connect same-type LANs, and gateways to connect different types of LANs, or LANs to WANs, or PCs to mainframes.

The topology (physical arrangement) may be a star (one server computer in the middle, with an individual link from it to each workstation), a hierarchy (connected like an organization chart), a ring (a circle of equal workstations, also called peer-to-peer), or a bus (a single cable, like a bus going down the street, picking up workstation messages and dropping them off)

Instead of purchasing and maintaining its own transmission media for, say, electronic data interchange with a trading partner, a company may use a VAN (value-added network). This is a public network that adds value to the data communications process by handling the interfacing with multiple types of hardware and software used by different companies, each with its own “mailbox” on the VAN.

Many CPA firms use VPNs (virtual private networks) to allow its associates to use the Internet in a secure, encrypted manner to communicate while working outside the office. The remote worker uses the LAN as if he is in the same office (except for slower response time).

A client-server system distributes processing between a server (a central file storage site which may search for and distribute an individual record requested by a user) and the clients (workstations which may read or update the record). It can work with different topologies. The server stores shared databases and system software, while individual applications (e.g., spreadsheets) and data may reside on the client workstation.

Client-server systems are replacing mainframe systems, because they use cheaper hardware & software, and they are flexible & expandable. Instead of centralizing all data, applications, and expertise, client-server systems distribute them. Empowering users, they also require more skill from users in technology, output design, & controls.

CASE (Computer-aided software engineering) tools are now widely employed to use computer software to build computer software, increasing the productivity of systems professionals. For example, they can take a data flow diagram and lead the developer to create a system based on it.