PDF Data structure and algorithm in Python - Array-Based Sequences - GNU

(1)

Array-Based Sequences

Seongjin Lee

Dept of Aerospace and Software Engineering, Gyeongsang National University

(2)

Python’s Sequence Types

In this chapter, we explore Python’s various “sequence” classes, namely the built-inlist, tuple, andstrclasses.

Commonality

• supports indexing to access an individual element of a sequence, using a syntax such asseq[k];

• uses a low-level concept known as an arrayto represent the sequence.

Diﬀerences

• the abstractions that these classes represent

• the way that instances of these classes are represented internally by Python

2

(5)

Python’s Sequence Types

Commonality

• uses a low-level concept known as anarray to represent the sequence.

2

(6)

Python’s Sequence Types

Commonality

• uses a low-level concept known as anarray to represent the sequence.

Diﬀerences

2

(7)

Because these classes are used so widely in Python programs, and because they will become building blocks upon which we will develop more complex data structures, it is imperative that we establish a clear understanding of both thepublic behavior andinner workingsof these classes.

3

(8)

Low-Level’s Arrays

(9)

Memory Address

Each byte of memory is associated with a unique number that serves as its address.

Memory addresses are typically coordinated with the physical layout of the memory system, and so we often portray the numbers in sequential fashion.

4

(10)

Low-Level’s Arrays

Definition : Array

A group of related variables can be stored one after another in a contiguous portion of the computer’s memory. Such a representation is denoted as array.

5

(11)

Example

A text string is stored as an ordered sequence of individual characters.

We describe this as an array of six characters, even though it requires 12 bytes of memory. We will refer to each location within an array as acell, and will use an integer index to describe its location within the array, with cells numbered starting with 0, 1, 2, and so on.

6

(12)

Low-Level’s Arrays

Each cell of an array must use the same number of bytes. This

requirement is what allows an arbitrary cell of the array to be accessed in constant time based on its index.

Example

Given the memory address at which an array starts, the number of bytes per element, and a disired index within the array, the appropriate memory address can be computed usingstart + cellsize * index.

7

(13)

Referential Arrays

(14)

Referential Arrays

In Python,each cell of the array must use the same number of bytes.

Given a list of names:

[ ’ R e n e ’ , ’ J o s e p h ’ , ’ J a n e t ’ , ’ J o n a s ’ , ’ H e l e n ’ , ’ V i r g i n i a ’ ]

How to represent such a list within an array?

Way 1: Reserve enough space for each cell to hold the maximum length string, but that would be wastfull.

8

(15)

Referential Arrays

How to represent such a list within an array?

string, but that would be wastfull.

8

(16)

Referential Arrays

How to represent such a list within an array?

Way 1: Reserve enough space for each cell to hold the maximum length string, but that would be wastfull.

8

(17)

Way 2: Python represents a list or tuple instance using an internal storage mechanism of an array of objectreferences. At the lowest level, what is stored isa consecutive sequence of memory addresses at which the elements of the sequence reside.

9

(18)

Referential Arrays

Although the relative size of the individual elements may vary,the number of bits used to store the memory address of each element is fixed. In this way,Python can support constant-time access to a list or tuple element based on its index.

10

(19)

A single list instance may includemultiple references to the same object as elements of the list, and it is possible for a single object to be an element of two or more lists, asthose lists simply store referencesback to that object.

11

(20)

Referential Arrays

Example

When you compute a slice of a list, the result is a new list instance, but that new list has references to the same elements that are in the original list.

12

(21)

13

(22)

Referential Arrays

> > > d a t a = [0] * 8

produces a list of length 8, with all 8 elements being the value 0.

Technically,all 8 cells of the list reference the same object.

14

(23)

> > > d a t a [2] += 1

Due to the fact thatthe referenced integer is immutable, the above command does not change the value of the existing integer instance.

15

(24)

Referential Arrays

> > > p r i m e s = [2 ,3 ,5 ,7 ,11 ,13 ,17 ,19]

> > > e x t r a s = [23 ,29 ,31]

> > > p r i m e s . e x t e n d ( e x t r a s )

Theextendcommand is used to add all elements from one list to the end of another list.

The extended list does not receive copies of those elements, it receives references to those elements.

16

(25)

> > > p r i m e s = [2 ,3 ,5 ,7 ,11 ,13 ,17 ,19]

> > > e x t r a s = [23 ,29 ,31]

> > > p r i m e s . e x t e n d ( e x t r a s )

Theextendcommand is used to add all elements from one list to the end of another list.

The extended list does not receive copies of those elements, it receives

references to those elements. 16

(26)

Low-Level’s Arrays

Compact Arrays in Python

(27)

Strings are represented using an array of characters(NOT an array of references).

We will refer to this more direct representation as acompact array becausethe array is storing the bits that represent the primary data (characters, in the case of strings).

17

(28)

Compact Arrays in Python

Compact arrays have several advantages over referential structures in terms of computing performance.

• overall memory usage will be much lower

• the primary data are stored consecutively in memory

18

(29)

Primary support for compact arrays is in a module namedarray. That module defines a class, also namedarray, providing compact storage for arrays of primitive data types.

19

(30)

Compact Arrays in Python

> > > f r o m a r r a y i m p o r t a r r a y

> > > p r i n t( a r r a y ( ’ i ’ , [1 ,2 ,3 ,4 ,5]) ) a r r a y ( ’ i ’ , [1 , 2 , 3 , 4])

> > > p r i n t( a r r a y ( ’ f ’ , [1 ,2 ,3 ,4 ,5]) ) a r r a y ( ’ f ’ , [1.0 , 2.0 , 3.0 , 4 . 0 ] )

> > > p r i n t( a r r a y ( ’ d ’ , [1 ,2 ,3 ,4 ,5]) ) a r r a y ( ’ d ’ , [1.0 , 2.0 , 3.0 , 4 . 0 ] )

20

(31)

21

(32)

Dynamic Arrays

(33)

When creating a low-level array in a computer system, the precise size of that array must be explicitly declared in order for the system to properly allocate a consecutive piece of memory for its storage.

22

(34)

Dynamic Arrays

Because the system might dedicate neighboring memory locations to store other data, the capacity of an array cannot trivially be increased by expanding into subsequent cells. In the context of representing a Python tupleorstrinstance, this constraint is no problem.

23

(35)

Python’s list class presents a more interesting abstraction.

Although a list has a particular length when constructed, the class allows us to add elements to the list, with no apparent limit on the overall capacity of the list.

To provide this abstraction, Python relies on an algorithmic sleight of hand known as adynamic array.

24

(36)

Dynamic Arrays

A list instance maintains an underlying array that often hasgreater capacity thanthe current length of the list.

This extra capacity makes it easy to append a new element to the list by using the next available cell of the array.

25

(37)

Dynamic Arrays

If a user continues to append elements to a list, any reserved capacity will eventually be exhausted.

• In that case, the class requests a new, larger array from the system, and initializes the new array so that its prefix matches that of the existing smaller array.

• At that point in time, the old array is no longer needed, so it is reclaimed by the system.

26

(38)

Dynamic Arrays

If a user continues to append elements to a list, any reserved capacity will eventually be exhausted.

• In that case, the class requests a new, larger array from the system, and initializes the new array so that its prefix matches that of the existing smaller array.

• At that point in time, the old array is no longer needed, so it is reclaimed by the system.

Like hermit crab

26

(39)

listsize.py i m p o r t sys

try:

n = int( sys . a r g v [ 1 ] ) e x c e p t:

n = 100 d a t a = []

for k in r a n g e( n ) : # N O T E : m u s t fix c h o i c e of n a = len( d a t a ) # n u m b e r of e l e m e n t s

b = sys . g e t s i z e o f ( d a t a ) # a c t u a l si z e in b y t e s p r i n t( ’ L e n g t h : { 0 : 3 d }; S i z e in b y t e s : { 1 : 4 d } ’ . f o r m a t( a , b ) )

d a t a . a p p e n d ( N o n e ) # i n c r e a s e l e n g t h by one

27

(40)

Dynamic Arrays

$ p y t h o n l i s t s i z e . py 6

L e n g t h : 0; S i z e in b y t e s : 64 L e n g t h : 1; S i z e in b y t e s : 96 L e n g t h : 2; S i z e in b y t e s : 96 L e n g t h : 3; S i z e in b y t e s : 96 L e n g t h : 4; S i z e in b y t e s : 96 L e n g t h : 5; S i z e in b y t e s : 128

28

(41)

Implementing a Dynamic Array

(42)

Implementing a Dynamic Array

Although the Python list class provides a highly optimized

implementation of dynamic arrays, upon which we rely for the remainder of this book, it is instructive to see how such a class might be

implemented.

The key is to provide means to grow the array A that stores the elements of a list. Of course, we cannot actually grow that array, as its capacity is fixed.

29

(43)

Although the Python list class provides a highly optimized

implementation of dynamic arrays, upon which we rely for the remainder of this book, it is instructive to see how such a class might be

implemented.

The key is to provide means to grow the array A that stores the elements of a list. Of course, we cannot actually grow that array, as its capacity is fixed.

29

(44)

Implementing a Dynamic Array

If an element is appended to a list at a time when the underlying array is full, we perform the following steps:

• Allocate a new array B with larger capacity.

• Set B[i]₌A[i], fori₌0, ...,n1, wherendenotes current number of items.

• Set A₌B, that is, we henceforth useB as the array supporting the list.

• Insert the new element in the new array.

30

(45)

31

(46)

Implementing a Dynamic Array

The remaining issue to consider is how large of a new array to create. A commonly used rule is for the new array to have twice the capacity of the existing array that has been filled.

32

(47)

i m p o r t c t y p e s

c l a s s D y n a m i c A r r a y :

""" A d y n a m i c a r r a y c l a s s ak i n to a s i m p l i f i e d P y t h o n l i s t . """

def _ _ i n i t _ _ ( s e l f ) :

""" C r e a t e an e m p t y a r r a y . """

s e l f . _n = 0

s e l f . _ c a p a c i t y = 1

s e l f . _A = s e l f . _ m a k e _ a r r a y ( s e l f . _ c a p a c i t y )

def _ m a k e _ a r r a y ( self , c ) :

""" R e t u r n new a r r a y wi t h c a p a c i t y c . """

r e t u r n ( c * c t y p e s . p y _ o b j e c t ) ()

33

(48)

Implementing a Dynamic Array

def _ _ l e n _ _ ( s e l f ) :

""" R e t u r n n u m b e r of e l e m e n t s s t o r e d in the a r r a y . """

r e t u r n s e l f . _n

def _ _ g e t i t e m _ _ ( self , k ) :

""" R e t u r n e l e m e n t at i n d e x k . """

if not 0 <= k < s e l f . _n :

r a i s e I n d e x E r r o r ( ’ i n v a l i d i n d e x ’ ) r e t u r n s e l f . _A [ k ]

34

(49)

def _ r e s i z e ( self , c ) :

""" R e s i z e i n t e r n a l a r r a y to c a p a c i t y c . """

B = s e l f . _ m a k e _ a r r a y ( c ) for k in r a n g e( s e l f . _n ) :

B [ k ] = s e l f . _A [ k ] s e l f . _A = B

s e l f . _ c a p a c i t y = c

35

(50)

Implementing a Dynamic Array

def a p p e n d ( self , obj ) :

""" Add o b j e c t to end of the a r r a y . """

if s e l f . _n == s e l f . _ c a p a c i t y : se l f . _ r e s i z e (2 * s e l f . _ c a p a c i t y ) s e l f . _A [ s e l f . _n ] = obj

s e l f . _n += 1

36

(51)

(52)

Eﬃciency of Python’s Sequence Types

Python’s List and Tuple Classes

(53)

37

(54)

Constant Operation

• len(data)

• data[j]

38

(55)

• count

the loop for computing the count must proceed through the entire sequence,

• index,__contains__

the loops for checking containment of an element or determining the index of an element immediately exit once they find the leftmost occurrence of the desired value, if one exists.

39

(56)

Searching for Occurrences of a Value

Example

data = list(range(10000000))

• 5 in data: Best

• 9999995 in data: Middle

• -5 in data: Worst

40

(57)

The asymptotic behavior is proportional to the length of the result.

Example

• The slicedata[6000000:6000008]can be constructed almost immediately because it has only eight elements;

• the slicedata[6000000:7000000]has one million elements, and thus is more time-consuming to create

41

(58)

Mutating Behaviors

42

(59)

The simplest of those behaviors has syntaxdata[j] = val, and is supported by the special__setitem__method. This operation has worst-caseO(1)running time because

• it simply replaces one element of a list with a new value;

• no other elements are aﬀected and the size of the underlying array does not change.

43

(60)

Adding Elements to a List

• Theappendmethod

requires O(n)time because the underlying array is resized, but uses O(1)time in the amortized sense.

44

(61)

• Theinsertmethod

insert(k, value)inserts a given value into the list at index 0_≤k_≤nwhile shifting all subsequent elements back one slot to make room.

45

(62)

Adding Elements to a List

def i n s e r t ( self , k , v a l u e ) :

""" I n s e r t v a l u e at i n d e x k , s h i f t i n g s u b s e q u e n t v a l u e s r i g h t w a r d . """

# ( for s i m p l i c i t y , we a s s u m e 0 <= k <= n in t h i s v e r i o n )

if s e l f . _n == s e l f . _ c a p a c i t y : se l f . _ r e s i z e (2 * s e l f . _ c a p a c i t y ) for j in r a n g e( s e l f . _n , k , -1) :

se l f . _A [ j ] = s e l f . _A [ j -1]

s e l f . _A [ k ] = v a l u e s e l f . _n += 1

46

(63)

Adding Elements to a List

time per operation;

• Inserting at the middle requires about half the time as inserting at the beginning, yet is stillO(n)time;

• Inserting at the end displaysO(1)behavior, akin to append.

47

(64)

Adding Elements to a List

• Inserting at the beginning of a list is most expensive, requiring linear time per operation;

• Inserting at the middle requires about half the time as inserting at the beginning, yet is stillO(n)time;

• Inserting at the end displaysO(1)behavior, akin to append.

47

(65)

• pop(): removes the last element from a list.

This is most eﬃcient, because all other elements remain in their original location. This is eﬀectively an O(1)operation, but the bound is amortized because Python will occasionally shrink the underlying dynamic array to conserve memory.

48

(66)

Removing Elements from a List

• pop(k): removes the element that is at indexk_<nof a list, shifting all subsequent elements leftward to fill the gap that results from the removal.

The eﬃciency of this operation isO(nk), as the amount of shifting depends upon the choice of index k.

49

(67)

• Theremovemethod

remove(value)allows the caller to specify the valuethat should be removed.

The eﬃciency of this operation isO(n). One part of the process searches from the beginning until finding the value at indexk, while the rest iterates fromk to the end in order to shift elements leftward.

50

(68)

Removing Elements from a List

def r e m o v e ( self , v a l u e ) :

""" R e m o v e f i r s t o c c u r r e n c e of v a l u e ( or r a i s e V a l u e E r r o r ) . """

# no t e : we do not c o n s i d e r s h r i n k i n g the d y n a m i c a r r a y in t h i s v e r s i o n

for k in r a n g e( s e l f . _n ) : if s e l f . _A [ k ] == v a l u e :

for j in r a n g e( k , s e l f . _n - 1) : se l f . _A [ j ] = s e l f . _A [ j +1]

se l f . _A [ s e l f . _n - 1] = N o n e

se l f . _n -= 1 # we h a v e one le s s i t e m r e t u r n # e x i t i m m e d i a t e l y r a i s e V a l u e E r r o r ( ’ v a l u e not f o u n d ’ )

51

(69)

• Theextendmethod

• add all elements of one list to the end of a second list

• A call todata.extend(other)produces the same outcome as the code,

for e l e m e n t in o t h e r : da t a . a p p e n d ( e l e m e n t )

In either case, the running time is proportional to the length of the other list, and amortized because the underlying array for the first list may be resized to accommodate the additional elements.

52

(70)

Extending a List

In practice, theextendmethod ispreferable to repeated calls toappend becausethe constant factorshidden in the asymptotic analysis are significantly smaller.

53

(71)

The greater eﬃciency of extend is threefold.

1. There is always some advantage to using an appropriate Python method, because those methods are often implemented natively in a compiled language (rather than as interpreted Python code).

2. There is less overhead to a single function call that accomplishes all the work, versus many individual function calls.

3. Increased eﬃciency of extend comes from the fact that the resulting size of the updated list can be calculated in advance. If the second data set is quite large, there is some risk that the underlying dynamic array might be resized multiple times when using repeated calls to append. With a single call to extend, at most one resize operation will be performed.

54

(72)

Constructing New Lists

• List Comprehension

s q u a r e s = [ k * k for k in r a n g e(1 , n +1) ]

• Loop

s q u a r e s = []

for k in r a n g e (1 , n +1) : s q u a r e s . a p p e n d ( k * k )

The list comprehension syntax is significantly faster than building the list by repeatedly appending.

55

(73)

Initialize a list of constant values using the multiplication operator, as in [0] * nto produce a list of lengthnwith all values equal to zero. It is more eﬃcient than building such a list incrementally.

56

(74)

Using Array-Based Sequences

(75)

Storing High Scores for a Game

(76)

Storing High Scores for a Game

c l a s s G a m e E n t r y :

""" R e p r e s e n t s one e n t r y of a li s t of hi g h s c o r e s . """

def _ _ i n i t _ _ ( self , name , s c o r e ) : s e l f . _ n a m e = n a m e

s e l f . _ s c o r e = s c o r e

def g e t _ n a m e ( s e l f ) : r e t u r n s e l f . _ n a m e

def g e t _ s c o r e ( s e l f ) : r e t u r n s e l f . _ s c o r e

def _ _ s t r _ _ ( s e l f ) :

r e t u r n ’ ({0} , { 1 } ) ’ .f o r m a t( s e l f . _name , s e l f .

_ s c o r e ) # e . g . , ’( Bob , 98) ’ 57

(77)

58

(78)

Storing High Scores for a Game

c l a s s S c o r e b o a r d :

""" Fixed - l e n g t h s e q u e n c e of h i g h s c o r e s in n o n d e c r e a s i n g o r d e r . """

def _ _ i n i t _ _ ( self , c a p a c i t y = 1 0 ) :

""" I n i t i a l i z e s c o r e b o a r d wi t h g i v e n m a x i m u m c a p a c i t y .

All e n t r i e s are i n i t i a l l y N o n e .

"""

s e l f . _ b o a r d = [ N o n e ] * c a p a c i t y s e l f . _n = 0

def _ _ g e t i t e m _ _ ( self , k ) : r e t u r n s e l f . _ b o a r d [ k ]

def _ _ s t r _ _ ( s e l f ) :

r e t u r n ’ \ n ’ . j o i n (str( s e l f . _ b o a r d [ j ]) for j

in r a n g e( s e l f . _n ) ) 59

(79)

60

(80)

Storing High Scores for a Game

def add ( self , e n t r y ) :

s c o r e = e n t r y . g e t _ s c o r e ()

g o o d = s e l f . _n < len( s e l f . _ b o a r d ) or s c o r e >

s e l f . _ b o a r d [ -1]. g e t _ s c o r e ()

if g o o d :

if s e l f . _n < len( s e l f . _ b o a r d ) : se l f . _n += 1

j = s e l f . _n - 1

w h i l e j > 0 and s e l f . _ b o a r d [ j - 1 ] . g e t _ s c o r e () < s c o r e :

se l f . _ b o a r d [ j ] = s e l f . _ b o a r d [ j -1]

j -= 1

se l f . _ b o a r d [ j ] = e n t r y

61

(81)

if _ _ n a m e _ _ == ’ _ _ m a i n _ _ ’ : b o a r d = S c o r e b o a r d (5) for e in (

( ’ Rob ’ , 7 5 0 ) , ( ’ M i k e ’ ,1105) , ( ’ R o s e ’ , 5 9 0 ) , ( ’ J i l l ’ , 7 4 0 ) ,( ’ J a c k ’ , 5 1 0 ) , ( ’ A n n a ’ , 6 6 0 ) , ( ’ P a u l ’ , 7 2 0 ) , ( ’ Bob ’ , 4 0 0 ) ,

) :

ge = G a m e E n t r y ( e [0] , e [ 1 ] ) b o a r d . add ( ge )

p r i n t( ’ A f t e r c o n s i d e r i n g {0} , s c o r e b o a r d is :

’ .f o r m a t( ge ) ) p r i n t( b o a r d ) p r i n t()

62

(82)

Using Array-Based Sequences

Sorting a Sequence

(83)

Starting with an unordered sequence of elements and rearranging them into nondecreasing order.

63

(84)

Insert-Sorting

• First start with the first element in the array. One element by itself is already sorted.

• Then consider the next element in the array. If it is smaller than the first, we swap them.

• Next consider the third element in the array. Swap it leftward until it is in its proper order with the first two elements.

• Then consider the fourth element, and swap it leftward until it is in the proper order with the first three.

• Continue in this manner with the fifth element, the sixth, and so on, until the whole array is sorted.

64

(85)

65

(86)

Insert-Sorting

66

(87)

def i n s e r t i o n _ s o r t ( A ) :

""" S o r t li s t of c o m p a r a b l e e l e m e n t s i n t o n o n d e c r e a s i n g o r d e r . """

for k in r a n g e(1 , len( A ) ) : # fr o m 1 to n -1

cur = A [ k ] # c u r r e n t

e l e m e n t to be i n s e r t e d

j = k # fi n d

c o r r e c t i n d e x j for c u r r e n t

w h i l e j > 0 and A [ j -1] > cur : # e l e m e n t A [ j -1] mu s t be a f t e r c u r r e n t

A [ j ] = A [ j -1]

j -= 1

A [ j ] = cur # cur is

now in the r i g h t p l a c e

67

PDF Data structure and algorithm in Python - Array-Based Sequences - GNU

Array-Based Sequences

Table of contents

Python’s Sequence Types

Commonality

Diﬀerences

Python’s Sequence Types

Commonality

Python’s Sequence Types

Commonality

Diﬀerences

Low-Level’s Arrays

Memory Address

Low-Level’s Arrays

Low-Level’s Arrays

Referential Arrays

Referential Arrays

How to represent such a list within an array?

Referential Arrays

How to represent such a list within an array?

Referential Arrays

How to represent such a list within an array?

Referential Arrays

Referential Arrays

Referential Arrays

Referential Arrays

Low-Level’s Arrays

Compact Arrays in Python

Compact Arrays in Python

Compact Arrays in Python

Dynamic Arrays

Dynamic Arrays

Dynamic Arrays

Dynamic Arrays

Dynamic Arrays

Like hermit crab

Dynamic Arrays

Implementing a Dynamic Array

Implementing a Dynamic Array

Implementing a Dynamic Array

Implementing a Dynamic Array

Implementing a Dynamic Array

Implementing a Dynamic Array

Eﬃciency of Python’s Sequence Types

Python’s List and Tuple Classes

Constant Operation

Searching for Occurrences of a Value

Mutating Behaviors

Adding Elements to a List

Adding Elements to a List

Adding Elements to a List

Adding Elements to a List

Removing Elements from a List

Removing Elements from a List

Extending a List

Constructing New Lists

Using Array-Based Sequences

Storing High Scores for a Game

Storing High Scores for a Game

Storing High Scores for a Game

Storing High Scores for a Game

Using Array-Based Sequences

Sorting a Sequence

Insert-Sorting

Insert-Sorting