A Semi-Markov Decision Process (SMDP) for Active State Control of A Heterogeneous Network

(1)

http://dx.doi.org/10.3837/tiis.2016.07.017 ISSN : 1976-7277

A Semi-Markov Decision Process (SMDP) for Active State Control of A

Heterogeneous Network

Janghoon Yang¹

1Department of New Media Contents, Seoul Media Institute of Technology 661 Deungchon-dong, Kangseo-gu, Seoul 157-030, Korea

[e-mail: jhyang@smit.ac.kr]

*Corresponding author: Janghoon Yang

Received December 21, 2015; revised May 5, 2016; accepted May 31, 2016; published July 31, 2016

Abstract

Due to growing demand on wireless data traffic, a large number of different types of base stations (BSs) have been installed. However, space-time dependent wireless data traffic densities can result in a significant number of idle BSs, which implies the waste of power resources. To deal with this problem, we propose an active state control algorithm based on semi-Markov decision process (SMDP) for a heterogeneous network. A MDP in discrete time domain is formulated from continuous domain with some approximation. Suboptimal on-line learning algorithm with a random policy is proposed to solve the problem. We explicitly include coverage constraint so that active cells can provide the same signal to noise ratio (SNR) coverage with a targeted outage rate. Simulation results verify that the proposed algorithm properly controls the active state depending on traffic densities without increasing the number of handovers excessively while providing average user perceived rate (UPR) in a more power efficient way than a conventional algorithm.

Keywords:MDP, active state control, load balancing, heterogeneous networks, handover

(2)

3172 Yang : A Marko Decision Process (MDP) based Load Balancing Algorithm for Multi-cell Networks with Multi-carriers

1. Introduction

A

s mobile data traffic data increases in a cellular network, the density of cells increases to provide satisfactory wireless service. In the past, micro cells or pico cells were installed to deal with this problem. Nowadays, including femto cells, the various types of cells are deployed, which constitutes a heterogeneous cellular network. As the cellular system evolves, variety of heterogeneities occurs. Especially internet of things (IOT) is likely to have great impact on the architecture of a next generation cellular system. In the perspective of cellular systems, more heterogeneous cells are likely to appear. Severe heterogeneity in cells may incur several critical issues such as load balancing, handover, and inter-cell interference. Thus, controlling cellular system parameters to optimize some performance for heterogeneous networks will be a very difficult task.

As a large number of cells are installed in a system, managing cells in an energy-efficient way also lays down some issues. The energy consumption incurred by cellular systems is known to take several tenths percentage of the world's energy consumption [1]. To address the energy-efficiency more systematically, energy efficiency evaluation framework (E3F) was developed [2]. According to the model of base station (BS) power consumption [3], power amplifier usually consumes 55-60% of the overall power at full load for macro cells while around 30% for small cells. Power control at BSs can be broadly classified into transmit power control for interference control to improve quality of service (QoS) and active power state control for energy efficiency (EE). The former may help to reduce power consumption marginally. Thus, it is expected that active state control with consideration of QoS need to be developed to deal with energy-efficient operation with respect to a given cell load condition.

Several researches on this problem have been conducted for the past few years. A joint active state control and cell association which was mapped to Knapsack-like problem was proposed to minimize total energy used in the network [27]. Since a brute-force optimal BS control has exponential complexity with the number of BSs, many of them take a form of a greedy algorithm [4][8][17][19]. In [8], an active set control was applied to a subset of BSs to support peak traffic while others being turned on always. Starting with all BSs being in active state, a dynamic cell reconfiguration algorithm to minimize the total power with constraint on cell load sequentially switches off BSs in a greedy way [19]. Discontinuous transmission schemes were also considered as an energy efficient scheme [7][18]. In [7], BSs were switched off to save energy in sacrifice of transmission delay without offloading the mobile traffics while optimal delay was sought with the cost function of average delay and average power in [18]. Some novel methods were also introduced [5][10]. Ecological self-organization based method for distributed inter-BS cooperation heuristically switches off a BS when cell load is below a certain threshold while it is turned on by the request of surrounding neighbor cells which request load sharing [10]. Tabu search was exploited to find an active BS set which provides the best BS-RS(relay station) association minimizing energy consumption under a QoS constraint [5]. Some simple methods such as choosing one from the predefined sets of active BSs based on mobile traffic density [6] and determining active state from using average distance between mobile stations (MSs) and a BS [11] were also developed.

Active state control can be formulated into a sequential decision problem. One of the most efficient methods for solving sequential decision problem is to exploit the framework of Markov decision process (MDP). There are several researches to apply MDP to wireless network optimization problems such as call admission control for multiple radio access

(3)

technologies (RATs) [12][14], and joint radio resource management [13]. Only a few are closely related to the network optimization for EE. Femto cell active control in a heterogeneous network using MDP for a very specific model tried to minimize total power from maximizing total reward [16]. Delay-optimal control of BS discontinues transmission was formulated into partially observable MDP (POMDP) with the cost function of average delay and average power rather than focusing on energy-efficient operation [18].

There are two important limitations in existing related research. First, many of them focus on homogeneous network in which the effect of different coverage is not considered properly.

There are some researches for heterogeneous networks. However, most of them have been studied for controlling the active state of the single type of cells. A policy for controlling the sleep mode of small cells in a heterogeneous network based on stochastic geometry was developed to maximize energy efficiency (EE) in [23]. Algorithms for Switching on/off macro cells in a heterogeneous network have been developed to improve EE in [24] and to have a tradeoff between EE and the cost of mobile network operators (MNOs) with a combinational auction framework in [25]. Second, many existing algorithms do not consider broadcasting coverage for signaling channel when active state is controlled. One common assumption in existing research on energy-efficient optimization of a wireless network is that all MSs are in the coverage of some cells regardless of the number of active cells. However, this can be hardly true in a realistic network environment. This can be more problematic especially in a heterogeneous network environment where small cells and macro cells co-exist. In [8], the coverage of wireless network was explicitly considered in active state control. However, it is required to verify whether it satisfies the coverage condition whenever it switched off a BS, which could not be implemented in a realistic system due to the instability of service. An optimal density of micro BSs and macro BSs for energy-efficient operation in consideration of coverage constraint was also calculated numerically from using stochastic geometry theory [15]. However, a method for controlling the active state of each cell in consideration of cell coverage jas not been developed to the best of author's knowledge.

In this paper, we propose an ASC algorithm from the framework of semi-MDP (SMDP) in a heterogeneous network to have energy-efficient operation satisfying coverage constraint. It starts with learning channel environment from the measurement report of each MS to know the feasible set for active state control satisfying coverage condition. Then, it learns whether BS can be switched off or on depending on cell load.

The rest of the paper is organized as follows. A system model is given in section 2. In section 3, a problem is formulated in terms of power of each cell and coverage constraint. In section 4, the framework of SMDP for minimizing power with coverage constraint which approximately transforms a problem in the continuous domain into one in the discrete domain is developed with defining associated cost, transition probabilities, actions, and states. Since the transition probabilities are not available in a closed form, on-line reinforcement learning algorithm was introduced in section 5. In section 6, simulation setups are specified, and the performances of the proposed ASC algorithm are verified through numerical simulation. We make some concluding remarks in section 7.

2. System Model

We consider a heterogeneous downlink multi-cell system with frequency reuse-1 where the different types of cells exist in a system. For simplicity, the two types of cells, macro cells and small cells, are considered without loss of generality throughout this paper. There are N_MC

(4)

macro cells and N_SC small cells. Each BS may have a single sector or multi-sector depending on network design. With slight abuse of terminology, we call a sector as a cell throughout this paper. We make several assumptions to restrict the scope of this research and elucidate considered system setup.

Every BS has only one transmit antenna to be free from dependency on the types of multi-input multi-ouput (MIMO) techniques and to focus on the characterization of the ASC algorithm. We differentiate macro cell and small cell depending only on transmit power. Each BS is perfectly synchronized to common clock such as global positioning system (GPS) so that there may not be synchronization error in the received signal of MS. Likewise, each MS keeps perfect synchronization to its serving BS, and has perfect channel estimation. There exists a centralized processor which gathers perfect information on loading of each cell and controls its active state. There may be some uncertainty on this information due to delay and estimation error in practice. However, it may not have significant effect on the performance since active state control is likely to be done on long-term basis such as order of minutes or hours, and averaging out cell load over sufficient time interval may provide stable statistics.

One of important factors influencing cell load is a mobile traffic distribution. The arrival of mobile traffic is assumed to follow Poisson point process with arrival rate per unit area  where the size of each traffic is exponentially distributed with mean 1/.Consequently, the traffic density  can be calculated as /. For simplicity, we focus on homogeneous mobile traffic which can be readily extended to inhomogeneous case by making it dependent on location. A system load density associated with the cell c can be defined as follows [4][5].

) , ) ( , (

ON c ON

c z F r zF

  (1) where F_ON is a set of active cells, and r_c(z,F_ON)is the achievable transmission rate with the quality of radio frequency channel at location z served by the cell c. Exploiting (1), one can define loading of the cell c as follows.



 R c ON c

c z F q z dz

L  ( , ) ( ) (2) where q_c(z) is the probability distribution of being associated with the cell c at location z, and R is the region of the system. q_c(z) will be 0 if location z is out of coverage of the cell

. c

Cell coverage usually depends on signal to noise ratio (SNR) or signal to interference plus noise ratio (SINR). For simplicity, we consider SNR as a measure for defining coverage throughout this paper. Let E_C(z) be the event that maximum signal to noise ratio (SNR) over cells belonging to the set C is less than _T at location z when they are active. The probability of this event can be expressed as





   



C c

T c

C c

C z z z

E ( )) Pr(max ( ) ) Pr( ( ) )

Pr(     (3)

where _c(z) is the SNR of the signal received from the cell c. Exploiting this probability, one may define a feasible set collection F consisting of the sets of active cells which support outage level  over the region R in the following way.

} ) ( )) ( Pr(

|

{ 

 Fi



R EF z fZ z dz

F i (4)

(5)

where f_Z(z) is a geometric traffic density. Active set control problem in terms of minimizing total power with outage constraint can be formulated as





 i

i F c F c

F P

min (5) where P is the power consumption of the cell _c c. Even though the solution of this problem provides the minimization of total transmit power, quality of service (QoS) may not be good enough to be chosen as a practical solution. One may alternatively add loading constraint to balance the traffic and provide QoS.



 

  

i

i F c Fi c c F c c T

F P (L L )

min  (6) where _c is a nonnegative constant parameter,and L is the allowable maximum loading at _T each cell. It is noted that computing the optimal solution to (5) or (6) is a combinatorial problem which necessitates the exponential computational complexity with the number of cells. Throughout this paper, we will propose a suboptimal solution to tackle this problem with moderate complexity. We also provide the description of notations and summary of used acronyms and corresponding their meaning in Table 1 and Table 2 respectively.

Table 1. The description of Notations.

Notation Description

NMC Number of macro cells NSC Number of small cells

 arrival rate per unit area .

/

1  Average traffic size

 Traffic density

c System load density associated with a cell c Lc Cell load at a cell c

Pc power consumption of the cell c.

 a positive discount parameter for defining cost in a continuous time domain

lMC the average cell loads of macro cell type lSC the average cell loads of small cell type sMC the discrete cell load states of macro cell type sSC the discrete cell load states of small cell type TH thresholds for high cell load

TL thresholds for low cell load

 Outage threshold

(6)

Table 2. Summary of used acronyms and corresponding their meaning.

Acronyms Description

BS Base Stations

MDP Markov Decision Process SMDP Semi-Markov Decision Process SNR Signal to Noise Ratio

UPR User Perceived Rate ASC Active State Control

MS Mobile Station

SCM Spatial Channel Model

3GPP The 3rd Generation Partnership Project

HO Hand Over

M-GOFF Modified Greedy OFF

3. Problem Formulation

In this section, we formulate semi-Markov decision process for controlling active state of cells.

One can easily find that the solution of (5) will be the same regardless of mobile traffic conditions. This means that it depends only on the geometry of system environment. This solution may not be a good working solution when there is space-time variation in the vloume of data traffic. The solution of (6) may provide a trade-off between the power minimization and cell load balancing. We will implicitly exploit this fact to formulate the problem in terms of semi-Markov decision process.

Let x _k t , _k a be state, continuous time, and action at time step _k k, respectively. A transition distribution can be expressed as

) ,

| ,

{ ) ,

( ₁ ₁

, a Pt t x j x i a a

T_i _j   _k_  _k  _k_  _k  _k  (7) With simple manipulation, this distribution can be expressed as

)

| ( )

,

( _, _,

, a P f a

T_i _j   _i _j _i _j  (8) where f_i_,_j(|a)P{t_k_₁t_k |x_k i,x_k_₁ j,a_k a) and transition probabilityP_i_,_j(,a)

= P_i_,_j lim_{ }_ T_i_,_j(,a).One can define total discounted cost with policy  [a₀,a₁,] starting from state i as follows

}

| ) , ( {

lim ) (

1

0



^ 1



 

 

 ^N ^

k t

t k k

t

N E e g x a dt x i

i

C ^k

k

  (9)

where g(x_k,a_k) is cost per stage which is assumed to be constant for an interval [t_k,t_k_₁], and

 is a positive discount parameter in continuous time domain. C_(i) can be decomposed into two parts, expected single stage cost G(i,a₀(i)) and expected cost to go from the next state [20].

)}

( ,

| ) ( {

)) ( , ( )

( ₀ ₀ ₀

1 j x i a i

C e E i a i G i

C_   ^^ _  (10)

where ₁[a₁,a₂,], ,

) (

) , ) ( )) ( , ( (

)) ( ( ))

( ,

( 0 0 0

1 0

0 P u

u d dt T i a i g e i

a P i

a i G

ij t ij

N

j ij

S   

  

^ ^



 and N is _S

the number of states. The second term in the right side of (10) can be further calculated by

(7)

) ( )) ( ˆ (

)}

( ,

| ) ( } ), ( ,

| { { )}

( ,

| ) ( {

1 1 1

1

0 0

j C i a P

i a i j C j i a i e E E i a i x j C e E

NS

j

o ij ij j



 











(11)

where ˆ_ij(a_o(i))^



0^{ }e ^df_i,_j( |a_o(i)). (10) can be rearranged by using (11) as )

( )) ( ˆ ( ))

( , ( )

( 1

1

0 i P a i C j

a i G i C

NS

j

o ij

ij 











 (12) Even though this equation appears to be same as the system equation of discrete MDP, it is not, since summing P_ijˆ_ij(a_o(i))over j is not equal to 1. Thus, we derive the system equation of Markov decision process with discounted cost from the upper bound of P_ijˆ_ij(a_o(i)).

) ( ))

( , ( )

( 1

1

0 i PC j

a i G i C

NS

j ij 

 







 (13) )

ˆ ( max_{(_i_,_a_,_j₎_|_i _S_,_a _A_,_j _S_}_ij a

  _ _ _ (14) For instance, if transition time follows exponential distribution with the transition rate of

) (a

b_ij , then  can be determined as max_{(_i_,_a_,_j₎_|_i__S_,_a__A_,_j__S_}1/(b_ij(a)). Setting  as in (14), we can find the relationship between  and  with the following proposition.

Proposition-1 :  is monotonically non-increasing with  .

Proof : since ˆ_ij(a_o(i)) is monotonically non-increasing with . The maximum of ˆ_ij(a_o(i)) over {(i,a,j)|iS,aA,jS} is also monotonically non-increasing.

This proposition implies that larger  is equivalent to smaller  in continuous time domain.

That is, setting  larger in discrete domain means to consider future cost more aggressively in continuous time domain.

An expected single stage cost

G ( i , a

₀

( i ))

can be upper bounded with approximation in the following way.

)) ( ( )) ( , exp (

))1 ( ( )) ( , (

}}

), ( , 1 |

{ { )) ( , (

} )) ( , ( {

)) ( , (

0 0

} ), ( ,

| {

1 0 0

0 0

0

i a i a i g i

a P i a i g

j i a e i E E i a i g

dt i a i g e E i a i G

i j

i a i n E

j ij j t

o 











 

 

 

 













(15)

where ( ( )) ( ( )) { | , ( ), }.

1 0

0 i P a i E i a i j

a _o

n

j ij

i 







 

In (15), inequality and approximation come from Jensson's inequality and Taylor series expansion respectively. It is noted that the approximation is valid whenE{|i,a_o(i),j} is close to 0.

(8)

4. Markov Decision Process for Active State Control

In this section, we explicitly define the elements of the MDP for active state control in a heterogeneous network. A MDP is described as a tuple S,A,T,C where S is a set of states, A is a set of actions, T {P_ij(a)|iS,aA,jS}is a set of transition probabilities, and C :SA is a cost function. In MDP, a next state and cost depend only on the current state and the chosen action which represents Markov property. An ASC problem can be formulated in many different ways with MDP by defining the elements of the MDP differently.

It is known that MDP can have the curse of dimensionality with a large number of states. Thus, the proposed MDP will be constructed such that it can be implemented with moderate complexity.

State information is usually used for determining an action. One may determine the active state of each cell based on the active states of other cells, loading of each cell, the number of MSs in the system, and so forth. Theoretically considering every possible variable to determine a state may provide the best performance. However, it can lead to huge complexity and excessive convergence time if some parameters need to be learned. Thus, we define a state from a two-dimensional vector (l_MC,l_SC) where l_MC and l_SC represent the average cell loads of macro cells and small cells. Since the decision of the active state of each cell is closely related to average cell load, this information is taken as state information. Even though the state has only two dimensions, the number of state can be infinite if it takes a continuous value.

Thus, we consider a discretized state space in the following way.

}}

, , { ,

| ) ,

{(s s s s H M L

S  _MC _SC _MC _SC (16) where s_MC and s_SC are the discrete cell load states of macro cells and small cells, and " H ", "

M ", and " L " represent three cell load states depending on average cell load. There is no strict rule to define the cell load state. It may depend on the goal of system operation and service management scheme. In this paper, this will be set by defining the interval of cell load for determining cell load state.













otherwise M

T l if L

T l if H

s _i _L

H i i

, , ,

(17)

where T and _H T are thresholds for high cell load and low cell load respectively, and _L }.

, {MCSC i

For active state control, the action space may be simply defined as turning on and turning off.

However, since we are interested in a heterogeneous system, we distinguish them depending on the types of cells. Consequently, the action space can be defined as

}}

, ,

, {

|

{a a M On S On M Off S Off

A      (18) where MOn,SOn,MOff,andSOff represents "Turn on a macro Cell", "Turn on a small cell", "Turn off a macro cell", and "Turn off a small cell" respectively. One may have more refined action space such that it can include a set of cells which need to change their active states. However, it will make MDP too complicated. Thus, the decision of this set will be done rather heuristically which will be defined in a subsequent section.

Taking an action for a current state incurs cost. Even though an objective function is given in (6), it cannot be directly used since it requires the parameter decision of _c. In addition, active state depends on cell load condition. There can be infinite number of ways to define this function while the systematic method of constructing this function is never known to the best

(9)

of authors' knowledge. Thus, from the observation that it is reasonable to control the active state of a cell based on cell loads, the single stage cost is proposed as follows.

)}

), ((*, ), ,*),

((

), ),

((*, ), ,*),

{((

)}

), {((*, )},

,*), {((

)}

), {((*, )},

,*), {((

5 . 0

) 1 ( 5 . 0 )

1 ( 5 . 0

5 . 0 5

. 0

ON S L ON M L OFF S H OFF

M H G

OFF S L G

OFF M L G

ON S H G

ON M H G

SC MC

l l























(19)

where G^ {(s,a)|sS,aA} In (20), G denotes a set of state and action pairs of which ^ single stage cost is  , and * means that any state in the state space can be allowed. Cost is designed such that it can have value between -0.5 and 0.5. Cost, 0.5 is assigned to the state-action pairs which are not desired to happen. Cost of turning on a cell in the case of high average cell load is designed to decrease linearly with average cell load, while cost of turning off a cell in the case of low average cell load linearly increases with average cell load. No cost is defined for the state(M,M), since it is considered as a terminal state..

For the approximate MDP defined in (13), a set of system equation for optimality can be defined as [20].



^ ^



 _

'

* , '

*( ) min [ ( , ) ( ) ( ')]

s s

s s s

s

s G a P aV for S

V _a _A  (20)

where V(s) is called as a value function. This is a Bellman equation for MDP. Corresponding optimal policy ^* can be expressed as



^ ^



 _

'

* , '

* argmin [ ( , ) ( ) ( ')]

s s

s s s

s a P aV for S

A G

a 

 (21)

This equation is usually solved through value iteration or policy iteration [20]. To do so, knowledge on transition probability is required. However, this information is not available in the given problem formulation. When this information is not available, heuristic algorithm or learning algorithm can be an alternative.

5. Reinforcement learning for Active State Control

Some learning algorithms are known to converge to optimal solution for some conditions [21].

These learning algorithms are broadly classified into off-line and on-line algorithms. Since off-line algorithm requires precise modeling of physical environment, and its performance may degrade in the presence of model mismatch, we focus on on-line learning algorithm.

To derive on-line learning algorithm, first, we have to define a Q-factor Q s( a, ) which is also often called as an action-value function. Bellman equation for an optimal Q-factor Q s^*( ,a) can be expressed as

A a and S for a Q a

P a

G a

Q ⁽ ^, ⁾ ⁽ ^, ⁾



^[ ^|' ^, ^]^min_a^_A ⁽ ^'^, ^'⁾   ^,  

'

* '

*

s

s s

s s s

s  (22)

One of the most efficient algorithm to approximate Q s^*( ,a) is Sarsa algorithm[22]. This algorithm updates Q-value and policy at each step [22]. A updating equation for Sarsa algorithm at the kth step is given as

)) , ( ) , ( ) , ( ( ) , ( ) ,

( _k a_k Q _k a_k G a Q _k ₁ a_k ₁ Q _k a_k

Q s  s  s  s _ _  s (23) where  is a step size which determines averaging width. At each step, once the current state and action are given, associated single stage cost and next state can be determined. The action

(10)

of the next stage can be decided from a given policy. Policy may be updated at each step with updated Q-value if necessary. This procedure is repeated at every step. Convergence is guaranteed with  -greedy policy if all state-action pairs are visited with asymptotically large number of times [21].

At each step, an action needs to be determined from a policy. The policy is a mapping from a state to an action. We adopt a heuristic random policy which combines a deterministic policy and a softmax policy.

1 ) ),

, ((

, 1 ) ),

, ((

1 ) ),

, ((

, 1 ) ),

, ((

)}

, ( ), , ( ), , ( ), , {(

) , (

'

) , ' (

) , (























^



ON M M H ON

S H M

OFF S L M OFF

M M L

H H L H H L L L e for

a e _Q _a

a Q



 s s

s s s

(24) The proposed policy exploits a softmax policy for the states consisting of the combination of

L and H while it does a deterministic policy for other states. It is noted that the softmax policy is applied to the case that there is ambiguity in choosing a proper action for a given state.

On the contrary the deterministic policy is applied to the states for which proper action is intuitively clear. For example, when s(L,L) it is quite reasonable to turn off a macro cell or a small cell to lower average cell load. But it is unclear whether it will be better to turn on a macro cell or a small cell. Thus, a softmax policy probabilistically determines which action is going to be taken.

The action determined by the reinforcement algorithm does not specify the set of cells to be controlled at each stage. For simplicity, we set the maximum number of cells to change its active state as 1. For the purpose of load balancing, it will be advantageous to turn on the power of a cell of which neighbors are heavily loaded.



 

  

 ( , ) : 1 ,

, 0

: _, _,

| } ) , ( , 1

| ) , {(

| max 1 arg

y j y i

i jy W P jy

i y

j P

y i L

W y j P

y

c j (25)

where y{MC,SC}, P_j_,_y represents the active state of the cell j of type y which takes the value 0 for being turned on and 1 for being turned off, and W_i is the set of neighbor cells of celli which is defined more explicitly later. (25) selects a cell with the largest loading averaged over its neighbor cells. Similarly, the cell to be turned off will be determined as follows.

   



 

1 : ) ,

( ,

, {}

, _,

| } ) , ( , 1

| ) , {(

| min 1 arg

y j

i jy WiP jy

i y

j Y

y i L

W y j P

y

c j (26)

where { | 0, ˆ}

) , ' (

,

' j i

C y j

y j j

i C P C Y

Y

j











, and Yˆ_i is the collection of the neighbor supporter sets which is defined in section 6. (26) simply selects a cell with the lowest loading averaged over its neighbor cells among cells which can be turned off safely with satisfying the coverage constraint.

The complexity of ASC algorithm depends on the reinforcement learning algorithm and the algorithm for selecting a cell to be turned on or off expressed in (25) and (26). The complexity of reinforcement learning algorithm itself is [28]. Since the number of neighbor cells and the number of active or inactive cells are proportional to the number of cells. The complexity due to (25) and (26) will be . Associated with this algorithm no extra signaling is required.

However, the cell load of each cell needs to be collected at a centralized processor every period of load balancing algorithm, which is likely to be very minor overhead.

(11)

6. Simulation Results 6.1 Simulation Setup

We consider a conventional wrap-around hexagonal downlink network consisting of 57 macro cells where three cells are co-located as three sectors with nominal coverage of 120 degree and randomly located N_SC omni-directional small cells. Following conventional transmit power setup [3][26], the transmit powers of the macro cells and the small cells are set to be 20 watt and 1 watt respectively. Since we treat each sector as a cell, the total number of cells is 57N_SC. Each cell transmits signal with a single antenna. Each MS with two receiving antenna executes maximum ratio combining over received signal. Channel model follows the urban micro model of 3GPP SCM with non-line of sight. The number of MSs are 10 per macro cell at the start of simulation. They are geometrically randomly distributed with uniformly distributed velocity. That is, the m th MS has velocity (100/570)m. Each MS moves along a straight line with a random direction.

The average data traffic size for each call arrival was set to be 1k bits. Discount factor  was set as 0.9. T_H and T were 0.9 and 0.7 to make the cell load of each active cell rather high _L so that it can operate in a power-efficient manner. Simulation was executed over 50000 frames where frame length was 1 sec. For the first 10000 frames, feasible neighbor sets for supporting the coverage of each cell were estimated. Learning algorithm was executed over 40000 frames with the period of 10 frames. The practical active state control may be executed every several minutes or several tens of minutes. However, since it is important to include the effect of the short-term channel characteristics which is closely related with velocity of channel and the quality of service, we focus on the simulation on short-term scale rather than long-term scale.

Even though learning period of 10 sec is rather too short, it is expected that setting control period short may not have great impact on characterizing the performance of the proposed algorithm. Adopting the conventionally used parameter values in a practical cellular system, we set simulation parameters. We summarize main simulation parameters in Table 3.

To simulate the proposed algorithm, a feasible set collection F needs to be estimated. Brute force search necessitates the exponential complexity with the number of cells which can not be feasible. To deal with this problem, we define a collection of neighbor supporter cells Y_c for cell c as follows

} )

| ( )) ( Pr(

|

{ 

 ^



_A_c S^^c ^Z ^c c

c S E x f z A dz

Y (27)

where S^^c is a set of active cells with the power of the cell c being turned off, and A is the _c SNR coverage of the cell c which satisfies an inequality,



A_cPr(E_{c_}(x))fX(x|Ac)dx .

Table 3. Simulation parameters.

parameter value

Transmit Power of Macro cells 20Watt Transmit power of Small cells 1Watt

Path loss exponent 4.05

Bandwidth 20MHz

Operating Frequency 1850MHz

Number of MSs 570

Minimum Inter-Macro BS distance 1000m

(12)

This inequality implies that SNR outage is less than  . Thus, Y is the collection of the sets _c consisting of neighbors providing the same SNR outage coverage of the cell c while the power of the cell c is turned off. To find Y , we define a set of candidate neighbor cells _c W as _c

} , ) (

|

{a x₀suchthat x₀ a c

W_c  _a _T  (28) For given W , there are _c 2^|^W^c^|1 possible candidate sets which can be included in Y . If _c W_c is large, then it will add extra complexity. Thus, one can reduce the size of W_c in some sacrifice of performance which is expected to be negligible. Each cell counts the number of events that both SNRs of cell c, and cell a are greater than SNR threshold. The numbers of events are sorted with decreasing order. Cells corresponding to top N in the sorted list are selected as _W candidates to be used for finding Y approximately. At each frame, the event count of the i th _c combination of neighbor support cells can be counted as follows

)) (

( _,

,

, am T

M m

C a c

i c i

c iI V

V  



  

  (29) where C_i_,_c is the i th combination of neighbor supporter cells for cell c,M is the set of MSs _c served by the cell c ,  is a logical or operation, and ()I is an indication function which has value 1 if the condition inside the bracket is true, otherwise 0. Exploiting this information, one can estimate Y as _c

} 1 2 ,...

1 , /

) (

|

ˆ { ^| ^|

, , ,

,    

 _i_c _T_c _i_c _T_c ^W^c

c C N V N i

Y  (30)

where N_T_,_c is the total number of SNR report at the cell c. (30) implies that a collection of neighbor supporter cells is determined from the estimated outage probability of each set of neighbor supporter cells. In the simulation, W is estimated after the first 5000 frames. If the _c number of elements is greater than 10, then some elements of W are removed with the rule _c described above. Then, Yˆ_c is determined from V_i_,_c estimated for the second 5000 frames.

6.2. Simulation Results

In this subsection, we investigate the performance of the proposed ASC algorithm with simulation results. Three performance metrics, the percentage of active cells, the number of HOs, and average user perceived rate (UPR) are considered. The percentage of active cells is defined as the ratio between the number of active cells and the total number of cells. It will be given separately for macro cells and small cells to assess the power efficiency more accurately in relation with other metrics. The number of HOs is also an important parameter to be considered for controlling the active state of cells. HO often incurs delay and degradation in service reliability. HO performance in a practical system depends on HO algorithms. However, in this simulation HO is determined instantly with received signal strength for simplicity to be free from a specific HO algorithm and parameterization. Thus, the number of HOs will depend on the mobility of MSs and how often ASC algorithm switches on or off a cell. Depending on how many cells are turned off and which cells are off, service quality may be different. One of representative metrics for service quality at MS is average UPR. UPR is defined to be the ratio between the data traffic size and total time to finish its transmission. Thus, UPR represents the end user experience in having wireless service. In summary, the percentage of active cells, the number of HOs, and average UPR are chosen as representative performance metric to assess the power efficiency, service reliability, and end user service experience.

(13)

The performance of the proposed algorithm is likely to depend mostly on the threshold T _H and T which are used for deciding the active state depending on cell load. In _L Fig. 1, the effects of these thresholds on the active state were shown for two different traffic densities, 0.01, and 0.001. Two different types of plots were drawn together. First, the percentage of active cells was shown when T increased from 0.1 to 0.8 with the step of 0.1 while fixing _L

. 9 .

0

TH It can be observed that the percentage of active macro cells decreases as T _L increases. It is because there is more chance to turn off a cell when T is high. However, the _L percentage of active small cells is kept constant about 5%. High loading is required to turn on a cell for high T_H. However, small cells usually have low loading due to small coverage unless the wireless network is heavily crowded. Second, the percentage of active small cells was shown to decrease as T decreased from 0.9 to 0.2 with the step of 0.1 while fixing _H

. 1 .

0

TL Low T_:_L tends to make relatively large percentage of small cells active, since it activates the cells with small loading. However, the percentage of active macro cells does not change much with increasing T_H. As long as traffic density is not very low, and traffic is uniformly distributed geometrically, each macro cell may have some loading due to large coverage. Thus, every macro cell is active when T_L 0.1,and traffic density is 0.01, while there is slight decrease in the percentage of active macro cells when T_L0.1, and traffic density is 0.001. These results confirm that high T leads to aggressive active state control _L while low T and low _H T lead to conservative active state control. _L

Fig. 1. Effects of load thresholds on active state (solid line : macro cell, dotted line : small cell).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 10 20 30 40 50 60 70 80 90 100

% of Active Cells

Threshold

TH = 0.9,  = 0.01 TH = 0.9,  = 0.001 T_L = 0.1,  = 0.01 T_L = 0.1,  = 0.001

(14)

Fig. 2. Effects of load thresholds on the number of HOs.

The effect of load thresholds on the number of HOs is shown in Fig. 2. There is no significant change depending on load threshold and traffic density. The number of HOs with traffic density of 0.001 is slightly more than one with 0.01. Since the number of active cells is small when the traffic density is small, turning power on or off influences more MSs. When traffic density is fixed at 0.01, the number of HOs is found to increase slightly with the same reason as T increases while _L T_H 0.9. Since HOs are incurred by both the mobility of MSs and the change in the active state of cells, the variation with load threshold depends on the period of active state control and observation time. Nonetheless, considering the short period of active state control, it is expected that the number of HOs is not very sensitive to the load thresholds.

Fig. 3. The effect of the number of small cells on performances (solid line : traffic density of 0.1, dotted line : traffic density of 0.01).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.5 1 1.5 2 2.5

3x 10⁷

The Number of HOs

Threshold

TH = 0.9,  = 0.01 TH = 0.9,  = 0.001 T_L = 0.1,  = 0.01 T_L = 0.1,  = 0.001

10 20 30 40 50 60 70 80 90 100 110

0 50 100

% of Active Cells

10 20 30 40 50 60 70 80 90 100 110

0 1 2 3x 10⁵

The number of HOs

10 20 30 40 50 60 70 80 90 100 110

0 1 2 3 4 5x 10⁷

UPR (bps)

The number of small cells

Small Cell Macro Cell

(15)

The effect of the number of small cells on performances is shown in Fig. 3. It was investigated for two different traffic densities, 0.1, and 0.01. In the top figure, it can be observed that while there is no significant change in the number of active small cells for the traffic density of 0.01, it starts to increase around 70 for the traffic density of 0.1. It is conjectured that as the number of small cells increases, there can be more chance to have small cells with relatively large loading, which produces this result. It is interesting to note that increase in the number of small cells from 10 to 110 does not change the number of active small macro cells much. This result may not be generalized, since massively large number of active cells may have different effect. Due to the limited memory of simulator and simulation time, we could not evaluate the effect of the very large number of small cells which is left for future research with different simulation methodology. In the middle figure, the number of HOs was plotted with the increasing number of small cells. As in Fig. 1, the number of HOs with the traffic density of 0.01 is slightly more than one with the traffic density of 0.1, since the percentage of active macro cells is relatively smaller than one with the traffic density of 0.01.

It is also observed that the number of HOs increases slightly as the number of small cells increases. However, it is not significant due to the small coverage of small cells. In the bottom figure, average UPRs were compared. In this paper, UPR is defined to be the ratio between the data traffic size and total time to finish its transmission. Thus, UPR represents the end user experience in having wireless service. Since active state is controlled such that the average loading of active cells can be maintained between the T and _L T . It is expected that the UPRs _H is not likely to be constant over different number of small cells. Even though there does not seem to be any particular relationship between UPR and the number of small cells, relative variation of UPR seems to be marginal for the different numbers of small cells.

Fig. 4. The percentage of active cells for different data traffic densities (solid line : small cells, dotted line : macro cells).

The proposed algorithm is compared with an existing algorithm and the case of all cells being active in Fig. 4. There is no proper existing ASC algorithm for a heterogeneous network to be applicable to the system setup considered, to the best author's knowledge. Greedy OFF (GOFF) algorithm in [4] has good theoretical foundation with good performance verified with simulation results for a homogeneous network. However, GOFF algorithm stops when there is

10^-5 10^-4 10^-3 10^-2 10^-1 10⁰

0 10 20 30 40 50 60 70 80 90 100

Traffic density

% of Active Cells

Proposed method M-GOFF[4]

All cells being active