Design and Implementation of Dynamic Recommendation Service in Big Data Environment

(1)

Design and Implementation of Dynamic Recommendation Service in Big Data Environment

Kim Ryong*․Kyung-Hye Park**

Abstract

Recommendation Systems are information technologies that E-commerce merchants have adopted so that online shoppers can receive suggestions on items that might be interesting or complementing to their purchased items. These systems stipulate valuable assistance to the user’s purchasing decisions, and provide quality of push service. Traditionally, Recommendation Systems have been designed using a centralized system, but information service is growing vast with a rapid and strong scalability. The next generation of information technology such as Cloud Computing and Big Data Environment has handled massive data and is able to support enormous processing power. Nevertheless, analytic technologies are lacking the different capabilities when processing big data. Accordingly, we are trying to design a conceptual service model with a proposed new algorithm and user adaptation on dynamic recommendation service for big data environment.

Keywords：Big Data Environment, Dynamic Recommendation Service, E-commerce Merchants, Information Technology

1)

Received：2019. 10. 01. Revised : 2019. 10. 28. Final Acceptance：2019. 10. 31.

※ This study was supported by research fund of ChungNam National University.

** First Author, Doctoral Candidate, MIS Major, Dept. of Management, ChungNam National University, e-mail：[email protected]

** Corresponding Author, Professor, School of Business, College of Economics and Management, ChungNam National University, 99 Daehak-ro, Yuseong-gu, Daejeon, Tel：042-821-5578, e-mail：[email protected]

(2)

for useful information [Ahmad, 2017; Greg, 2003]. Traditional database management sys- tem, such as relational database, was proven to be good for the structured data but break- able in cases of semi-structured and unstruc- tured data [Nandigam, 2005]. However, in reality data are coming from different data sources in various formats and vast majority of these data are unstructured or semi-struc- tured in nature [Balci, 2008]. Moreover, data- base systems are also pushed to its limit of storage capacity. As a result, organizations are struggling to extract useful information from the unpredictable explosion of data cap- tured from inside and outside their organi- zation. This explosion of data is referred to as “Big Data” [Ibrar, 2016]. Big data is a col- lection of large volume of complex data that exceeds the processing capacity of conven- tional database architecture. Traditional data- bases and data warehousing technologies do not scale to handle billions of lines of data and cannot effectively store unstructured and semi-structured data [Bourret, 2000; Broisin, 2005]. Enterprises have adapted platforms for massive data processing, however, many of them have reported that the analytic techni- ques are lacking in capabilities when proce- ssing big data [Sanjay, 2013].

In this research paper, the focus is on de- signing a Decentralized Recommendation Sys- tem (RS) to fully implement and benefit from

Rating Prediction and finding the Top-N items in the context of RS [Tewari, 2018; Evangelia, 2016]. Applications of the concept of artificial life and evolutionary algorithm inspired RS have been used in a practical scenario illu- strated in recommending relevant advertise- ment to online shopping users [Kim, 2016].

2. Related Works 2.1 Hadoop

Hadoop is the open-source implementa-

tion of the Map-Reduce framework. Hadoop

allows splitting an amount of work into many

pieces and enables these pieces of work to be

sent to the worker units [Mohd, 2015]. Those

worker units could be very primitive computa-

tion engines based on cheap commodity hard-

ware. In an ideal situation, they utilize some

sort of direct-attached storage to minimize

the network bottleneck. Usually, one job trac-

ker and one name node exist. The job tracker

receives the job as a big amount of work and

split it to many tasks. On the same node, there

should be also a name node installed. The

name node holds a list of all available data

nodes, and where files are located, in which,

it also includes replication of files [Neyland,

2016]. Computation resources are measured

in map and reduce capacity. With that infor-

mation, the task could be sent to the node

(3)

<Figure 1> Hadoop Ecosystem for Recommender Service

with the local data, which avoids unnecessary

network traffic and valuable task setup time.

The following <Figure 1>. illustrates the archi- tecture of Hadoop and its ecosystem.

2.2 Hadoop Distributed File System

Hadoop Distributed File System (HDFS) plays an important role in the entire Hadoop cluster, for this reason it holds all the data of the whole cluster. It scales well, provides fault tolerance through inter-node replica- tion and handles file system errors transpa- rent [Robert, 2016]. The HDFS saves the files into blocks of a predefined size. It is common to set this size around 64 or 128 megabytes.

For a cluster environment with many tera- bytes, size could resolve into a poor configu- ration, where it generates a remarkable amount of blocks, where it has to be managed by the name node. In addition, for an optimal data locality on the worker nodes, the job tracker generates one map task per block. A map task is a valuable unit, because it requires an expensive setup of the task on the worker node. With that in mind, setting an optimal block size is an important configuration. It is also an important size for a non-splittable compression algorithm [Brent, 2016]. The coor-

dinator of the HDFS is the name node, which lacks in fault-tolerance. If the node with the name node-image (the meta-information of the HDFS) fails, the complete HDFS becomes unavailable. Because of this, there is other file system. HDFS works best with read access, as it is optimal for data warehouses.

2.3 Hive and HiveQL

Hive is mostly designed for offline OLAP queries, as used in common data warehouses.

The data are saved as files and folders in HDFS. Hive gives the user the possibilities to write their own de/serialization methods, for example, it can be used for enabling com- pression. HiveQL is a high level query lan- guage for Map-Reduce provided by Hive to define and manipulate data stored in Hive.

The ANSI SQL-92 standard is not fully imple- mented, but the language is similar to the well-known SQL language. The language sup- ports the most primitive data types, collec- tions like arrays, maps and user-defined types.

Creating tables’ statements is provided by

Data Definition Language (DDL). Further

indexes could be defined and data could be

loaded into tables by using LOAD and INSERT

statements.

(4)

<Figure 2> Dynamic Recommendation Service Architecture

the services that are relevant to their se-

lected positions in a designed process. <Figure 2>. bellow shows the recommended design.

From the data flow presented, each element is linked based on recommender engine. To achieve the first objective, we use usage data and adapt well-known Collaborative Filte- ring (CF) techniques. We aim at discovering the user’s interest that is hidden in the usage data. We also intend to use CF techniques, which have been developed for item recom- mendation and prediction. While, we do not ask users to provide additional information such as profile, rating or comments, when a user selects a service. Our Dynamic Recom- mendation Service recommends the interest relevant dynamically.

To do so, we first identify user interests based on past usage data, then, we integrate these interests in CF algorithms to calculate similarities between users and services. Based

and other services using vector space model (VSM). Sort the services in a descending order of similarities and then display the Top-l recommended services to the user. The pseudo code of service recommendation based on item- based Top-N CF is described in <Algorithm 1>.

The key step of the algorithm is finding the

similarity between a service Sx. To compute

this similarity, we apply VSM, which is deve-

loped to compute the similarity between two

individual documents. It presents documents

in a k dimensional space, where k is the num-

ber of different terms. Each document is pre-

sented as a vector with k elements and each

element of a document vector corresponds to

a term appearing in the document. The value

of a vector element is the weight of the

corresponding term. This weight is computed

by term frequency (TF) and inverse document

frequency (IDF). Similarity between two docu-

ments is computed by the cosine value of the

(5)

input : Sx: currently used service output: a recommended list of l services S = set of services;

for each service Si in S do

Compute the similarity between Si and Sx;

end

Sort Si ∈ S in descending order of similarity;

Select Top-l services for recommendation;

<Algorithm 1> Service-Driven Recommendation

input : Ux: active user

output: a recommended list of l services U = set of users;

for each user Uj in U do

Compute the similarity between Uj and Ux;

end

Sort Uj∈U in descending order of similarity values;

Select Top-k users from the sorted list of Uj ∈ U;

for each of k selected users

Select the t-most-frequently-used;

Services to make a recommended list of l = k×t services;

<Algorithm 2> User-Oriented Recommendation

<Figure 3> UML Classification of Recommendation Service

angle created by the two corresponding vectors.

In our approach, we consider analogically each row (service) in the usage matrix as a docu- ment and each column (user) as a term. The value of each element in the usage matrix is considered as the number of times that the corresponding term appears in the correspon- ding documents. Similarity between two services is inferred from the similarity between two row vectors. We also apply the term-frequency (TF) and inverse document frequency (IDF) to the usage matrix to compute the weight of each user (term).

3.2 User-Oriented Algorithm (UOA)

Inspired by the fact that users who have similar interest tend to select similar items, we aim to use the algorithm to find users with similar interest, i.e. users that used similar services.

The pseudo code of User-oriented Recom- mendation is described in <Algorithm 2>. By selecting the most frequently used services that were used by the most relevant users and were not used by the active users to make recommendations, we consider in this algorithm each user as a document and each service as a term. A process and an approach that is contrary to the Sx.

The system generates recommendations in

three steps algorithm. First, compute the si-

milarity between the active user and others

based on their usage data. Second, sort other

users in descending order of similarity and

select the top-k users in the list. At last, for

each selected user, select the t-most-frequently-

used services that were not used by the active

user to make recommendations, illustrated

in <Figure 3>.

(6)

<Figure 4> UML-based Activity Model of Recommender Service input : current user Ux, current used service Sy

output: a recommended list of l services U = set of users;

for each user Uj in U do

Compute the similarity between Uj and Ux;

end

Sort Uj∈ U in descending order of similarity values;

Select Top-k users from the sorted list of Uj ∈ U;

A′[m×k] = usage data of the selected users;

S = set of services;

for each service Si in S do

Compute the similarity between Si and Sy based on the new usage;

Matrix A′[m×k];

endSort Si∈S in descending order of similarity values;

Select Top-l services for recommendation;

to search for relevant services based on con- texts but also allows process analysts to add constraints to the requested context to filter the searching results. The query language in our approach consists of three parameters, which are: associated service, connection constrain, and radius. The associated service is the service whose neighborhood context is taken into account to match with other con- texts.

Connection constrains are services or con-

nection flows to be included/excluded to filter

the query’s results. The radius is the number

of connection layers taken into account for the

neighborhood context matching. It specifies

the largeness of the considered contexts.

(7)

In our Query’s execution, we developed our query to filter the services returned by the context matching. In general, the procedure of the query execution is as follows: 1) Cap- ture the context of the associated service.

This neighborhood context is identified by the associated service and connection flows to its neighbors. The largeness of the context is specified by the radius parameter. 2) Match the context of the associated service to others in other business processes. 3) Refine the mat- ching result by selecting only services whose contexts satisfy the query’s constraint. 4) Sort the selected services based on the matching values and pick up Top-N services where the process analyst for the query’s response can tune the flexible N.

5. Conclusions

Recommendation Systems are information technologies that E-commerce merchants have adopted so that online shoppers can receive suggestions on items that might be intere- sting or complementing to their purchased items. These systems stipulate valuable assis- tance to the user’s purchasing decisions, and provide quality of push service.

We are trying to design a conceptual ser- vice model with a proposed new algorithm and user adaptation on ‘Dynamic Recommenda- tion Service’ for big data environment in this paper, we proposed an algorithm for a recom- mendation service with big data platform. To achieve a long-term real time-based model that evolves together with the users interest, it’s necessary to apply feedback techniques that provide recommendation information on this evolution. However, due to our research area, we are omitting the relevance of the users behavior, which was deemed successful by the shown feedback. Nevertheless, our re-

search suggested a new model named Dyna- mic Recommend Service which recommends the interest relevant dynamically. In addi- tion, we’re open to serve the recommendation service data from the big data system in order to resolve any specific <Query Model>.

Query ::= ServiceID, ‘:’ ,[Constraint], ‘:’ ,Radius;

ServiceID ::= Character, {Character|Digit};

Constraint ::= (‘+’|‘-’) Term | Constraint, ‘|’ ,Term;

Term ::= Item | Term,‘+’,Item | Term, ‘-’ ,Item;

Item ::= ServiceID| ConFlow | ‘(’, Constraint, ‘)’;

ConFlow ::= ‘<’, [ServiceID], ‘[’, FlowString, ‘]’, [ServiceID], ‘>’;

FlowString ::= ConElement, {ConElement};

ConElement ::= ‘sequence’ | ‘AND-split’ | ‘AND-join’

| ‘OR-split’ | ‘ORjoin’ | ‘XOR-split’ | ‘XOR-join’;

Radius ::= DigitNotZero, {Digit};

In this paper we have studied on the data collection method named ‘Dynamic Recommen- dation Service’ for Big Data processing. In the future work, it is necessary to compare and analyze the results from collected data and the performance improvement with the existing methods by constructing the system as the method proposed in this paper.

References

[1] Ahmad et al., Big data management in participatory sensing: Issues, trends and future directions, Future Generation Com- puter Systems , 2017.

[2] Balci, O. and Ormsby, W. F., Network-cen- tric military system architecture assess- ment methodology, International Jour- nal of System of Systems Engineering , Vol. 1, No. 1, 2008, pp. 271-292.

[3] Bourret, R., Bornhovd, C., and Buchmann,

A., A generic load/extract utility for data

transfer between XML documents and

relational databases, Second Internatio-

(8)

service of learning objects virtualiza- tion, Science and Technology Informa- tion and Communication for Education and Training , Vol. 12, 2005, pp. 177-204.

[6] Evangelia, C. and George, K., Local Item- Item Models For Top-N Recommenda- tion, Proceedings of the 10

^th

ACM Con- ference on Recommender Systems , 2016, pp. 67-74.

[7] Fady, D., Esther, P., and Bettina, K., A P2P Recommendation System for Large- Scale Data Sharing, Transactions on Large- Scale Data and Knowledge-Centered Sys- tems , 2011, pp. 87-11.

[8] Giancarlo, R., Rossano, S., and Enrico, G., A Decentralized Recommendation Sys- tem Based on Self-organizing Partner- ships, International Conference on Rese- arch in Networking , 2006, pp. 618-629.

[9] Greg, L., Brent, S., and Jeremy, Y., Amazon.com recommendations: Item-to- item collaborative filtering, IEEE Com- puter Society , Vol. 7, 2003, pp. 76-80.

[10] Ibrar et al., Big data: From beginning to future, International Journal of Informa- tion Management , Vol. 36, No. 6, 2016, pp. 1231-1247.

lavala, M., Semantic web services, The Journal of Computing Sciences in Col- leges , Vol. 21, No. 1, 2005, pp. 50-63.

[14] Neyland, D., Bearing accountable wit- ness to the ethical algorithmic system, Science, Technology and Human , Vol.

41, No. 1, 2016, pp. 50-76.

[15] Qian, Z., Gediminas, A., Maxwell, H., Martijn, W., and Joseph, A. K., Toward Better Interactions in Recommender Sys- tems: Cycling and Serpentining Appro- aches for Top-N Item Lists, Proceedings of the 2017 ACM Conference on Com- puter Supported Cooperative Work and Social Computing , 2017, pp. 1444-1453.

[16] Robert S., Patrick M., and Phil, T., Trans- parent fault tolerance for scalable func- tional computation, Journal of Functio- nal Programming , 26, 2016.

[17] Sanjay, M., E-Commerce Strategy, E- Commerce Strategy , 2013, pp. 155-171.

[18] Tewari, A. S., Singh, J. P., Barman, and A. G., Generating Top-N Items Recom- mendation Set Using Collaborative, Con- tent Based Filtering and Rating Vari- ance, Procedia Computer Science , Vol.

132, 2018, pp. 1678-1684.

(9)

Author Profile Kim Ryong

Kim Ryong received the B.S.

and M.S. degree in the De- partment of Computer Science from the ChungNam Natio- nal University and received the B.S. and M.S. degree in Business Administration from Korea National Open University. His current research Big Data Modeling & Simulation, Personalization