Clustering of optimized data for email forensics
RAIRO - Operations Research - Recherche Opérationnelle, Special issue - Advanced Optimization Approaches and Modern OR-Applications, Tome 50 (2016) no. 4-5, pp. 951-963.

Forensics is a study of evidence to help the police solving crimes. If we apply (Forensics) in Computer Sciences domain, crimes are mainly network attacks found more in emails; which become nowadays the most popular way of communication accessible via Internet. We receive in our Inboxes emails gangs without being aware of them. Therefore, it is necessary to build an automatic checking system to filter good emails from bad ones. In this paper, we propose a new emails processing approach using Singular Value Decomposition method (SVD) to optimize emails data before applying Data Mining techniques (Clustering) to extract bad emails located in the mail servers where the user’s inboxes are hosted. Our study is based on filtering Emails (bads and goods) by the clustering of optimized data compared with unoptimized one.

Reçu le :
Accepté le :
DOI : 10.1051/ro/2015057
Classification : 05C12, 05C50, 05B10, 91C20, 15A18, 34A05
Mots-clés : Email, feronsics, spam, SVD, LSI, optimisation, data mining, clustering
Salhi, Dhai Eddine 1 ; Tari, Abdelkamel 1 ; Kechadi, M-Tahar 2

1 University Abderrahmane Mira of bejaia, LIMED Laboratory, 06000 Bejaia, Algeria.
2 University college Dublin, Parallel Comptaional Research Group Laboratory, Dublin4, Dublin, Ireland.
@article{RO_2016__50_4-5_951_0,
     author = {Salhi, Dhai Eddine and Tari, Abdelkamel and Kechadi, M-Tahar},
     title = {Clustering of optimized data for email forensics},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {951--963},
     publisher = {EDP-Sciences},
     volume = {50},
     number = {4-5},
     year = {2016},
     doi = {10.1051/ro/2015057},
     mrnumber = {3570541},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ro/2015057/}
}
TY  - JOUR
AU  - Salhi, Dhai Eddine
AU  - Tari, Abdelkamel
AU  - Kechadi, M-Tahar
TI  - Clustering of optimized data for email forensics
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2016
SP  - 951
EP  - 963
VL  - 50
IS  - 4-5
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ro/2015057/
DO  - 10.1051/ro/2015057
LA  - en
ID  - RO_2016__50_4-5_951_0
ER  - 
%0 Journal Article
%A Salhi, Dhai Eddine
%A Tari, Abdelkamel
%A Kechadi, M-Tahar
%T Clustering of optimized data for email forensics
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2016
%P 951-963
%V 50
%N 4-5
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ro/2015057/
%R 10.1051/ro/2015057
%G en
%F RO_2016__50_4-5_951_0
Salhi, Dhai Eddine; Tari, Abdelkamel; Kechadi, M-Tahar. Clustering of optimized data for email forensics. RAIRO - Operations Research - Recherche Opérationnelle, Special issue - Advanced Optimization Approaches and Modern OR-Applications, Tome 50 (2016) no. 4-5, pp. 951-963. doi : 10.1051/ro/2015057. http://archive.numdam.org/articles/10.1051/ro/2015057/

S. Bandyopadhyay et al., Clustering distributed data streams in peer-to-peer environments. Inf. Sci. 176 (2006) 1952–1985. | DOI

R. Bekkerman, Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora (2004).

P. Bowes, Increased use of electronic communications tools among North American and European workers, press release (2000).

D. Clot, Méthodologies de fouille de données pour la modélisation dans les processus d’aide à la décision complexe: application à l’analyse des paramètres de déformation du coeur. Thèse de doctorat, Lyon 1 (2002).

S. Curtis, Pro open source mail: Building an enterprise mail solution (2006).

L. De Lathauwer, B. De Moor and J. Vandewalle, A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21 (2000) 1253–1278. | DOI | MR | Zbl

J. Diesner, T.L. Frantz and K.M. Carley, Communication networks from the Enron email corpus It’s always about the people. Enron is no different. Comput. Math. Organization Theory 11 (2005) 201–228. | DOI | Zbl

U. Fayyad, G. Piatetsky–Shapiro and P. Smyth, From data mining to knowledge discovery in databases. AI magazine 17 (1996) 37.

G.T. Fernando, Distributed systems: principles and paradigms. Edited by Andrew S. Tanenbaum, Maarten Van Steen Pearson Education, Inc., 2007 ISBN: 0-13-239227-5. J. Comput. Sci. Technol. 11 (2011) 115–116.

J.Y. Halpern and R. Fagin, Modelling knowledge and action in distributed systems: Preliminary report. Springer Berlin Heidelberg (1988). | Zbl

J. Han, M. Kamber and J. Pei, Data mining: concepts and techniques: concepts and techniques. Elsevier (2011). | Zbl

P. Hazel, Exim: The Mail Transfer Agent. O’Reilly Media, Inc. (2001).

D.T. Larose, Discovering knowledge in data: an introduction to data mining. John Wiley Sons (2014). | MR | Zbl

A. Mcdonald et al., Linux E-mail. Packt Publishing Ltd (2009).

A. Mirzal, Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition. Preprint (2010). | arXiv

D. Mullet and I. Managing, O’Reilly Media, Inc. (2000).

B. Rosario, Latent semantic indexing: An overview. Techn. Rep. Infosys 240 (2000).

P.H. Sellers, The theory and computation of evolutionary distances: pattern recognition. J. Algorithms 1 (1980) 359–373. | DOI | MR | Zbl

M. Sogrine, T. Kechadi and N. Kushmerick, Latent semantic indexing for text database selection. In: Proc. of the SIGIR 2005 Workshop on Heterogeneous and Distributed Information Retrieval (2005) 12–19.

R. Sureswaran et al., Active e-mail system SMTP protocol monitoring algorithm. In: Broadband Network Multimedia Technology, 2009. IC-BNMT’09. 2nd IEEE International Conference on. IEEE (2009) 257–260.

E. Triantaphyllou, Data Mining and Knowledge Discovery via Logic-Based Methods: Theory, Algorithms, and Applications. Springer Science Business Media (2010). | MR | Zbl

J. Tarhio and M. Tienari, Computer Science at the University of Helsinki 1991. University of Helsinki, Department of Computer Science (1991).

S. Tufféry, Data mining et statistique décisionnelle: l’intelligence dans les bases de données. Editions Technip (2005). | Zbl

G.J. Williams and S.J. Simoff (eds.). Data mining: Theory, methodology, techniques, and applications. Springer (2006).

S. Whittaker, Supporting collaborative task management in e-mail. Human Comput. Interaction 20 (2005) 49–88. | DOI

S. Whittaker and C. SIdner, Email overload: exploring personal information management of email. In: Proc. of the SIGCHI conference on Human factors in computing systems. ACM (1996) 276–283.

Cité par Sources :