Omitir los comandos de cinta
Saltar al contenido principal
Inicio de sesión
Universidad EAFIT
Carrera 49 # 7 sur -50 Medellín Antioquia Colombia
Carrera 12 # 96-23, oficina 304 Bogotá Cundinamarca Colombia
(57)(4) 2619500

CAOBA (2016-2017)

This report contains all methodological and theoretical aspects related to the procedure of extracting information from data, in order to construct useful knowledge for the establishment of candidate groups based on the data provided by Bancolombia for the “Risk and Value in Financial Communities” in the framework of the Alianza CAOBA initiative. The main goal is to provide the most useful and possible interaction matrices that relates the members within the network, to be used in the econometric model estimation by Economy team researchers. The econometric model will reveal if the variables chosen were the most relevant for the definition of the economy community. It is important to state that from the biological point of view the definition of community diverges from the economy framework, but we will try to establish such relationships and also differences in the body of this report. 


In the framework of the CAOBA initiative, Universidad EAFIT is leading a research project entitled: “Risk and Value in Financial Communities” looking for the establishment of the possible communities within the subset of clients with certain features in terms of the Value and Risk for Bancolombia. To this end, the team of Universidad EAFIT configured the possible steps from several knowledge fields such as Finances, Economy and Sciences. For the proper establishment of the relationships between members of the network and the definition of the economic communities the Economy research team involved in this project defined an econometric model based on several assumptions [5, 10]:

1. Disjoint groups. 

2. Interaction matrix related to “transactionality” (see report from Economy team).
3. Interaction matrix related to certain financial attributes (see report from Finances team).

Our goal, as researches of Sciences field, is to build such matrices that fulfills the following conditions:

  1. Matrices obtained from candidate groups via non-supervised methods, to be an economic community in terms of the econometric model.

  2. Sparse adjacency matrices with main diagonal entries equal to zero, obtained from the relationships between certain members of the candidate communities.

  3. Block diagonal matrices whose diagonal elements are square matrices that describes the interactions of the candidate communities.

The aforementioned matrices will be obtained from two datasets that describes clients in terms of its transactionality and its financial states1. In particular, these datasets are generated by the team of Finances following certain criteria (See Finances Team report).

It is important to address that even if the datasets provided by the Finances team was filtered and certain variables selected to this research, we must be first preprocess it because of the non structured nature of certain attributes.

On the other hand, we are not going to use yet text mining techniques (filtering/stemming, stop words) [16], but we are going to implement certain alpha cuts over the results obtained by non crisp clustering algorithms.

Now, we propose to develop several non-supervised strategies in order to find the required matrices and for this aim, we will use a set of clustering techniques based on certain principles both crisp and fuzzy [23, 16], as well as adjacency [21, 12], together with dimensionality reduction and visualization algorithms used mainly in metagenomic applications [19, 9]. Our procedure of extracting information from data, in order to construct useful knowledge will be tested with this problem, but need to be proved later on the Econometric Model esteemed by Bayesian Inference. Also, for the verification of the information retrieved, it is important the interdisciplinary work to figure it out if the results are consistent with the possible interactions within the Bancolombia clients in terms of their value and risk.

This report is organized as follows: section 2 presents generally the methodology to fol- low, section 3 introduces the feature spaces generated with the data and describe properly the procedures for its generation, section 4 presents the results and contains a discussion. Fi- nally, section 5 will address the concluding remarks, and section 6 will provide supplementary material for this report. 

Here some nice pictures:


  1. [1]  Santiago Rodríguez C., Manuela O. Bastidas, and O. Lucia Quintero M. Order dependent one-vs-all tree based binary classification scheme for multiclass automatic speech emotion recognition. In Memorias del XVI Congreso Latinoamericano de Control Automático, CLCA 2014, 2014.

  2. [2]  Stephen L. Chiu. Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3):267–278, 1994.

  3. [3]  James W. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997.

  4. [4]  Geoff Dougherty. Pattern Recognition and Classification. Springer New York, 2013.

  5. [5]  Lung fei Lee. Identification and estimation of econometric models with group interactions,

    contextual factors and fixed effects. Journal of Econometrics, 140(2):333 – 374, 2007.

  6. [6]  K. He, Y. Sun, D. Bindel, J. Hopcroft, and Y. Li. Detecting overlapping communities from local spectral subspaces. In 2015 IEEE International Conference on Data Mining (ICDM), pages 769–774, 2015.

  7. [7]  K. He, Y. Sun, D. Bindel, J. Hopcroft, and Y. Li. Detecting overlapping communities from local spectral subspaces. ArXiv e-prints, 2015.

  8. [8]  Jyh-Shing Roger Jang, Chuen-TsaiSun, and Eiji Mizutani. Neuro-fuzzy and Soft Com- puting: A Computational Approach to Learning and Machine Intelligence. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1997.

  9. [9]  Cedric C. Laczny, Nicolas Pinel, Nikos Vlassis, and Paul Wilmes. Alignment-free visual- ization of metagenomic data by nonlinear dimension reduction. Scientific Reports, 4:4516, March 2014.

  10. [10]  Lung-fei Lee, Xiaodong Liu, and Xu Lin. Specification and estimation of social interaction models with network structures. Econometrics Journal, 13(2):145–176, 2010.

  11. [11]  Y. Li, K. He, D. Bindel, and J. Hopcroft. Overlapping community detection via local spectral clustering. ArXiv e-prints, 2015.

  12. [12]  Y. Li, K. He, D. Bindel, and J. Hopcroft. Uncovering the small community structure in large networks: A local spectral approach. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, pages 658–668, New York, NY, USA, 2015. ACM.

  13. [13]  L. Lovász. Random walks on graphs: A survey. In D. Miklós, V. T. Sós, and T. Szőnyi, editors, Combinatorics, Paul Erdős is Eighty, volume 2, pages 353–398. János Bolyai Mathematical Society, Budapest, 1996.

  14. [14]  Bryan F. J. Manly. Multivariate statistical methods: A primer. Chapman & Hall/CRC Press, 3rd edition, 2005. 

  1. [15]  Kimball Martin. Graph theory and social networks - spring 2014 notes, April 2014. Accesed: 28/05/2016.

  2. [16]  O. Lucia Quintero Montoya, Luisa F. Villa, Santiago Muñoz, Ana C. Ruiz Arenas, and Manuela Bastidas. Information retrieval on documents methodology based on entropy filtering methodologies. International Journal of Business Intelligence and Data Mining, 10(3):280–296, 2015.

  3. [17]  Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 849–856, 2001.

  4. [18]  Daniel Spielman. Spectral graph theory, 2015. Accesed: 28/05/2016.

  5. [19]  Laurens van der Maaten. Barnes-hut-sne. CoRR, abs/1301.3342, 2013.

  6. [20]  Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9:2579–2605, 2008.

  7. [21]  Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.

  8. [22]  Martin Wattenberg, Fernanda Viégas, and Ian Johnson. How to use t-sne effectively., 2016.

  9. [23]  Rui Xu and Donald C. Wunsch. Clustering algorithms in biomedical research: A review. IEEE Reviews in Biomedical Engineering, 3:120–154, 2010.

  10. [24]  R. R. Yager and D. P. Filev. Approximate clustering via the mountain method. IEEE Transactions on Systems, Man, and Cybernetics, 24(8):1279–1284, Aug 1994.

  11. [25]  L.A. Zadeh. Fuzzy sets. Information and Control, 8(3):338 – 353, 1965.

  12. [26]  L.A. Zadeh. Fuzzy algorithms. Information and Control, 12(2):94 – 102, 1968. 

Última modificación: 07/02/2017 23:10