Automated Text Clustering of Newspaper and Scientific Texts in Brazilian Portuguese: Analysis and Comparison of Methods
Abstract
Keywords
Full Text:
PDFReferences
Aires, R. V. X. (2000). Implementação, Adaptação, Combinação e Avaliação de Etiquetadores para o Português do Brasil. [Implementation, Adaptation, Combination and Evaluation of Brasilian Portuguese Taggers.] Unpublished MsC Thesis. Universidade de São Paulo, São Paulo, Brazil.
Aluísio S. M., & Almeida G. M. B. (2006). O que é e como se constrói um corpus? Lições aprendidas na compilação de vários corpora para pesquisa lingüística. [What is a corpus and how to build a corpus? Lessons learned during the compilation of various corpora for linguistic research.] Calidoscópio, 4(3), 155-177.
Basic, B. D., Berecek B., & Cvitas A. (2005). Mining textual data in Croatian, in Proceedings of the 28th International Conference, MIPRO 2005, Business Intelligence Systems. (pp. 61–66).
Bezerra, G. B., Barra, T. V., Ferreira, H. M., & Von Zuben, F. J. (2006). A hierarchical immune-inspired approach for text clustering. Selected papers based on the presentations at the 2006 conference on Information Processing and Management of Uncertainty, IMPU, Paris, France. (pp. 131-142).
Biderman, M. T. C. (2001). O Português Brasileiro e o Português Europeu : Identidade e contrastes. [The Brazilian Portuguese and European Portuguese: Identity and contrasts.] Revue belge de philologie et d'histoire, 79 (3), 963-975.
Caldas Junior, J., Imamura, C.Y.M., & Rezende, S.O. (2001). Avaliação de um Algoritmo de Stemming para a Língua Portuguesa. [Evaluation of a Stemming Algorithm for Portuguese Language.] The Proceedings of the 2nd Congress of Logic Applied to Technology, São Paulo, Brazil. (pp. 267-274). São Paulo, Brazil: SENAC/Plêiade.
Camargo, Y. B. L. (2007). Abordagem lingüística na classificação de textos em português. [A linguistic approach to the classification of texts in Portuguese.] Unpublished MsC Thesis. COPPE - Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.
DaSilva, C. F., Vieira, R., Osório, F. S., & Quaresma, P. (2004). Mining Linguistically Interpreted Texts. In Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora.
Fellbaum, C. (1998). WordNet. An electronic lexical database. Cambridge, MA: MIT Press.
Furlanetto, M. M. (2008). Neological Formations in Brazilian Portuguese: a Discursive View. Fórum Lingüístico, 5 (2), 1-22, Florianópolis, Brazil.
Goldberg D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley.
Hall M. A. (1998). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand. (University of Waikato Ph.D. dissertation).
IBGE (2000). Brasil: 500 anos de povoamento [Brazil: 500 years of settlement]. Rio de janeiro: IBGE.
Jones, G., Robertson, A. M., Santimetvirul, C., & Willett, P. (1995). Non-hierarchic document clustering using a genetic algorithm. Information Research, 1(1), Retrieved 15 April, 2012 from http://InformationR.net/ir/1-1/paper1.html.
Hammouda, K. M., & Kamel. M. S. (2004). Efficient phrase-based document indexing for web document clustering. IEEE Transactions on Knowledge and Data Engineering, 16(10), 1279–1296.
Kohonen, T. (1997). Self-Organizing Maps. 2nd ed., Berlin: Springer-Verlag.
Kohonen, T. (1998). Self-Organization of Very Large Document Collections: State of the Art. Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks, Skövde, Sweden, 2-4 September, 1998, 8(1), 65-74: Springer.
Maia, L. C., & Souza, R. R. (2010). Uso de sintagmas nominais na classificação automática de documentos eletrônicos. [The use of noun phrases in automatic classification of electronic documents.] Perspect. Ciênc. Inf., 15(1), 154-172.
Manning, C. D, Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi: Cambridge University Press.
Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.
Markov, Z., & Larose D. T. (2007). Data Mining the Web: Uncovering Patterns in Web Content, Structure and Usage. Hoboken, New Jersey: John Wiley ; Sons, Inc.
Palmeira, E., & Freitas, F. (2007). Ontologias detalhadas e classificação de texto: uma união promissora. [Detailed ontologies and text classification: a promising union.] ENIA 2007: VI Encontro Nacional de Inteligência Artificial. Rio de Janeiro, Brazil, July, 03-06, 2007. Rio de Janeiro: Instituto Militar de Engenharia.
Ranganathan, S. R. (1967). Prolegomena to Library Classification. London: Asia Publishing House.
Ratnaparkhi, A. (1996). A Maximum Entropy Model for Part-of-Speech Tagging. Proceedings of the First Empirical Methods in NLP Conference. University of Pennsylvania, May 17-18, 1996. (pp. 133-142).
Reis, João José (2000). Presença Negra: conflitos e encontros. In Brasil: 500 anos de povoamento [Black Presence: conflicts and encounters]. Rio de Janeiro: IBGE, 2000. pp: 91.
Rossel, M., & Velupillai, S. (2005). The Impact of Phrases in Document Clustering for Swedish. Proceedings of the 15th NODALIDA conference, NoDaLiDa 2005, Joensuu, Finland. (pp.173-179).
Seno, E. R. M., & Nunes, M. D. V. (2008). Some Experiments on Clustering Similar Sentences of Texts in Portuguese. In Teixeira, A., StrubeDeLima, V. L., CaldasDeOliveira, L., Quaresma, P. (Eds.), Lecture Notes in Artificial Intelligence, Vol. 5190. 8th International Conference on Computational Processing of the Portuguese Language, PROPOR 2008, Aveiro, Portugal, September 08-10, 2008. (pp. 133-142). Berlin, Germany: Springer-Verlag.
Silva. A. S. (2006). Sociolinguística cognitiva e o estudo da convergência/divergência entre o Português Europeu e o Português Brasileiro. [Cognitive Sociolinguistics and the study of convergence / divergence between European Portuguese and Brazilian Portuguese.] Veredas :Revista de Estudos Lingüísticos, 10 (2006): Universidade Federal de Juiz de Fora.
Slonin, N., Friedman N., & Tishby, N. (2002). Unsupervised document classification using sequential information maximization. Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 11-15, 2002. (pp. 129-136). New York: ACM Press.
Song W., & Park S. C. (2006). Genetic Algorithm-based Text Clustering Technique. In Licheng Jiao, Lipo Wang, Xinbo Gao, Jing Liu, Feng Wu (Eds.), Lecture Notes in Computer Science, Vol. 4221. Advances in Natural Computation, Second International Conference, ICNC 2006, Xi'an, China, September 24-28, 2006. (pp. 779-782). Berlin: Springer-Verlag.
Stefanowski, J., & Weiss, D. (2003). Web search results clustering in Polish: experimental evaluation of Carrot. Advances in Soft Computing, Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM´03 Conference, Zakopane, Poland, vol. 579 (14). (pp. 209-22).
Viera, A.F.G., & Virgil, J. (2007). Uma revisão dos algoritmos de radicalização em língua portuguesa. [A review of stemming algorithms for Portuguese Language.] Information Research, 12(3), paper 315. Retrieved 15 April, 2012 from http://InformationR.net/ir/12-3/paper315.html.
Witten I. H., & Frank E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2º Ed. Amsterdam, Boston, Heidelberg, London, New York, Oxford, Paris, San Diego, San Francisco, Singapure, Sydney, Tokyo: Elsevier.
DOI: http://dx.doi.org/10.4301/s1807-17752014000200011
Copyright (c)