The Webometrics Ranking formally and explicitly adheres to the Berlin Principles of Higher Education Institutions. The ultimate aim is the continuous improvement and refinement of the methodologies according to a set of agreed principles of good practices.
0) Background of the project.
The “World Universities' ranking on the Web” is an initiative of the Cybermetrics Lab, a research group of the Centro de Información y Documentación (CINDOC), part of the National Research Council (CSIC), the largest public research body in Spain.
Cybermetrics Lab is devoted to the quantitative analysis of the Internet and Web contents specially those related to the processes of generation and scholarly communication of scientific knowledge. This is a new emerging discipline that has been called Cybermetrics (our team developed and publishes the free electronic journal Cybermetrics since 1997) or Webometrics.
We believe that websites can reflect better the output of professors and researchers, as they can provide a more complete picture of their activities. The Web offers the possibility of using not only formal (e-journals, repositories) but also informal scholarly communication for that purpose. Web publication is cheaper, maintaining the high standards of quality of peer review processes. It could also reach much larger potential audiences, and with the current emphasis in "Open access" initiatives, it could give access to scientific knowledge to researchers and institutions located in developing countries. Third parties (economic, industrial, political or cultural stakeholders) can also benefit from the universal access granted by Web publication
With these rankings we intend to provide extra motivation to researchers worldwide for publishing more and better scientific content on the Web, making it available to colleagues and people wherever they are located.
The "Webometrics Ranking of World Universities" was officially launched in 2004, and it is updated every 6 months (data collected in January and July and published one month later). The Web indicators used are based and correlated with traditional scientometric and bibliometric indicators and the goal of the project is to convince academic and political communities of the importance of the web publication not only for dissemination of the academic knowledge but for measuring scientific activities, performance and impact too.
A) Purposes and Goals of Rankings
1. Assessment of higher education (processes, and outputs) in the Web.
The Web based rankings could be combined with other non-web indicators and we are already publishing comparative analysis with similar initiatives. But the current objective of the Webometrics Ranking is to promote Web publication by universities, evaluating the commitment to the electronic distribution of these organizations and to fight a very concerning academic digital divide which is evident even among world universities from developed countries. However, even when we do not intend to assess universities performance solely on the basis of their web output, Webometrics Ranking is measuring a wider range of activities than the current generation of bibliometric indicators that focuses only in the activities of scientific elite.
2. Ranking purpose and target groups. Webometrics Ranking is measuring the volume, visibility and impact of the web pages published by universities, with special emphasis in the scientific output (referred papers, conference contributions, pre-prints, monographs, thesis, reports, …) but also taking into account other materials (courseware, seminars or workshops documentation, digital libraries, databases, multimedia, personal pages, …) and the general information on the institution, their departments, research groups or supporting services and people working or attending courses.
There is a direct target group for the Ranking which are the university authorities. If the web performance of an institution is below the expected position according to their academic excellence, they should reconsider their web policy, promoting substantial increases in the volume and quality of their electronic publications.
Faculty members are indirect target groups as we expect that in a near future the web information could be as important as other bibliometric and scientometric indicators for the evaluation of the scientific performance of scholars and their research groups.
Finally, candidate students should not used this data as the sole guide for choosing university, although a Top position means that the institution has a policy that encourages new technologies and it has resources for their adoption.
3. Diversity of institutions: Missions and goals of the institutions. Quality measures for research-oriented institutions, for example, are quite different from those that are appropriate for institutions that provide broad access to underserved communities. Institutions that are being ranked and the experts that inform the ranking process should be consulted often.
4. Information sources and interpretation of the data provided. Access to the Web information is done mainly through search engines. These intermediaries are free, universal, and very powerful even when considering their shortcomings (coverage limitations and biases, lack of transparency, commercial secrets and strategies, irregular behaviour). Search engines are key for measuring visibility and impact of university’s websites.
There are a limited number of sources that can be useful for webometric purposes: 7 general search engines (Google*, Yahoo Search*, Live (MSN) Search*, Exalead*, Ask (Teoma), Gigablast and Alexa) and 2 specialised scientific databases (Google Scholar* and Live Academic). All of them have very large (huge) independent databases, but due to the availability of their data collection procedures (Apis), only those marked with asterisk are used in compiling the Webometrics Ranking.
5. Linguistic, cultural, economic, and historical contexts. The project intends to have true global coverage, not narrowing the analysis to a few hundreds of institutions (world-class universities) but including as many organizations as possible. The only requirement in our international rankings is having an autonomous web presence with an independent web domain. This approach allows a larger number of institutions to monitor their current ranking and the evolution of this position after adopting specific policies and initiatives. Universities in developing countries have the opportunity to know precisely the indicators' threshold that marks the limit of the elite.
Current identified biases of the Webometrics Ranking includes the traditional linguistic one (m ore than half of the internet users are English-speaking people), and a new disciplinary one (technology instead of biomedicine is at the moment the hot topic) Since in most cases the infrastructure (web space) and the connectivity to the Internet already exits , the economic factor is not considered a major limitation (at least for the 3.000 Top universities).
B) Design and Weighting of Indicators
6. Methodology used to create the rankings. The unit for analysis is the institutional domain, so only universities and research centres with an independent web domain are considered. If an institution has more than one main domain, two or more entries are used with the different addresses. About 5-10% of the institutions have no independent web presence, most of them located in developing countries. Our catalogue of institutions includes not only universities but also other Higher Education institutions following the recommendations of UNESCO. Names and addresses were collected from both national and international sources including among others:
University activity is multi-dimensional and this is reflected in its web presence. So the best way to build the ranking is combining a group of indicators that measures these different aspects. Almind & Ingwersen proposed the first Web indicator, Web Impact Factor (WIF), based on link analysis that combines the number of external inlinks and the number of pages of the website, a ratio of 1:1 between visibility and size. This ratio is used for the ranking but adding two new indicators to the size component: Number of documents, measured from the number of rich files in a web domain, and number of publications being collected by Google Scholar database. As it has been already commented, the four indicators were obtained from the quantitative results provided by the main search engines as follows:
Size (S). Number of pages recovered from four engines: Google, Yahoo, Live Search and Exalead. For each engine, results are log-normalised to 1 for the highest value. Then for each domain, maximum and minimum results are excluded and every institution is assigned a rank according to the combined sum.
Visibility (V). The total number of unique external links received (inlinks) by a site can be only confidently obtained from Yahoo Search, Live Search and Exalead. For each engine, results are log-normalised to 1 for the highest value and then combined to generate the rank.
Rich Files (R). After evaluation of their relevance to academic and publication activities and considering the volume of the different file formats, the following were selected: Adobe Acrobat (.pdf), Adobe PostScript (.ps), Microsoft Word (.doc) and Microsoft Powerpoint (.ppt). These data were extracted using Google and merging the results for each filetype after log-normalising in the same way as described before.
Scholar (Sc). Google Scholar provides the number of papers and citations for each academic domain. These results from the Scholar database represent papers, reports and other academic items.
The four ranks were combined according to a formula where each one has a different weight:
Webometrics Rank (position)=
7. Relevance and validity of the indicators. The choice of the indicators was done according to several criteria (see note), some of them trying to catch quality and academic and institutional strengths but others intending to promote web publication and Open Access initiatives. The inclusion of the total number of pages is based on the recognition of a new global market for academic information, so the web is the adequate platform for the internationalization of the institutions. A strong and detailed web presence providing exact descriptions of the structure and activities of the university can attract new students and scholars worldwide . The number of external inlinks received by a domain is a measure that represents visibility and impact of the published material, and although there is a great diversity of motivations for linking, a significant fraction works in a similar way as bibliographic citation. The success of self-archiving and other repositories related initiatives can be roughly represented from rich file and Scholar data. The huge numbers involved with the pdf and doc formats means that not only administrative reports and bureaucratic forms are involved. PostScript and Powerpoint files are clearly related to academic activities.
8. Measure outcomes in preference to inputs whenever possible. Data on inputs are relevant as they reflect the general condition of a given establishment and are more frequently available. Measures of outcomes provide a more accurate assessment of the standing and/or quality of a given institution or program. We expect to offer a better balance in the future, but current edition intend to call the attention to incomplete strategies, inadequate policies and bad practices in web publication before attempting a more complete scenario.
9. Weighting the different indicators: Current and future evolution. The current rules for ranking indicators including the described weighting model has been tested and published in scientific papers. More research is still done on this topic, but the final aim is to develop a model that includes additional quantitative data, especially bibliometric and scientometric indicators.
C) Collection and Processing of Data
10. Ethical standards. We identified some relevant biases in the search engines data including under-representation of some countries and languages. As the behaviour is different for each engine, a good practice consists of combining results from several sources. Any other mistake or error is unintentional and it should not affect the credibility of the ranking. Please contact us if you think the ranking is not objective and impartial in any way.
11. Audited and verifiable data. The only source for the data of the Webometrics Ranking is a small set of globally available, free access search engines. All the results can be duplicated according to the describing methodologies taking into account the explosive growth of the web contents, their volatility and the irregular behaviour of the commercial engines.
12. Data collection. Data are collected during the same week, in two consecutive rounds for each strategy, being selected the higher value. Every website under common institutional domain is explored, but no attempt has been done to combine contents or links from different domains.
13. Quality of the ranking processes. After automatic collection of data, positions are checked manually and compared with previous editions. Some of the processes are duplicated and new expertise is added from a variety of sources. Pages that linked to the Webometrics Ranking are explored and comments from blogs and other fora are taken into account. Finally, our mailbox receives a lot of requests and suggestions that are acknowledged individually.
14. Organizational measures to enhance credibility. The ranking results and methodologies are discussed in scientific journals, and presented in international conferences. We expect international advisory or even supervisory bodies to take part in future developments of the ranking.
D) Presentation of Ranking Results
15. Display of data and factors involved.
The published tables show all the Web indicators used in a very synthetic and visual way. Rankings are provided not only from a central Top 3000 classification but also considering several regional rankings for comparative purposes.
16. Updating and error reducing
. The listings are offered from asp dynamic pages build on several databases that can be corrected when errors or typos are detected.
Our group thanks the comments, suggestions and proposals than can be useful for improving this website. We try to maintain an objective position on the quantitative data provided but mistakes can occur. Please, take into account that merging, domain change or networks problems can affect the ranking of the institutions.
Currently the members of our team are Isidro F. AGUILLO, José Luis ORTEGA, Mario FERNÁNDEZ (Webmaster) and Helena ZAMORA.
For more information please contact:
Isidro F. Aguillo
CINDOC - CSIC
Joaquín Costa, 22
28002 Madrid. SPAIN
- Aguillo, I. F.; Granadino, B.; Ortega, J. L.; Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetric indicators. Journal of the American Society for the Information Science and Technology, 57(10): 1296 - 1302.
- Wouters, P.; Reddy, C. & Aguillo, I. F. (2006). On the visibility of information on the Web: an exploratory experimental approach. Research Evaluation, 15(2):107-115.
- Ortega, J L; Aguillo, I.F.; Prieto, JA. (2006). Longitudinal Study of Contents and Elements in the Scientific Web environment. Journal of Information Science, 32(4):344-351.
- Kretschmer, H. & Aguillo, I. F. (2005).New indicators for gender studies in Web networks. Information Processing & Management, 41 (6): 1481-1494.
- Aguillo, I. F.; Granadino, B.; Ortega, J.L. & Prieto, J.A. (2005). What the Internet says about Science. The Scientist, 19(14):10, Jul. 18, 2005.
- Kretschmer, H. & Aguillo, I. F. (2004). Visibility of collaboration on the Web. Scientometrics, 61(3): 405-426.
- Cothey V, Aguillo IF & Arroyo N (2006). Operationalising “Websites”: lexically, semantically or topologically?. Cybermetrics, 10(1): Paper 4. http://www.cindoc.csic.es/cybermetrics/articles/v10i1p4.html