Abstract
Image search is the second most frequently used search service on the Web. However, there are very few studies investigating any aspect of it. In this study, we investigate the precision of Web image search engines of Google and Bing for popular and less popular entities using text-based queries. Furthermore, we investigate four additional aspects of Web image search engines that have not been studied before. We used 60 different queries in total from three different domains for popular and less popular categories. We examined the relevancy of the top 100 images for each query. Our results indicate that image search is a solved problem for popular entities. They deliver 97% precision on the average for popular entities. However, precision values are much lower for less popular entities. For the top 100 results, average precision is 48% for Google and 33% for Bing. The most important problem seems to be the worst cases in which the precision can be less than 10%. The results show that significant improvement is needed to better identify relevant images for less popular entities. One of the main issues is the association problem. When a Web page has query words and multiple images, both Google and Bing are having difficulty determining the relevant images.
1. Introduction
There are many images on the Web and this number is growing every day. ‘More than 250 billion photos have been uploaded to Facebook as of September 2013, and more than 350 million photos are uploaded every day on average’. 1 These images contain very valuable information and users frequently search for images on the Web. According to Google [1], image search is the second most frequently used search service after the general Web search. Google reports that it performs image search among hundreds of billions of images as of 2009. Therefore, image search is a very important form of information retrieval on the Web.
General algorithms for text-based document searches on the Web are well established. There are well-known algorithms for crawling, indexing, ranking and performing the searches [2]. However, the general algorithms for image search are less established. In the case of document searches, both the query and the documents are natural language texts. Search engines can compute the similarity of a query to documents by using text similarity methods and select the most similar documents. However, when performing image searches, the query and the target images are in two different forms. While the query is in the form of natural language text, the target documents are image pixels. Therefore, there is no way of directly comparing the similarity of queries to target images.
Web image search engines rely heavily on the texts in Web pages to determine the aboutness of images on those Web pages [1, 3, 4]. For every image, they determine a set of relevant keywords by using the alt text of the image source tag, image file name, image caption, surrounding text, title of the Web page, content of the Web page, etc. Then, they calculate the similarity of queries to the relevant keywords of images. In addition, they use the hyperlink structure of the Web to determine the popularity and authority of images [5, 6].
There are two main problems with this method. First, many objects or concepts that exist in images may not be mentioned in these texts. Usually, an image contains so much information that even a person cannot determine all relevant keywords for it. Therefore, indexing images based only on the texts results in poor recall. Second, many of the keywords on those Web pages may not be relevant to the indexed images on the Web pages. This results in poor precision. In addition, many Web pages contain more than one image and a lot of text. It is not easy to associate the correct keywords on a Web page with the correct images on that Web page. There have been many studies addressing this problem [4, 7], but it is still a difficult issue.
There have been many studies for developing methods to automatically annotate images and determine the relevant keywords by examining the image content. The objective of these studies is to determine a more complete set of relevant keywords for images. This should improve both the precision and recall of image search systems. A review of the recent works is given by [8]. The recent trend in automatic image annotation is to use machine-learning methods to label images or their subsections. However, these methods are computing-intensive and work with a limited set of keywords. In general, automatic image annotation is a very difficult problem [9]. It is unlikely to get acceptable performance in the scale of the Web in which there are millions of keywords and billions of images.
While there are many studies investigating various aspects of Web search engines for document retrieval [10], there are far fewer studies investigating aspects of Web image search engines. We were able to find only three studies [11, 12, 13] in the literature related to the evaluation of Web image search engines.
In this study, we investigate the precision of Web image search engines for popular and less popular entities. The main difference between our study and those three previous studies is the fact that we determined separate query sets for popular and less popular categories. In all the previous studies, most of the selected queries represent popular entities and some represent less popular ones. However there is no discussion about the effects of the popularity of searched entities for the precision. Our study shows that the popularity of searched entities affects the precision significantly. In addition, the limits of Web image search engines are better observed by using less popular queries. It is more difficult for Web image search engines to locate entities that have fewer instances on the Web. These tests reveal more about the strengths and weaknesses of Web image search engines. They show the areas where more research is needed. Moreover, we select unambiguous queries to better evaluate the relevancy of returned results. In addition, the extra research questions that we investigate have not been studied before.
1.1. Research questions
In this study, we focus on two Web image search engines that crawl the public Web for images and accept natural language queries: Google image search and Bing image search. We investigate their precision for natural language queries. More specifically, we address the following research questions.
What is the precision of Web image search engines for popular entities that are very common and have many images on the Web?
What is the precision of Web image search engines for less popular entities that are less common and have fewer images on the Web?
We also examine the following subquestions.
Does the type of searched entities affect the precision of Web image search engines?
Do these image search engines employ automatic image annotation methods when indexing images or do they rely solely on texts in Web pages?
Can they associate the correct keywords on Web pages with the relevant images on the same Web page?
Do they index images for the text on them using OCR?
Comparing the two image search engines is not our primary goal. Our primary goal is to understand the state of the art in Web image search engines. We would like to determine the strengths and weaknesses of Web image search engines. This should help researchers to understand and identify the problems in this field better. In addition, it would help end users to better understand and use Web image search engines.
The paper is organized as follows. In the next section, we give an overview of image search engines on the Web. In section 3, we review the literature. In section 4, we explain the method that we used. In sections 5 and 6, we investigate precisions of image search engines for popular and less popular entities. In section 7, we investigate the subquestions outlined above. In section 8, we present the conclusions.
2. Image search engines
Image search is a difficult problem and there are many proposed solutions. We provide a brief overview of current image search engines. We divide the image search engines into the following five categories based on the methods they use and the services they provide.
Web image search engines.
Tag-based image search engines.
Graph-based image search engines.
Content-based image search engines.
Image-to-query search engines.
Web image search engines crawl the public Web for images and mainly use the texts on Web pages to index them. They also use the graph structure of the Web to calculate the popularity and authority of images. They primarily provide search services for text-based queries. Primary examples include Google Image Search, Bing Image Search, Yandex Image Search and Yahoo Image Search.
Tag-based image search engines compile their own image sets by encouraging users to upload images to their Web sites. Users tag their images when uploading to the Web site. They may also provide a title and description for images. Other users can post comments, like them or tag as favourites. Queries are natural language texts and image search is primarily performed on image tags, descriptions and titles. They also use comments, favourites and the number of views to rank the results. Primary examples include Flickr and Instagram.
Graph-based image search engines construct a graph database consisting of entities and relationships. Entities are people, places, institutions, businesses, concepts, etc. Relationships are the connections between them. Images are associated with entities in this graph and can be searched with semi-structured text queries. Some sample queries are ‘photos uploaded by my friends’, ‘Photos of Empire State Building’ and ‘photos of empire state building uploaded by my friends’. A primary example is Facebook image search.
Content-based image search engines take an image as a query and return a ranked list of similar images. They measure the similarity of the query image to the images on the search engine collection. They find exact copies, different sizes and modified images from different sources. Google, Bing and TinEye are all providing such a search service.
Image-to-query search engines accept an image as a query and return a text description for that image. When the image of the Empire State Building is submitted as a query, it should return the text ‘Empire State Building’. This service is provided only by Google. When returning the text description of the query image, it also performs a general Web search on the determined description. This service is designed to be used primarily by mobile users. They may take a picture of something and query the search engine for more information.
3. Literature review
We provide the literature review for studies investigating the precision of Web document search engines, studies investigating the precision of Web image search engines and studies investigating various aspects of Web image search. There have been many studies evaluating retrieval effectiveness of search engines. A review of search engine evaluation studies is given in [10].
In studies investigating the retrieval effectiveness of Web search engines, (1) a set of information needs and corresponding queries is determined, (2) queries are submitted to the search engines and the top k documents returned are tagged for relevancy, (3) precision and relative recall values are calculated for quantifying the retrieval effectiveness.
In one of the earliest studies, Gordon and Pathak [14] investigated and compared the retrieval effectiveness of seven Web search engines. They collected 33 information needs from faculty members. Experienced searchers determined the best queries and performed the searches. Faculty members who provided the information needs tagged the top 20 documents of each query for relevancy. They calculated the precision values at various document cut-off values (DCV). They also calculated the relative recall values by assuming the total number of relevant documents to be all relevant documents retrieved by these seven search engines.
Another study that investigated and compared the retrieval effectiveness of early search engines was by Hawking et al. [15]. They investigated 20 Web search engines, and selected 54 queries from search logs of two search engines. They selected longer queries (5.9 words in length on average) to make sure that the intent of the queries is clear. They submitted all queries to all search engines and tagged the top seven documents for relevancy. They calculated the precision values at cut-off points of 1 and 5. They did not calculate the recall values since that depends on accurate estimates of the total number of relevant documents in the collection.
A more recent study [16] in 2007 evaluated the retrieval effectiveness of five search engines (Google, Yahoo, Live, Ask, AOL) for 50 queries related to the field of library and information science. They collected the top ten documents for each query from each search engine and evaluated the relevancy of each document. In addition, they measured the stability of results by conducting the same experiment one month later. They compared these search engines for their precision, coverage and stability.
The most recent retrieval effectiveness study is by D. Lewandowski in 2015 [17]. While previous studies have used queries in the order of tens and usually specific to a domain, he used much larger and more representative query set. Lewandowski used 1,000 queries from a search engine log. He assembled two different query sets for informational and navigational queries, and submitted the queries to Google and Bing, then retrieved the top ten documents for each informational query and only the top link for each navigational query. He used university students to judge the relevancy of documents and paid them for each completed task. Lewandowski found that the average precision of both search engines was similar for informational queries, with Google leading by 3%. The difference was more significant for navigational queries. Google outperformed Bing with a 19% difference.
Investigating the retrieval effectiveness of image search engines is similar to the investigation of the retrieval effectiveness of Web search engines. A set of information needs or queries is determined. These are submitted to the search system and the returning images are tagged for relevancy. Then, retrieval effectiveness parameters are calculated.
Retrieval effectiveness of Web image search engines was first investigated by Çakir, Bahçeci & Bitirim [11]. They have determined seven topics from top search terms used on the Web. They selected five queries from each topic and submitted these 35 queries to four different Web image search engines. They retrieved the top 40 images for each query and determined the relevancy of each image. Another study [12] by Fendley & Kidambi investigated the retrieval effectiveness of three Web image search engines for queries related to the game of cricket. They particularly focused on the effects of three factors on retrieval effectiveness: query types, number of images retrieved and the type of search engine. Four types of queries are determined and five queries from each category are selected. These 20 queries are submitted to each Web image search engine. The top 40 images are evaluated for the relevancy. The third study, by Tokgöz et al. [13], investigated the retrieval effectiveness of Web and image search engines for Turkish queries. Under seven categories, four queries are determined and submitted to four different search engines. The top 40 results are retrieved for evaluation.
There has been an extensive amount of research on many aspects of Web image searches. We review some of those research efforts to illustrate the difficulties and proposed solutions. One of the earliest projects was WebSeer [18] from the University of Chicago in 1996. They crawled the Web for images and developed a Web image search engine. It accepted text-based queries and returned a list of relevant images. They used the various texts around the images on Web pages for indexing such as filenames, captions, alt text, html title, etc. Another project was WebSeek [19] from Colombia University in 1997. They used image content analysis methods in addition to text processing. They provided text-based querying. However, they also supported query modification using content-based relevance feedback. Users could refine the searches by selecting some of the initially returned images. They used a colour histogram similarity measure to categorize the images and determine which images were most similar to the queried images.
As Google become popular by using the hyperlink-based PageRank algorithm [20], similar methods are used in Web image search systems. The developers of the project PicASHOW [5] used the hyperlink graph of the Web to determine the authoritative images on the Web. In addition, they used the ideas from co-citation networks to relate images on the same Web pages and the images linked from the same Web pages. In another project named iFind [6], the ideas from hyperlink analysis are improved by dividing Web pages into vision-based blocks and applying block-based hyperlink analysis methods [21]. This method is less sensitive to noisy links and can better reflect the semantic relationship between images. In this method, surrounding texts are also extracted from blocks instead of whole Web pages.
An important problem for Web image searching is to associate the correct keywords in a Web page with the correct images on that Web page. The texts surrounding images are very noisy and it is difficult to identify the relevant keywords. A project named AnnoSearch [22] targeted this problem by determining visually similar images and combining the common phrases around them. It is assumed that visually similar images should also have similar keywords. Another project named ARISTA [23] applied this method to near-duplicate images on the Web. They examined two-billion Web images and discovered that 22% have near-duplicates and 8.1% have more than ten near-duplicates. They combined the surrounding texts of near-duplicate images and determined a more accurate set of keywords for them. Their approach is particularly helpful for accurately tagging popular concepts with many near-duplicate images.
Automatic image annotation [8] is another active research area for image search. The main idea is to learn semantic concepts from a large number of image samples by feature extraction and use machine learning models to label new images. This method requires sample images to be semantically labelled in advance. However, it is a very difficult task to build training datasets for concepts in the size of the Web. In addition, the accuracy of machine learning algorithms decreases as the number of concepts increases. These difficulties motivated some researchers to look for data-intensive solutions. Torralba et al. [24] studied the impact of having very large datasets when applying simple techniques for object and scene recognition. They used 80 million tiny images gathered from the Web for 75,000 non-abstract nouns from Wordnet [25]. They showed that simple non-parametric methods, in conjunction with large datasets, could give a reasonable performance on object recognition tasks.
4. The method
We investigated the precision of Web image search engines for queries from three different topics. We selected the topics to cover frequently searched domains and also to be diverse. In addition, the topics should cover images with various level of difficulty for image searching. We determined the following query topics.
4.1. Query types
We determined two sets of queries for each query topic. Each set had ten entries. One set included the queries for popular entities and the other set included the queries for less popular entities. We have selected the popular entities by choosing the very well-known entities of each topic. Our criterion for selecting the popular entities is that when we submit the query to Bing image search engine, the hit count for that query must be at least 1,000. But, usually the hit counts of selected entities are much higher.
Selection of less popular entities is more challenging. There should be both a lower bound and an upper bound on the number of images indexed. There should be some number of indexed images on the Web but not a lot. Therefore, our criterion is that the hit count for each entity must be between 100 and 500 in Bing image search engine.
At the time of conducting this research, Google image search engine did not provide the hit count estimates for image searches. Therefore we retrieved estimated hit counts from Bing image search engine. However, during this study around March 2014, Bing also stopped providing hit count estimates for image searches. After that time, we used hit count values from Yandex image search engine. The hit count values from Yandex are shown with a Y next to them.
Previous studies [30] investigating the accuracy of search engine hit counts for document searches showed that they are estimates of the number of matching documents. They cannot be taken as the true values. However, they can be used as approximate values. We assume the hit count estimates for image searches to be similar and use it in our study as a general indicator for commonness of images for selected entities.
4.2. Constructing unambiguous query sets
When selecting entities, we make sure that the selected query for the entity is not ambiguous. Many queries may refer to multiple entities. For example, there may be many people sharing the same name. A plant name may also be used for other things. While ‘orange’ is a plant name, the word ‘orange’ also refers to a colour, various companies with this name, a county in California, multiple restaurants around the world, etc. When evaluating the relevancy of resulting images, it is important that the used query refers to predominantly for a single entity.
We assume that a query predominantly refers to a single entity if Google Web search returns at least eight relevant documents out of the top ten results for that entity. After determining an initial entity set, we test each query on Google Web search and count the number of relevant documents. We leave out those that do not return at least eight relevant documents among the top ten results.
4.3. Entities and queries
We construct the queries to target the requested entities. Usually the queries are composed of the entity names. However, we add extra words to clarify the intent of the search for some queries. The queries for person entities are the full names of the selected persons. The queries for plant entities are English names of the plants. The queries for logo pictures are constructed by adding the word ‘logo’ to the company names.
4.4. Search engine settings
We used the English Web interfaces of both image search engines. We used the global .com Websites of each search engine: www.google.com and www.bing.com. All settings were in most general form except the adult filters. We enabled the adult filters in both search engines.
4.5. Query submission and relevancy evaluation
We submitted each query as a text to both image search engines by using their browser interfaces. We manually submitted each query to the search engines and saved the result pages for later examination as images. We retrieved 100 images for each query and evaluated for relevancy. One of the authors evaluated the relevancy of each returned image to the query. Each returned image is labelled as either relevant or non-relevant by using a binary judgement. For person queries, the images that have the picture of the queried person are evaluated as relevant and others are evaluated as non-relevant. For plant queries, the images that have the picture of the queried plant are evaluated as relevant and others are evaluated as non-relevant. For logo queries, the images that have the picture of the queried logo are evaluated as relevant and others are evaluated as non-relevant. We calculated the precision values at three different cut-off points: top 10, top 30 and top 100.
For the queries in the less popular category, the actual number of relevant images on the Web may be less than 100. However, there is no way of knowing the actual number of relevant images for these queries. Nonetheless, this does not affect the calculation of precision values. For the accurate calculation of precision values, we only need 100 returned images and their correct evaluations for relevancy. In our case, both image search engines returned more than 100 images for all queries in the less popular category. Therefore we calculated the precision values in those three cut-off points. If we had known that some queries had less than 100 relevant images on the Web, then the precision values at top 10 and top 30 would have been more meaningful.
There were some images that are difficult to identify by the evaluator. Particularly some plant images were difficult to distinguish from others. To minimize the number of incorrect labellings, we have taken two precautions. First, the evaluator examined many images and built up some experience in advance to recognize the queried entities better. Second, when she was not sure, she visited the source Web page and examined the text and other content in it.
5. Precision for popular entities
Although image search is a difficult task on the Web, it is easier to locate relevant images for popular entities. There are many images on the Web about the queried entities. Image search engines should be able to locate some of those relevant images and return to the users.
Table 1 shows the list of ten person queries, estimated hit count values and precision values for Google and Bing searches. Estimated hit counts are all more than 40,000. This means that there are tens of thousands of images on the Web related to these queries. Since the precision values are very high for popular entities, we only report the precision values for the top 100 images. We report the precision values as percentages, which is more convenient because precision percentage also shows the number of relevant images among the top 100 results.
Precision values of popular person searches for the first 100 result images.
The results in this table show that these two image search engines can return highly relevant images, while there are very few non-relevant images. The only exception is the query ‘Leonardo da Vinci’. For this query, both search engines return around 30 relevant images. Non-relevant images are usually his paintings. In a sense, this query is highly ambiguous for image search. Our criterion for the determination of query ambiguity failed for this query.
If we exclude the query ‘Leonardo da Vinci’, the average precision values are 99.3% for Google and 98.3% for Bing. These results show that both image search engines can retrieve highly relevant images for people that have many images on the Web. In addition, the results indicate that the query should be unambiguous for high retrieval effectiveness.
Non-relevant images are usually the images of other people. For example, two non-relevant images for the query ‘Steve Jobs’ on Google belong to two other people. Similarly, ten non-relevant images for the query ‘John F. Kennedy’ on Bing belong to another person from the Kennedy family.
Table 2 shows the results for popular plant queries. The smallest hit count value is 55,000. The results are very similar to the results of popular person searches. On average, Google provides 98.8% precision and Bing provides 97.3% precision.
Precision values of popular plant searches for the first 100 result images.
Table 3 shows the results for popular logo queries. Average precision values for logo searches are much lower compared with the popular person and popular plant searches. Average precision values are 86% for Google and 81% for Bing. The primary reason for this result is the fact that some of the queried entities have much smaller estimated hit count values. Only the first five queries have estimated hit counts of more than 40,000. The last three queries have estimated hit counts of less than 10,000.
Precision values of popular logo searches for the first 100 result images.
Precision values of the first five queries are similar to the precision values of queries from the other two topics. Average precision values are 95% for Google and 97% for Bing. As the hit count estimates decrease, precision values decrease significantly. The query with the lowest hit count value is ‘koç holding logo’ and both precision values are lower than 50% for this query.
In summary, Web image search engines can retrieve images with very high precision values as long as the searched entities have tens of thousands of images on the Web. As the estimated hit counts decreases, the precision of image search engines also decreases. There is a positive correlation between the hit count estimates and the precision values.
6. Precision for less popular entities
Image search engines index much smaller numbers of images for the entities in the less popular category. Estimated hit count values are in the range of 100 to 500. This means that there are at most around 500 associated images for these queries, and many of them are much smaller. On average, the estimated hit count value is 266. Therefore, we expect the precision values to be much lower compared with the popular entities category.
Table 4 shows the queries and the precision values for three different cut-off points for less popular person queries. Selected persons are usually academics from some Turkish universities. Compared with the popular person queries, the precision values are much smaller. For the top 100 results, on average only 24 of the returned images belong to the queried person for Google and only 19 of the returned images belong to the queried person for Bing. For popular person queries, this value was more than 95. Therefore, image search engines return many non-relevant images for less common entities. Almost four out of five images are non-relevant to the queries.
Precision values for less popular person queries.
The lowest precision value for Google results is for the query ‘Murat Uzam’. Only six relevant images are returned among the first 100 images. The lowest precision value for Bing results is for the query ‘Lemi Orhan Ergin’. Only three relevant images are returned among the first 100 images. These results imply that the precision values for image searches can be really low for some queries.
Precision values increase significantly for images returned toward the top of the result lists for both search engines. Average precision for the top ten images is 71% for Google and 48% for Bing. This is a big improvement compared with the top 100 results. Average precision for the top 30 images is 48% for Google and 32% for Bing.
There are many types of non-relevant images. The majority of non-relevant images belong to other people. However, there are other types of non-relevant images. Some non-relevant images are book covers that may or may not be related to the queried persons, some are document or presentation images and some are images of buildings, logos, plants, etc. These results imply that Google and Bing image search engines are not automatically performing face detection for the person queries. However, they have the capability of face detection on images and allow users to search for person images by selecting the proper filters in image search interfaces. When the appropriate filter is turned on, they only return images with human faces in them. We have not investigated the precision of these queries with the face detection filter turned on.
Table 5 shows the queries and the precision values for less popular plants. Although the precision values are much lower compared with the precision values of popular plants, they are much higher compared with the precision values of less popular person queries. For the top 100 results, on average 69 of the returned images belong to the queried plants for Google and 64 of the returned images belong to the queried plants for Bing.
Precision values for less popular plant queries.
Similar to the person queries, higher ranking images in result lists are much more likely to be relevant compared with the lower ranking images. Average precision for the top ten images is 98% for both Google and Bing. We investigated the possible reason for the higher precision values of less popular plant queries in subsection 7.1.
There are many types of non-relevant images for less popular plant queries. Some of the non-relevant images belong to images of other plants, some belong to images of statistical information from books and presentations and others belong to insects, maps, articles, drugs, people, etc.
The lowest precision values among top 100 results for both search engines are for the query ‘Centaurea Behen’. Of the Google image results, 36 are relevant, and 19 of the images are relevant for Bing results. This plant seems to have the most ambiguous query among our list. There are many other plants sharing the same first name ‘Centaurea’.
Table 6 shows the queries and the precision values for less popular logos. Nine of the logo queries are company logos from Turkey, and the remaining one is the logo of a university. The precision values of the less popular logo queries are higher than the precision values of less popular person queries. However, they are much lower compared with the precision values of less popular plant queries.
Precision values for less popular logo queries.
The difference between the precision values of two search engines is much higher compared with the results of other two topics. For the top 100 results, the average precision value is 51% for Google and 17% for Bing. Similarly, average precision values for cut-off points of 10 and 30 are much higher for Google. It seems that for the logo pictures, Google produces much better results.
7. Investigating the extra research questions
7.1. Effects of query types on precision
Precision values for less popular plant searches are much higher compared with the precision values for less popular person and logo searches. The difference is higher between the precision values of less popular person results and less popular plant results. Therefore, we examined the target Web pages that had the returned images for these two topics. We counted the number of images on the target Web pages and determined the number of relevant images. We tried to understand whether there are significant differences between these two types of Web pages.
We selected five queries from each query set. We retrieved the top ten images for these queries from both search engines and examined the contents of Web pages that contain these images. We could not retrieve two Web pages for less popular person queries and six Web pages for less popular plant queries. In total, we examined 98 Web pages for less popular person queries and 94 Web pages for less popular plant queries.
Table 7 shows the summary of the results. The second column shows the number of Web pages where all images on the page are relevant to the query. These Web pages don’t have any non-relevant images. Search engines cannot make a mistake when selecting the relevant image from these Web pages. Out of 98 Web pages for less popular person queries, 28 such Web pages exist. However, the number of Web pages with all relevant images is much higher for less popular plant queries. Out of 94 Web pages, 68 of them have all relevant images. On the other hand, the number of Web pages that doesn’t have any relevant images is much higher for less popular person queries. While there are only three Web pages without any relevant images for less popular plant queries, the result is 23 for less popular person queries.
Relevancy of images on Web pages to queries.
These results explain the difference between the precision values of less popular person searches and less popular plant searches. These two types of Web pages have very different properties. The Web pages that have plant images are mostly informational Web pages dedicated to a single topic. They contain information about plants with some pictures, and most of the information and pictures belong to the same plant. However, the Web pages that have person names comprise a diverse set. Those Web pages may be news articles, blog entries, etc. These kinds of Web pages have more information and pictures about many people and many types of entities. Therefore, the type of query significantly affects the precision of image searches.
7.2. Indexing images
To understand whether these two image search engines employ automated image annotation methods or rely solely on the texts in Web pages, we examined the content of Web pages that contained non-relevant images. We determined the non-relevant images among the top 100 results for two queries for less popular persons. We examined whether the target Web pages contained the query words. Table 8 shows the results of the tests for Google. Among 153 pages, there are only two Web pages that do not have any of the query words. The majority of other Web pages contain all query words. Only nine out of 153 Web pages contain some of query words.
Existence of query words on target Web pages for non-relevant images for Google results.
Table 9 shows the results of the tests for Bing. In this case, there are more Web pages that do not contain any of the query terms. Out of 157 Web pages, 11 do not contain any of the query terms. We examined these Web pages and it seems that the majority of those pages are frequently updated Web pages. Some of them are dynamic social media Web pages. They may have contained the query words at the time of indexing those images but have been updated later.
Existence of query words on target Web pages for non-relevant images for Bing results.
These results suggest that both Google and Bing are mainly performing text-based indexing for images.
7.3. Association problem
We define the association problem as linking terms in a Web page with relevant images on that Web page. Usually, Web pages have many terms and many images. Some of those terms are related to some of those images on that Web page. Some terms may not be related to any of the images on that Web page. There are many studies in literature addressing this problem [4, 7]. In this subsection, we investigate whether the association problem is a serious issue for these two search engines.
We examine the content of the Web pages that contain non-relevant images for two queries of less popular persons. If a non-relevant image is returned from a Web page that actually has a relevant image for that query, we conclude that there is an association problem. The search engine cannot accurately associate the correct terms with relevant images on that Web page.
Table 10 shows the existence of relevant images on the target Web pages for Google results. We examined the target Web pages and determined whether they contain the images of persons we are searching for. For the query ‘Tankut Yalçınöz’, half of the target Web pages contained the images of that person. For the query ‘Lemi Orhan Ergin’, 75% of the target Web pages contained the images of that person. In total for two queries, 57% of Web pages contained relevant images.
Existence of relevant images on Web pages for Google results.
Table 11 shows the existence of relevant images on the target Web pages for Bing results. The results are similar to Google results. In this case, in total 46% of Web pages contain a relevant image for the searched query.
Existence of relevant images on Web pages for Bing results.
These results show that the association problem is a very important issue for Web image search engines. When a Web page has query words and multiple images, both Google and Bing have difficulty determining the correct relevant images. Around half of non-relevant images could have been avoided if search engines had better algorithms to select among multiple images on Web pages.
7.4. Indexing with optical character recognition (OCR)
There are many images on the Web with text on them. These texts may help search engines to determine the aboutness of images. Image search engines may recognize the text on images using OCR methods and index the images based on that text. There have been many studies proposing OCR methods to extract the text from Web images [e.g. 31]. However, not all text on images can be accurately extracted. Indeed CAPTCHA applications are based on the assumption that some text on images may be easily recognized by humans but not by algorithms [9].
Compared with other image content analysis methods such as object recognition and scene detection, OCR provides higher accuracy rates. Therefore image search engines may employ OCR methods to index Web images. We conducted an experiment to determine whether these two image search engines use OCR to index images. We selected ten images from the Web with text on them, and we made sure that the text on images does not appear in the corresponding Web pages. Then, we submitted the text on images as the query to both image search engines. We examined all returned images to see whether they include the original image.
Table 12 shows the queries, Web page URLs for the images and the outcome of the test. None of the queries returned the searched images with text on them. These results strongly suggest that these two search engines are not indexing images using the text on them.
The results of the image indexing for the text on them.
| Query | URL | Bing | ||
|---|---|---|---|---|
| 1 | türkiye cumhuriyeti ilelebet payidar kalacaktır | http://www.istanbul.gov.tr/Default.aspx?pid=399 | No | No |
| 2 | Nankör diye haykırmış, Saatler her geçen an’a, meğer arkadaş değilmiş akreple yelkovan.. | http://siirsevenlere.blogcu.com/sessiz-gemi/4628417 | No | No |
| 3 | İnce bir sızı dizlerimde Yokuşlarda halsizim… Bir çocuk ağlar içimde susturamam! Boncuk boncuk buzyaşlarım… Dökülür yanaklarımdan ayak uçlarıma… tutamam! | http://birgo.mynet.com/teardrop_2008/yazi/sessiz-gemi… | No | No |
| 4 | Sometimes, you read a book and it fills you with this weird evangelical zeal, and you become convinced that the shattered world will never be put back together unless and until all living humans read the book. John Green, The Fault in Our Stars | http://www.thesilverpen.com/inspired-living-celebrating-life/inspirational-books-women-children/bookworm-the-fault-in-our-stars-by-john-green/ | No | No |
| 5 | $L000917: inc dword ptr [EBP-4] mov EAX, dword ptr [EBP-4] | https://hplusplus.wordpress.com/tag/the-h-sorting-library/ | No | No |
| 6 | Animation by Paco Zeng Start Take Card out Correct Position | http://www.ee.ryerson.ca/~courses/coe428/sorting/insertionsort.html | No | No |
| 7 | LECTURER, AND RESEARCHER AT THAMAR UNIVERSITY Mohammed HUSSEIN | http://www.slideshare.net/MohammedHussein8/quick-sort-merge-sort-heap-sort | No | No |
| 8 | ilgiyi üzerine, tozları içine çeken teknoloji: Yeni Arçelik Tornado | http://www.arcelik.com.tr/kucuk-ev-aletleri.html | No | No |
| 9 | Arçelik Gurme çay makinesi, Filter Sense özel demleme teknolojisi ve kişiye özel lezzet seçimleri ile çayınız hep ilk içtiğiniz tazelikte. | http://www.arcelik.com.tr/kucuk-ev-aletleri.html | No | No |
| 10 | Dilara Kınalı 8 yaşında. İstanbul’da yaşıyor. Öğrenci. Büyümek istiyorum Ben küçük bir çocuğum ne kötü ama ben büyümek istiyorum Büyüt beni ana | http://galeri.uludagsozluk.com/g/%C5%9Fiir-yazmak/ | No | No |
