Abstract
Introduction
Emergence of platforms primarily driven by visual content, such as Instagram, has intensified the significance of visual communication. Social photography in Instagram has been examined from multiple perspectives, such as visibility labor in advertorials (Abidin, 2016), platform effects on street art and graffiti (MacDowall & de Souza, 2018), and promotion of body type and objectification (Tiggemann & Zaccardo, 2016), using different analytical approaches (e.g., Fardouly, 2018; Ging & Garvey, 2018; Guidry et al., 2018). As these studies indicate, a range of methodologies, including ethnography, semiotics, and content analysis, has been used to examine Instagram. Despite the significance of such work, diversity of content and the increasing volume of images pose challenges for researchers. As the relevance of images for a hashtag is defined by users and the quality of such images are not uniform, making generalized claims regarding the content in visual expressions within a hashtag and its effects is difficult at best. This demands meaning-independent analyses of online visual content that could benefit from measures that can be applied across different samples, including large volumes of images, as diversity of content may depend on sample size. Design features, such as “like buttons,” allow metrification of user affect and engagement by converting them into numbers (Gerlitz & Helmond, 2013). However, scholars who highlight the need for alternative approaches to study digital platforms have stressed the need to move beyond “vanity metrics” such as brute counting of views and likes (e.g., Rogers, 2018). Lack of accessible measures beyond vanity metrics can limit the scope of scholarly work that examines social photography. While measures afforded by platforms, such as the number of views and likes, help quantify user engagement with visual content to some extent, social science research can benefit from measures that capture the essence of visual communication while allowing automated analysis to quantify the nature of visual content in social photography.
This study aims to contribute to the growing body of Instagram research on two levels. First, our goal is to address the issue mentioned above by suggesting a meaning-independent measure of visual content that allows examining Instagram images using image segmentation—that is, partitioning images into multiple regions or objects (Rajinikanth & Couceiro, 2015)—and argue that the number of machine-readable regions in an image can be used as a measure of visual affluence (“richness” of visual content). As we demonstrate in the study, visual affluence can serve as a distant measure for examining online social photography. A large number of studies that examine Instagram images tend to focus on specific social contexts, such as “pro-ana” (pro-anorexia) and “thinspiration” communities (Ging & Garvey, 2018), body image concerns and objectification among young women (Fardouly, 2018), and political campaigns (Filimonov et al., 2016). This shows a “socially oriented” turn in visual social media research. While context-specific work is necessary to examine local-level effects, it is important to theorize social photography beyond contexts. Caliandro (2018, p. 551) claimed that
the main task for the ethnographer moving across social media environments should not be exclusively that of identifying an online community to delve into but of mapping the practices through which Internet users and digital devices structure social formations around a focal object (e.g., a brand).
However, analysis of social photography can be extended beyond specific social contexts and/or focal objects, as visual content can be examined beyond representation and meaning-making. For instance, although psychological and behavioral effects of color have been subject to substantial inquiry in various fields (e.g., Ab. Jalil et al., 2012; Spence, 2018; van Esch et al., 2019), there is only a limited number of studies that examine the role color plays in online images (e.g., Hyun & Kim, 2019; Kim & Hyun, 2018).
There are many approaches to analyzing images that exceed the boundary of meaning-making. For instance, notions such as visual clutter—“state in which excess items, or their representation or organization, lead to a degradation of performance at some task” (Rosenholtz et al., 2007, p. 3); visual complexity—presence of high amounts of information in a texture (Amadasun & King, 1989) and the difficulty in providing verbal descriptions (Heaps & Handel, 1999); and entropy—the amount of information needed to describe the behavior of a given system (Schieber & Gilland, 2008)—have been used to examine visual content. We aim to propose visual affluence as a related measure that can be applied across different types of visual content and sample sizes, from a few images to large volumes of images. From a technical point of view, visual affluence is a more accessible measure to automate quantification of visual content that does not require training algorithms as required by methods such as deep learning (see Wason, 2018). In this study, we use the R EBImage package (Pau et al., 2010), which provides general-purpose functionality for image analysis to examine the applicability of image segmentation as a measure of visual affluence. In the following section, we raise the need for “visually oriented” social media image research highlighting internal incongruence and variability in visual hashtags. Then we segment a series of images to examine the potential of image segmentation as measure of visual affluence. We also map visual affluence across a range of hashtags, identifying “levels of affluence” on Instagram to initiate application of visual affluence across hashtags.
Social Photography
For more than a century, photographic images have played an exceptional role in the way we see and think about the world, ourselves, and others (Lister, 1995). In a time of network connectivity, when mobile technologies abound, photography presents us with the appearance of things and promotes the consumption of a “great many more images around, claiming our attention” (Sontag, 1973, p. 1), urging closer attention to our relationship with the photographic image. The emergence of social network sites (SNSs), especially platforms such as Instagram that are driven by visual content, has amplified the role visual content plays in everyday life. The real-time dissemination and exchange of a vast volume of photographic imagery to a networked public has challenged and shifted photographic practices (Zappavigna, 2016) toward “ubiquitous photography” (Hand, 2012; Kember, 2013). The act of photography is therefore social, networked, user-based, amateur, personal, or even vernacular (Batchen, 2002; Rubinstein & Sluis, 2008; Van House, 2011). The social media photograph moves between platforms and devices, presented in collection with other images and media, and locational and temporal data. From a photography studies perspective, Rubinstein and Sluis (2013) provide an analysis of the current state of the photographic image as a networked and an algorithmic one. They note that “the image within the network is doing something other than showing us pictures, and it is doubtful if we have the right vocabulary to address this image economy” (p. 156). Highlighting the “undecidability” of online images, Rubinstein and Sluis argue that the networked image delivers an image of the multiplicity engendered by the network to the screen rather than identity. This suggests that online social photography should be read in a variety of modes, including levels of analysis that exceed meaning and representation. In the following section, we discuss the internal incongruence and variability of content as characteristics of online social photography and raise the need for a wave of “visually oriented” social media research.
Visual Hashtags, Internal Incongruence, and Variability
Central to our inquiry of online social photography is the sociotechnical affordance of hashtags. Coined by Gibson (1986), the notion of affordances captures what an environment affords or offers to an animal. Gibson suggests that affordances in a given environment should be measured relative to the animal. Evans et al. (2017) define affordances as a relational structure between technology and user that allows or constrains behavioral outcomes. Hashtags should not be seen as mere technological elements, as users and social contexts are essential for the emergence and performance of hashtags. As Rathnayake and Suthers (2018) note, hashtags can be considered as affordances, as platforms afford their creation and different acts emerge from their use. Highlighting the user-driven construction, Caliandro (2018, p. 18) claims that hashtags are “markers through which users develop a specific thread of conversation or self-categorize their own contents.” Similarly, Bruns and Burgess (2011) describe hashtags as a user-generated mechanism for topical tagging and collating online content. Zappavigna (2016) argues that hashtags operate as social metadata in the sense that they are a form of descriptive annotation produced by users, also indicating a shift toward coordinating activity and commentary rather than simply categorizing artifacts. These “user-generated” tags can relate to a practice termed as “folksonomy” (Wal, 2007) and function as “searchable signatures” (Schlesselman-Tarango, 2013). Instagram affordances promote visual and textual communication, and there is similarity with respect to the hashtag architecture (Highfield & Leaver, 2015) with Twitter. The use of hashtags on Instagram indicates less of a conversation. However, participation in a community, presentation of the self, and collective dimensions of engagement, such as supporting visibility, characterize the use of Instagram. For instance, examining the hashtag #hipster, Caliandro (2018, p. 569) notes that “Instagram functions as a public space through which internet users co-create a specific social imaginary related to the concept of hipsterism, and in doing so helping the research to better define this phenomenon.”
Although these studies viewed user-driven construction through hashtags as a basis for content classification, self-categorization may not necessarily result in well-defined categories. In other words, utterances in co-created social imaginaries (Caliandro, 2018) may not strictly adhere to the meanings associated with the hashtag. The act of using a hashtag with specific images indicates relevance from the perspective of the user. This subjective relevance is a central characteristic of social photography. From this perspective, objective meaning in social photography is at risk in collective settings and that demands constructs and concepts which can provide interpretation beyond what is afforded by meanings associated with objects included in such images. This, we argue, shows another less discussed character of hashtags, that is, inconsistency in meaning. We suggest that this internal incongruence—diversity of content within hashtags, sometimes exceeding direct meanings with which hashtags are associated—should be taken into account when examining hashtags. Figure 1 shows four pairs of Instagram images, each representing a hashtag. Images on the left in each pair can be directly connected to the meaning of the hashtag. Images on the right are not directly associated with the hashtag. Accordingly, even strictly defined Instagram hashtags, such as #Foodporn, #Trump, and #Brexit, which can drive content creation, can still include images that do not necessarily adhere to the meanings ascribed to the hashtag. General hashtags, such as #Instagood, include a range of images that indicate relevance from the perspective of users, rather than constructing a collective and objective meaning. This internal incongruence does not mean that images that do not directly express association with the hashtag do not belong in the hashtag. On the contrary, this inconsistency should be seen as a defining characteristic of user-driven construction. Internal incongruence in content is not similar to “undecidability” (Rubinstein & Sluis, 2013) as the latter is conceptualized based on the role played by metadata by releasing the image from its “stillness” and continuing reinvention. Instead, internal incongruence acknowledges the diversity that users bring into hashtags as they define relevance from their point of view in the process of content creation.

Instagram hashtags and internal incongruence. (a) #Sushi. (b) #Trump. (c) #Brexit. (d) #Grafitti.
Although internal incongruence in social photography is unavoidable, current literature does not adequately deal with its causes and outcomes. This may have been caused by the socially oriented focus of Instagram studies. In other words, many Instagram studies focus on meaning-making within a networked social setting (i.e., hashtag) rather than how meaning is challenged in its construction. For instance, Tiggemann and Zaccardo (2016) focus on how body types are contained in Instagram images. Similarly, Rodriguez and Hernandez (2018) demonstrate how Instagram images reinforce hegemonic masculinity by fostering objectification of women. From the perspective of engagement, Filimonov and colleagues (2016) examine how the platform is used for specific purposes, such as political campaigns and mobilization. While these studies provide important insight, explaining how Instagram is used in different social contexts, they do not adequately explain internal diversity of content. We therefore argue that there is a need for a visually oriented line of inquiry focusing on Instagram content beyond meaning-making processes. Such work needs to consider effects of visual attributes, such as color combinations, shades, brightness, clutter, and complexity of online content.
Variability—differences among images that elicit similar content—is also crucial in approaching a meaning-independent reading of online images. Such variability can be caused by a range of factors including the use of filters, quality of cameras, lighting, and shapes and shades in the background. Figure 2 shows two images shared using the hashtag #Graffiti. These images are highly similar, as they include a female figure with a sword climbing a bridge located above a pipeline containing graffiti art. From the perspective of meaning, the objects in the images are similar. However, these images are considerably different from each other from the perspective of quality attributes, such as background details, angles, color variation, and brightness. Such differences may result in different reactions from users, since attributes such as color, as Kuzinas (2013) argues, can affect viewers independently of other elements. As Jue and Kwon (2013) demonstrate, color can be effective in estimating psychological states. Their work also shows that, while some colors, such as red and black, are perceived as aggressive and anxious, excessive use of black may darken the images and impressions of content. Moreover, people tend to associate brightness with positivity (Specker et al., 2018). According to Gong et al. (2017), backgrounds and hue influence the perception of color emotions to varying degrees. Examining visual attributes beyond explicit meaning is an interdisciplinary endeavor, and few studies take such an initiative. Hyun and Kim (2019), for instance, examine relationships between user characteristics and color features of images shared on Instagram. Their study shows associations between gender, agreeableness, neuroticism, openness, and neuroticism and visual attributes such as color diversity and harmony. Similarly, a study conducted by Kim and Hyun (2018) shows that pixel features, such as variance of RGB pixel values, hue, share of red color, and the share of warmth (sum of red, orange, and yellow) correlated with user personalities. While these studies provide evidence of correlation between visual properties and user attributes, such as personality, work that examines effects of such visual properties is underrepresented in social media research.

Variability in Instagram images (#Grafitti).
Quantifying Social Photography: Image Segmentation as an Approach to Measuring Visual Affluence
In this section, we discuss how image segmentation can be used to develop a visually oriented basis for classifying images. Image segmentation allows the extraction of meaningful information from images by separating them into regions or objects and it is widely applied in a range of fields, including remote sensing, medical imaging, and pattern recognition (Rajinikanth & Couceiro, 2015). Previous work has demonstrated that image segmentation can be used to examine a variety of images, such as lung images (Skourt et al., 2018), catenary images (Wu et al., 2018), sonar images (Song et al., 2019), and breast ultrasound images (Xian et al., 2018; Xu et al., 2019) and help detect critical problems (e.g., breast cancer). While this is a well-established approach in fields such as medicine, its potential for explaining content in “everyday images” has not been examined. We segmented an image of a bird (Figure 3) to examine the possibility of using image segmentation to extract information from non-microscopic images. This image was segmented using the R EBImage package (Pau et al., 2010), which provides general-purpose functionality for image analysis. The “bwlabel” function included in the package detects connected sets in a binary image and assigns labels to each set. Setting threshold values for segmentation is crucial, as it can decide the granularity of segmentation. Thresholding produces image objects with binarized pixel values (Oleś et al., 2018) that can be used to isolate objects in an image. However, the number of “objects” identified by segmentation is not similar to the number of actual (physical) objects in images. Figure 3 shows three versions of the same image segmented with different threshold values. Colors in each segmented image represent each connected set of pixels. While a threshold value close to 1 or 0 returns a lower number of objects, a threshold value close to 0.4 returns a higher number of objects. This shows that a medium-level threshold can increase the granularity of segmentation, thereby providing a more nuanced assessment of the presence of different colors, objects, and shades in the image. This “richness” can be conceptualized as a “distant measure” of image content. As shown in Figure 1, the number of objects identified by the image segmentation function does not indicate the number of physical objects that a human reader or an object detection algorithm may recognize. Instead, the number of objects in images as identified by the “bwlabel” function of the EBImage package is an indicator of the visual richness of the image.

Original versus segmented image. (a) Original image. (b) Segmented image (threshold: 0.90). No. of objects: 60. (c) Segmented image (threshold: 0.60). No. of objects: 394. (d) Segmented image (threshold: 0.40). No. of objects: 497. (e) Segmented image (threshold: 0.20). No. of objects: 304. (f) Segmented image (threshold: 0.10). No. of objects: 241.
As a metric, the number of objects identified by image segmentation is different from constructs such as visual complexity, clutter, and entropy that have been applied widely in vision research. Perceived visual complexity of images is caused by a range of factors, such as the quantity of objects, clutter, openness, symmetry, organization, and variety of colors (Oliva et al., 2004). Similarly, Pieters et al. (2010) argued that visual imagery, such as in advertisements, is complex if it has dense perceptual features and/or elaborate creative design. Although the number of objects identified through the image segmentation process indicates the extent of perceptual features and may correlate with the presence of objects and/or a range of colors in a given image, this measure cannot be considered as a metric of visual complexity. This is because the number of objects recognized by the segmentation function does not take into account the relationship between physical objects included in the image and the perceived complexity caused by their location or arrangement. Oliva et al. (2004) note that visual complexity relates to both object variety (i.e., quantity as well as the range of objects) and surface variety (i.e., complexity caused by the variety of materials and surface styles). Although image segmentation can detect the extent of visual stimuli in an image, it cannot differentiate between object variety and surface variety in visual content.
Visual clutter, another measure used to examine visual content, relates to a surplus of objects in a display, creating a “state in which excess items, or their representation or organization, lead to a degradation of performance at some task” (Rosenholtz et al., 2007, p. 3). As Moacdieh and Sarter (2007) noted, clutter relates not only to the number of objects in a display but also to their structure, organization, and order. Although defining clutter in terms of regions in an image rather than objects make a problem more traceable (Bravo & Farid, 2008), we do not treat the number of objects identified by the segmentation process as a metric of visual clutter. For instance, the image on the left (i.e., nature) in Figure 4 may be perceived as less “cluttery” and/or complex and more aesthetically pleasing than the image on the right (i.e., junk shop), although the former has a considerably higher number of objects (or regions) than the latter (see Figure 4c and d). Moreover, the number of segments in an image does not indicate visual entropy, a metric of system complexity which, according to Schieber and Gilland (2008), shows that a system is more complex if more information is needed for it to be specified. Entropy can be seen as a measure of diversity and it is maximized if items in a collection of things are different from each other (Stamps, 2003). Stamps claims that entropy is a predictor of impressions of visual diversity. Visual affluence is different from diversity or entropy, as images that include patterns consisting of the same object may include more segments than images in which the same object is not repeated, given that any other object is not present in such images.

Segmentation of an image representing a scenery and object clutter. (a) Original image—Nature. (b) Original image—Junk Shop. (c) Segmented image—Nature. (d) Segmented image—Junk Shop.
Given that the proposed measure detects “regions” in images, we suggest that the number of objects identified through image segmentation can be considered as a measure of

Differences in visual affluence in facial photographs. (a) Original image—Face A. (b) Original image—Face A. (c) Original image—Face B. (d) Segmented image.

A close-up of a selected area in a highly affluent image.
Differences in visual affluence between images within a hashtag may indicate internal incongruence as well as variability in image attributes. Figure 7 includes four segmented images representing two hashtags (#Trump and #Graffiti) that display internal incongruence as well as variability. As mentioned previously, the pair of images representing the hashtag #Trump shows internal incongruence (see Figure 1b), as the direct relevance of the image on the right (female figure) to the hashtag cannot be established without knowledge of the user’s perspective. In contrast, the image on the left is directly related to the hashtag, as it includes president Trump and information related to his performance. These images were segmented (Figure 7—top) to examine their affluence levels. The results showed that images that contain letters and dark, single-color backgrounds are less affluent than images that include subtle shades. Accordingly, the image that seems irrelevant to the hashtag (female figure) was more affluent (No. of objects: 757) compared to the more relevant image (No. of objects: 516). This shows that differences in visual affluence do not explain relevance. In other words, visual affluence does not depend on or relate to relevance.

Internal incongruence and variability captured using image segmentation. (a) #Trump. (b) #Grafitti.
We also segmented the pair of images given in Figure 2 to examine differences in affluence between images that show variability. The segmented images are given in Figure 7 (bottom). The results showed that the larger image on the left that includes a complex background is more affluent than the other, although the images may convey similar meanings. From this perspective, visual affluence is capable of explaining variability more accurately than internal incongruence. To elaborate visual affluence further, we segmented three images representing the hashtag #food. These images were selected to focus on three different qualities: (1) details in the objects on the foreground (top), (2) patterns/details in the background (middle), and (3) blurriness (bottom). Figure 8 shows images before (left) and after segmentation (right). The image on the top that included detailed objects in the foreground had the highest number of objects (2,643). The segmentation function also identified the rough pattern in the background of the picture in the middle (No. of objects 780). However, the image with a less complicated background and a blurry region had the least number of objects (258). This shows that images containing objects with complex texture and background patterns are more affluent, while images that include blurry backgrounds are less affluent than others.

Images with different levels of affluence (#Food). (a) Objects with texture, 2,643 objects. (b) Detailed/textured background, 780 objects. (c) Blurry background, 258 objects.
To apply the proposed measure across hashtags, we measured visual affluence in a sample representing five Instagram hashtags (#food, #nature, #graffiti, #minimalism, and #instagood). The sample used for analysis included 2,683 images (#Food: 500, #Nature: 518, #Minimalism: 468, #Graffiti: 500, #Instagood: 697) and was obtained using Netlytic (Gruzd, 2016) before Instagram decided to limit API access. Prior to examining differences among hashtags, we calculated the visual affluence of randomly selected images representing three hashtags (Figure 9: top: #Minimalism, middle: #Nature, and bottom: #Food). These images were segmented at three different thresholds (low: 0.25, middle: 0.50, and high: 0.75) to identify a global threshold level for analysis. In general, the number of objects was different between images at all three threshold levels. A threshold level of 0.5 captured a higher number of objects in three Instagram images representing three hashtags. Therefore, 0.5 was selected as the optimum global threshold level for segmentation. Table 1 shows minimum and maximum affluence levels, means and standard deviations, and skewness and kurtosis statistics for each hashtag. According to these statistics, two hashtags (#Nature and #Graffiti) had higher mean values than the other hashtags. Mean ranks and Mann–Whitney

Segmented Instagram images.
Descriptive Statistics—Visual Affluence in Instagram Hashtags.
Mean Ranks and Mann–Whitney
Conclusion
The visual perception and processes of meaning-making of the social media image are becoming urgent considering the variety and volume of images on social media platforms. Despite the apparent “visual turn” (Gibbs et al., 2015) of social media, research in this domain is still in its preliminary stages, when comparing it with the textual analysis of social media communication (Faulkner et al., 2018; Highfield & Leaver, 2016). The social media photograph and the interpretation of its image-based and intertextual content is more complex than that of a physical print. As discussed previously, the communicative purpose and immediate qualities of social media photography are intertwined with the dynamics of the platform and algorithmic processes (Rubinstein & Sluis, 2013). A recurring theme in the literature has been the question, challenge, and “ambiguity” of a single image interpretation. The huge amount of mobile social media images proliferates at considerable speed across networks, systems, and audiences, while rarely being looked at (Lister, 2013). In relation to the image economy of the web (Rubinstein & Sluis, 2013), the context for interpretation of individual images is difficult (Hand, 2012), challenging the traditional approach of visual qualitative research. Highlighting the issues of internal incongruence and variability, we encourage a new line of inquiry by framing visual affluence as a meaning-independent basis to capture the “richness” of online images. Image segmentation is an established measure, especially in life sciences, and we demonstrate that it can be used to quantify the richness of online social photography, including large samples. Visual affluence should be treated as a concept (or a visual property) rather than a “big data” analysis technique. Similarly, neither is it an alternative to techniques such as deep learning that are used for object identification in images. As visual affluence can be applied to a single image as well as any number of images, it is not subject to challenges related to deep learning, such as the need for large volumes of data for training algorithms, network overfitting, and brittleness (see Wason, 2018). Accordingly, this work should not be considered as an invention of a new technique to read images.
This discussion can be used to encourage what can be called a visually oriented turn in online social photography research, at least within the field of social media studies. Such a turn may benefit from hybrid methods that integrate automated data analysis with other methodologies, such as experimental designs and surveys. While our discussion of the literature focuses on highlighting issues such as internal incongruence, we have not adequately dealt with the rich body of academic work related to vision that can provide interdisciplinary insight into initiating such a turn. Previous work that shows associations between visual qualities and psychological state (e.g., Gong et al., 2017; Jue & Kwon, 2013; Kim & Hyun, 2018; Specker et al., 2018) can be extended with experiments examining correlations between visual affluence and such attributes. Moreover, survey-based research can examine correlations between user attributes and visual affluence. Such analysis may help explain the psychological effects of visual hashtags with varying degrees of affluence. This is particularly important as we have demonstrated that visual hashtags may contain different levels of affluence. Moreover, associations between user characteristics, such as personality aspects (see Hyun & Kim, 2019), content preferences, and visual affluence can be examined to understand how images of different levels of affluence appeal to certain personalities. Further work can also examine accumulation of and variances in affluence within hashtags from a more microscopic point of view. The proposed measure can be applied beyond online social photography. For instance, variances in visual affluence across frames in online videos can be used to examine user reactions to such content. Such analysis can extend work that focuses on platforms such as YouTube and Vimeo to a meaning-independent analysis of effects. Moreover, work on implications of visual affluence can benefit a range of disciplines beyond social media studies, such as advertising, branding, and political communication. For instance, previous work that examined effects of factors such as color and perceived complexity on behavioral outcomes (e.g., persuasion, comprehension, preferences) (e.g., Kareklas et al., 2014, 2019; van Mulken & Forceville, 2010) can be used as a basis to examine how visual affluence affects consumer reactions to marketing communication campaigns.
Empirical analysis that we have discussed has several limitations. First, our sample was limited and the distributions that represented each hashtag were not normal. Therefore, more analysis needs to be conducted using large samples. We collected our data before the platform limited accessibility for data collection purposes. Issues arising from global threshold levels should also be examined, as the effectiveness of segmentation depends on threshold levels.
