Sage Journals: Discover world-class research

Abstract

Objective

To evaluate the quality, reliability, and user engagement of endometriosis-related videos on TikTok and Bilibili, identifying variations by platform, uploader type, and content category to inform digital health strategies.

Methods

The top 100 videos per platform were retrieved using the Chinese keyword for “endometriosis.” After excluding irrelevant or promotional content, 195 videos (99 TikTok, 96 Bilibili) were analyzed. Categorization included uploader type (professional individuals, nonprofessionals, institutions) and content (disease knowledge, treatment, Traditional Chinese Medicine, other). Quality was assessed via Global Quality Score (GQS), modified DISCERN (mDISCERN), JAMA benchmarks, and Video Information and Quality Index (VIQI). Engagement (likes, collections, comments, shares) and duration were recorded. Analyses used the Wilcoxon rank-sum, Kruskal–Wallis, Fisher's exact, and Spearman correlations.

Results

Professionals uploaded 83.6% of videos; disease knowledge dominated (64.1%). Bilibili videos were longer (median 281.5 vs. 64.67 s; P < .0001) with higher GQS (3.29 vs. 3.04; P = .0123), mDISCERN (3 vs. 2; P < .0001), and JAMA (1 vs. 0; P < .0001). TikTok excelled in engagement (e.g., likes 355 vs. 18.5; P < .0001). Professional sources scored higher (P < .001–.003). Treatment content was most engaging but shorter (P < .001). Engagement correlated internally (P > .7) but weakly with quality (P < .3).

Conclusions

Videos show moderate quality, with Bilibili emphasizing reliability and TikTok virality. Professional content is superior, but the popularity-quality disconnect highlights needs for verification and education to reduce misinformation.

Keywords

Endometriosis short-video platforms health information quality TikTok Bilibili

Introduction

Endometriosis, a chronic inflammatory condition characterized by the presence of endometrial-like tissue outside the uterine cavity, affects a significant proportion of women of reproductive age, contributing to pelvic pain, infertility, and reduced quality of life.^1–5 Globally, it affects approximately 10% of women of reproductive age, equating to around 190 million individuals, with similar prevalence rates in China, where the age-standardized incidence has increased by 3.51% since 1990. Despite its substantial burden, diagnosis often delays by 7–10 years due to nonspecific symptoms and limited awareness, exacerbating physical and psychological impacts.^6,7

In recent years, social media has emerged as a pivotal channel for health information dissemination, particularly for conditions like endometriosis, where patients seek peer support and educational resources.⁸ Platforms such as TikTok (Douyin in China) and Bilibili, with over 600 million and 300 million active users respectively, facilitate rapid sharing of short videos, enabling accessible health education. TikTok (Douyin) and Bilibili represent two distinct ecosystems in the Chinese social media landscape. To provide necessary context for the engagement metrics, we considered the distinct characteristics of each platform, TikTok (Douyin) is characterized by short-form videos driven by algorithmic recommendations for a broad user base,^9,10 whereas Bilibili focuses on medium-to-long-form content with a unique “danmu” (bullet screen) commenting system, attracting a predominantly younger demographic.^11,12 These structural differences likely influence how health information is disseminated and consumed on each platform. However, the absence of formal medical editorial oversight or pre-publication verification on these platforms raises concerns about content accuracy and reliability, potentially leading to misinformation that influences patient decisions.^13,14

Prior studies have examined health video quality on these platforms for conditions like hypertension, thyroid eye disease, cataract, and so on,^15–17 consistently finding moderate to low quality, with professional sources outperforming others but engagement not correlating with reliability. Despite this, endometriosis-specific analyses on Chinese platforms remain scarce. This study addresses this gap by assessing video adherence to clinical health information standards and engagement on TikTok and Bilibili, aiming to inform strategies for improving digital health communication.

To address the research gap on the quality of endometriosis information delivered via short-video platforms, this study conducts a cross-sectional content analysis of videos on TikTok and Bilibili. We evaluate video quality and reliability using validated instruments—the Global Quality Score (GQS), modified DISCERN (mDISCERN), Journal of the American Medical Association (JAMA) benchmarks, and the Video Information and Quality Index (VIQI). By systematically appraising platform content, we aim to identify shortcomings in current health communication practices. We advance three hypotheses. First, content quality differs between platforms because of differences in user demographics and recommendation algorithms. Second, videos produced by healthcare professionals and institutions achieve higher quality scores than videos from nonprofessional sources. Third, user engagement metrics such as likes and shares do not correlate with objective quality scores.

Method

Video selection and data extraction

Videos were searched using the simplified Chinese keyword “子宫内膜异位症” (endometriosis) on both TikTok and Bilibili, with searches conducted in incognito mode without user login to prevent algorithmic personalization and ensure reproducibility; the top 100 videos from each platform were retrieved based on the platforms’ default sorting algorithms, which prioritize relevance and popularity. Inclusion criteria required videos to be publicly accessible, in Chinese language, and directly addressing aspects of endometriosis such as etiology, symptoms, diagnosis, treatment options, prevention strategies, or patient experiences; exclusion criteria encompassed videos that were advertisements, promotional content, duplicates, or irrelevant to endometriosis, resulting in the exclusion of five videos and the inclusion of 195 videos (99 from TikTok and 96 from Bilibili) as depicted in the flowchart (Figure 1). For each video included, data collection was conducted in strict accordance with platform guidelines: videos were not downloaded, no personal data (such as usernames or identifiable information) were stored, and collection was limited to publicly available metrics. The following variables were manually extracted: platform (TikTok or Bilibili), upload date, video duration in seconds, number of likes, number of collections (favorites or bookmarks), number of comments, number of shares, uploader type classified as professional individuals (licensed physicians, nurses, or health experts with verifiable credentials), nonprofessional individuals (patients, laypersons, or influencers without medical qualifications), or professional institutions (hospitals, clinics, or medical organizations), and content type categorized as disease knowledge, treatment, Traditional Chinese Medicine, or other.

Figure 1.

Study selection flow for short videos on endometriosis. Initial retrieval identified 200 records using the Chinese keyword for endometriosis, comprising one hundred TikTok and 100 Bilibili videos.

Quality and reliability assessment

Video quality and reliability were evaluated using four validated instruments: the Global Quality Score (GQS), a 5-point Likert scale assessing overall educational value where 1 indicates poor quality and poor flow with completely useless information, 2 represents generally poor quality and poor flow with limited usefulness, 3 denotes moderate quality and suboptimal flow with somewhat useful information, 4 signifies good quality and generally good flow with useful content, and 5 reflects excellent quality and excellent flow with highly useful information^18,19; the modified DISCERN (mDISCERN), a 5-item binary tool (1 point for yes, 0 for no) evaluating reliability through questions on whether aims are clear and achieved, reliable sources of information are used, the information is balanced and unbiased, additional sources are listed for reference, and areas of uncertainty are mentioned, yielding a total score from 0 to 5^20,21; the JAMA benchmarks, a 4-point scale assessing credibility where 1 point is awarded each for authorship (provision of authors’ names, affiliations, and credentials), attribution (clear listing of references and sources with copyright information), disclosure (prominent declaration of ownership, sponsorship, funding, or conflicts of interest), and currency (indication of content posting and update dates), resulting in scores from 0 to 4²²; and the Video Information and Quality Index (VIQI), a 20-point scale comprising four domains each rated from 1 (poor) to 5 (excellent)—flow of the video (smoothness and logical progression), clarity of information (accuracy and comprehensibility), quality of production (sound and image resolution), and precision (alignment between title, description, and content)—with a total score calculated by summing the domain ratings.²³ Two board-certified gynecologists with >5 years of experience independently scored all videos while blinded to each other's assessments to reduce bias. Reviewers watched each video fully at normal speed, pausing as needed for notes, and applied scoring tools via a predefined rubric.

Statistical analysis

Data were analyzed using R language (version 4.2.3) with packages including stats for basic statistical tests, irr for inter-rater reliability calculations, and dplyr for data manipulation; normality of continuous variables was assessed using the Shapiro–Wilk test via shapiro.test; descriptive statistics presented continuous variables as mean ± standard deviation (SD) for normally distributed data or median [interquartile range (IQR)] for non-normal distributions, and categorical variables as frequencies and percentages; group comparisons for continuous variables employed the Wilcoxon rank-sum test (wilcox.test) for two groups, Kruskal–Wallis test (kruskal.test) for more than two groups followed by Dunn's post hoc test if significan; associations between variables were examined using Spearman's rank correlation coefficient; all tests were two-sided with statistical significance set at P < .05, and no adjustments for multiple comparisons were applied given the exploratory nature of the study.

Ethical considerations

This study was conducted in accordance with the Declaration of Helsinki and ethical guidelines for internet-mediated research. The study protocol was reviewed by the Ethics Review Committee of Huzhou Maternity and Child Health Care Hospital, which granted a formal exemption from full review (Waiver No.2026-J-S-002), as the study involved the analysis of publicly available secondary data. All data were ethically sourced, and all videos analyzed were publicly accessible at the time of data collection. Individual informed consent was not required as no direct interaction with human participants occurred, and no personally identifiable information was published. Data collection and analysis were conducted in full compliance with the Terms of Service of both TikTok (Douyin) and Bilibili.

Result

Engagement and sources

Across 195 videos, platform-level engagement patterns differed markedly. Compared with TikTok, Bilibili videos accrued fewer likes, collections, comments, and shares on a per-video median basis yet were substantially longer in duration (Table 1). Median likes were 18.5 on Bilibili versus 355 on TikTok, median collections were 30.5 versus 118, median comments were 1 versus 57, and median shares were 0 versus 101. All between-platform differences in these four metrics were significant with P values less than .001 by Wilcoxon rank-sum testing. Video duration showed the opposite pattern, with Bilibili videos being much longer than TikTok videos. The median duration on Bilibili was 281.5 seconds compared with 64.67 seconds on TikTok, also with P less than .001. These findings align with platform design differences, where TikTok favors short, rapidly consumed content that tends to concentrate engagement.

Table 1.

Platform-level engagement and video duration for endometriosis-related short videos on Bilibili and TikTok.

Variable	Bilibili (mean ± SD) n = 96	Tiktok (mean ± SD) n = 99	Bilibili (median[IQR]) n = 96	Tiktok (median[IQR]) n = 99	P value
Likes^a	2317.68 ± 9242.31	1381.74 ± 3695.24	18.5 [5.50–64.00]	355 [46.00–1005.00]	<0.0001
Collections^a	565.26 ± 2431.36	428.67 ± 810.94	30.5 [7.50–119.50]	118 [19.00–438.00]	0.00129
Comments^a	132.19 ± 461.25	290.78 ± 606.7	1 [0.00–5.50]	57 [6.00–242.00]	<0.0001
Shares^a	153.29 ± 820.09	532.54 ± 1095.81	0 [0.00–4.00]	101 [8.00–425.00]	<0.0001
Duration^a	797.4 ± 1085.12	106.02 ± 140.81	281.5 [124.00–1152.50]	64.67 [42.60–94.97]	<0.0001

Wilcoxon rank-sum test. Likes, collections, comments, and shares denote per-video engagement counts. Duration is measured in seconds.

Uploader composition was dominated by professional individuals. As shown in Figure 2A, professional individuals accounted for the vast majority of uploads 163 of 195, followed by nonprofessional individuals 23 and professional institutions 9. The distribution of uploader types differed by platform Figure 2B. On Bilibili, professional individuals represented 76% of uploaders, nonprofessional individuals 18 point 8%, and professional institutions 5 point 2%. On TikTok, professional individuals constituted 90 point 9%, nonprofessional individuals 5 point 1%, and professional institutions 4%. The between-platform distribution was significantly different by Fisher's exact test with P equal to .009, suggesting platform ecosystems attract distinct contributor profiles.

Figure 2.

Distribution of uploader types overall and by platform. (A) shows the overall composition of uploader categories among 195 videos, with professional individuals constituting the majority, followed by nonprofessional individuals and professional institutions. (B) presents the proportional distribution of uploader types within each platform Bilibili and TikTok. The between-platform distribution differed significantly by Fisher's exact test P = .009. Percentages denote the share of each uploader category within platform.

Engagement metrics varied by uploader type Table 2. Relative to nonprofessional individuals, content from professional individuals attracted higher median likes 85 versus 9, higher collections 69 versus 18, and higher shares 13 versus 0, with Kruskal–Wallis P values less than .001, .014, and .008, respectively. Comments showed a similar tendency eight versus three but did not reach statistical significance P equal to .1. Professional institutions did not consistently outperform professional individuals, showing intermediate engagement values. Video length also differed across uploader categories P equal to .002, with nonprofessional individuals posting the longest videos, median 284 seconds, whereas professional individuals and institutions posted shorter content medians, approximately 90–96 seconds. These patterns imply that professional creators achieve greater visibility and interaction even with briefer formats, whereas longer videos from nonprofessionals may not translate into higher audience engagement.

Table 2.

Engagement and duration by uploader type across all videos.

Variables	Total	Nonprofessional individuals (n = 23)	Professional individuals (n = 163)	Professional institutions (n = 9)	P value
Likes^a	56.00 (11.00, 647.00)	9.00 (2.00, 66.00)	85.00 (16.00, 764.00)	45.00 (3.00, 1005.00)	<.001
Collections^a	56.00 (10.00, 251.00)	18.00 (2.00, 83.00)	69.00 (12.00, 313.00)	26.00 (5.00, 194.00)	.014
Comments^a	6.00 (1.00, 133.00)	3.00 (0.00, 25.00)	8.00 (1.00, 156.00)	4.00 (1.00, 34.00)	.1
Shares^a	5.00 (0.00, 141.00)	0.00 (0.00, 4.00)	13.00 (0.00, 155.00)	2.00 (0.00, 129.00)	.008
Duration^a	104.58 (59.87, 320.60)	284.00 (143.00, 1182.00)	94.03 (55.97, 302.00)	96.00 (39.78, 156.00)	.002

Kruskal–Wallis test. Uploader types comprise nonprofessional individuals, professional individuals, and professional institutions. Duration is measured in seconds.

Together, the platform contrasts and uploader effects indicate that audience interaction is shaped both by the structural characteristics of short-video ecosystems and by the credibility signaled by uploader identity. The lack of a significant difference in comments across uploader types, despite pronounced differences in likes and shares, may reflect divergent user behaviors across platforms and content styles rather than uniformly higher participatory discourse.

Content themes and engagement

Video content was dominated by disease knowledge. As illustrated in Figure 3A, nearly two thirds of videos focused on disease knowledge 125 of 195, followed by treatment 47, Traditional Chinese Medicine 17, and other topics six. The composition of content differed significantly by platform Figure 3B. On Bilibili, disease knowledge accounted for 81-point 2%, whereas on TikTok it represented 47-point 5%, with a corresponding enrichment of treatment content on TikTok 42-point 4%. Fisher's exact testing indicated a significant between-platform difference with P less than .001, suggesting that platform ecosystems prioritize distinct informational niches.

Figure 3.

Distribution of video content categories overall and by platform. (A) Overall composition of content among 195 videos, categorized as disease knowledge, treatment, Traditional Chinese Medicine, and other. (B) Proportional distribution of content categories within Bilibili and TikTok. The composition differed significantly between platforms by Fisher's exact test P < .001.

Engagement varied substantially across content categories Table 3. Treatment videos received the highest audience response, with markedly elevated medians for likes 445, collections 134, comments 79, and shares 99, all significantly greater than other categories by Kruskal–Wallis testing P values ranging from less than .001 to .014. Disease-knowledge videos, although most prevalent, showed modest engagement medians, likes 41, collections 41, comments four, shares two. Traditional Chinese Medicine content drew comparatively limited interaction across all metrics, and the small other category exhibited low engagement as well. These patterns imply that audience attention is preferentially captured by treatment-oriented content, potentially reflecting user demand for actionable management information.

Table 3.

Engagement and duration by video content category across all videos.

Variables	Total	Disease knowledge(n = 125)	Traditional chinese medicine(n = 17)	Treatment(n = 47)	Other(n = 6)	P
Likes^a	56.00 (11.00, 647.00)	41.00 (9.00, 682.00)	33.00 (10.00, 46.00)	445.00 (85.00, 920.00)	86.50 (10.00, 118.00)	.001
Collections^a	56.00 (10.00, 251.00)	41.00 (9.00, 210.00)	27.00 (7.00, 86.00)	134.00 (27.00, 543.00)	77.50 (20.00, 174.00)	.014
Comments^a	6.00 (1.00, 133.00)	4.00 (1.00, 106.00)	2.00 (1.00, 6.00)	79.00 (10.00, 305.00)	1.50 (0.00, 4.00)	<.001
Shared^a	5.00 (0.00, 141.00)	2.00 (0.00, 101.00)	3.00 (0.00, 17.00)	99.00 (9.00, 333.00)	0.50 (0.00, 1.00)	<.001
Duration^a	104.58 (59.87, 320.60)	161.00 (72.00, 586.82)	67.85 (50.09, 154.00)	76.81 (50.97, 102.35)	236.00 (70.00, 807.00)	<.001

Kruskal–Wallis test. Content categories include disease knowledge, Traditional Chinese Medicine, treatment, and other. Duration is measured in seconds.

Video duration also differed by content type, with P less than .001. Disease-knowledge and other videos tended to be longer medians, 161 seconds and 236 seconds, respectively, whereas treatment and Traditional Chinese Medicine videos were shorter medians, approximately 77 and 68 seconds. The inverse association between length and engagement across categories suggests that concise, management-focused narratives may be more effective at eliciting user interaction on short-video platforms, while longer expository formats may be less likely to stimulate reactive behaviors such as liking, sharing, or collecting.

Quality assessment overview

Platform comparisons indicated modest but meaningful differences in informational quality Table 4. Bilibili videos achieved higher median scores on GQS and mDISCERN than TikTok with P equal to .0123 and P less than .0001, respectively, and also outperformed TikTok on the JAMA benchmark with P less than .0001. VIQI did not differ significantly between platforms. Taken together, these findings suggest that Bilibili content exhibits greater informational rigor and source transparency, while audiovisual production quality is broadly comparable across platforms.

Table 4.

Quality metrics by platform for endometriosis-related short videos on Bilibili and TikTok.

Variable	Bilibili (mean ± SD) n = 96	Tiktok (mean ± SD) n = 99	Bilibili (median[IQR]) n = 96	Tiktok (median[IQR]) n = 99	P value
GQS^a	3.29 ± 0.81	3.04 ± 0.78	3 [3.00–4.00]	3 [3.00–4.00]	.0123
mDISCERN^a	2.78 ± 0.91	2.18 ± 0.68	3 [2.00–3.00]	2 [2.00–3.00]	<.0001
JAMA^a	0.84 ± 0.73	0.35 ± 0.48	1 [0.00–1.00]	0 [0.00–1.00]	<.0001
VIQI^a	11.22 ± 2.63	10.75 ± 1.84	12 [9.50–13.00]	11 [9.00–12.00]	.106

Wilcoxon rank-sum test. Quality instruments were the Global Quality Score GQS, modified DISCERN mDISCERN, Journal of the American Medical Association JAMA benchmark, and Video Information and Quality Index VIQI.

Uploader type was strongly associated with quality metrics, Table 5 and Figure 4. Videos from professional individuals and professional institutions scored higher than those from nonprofessional individuals across all instruments. Median GQS was three among professional creators versus two among nonprofessionals with Kruskal–Wallis P less than .001. A similar gradient was observed for mDISCERN with medians of three versus two and P equal to .003. JAMA scores were lowest in nonprofessional content median zero, whereas professional individuals and institutions both reached a median of one with P less than .001, indicating better disclosure of authorship, attribution, and currency among professional sources. VIQI paralleled these trends, with higher medians in professional groups 11–12 compared with nine among nonprofessionals with P less than .001, reflecting superior technical and structural presentation.

Figure 4.

Quality metrics by uploader type. (A) Global Quality Score (GQS) by uploader category. (B) Modified DISCERN (Mdiscern) by uploader category. (C) Journal of the American Medical Association (JAMA benchmark) by uploader category. (D) Video Information and Quality Index (VIQI) by uploader category. Boxplots display distributions with individual data points overlaid; annotations indicate overall ANOVA results and multiple-comparison markers.

Table 5.

Quality metrics by uploader type across all videos.

Variable	Total	Nonprofessional individuals	Professional individuals	Professional institutions	P value
GQS^a	3.00 (3.00, 4.00)	2.00 (2.00, 3.00)	3.00 (3.00, 4.00)	3.00 (3.00, 4.00)	<.001
mDISCERN^a	3.00 (2.00, 3.00)	2.00 (2.00, 2.00)	3.00 (2.00, 3.00)	3.00 (2.00, 3.00)	.003
JAMA^a	1.00 (0.00, 1.00)	0.00 (0.00, 0.00)	1.00 (0.00, 1.00)	1.00 (0.00, 1.00)	<.001
VIQI^a	11.00 (9.00, 12.00)	9.00 (8.00, 10.00)	11.00 (10.00, 13.00)	12.00 (9.00, 13.00)	<.001

Kruskal–Wallis test. Uploader types comprise nonprofessional individuals, professional individuals, and professional institutions. Quality instruments were the Global Quality Score GQS, modified DISCERN mDISCERN, Journal of the American Medical Association JAMA benchmark, and Video Information and Quality Index VIQI.

The dispersion of scores in Figure 4 underscores these contrasts. Nonprofessional content clustered at lower values with narrow spread for GQS, mDISCERN, and JAMA, consistent with uniformly limited reference to evidence and source information. Professional individuals displayed consistently higher central tendencies and tighter distributions for GQS and mDISCERN, suggesting more reliable and balanced information delivery. Professional institutions exhibited comparable or slightly higher medians than professional individuals for VIQI and GQS, albeit with greater variability, which may reflect heterogeneous production standards across institutional accounts.

Overall, the convergence of platform-level and uploader-level analyses indicates that both ecosystem context and source identity shape the quality of short-video health information. The absence of a platform difference in VIQI alongside significant differences in GQS, mDISCERN, and JAMA implies that surface production features do not guarantee substantive informational quality. These patterns reinforce the study hypothesis that professional provenance is associated with higher objective quality while also highlighting opportunities for platforms to incentivize evidence citation and source disclosure.

Correlations among metrics

Across all videos, engagement indicators were strongly intercorrelated while their relationships with objective quality were weak or inconsistent Figure 5A. Likes, collections, comments, and shares showed high positive correlations with one another, Spearman rho approximately 0.78–1.00, all P less than .001, indicating that popular videos tended to be uniformly popular across engagement dimensions. In contrast, correlations between engagement and GQS or mDISCERN were small in magnitude and mixed in direction. Notably, GQS displayed a negligible association with likes and collections and a modest positive correlation with VIQI. JAMA correlated positively with mDISCERN and GQS but was negatively or near-null with engagement, underscoring that source transparency and evidence citation do not systematically translate into higher audience reactions. VIQI showed moderate positive correlations with mDISCERN, GQS, and JAMA, suggesting that better technical presentation often co-occurs with stronger informational features, though the overlap was far from complete.

Figure 5.

Spearman correlation matrices for engagement and quality metrics. (A) All videos combined n = 195. (B) TikTok subset n = 99. (C) Bilibili subset n = 96. Cells display pairwise Spearman rho values with significance annotations; color intensity reflects correlation magnitude. Metrics include likes, collections, comments, shares, Global Quality Score (GQS), modified DISCERN (Mdiscern), Journal of the American Medical Association (JAMA benchmark), and Video Information and Quality Index (VIQI).

Platform-stratified analyses revealed similar patterns with some divergences, Figure 5B and Figure 5C. On TikTok, mDISCERN, GQS, JAMA, and VIQI were positively interrelated rho approximately 0.41–1.00 and each showed only weak associations with engagement metrics. Shares on TikTok were essentially uncorrelated with JAMA, reinforcing the decoupling between disclosure standards and virality. On Bilibili, quality instruments were again moderately to strongly correlated with one another, and their associations with engagement were minimal. The magnitude of engagement–engagement correlations was slightly lower than on TikTok yet remained substantial. Taken together, these matrices corroborate the hypothesis that user engagement is a poor proxy for objective informational quality, while production quality aligns only partially with evidentiary rigor and transparency.

Discussion

This cross-sectional analysis demonstrates that endometriosis-related videos on TikTok and Bilibili are characterized by moderate overall quality, with significant variations attributable to platform architecture and uploader credentials. Bilibili videos exhibited superior scores on reliability (mDISCERN), transparency (JAMA), and global quality (GQS) compared to TikTok, consistent with prior evaluations of health content on these platforms. This disparity may stem from Bilibili's emphasis on longer-form content, which allows for more comprehensive explanations and source citations, whereas TikTok's short-video model favors succinct, engaging narratives that may sacrifice depth for accessibility. Conversely, TikTok's higher engagement metrics—likes, shares, comments, and collections—align with its algorithmic promotion of viral, emotionally resonant material, a feature that has been noted in studies of other health topics such as irritable bowel syndrome and radiotherapy.^24–27 It is important to acknowledge a potential framework-content mismatch when applying tools like mDISCERN and JAMA—originally designed for formal medical information—to user-generated content. These platforms are often valued by patients for lived experience, emotional support, and peer validation, dimensions that clinical assessment tools may systematically penalize. Consequently, a “low quality” score in this study reflects a divergence from established clinical information standards rather than a lack of value for patient community building. However, it is crucial to distinguish between content that scores low due to its narrative nature and content that propagates misinformation. The primary risk arises not from the narrative format itself, but when such content contains unverified medical claims that could be interpreted as a substitute for professional medical advice.

The predominance of professional individuals as uploaders (83.6%) and their association with elevated quality scores across all metrics reinforce the value of expert involvement in digital health dissemination. Nonprofessional content, while comprising a smaller proportion, consistently scored lower, particularly in JAMA benchmarks for authorship and attribution, echoing findings from analyses of endometriosis information on platforms like Instagram and Facebook.^28,29 This gradient suggests that credentialed sources are more likely to adhere to evidence-based standards, yet the persistence of nonprofessional videos highlights a potential vulnerability in user-generated ecosystems, where misinformation can proliferate amid limited oversight. Platform-specific uploader distributions, with TikTok favoring professionals and Bilibili hosting more nonprofessionals, may reflect differing community norms and algorithmic incentives, as observed in comparative studies of gastric cancer and gastrointestinal bleeding content.^29,30

Content themes further illuminate user priorities, with disease knowledge dominating overall but treatment-focused videos eliciting the strongest engagement. This preference for actionable information mirrors patient journeys documented in social media analyses, where individuals with endometriosis seek practical guidance amid diagnostic delays and chronic symptoms. However, the shorter duration and lower quality of treatment videos raise concerns about incomplete or anecdotal advice, potentially exacerbating misinformation in a condition already plagued by diagnostic challenges. The inverse relationship between video length and engagement underscores a broader tension in short-video platforms: brevity enhances virality but may compromise informational completeness.

Critically, the weak correlations between engagement metrics and quality scores indicate that popularity is not a reliable indicator of accuracy or reliability. This decoupling, evident across both platforms, aligns with systematic reviews of health videos on social media, where viral content often prioritizes sensationalism over evidence. Recommendation algorithms on short-video platforms are designed to maximize watch time and interaction, often prioritizing visually stimulating or emotionally charged content over medically accurate but drier educational material. This algorithmic bias likely contributes significantly to the observed lack of correlation between information quality and user engagement. Such patterns pose risks for vulnerable audiences, as endometriosis patients frequently turn to social media for support and information amid gaps in traditional healthcare. The lack of correlation between engagement and quality scores likely reflects the dichotomy between user needs and clinical standards: users engage with content that offers emotional resonance and narrative support, which naturally differs from the rigid, evidence-based structures required by medical assessment frameworks. However, this disconnect remains critical to document, as patients may inadvertently treat highly engaging but anecdotal content as actionable medical advice.

These results have important implications for digital health strategies. Platforms like TikTok and Bilibili could enhance content moderation by promoting verified professional accounts and requiring source disclosures, thereby aligning engagement with quality. Clinicians should educate patients on evaluating online resources, perhaps integrating social media literacy into consultations. Moreover, collaborations between health organizations and influencers could amplify evidence-based messaging.

Limitations include the cross-sectional design, which captures a snapshot and may not reflect temporal changes in content. The focus on Chinese platforms limits generalizability of our findings to other linguistic and cultural contexts. The unique user demographics and algorithmic architectures of TikTok (Douyin) and Bilibili create a specific digital ecosystem that may not perfectly mirror Western platforms like YouTube or Instagram, although similar trends regarding misinformation appear in global studies. Subjective elements in scoring tools, despite high inter-rater reliability, introduce potential bias. Future research could employ longitudinal tracking, user surveys to assess impact on health behaviors, or interventions to improve content quality. Furthermore, the methodological framework applied here—utilizing GQS, mDISCERN, and VIQI—demonstrates high adaptability and can be effectively employed to evaluate health information quality across various other chronic conditions and emerging digital platforms.

Conclusion

This study provides a comprehensive evaluation of endometriosis-related videos on Bilibili and TikTok, revealing that platform architecture and uploader identity significantly influence information quality and engagement. Bilibili's ecosystem, characterized by longer video durations, was associated with superior reliability, transparency, and global quality scores compared to TikTok, which favored briefer, high-engagement content that often lacked evidentiary depth. A critical finding was the dominance of professional individuals and institutions in producing high-quality material, consistently outperforming nonprofessional creators across all objective benchmarks; however, this superior quality did not correlate with user engagement metrics like likes or shares. Instead, audience attention was disproportionately directed toward treatment-oriented and emotionally resonant content regardless of medical accuracy, highlighting a concerning decoupling where popularity serves as a poor proxy for clinical validity. Consequently, while short-video platforms offer accessible health education, the prevalence of general disease knowledge themes, contrasted with the high user demand for specific treatment advice, creates a vulnerability for patients seeking actionable guidance.

Footnotes

ORCID iD

Yufei Liang

Funding

This work was supported by the Huzhou Science and Technology Bureau [Project number: 2023GYB21].

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Contributorship

ZL: conceptualization; methodology; data curation; supervision; writing—review and editing; critical revision for intellectual content. YM: data curation; investigation; formal analysis; visualization; writing—original draft. YL: resources; validation; data curation; project administration; writing—review and editing. All authors contributed to manuscript writing and editing and approved the final version for submission.

References

Horne

Missmer

. Pathophysiology, diagnosis, and management of endometriosis. Br Med J 2022: 379.

Salliss

Farland

Mahnert

, et al. The role of gut and genital microbiota and the estrobolome in endometriosis, infertility and chronic pelvic pain. Hum Reprod Update 2022; 28: 92–131.

Bonavina

Taylor

. Endometriosis-associated infertility: from pathophysiology to tailored treatment. Front Endocrinol (Lausanne) 2022; 13: 1020827.

Coccia

Nardone

Rizzello

. Endometriosis and infertility: a long-life approach to preserve reproductive integrity. Int J Environ Res Public Health 2022; 19: 6162.

Maggiore

ULR

Chiappa

Ceccaroni

, et al. Epidemiology of infertility in women with endometriosis. Best Pract Res Clin Obstet Gynaecol 2024; 92: 102454–102454.

Velarde

Bucu

MEM

Habana

MAE

. Endometriosis as a highly relevant yet neglected gynecologic condition in Asian women. Endocr Connect 2023; 12: e230169.

Shen

, et al. Global, regional, and national prevalence and disability-adjusted life-years for endometriosis in 204 countries and territories, 1990–2019: findings from a global burden of disease study. European Journal of Obstetrics & Gynecology and Reproductive Biology: X 2025; 25: 100363.

Chen

Wang

. Social Media use for health purposes: systematic review. J Med Internet Res 2021; 23: e17917.

Wang

Y-H

T-J

Wang

S-Y

. Causes and characteristics of short video platform internet community taking the TikTok short video application as an example. IEEE 2019: 1–2.

10.

Montag

Yang

Elhai

. On the psychology of TikTok use: a first glimpse from empirical findings. Front Public Health 2021; 9: 641673.

11.

Zhou

. Understanding user behaviors of creative practice on short video sharing platforms-a case study of TikTok and Bilibili. Ohio, United States: University of Cincinnati, 2019.

12.

Wang

. Community-building on bilibili: the social impact of danmu comments. Media and Communication 2022; 10: 54–65.

13.

Suarez-Lledo

Alvarez-Galvez

. Prevalence of health misinformation on social Media: systematic review. J Med Internet Res 2021; 23: e17187.

14.

Moorhead

Hazlett

Harrison

, et al. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 2013; 15: 85.

15.

Che

, et al. The quality and reliability of short videos about hypertension on TikTok: a cross-sectional study. Sci Rep 2025; 15: 25042.

16.

Wang

Zhang

Cao

, et al. Quality and content evaluation of thyroid eye disease treatment information on TikTok and bilibili. Sci Rep 2025; 15: 25134.

17.

Cao

Zhang

Zhu

, et al. Quality of cataract-related videos on TikTok and its influencing factors: a cross-sectional study. Digital Health 2025; 11: 20552076251365086.

18.

Yeung

AWK

. Evaluation of content quality of online health information by global quality score: a case study of researchers misnaming it and citing secondary sources. Publications 2025; 13: 23.

19.

Sahin

Seyyar

. Assessing the scientific quality and reliability of YouTube videos about chemotherapy. Medicine (Baltimore) 2023; 102: e35916.

20.

Uzun

. Assessment of reliability and quality of videos on medial epicondylitis shared on YouTube. Cureus 2023; 15: e37250.

21.

Zheng

Chan

Liu

, et al. Hepatocellular carcinoma: current drug therapeutic status, advances and challenges. Cancers (Basel) 2024; 16: 1582.

22.

Ozsoy

. Evaluation of YouTube videos about smile design using the DISCERN tool and journal of the American medical association benchmarks. J Prosthet Dent 2021; 125: 151–154.

23.

Albayrak

Büyükçavuş

. Does YouTube offer high-quality ınformation? Evaluation of patient experience videos after orthognathic surgery. Angle Orthod 2023; 93: 409–416.

24.

Feng

, et al. Evaluating the content and quality of irritable bowel syndrome videos on social media platforms in China: focus on TikTok and bilibili. Digital Health 2025; 11: 20552076251382029.

25.

Yao

, et al. Short video platforms as sources of health information about HPV vaccine: a content and quality analysis. Digital Health 2025; 11: 20552076251379340.

26.

Zhang

Huang

Tong

, et al. Quality assessment of spinal cord injury-related health information on short-form video platforms: cross-sectional content analysis of TikTok, kwai, and BiliBili. Digital Health 2025; 11: 20552076251374226.

27.

Liang

Yang

, et al. Quality and reliability of prostate cancer-videos on TikTok and bilibili: cross-sectional content analysis study. Digital Health 2025; 11: 20552076251376263.

28.

Shiplo

Gholiof

Sarin

, et al.

Endometriosis influencers on Instagram: who are they and what are they posting?

J Minim Invasive Gynecol 2025; 32: 693–700.

29.

Wang

Yao

Wang

, et al. Bilibili, TikTok, and YouTube as sources of information on gastric cancer: assessment and analysis of the content and quality. BMC Public Health 2024; 24: 57.

30.

Wang

Liu

Yang

, et al. Assessing the content and quality of GI bleeding information on bilibili, TikTok, and YouTube: a cross-sectional study. Sci Rep 2025; 15: 14856.

Cross-platform comparison of the quality,reliability,and engagement of endometriosis-related videos on TikTok and Bilibili: A cross-sectional study

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Method

Video selection and data extraction

Quality and reliability assessment

Statistical analysis

Ethical considerations

Result

Engagement and sources

Content themes and engagement

Quality assessment overview

Correlations among metrics

Discussion

Conclusion

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

Contributorship

References