All methods were performed in strict accordance with the relevant guidelines and regulations to ensure ethical compliance. This research received formal authorisation and approval from the Ethics and Institutional Review Committee at Hainan Provincial People’s Hospital (Approval No: Med-Eth-Re [2023] 01). Furthermore, all patients provided written informed consent after being thoroughly briefed about the study prior to their participation. The flow diagram of our study design is shown in Supplementary Fig. 1.
Acquiring and refining data
For this study, we retrieved the gene expression profile statistics of OA and normal tissues from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/, accessed on 21 November 2023). Using ‘osteoarthritis’ as the search term, we obtained sequencing data for OA synovial tissues from two datasets, GSE55235 and GSE46750, along with their respective platform data, GPL96 and GPL10558. Specifically, GSE55235 included sequencing data from 10 OA and 10 normal synovial tissue samples, while GSE46750 provided sequencing data from 12 OA and 12 normal synovial tissue samples. NRGs were sourced from two key resources: 159 NRGs were identified from the necroptosis pathway (hsa04217) in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (https://www.kegg.jp/, accessed on 21 November 2023), and an additional 67 NRGs were curated from existing literature11. Finally, we integrated the intersections of these two gene sets, including the 204 NRGs, for subsequent analyses.
Differential analysis of NRGs
We utilised R software (version 4.3.1) to normalise the expression data matrix obtained from the GSE55235 dataset. Differential expression analysis was conducted on the normalized gene expression using the ‘limma’ R package to identify statistically significant differences between OA and normal synovial tissues. A boxplot was subsequently generated to visualize the differentially expressed NRGs between OA and normal synovial tissues, ensuring a clear depiction of the variations. An adjusted P value of < 0.05 and |log2FC| > 1 were used as the threshold for significance, and a heatmap was produced using the ‘pheatmap’ package.
Weighted gene co-expression network analysis (WGCNA)
The ‘WGCNA’ R package was employed to construct a weighted gene co-expression network using the OA tissue expression matrix. A sample tree was initially drawn to detect and exclude any potential outliers. The adjacency matrix was subsequently transformed into a topological overlap matrix (TOM), ensuring that the scale-free topology property was preserved with an R2 value set to 0.90. Modules were identified using the dynamic tree-cut algorithm, with parameters deepSplit set to 2 and minModuleSize set to 50. Finally, we conducted a correlation analysis between the identified modules and occurrence of OA, pinpointing the module that demonstrated the strongest correlation with the disease phenotype.
Enrichment analysis
Using bioinformatics and evolutionary genomics tools, we intersected the differentially expressed genes (DEGs) identified from the WGCNA results that showed a positive correlation with OA. To visually represent the overlap, a Venn diagram was constructed, enabling us to pinpoint the NRGs specifically associated with OA. This approach allowed for a clear identification of OA-related NRGs through the intersection of relevant gene sets. The gene set was then analysed using the ‘ClusterProfiler’ R package for Gene Ontology (GO) and KEGG enrichment analysis. GO enrichment analysis provided insights into the biological roles of core genes, whereas KEGG enrichment analysis identified core gene signalling pathways, with significance criteria set at P value < 0.05 and q value < 0.05.
Protein–protein interaction (PPI) networks of core genes
The gene sets identified through the Venn analysis were imported into the STRING database (https://string-db.org/, accessed on 15 January 2024) to construct the PPI network. The analysis was restricted to Homo sapiens to ensure species specificity. An interaction score threshold of > 0.7 was applied to select active interaction sources for constructing the PPI networks. A PPI enrichment p-value of < 0.05 was set to determine the statistical significance of the network. The resulting PPI network was further analyzed using Cytoscape software (version 3.9.1), where the degree plugin was utilized to rank genes based on their connectivity. The top 10 genes with the highest interaction scores were selected for the PPI network analysis.
Machine learning-based screening of OA-related NRGs
We merged the GSE55235 and GSE46750 datasets from the GEO database and applied three different machine learning algorithms, including support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGB), to establish classifier models for OA and normal tissues. Using these methods, we produced residual box plots, residual reverse cumulative distribution plots, and gene importance distribution plots. Using the GEO data, ROC curves were plotted, and the AUC was computed to assess the predictive accuracy of the classifier models. We classified the accuracy levels as excellent (0.9 ≤ AUC < 1), good (0.8 ≤ AUC < 0.9), and non-informative (AUC = 0.5) according to the guidelines.
Correlation between CASP1 and immune cell infiltration in OA tissues
We used the single-sample gene set enrichment analysis (ssGSEA) method to appraise the level of immune cell infiltration in OA tissues. Box plots showed the differential expression of 29 immune-related gene sets between OA and normal tissues. The CIBERSORT algorithm assessed immune cell infiltration levels in OA and normal tissues. The xCell algorithm analysed the infiltration levels of 64 types of immune cells in OA tissues, with box plots displaying the differential expression levels of these cells.
Enrichment analysis of CASP1 higher and lower expression groups
The downloaded expression data matrix was normalised using R software. We used the ‘limma’ package for differential expression analysis of the normalised data, with criteria of |log2FC| > 1 and an adjusted P < 0.05 to verify DEGs. Volcano and heat maps were generated using the ‘ggplot2’ and ‘pheatmap’ packages. GO and KEGG enrichment analyses were performed on the gene sets using the ‘ClusterProfiler’ R package, which highlighted the signalling pathways and biological functions of the gene sets.
Validation of CASP1
We collected normal knee (n = 3) and OA (n = 3) synovial tissues from patients undergoing arthroscopic surgery for acute trauma and knee arthroplasty, respectively. Whole RNA was extracted using Beyozol (Beyotime Bio, Inc.), and cDNA was synthesised using the BeyoRT™ II cDNA Synthesis Kit. Quantitative real-time PCR (qRT-PCR) was then performed using the BeyoFast™ SYBR Green One-Step Kit on an Applied Biosystems Real-Time PCR System, and gene expression was quantified relative to GAPDH using the 2−ΔΔCt method.
In the western blot (WB) analysis, proteins were separated using sodium dodecyl sulphate–polyacrylamide gel electrophoresis (Bio-Rad) and transferred onto polyvinylidene fluoride membranes (Millipore). The membranes were then treated with a CASP1-specific primary antibody (Abcam) and subsequently with an HRP-conjugated secondary antibody (Santa Cruz Biotechnology). Signal detection was performed using enhanced chemiluminescence reagents (Thermo Fisher Scientific), and protein band intensity was measured with ImageJ software (version 1.53e), with GAPDH serving as the internal control.
Statistical analysis
All statistical analyses were carried out using R software for comprehensive data evaluation. Power calculations for sample size determination were also performed in R, with a power level set at 0.80 to ensure an adequate sample size for detecting the expected effect.