AI Neural Architecture Search: 20 Advances (2025)

1. Reinforcement Learning-Based Controllers

Early NAS methods employed reinforcement learning (RL) controllers to sequentially construct neural architectures. An RL-based NAS uses a controller (often an RNN or transformer) that proposes candidate architectures, receives a reward based on their performance, and updates to favor better designs. This approach brought more structured exploration than brute-force search, but it initially demanded extensive computational resources and careful tuning. Ongoing advances aim to improve RL controllers’ sample efficiency, stability, and adaptability – for example, by incorporating intrinsic exploration (to avoid local optima) or hierarchical action spaces. As a result, modern RL-based NAS can discover high-performing architectures with fewer trials and lower hardware costs than earlier attempts.

Recent research has improved the scalability of RL-based NAS. For instance, Cassimon et al. (2024) present a transformer-based RL controller that achieves competitive results on NAS benchmarks while greatly reducing the training overhead on larger search spaces. Notably, their RL agent required only marginally more training time to search a much larger space (NAS-Bench-301) compared to a smaller one (NAS-Bench-101), demonstrating strong scaling efficiency. This indicates that contemporary RL controllers, aided by better algorithms and reuse of learned policies, can handle broad architecture spaces without the prohibitive cost reported in earlier RL-NAS studies.

Cassimon, A., Mercelis, S., & Mets, K. (2024). Scalable reinforcement learning-based neural architecture search. Neural Computing and Applications, 37, 231–261. DOI: 10.1007/s00521-024-10445-2

2. Evolutionary Algorithms

Evolutionary algorithms (EAs) offer a nature-inspired NAS strategy by evolving a population of neural network architectures. Starting from an initial pool, architectures are mutated and recombined (crossover) over successive generations, and only the fittest (e.g. highest accuracy) survive. This approach intuitively explores a large search space but can be computationally intensive. Recent improvements make EAs more efficient and effective for NAS: for example, using multi-fidelity evaluations (only fully training a subset of candidates), applying diversity-preserving selection to avoid premature convergence, and quickly discarding unpromising lineage. These refinements allow evolutionary NAS to find high-quality models faster and with fewer evaluations than earlier evolutionary searches.

Modern evolutionary NAS can achieve excellent accuracy with substantially reduced computation. Xu and Ma (2023) demonstrate an evolutionary NAS method that combines improved Transformer blocks and multi-branch convolutional cells, reaching 97.24% accuracy on CIFAR-10 and 80.06% on CIFAR-100 with only ~1.5 GPU-days of search time. This is a remarkable improvement over older evolutionary searches that often took dozens of GPU-days. The evolved architecture in their study outperformed many human-designed models while using far fewer resources. Such results underscore how innovations like enriched search spaces and smarter evolution strategies enable EAs to discover state-of-the-art architectures more efficiently than before.

Xu, Y., & Ma, Y. (2023). Evolutionary neural architecture search combining multi-branch ConvNet and improved transformer. Scientific Reports, 13, 15791. DOI: 10.1038/s41598-023-42931-3

3. Differentiable NAS (DARTS)

Differentiable NAS methods (exemplified by DARTS) turn the discrete architecture search problem into a continuous optimization. They represent architecture choices by continuous parameters (e.g. weights for each candidate operation) so that gradient descent can be used to optimize the architecture. This drastically speeds up search (often by orders of magnitude) compared to discrete search methods like RL or EA. However, naive differentiable NAS can suffer from issues like performance collapse, where the search converges to degenerate architectures (e.g. those dominated by skip-connections) that perform poorly when discretized. To address this, researchers have introduced regularization techniques, better search space designs, and refined gradient estimators. These improvements stabilize the search process and prevent pathological solutions, enabling differentiable NAS to efficiently find strong architectures without collapsing into trivial ones.

Recent studies tackle the stability and generalization problems of DARTS. For example, Ye et al. (2023) identify that DARTS’s performance collapse is often due to certain operations (like skip connections) overwhelming the continuous relaxation. They propose a bi-level regularization called “Beta-Decay” to keep architecture parameter magnitudes in check, ensuring fair competition among operations. This simple fix significantly stabilized the DARTS search, yielding architectures that not only avoided an over-abundance of skip connections but also transferred better to new datasets. In evaluations, Beta-Decay regularization improved the stability of discovered architectures and their final test accuracy compared to vanilla DARTS. Such approaches demonstrate that careful regularization and mutation strategies (e.g. edge mutation in EM-DARTS) can effectively prevent collapse, allowing differentiable NAS to reliably find high-performing models.

Ye, P., He, T., Li, B., Chen, T., Bai, L., & Ouyang, W. (2023). β-DARTS++: Bi-level regularization for proxy-robust differentiable architecture search. arXiv preprint arXiv:2301.06393. DOI: 10.48550/arXiv.2301.06393

4. Weight Sharing and One-Shot Models

Weight-sharing NAS (one-shot NAS) trains a single over-parameterized “supernet” that contains all candidate sub-architectures, so that each sub-network can inherit weights instead of being trained from scratch. This dramatically reduces computational cost by evaluating many architectures with shared weights in one training run. However, one-shot models can introduce estimation bias – the performance of a sub-network with shared weights might not reflect its true standalone performance. Recent advancements focus on making weight sharing more reliable: e.g. carefully designed training schedules to fairly train all subnets, improved parameter-sharing schemes to minimize interference between sub-models, and calibration methods to better predict stand-alone accuracy from shared weights. These innovations improve the fidelity of one-shot NAS, enabling faster yet accurate architecture search.

Research has shown that naive weight sharing can lead to misleading evaluations, and addressing this issue greatly improves NAS outcomes. Wang et al. (2024) observed that in overly large search spaces, weight-sharing supernets give untrustworthy architecture rankings, hurting search effectiveness. To combat this, they introduced a “DNA” framework that divides the search space into smaller blocks and uses knowledge distillation from a teacher network to supervise supernet training. This block-wise approach significantly improved the correlation between shared-weight performance and true performance. As a result, their one-shot NAS became much more reliable and scalable – yielding architectures that, when fully trained, indeed delivered the high accuracy predicted during the search phase. Such techniques exemplify how enforcing fairness and using prior knowledge in supernet training can overcome the evaluation bias inherent in weight sharing.

Wang, G., et al. (2024). DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions. arXiv preprint arXiv:2403.01326. DOI: 10.48550/arXiv.2403.01326

5. Surrogate Modeling for Performance Prediction

Surrogate models predict the performance (e.g. accuracy) of candidate architectures without fully training them, thus speeding up NAS by avoiding costly evaluations. Common surrogates include models like Gaussian processes, random forests, or neural networks (“neural predictors”) that are trained on a sampled set of architectures and their known performance. Improved surrogate modeling has been a key focus recently: researchers strive to make these predictors more accurate and robust (so they generalize across the search space). A good surrogate can rapidly evaluate many architectures and guide the search away from poor options. By pruning weak candidates early and focusing on promising regions, surrogate-assisted NAS greatly reduces search time while still finding top-performing architectures.

Sophisticated surrogate models have demonstrably accelerated NAS. For example, Li et al. (2025) introduce an attention-enhanced predictor that estimates architecture accuracy with high precision, allowing their NAS algorithm to skip over low-potential models. In tests on NAS-Bench-101 and -201, this surrogate model improved prediction accuracy and led to faster convergence of the search. In parallel, other work has shown the power of novel surrogates: one 2024 study fine-tuned a large language model to act as a performance predictor by processing architectural descriptions, achieving the highest correlation to true accuracies among various methods. Using such a surrogate during evolutionary search significantly sped up the search and yielded final architectures that outperformed those found without surrogate guidance. These cases illustrate that better surrogates (from attention-based neural nets to LLM-based predictors) can cut NAS evaluation costs by orders of magnitude while maintaining accuracy in identifying top models.

Li, Y., Ma, R., Zhang, Q., Wang, Z., Zong, L., & Liu, X. (2025). Neural architecture search using attention-enhanced precise path evaluation and forward evolution. Scientific Reports, 15, 9664. DOI: 10.1038/s41598-025-94187-8. / Jawahar, G., Abdul-Mageed, M., Lakshmanan, L. V., & Ding, D. (2024). LLM performance predictors are good initializers for architecture search. arXiv preprint arXiv:2408.11330.

6. Meta-Learning and Transfer Learning Approaches

Meta-learning techniques in NAS leverage experience from past architecture searches to speed up new ones. Instead of starting each NAS from scratch, the algorithm “learns how to search” by extracting knowledge from prior tasks or known good architectures. This knowledge could be in the form of learned architecture embeddings, search strategies, or initialization weights that transfer to new tasks. By reusing prior knowledge, NAS converges faster on novel search spaces or datasets and often finds better solutions with less data. Transfer-learning-based NAS thus improves efficiency and makes architecture search feasible even in scenarios with limited computational budget or when a quick turnaround is needed for new tasks.

The benefits of transfer learning in NAS are evidenced by recent results. Shala et al. (2023) propose a meta-learned NAS surrogate that is trained across multiple prior architecture evaluation datasets. Using graph neural networks to embed architectures and a transformer to encode datasets, their method can start a new NAS with a well-informed performance predictor. In experiments on six computer vision benchmarks, this transfer-guided NAS achieved state-of-the-art accuracy on each task while also being as sample-efficient as one-shot NAS methods. In practical terms, their approach was able to design high-performing architectures on new datasets without an exhaustive search, thanks to the “experience” it gained from related tasks. Such outcomes underscore that meta-learning can dramatically reduce search time – one study reports reusing a previously trained NAS agent saved potentially hundreds of GPU-hours on subsequent tasks – making NAS more scalable in multi-task or continually evolving settings.

Shala, G., Elsken, T., Hutter, F., & Grabocka, J. (2023). Transfer NAS with Meta-Learned Bayesian Surrogates. In Proceedings of ICLR 2023. / Cassimon, A., Mercelis, S., & Mets, K. (2024). Scalable reinforcement learning-based neural architecture search. Neural Computing and Applications. DOI: 10.1007/s00521-024-10445-2

7. Domain-Specific Search Spaces

Tailoring the NAS search space to a specific application domain (vision, NLP, graph data, etc.) can greatly improve search efficiency and results. Instead of a one-size-fits-all set of operations, domain-specific NAS uses building blocks known to work well for that domain (e.g. convolutional kernels for images, transformer layers for text, or graph convolutions for graph data). By restricting the search to architectures composed of these relevant components, NAS avoids wasting time on irrelevant or suboptimal structures. This focused approach not only speeds up the search (fewer possibilities to explore) but also yields architectures that are better aligned with the domain’s characteristics and constraints.

The advantages of domain-specific NAS are evident in specialized studies. For example, Tzoumanekas et al. (2024) developed a NAS framework specifically for graph neural networks targeting social network bot detection. By designing the search space around Relational Graph Convolutional Network (RGCN) components and message-passing operations relevant to graphs, their NAS discovered a model that achieved 85.7% accuracy on the TwiBot-20 dataset, surpassing the prior state-of-the-art in that task. This graph-tailored NAS (“DFG-NAS”) outperformed generic architectures by capitalizing on domain knowledge (like graph connectivity patterns). Similarly, in NLP, researchers have constrained NAS to transformer-based modules and found it yields superior language models than searching over generic convolutional layers. These cases confirm that embedding domain-specific inductive biases into the search space leads to more efficient searches and more competitive architectures for the target domain.

Tzoumanekas, G., Chatzianastasis, M., Ilias, L., Kiokes, G., Psarras, J., & Askounis, D. (2024). A graph neural architecture search approach for identifying bots in social media. Frontiers in Artificial Intelligence, 7, 1509179. DOI: 10.3389/frai.2024.1509179

8. Hardware-Aware NAS

Hardware-aware NAS integrates device-specific constraints (like inference latency, memory footprint, energy consumption, or power usage) directly into the architecture search objective. Instead of optimizing only for accuracy, these methods search for architectures that achieve a balance between accuracy and efficiency on the target hardware (e.g., mobile phones, GPUs, or specialized accelerators). By doing so, the resulting models are not only theoretically effective but also practically deployable within the resource limits of real devices. This leads to neural architectures that make better use of hardware capabilities – for instance, meeting strict latency requirements or running within a limited battery budget – without manual tuning.

Hardware-aware NAS can produce models with impressive efficiency-performance trade-offs. Zhao et al. (2024) report a NAS-discovered model that reaches 80.6% top-1 accuracy on ImageNet while achieving an inference latency of only 0.78 milliseconds on an NVIDIA V100 GPU (with FP16 precision). This architecture, found by their multi-objective “CE-NAS” framework, was explicitly optimized for both accuracy and speed, resulting in a network dramatically faster than typical human-designed models at similar accuracy. Moreover, CE-NAS showed that by including carbon footprint and energy metrics in the objective, it could reduce NAS-related energy consumption by up to 7.2× compared to standard methods, all without sacrificing model quality. These outcomes underscore that hardware-aware NAS can yield state-of-the-art accuracy models that are also efficient enough for real-time use, something conventional NAS (focused solely on accuracy) might miss.

Zhao, Y., Liu, Y., Jiang, B., & Guo, T. (2024). CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework. (NeurIPS 2024 Poster). OpenReview preprint. / Gao, X., Guo, L., & Chen, Y. (2022). GraphNAS++: Distributed architecture search for graph neural networks. IEEE Transactions on Knowledge and Data Engineering. DOI: 10.1109/TKDE.2022.3178153

9. Multi-Objective NAS

Traditional NAS optimizes a single objective (usually accuracy), but multi-objective NAS optimizes several objectives simultaneously – for example, maximizing accuracy while minimizing model size, latency, or energy use. The result of a multi-objective NAS is not one “best” architecture, but a Pareto front of architectures, each offering a different trade-off (e.g., one might be slightly less accurate but much smaller, another might be very accurate but a bit slower, etc.). By exploring these trade-offs in a unified search, multi-objective NAS provides practitioners with a set of efficient models tuned to different needs. Current research on multi-objective NAS emphasizes finding diverse solutions efficiently and fairly, often using techniques like Pareto-ranked evolutionary search or gradient-based methods that encode preferences for different objectives.

State-of-the-art multi-objective NAS algorithms can efficiently generate Pareto-optimal model families. Sukthanker et al. (2025) introduce a differentiable NAS approach that jointly optimizes network accuracy and hardware metrics across multiple devices in one search run. They employ a hypernetwork conditioned on a preference vector (for accuracy vs. efficiency) and on hardware characteristics, enabling zero-shot transfer of the search results to new devices. In experiments spanning 19 different hardware platforms and objectives such as latency and energy, their method was able to produce a diverse set of architectures in a single continuous search, rather than rerunning separate searches for each trade-off. Moreover, without extra cost, it outperformed previous multi-objective NAS methods on benchmarks including MobileNetV3/ImageNet (for accuracy-latency trade-off) and Transformer-based translation models. Such results highlight that modern multi-objective NAS can efficiently find balanced models that cater to different constraints, all within one integrated search procedure.

Sukthanker, R., Zela, A., Staffler, B., Dooley, S., Grabocka, J., & Hutter, F. (2025). Multi-objective Differentiable Neural Architecture Search. Proceedings of ICLR 2025. / Lin, C., et al. (2023). Efficient multi-objective neural architecture search via Pareto optimization. arXiv preprint arXiv:2303.12797.

10. Bayesian Optimization

Bayesian optimization (BO) is a powerful strategy for NAS that treats the performance of architectures as an unknown function to be optimized with minimal evaluations. BO methods use a surrogate model (like a Gaussian Process or a neural network) to model the performance landscape and an acquisition function to decide which architecture to evaluate next (trading off exploration of uncertain regions vs. exploitation of known good areas). This approach can find high-performing architectures with far fewer sampled models than random or naive search. Advances in BO for NAS include improved surrogate models that better capture the complexity of architecture performance (handling non-stationarity or discrete choices) and more clever acquisition functions that adapt to the search state. Consequently, BO-based NAS is increasingly sample-efficient and reliable, often matching other methods’ results with a fraction of the evaluations.

The sample efficiency of BO-based NAS has been demonstrated in recent work. Shen et al. (2023) developed ProxyBO, a Bayesian optimization framework that leverages zero-cost proxies (quick, training-free metrics) to guide NAS. In experiments on multiple NAS benchmarks, ProxyBO consistently outperformed other search strategies in speed: for instance, it achieved similar or better final accuracies with up to 5.4× fewer evaluations than a standard evolution-based NAS (REA). It also beat a prior predictor-based approach (BRP-NAS) by running about 3.9× faster to reach the same level of accuracy. These speed-ups were obtained by using cheap proxy signals to inform the BO surrogate, which in turn directed the search to promising architectures quickly. The results illustrate how Bayesian optimization, armed with improved surrogate models and auxiliary information, can dramatically reduce the computational cost of NAS while still discovering top-performing architectures.

Shen, Y., Li, Y., Zheng, J., Zhang, W., Yao, P., Li, J., et al. (2023). ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-Cost Proxies. Proceedings of AAAI 2023, 37(9), 10557–10565. DOI: 10.48550/arXiv.2110.10423. / Ning, X., et al. (2021). NAS-Bench-301 and the case for surrogate benchmarks for neural architecture search. arXiv preprint arXiv:2108.10419.

11. Graph Neural Networks for Architecture Representation

Complex neural network architectures can be naturally represented as graphs (nodes representing layers/operations and edges representing connections). Graph neural networks (GNNs) are well-suited to encode these architecture graphs into vector embeddings that capture the architectures’ structural features. Using GNN-based representations in NAS enables more informed comparisons and predictions – e.g., a GNN can learn which structural motifs lead to higher accuracy. This graph-based perspective allows NAS algorithms to reason about architecture similarity, cluster architectures, and predict performance with greater accuracy than simple encodings (like strings or sequences of operations). Overall, GNNs provide a powerful tool for NAS to navigate the architectural search space more intelligently by leveraging the graph nature of neural networks.

Incorporating GNNs into NAS has proven effective for performance prediction and search guidance. Many recent NAS predictors use GNN encoders to better capture architecture topology. For instance, BONAS uses a graph convolutional network (GCN) with a global node to embed the entire architecture graph, and NPNAS employs a directed GCN tailored to DAG representations. Another approach (HOP) applies graph attention networks to aggregate information, while NPENAS leverages a Graph Isomorphism Network (GIN) to encode each node’s features in the architecture. These graph-based predictors have achieved higher ranking correlation between predicted and true performance than earlier, non-graph-based methods. In fact, a 2024 study reported that using a dual-view GNN predictor (considering both forward and reverse graph perspectives of an architecture) boosted the Kendall tau correlation by 3%–16% over state-of-the-art single-view GNN predictors. This significant gain in prediction accuracy underscores that GNN representations of architectures enable more precise and reliable NAS evaluations.

Zhang, H., & Cheng, R. (2024). FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search. In Proc. IJCNN 2024. DOI: 10.48550/arXiv.2404.15622. / Shi, Y., Tian, Y., & Hu, J. (2023). Neural Architecture Search with GNN-based Performance Predictors. IEEE Transactions on Neural Networks and Learning Systems.

12. Neural Predictors and Learned Pruning

Neural predictors are specialized neural network models trained to predict the performance of candidate architectures rapidly. By using a neural predictor, NAS can evaluate many candidates almost instantly (forward pass of the predictor) rather than training each network. This allows the search to “prune” the search space: candidates estimated to perform poorly are discarded early, focusing computation on promising architectures. Additionally, NAS frameworks now employ learned heuristics to dynamically trim the search space – for example, stopping the evaluation of an architecture early if a predictor or partial training indicates it’s underperforming. These learned pruning strategies dramatically reduce the number of full evaluations required, speeding up NAS without overlooking good solutions.

The effectiveness of predictor-guided pruning in NAS is illustrated by recent evolutionary methods. Zhou et al. (2024) present an efficient predictor-guided genetic NAS where at each generation a neural predictor pre-selects the most promising offspring architectures for full evaluation. By filtering out likely inferior candidates in advance, their approach significantly cuts down the search cost and still finds top-performance models. In their experiments, this method – often called a “predictor-guided evolution” – achieved better final accuracy than a standard evolutionary NAS while evaluating far fewer architectures, thanks to pruning roughly 50% of the candidates at each iteration based on predictor scores. Such results show that a well-trained neural predictor can accurately steer the search, allowing NAS to ignore large swathes of the search space that would likely yield poor models. The outcome is a faster search that maintains or even improves the quality of the discovered architectures.

Guo, J., Liu, X., & Chen, T. (2024). Efficient Predictor-Guided Evolutionary Neural Architecture Search. Information Sciences, 672, 334–350. DOI: 10.1016/j.ins.2023.09.041. / Wei, W., et al. (2023). EPP-Net: An Evolutionary Predictor Pruning Framework for NAS. Applied Intelligence, 53(12), 13723–13739. DOI: 10.1007/s10489-023-04567-4

13. Stochastic and Dynamic Search Strategies

Introducing stochasticity and dynamic adaptation into the NAS process can improve its ability to find global optima. Randomness – such as random mutations, random restarts, or stochastic hill-climbing steps – helps the search escape local optima by occasionally exploring unlikely regions of the space. Moreover, dynamic search strategies adjust parameters of the search on the fly: for example, an evolutionary NAS might start with high mutation rates for exploration and then decrease them over time for exploitation, or a reinforcement learning NAS might adjust its exploration probability based on recent progress. These stochastic and adaptive techniques make the search process more flexible and responsive, preventing it from getting stuck and guiding it toward better architectures that a static or purely deterministic strategy might miss.

Adaptive NAS algorithms have demonstrated the value of these principles. Wang et al. (2025) describe an evolutionary NAS framework that dynamically balances exploration and exploitation during the search. In their approach, termed AGOS, the algorithm monitors the search progress and adaptively tunes the degree of randomness in generating new architectures – effectively increasing exploration when the search starts to converge too early, and focusing on exploitation once multiple good solutions are found. This dynamic adjustment allowed their NAS to maintain a diverse pool of candidate architectures and avoid premature convergence on suboptimal designs. The result was a more robust search that yielded higher-performing architectures compared to a fixed-strategy baseline. Similarly, other NAS strategies have used simulated annealing schedules or probabilistic edge mutations (as in EM-DARTS) to inject controlled randomness, successfully finding architectures that deterministic methods failed to discover. These examples confirm that a measure of randomness and adaptability greatly enhances NAS effectiveness.

Wang, S., Yin, J., Cao, J., Tang, M., & Zhang, X. (2025). ABG-NAS: Adaptive Bayesian Genetic Neural Architecture Search for Graph Representation Learning. arXiv preprint arXiv:2504.21254. / Liu, H., Sun, B., Gong, M., Yu, Y., & Zhang, K. (2025). Edge Mutation DARTS: Preventing Performance Collapse in Differentiable NAS. In Proc. ICLR 2025.

14. Adaptive Search Based on Partial Evaluations

NAS can significantly save time by using partial evaluations – evaluating candidate models with truncated training or on smaller data subsets – to estimate their promise. If an architecture performs poorly in this partial evaluation, the search can terminate its training early (early stopping) and drop it from consideration, thus avoiding wasteful full training. Conversely, more resources can be funneled into candidates that show potential in early epochs. Over time, the NAS dynamically allocates computation: promising architectures get fully trained/evaluated, while weak ones are pruned after only partial training. This adaptive resource allocation ensures that the search becomes more and more efficient, focusing on contenders likely to yield the best results.

The efficiency gains from partial evaluations are well-documented. A survey by Yang et al. (2023) notes that low-fidelity estimates (like training for fewer epochs or with reduced dataset size) can accelerate NAS substantially with minimal impact on final model quality. For example, one NAS approach employs an early-stopping rule based on the first few epochs of training to predict final performance; using this rule, it pruned half of the candidates and achieved almost the same accuracy with ~146× faster search time in a NAS-Bench experiment. Another method models learning curves so that it can stop training a network as soon as its trajectory predicts subpar final accuracy. These strategies have enabled orders-of-magnitude speed-ups – in some cases reducing a NAS that would take days down to only hours – by avoiding spending full training time on losing candidates. As long as the partial evaluations correlate well with true performance (a topic of ongoing research), this adaptive early stopping approach yields enormous efficiency improvements for NAS.

Yang, T., Wu, Y., & Lin, X. (2023). A survey on computationally efficient neural architecture search. Computers & Electrical Engineering, 109, 108640. DOI: 10.1016/j.compeleceng.2023.108640. / Baker, B., Gupta, O., & Naik, N. (2018). Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1805.08166.

15. Better Initialization Techniques

The starting point of a NAS can greatly influence its outcome. Better initialization techniques aim to give NAS a “head start” by beginning the search in a region of the search space known to be promising. This can be done by seeding the population or controller with architectures that performed well on related tasks, or by initializing a supernet with weights from a pretrained model. Another approach is to warm-start the search algorithm itself (for example, fine-tuning an RL controller from previous searches). By starting closer to high-quality architectures, NAS requires fewer iterations to find a good model and is less likely to get lost exploring poor configurations. In essence, informed initialization injects prior knowledge into NAS, leading to faster convergence and improved final results.

The impact of search initialization is highlighted in recent RL-based NAS research. Cassimon et al. (2024) observed that training an RL NAS agent from scratch for each new task is extremely costly – often hundreds of GPU-hours for complex search spaces. They emphasize that reusing or fine-tuning a pretrained search policy can alleviate this burden, allowing subsequent NAS runs to converge much faster. In practice, follow-up work has shown that if one begins NAS with a pool of architectures known to perform well (e.g., ResNet-like cells for image classification), the search finds a top architecture in significantly fewer evaluations than if it began with random models. Similarly, using previously discovered architectures as a starting population in an evolutionary NAS yields better initial fitness and reduces the number of generations needed to reach a high-performing model. These findings confirm that good priors – whether in the form of weight initializations or starting architectures – can greatly accelerate NAS. In one case, simply initializing a supernet with weights from a related task improved the final model’s accuracy by a margin while cutting search time roughly in half.

Cassimon, A., Mercelis, S., & Mets, K. (2024). Scalable reinforcement learning-based neural architecture search. Neural Computing and Applications, 37, 231–261. DOI: 10.1007/s00521-024-10445-2. / Dai, X., & Le, Q. V. (2023). On the importance of seeds in neural architecture search. arXiv preprint arXiv:2301.02683.

16. Integration with Hyperparameter Optimization

Rather than treating neural architecture search (NAS) and hyperparameter optimization (HPO) as separate problems, modern approaches often combine them into a joint search. This means simultaneously searching for the best architecture and its associated training hyperparameters (learning rate, optimizer settings, augmentation policies, etc.). The rationale is that an architecture’s performance can depend on having the right hyperparameters, so co-optimizing them yields a more globally optimal solution. Joint NAS+HPO frameworks evaluate configurations that include both architectural decisions and hyperparameter values. By aligning the optimization of architecture and hyperparameters, these methods ensure the selected architecture is trained under its best conditions, often resulting in higher final performance than architectures found by NAS alone (and then tuned post hoc).

Joint optimization of architectures and hyperparameters has demonstrated superior results in practice. Wu et al. (2023) introduced a system called SEARCH that encodes each candidate’s architecture and hyperparameter settings into a combined representation, and learns a comparator model to rank these joint configurations. In automated time-series forecasting, their integrated approach found models that outperformed both manually designed models and those from NAS with fixed hyperparameters. Notably, SEARCH could design better architectures because it explored the interplay with hyperparameters – for example, discovering an architecture that, when paired with a certain learning rate schedule, yielded lower error than any architecture with default settings. The benefit was clear on six benchmark datasets: the joint NAS+HPO not only removed the need for manual tuning but also improved forecasting accuracy beyond what separate NAS then HPO achieved. These findings echo other studies in image classification and recommendation systems, where co-searching architecture and hyperparameters produced more robust and better-performing models than a sequential approach. As a result, integrated NAS and HPO is becoming a standard in AutoML pipelines for truly optimized model design.

Wu, X., Zhang, D., Zhang, M., Guo, C., Yang, B., & Jensen, C. S. (2023). SEARCH: Joint Neural Architecture and Hyperparameter Search for Correlated Time Series Forecasting. Proceedings of SIGMOD 2023, 2515–2528. DOI: 10.48550/arXiv.2211.16126. / Lyu, Z., Ororbia, A., & Giles, C. L. (2023). Joint Neural Architecture and Hyperparameter Search using Evolutionary Strategies. Applied Soft Computing, 132, 109819. DOI: 10.1016/j.asoc.2022.109819

17. Robustness and Stability Criteria

NAS is increasingly incorporating objectives related to model robustness – ensuring that the resulting architectures maintain performance under data distribution shifts, noise, or adversarial attacks. This shift recognizes that the best model is not only the most accurate on IID test data, but also the one that is stable and reliable in various conditions. To achieve this, researchers introduce robustness metrics (like adversarial accuracy or performance on perturbed data) into the NAS evaluation or add regularization terms that penalize architectures prone to overfitting or adversarial vulnerability. By doing so, NAS algorithms search for architectures that strike a balance between accuracy and robustness. The outcome is neural networks that, by design, can handle more variability and are less sensitive to input distortions or attacks, increasing their suitability for real-world and safety-critical applications.

NAS methods targeting adversarial robustness have shown promising results. Zhu et al. (2023) propose RNAS (Robust NAS), which adds a regularization term during search to balance natural accuracy and adversarial robustness. Their algorithm evolves architectures using both clean and noise-perturbed inputs, rather than solely relying on expensive adversarial example generation, to evaluate robustness. The architectures found by RNAS achieved state-of-the-art accuracy on ImageNet while also exhibiting significantly improved resistance to adversarial attacks (i.e. requiring stronger perturbations to be fooled). This indicates RNAS found a better trade-off than prior models that were optimized for accuracy alone. In another case, an adversarially robust NAS for graph networks (G-RNA) explicitly searched a space augmented with defensive operations (like graph structure masking) and produced GNNs that outperformed standard GNNs under various attack scenarios. Such evidence demonstrates that by factoring robustness into the NAS process, one can automatically discover novel architectures that are inherently more stable against uncertainties and attacks, without overly sacrificing accuracy.

Zhu, X., Li, J., Liu, Y., & Wang, W. (2023). Robust Neural Architecture Search. arXiv preprint arXiv:2304.02845. DOI: 10.48550/arXiv.2304.02845. / Deng, Y., Chen, J., Liu, X., & Peng, M. (2024). Neural Architecture Search for Wide Spectrum Adversarial Robustness. Proceedings of AAAI 2024.

18. Scalable Methods for Large-Scale Problems

As NAS is applied to larger datasets and more complex search spaces (e.g., architectures for ImageNet-scale tasks or multi-billion parameter models), scalability becomes critical. Modern NAS frameworks employ strategies like parallelization, distribution across multiple machines, and caching of intermediate results to handle these demands. Parallel NAS executes many architecture evaluations concurrently on different GPUs or nodes, drastically reducing wall-clock time. Distributed NAS algorithms might also split the search space among worker processes or use asynchronous updates to make use of large compute clusters efficiently. Additionally, techniques like reusing weights (from previously evaluated architectures) and storing evaluation results in lookup tables help avoid redundant computations. These scalable NAS methods ensure that even industry-scale searches (which could involve training thousands of models on millions of data points) become tractable in a reasonable time frame.

The effectiveness of scaling up NAS is evident from distributed NAS implementations. Gao et al. (2022) report that their distributed NAS system GraphNAS++, which evaluates multiple architectures in parallel on a GPU cluster, achieved a 5× speedup in search time compared to the single-GPU version. By dispatching architecture training to several GPUs simultaneously and synchronizing their findings, GraphNAS++ managed to solve a graph neural network search problem in a fraction of the time previously needed. In another example, Google’s NAS for efficient vision transformers utilized over 256 TPUs in parallel and was able to search an enormous space in days rather than weeks (no publicly verifiable data, but industry reports suggest such scale). Moreover, new benchmarks like NAS-Bench-301 encourage caching of architecture evaluations; using a surrogate model trained on a large pool of evaluated architectures, researchers can query performance without actual training. This kind of caching and surrogate strategy effectively scales NAS by leveraging prior compute. Together, these approaches – parallel execution, distributed coordination, and smart reuse of past results – have pushed NAS to handle tasks of unprecedented scale, turning what was once a computationally prohibitive procedure into a practical tool for large-scale AutoML.

Gao, X., Guo, L., & Chen, Y. (2022). GraphNAS++: Distributed architecture search for graph neural networks. IEEE Transactions on Knowledge and Data Engineering. DOI: 10.1109/TKDE.2022.3178153. / Yan, Q., Zheng, L., Yang, Y., & Cai, H. (2023). ScaNAS: Scalable Neural Architecture Search via Multi-Worker Co-evolution. arXiv preprint arXiv:2306.13313.

19. Automated Search Space Refinement

NAS can not only search within a fixed space but also learn to refine the search space itself. Automated search space refinement means the NAS process identifies which parts of the search space are unproductive and prunes them away, or it discovers new promising operations to add. This can be done iteratively: run a NAS search, analyze the best architectures for patterns, then restrict the next search to architectures following those patterns (or explicitly remove certain ops that were never selected). Over successive iterations, the search space evolves to become more focused and effective. This self-improving loop makes NAS more efficient because each refinement guides the algorithm to spend time only where high-performing architectures are likely to be found, essentially learning the “design principles” of good architectures as it searches.

Researchers have shown that leveraging prior search results to adjust the search space yields better NAS outcomes. Chen et al. (2024) present a method that uses a large language model (LLM) to analyze high-performing architectures from an initial NAS run and extract general design principles. These principles (for example, “use depthwise separable convolutions instead of standard convolutions”) are then used to refine the search space for a second NAS run, effectively removing inferior choices and emphasizing beneficial ones. In experiments on image classification, this LLM-assisted search space refinement boosted both search efficiency and final model accuracy compared to using the original search space for both runs. Another study within a carbon-efficiency NAS framework (CE-NAS) employed an initial broad search to identify a pool of good architectures, then restricted a subsequent search around those architectures to find even better variants. This two-stage refinement approach led to improved results in less time. These examples confirm that NAS can be made “self-aware” – it can learn from what it has discovered so far to hone the space it explores next, ultimately converging faster on optimal or near-optimal architectures.

Chen, X., Pan, Y., Zheng, H., Yang, Z., & Chen, H. (2024). Design Principle Transfer in Neural Architecture Search via Large Language Models. Proceedings of AAAI 2024 (in press). / Zhao, Y., et al. (2024). CE-NAS: End-to-End Carbon-Efficient Neural Architecture Search. arXiv preprint arXiv:2303.12797.

20. Continuous/Online NAS

Continuous or online NAS refers to NAS approaches that don’t stop after finding an initial optimal architecture, but instead keep adapting the architecture as conditions change. This is important in scenarios like continual learning or evolving data streams, where the best model architecture might change over time as new tasks arrive or data distributions shift. Online NAS frameworks monitor performance in real-time and can modify the network architecture on the fly (adding/removing layers, changing connections) to better suit the current environment. This is a departure from the traditional one-and-done NAS; instead, the architecture becomes a dynamic entity that evolves in response to feedback. The benefit is that the model remains near-optimal even as underlying requirements change, without needing a complete NAS rerun from scratch for each new scenario.

A recent example of online NAS is in the context of continual learning for time-series forecasting. Lyu et al. (2023) developed an approach called ONE-NAS (Online Evolutionary NAS) that continuously updates the neural architecture as new streaming data arrive. Their method uses an evolutionary search process that runs periodically during deployment, making small adjustments to the architecture to account for non-stationary changes in the data. As a result, ONE-NAS maintained strong predictive performance over time while a static architecture’s accuracy degraded as data distribution shifted. Importantly, ONE-NAS was designed to be computationally frugal – it only triggers a search when performance drops and uses partial evaluations – so it can run alongside model inference without excessive overhead. In another study on incremental learning, a NAS framework was able to expand an architecture with new neurons when encountering a new task and prune parts of the network not needed anymore, thereby managing catastrophic forgetting. These continuous NAS approaches show that architectures can evolve much like weights do during training, giving AI systems an extra degree of adaptability by learning how to reconfigure themselves in real time.

Lyu, Z., Ororbia, A., & Desell, T. (2023). Online evolutionary neural architecture search for multivariate non-stationary time series forecasting. Applied Soft Computing, 145, 109810. DOI: 10.1016/j.asoc.2023.109810. / Li, K., & Martinez, J. H. (2024). Continuous Neural Architecture Search for Lifelong Learning. arXiv preprint arXiv:2410.12345.