Navigating Complex Search Tasks with AI Copilots: The Undiscovered Country and References

26 Apr 2024

This paper is available on arxiv under CC 4.0 license.


(1) Ryen W. White, Microsoft Research, Redmond, WA, USA.

Abstract and Taking Search to task

AI Copilots



The Undiscovered Country and References


AI copilots will transform how we search. Tasks are central to people’s lives and more support is needed for complex tasks in search settings. Some limited support for these tasks already exists in search engines, but copilots will expand the task frontier to make more tasks actionable and address the “last mile” in search interaction: task completion [58]. Moving forward, search providers should invest in “better together” experiences that utilize copilots plus traditional search, make these joint experiences more seamless for searchers, and add more support for their use in practice, e.g., help people to quickly understand copilot capabilities and potential and/or recommend the best modality for the current task or task stage. This includes experiences where both modalities are offered separately and can be selected by searchers and those where there

is unification and the selection happens automatically based on the query and the conversation context. The foundation models that power copilots have other search-related applications, e.g., for generating and applying intent taxonomies [43] or for evaluation [19]. We must retain a continued focus on human-AI cooperation, where searchers stay in control while the degree of system support increases as needed [44], and on AI safety. Searchers need to be able to trust copilots in general but also be able to verify their answers with minimal effort. Overall, the future is bright for IR, and AI research in general, with the advent of generative AI and the copilots that build upon it. Copilots will help augment and empower searchers in their information seeking journeys. Computer science researchers and practitioners should embrace this new era of assistive agents and engage across the full spectrum of exciting practical and scientific opportunities, both within information seeking as we focused on here, and onwards into other important domains such as personal productivity [5] and scientific discovery [22].


[1] Eugene Agichtein, Ryen W White, Susan T Dumais, and Paul N Bennet. 2012. Search, interrupted: understanding and predicting search task continuation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 315-324.

[2] Marcia J Bates. 1990. Where should the person stop and the information search interface start? Information Processing & Management 26, 5 (1990), 575–591.

[3] Nicholas J Belkin. 1980. Anomalous states of knowledge as a basis for information retrieval. Canadian journal of information science 5, 1 (1980), 133–143.

[4] Paul N Bennett, Ryen W White, Wei Chu, Susan T Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 185–194.

[5] Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue 20, 6 (2022), 35–57.

[6] Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).

[7] Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 141–159.

[8] Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM New York, NY, USA, 3–10.

[9] Andrei Z Broder and Preston McAfee. 2023. Delphic Costs and Benefits in Web Search: A utilitarian and historical analysis. arXiv preprint arXiv:2308.07525 (2023).

[10] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).

[11] Katriina Byström and Kalervo Järvelin. 1995. Task complexity affects information seeking and use. Information processing & management 31, 2 (1995), 191–213.

[12] Robert Capra and Jaime Arguello. 2023. How does AI chat change search behaviors? arXiv preprint arXiv:2307.03826 (2023).

[13] Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023) (2023).

[14] Antonia Creswell and Murray Shanahan. 2022. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271 (2022).

[15] Brenda Dervin. 1998. Sense-making theory and practice: An overview of user interests in knowledge seeking and use. Journal of knowledge management 2, 2 (1998), 36–46.

[16] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[17] Karl Duncker and Lynne S Lees. 1945. On problem-solving. Psychological monographs 58, 5 (1945), i.

[18] Brad Everman, Trevor Villwock, Dayuan Chen, Noe Soto, Oliver Zhang, and Ziliang Zong. 2023. Evaluating the Carbon Impact of Large Language Models at the Inference Stage. In 2023 IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE, 150–157.

[19] Guglielmo Faggioli, Laura Dietz, Charles LA Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, et al. 2023. Perspectives on Large Language Models for Relevance Judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval. 39–50.

[20] Jianfeng Gao, Chenyan Xiong, Paul Bennett, and Nick Craswell. 2023. Neural Approaches to Conversational Information Retrieval. Vol. 44. Springer Nature.

[21] Ahmed Hassan Awadallah, Ryen W White, Patrick Pantel, Susan T Dumais, and Yi-Min Wang. 2014. Supporting complex search tasks. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 829–838.

[22] Tom Hope, Doug Downey, Daniel S Weld, Oren Etzioni, and Eric Horvitz. 2023. A computational inflection for scientific discovery. Commun. ACM 66, 8 (2023), 62–73.

[23] Peter Ingwersen and Kalervo Järvelin. 2005. The turn: Integration of information seeking and retrieval in context. Vol. 18. Springer Science & Business Media.

[24] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.

[25] Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 133–142.

[26] Jeonghyun Kim. 2006. Task difficulty as a predictor and indicator of web searching interaction. In CHI’06 extended abstracts on human factors in computing systems. 959–964.

[27] David R Krathwohl. 2002. A revision of Bloom’s taxonomy: An overview. Theory into practice 41, 4 (2002), 212–218.

[28] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.

[29] Yuelin Li and Nicholas J Belkin. 2008. A faceted approach to conceptualizing tasks in information seeking. Information processing & management 44, 6 (2008), 1822–1837.

[30] Yuanchun Li and Oriana Riva. 2021. Glider: A reinforcement learning approach to extract UI scripts from websites. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1420–1430.

[31] Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2021. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning. PMLR, 6565–6576.

[32] Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (2006), 41–46.

[33] James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, and Scott Miller. 2023. Synthetic Cross-language Information Retrieval Training Data. arXiv preprint arXiv:2305.00331 (2023).

[34] Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, and Ahmed Awadallah. 2023. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707 (2023).

[35] Marc Najork. 2023. Generative Information Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1–1.

[36] Alexandra Olteanu, Jean Garcia-Gathright, Maarten de Rijke, Michael D Ekstrand, Adam Roegiest, Aldo Lipani, Alex Beutel, Alexandra Olteanu, Ana Lucic, AnaAndreea Stoica, et al. 2021. FACTS-IR: fairness, accountability, confidentiality, transparency, and safety in information retrieval. In ACM SIGIR Forum, Vol. 53. ACM New York, NY, USA, 20–43.

[37] Soo Young Rieh, Kevyn Collins-Thompson, Preben Hansen, and Hye-Jung Lee. 2016. Towards searching as a learning process: A review of current perspectives and future directions. Journal of Information Science 42, 1 (2016), 19–34.

[38] Shawon Sarkar and Chirag Shah. 2021. An integrated model of task, information needs, sources and uncertainty to design task-aware search systems. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval. 83–92.

[39] Reijo Savolainen. 2012. Expectancy-value beliefs and information needs as motivators for task-based information seeking. Journal of Documentation 68, 4 (2012), 492–511.

[40] Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).

[41] Chirag Shah. 2023. Generative AI and the Future of Information Access. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (Birmingham, United Kingdom) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 3.

[42] Chirag Shah, Ryen White, Paul Thomas, Bhaskar Mitra, Shawon Sarkar, and Nicholas Belkin. 2023. Taking search to task. In Proceedings of the 2023 Conference on Human Information Interaction and Retrieval. 1–13.

[43] Chirag Shah, Ryen W White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Snigdha Sarathi Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Xiaochuan Ni, et al. 2023. Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies. arXiv preprint arXiv:2309.13063 (2023).

[44] Ben Shneiderman. 2022. Human-centered AI. Oxford University Press.

[45] Adish Singla, Ryen White, and Jeff Huang. 2010. Studying trailfinding algorithms for enhanced web search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 443–450.

[46] Jaime Teevan, Kevyn Collins-Thompson, Ryen W White, and Susan Dumais. 2014. Slow search. Commun. ACM 57, 8 (2014), 36–38.

[47] Jaime Teevan, Susan T Dumais, and Eric Horvitz. 2005. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 449–456.

[48] Jaime Teevan, Meredith Ringel Morris, and Steve Bush. 2009. Discovering and using groups to improve personalized search. In Proceedings of the second acm international conference on web search and data mining. 15–24.

[49] Maartje ter Hoeve, Robert Sim, Elnaz Nouri, Adam Fourney, Maarten de Rijke, and Ryen W White. 2020. Conversations with documents: An exploration of document-centered assistance. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 43–52.

[50] Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2023. Large language models can accurately predict searcher preferences. arXiv preprint arXiv:2309.10621 (2023).

[51] Randall H Trigg. 1988. Guided tours and tabletops: Tools for communicating in a hypertext environment. ACM Transactions on Information Systems (TOIS) 6, 4 (1988), 398–414.

[52] Sarah K Tyler and Jaime Teevan. 2010. Large scale query log analysis of re-finding. In Proceedings of the third ACM international conference on Web search and data mining. 191–200.

[53] Pertti Vakkari. 2001. A theory of the task-based information retrieval process: A summary and generalisation of a longitudinal study. Journal of documentation 57, 1 (2001), 44–60.

[54] Pertti Vakkari. 2016. Searching as learning: A systematization based on literature. Journal of Information Science 42, 1 (2016), 7–18.

[55] Nicholas Vincent. 2022. The Paradox of Reuse, Language Models Edition. Accessed: 2023-09-12.

[56] Yu Wang, Xiao Huang, and Ryen W White. 2013. Characterizing and supporting cross-device search tasks. In Proceedings of the sixth ACM international conference on Web search and data mining. 707–716.

[57] Ryen W White. 2016. Interactions with search systems. Cambridge University Press.

[58] Ryen W White. 2018. Opportunities and challenges in search interaction. Commun. ACM 61,12 (2018), 36–38.

[59] Ryen W White. 2018. Skill discovery in virtual assistants. Commun. ACM 61, 11 (2018), 106–113.

[60] Ryen W White. 2022. Intelligent futures in task assistance. Commun. ACM 65, 11 (2022), 35–39.

[61] Ryen W. White. 2023. Tasks, Copilots, and the Future of Search. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 5–6.

[62] Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. 2007. Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 159–166.

[63] Ryen W White, Wei Chu, Ahmed Hassan, Xiaodong He, Yang Song, and Hongning Wang. 2013. Enhancing personalized search by mining and modeling task behavior. In Proceedings of the 22nd international conference on World Wide Web. 1411–1420.

[64] Ryen W White, Adam Fourney, Allen Herring, Paul N Bennett, Nirupama Chandrasekaran, Robert Sim, Elnaz Nouri, and Mark J Encarnación. 2019. Multi-device digital assistance. Commun. ACM 62, 10 (2019), 28–31.

[65] Ryen W White, Ian Ruthven, and Joemon M Jose. 2005. A study of factors affecting the utility of implicit relevance feedback. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 35–42.

[66] Ryen W White, Ian Ruthven, Joemon M Jose, and CJ Van Rijsbergen. 2005. Evaluating implicit feedback models using searcher simulations. ACM Transactions on Information Systems (TOIS) 23, 3 (2005), 325–361.

[67] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. AutoGen: Enabling nextgen LLM applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).

[68] Iris Xie. 2008. Interactive information retrieval in digital environments. IGI global.

[69] Jinyun Yan, Wei Chu, and Ryen W White. 2014. Cohort modeling for enhanced personalized search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 505–514.

[70] Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, et al. 2021. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500 (2021).

[71] Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck. 2020. Generating clarifying questions for information retrieval. In Proceedings of the web conference 2020. 418–428.

[72] Jieyu Zhang, Ranjay Krishna, Ahmed H Awadallah, and Chi Wang. 2023. EcoAssistant: Using LLM Assistant More Affordably and Accurately. arXiv preprint arXiv:2310.03046 (2023).

[73] Yi Zhang, Sujay Kumar Jauhar, Julia Kiseleva, Ryen White, and Dan Roth. 2021. Learning to decompose and organize complex tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2726–2735.

[74] Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Lingpeng Kong, Jiajun Chen, Lei Li, and Shujian Huang. 2023. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675 (2023).

[75] Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019).