2020 |
|
83. | Larson, Stefan; Mahendran, Anish; Lee, Andrew; Kummerfeld, Jonathan K; Hill, Parker; Laurenzano, Michael A; Hauswald, Johann; Tang, Lingjia; Mars, Jason Systems and methods for automatically configuring training data for training machine learning models of a machine learning-based dialogue system including seeding training samples or curating a corpus of training data based on instances of training data identified as anomalous Miscellaneous 2020, (US Patent 10,679,150). @misc{larson2020systems, title = {Systems and methods for automatically configuring training data for training machine learning models of a machine learning-based dialogue system including seeding training samples or curating a corpus of training data based on instances of training data identified as anomalous}, author = {Stefan Larson and Anish Mahendran and Andrew Lee and Jonathan K Kummerfeld and Parker Hill and Michael A Laurenzano and Johann Hauswald and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/12/US9117447.pdf}, year = {2020}, date = {2020-06-01}, abstract = {A system and method for improving a machine learning-based dialogue system includes: sourcing a corpus of raw machine learning training data from sources of training data based on a plurality of seed training samples, wherein the corpus of raw machine learning training data comprises a plurality of distinct instances of training data; generating a vector representation for each distinct instance of training data; identifying statistical characteristics of the corpus of raw machine learning training data based on a mapping of the vector representation for each distinct instance of training data; identifying anomalous instances of the plurality of distinct instances of training data of the corpus of raw machine learning training data based on the identified statistical characteristics of the corpus; and curating the corpus of raw machine learning training data based on each of the instances of training data identified as anomalous instances.}, note = {US Patent 10,679,150}, keywords = {}, pubstate = {published}, tppubtype = {misc} } A system and method for improving a machine learning-based dialogue system includes: sourcing a corpus of raw machine learning training data from sources of training data based on a plurality of seed training samples, wherein the corpus of raw machine learning training data comprises a plurality of distinct instances of training data; generating a vector representation for each distinct instance of training data; identifying statistical characteristics of the corpus of raw machine learning training data based on a mapping of the vector representation for each distinct instance of training data; identifying anomalous instances of the plurality of distinct instances of training data of the corpus of raw machine learning training data based on the identified statistical characteristics of the corpus; and curating the corpus of raw machine learning training data based on each of the instances of training data identified as anomalous instances. |
82. | Mars, Jason; Tang, Lingjia; Laurenzano, Michael; Hauswald, Johann; Hill, Parker System and method for implementing an artificially intelligent virtual assistant using machine learning Miscellaneous 2020, (US Patent 10,572,801). @misc{mars2020system, title = {System and method for implementing an artificially intelligent virtual assistant using machine learning}, author = {Jason Mars and Lingjia Tang and Michael Laurenzano and Johann Hauswald and Parker Hill}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/US20190130244A1.pdf}, year = {2020}, date = {2020-02-01}, abstract = {Systems and methods for implementing an artificially intelligent virtual assistant includes collecting a user query; using a competency classification machine learning model to generate a competency label for the user query; using a slot identification machine learning model to segment the text of the query and label each of the slots of the query; generating a slot value for each of the slots of the query; generating a handler for each of the slot values; and using the slot values to: identify an external data source relevant to the user query, fetch user data from the external data source, and apply one or more operations to the query to generate response data; and using the response data, to generate a response to the user query.}, note = {US Patent 10,572,801}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Systems and methods for implementing an artificially intelligent virtual assistant includes collecting a user query; using a competency classification machine learning model to generate a competency label for the user query; using a slot identification machine learning model to segment the text of the query and label each of the slots of the query; generating a slot value for each of the slots of the query; generating a handler for each of the slot values; and using the slot values to: identify an external data source relevant to the user query, fetch user data from the external data source, and apply one or more operations to the query to generate response data; and using the response data, to generate a response to the user query. |
81. | Kang, Yiping; Zhang, Yunqi; Kummerfeld, Jonathan K; Hill, Parker; Hauswald, Johann; Laurenzano, Michael A; Tang, Lingjia; Mars, Jason Systems and methods for intelligently curating machine learning training data and improving machine learning model performance Miscellaneous 2020, (US Patent 10,679,100). @misc{kang2020systems, title = {Systems and methods for intelligently curating machine learning training data and improving machine learning model performance}, author = {Yiping Kang and Yunqi Zhang and Jonathan K Kummerfeld and Parker Hill and Johann Hauswald and Michael A Laurenzano and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/12/US10679100.pdf}, year = {2020}, date = {2020-01-01}, abstract = {Systems and methods of intelligent formation and acquisition of machine learning training data for implementing an artificially intelligent dialogue system includes constructing a corpora of machine learning test corpus that comprise a plurality of historical queries and commands sampled from production logs of a deployed dialogue system; configuring training data sourcing parameters to source a corpora of raw machine learning training data from remote sources of machine learning training data; calculating efficacy metrics of the corpora of raw machine learning training data, wherein calculating the efficacy metrics includes calculating one or more of a coverage metric value and a diversity metric value of the corpora of raw machine learning training data; using the corpora of raw machine learning training data to train the at least one machine learning classifier if the calculated coverage metric value of the corpora of machine learning training data satisfies a minimum coverage metric threshold.}, note = {US Patent 10,679,100}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Systems and methods of intelligent formation and acquisition of machine learning training data for implementing an artificially intelligent dialogue system includes constructing a corpora of machine learning test corpus that comprise a plurality of historical queries and commands sampled from production logs of a deployed dialogue system; configuring training data sourcing parameters to source a corpora of raw machine learning training data from remote sources of machine learning training data; calculating efficacy metrics of the corpora of raw machine learning training data, wherein calculating the efficacy metrics includes calculating one or more of a coverage metric value and a diversity metric value of the corpora of raw machine learning training data; using the corpora of raw machine learning training data to train the at least one machine learning classifier if the calculated coverage metric value of the corpora of machine learning training data satisfies a minimum coverage metric threshold. |
80. | Liu, Tianyi; He, Sen; Huang, Sunzhou; Tsang, Danny; Tang, Lingjia; Mars, Jason; Wang, Wei A Benchmarking Framework for Interactive 3D Applications in the Cloud Inproceedings 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 881–894, IEEE 2020. @inproceedings{liu2020benchmarking, title = {A Benchmarking Framework for Interactive 3D Applications in the Cloud}, author = {Tianyi Liu and Sen He and Sunzhou Huang and Danny Tsang and Lingjia Tang and Jason Mars and Wei Wang}, url = {https://www.jasonmars.org/wp-content/uploads/2020/12/2006.13378.pdf}, year = {2020}, date = {2020-01-01}, booktitle = {2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)}, pages = {881--894}, organization = {IEEE}, abstract = {With the growing popularity of cloud gaming and cloud virtual reality (VR), interactive 3D applications have become a major class of workloads for the cloud. However, despite their growing importance, there is limited public research on how to design cloud systems to efficiently support these applications due to the lack of an open and reliable research infrastructure, including benchmarks and performance analysis tools. The challenges of generating human-like inputs under various system/application nondeterminism and dissecting the performance of complex graphics systems make it very difficult to design such an infrastructure. In this paper, we present the design of a novel research infrastructure, Pictor, for cloud 3D applications and systems. Pictor employs AI to mimic human interactions with complex 3D applications. It can also track the processing of user inputs to provide in-depth performance measurements for the complex software and hardware stack used for cloud 3D-graphics rendering. With Pictor, we designed a benchmark suite with six interactive 3D applications. Performance analyses were conducted with these benchmarks, which show that cloud system designs, including both system software and hardware designs, are crucial to the performance of cloud 3D applications. The analyses also show that energy consumption can be reduced by at least 37% when two 3D applications share a could server. To demonstrate the effectiveness of Pictor, we also implemented two optimizations to address two performance bottlenecks discovered in a state-of-the-art cloud 3D-graphics rendering system. These two optimizations improved the frame rate by 57.7% on average.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } With the growing popularity of cloud gaming and cloud virtual reality (VR), interactive 3D applications have become a major class of workloads for the cloud. However, despite their growing importance, there is limited public research on how to design cloud systems to efficiently support these applications due to the lack of an open and reliable research infrastructure, including benchmarks and performance analysis tools. The challenges of generating human-like inputs under various system/application nondeterminism and dissecting the performance of complex graphics systems make it very difficult to design such an infrastructure. In this paper, we present the design of a novel research infrastructure, Pictor, for cloud 3D applications and systems. Pictor employs AI to mimic human interactions with complex 3D applications. It can also track the processing of user inputs to provide in-depth performance measurements for the complex software and hardware stack used for cloud 3D-graphics rendering. With Pictor, we designed a benchmark suite with six interactive 3D applications. Performance analyses were conducted with these benchmarks, which show that cloud system designs, including both system software and hardware designs, are crucial to the performance of cloud 3D applications. The analyses also show that energy consumption can be reduced by at least 37% when two 3D applications share a could server. To demonstrate the effectiveness of Pictor, we also implemented two optimizations to address two performance bottlenecks discovered in a state-of-the-art cloud 3D-graphics rendering system. These two optimizations improved the frame rate by 57.7% on average. |
79. | Mars, Jason; Tang, Lingjia; Laurenzano, Michael A; Hauswald, Johann; Hill, Parker; Kang, Yiping; Zhang, Yunqi Systems and methods for intelligently configuring and deploying a machine learning-based dialogue system Miscellaneous 2020, (US Patent 10,769,384). @misc{mars2020systems, title = {Systems and methods for intelligently configuring and deploying a machine learning-based dialogue system}, author = {Jason Mars and Lingjia Tang and Michael A Laurenzano and Johann Hauswald and Parker Hill and Yiping Kang and Yunqi Zhang}, url = {https://www.jasonmars.org/wp-content/uploads/2020/12/US10769384.pdf}, year = {2020}, date = {2020-01-01}, abstract = {A system and method for intelligently configuring a machine learning-based dialogue system includes a conversational deficiency assessment of a target dialog system, wherein implementing the conversational deficiency assessment includes: (i) identifying distinct corpora of mishandled utterances based on an assessment of the distinct corpora of dialogue data; (ii) identifying candidate corpus of mishandled utterances from the distinct corpora of mishandled utterances as suitable candidates for building new dialogue competencies for the target dialogue system if candidate metrics of the candidate corpus of mishandled utterances satisfy a candidate threshold; building the new dialogue competencies for the target dialogue system for each of the candidate corpus of mishandled utterances having candidate metrics that satisfy the candidate threshold; and configuring a dialogue system control structure for the target dialogue system based on the new dialogue competencies, wherein the dialogue system control structure governs an operation of an automated dialogue agent.}, note = {US Patent 10,769,384}, keywords = {}, pubstate = {published}, tppubtype = {misc} } A system and method for intelligently configuring a machine learning-based dialogue system includes a conversational deficiency assessment of a target dialog system, wherein implementing the conversational deficiency assessment includes: (i) identifying distinct corpora of mishandled utterances based on an assessment of the distinct corpora of dialogue data; (ii) identifying candidate corpus of mishandled utterances from the distinct corpora of mishandled utterances as suitable candidates for building new dialogue competencies for the target dialogue system if candidate metrics of the candidate corpus of mishandled utterances satisfy a candidate threshold; building the new dialogue competencies for the target dialogue system for each of the candidate corpus of mishandled utterances having candidate metrics that satisfy the candidate threshold; and configuring a dialogue system control structure for the target dialogue system based on the new dialogue competencies, wherein the dialogue system control structure governs an operation of an automated dialogue agent. |
78. | Peper, Joseph; Hill, Parker; Leach, Kevin; Stapleton, Sean; Kummerfeld, Jonathan K; Hauswald, Johann; Laurenzano, Michael; Tang, Lingjia; Mars, Jason Systems and methods for machine learning-based multi-intent segmentation and classification Miscellaneous 2020, (US Patent 10,824,818). @misc{peper2020systems, title = {Systems and methods for machine learning-based multi-intent segmentation and classification}, author = {Joseph Peper and Parker Hill and Kevin Leach and Sean Stapleton and Jonathan K Kummerfeld and Johann Hauswald and Michael Laurenzano and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/12/US10824818.pdf}, year = {2020}, date = {2020-01-01}, abstract = {Systems and methods for synthesizing training data for multi-intent utterance segmentation include identifying a first corpus of utterances comprising a plurality of distinct single-intent in-domain utterances; identifying a second corpus of utterances comprising a plurality of distinct single-intent out-of-domain utterances; identifying a third corpus comprising a plurality of distinct conjunction terms; forming a multi-intent training corpus comprising synthetic multi-intent utterances, wherein forming each distinct multi-intent utterance includes: selecting a first distinct in-domain utterance from the first corpus of utterances; probabilistically selecting one of a first out-of-domain utterance from the second corpus and a second in-domain utterance from the first corpus; probabilistically selecting or not selecting a distinct conjunction term from the third corpus; and forming a synthetic multi-intent utterance including appending the first in-domain utterance with one of the first out-of-domain utterance from the second corpus of utterances and the second in-domain utterance from the first corpus of utterances.}, note = {US Patent 10,824,818}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Systems and methods for synthesizing training data for multi-intent utterance segmentation include identifying a first corpus of utterances comprising a plurality of distinct single-intent in-domain utterances; identifying a second corpus of utterances comprising a plurality of distinct single-intent out-of-domain utterances; identifying a third corpus comprising a plurality of distinct conjunction terms; forming a multi-intent training corpus comprising synthetic multi-intent utterances, wherein forming each distinct multi-intent utterance includes: selecting a first distinct in-domain utterance from the first corpus of utterances; probabilistically selecting one of a first out-of-domain utterance from the second corpus and a second in-domain utterance from the first corpus; probabilistically selecting or not selecting a distinct conjunction term from the third corpus; and forming a synthetic multi-intent utterance including appending the first in-domain utterance with one of the first out-of-domain utterance from the second corpus of utterances and the second in-domain utterance from the first corpus of utterances. |
77. | Lee, Andrew; Larson, Stefan; Clarke, Christopher; Leach, Kevin; Kummerfeld, Jonathan K; Hill, Parker; Hauswald, Johann; Laurenzano, Michael A; Tang, Lingjia; Mars, Jason; others, Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system Miscellaneous 2020, (US Patent 10,796,104). @misc{lee2020systems, title = {Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system}, author = {Andrew Lee and Stefan Larson and Christopher Clarke and Kevin Leach and Jonathan K Kummerfeld and Parker Hill and Johann Hauswald and Michael A Laurenzano and Lingjia Tang and Jason Mars and others}, url = {https://www.jasonmars.org/wp-content/uploads/2020/12/US10796104.pdf}, year = {2020}, date = {2020-01-01}, abstract = {Systems and methods for constructing an artificially diverse corpus of training data includes evaluating a corpus of utterance-based training data samples, identifying a slot replacement candidate; deriving distinct skeleton utterances that include the slot replacement candidate, wherein deriving the distinct skeleton utterances includes replacing slots of each of the plurality of distinct utterance training samples with one of a special token and proper slot classification labels; selecting a subset of the distinct skeleton utterances; converting each of the distinct skeleton utterances of the subset back to distinct utterance training samples while still maintaining the special token at a position of the slot replacement candidate; altering a percentage of the distinct utterance training samples with a distinct randomly-generated slot token value at the position of the slot replacement candidate; and constructing the artificially diverse corpus of training samples based on a collection of the percentage of the distinct utterance training samples.}, note = {US Patent 10,796,104}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Systems and methods for constructing an artificially diverse corpus of training data includes evaluating a corpus of utterance-based training data samples, identifying a slot replacement candidate; deriving distinct skeleton utterances that include the slot replacement candidate, wherein deriving the distinct skeleton utterances includes replacing slots of each of the plurality of distinct utterance training samples with one of a special token and proper slot classification labels; selecting a subset of the distinct skeleton utterances; converting each of the distinct skeleton utterances of the subset back to distinct utterance training samples while still maintaining the special token at a position of the slot replacement candidate; altering a percentage of the distinct utterance training samples with a distinct randomly-generated slot token value at the position of the slot replacement candidate; and constructing the artificially diverse corpus of training samples based on a collection of the percentage of the distinct utterance training samples. |
2019 |
|
76. | Kang, Yiping; Zhang, Yunqi; Kummerfeld, Jonathan K; Hill, Parker; Hauswald, Johann; Laurenzano, Michael A; Tang, Lingjia; Mars, Jason Systems and methods for intelligently curating machine learning training data and improving machine learning model performance Miscellaneous 2019, (US Patent 10,303,978). @misc{kang2019systems, title = {Systems and methods for intelligently curating machine learning training data and improving machine learning model performance}, author = {Yiping Kang and Yunqi Zhang and Jonathan K Kummerfeld and Parker Hill and Johann Hauswald and Michael A Laurenzano and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/US20190294925A1.pdf}, year = {2019}, date = {2019-05-01}, abstract = {Systems and methods of intelligent formation and acquisition of machine learning training data for implementing an artificially intelligent dialogue system includes constructing a corpora of machine learning test corpus that comprise a plurality of historical queries and commands sampled from production logs of a deployed dialogue system; configuring training data sourcing parameters to source a corpora of raw machine learning training data from remote sources of machine learning training data; calculating efficacy metrics of the corpora of raw machine learning training data, wherein calculating the efficacy metrics includes calculating one or more of a coverage metric value and a diversity metric value of the corpora of raw machine learning training data; using the corpora of raw machine learning training data to train the at least one machine learning classifier if the calculated coverage metric value of the corpora of machine learning training data satisfies a minimum coverage metric threshold.}, note = {US Patent 10,303,978}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Systems and methods of intelligent formation and acquisition of machine learning training data for implementing an artificially intelligent dialogue system includes constructing a corpora of machine learning test corpus that comprise a plurality of historical queries and commands sampled from production logs of a deployed dialogue system; configuring training data sourcing parameters to source a corpora of raw machine learning training data from remote sources of machine learning training data; calculating efficacy metrics of the corpora of raw machine learning training data, wherein calculating the efficacy metrics includes calculating one or more of a coverage metric value and a diversity metric value of the corpora of raw machine learning training data; using the corpora of raw machine learning training data to train the at least one machine learning classifier if the calculated coverage metric value of the corpora of machine learning training data satisfies a minimum coverage metric threshold. |
75. | Kannan, Ram Srivatsa; Subramanian, Lavanya; Raju, Ashwin; Ahn, Jeongseob; Mars, Jason; Tang, Lingjia Grandslam: Guaranteeing slas for jobs in microservices execution frameworks Inproceedings Proceedings of the Fourteenth EuroSys Conference 2019, pp. 1–16, 2019. @inproceedings{kannan2019grandslam, title = {Grandslam: Guaranteeing slas for jobs in microservices execution frameworks}, author = {Ram Srivatsa Kannan and Lavanya Subramanian and Ashwin Raju and Jeongseob Ahn and Jason Mars and Lingjia Tang}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/3302424.3303958.pdf}, year = {2019}, date = {2019-01-01}, booktitle = {Proceedings of the Fourteenth EuroSys Conference 2019}, pages = {1--16}, abstract = {The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios. In this study, we present GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices. GrandSLAm estimates time of completion of requests propagating through individual microservice stages within an application. It then leverages this estimate to drive a runtime system that dynamically batches and reorders requests at each microservice in a manner where individual jobs meet their respective target latency while achieving high throughput. GrandSLAm significantly increases throughput by up to 3x compared to the our baseline, without violating SLAs for a wide range of real-world AI and ML applications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The microservice architecture has dramatically reduced user effort in adopting and maintaining servers by providing a catalog of functions as services that can be used as building blocks to construct applications. This has enabled datacenter operators to look at managing datacenter hosting microservices quite differently from traditional infrastructures. Such a paradigm shift calls for a need to rethink resource management strategies employed in such execution environments. We observe that the visibility enabled by a microservices execution framework can be exploited to achieve high throughput and resource utilization while still meeting Service Level Agreements, especially in multi-tenant execution scenarios. In this study, we present GrandSLAm, a microservice execution framework that improves utilization of datacenters hosting microservices. GrandSLAm estimates time of completion of requests propagating through individual microservice stages within an application. It then leverages this estimate to drive a runtime system that dynamically batches and reorders requests at each microservice in a manner where individual jobs meet their respective target latency while achieving high throughput. GrandSLAm significantly increases throughput by up to 3x compared to the our baseline, without violating SLAs for a wide range of real-world AI and ML applications. |
74. | Arora, Manish; Skach, Matt; Huang, Wei; An, Xudong; Mars, Jason; Tang, Lingjia; Tullsen, Dean M Understanding the Impact of Socket Density in Density Optimized Servers Inproceedings 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 687–700, IEEE 2019. @inproceedings{arora2019understanding, title = {Understanding the Impact of Socket Density in Density Optimized Servers}, author = {Manish Arora and Matt Skach and Wei Huang and Xudong An and Jason Mars and Lingjia Tang and Dean M Tullsen}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/08675196.pdf}, year = {2019}, date = {2019-01-01}, booktitle = {2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)}, pages = {687--700}, organization = {IEEE}, abstract = {The increasing demand for computational power has led to the creation and deployment of large-scale data centers. During the last few years, data centers have seen improvements aimed at increasing computational density - the amount of throughput that can be achieved within the allocated physical footprint. This need to pack more compute in the same physical space has led to density optimized server designs. Density optimized servers push compute density significantly beyond what can be achieved by blade servers by using innovative modular chassis based designs. This paper presents a comprehensive analysis of the impact of socket density on intra-server thermals and demonstrates that increased socket density inside the server leads to large temperature variations among sockets due to inter-socket thermal coupling. The paper shows that traditional chip-level and data center-level temperature-aware scheduling techniques do not work well for thermally-coupled sockets. The paper proposes new scheduling techniques that account for the thermals of the socket a task is scheduled on, as well as thermally coupled nearby sockets. The proposed mechanisms provide 2.5% to 6.5% performance improvements across various workloads and as much as 17% over traditional temperature-aware schedulers for computation-heavy workloads.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The increasing demand for computational power has led to the creation and deployment of large-scale data centers. During the last few years, data centers have seen improvements aimed at increasing computational density - the amount of throughput that can be achieved within the allocated physical footprint. This need to pack more compute in the same physical space has led to density optimized server designs. Density optimized servers push compute density significantly beyond what can be achieved by blade servers by using innovative modular chassis based designs. This paper presents a comprehensive analysis of the impact of socket density on intra-server thermals and demonstrates that increased socket density inside the server leads to large temperature variations among sockets due to inter-socket thermal coupling. The paper shows that traditional chip-level and data center-level temperature-aware scheduling techniques do not work well for thermally-coupled sockets. The paper proposes new scheduling techniques that account for the thermals of the socket a task is scheduled on, as well as thermally coupled nearby sockets. The proposed mechanisms provide 2.5% to 6.5% performance improvements across various workloads and as much as 17% over traditional temperature-aware schedulers for computation-heavy workloads. |
73. | Larson, Stefan; Mahendran, Anish; Lee, Andrew; Kummerfeld, Jonathan K; Hill, Parker; Laurenzano, Michael A; Hauswald, Johann; Tang, Lingjia; Mars, Jason Outlier Detection for Improved Data Quality and Diversity in Dialog Systems Journal Article Proceedings of NAACL-HLT 2019, pp. 517–527, 2019. @article{larson2019outlier, title = {Outlier Detection for Improved Data Quality and Diversity in Dialog Systems}, author = {Stefan Larson and Anish Mahendran and Andrew Lee and Jonathan K Kummerfeld and Parker Hill and Michael A Laurenzano and Johann Hauswald and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/N19-1051.pdf}, year = {2019}, date = {2019-01-01}, journal = {Proceedings of NAACL-HLT 2019}, pages = {517–527}, abstract = {In a corpus of data, outliers are either errors: mistakes in the data that are counterproductive, or are unique: informative samples that improve model robustness. Identifying outliers can lead to better datasets by (1) removing noise in datasets and (2) guiding collection of additional data to fill gaps. However, the problem of detecting both outlier types has received relatively little attention in NLP, particularly for dialog systems. We introduce a simple and effective technique for detecting both erroneous and unique samples in a corpus of short texts using neural sentence embeddings combined with distance-based outlier detection. We also present a novel data collection pipeline built atop our detection technique to automatically and iteratively mine unique data samples while discarding erroneous samples. Experiments show that our outlier detection technique is effective at finding errors while our data collection pipeline yields highly diverse corpora that in turn produce more robust intent classification and slot-filling models.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In a corpus of data, outliers are either errors: mistakes in the data that are counterproductive, or are unique: informative samples that improve model robustness. Identifying outliers can lead to better datasets by (1) removing noise in datasets and (2) guiding collection of additional data to fill gaps. However, the problem of detecting both outlier types has received relatively little attention in NLP, particularly for dialog systems. We introduce a simple and effective technique for detecting both erroneous and unique samples in a corpus of short texts using neural sentence embeddings combined with distance-based outlier detection. We also present a novel data collection pipeline built atop our detection technique to automatically and iteratively mine unique data samples while discarding erroneous samples. Experiments show that our outlier detection technique is effective at finding errors while our data collection pipeline yields highly diverse corpora that in turn produce more robust intent classification and slot-filling models. |
72. | Tang, Lingjia; Mars, Jason; Hundt, Robert System and methods for sharing memory subsystem resources among datacenter applications Miscellaneous 2019, (US Patent 10,313,265). @misc{tang2019system, title = {System and methods for sharing memory subsystem resources among datacenter applications}, author = {Lingjia Tang and Jason Mars and Robert Hundt}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/US9401869.pdf}, year = {2019}, date = {2019-01-01}, abstract = {Systems and methods for mapping applications onto system resource of a computing platform are discussed. The computing platform may receive, using control circuitry, a request to run a plurality of applications on a computing platform having a plurality of system resources. The computing platform may determine a plurality of mapping configurations for the plurality of applications onto the plurality of system resources. The computing platform may execute the plurality of applications with each of the plurality of mapping configurations. The computing platform may determine at least one performance metric based on the executed plurality of applications for each of the plurality of mapping configurations. The computing platform may select a selected mapping configuration among the plurality of mapping configurations based on at least one determined performance metric.}, note = {US Patent 10,313,265}, keywords = {}, pubstate = {published}, tppubtype = {misc} } Systems and methods for mapping applications onto system resource of a computing platform are discussed. The computing platform may receive, using control circuitry, a request to run a plurality of applications on a computing platform having a plurality of system resources. The computing platform may determine a plurality of mapping configurations for the plurality of applications onto the plurality of system resources. The computing platform may execute the plurality of applications with each of the plurality of mapping configurations. The computing platform may determine at least one performance metric based on the executed plurality of applications for each of the plurality of mapping configurations. The computing platform may select a selected mapping configuration among the plurality of mapping configurations based on at least one determined performance metric. |
71. | Kannan, Ram Srivatsa; Laurenzano, Michael; Ahn, Jeongseob; Mars, Jason; Tang, Lingjia Caliper: Interference estimator for multi-tenant environments sharing architectural resources Journal Article ACM Transactions on Architecture and Code Optimization (TACO), 16 (3), pp. 1–25, 2019. @article{kannan2019caliper, title = {Caliper: Interference estimator for multi-tenant environments sharing architectural resources}, author = {Ram Srivatsa Kannan and Michael Laurenzano and Jeongseob Ahn and Jason Mars and Lingjia Tang}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/3323090.pdf}, year = {2019}, date = {2019-01-01}, journal = {ACM Transactions on Architecture and Code Optimization (TACO)}, volume = {16}, number = {3}, pages = {1--25}, publisher = {ACM New York, NY, USA}, abstract = {We introduce Caliper, a technique for accurately estimating performance interference occurring in shared servers. Caliper overcomes the limitations of prior approaches by leveraging a micro-experiment-based technique. In contrast to state-of-the-art approaches that focus on periodically pausing co-running applications to estimate slowdown, Caliper utilizes a strategic phase-triggered technique to capture interference due to co-location. This enables Caliper to orchestrate an accurate and low-overhead interference estimation technique that can be readily deployed in existing production systems. We evaluate Caliper for a broad spectrum of workload scenarios, demonstrating its ability to seamlessly support up to 16 applications running simultaneously and outperform the state-of-the-art approaches.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We introduce Caliper, a technique for accurately estimating performance interference occurring in shared servers. Caliper overcomes the limitations of prior approaches by leveraging a micro-experiment-based technique. In contrast to state-of-the-art approaches that focus on periodically pausing co-running applications to estimate slowdown, Caliper utilizes a strategic phase-triggered technique to capture interference due to co-location. This enables Caliper to orchestrate an accurate and low-overhead interference estimation technique that can be readily deployed in existing production systems. We evaluate Caliper for a broad spectrum of workload scenarios, demonstrating its ability to seamlessly support up to 16 applications running simultaneously and outperform the state-of-the-art approaches. |
70. | Larson, Stefan; Mahendran, Anish; Peper, Joseph J; Clarke, Christopher; Lee, Andrew; Hill, Parker; Kummerfeld, Jonathan K; Leach, Kevin; Laurenzano, Michael A; Tang, Lingjia; Mars, Jason An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction Journal Article Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 1311–1316, 2019. @article{larson2019evaluation, title = {An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction}, author = {Stefan Larson and Anish Mahendran and Joseph J Peper and Christopher Clarke and Andrew Lee and Parker Hill and Jonathan K Kummerfeld and Kevin Leach and Michael A Laurenzano and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/D19-1131.pdf}, year = {2019}, date = {2019-01-01}, journal = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing}, pages = {1311–1316}, abstract = {Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope---i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope---i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems. |
2018 |
|
69. | Hill, Parker; Zamirai, Babak; Lu, Shengshuo; Chao, Yu-Wei; Laurenzano, Michael; Samadi, Mehrzad; Papaefthymiou, Marios; Mahlke, Scott; Wenisch, Thomas; Deng, Jia; Tang, Lingjia; Mars, Jason Rethinking numerical representations for deep neural networks Journal Article arXiv preprint arXiv:1808.02513, 2018. @article{hill2018rethinking, title = {Rethinking numerical representations for deep neural networks}, author = {Parker Hill and Babak Zamirai and Shengshuo Lu and Yu-Wei Chao and Michael Laurenzano and Mehrzad Samadi and Marios Papaefthymiou and Scott Mahlke and Thomas Wenisch and Jia Deng and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/1808.02513.pdf}, year = {2018}, date = {2018-01-01}, journal = {arXiv preprint arXiv:1808.02513}, abstract = {With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency. In this work, we explore unconventional narrow-precision floating-point representations as it relates to inference accuracy and efficiency to steer the improved design of future DNN platforms. We show that inference using these custom numeric representations on production-grade DNNs, including GoogLeNet and VGG, achieves an average speedup of 7.6x with less than 1% degradation in inference accuracy relative to a state-of-the-art baseline platform representing the most sophisticated hardware using single-precision floating point. To facilitate the use of such customized precision, we also present a novel technique that drastically reduces the time required to derive the optimal precision configuration. }, keywords = {}, pubstate = {published}, tppubtype = {article} } With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency. In this work, we explore unconventional narrow-precision floating-point representations as it relates to inference accuracy and efficiency to steer the improved design of future DNN platforms. We show that inference using these custom numeric representations on production-grade DNNs, including GoogLeNet and VGG, achieves an average speedup of 7.6x with less than 1% degradation in inference accuracy relative to a state-of-the-art baseline platform representing the most sophisticated hardware using single-precision floating point. To facilitate the use of such customized precision, we also present a novel technique that drastically reduces the time required to derive the optimal precision configuration. |
68. | Lin, Shih-Chieh; Zhang, Yunqi; Hsu, Chang-Hong; Skach, Matt; Haque, Md E; Tang, Lingjia; Mars, Jason The architectural implications of autonomous driving: Constraints and acceleration Inproceedings Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 751–766, 2018. @inproceedings{lin2018architectural, title = {The architectural implications of autonomous driving: Constraints and acceleration}, author = {Shih-Chieh Lin and Yunqi Zhang and Chang-Hong Hsu and Matt Skach and Md E Haque and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/AutonomousCar-ASPLOS18.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems}, pages = {751--766}, abstract = {Autonomous driving systems have attracted a significant amount of interest recently, and many industry leaders, such as Google, Uber, Tesla, and Mobileye, have invested a large amount of capital and engineering power on developing such systems. Building autonomous driving systems is particularly challenging due to stringent performance requirements in terms of both making the safe operational decisions and finishing processing at real-time. Despite the recent advancements in technology, such systems are still largely under experimentation and architecting end-to-end autonomous driving systems remains an open research question. To investigate this question, we first present and formalize the design constraints for building an autonomous driving system in terms of performance, predictability, storage, thermal and power. We then build an end-to-end autonomous driving system using state-of-the-art award-winning algorithms to understand the design trade-offs for building such systems. In our real-system characterization, we identify three computational bottlenecks, which conventional multicore CPUs are incapable of processing under the identified design constraints. To meet these constraints, we accelerate these algorithms using three accelerator platforms including GPUs, FPGAs, and ASICs, which can reduce the tail latency of the system by 169x, 10x, and 93x respectively. With accelerator-based designs, we are able to build an end-to-end autonomous driving system that meets all the design constraints, and explore the trade-offs among performance, power and the higher accuracy enabled by higher resolution cameras.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Autonomous driving systems have attracted a significant amount of interest recently, and many industry leaders, such as Google, Uber, Tesla, and Mobileye, have invested a large amount of capital and engineering power on developing such systems. Building autonomous driving systems is particularly challenging due to stringent performance requirements in terms of both making the safe operational decisions and finishing processing at real-time. Despite the recent advancements in technology, such systems are still largely under experimentation and architecting end-to-end autonomous driving systems remains an open research question. To investigate this question, we first present and formalize the design constraints for building an autonomous driving system in terms of performance, predictability, storage, thermal and power. We then build an end-to-end autonomous driving system using state-of-the-art award-winning algorithms to understand the design trade-offs for building such systems. In our real-system characterization, we identify three computational bottlenecks, which conventional multicore CPUs are incapable of processing under the identified design constraints. To meet these constraints, we accelerate these algorithms using three accelerator platforms including GPUs, FPGAs, and ASICs, which can reduce the tail latency of the system by 169x, 10x, and 93x respectively. With accelerator-based designs, we are able to build an end-to-end autonomous driving system that meets all the design constraints, and explore the trade-offs among performance, power and the higher accuracy enabled by higher resolution cameras. |
67. | Hsu, Chang-Hong; Deng, Qingyuan; Mars, Jason; Tang, Lingjia Smoothoperator: Reducing power fragmentation and improving power utilization in large-scale datacenters Inproceedings Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 535–548, 2018. @inproceedings{hsu2018smoothoperator, title = {Smoothoperator: Reducing power fragmentation and improving power utilization in large-scale datacenters}, author = {Chang-Hong Hsu and Qingyuan Deng and Jason Mars and Lingjia Tang}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/smooth_operator.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems}, pages = {535--548}, abstract = {With the ever growing popularity of cloud computing and web services, Internet companies are in need of increased computing capacity to serve the demand. However, power has become a major limiting factor prohibiting the growth in industry: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. In this work, we first investigate the power utilization in Facebook datacenters. We observe that the combination of provisioning for peak power usage, highly fluctuating traffic, and multi-level power delivery infrastructure leads to significant power budget fragmentation problem and inefficiently low power utilization. To address this issue, our insight is that heterogeneity of power consumption patterns among different services provides opportunities to re-shape the power profile of each power node by re-distributing services. By grouping services with asynchronous peak times under the same power node, we can reduce the peak power of each node and thus creating more power head-rooms to allow more servers hosted, achieving higher throughput. Based on this insight, we develop a workload-aware service placement framework to systematically spread the service instances with synchronous power patterns evenly under the power supply tree, greatly reducing the peak power draw at power nodes. We then leverage dynamic power profile reshaping to maximally utilize the headroom unlocked by our placement framework. Our experiments based on real production workload and power traces show that we are able to host up to 13% more machines in production, without changing the underlying power infrastructure. Utilizing the unleashed power headroom with dynamic reshaping, we achieve up to an estimated total of 15% and 11% throughput improvement for latency-critical service and batch service respectively at the same time, with up to 44% of energy slack reduction.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } With the ever growing popularity of cloud computing and web services, Internet companies are in need of increased computing capacity to serve the demand. However, power has become a major limiting factor prohibiting the growth in industry: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. In this work, we first investigate the power utilization in Facebook datacenters. We observe that the combination of provisioning for peak power usage, highly fluctuating traffic, and multi-level power delivery infrastructure leads to significant power budget fragmentation problem and inefficiently low power utilization. To address this issue, our insight is that heterogeneity of power consumption patterns among different services provides opportunities to re-shape the power profile of each power node by re-distributing services. By grouping services with asynchronous peak times under the same power node, we can reduce the peak power of each node and thus creating more power head-rooms to allow more servers hosted, achieving higher throughput. Based on this insight, we develop a workload-aware service placement framework to systematically spread the service instances with synchronous power patterns evenly under the power supply tree, greatly reducing the peak power draw at power nodes. We then leverage dynamic power profile reshaping to maximally utilize the headroom unlocked by our placement framework. Our experiments based on real production workload and power traces show that we are able to host up to 13% more machines in production, without changing the underlying power infrastructure. Utilizing the unleashed power headroom with dynamic reshaping, we achieve up to an estimated total of 15% and 11% throughput improvement for latency-critical service and batch service respectively at the same time, with up to 44% of energy slack reduction. |
66. | Jain, Animesh; Phanishayee, Amar; Mars, Jason; Tang, Lingjia; Pekhimenko, Gennady Gist: Efficient data encoding for deep neural network training Inproceedings 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 776–789, IEEE 2018. @inproceedings{jain2018gist, title = {Gist: Efficient data encoding for deep neural network training}, author = {Animesh Jain and Amar Phanishayee and Jason Mars and Lingjia Tang and Gennady Pekhimenko}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/08416872.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)}, pages = {776--789}, organization = {IEEE}, abstract = {Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks. A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train. In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps). We then introduce a framework for DNN-layer-specific optimizations (e.g., convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs. We find that a feature map typically has two uses that are spread far apart temporally. Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately. Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes - lossless and lossy - to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps. For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value. We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2× across 5 state-of-the-art image classification DNNs, with an average of 1.8× with only 4% performance overhead. We also show that further software (e.g., CuDNN) and hardware (e.g., dynamic allocation) optimizations can result in even larger footprint reduction (upto 4.1×).}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks. A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train. In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps). We then introduce a framework for DNN-layer-specific optimizations (e.g., convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs. We find that a feature map typically has two uses that are spread far apart temporally. Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately. Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes - lossless and lossy - to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps. For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value. We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2× across 5 state-of-the-art image classification DNNs, with an average of 1.8× with only 4% performance overhead. We also show that further software (e.g., CuDNN) and hardware (e.g., dynamic allocation) optimizations can result in even larger footprint reduction (upto 4.1×). |
65. | Kang, Yiping; Zhang, Yunqi; Kummerfeld, Jonathan K; Tang, Lingjia; Mars, Jason Data collection for dialogue system: A startup perspective Inproceedings Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pp. 33–40, 2018. @inproceedings{kang2018data, title = {Data collection for dialogue system: A startup perspective}, author = {Yiping Kang and Yunqi Zhang and Jonathan K Kummerfeld and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/N18-3005.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)}, pages = {33--40}, abstract = {Industrial dialogue systems such as Apple Siri and Google Now rely on large scale diverse and robust training data to enable their sophisticated conversation capability. Crowdsourcing provides a scalable and inexpensive way of data collection but collecting high quality data efficiently requires thoughtful orchestration of the crowdsourcing jobs. Prior study of this topic have focused on tasks only in the academia settings with limited scope or only provide intrinsic dataset analysis, lacking indication on how it affects the trained model performance. In this paper, we present a study of crowdsourcing methods for a user intent classification task in our deployed dialogue system. Our task requires classification of 47 possible user intents and contains many intent pairs with subtle differences. We consider different crowdsourcing job types and job prompts and analyze quantitatively the quality of the collected data and the downstream model performance on a test set of real user queries from production logs. Our observation provides insights into designing efficient crowdsourcing jobs and provide recommendations for future dialogue system data collection process.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Industrial dialogue systems such as Apple Siri and Google Now rely on large scale diverse and robust training data to enable their sophisticated conversation capability. Crowdsourcing provides a scalable and inexpensive way of data collection but collecting high quality data efficiently requires thoughtful orchestration of the crowdsourcing jobs. Prior study of this topic have focused on tasks only in the academia settings with limited scope or only provide intrinsic dataset analysis, lacking indication on how it affects the trained model performance. In this paper, we present a study of crowdsourcing methods for a user intent classification task in our deployed dialogue system. Our task requires classification of 47 possible user intents and contains many intent pairs with subtle differences. We consider different crowdsourcing job types and job prompts and analyze quantitatively the quality of the collected data and the downstream model performance on a test set of real user queries from production logs. Our observation provides insights into designing efficient crowdsourcing jobs and provide recommendations for future dialogue system data collection process. |
64. | Kannan, Ram Srivatsa; Jain, Animesh; Laurenzano, Michael A; Tang, Lingjia; Mars, Jason Proctor: Detecting and investigating interference in shared datacenters Inproceedings 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 76–86, IEEE 2018. @inproceedings{kannan2018proctor, title = {Proctor: Detecting and investigating interference in shared datacenters}, author = {Ram Srivatsa Kannan and Animesh Jain and Michael A Laurenzano and Lingjia Tang and Jason Mars}, url = {https://www.jasonmars.org/wp-content/uploads/2020/04/08366937.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)}, pages = {76--86}, organization = {IEEE}, abstract = {Cloud-scale datacenter management systems utilize virtualization to provide performance isolation while maximizing the utilization of the underlying hardware infrastructure. However, virtualization does not provide complete performance isolation as Virtual Machines (VMs) still compete for nonreservable shared resources (like caches, network, I/O bandwidth etc.) This becomes highly challenging to address in datacenter environments housing tens of thousands of VMs, causing degradation in application performance. Addressing this problem for production datacenters requires a non-intrusive scalable solution that 1) detects performance intrusion and 2) investigates both the intrusive VMs causing interference, as well as the resource(s) for which the VMs are competing for. To address this problem, this paper introduces Proctor, a real time, lightweight and scalable analytics fabric that detects performance intrusive VMs and identifies its root causes from among the arbitrary VMs running in shared datacenters across 4 key hardware resources - network, I/O, cache, and CPU. Proctor is based on a robust statistical approach that requires no special profiling phases, standing in stark contrast to a wide body of prior work that assumes pre-acquisition of application level information prior to its execution. By detecting performance degradation and identifying the root cause VMs and their metrics, Proctor can be utilized to dramatically improve the performance outcomes of applications executing in large-scale datacenters. From our experiments, we are able to show that when we deploy Proctor in a datacenter housing a mix of I/O, network, compute and cache-sensitive applications, it is able to effectively pinpoint performance intrusive VMs. Further, we observe that when Proctor is applied with migration, the application-level Quality-of-Service improves by an average of 2.2× as compared to systems which are unable to detect, identify and pinpoint performance intrusion and their root causes.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Cloud-scale datacenter management systems utilize virtualization to provide performance isolation while maximizing the utilization of the underlying hardware infrastructure. However, virtualization does not provide complete performance isolation as Virtual Machines (VMs) still compete for nonreservable shared resources (like caches, network, I/O bandwidth etc.) This becomes highly challenging to address in datacenter environments housing tens of thousands of VMs, causing degradation in application performance. Addressing this problem for production datacenters requires a non-intrusive scalable solution that 1) detects performance intrusion and 2) investigates both the intrusive VMs causing interference, as well as the resource(s) for which the VMs are competing for. To address this problem, this paper introduces Proctor, a real time, lightweight and scalable analytics fabric that detects performance intrusive VMs and identifies its root causes from among the arbitrary VMs running in shared datacenters across 4 key hardware resources - network, I/O, cache, and CPU. Proctor is based on a robust statistical approach that requires no special profiling phases, standing in stark contrast to a wide body of prior work that assumes pre-acquisition of application level information prior to its execution. By detecting performance degradation and identifying the root cause VMs and their metrics, Proctor can be utilized to dramatically improve the performance outcomes of applications executing in large-scale datacenters. From our experiments, we are able to show that when we deploy Proctor in a datacenter housing a mix of I/O, network, compute and cache-sensitive applications, it is able to effectively pinpoint performance intrusive VMs. Further, we observe that when Proctor is applied with migration, the application-level Quality-of-Service improves by an average of 2.2× as compared to systems which are unable to detect, identify and pinpoint performance intrusion and their root causes. |
_publications_
2020 |
|
83. | Systems and methods for automatically configuring training data for training machine learning models of a machine learning-based dialogue system including seeding training samples or curating a corpus of training data based on instances of training data identified as anomalous Miscellaneous 2020, (US Patent 10,679,150). |
82. | System and method for implementing an artificially intelligent virtual assistant using machine learning Miscellaneous 2020, (US Patent 10,572,801). |
81. | Systems and methods for intelligently curating machine learning training data and improving machine learning model performance Miscellaneous 2020, (US Patent 10,679,100). |
80. | A Benchmarking Framework for Interactive 3D Applications in the Cloud Inproceedings 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 881–894, IEEE 2020. |
79. | Systems and methods for intelligently configuring and deploying a machine learning-based dialogue system Miscellaneous 2020, (US Patent 10,769,384). |
78. | Systems and methods for machine learning-based multi-intent segmentation and classification Miscellaneous 2020, (US Patent 10,824,818). |
77. | Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system Miscellaneous 2020, (US Patent 10,796,104). |
2019 |
|
76. | Systems and methods for intelligently curating machine learning training data and improving machine learning model performance Miscellaneous 2019, (US Patent 10,303,978). |
75. | Grandslam: Guaranteeing slas for jobs in microservices execution frameworks Inproceedings Proceedings of the Fourteenth EuroSys Conference 2019, pp. 1–16, 2019. |
74. | Understanding the Impact of Socket Density in Density Optimized Servers Inproceedings 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 687–700, IEEE 2019. |
73. | Outlier Detection for Improved Data Quality and Diversity in Dialog Systems Journal Article Proceedings of NAACL-HLT 2019, pp. 517–527, 2019. |
72. | System and methods for sharing memory subsystem resources among datacenter applications Miscellaneous 2019, (US Patent 10,313,265). |
71. | Caliper: Interference estimator for multi-tenant environments sharing architectural resources Journal Article ACM Transactions on Architecture and Code Optimization (TACO), 16 (3), pp. 1–25, 2019. |
70. | An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction Journal Article Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 1311–1316, 2019. |
2018 |
|
69. | Rethinking numerical representations for deep neural networks Journal Article arXiv preprint arXiv:1808.02513, 2018. |
68. | The architectural implications of autonomous driving: Constraints and acceleration Inproceedings Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 751–766, 2018. |
67. | Smoothoperator: Reducing power fragmentation and improving power utilization in large-scale datacenters Inproceedings Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 535–548, 2018. |
66. | Gist: Efficient data encoding for deep neural network training Inproceedings 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 776–789, IEEE 2018. |
65. | Data collection for dialogue system: A startup perspective Inproceedings Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pp. 33–40, 2018. |
64. | Proctor: Detecting and investigating interference in shared datacenters Inproceedings 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 76–86, IEEE 2018. |
© 2021 · Jason Mars