تعلم الفرق الزمني

تعلم الفرق الزمني أو التعلم من الفارق الزمني (بالإنجليزية: Temporal difference learning)‏ هو قسمٌ من طرق التعلم بالتعزيز حرة النموذج، تتعلم بواسطة الانطلاق من التقدير الحالي لدالة القيمة ^{[الإنجليزية]}. تأخذ هذه الطرق عينات من المحيط، مثل طرق مونت كارلو وتقوم بتحديثات مبنية على التقديرات الحالية، مثل طرق البرمجة الديناميكية.[1]

في حين أن طرق مونت كارلو لا تعدل قيمها سوى عند معرفة النتيجة النهائية، تعدل طرق تعلم الفرق الزمني التوقعات لتطابق توقعات لاحقة، أكثر دقة حول المستقبل وذلك قبل معرفة النتيجة النهائية.[2] هذه الهيئة من البدء ^{[الإنجليزية]} موضحة في المثال التالي:

افترض أنك تريد توقع الجو ليوم السبت، وتملك نموذجا ما يقوم بتوقع الجو يوم السبت، بالأخذ في الحسبان الجو في كل يوم من أيام الأسبوع. في الحالة القياسية، ستنتظر حتى يوم السبت ثم تعدل جميع نماذجك. لكن -على سبيل المثال- حين يحل يوم الجمعة يُفترض أن تكون لديك فكرة جيدة كيف سيكون الجو يوم السبت، ومنه تكون قادرا على تغيير نموذج يوم السبت قبل حلوله.[2]

طرق التعلم بالفارق الزمني لها علاقة بنموذج الفارق الزمني الذي تتعلم به الحيوانات.[3][4][5][6][7]

مراجع

Richard Sutton & Andrew Barto (1998). Reinforcement Learning. MIT Press. ISBN 978-0-585-02445-5. مؤرشف من الأصل في 30 مارس 2017. الوسيط |CitationClass= تم تجاهله (مساعدة)
Richard Sutton (1988). "Learning to predict by the methods of temporal differences". Machine Learning. 3 (1): 9–44. doi:10.1007/BF00115009. الوسيط |CitationClass= تم تجاهله (مساعدة) (A revised version is available on Richard Sutton's publication page نسخة محفوظة 2017-03-30 على موقع واي باك مشين.)
Schultz, W, Dayan, P & Montague, PR. (1997). "A neural substrate of prediction and reward". Science. 275 (5306): 1593–1599. CiteSeerX = 10.1.1.133.6176 10.1.1.133.6176. doi:10.1126/science.275.5306.1593. PMID 9054347. الوسيط |CitationClass= تم تجاهله (مساعدة)صيانة CS1: أسماء متعددة: قائمة المؤلفون (link)
Montague, P. R.; Dayan, P.; Sejnowski, T. J. (1996-03-01). "A framework for mesencephalic dopamine systems based on predictive Hebbian learning" (PDF). The Journal of Neuroscience. 16 (5): 1936–1947. doi:10.1523/JNEUROSCI.16-05-01936.1996. ISSN 0270-6474. PMID 8774460. مؤرشف من الأصل (PDF) في 21 يوليو 2018. الوسيط |CitationClass= تم تجاهله (مساعدة)
Montague, P.R.; Dayan, P.; Nowlan, S.J.; Pouget, A.; Sejnowski, T.J. (1993). "Using aperiodic reinforcement for directed self-organization" (PDF). Advances in Neural Information Processing Systems. 5: 969–976. مؤرشف من الأصل (PDF) في 12 مارس 2006. الوسيط |CitationClass= تم تجاهله (مساعدة)
Montague, P. R.; Sejnowski, T. J. (1994). "The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms". Learning & Memory. 1 (1): 1–33. ISSN 1072-0502. PMID 10467583. الوسيط |CitationClass= تم تجاهله (مساعدة)
Sejnowski, T.J.; Dayan, P.; Montague, P.R. (1995). "Predictive hebbian learning". Proceedings of Eighth ACM Conference on Computational Learning Theory: 15–18. doi:10.1145/230000/225300/p15-sejnowski (غير نشط 2019-08-20). مؤرشف من الأصل (PDF) في 13 أبريل 2020. الوسيط |CitationClass= تم تجاهله (مساعدة)

بوابة علوم
بوابة تعلم الآلة
بوابة حوسبة علمية
بوابة خوارزميات
بوابة علوم عصبية

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[RSutton-1998-1] Richard Sutton & Andrew Barto (1998). Reinforcement Learning. MIT Press. ISBN 978-0-585-02445-5. مؤرشف من الأصل في 30 مارس 2017. الوسيط |CitationClass= تم تجاهله (مساعدة)

[RSutton-1988-2] Richard Sutton (1988). "Learning to predict by the methods of temporal differences". Machine Learning. 3 (1): 9–44. doi:10.1007/BF00115009. الوسيط |CitationClass= تم تجاهله (مساعدة) (A revised version is available on Richard Sutton's publication page نسخة محفوظة 2017-03-30 على موقع واي باك مشين.)

[WSchultz-1997-3] Schultz, W, Dayan, P & Montague, PR. (1997). "A neural substrate of prediction and reward". Science. 275 (5306): 1593–1599. CiteSeerX = 10.1.1.133.6176 10.1.1.133.6176. doi:10.1126/science.275.5306.1593. PMID 9054347. الوسيط |CitationClass= تم تجاهله (مساعدة)صيانة CS1: أسماء متعددة: قائمة المؤلفون (link)

[:0-4] Montague, P. R.; Dayan, P.; Sejnowski, T. J. (1996-03-01). "A framework for mesencephalic dopamine systems based on predictive Hebbian learning" (PDF). The Journal of Neuroscience. 16 (5): 1936–1947. doi:10.1523/JNEUROSCI.16-05-01936.1996. ISSN 0270-6474. PMID 8774460. مؤرشف من الأصل (PDF) في 21 يوليو 2018. الوسيط |CitationClass= تم تجاهله (مساعدة)

[:1-5] Montague, P.R.; Dayan, P.; Nowlan, S.J.; Pouget, A.; Sejnowski, T.J. (1993). "Using aperiodic reinforcement for directed self-organization" (PDF). Advances in Neural Information Processing Systems. 5: 969–976. مؤرشف من الأصل (PDF) في 12 مارس 2006. الوسيط |CitationClass= تم تجاهله (مساعدة)

[:2-6] Montague, P. R.; Sejnowski, T. J. (1994). "The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms". Learning & Memory. 1 (1): 1–33. ISSN 1072-0502. PMID 10467583. الوسيط |CitationClass= تم تجاهله (مساعدة)

[:3-7] Sejnowski, T.J.; Dayan, P.; Montague, P.R. (1995). "Predictive hebbian learning". Proceedings of Eighth ACM Conference on Computational Learning Theory: 15–18. doi:10.1145/230000/225300/p15-sejnowski (غير نشط 2019-08-20). مؤرشف من الأصل (PDF) في 13 أبريل 2020. الوسيط |CitationClass= تم تجاهله (مساعدة)

تعلم الآلة وتنقيب في البيانات
جزء من سلسلة مقالات حول

مشكلات تصنيف إحصائي تحليل عنقودي تحليل الانحدار Anomaly detection تعلم الآلة الآلي تعلم قواعد الارتباط تعليم معزز Structured prediction Feature engineering Feature learning Online learning Semi-supervised learning تعلم غير مراقب Learning to rank Grammar induction
التعليم بالإشراف تعلم شجرة القرار Ensembles Bagging Boosting غابة عشوائية كي أقرب جار انحدار خطي المصنف بايز ساذج شبكة عصبونية اصطناعيةs انحدار لوجستي بيرسيبترون Relevance vector machine (RVM) شعاع الدعم الآلي
تحليل عنقودي BIRCH CURE تجميع هرمي خوارزمية تصنيفية تحقيق أقصى قدر للتوقع (EM) DBSCAN OPTICS Mean-shift
قائمة تحليل عاملي CCA تحليل المكونات المستقلة تحليل التمييز الخطي NMF تحليل العنصر الرئيسي PGD t-SNE
التوقع المهيكل Graphical models شبكة بايزية Conditional random field نظرية ماركوف المخفية
تشخيص الشذوذ كي أقرب جار Local outlier factor
الشبكات العصبونية الاصطناعية Autoencoder تعلم متعمق ديب دريم Multilayer perceptron RNN LSTM GRU ESN Restricted Boltzmann machine شبكة خصومية توليدية خريطة ذاتية التنظيم شبكة عصبونية التفافية U-Net
تعليم معزز Q-learning SARSA تعلم الفرق الزمني
النظرية Bias–variance dilemma Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
أنظمة تعلم الآلة NeurIPS ICML ML JMLR
بوابة تعلم الآلة